Intel have announced the next big instruction set extension, AVX512 to be
implemented in 2015 or 2016. The
details are defined in
Intel Architecture Instruction Set Extensions Programming
Reference. There are many interesting extensions:
- The size of vector registers are extended from 256 bits (YMM registers) to
512 bits (ZMM) registers. There is room for further extensions to at least 1024 bits (what
will they be called?)
- The number of vector registers is doubled to 32 registers in 64-bit mode.
There will still be only 8 vector registers in 32-bit mode.
- Eight new mask registers k0 - k7 allow masked and conditional operations.
Most vector instructions can be masked so that it only operates on selected
vector elements while the remaining vector elements are unchanged or zeroed.
This will replace the use of vector registers as masks.
- Most vector instructions with a memory operand have an option for
broadcasting a scalar operand.
- Floating point vector instructions have options for specifying the
rounding mode and for suppressing exceptions.
- There is a new addressing mode called compressed displacement. Where
instructions have a memory operand with a pointer and an 8-bit sign-extended
displacement, the displacement is multiplied by the size of the operand. This
makes it possible to address a larger interval with just a single byte
displacement as long as the memory operands are properly aligned. This makes
the instructions smaller in some cases to compensate for the longer prefix.
- More than 100 new instructions
- The 512-bit registers can do vector operations on 32-bit and 64-bit signed
and unsigned integers and single and double precision floats, but
unfortunately not on 8-bit and 16-bit integers.
A year ago, Intel announced a similar instruction set with 512-bit registers in
Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual.
The two instruction sets are very similar, both are backwards compatible, but
they are not compatible with each other. The two instruction sets differ by a
single prefix bit, even for otherwise identical instructions. I assume that the Knights Corner
or Xeon Phi instruction set will have a short life and be replaced by AVX512.
The AVX512 instruction set uses a new 4-bytes prefix named EVEX, which is
similar to the 2- or 3-bytes VEX prefix, but with 62 (hexadecimal) as the first
byte. (Actually, I predicted
several years ago
that the 62 byte would be used for such a prefix because it was the only
remaining byte that could be used in the same way as the VEX prefix bytes). The
extra bits in the EVEX prefix are used for doubling the number of registers, for
specifying vector size, and for the extra features of broadcasting, masking,
zeroing, specifying rounding mode, and suppressing floating point exceptions.
The calling conventions for the new registers are partially defined in a
draft ABI, but it is still discussed whether the new registers should have
callee save status, see Gnu
libc-alpha
mailing list.
I have commented on the AVX512 instruction set and suggested various
improvements at
Intel's blog and Intel's forum.
The new instruction sets are supported by my
objconv disassembler. |