Earlier this month, Professor McCalpin has made some reviews on his blog ( sites.utexas.edu/jdm4372/2016/11/05/intel-discloses-vectorsimd-instructions-for-future-processors/ ) about the Oct/2016 release of Intel’s Instruction Extensions Programming Reference ( https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf ) which has now disclosed a few new "vector+SIMD" instructions, as he called, as now they can operate on consecutive SIMD registers. i.e: Multiple operations as both simultaneous (SIMD) and consecutive (Vector). An example of a DGEMM was given on the new V4FMADDPS instruction, which performs 4 consecutive multiply-accumulate operations with a single 512-bit accumulator register, four different (consecutively-numbered) 512-bit input registers, and four (consecutive) 32-bit memory values from memory |