There are many different versions of Intel compilers and function libraries
with different CPU dispatching schemes. Some of these are fair to non-Intel
processors and some are unfair. By unfair dispatching I mean that it
chooses a suboptimal code path when running on a non-Intel CPU even when the CPU
is compatible with a better code path. The different versions can get quite confusing, so I have
tried to test as many different versions of Intel software products as I could
get my hands on and present an overview of the results here.
The tables below show the highest instruction set available to Intel and
non-Intel processors when running the different software products. The sequence
of instruction sets have the not very logical names:
386 |
MMX |
SSE |
SSE2 |
SSE3 |
SSSE3 |
SSE4.1 |
SSE4.2 |
AVX |
Intel Math Kernel Library
The Math Kernel Library (MKL) contains many advanced mathematical functions.
The results in the following table do not apply to various (sub-)packages that may be bundled with
the MKL, such as the Intel Vector Math Library (VML), Intel Performance
Primitives (IPP) and Intel Threading Building Blocks (TBB).
Library version |
Intel processor
32 bit mode |
non-Intel processor
32 bit mode |
Intel processor
64 bit mode |
non-Intel processor
64 bit mode |
MKL 7.0, 2004 |
SSE3 |
386 |
n.a. |
n.a. |
MKL 8.1, 2006 |
SSSE3 |
SSSE3 |
SSSE3 |
SSSE3 |
MKL 9.0, 2006 |
SSSE3 |
SSSE3 |
SSSE3 |
SSSE3 |
MKL 10.2, 2008 |
SSE4.2 |
SSE4.2 |
SSE4.2 |
SSE2 |
MKL 10.3, 2010 |
SSE4.2 |
SSE4.2 |
AVX |
SSE2 |
As we can see, version 8 and 9 give Intel and non-Intel processors access to
the same instruction sets, while version 7 and the
64-bit version 10 have unfair dispatching. MKL 7.0 has no x86-64 version.
Intel Vector Math Library
The Vector Math Library (VML) contains procedures for calculating elementary
mathematical functions on vectors of arbitrary size.
Library version |
Intel processor
32 bit mode |
non-Intel processor
32 bit mode |
Intel processor
64 bit mode |
non-Intel processor
64 bit mode |
VML 7.0, 2004 |
SSE3 |
386 |
n.a. |
n.a. |
VML 8.1, 2006 |
SSSE3 |
SSE |
SSSE3 |
SSE2 |
VML 9.0, 2006 |
SSSE3 |
SSE |
SSSE3 |
SSE2 |
VML 10.2, 2006 |
SSE4.2 |
SSE2 |
SSE4.2 |
SSE2 |
VML 10.3, 2010 |
AVX |
SSE2 |
AVX |
SSE2 |
As we can see, all versions have unfair dispatching. There are different
branches for Intel processors with SSE2 and non-Intel processors with SSE2. I
have not tested which of the SSE2 branches run fastest on non-Intel processors.
Intel Performance Primitives
All the versions I have tested have fair CPU dispatching.
Intel Threading Building Blocks
This library has some CPU dispatching, but I have not tested whether it is
fair or not.
Intel standard C library and standard math library
These libraries are called automatically from code compiled with an Intel C++
compiler.
Library version |
Intel processor
32 bit mode |
non-Intel processor
32 bit mode |
Intel processor
64 bit mode |
non-Intel processor
64 bit mode |
7.1, 2004 |
SSE2 |
386 |
n.a. |
n.a. |
8.1, 2005 |
SSE3 |
386 |
n.a. |
n.a. |
9.1, 2006 |
SSE3 |
386 |
SSE3 |
SSE2 |
10.1, 2008 |
SSE4.2 |
386 |
SSE4.2 |
SSE2 |
11.1, 2010 |
AVX |
386 |
AVX |
SSE2 |
12.0, 2010 |
AVX |
386 |
AVX |
SSE2 |
All versions have unfair CPU dispatching. In many cases, however, the Intel compiler can generate calls directly to the SSE2 version
of a function when compiling for the SSE2 or higher
instruction set. This also applies to non-Intel processors.
Intel Short Vector Math Library
The Short Vector Math Library (SVML) is used for elementary mathematical
functions on vector registers (XMM and YMM registers). It is called
automatically from code compiled with an Intel compiler when the SSE2 or
higher instruction set is enabled. The SVML can also be used with other
compilers such as the Gnu C++ compiler.
Library version |
Intel processor
32 bit mode |
non-Intel processor
32 bit mode |
Intel processor
64 bit mode |
non-Intel processor
64 bit mode |
7.1, 2004 |
SSE2 |
SSE2 |
n.a. |
n.a. |
8.1, 2005 |
SSE3 |
SSE2 |
n.a. |
n.a. |
9.1, 2006 |
SSE3 |
SSE2 |
SSE3 |
SSE2 |
10.1, 2008 |
SSE4.2 |
SSE2 |
SSE4.2 |
SSE2 |
11.1, 2010 |
AVX |
SSE2 |
AVX |
SSE2 |
12.0, 2010 |
AVX |
SSE2 |
AVX |
SSE2 |
Intel C++ compiler
The Intel C++ compiler has various options that allow the programmer to
generate code for a specific instruction set or to make multiple versions of the
code for different instruction sets with automatic CPU dispatching. Non-Intel
processors will always get the generic version of the code if CPU dispatching is
used. The default level for the generic code is SSE2 for version 11 and 12 of the
compiler, and 386 for version 10 and earlier in 32-bit mode as indicated in the
following table.
Compiler version |
Intel processor
32 bit mode |
non-Intel processor
32 bit mode |
Intel processor
64 bit mode |
non-Intel processor
64 bit mode |
7.1, 2004 |
SSE2 |
386 |
n.a. |
n.a. |
8.1, 2005 |
SSE3 |
386 |
n.a. |
n.a. |
9.1, 2006 |
SSE3 |
386 |
SSE3 |
SSE2 |
10.1, 2008 |
SSE4.2 |
386 |
SSE4.2 |
SSE2 |
11.1, 2010 |
AVX |
SSE2 |
AVX |
SSE2 |
12.0, 2010 |
AVX |
SSE2 |
AVX |
SSE2 |
There is an option
for setting the generic level higher or lower. For example, the options /arch:SSE3 /QaxSSE4.1,AVX
will set the generic level to SSE3 and generate three versions of the code for
the SSE3, SSE4.2 and AVX instruction sets. Non-Intel processors can only get the
generic version, which will be SSE3 in this example. Code compiled with the /Qx option, for example /QxSSE4.1
will fail to run on non-Intel processors and processors without the specified
instruction set.
Other Intel products
The above test results are obtained with Intel C++ compilers and function
libraries for Windows and Linux. I have found no differences between the Windows
and Linux versions in the cases where I have had access to both. I have not
tested the Macintosh versions, but this is less relevant as long as no Macintosh
computers are available with AMD or VIA processors. I have not tested the Intel
Fortran compiler, but it seems to be similar to the Intel C++ compiler with
respect to CPU dispatching.
Anybody who have earlier versions of the compiler and function libraries than
the ones I have tested are welcome to contact me. |