The code may contain several different versions, each optimized for a particular instruction set extension, such as SSE2, AVX, or AVX512. A so-called CPU dispatcher is choosing which code version to run, depending on the CPU it is running on. The CPU dispatcher may be fair or unfair. A fair CPU dispatcher will choose the optimal code version depending on which instruction set extension the CPU supports. An unfair CPU dispatcher will detect the brand of CPU and choose the optimal code path only when it is running on an Intel CPU. It will choose the slow "generic" version when running on another brand of CPU, even if the CPU is compatible with a better version.
The fact that unfair CPU dispatchers have appeared in software produced by third party programmers without their knowledge, has led to a lot of controversy and legal battles. We have even seen misleading benchmark comparisons between Intel and AMD processors based on benchmark software that contained unfair CPU dispatchers.
A long discussion thread on this topic can be seen in an older version of this blog at https://www.agner.org/optimize/blog/read.php?i=49. The present post is a digest of this discussion with a review of the most important conclusions. Links and details can be found in the original thread.
When I started testing Intel's compiler several years ago, I soon found out that it had a biased CPU dispatcher. Back in January 2007 I complained to Intel about the unfair CPU dispatcher. I had a long correspondence with Intel engineers about the issue, where they kept denying the problem and I kept providing more evidence. They said that:
This sounds nice, but the truth is that the CPU dispatcher did not support higher instruction sets in AMD processors and still does not today (Intel compiler version 19). I have later found out that others have made similar complaints to Intel and got similarly useless answers (see links in the old blog post).The CPU dispatch, coupled with optimizations, is designed to optimize performance across Intel and AMD processors to give the best results. This is clearly our goal and with one exception we believe we are there now. The one exception is that our 9.x compilers do not support SSE3 on AMD processors because of the timing of the release of AMD processors vs. our compiler (our compiler was developed before AMD supported SSE3). The future 10.x compilers, which enter beta this quarter and release around the middle of the year, will address this now that we've had time to tune and adjust to the new AMD processors.
I also found that the Intel CPU dispatcher did not only check the vendor ID string and the instruction sets supported. It also checked for specific processor models. In fact, it would fail to recognize future Intel processors with a family number different from 6. When I mentioned this to the Intel engineers they replied:
In other words, they claim that they are optimizing for specific processor models rather than for specific instruction sets. If true, this gives Intel an argument for not supporting AMD processors properly. But it also means that all software developers who use an Intel compiler have to recompile their code and distribute new versions to their customers every time a new Intel processor appears on the market. Three years later, I tried to run a program compiled with an old version of Intel's compiler on the newest Intel processors? You guessed it: It still runs the optimal code path. But the reason is more difficult to guess: Intel have manipulated the CPUID family numbers on new processors in such a way that they appear as known models to older Intel software.You mentioned we will not support future Intel processors with non-'6' family designations without a compiler update. Yes, that is correct and intentional. Our compiler produces code which we have high confidence will continue to run in the future. This has the effect of not assuming anything about future Intel or AMD or other processors. You have noted we could be more aggressive. We believe that would not be wise for our customers, who want a level of security that their code (built with our compiler) will continue to run far into the future. Your suggested methods, while they may sound reasonable, are not conservative enough for our highly optimizing compiler. Our experience steers us to issue code conservatively, and update the compiler when we have had a chance to verify functionality with new Intel and new AMD processors. That means there is a lag sometime in our production release support for new processors.
Perhaps the initial design of Intel's CPU dispatcher was indeed intended to optimize for known processor models only, without regard for future models. If any of my students had made such a solution that was not future-oriented, I would consider it a serious flaw. Perhaps the Intel engineers discovered the missing support for future processors too late so that they had to design the next generation of their processors in such a way that they appeared as known models to existing Intel software.
After Intel had flatly denied to change their CPU dispatcher, I decided that the most efficient way to make them change their minds was to create publicity about the problem. I contacted several IT magazines, but nobody wanted to write about it. Sad, but not very surprising, considering that they all depend on advertising money from Intel. The only publicity was my own optimization manual where I have described the problem in detail and given instructions on how to replace the unfair CPU dispatcher. I wonder why AMD did not create public awareness about the problem. Were they obliged to keep quiet about an ongoing lawsuit? And what about VIA/Centaur?
It is possible to change the vendor ID string in VIA processors by the use of certain undocumented registers. It is also possible to change the vendor ID string in AMD processors by the use of virtualization instructions. In 2010, I documented that software produced with Intel tools run faster on a VIA processor when the vendor ID string in the processor is changed to "GenuineIntel".
The same effect was documented with some of the most popular mathematical software packages, including Mathematica, Mathcad, and Matlab. These software packages are running slower than necessary on non-Intel computers. A significant improvement in speed was observed when the CPU ID string was manipulated to fake an Intel CPU.
Perhaps AMD engineers have been unaware of the problem for several years. I discovered that a certain function library published by AMD contains an unfair CPU dispatcher originating from an Intel Fortran compiler. This function library runs faster on a non-Intel processor when the vendor ID string is manipulated to falsely indicate an Intel processor.
The fact that software produced with Intel tools contain unfair CPU dispatchers, possibly without the knowledge of the programmer or the user, has led to some serious legal battles. AMD has filed several lawsuits against Intel. An out of court settlement in 2009 agrees that Intel shall not include any artificial performance impairment that is made intentionally to degrade the performance or operation of a specific AMD product.
The US Federal Trade Commission filed a complaint in 2009 against Intel for several charges of unfair competition. One of the charges was the unfair CPU dispatching. My research played an important role in documenting this. An out of court settlement orders that Intel must inform its software customers about the CPU dispatch mechanism that leads to suboptimal performance on non-Intel CPUs.
The state of affairs after these legal battles is that Intel is publishing an "Optimization Notice" on its software products stating that "Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors."
Most of Intel's function libraries now contain two different CPU dispatchers, a fair one and an unfair one. It is not clear when the fair dispatcher is used and when the unfair dispatcher is used. The decision may depend on legal technicalities that are elusive to the programmer and to the end user. Software products with unfair CPU dispatching still abound despite the settlement with AMD that prohibits artificial performance impairment.
In 2010, Intel published an article on how the CPU dispatching works in the Intel Performance Primitives (IPP) function library. The article indicates a fair handling of non-Intel processors in the IPP library. This is in accordance with my test results. What the article does not mention is the unfair CPU dispatching in several other Intel function libraries.
Workarounds
Software programmers should be aware of these problems and avoid unfair CPU dispatching in any software product that may run on non-Intel processors. Possible remedies and workarounds include:
- Avoid the Intel compiler. There are other compilers with similar or better performance. If you need an Intel compiler for a specific feature, then use the Intel compiler only for the relevant part of the program and compile the rest of the program with another compiler.
- Avoid Intel function libraries with unfair CPU dispatching, if possible. The Intel MKL and VML libraries have unfair CPU dispatching in many cases. The IPP and SVML libraries have fair CPU dispatching when used with a non-Intel compiler.
- Intel function libraries have two CPU dispatchers, a fair one and an unfair one. You may call the fair dispatcher explicitly. The MKL library also contains a function that detects whether it is running an Intel CPU. You may override this function with another function that always returns true. Code examples for these methods are provided in https://www.agner.org/optimize/intel_dispatch_patch.zip. Note that these methods are relying on undocumented features in Intel function libraries.
- Set the environment variable MKL_DEBUG_CPU_TYPE=5 when running on an AMD computer. This will improve performance on certain MKL functions.
- Use other compilers and other function libraries than Intel whenever a suitable alternative exists.
- Make your own functions. For example, you may use my C++ vector class library (VCL) for making mathematical functions that use the latest instruction set extensions for parallel computation.
What end users can do
- If you are using mathematical software packages such as Matlab, Mathcad, or Mathematica on an AMD computer then you may set the environment variable MKL_DEBUG_CPU_TYPE=5.
These software packages are using Intel MKL functions for certain purposes. Setting the environment variable will override the unfair CPU dispatching in MKL functions and make the program run faster. - Never rely on benchmark tests unless the benchmarking code is known to be open source and compiled without using any Intel tools.
- Demand that software producers guarantee fair performance on non-Intel processors.