Page 1 of 1

Intel's "cripple AMD" function

Posted: 2019-12-30, 12:35:23
by agner
Intel is producing C++ and Fortran compilers as well as several highly optimized function libraries for a variety of purposes. Software programmers who are using Intel's compilers and function libraries are not always aware that software produced with these tools may run slower than necessary when it is executed on an AMD or VIA processor.

The code may contain several different versions, each optimized for a particular instruction set extension, such as SSE2, AVX, or AVX512. A so-called CPU dispatcher is choosing which code version to run, depending on the CPU it is running on. The CPU dispatcher may be fair or unfair. A fair CPU dispatcher will choose the optimal code version depending on which instruction set extension the CPU supports. An unfair CPU dispatcher will detect the brand of CPU and choose the optimal code path only when it is running on an Intel CPU. It will choose the slow "generic" version when running on another brand of CPU, even if the CPU is compatible with a better version.

The fact that unfair CPU dispatchers have appeared in software produced by third party programmers without their knowledge, has led to a lot of controversy and legal battles. We have even seen misleading benchmark comparisons between Intel and AMD processors based on benchmark software that contained unfair CPU dispatchers.

A long discussion thread on this topic can be seen in an older version of this blog at https://www.agner.org/optimize/blog/read.php?i=49. The present post is a digest of this discussion with a review of the most important conclusions. Links and details can be found in the original thread.

When I started testing Intel's compiler several years ago, I soon found out that it had a biased CPU dispatcher. Back in January 2007 I complained to Intel about the unfair CPU dispatcher. I had a long correspondence with Intel engineers about the issue, where they kept denying the problem and I kept providing more evidence. They said that:
The CPU dispatch, coupled with optimizations, is designed to optimize performance across Intel and AMD processors to give the best results. This is clearly our goal and with one exception we believe we are there now. The one exception is that our 9.x compilers do not support SSE3 on AMD processors because of the timing of the release of AMD processors vs. our compiler (our compiler was developed before AMD supported SSE3). The future 10.x compilers, which enter beta this quarter and release around the middle of the year, will address this now that we've had time to tune and adjust to the new AMD processors.
This sounds nice, but the truth is that the CPU dispatcher did not support higher instruction sets in AMD processors and still does not today (Intel compiler version 19). I have later found out that others have made similar complaints to Intel and got similarly useless answers (see links in the old blog post).

I also found that the Intel CPU dispatcher did not only check the vendor ID string and the instruction sets supported. It also checked for specific processor models. In fact, it would fail to recognize future Intel processors with a family number different from 6. When I mentioned this to the Intel engineers they replied:
You mentioned we will not support future Intel processors with non-'6' family designations without a compiler update. Yes, that is correct and intentional. Our compiler produces code which we have high confidence will continue to run in the future. This has the effect of not assuming anything about future Intel or AMD or other processors. You have noted we could be more aggressive. We believe that would not be wise for our customers, who want a level of security that their code (built with our compiler) will continue to run far into the future. Your suggested methods, while they may sound reasonable, are not conservative enough for our highly optimizing compiler. Our experience steers us to issue code conservatively, and update the compiler when we have had a chance to verify functionality with new Intel and new AMD processors. That means there is a lag sometime in our production release support for new processors.
In other words, they claim that they are optimizing for specific processor models rather than for specific instruction sets. If true, this gives Intel an argument for not supporting AMD processors properly. But it also means that all software developers who use an Intel compiler have to recompile their code and distribute new versions to their customers every time a new Intel processor appears on the market. Three years later, I tried to run a program compiled with an old version of Intel's compiler on the newest Intel processors? You guessed it: It still runs the optimal code path. But the reason is more difficult to guess: Intel have manipulated the CPUID family numbers on new processors in such a way that they appear as known models to older Intel software.

Perhaps the initial design of Intel's CPU dispatcher was indeed intended to optimize for known processor models only, without regard for future models. If any of my students had made such a solution that was not future-oriented, I would consider it a serious flaw. Perhaps the Intel engineers discovered the missing support for future processors too late so that they had to design the next generation of their processors in such a way that they appeared as known models to existing Intel software.

After Intel had flatly denied to change their CPU dispatcher, I decided that the most efficient way to make them change their minds was to create publicity about the problem. I contacted several IT magazines, but nobody wanted to write about it. Sad, but not very surprising, considering that they all depend on advertising money from Intel. The only publicity was my own optimization manual where I have described the problem in detail and given instructions on how to replace the unfair CPU dispatcher. I wonder why AMD did not create public awareness about the problem. Were they obliged to keep quiet about an ongoing lawsuit? And what about VIA/Centaur?

It is possible to change the vendor ID string in VIA processors by the use of certain undocumented registers. It is also possible to change the vendor ID string in AMD processors by the use of virtualization instructions. In 2010, I documented that software produced with Intel tools run faster on a VIA processor when the vendor ID string in the processor is changed to "GenuineIntel".

The same effect was documented with some of the most popular mathematical software packages, including Mathematica, Mathcad, and Matlab. These software packages are running slower than necessary on non-Intel computers. A significant improvement in speed was observed when the CPU ID string was manipulated to fake an Intel CPU.

Perhaps AMD engineers have been unaware of the problem for several years. I discovered that a certain function library published by AMD contains an unfair CPU dispatcher originating from an Intel Fortran compiler. This function library runs faster on a non-Intel processor when the vendor ID string is manipulated to falsely indicate an Intel processor.

The fact that software produced with Intel tools contain unfair CPU dispatchers, possibly without the knowledge of the programmer or the user, has led to some serious legal battles. AMD has filed several lawsuits against Intel. An out of court settlement in 2009 agrees that Intel shall not include any artificial performance impairment that is made intentionally to degrade the performance or operation of a specific AMD product.

The US Federal Trade Commission filed a complaint in 2009 against Intel for several charges of unfair competition. One of the charges was the unfair CPU dispatching. My research played an important role in documenting this. An out of court settlement orders that Intel must inform its software customers about the CPU dispatch mechanism that leads to suboptimal performance on non-Intel CPUs.

The state of affairs after these legal battles is that Intel is publishing an "Optimization Notice" on its software products stating that "Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors."

Most of Intel's function libraries now contain two different CPU dispatchers, a fair one and an unfair one. It is not clear when the fair dispatcher is used and when the unfair dispatcher is used. The decision may depend on legal technicalities that are elusive to the programmer and to the end user. Software products with unfair CPU dispatching still abound despite the settlement with AMD that prohibits artificial performance impairment.

In 2010, Intel published an article on how the CPU dispatching works in the Intel Performance Primitives (IPP) function library. The article indicates a fair handling of non-Intel processors in the IPP library. This is in accordance with my test results. What the article does not mention is the unfair CPU dispatching in several other Intel function libraries.

Workarounds
Software programmers should be aware of these problems and avoid unfair CPU dispatching in any software product that may run on non-Intel processors. Possible remedies and workarounds include:
  • Avoid the Intel compiler. There are other compilers with similar or better performance. If you need an Intel compiler for a specific feature, then use the Intel compiler only for the relevant part of the program and compile the rest of the program with another compiler.
  • Avoid Intel function libraries with unfair CPU dispatching, if possible. The Intel MKL and VML libraries have unfair CPU dispatching in many cases. The IPP and SVML libraries have fair CPU dispatching when used with a non-Intel compiler.
  • Intel function libraries have two CPU dispatchers, a fair one and an unfair one. You may call the fair dispatcher explicitly. The MKL library also contains a function that detects whether it is running an Intel CPU. You may override this function with another function that always returns true. Code examples for these methods are provided in https://www.agner.org/optimize/intel_dispatch_patch.zip. Note that these methods are relying on undocumented features in Intel function libraries.
  • Set the environment variable MKL_DEBUG_CPU_TYPE=5 when running on an AMD computer. This will improve performance on certain MKL functions.
  • Use other compilers and other function libraries than Intel whenever a suitable alternative exists.
  • Make your own functions. For example, you may use my C++ vector class library (VCL) for making mathematical functions that use the latest instruction set extensions for parallel computation.

What end users can do
  • If you are using mathematical software packages such as Matlab, Mathcad, or Mathematica on an AMD computer then you may set the environment variable MKL_DEBUG_CPU_TYPE=5.
    These software packages are using Intel MKL functions for certain purposes. Setting the environment variable will override the unfair CPU dispatching in MKL functions and make the program run faster.
  • Never rely on benchmark tests unless the benchmarking code is known to be open source and compiled without using any Intel tools.
  • Demand that software producers guarantee fair performance on non-Intel processors.

Re: Intel's "cripple AMD" function

Posted: 2020-09-04, 8:27:02
by agner
Update:

The workaround trick of setting the environment variable MKL_DEBUG_CPU_TYPE=5 no longer works for Intel Math Kernel Library (MKL) version 2020.1. This is a problem for end users of mathematical programs. I have heard rumors that Matlab will stay with MKL version 2020.0 in order to better support customers with AMD computers while they are desperately looking for alternative options.

I do not know whether producers of other Math programs are also trying to find a fix. This may be a serious problem for end users who are running math programs on AMD machines. I would recommend that users ask the producer of the software whether they have a solution to the problem. If not, you could turn off automatic updates for the math software and stay with a pre-2020 version while setting the environment variable MKL_DEBUG_CPU_TYPE=5.

It is possible to make a hacker program that modifies the Intel function library files to remove the check for the Intel brand name. This is not a permanent solution, though, because we do not know whether Intel will modify their function libraries so that the hacking software no longer works.

The problem will remain as long as we are relying on compilers and software libraries produced by a CPU vendor that has an interest in making the software work better on their own CPUs than on competing CPU brands. The only sustainable solution is to use compilers and function libraries from reliable third parties. There are excellent compilers available, especially Clang and Gcc, but it is more difficult to find good alternatives to Intel math function libraries. I really hope that somebody will make open source math function libraries that are optimized for the AVX2 and AVX512 instruction sets. My own vector class library includes the elementary mathematical functions and a good random number generator, but I don't have the time to make a more comprehensive math function library. The vector class library can be a useful tool, though, for optimizing a math function library for different instruction sets (https://github.com/vectorclass/version2)

Re: Intel's "cripple AMD" function

Posted: 2020-12-12, 12:51:43
by Kristine
agner wrote: ↑
2020-09-04, 8:27:02
I really hope that somebody will make open source math function libraries that are optimized for the AVX2 and AVX512 instruction sets. My own vector class library includes the elementary mathematical functions and a good random number generator, but I don't have the time to make a more comprehensive math function library. The vector class library can be a useful tool, though, for optimizing a math function library for different instruction sets (https://github.com/vectorclass/version2)
https://github.com/shibatch/sleef is a "SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT", and it supports AVX512 along NEON and other architectures.

Re: Intel's "cripple AMD" function

Posted: 2021-01-25, 5:50:49
by jorgbrown
In the long run, this will hurt Intel, since it will drive customers away from their compilers. Given AMD's new Zen CPUs, anyone investigating their use will switch to gcc or clang for compilation, rather than Intel. I know Google switched their C++ compilation to clang more than two years ago. (There were of course many other reasons for that switch, but by crippling compiled-code performance on AMD, Intel has essentially disqualified their compiler from consideration.)

New Clang-based Intel compiler is better

Posted: 2022-08-08, 9:13:02
by agner
There is an important update to this story. Intel have switched to a Clang-based compiler that works well with non-Intel microprocessors.

If you download the new Intel "oneAPI" Compiler, you get two versions. A legacy version named "Classic" which is a continuation of the old compiler, and a new "LLVM-based" version which is in fact a forking of the Clang compiler. The classic version is not recommended for new projects and may soon be discontinued.

The Intel C++ compiler is now free, where previous versions required an expensive license.

Quite importantly, the legal "Optimization Notice" has disappeared. This may indicate that the new compiler is no longer crippling performance on non-Intel microprocessors.

The new compiler has various options for specifying an instruction set extension. The manual says about the /arch:INSTRUCTIONSET and -mINSTRUCTIONSET options:
Code generated with these options should execute on any compatible, non-Intel processor with support for the corresponding instruction set.
There are other options to specify the instruction set, and these options should be avoided:

The -xINSTRUCTIONSET or /QxINSTRUCTIONSET options will generate code that runs only on Intel processors. You will get an error message if you try to run the code on an AMD processor.

The -axINSTRUCTIONSET or /QaxINSTRUCTIONSET options will generate multiple versions of the user code with automatic CPU dispatching. This option works only on the legacy compiler, and it gives poor performance on non-Intel processors.

My tests confirm that code generated with the new LLVM-based compiler with the /arch or -m options performs well on non-Intel processors. The various tricks to circumvent the unfair CPU dispatcher are apparently not needed any more.

We may speculate why Intel has switched to a new compiler and changed its policy on CPU dispatching, and there can be many good reasons:
  • It has become too expensive to develop and maintain a compiler. There are now more than two thousand x86 instructions, and the number is still increasing. The C++ language keeps developing with new complicated features added. And optimization techniques keep getting more advanced
  • Other compilers optimize better than Intel's
  • Nobody will use a compiler that generates code that does not work well on all brands of CPU
  • The controversy around unfair CPU dispatching has damaged Intel's image
The reason Intel gives is shorter compilation times and better optimization (link).

Intel provides many highly optimized function libraries. I have tested the CPU dispatching in some of the library functions. These libraries still have both a fair and an unfair CPU dispatcher as described in the above post. It appears that we get the fair dispatcher when the library is used in connection with the LLVM-based compiler without the -x option. We also get fair dispatching when the library is used with another compiler than Intel. However, I have not tested all library functions. When asked about the MKL library, Intel just replies "performance on non-Intel microarchitectures can vary" (link). I am pretty sure that the SVML library has fair CPU dispatching when used with a non-Intel compiler as this has been the case for many years. There is less certainty about the MKL library.

I have tested how well the different compilers optimize, and published the results in my C++ optimization manual. The Clang and Gnu compilers are very good. Intel's LLVM-based compiler is very similar to the Clang compiler. Microsoft's compiler and the legacy Intel compiler give somewhat inferior performance.

Whether Intel's LLVM-based compiler treats Intel and AMD processors equally in all cases is still an open question. In an article about the shift to the new compiler and Intel's contribution to the general LLVM project, they write (link)
Not all our optimization techniques get upstreamed—sometimes because they are too new, sometimes because they are very specific for Intel architecture.
The same article shows benchmarks indicating that Intel's LLVM-based compiler optimizes better than Clang. However, such a difference is not seen in my tests.

My conclusion now is that Intel's LLVM-based compiler may be used for compiling code that can run on all brands of CPUs, but you may as well use the plain Clang compiler which is almost identical and optimizes at least as good.

Intel's SVML library may be used for vector math functions. Other alternatives include the Sleef library and my Vector class library.

Other Intel function libraries should be used with care as long as there is no clear indication from Intel about the CPU dispatching in these libraries. You may do your own tests to see how well they perform on the best AMD processors.

Re: Intel's "cripple AMD" function

Posted: 2023-09-06, 7:52:16
by karalinda
In many cases, there are no good alternatives to Intel's function libraries. ... There's a diff between cripple and not-use-best.Although both AMD and Intel are descendants of Fairchild Semiconductor, slope unblocked the main difference between the two companies is that Intel has much stronger revenue streams and higher R&D budgets. That financial advantage, along with the efficiency and sophistication of Intel's chips, has often left AMD struggling to compete.Intel has announced an enhanced version of its 13th-Generation Core i9-13900K processor, which is its first to reach speeds of 6.0GHz without overclocking. The new Core i9-13900KS is Intel's fastest-ever CPU and the fastest one currently on the market from any manufacturer

Re: Intel's "cripple AMD" function

Posted: 2023-09-06, 10:06:59
by agner
Karalinda wrote:
In many cases, there are no good alternatives to Intel's function libraries
Apparently, it is now possible to use the Intel function libraries without the cripple feature. See my previous post "New Clang-based Intel compiler is better"