Today, most high-end microprocessors have two or more cores. Multi-threaded applications take advantage of multi-core processors by running multiple threads simultaneously. If you are running four threads simultaneously on a processor with four cores you get four times as much work done per time unit.
Some processors take multithreading even further by running two threads in each core. This is what Intel calls hyperthreading (also called simultaneous multithreading). For example, the Intel Core i7 processor with four cores can run eight threads simultaneously - two in each core. Apparently, the more threads you can run simultaneously the more work you get done in a given time. But there is a problem here: The two threads running in the same core are competing for the same resources. If each of the two threads gets only half the amount of a limiting resource then it will run at half speed, and the advantage of hyperthreading is completely gone. Two threads running at half speed is certainly not better than a single thread running at full speed.
I have made some tests of hyperthreading to see how fast each of the two threads is running. The following resources are shared between two threads running in the same core:
- Cache
- Branch prediction resources
- Instruction fetch and decoding
- Execution units
Hyperthreading is no advantage if any of these resources is a limiting factor for the speed. But hyperthreading can be an advantage if the speed is limited by something else. To be more specific, each of the two threads will run at more than half speed in the following cases:
- If memory data are so scattered that there will be many cache misses
regardless of whether each thread can use the full cache or only half of it. Then one thread can use all the execution resources while the other thread is waiting for a memory operand that was not in the cache.
- If there are many branch mispredictions and the number of branch mispredictions is not increased much by sharing the branch target buffer and branch history table between two threads. Then one thread can use all the execution resources while the other thread is waiting for the misprediction to be resolved.
- If the code has many long dependency chains that prevent efficient use of the execution units.
In these cases, each of the two threads will run at more than half speed, but less than full speed. The total performance is never doubled by hyperthreading, but it may be increased by e.g. 25%. On the other hand, if the performance is limited by any of the shared resources, for example the instruction fetcher, the memory read port, or the multiply unit, then the total performance is not increased by hyperthreading. Actually, in the worst cases the total performance is decreased by hyperthreading because
some resources are wasted when the two threads compete for the same resources. A quick google search reveals several examples of applications that run slower with hyperthreading than when hyperthreading is disabled.
I have tested two microprocessors with hyperthreading: the Intel Core i7 and
the Intel Atom. The Core i7 has four cores. This processor is quite powerful.
The execution units of each core are so powerful that a single thread will
rarely utilize the full potential of the processor. Therefore, it makes good
sense to run two threads in the same core. Unfortunately, the instruction fetch
unit is less powerful, and this is likely to be a bottleneck even in
single-threaded applications. With hyperthreading enabled, the Core i7 can run
eight threads simultaneously. This can give an impressive performance in
favorable cases, but how many applications are able to keep eight threads busy
at the same time?
The Intel Atom is a small low-power processor which is used in small netbook
computers and embedded applications. It has two cores capable of running two
threads each. The execution units of the Atom are much smaller than the i7. It
sounds like a weird idea to share the already meager execution units between two
threads. The rationale is that the Atom lacks the out-of-order capabilities of
the bigger processors. When the execution unit is waiting for an uncached memory
operand or some other long-latency event, it would have nothing else to do in
the meantime unless there was a second thread it could work on.
The details of these processors are explained in my microarchitecture manual
www.agner.org/optimize/#manuals.
Obviously, it can be quite difficult for a software programmer to predict whether
hyperthreading is good or bad for a particular application.
The only safe way of answering this question is to test it. Ideally, the programmer should
test his or her application on several different microprocessors with several different
data sets and with hyperthreading turned on and off.
This is a large burden indeed to put on software developers, and very few programmers
are willing to spend time and money on testing how hyperthreading affects their application.
If it turns out that hyperthreading is not good for a particular application then comes
the next problem of how to turn it off. Telling the user to turn off hyperthreading in the
BIOS setup is not an option. The average user may not have the skills to do so; the feature
may not be supported in the BIOS; or it may be that hyperthreading is good for one program
and bad for another program running on the same computer.
The programmer has to put the "avoid hyperthreading" feature into the program.
First the program has to detect whether the computer it is running on has hyperthreading or not.
Later versions of Windows have system functions that can give this information.
In Linux you have to read a configuration file. If hyperthreading is detected then lock
the process to use the even-numbered logical processors only.
This will make one of the two threads in each processor core idle so that there is no
contention for resources.
Unfortunately, you cannot prevent the operating system from using the idle threads for something else. There is no way to tell the microprocessor to give one of the two threads in a core higher priority than another. Sometimes it happens that the operating system lets two threads with very different priority run in the same processor core. This has the unfortunate consequence that the low-priority thread steals resources from the high-priority thread. I have seen this happening even with the new Windows 7. It is the responsibility of the operating system to avoid putting threads with different priority into the same core. But unfortunately, operating system designers haven't fully solved this problem yet.
What the application programmer needs is a system call that tells the operating system that "This
application wants to run no more than one thread in each core and I don't want to share any core with any other processes". Unfortunately, current operating systems have no such system call to my knowledge. Other
microprocessor vendors use hyperthreading as well. In fact, there are rumors
that AMD will use hyperthreading in some of their processors in the future. Hyperthreading
does indeed give a measurable advantage that shows in benchmark tests. This is a
strong sales argument that may convince the confused consumer. But the
microprocessor designer should also take into account that few applications are
able to handle hyperthreading optimally. This is a technology that places a
considerable burden on software developers as well as on operating system
designers. We may ask whether the silicon space that is used for implementing
hyperthreading might be better used for other purposes? |