Search found 4 matches
- 2022-05-17, 15:54:54
- Forum: Agner's CPU blog
- Topic: Intel's new Chimera: Alder Lake
- Replies: 14
- Views: 696723
Re: Intel's new Chimera: Alder Lake
With 8000 8-byte NOPS, the bottleneck are not the decoders, but instruction cache misses. You can see this by looking at the L2_RQSTS.CODE_RD_HIT counter (24.C4): sudo ./nanoBench.sh -f -conf configs/cfg_AlderLakeP_all.txt -cpu 0 -basic -unroll 1000 -loop 1000 -asm "|8|8|8|8|8|8|8|8" | grep -v 0.00 ...
- 2022-05-16, 14:06:34
- Forum: Agner's CPU blog
- Topic: Intel's new Chimera: Alder Lake
- Replies: 14
- Views: 696723
Re: Intel's new Chimera: Alder Lake
My code is not running out of the µop cache. This can be seen from the UOPS_MITE count that is shown in the output.
- 2022-05-15, 21:04:11
- Forum: Agner's CPU blog
- Topic: Intel's new Chimera: Alder Lake
- Replies: 14
- Views: 696723
Re: Intel's new Chimera: Alder Lake
The decoders can deliver a maximum of 4 µops per clock for a single thread According to my tests, the decoders on the P cores can decode 6 instructions per cycle. Here is an example for a sequence of NOP instructions that require, on average, 0.17 cycles: https://uops.info/html-tp/ADL-P/NOP-Measure...
- 2021-04-06, 19:09:04
- Forum: Agner's CPU blog
- Topic: Intel Sunny Cove
- Replies: 7
- Views: 95489
Re: Intel Sunny Cove
According to your optimization guide, inc and dec cannot be macro fused on Tiger Lake. How do your tests for this look like? According to my tests (which are available here: https://www.uops.info/html-tp/TGL/DEC_R64-Measurements.html#macroFusion), they do macro fuse in the same way as on previous mi...