Search found 4 matches
- 2023-07-31, 10:24:05
- Forum: Agner's CPU blog
- Topic: Intel AVX10 & APX announcement
- Replies: 7
- Views: 209380
Re: Intel AVX10 & APX announcement
ARM SVE/2 can vary vector length between 128 bits and 2048 bits, at 128-bit increments. I don't see any reason to adopt this in x86 since you can just mask off the unused part of a vector when saving it. Might be worth pointing out that ARM has published an errata (see C215) which restricts SVE vec...
- 2023-07-30, 10:32:37
- Forum: Agner's CPU blog
- Topic: Intel AVX10 & APX announcement
- Replies: 7
- Views: 209380
Re: Intel AVX10 & APX announcement
Responding to "What do you guys think?": Why would any other CPU vendor want to adopt this ISA extension? AFAIK AVX10.1 is just Sapphire Rapids' level AVX-512 renamed, with some new CPUID bits. I don't see why AMD (assuming they adopt FP16) would choose not to support it. AVX10.2 doesn't seem to cha...
- 2022-04-25, 6:48:11
- Forum: Agner's CPU blog
- Topic: Intel's new Chimera: Alder Lake
- Replies: 14
- Views: 696724
Re: Intel's new Chimera: Alder Lake
Thanks for the writeup.
If it helps, I have an AVX512 enabled 12700K if there's some program/code you want me to run on it.
If you want to get such a chip yourself, I wrote an article regarding requirements.
If it helps, I have an AVX512 enabled 12700K if there's some program/code you want me to run on it.
If you want to get such a chip yourself, I wrote an article regarding requirements.
- 2021-10-04, 11:35:55
- Forum: Agner's CPU blog
- Topic: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
- Replies: 4
- Views: 88726
Re: Intel Floating Point Executing 3 to 4 Times Faster Than it Should. MAKES NO SENSE
Pre-loading the destination register of the multiply speeds up this loop by 5 times. Loading the source register does not speed it up. There's no way that loop should be able to run faster than the 5 clock cycle latency of the multiply instruction, and yet it does. This should be impossible. Even m...