Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Proposal now published
Author: Hubert Lamontagne Date: 2016-03-23 20:03
Really nice doc! A question and a couple suggestions:

I kinda wonder how the OS would handle the following case:
- Load program 1 to offset 0010 0000h
- Load program 2 to offset 0020 0000h
- Load program 3 to offset 0030 0000h
- The user unexpectedly loads a 300mb file in program 2, which causes a surprise 300mb allocation on the heap. Where does the OS place this allocation?

--------

One suggestion: remove "Indexing into the register file".

Rationale:

The implementation of register file indexing on an out-of-order CPU looks like this:

- Option #1: Stall hard at the register rename stage until the indexing value becomes valid and can be read. Then read the value through a special feedback path from the register file/bypass network/data cache up to the renamer. If the index is loaded from memory, this will stall for at least 3 or 4 cycles even in the best case. Resume execution.

- Option #2: Speculatively assume that the indexed register will not be the same as any other subsequent operation. Record the previous value of every single rename in a special queue or register file after this until the indexing value can be read (through the special feedback path), or alternatively backup the whole renaming register file. When the indexing resolves, check that no reads/writes have been done to the resolved register. If there are any reads, trigger a branch prediction fail and restore the CPU state to the last known valid state.

- Option #3: Have a specialized predictor that remembers which indexed register reads/writes cause branch prediction fails. Use option #2 by default, except for indexed-register-reads/writes that trigger fails which use option #1.

Even on in-order CPUs, this is a problem because this defeats the bypass network for register reads - you can't bypass a register that you can't figure out what it's going to be yet! Having to stall multiple cycles to avoid potential hazards make register-file indexing very costly, which defeats the purpose.

--------

2nd suggestion: Make operand size for tiny format 32bits, not 64bits (except mov). Consider trading tiny setbit/clearbit/xor/reversesubtract/andnot for arithmetic shift right and 64bit add/sub. Consider adding the option for 32bit signed indexes in address calculations or sign extending instead of zero extending 32bit operations to 64bit by default.

Rationale:

32bit integer operations are extremely extremely common in C/C++ code, due to how the LLP64 (win64) and LP64 (OSX+Linux) models work. Almost only pointers will be 64bits in a typical program (and the remaining cases are when size_t is used instead of int for loop indexes), and typically the only operation done on pointers are add, sub and compare. This also affects indexing - some_pointer[signed_int32_offset] tends to show up very often in code, and you probably don't want the compilers to add a sign extension operation every time.

 
thread Proposal for an ideal extensible instruction set new - Agner - 2015-12-27
replythread Itanium new - Ethan - 2015-12-28
last reply Itanium new - Agner - 2015-12-28
replythread Proposal for an ideal extensible instruction set new - hagbardCeline - 2015-12-28
last replythread Proposal for an ideal extensible instruction set new - Agner - 2015-12-28
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
replythread Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-05
replythread Proposal for an ideal extensible instruction set new - John D. McCalpin - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-06
last reply Proposal for an ideal extensible instruction set new - Ook - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - acppcoder - 2016-03-27
reply Proposal for an ideal extensible instruction set new - Jake Stine - 2016-01-11
replythread Proposal for an ideal extensible instruction set new - Agner - 2016-01-12
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-02
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-03
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-12
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-18
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-21
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-22
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-23
replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-23
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last replythread Proposal for an ideal extensible instruction set new - asdf - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-25
replythread limit instruction length to power of 2 new - A-11 - 2016-02-24
last replythread limit instruction length to power of 2 new - Agner - 2016-02-24
replythread Any techniques for more than 2 loads per cycle? new - Hubert Lamontagne - 2016-02-24
last reply Any techniques for more than 2 loads per cycle? new - Agner - 2016-02-25
last replythread limit instruction length to power of 2 new - A-11 - 2016-02-25
last reply limit instruction length to power of 2 new - Hubert Lamontagne - 2016-02-25
replythread More ideas new - Agner - 2016-03-04
replythread More ideas new - Hubert Lamontagne - 2016-03-07
last reply More ideas new - Agner - 2016-03-08
last reply More ideas new - Agner - 2016-03-09
replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-07
reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-09
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-10
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - anon2718 - 2016-03-13
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-14
replythread A design without a TLB new - Agner - 2016-03-11
replythread A design without a TLB new - Hubert Lamontagne - 2016-03-11
reply A design without a TLB new - Agner - 2016-03-11
last reply A design without a TLB new - Agner - 2016-03-12
reply A design without a TLB new - Bigos - 2016-03-13
last reply A design without a TLB new - Agner - 2016-03-28
replythread Proposal now published new - Agner - 2016-03-22
last replythread Proposal now published - Hubert Lamontagne - 2016-03-23
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-25
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-28
last replythread Proposal now published new - Agner - 2016-03-29
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-30
last replythread Proposal now published new - Agner - 2016-03-30
last replythread Do we need instructions with two outputs? new - Agner - 2016-03-31
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-01
reply Do we need instructions with two outputs? new - Agner - 2016-04-01
replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-03
reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-03
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-03
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-04
reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-04
last replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-06
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-07
last replythread Do we need instructions with two outputs? new - HarryDev - 2016-04-08
last reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-09
replythread How about stack machine ISA? new - A-11 - 2016-04-10
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-14
last replythread treating stack ISA as CISC architecure new - Agner - 2016-04-14
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-17
replythread treating stack ISA as CISC architecure new - Hubert Lamontagne - 2016-04-17
last replythread stack ISA versus long vectors new - Agner - 2016-04-18
last replythread stack ISA versus long vectors new - Hubert Lamontagne - 2016-04-19
last reply stack ISA versus long vectors new - Agner - 2016-04-20
last reply treating stack ISA as CISC architecure new - A-11 - 2016-04-18
replythread Proposal for an ideal extensible instruction set new - zboson - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-12
last reply Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-12
replythread Version 1.01 new - Agner - 2016-05-10
last replythread Version 1.01 new - Hubert Lamontagne - 2016-05-13
last replythread Version 1.01 new - Agner - 2016-05-14
last replythread Version 1.01 new - Harry - 2016-06-02
replythread Public repository new - Agner - 2016-06-02
reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Agner - 2016-06-09
replythread Rethinking DLLs and shared objects new - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - cv - 2016-05-20
last reply Rethinking DLLs and shared objects new - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - Peter Cordes - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Agner - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Joe Duarte - 2016-06-17
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-18
last reply Rethinking DLLs and shared objects new - Bigos - 2016-06-18
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-02
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-04
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-04
last reply Rethinking DLLs and shared objects new - Agner - 2016-06-06
replythread Is it better to have two stacks? new - Agner - 2016-06-05
reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-07
replythread Is it better to have two stacks? new - Eden Segal - 2016-06-13
last replythread Is it better to have two stacks? new - Agner - 2016-06-13
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-14
last replythread Is it better to have two stacks? new - Agner - 2016-06-14
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-15
last replythread Is it better to have two stacks? new - Agner - 2016-06-15
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-16
last replythread Is it better to have two stacks? new - Agner - 2016-06-16
last reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-17
last reply Is it better to have two stacks? new - Freddie Witherden - 2016-06-22
last reply Now on Github new - Agner - 2016-06-26