Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

Proposal now published
Author: Hubert Lamontagne Date: 2016-03-24 12:20
| 1. Allocate more heap space somewhere else. Make an extra entry in the memory map for it.

That would work, although then the memory map would potentially need to be somewhat complex and probably have
to be cached - this is almost a TLB already, the only missing feature is the ability to remap memory blocks.

| 2. Same as 1. Use virtual address translation to keep it contiguous in order to make the heap manager simpler.
| This requires that you allocate a lot of unused virtual address space for each program as it starts. This method
| works also for the stack.

This is exactly what TLBs do!

| 3. As a last resort, when memory space has become hopelessly fragmented and you are out of memory map entries.
| Swap the data of the least used program to disk. Reorganize the fragmented data by actually moving the values to
| make them contiguous in physical memory (assuming that they were already contiguous in virtual address space).
| This will cause an annoying delay to the user, but we already have such delays in current systems when you run out
| of memory. You may signal a warning to the user: Memory low, please close some programs.

Amazingly, this approach exists on real hardware: pre-OSX Mac OS works this way. It has a memory compactor and
all memory allocations must use memory handles instead of pointers. When RAM runs out, the OS recopies the whole
RAM and removes all dead allocation and updates all memory handle addresses. The reason for this is that the first
Macs had no MMU (!) so this was the only approach that could work, but this created long lasting damage to the
platform: there was a whole system for locking down handles (when doing system calls and so forth), and the whole
thing was only fixed with OSX. 16 bit Windows also has this issue (they figured out a way to fix it in Win32).

Java also typically has memory compaction and doesn't require an MMU, but it pays a price for it: it has unavoidable
garbage collector pauses, often taking upwards of 100ms (which is one of the reasons why Minecraft is a bit jerky).
Then again, Java is typically used for server software and enterprise/government stuff, where pauses don't really matter
as much as dealing with second rate programmers that can't use malloc()/free() properly.

------

| I want to avoid complex instructions for saving and restoring all or many registers. Any suggestions for an alternative to
| writing 64 consecutive save instructions or having a complex microcoded save-all instruction?

For general purpose registers, you're going to do 32 consecutive loads/stores so it's always going to take 'some' time.
I'd suggest mandating 16-byte stack alignment in the ABI, and having store-dual and load-dual instructions that can
save/restore two registers in a single whole 128bit memory access. This is what ARM64 does. ARM already has to deal
with other instructions with 3 sources or 2 destinations anyways.

Other RISCs simply use individual instructions for each register, because normally you do register saving/restoring
when doing branches and interrupts and other moderately slow things, so the pipeline is likely to stall for all sorts of
other reasons (branch prediction miss, data cache miss, instruction cache miss, chains of indirect loads, having all
downstream operations depend on a single load so that the whole pipeline is latency-limited, etc), so the cost of
saving/restoring the regfile is lost in the wash.

For vector register save/restore, you're dealing with large vector registers which is going to take multiple memory cycles,
so it almost doesn't matter how you do it! The cost of keeping the data cache busy for multiple cycles for each vector
register load/store completely dwarfs the cost of issuing 32 loads/stores instead of less. It's also possible to have
instructions that load/store to multiple contiguous SIMD registers - Arm NEON has tons of this, including strange stuff
like interleaved loads.

------

| I am not sure if you are trying to save power by doing a 32-bit addition instead of 64 bits. The hardware may actually
| disable the upper part of the heavy carry-lookahead circuit if it can detect quickly that the values are small. Otherwise,
| I don't see the problem of using 64-bit addition on 32-bit values. The value will simply be truncated if it is later used
| in a 32-bit (not-tiny) instruction. The compiler will recognize the need to sign-extend a 32-bit signed index or optimize
| the program by replacing it with a 64-bit index variable. Or you may use an unsigned 32-bit index and use the carry
| flag for detecting end of loop if the index is counting down. I would rather keep array indexes 64-bit because setting
| a 2 GB size limit to arrays is a problem for the programming language standard, and the value may cross zero when
| a count-down loop ends.

I'm not suggesting it for power-saving reasons, I'm suggesting it for C++ compatibility reasons!

You're right: 64bit MOV, ADD, SUB, AND, OR, XOR, SHL can stand in for 32bit versions. This does not work for SHR,
arithmetic SAR and comparisons though. MUL is a bit of a wash - 64bit MUL can stand in for 32bit MUL, BUT 64bit
MUL is kinda slow to implement in hardware due to the large number of bits involved and large number of partial sums,
so a 32bit MUL still makes sense. This is why ARM ended up with a 16x16->32 MUL in its instruction set - it's strictly
there because it can run faster than 32x32->32 MUL.

The compiler will probably have to add an instruction to sign-extend all 32bit indexes. It cannot replace 32bit variables
with 64bit variables because that changes behavior - instead of 'int' being "32bits", it becomes "32bits except if the
compiler decided to make it 64bits but it can still revert to 32bits at any time depending on if the value is in a register
or in RAM". Technically you could make int 64bit, and some early 64bit cpus did this, but this is bad because it eats
twice as much data cache for no good reason.

IRL, >2GB arrays are as rare as hens teeth and if they do happen, they probably already need complex management
for other reasons. This is not really due to constraints of current memory sizes... it's more of a result of the fact that
you practically never need "2 billion of something".

 
thread Proposal for an ideal extensible instruction set new - Agner - 2015-12-27
replythread Itanium new - Ethan - 2015-12-28
last reply Itanium new - Agner - 2015-12-28
replythread Proposal for an ideal extensible instruction set new - hagbardCeline - 2015-12-28
last replythread Proposal for an ideal extensible instruction set new - Agner - 2015-12-28
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
replythread Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04
reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-05
replythread Proposal for an ideal extensible instruction set new - John D. McCalpin - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-06
last reply Proposal for an ideal extensible instruction set new - Ook - 2016-01-05
last reply Proposal for an ideal extensible instruction set new - acppcoder - 2016-03-27
reply Proposal for an ideal extensible instruction set new - Jake Stine - 2016-01-11
replythread Proposal for an ideal extensible instruction set new - Agner - 2016-01-12
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-02
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-03
last replythread Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-12
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-18
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-21
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-22
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-23
replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-23
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last replythread Proposal for an ideal extensible instruction set new - asdf - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-24
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-02-25
replythread limit instruction length to power of 2 new - A-11 - 2016-02-24
last replythread limit instruction length to power of 2 new - Agner - 2016-02-24
replythread Any techniques for more than 2 loads per cycle? new - Hubert Lamontagne - 2016-02-24
last reply Any techniques for more than 2 loads per cycle? new - Agner - 2016-02-25
last replythread limit instruction length to power of 2 new - A-11 - 2016-02-25
last reply limit instruction length to power of 2 new - Hubert Lamontagne - 2016-02-25
replythread More ideas new - Agner - 2016-03-04
replythread More ideas new - Hubert Lamontagne - 2016-03-07
last reply More ideas new - Agner - 2016-03-08
last reply More ideas new - Agner - 2016-03-09
replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-07
reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-08
last replythread Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-09
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-10
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-03-11
last replythread Proposal for an ideal extensible instruction set new - anon2718 - 2016-03-13
last reply Proposal for an ideal extensible instruction set new - Agner - 2016-03-14
replythread A design without a TLB new - Agner - 2016-03-11
replythread A design without a TLB new - Hubert Lamontagne - 2016-03-11
reply A design without a TLB new - Agner - 2016-03-11
last reply A design without a TLB new - Agner - 2016-03-12
reply A design without a TLB new - Bigos - 2016-03-13
last reply A design without a TLB new - Agner - 2016-03-28
replythread Proposal now published new - Agner - 2016-03-22
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-23
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-24
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-24
last replythread Proposal now published new - Agner - 2016-03-25
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-28
last replythread Proposal now published new - Agner - 2016-03-29
last replythread Proposal now published new - Hubert Lamontagne - 2016-03-30
last replythread Proposal now published new - Agner - 2016-03-30
last replythread Do we need instructions with two outputs? new - Agner - 2016-03-31
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-01
reply Do we need instructions with two outputs? new - Agner - 2016-04-01
replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-02
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-02
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-03
reply Do we need instructions with two outputs? new - Joe Duarte - 2016-04-03
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-03
last replythread Do we need instructions with two outputs? new - Agner - 2016-04-04
reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-04
last replythread Do we need instructions with two outputs? new - Joe Duarte - 2016-04-06
last replythread Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-07
last replythread Do we need instructions with two outputs? new - HarryDev - 2016-04-08
last reply Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-09
replythread How about stack machine ISA? new - A-11 - 2016-04-10
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-14
last replythread treating stack ISA as CISC architecure new - Agner - 2016-04-14
last replythread treating stack ISA as CISC architecure new - A-11 - 2016-04-17
replythread treating stack ISA as CISC architecure new - Hubert Lamontagne - 2016-04-17
last replythread stack ISA versus long vectors new - Agner - 2016-04-18
last replythread stack ISA versus long vectors new - Hubert Lamontagne - 2016-04-19
last reply stack ISA versus long vectors new - Agner - 2016-04-20
last reply treating stack ISA as CISC architecure new - A-11 - 2016-04-18
replythread Proposal for an ideal extensible instruction set new - zboson - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-11
last replythread Proposal for an ideal extensible instruction set new - Agner - 2016-04-12
last reply Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-12
replythread Version 1.01 new - Agner - 2016-05-10
last replythread Version 1.01 new - Hubert Lamontagne - 2016-05-13
last replythread Version 1.01 new - Agner - 2016-05-14
last replythread Version 1.01 new - Harry - 2016-06-02
replythread Public repository new - Agner - 2016-06-02
reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Harry - 2016-06-02
last reply Public repository new - Agner - 2016-06-09
replythread Rethinking DLLs and shared objects new - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - cv - 2016-05-20
last reply Rethinking DLLs and shared objects new - Agner - 2016-05-20
replythread Rethinking DLLs and shared objects new - Peter Cordes - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Agner - 2016-05-30
last replythread Rethinking DLLs and shared objects new - Joe Duarte - 2016-06-17
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-18
last reply Rethinking DLLs and shared objects new - Bigos - 2016-06-18
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-02
last replythread Rethinking DLLs and shared objects new - Agner - 2016-06-04
last replythread Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-04
last reply Rethinking DLLs and shared objects new - Agner - 2016-06-06
replythread Is it better to have two stacks? new - Agner - 2016-06-05
reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-07
replythread Is it better to have two stacks? new - Eden Segal - 2016-06-13
last replythread Is it better to have two stacks? new - Agner - 2016-06-13
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-14
last replythread Is it better to have two stacks? new - Agner - 2016-06-14
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-15
last replythread Is it better to have two stacks? new - Agner - 2016-06-15
last replythread Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-16
last replythread Is it better to have two stacks? new - Agner - 2016-06-16
last reply Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-17
last reply Is it better to have two stacks? new - Freddie Witherden - 2016-06-22
last reply Now on Github new - Agner - 2016-06-26