Agner`s CPU blog

Software optimization resources | E-mail subscription to this blog | www.agner.org

List Messageboards

Do we need instructions with two outputs?

Author: Hubert Lamontagne

Date: 2016-04-03 19:00

Suppose the SP is at 0x02018, and the L1 cache lines are 64 bytes in size, and you want to save a vector that's, say, 24 bytes long (6*32bit floats). Then, you need to first save the control word that tells you the size, compression etc of the vector. Fair enough, the vector data goes to 0x01FE8..0x02017. And then you have to save the vector size control word, which puts you at 0x01FE4 if you assume 32bits... but this doesn't work because then your SP is not 8-byte aligned anymore for 64bit integer and floating-point value. So you must save the vector size word to 0x01FE0 instead, with some extra padding (and the CPU either stores the amount of padding in the vector size word, or recalculates the amount of padding from SP alignment and vector size when reloading the vector).

Another possibility is that you could add some post-padding, so that the vector line is saved to 0x01FD8..0x01FFF and the control word goes to 0x01FD0, so that the whole thing fits in a single cache line. The amount of post-padding must be saved in the vector size control word.

Yeah, it's doable. But it's a long multicycle instruction, probably microcoded - after all, it writes an unpredictable amount of bytes to unpredictable offsets, often spanning 2 different cache lines, and updates the SP, and involves multiple address calculations to figure out just how much pre-padding and post-padding you need to do to keep your stack and your data well aligned. And it's very likely to completely block memory operation reordering (ie act like a memory barrier) because it's too difficult for concurrent memory operations to figure out whether they will overlap or not.

Agner wrote:

Hardware multipliers are expensive, and divisors are even more expensive. I wonder if we need to support multiplication and division of all operand sizes, including vectors of 8-bit and 16-bit integers, if programmers are using floating point anyway?

Generally, 8-bit and 16-bit vector multiplications are provided in SIMD instruction sets to do stuff like movie decoding and software rendering (when OpenGL/DirectX are unavailable due to software constraints, such as running as a plugin). For scalars, 32*32->32 multiplies cover everything (and are common in C++ code), but some CPUs also provide 16*16->32 multiplies because they run faster (ARM).

Reply To This Message

Previous Message

Proposal for an ideal extensible instruction set new - Agner - 2015-12-27

Itanium new - Ethan - 2015-12-28

Itanium new - Agner - 2015-12-28

Proposal for an ideal extensible instruction set new - hagbardCeline - 2015-12-28

Proposal for an ideal extensible instruction set new - Agner - 2015-12-28

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-04

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-05

Proposal for an ideal extensible instruction set new - John D. McCalpin - 2016-01-05

Proposal for an ideal extensible instruction set new - Adrian Bocaniciu - 2016-01-06

Proposal for an ideal extensible instruction set new - Ook - 2016-01-05

Proposal for an ideal extensible instruction set new - acppcoder - 2016-03-27

Proposal for an ideal extensible instruction set new - Jake Stine - 2016-01-11

Proposal for an ideal extensible instruction set new - Agner - 2016-01-12

Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-02

Proposal for an ideal extensible instruction set new - Agner - 2016-02-03

Proposal for an ideal extensible instruction set new - Jonathan Morton - 2016-02-12

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-18

Proposal for an ideal extensible instruction set new - Agner - 2016-02-21

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-22

Proposal for an ideal extensible instruction set new - Agner - 2016-02-23

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-02-23

Proposal for an ideal extensible instruction set new - Agner - 2016-02-24

Proposal for an ideal extensible instruction set new - asdf - 2016-02-24

Proposal for an ideal extensible instruction set new - Agner - 2016-02-24

Proposal for an ideal extensible instruction set new - Agner - 2016-02-25

limit instruction length to power of 2 new - A-11 - 2016-02-24

limit instruction length to power of 2 new - Agner - 2016-02-24

Any techniques for more than 2 loads per cycle? new - Hubert Lamontagne - 2016-02-24

Any techniques for more than 2 loads per cycle? new - Agner - 2016-02-25

limit instruction length to power of 2 new - A-11 - 2016-02-25

limit instruction length to power of 2 new - Hubert Lamontagne - 2016-02-25

More ideas new - Agner - 2016-03-04

More ideas new - Hubert Lamontagne - 2016-03-07

More ideas new - Agner - 2016-03-08

More ideas new - Agner - 2016-03-09

Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-07

Proposal for an ideal extensible instruction set new - Agner - 2016-03-08

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-08

Proposal for an ideal extensible instruction set new - Joe Duarte - 2016-03-09

Proposal for an ideal extensible instruction set new - Agner - 2016-03-10

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-03-11

Proposal for an ideal extensible instruction set new - Agner - 2016-03-11

Proposal for an ideal extensible instruction set new - anon2718 - 2016-03-13

Proposal for an ideal extensible instruction set new - Agner - 2016-03-14

A design without a TLB new - Agner - 2016-03-11

A design without a TLB new - Hubert Lamontagne - 2016-03-11

A design without a TLB new - Agner - 2016-03-11

A design without a TLB new - Agner - 2016-03-12

A design without a TLB new - Bigos - 2016-03-13

A design without a TLB new - Agner - 2016-03-28

Proposal now published new - Agner - 2016-03-22

Proposal now published new - Hubert Lamontagne - 2016-03-23

Proposal now published new - Agner - 2016-03-24

Proposal now published new - Hubert Lamontagne - 2016-03-24

Proposal now published new - Agner - 2016-03-24

Proposal now published new - Hubert Lamontagne - 2016-03-24

Proposal now published new - Agner - 2016-03-25

Proposal now published new - Hubert Lamontagne - 2016-03-28

Proposal now published new - Agner - 2016-03-29

Proposal now published new - Hubert Lamontagne - 2016-03-30

Proposal now published new - Agner - 2016-03-30

Do we need instructions with two outputs? new - Agner - 2016-03-31

Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-01

Do we need instructions with two outputs? new - Agner - 2016-04-01

Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02

Do we need instructions with two outputs? new - Agner - 2016-04-02

Do we need instructions with two outputs? new - Joe Duarte - 2016-04-02

Do we need instructions with two outputs? new - Agner - 2016-04-02

Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-02

Do we need instructions with two outputs? new - Agner - 2016-04-03

Do we need instructions with two outputs? new - Joe Duarte - 2016-04-03

Do we need instructions with two outputs? - Hubert Lamontagne - 2016-04-03

Do we need instructions with two outputs? new - Agner - 2016-04-04

Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-04

Do we need instructions with two outputs? new - Joe Duarte - 2016-04-06

Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-07

Do we need instructions with two outputs? new - HarryDev - 2016-04-08

Do we need instructions with two outputs? new - Hubert Lamontagne - 2016-04-09

How about stack machine ISA? new - A-11 - 2016-04-10

treating stack ISA as CISC architecure new - A-11 - 2016-04-14

treating stack ISA as CISC architecure new - Agner - 2016-04-14

treating stack ISA as CISC architecure new - A-11 - 2016-04-17

treating stack ISA as CISC architecure new - Hubert Lamontagne - 2016-04-17

stack ISA versus long vectors new - Agner - 2016-04-18

stack ISA versus long vectors new - Hubert Lamontagne - 2016-04-19

stack ISA versus long vectors new - Agner - 2016-04-20

treating stack ISA as CISC architecure new - A-11 - 2016-04-18

Proposal for an ideal extensible instruction set new - zboson - 2016-04-11

Proposal for an ideal extensible instruction set new - Agner - 2016-04-11

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-11

Proposal for an ideal extensible instruction set new - Agner - 2016-04-12

Proposal for an ideal extensible instruction set new - Hubert Lamontagne - 2016-04-12

Version 1.01 new - Agner - 2016-05-10

Version 1.01 new - Hubert Lamontagne - 2016-05-13

Version 1.01 new - Agner - 2016-05-14

Version 1.01 new - Harry - 2016-06-02

Public repository new - Agner - 2016-06-02

Public repository new - Harry - 2016-06-02

Public repository new - Harry - 2016-06-02

Public repository new - Agner - 2016-06-09

Rethinking DLLs and shared objects new - Agner - 2016-05-20

Rethinking DLLs and shared objects new - cv - 2016-05-20

Rethinking DLLs and shared objects new - Agner - 2016-05-20

Rethinking DLLs and shared objects new - Peter Cordes - 2016-05-30

Rethinking DLLs and shared objects new - Agner - 2016-05-30

Rethinking DLLs and shared objects new - Joe Duarte - 2016-06-17

Rethinking DLLs and shared objects new - Agner - 2016-06-18

Rethinking DLLs and shared objects new - Bigos - 2016-06-18

Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-02

Rethinking DLLs and shared objects new - Agner - 2016-06-04

Rethinking DLLs and shared objects new - Freddie Witherden - 2016-06-04

Rethinking DLLs and shared objects new - Agner - 2016-06-06

Is it better to have two stacks? new - Agner - 2016-06-05

Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-07

Is it better to have two stacks? new - Eden Segal - 2016-06-13

Is it better to have two stacks? new - Agner - 2016-06-13

Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-14

Is it better to have two stacks? new - Agner - 2016-06-14

Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-15

Is it better to have two stacks? new - Agner - 2016-06-15

Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-16

Is it better to have two stacks? new - Agner - 2016-06-16

Is it better to have two stacks? new - Hubert Lamontagne - 2016-06-17

Is it better to have two stacks? new - Freddie Witherden - 2016-06-22

Now on Github new - Agner - 2016-06-26

List Messageboards