KREYREN has quit [Read error: Connection reset by peer]
Leopold has quit [Read error: Connection reset by peer]
zv has quit [Ping timeout: 256 seconds]
jfsimon1981 has joined #riscv
KREYREN has joined #riscv
BootLayer has joined #riscv
zv has joined #riscv
Forty-Bot has joined #riscv
zv_ has quit [Ping timeout: 268 seconds]
heat has joined #riscv
frkzoid has quit [Ping timeout: 272 seconds]
frkazoid333 has joined #riscv
billchenchina has joined #riscv
frkzoid has joined #riscv
billchenchina has quit [Remote host closed the connection]
billchenchina has joined #riscv
frkazoid333 has quit [Ping timeout: 246 seconds]
somlo has quit [Quit: Leaving]
Stat_headcrabed has joined #riscv
Andre_Z has quit [Quit: Leaving.]
Trifton has joined #riscv
billchenchina has quit [Quit: Leaving]
billchenchina has joined #riscv
coldfeet has joined #riscv
tlwoerner has quit [Ping timeout: 240 seconds]
cousteau has joined #riscv
rsjw has joined #riscv
<cousteau>
Hi! I intend to use custom instructions that use "double-width registers" (for example, ".insn r 13, 0, 0, a0, a2, a4", where "a0" actually represents a 64-bit value stored in registers a0 and a1). This is supported for example in the CV-X-IF interface for RV32 architectures.
jobol has quit [Quit: Leaving]
<cousteau>
Question: I intend to call this instruction from C using inline assembly, like asm (".insn r 13, 0, 0, %0, %1, %2" : "=r" (z) : "r" (x), "r" (y)) where x, y, z are uint64_t
<cousteau>
Can I expect the compiler to realize that x, y, z are "double-width" variables, and do the right thing regarding saving/overwriting registers?
<cousteau>
(full disclosure: I already asked this in #gcc but received no answer; my apologies for the cross-positing but I believe this might be a more topical channel)
<cousteau>
The compiler is gcc, by the way
<courmisch>
no, it won't work, because "r" means one register
<cousteau>
Dammit
<cousteau>
So you mean that "=r" will be interpreted by gcc as "this overwrites one register", correct?
<courmisch>
the GCC documentation would tell you that much
<cousteau>
Well, I failed to interpret it, or couldn't find a definite answer for that in there
<cousteau>
(meaning, I'm not lazy, I did my homework and read the docs before asking, I just suck at doing homework) :)
Revy has quit [Changing host]
Revy has joined #riscv
fuwei has joined #riscv
<courmisch>
you need a suitable constraint, or hard-coded registers
Trifton has quit [Quit: ~~~RiDiN tHe WaVeS~~~]
<cousteau>
Hard-coded registers won't do; that means writing the whole function in assembly I think. So let me have a look at that constraint thing...
tlwoerner has joined #riscv
<cousteau>
I'm reading "Constraints for Particular Machines" in the GCC docs and can't see any suitable one, but it's interesting nonetheless
<jrtc27>
zdinx just uses f, so there may well not be a constraint to use
<jrtc27>
zilsd hasn't got any patches published for either toolchain yet afaik so can't see what they're inventing
<cousteau>
Does RV32Zdinx exist at all?
<jrtc27>
yes
<jrtc27>
that's where aligned register pairs first crept into the ISA
<cousteau>
Interesting. Makes sense.
<cousteau>
Also, yep, Zilsd does exactly the kind of thing I'm trying to do as well.
Stat_headcrabed has quit [Quit: Stat_headcrabed]
tlwoerner has quit [Ping timeout: 268 seconds]
<cousteau>
So, I guess that GCC should be extended with some specific "aligned register pair" constraint for inline assembly?
tlwoerner has joined #riscv
<cousteau>
(in a future)
<cousteau>
And in the meantime, my options are: (1) write the whole function in assembly so that I have full control of the registers, (2) ¿tell GCC that a certain register is about to be clobbered, and get everything out of it? Can that be done?
<cousteau>
Like "free registers a0 and a1, move variable x to registers a0 and a1, call .insn a0, move a0 and a1 to variable y"?
<jrtc27>
you could use tied operands for each half
<cousteau>
"Clobbers - A comma-separated list of registers or other values changed by the AssemblerTemplate, beyond those listed as outputs." Sweet!
<cousteau>
jrtc27: what are those?
<jrtc27>
register uint32_t x __asm__("a0");
<cousteau>
Oh! Interesting.
<cousteau>
I understand that writing that will be equivalent to telling the compiler "don't use this register for anything else, and if you're using it, move everything outside of it before this line of code that declares it", correct?
<rsjw>
it looks like there are some caveats that are documented in the section 6.47.5.2 Specifying Registers for Local Variables
<cousteau>
jrtc27: also, how would that work with 64-bit variables?
<jrtc27>
that's why I said "each half"
<cousteau>
Oh.
<jrtc27>
do the slicing and gluing manually and with any luck it'll be optimised to moving that half of the register pair
ntwk has quit [Ping timeout: 246 seconds]
<rsjw>
"Defining a register variable does not reserve the register. Other than when invoking the Extended 'asm', the contents of the specified register are not guaranteed."
<jrtc27>
yes, you still need to use then as in and out operands
<jrtc27>
you just get to choose how they're allocated
<cousteau>
rsjw: which 6.47.5.2 are you looking at?
<cousteau>
Well, I was using the link with no version number in it so I assumed it'd be the latest, or the "default" one
<cousteau>
OK, thanks for the info! Gotta go now, but I'll look further into this later. My understanding is that "aligned register pairs" are not directly supported by inline assembly yet, so some manual work may need to be done. Storing the value as 2 variables may work for me.
<cousteau>
I guess it'll be like uint32_t x_lo asm("a0"), x_hi asm("a1"), y_lo asm("a2"), y_hi asm("a3"); x_lo = ...; ... asm(".insn whatever a2, a0" ::: "a2,a3")
<cousteau>
Correct?
<cousteau>
Like "this will overwrite a2 and a3 so don't put anything cool in there"
<courmisch>
you do realise that you could try (and fail) to compile that without asking
<cousteau>
Not sure what I would have to do about a0 and a1
<courmisch>
what jrtc27 said, you still need input or output operands
<cousteau>
courmisch: succeeding to compile that won't ensure me that it's the right thing to do, that's why I'm asking
<courmisch>
that's a pretty bad excuse, TBH
<courmisch>
you could at least get something that compiles and whose disassembly seems correct
<courmisch>
and only then ask somebody to check that it is right and not just working by accident
<cousteau>
I mean, you're "50% right", I should at least try things that seem to work and then ask to cover the other 50%
<cousteau>
But I think that understanding this would be a better approach than "try it and see"
<cousteau>
(sorry for being so annoying BTW, you're all being a great help)
<courmisch>
doing it yourself is a much more effective way to get to understand than asking questions on IRC
<cousteau>
Well, I tried the "just pass 64-bit variables to asm()" approach OK godbolt and it worked perfectly
<cousteau>
*on godbolt
<cousteau>
It was wrong, but the use case I tried worked fine
<cousteau>
Anyway, I'll try this later
<cousteau>
Thanks for the help! See you around :)
<courmisch>
is there no Zbb and V strnlen() anywhere? I saw many strlen() impl's but ...
<courmisch>
well ended up doing my own
KREYREN has quit [Remote host closed the connection]
<conchuod>
courmisch: I assume you mean one for systems that support both V and Zbb? There's a bitmanip only one (and strcmp iirc) in the bitmanip spec.
raym has joined #riscv
BootLayer has quit [Quit: Leaving]
<courmisch>
conchuod: one for Zbb and one for V
<courmisch>
conchuod: I think Zbb has an strlen example, not strnlen?
<conchuod>
Oh, sorry I misread your message.
fuwei has quit [Remote host closed the connection]
fuwei has joined #riscv
<courmisch>
is there a hardware reason why RVI defines SUB, rather than RSUB?
<courmisch>
I wish there was an RSUBI...
mlw has quit [Ping timeout: 260 seconds]
* cousteau
is curious what RSUB would be
heat has quit [Remote host closed the connection]
<cousteau>
Oh right, reverse SUB, like b-a instead of a-b. I get why RSUBI would be useful
jmdaemon has joined #riscv
n_crm has quit [Ping timeout: 246 seconds]
Andre_Z has joined #riscv
n_crm has joined #riscv
zkrx has quit []
cousteau_ has joined #riscv
cousteau has quit [Ping timeout: 255 seconds]
wingsorc has joined #riscv
cousteau has joined #riscv
ntwk has joined #riscv
jmdaemon has quit [Ping timeout: 260 seconds]
zkrx has joined #riscv
<cousteau>
Re: my previous question about inline assembly and registers. Do you think it's a RISC-V specific question (the RISC-V gcc implementation), or more of a generic gcc question? I would like to understand the core reason for this issue rather than keep shooting in the dark
<cousteau>
I'm from the old school of "don't just TIAS, you need to understand it; otherwise you may run into undefined behavior"
<cousteau>
So I'm willing to learn rather than keeping asking or trying blindly, but I need to know where to start
<jrtc27>
what issue?
<jrtc27>
specifically
<dh`>
yeah, was about to say the same thing, there's a good bit of scroll
zBeeble42 is now known as zBeeble
<cousteau>
the one about how to deal with "pairs of registers" in inline assembly in general
<dh`>
there should be an operand constraint type for it
<cousteau>
In short: (let me write some)
<dh`>
you will need that anyway if you ever want gcc to be able to issue the instructions itself
<jrtc27>
register pairs don't exist as a concept in many ISAs, hence there is no machine-independent constraint for it
<dh`>
there probably is not one so you want to crib from another arch that has the concept, like mips or sparc32
<jrtc27>
architectures that have them should define such a constraint
<cousteau>
some RISC-V instruction extensions allow you to work with "pairs of registers", so that e.g. `instruction a4, a0` actually means `instruction {a5,a4}, {a1,a0}`
<dh`>
then once you have that it's just a matter of attaching them to the proper asm operands
<dh`>
however, adding them to gcc is not for the faint of heart, if you've never been inside gcc before I can't really recommend it
<cousteau>
dh`: there should be, but (to my understanding) there is not
<dh`>
and count your sanity points before you start
<dh`>
if you're looking for a quick workaround, there isn't one
<cousteau>
jrtc27: yeah, I didn't think there'd be a machine-independent thing for it
jmdaemon has joined #riscv
<cousteau>
or... well, not likely at least
<cousteau>
dh`: what you're proposing is modifying gcc's RISC-V implementation to add those. I was thinking of a more custom "how to make it work with what I've got" approach
<dh`>
yes, that's what I thought
<dh`>
short answer is: don't try
<cousteau>
(well, not "a more custom", more like the opposite)
<cousteau>
dh`: I'm sure an experienced-enough developer could add those instructions without much issue
<cousteau>
But, I am no such developer.
<cousteau>
jrtc27: re: "architectures that have them should define such a constraint" -- what I got from the previous conversation was that RISC-V probably *should* have it, since some minor extensions seem to use that
coldfeet has quit [Quit: Lost terminal]
<dh`>
you can in theory write an inline asm that takes its inputs in two arbitrary registers and also explicitly destroys a0/a1 and write the thing as "move a0, %0; move a1, %1; my-fancy-insn a0"
<dh`>
but that's going to generate terrible code, you're better off just writing in asm
<cousteau>
"terrible code" is code that destroys variables I'm using. Code that uses unnecesary mv instructions is bad, but not terrible.
<cousteau>
Well, it's terrible, but not terrible terrible.
<cousteau>
Let's leave it at "horrible".
<dh`>
terrible horrible no good very bad
<cousteau>
(On a side note: my keyboard's C, Q, and 1 keys don't work properly, so please don't mind if I make any typo involving those)
jmdaemon has quit [Ping timeout: 268 seconds]
<cousteau>
The solution I have so far is "write the whole function in assembly" so anything beyond that is an improvement.
<cousteau>
(still beats "write the whole program in assembly")
<jrtc27>
RISC-V only has them for floats at the moment
<jrtc27>
well, language-level floats
<jrtc27>
with Zdinx
<jrtc27>
so the f constraint gets repurposed there and means a float in a GPR or pair thereof
<jrtc27>
therefore it's not *currently* needed for any ratified extensions
<cousteau>
so the way I could leverage that is if I were writing for a Zdinx arch and I used "f", right?
<cousteau>
Zdinx is not ratified?
jmdaemon has joined #riscv
<cousteau>
also, do you mean that "if it's not ratified, compilers ain't gonna bother implement it"?
<jrtc27>
Zdinx is ratified but f works and will get you GPRs
<jrtc27>
compilers implement experimental extensions too but Zilsd is not far enough along yet to have patches available for either major toolchain
<jrtc27>
also that, pedantically, it's not RISC-V until it's ratified
<cousteau>
well, if Zilsd is not "trendy enough", then no way Xif is
<sorear>
zacas also requires pair constraints
<sorear>
courmisch: mips and alpha don't have rsub, and OP-IMM is congested enough that rsubi would be a hard sell regardless; the most useful case is neg and that can be expressed with sub/x0
<dh`>
alpha does have rsub, that's one of the ways alpha is slightly better than mips
<dh`>
they call it something else of course
<rsjw>
cousteau: regarding your /* make register vars */ asm(".insn whatever a2, a0" ::: "a2,a3") example, you'd still need to tell gcc about the input variables, so it'd probably be more like asm(".insn whatever %0, %2" : "=r" (a2), "=r" (a3) : "r" (a0), "r" (a1) )
<rsjw>
err and clobbers...
<cousteau>
rsjw: oooooh
<cousteau>
yeah, I didn't realize I could just add input and output registers/vars and then just NOT use them in the assembly code
<rsjw>
/* make register vars */ asm(".insn whatever a2, a0" ::: "a2,a3") example, you'd still need to tell gcc about the input variables, so it'd probably be more like asm(".insn whatever %0, %2" : "=r" (a2), "=r" (a3) : "r" (a0), "r" (a1) : "a1", "a2")
<cousteau>
I'm totally taking a photo of that
<jrtc27>
this is one of the things the manual is actually decent at explaining
<sorear>
look at what every libc does to set up a0..a7 for syscalls
<rsjw>
cousteau: disclaimer: I'm no expert at this, I just read some documentation
* cousteau
was on his phone before and just taking screenshots of the whole convo
<rsjw>
actually, the inputs aren't clobbered... but it should be an input...
<cousteau>
well maybe, but what you said made so much sense I'm willing to believe it
Andre_Z has quit [Quit: Leaving.]
<cousteau>
yeah, you somehow need to tell the compiler that the inputs must be what they must be
<rsjw>
/* make register vars */ asm(".insn whatever a2, a0" ::: "a2,a3") example, you'd still need to tell gcc about the input variables, so it'd probably be more like asm(".insn whatever %0, %2" : "=r" (a2), "=r" (a3) : "r" (a0), "r" (a1) : "a1")
<cousteau>
uint32_t x_lo asm("a0"), x_hi asm("a1"); ...; // just means "whenever I use `x_hi`, it must be in a1", not "a1 can only be used for x_hi"
<cousteau>
asm("whatever %0" :: "r" (x_lo)); // you somehow need to tell the compiler that, when you reach this point, x_hi *must* be loaded in a1
<rsjw>
sorry, I'm confusing myself again. there's no clobber list necessary because all the relevant registers are listed in the input and output lists. sorry for the repetitiveness
<cousteau>
because the compiler might have reused a1 for something else, which is supposed to be OK if you're "not using x_hi" as far as the compiler is concerned
<cousteau>
that last sentence had just way too many "C"s :(
<dh`>
yeah, I appear to have been wrong
<dh`>
I had thought that the way immediates were set up in the architecture was partly to specifically make them the first operand of subtract but apparently that's not actually true
<dh`>
feh.
<dh`>
so, n/m.
Tenkawa has quit [Quit: Was I really ever here?]
<cousteau>
ok so the code should look like uint32_t x_lo asm("a2"), x_hi asm("a3"), y_lo asm("a4"), y_hi asm("a5"); x_lo = ...; x_hi = ...; asm (".insn i 13,0, %[rd], %[rs1], 0" : [rd] "=r" (y_lo), "=r" (y_hi), [rs1] "r" (x_lo), "r" (x_hi) : "a5")
<cousteau>
er, asm (".insn i 13,0, %[rd], %[rs1], 0" : [rd] "=r" (y_lo), "=r" (y_hi) : [rs1] "r" (x_lo), "r" (x_hi) : "a5") (that was meant to be a ":", not a ",")
<cousteau>
or, since we're already fixing the registers, asm (".insn i 13,0, a4, a2, 0" : "=r" (y_lo), "=r" (y_hi) : "r" (x_lo), "r" (x_hi) : "a4,a5")
<cousteau>
...would the clobber be needed at all, since I'm already telling it I'm =r'ing a4 and a5?
Trifton has joined #riscv
<rsjw>
yeah I don't think you need the clobber list
<rsjw>
I corrected myself about that above, but it must've got lost in the wall of other corrections I sent
<cousteau>
yeah I'm not the best at keeping track of long conversations
<cousteau>
my buffer is small
<rsjw>
do these registers need to be on a 64-bit boundary for the 1st one?
<cousteau>
I'm like a baby. I don't have object permanence. When a message goes out of my screen, I believe it has ceased to exist.
<cousteau>
rsjw: yes; they need to be even-odd
<rsjw>
or more correctly.... does the register number need to be either even or odd?
<cousteau>
like {x5,x4} + {x11,x10}, but not e.g. {x6,x5}
<rsjw>
that's odd-even though, right?
<rsjw>
oh... low one first
<cousteau>
"CV-X-IF optionally supports implementation of (custom or standardized) ISA extensions mandating dual register file write-backs. Dual write-back is supported for even-odd register pairs (Xn and Xn+1 with n being an even number extracted from instruction bits [11:7])."
<cousteau>
(similarly so for dual-read)
<cousteau>
rsjw: yeah, my brain works mostly in little endian nowadays
<cousteau>
I blame email conversations when the reply is written before the first message.
<cousteau>
It's "even-odd" as in "X&Y where Y is the one after X, and X is even (so the one after it is odd)"
<rsjw>
heh
<cousteau>
so... it was more a matter of register order rather than endianness. But endianness too.
<rsjw>
I'll let you bang you head against those details
<sorear>
modern in-order register files are typically a large pool of latches or flip-flops with a tree of muxes for each read port
<sorear>
if you have 32 registers, the tree will have five levels; the input to the last mux is 2*XLEN bits of data, selected by a single bit of the register number
<cousteau>
rsjw: meaning "approved for a novice" or "approved by a novice"?
<sorear>
so register-pair inputs are completely free on the regfile side for a common implementation strategy only if they are aligned
<cousteau>
dh`: certain instruction extensions REQUIRE the register pairs to be that way
<cousteau>
or do you mean the HW implementation?
<sorear>
you can do something similar for outputs by separating "even write data" and "odd write data" and then activating the write enables for two registers at once, this requires a new mux to generate the second write data but has no cost which scales with the total regfile size
<cousteau>
the HW implementation... yeah, I guess it does. That way you implement the register file as a 16-entry deep, 64-bit wide memory block, with a mux at the end to go from 64 to 32
<cousteau>
otherwise, you'll need some sort of barrel shifter for realigning misaligned memory accesses (but for the small memory that is the register file, rather than the large system RAM)
<cousteau>
can be done too, but it's harder and kinda pointless
<sorear>
to my knowledge ASIC register files haven't been small memory blocks since about 2000, scaling isn't kind to SRAM edge circuitry and standard cell memories are smaller up to a couple kB
<cousteau>
I've had these mental ramblings before re: memory, but they apply to registers as well
<sorear>
GPU register files are large enough that they act more like memory in terms of the applicable tradeoffs; Intel's GPU ISA allows (byte granular) misaligned register access
<cousteau>
sorear: what are they then? Bunch of flip-flops with muxes? I believe that'd benefit from alignment as well
<sorear>
flip flops or latches in all the implementations I've seen or read about
<cousteau>
with alignment, you implement the dual register file as {reg[n&~1], reg[n|1]}
<cousteau>
without alignment, you implement the dual register file as {reg[n+1], reg[n]}
<cousteau>
and you have to include an adder for the register file access, which kinda slows things down
<cousteau>
also, bigger muxes (2x 32:1 instead of 2x 16:1)
<dh`>
hmm
<cousteau>
so, bigger logic overhead
<dh`>
I guess I would have thought that the +1 vs |1 would happen far enough upstream from the actual register file access to disappear into the noise
jmdaemon has quit [Ping timeout: 268 seconds]
<dh`>
for writing back, I suppose that's only true if you actually want to be able to write back any two registers and this is a special case
<dh`>
which it probably isn't because we don't have two register output fields for a reason
<cousteau>
dh`: do you mean delay/efficiency-wise, or logic resources-wise?
<cousteau>
dh`: well, this Xif extension also allows "dual register write"
<dh`>
for reading it seems like it's all kinda moot anyway but I guess one double-width access port is cheaper than two single-width access ports
<dh`>
either, basically I was wondering if the alignment requirement really serves much purpose
<dh`>
keep in mind that I know just enough about hardware to make ignorant comments like this on irc :-)
<cousteau>
up to 3 read registers, up to 3 of the read registers can be dual, and the single write register can be dual too
<cousteau>
That's cool. I'm more on the HW side but kinda lacking on the SW part
* cousteau
knows a lot of C but still believes that compilers are magic
<dh`>
and I guess the question was partly also motivated by earlier examples of register pairs (like the sparc32 doubleword memory instructions) not actually needing more read ports than they already have to do both registers independently
<dh`>
so the pairing can (in theory at least) be only about addressing
<sorear>
a double-width access port is half the cost of two independent single-width access ports, because the top and bottom of the double-width access only have to be able to access half of the physical registers, so you need ~1024 muxes instead of ~2048
<dh`>
right, the thing I was thinking was that at least in some cases you likely already have enough single-width access ports
<cousteau>
C code -> [magic] -> [more magic] -> assembly code -> [some minor but obscure magic] -> binary code -> HW
<dh`>
but I have no idea what the application in this extension is
<sorear>
if a single-width read port uses bit0 of the register specifier for the last level of muxing, you get a double-width read port for free
<dh`>
you left off a step: assembly code -> [minor but obscure magic] -> .o file -> [dark black magic] -> program file
<cousteau>
...oh right, the linker
<cousteau>
yeah I absolutely forgot about it, and probably for a reason
<dh`>
linking violates most of the nice simple models of compiling that people get taught in undergrad
<sorear>
renamed out of order is a bigger problem because there's no guarantee the even and odd architectural registers are in any given subset of physical registers, but given that there are no ooo core-v cores I can't see cv-x-if caring too deeply about that
<dh`>
you'd hae to rename them in pairs
<cousteau>
Out of Order?
<dh`>
and that might restrict all your renaming to pairs if you aren't careful
<cousteau>
you misspelled "Magic"
<dh`>
and that would definitely suck
<cousteau>
...ah, good point. OoO and reg pairs would be crazy
<cousteau>
good thing I'm dealing with an in-order arch
<cousteau>
(well, not OoO in general, but the register renaming part)
<rsjw>
cousteau: meaning "approved by"
<cousteau>
by noobs, for noobs. Cool
<cousteau>
*Cool!
<cousteau>
stupid ! key not working either because it's in the 1 key
DesRoin has quit [Ping timeout: 268 seconds]
<rsjw>
keyboards are inexpensive
<rsjw>
also you can usually pop the caps off and clean them
DesRoin has joined #riscv
<cousteau>
rsjw: cool, where can I get an inexpensive keyboard for this laptop?
<cousteau>
which I carry with me frequently so a USB one would be impractical?
<cousteau>
(yeah that's been my issue for the last few days)
<cousteau>
anyway I already tried cleaning the thing
<cousteau>
and this is a weird laptop brand so it's hard to get spare keyboards for it
<cousteau>
and it's old as hell so it's not worth investing much more money on it
<cousteau>
but it sucks because I literally bought two 16 GB RAM sticks for it days before it broke
<cousteau>
Anyway. Thanks a million for the help with this whole inline assembly business! I think the solution should work. But I'll look a bit further into gcc inline assembly to learn more about it. Good night!
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
mlw has joined #riscv
somlo has joined #riscv
<rsjw>
cousteau_: ebay, amazon
ntwk has quit [Ping timeout: 260 seconds]
khem has quit [Quit: WeeChat 4.2.2]
pecastro has quit [Ping timeout: 246 seconds]
mlw has quit [Ping timeout: 268 seconds]
ntwk has joined #riscv
<cousteau_>
After some thought, I've decided that the proper way to implement this in gcc, were it to ever make it into the inline asm syntax, would be to just tell asm that when you pass it a 64-bit variable, it should put it in two consecutive 32-bit registers automatically. Maybe even convince assembly that RV32I has 16 64-bit registers, which simply overlap the existing 32-bit registers and happen to be called the same as the least significant 32-bit register in
<cousteau_>
the pair.
<cousteau_>
Perhaps adding an extension for %H0 and %L0 meaning the high/low 32-bit register of the 64-bit variable %0.