<dramforever[m]>
rustc wouldn't let me clobber s0 and s1 on riscv so it's a bit annoying
<dramforever[m]>
it's also really annoying to have to do jalr ra, t0 like this...
test924 has joined #riscv
<sorear>
if you're switching between coroutines that have their own procedure calls, it might not be appropriate to use the RAS hint
<sorear>
actually it's rather useless unless crob_yield is #[inline(always)] because the predicted next instruction address will _always be the same_
<sorear>
I don't see why it should deny s0/s1 when the other registers are obviously fine (if it were only s0 I might suspect that it wants you to use the fp name)
vagrantc has joined #riscv
vagrantc has quit [Quit: leaving]
<dramforever[m]>
i see it being inlined alright and didn't bother putting #[inline(always)] in it, but yes it is intended to be inlined
<dramforever[m]>
if it weren't and was using the "usual" calling conventions it's no different from the usual "save all callee saved regs" kind of coroutines
coldfeet has quit [Remote host closed the connection]
<dramforever[m]>
courmisch: it would *work*, but the benefits of it being inlinable would be lost
<dramforever[m]>
i also found that at least gcc is fine with clobbering the frame pointer register when -fomit-frame-buffer, and doesn't have a "reserved register" problem, which would mean a much cleaner implementation (only tried x86 so far, but i think it should work analogously for riscv)
felixonmars has quit [Ping timeout: 268 seconds]
jfsimon1981_b has quit [Remote host closed the connection]
jfsimon1981_b has joined #riscv
BootLayer has quit [Quit: Leaving]
felixonmars has joined #riscv
jfsimon1981_b has quit [Remote host closed the connection]
jfsimon1981_b has joined #riscv
coldfeet has joined #riscv
BootLayer has joined #riscv
jfsimon1981_b has quit [Remote host closed the connection]
jfsimon1981_c has joined #riscv
zjason``` has quit [Ping timeout: 260 seconds]
luca_ has joined #riscv
luca_ is now known as OwlWizard
Noisytoot has quit [Excess Flood]
Andre_Z has joined #riscv
Noisytoot has joined #riscv
Tenkawa has joined #riscv
OwlWizard has quit [Quit: OwlWizard]
heat has joined #riscv
stolen has quit [Quit: Connection closed for inactivity]
Forty-Bot has quit [Ping timeout: 252 seconds]
test925 has joined #riscv
eightthree has quit [Ping timeout: 256 seconds]
test924 has quit [Ping timeout: 240 seconds]
eightthree has joined #riscv
BootLayer has quit [Quit: Leaving]
beber_ has quit [Quit: Gateway shutdown]
beber_ has joined #riscv
<unlord>
question for you fine people
<unlord>
vwmulu.vx v16, v8, t3 <-- this is giving me an illegal instruction when LMUL=m8
<courmisch>
mixed width is illegal with m8
<courmisch>
since it would result in EMUL=16 for the wide operand(s)
beber_ has quit [Quit: Gateway shutdown]
<unlord>
right, that is why I put it in v16
<courmisch>
maybe but EMUL>8 is illegal
<unlord>
lame :)
<courmisch>
it wouldn't give any better performance than two instructions with EMUL=8 in any reasonable IP, I think
marcj has quit [Ping timeout: 272 seconds]
beber_ has joined #riscv
<courmisch>
I mean, if your hardware can parallelise that much, then it's time to double the vector length
<unlord>
it just means more special cases
<courmisch>
it's just that conventionally LMUL is for the narrow operand(s)
<courmisch>
in turn, that convention is because it makes it more likely that you won't need to change vcfg
<courmisch>
(though I'm not sure that claim is based in any reality)
<unlord>
courmisch: why is there no mf16 if there is encoding space for it?
dzaima[m] has joined #riscv
<dzaima[m]>
fractional LMUL is there for converting between different width integers while not increasing register usage; and mf8 is enough to convert 8-bit elements to a m1 register of 64-bit elements. Were 128-bit element types to be added, an mf16 would supposedly be added too
<dzaima[m]>
spec: "Implementations must provide fractional LMUL settings that allow the narrowest supported type to occupy a fraction of a vector register corresponding to the ratio of the narrowest supported type’s width to that of the largest supported type’s width."
<courmisch>
unlord: in theory, as dzaima pointed out, it would only make sense for e128
<courmisch>
that being noted, at least K230 does see performance improvements from using fractional multipliers even in absence of mixed width
<courmisch>
I guess that VLMUL=0b100 is reserved fro either MF16 or M16 whichever (if any) ends up making sense in the future
sakman_ is now known as sakman
<sorear>
it's annoying because (a) pipelined multipliers generate the high and low halves in the same cycle, so mul; mulh wastes half the output in each cycle (b) sifive has a double-width write port, so widening and single-width operations take the same number of cycles at a given LMUL
<sorear>
best multiplier occupancy for a bignum multiply you can get is iirc around 40%, I forgot most of the details
<sorear>
(tip: if you interleave a large vector between several vector registers, you can do a "slide by 1" by renaming the registers with only one slide instruction)
Tenkawa has quit [Ping timeout: 268 seconds]
Tenkawa has joined #riscv
mlw has quit [Ping timeout: 268 seconds]
Andre_Z has joined #riscv
hightower2 has joined #riscv
Andre_Z has quit [Quit: Leaving.]
jfsimon1981_c has quit [Remote host closed the connection]
jfsimon1981_c has joined #riscv
coldfeet has quit [Remote host closed the connection]
BootLayer has quit [Quit: Leaving]
jfsimon1981_c has quit [Remote host closed the connection]
jfsimon1981_c has joined #riscv
___nick___ has quit [Ping timeout: 260 seconds]
jfsimon1981_c has quit [Read error: Connection reset by peer]
jfsimon1981_c has joined #riscv
marcj has joined #riscv
markh has quit [Remote host closed the connection]
markh has joined #riscv
luca_ has joined #riscv
luca_ has quit [Remote host closed the connection]