<jrtc27>
jfsimon: it varies based on the immediate
<jrtc27>
LLVM will apply a bunch of rules to determine what it thinks is the "best" sequence
<jrtc27>
on RV32 it's pretty simple because lui+addi does everything you need, so the only question is whether you can do it in one instruction (or multiple compressed instructions)
<jrtc27>
on RV64 it's a lot more complicated
<jfsimon>
indeed i saw thanks, playing a bit with compiler to check how they make the pseudo to instruction
<jrtc27>
e.g. LLVM takes advantage of the various SHxADD instructions from Zba(?) if present and useful
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #riscv
shamoe has quit [Quit: Connection closed for inactivity]
dh` has quit [Ping timeout: 255 seconds]
sjs has quit [Remote host closed the connection]
sjs has joined #riscv
KREYREN_ has quit [Remote host closed the connection]
shamoe has joined #riscv
heat_ has quit [Ping timeout: 252 seconds]
zBeeble42 has joined #riscv
zBeeble24 has quit [Read error: Connection reset by peer]
zBeeble42 has quit [Read error: Connection reset by peer]
zBeeble42 has joined #riscv
wingsorc has quit [Ping timeout: 258 seconds]
pabs3 has quit [Ping timeout: 260 seconds]
pabs3 has joined #riscv
sakman has quit [Quit: Leaving]
sakman has joined #riscv
raym has quit [Ping timeout: 240 seconds]
davidlt has joined #riscv
Jackneill has joined #riscv
Jackneill has quit [Ping timeout: 240 seconds]
Jackneill has joined #riscv
jacklsw has joined #riscv
shamoe has quit [Quit: Connection closed for inactivity]
psydroid has joined #riscv
BootLayer has joined #riscv
<courmisch>
Say I need to load 64-bit blobs at regular interval. If the base adress and stride are multiple of 8, I can use vlse64.v.
<courmisch>
But is there any remotely practical way to do it if not aligned? if I manually break it down into a series of vle8.v/add, it only occupies the bottom half (or less) of each vector :/
crabbedhaloablut has joined #riscv
davidlt has quit [Ping timeout: 260 seconds]
jfsimon1981_c has joined #riscv
jfsimon has quit [Ping timeout: 264 seconds]
___nick___ has joined #riscv
snowsauce has quit [Ping timeout: 255 seconds]
<dzaima[m]>
courmisch: as I'd imagine that a strided load would perform about the same as an indexed load (except maybe for strides of 1/0/-1) for the same element size, you could also use vluxei16.v and arrange the bytes as you like. If the base isn't aligned but the stride is still a multiple of 8 bytes, it's probably be better to manually blend together two manually-aligned vlse64.v-s
hightower3 has quit [Ping timeout: 264 seconds]
notgull has quit [Ping timeout: 252 seconds]
notgull has joined #riscv
KREYREN has joined #riscv
<courmisch>
dzaima[m]: so VLSE64 twice then VSLL, VSRL and VOR ? I guess that could work
<dzaima[m]>
yeah
<courmisch>
at least should be faster than VLUXEIE8
<dzaima[m]>
right
<courmisch>
thx, lets see
junaid_ has joined #riscv
Kedleston has joined #riscv
junaid_ has quit [Remote host closed the connection]