lagash has quit [Remote host closed the connection]
lagash has joined #riscv
pecastro has quit [Ping timeout: 252 seconds]
jacklsw has joined #riscv
danlarkin has quit [Server closed connection]
danlarkin has joined #riscv
unnick_ has joined #riscv
unnick has quit [Ping timeout: 240 seconds]
esv_ is now known as esv
esv_ has joined #riscv
jacklsw has quit [Quit: Back to the real world]
esv_ has quit [Remote host closed the connection]
esv_ has joined #riscv
esv_ has quit [Client Quit]
esv has quit [Ping timeout: 240 seconds]
jacklsw has joined #riscv
vagrantc has quit [Quit: leaving]
esv has joined #riscv
edr has quit [Quit: Leaving]
jacklsw has quit [Quit: Back to the real world]
ntwk has quit [Ping timeout: 246 seconds]
esv has quit [Ping timeout: 252 seconds]
ntwk has joined #riscv
esv has joined #riscv
tux3_ has quit [Server closed connection]
tux3 has joined #riscv
dilfridge has quit [Server closed connection]
dilfridge has joined #riscv
<unlord>
woah, that link to a zip instruction is useful
drewj has joined #riscv
drewj has quit [Ping timeout: 255 seconds]
drewj has joined #riscv
drewj has quit [Read error: Connection reset by peer]
drewj has joined #riscv
agent314 has joined #riscv
mlw has joined #riscv
tucanae47 has quit [Server closed connection]
tucanae47 has joined #riscv
drewj has quit [Ping timeout: 264 seconds]
drewj has joined #riscv
drewj has quit [Ping timeout: 240 seconds]
drewj has joined #riscv
crabbedhaloablut has joined #riscv
davidlt has joined #riscv
maxinuxx has joined #riscv
clemens3 has quit [Server closed connection]
heat has quit [Ping timeout: 246 seconds]
clemens3 has joined #riscv
mlw has quit [Ping timeout: 255 seconds]
drewj has quit [Ping timeout: 240 seconds]
drewj has joined #riscv
drewj has quit [Read error: Connection reset by peer]
drewj has joined #riscv
mlw has joined #riscv
drewj has quit [Ping timeout: 240 seconds]
drewj has joined #riscv
agent314 has quit [Ping timeout: 252 seconds]
averymt has joined #riscv
agent314 has joined #riscv
crabbedhaloablut has quit []
crabbedhaloablut has joined #riscv
BootLayer has joined #riscv
drewj has quit [Quit: Quit]
shamoe has quit [Quit: Connection closed for inactivity]
jacklsw has joined #riscv
prabhakar has quit [Ping timeout: 255 seconds]
markh has quit [Read error: Connection reset by peer]
mark4o has joined #riscv
davidlt has quit [Remote host closed the connection]
mark4o is now known as markh
davidlt has joined #riscv
ema has quit [Ping timeout: 255 seconds]
ema has joined #riscv
<courmisch>
on simple bswap16, the vwaddu/vwmaccu interleaving trick is slower than bit shift and or
elastic_dog has quit [Ping timeout: 246 seconds]
<dzaima[m]>
just shift+or, or zext+shift+or? cause for a vector-vector interleave you need the three
HumanG33k has quit [Quit: WeeChat 3.8]
<courmisch>
shift+shift+or
EchelonX has joined #riscv
<courmisch>
it's probably an edge case though
<courmisch>
also I can see the value of the add/acc trick if you can't spare a vector
elastic_dog has joined #riscv
aburgess has quit [Ping timeout: 246 seconds]
<dzaima[m]>
don't find it particularly weird that this'd be the case on some implementations, but this one, going by camel-cdr's benchmarks (https://camel-cdr.github.io/rvv-bench-results/canmv_k230/index.html for those that haven't seen) it'd seem to me that it should be faster? or are the benchmarks not particularly accurate for throughput (perhaps from some hazards due to no register rotation & being in-order)?
<dzaima[m]>
oh, misread your shift+shift+or as zext+shift+or; so presumably you're actually comparing vnsrl+vnsrl+vwaddu+vwmaccu vs shift+shift+or?
<courmisch>
for a simple bswap16, it's vlseg2e8;vwaddu;vwmaccu vs vle16+vsrl+vsll+vor
<dzaima[m]>
ah, so a segment load; how does that alone compare to the vle16?
<courmisch>
I didn't take measurement. camel coder seemed to imply that it's about the same?
<dzaima[m]>
oh, it's under byteswap benchmarks
<dzaima[m]>
oh, no, that's a different thing
<courmisch>
it seems he's using byteswap as a way to bench gather
<courmisch>
it doesn't make sense to use gather for this in real life, AFAIK
<dzaima[m]>
the only thing said there is about segmented stores being about as fast, nothing about loads (though it would be somewhat weird if the two were particularly different)
<dzaima[m]>
even then, the segment store version gets to avoid a vzext.vf2 and still ends up slightly slower than the alternative, so it's still looking far from on-par; I think camel-cdr's intent is that it's just closer to a vector load than scalar loads (which is the case on the C920 0.7.1 tests)
mlw has quit [Ping timeout: 240 seconds]
<dzaima[m]>
(s/is/isn't/ perhaps; C920 0.7.1 segmented stores are worse than even the scalar comparison, I mean)
<dzaima[m]>
(that s/is/isn't/ being on the second "is" in my message)
Morn_ has quit [Ping timeout: 252 seconds]
Morn_ has joined #riscv
Jackneill has joined #riscv
mlw has joined #riscv
drewfustini has quit [Server closed connection]
drewfustini has joined #riscv
davidlt has quit [Ping timeout: 260 seconds]
mlw has quit [Ping timeout: 260 seconds]
jacklsw has quit [Ping timeout: 246 seconds]
<unlord>
dzaima[m]: thanks for the link, I had not seen those benchmarks