sorear[m] changed the topic of #riscv to: Matrix users: #riscv:libera.chat will be ending operation NET Jul 25; please test #riscv:catircservices.org as a replacement | RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
Leopold has quit [Ping timeout: 250 seconds]
Leopold has joined #riscv
Leopold has quit [Ping timeout: 240 seconds]
Leopold has joined #riscv
Pierce[m] has joined #riscv
unsigned has joined #riscv
Andre_Z has quit [Quit: Leaving.]
jn has quit [Ping timeout: 240 seconds]
jn has joined #riscv
jn has joined #riscv
jn has quit [Changing host]
terminalpusher has quit [Ping timeout: 246 seconds]
Tenkawa has joined #riscv
Tenkawa has quit [Client Quit]
crabbedhaloablut has quit [Ping timeout: 246 seconds]
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
Forty-Bot has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
billchenchina has joined #riscv
MarvelousWololo has quit [Ping timeout: 258 seconds]
vagrantc has joined #riscv
<sorear>
who wants to name a vector unit probably targeting Flute
awita has quit [Remote host closed the connection]
pecastro has joined #riscv
<cousteau>
dh`: oh my god
<cousteau>
brilliant
<cousteau>
(or maybe some synonym, like "Syrinx", to make it less obvious)
jmdaemon has quit [Ping timeout: 246 seconds]
<cousteau>
like how "hwacha" is a thing that launches multiple "rocket"s in parallel
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
aburgess has joined #riscv
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
BootLayer_ has quit [Quit: Leaving]
aburgess has quit [Ping timeout: 245 seconds]
intrigus has joined #riscv
Raito_Bezarius has quit [Read error: Connection reset by peer]
<sevan>
took 15 days to complete a Perl 5 smoke test on a D1 LicheerRV running Ubuntu 23.04 :)
<sevan>
All testsuite attempts failed due to failing time tests due to clock drift issue
<sevan>
due to the clock drift issue the dist/threads-shared/t/waithires.t tests also fail when testing against threaded perl.
Tenkawa has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
aerkiaga has joined #riscv
aburgess has joined #riscv
intrigus has quit [Ping timeout: 246 seconds]
BootLayer has joined #riscv
drmpeg has left #riscv [#riscv]
terminalpusher has joined #riscv
Andre_Z has joined #riscv
intrigus has joined #riscv
cousteau has quit [Remote host closed the connection]
cousteau has joined #riscv
Tenkawa has quit [Quit: Was I really ever here?]
awita has joined #riscv
drmpeg has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
intrigus has quit [Quit: Client closed]
psydroid[m] has joined #riscv
Noisytoot has quit [Ping timeout: 272 seconds]
BootLayer has quit [Quit: Leaving]
Noisytoot has joined #riscv
Andre_Z has quit [Quit: Leaving.]
BootLayer has joined #riscv
pecastro has quit [Read error: Connection reset by peer]
pecastro has joined #riscv
billchenchina has joined #riscv
andyc has joined #riscv
MarvelousWololo has joined #riscv
<sorear>
maybe.
awita has quit [Ping timeout: 246 seconds]
<sorear>
jrtc27 or anyone else: have you run two or more of flute, rocket, cva6 with a rv64gc configuration through a synthesis and timing analysis flow for any target, and can you comment offhand about whether they are broadly comparable or wildly different?
jacklsw has joined #riscv
kilobyte_ch has quit [Ping timeout: 252 seconds]
<jrtc27[m]>
IIRC rocket achieves a higher frequency than flute, maybe 20-50%, on a VCU118
<jrtc27[m]>
Don’t recall area
<cousteau>
I should look up what's flute
<jrtc27[m]>
That may have improved since though, early Flute was quite aggressive at not introducing cycles of latency
<cousteau>
apparently it's a musical instrument... /me adds more search parameters
<cousteau>
ok so it's a 5-stage in-order RISC-V core made by Bluespec
<cousteau>
which are the ones who made... this... something related to formal verification maybe?
fabs has quit [Remote host closed the connection]
kilobyte_ch has joined #riscv
<sorear>
it's a HDL with much more composable semantics than chisel or verilog
<sorear>
it gets used for formal verification but I'm planning to use it because it's easier to reason about for *me*
<sorear>
i still can't keep valid and ready straight
<sorear>
i briefly intended to write a clone of the compiler for personal use, but then they open-sourced it, so
pecastro has quit [Ping timeout: 272 seconds]
<cousteau>
sorear: ah right
<cousteau>
is it related to this other one... I forgot the name... the one VexRiscV is based on?
<cousteau>
SpinalHDL
<sorear>
no, spinalhdl is a fork of chisel 2
<cousteau>
oh, is it?
<sorear>
parallels chisel 3 in some ways, diverges in others
<cousteau>
I knew it was scala-based, didn't know it was chisel-based
<courmisch>
do I suppose that the C910 have a single float point unit? vector floating point is slower than scalar
* cousteau
looks up C910
<sorear>
you are asking if you can run scalar and vector floating point simultaneously?
* cousteau
sees that the project immediately assumes the user runs csh and automatically starts disliking the project
<courmisch>
no. I'm just saying that a fairly simple vector float loop is slightly slower than the equivalent scalar float loop
<sorear>
it's a verilog hdl project, you're lucky if it doesn't start by telling you to install tcl
pecastro has joined #riscv
<cousteau>
meh, tcl is probably already there
<sorear>
does that depend on the iteration count?
<cousteau>
and (t)csh too, just not as the default shell
<cousteau>
courmisch: maybe the vector unit has many slow FPUs and the scalar unit has one fast FPU
<cousteau>
so running unparallelizable code on the vector FPU will be slower
<cousteau>
but running parallelizable code MIGHT be faster
<sorear>
if you post the loop / test code I can come up with some weirder theories
<courmisch>
the code is not unparallelisable, since it's been vectored
* sorear
trying to figure out which synthesis toolchain is least likely to fight a nix installation
andyc has quit [Quit: Connection closed for inactivity]
<courmisch>
I tried swapping the order of the loads, but that makes it even worse
<dh`>
hey, there's nothing wrong with tcsh, just as long as you don't start writing csh scripts
___nick___ has joined #riscv
<sorear>
courmisch: hmm, what if instead of vlseg2e32 you use vle64 + vnsrl + vnsrl
<sorear>
this is probably a bad idea in general but if it helps on c910 that tells us something
<courmisch>
that won't work on RVV 0.7.1, or rather it will require additional vsetvls
<sorear>
reasonable implementations of V either have multi-issue or multi-cycle execution of vector instructions, making vsetvls and pointer updates/counter subtraction free
jacklsw has quit [Ping timeout: 240 seconds]
<dzaima[m]>
I'd imagine with a higher LMUL that overhead should also decrease, but here it's LMUL=1
<courmisch>
sorear: also 32 is probably not a legal imm for vnsrl.wi :/
<courmisch>
well I can use an additional temporary
jmdaemon has joined #riscv
<courmisch>
sorear: yeah, that is a little bit faster than scalar. Which means they botched the segmented loads
<sorear>
the segmented loads are a PITA
kilobyte_ch has quit [Ping timeout: 245 seconds]
<sorear>
I very much doubt c910 will be the last implementation to run unit-stride segmented loads and stores at much less than full cache speed. I haven't even figured out a workable general approach for panpipe, although vlseg2 is easy enough and might be worth special-casing
kilobyte_ch has joined #riscv
cwebber` has joined #riscv
cwebber has quit [Ping timeout: 245 seconds]
cwebber` has quit [Read error: Connection reset by peer]
cwebber` has joined #riscv
billchenchina has quit [Remote host closed the connection]
BootLayer has quit [Quit: Leaving]
Tenkawa has joined #riscv
billchenchina has joined #riscv
billchenchina has quit [Remote host closed the connection]
billchenchina has joined #riscv
Bluefoxicy has quit [Ping timeout: 252 seconds]
Bluefoxicy has joined #riscv
billchenchina has quit [Quit: Leaving]
___nick___ has quit [Ping timeout: 245 seconds]
aerkiaga has quit [Remote host closed the connection]
flip214 has quit [Read error: Connection reset by peer]
flip214 has joined #riscv
<courmisch>
so writing RVV was too easy compared to NEON and SVE2; they had to add useless loopholes?
<muurkha>
heh
<courmisch>
segmented loads are super useful. Or rather they'd be if they didn't destroy perfs like this.
<courmisch>
I understand that non-unit-strided loads are intrinsically slow, but if segmented loads are also slow, it really sucks
<courmisch>
though TBH, I haven't checked how NEON/SVE segmented loads behave on the competiting side
<courmisch>
competing*
cousteau has quit [Remote host closed the connection]
cousteau has joined #riscv
cwebber` has quit [Read error: Connection reset by peer]
<sorear>
courmisch: look on the bright side, if you commit the code you wrote then the next time people design a vector engine they'll observe that implementing a fast path for vlseg2 is a big advantage on ffmpeg benchmarks
<sorear>
courmisch: for my personal use: what combinations of NF and ELEN are you expecting common use of?
crabbedhaloablut has quit []
<dzaima[m]>
I've seen what I believe was NEONs equivalent of segmented load or store being unreasonably slow on both cortex A53 and Apple's M1
<sorear>
compared to other implementations?
<dzaima[m]>
compared to a regular load & a shuffle
cwebber`` has joined #riscv
cwebber` has quit [Ping timeout: 245 seconds]
<dzaima[m]>
could've been some other factor though (unroll amount maybe), didn't look too much into it
jekb_ has quit [Ping timeout: 240 seconds]
jekb has joined #riscv
terminalpusher has quit [Remote host closed the connection]
* cousteau
should learn RVV
matoro has joined #riscv
matoro has quit [Client Quit]
matoro has joined #riscv
<sorear>
considering that rvv was ratified with zero compilers, zero(?) implementations (_still_ zero for OSHW implementations), and close to zero accelerated apps and libraries, it's a miracle it's as usable as it is
<muurkha>
very unlike the process for RVI and RVC
<sorear>
at least most of what I want is additive, not subtractive
<sorear>
vmslidedown when :/
<muurkha>
(though in those cases ARMv7 is still more usable, just more complicated)
<sorear>
more usable as an apps developer, sure
<muurkha>
right
<sorear>
I would not say armv7 is easier to implement than rv32gc