sorear[m] changed the topic of #riscv to: Matrix users: #riscv:libera.chat will be ending operation NET Jul 25; please test #riscv:catircservices.org as a replacement | RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
Leopold has quit [Ping timeout: 250 seconds]
Leopold has joined #riscv
Leopold has quit [Ping timeout: 240 seconds]
Leopold has joined #riscv
Pierce[m] has joined #riscv
unsigned has joined #riscv
Andre_Z has quit [Quit: Leaving.]
jn has quit [Ping timeout: 240 seconds]
jn has joined #riscv
jn has joined #riscv
jn has quit [Changing host]
terminalpusher has quit [Ping timeout: 246 seconds]
Tenkawa has joined #riscv
Tenkawa has quit [Client Quit]
crabbedhaloablut has quit [Ping timeout: 246 seconds]
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
Forty-Bot has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
billchenchina has joined #riscv
MarvelousWololo has quit [Ping timeout: 258 seconds]
vagrantc has joined #riscv
<sorear> who wants to name a vector unit probably targeting Flute
vagrantc has quit [Ping timeout: 272 seconds]
BootLayer has joined #riscv
elastic_dog has quit [Ping timeout: 260 seconds]
elastic_dog has joined #riscv
MarvelousWololo has joined #riscv
<dh`> panflute?
<muurkha> +1
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
billchenchina has quit [Quit: Leaving]
crabbedhaloablut has joined #riscv
MarvelousWololo has quit [Read error: Connection reset by peer]
wingsorc has quit [Remote host closed the connection]
wingsorc has joined #riscv
pbsds has quit [Quit: The Lounge - https://thelounge.chat]
aburgess has quit [Ping timeout: 246 seconds]
[0x4A6F][m] has joined #riscv
BootLayer_ has joined #riscv
BootLayer has quit [Ping timeout: 245 seconds]
pbsds has joined #riscv
awita has joined #riscv
awita has quit [Remote host closed the connection]
pecastro has joined #riscv
<cousteau> dh`: oh my god
<cousteau> brilliant
<cousteau> (or maybe some synonym, like "Syrinx", to make it less obvious)
jmdaemon has quit [Ping timeout: 246 seconds]
<cousteau> like how "hwacha" is a thing that launches multiple "rocket"s in parallel
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
aburgess has joined #riscv
joev has quit [Ping timeout: 246 seconds]
joev has joined #riscv
BootLayer_ has quit [Quit: Leaving]
aburgess has quit [Ping timeout: 245 seconds]
intrigus has joined #riscv
Raito_Bezarius has quit [Read error: Connection reset by peer]
<sevan> took 15 days to complete a Perl 5 smoke test on a D1 LicheerRV running Ubuntu 23.04 :)
<sevan> All testsuite attempts failed due to failing time tests due to clock drift issue
<sevan> due to the clock drift issue the dist/threads-shared/t/waithires.t tests also fail when testing against threaded perl.
Tenkawa has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
aerkiaga has joined #riscv
aburgess has joined #riscv
intrigus has quit [Ping timeout: 246 seconds]
BootLayer has joined #riscv
drmpeg has left #riscv [#riscv]
terminalpusher has joined #riscv
Andre_Z has joined #riscv
intrigus has joined #riscv
cousteau has quit [Remote host closed the connection]
cousteau has joined #riscv
Tenkawa has quit [Quit: Was I really ever here?]
awita has joined #riscv
drmpeg has joined #riscv
joev has quit [Ping timeout: 245 seconds]
joev has joined #riscv
intrigus has quit [Quit: Client closed]
psydroid[m] has joined #riscv
Noisytoot has quit [Ping timeout: 272 seconds]
BootLayer has quit [Quit: Leaving]
Noisytoot has joined #riscv
Andre_Z has quit [Quit: Leaving.]
BootLayer has joined #riscv
pecastro has quit [Read error: Connection reset by peer]
pecastro has joined #riscv
billchenchina has joined #riscv
andyc has joined #riscv
MarvelousWololo has joined #riscv
<sorear> maybe.
awita has quit [Ping timeout: 246 seconds]
<sorear> jrtc27 or anyone else: have you run two or more of flute, rocket, cva6 with a rv64gc configuration through a synthesis and timing analysis flow for any target, and can you comment offhand about whether they are broadly comparable or wildly different?
jacklsw has joined #riscv
kilobyte_ch has quit [Ping timeout: 252 seconds]
<jrtc27[m]> IIRC rocket achieves a higher frequency than flute, maybe 20-50%, on a VCU118
<jrtc27[m]> Don’t recall area
<cousteau> I should look up what's flute
<jrtc27[m]> That may have improved since though, early Flute was quite aggressive at not introducing cycles of latency
<cousteau> apparently it's a musical instrument... /me adds more search parameters
<cousteau> ok so it's a 5-stage in-order RISC-V core made by Bluespec
<cousteau> which are the ones who made... this... something related to formal verification maybe?
fabs has quit [Remote host closed the connection]
kilobyte_ch has joined #riscv
<sorear> it's a HDL with much more composable semantics than chisel or verilog
<sorear> it gets used for formal verification but I'm planning to use it because it's easier to reason about for *me*
<sorear> i still can't keep valid and ready straight
<sorear> i briefly intended to write a clone of the compiler for personal use, but then they open-sourced it, so
pecastro has quit [Ping timeout: 272 seconds]
<cousteau> sorear: ah right
<cousteau> is it related to this other one... I forgot the name... the one VexRiscV is based on?
<cousteau> SpinalHDL
<sorear> no, spinalhdl is a fork of chisel 2
<cousteau> oh, is it?
<sorear> parallels chisel 3 in some ways, diverges in others
<cousteau> I knew it was scala-based, didn't know it was chisel-based
<courmisch> do I suppose that the C910 have a single float point unit? vector floating point is slower than scalar
* cousteau looks up C910
<sorear> you are asking if you can run scalar and vector floating point simultaneously?
* cousteau sees that the project immediately assumes the user runs csh and automatically starts disliking the project
<courmisch> no. I'm just saying that a fairly simple vector float loop is slightly slower than the equivalent scalar float loop
<sorear> it's a verilog hdl project, you're lucky if it doesn't start by telling you to install tcl
pecastro has joined #riscv
<cousteau> meh, tcl is probably already there
<sorear> does that depend on the iteration count?
<cousteau> and (t)csh too, just not as the default shell
<cousteau> courmisch: maybe the vector unit has many slow FPUs and the scalar unit has one fast FPU
<cousteau> so running unparallelizable code on the vector FPU will be slower
<cousteau> but running parallelizable code MIGHT be faster
<sorear> if you post the loop / test code I can come up with some weirder theories
<courmisch> the code is not unparallelisable, since it's been vectored
* sorear trying to figure out which synthesis toolchain is least likely to fight a nix installation
andyc has quit [Quit: Connection closed for inactivity]
___nick___ has joined #riscv
___nick___ has quit [Client Quit]
___nick___ has joined #riscv
___nick___ has quit [Client Quit]
<courmisch> I tried swapping the order of the loads, but that makes it even worse
<dh`> hey, there's nothing wrong with tcsh, just as long as you don't start writing csh scripts
___nick___ has joined #riscv
<sorear> courmisch: hmm, what if instead of vlseg2e32 you use vle64 + vnsrl + vnsrl
<sorear> this is probably a bad idea in general but if it helps on c910 that tells us something
<courmisch> that won't work on RVV 0.7.1, or rather it will require additional vsetvls
<sorear> reasonable implementations of V either have multi-issue or multi-cycle execution of vector instructions, making vsetvls and pointer updates/counter subtraction free
jacklsw has quit [Ping timeout: 240 seconds]
<dzaima[m]> I'd imagine with a higher LMUL that overhead should also decrease, but here it's LMUL=1
<courmisch> sorear: also 32 is probably not a legal imm for vnsrl.wi :/
<courmisch> well I can use an additional temporary
jmdaemon has joined #riscv
<courmisch> sorear: yeah, that is a little bit faster than scalar. Which means they botched the segmented loads
<sorear> the segmented loads are a PITA
kilobyte_ch has quit [Ping timeout: 245 seconds]
<sorear> I very much doubt c910 will be the last implementation to run unit-stride segmented loads and stores at much less than full cache speed. I haven't even figured out a workable general approach for panpipe, although vlseg2 is easy enough and might be worth special-casing
kilobyte_ch has joined #riscv
cwebber` has joined #riscv
cwebber has quit [Ping timeout: 245 seconds]
cwebber` has quit [Read error: Connection reset by peer]
cwebber` has joined #riscv
billchenchina has quit [Remote host closed the connection]
BootLayer has quit [Quit: Leaving]
Tenkawa has joined #riscv
billchenchina has joined #riscv
billchenchina has quit [Remote host closed the connection]
billchenchina has joined #riscv
Bluefoxicy has quit [Ping timeout: 252 seconds]
Bluefoxicy has joined #riscv
billchenchina has quit [Quit: Leaving]
___nick___ has quit [Ping timeout: 245 seconds]
aerkiaga has quit [Remote host closed the connection]
flip214 has quit [Read error: Connection reset by peer]
flip214 has joined #riscv
<courmisch> so writing RVV was too easy compared to NEON and SVE2; they had to add useless loopholes?
<muurkha> heh
<courmisch> segmented loads are super useful. Or rather they'd be if they didn't destroy perfs like this.
<courmisch> I understand that non-unit-strided loads are intrinsically slow, but if segmented loads are also slow, it really sucks
<courmisch> though TBH, I haven't checked how NEON/SVE segmented loads behave on the competiting side
<courmisch> competing*
cousteau has quit [Remote host closed the connection]
cousteau has joined #riscv
cwebber` has quit [Read error: Connection reset by peer]
cwebber` has joined #riscv
matoro has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Tenkawa has quit [Quit: Was I really ever here?]
<sorear> courmisch: look on the bright side, if you commit the code you wrote then the next time people design a vector engine they'll observe that implementing a fast path for vlseg2 is a big advantage on ffmpeg benchmarks
<sorear> courmisch: for my personal use: what combinations of NF and ELEN are you expecting common use of?
crabbedhaloablut has quit []
<dzaima[m]> I've seen what I believe was NEONs equivalent of segmented load or store being unreasonably slow on both cortex A53 and Apple's M1
<sorear> compared to other implementations?
<dzaima[m]> compared to a regular load & a shuffle
cwebber`` has joined #riscv
cwebber` has quit [Ping timeout: 245 seconds]
<dzaima[m]> could've been some other factor though (unroll amount maybe), didn't look too much into it
jekb_ has quit [Ping timeout: 240 seconds]
jekb has joined #riscv
terminalpusher has quit [Remote host closed the connection]
* cousteau should learn RVV
matoro has joined #riscv
matoro has quit [Client Quit]
matoro has joined #riscv
<sorear> considering that rvv was ratified with zero compilers, zero(?) implementations (_still_ zero for OSHW implementations), and close to zero accelerated apps and libraries, it's a miracle it's as usable as it is
<muurkha> very unlike the process for RVI and RVC
<sorear> at least most of what I want is additive, not subtractive
<sorear> vmslidedown when :/
<muurkha> (though in those cases ARMv7 is still more usable, just more complicated)
<sorear> more usable as an apps developer, sure
<muurkha> right
<sorear> I would not say armv7 is easier to implement than rv32gc
<muurkha> certainly not!
<muurkha> nor smaller
<muurkha> on silicon
matoro has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
matoro has joined #riscv
matoro has quit [Client Quit]
matoro has joined #riscv
matoro has quit [Client Quit]
matoro has joined #riscv
<dzaima[m]> re: NEON segment load/store - appears I improperly tested it and only a scalar version ran.. so disregard that
pecastro has quit [Ping timeout: 272 seconds]
cousteau_ has joined #riscv
cousteau has quit [Ping timeout: 245 seconds]
TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
TMM_ has joined #riscv
Tenkawa has joined #riscv