#riscv on 2023-07-15 — irc logs at libera.irclog.whitequark.org

2023-07-13 23:36 sorear[m] changed the topic of #riscv to: Matrix users: #riscv:libera.chat will be ending operation NET Jul 25; please test #riscv:catircservices.org as a replacement | RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv

00:07 Leopold has quit [Ping timeout: 250 seconds]

00:11 Leopold has joined #riscv

00:20 Leopold has quit [Ping timeout: 240 seconds]

00:24 Leopold has joined #riscv

00:39 Pierce[m] has joined #riscv

00:42 unsigned has joined #riscv

00:51 Andre_Z has quit [Quit: Leaving.]

01:57 jn has quit [Ping timeout: 240 seconds]

01:58 jn has joined #riscv

01:58 jn has quit [Changing host]

02:32 terminalpusher has quit [Ping timeout: 246 seconds]

02:33 Tenkawa has joined #riscv

02:33 Tenkawa has quit [Client Quit]

02:42 crabbedhaloablut has quit [Ping timeout: 246 seconds]

03:00 joev has quit [Ping timeout: 245 seconds]

03:00 joev has joined #riscv

03:11 Forty-Bot has joined #riscv

03:29 joev has quit [Ping timeout: 245 seconds]

03:29 joev has joined #riscv

03:43 billchenchina has joined #riscv

04:04 MarvelousWololo has quit [Ping timeout: 258 seconds]

04:11 vagrantc has joined #riscv

04:41 <sorear> who wants to name a vector unit probably targeting Flute

05:02 vagrantc has quit [Ping timeout: 272 seconds]

05:11 BootLayer has joined #riscv

05:45 elastic_dog has quit [Ping timeout: 260 seconds]

05:57 elastic_dog has joined #riscv

05:59 MarvelousWololo has joined #riscv

06:24 <dh`> panflute?

06:30 <muurkha> +1

06:38 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

06:38 TMM_ has joined #riscv

06:44 billchenchina has quit [Quit: Leaving]

06:51 crabbedhaloablut has joined #riscv

07:28 MarvelousWololo has quit [Read error: Connection reset by peer]

07:39 wingsorc has quit [Remote host closed the connection]

07:39 wingsorc has joined #riscv

07:46 pbsds has quit [Quit: The Lounge - https://thelounge.chat]

07:48 aburgess has quit [Ping timeout: 246 seconds]

08:52 [0x4A6F][m] has joined #riscv

09:00 BootLayer_ has joined #riscv

09:03 BootLayer has quit [Ping timeout: 245 seconds]

09:14 pbsds has joined #riscv

09:28 awita has joined #riscv

09:51 awita has quit [Remote host closed the connection]

09:59 pecastro has joined #riscv

10:07 <cousteau> dh`: oh my god

10:07 <cousteau> brilliant

10:09 <cousteau> (or maybe some synonym, like "Syrinx", to make it less obvious)

10:09 jmdaemon has quit [Ping timeout: 246 seconds]

10:10 <cousteau> like how "hwacha" is a thing that launches multiple "rocket"s in parallel

10:19 joev has quit [Ping timeout: 246 seconds]

10:20 joev has joined #riscv

10:27 aburgess has joined #riscv

10:32 joev has quit [Ping timeout: 246 seconds]

10:33 joev has joined #riscv

10:46 BootLayer_ has quit [Quit: Leaving]

10:46 aburgess has quit [Ping timeout: 245 seconds]

10:51 intrigus has joined #riscv

11:02 Raito_Bezarius has quit [Read error: Connection reset by peer]

11:03 <sevan> took 15 days to complete a Perl 5 smoke test on a D1 LicheerRV running Ubuntu 23.04 :)

11:04 <sevan> All testsuite attempts failed due to failing time tests due to clock drift issue

11:05 <sevan> due to the clock drift issue the dist/threads-shared/t/waithires.t tests also fail when testing against threaded perl.

11:10 Tenkawa has joined #riscv

11:19 joev has quit [Ping timeout: 245 seconds]

11:20 joev has joined #riscv

11:27 aerkiaga has joined #riscv

11:29 aburgess has joined #riscv

11:49 intrigus has quit [Ping timeout: 246 seconds]

11:50 BootLayer has joined #riscv

12:33 drmpeg has left #riscv [#riscv]

12:35 terminalpusher has joined #riscv

12:38 Andre_Z has joined #riscv

12:50 intrigus has joined #riscv

12:57 cousteau has quit [Remote host closed the connection]

12:58 cousteau has joined #riscv

13:01 Tenkawa has quit [Quit: Was I really ever here?]

13:05 awita has joined #riscv

13:10 drmpeg has joined #riscv

13:24 joev has quit [Ping timeout: 245 seconds]

13:25 joev has joined #riscv

13:27 intrigus has quit [Quit: Client closed]

13:31 psydroid[m] has joined #riscv

13:53 Noisytoot has quit [Ping timeout: 272 seconds]

14:02 BootLayer has quit [Quit: Leaving]

14:08 Noisytoot has joined #riscv

14:35 Andre_Z has quit [Quit: Leaving.]

14:37 BootLayer has joined #riscv

14:45 pecastro has quit [Read error: Connection reset by peer]

14:55 pecastro has joined #riscv

15:05 billchenchina has joined #riscv

15:10 andyc has joined #riscv

15:10 MarvelousWololo has joined #riscv

15:47 <sorear> maybe.

15:59 awita has quit [Ping timeout: 246 seconds]

16:01 <sorear> jrtc27 or anyone else: have you run two or more of flute, rocket, cva6 with a rv64gc configuration through a synthesis and timing analysis flow for any target, and can you comment offhand about whether they are broadly comparable or wildly different?

16:07 jacklsw has joined #riscv

16:09 kilobyte_ch has quit [Ping timeout: 252 seconds]

16:16 <jrtc27[m]> IIRC rocket achieves a higher frequency than flute, maybe 20-50%, on a VCU118

16:16 <jrtc27[m]> Don’t recall area

16:17 <cousteau> I should look up what's flute

16:17 <jrtc27[m]> That may have improved since though, early Flute was quite aggressive at not introducing cycles of latency

16:17 <cousteau> apparently it's a musical instrument... /me adds more search parameters

16:18 <cousteau> ok so it's a 5-stage in-order RISC-V core made by Bluespec

16:18 <cousteau> which are the ones who made... this... something related to formal verification maybe?

16:20 fabs has quit [Remote host closed the connection]

16:24 kilobyte_ch has joined #riscv

16:27 <sorear> it's a HDL with much more composable semantics than chisel or verilog

16:27 <sorear> it gets used for formal verification but I'm planning to use it because it's easier to reason about for *me*

16:27 <sorear> i still can't keep valid and ready straight

16:29 <sorear> i briefly intended to write a clone of the compiler for personal use, but then they open-sourced it, so

16:36 pecastro has quit [Ping timeout: 272 seconds]

16:45 <cousteau> sorear: ah right

16:45 <cousteau> is it related to this other one... I forgot the name... the one VexRiscV is based on?

16:45 <cousteau> SpinalHDL

16:45 <sorear> no, spinalhdl is a fork of chisel 2

16:46 <cousteau> oh, is it?

16:46 <sorear> parallels chisel 3 in some ways, diverges in others

16:46 <cousteau> I knew it was scala-based, didn't know it was chisel-based

16:52 <courmisch> do I suppose that the C910 have a single float point unit? vector floating point is slower than scalar

16:54 * cousteau looks up C910

16:54 <sorear> you are asking if you can run scalar and vector floating point simultaneously?

16:54 * cousteau sees that the project immediately assumes the user runs csh and automatically starts disliking the project

16:55 <courmisch> no. I'm just saying that a fairly simple vector float loop is slightly slower than the equivalent scalar float loop

16:55 <sorear> it's a verilog hdl project, you're lucky if it doesn't start by telling you to install tcl

16:56 pecastro has joined #riscv

16:56 <cousteau> meh, tcl is probably already there

16:56 <sorear> does that depend on the iteration count?

16:56 <cousteau> and (t)csh too, just not as the default shell

16:56 <cousteau> courmisch: maybe the vector unit has many slow FPUs and the scalar unit has one fast FPU

16:57 <cousteau> so running unparallelizable code on the vector FPU will be slower

16:57 <cousteau> but running parallelizable code MIGHT be faster

16:57 <sorear> if you post the loop / test code I can come up with some weirder theories

16:58 <courmisch> the code is not unparallelisable, since it's been vectored

16:58 * sorear trying to figure out which synthesis toolchain is least likely to fight a nix installation

17:20 andyc has quit [Quit: Connection closed for inactivity]

17:24 ___nick___ has joined #riscv

17:27 ___nick___ has quit [Client Quit]

17:27 <courmisch> sorear: https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavcodec/riscv/aacpsdsp_rvv.S#l23

17:27 <courmisch> https://git.ffmpeg.org/gitweb/ffmpeg.git/blob/HEAD:/libavcodec/aacpsdsp_template.c#l29 (the C scalar version)

17:29 ___nick___ has joined #riscv

17:29 ___nick___ has quit [Client Quit]

17:30 <courmisch> I tried swapping the order of the loads, but that makes it even worse

17:31 <dh`> hey, there's nothing wrong with tcsh, just as long as you don't start writing csh scripts

17:31 ___nick___ has joined #riscv

17:33 <sorear> courmisch: hmm, what if instead of vlseg2e32 you use vle64 + vnsrl + vnsrl

17:33 <sorear> this is probably a bad idea in general but if it helps on c910 that tells us something

17:34 <courmisch> that won't work on RVV 0.7.1, or rather it will require additional vsetvls

17:35 <sorear> reasonable implementations of V either have multi-issue or multi-cycle execution of vector instructions, making vsetvls and pointer updates/counter subtraction free

17:37 jacklsw has quit [Ping timeout: 240 seconds]

17:37 <dzaima[m]> I'd imagine with a higher LMUL that overhead should also decrease, but here it's LMUL=1

17:39 <courmisch> sorear: also 32 is probably not a legal imm for vnsrl.wi :/

17:39 <courmisch> well I can use an additional temporary

17:39 jmdaemon has joined #riscv

17:59 <courmisch> sorear: yeah, that is a little bit faster than scalar. Which means they botched the segmented loads

18:08 <sorear> the segmented loads are a PITA

18:23 kilobyte_ch has quit [Ping timeout: 245 seconds]

18:34 <sorear> I very much doubt c910 will be the last implementation to run unit-stride segmented loads and stores at much less than full cache speed. I haven't even figured out a workable general approach for panpipe, although vlseg2 is easy enough and might be worth special-casing

18:37 kilobyte_ch has joined #riscv

18:42 cwebber` has joined #riscv

18:43 cwebber has quit [Ping timeout: 245 seconds]

18:45 cwebber` has quit [Read error: Connection reset by peer]

18:48 cwebber` has joined #riscv

19:00 billchenchina has quit [Remote host closed the connection]

19:02 BootLayer has quit [Quit: Leaving]

19:18 Tenkawa has joined #riscv

19:26 billchenchina has joined #riscv

19:27 billchenchina has quit [Remote host closed the connection]

19:27 billchenchina has joined #riscv

19:29 Bluefoxicy has quit [Ping timeout: 252 seconds]

19:32 Bluefoxicy has joined #riscv

20:01 billchenchina has quit [Quit: Leaving]

20:04 ___nick___ has quit [Ping timeout: 245 seconds]

20:05 aerkiaga has quit [Remote host closed the connection]

20:08 flip214 has quit [Read error: Connection reset by peer]

20:08 flip214 has joined #riscv

20:52 <courmisch> so writing RVV was too easy compared to NEON and SVE2; they had to add useless loopholes?

20:53 <muurkha> heh

20:57 <courmisch> segmented loads are super useful. Or rather they'd be if they didn't destroy perfs like this.

20:58 <courmisch> I understand that non-unit-strided loads are intrinsically slow, but if segmented loads are also slow, it really sucks

20:58 <courmisch> though TBH, I haven't checked how NEON/SVE segmented loads behave on the competiting side

20:58 <courmisch> competing*

21:05 cousteau has quit [Remote host closed the connection]

21:06 cousteau has joined #riscv

21:13 cwebber` has quit [Read error: Connection reset by peer]

21:16 cwebber` has joined #riscv

21:27 matoro has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

21:40 Tenkawa has quit [Quit: Was I really ever here?]

21:56 <sorear> courmisch: look on the bright side, if you commit the code you wrote then the next time people design a vector engine they'll observe that implementing a fast path for vlseg2 is a big advantage on ffmpeg benchmarks

21:57 <sorear> courmisch: for my personal use: what combinations of NF and ELEN are you expecting common use of?

22:03 crabbedhaloablut has quit []

22:23 <dzaima[m]> I've seen what I believe was NEONs equivalent of segmented load or store being unreasonably slow on both cortex A53 and Apple's M1

22:24 <sorear> compared to other implementations?

22:24 <dzaima[m]> compared to a regular load & a shuffle

22:24 cwebber`` has joined #riscv

22:25 cwebber` has quit [Ping timeout: 245 seconds]

22:26 <dzaima[m]> could've been some other factor though (unroll amount maybe), didn't look too much into it

22:29 jekb_ has quit [Ping timeout: 240 seconds]

22:34 jekb has joined #riscv

22:48 terminalpusher has quit [Remote host closed the connection]

22:59 * cousteau should learn RVV

23:01 matoro has joined #riscv

23:01 matoro has quit [Client Quit]

23:03 matoro has joined #riscv

23:04 <sorear> considering that rvv was ratified with zero compilers, zero(?) implementations (_still_ zero for OSHW implementations), and close to zero accelerated apps and libraries, it's a miracle it's as usable as it is

23:07 <muurkha> very unlike the process for RVI and RVC

23:08 <sorear> at least most of what I want is additive, not subtractive

23:08 <sorear> vmslidedown when :/

23:08 <muurkha> (though in those cases ARMv7 is still more usable, just more complicated)

23:09 <sorear> more usable as an apps developer, sure

23:09 <muurkha> right

23:09 <sorear> I would not say armv7 is easier to implement than rv32gc

23:09 <muurkha> certainly not!

23:09 <muurkha> nor smaller

23:10 <muurkha> on silicon

23:12 matoro has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

23:13 matoro has joined #riscv

23:16 matoro has quit [Client Quit]

23:16 matoro has joined #riscv

23:16 matoro has quit [Client Quit]

23:18 matoro has joined #riscv

23:20 <dzaima[m]> re: NEON segment load/store - appears I improperly tested it and only a scalar version ran.. so disregard that

23:29 pecastro has quit [Ping timeout: 272 seconds]

23:36 cousteau_ has joined #riscv

23:38 cousteau has quit [Ping timeout: 245 seconds]

23:40 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

23:41 TMM_ has joined #riscv

23:51 Tenkawa has joined #riscv