#riscv on 2023-06-15 — irc logs at libera.irclog.whitequark.org

2021-08-01 01:31 sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv

00:05 sevan has quit [Ping timeout: 265 seconds]

00:18 pabs3 has quit [Quit: Don't rest until all the world is paved in moss and greenery.]

00:28 pabs3 has joined #riscv

00:39 pabs3 has quit [Quit: Don't rest until all the world is paved in moss and greenery.]

00:42 pabs3 has joined #riscv

00:42 pabs3 has quit [Remote host closed the connection]

00:48 ashtin has joined #riscv

00:51 pabs3 has joined #riscv

00:59 terminalpusher has quit [Remote host closed the connection]

01:19 handsome_feng has joined #riscv

01:33 Tenkawa has quit [Quit: Was I really ever here?]

01:46 jacklsw has joined #riscv

01:55 Pokey has quit [Server closed connection]

01:55 Pokey has joined #riscv

02:00 zBeeble42 has joined #riscv

02:02 zBeeble has quit [Remote host closed the connection]

02:17 knolle has quit [Ping timeout: 240 seconds]

02:32 <dh`> yes, that premise (except I thikn they were betting overall performance as well)

03:06 justache has quit [Server closed connection]

03:06 justache has joined #riscv

03:08 <muurkha> dh`: yeah. along those lines, what do you think of $ du -sh /lib/x86_64-linux-gnu/ /usr/lib/python3.8

03:09 <muurkha> 3,5G/lib/x86_64-linux-gnu/

03:09 <muurkha> uhgh

03:09 <muurkha> 43M/usr/lib/python3.8

03:09 <muurkha> sorry

03:09 <muurkha> https://blog.cr.yp.to/20190430-vectorize.html is what I meant to paste

03:13 <muurkha> it seems to endorse the RVV approach

03:14 <muurkha> unless I'm missing a subtle difference between ARM SVE and the RVV design

03:19 heat_ has joined #riscv

03:20 heat has quit [Ping timeout: 240 seconds]

03:25 jacklsw has quit [Quit: Back to the real world]

03:29 heat_ has quit [Remote host closed the connection]

03:30 heat_ has joined #riscv

03:32 jacklsw has joined #riscv

03:34 <muurkha> interestingly there's a slide deck on riscv.org about their PULPino RI5CY: https://riscv.org/wp-content/uploads/2016/01/Wed1315-PULP-riscv3_noanim.pdf

03:34 prabhakarlad has quit [Quit: Client closed]

03:39 raym has joined #riscv

03:48 <muurkha> it shows examples of the ungodly hacks in a RISC-V context

04:06 <muurkha> aha, and the RI5CY user manual https://pulp-platform.org/docs/ri5cy_user_manual.pdf explains how the DSP-style zero-overhead hardware looping support interacts with context switches: the loop state is exposed as CSRs. the OR10N paper didn't seem to explain that

04:14 <jrtc27> and not gated behind a feature bit, so your OS has to be aware of it otherwise bad things happen if something uses it

04:14 <jrtc27> (boom if multiple use it, plus side-channels galore)

04:17 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

04:17 TMM_ has joined #riscv

04:17 <muurkha> yes, I think you need to use an OS specifically patched for this chip

04:18 <muurkha> also it looks like Atom finally went out-of-order last year with Tremont: https://chipsandcheese.com/2022/01/02/intels-tremont-atom-changes-course/

04:19 <muurkha> I don't think these chips support memory protection, so if one task wanted to exfiltrate information to another I think it could just write into its memory

04:20 <muurkha> so I don't think they're worried about side-channels

04:22 <muurkha> it's interesting to think about what kind of design would allow you to add per-task state like these hardware looping registers in a way that you wouldn't need to patch the OS task switching code every time you added a new one

04:26 <muurkha> oh actually Goldmont Plus, the previous Atom, was also OoO: https://en.wikichip.org/wiki/intel/microarchitectures/goldmont_plus

04:28 <muurkha> Goldmont (without Plus) had register renaming and speculative execution but maybe not OoOE?

04:36 heat_ has quit [Ping timeout: 248 seconds]

04:45 <muurkha> so you would have won your bet anyway!

05:00 BootLayer has joined #riscv

05:07 JanC has quit [Remote host closed the connection]

05:07 JanC has joined #riscv

05:07 Larhzu has quit [Server closed connection]

05:07 Larhzu has joined #riscv

05:15 JanC has quit [Remote host closed the connection]

05:15 JanC has joined #riscv

05:18 <muurkha> current top item on Character Assassination News is about ESP32-C3: https://news.ycombinator.com/item?id=36336112

05:45 anup_patel has joined #riscv

05:49 anup_patel has quit [Remote host closed the connection]

05:51 jacklsw has quit [Quit: Back to the real world]

05:53 <dh`> that link (the dan bernstein one) sounds kind of nutty as one might perhaps expect ("take this freedom away"?)

05:53 <dh`> but, I think he's unaware of any number of reasons why huge vectors might in fact slow the whole system down

05:54 <dh`> meanwhile I've been more-or-less intentionally not following the RVV stuff so I don't have an opinion

06:01 elastic_dog has quit [Ping timeout: 240 seconds]

06:05 Leopold has joined #riscv

06:07 Leopold has quit [Remote host closed the connection]

06:12 jacklsw has joined #riscv

06:13 elastic_dog has joined #riscv

06:18 <muurkha> dh`: hmm, are you aware of such reasons, or are you just speculating that they might exist?

06:19 <muurkha> this is suboptimal output from objdump -d when presented with a not-yet-relocated la:

06:19 <muurkha> aa:00000517 auipca0,0x0

06:19 <muurkha> ae:00050513 mva0,a0

06:24 <jrtc27> what's your point?

06:24 <jrtc27> that's how RISC ISAs present

06:25 <jrtc27> and x86 would instead just say mov 0(%rip), %rXX

06:26 <jrtc27> uh lea even

06:26 <muurkha> maybe objdump should say addi a0, a0, 0 in this case

06:26 <jrtc27> there's an LLVM patch to do that

06:26 <jrtc27> I'm not sure I entirely agree with it

06:27 <jrtc27> unless you can go all the way and make it say addi a0, a0, %pcrel_lo(1b)

06:27 Leopold has joined #riscv

06:27 <jrtc27> but regardless, objdump -dr and move on

06:27 <muurkha> that would be a lot better but objdump doesn't normally go around looking at relocations

06:27 <muurkha> does it?

06:27 <jrtc27> -r does

06:27 <jrtc27> no reason -d couldn't

06:27 <jrtc27> like how -d already looks up symbols

06:28 <muurkha> hmm, good point

06:28 <muurkha> I'd never used -r, thanks

06:29 <muurkha> it just seems a little unnecessarily obscurantist to use the pseudo-instruction in the disassembly output; probably I'm not the only noob who forgets that mv is actually addi...0

06:31 <muurkha> which you still have to remember if you're reading this:

06:31 <muurkha> ae:00050513 mva0,a0

06:31 <muurkha> ae: R_RISCV_RELAX*ABS*

06:31 <muurkha> ae: R_RISCV_PCREL_LO12_I.L0

06:33 <jrtc27> well it'd have to look at the relocation to decide not to emit mv

06:34 <muurkha> oh, I was thinking it could just look at the fact that the source and destination register are the same

06:35 <muurkha> although I guess they wouldn't have to be

06:35 Armand has joined #riscv

06:36 <muurkha> in theory you could auipc or lui into one register and then fix up the low 12 bits in some other register, and maybe that would even be a useful thing to do if you had two or more constants to load in the same 12-bit area, but do any compilers actually do that?

06:39 <muurkha> oh well. pretty trivial

06:49 ldevulder has joined #riscv

07:00 Leopold has quit [Quit: No Ping reply in 180 seconds.]

07:01 Leopold has joined #riscv

07:14 JanC has quit [Read error: Connection reset by peer]

07:14 JanC has joined #riscv

07:25 JanC has quit [Ping timeout: 240 seconds]

07:26 JanC_ has joined #riscv

07:26 JanC_ is now known as JanC

07:26 Leopold has quit [Remote host closed the connection]

07:31 Leopold has joined #riscv

07:34 rsalveti has quit [Quit: Connection closed for inactivity]

07:37 pecastro has joined #riscv

07:39 ncopa has joined #riscv

07:39 ncopa has quit [Changing host]

07:54 danilogondolfo has joined #riscv

08:19 Wickram has joined #riscv

08:29 aburgess has quit [Ping timeout: 265 seconds]

08:29 JanC has quit [Ping timeout: 248 seconds]

08:34 JanC has joined #riscv

08:37 zjason` has joined #riscv

08:37 zjason has quit [Ping timeout: 240 seconds]

08:49 zjason` is now known as zjason

09:03 prabhakarlad has joined #riscv

09:04 jacklsw has quit [Quit: Back to the real world]

09:05 ema has quit [Quit: leaving]

09:08 <patersonc[m]> Sigh: https://lore.kernel.org/lkml/20230614104759.228372-1-prabhakar.mahadev-lad.rj@bp.renesas.com/T/#m2e414ba4dfda70ee311eab98c555e34acf6ed789

09:10 <patersonc[m]> Isn't the world full of "non-standard" devices?

09:12 <bjdooks> are those the special ones where they managed to make the closecoupled memory match on VA insyead of PA?

09:13 frkzoid has quit [Read error: Connection reset by peer]

09:13 freakazoid332 has joined #riscv

09:14 frkzoid has joined #riscv

09:16 frkazoid333 has quit [Ping timeout: 240 seconds]

09:24 unlord has quit [Ping timeout: 240 seconds]

09:25 ema has joined #riscv

09:27 unlord has joined #riscv

09:28 MaxGanzII has joined #riscv

09:33 Leopold has quit [Quit: No Ping reply in 180 seconds.]

09:40 Leopold has joined #riscv

09:47 Wickram has quit [Ping timeout: 240 seconds]

10:05 heat_ has joined #riscv

10:23 DynamiteDan has quit [Excess Flood]

10:23 DynamiteDan has joined #riscv

10:24 DynamiteDan has quit [Excess Flood]

10:24 DynamiteDan has joined #riscv

10:29 Leopold has quit [Ping timeout: 240 seconds]

10:34 Leopold has joined #riscv

10:49 wingsorc has quit [Ping timeout: 240 seconds]

11:00 <prabhakarlad> bjdooks: "losecoupled memory match on VA instead of PA" this has nothing to do with CMO.

11:04 <prabhakarlad> patch series pointed by patersonc[m] is to support non-coherent nonstandard systems with the use of func pointers. I vaguely remember you were working on such platform and you wanted to have function pointers too to handle CMO?

11:17 Leopold has quit [Quit: No Ping reply in 180 seconds.]

11:21 Tenkawa has joined #riscv

11:23 Leopold has joined #riscv

11:24 uzix is now known as mahk

11:27 FL4SHK has quit [Ping timeout: 240 seconds]

11:27 FL4SHK has joined #riscv

11:31 Wickram has joined #riscv

11:43 aburgess has joined #riscv

11:51 GreaseMonkey has quit [Quit: HYDRA IRC LOL]

11:54 DynamiteDan has quit []

11:54 DynamiteDan has joined #riscv

11:55 DynamiteDan has quit [Excess Flood]

11:55 DynamiteDan has joined #riscv

11:56 prabhakarlad has quit [Quit: Client closed]

11:59 knolle has joined #riscv

12:37 Wickram has quit [Quit: WeeChat 3.8]

12:37 Wickram has joined #riscv

13:20 shoragan has quit [Server closed connection]

13:21 shoragan has joined #riscv

13:26 Leopold_ has joined #riscv

13:27 meta-coder has joined #riscv

13:29 Leopold has quit [Ping timeout: 240 seconds]

13:32 BootLayer has quit [Quit: Leaving]

13:37 sauce has quit [Server closed connection]

13:37 sauce has joined #riscv

14:07 MaxGanzII has quit [Remote host closed the connection]

14:08 MaxGanzII has joined #riscv

14:11 prabhakarlad has joined #riscv

14:24 greaser|q has joined #riscv

14:24 greaser|q has quit [Client Quit]

14:30 Wickram has quit [Quit: WeeChat 3.8]

14:35 greaser|q has joined #riscv

14:55 awita has joined #riscv

15:04 prabhakarlad has quit [Quit: Client closed]

15:18 BootLayer has joined #riscv

15:29 Armand has quit [Ping timeout: 240 seconds]

15:38 handsome_feng has quit [Quit: Connection closed for inactivity]

15:40 <Larhzu> Hello! There is a plan to add a RISC-V filter to the .xz compression format. A filter converts pc-relative addresses in immediates to absolute addresses. This makes the data more repetitive.

15:40 <Larhzu> (1) Seems that Clang/LLVM 16 can generate code where AUIPC and the paired instruction(s) are not adjacent. Many variations are possible. There can even be a conditional branch between AUIPC and its paired instruction(s).

15:40 <Larhzu> Around 99 % are adjacent pairs so filtering only those works well for now. Any guesses about future compilers? If non-adjacent pairs are expected to become much more common (like 10 % of all uses) then the filter might need to catch the simplest forms of split pairs too.

15:40 <Larhzu> (2) The spec has a table about 48-bit and longer instructions but it's not frozen. I wonder if the table is still the best guess. It can help a little if the filter can sync to the instruction stream.

15:47 jacklsw has joined #riscv

15:47 <palmer> Larhzu: you should be able to get non-contiguous high/low relocations out of GCC as well, it's just not the default because we've found it generates slightly worse code

15:49 <Larhzu> palmer: OK. :-) I guess in the future it might depend on -mtune too.

15:50 jacklsw has quit [Client Quit]

15:50 <Larhzu> Like if the processor can fuse auipc+jalr or auipc+ld or not.

15:50 jacklsw has joined #riscv

15:53 wingsorc has joined #riscv

15:53 jacklsw has quit [Client Quit]

15:53 <jrtc27> probably most of the non-contiguous cases are things like `1: auipc a0, %pcrel_hi(foo); ld a1, %pcrel_lo(1b)(a0); addi a1, a1, 1; sd a1, %cprel_lo(1b)(a0)`

15:53 jacklsw has joined #riscv

15:53 vagrantc has joined #riscv

15:53 <jrtc27> ie reusing the auipc rather than rematerialising it

15:54 <jrtc27> that was the main motivation in LLVM for properly modelling things

15:55 <muurkha> yeah, I was wondering last night about whether that might happen, and that's a more plausible case than what I'd thought of

15:56 <Larhzu> I had objdumped files from llvm-16_16.0.4-1~exp1_riscv64.deb and found various interesting cases. :)

15:56 <Larhzu> The load-modify-store use is among them.

15:57 <Larhzu> https://paste.debian.net/hidden/6115b346/

15:57 <muurkha> also, and I don't know how likely this is, you could imagine a superscalar platform that didn't do op fusion, so that `auipc s0,1; addi s0,s0,248; auipc s1,2; addi s1,s1,40` would be slower than if you interleaved the pairs to get more ILP

15:57 sauce has quit []

15:58 jacklsw has quit [Client Quit]

15:58 <jrtc27> the last case definitely looks like a bug to me

15:59 jacklsw has joined #riscv

15:59 <muurkha> the dead loads in the paste?

15:59 <jrtc27> they're not dead

15:59 <jrtc27> but it's spilling an auipc rather than rematerialising

15:59 <jrtc27> (auipc isn't *quite* rematerialisable, but in effect it is in the way it's used here so long as you're careful)

16:00 <Larhzu> The original auipc result goes directly to stack without any other use. Unfortunately I didn't keep the exact filenames anywhere but it's from that Debian package.

16:00 <muurkha> oh, right, because the second instruction of each pair uses the result of the previous one

16:01 <Larhzu> So I suppose the auipc should just appear later right before the ld instructions.

16:01 <muurkha> that seems like it would be better

16:02 sauce has joined #riscv

16:03 <Larhzu> From filtering point of view, auipc is a complex instruction because one needs the paired instruction too. Converting auipc alone is off-by-4096 half the time, and it's good to convert the lowest 12 bits too.

16:04 <Larhzu> If there is auipc-ld-addi-sd sequence, handling just the auipc-ld part is very good already.

16:05 aburgess has quit [Ping timeout: 260 seconds]

16:05 <Larhzu> A filter is small and dumb code. ARM64 filter, without comments and empty lines, is under 40 lines. RISC-V prototypes are 75-120 lines.

16:05 <Larhzu> So simple and good enough is the goal.

16:07 <Larhzu> But if future output from compilers will have more of auipc-auipc-ld-ld or such cases then perhaps a filter should handle them. An improved filter can made later but it's a bit annoying.

16:08 <muurkha> I wonder if you could usefully filter just the instruction with the 12-bit offset

16:08 <muurkha> l*, s*, jalr

16:09 <muurkha> because that's where most of the entropy will be

16:09 <Larhzu> No because the 12-bit offset is relative to the pc of auipc, not the pc of l* or s*.

16:09 <Larhzu> To filter the pair (load, store, jalr, addi) one has to know the pc of the related auipc.

16:10 <muurkha> that's true, but as you said, in 99% of cases it's the previous instruction

16:10 <Larhzu> And to filter auipc, one has to know the lowest 12 bits because otherwise the auipc conversion will be off-by-4096 half the time.

16:10 <Larhzu> Current filter prototypes treat auipc+inst2 as a fused pair (like 8-byte instruction) so either both are converted or neither.

16:11 <muurkha> well, to do it correctly in every case, one has to

16:11 jacklsw has quit [Quit: Back to the real life]

16:11 <Larhzu> Adding lookahead of a few instructions is possible but the first tries didn't give good results. If filter is applied to the whole executable and not just .text then false positives in non-code data are a problem.

16:11 jacklsw has joined #riscv

16:12 jacklsw has quit [Client Quit]

16:12 guerby__ is now known as guerby

16:13 <Larhzu> Trying to filter only .text would be good but ELF section headers are at the end of the executable which isn't nice for streamable compression. Program headers fairly accurately tell the location of executable section on x86-64 but not on RISC-V or ARM64.

16:14 <muurkha> a drawback of the pre-filtering approach to improving compressibility is that you can't just decide not to use an encoding in a given case because it doesn't help compression, I guess

16:15 <Larhzu> There are ideas about smarter filtering. To use section headers, one has to buffer a lot to allow compression tool to work in pipes.

16:16 <Larhzu> Or figuring out some way to cheat, for example, detecting what is executable code and what isn't without ELF headers.

16:20 <Larhzu> Perhaps the filter development should wait a bit to see how compiler outputs evolve.

16:25 <muurkha> RISC-V is 13 years old tho

16:25 <muurkha> and one of the first things they did was get a GCC target working

16:26 <Larhzu> Linker relaxation in psABI doesn't allow auipc+ld to become auipc+c.ld. In big executables there are a few places where it would be possible (the immediate would fit) but it's not common. From filtering point of view I kind of hope such relaxation won't be allowed.

16:26 <muurkha> it's not as if the current LLVM and GCC support is an early prototype that will be replaced by something much better next week

16:26 <Larhzu> RISC-V is both old and young at the same time.

16:27 <jrtc27> who says you can't turn auipc+ld into auipc+c.ld?

16:28 <jrtc27> from a psABI perspective you can

16:28 <jrtc27> whether GNU ld bothers to is a different matter

16:28 <Larhzu> It's not explicitly listed under "Linker Relaxation Types". So perhaps my interpration was just too strict.

16:28 <jrtc27> it already turns auipc+jalr into jal or c.jal

16:29 <jrtc27> (or c.j for the x0 rather than ra case)

16:29 <Larhzu> auipc+jalr to c.jal and to c.j *are* expliclty listed.

16:29 <jrtc27> we should probably kill that section...

16:29 <jrtc27> and just say "to a semantically-equivalent sequence" or similar

16:30 <jrtc27> enumerating every possibility, whether performed or not, is a fool's errand

16:30 <jrtc27> and only call out some of the interesting ones

16:30 awita has quit [Ping timeout: 246 seconds]

16:30 <jrtc27> like GP-based relaxation

16:30 <jrtc27> and, when added, GOT->non-GOT

16:31 <jrtc27> which are both semantically equivalent in the normal case, but do have some implications for certain use cases

16:31 <Larhzu> muurkha: It's not about they being replaced. A new processor might have best performance with different instruction scheduling, including putting something else in the middle of auipc+inst2. It would be new -mtune=foo.

16:31 <jrtc27> (former for if GP isn't set correctly, e.g. at program startup when you're trying to set GP in the first place, and latter for early-boot code of OSes that have yet to set up virtual memory)

16:33 <Larhzu> jrtc27: OK, thanks, this is useful info to me. :-)

16:48 heat_ has quit [Read error: Connection reset by peer]

16:48 heat has joined #riscv

16:56 Armand has joined #riscv

16:58 ikke has quit [Quit: WeeChat 3.8]

17:05 <dh`> muurkha: real reasons use/overuse of vectors might slow things down headline with "if you use these monster registers they have to be saevd and loaded by the kernel all over the place"

17:06 Andre_Z has joined #riscv

17:06 <dh`> also, more speculatively, I'd expect that if you make your vector ops too dense other functional units end up going idle, which doesn't necessairly result in better overall throughput

17:07 <dh`> meanwhile iirc from last night djb's reasoning ignored the fact that these things usually happen in loops and there's an icache

17:10 jacklsw has joined #riscv

17:14 MaxGanzII has quit [Ping timeout: 240 seconds]

17:16 jacklsw has quit [Quit: Back to the real life]

17:21 jacklsw has joined #riscv

17:27 jacklsw has quit [Quit: Back to the real life]

17:27 MaxGanzII has joined #riscv

17:27 jacklsw has joined #riscv

17:27 meta-coder has quit [Ping timeout: 260 seconds]

17:28 jacklsw has quit [Client Quit]

17:28 jacklsw has joined #riscv

17:31 danilogondolfo has quit [Quit: Leaving]

17:39 jacklsw has quit [Quit: Back to the real life]

17:48 rsalveti has joined #riscv

17:48 aerkiaga has joined #riscv

18:07 Xark has quit [Ping timeout: 268 seconds]

18:12 Xark has joined #riscv

18:25 armand_ has joined #riscv

18:25 Armand has quit [Ping timeout: 240 seconds]

18:29 armand_ is now known as Armand

18:32 Andre_Z has quit [Quit: Leaving.]

18:41 raym has quit [Ping timeout: 268 seconds]

18:46 TMM_ has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

18:46 TMM_ has joined #riscv

18:47 raym has joined #riscv

18:50 EchelonX has joined #riscv

19:05 ldevulder has quit [Quit: Leaving]

19:22 ashtin has quit [Read error: Connection reset by peer]

19:23 vagrantc has quit [Quit: leaving]

19:25 Stat_headcrabed has joined #riscv

19:27 aurel32 has quit [Quit: leaving]

19:27 aurel32 has joined #riscv

19:30 BootLayer has quit [Quit: Leaving]

19:33 Stat_headcrabed has quit [Quit: Stat_headcrabed]

19:36 dilfridge is now known as stroopwafel

19:36 stroopwafel is now known as dilfridge

19:41 aburgess has joined #riscv

20:39 heat_ has joined #riscv

20:40 heat has quit [Ping timeout: 258 seconds]

20:45 wingsorc__ has joined #riscv

20:48 heat_ has quit [Read error: Connection reset by peer]

20:48 heat_ has joined #riscv

21:15 hometown has quit [Ping timeout: 240 seconds]

21:19 greaser|q has joined #riscv

21:19 greaser|q has quit [Changing host]

21:19 greaser|q is now known as GreaseMonkey

21:27 heat_ is now known as heat

21:30 <pierce> Oh I just saw that there’s now a CHERI SIG

21:53 <jrtc27> has been for a few months

22:26 <pierce> Oh okay, I only saw this https://lists.riscv.org/g/tech-announce/message/231

22:29 Armand has quit [Quit: Leaving]

22:29 <jrtc27> SIGs start off with acting chairs until the charter is formed and candidates formally put themselves forwards

22:45 MaxGanzII has quit [Ping timeout: 240 seconds]

22:45 raym has quit [Ping timeout: 240 seconds]

22:57 raym has joined #riscv

23:03 wingsorc__ has quit [Ping timeout: 258 seconds]

23:12 pecastro has quit [Ping timeout: 268 seconds]