<palmer>
bjdooks: we don't have glibc support for V yet, so you can't do anything that requires context save/restore. We also don't have the Linux bits to enable them, but stillson just posted some patches
\subline is now known as \sublineReadOnly
<palmer>
courmisch: you're talking about the rd!=rs1 case? that's when you're changing stacks
Andre_H has joined #riscv
davidlt has joined #riscv
epony has joined #riscv
<courmisch>
palmer: is the expectation that the µarch magically switches to a different RAS ? That seems insane
<courmisch>
I'd expect that userspace context switching just breaks all RAS-related optimisations
<palmer>
RISC-V has two stacks, one that's RA-based and one that's based on some other register (x5 from the docs, I forget the ABI name)
<palmer>
so it's not for scheduling in a new thread, sorting that out is HW's problem. It's for coroutines or microcode or whatever
<courmisch>
t0 is the other link register, it's used for __riscv_save and __riscv_restore
<courmisch>
but I still don't understand how the RAS microarchitectural of "none" differs from "push then pop"
<courmisch>
+behaviour
<courmisch>
if you push the return address then pop it, it's as if you did nothing
<palmer>
I'm looking at "Yes & Yes & No & Pop, then push"
<courmisch>
yeah
<courmisch>
I must have some wrong documentation version
<courmisch>
now that makes more sense
<palmer>
ya, if it says the other way around then it's nonsense ;)
<courmisch>
user-mode QEMU already exposes vectors, regardless of the fact that Linux technically has not defined the ABI yet. Then again, as long as you keep off <ucontext.h>, I don't think that glibc is a concern
<courmisch>
setjmp/longjmp will clobber the vectors but that's perfectly fine since the vector ABI says all vectors are call-clobbered
<courmisch>
(thankfully so, it's such a pain with ARM NEON)
<palmer>
ff93ec3f3567510d11882691ec3dfe8b35874617
<courmisch>
right
<courmisch>
no idea how I ended up with such an old version without noticing though
<palmer>
ya, it's 2017
<courmisch>
the only thing that QEMU user does *not* yet do is set the V bit in HWCAP
<courmisch>
so you have to just assume that V is supported, or test it via SIGILL (yikes)
<palmer>
ya, that's backwards -- the idea was to match the Linux ABI, not to just pick a third one ;)
<courmisch>
considering that vector size is not configurable (unlike in ARM SVE, IIRC?), I don't see why a real interface would be needed though
<courmisch>
kernel can just trap on first use and enable vectors for the calling task from then on
<courmisch>
context switching is transparent to user mode anyhow
<palmer>
it's not for signals, so userspace needs to ack into having the vector unit on if it wants to take one
<palmer>
I guess maybe we could do some tricks with sigreturn? still seems saner to just keep the state in mcontext/ucontext
<courmisch>
ah you mean between libc and kernel, yes. I meant for normal application code, there should be nothing to do other than check the bit in HWCAP
<courmisch>
having applications flag themselves is a non-starter IMO. In most cases the Vector code will be in some library that should not modify the process or task state unilaterally or arbitration will be a big mess
<courmisch>
I haven't checked, but I assume QEMU just leaves vectors out of signal handling - which presumably works as long as the signal handler does not use vectors in any way
<palmer>
with stuff like autovectorization and memcpy it's pretty hard to ensure that arbitrary C code doesn't touch the V state
<palmer>
we've started to see some bugs trickle in that are possibly signal-related, but nobody's 100% sure yet (we're kind of just putting them to the side until the runtime V support is there)
dionysos is now known as [dionysos]
<courmisch>
ah right
<courmisch>
well at any rate, usermode QEMU does enable it if you set the CPU accordingly
<palmer>
yep, and that's how folks are running to find the bug -- because you can't yet turn V on in Linux, the bugs won't be seen ;)
Andre_H has quit [Quit: Leaving.]
<conchuod>
Palmer those vector patches seem unchanged since 6 months ago when the last version was posted
<conchuod>
I think your Rivos fella just wanted to tack on a single fixup
<conchuod>
(his name escapes me, Chris?)
<muurkha>
courmisch: it doesn't work if the signal handler uses <ucontext.h> or similar to preemptively change context to a different usermode thread
<muurkha>
well, I should say, I'm pretty sure it won't work in that case. I haven't tried it!
<courmisch>
muurkha: you can't use the ucontext from the signal handler
<courmisch>
muurkha: there are no ways to return out of the signal handler without going to sigreturn syscall, AFAIK
<courmisch>
I don't think you can return to the correct PC without having at least one X register clobbered without an exception return
<courmisch>
and you can't do an exception return from userspace
<muurkha>
exception?
<courmisch>
what would you use the ucontext for? if you want to print out the content for self-debugging, it's not really an issue that vectors aren't there
<courmisch>
if you want to use it to swap context, that just won't work
<courmisch>
it won't work vector or no vector
Andre_H has joined #riscv
Starfoxxes has quit [Ping timeout: 265 seconds]
Andre_H has quit [Remote host closed the connection]
Starfoxxes has joined #riscv
cwebber has joined #riscv
dor has quit [Remote host closed the connection]
<palmer>
is anyone else seeing these kasan pcpu_alloc crashes? seems like I'm getting them randomly, which is kind of a headache
Trifton has joined #riscv
<prabhakarlad>
Hi All, anyone using meta-riscv?
<prabhakarlad>
reason I ask, I am seeing issues where ldconfig (fails with illegal instruction). I was wondering if anyone else as seen such issue.
<prabhakarlad>
s/else as/else has
<conchuod>
hmm, we do use it as part of our bsp prabhakarlad but we don't usually run on the ToT so I am likely no help.
<palmer>
prabhakarlad: khem is in here, he might know. Not sure if davidlt tests it any more...
<prabhakarlad>
conchuod: no worries.
<davidlt>
I haven't used it for some time
<conchuod>
Does anyone other than you actively do anything with KASAN on RISC-V palmer ?
<conchuod>
Oh, also RCU stall detection seems to be broken on polarfire :)
<drmpeg>
Broken how?
<drmpeg>
The timeout has changed.
<conchuod>
As in, a stall that used to be detected is no longer detected
<conchuod>
And I can let the system sit an hour and nothing happens
<drmpeg>
Okay, it was the other way around for the timing thing.
<drmpeg>
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0 sets it back to the old timing.
<conchuod>
I bisected it to a change to the riscv clocksource timers
qwer has joined #riscv
qwer has quit [Ping timeout: 260 seconds]
<conchuod>
palmer: I can repro with stress-ng
<conchuod>
bah nvm, not the same oops
qwer has joined #riscv
<conchuod>
I saw KASAN complaining but it was not related, sorry :/
<conchuod>
Probably some facet of my weird ddr configuration
<palmer>
OK, I get a bunch of different stack traces but they're all in pcu_alloc
<palmer>
everything is a small offset from 0, smells kind of like TP hasn't been initialized in some cases
<conchuod>
Ye I think I just messed up and loaded the wrong dt for what my memory map actually is so there was some overlap
<conchuod>
I saw KASAN and didnt double check before typing if it was the same