heat has quit [Read error: Connection reset by peer]
<courmisch>
the vsetvli behaviour on penultimate loop iteration is kind of annoying
<courmisch>
I have a case where I need to be sure that VL is always even and that's in the way :/
dzaima[m] has joined #riscv
<dzaima[m]>
yeah, it's very not nice; another place where it breaks things is for mask loads/stores (vlm/vsm) where otherwise you could have a loop which adds vl>>3 or whatever to a pointer on each iteration, and it'd work (assuming a high enough LMUL such that vlmax≥8)
<courmisch>
I have not yet had a use case for masking :)
<courmisch>
possibly because multimedia grew up in SIMD without masking and built a habit of not having masks
<courmisch>
I guess I have to insert a MIN before VSETVLI
<dzaima[m]>
this would be for like in-memory packed-bit-boolean arrays
<dzaima[m]>
peeling off the last iteration is another option (if the loop body isn't very large)
<courmisch>
To be clear, I don't question that it's useful in *some* cases.
<courmisch>
But it's a lot less useful than with SVE (because SVE has no tail concept)
mwette has joined #riscv
drewj has joined #riscv
mobius has quit [Server closed connection]
mobius has joined #riscv
drewj has quit [Ping timeout: 246 seconds]
<courmisch>
and now I need to eat my hat because I just got a use case for masking
<Andre_Z>
Hi, I'm currently porting Hangover to RISC-V. One thing that's annoying is that there's no __sync_bool_compare_and_swap_128 on RISC-V64. Is someone aware of preexisting code to workaround that?
<courmisch>
well you can't compare-and-swap 128-bit values atomic values if you can't
<courmisch>
AFAIK, you can't do that on Armv8.0 either
<courmisch>
Andre_Z: so whatever the Armv8 code does?
<Andre_Z>
there's __sync_bool_compare_and_swap_128 on ARMv8, I'll have a look what it does. But am I really the only one running into that?
<palmer>
Andre_Z: there's a proposed ISA extension for 128-bit CAS
lagash has quit [Remote host closed the connection]
lagash has joined #riscv
<palmer>
no, missing double-word CAS breaks a bunch of stuff. That's why everyone else has it ;)
<Andre_Z>
palmer: interesting, how do I check if my CPU has that already, or is it too unlikely in the proposed state?
<heat_>
try using a spinlock/mutex
<heat_>
it's not lockless but will just work
awita has quit [Ping timeout: 246 seconds]
<courmisch>
ah nvmd, Armv8.0 has LDXP/STXP
<sorear>
do you have a link for Hangover?
<Andre_Z>
heat_: Is there some code outthere? Porting Hangover is already a challange without thinking about CAS
<Andre_Z>
I think the only similar thing for RISC-V atm is Wine-CE, but Hangover will be like 10x faster. But wrt funding, it seems only one guy is currently interested...
agent314 has quit [Ping timeout: 246 seconds]
<sorear>
armv8 has had double-word CAS since the beginning but armv6(or was it 5? ) and mips are limited to the word size
<heat_>
Andre_Z, fwiw it's entirely possible that C11 atomics will Just Work and link in libatomic
<courmisch>
well *technically*, you need Armv8.1-LSE for CAS
<courmisch>
but Armv8.0 has 128-bit load/store exclusive
<heat_>
the __sync stuff is AFAIK discouraged
<sorear>
for an emulator, I think your only real options are to either stop the world for wide atomics (qemu does this), or do software MESI with a separate address space per thread and page-sized "cache lines"
Stat_headcrabed has joined #riscv
<sorear>
load/store exclusive as it is actually implemented and CAS are equally powerful
drewj has joined #riscv
<courmisch>
heat_: I don't think you can introduce a spin lock when writing an emulator
jacklsw has quit [Ping timeout: 246 seconds]
<Andre_Z>
sorear: I'm not yet at the emulation stuff, just basic Wine as the foundation
<Andre_Z>
heat_: I'll have a look, thanks
<courmisch>
uh, basic WINE? there's no Windows on RISC-V (yet) so what does that even mean?
<sorear>
the advantages of load/store exclusive for mitigating ABA problems as found in the published literature require you to be able to load and store unrelated variables between the LL and SC, which no real architecture allows
<heat_>
courmisch, why not? they're already calling functions, it's not like this is JITted
<courmisch>
heat_: you can't realistic change the memory layout of things
<heat_>
libatomic Should(tm) just hash the address into some locks, and do the cmpxchg under them
<courmisch>
so you could use a spinlock, but it would have to be in a parking pattern or something like that
<heat_>
sure
<sorear>
the libatomic approach works for the C abstract machine but not for emulators, because a "cmpxchg16b" instruction is atomic with respect to 32-bit and 64-bit _simple_ loads and stores executed in other threads
<Andre_Z>
courmisch: check my Hangover link above :) Wine just needs to run, then I can use the WoW64 mode to plugin an emulator for only emulating the x86 application. Windows-syscalls are then handled on the RISC-V side, outside of the emulator
<sorear>
(libatomic approach = hashed locks)
<courmisch>
parking pattern is what you call hashed locks, I think
<Andre_Z>
heat_: thanks, that looks promising
<courmisch>
but yeah, you will have problems w.r.t. regular loads and stores
Stat_headcrabed has quit [Quit: Stat_headcrabed]
drewj has quit [Ping timeout: 252 seconds]
<heat_>
yeah... good point
drewj has joined #riscv
<courmisch>
well, emulating x86 without RVWMO is basically impossible
<sorear>
fences everywhere or software MESI...
<courmisch>
no not WMO
<heat_>
tbf cmpxchg16b is recent enough that you can try to not implement it and see what happens
<courmisch>
whatever the extension name was that brought x86 memory model
<sorear>
Ztso
<courmisch>
yes that, thx
<sorear>
which for some reason isn't available in hwprobe despite currently existing hardware, openjdk is probing mvendorid instead
<courmisch>
hwprobe support in glibc any decade
<palmer>
courmisch: I was going to merge it last night, but the tests fail
<courmisch>
can't we get the syscall wrapper merged *without* the ifunc things that can't seem to be made to work?
<palmer>
probably? you're just looking for one of the patches?
<palmer>
I've been so worried about the ifunc stuff I kind of forgot about everything else...
<courmisch>
I am looking for source compatibility that I can rely on, as opposed to second-guessing what will be merged
<courmisch>
sorear: prefetch support also not exposed, IIRC
<palmer>
OK, so is there a patch you can point me at or something?
mlw has joined #riscv
<courmisch>
and unaligned vector memory load/store, but that one has no hardware implementation
<courmisch>
palmer: I lost the link to the last version in patchwork. I only care about the plain syscall wrapper (since this goes into code bases that can't rely on IFUNC).
<sorear>
that's surprising given that it's basically free on implementations where changing SEW is a no-op
<sorear>
i wasn't aware of current implementations of Zicbop
<courmisch>
well there's only one implementation of RVV 1.0 commercially "available"
<courmisch>
and that's K230, and it crashes on unaligned accesses
<heat_>
<courmisch> well, emulating x86 without RVWMO is basically impossible <-- tbf they have done it before for arm64, but it just sucks compared to the M1's x86 TSO sauce
<courmisch>
heat_: you're making my point. Yes, unless you have Apple hardware, you're toast
<courmisch>
palmer: seems right
<palmer>
OK, I'll go try and figure out what's actually failing
<courmisch>
palmer: it's dumb, but I don't even know if it's going to be called "__riscv_hwprobe" or "riscv_probe" or whatever
<palmer>
those shouldn't really be tripping anything up
<palmer>
ya, makes sense
<palmer>
let me try and figure out what's actually failing, build-many-glibcs is kind of a headache...
<courmisch>
and then ofc, QEMU user doesn't support hwprobe either. Sigh.
<courmisch>
and vendors seem to love their pre-hwprobe kernel versions
<palmer>
QEMU should support it, Robbin had some patches and IIRC they got merged