<YangyuChen[m]>
After reading this: https://github.com/llvm/llvm-project/issues/88029, I think we may need to implement the misaligned vector load/store emulation on kernel / SBI to handle the situation mentioned in the link.
alexghiti has joined #riscv
naoki has quit [Client Quit]
<jrtc27>
or don't pass -mno-strict-align?
mlw has joined #riscv
<jrtc27>
emulated vector loads/stores are going to perform horribly, you're better off not taking the traps and having instructions that handle the misalignment
paddymahoney has quit [Ping timeout: 260 seconds]
paddymahoney has joined #riscv
<courmisch>
considering that vectors are all about perfs, emulating them seems like a cure worse than the disease, yeah
<courmisch>
also emulating vectors loads and stores is not so easy in the first place
iooi_ has joined #riscv
iooi has quit [Ping timeout: 268 seconds]
iooi_ is now known as iooi
<courmisch>
you'd have to account for mask and tail
<sorear>
from a perf angle, for any core where we currently have a full scheduling model I don't see any reason not to add a bit for "elementally misaligned vector loads and stores are at least as fast as the same operation at ELEN=8"
<sorear>
from a correctness angle, I'm pretty sure that as long as Zicclsm exists and cores that trap on elementally misaligned vector access also exist, it needs to be handled in SBI. kernel is not good enough because the profile requirements apply for arbitrary S-mode code
<sorear>
alternatively we can declare that the c908 + opensbi system does not provide Zicclsm / V, and suppress one bit or the other in dtb/hwprobe
<sorear>
I don't think tail is a problem, you can do any "emulated" operation as e8, m<configured LMUL>, tu, mu, and set vl = <configured vl> * <configured ELEN/8>
<sorear>
masks are much trickier since they need to be replicated, and I haven't figured out what you'd need to do to make strides/indexes/segmentation work; just saying "hardware bug, zicclsm is missing" would be easier
<sorear>
for segmented indexed or tiny-stride operations we may have to fall back to one element at a time - rewriting them as independent strided element loads results in slightly but detectably different memory ordering
iooi_ has joined #riscv
iooi has quit [Ping timeout: 256 seconds]
iooi_ is now known as iooi
simpl_e has quit [Remote host closed the connection]
<sorear>
if there's something that needs a respin yes, I'm a bit behind on email
<alexghiti>
The respin is up to you regarding Greg's comment. I'll keep this fix on my list then, thanks.
<sorear>
I'll have to think about that
<YangyuChen[m]>
<courmisch> "considering that vectors are all..." <- I think we still need this for binary compatibility if the binary already compiled with some misaligned vector load/store exists. And we also need to add this to hwprobe to prevent some core without vector misaligned load/store support for dynamic function dispatch (like openssl already does).
KombuchaKip has joined #riscv
<courmisch>
YangyuChen[m]: SBI needs to implement it because RVA22 requires support for it. But it won't solve the existing binary compatibility problems
<courmisch>
we can't go back in time and patch SBI to fix existing firmware. and in the future, code should be fixed rather than rely on binary compat
<YangyuChen[m]>
What is "existing binary compatibility problems" here?
<YangyuChen[m]>
I mean if the misaligned vector load/store will be emulated in sbi / kernel, the compatibility is OK, but with relatively poor performance.
<courmisch>
the existence of an LLVM issue seems to imply that somebody is hitting the problem
<YangyuChen[m]>
The best way I can imagine is to provide a new hwprobe to probe vector misaligned speed and using ifunc to dynamically choose the code in the runtime.
<courmisch>
I'm not sure we want to encourage writing vector optmisations with unaligned vector memory accesses
<YangyuChen[m]>
I agree. And Zicclsm is in RVA23U64 mandatory base. It may not happens in the future CPUs.
danielinux has quit [Changing host]
danielinux has joined #riscv
<courmisch>
I don't see the point in exposing it in hwprobe if it's not to enable dedicated opts though
naoki has joined #riscv
naoki has quit [Client Quit]
<jrtc27>
or just don't claim to be compatible with a profile you're not?
alexghiti has quit [Ping timeout: 256 seconds]
alexghiti has joined #riscv
oaken-so1rce is now known as oaken-source
wingsorc has quit [Quit: Leaving]
<courmisch>
jrtc27: I don't recall Cannaan making such a claim about K230? I wouldn't be surprised if people assumed that V implies RVA20U64 even if it formally doesn't
<courmisch>
though
<sorear>
« or just don't claim to be compatible with a profile you're not? » this is the simple option I was proposing above, although the details are tricky in their own way
<sorear>
courmisch: the other side of this problem is «Misaligned accesses are supported in userspace, but they may perform poorly.» in Documentation/arch/riscv/uabi.rst, which may be taken as referring to Zicclsm. Perhaps we should clarify that guarantee to "misaligned _scalar_ accesses are supported, check Zicclsm in hwprobe before assuming misaligned vector accesses are also supported"
<sorear>
if you have V+Zicclsm you can do "vsetivli zero, 30, e4, m8, ma, ta; vle32 v0, (a1); vse32 v0, (a0)" to memcpy 120 bytes regardless of alignment. this is probably a useful optimization and I am somewhat surprised there are real cores that choke on that
<conchuod>
sorear: I think that would be a worthwhile change. Although, there's no Zicclsm hwprobe key, so maybe that specific part would change to drop wording about the information source?
<sorear>
I still think that one should have been added shortly after it was ratified in (checks riscv.org) March 2023
<sorear>
that's fine. maybe leaving it unspecified will encourage development
<conchuod>
Would "Misaligned vector access cannot be assumed unless the platform supports Zicclsm" work as the second sentence?
<sorear>
"non-scalar" or "using instructions outside of GC" might work better, since an unpatched SBI won't be emulating Zilsd or AMF loads and stores either
<conchuod>
Effectively that statement in the uabi doc just means "whatever worked on an unleashed is assumed to always work".
<conchuod>
There's a bunch of stuff that came out of the profiles spec that I've yet to see anywhere other than qemu
<conchuod>
Also, is this really all we get for Zicclsm: "Misaligned loads and stores to main memory regions with both the cacheability and coherence PMAs must be supported." What constitutes a "load" or a "store"?
<sorear>
I think most of the profiles-derived extension strings are supported for parsing by gcc and binutils, just tested zicclsm and it was accepted
<sorear>
so it's not quite only qemu
<sorear>
we know what loads, stores, and AMOs are because they're the interface specification between the ISA and the memory model. a better spec would spell this out
<conchuod>
I'd just love a reference to the source for those defnitions.
<sorear>
you mean as text in riscv-profiles.adoc? agreed
<conchuod>
Yeah, not from you but in the document.
<sorear>
I don't think we need a page of pseudocode for every single Zb* and Zc* mnemonic, but then you have stuff like this that isn't clear at all (if you check the issues tab I had to get clarification on several of them)
<conchuod>
sorear: Do you think that Zicclsm is incompatible with a platform where "elementally misaligned vector loads/stores" trap?
<conchuod>
That's probably a poor question actually.
<conchuod>
I see, 4 months ago and said note was not added.
<sorear>
this also matches my understanding of the intent - alignment isn't a dealbreaker on A-profiles and Zicclsm is how this is realized
Andre_Z has joined #riscv
<conchuod>
One thing I missed until recently was that the RVA..S.. profles include all of RVA..U.. extensions.
<conchuod>
So things like Zicclsm that were only defined in the RVA..U.. section were actually valid for S mode too - but we'd already deemed Zicclsm useless at that point as it provided nothing over the existing uabi.rst statement.
<conchuod>
I guess that's changed with the addition of vector and the arc ruling.
<courmisch>
sorear: kernel doc should clarify that it's about scalar, yes
<conchuod>
sorear: thanks for that link
Andre_Z has quit [Quit: Leaving.]
Andre_Z has joined #riscv
<courmisch>
sorear: more interesting case for me would strided loads, becausd then the element size is the row size, which is not always aligned
<sorear>
courmisch: I can't quite tell what that's a reply to
<courmisch>
sorear: hwprobe doc should clarify that it's only checking scalar accesses
stolen has joined #riscv
<somlo>
so I tried to casually rebuild the latest linus tree for my litex/rocket system, as I do every morning, and I got this:
<somlo>
drivers/irqchip/irq-riscv-imsic-early.c: In function 'imsic_ipi_domain_init':
<somlo>
drivers/irqchip/irq-riscv-imsic-early.c:52:9: error: too many arguments to function 'riscv_ipi_set_virq_range'
<cousteau>
It says the signed immediate format is simm20[20|10:1|11|19:12] (and no simm20[0]), but that looks more like the type-J bit salad.
<cousteau>
Some tests on godbolt suggest that the actual format is just simm20[19:0]
esv_ has joined #riscv
esv has quit [Remote host closed the connection]
BootLayer has joined #riscv
<cousteau>
I'm going to assume that's the expected behavior, and the documentation is buggy.
coldfeet has joined #riscv
damian101_ has joined #riscv
damian101 has quit [Ping timeout: 268 seconds]
coldfeet has quit [Remote host closed the connection]
<sorear>
cousteau: gas/doc/c-riscv.texi in the binutils-gdb repo?
damian101_ has quit [Ping timeout: 264 seconds]
frkzoid has joined #riscv
ntwk has quit [Read error: Connection reset by peer]
freakazoid332 has quit [Ping timeout: 240 seconds]
<palmer>
cousteau sorear: ya, looks like it's just a doc bug. Are you guys going to send a patch, or do you want me to?
unnick has quit [Ping timeout: 255 seconds]
stolen has quit [Quit: Connection closed for inactivity]
van3ll0pe has joined #riscv
unnick has joined #riscv
<van3ll0pe>
Hello guys, I'm making an emulator of riscv, and I want to know for instruction type with immediate data. I want to stock it in a variable, but for the U type instruction with the immediate data to the bit 31 to 12, I can get it with a binary and after that I don't need to shift the bit << 12 when I realise the instruction lui and auipc because I already get it on 31 to 12 bit.
<cousteau>
palmer: well I don't have experience sending patches there, plus I'd rather have someone who actually knows what it should say doing the patching
<cousteau>
(while you or someone else are at it, please replace every occurrence of opcode6 with opcode7)
<cousteau>
Also, the bit indexes below the ASCII art box designs is bugging me a lot; it seems to refer to the bit on the left of the division sometimes, and others the one on the right...
<palmer>
this is why I don't read docs... ;)
<cousteau>
Reading docs is for people who don't know how to use a tool!
<cousteau>
Such as me :/
Andre_Z has quit [Quit: Leaving.]
<cousteau>
I can try sending a patch though. Mailing list? (This doesn't have a "pull request" type of thing like GitHub/GitLab do, right?)
<cousteau>
van3ll0pe: yes, bits 31 to 12 of the LUI/AUIPC instruction correspond to bits 31 to 12 of the immediate word.
<cousteau>
Bits 11 to 0 of the immediate are zero, and bits 32 and above (in the case of RV64) are sign-extended.
<cousteau>
Something tells me this was done on purpose to simplify things.
<palmer>
cousteau: they just get emailed to binutils@sourceware.org
Stat_headcrabed has joined #riscv
coldfeet has joined #riscv
test925 has quit [Quit: Leaving]
<cousteau>
Thanks!
vagrantc has joined #riscv
<cousteau>
Hm. Not sure if I should do something about the numbers below the boxes being misleading as well (put all the numbers to the LEFT of the box boundary, except the final 31)
<cousteau>
Either the leftmost 31 should be a 32, and then the numbers are Python-style array delimiters, or they should clearly appear left or right of the line.
<cousteau>
I think I'll make one patch for each thing; I'm finding so many tiny "errors" this may take forever...
esvmicrosoft has joined #riscv
<cousteau>
There's also the issue that some immediates are called simm20[20:1] when I believe they should be simm21 (with bit [0] set to 0)
esvmicrosoft has left #riscv [#riscv]
<cousteau>
OK, I'll submit a patch with ONLY the U instruction change, and the opcode7 one
<palmer>
I'm trying to find someone to pickup the kernel side of things, we're kind of overloaded here right now...
<courmisch>
why does the kernel even need to care, if the emulation has to go in SBI?
<palmer>
I'd just do the emulation in the kernel, I generally try and keep stuff out of SBI where we can
<courmisch>
somebody was demonstrating that it has to be in SBI earlier
<palmer>
courmisch: can you point me to the post? I'm not really sure why we'd need it in SBI, we should be able to prevent the kernel from emitting these misaligned accesses at all (the'll kill performance, so even if we just turn off kernel-mode vector that's likely a net win)
BootLayer has quit [Quit: Leaving]
<conchuod>
palmer: I think it has to go into SBI for profile compliance.
<conchuod>
The profile requires Zicclsm for s-mode, which in turn means that it must be implement by m-mode.
<palmer>
Well, we'll still need it in the kernel for implemtations that don't claim profile compliance
<conchuod>
Why?
<conchuod>
If you dont see Zicclsm support, you dont do misaligned vector access
coldfeet has joined #riscv
<palmer>
ya, it just seems like that's going to end up being a rabbit hole of SIGILLs. This stuff is all hueristics in the compiler, so it's going to be wrong from time to time
<conchuod>
I don't think we need to be mopping up everyones implementation issues.
<conchuod>
The scalar one we "have" to do for abi reasons, vector we can avoid.
<palmer>
OK, I guess we'll just see how many bugs show up ;)
<palmer>
then we pretty much just need the hwprobe side of things so we can tell userspace how the accesses behave
<conchuod>
What we do need to do is fix uabi.rst to clarify that that's only for scalar access.
<conchuod>
palmer: we need a key for Zicclsm and I think that's sufficient?
<palmer>
nah, we need the performance side of things too
<conchuod>
Right, cos it could be implemented in fw
<palmer>
if misaligned vector loads are going to trap then we're gonig to need to avoid emiting them, regardless of whether those traps get fixed up
<conchuod>
I dunno if anyone in here volunteered to actually write the uabi.rst patch, I guess I shall but I'll probably botch the wording.
<palmer>
is Zicclsm not ratified yet?
<palmer>
I'm almost done writing it
<conchuod>
It's in the profiles spec
<conchuod>
palmer: sweet
<palmer>
ya, but a ratified spec?
<palmer>
ya, it's in a ratified one
<conchuod>
RVI screwed up the extension status pages cos of Jira integration.
<palmer>
ya, and the moved stuff around so Google 404s ;)
<palmer>
I think it's ratified...
<conchuod>
sorear said march 2023
<conchuod>
Which I think is correct cos Zicclsm is in rva20
<palmer>
it's in the github release that says "ratified", so that's good enough for me ;)
<palmer>
Ya, and if we do it in SBI it'll be harder to turn off
<conchuod>
If people implement hw like that and want profile compliance they're gonna do it in SBI.
<conchuod>
What do you do then, hope the firmware implements delegation of the trap to s-mode so that you can blow up on it in your kernel?
coldfeet has joined #riscv
<conchuod>
The other options rely on people noticing that the hwprobe keys are saying misaligned vector is emulated etc, right?
rsjw has quit [Ping timeout: 272 seconds]
jmdaemon has joined #riscv
test924 has joined #riscv
___nick___ has joined #riscv
<courmisch>
it would make sense to add a new hwprobe bit-field for this though
<courmisch>
FFmpeg already has a few cases that would benefit from unaligned vectors, though that depends on hardware actually existing
___nick___ has quit [Client Quit]
___nick___ has joined #riscv
test925 has joined #riscv
test924 has quit [Ping timeout: 240 seconds]
van3ll0pe has quit [Remote host closed the connection]
<unlord>
palmer: does Robin Dapp need access to hardware (re your link)
<palmer>
unlord: he's on Jeff's team at Ventana, so I assume he has what he needs -- though if you guys have a link to something that'll boot on that Spacemit chip that'd be awesome, we're all trying to get it booting something we can play with
<palmer>
(he's also wtw, but looks like he's not here now?)