sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
vagrantc has quit [Quit: leaving]
pecastro has quit [Ping timeout: 252 seconds]
<dh`> on netbsd you just do build.sh -m riscv tools
geertu has quit [Ping timeout: 250 seconds]
geertu has joined #riscv
jimwilson_ has joined #riscv
frost has joined #riscv
jacklsw has joined #riscv
radu2424078 has joined #riscv
tgamblin has quit [Quit: Leaving]
<Sofia> jimwilson: I suspected this so (mulhsu being basically free to implement). Just yet to see anyone use it.
<Sofia> jimwilson: So that vadc is a pseudoinstruction then?
_whitelogger has joined #riscv
<Sofia> jimwilson: As I've said a few times here regarding adc. We really don't need it. If you're doing high performance math, you want to defer communication between limbs using a reduced radix representation. This cuts the dependency across limbs, so they may be computed in parallel by higher end cores. This could mean your 2-4 independent adds or multiplies all run in the single cycle. You do this same
<Sofia> optimization in the x86 world (where adc is simply too slow with its dependency). Ex. https://www.chosenplaintext.ca/articles/radix-2-51-trick.html
<Sofia> Once you've (analytically) done enough additions or multiplies that doing yet another would overflow in the worst case, you can apply the carries as needed. However this worst-case may not be your average case, so you might guard the communication with a fast-path check if your future adds or muls could overflow the current state. Further deferring in the average or best case. Only paying for the
<Sofia> carries if you added enough large numbers (worst case)
<sorear> i think he knows what reduced radix representations are
<Sofia> Easy enough to elaborate in case they don't, or others don't.
EchelonX has quit [Quit: Leaving]
<pabs3> jimwilson: s/multilib/multiarch/ :)
davidlt has joined #riscv
<dh`> mulhsu is supposed to be for the inner terms of double-width multiplies
<dh`> (this has come up here before)
<dh`> that's about all it's good for though
PyroPeter has quit [Ping timeout: 240 seconds]
PyroPeter has joined #riscv
davidlt has quit [Ping timeout: 252 seconds]
<Sofia> dh`: Can you elaborate? We do not get the instruction with i128 * i128 -> i128, nor u128 * u128 -> u128.
<The_Decryptor> https://godbolt.org/z/1Krcj69Ed < This produces it
<jimwilson_> I gave a C example to produce mulhsu earlier, basically same as the rust example.
<jimwilson_> vadc is a real instruction, produces the sum, vmadc produces the carry out, https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-integer-add-with-carry-subtract-with-borrow-instructions
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
perro has quit [Ping timeout: 252 seconds]
<Sofia> jimwilson_: Oh I missed that line. Thanks
<Sofia> The_Decryptor: Thanks as well.
<Sofia> Expected results, make sense. :)
BOKALDO has joined #riscv
davidlt has joined #riscv
winterflaw has joined #riscv
pecastro has joined #riscv
bauruine has joined #riscv
voltron has joined #riscv
geertu has quit [Read error: Connection reset by peer]
geertu has joined #riscv
jjido has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jacklsw has quit [Quit: Back to the real world]
pecastro has quit [Ping timeout: 252 seconds]
pecastro has joined #riscv
elastic_1 has joined #riscv
elastic_dog has quit [Quit: elastic_dog]
FL4SHK has quit [Ping timeout: 250 seconds]
FL4SHK has joined #riscv
voltron has quit [Remote host closed the connection]
crabbedhaloablut has quit [Remote host closed the connection]
crabbedhaloablut has joined #riscv
tgamblin has joined #riscv
perro has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
freakazoid333 has joined #riscv
freakazoid333 has quit [Remote host closed the connection]
freakazoid333 has joined #riscv
freakazoid333 has quit [Ping timeout: 240 seconds]
<Sofia> jimwilson_: Do you define the `sltnu a0 a0 a1` as `sub a0, x0, a0; sltu a0 a0 a1` ?
freakazoid333 has joined #riscv
freakazoid333 has quit [Read error: Connection reset by peer]
<Sofia> Or `sub a1, x0, a1; stlu a0 a0 a1` ?
freakazoid333 has joined #riscv
<jrtc27> write it in C and see what GCC and Clang emit..
<jrtc27> oh you mean which one is negated in such a hypothetical instruction
<Sofia> Yes
<jrtc27> ultimately it makes no difference
<Sofia> Does it? I'd expect this to have implications for a compressed variant.
<jrtc27> there is no compressed slt[i][u], let alone a negated one
<Sofia> Noted.
<Sofia> Then I guess it'd align with andn et al
<Sofia> X(rd) = X(rs1) & ~X(rs2)
<Sofia> rd = rs1 < (0 - rs2)
<Sofia> jrtc27: Doesn't it make a difference?
<Sofia> They look different to me. https://godbolt.org/z/vK1fna43o
<jrtc27> rs1 < 2^64 - rs2 <=> rs2 < 2^64 - rs1
<jrtc27> I believe
<Sofia> h => false
<Sofia> er
<Sofia> f(a, b) == g(b, a)
<Sofia> still false
jacklsw has joined #riscv
<jrtc27> yes, because it's meant to be b < 0-a
<Sofia> That isn't what I asked though?
<jrtc27> because if the instruction negates an input, you just swap the operands
<Sofia> The question is: is the left negated or is the right negated.
<jrtc27> yes
<jrtc27> and clearly it's not the exact same instruction
<jrtc27> but it's just as powerful
<Sofia> Right, but one of these fits the desired application better. I'm asking which.
<jrtc27> my point was that both are usable in the same situation
<jrtc27> since swapping which is negated just requires you to swap the operands
<jrtc27> however that's not quite true
<jrtc27> my maths only works in the integers, not mod 2^64
<jrtc27> if one of the operands is 0 then negating it still gives 0
<jrtc27> not 2^64
<jrtc27> if you do it as the one's complement it works
<Sofia> Ohh, what is this.
<jrtc27> runs the z3 solver on an encoding of llvm ir to verify the validity of transformations
* Sofia approves
<Sofia> :D
<jrtc27> often used to catch issues around poison, but also useful for more boring arithmetic checks
jjido has joined #riscv
<Sofia> I like how the bug list finds LLVM and Z3 bugs. https://github.com/AliveToolkit/alive2/blob/master/BugList.md
<Sofia> finds/shows
<Sofia> The bug has to be in one of them (or alive2..wait where is that list)
<jrtc27> I found a couple of bugs in how they handled float<->int rounding
<jrtc27> for the edge cases in the range (-1, 0]
<jrtc27> or maybe (-1, 0), I forget
<Sofia> Fun
<Sofia> I'm more interested in rule synthesis, ruleset reduction, and interactive verification (with automation tools available). https://github.com/uwplse/ruler + https://arxiv.org/abs/2108.10436 + https://yewtu.be/watch?v=5VGTXMgebOo
<Sofia> Ruler is not interactive though, it could be.
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
paddymahoney has joined #riscv
ntwk has quit [Quit: ntwk]
raghavgururajan has quit [Ping timeout: 268 seconds]
BOKALDO has quit [Quit: Leaving]
Andre_H has joined #riscv
Andre_H has quit [Client Quit]
BOKALDO has joined #riscv
winterflaw has quit [Remote host closed the connection]
<dh`> sofia: the premise is that if you multiply two pairs (h0, l0) * (h1, l1) and this is representing a signed value, the h0 * l1 and h1 * l0 products are unsigned * signed. This only matters I guess if you're doing e.g. 2x32 * 2x32 -> 4x32.
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
winterflaw has joined #riscv
freakazoid333 has quit [Ping timeout: 250 seconds]
freakazoid343 has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
freakazoid343 has quit [Remote host closed the connection]
freakazoid343 has joined #riscv
jmdaemon has joined #riscv
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid343 has joined #riscv
vagrantc has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Changing host]
jmdaemon has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jacklsw has quit [Quit: Back to the real life]
mahmutov has joined #riscv
jmdaemon has quit [Quit: WeeChat 3.3]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Remote host closed the connection]
jmdaemon has joined #riscv
jmdaemon has quit [Quit: WeeChat 3.3]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
geertu has quit [Read error: Connection reset by peer]
geertu has joined #riscv
nvmd has joined #riscv
jmdaemon has quit [Quit: WeeChat 3.3]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
Andre_H has joined #riscv
jmdaemon has joined #riscv
Ivii has joined #riscv
davidlt has quit [Ping timeout: 240 seconds]
<gordonDrogon> Hm. Well. After some head scratching and reading/searching/whatnot - failing to install a riscv32 compiler,etc. on my desktop, re-reading and learning about the linkers "emulation" flag I can finally compile a stand-alone C program into a binary file for RV32IM using the supplied riscv64 tools on my Devuan desktop. I feel it should not have been this hard, however now I can concentrate more on my emulator. Phew!
<muurkha> Congratulations!
<muurkha> have you written notes about what you had to do and what you had to learn in the process? maybe someone else would benefit from them
<gordonDrogon> history > MY_PRECIOUS
<gordonDrogon> as well as creating a Makefile that will be my future base for more experiments.
<la_mettrie> i haven't yet found the command which makes riscv-gcc to compile for 32-bits processor. traditional -m32 switch (used in x86 world) doesn't work with it.
<gordonDrogon> it all seems possible, just a matter of working out the right runes. Still a lot to learn but way back I learned Sparc by studying the output of the compiler, so even though my aim isn't to write/compile C (it'll be pure ASM) it's still good to look at.
<gordonDrogon> -march=rv32im -mabi=ilp32 -ffreestanding -fno-pic -nostdlib # is a good start
<gordonDrogon> linker appears to want: riscv64-linux-gnu-ld -melf32lriscv_ilp32f -nostdlib -Ttext=0x0 -N test.o -o test
<gordonDrogon> then: riscv64-linux-gnu-objcopy test -O binary test.bin
<gordonDrogon> and hexdump on test.bin agrees with objdump/disasm on test.o, so happy for now.
<gordonDrogon> actually, I will do some C just for the "LoLs" ... so compiled C running on an emulator written in BCPL running on a 65c816 running at 16Mhz. ... Just call me mad...
<gordonDrogon> not mad enough to try to run a Linux kernel on it though...
<gordonDrogon> (although there was that guy who did it on an ATmega running an ARM emulator...)
<sorear> kind of surprised you got ld to do anything useful without writing a linker script, OTOH I've never used the "emulation" option so maybe they achieve the same thing
<gordonDrogon> I'd never heard of it until I started to actually read the error messages...
<jrtc27> the emulation does basically nothing
<jrtc27> for lld it just sets the machine, class, endianness and, sometimes, OSABI
<jrtc27> for bfd it also sets some default multilib search paths because apparently that's the linker's job now
<jrtc27> and not the driver's, ie gcc/clang
<muurkha> gordonDrogon: may also be useful for your own learning to review and summarize. though maybe you don't really care about enhancing your learning about the binutils build system
<jrtc27> lld takes the view anything like that is gcc/clang's job
<gordonDrogon> I'm quite out of touch these days, so it's usefull learning. I'll blog it all when I have something to show though.
<gordonDrogon> it may help others, but my ultimate aim is to run BCPL on risc-v, so the target audiance is very small... single digits < 2 possibly...
<gordonDrogon> bcpl compiles to a bytecode so in-theory, I just write the byecode interpreter/vm and it all "just works" ...
<muurkha> but getting RV32IM working is useful for a much larger fraction of people
<gordonDrogon> I've done this once for the 65816 and grew to loathe it....
<muurkha> you didn't enjoy writing a bytecode interpreter in 65816 assembly?
<gordonDrogon> muurkha, I'm sure it will be - lots of people wanting to use things like esp32-c3 which is a candidate for me, even though they do have their own tools for you to download/compile ,etc .
<gordonDrogon> the '816 is just old - it was too little, too late even when it was new. 64KB banks of RAM which makes it awkward to use.
<muurkha> correct me if I'm wrong but A is pointless for single-processor systems and FD is not worth the cost for a lot of things, so I'd think RV32IM would be maybe the #4 most common version of RISC-V after RV64G, RV32G, and RV32I?
<muurkha> but also of course there's the more general issue of "how do I configure my compiler and binutils to support weird variants of RISC-V"
<gordonDrogon> I decided to look at the '816 after building and playing with a few 6502 systems in recent years - the 6502 was the first microprocessor I used back in '78 ...
<gordonDrogon> muurkha, single precision FP is handy - sometimes - currently in my '816 system, I pass all FP operations to the boards "co-processor" which is an ATmega ... I'm sure there must be some FP libraries that work on the 32IM target though, but writing my own isn't beyond the realms of possibility.
BOKALDO has quit [Quit: Leaving]
<muurkha> yeah, FP is sometimes very useful indeed
<gordonDrogon> I just like the idea of risc-v ... the world has enough arms - and I did look at ARM way back then (owned an Acorn Archimedes) but never really got into it, then Sparc came along and it felt better, so never went back to ARM...
<muurkha> hardware FP I mean. FP in general is even more often useful
<gordonDrogon> depends on your target application - back in early BCPl days (late 60's!!!) doing a muldiv which preserved precision was faster than FP but today when you can do an FP operation in more or less a single cycle, who knows ...
<gordonDrogon> (although I think some people get lazy, but they can get off my lawn ;-)
<gordonDrogon> right. supper time here. back later :)
<muurkha> well, hardware FP costs a lot of transistors. those transistors might be better spent on a bigger cache or a second RV32I or RV32IM processor
<sorear> A is not useless if you have interrupts
<muurkha> oh interesting, I didn't realize that, sorear. thank you!
<jrtc27> yeah, RV32IMAC is a good target for microcontrollers
<jrtc27> unless you're *really* space-constrained for your cores somehow
<jrtc27> (or power)
<muurkha> why did I leave out C? does it ever make sense to leave out C except for class exercises?
<jrtc27> it does complicate instruction fetch
<jrtc27> but decompression is a big combinatorial blob
<jrtc27> (well, not that big)
<jrtc27> so not complex to implement, just area
<dh`> also easy to get wrong
<muurkha> and it saves you I$ area, no, jrtc27?
<sorear> there's also a spectrum of options for FD that I think has been inadequately explored by current foss implementations (Rocket's FPU is the size of a 16-32kB cache, but it's designed to do a 52 bit FMA _every cycle_ and you know historical chips were slower than that)
<muurkha> I can't help but read "FMA" as "fuck my accumulator"
<muurkha> in, of course, the pejorative sense, not some kind of sexual sense
<jrtc27> see also sifive's fu540 and fu740
<muurkha> haha
<sorear> there have been vague arguments about highly specialized accelerator cores that run very little total code
<muurkha> where you use the RISC-V CPU as a sort of scripting engine to configure the accelerator?
<sorear> mostly for "class exercises" though. (completely decoupling PC generation, fetch, and decode has its advantages)
<muurkha> I took apart a DVD player once and was astounded to find it was driven by an 8051
<sorear> more like using risc-v instead of a completely accelerator-specific microcode
vagrantc has quit [Quit: leaving]
<muurkha> not an Intel 8051, but a humongous BGA chip consisting mostly of video codec hardware, integrating an 8051 IP core
<dh`> seems like you ought to be able to make a very small microcoded fpu that would be horribly slow but cover you for the cases where you just want to be able to execute the occasional fp instruction
<muurkha> so I thought that might be the kind of thing you meant
<dh`> (and still cheaper that software emulation in M mode)
<muurkha> sorear: what do you mean by "instead of a completely accelerator-specific microcode"?
<sorear> like if you need a state machine to control things
<sorear> as i said vague
jmdaemon has quit [Read error: Connection reset by peer]
jmdaemon has joined #riscv
<muurkha> oh, "a state machine to control things" was what I thought I was saying
<muurkha> you could argue that the Cray-1 also sort of worked that way
smartin has quit [Remote host closed the connection]
smartin has joined #riscv
jmdaemon has quit [Ping timeout: 268 seconds]
jmdaemon has joined #riscv
jmdaemon has quit [Read error: Connection reset by peer]
jmdaemon has joined #riscv
jmdaemon has quit [Read error: Connection reset by peer]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
EchelonX has joined #riscv
jmdaemon has joined #riscv
vagrantc has joined #riscv
mahmutov has quit [Ping timeout: 260 seconds]
<gordonDrogon> I did some work in the i860 once upon a time - lost too many braincells on it, but it could do a multiply and add per cycle, but you had to sort of manually pump the pipeline. it was horrible to recover from and restart after an interrupt too.
<gordonDrogon> and re. the cost of transistors - this old (ish) timer moans at all the stuff they put on SoCs now - when I just want a simple CPU why burden the package with wi-fi,bt, AI, and who knows what else. That's progress, I'm told ...
<sorear> wi-fi is an interesting example because of the antenna but the short answer is "you'd save much more area by getting rid of the pad ring"
smartin has quit [Quit: smartin]
bauruine has quit [Remote host closed the connection]
Ivii has quit [Quit: Leaving]
Andre_H has quit [Quit: Leaving.]
<muurkha> a nice thing about grunty coprocessors is that they don't have to implement interrupts (other than reset)
<muurkha> I mean reset is an interrupt but the machine is useful even if your program can't recover from it
nvmd has quit [Quit: Later, nerds.]
nvmd has joined #riscv
jmdaemon has quit [Quit: WeeChat 3.3]
jmdaemon has joined #riscv
jmd has joined #riscv
nvmd has quit [Quit: Later, nerds.]
jmdaemon has quit [Quit: WeeChat 3.3]
jmd has quit [Quit: ZNC 1.8.2 - https://znc.in]