sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
jmdaemon has joined #riscv
jmdaemon has quit [Changing host]
jmdaemon has joined #riscv
jmdaemon has quit [Client Quit]
<muurkha> solrize: 6.6 DMIPS per MHz is extremely impressive; that would be about 5 instructions per cyle?
<muurkha> *cycle
<muurkha> from https://linuxgizmos.com/hifive-unmatched-sbc-showcases-new-fu740-risc-v-soc/ I had concluded that a CoreMark was about 2 Dhrystone MIPS, and this is 7.6 CoreMarks per MHz
<solrize> i dunno, it is comparable to current top arm cpu, maybe it is equivalent in 32 bit mips, so a single addition in 64 bits is like 2 or 3 instructions in the mips count
<muurkha> you mean the Apple ARMs?
<muurkha> or the weaker ARM ARMs?
<solrize> arm76 i think
<muurkha> I don't think there are a lot of 64-bit adds in Dhrystone, maybe in CoreMark
<solrize> see the cnx discussion thread
<solrize> dhrystone is very bogus
<muurkha> of course
mahmutov has quit [Ping timeout: 256 seconds]
<muurkha> the Apple M1 ARM blows all the Cortex-A* designs away in performance, doesn't it?
<solrize> dunno
<solrize> this says the new riscv thing is comparable to cortex a75
<solrize> i still want to see some benchmarks with gcc -trapv
ahs3_ has quit [Remote host closed the connection]
ahs3_ has joined #riscv
radu242407 has quit [Ping timeout: 256 seconds]
<muurkha> -ftrapv?
ahs3 has quit [Quit: Ex-Chat]
ahs3_ has quit [Remote host closed the connection]
ahs3 has joined #riscv
pecastro has quit [Ping timeout: 256 seconds]
ahs3 has quit [Remote host closed the connection]
ahs3 has joined #riscv
cwebber has quit [Ping timeout: 252 seconds]
<solrize> that throws an error on signed int overflow, which is undefined behaviour in C and causes a lot of program bugs and CVE's. would also like to see multi precision arithmetic benchmarks
<solrize> the idea is to test the effect of riscv not having an overflow flag, which is a recurring criticism of the architecture
<muurkha> I know, I meant "do you mean -ftrapv instead of -trapv?"
<muurkha> I think the missing carry flag is the more common criticism
<muurkha> if you made the registers 33 bits and added an ADC instruction that adds the carry bit from one register into the value in another, maybe the criticism would mute somewhat. but I think it's more that people aren't used to it than that it's actually bad
cwebber has joined #riscv
<Sofia> muurkha: I'm glad we don't have add-with-carry on normal ints. We have it for vectors, vadc, where it makes sense.
<Sofia> If you want to do high performance multi-precision arithmetic, you don't want to do carries.. those are slow dependent instructions. You want to defer carry propagation. Use a larger radix, I.e. 2^51 as linked last night.
<Sofia> We do this even on x86. adc doesn't help for perf.
<Sofia> smaller radix*
<Sofia> 2^51 is 2^64 - 13, where 13 is the number of additions you can do between carry propagations.
<Sofia> s,- ,/ 2^
<sorear> 8086 doesn't have adc "because crypto" it has adc because you need it for 32-bit numbers and quite a lot of variables need to be 32-bit, especially when you can address more than 64kB of memory
<Sofia> Going the other way; you can use "holes" to do computations which overflow, then mask, and combine. https://docs.rs/crate/polyval/latest/source/src/backend/soft64.rs
<Sofia> See: fn bmul64(x: u64, y: u64) -> u64 {
<Sofia> High performance multi-precision is more than just crypto.
<Sofia> If you're not doing high-performance, then you didn't need it anyway.
<sorear> it's a few things, but if you can do 32 or 64 bit natively that cuts the number of uses way down, because most numbers are small
<Sofia> Ah you're talking about 16-bit days. I forget x86 started 16-bit. :P
<Sofia> Also; aren't slt-add pairs friendly for fusing? I I don't know the C details off the top of my head for these two, but that'd be a 32-bit or 64-bit fuse in the best or worst case.
<sorear> not really since you now have 3 inputs
<Sofia> Hmm
<sorear> if you're optimizing for long multiprecision adds... you've got 3 memory instructions for every 5 arithmetic, the memory probably cost more
<sorear> (as well as a kind of unavoidable loop-carried dependency)
* Sofia nods
<jrtc27> ah that thread
<jrtc27> *cares about performance*
<jrtc27> *doesn't bother to allocate registers such that the adds are compressible*
<sorear> on closer inspection all of the add/sub/shift through carry instructions were taken directly from the 8080
<sorear> imagine an 8 bit arch that took riscv's approach to carries, it wouldn't work at all.
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
* Sofia imagines the memory availability of an 8-bit machine and the size of even RISC-V's compressed instructions...
<sorear> afaik all of them had 16 bit address spaces
<sorear> you were not expected to populate all of that, sometimes they didn't even bond out all the address pins
<Sofia> Better and yikes
<Sofia> XD
<Sofia> Intel wants to take back the reset of the 64-bits in its address space IIUC.
<Sofia> I wonder what for.
<Sofia> rest*
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
<sorear> hmm? they have five-level paging now because 128 TiB is no longer an unreasonable amount of memory for scale-up customers
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
<solrize> amd64 has 5 levels i thought?
<solrize> muurkha, yeah -ftrapv
<solrize> sofia it's not just about multi precision, it's about detecting errors instead of having them mess things up unnoticed
<solrize> *it's also
<Sofia> solrize: Which exactly? Having carry flags or adding with carry?
<Sofia> sorear: On a single socket...?
<Sofia> Detecting overflows for errors I get. slt or blt are what you want there usually.
jacklsw has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
MichaelZhu has quit [Read error: Connection reset by peer]
PyroPeter has quit [Ping timeout: 260 seconds]
PyroPeter has joined #riscv
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
MichaelZhu has quit [Read error: Connection reset by peer]
<muurkha> Sofia: oh, that's interesting! I didn't have any idea about the performance cost of ADC
<muurkha> I think it might be more reasonable to say that x86 started 8-bit
<Sofia> Either way x86 is a nightmare we shouldn't need to use...
MichaelZhu has joined #riscv
<Sofia> muurkha: See the /query I opened btw. Re graphics.
MichaelZhu has quit [Remote host closed the connection]
<muurkha> https://dercuano.github.io/notes/computer-algebras-ii.html#addtoc_4 has some of my notes on the history of the x86 instruction set
MichaelZhu has joined #riscv
<muurkha> although there are some embarrassing errors in it
<muurkha> I think actually RV32C is competitive on code density with things like the 8080 and the 6502, because on those processors you had to do a lot of fiddling around to do simple things (like, as sorear mentioned, 32-bit arithmetic, and even 16-bit in most cases)
<Sofia> "needed 16.9 ms for a division" 60Hz is already a budget of 16.66 ms. No realtime on that thing.. XD
MichaelZhu has quit [Ping timeout: 252 seconds]
MichaelZhu has joined #riscv
MichaelZhu has quit [Ping timeout: 268 seconds]
mahmutov has joined #riscv
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
<solrize> Sofia, adc helps for multiprec, carry flag helps for error detection, riscv omits both. once there is a carry flag there is probably also adc of course. anyway there doesn't have to be either if there is some other mechanism, but there isn't one afaict
<solrize> just some handwaving about op fusion
<solrize> i did see that gmp mailing list thread
MichaelZhu has quit [Ping timeout: 250 seconds]
<Sofia> solrize: My point there was if you're doing high performance math with larger than XLEN wide numbers, then you should be using a reduced radix representation. 51-bit limbs in 64-bit words is commonly used in cryptography, for instance. Or 28-bit limbs in 32-bit words. This defers the communication between the limbs/digits. Eventually, in this case after 13 additions, you must do the carry
<Sofia> propagation (or at least check if you need to, and you only need to if you want to do more additions than could fit in the remaining bits)
<solrize> sofia that is only because of using float64 because of the originally slow int mult on the pentium
<Sofia> What? I didn't mention floats.
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
frost has joined #riscv
<solrize> the 51 bit limbs resulted from 53 bit mantissas in float doubles
<muurkha> heh, yeah, an IBM Type 650 was not going to be generating a 60Hz video signal
<muurkha> there are more reasons to use 51-bit limbs than squeezing 53-bit multiplies out of an FPU
<muurkha> like the one Sofia mentioned above
MichaelZhu has quit [Ping timeout: 256 seconds]
<solrize> if you add 13 51-bit ints you get at most a 55 bit int
MichaelZhu has joined #riscv
<Sofia> ((1<<52) - 1) * 13 = 0b11001111111111111111111111111111111111111111111111110011, equal length to 55-bit indeed. Ah. I misrecalled. It isn't 13 additions, it is 2^13 additions.
MichaelZhu has quit [Remote host closed the connection]
<Sofia> ((1<<52) - 1) << 13 = 0b11111111111111111111111111111111111111111111111111110000000000000.
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
<solrize> you normally won't do 2**13 additions of such size as part of a multiplication. i will check how gmp does it, but for cryptography sized numbers usually you'd make a convolution-style product, do lots of 64*64->128 mults and accumulate into 3 registers using adc
<solrize> look at the MIRACL paper for how to do that in 32 bits on a a register starved x86-32
<solrize> 386-style i mean
MichaelZhu has quit [Remote host closed the connection]
MichaelZhu has joined #riscv
<Sofia> Multiplication eats the bits a lot faster :)
<solrize> the original pentium and maybe 486 had much faster FMUL than MUL, enough that it was worth doing the MAC's in floating point
MichaelZhu has quit [Remote host closed the connection]
<solrize> but later processors fixed that
MichaelZhu has joined #riscv
MichaelZhu has quit [Remote host closed the connection]
<Sofia> Fun
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
davidlt has joined #riscv
aburgess has quit [Ping timeout: 252 seconds]
octav1a has quit [Ping timeout: 265 seconds]
BOKALDO has joined #riscv
frost has quit [Quit: Connection closed]
frost has joined #riscv
frost has quit [Quit: Connection closed]
svendre has joined #riscv
pabs3 has quit [Ping timeout: 240 seconds]
mahmutov has quit [Ping timeout: 250 seconds]
davidlt has quit [Ping timeout: 268 seconds]
pabs3 has joined #riscv
svendre has quit [Quit: Client closed]
jjido has joined #riscv
winterflaw has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
BOKALDO has quit [Quit: Leaving]
jjido has joined #riscv
jmdaemon has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
mahmutov has joined #riscv
mahmutov has quit [Ping timeout: 240 seconds]
davidlt has joined #riscv
<Sofia> So any of the RISC-V extensions write to more than a single register in a single instruction?
<Sofia> Do*
<Sofia> Fusions aside.
BOKALDO has joined #riscv
pecastro has joined #riscv
la_mettrie has quit [Quit: WeeChat 2.7.1]
la_mettrie has joined #riscv
jmdaemon has quit [Quit: WeeChat 3.3]
jacklsw has quit [Quit: Back to the real life]
gktrk has quit [Ping timeout: 245 seconds]
torbjornsd has joined #riscv
aburgess has joined #riscv
mahmutov has joined #riscv
jjido has joined #riscv
jacklsw has joined #riscv
octav1a has joined #riscv
cwebber has quit [Ping timeout: 252 seconds]
MichaelZhu has joined #riscv
EchelonX has joined #riscv
MichaelZhu has quit [Quit: Leaving...]
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid343 has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
davidlt has quit [Ping timeout: 256 seconds]
jjido has joined #riscv
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid343 has joined #riscv
davidlt has joined #riscv
jacklsw has quit [Quit: Back to the real life]
Andre_H has joined #riscv
voltron has joined #riscv
voltron has quit [Remote host closed the connection]
radu242407 has joined #riscv
gdd has quit [Ping timeout: 252 seconds]
gdd has joined #riscv
mahmutov has quit [Ping timeout: 268 seconds]
elastic_dog has quit [Ping timeout: 250 seconds]
elastic_dog has joined #riscv
cousteau has joined #riscv
radu242407 has quit [Ping timeout: 250 seconds]
cousteau has quit [Read error: Connection reset by peer]
winterflaw has quit [Remote host closed the connection]
radu242407 has joined #riscv
radu242407 has quit [Read error: Connection reset by peer]
radu242407 has joined #riscv
BOKALDO has quit [Quit: Leaving]
radu242407 has quit [Read error: Connection reset by peer]
radu2424078 has joined #riscv
radu2424078 has quit [Read error: Connection reset by peer]
<gordonDrogon> Frustrating: /usr/riscv64-linux-gnu/include/bits/wordsize.h:28:3: error: #error "rv32i-based targets are not supported"
voltron has joined #riscv
jellydonut has quit [Quit: jellydonut]
jellydonut has joined #riscv
cousteau has joined #riscv
radu2424078 has joined #riscv
voltron has quit [Remote host closed the connection]
voltron has joined #riscv
voltron has quit [Remote host closed the connection]
voltron has joined #riscv
voltron has quit [Remote host closed the connection]
radu2424078 has quit [Quit: The Lounge - https://thelounge.chat]
radu2424078 has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
radu2424078 has quit [Read error: Connection reset by peer]
davidlt has quit [Ping timeout: 256 seconds]
zjason has quit [Read error: Connection reset by peer]
zjason has joined #riscv
radu2424078 has joined #riscv
radu2424078 has quit [Read error: Connection reset by peer]
radu2424078 has joined #riscv
mahmutov has joined #riscv
radu2424078 has quit [Read error: Connection reset by peer]
radu2424078 has joined #riscv
radu2424078 has quit [Read error: Connection reset by peer]
smartin has quit [Remote host closed the connection]
smartin has joined #riscv
radu2424078 has joined #riscv
radu2424078 has quit [Read error: Connection reset by peer]
radu2424078 has joined #riscv
mahmutov has quit [Ping timeout: 250 seconds]
<jimwilson> rv64 linux distros aren't multiarch/multilib, as there is no rv64 hardware or kernel support for rv32 binaries
<cousteau> are there rv32 linux distros at all? I thought it was only rv64
<jimwilson> openembedded supports rv32, gentoo wants to support rv32 last I looked
<cousteau> also I thought it was possible to set all/some rv64 into "32-bit mode" by writing to some register
<cousteau> jimwilson: oh I see
<cousteau> didn't know
<jimwilson> the ISA supports rv32 on rv64, I don't know of any hardware that implements it
<cousteau> oh... yeah I should've supposed it was optional
radu2424078 has quit [Quit: The Lounge - https://thelounge.chat]
<cousteau> anyway, gotta go! Bye
radu2424078 has joined #riscv
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
winterflaw has joined #riscv
<jimwilson> Sofia, the ISA supports mulhsu because it was effectively free for the hardware to provide it
<jimwilson> gcc can generate it with a widening signed/unsigned multiply https://godbolt.org/z/3fW6bcrGE
<muurkha> jimwilson: makes sense
jjido has joined #riscv
vagrantc has joined #riscv
<jimwilson> Sofia, I asked for add-with-carry in https://github.com/riscv/riscv-bitmanip/issues/166
<jimwilson> the vector extension can write results to register pairs/quads/octuples, but that is still one result
<jimwilson> I don't know of any extension that can produce two results from one operation
<jimwilson> the V extension has add-with-carry, but it is two instructions, one to produce the sum, and one to produce the carry out
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
torbjornsd has quit [Remote host closed the connection]
winterflaw has quit [Ping timeout: 276 seconds]
<gordonDrogon> I'm just trying to cross-compile on my x86 linux desktop, but I understand it a bit more now - target is rv32i, plan is to compile some C to understand what gcc-as expects then get that to a freestanding binary which I can then use to slowly test my emulator.
cwebber has joined #riscv
<jimwilson> debian and ubuntu 21.04+ have multilib riscv-elf cross compilers, you can download precompiled toolchains from various places like sifive, codesourcery, and embecosm
radu2424078 has quit [Ping timeout: 260 seconds]
<jimwilson> or you can try building one yourself from github.com/riscv/riscv-gnu-toolchain
<la_mettrie> apt-get install gcc-riscv64-unknown-elf
<la_mettrie> (...includes risc32)
Andre_H has quit [Quit: Leaving.]
jimwilson has quit [Quit: Leaving]
aburgess_ has joined #riscv
aburgess has quit [Ping timeout: 252 seconds]
jimwilson has joined #riscv