sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid12345 has quit [Ping timeout: 250 seconds]
freakazoid343 has joined #riscv
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
jwillikers has quit [Read error: Connection reset by peer]
jwillikers has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
balrog_ has quit [Quit: Bye]
jwillikers has quit [Read error: Connection reset by peer]
jwillikers has joined #riscv
jacklsw has joined #riscv
balrog has joined #riscv
jwillikers has quit [Remote host closed the connection]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
vagrantc has quit [Quit: leaving]
cousteau has quit [Quit: ♫ I can't forget the day I shot that network down ♫]
kaph has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 250 seconds]
cwebber has joined #riscv
<Sofia> jimwilson: Thanks for all that information ! <3
cwebber has quit [Remote host closed the connection]
<Sofia> jimwilson: Do you know a document which describes the versions like you have?
<jimwilson> drmpeg, I'm "retired", at least that is the story for now, until I get bored or run out of money
<jimwilson> Sofia, I reported the doc bugs to SiFive via the forums and got an ack, but it could be months before the docs are updated
<jimwilson> I don't know of any docs explaining the version numbering scheme
<drmpeg> Cool. I think we're all going to miss your insider info though.
<drmpeg> I've been retired for almost 11 years now and love it. It's not for everyone I guess. But for me, I'm happy with taking a shower as being my big accomplishment for the day.
<drmpeg> I did get a little wound up this fall and did a pretty good size project. So still trying to stay sharp. https://github.com/drmpeg/gr-atsc3
<Sofia> Okay. Thanks jimwilson
<Sofia> Can't trust the docs -> Good thing I plan to generate programs which test the timing cost functions against the target hardware. :)
<Sofia> Delta debugging style to learn the right details.
<jimwilson> the instruction scheduling code in gcc and llvm is another source of info, e.g. gcc/config/riscv/sifive-7.md in gcc, this info is sometimes tweaked to avoid secondary effects but generally should be accurate
<jimwilson> there are plenty of toolchain related projects I could work on, but for now I've been finding other things to do as I unwind from my SiFive job
<Sofia> .md, not markdown. A lispy language?
<Sofia> "machine description"
<Sofia> Thank you GitHub CSS classes.
<sorear> it's a gcc bespoke thing
<sorear> lispy because gcc history
kaph_ has joined #riscv
kaph has quit [Read error: Connection reset by peer]
kailo has quit [Quit: Leaving]
<jimwilson> rms likes lispy languages
cwebber has joined #riscv
<Sofia> Hmm. When multiply is implemented in 3-cycles and mulh-mul is fused. Are both registers written to in the same cycle or would they only preserve the progress to shortcut the latter instruction? In other words; are both registers visible after 3 cycles, or is one visible after 3, and the other after 4?
<Sofia> Considering the RVM spec recommends this fusion, I'd expect some cores to implement the "shortcut" method for 3,4 cycles respectively.
<Sofia> To restate my question. When a core implements this fusion, **can** it decide between these strategies or do other details imply it would need to be 3-cycle dual-write?
<Sofia> If a core has multi-issue; then I'd hope the complexity of "stealing" an unused writeback would not be too great. But such a stealing strategy would either need a proof that not all execution units could be saturated or would need to be able to queue the writeback.
BOKALDO has joined #riscv
<sorear> again, you're asking implementation questions about implementations that don't publically exist
<Sofia> Kinda. I'm asking implementation space potential questions.
<sorear> if I were implementing this I would *not* do the fusion for implementations that have a pipelined multiplier (too much complexity to save 1 cycle in a case that's already quite infrequent in general code); if you have an iterative multiplier, the multiply would save the high part in a hidden register and the mulh would copy from that hidden register, as a 1 cycle move instead of repeating an 8-60 cycle iterative multiply
<sorear> (or the other way around? i forget what the recommended fusion sequence is)
<Sofia> For now at least; I'll consider the fusion latency as the maximum latency of the involved instructions. Then let this invariant be broken one day and investigate how to better model it then.
<Sofia> mulh-mul per RVM. div-rem per RVV. According to the fusion paper you linked.
<Sofia> So this "hidden register" you'd do would be my "shortcut" version. Sufficiently simple. At least implying the "macro-op" details here are basically just issueing the two instructions on the same execution unit such that the latter can see the internal precomputed register.
<Sofia> Or otherwise equivalent up to implementation details.
<sorear> you really don't want to write two registers in one cycle, that doubles the complexity of your writeback logic
<Sofia> Right.
<Sofia> Regarding my fusion modeling.. does this seem reasonable? Latency = max of subcomponent latencies + additional writeback registers.
<Sofia> So in this case for a 3-cycle mul; both output registers of mulh-mul are available after 4 cycles.
<Sofia> At least until an implementation proves the model insufficiently expressive.
il has left #riscv [WeeChat 3.2]
<Sofia> or 3,4, +1 per additional writeback in order.
<sorear> what you're trying to do doesn't make sense to me
<Sofia> Measure latency for (potentially) fused instruction sequences?
<sorear> in the most general case a micro-op takes zero or more inputs, possibly on different cycles, and produces zero or more outputs, possibly on different cycles
<sorear> rocket generates FPU flags on a different cycle from FPU results
<sorear> most of the Arm cores, given ADD R1, R2, R3 LSL #16, allow R2 to be ready one cycle later than R3 (because it gets read after the shift has already happened)
<sorear> so in general you have latencies from one input to one output
<sorear> (which can also be data dependent)
<Sofia> Hmm
<Sofia> I don't know how to read that ARM notation.
<Sofia> Hm
<Sofia> slli r3 r3 16, add r1 r2 r3 ?
<sorear> no, because r3 isn't modified
<sorear> r1 = r2 + (r3 << 16)
<Sofia> Er. So more like: slli r1 r3 16, add r1 r1 r2
<sorear> misleading if r1 and r2 are the same
<Sofia> Perhaps but now I at least understand what that means. :)
* Sofia shall enjoy the variance of intervals until she one day has a cycle accurate simulator available or a fancy core with all the nice optimizations to matter.
riff-IRC has quit [Remote host closed the connection]
riff-IRC has joined #riscv
elastic_dog has quit [Quit: elastic_dog]
elastic_dog has joined #riscv
elastic_dog has quit [Ping timeout: 268 seconds]
elastic_dog has joined #riscv
gioyik has quit [Quit: WeeChat 3.3]
kaph has joined #riscv
kaph_ has quit [Read error: Connection reset by peer]
jacklsw has quit [Quit: Back to the real world]
jacklsw has joined #riscv
jimwilson has quit [Remote host closed the connection]
jimwilson has joined #riscv
bauruine has joined #riscv
kaph has quit [Remote host closed the connection]
kaph has joined #riscv
kaph has quit [Remote host closed the connection]
kaph has joined #riscv
jacklsw has quit [Read error: Connection reset by peer]
pa4wdh has joined #riscv
jjido20 has joined #riscv
kaph_ has joined #riscv
kaph has quit [Ping timeout: 256 seconds]
EchelonX has quit [Quit: Leaving]
pa4wdh has quit [Quit: Leaving]
jwillikers has joined #riscv
kaph_ has quit [Remote host closed the connection]
kaph has joined #riscv
aerkiaga has joined #riscv
kaph has quit [Read error: Connection reset by peer]
kaph has joined #riscv
Andre_H has joined #riscv
kaph_ has joined #riscv
kaph has quit [Ping timeout: 268 seconds]
jwillikers has quit [Remote host closed the connection]
bauruine has quit [Remote host closed the connection]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
jjido20 has quit [Quit: Connection closed]
freakazoid343 has joined #riscv
aerkiaga has quit [Remote host closed the connection]
freakazoid12345 has quit [Ping timeout: 250 seconds]
BOKALDO has quit [Quit: Leaving]
organizedglobals has quit [Quit: Client limit exceeded: 20000]
crabbedhaloablut has quit [Remote host closed the connection]
crabbedhaloablut has joined #riscv
kaph_ has quit [Ping timeout: 240 seconds]
kaph has joined #riscv
kaph has quit [Read error: Connection reset by peer]
kaph has joined #riscv
freakazoid12345 has joined #riscv
<geertu> Just noticed "[PATCH 00/13] riscv: compat: Add COMPAT mode support for rv64"
organizedglobals has joined #riscv
freakazoid343 has quit [Ping timeout: 240 seconds]
<geertu> I was under the impression before that this was nevver gonna be supported?
kaph_ has joined #riscv
kaph has quit [Read error: Connection reset by peer]
<jrtc27> seems like a waste of complexity to me
<jrtc27> but then again I'd have said the same about rv32 support at all in the kernel
<jrtc27> if you're running a 64-bit kernel there's not really a good reason to run a 32-bit userspace
<jrtc27> outside of possibly using rv64 as a fast native builder for rv32 linux binaries, but that seems far too niche to deserve dedicated kernel support...
kaph has joined #riscv
kaph_ has quit [Read error: Connection reset by peer]
mahmutov has joined #riscv
BOKALDO has joined #riscv
<jimwilson> rv32gc/ilp32d code can be smaller than rv64gc/lp64d code, smaller pointers means smaller data, can also be faster, smaller data means less memory bandwidth and less cache pressure
<jimwilson> also, there are still people that haven't ported their code to 64-bit targets yet, seriously, I still see this in real world code
jacklsw has joined #riscv
<geertu> I can see a use case for Linux on rv32
<jimwilson> the rv64 ISA has support for running rv32 binaries, it was only a matter of time before someone implemented it
<geertu> as long as I cannot fit an rv64 core in the -25 ECP5 on my OrangeCrab ;-)
<la_mettrie> isn't there lots of small ARM32 embedded devices running linux still being made?
<geertu> la_mettrie: Sure there are
winterflaw has quit [Ping timeout: 276 seconds]
Noisytoot has quit [Ping timeout: 256 seconds]
freakazoid343 has joined #riscv
kaph_ has joined #riscv
kaph has quit [Read error: Connection reset by peer]
freakazoid12345 has quit [Ping timeout: 252 seconds]
jacklsw has quit [Quit: Back to the real life]
freakazoid12345 has joined #riscv
jjido has joined #riscv
freakazoid343 has quit [Ping timeout: 250 seconds]
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
gdd has joined #riscv
winterflaw has joined #riscv
kehvo has quit [Quit: WeeChat 3.2.1]
kaph_ has quit [Ping timeout: 250 seconds]
cousteau has joined #riscv
kaph has joined #riscv
kehvo has joined #riscv
gdd has quit [Ping timeout: 240 seconds]
kehvo has quit [Quit: WeeChat 3.3]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 260 seconds]
kaph_ has joined #riscv
kaph has quit [Read error: Connection reset by peer]
<sorear> i'm more concerned about the fact that the rv32/rv64 interactions still aren't completely specified
<sorear> like what gets written into sepc and stval
BOKALDO has quit [Quit: Leaving]
riff-IRC has quit [Read error: Connection reset by peer]
riff-IRC has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 252 seconds]
<muurkha> cousteau: "picture". branch within the current "picture"? :)
<muurkha> "picture window"
<cousteau> but that doesn't match the "one image is worth 1000 words" phrase
<cousteau> (also, don't branches reach like a 1M window?)
<cousteau> "a gallery" or "an album"
<sorear> videos?
<muurkha> cousteau: the PDP-8 referred to the "picture" as a "page" (even though it didn't have VM) because its direct addressing instructions, for both branches and operands, had a 7-bit word address which was resolved to a 12-bit absolute address, not by adding it to PC, but by replacing the low 7 bits of PC with it (or, as another option, indexing a 6502-like page 0)
<muurkha> I think the usual phrase is "a picture is worth a thousand words"
<cousteau> I thought it was because a (cache) "line" had several words, and a "page" had several "lines"
<cousteau> oh, is it? ..dammit
<cousteau> sorear: I was thinking on "videos" too
<cousteau> ...ok so maybe because of translation mismatches; the Spanish phrase is "an image is worth more than a thousand words". Guess "picture" is a valid translation for "imagen" too.
<muurkha> what is it, un imagen vale mil palabras?
<cousteau> "una imagen vale más que mil palabras" (one image is worth more than a thousand words)
<cousteau> so it was fitting that an "imagen" would be 1024 words
<muurkha> right
<cousteau> (they don't clarify how much more than 1000 it's worth, but I'm going to guess it's exactly 1024)
<muurkha> WP also mentions the Chinese proverb "Hearing something a hundred times isn't better than seeing it once" (百闻不如一见, p bǎi wén bù rú yī jiàn)
<cousteau> so... let's call it "picture"? :) Although "page" is probably more extended anyway
<muurkha> "picture" is much better I think
<muurkha> and because on RISC-V it slides with PC, it could be a "picture window", which is a phrase meaning a large single-pane glass window (typically with a good view)
<muurkha> geertu: yeah, Linux on RV32 makes a lot of sense to me
<muurkha> I mean obviously Linux is useful on computers that are smaller than current cellphones but bigger than microcontrollers. I've used Linux on ARM32 quite a ot
<muurkha> *lot
<muurkha> with MMUs tho
mahmutov has quit [Ping timeout: 250 seconds]
Narrat has joined #riscv
zjason` has joined #riscv
zjason has quit [Ping timeout: 268 seconds]
shicz2 has quit [Ping timeout: 256 seconds]
shicz2 has joined #riscv
kaph_ has quit [Remote host closed the connection]
kaph_ has joined #riscv
Gravis has quit [Ping timeout: 256 seconds]
cousteau is now known as cousteau_
cousteau_ is now known as cousteau
<dh`> I've said before that I disagree with the premise that unix on rv32 isn't useful
<dh`> also speaking with my packager hat on, there are in fact quite a few things out there that aren't 64-bit clean
kaph_ has quit [Remote host closed the connection]
kaph_ has joined #riscv
freakazoid343 has quit [Read error: Connection reset by peer]
freakazoid343 has joined #riscv
Gravis has joined #riscv
kaph_ has quit [Read error: Connection reset by peer]
kaph has joined #riscv
jjido has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
aerkiaga has joined #riscv
Sofia has quit [Remote host closed the connection]
cousteau has quit [Ping timeout: 268 seconds]
kaph has quit [Read error: Connection reset by peer]
freakazoid12345 has joined #riscv
freakazoid343 has quit [Ping timeout: 245 seconds]
kaph_ has joined #riscv
Sofia has joined #riscv
freakazoid343 has joined #riscv
freakazoid12345 has quit [Ping timeout: 240 seconds]
drmpeg has quit [Quit: ChatZilla 0.9.93 [Waterfox 56.6/20211006150338]]
<muurkha> aye
<muurkha> unix on rv32 without virtual memory would be maybe a lot more useful right now
<muurkha> actually, can you do that today with qemu? either i386-linux, rv64-linux, or arm9-linux?
<muurkha> (cortex-a15-linux?)
<sorear> do what, and why arm9?
<sorear> ARM9 or armv9?
drmpeg has joined #riscv
<muurkha> armv9, sorry
<muurkha> let me try to restate more clearly so you can tell me if what I'm thinking makes sense
<muurkha> suppose, somewhat improbably, someone has an RV32 processor with enough RAM to run Linux (a few megabytes) but no MMU (like all the RV32 implementations I know of)
<muurkha> perhaps they can compile qemu to run on bare metal, i.e., without an underlying operating system, to cause the RV32 processor to emulate some other processor which already runs Linux
<sorear> there's already nommu linux though
<sorear> someone would need to get fdpic working to have a 'reasonable' native environment
<muurkha> yes, uClinux, but it doesn't provide memory protection between processes, so application bugs will crash your program
<muurkha> machine
<sorear> NOMMU on arm platforms configures the MPU automatically, should be doable on rv32 as well when someone does it
<muurkha> or is nommu linux separate from uClinux?
<muurkha> since qemu is capable of enforcing memory protection within a single Unix process, it must be doing the relevant checks dynamically, in software (?)
<sorear> uClinux was a fork that was active ~2 decades ago, it's all mainline now
<muurkha> so maybe qemu or something similar would be a practical way to get Linux with memory protection and fork() working on an RV32 with no MMU
<sorear> you get memory protection and vfork/posix_spawn
<muurkha> an MPU helps with not crashing the kernel from a user process but I don't think it allows you to implement fork() (?)
<sorear> (you could of course implement fork in the _traditional_ way and just copy the entire process image somewhere else, but that won't fly on linux)
<muurkha> heh
<muurkha> how much of a userland exists today with only vfork and posix_spawn? is there a usable shell and an implementation of Make?
freakazoid12345 has joined #riscv
<sorear> I believe so but I haven't really tried it
<muurkha> I mean presumably bash, zsh, gnumake, etc., are a no-go
cousteau has joined #riscv
<muurkha> the performance cost of running everything under qemu might be pretty heavy but maybe, as you say, less than copying the entire process image every time you context switch between two processes that have forked
<sorear> it's a totally reasonable approach if you're on a PDP and data segments are limited to 8kb
freakazoid343 has quit [Ping timeout: 240 seconds]
<muurkha> but PDPs have segment registers, rv32 doesn't
<muurkha> also weren't PDPs capable of handling 64KiB data segments?
<sorear> I forget exactly what the restrictions were and what they applied to
<muurkha> -11s, I mean, not -8s, which AFAIK didn't have segment registers
<sorear> -7 and -11 are the ones relevant to unix i think
<muurkha> yeah, and I don't think Unix on the -7 had fork(), just exec()
<muurkha> programs would exit by exec()ing the shell, kind of like CP/M
<muurkha> anyway, does the qemu approach sound reasonable?
<sorear> no comment
<muurkha> heh
<muurkha> sorry
<sorear> hard for me to get excited about anything for machines with less than a watt of ram, unless you have hundreds of them in which case the criteria are very different
<sorear> or if the supply chain crisis gets exponentially worse and a watt of ram goes back to being a few kilobytes, that will also get my attention
<sorear> qemu-system has a TON of system dependencies, getting multithreaded glib to work without a mmu may be as difficult as whatever it was you were trying to avoid
vagrantc has joined #riscv
<muurkha> aha, thanks!
<muurkha> a watt can be a very significant amount; Ambiq's Apollo3 chips plausibly can run 0.3 megs of RAM on 0.1 mW, which would put a watt of RAM at 3 gigs
<sorear> the human brain uses ~10W on average, so going from 1W for your toy to 0.5W isn't saving resources at a system level, it's just making things pointlessly more difficult
<sorear> unless, again, you have many of them per person
<muurkha> only 10W?
<muurkha> there are some significant obstacles to turning beans and rice into electrical power. the apparatuses that can do it are both much bigger and much more unreliable than a CPU, and it's pretty inconvenient
<muurkha> there's no essential reason that has to be true, but it happens to be true at the moment
<sorear> that's over a 24 hour period, total body energy use is ca. 100W, brain is 10% of that
<sorear> no idea what the peak is
<muurkha> I agree that going from 1W to 0.5W isn't very useful, though
<sorear> the ultimate input is sunlight and PV cells are ca. 200 times more efficient at capturing energy than crops (any of them)
<muurkha> https://en.wikipedia.org/wiki/Brain#Metabolism says 20-25% of basal metabolism, so it could be 10% if you're getting a substantial amount of exercise, for example because you're pulling the starter cord on your computer every 10 seconds because it uses a watt
<muurkha> yes, that is true. and solar cells are clearly a viable way to power computation, as demonstrated by numerous solar calculators over the years
<muurkha> but for your solar cell to work indoors, it needs to be amorphous, which means only 10% efficient, and indoors your total illuminance is only 1-5 W/m²
<muurkha> so a portable computer that consumes a watt and can work off ambient light indoors needs at least 20 dm² of solar panels. if it instead consumes a milliwatt, it can get by on a 2 cm² PV cell
<muurkha> Ambiq chips and Sharp memory LCDs require about 0.1 mW each, so milliwatt-scale computing is technically feasible
<sorear> why would I put a solar cell indoors? I have a roof and plenty of wire
<muurkha> so far Ambiq's chips are ARM, but it wouldn't be surprising if they switched to RV32IMA or something, since GD seems to have gotten a 3× power reduction from doing that, and Ambiq is all about power reduction
<muurkha> if your computer has to be connected to plenty of wire to work, it stops being portable
<sorear> that's what the battery is for
<muurkha> a battery triples the weight of a 1-watt computer and makes it fail all the time, like, typically several failures a month
<muurkha> and wears out
<muurkha> even the 0.1-watt Swindle is mostly battery, though it weighs less than other computers such as cellphones
Andre_H has quit [Quit: Leaving.]