#riscv on 2023-09-03 — irc logs at libera.irclog.whitequark.org

2023-08-11 11:05 sorear changed the topic of #riscv to: RISC-V instruction set architecture | https://riscv.org | Logs: https://libera.irclog.whitequark.org/riscv | Matrix: #riscv:catircservices.org

00:00 Starfoxxes has quit [*.net *.split]

00:00 zkrx has quit [*.net *.split]

00:00 Starfoxxes has joined #riscv

00:00 zkrx has joined #riscv

00:14 <deathmist> hmm reverting the commit didn't seem to affect anything... I'll have to come back to this tomorrow

00:47 dlan has quit [Ping timeout: 244 seconds]

00:47 Tenkawa has joined #riscv

00:48 dlan has joined #riscv

00:49 <Tenkawa> mps: mine did keep lower cpu load like yours after I tried again... it was just really slow to go down after the boot.. I wasn't waiting long enough

00:50 <Tenkawa> So I do like the load avg workaround (which works with my xfs chainload need)

00:53 Tenkawa has quit [Quit: Was I really ever here?]

01:04 Trifton has joined #riscv

01:09 <notgull> How do RiscV systems boot? Do you have access to the usual BIOS interrupts?

01:11 <muurkha> no

01:12 <muurkha> the RISC-V ISA spec itself doesn't specify that at all, and different implementations of the ISA vary a lot

01:12 <muurkha> huh, I didn't realize Western Digital was interested in running Linux on RISC-V: https://github.com/westerndigitalcorporation/RISC-V-Linux/tree/master

01:13 <muurkha> I thought their interest in RISC-V was embedded microcontrollers for their disks

01:14 <sorear> if by "usual BIOS interrupts" you mean 10h video, 13h disk, etc that is an IBM standard and not applicable outside the 16-bit x86 world

01:15 <notgull> Hmm, I see

01:15 <muurkha> there's a presentation (again from WD!) from 02019 at https://riscv.org/wp-content/uploads/2019/12/Summit_bootflow.pdf with lots of colorful slides about how this has changed over time with respect to the HiFive boards

01:17 <notgull> I see, so SBI is generally what is used?

01:17 <sorear> that's not really what SBI does

01:18 <notgull> Huh, so how do boot loaders, for instance, read from the disk before drivers are set up?

01:19 <sorear> depends on the boot stage and the platform

01:19 <sorear> u-boot has its own set of disk drivers, or you might use the UEFI disk interface

01:20 <notgull> Hmm, so there's no real standard?

01:20 <notgull> Oh, so UEFI can be used in this instance?

01:21 <sorear> there are several real standards, but trying to be all things to all people has unavoidable costs

01:22 <notgull> I get it, so I have to research which platform

01:22 <notgull> Im developing for?

01:22 <sorear> yes

01:23 <muurkha> yeah. RISC-V is more like "8086" and less like "IBM PC"

01:23 <notgull> I get it. Thanks!

01:23 <sorear> if you want to support multiple platforms, you can take a device tree or UEFI/ACPI approach

01:23 <notgull> 👍

01:26 Narrat has quit [Quit: They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance.]

01:29 MaxGanzII_ has quit [Ping timeout: 246 seconds]

01:36 drmpeg has quit [Ping timeout: 248 seconds]

01:49 drmpeg has joined #riscv

02:02 <muurkha> how widespread is the use of x8 as a frame pointer these days?

02:35 <muurkha> I feel like the ABI's requirement (?) of always maintaining 16-byte alignment coupled with the lack of LDM/STM instructions or preincrement/postincrement addressing modes means that the regular stack pointer is pretty much a frame pointer; you can't push a register on the stack because that would break alignment. instead you have to addi to allocate a new stack frame, at which point you start storing

02:35 <muurkha> stuff in it

02:35 <sorear> how are you operationally defining "is pretty much a frame pointer"?

02:36 guerby has joined #riscv

02:36 <muurkha> well, I mean what you push and pop are entire activation records, not register values

02:36 <sorear> the point of x8 is that you can do a precise backtrace in a dozen lines of code, i.e. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/kernel/stacktrace.c#n43

02:36 <muurkha> so if you wanted to maintain a traversable linked list of stack frames without any debugging metadata, you could do it without a separate frame pointer register

02:37 <muurkha> yeah, that's exactly what I was thinking about

02:37 <sorear> you need the link to have a linked list

02:38 <muurkha> yes, for sure

02:38 <sorear> x8 also gets used in non-ABI fashion for local variable access in functions that use VLAs or alloca

02:38 <muurkha> is struct stackframe in asm/stacktrace.h?

02:39 <muurkha> aah, thanks, I hadn't thought about that

02:39 <muurkha> I forget, the RISC-V ABI doesn't define a red zone, does it?

02:40 <muurkha> not that that would help with VLAs...

02:40 <sorear> no red zone, everything below sp is volatile

02:41 <muurkha> so you have to displace sp to alloca(), at which point you can no longer index your local vars off sp

02:41 <muurkha> also you can no longer addi to restore sp in the epilogue; you have to read the saved sp out of the stack frame

02:42 <muurkha> thanks!

02:42 <sorear> with the current implementation the saved sp *is* the stack frame, and also the dwarf cfa

02:42 <sorear> which ironically means that you do in fact use addi to restore sp, since mv sp,s0 is an alias for addi x1,x8,0

02:43 <sorear> (locals are negative(x8), stack arguments are positive(x8))

02:51 <muurkha> current implementation in what? I don't know anything about dwarf unfortunately

02:51 <muurkha> is this for C variadic-supporting calling conventions where the callee may not know enough to pop its own arguments?

02:52 <sorear> i'm familiar with frame pointer discipline in gcc and llvm, which is essentially the same and depends on -fno-omit-frame-pointer

02:52 <sorear> the callee never pops its own arguments

02:53 <sorear> functions with more than 8 words of arguments are rare enough that nobody's bothered to define a PASCAL/STDCALL for riscv

02:53 <muurkha> well, in amd64 SBCL and most Pascal implementations it does

02:54 <muurkha> I don't know if SBCL supports RISC-V (yet?)

02:54 <muurkha> the note about how mv is actually addi is well-taken. I wasn't thinking about that :)

02:57 <dh`> basically it's the same as mips

02:57 <muurkha> you could say that about most things in RISC-V :)

02:57 <dh`> there are certain disadvantages to fomiting on the frame pointer by default, but mostly you come out ahead

02:58 <dh`> especially in a world that seems to have accepted the notion of having enormous unwind tables and libunwind all over the place

02:58 * muurkha towels off the frame pointer, eyeing dh` evilly

02:59 <sorear> if sbcl/risc-v is not using the standard calling convention (likely, if it needs to support fully general tail calls) it's out of scope for the ABI

02:59 <muurkha> yeah, the tradeoffs that make sense may be different in a world where development machines routinely have tens of gigabytes of RAM

02:59 <muurkha> agreed, sorear

02:59 <jrtc27> c.f. GHC's STG that does its own wildly-different thing internally

02:59 <jrtc27> (full of CPS)

03:00 <muurkha> hmm, I feel like combinator graph reduction is really kind of different from CPS...

03:00 <muurkha> I don't remember whether SBCL supports fully general tail calls. CL doesn't require it

03:01 <sorear> I know V8 does some weird ABI stuff for tail call reasons

03:01 ntwk has quit [Quit: ntwk]

03:01 <muurkha> why would fully general tail calls require violating the ABI?

03:02 <sorear> more annoying is that riscv copies the arm/mips frame layout with pc and sp at the high addresses... except pc and sp are *low* registers on riscv, which means that the saved register area is in reverse register order, but they decided to make cm.push use forward register order, so it's incompatible with frame pointer ABI

03:03 <muurkha> cm.push?

03:03 <sorear> if A() calls B(x) and there are no argument registers, A needs to allocate stack space for x, but since it's a tail call A cannot deallocate stack space, which leaves B as the only option

03:03 <sorear> from Zcmp

03:03 <muurkha> aha

03:03 <jrtc27> yeah cm.push not putting ra and sp round the other way sucks

03:04 <jrtc27> (as for the frame layout, what's specified is whatever gcc did long ago...)

03:05 <muurkha> sorear: I'm not following you about A() and B(x)

03:05 <muurkha> by B(x) do you mean B(a, b, c, d, e, f, g, h, x), so you ran out of argument registers?

03:07 <muurkha> also I think you can totally deallocate stack space; it's just addi sp, sp, 16, once you're done storing whatever you needed to store on the stack for arguments

03:07 <sorear> muurkha: I said "and there are no argument registers"

03:07 <muurkha> but in the RISC-V ABI there are argument registers, 8 of them?

03:08 <muurkha> (not counting floating-point)

03:08 <dh`> yes but obviously the same thing happens if you run out

03:08 <sorear> are we not discussing hypotheticals here?

03:08 <muurkha> oh, I thought we were talking about whether the RISC-V ABI prevented general tail-call elimination?

03:09 <sorear> for the purposes of answering that question, 8 and 0 are both finite numbers and therefore equivalent

03:09 <muurkha> (SBCL always defines its own ABI on every platform and it's always weird as hell)

03:09 <muurkha> okay. so you need to allocate stack space for x, so that on entry to B, sp points at x, right?

03:09 <sorear> yes

03:10 <sorear> but when B returns to A and B's caller, 0(sp) is part of the caller's stack frame

03:11 <muurkha> can't A just allocate stack space to call B with with addi sp, sp, -16?

03:11 <muurkha> oh, now I understand

03:11 <muurkha> that won't work if the A and B's caller is responsible for deallocating it

03:11 <muurkha> as it must be in ABIs that support C varargs

03:12 <muurkha> is that what you were saying?

03:13 <sorear> [enormous unwind tables] I'm not very enthusiastic about frame pointers these days because I don't see an unsymbolized list of return addresses as particularly useful; if you have enough information to symbolize it, you can decode a stack dump and there are fewer things that can go wrong

03:13 <muurkha> since it's a tail call A cannot deallocate stack space after B returns

03:13 <muurkha> that was the part I was failing to grasp

03:14 <sorear> frame pointers were invented for ancient compilers that change sp mid-expression to handle argument pushing and can't track that in their symbol tables, and are being kept alive as a half-useful workaround for DWARF CFI being barely fit for purpose

03:15 <sorear> [after B returns] precisely

03:15 <muurkha> frame pointers are also useful for spaghetti stacks

03:16 <sorear> i'd say something about 16-bit x86 and its lack of [SP+imm] but I'm not sure if that's a chicken or an egg

03:17 <muurkha> yeah, [bp+imm] is pretty important on the 8086

03:18 <muurkha> but clearly that was designed because the designers were previously familiar with frame pointers

03:18 <muurkha> maybe due to the iAPX432's B5000 heritage? or maybe the 432 started later

03:23 <muurkha> RVC kind of swings the other way: there's c.lwsp, c.swsp, etc., but no c.lwfp and c.swfp. you can use c.lw and c.sw to index off x8 but you only get 5 bits of offset and can only access the 8 RVC registers

03:26 <muurkha> for things like Smalltalk and Scheme you'd maybe like a "self pointer" or "closure pointer" register, but RVC was optimized for C, not for Smalltalk

03:27 <muurkha> and the penalty of having to use a full-width instruction is a lot less severe than the corresponding things with Thumb-1 or 8086

03:28 <muurkha> does the ABI require you to leave gp and tp unchanged so your callees have access to them? it's not clear to me in the version I'm reading here, I'm just inferring that from their names. if so, does that also apply to interrupt handlers, or is it okay to save them, clobber them, and then restore them before you call a callee?

03:33 <dh`> it only matters if you care about the debugger being able to cope

03:35 madge has joined #riscv

03:38 vagrantc has quit [Quit: leaving]

03:38 <sorear> anything that you expect to run in a unix shared libraries environment needs to leave gp and tp unchanged at all times, because the main program might install a signal handler that accesses a _Thread variable

03:38 <sorear> if you control the interrupt process and can install a good gp/tp before running the interrupt handler, you have more freedom

03:40 kaol has quit [Server closed connection]

03:41 <sorear> [optimized for C] i would say that the more you optimize, especially type-aware and flow-aware optimizations, the more all languages converge on something that resembles RISC instructions

03:41 kaol has joined #riscv

03:42 <sorear> if it were truly "optimized for C" it'd look more like VAX with complex addressing modes and memory-memory instructions

03:46 <dh`> maybe not, generating those from a C compiler isn't exactly trivial

03:47 <dh`> anyone remember Hobbit?

04:02 billchenchina has joined #riscv

04:04 BootLayer has joined #riscv

04:04 <muurkha> the AT&T chip?

04:05 <muurkha> sorear: thanks! that's kind of what I thought

04:05 <muurkha> I think it's reasonable that everything ends up resembling RISC instructions

04:07 <muurkha> the particular thing I was talking about being optimized for C was that there's no compressed instructions for loading and storing instance variables/closure variables

04:08 <muurkha> maybe that's not really so important, since even in Smalltalk or Scheme you end up accessing local variables a lot more often than those

04:21 EchelonX has joined #riscv

04:31 madge has quit [Quit: madge]

04:31 <sorear> if it's c++ your instance pointer will normally be in a0, and you can use compressed instructions to access instance variables...

04:41 <dh`> yeah, the AT&T chip

04:42 <dh`> also re closure variables, ordinarily your closure pointer's going to be an argument...

04:45 zjason` is now known as zjason

04:45 <muurkha> a way RISC-V could be more optimized for C would be to have more i386-like or ARM-like addressing modes in its load/store instructions. Smalltalk, Java, ML, and Lisp only need simple base+offset access modes, because their stack frames and records are simple vectors

04:46 <muurkha> i386 has basereg+offsetreg(*scale)+immediate

04:46 <dh`> uh, all of those languages have arrays of some kind

04:47 <muurkha> yeah, but not embedded inside another object

04:47 <dh`> doesn't matter, offsets still aren't fixed

04:47 <muurkha> fp and sp are normally arguments (passed from the caller) and also call-preserved; that's what you'd want for instance pointers too

04:48 <dh`> to access the a1'th element of the array in a0, you do shl t1, a1, 2; add t1, t1, a0; lw t1, 0(t1)

04:49 <muurkha> hmm? I mean in C if you have an array that is a local variable, or an access to a struct field inside an array, you index it with frame pointer + immediate offset (to the beginning of the array, or to the struct field) + scaled index (for the index into the array)

04:49 <dh`> if the array is at some offset inside a struct, that offset replaces the 0 in the lw

04:49 <muurkha> yes

04:49 <dh`> (unless it's too large, but that's a different issue)

04:50 <dh`> so whether arrays are embedded in structs or not is immaterial

04:50 <muurkha> but typically adding three addends like that is a short enough path length to fit inside a clock cycle

04:50 <dh`> you save one instruction by having a lw t1, (a0 + t1) instruction, like sparc did

04:50 <dh`> and another by assimilating the shift like x86

04:51 <muurkha> so on an in-order microarchitecture you can win by having an x86-like addressing mode

04:51 <dh`> I think the answer to that is supposed to be "micro-op fusion"

04:51 <muurkha> a big complicated chip can do that with micro-op fusion, sure

04:51 <sorear> this is your irregular reminder that carry-save adders exist

04:52 <muurkha> yes, that's why three addends isn't especially slower than two

04:52 <sorear> you can forward a sum into another addition or subtraction at negligible cost

04:52 <muurkha> but you still need to wait for the carries to propagate to form the effective address to put on the memory bus

04:52 <dh`> and this is why sparc (and mips64 too) had register + register addressing

04:53 <muurkha> yeah

04:53 <dh`> the reason riscv doesn't is that it doesn't fit in the instruction word, or alternatively does but only at the cost of making it a lot more irregular

04:54 <muurkha> plausibly, yeah. it helps if you can use an instruction format with fewer registers, like RVC

04:54 Jackneill_ has joined #riscv

04:55 <muurkha> usually the biggest cost of this sort of thing is that you need three ports on the register file, which makes the bits bigger than if you only need two. but you need three ports anyway to do regular RISC-V instructions in one cycle

04:55 <muurkha> (or, for that matter, regular 8086 instructions)

04:55 <dh`> arguably, 16 registers is enough if you don't waste several of them like arm32 did

04:55 <sorear> x86 has 4 operands (segment base, base reg or pc, scaled index, displacement), arm and mips only does base reg and index OR displacement

04:56 <muurkha> the number of registers depends on what you're doing

04:56 <sorear> regular risc-v instructions are all 2R1W, which is enough for base + scaled index loads but not stores

04:56 <muurkha> yes, true!

04:56 <dh`> I can't remember the last time I saw code that had > 16 locals all live at once that didn't also need a rewrite

04:57 <muurkha> as a trivial example, emulating an arm32 can be significantly faster if you can have 16 locals all live at once

04:57 <dh`> maybe aggressive inlining changes that

04:57 <muurkha> or, say, 19 or so

04:57 <sorear> i have a suspicion it was mostly sized for dgemm

04:57 <muurkha> also I think there are cryptographic algorithms that would get a significant speedup that way

04:58 <dh`> maybe, I haven't looked in any crypto sources in a long time

04:58 <muurkha> sorear: what, ARM's 16 registers?

04:58 <dh`> by tradition they're usually encrypted after all

04:58 <sorear> four independent 2d arrays with general strides and upper/lower bounds, and if you can keep the floats separate that's great

05:01 davidlt has joined #riscv

05:01 <gurki> i hope that the hpc folks will have some nice proposals for specific instructions; as it is riscv performance is rather abysmal in comparison to "classical" architectures :S

05:02 <gurki> but then, we kinda lack hardware thats even meant to compete to begin with

05:03 <muurkha> what kind of classical architectures do you mean?

05:03 <gurki> x86, arm

05:03 <muurkha> amd64?

05:03 <muurkha> I had assumed you meant, like, Cray-1

05:03 <gurki> i consider that a part of x86 :3

05:03 <gurki> (im aware its an extension)

05:04 <muurkha> I don't think there's ever been an amd64 chip with a part number ending in "86"

05:04 <gurki> well its even worse for gpus but thats by no metric a fair or reasonable comparison so i skipped it

05:04 <dh`> there hasn't been anything with a part number ending in "86" for a good twenty years

05:05 <muurkha> I feel like mostly RISC-V performance sucks compared to things like the M1 because people aren't fabbing RISC-V parts in those process nodes

05:05 <dh`> thirty if you don't count cyrix and early amd stuff

05:05 <muurkha> I mean *also* there's microarchitecture stuff like scoreboards and branch predictors

05:05 <gurki> nah. fabbing at 5nm doesnt get you _that_much_

05:06 <dh`> actually I bet someone still makes 486s for industrial/hardened apps

05:06 crabbedhaloablut has joined #riscv

05:06 <muurkha> but I think that's sort of only useful if you have the real estate for all the functional units

05:06 <muurkha> dh`: I think you can still get an 80186

05:06 <dh`> could be

05:06 <gurki> thats the thing. nowadays cpus are fast since they kinda are isa + big blob of stuff that actually makes it fast

05:07 <gurki> thats kinda the next step for riscv imho

05:08 <muurkha> dh`: https://www.digikey.com/en/products/detail/rochester-electronics-llc/N80186/12122323 says Rochester Electronics has 7363 AMD N80186 chips in stock for US$17 each

05:08 <muurkha> 8 MHz in a 68-PLCC

05:08 <muurkha> I think the PLCC form factor actually postdates the 80486?

05:09 <muurkha> it's not RoHS compliant, though, so evidently this isn't intended for new designs

05:10 EchelonX has quit [Quit: Leaving]

05:27 _whitelogger has joined #riscv

05:36 junaid_ has joined #riscv

05:54 Jackneill_ has quit [Ping timeout: 244 seconds]

06:02 freakazoid332 has joined #riscv

06:06 frkzoid has quit [Ping timeout: 258 seconds]

06:09 agent314 has quit [Ping timeout: 246 seconds]

06:09 agent314 has joined #riscv

06:28 agent314 has quit [Ping timeout: 255 seconds]

06:48 MaxGanzII_ has joined #riscv

06:56 jmdaemon has joined #riscv

07:01 josuah has quit [Quit: zzz]

07:10 bjoto has quit [Ping timeout: 258 seconds]

07:12 bjoto has joined #riscv

07:17 junaid_ has quit [Remote host closed the connection]

07:29 billchenchina has quit [Remote host closed the connection]

07:31 junaid_ has joined #riscv

07:43 billchenchina has joined #riscv

08:04 Stat_headcrabed has joined #riscv

08:13 Stat_headcrabed has quit [Quit: Stat_headcrabed]

08:43 billchenchina has quit [Remote host closed the connection]

08:53 elastic_dog has quit [Ping timeout: 248 seconds]

09:28 josuah has joined #riscv

09:30 josuah has quit [Client Quit]

09:39 josuah has joined #riscv

10:13 somlo has quit [Server closed connection]

10:13 somlo has joined #riscv

10:42 BootLayer has quit [Quit: Leaving]

10:45 JanC has quit [Remote host closed the connection]

10:45 JanC has joined #riscv

10:49 elastic_dog has joined #riscv

11:24 stolen has joined #riscv

11:24 terminalpusher has joined #riscv

11:57 BootLayer has joined #riscv

12:29 ntwk has joined #riscv

12:53 dilfridge has quit [Server closed connection]

12:53 dilfridge has joined #riscv

13:07 MaxGanzII_ has quit [Ping timeout: 246 seconds]

13:58 MaxGanzII_ has joined #riscv

14:08 indy_ is now known as indy

14:14 stolen has quit [Quit: Connection closed for inactivity]

14:16 elastic_dog has quit [Ping timeout: 246 seconds]

14:24 elastic_dog has joined #riscv

14:38 ntwk has quit [Quit: ntwk]

14:41 davidlt has quit [Ping timeout: 255 seconds]

14:43 sevan has quit [Quit: leaving]

14:49 Jackneill_ has joined #riscv

15:01 stolen has joined #riscv

15:10 ntwk has joined #riscv

15:16 phoooo has joined #riscv

15:17 Andre_Z has joined #riscv

15:18 phoooo has quit [Client Quit]

15:49 davidlt has joined #riscv

16:00 heat has joined #riscv

16:01 Stat_headcrabed has joined #riscv

16:04 terminalpusher has quit [Ping timeout: 246 seconds]

16:05 Stat_headcrabed has quit [Client Quit]

16:16 terminalpusher has joined #riscv

16:59 ntwk has quit [Ping timeout: 245 seconds]

17:30 Andre_Z has quit [Quit: Leaving.]

18:16 MaxGanzII_ has quit [Remote host closed the connection]

18:17 MaxGanzII_ has joined #riscv

18:22 davidlt has quit [Ping timeout: 250 seconds]

18:27 elastic_dog has quit [Ping timeout: 245 seconds]

18:39 elastic_dog has joined #riscv

18:55 Jackneill_ has quit [Ping timeout: 258 seconds]

19:05 sevan has joined #riscv

19:05 crabbedhaloablut has quit []

19:11 aerkiaga has joined #riscv

19:27 vagrantc has joined #riscv

19:50 BootLayer has quit [Quit: Leaving]

20:00 prabhakarlad has joined #riscv

20:15 Andre_Z has joined #riscv

20:45 junaid_ has quit [Remote host closed the connection]

20:46 terminalpusher has quit [Ping timeout: 246 seconds]

20:47 sevan has quit [Ping timeout: 245 seconds]

20:48 junaid_ has joined #riscv

21:00 Andre_Z has quit [Quit: Leaving.]

21:31 vagrantc has quit [Quit: leaving]

21:34 stolen has quit [Quit: Connection closed for inactivity]

21:36 junaid_ has quit [Remote host closed the connection]

21:41 Bluefoxicy has quit [Ping timeout: 246 seconds]

21:49 Bluefoxicy has joined #riscv

22:23 heat has quit [Remote host closed the connection]

22:23 MaxGanzII_ has quit [Ping timeout: 246 seconds]

22:49 ntwk has joined #riscv

23:03 aerkiaga has quit [Remote host closed the connection]

23:42 sevan has joined #riscv