freakazoid333 has quit [Ping timeout: 256 seconds]
fabs has quit [Ping timeout: 240 seconds]
dlan has quit [Remote host closed the connection]
pho has joined #riscv
dlan has joined #riscv
pabs3 has quit [Ping timeout: 248 seconds]
pabs3 has joined #riscv
hendursa1 has quit [Quit: hendursa1]
hendursaga has joined #riscv
vagrantc has joined #riscv
Andre_H has joined #riscv
wgrant has quit [Ping timeout: 248 seconds]
<meowray>
dropped glibc riscv's reliance on _GLOBAL_OFFSET_TABLE_[0]
* meowray
feels good
jwillikers has quit [Remote host closed the connection]
jwillikers has joined #riscv
Narrat has joined #riscv
mahmutov has joined #riscv
solrize has quit [Ping timeout: 240 seconds]
jwillikers has quit [Remote host closed the connection]
jwillikers has joined #riscv
solrize has joined #riscv
solrize has quit [Changing host]
solrize has joined #riscv
Narrat has quit [Quit: They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance.]
Narrat has joined #riscv
davidlt has quit [Ping timeout: 240 seconds]
BOKALDO has quit [Quit: Leaving]
Xav101 has joined #riscv
<Xav101>
Hey I'm trying to write some asm here, and I need to figure out how to store a value in a register to a .hword in the .bss section. How do you do this in asm?
hendursaga has quit [Remote host closed the connection]
hendursaga has joined #riscv
<sorear>
how do you store anything in memory?
<Xav101>
sw/h/b?
<sorear>
are those applicable to this problem? why or why not?
<Xav101>
I'd assume so? bss is stored in memory.
<Xav101>
and RISC-V is load store meaning you have to use store commands to write to memory
<sorear>
which of those would you use for a .hword?
extropiator has joined #riscv
<Xav101>
sh
<sorear>
(I think you're right but I've been burned in the past by expecting size mnemonics to make sense, nobody can agree on what a "word" is)
<Xav101>
I moreso don't understand the semantics of the assembly.
<sorear>
can you give an example of something that's confusing?
<Xav101>
For example you'd have your .bss and foo: .hword
<Xav101>
how would I sh register t3 for example to foo
<sorear>
sh requires the address you store to to be in a register. so you'd need to somehow get the address of foo into a register.
<sorear>
(or another nearby address)
<Xav101>
right
<Xav101>
so what would the semantics be for that?
<sorear>
what do you know about the sh _assembly_ forms? (there's some gas magic that can help here)
<Xav101>
nothing
<Xav101>
are you talking about the pseudo-ops?
<Xav101>
Because I saw those in the manual but I have no clue what the fuck they do
<sorear>
yes
<sorear>
what do you know about ... the gas manual seems to call them "modifiers". %hi and friends
<sorear>
if you're using an assembler you will need to either use a pseudo-op or a modifier to make an address happen
<Xav101>
The "relocations"?
<Xav101>
I would prefer to just use pseudo-ops but I don't understand how those pseudo-ops work
<jrtc27>
relocations are one way in which the modifiers are implemented
<sorear>
can you give a bigger picture of what you're trying to learn and how you've set up your development environment?
<sorear>
initially this looked like a class assignment but since you mentioned trying to understand the manual this may be a bit more… freeform?
<Xav101>
yeah it's more freeform. I'm just trying to learn it
<Xav101>
my environment is more or less just binutils cross-compiled for RISC-V and a linker script to learn how those worked (eg. putting sections in memory)
extropiator has quit [Quit: Client closed]
<sorear>
so if you used the linker script to put foo at a _known_ address, say 0x1234, how would you do the store?
<sorear>
related question: how can you put 0x1234 into a register?
<sorear>
there are, broadly, two different ways to approach this problem, one that's more magical and one that gives you a direct view of what's going on, not sure which you would want to approach first
vagrantc has quit [Quit: leaving]
<Xav101>
well if I wanted to put the constant into the register I'd use li usually
<Xav101>
which breaks down into what, a lui and an addi rd, zero, imm?
<sorear>
that's the general 32-bit case yes
<sorear>
in riscv32 a lui and an addi can construct any 32-bit constant (64-bits is more complicated)
<sorear>
gas has a "la" form which works like "li" but for addresses of symbols
<xentrac>
heh, yes, more complicated
<sorear>
so you can do "la t4, foo; sh t3, 0(t4)" and get 3 instructions
<sorear>
but this is a bit wasteful, since the addi can be combined with the sh
<sorear>
so there's also a pseudo form of the load/store instructions "sh t3, foo, t4" which generates just 2 instructions (t4 is overwritten with the result of the lui)
<sorear>
(conversely "lh t3, foo"; you don't need an extra register because it uses t3 twice)
<sorear>
that's the "more magical" way. it doesn't give you control over the exact sequence of instructions, which is a problem if you're trying to do scheduling
<sorear>
you can (32-bit only, and ignoring linker complications) consider "la t4, foo" equivalent to "lui t4, %hi(foo); addi t4, t4, %lo(foo)". %hi and %lo are magic functions in the assembler that break an address into pieces you can use in immediate operands
<sorear>
with me so far?
<xentrac>
hmm, doesn't the linker have to implement %hi and %lo as additional relocation types?
<xentrac>
not just the assembler, i mena
<sorear>
we'll get there...
<xentrac>
sorr
<xentrac>
y
<sorear>
this would be easier to explain if gas had binary output. i'm pretending it does to try not to explode heads
<sorear>
i'm nearly to the point of recommending people who want to learn assembly to install dos and nasm :p 8086 machine code is tidy by the standards of everything that came after it
<xentrac>
it has more than its share of irregularities
<xentrac>
add, adc, sub, and sbb set the carry flag, but inc and dec don't. does loop set it? can you remember without checking the manual?
<sorear>
I don't think loop touches flags
<sorear>
it has to work the way it does because if you're doing multi-precision math, you need looping instructions that don't disturb CF
<sorear>
including pointer manipulation
<xentrac>
I think you're right, but I also think the RISC-V approach of not having any flags is a lot simpler
<xentrac>
but definitely having 13 types of relocations in ELF is a drawback of RISC-V
<jrtc27>
be like powerpc, make flag-setting based on whether or not there's a dot after the mnemonic
<jrtc27>
just 13?
<xentrac>
I think it might be more but I felt safe in saying 13
wgrant has joined #riscv
<jrtc27>
we're up to number 58
<jrtc27>
though there are some holes
<xentrac>
58!
<jrtc27>
well, one hole
<jrtc27>
of size 4
<jrtc27>
no no just 58, not 58! :P
<sorear>
risc-v is mostly simpler as an ISA (apart from the limited immediate size) but trying to learn assembly on a modern unix with pervasive dynamic linking is folly
<xentrac>
heh, permutations of relocations
<xentrac>
I think RV32I is actually simpler than the 8080
<jrtc27>
aarch64 has about 120-odd
<jrtc27>
about because my counting may have skipped or double-counted in the odd place, and they have both lp64 and ilp32 relocations (most are lp64-only or for both, but a handful are ilp32-only)
<sorear>
am I the only one who worries about how soon we're going to run out of reloc types on riscv32
<jrtc27>
(and I just counted everything)
<jrtc27>
yes
<jrtc27>
I worry too
<jrtc27>
the current scheme is stupid, we have 64-bit-only relocations occupying precious encoding space for rv32
<jrtc27>
those should've been put beyond 255
<jrtc27>
well, riscv32 is probably fine, even allowing for the custom extension space of 64 relocations
<jrtc27>
but I worry about cheri-riscv32 when it gets standardised...
<jrtc27>
(well, if)
<jrtc27>
also why do we have TPREL32 and TPREL64 etc, should just have one TPREL that's based on the ABI...
<jrtc27>
I mean it's only about 6 relocations I could shave off if given a fresh start rather than inheriting an existing spec
<jrtc27>
but still, that's ~2.5% of the encoding space
<jrtc27>
that's also one of the reasons why I don't want to specify all these experimental things like the overlay proposal or compact code model stuff until it's been proven and definitely will be used
Xav101 has quit [Ping timeout: 252 seconds]
vagrantc has joined #riscv
<xentrac>
why based on the ABI?
<jrtc27>
those relocations are used to initialise GOT entries
<jrtc27>
if you know the ABI you know what the word size, and thus GOT entry size, is
<jrtc27>
same as how there's only one R_RISCV_RELATIVE, R_RISCV_JUMP_SLOT, R_RISCV_IRELATIVE
<jrtc27>
also don't know why R_RISCV_32 and R_RISCV_64 both exist, you only ever want the one that corresponds to your address size...
<xentrac>
oh, I see
<jrtc27>
(although llvm and binutils both let you write .4byte foo with rv64 and do indeed emit an R_RISCV_32... seems like a bad idea if you ask me...)
<xentrac>
I wonder if you could have gotten away without a TPREL* relocation type at all by having an init section in the .so that loops over its GOT to initialize it, if that's all you need them for
<jrtc27>
every single architecture that uses traditional ELF TLS has a TPREL relocation type
<jrtc27>
how does your magic section know which entries to initialise with what?
<jrtc27>
bearing in mind that it could be referring to symbols in a different object at run time
<xentrac>
I suppose the dynamic linker would have to invoke it with the addresses of the shared libraries in some kind of alist or something, but I haven't really thought it through at all
<xentrac>
my own prejudice is that it's desirable to avoid dependencies in the object file format on the ABI, so that people can experiment with varying the ABI without having to design their own object code format, though obviously that has its limits
<jrtc27>
all that does is invent your own object format with an ad-hoc ELF relocation equivalent
<jrtc27>
and things like TLS, GOTs and PLTs are rather tied to ELF
<jrtc27>
different object formats are associated with OSes that do things differently
<xentrac>
for example, I've been trying to figure out how to extend the ABI to support CLU iterators
<xentrac>
I think it should be doable in a way that's much more efficient than the standard function-pointer-plus-userdata approach
<jrtc27>
I don't see why you'd need any kind of dependency on the object file format
<jrtc27>
that's at a much higher level than something like TLS or function calls
<xentrac>
well, it's at the same level as function calls, really
<xentrac>
I mean the function calling convention is specifically what I'm modifying
<xentrac>
and I think that's valuable for applications like IMGUI, which is potentially a huge win for low-RAM microcontrollers
<jrtc27>
depends which part of function calls
<xentrac>
and I agree, I don't think things like that need a dependency on the object file format, even though they change the ABI
<jrtc27>
there's the calling convention part and there's the PLT+relocation part
<xentrac>
yeah
<sorear>
more efficient in what way?
<sorear>
you could use gcc nested functions :p
<xentrac>
running fewer instructions
* jrtc27
screams
<xentrac>
yeah, exactly
valentin has quit [Remote host closed the connection]
<jrtc27>
did you know riscv-tests used to have a nested function in it?
<jrtc27>
actually still does, never rebased that PR after it got approved...
<xentrac>
that's good
<xentrac>
but GCC nested functions are never going to be fast, so they're only useful for large-granularity stuff
<sorear>
why not? only a couple instructions
winterflaw has quit [Ping timeout: 244 seconds]
<xentrac>
well, consider the difference between
<xentrac>
if (button("Increment")) n++; numericTextField("Value", &n);
<xentrac>
and something using Ruby block arguments:
<xentrac>
(hmm, this is not the best example, maybe I should have used the more convincing example...)
<xentrac>
in this case `n` has to get allocated in memory by the time we call numericTextField anyway, but before that it can be in a call-preserved register in the first case, and the if... n++ can compile just to a conditional jump and an addi instruction
<jrtc27>
well gcc nested functions need an executable stack so no thank you
<xentrac>
irrelevant
<xentrac>
I mean the performance problem arises if you compile it as struct cbctx { int *np }; void cb(void *p) { struct cbctx *cp = p; ++*p->np; } button("Increment", &cb, &(struct cbctx){ &n });
<xentrac>
which doesn't require gcc's extension and might actually run faster
<jrtc27>
how else are you going to do it?
<xentrac>
well, basically the problem is that when button() returns it is guaranteed to have either not messed with the call-preserved register n is in, or restored it to its former glory after having clobbered it
<jrtc27>
gcc's extension is equivalent, it just doesn't have the luxury of a pair of pointers, it only gets to have the one function pointer, so needs to construct a thunk that embeds the data pointer
<xentrac>
I know
<jrtc27>
(and the struct is just the stack frame of the outer function)
<xentrac>
right
<xentrac>
(too bad we can't use stack frame types as struct types in other contexts)
<xentrac>
anyway, so when a CLU-style generator yields to a block from the caller, you would probably want it to restore the call-preserved register too. or at least some call-preserved registers
<jrtc27>
you can do that, just make yield actually return
<jrtc27>
and then have the caller re-call it if it's not done
<xentrac>
right, but then the generator doesn't have any place to efficiently store its own state
<xentrac>
preserving some or all of the call-preserved registers would allow you to compile the block { n++; } as something like addi %x9, $1; ret
<xentrac>
and if there are either normally-temporary registers the block is forbidden from clobbering, or call-preserved registers that are reserved for the generator, then the generator can use those registers to store its own state
<jrtc27>
hm, I see what you're getting at
<xentrac>
(I shouldn't say *all* of the call-preserved regsiters because obviously sp can't be restored while we're inside the block)
<pierce>
<jrtc27> "but I worry about cheri-..." <- What's CHERI RISC-V?
<xentrac>
a capability-secure version of RISC-V, it's super awesome
<xentrac>
my strawman asssembly code for amd64 lets me "compile" `while { *s } { *t++ = *s++; }` with a `while` that is actually a library function to five instructions for `{ *s }` and five more instructions for `{ *s++ = *t++ }`, with a `while` that is 8 instructions, allowing the caller of `while` to pass just two bare code pointers to it
<xentrac>
maybe I should have used that example, but I think it's kind of an unconvincing example because making `while` a library function is sort of transparently a bad idea
<sorear>
this feels perilously close to what ENTER and LEAVE were designed for
<xentrac>
how so?
<sorear>
block structured languages etc
mahmutov has joined #riscv
Andre_H has quit [Quit: Leaving.]
<xentrac>
well, I suppose GCC nested functions are a C implementation of functionality common to most block-structured languages
<xentrac>
but I don't think ENTER and LEAVE really help you to write CLU-style generators
<sorear>
not as much as they could, they mostly assume functions aren't first-class
NeoCron has quit [Remote host closed the connection]
<xentrac>
block arguments, as a language feature, have a couple of advantages over general closures like GCC's nested function implementation: they are statically safe with stack allocation discipline (they don't require a garbage collector), and they can be implemented much more efficiently
<sorear>
you seem to be trying to do funargs without a heap. pure dynamic scoping would be a funny approach
<xentrac>
yes, that's exactly right
<xentrac>
because why should I have to suffer through writing everything as first-order code just because I want guaranteed static space bounds?
<sorear>
does button { x++; } create an on-screen button and save the block to call every time the button is clicked?
<xentrac>
have you looked at IMGUI toolkits like Dear Imgui or React?
<sorear>
only very cursorily. i'm more familiar with tk
<xentrac>
right, so, Tk is retained-mode; your widgets have a persistent existence in memory
<xentrac>
the idea of IMGUI is that your GUI is a function that the framework calls, which goes through and invokes a bunch of subroutines to draw the currently existing widgets, which can thus exist only on the stack while they're being drawn
<xentrac>
this means you can do a GUI on a microcontroller with a few dozen bytes of RAM
<xentrac>
so the code calling button() gets invoked periodically, maybe 60 or 120 times a second
<xentrac>
and button() returns true precisely when the framework has detected, say, a click inside the area where it's drawing the button
<xentrac>
most of the hassles of GUI programming stem from keeping the state of the GUI in sync with the state of your underlying model, and the IMGUI approach eliminates those hassles by basically eliminating the state of the GUI
<xentrac>
so no, it doesn't save the block to call every time the button is clicked; it just invokes it (in Dear Imgui, by returning true so that the enclosing if() will fire) in that moment
<sorear>
do you know what a light pen is or how they used to work
<xentrac>
sure
<xentrac>
I mean I've read Sutherland's thesis
<xentrac>
and I've played Duck Hunt
<xentrac>
(it's probably easier to find a working NES than a working CRT TV to play Duck Hunt on nowadays)
<sorear>
Duck Hunt isn't exploring the full depth of this analogy since it uses a raster display
<sorear>
if you have a vector display, the light pen interrupt fires *during the draw call* of the pointed-to object
<xentrac>
well, Sutherland actually built a display list in half of memory
<xentrac>
but in principle, sure
<xentrac>
same idea
<xentrac>
also I think he didn't have interrupts
<xentrac>
most light pens after about 01970 used raster displays I think
jwillikers has quit [Remote host closed the connection]
rvalles has quit [Ping timeout: 248 seconds]
crabbedhaloablut has quit [Remote host closed the connection]
crabbedhaloablut has joined #riscv
mahmutov_ has joined #riscv
mahmutov has quit [Ping timeout: 252 seconds]
<dh`>
one of the things that you don't/can't have in elf relocations is a distinction between processor-specific and elf-specific operations
<jrtc27>
what do you mean?
<dh`>
e.g. "stuff an address in this auipc/lw pair" is riscv-specific and you need one for every combination of instruction fields the linker needs to be able to stuff values into
mahmutov_ has quit [Ping timeout: 248 seconds]
<dh`>
but "place an address into this word-sized location" is not, and neither is "place the GOT index here"
<jrtc27>
not all architectures have GOTs
<jrtc27>
PowerPC has TOCs
<jrtc27>
also what address you want isn't always clear
<jrtc27>
e.g. on ia64 you can get either the function entry point or the function descriptor address
<jrtc27>
both use the same symbol, which you get depends on the relocation used
<jrtc27>
not sure how PE/COFF dealt with that for Windows on Itanium
<dh`>
all of this is resolvable
<dh`>
what I'm claiming basically is that the mechanism relocation set should sit on top of the physical relocation set
<dh`>
even if the mechanism relocation set isn't 100% MI, it would still make things a lot easier
<jrtc27>
I mean in practice it is
<jrtc27>
there's a set of relocations every architecture has
<dh`>
kinda
<jrtc27>
just different values
<jrtc27>
and they all need special handling anyway so there's not that much to gain from centralising it
<dh`>
and they're often not quite the same because whatever intern at sun or sgi or wherever didn't get it right the first two times
<dh`>
so it's all highly irregular and a pain in the neck
<jrtc27>
if it were interns I think we'd have ended up with something far worse :)
<dh`>
I suppose so
<xentrac>
MI?
<jrtc27>
machine-independent
<xentrac>
oh sorry
<jrtc27>
term often used in the BSD world
<xentrac>
thanks
<xentrac>
worse, like PE?
<dh`>
I dunno, based on having written a linker a few years ago I'd say relocations could be systematized a lot
<sorear>
dh`: have you looked at the musl dynamic linker?
freakazoid343 has quit [Read error: Connection reset by peer]