freakazoid334 has quit [Read error: Connection reset by peer]
freakazoid334 has joined #riscv
winterflaw has joined #riscv
freakazoid334 has quit [Read error: Connection reset by peer]
freakazoid334 has joined #riscv
somlo has quit [Quit: Leaving]
somlo has joined #riscv
choozy has joined #riscv
rvalles has quit [Read error: Connection reset by peer]
rvalles has joined #riscv
Andre_H has joined #riscv
Raito_Bezarius has quit [Ping timeout: 246 seconds]
Raito_Bezarius has joined #riscv
vagrantc has joined #riscv
jeancf has joined #riscv
<xentrac>
happy Leibniz day!
jeancf has quit [Ping timeout: 268 seconds]
iorem has quit [Quit: Connection closed]
freakazoid334 has quit [Read error: Connection reset by peer]
freakazoid334 has joined #riscv
zjason has quit [Ping timeout: 265 seconds]
devcpu has quit [Remote host closed the connection]
jeancf has joined #riscv
<meowray>
how to disable .sdata?
<jimwilson>
For gcc, -msmall-data-limit=0 should work.
<jrtc27>
-G 0
<jrtc27>
I think for clang
<jrtc27>
but depending on what you aim to achieve it may not help
<jrtc27>
symbols outside .sdata can still be relaxed to be gp-relative if they are in range, .sdata is just a way to gather them up and try to make sure they're in rnge
<jrtc27>
seems clang also accepts -msmall-data-limit, but gcc doesn't accept -G
<meowray>
ok, the issue i am investigating is unrelated to .sbss/.sdata; echo 'int var; void *addr(){return &var; }'>a.c; riscv64-linux-gnu-gcc -O2 -fno-pic -mcmodel=medany a.c -fuse-ld=bfd -nostdlib -no-pie -Wl,-Ttext=0xffff0000,-Tdata=0x2000
<meowray>
ld.bfd rewrites auipc/addi to lui/addi if pc cannot reach the data and the data is close to address zero
<meowray>
even in --no-relax mode
Iryon has joined #riscv
jeancf has quit [Ping timeout: 272 seconds]
<jrtc27>
yeah, I've hated that reliance on relaxation for undef weak medany
<jrtc27>
it's the sole reason we have to build FreeBSD's kernel as a PIE rather than a PDE
<jrtc27>
oh so it's just a symbol that's in the low 2G (I guess a userspace address?)
<jrtc27>
having userspace symbols in the kernel is pretty gross...
<meowray>
it's something like vdso_image_base + link_time_address(__vdso_rt_sigreturn); to get link_time_address(__vdso_rt_sigreturn), we need lui/addi; however, medany only gives auipc+addi, the correctness is fragily provided by ld.bfd's instruction rewriting even in --no-relax mode
<meowray>
yes, link_time_address(__vdso_rt_sigreturn) is in the low 2G, and can be treated as an absolute address
<jrtc27>
so __vdso_rt_sigreturn is really what __vdso_rt_sigreturn would have if linked with a base address of 0?
<jrtc27>
(ie what the symbol value in a .so would have?)
<meowray>
yes, the code wants to compute st_value(__vdso_rt_sigreturn)
<jrtc27>
that's awful
mahmutov has joined #riscv
<jrtc27>
my opinion of that is __vdso_rt_sigreturn should be a size_t that contains that offset
<jrtc27>
rather than abusing linker symbols
<jrtc27>
ie do the arm vdso_offset_##name thing
<jrtc27>
riscv is the only architecture that does things like that
<jrtc27>
make the linux implementation look like other architectures and not abuse linkers and your problem goes away
<meowray>
how does arch/arm64/include/asm/vdso.h vdso_offset work?
<jrtc27>
shell script generates a bunch of #define vdso_offset_foo
<jrtc27>
which is both less efficient and less reliable
<meowray>
that is nice. it works even for -pie and -shared.
<jrtc27>
note that the riscv approach would not work for arm64, whether using bfd or lld, for exactly the same reason that riscv lld doesn't work
<jrtc27>
yeah, it's the sane way of doing it
* meowray
runs `make -k -j 50 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- LLVM=1 LLVM_IAS=1 O=/tmp/out/arm64 defconfig vmlinux` to better understand what vdso_offset does
<meowray>
compnerd's question: is the auipc/addi->lui/addi transformation in medany code model valid from psABI's perspective?
<jimwilson>
The auipc/addi must be fixed, and it doesn't matter how it gets fixed. I used lui/addi because it shows what happened, and it is trivial to implement, if you see that then you know that you had a undef weak reference.
<palmer>
oh, weird, someone just poked me about this on the Google chat
<palmer>
so I'm happy to take a fix that moves this away from the symbol stuff, I just ran into build system headaches last time
<meowray>
jimwilson: "The auipc/addi must be fixed" do you mean the lui/addi conversion is invalid?
<meowray>
i don't find vdo_offset in an arm64 build, with `make all`
<jrtc27>
they're #define's
<jimwilson>
in medany, auipc can't reach an address of zero, so it must be fixed, converting it to lui is the easy solution
<jrtc27>
well, that's under the assumption that generating auipc in the first place was legitimate
<jrtc27>
other architectures take the view that's bad codegen
<jrtc27>
I would argue fixing codegen is the cleaner approach
<jimwilson>
the compiler can't know that you have an undefined weak, other arches do the same conversion, though not all convert to valid instructions
<jrtc27>
no they don't
<jrtc27>
they know you have a maybe-undefined weak
<jrtc27>
aarch64 will always make that indirect
<jrtc27>
and only use pc-relative if it can see a definition in the same translation unit
<jrtc27>
that means no instruction rewriting needed
<jimwilson>
when I did the elf linker fix, I copied from other targets, so yes, there are other targets that do the same, there may also be other targets that handle this differently
<palmer>
The "undefined weak is 0" is a different problem than the VDSO symbols/offsets, it's s just the same line in binutils that happens to make both work
<jrtc27>
yes
<palmer>
IMO we shouldn't be relying on the near-0 addresses, and that shouldn't be in the spec
<jrtc27>
one needs to work without C source changes, the other is "don't do that"
<jrtc27>
agreed
<palmer>
OK, well, the guy from the Google chat said he'd send a fix for the Linux problem
<palmer>
So I think we're good there
<jrtc27>
great
<palmer>
IDK if there was also an argument about undefined weak, it's kind of hard to follow all the shouting
<meowray>
i think it is valuable because it gives you direct modifier control in inline asm
freakazoid334 has quit [Read error: Connection reset by peer]
<palmer>
OK, so this is different than the VDSO thing
<palmer>
but I'd be in favor of adding some sort of "laa" (load absolute address), which does symbol resolution but always does absolute addressing
<palmer>
that is useful in many contexts, and IIUC that's what your'e suggesting in that bug
<palmer>
in terms of the asm constraint, if it works then we should just address it
<palmer>
*document it
<jrtc27>
yeah, I've been meaning to add "lga"
<jrtc27>
("load global address")
<jrtc27>
"laa" doesn't exist, that's currently always lui+addi
<palmer>
sure, I don't really care what you call it
<palmer>
but having the ability to say "I want to generate an absolute address here" is useful, even if there's another way to do it for the VDSO offsets
<jrtc27>
"lga" would be the always-GOT version of la, like "lla" is the always-PC-relative version of la; currently you have la that's either GOT or PC-relative and you can only force it one way
<palmer>
ah, OK, that'd be different
<jrtc27>
kinda, though it'd work in every case you want laa
<palmer>
but ya, having the ability to explicitly control these is the right way to go
<jrtc27>
the only reason the pseudos exist currently is because of the weirdo %pcrel_lo handling
<palmer>
no, we can't always rely on a working GOT
<jrtc27>
why not?
<palmer>
because the code that sets up the GOT might not have run yet
<palmer>
that kind of stuff is usually when I want to be very explicit about my addressing scheme
<jrtc27>
laa is unlikely to help you there either unless it's a PDE
<jrtc27>
that kind of thing almost always wants to be lla
<palmer>
OK, well I don't really care that much about the assembler shortcuts
<palmer>
so are there any more issues with the undefined weak? is this issue with generating 0 via a promotion to absolute addressing a problem in LLVM land?
<palmer>
there's a lot in Farugi's blog and I'm not really sure what (if anything) needs to be fixed here
<jrtc27>
well, currently clang matches gcc codegen and will do PC-relative addressing for extern weak symbols
<jrtc27>
and lld does not do any instruction rewriting
<jrtc27>
so gcc+lld and clang+lld are both broken in the same manner
devcpu has joined #riscv
<jrtc27>
we could just implement the instruction rewriting, but it's kinda gross
<palmer>
well, I don't see a way to make LLD function without rewriting
<palmer>
the binaries already exist, so if you want to be compatible with GCC's binaries then you need to rewrite
<palmer>
IDK if we put that in the psABI spec, but we should as it's necessary to make real systems work
<meowray>
for -fno-pic -mcmodel=medany codegen, i agree with jrtc27 that auipc/addi codegen is wrong on the compiler side
<palmer>
I disagree
<palmer>
what do you want generated?
<meowray>
GOT for undefined weak
<palmer>
no, that's an unnucessary performance hit and will break real code (a lot of the embedded stuff doesn't have a GOT, as it's not otherwise necessary for non-PIC)
<palmer>
also, it doesn't fix the problem of old binaries
<palmer>
I don't understand why you're so against just generating the absolute 0
<palmer>
it's just a few more bits to set in the instruction
<palmer>
(or, I guess overwrite -- you're setting them to 0)
<meowray>
i think you misunderstand GOT
<palmer>
probably ;)
<meowray>
a GOT entry can have a constant value or a non-constant value. A constant value doesn't need GLOB_DAT. A non-constant value needs GLOB_DAT. A constant value works everywhere
<palmer>
but we don't generate GOT-relative references now, so just sneaking them in is going to break stuff
<meowray>
i am against "linker instruction rewriting as a workaround for incorrect compiler codegen"
<palmer>
well, IMO it's correct codegen
<meowray>
instruction rewriting is an **optional feature**. reling on it for correctness is awful
<palmer>
I'm not sure what you mean there: the linker has to rewrite instructions, that's the only way to get the constants in there
<jrtc27>
it rewrites the immediate as directed by the relocation, it does nothing to the instruction
<jrtc27>
*opcode
<palmer>
you don't need to rewrite the opcode here either, just the base register
freakazoid333 has joined #riscv
<meowray>
auipc a0, %got_pcrel_hi(var); ld a0, %pcrel_lo(.LBB0_1)(a0) # the GOT-generating relocation (R_RISCV_GOT_HI20) asks ld to generate a GOT entry. ld can fill a constant value at the entry and suppresses the dynamic relocation GLOB_DAT
<jrtc27>
for the performance argument: (a) this only affects extern weak symbols which is not very common (b) aarch64 already takes this approach and nobody's complaining about that
<meowray>
if you want to clever, you can implement **optional** GOT indirection to PC-relative linker optimization: auipc%got_pcrel_hi -> lui , but the point is: it is optional
<meowray>
(the (correct) optimization is implemented in x86-64 GOTPCRELX and ppc64 TOC16*, and Mach-O arm64 LC_LINKER_OPTIMIZATION_HINT)
<meowray>
correct to : GOT indirection to PC-relative linker optimization: I mean GOT indirection to absolute/PC-relative linker optimization