<mrvn>
I don't even get why it's doing the same request 3 times
<ddevault>
geist: hm, aight
<mrvn>
Same issue with 1.1.1.1 by the way.
<heat>
glibc right?
<mrvn>
me?
<heat>
i've seen some pretty weird behavior wrt nss when it was misconfigured
<mrvn>
normal Debian install. I only configured dhcp to leave the resolv.conf alone.
<mrvn>
It looks like the first message to 8.8.8.8 contains 2 DNS queries and only gets one reply. Waiting for a second times out and then it sends each request again separately. Do I see that right?
<heat>
that looks correct yes
<heat>
let me trace mine
<heat>
mrvn, how does your resolv look?
<mrvn>
Do you get the same when you starce "ping www.debian.org"?
<bslsk05>
www.debian.org: Debian -- The Universal Operating System
<mrvn>
"nameserver 8.8.8.8"
<heat>
ok lets see
FatAlbert has joined #osdev
<mrvn>
I'm guessing the 2 dns requests are IPv4 and IPv6
<heat>
mine only sends a single DNS request it seems
<heat>
wait, makes sense, it's for the reverse lookup of the IP addr
<heat>
so, two 32 byte msgs sent by sendmmsg, no timeouts and a 48 byte response and a 60 byte response
<heat>
everything works here
<heat>
mrvn, how does your wireshark look?
<heat>
if you don't get two responses, you may have a broken firewall or router in the way I guess
FatAlbert has quit [Ping timeout: 258 seconds]
<mrvn>
standard query or A, AAAA, responce for A, 5s pause, query for A, response A, query AAAA, response AAAA
<mrvn>
It looks like the second query is lost because it's send as separate frame
<heat>
hm?
<heat>
it's not lost
<heat>
you're just losing the response
ethrl has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Likorn has quit [Quit: WeeChat 3.4.1]
ethrl has joined #osdev
ethrl has quit [Client Quit]
Likorn has joined #osdev
<zid`>
It's not lost, heat just can't find it and doesn't know where it is
<zid`>
ddevault: Did you fix your EFLAGS?
<ddevault>
I will be looking into that tomorrow
<zid`>
It should just be that the stack frame you constructed that iret pops hasn't got interrupt enable bit set in eflags, from what you sai
<zid`>
said
<ddevault>
that's probably it
ethrl has joined #osdev
citrons has joined #osdev
ethrl has quit [Read error: Connection reset by peer]
bauen1 has quit [Quit: leaving]
Likorn has quit [Quit: WeeChat 3.4.1]
floss-jas has quit [Remote host closed the connection]
nyah has quit [Quit: leaving]
<gorgonical>
Do I "have" to store the address of the per_cpu data area in a register?
<gorgonical>
I guess I do, don't I?
<mjg_>
what arch is this? amd64?
<gorgonical>
risc-v
<mjg_>
oh, no opinoin :)
<gorgonical>
I'm trying to figure out how Linux does this: I think linux uses tpidr_el0 for TLS, and then tpidr_el1 for PDA, and then I forget how/where the task_struct is stored
<gorgonical>
RISC-V so far I think uses tp register for TLS, then CSR_SCRATCH for task_struct, and I have no idea where the PDA is stored
<mrvn>
Do you want to access the per_cpu data area often and fast?
<gorgonical>
Even if I didn't, wouldn't I still need some way for the CPUs to know their index in the central directory?
<mrvn>
the cpu normaly has an opcode to get the ID
<gorgonical>
There *is* a uscratch register that maybe I can shove this into. That saves one dereference
<gorgonical>
My fear is that libc or something already uses uscratch
bauen1 has joined #osdev
<mrvn>
How many registers do you have that the kernel can write to but user can't?
<heat>
gorgonical, there's a tls register iirc
<gorgonical>
Yeah, usually tp, I think that's like x4
<heat>
yeah
<heat>
I use it for my percpu data in the kernel since it's, well, my tls for all intents and purposes
<mrvn>
can't userspace write to x4?
<gorgonical>
But in user-mode thats tls storage. The kernel exception handler I'm stealing from linux shoves the task_struct* into the tp register
<gorgonical>
mrvn: yes but that's why the kernel stores whatever it wants in one of these scratch registers
<heat>
you keep it in the scratch register and swap it
<heat>
i didn't read everything that went after that
<heat>
but they do their loading right there
<gorgonical>
I only sort of understand how this is done on ARM64 so my understanding of how they solved this problem is vague
<gorgonical>
heat: what do you mean about the loading?
<heat>
loading of the tp
<gorgonical>
Yeah. tp in userland points to tls. In kernel land they want it to point to the task_struct. So first insn is to swap them. CSR_SCRATCH contains the task_struct ptr. Then all the context switching is with the kernel-tp
genpaku has quit [Quit: leaving]
X-Scale has joined #osdev
<gorgonical>
I've had a busy day so maybe I overlooked it, but the thread_info struct at the start of the task_struct struct doesn't appear to contain the pda
<mrvn>
why should the task truct have anything about the per code data?
<gorgonical>
It shouldn't, but the thread_info might point to it I supposed
<klange>
On ARM64, I tell the compiler x18 is reserved and then stick the per-core pointer in there. I stole that from geist. That's only in the kernel; x18 is the swapped on context switch with everything else
<gorgonical>
klange: That might be the thing to do
<zid`>
sounds like mips where k0 is free for the kernel or whatever
<klange>
In userspace, tpidr_el0 is the thread pointer. This is fully controlled by userspace and dutifully restored by the kernel. Then gcc's built-in understanding of thread-locals takes over.
<klange>
The important thing to note is the difference between per-core and per-thread. An execution context that is per-thread does not unexpected change between function calls, but a per-core one definitely can if one of those function calls is a context switch.
<klange>
So trying to convince the compiler to use thread-local stuff for your per-core stuff is a no-no, as it could elide a load after a function call; it can't do that if you tell it to always reference based on the register
<mrvn>
Another thing that's nicer without per task kernel stack. per-core never changes while inside the kernel.
<gorgonical>
I see. I mean the naive way is to use cpuid as an index, but I think that's pretty slow on riscv
<klange>
The option to tell gcc and clang that x18 is reserved is `-ffixed-x18`. You can also make that part of your ABI spec and bake it in... which Apple does in userspace on macOS for stupid legacy reasons.
<heat>
gorgonical, the proper way is to keep the percpu data pointer in sscratch and tp
<mrvn>
gorgonical: and every other archs too
<klange>
It's slow everywhere. Reserving a register is better because register-based addressing is universally faster.
<gorgonical>
heat: what do you mean?
<klange>
It's even faster than the tpidr lookups on ARM, since those still need a cycle or two to pull the msr out into a general register anyway - which is why gcc and clang will happily elide that operation when they can (*if they are doing it as part of native TLS)
<heat>
gorgonical, erm. just keep it there
<heat>
what more do you need?
<klange>
I recent did a thing on macOS to manually do TLS operations for my interpreter because macOS's default is always calling out to library functions and using (basically) GOT callbacks
<gorgonical>
and then put a ptr to the currently executing task in the percpu?
<heat>
yes
<heat>
that's what I do
<gorgonical>
I think that's the best option I have unless I discover linux does something really smart
<gorgonical>
thank you
<klange>
I put something a level above the task, but maybe that's a design mistake on my part
<heat>
i mentioned this option like 20 minute ago xD
<gorgonical>
you did but I got confused about who mentioned what
<mrvn>
You can chain it all from bottom to top: core -> thread -> process -> group
<klange>
fun fact, for the userspace thread pointer macos uses tpidrro_el0 instead of tpidr_el0
<klange>
UNLIKE EVERYONE ELSE
<heat>
what's tpidrrrrrrrrrrrrrrrrrrrrrro
<klange>
which means __builtin_thread_pointer does the wrong thing
<klange>
it's like tpidr but read only (and also it's a different register)
<heat>
sounds like __builtin_thread_pointer needs to be fixed for the darwin targets
<klange>
yeah, not sure why it hasn't been fixed to return the right thing
<mrvn>
That sucks, no user space threads that wayx.
<klange>
I assume because it's part of a gcc compatibility thing that they only care about on linux
<klange>
it's only read-only in the direct sense
<klange>
you can still use a syscall to set it
<klange>
and they use it the same way in the end
<klange>
_except_ that they push it all behind hooks that the dynamic linker sets up
<klange>
rather than actually linking slot lookups
<klange>
so everything is slow as hell
<heat>
10 bucks in how there's a massive exploit in the sillicon and they abstracted it that way so you can't set bad tp values
<bslsk05>
github.com: kuroko/vm.h at master · kuroko-lang/kuroko · GitHub
<mrvn>
kind of defeats the purpose of user space threads if you have to syscall to swap threads.
<klange>
This inlines the thread slot lookup _like every other platform does normally_, so thread-local storage is just as fast as it is on Linux, or ToaruOS.
<heat>
you almost always have to use a syscall to swap threads
<heat>
fsgsbase is super recent in the grand scheme of things
<mrvn>
heat: tls too
<heat>
fsgsbase is like 2014 recent
<heat>
not 2001 recent
<mrvn>
post c99, basically still wet paint :)
<heat>
huh
<heat>
how old is tls actually?
<heat>
like as an actual concept
<mrvn>
I would use tpidrro_el0 for the shared kernel/user thread pointer and tpidr_el0 for tls.
<heat>
i used ~linux's nptl date
<heat>
mrvn, you'll leak a kernel address that way
<bslsk05>
en.cppreference.com: C++ keywords: thread_local (since C++11) - cppreference.com
<klange>
tpidr_el0 on macOS _appears_ to be the core ID.
<mrvn>
heat: obviously. It's shared. Things like the pid and tid.
<klange>
Not the thread ID, not a _pointer_ to a core struct. Just a number for the core you are on.
<klange>
The most useless thing ever.
<mrvn>
So if userspace writes to tpidr_el0 the core suddenly is a different core to the kernel?
<klange>
tpidrro_el0 is the base of the thread-local data, which uses the descriptor slot approach; the rest of the thread struct is behind it at a fixed offset (basically it points into the middle of a struct, where a big array of pointers happens to start, which is typical)
<heat>
can't you switch tls models?
<heat>
or do they just not support it
<klange>
models? there are no models on macOS
<klange>
The only thing that has different TLS models is ELF.
<mrvn>
Another of those optimizer things. Setting the tpidrro_el0 is slow so you make it point to a pointer to thread local.
<gorgonical>
Update: it actually seems like Linux does the simple thing and just takes the processor ID as an index
<klange>
(I _wish_, initial-exec is what I want for my interpreter, of course)
<heat>
gorgonical, that's horrific
<klange>
(inlined dynamic isn't really that slow, though, if done properly)
<mrvn>
damn, I wanted to do some more work on my kernel this week and it's friday already.
<klange>
Symbol points to the descriptor, descriptor has key (index into the thread-local pointer array) + offset (because keys can be shared by many thread-locals), then you do tp[key]+offset and try to inline that as best as you can...
<klange>
anyway I made my interpreter speedy on macos by abusing knowledge of how the thread local storage model works and it's great, the end
<klange>
thank you apple for at least having these parts of macos be open-source
<heat>
yes, that's the standard for dynamically linked objects in linux as well afaik
<gorgonical>
heat: so linux does what you suggested; store the cpuid that the thread is currently running on and use that
<klange>
(also for various reasons they can't change this abi, so this inlining of why dyld's hooks do is totally safe)
<mrvn>
Looking back over how long this TLS discussion is running I really have to ask: Aren't threads more trouble than they are worth?
<gorgonical>
or what mrvn suggested, I literally can't remember
<heat>
gorgonical: no, i didn't suggest that, because that's slow
<klange>
mrvn: probably
<heat>
mrvn, no?
<heat>
what's the alternative?
<klange>
more processes
<mrvn>
just imagine all the data race you avoid by not having thread :)
<klange>
"I don't see the problem with that." - CPython
<gorgonical>
heat: what's the alternative? This is faster than getting the cpuid directly isnt it
<mrvn>
heat: factory model with message passing works nicely.
<gorgonical>
Oh you're suggesting to directly put the ptr in, not the index
<gorgonical>
yes that is better strictly
<heat>
gorgonical, yes
<mrvn>
gorgonical: if you store the ID in a register you might as well just store a pointer
<mrvn>
one less memory access and addition
<gorgonical>
Makes me wonder why linux does it this way then