<bslsk05>
wiki.osdev.org: Thread Local Storage - OSDev Wiki
<ddevault>
cc sortie
<ddevault>
if I understand this correctly, once I allocate the TLS structure, I should set both FSBASE and %fs to the address of the self-pointer?
<ddevault>
or do I set %fs to zero and FSBASE to the TLS area
<ddevault>
yeah, I think that's what I want
moon-child is now known as MOON-CHILD
scaleww has joined #osdev
MOON-CHILD is now known as moon-child
<klange>
ddevault: the mentioned IRC convo is not informative, don't worry about finding it
<ddevault>
ack
<klange>
The descriptor set in fs is meaningless once you hit long mode, so don't worry about that either
<ddevault>
I think I have everything in place
<ddevault>
the last step is preserving fs/fsbase on context switch
<klange>
Let userspace set fsbase through whatever mechanism you prefer, and then it'll be the job of a thread library or dynamic loader to set the thread base with it. If you want compatibility with standard gcc/clang thread-local stuff, there's some requirements about where it points and a dozen different ways it can be referenced in ELF relocations
<klange>
Yeah, just consider it part of the process/thread/scheduler state and set it as you switch. gsbase is more complicated with the two MSRs and the swap...
<ddevault>
my kernel's init loader is setting up fsbase/fs
<ddevault>
otherwise yeah, just adding it to the task state
<ddevault>
GNU ld is starting to get annoyed with my linker scripts
<ddevault>
cool thing is that adding two more registers to my arch context struct causes a GP fault
poyking16 has quit [Read error: Connection reset by peer]
poyking16 has joined #osdev
SpikeHeron has quit [Quit: WeeChat 3.5]
SpikeHeron has joined #osdev
[itchyjunk] has joined #osdev
dude12312414 has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
foudfou has quit [Quit: Bye]
foudfou has joined #osdev
gildasio has quit [Ping timeout: 268 seconds]
gildasio has joined #osdev
scaleww has quit [Read error: Connection reset by peer]
arch_angel has joined #osdev
arch_angel is now known as arch-angel
wootehfoot has joined #osdev
nyah has quit [Quit: leaving]
SpikeHeron has quit [Quit: WeeChat 3.5]
bauen1 has quit [Ping timeout: 260 seconds]
bauen1 has joined #osdev
wootehfoot has quit [Ping timeout: 245 seconds]
hypoalgesia has joined #osdev
SpikeHeron has joined #osdev
ripmalware_ has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
ripmalware has quit [Ping timeout: 272 seconds]
<junon>
Oh hey it's drew, what are you doing in here
<junon>
:D
Matt|home has quit [Read error: Connection reset by peer]
<geist>
i'm drawing a blank
<GeDaMo>
That would save on ink :|
ripmalware_ has quit [Ping timeout: 244 seconds]
<vdamewood>
Not if it were blank ink.
<gog>
mew??
* vdamewood
gives gog a fishy
* gog
eat fishy
Vercas has quit [Remote host closed the connection]
<junon>
Btw, update on the weird crash in debug mode. All of my IRQ handlers are in place, each just logs and halts for now. I've verified fault handlers are printing messages and whatnot, even out to serial.
<junon>
Debug mode still just exits without an error. Definitely a bug in QEMU.
<gog>
earthquake :O
<vdamewood>
gog: google confirms.
<geist>
junon: haha almost certainly not
<gog>
pretty far away but still made a lot of noise
<geist>
if you think it's bug in qemu or the compiler, almost overwhelmingly it's not
<junon>
gog: iceland? Or in CA?
<gog>
heard the wave through one window then the other
<junon>
geist: I think it's a bug in TCG. The output of Qemu even says so. The exact same code works fine in release, not in debug.
<gog>
almost no quakes to constnat quakes
<geist>
junon: while that may be true, it's almost certainly not
<GeDaMo>
Volcano related?
<gog>
yeh
<geist>
99.9% of the tmie yuo think it's qemu or the compiler, it's not. you have to dig deeper
<vdamewood>
gog basically lives on a volcano.
<gog>
they're predicting one to open up in the sea west-southwest of grindavík
<geist>
it does blow my mind, only like 350k people live in iceland
<gog>
geist: yeh and about 50k of us are foreginers
<junon>
geist: I have no other explanation for it. None of my fault handlers are firing. The same exact, unchanged code worked fine for a long time. It was just after a refactor that it happened. I've enabled -d and nothing looks out of the ordinary except for QEMU saying the translation failed. It performed a function call, then a few instructions, and then QEMU barks about invalid translation and exits 0.
<geist>
i've seen factories in china with more people than that for a work shift
<geist>
junon: that means you have one of the bazillion kinds of situational heisenbugs
<geist>
that happens *all the time* in osdev
<junon>
There's... not a lot going on tbh. It's pretty simple code. I have no other way of debugging this aside from what I've tried.
* vdamewood
eats the heisenbug.
<geist>
i'm ot saying you're full of shit i'm just pointing out that this is a very common trap, and the first thing you need to do is not assume it's qemu's fault and think outside the box
<geist>
well, what have you tried?
<junon>
I didn't assume it was QEMU's fault at first. Of course not.
<junon>
I've booted up a gdb server, stepped through the instructions.
<geist>
and then?
<junon>
It `calls`, executes one or two, all the registers seem fine. It barks about invalid TCG translation
<junon>
then QEMU exits.
<geist>
that's very very first try, but gdb and reality quite frequently get out of sync. you must not rely on gdb
<junon>
Even though I'm stepping through instructions? This is literally right at the beginning of the kernel booting
<geist>
it's nice when it works, but always assume that there can be a failure bewtween gdb and what the hardware is doing
<geist>
yes, even though you *think* you're stepping through the instructions
<geist>
gdb is operating on a fiction that it knows a thing *should* be mapped at a location
<junon>
I have no idea how to debug this further, then.
<geist>
but it doesn't read the instructions out of memory, it's not actually looking at whats going on in the computer
<geist>
it's simply stepping the hardware through an instructino and then looking back to the original binary, assuming the computer is running the code you told it its running
<geist>
if those ever get out of sync, you get precisely this sort of thing
<geist>
where the cpu appears to do something impossible
<geist>
that's always a strong sign you have something deeper wrong
<geist>
like memory isn't loaded right, or you have corruption somewhere that trashes the text segment, etc
<junon>
This is the first function call at the top of the kernel entry point.
<junon>
I don't even load my own memory, I'm using a bootloader crate just to get me off the ground.
<geist>
that's good,. means yo uhave less state to sort through
<geist>
this is good. the problem is simpler to debug then. it's far worse when it happens 10 minutes later after 10 million context switches
<geist>
so. here's the hard core thing i'd try: -d cpu,exec,int
<geist>
it's a huge firehose, but soetimes that yields you wat you want. shows you precisely what TCG was thinking it was doing
<junon>
I did -d all
<geist>
-d all is waaaaaaaay too much
<geist>
but did you find anything in it?
<junon>
Yes, it executes 4 instructions then Qemu barks about TCG having a bug and to send a bug report to the mailing list, and then exits 0.
<geist>
as a side note: precisely this sort of failure/debugging/etc is the *real* fun part of osdev. programming with no safety nets
<geist>
this is when stuff gets interesting
<geist>
can you pastebin it?
<geist>
or at least the interesting parts?
<junon>
I commenting out everything except the first statements. Now it's working, whereas the first statements fail typically.
<geist>
again this is almost certainly some sort of loading issue
<geist>
i've been working qith qemu for a good 15 years or so and it doesn't just randomly sprout TCG failures
<geist>
though i should ask, what architecture are you running? and what version of qemu?
<geist>
what qemu is *bad* at is giving you meaningful errors when the cpu really gets fucked up
<junon>
x86_64, 6.2.0. This happens on both WSL1's version as well as the aarch64 version - reproducible on both.
<geist>
running with the PC where the mmu is off, etc. it tends to dump non errors and stop
<danlarkin>
junon: maybe your kernel isn't being loaded where you think it is
<geist>
yah it' almost assuredly something like that
<junon>
Then it's a bug in the bootloader crate which everyone in the rust osdev world seems to use.
<junon>
Then why does everything else work normally?
<geist>
but again the first thing you gotta do is get over this notoin that qemu is broken. now you have to go deeper and manually trace it
<danlarkin>
bro it's not a bug in someone else's code
<geist>
this happens a *lot*. you have to assume nothing, verify everything, manually work it out
<geist>
never rely on high level tools like gdb exclusively. use them when they help you but always be prepared to get your hands dirty
<junon>
Okay
<geist>
for example, first thing i'd do is put an infinite loop on the first instruction. now start qemu and with the qmemu console (not gdb) stop the state of the cpu when it gets there
<junon>
I think my point is being missed.
<geist>
inspect the contnts of all the registers, make sure it's loaded in the right spot
<geist>
then use the memory inspection stuff to verify all the pages got there. sometimes maybe it only loaded the first 512 bytes and the thing you're jumping into is at byte 513. for example
<geist>
that's a common problem
<junon>
This is Rust code. I'm executing A, B, C in order. In debug mode, A causes the QEMU failure and then exits. If I comment out C, A runs fine.
<geist>
or it loaded it at the wrong address, which works fine up until it suddenly explodes
<junon>
This is why I believe it to be a codegen issue.
<geist>
that's fine, but its meaningless to me. that's precisely what heisenbugs look like
<geist>
sometimes you change the length of a string and it fails
<geist>
that's generally because the layout of the binary changes
<geist>
which then causes latent loading issues to show up in 'random' locations
<geist>
or codegen changes that causesit to switch from using a PC relative construct to a PC-dependent one, etc
<geist>
i'd always start with a dissassembly of your binary
<geist>
then you can go through it and see precisely where it goes off the rails
<geist>
like i said start by verifying the code is loaded where you thought it was and in the right state, and got completely loaded
<geist>
that would rule out a large pile of heisenbugs
<geist>
you can compare it against the disassembly
<danlarkin>
junon: the most productive thing you can do is assume your tools are correct and start trying the angles geist is suggesting
<geist>
could also be things like your stack isn't right, or the segments aren' set up right. or it uses a fpu instruction and it's disabled, etc
<junon>
danlarkin: I'm fully aware.
<geist>
especially if you're jumping into rust, and doubleplus so if you're writing your start.S code yourself
<geist>
by omission you could have left something out you didn't think you needed
<geist>
so usually you narrow the focus to precisely which instruction goes off in the weeds
<geist>
and not at the gdb level, that's far too high level
<geist>
can you trace it to precisely which instruction?
<junon>
It's a `sub` instruction in gdb. But if gdb can't even tell me what RIP is then what's the point? QEMU is useless then, isn't it?
<geist>
i'm basically going through the mental checklist i'd be going through
<geist>
again, dont rely on gdb
<geist>
it's very easy for it to get out of sync with qemu
<geist>
especially on x86-64 in early boot sequences
<junon>
but isn't GDB still getting register values from QEMU?
<geist>
i *never* use gdb + qemu
<junon>
if QEMU's `gdb-server` option is so faulty then why bother..? I'm lost
<geist>
precisely because of this, some folks do, but they understand that it has limitations. it's better at debugging things once it's all set up and running and the mode switches are far behind it
<junon>
anyway, I'll trace.
<geist>
no. you just dont get gdb. i'm not saying 'dont bother'
<geist>
i'm saying 'gdb tracing early code works nicely when it dos, but if it does't dont get bent out of shape about it'
<geist>
it's a tool that breaks often, so dont rely on it, but use it when it works
<geist>
get very familiar with qemu's own command line
<geist>
inserting infinite loops in the code and walking it is very useful
<geist>
also there's a --singlestep mode in qemu that might be helpful
<junon>
Right I get that, it just seemed that release code worked, debug code didn't, QEMU itself was spitting out "this is a bug, TCG and the disassembler had different output for the same bytecode, this is a bug, please file an issue at whatever@mailinglist.com".
<geist>
okay. well i've said what i'm going to say
<junon>
That was the thought process
<geist>
yes i hear you but that doesn't make your problem go away
<junon>
I'm not disagreeing with what you said, just explaining why I didn't go and trace this down.
<geist>
if you wanted to file a bug they're going to ask you allt he same questions: narrow it down to precisely what's going on
<geist>
sure, but that's the past
<geist>
now you have a problem in front of you, here's the steps i'd take to figure it out
<junon>
Yes I'm doing that now.
<geist>
cool!
<geist>
bumme ryou hit this right off the bat, usually folks get a bit farther before hitting their first hard core heisenbug. but i haven't worked too much with trying to write kernels in rust
<geist>
so it being somewhat higher level i think you get abstracted from hardware a bit farther a big faster
<geist>
which is a recipe for harder to debug heisenbugs when your situation isn't entirely stable yet
<geist>
*usually* te first few lines of code i write for a kernel or firmware is some assembly to get things into a proper state
<junon>
I was hesitant on using the bootloader crate anyway for this reason. I've done this a number of times in C already, just got way further way quicker with Rust and felt more confident in the language giving me clear insight as to the memory safety situation.
<geist>
and that you end up sorting out a lot of issues there, but being assembly you know precisely whats being run
<geist>
and then C being somewhat between higher levle languages like Rust
<geist>
yeah, a bootloader crate sounds a little iffy to be honest.
<geist>
there are a million ways to do bootloaders and it may be the author of that has different assumptions than you do
<geist>
release v debug smells a lot like some state isn't correct that there's some ambiguity if the bootloader or you is supposed to set it up
<geist>
(stack alignment, state of the cpu at the first line of rust, etc)
<geist>
it hitting a TCG bug is a bummer, but once you figure t out i bet it's not a bug but an invalid state that qemu doesn't have a solid error message for
<geist>
i've seen that a lot. you get your emulated cpu off in the weeds and it just bombs out with a looks like qemu bug, but really it's a 'i dunno how to continue so i'll just stop here and call generic error message routine'
<geist>
side note are you using KVM or not?
<geist>
probably not because you said TCG
<junon>
That was my first inclination, but when GDB looked like it was working fine I assumed it was telling the truth. I didn't realize GDB wasn't actually inspecting the QEMU state as closely as it lends itself to. Otherwise I wouldn't have jumped to the conclusions I did.
<junon>
And no, not KVM.
<junon>
I think I have the window binaries installed and might try one of the windows backends for it just to see what it does.
<junon>
Right now everything is on WSL1 though. Unfortunately.
<geist>
hrm.
<junon>
At least with KVM I could rule out TCG being the culprit.
<geist>
yah downside is KVM tends to be far more opaque about errors, since most of the state is on thec pu itself
<geist>
and your ability to trace it is somewhat diminished
<junon>
Yeah. I also worry that even if it worked, it would be masking the bug. I worry release mode is doing that already.
<junon>
And that it's not actually a debug vs release config/codegen issue but something deeper.
<junon>
Like you said.
<geist>
but yeah thsi sort of thing is extremely common in low level code. its the fun stuff that really teaches you how the low level stuff works, and especially codegen of compiler/language
<j`ey>
junon: whats the rust code doing btw?
<geist>
this is one of my concerns about using even higher level languages than asm or C for kernel stuff. clearly folks are going to do it, and i do stuff in C++, etc
<geist>
but in lots of kernel stuff it's fairly important to know precisely what the compiler is doig
<geist>
and the more higher level the languag the more work on theprogrammer to stay on top of it
<geist>
it's a hidden cost of higher level languages that generally doesn't matter to user space code
<geist>
but matters a lot on 'bare metal'
<clever>
things like fpu state, and light weight threads?
<junon>
The rust code I've actually written is just the entry point coming in from the bootloader crate, which sets up the serial output, then initializes IRQs, and then sets up the rasterizer if a frame buffer was allocated. Nothing else yet.
<geist>
clever: sure, or in general when is it going to use jump tables, or what sort of data structures does construct A vs B use, or does it rely on init hooks that the runtime was supposed to call, etc
<geist>
there are a bazillion things there that can be sorted out but it's hidden state to most user space programmers
<geist>
C++ global constructors are a good example of a huge hidden footgun
<mrvn>
geist: but C isn't that much different there than c++.
<geist>
i have no idea if rust has something like that
<geist>
mrvn: omg it's on a different level
<junon>
In this case, I'd wager this is more a symptom of something in the bootloader crate. Just speculating. Right now I'm not doing much "high level" stuff in Rust (it's in no_std mode anyway).
<mrvn>
geist: even with C you still have no idea what the compiler is doing
<geist>
yes at the basics no, but there are all these C++ runtime details that you have to know. you and I know them, but if you're getting started
<geist>
mrvn: sure, that's why you have tos tay on top of it
<geist>
and i'm saying the staying on top of it part gets harder as more and more layers are added
<geist>
more abstraction
<mrvn>
and constructor in C and c++ are the same too
<geist>
gosh no, not at all
<mrvn>
just one is a compiler extension and the other standard
<geist>
uh. okay.
<mrvn>
geist: both ways end up with a function pointer in init_array
<geist>
butone of them is an intrinsic part of the laguage that you use all the time, the second one is something yo uhave to explicitly do
<mrvn>
true, __attribute__ is a bit more obvious.
<geist>
my point is it's layers. sure you and i know about it, but you can keep extrapolating it out to more and more features
<mrvn>
but globals are evil either way.
<junon>
Rust has nothing like this btw. Rust requires you make such things explicit, too. Way moreso than c++.
<mrvn>
junon: no globals?
<geist>
a *good* kernel programmers know what the compiler is going to emit basically
<geist>
or at least have a good idea. this construct equals basically this code
<junon>
It allows globals, but no automatic instantiation of them. They have to be completely const, or you have to be explicit about access to them if they're mutable via an `unsafe{}` block.
<geist>
the details are up to the compiler, but you should be aware of the scope of the code it'll emit
<geist>
and yeah was gonna say rust actually might be better here
<geist>
i shouldn't have mentioned specifics, i was trying to make an overall meta commentary
<mrvn>
geist: That hasn't been the case with C for a decade or so now. Too many interaction sin the optimizer.
<geist>
but by mentioning a specific the ratholding begun
<geist>
so i give up
<geist>
but anyway, i hoe you get my overall gist
<mrvn>
What we need is a less optimizing C like language.
<geist>
mrvn: sure but the compiler wont replace your hand rolled linked list with a WAVL tree
<mrvn>
or just more predicatable compiler
<geist>
vs std::foo maybe doing this vs that
<geist>
again not saying that's good or bad, but a good programmer knows this
<junon>
Yeah in C++ you don't know if `whatever_t foo;` is just sub'ing rsp or if it's jumping to some code.
<junon>
I agree it's no ideal when you want absolute control and understanding.
<geist>
and thus the higher level the language, IMO to use it in kernel/low level stuff requires more stuff you need to research and stay on top of
<mrvn>
junon: that I find the least of a problem
<geist>
so you know what you're getting
<mrvn>
Bigger problem is: foo(bla); Will that copy bla? modify bla? store bla for later use?
<geist>
right
<junon>
Right.
<j`ey>
anyway, none of this helps junon :P
<junon>
Rust makes this explicit.
<geist>
but as junon is saying rust actually might be in general better about this
<geist>
my limited amoung of rust programming iw as starting to see the picture, but was spending too much time looking at the codegen
<geist>
and it was... okay.
<geist>
like, kinda what i expected
<junon>
You can't accidentally copy something in Rust, for example.
<mrvn>
In the rust the caller says how something is passed and not the calee, right?
<junon>
It has to be pretty explicit whenever you do so. Or the compiler barks at you.
<junon>
Both state it, basically.
<junon>
The callsite has to agree with the protoype, in essence.
<junon>
In many cases, at least.
<geist>
junon: before this gets too ratholed, can you paste a link to the bootloader crate? curious what the docs say
<geist>
ah interesting. yeah it doesn't really specify that much in detail precisely what the handoff looks like
<geist>
i guess it's intrinsically x86-64, so it's presumably going to set up the cpu to some sort of reasonable place
<junon>
check the `entry_point` macro.
<geist>
hell this could simply be something like it only gives you 4K of stack and you immediately blew it
<junon>
Passed off a BootInfo struct to the function.
<geist>
and thus the debug build stomps over the text segment and you trash the codegen
<j`ey>
junon: what exactly is your rust code doing?
<j`ey>
just an infinite loop?
<junon>
geist.
<junon>
You might have just solved it.
<junon>
Let me look at something lol.
<geist>
stack usage is a *huge* example of Things Yuo Gotta Know
<junon>
I assumed a much larger stack. That was dumb of me. I didn't care to look into it because I usually set that stuff up myself.
<geist>
as part of my example before
<junon>
> Defaults to at least 80KiB if not given.
FreeFull has joined #osdev
<geist>
even in zircon we've been bitten by that over and over again. someome brings in a C++ construct in the kernel that accidentally uses a crapton of stack
<geist>
also keep in mind they might have bugs. does it align the stack properly? etc
<geist>
anyway this is something you'll probably almost immediately want to set up yourself
<geist>
also i'd look at the disasm and see if you can see how much stack your functions are using
<geist>
that'd give you a pretty hard smoking gun
<junon>
geist: it was stack size.
<junon>
Increased it to 512KiB and it runs fine on debug now.
<geist>
geezus 80KB wasn't enough?
<geist>
...
<j`ey>
lol
<junon>
Debug info in structs, perhaps. idk.
<geist>
my kernel has 4K stacks, zircon has 8K
<mrvn>
there is something seriously wrong with your code if it needs 80k stack
<junon>
Like I said, release mode works fine.
<j`ey>
junon: still wondering what the rust code does..
<geist>
though to be fair we simply cant compile -O0 on zircon for example, precisely because of stack size
<geist>
so it may be that your debug mode rust is not realistic for kernel work
<geist>
or at leas tyou need to define a 'debug + lightweight optimization' build mode and use that
<junon>
j`ey: sets up serial, sets up IRQs, sets up rasterizer and framebuffer logger (which has a large-ish character buffer). That's it.
<mrvn>
geist: that's because -O0 means you are compiling for a stupid stack machine.
<geist>
indeed.
<j`ey>
junon: thats quite a bit of code I guess.. but still, more stack than I expected
<mrvn>
even with clang they first compile everything down to a 60th area stack machine CPU and then optimize
<junon>
i'll grab the stack size before/after in both debug and release to see.
<junon>
curious now.
<geist>
this is something you'll have to stay on top of very very aggressively
<geist>
since you do not want huge stacks in the kernel
[com]buster has joined #osdev
<geist>
why it results in a TCG bug is still interesting, but it almost certainly is a result of the stack overflowing into the text segment and trashing the instructions
<mrvn>
geist: is circon using -O2 or -Os?
<geist>
and thus TCG is now told to run some complete garbage
<mrvn>
zircon
<geist>
mrvn: -O2 for the kernel
<geist>
user space has more options based on build type, but we explicitly build thekernel the same way independent of the overall system image type
<geist>
the only fundamental build switch we have on the kernel currently is a 'with full debug asserts + test case code' vs 'none of that'
<geist>
basically in LK it's a direct mapping to LK_DEBUGLEVEL which turns on DEBUG_ASSERTS and test code
<geist>
we just renamed all of that ZX_DEBUG_ASSERT, ZX_DEBUGLEVEL, etc
<GeDaMo>
Sinclair reboot? :P
<geist>
as a side note on modern machines it's somewhat as i thought: asserts are not that expensive. most of them are `if (unlikely(condition)) branch_to_end_of_function;`
<geist>
so in general even with full debug asserts on in the kernel we see maybe a 5% benchmark regression
<geist>
which makes it quite useful to use even in day to day stuff
<geist>
even though the size of the kernel grows by a *lot*
<geist>
just from the size of all those asserts
<geist>
generally confirms my usual policy of assert early and often, it'll pay off in dividends as the project gets larger. made the code extremely fragile sothat it blows up quickly and definitively
<geist>
i say that because i've worked at places where the exact opposite was the policy. based on some notion that asserts are expensive and the system will blow up eventually anyway
<geist>
and i thought it was a bad idea
<j`ey>
asserts <3
<geist>
Though to be fair i think the asserts being mostly free are a function of modern machines. Once a piece of code has run once the branch predictors are hot and they become freeing
<mrvn>
geist: with speculative execution the code jumps just over the assert and if there is any register or memory dependency around the assert then it's free as your waiting anyway.
<geist>
Yep., even if it runs all of the code leading up to the branch, it still ends up predicting the lack of branch and continues on
<mrvn>
just don't have a complex, expensive condition in the assert
<mrvn>
assert(foo != nullptr) is free, assert(compute(bla) == 10); is not.
<geist>
Yup. And absolutely make sure you mark your condition in the ASSERT macro as unlikely() (or the compiler equivalent of)
<mrvn>
You should fix the code to leave a guard page below the stack that's not mapped.
<junon>
This is printing out $rsp at the entry point, after the bootloader has jumped to the kernel, and then directly after I allocate the frame buffer logger.
<j`ey>
maybe the framebuffer is on the stack
<junon>
It is
<mrvn>
too small for the framebuffer
<junon>
It's just not that big.
<junon>
As is, in release mode, the kernel takes 432 bytes to get to that point. In debug mode - the same code - takes 99904 bytes.
<geist>
might be some thing that makes a whole copy of the framebuffer array and then copies to the real one, etc
<junon>
Of stack.
<mrvn>
does it store a full call graph in the stack?
<junon>
You mean for debugging purposes?
<mrvn>
yes
<mrvn>
so you can get a backtrace of all functiond with arguments.
<junon>
I suspect you might be right, especially given how "developer friendly" rust tries to be.
<junon>
It's just working against me in this case. lol
<junon>
I'll poke around and see what I can find. I can't be the first one that has run into this issue. There's no fancy Rust code happening here.
<mrvn>
although between kernel entry and main shouldn't be many layers.
<junon>
In the way I was measuring it, none - I just sampled rsp right at the beginning of my rust code, did the serial+irq+frame buffer init stuff and logged out a second rsp sample in the framebuffer logger constructor, hence the numbers.
<junon>
yeah poking around a bit, this is a common issue in rust. TIL.
<junon>
debug mode stack sizes are ridiculously huge.
<geist>
again i encourage you to look at the disassebmly
<geist>
get used to eyeballing how much stack a function uses by looking at the function prolog
<junon>
yeah I will. I need to set up that tooling anyway.
<junon>
Good exercise to do so.
<geist>
yah you can objdump -d it just like a C program. it looks pretty close
<geist>
i always set up my build system to generate a .lst file so i can just open it with an editor and tell it to autoreload it when it changes
<ddevault>
does anyone know what the TLS self-pointer is for
<geist>
so i get a up to date disassembly in a window
<geist>
ddevault: if you mean having slot 0 in something fs or gs points to?
<ddevault>
I mean having fsbase point to itself
<ddevault>
and/or %fs
<geist>
yah that lets you compute the linear address
<geist>
ie, `mov fs:0, rax`
<geist>
now you know what the actual address is, instead of the 'pointed to by fs' address
<ddevault>
ah, I see
<ddevault>
I'm wondering about where to store this for multiple threads
<geist>
just think of fsbase and gsbase as being additional state you swap for each thread
<ddevault>
is it better to have one fsbase for each task, or to have one fsbase per process and fs set to a bunch of TLS/thread structures concatenated into the same area of memory
<geist>
threads
<geist>
well depends on what you mean by task. but if task == thread, then it's per thread
<ddevault>
I suppose that fsbase is somewhat expensive to change
<ddevault>
task == thread, yes
<geist>
no, not really
<ddevault>
task == thread == process, actually, in my design
<geist>
it's just additional state you save on a context switch
<j`ey>
junon: weird, I did some rust debug stuff but never saw a stack that big
<ddevault>
linux goes to some trouble to avoid updating fsbase
<ddevault>
if it's not necessary
<geist>
yes, it's 'expensive' but it's in the cycle counting sense
<geist>
but dont worry about that, that's far into microoptimization territory
<ddevault>
ack
<ddevault>
so I suppose the simplest approach would be to have a different fsbase for each task
<geist>
yep.
<ddevault>
and leave %fs always set to zero
<geist>
is this for x86-64?
<ddevault>
yeah
<geist>
yes in that case you set fs gs es ds to 0
<mrvn>
doesn't fsbase change fs?
<geist>
and though user space can fiddle with it, you just set it back to 0
<geist>
mrvn: no
<mrvn>
well, the internal representation as opposed to the index of the descriptor
<geist>
have to be extremely precise here mrvn
<ddevault>
well, userspace can fiddle with it if they want, I'm not going to enforce that it's zero
<ddevault>
just a convention in userspace
<geist>
the register itself is almost completely decoupled from the internal representation on x86-64
<geist>
the register itself doesn't change automatically
<geist>
ddevault: right, you dont sign up to save it is fine
<mrvn>
and loading fs only sets 32bit of the base, right?
<geist>
that's what most oses do
<geist>
mrvn: no. it does nothing *asterisk
<ddevault>
I mean I will save it
<geist>
(except where it does, because of details)
<ddevault>
not really difficult to
<ddevault>
push %fs
bleb has quit [Ping timeout: 272 seconds]
<ddevault>
I just don't care if userspace wants it to be zero or 1337
<geist>
ddevault: no dont. that's actually not a good idea for the asterisk reasons
<ddevault>
hm?
<geist>
you want to explicitly whack it back to 0
<geist>
and leave it there
<ddevault>
care to elaborate?
<mrvn>
ddevault: you can't pop fs to get your fsbase back
<ddevault>
I had planned on saving fs & fsbase separately
<ddevault>
not quite as simple as push fsbase, of course
<geist>
ugh. i forget precisely the deails but there are certain microarchitectures in varoius points in time in the AMD and intel lineup where loading fs or gs in x86-64 actually *does* modify the hidden fsgsbase
<geist>
in particular situations
<ddevault>
I do at least need to save fsbase since other tasks will have their TLS area elsewhere
<geist>
so you, the kernel, want to just put it in a known state so you, the kernel, aren't tricked into fucking it up
<ddevault>
eh, wut
<geist>
there are even quirks within linux about this
<mrvn>
geist: but the descriptor doesn't have 64bit in it
<geist>
mrvn: yes. i knpow
<ddevault>
alright, I guess I'll just pin fs to zero and only save/restore fsbase
<geist>
correct that's the right thing to do
<ddevault>
do I need to worry about any other weirdness when I start looking at gs?
<geist>
i used to know this detail, but it was an annoying pile of before and after state and precisely which cpu it was
<geist>
the simplest strategy is to not sign up to saving that state so you can basically assert it's 0
<geist>
at leas once the kernel has control of it
<geist>
it was something to do with setting fs or gs to 0 if it previous was *not* zero causes the fsgsbase to get updated
<geist>
buto nly on some architectures
<mrvn>
hmm, if I set fsbase to 8GB and then far jump to CS32 does that mean access though %fs has access to memory above 8GB?
<geist>
mrvn: i think so, yes.
<geist>
welcome to x86 bullshit
<mrvn>
Cool, I could give a 32bit process 12GB memory, 4GB normsl, 4GB fs, 4GB gs.
<geist>
it never ends. every single one of these things has some sharp edge hazard thing
<geist>
mrvn: but i might be wrong about that
<ddevault>
thanks for the tips, geist
<geist>
ddevault: anyway, i did this research when deciding what to do about it in zircon, and turns out linux doesn't save all these segment regsters for 64bit oses so it's s safe bet
<mrvn>
geist: loading fs in 32bit mode will overwrite the lower 32bit of the fsbase for legacy reasons. not sure if it will touch the upper bits.
<geist>
if linux doesn't do t then you end up with it not being in the ABI so you're safe
<geist>
mrvn: yah. dunno.
<mrvn>
another dege case
<ddevault>
navigating the linux entry code gives me a headache
<geist>
there's another silly edge case where in some very specific situation one of intel or AMD zeros out the SS automatically
<geist>
and thus user space can't realistically stuff antyhing in SS and expect it to be saved, even if it's not used for anything
<geist>
so you quickly get into only ds and es have truly no side effect, but no real point saving those either
bleb has joined #osdev
<geist>
but iirc the issue is even in 64bit mode the cpu *still* consults the GDT and loads the internal hidden state, it just doesn't do anything with it (or generate exceptions)
* mrvn
writes %fs:%gs:rax into some opcode and whatches what the cpu will do.
<geist>
so that's where you get into some amount of trouble if you save/restore user space *s registers
<geist>
user space can do it, but if you, the kernel, are going to save it for user space then you (the kernel) can be tricked into doing thins in ring 0 that maybe have side effects
<geist>
so by just slamming it back to 0, which is a known state, and not restoring it, you save yourself a class of headaches
<mrvn>
isn't SS still used for the SP?
<geist>
that's the TL;DR iirc
<geist>
mrvn: no
<geist>
again you have to be precise. the *register* has no use, but whether or not the hidden part of it does is what matters
<geist>
and in the case of x86-64 64bit mode, no
<mrvn>
the one segment register that would be useful to have
<geist>
the only register that really has any use in 64bit mode is cs, and that's only because it selects the bitness of the cpu
<mrvn>
cpl too
<geist>
and i totally forget what happens if user space 64bit code tries to long jump to a 32bit cs descriptor
<geist>
probably a trap to ring0, but i dont know precisely why
<geist>
probably an obscure rule that keeps this from happening
<geist>
anyway, this is not what i want to htink about today :)
<mrvn>
not sure why that should be forbidden
<geist>
dredging up bad memories
<ddevault>
I don't think I'm going to port my kernel to i686 et al
<ddevault>
protected mode is not for me
<mrvn>
ddevault: just do 64bit.
<ddevault>
aye
<mrvn>
ddevault: x32 userspace might be nice though
<ddevault>
meh
<ddevault>
not like I'm aiming for binary compatibility with anything
<geist>
yah it's a lot simpler if you simply dont support 32bit ring 3
<geist>
also remember x32 is still 64bit code, just running with smaller pointers
<geist>
so from the kernels' point of view it's the same mode
<mrvn>
with a few limits, like allocating memory below 4GB. Makes memory usage smaller, cache more efficient.
<ddevault>
unnecessary complication
<mjg>
x32 never took off though
<mjg>
and unfortunately there are real programs which don't fit the 4g space
<mjg>
see your web browser
<mrvn>
mjg: and that's about it on a desktop
<mjg>
but then you already need real 64-bit userspace to support it
<ddevault>
desktops are among the least interesting and most important use-cases
<mjg>
as for servers, see your databases et al
SGautam has joined #osdev
<mjg>
i would be all for x32, but today's progs are just too memory hungry
<ddevault>
well, not any of the good programs
* mjg
pets rexima
<geist>
yeah it's one of these cases where the vast majority of people dont really care if some thing runs 10% faster with 15% less ram
<geist>
does it matter? will it make you choose A vs B?
<ddevault>
and if you're starting a new OS you might as well only invite the good programs
<mrvn>
mjg: x32 is just a flag in the process structure so that malloc won't use more than 4GB memory. So not really much extra work.
<geist>
does the general person care or even know about this?
<mjg>
general person will be sad about weird bugs they run into when x32 talks to a full 64 bit
<geist>
mrvn: it's a huge amout of work because now all your shared libs in the system have to have two copies, and now you might overall use more memory if you have both sets
<mjg>
ye, as noted earlier, once you already need full 64 bit userspace, you are kind of hosed
<mrvn>
geist: 3 sets. you need 32bit too for reasons.
<mjg>
unless x32 is an enormous win, which it is not
<geist>
right
<jimbzy>
Howdy
<mjg>
did go even try x32?
<ddevault>
gccgo probably works on x32
<geist>
jimbzy: howdy!
<mrvn>
mjg: it's a big win when you only have 4-8GB ram.
<mjg>
quite frankly i can't tell whether that's a non-starter for G or contrary, would be great, but $reasons
<jimbzy>
How have you all been doing? I haven't been around much lately and my buffer overruneth.
bleb has quit [Ping timeout: 252 seconds]
<mrvn>
Why should I even let my browser use more than 4GB ram? Conserve memory. Seriously.
<mjg>
if you don't want your browser to use 4GB of ram use lynx
<mjg>
or don't use web :-P
<geist>
yeah i mena sure your anger is legit, but what are you gonna do?
<mjg>
i'm pretty sure this ship has sailed
<geist>
indeed
<mrvn>
Every tab is sandboxed, or should be. So it would be 4GB per tab.
<geist>
jimbzy: not too bad. it's summer. i'm not s summer person. it's hot(ish) here
<geist>
but i have AC so it's all good
<mjg>
burgeoise setup
<mjg>
no AC but i have a fan
<jimbzy>
Yeah, it has been pretty funky here, too. It's not bad in the Bat Cave, but upstairs is usually pretty warm.
<mjg>
hm
<mjg>
how about 40 bit pointers though? :)
<mrvn>
mjg: 47 bit.
<mjg>
i'm aware of the steep decline in memory savings due to alignment
<jimbzy>
Pretty nice out today. I was out in the yard working a bit and measuring my property trying to figure out where my dipole is going to go.
<mjg>
my point is that ultimately you can sitll do markedly less than 64
<ddevault>
have fan, do not have AC
<mjg>
without chocking in your address sapce
<ddevault>
but the problem today is humidity
<mrvn>
mjg: you mean because your uint16_t has 16 byte alignment?
<mjg>
mrvn: i mean 2 pointers stacked together would come with a 3 byte gap
<ddevault>
mrvn: the ideal solution is not to sandbox tabs but rather not to support javascript
<mjg>
bbl
<jimbzy>
My old foreman used to always remind us about that, ddevault.
gxt__ has quit [Remote host closed the connection]
<jimbzy>
We'd be standing in 250-300F asphalt and he'd say, "It's not the heat, babe, it's the humidity..."
gxt__ has joined #osdev
<ddevault>
I grew up in a dry place and never had problems with heat
<jimbzy>
Nah, Sonny, I'm pretty sure it's just the heat...
<ddevault>
then I moved to humid places and ho boy
<jimbzy>
Yeah it's a oppressive.
<mjg>
still, would be curious what kind of perf you would get with 48 bit pointers + everything aligned to 2 bytes
<mjg>
as in allowing ints aligned to 2 bytes
<mjg>
on amd64
<mrvn>
mjg: why 2?
<mjg>
so that there is no gap between back-to-back pointers
<mrvn>
why not 1 if you throw out aligned load/store anyway?
<mjg>
aligned to 2 makes the misaligned less likely across the codebase
<mrvn>
Do you store a pointer as 32bit + 16bit? or read-modify-write 64bit?
<mjg>
amd64 tolerate this fine, but perhaps that breaks down when they are flooed with it
<mjg>
i would imagine you would do a 32 bit store + another 32 bit store 2 bytes from the beginning
<mjg>
or that 32 + 16, whatever
<mjg>
to avoid touching the other bytes
<mrvn>
urgs, no. I think that would create a memory dependency
<mjg>
and then there is the problem with applicability of sse
<mrvn>
depending on the cpu you might also have a swap opcode for the high/low 32bit that's cheaper than shifting.
<mjg>
amd64 has an addressing mode which takes care of it
<mrvn>
are you doing a lot of pointer arithmatic with SSE?
<mjg>
i fiture i would still be doing a lot of copying
<mjg>
and sse requires 16 (or was it 32?) byte alignment
<mrvn>
only the older models
<mjg>
which may be getting scarce, making it impossible to inline
<mrvn>
unaligned load exists, it's just a bit slower
jafarlihi has joined #osdev
<mjg>
for sse? nice
<mjg>
hrm, this 48 bit stuff definitely breaks down when you try to do atomic_store of a pointer
<mjg>
unless you cmpxchg on the 64 bit word, which kind of demolishes performance of hte op
<mjg>
as you even have to branch on it :[
<mrvn>
mjg: totaly trashing it, yes.
<mjg>
x32 it is then
<mrvn>
mjg: I want a struct { void *ptr : 48; uint16_t avl; }
poyking16 has quit [Quit: WeeChat 3.6]
<mjg>
you want atomic store without fucking up avl
<mjg>
i get it, that's what i said
<mrvn>
mjg: nah, I would be fine storing the full 64bit atomically. Make the struct atomic, not the ptr.
<mrvn>
Alternatively let me use the lower bits of a pointer that's aligned to e.g. 4 byte.
<mrvn>
ptr : 48-2; is really all you need most of the time.
<mjg>
have to go afk for 2h
bleb has joined #osdev
<jafarlihi>
geist: Do you know if there's a way of getting "scope" of IPv6 address in Fuchsia?
<geist>
i dont know, but i encourage you to ask these questions on the fuchsia discord
<raggi>
That concept doesn't exist, you don't get the scope of an address, an address has a scope or it is not fully specified
<jafarlihi>
For IPv6 unicast addresses, how do you tell apart link local and global scope?
<mxshift>
fe80::/10 is link local
<jafarlihi>
So you have to check if it's in that subnet instead of checking just bits like for multicast addresses?
<mxshift>
Correct, there are no dedicated bits in a unicast ipv6 address to indicate scope. You check the prefix just like you check for ff00::/8 to check for multicast
<mrvn>
.oO(how is the prefix not bits?)
Burgundy has joined #osdev
<mxshift>
idk. I mean, I get what they are saying about multicast ipv6 defining a bitfield that encode scope that isn't part of the multicast prefix
foudfou has quit [Remote host closed the connection]
opal has quit [Remote host closed the connection]
foudfou has joined #osdev
opal has joined #osdev
foudfou has quit [Remote host closed the connection]
opal has quit [Read error: Connection reset by peer]
gildasio has quit [Remote host closed the connection]
gxt__ has quit [Write error: Connection reset by peer]
opal has joined #osdev
gildasio has joined #osdev
gxt__ has joined #osdev
matrice64 has joined #osdev
foudfou has joined #osdev
<junon>
Is there a performance benefit of colocating multiple frames together? Or is locality just a matter of individual frames?
<zid>
frames of what?
<junon>
I mean, when mapping physical frames into virtual address space, is there a benefit of mapping consecutive physical frames to consecutive virtual memory?
<zid>
ah, sounds like pages
<mrvn>
junon: search for cache coloring
<junon>
pages, yes
<junon>
Thanks :) exactly what I'm after
<mrvn>
basically if you pick your pages badly they will have cache collisions.
<zid>
barring tag bit issues that might come from picking bad strides, the dram controller could potentially perfer it
<zid>
prefer*
<junon>
Gotcha, makes sense
<zid>
so I imagine "In practice no, in theory yes"
<mrvn>
zid: It's a real issue on some ARMs that do have coloring problems.
<mrvn>
Does c++ have a equivalent to std::copy that use move semantic?
<jafarlihi>
geist: Hey, do you know how to get first 2 octets from `Ipv6AddressEntry` or how to check if address is in a given subnet?
<sortie>
ddevault, ah just back from my vacation, if you still have any TLS questions, ping me tomorrow :)
<ddevault>
sortie: wb :)
jafarlihi has quit [Ping timeout: 252 seconds]
GeDaMo has quit [Quit: There is as yet insufficient data for a meaningful answer.]
<Griwes>
it's also the counterexample to give whenever someone says "std::move doesn't move" ;D
<mrvn>
ahh, way down in "see also"
<mrvn>
Griwes: ever used the ExecutionPolicy thing?
<Griwes>
personally no, but my employer has a product that uses it to accelerate things with gpus :P
<mrvn>
Do you know if that creates threads on it's own or do you have to make some thread pool for it or something?
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<Griwes>
All the execution resources for that are created by the stdlib underneath
<Griwes>
We (as in, the committee) have things in the works to let you customize how that happens at use site, but it's going to be C++26 material
<mrvn>
Lets hope it's something that allows composition.
SGautam has quit [Quit: Connection closed for inactivity]
<Griwes>
Yes, both ways, i.e. the plan is to both let you provide a custom resource to the algorithms, and to be able to reuse the resource the algorithms use for yourself (at least eventually)
<mrvn>
Griwes: if I do 2 executions in sequence I would want to only create threads once and then do both operations together.
<Griwes>
there's generally going to be a thread pool going that's initiated once, on a good quality implementation
<mrvn>
that's still overhead.
<Griwes>
there's always some form of overhead when parallelizing work
<Griwes>
a good quality implementation will also have a threshold for whether to use the parallel resource or not based on the size of the problem, and will fall back to serial for small sizes that it thinks will be faster that way
<zid>
My main overhead is having to buy extra computers
<zid>
coding that is annoying
<mrvn>
Griwes: essential for something like sort()
<mrvn>
zid: coding buying computers? I'm sure you can do that with a little shell script and curl.
socksonme_ has quit [Ping timeout: 268 seconds]
matrice64 has joined #osdev
\Test_User has joined #osdev
bleb has quit [Ping timeout: 252 seconds]
<mjg>
uh... i ran into a claim that there are arm big.little systems where *some* cores supported LSE and others don't
<bslsk05>
web.archive.org: A big LITTLE Problem: A Tale of big.LITTLE Gone Wrong | by Nia Weiss | Medium
<mjg>
welp, the text disappears few seconds after rederning
<j`ey>
yeah.. idk, maybe turn off js or something
<mrvn>
don't you wish you could just press ESC to stop the scripts in the page?
bleb has joined #osdev
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<moon-child>
click the stop button?
<moon-child>
anyway I don't think it's inherently problematic to do big.little where the little cores have different capabilities than the big ones. You just need the requisite support from the software side. I think it came up in this channel a couple of times
<mjg>
well it is a problem when it changes what atomic ops are available
<moon-child>
I would much rather have avx512 support on alderlake big cores, eg
<mrvn>
only if there isn't a common subset for code that doesn't care where it runs.
<moon-child>
don't even need a common subset, just some notion of safepointing
<mrvn>
that really needs software support then
<mrvn>
well, atomic ops you can probably just decode the PC and single step the code past the atomic and then replace it.
<moon-child>
yes, like I said
<mrvn>
or even just catch SIGILL and replace the op every time it faults.
<moon-child>
hmm that's an interesting alternative
<mrvn>
For single opcode stuff that should work well.
<mrvn>
But do you really have cpus where the BIG doesn't support everything the LITTLE can do?
<moon-child>
well, say, cas-loop vs ll/sc-loop is going to look a bit different
<mrvn>
moon-child: for AVX you would want a virtual vector size and have thje opcodes and loops be size indifferent and then safepoints where the code is outside any loop and can switch.
<moon-child>
so you might want a bit more sophisticated in practice
<mrvn>
but the CAS would still work on BIG and you wouldn't get a fault.
<moon-child>
mrvn: interesting part of avx512 is not the bigger vectors, but the new instructions
<moon-child>
more likely on arm, I think, is little cores support only ll/sc, not cas
<mrvn>
You can switch from BIG to LITTLE and then you get faults and can fix the code. But going from LITTLE to BIG I fear nothing would fault.
<moon-child>
so you would provide alt versions of the whole instruction sequence. Maybe play virtual memory tricks where you swap out the page, instead of catching the fault. And make sure not to switch from one core type to another while in the middle of a 'critical region'
<mrvn>
But for AVX512 for example you could have function calls on little and when you switch to BIG you change the function to replace the CALL with the opcode.
<moon-child>
yeah
<mrvn>
but that's all rather hackish and limited. Replacing whole functions with maybe completely different algorithms would be better.
<moon-child>
oh wait I skipped the rest of the sentence and I thought that's what you said
<moon-child>
lol
<mrvn>
That would also work without checkpoints if you do it RCU style. During compile record all the calls to the functions and when migrating swap the call addresses. Add some emulation of the BIG opcodes in case code is inside a BIG function when migrating and voila.
<mrvn>
I just found 5 RPi Picos while cleaning.
<clever>
nice
Burgundy has quit [Ping timeout: 268 seconds]
<mjg>
do any armistas now how much arm cpus are shafted with the vulns?
<mjg>
i hear they had some spectre-related problems