klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<clever> kingoffrance: ive also seen bugs, where the length of the shebang must fit within a certain number of bytes
<heat> that's not a bug
<heat> linux does that for instance
<clever> has it changed recently?
<klange> it's a security feature™
<heat> no
<heat> always has been like that
<clever> i remember something about ti changing
<heat> the limit has changed over time, but that was like 10 years ago
<klange> https://www.in-ulm.de/~mascheck/various/shebang/ has some collected details on different platforms
<bslsk05> ​www.in-ulm.de: The #! magic, details about the shebang/hash-bang mechanism
<bslsk05> ​github.com: improve perl shebang lines by switching to `use lib ...;` by cleverca22 · Pull Request #55786 · NixOS/nixpkgs · GitHub
<clever> apparently, the shebang in these perl scripts was reaching 30kb
<clever> and it worked for a while, and then broke
<clever> i cant remember what changed to make it stop working
<heat> apparently it's 256 now, used to be 128
<heat> mine is 100
<heat> https://github.com/heatd/Onyx/blob/master/kernel/kernel/binfmt/shebang.cpp <-- horrible, horrible code that I wrote while looking at linux's shebang code
<bslsk05> ​github.com: Onyx/shebang.cpp at master · heatd/Onyx · GitHub
<heat> it's pretty similar but a bit more readable IMO
<clever> heat: from what i can remember, i think the error might have been that linux started enforcing that it found a \n within the first N bytes?
<clever> when previously, it would truncate, and execute whatever was left
<heat> maybe
<heat> 30kb shebangs were never a thing
<clever> for nixpkgs, the perl include path was in the shebang
<clever> which is why it turned out to be 30kb
fkrauthan has quit [Quit: ZNC - https://znc.in]
fkrauthan has joined #osdev
fkrauthan has quit [Client Quit]
fkrauthan has joined #osdev
gildasio has quit [Ping timeout: 240 seconds]
gildasio has joined #osdev
<heat> clever, did perl just inject arguments when reading the shebang then?
orthoplex64 has joined #osdev
Ali_A has quit [Quit: Connection closed]
<clever> heat: something like that
<Clockface> does anyone here have a way to check if DOS is loaded and functioning
<Clockface> the program initially does not know if it was booted by the BIOS or loaded as a .COM file by DOS
<clever> Clockface: i think bios only loads the first sector, to 7c00 i think it was, while dos loads the entire .com file
<clever> so you coudl check the addr and if a magic# is present in the 2nd sector of the binary
<heat> try a dos interrupt and see if it works
<klange> clever: true for an MBR, not true for the channel's favorite meme: EL TORITO!
sikkiladho has quit [Quit: Connection closed for inactivity]
<geist> EL TORITO!
<heat> el tor
<heat> ito
<clever> klange: ah, i wasnt thinking about a bios with cdrom support
<heat> the best way to know you weren't booted by the BIOS is to get booted by UEFI
<heat> thank me later
wxwisiasdf has joined #osdev
<wxwisiasdf> ah well just threw autotools outta the window and now i am using make :D
<geist> huh the #! lore page was interesting
<heat> wxwisiasdf, tip: don't actually switch until you're sure that's what you want
<heat> rewritting build files is horrible
<heat> since I don't know what build system i'm actually going for, i'm keeping the current one for now
<heat> maybe i'll rewrite it bit by bit
<Clockface> i like the magic number
<Clockface> ill do that
<heat> but it doesn't work
<wxwisiasdf> no i already did it
<heat> that fast?
<wxwisiasdf> i just did it in one shot and it broke everything and my os now has like 2 ubsan bugs and stuff
<wxwisiasdf> but hey i am not using make
<wxwisiasdf> now*
<wxwisiasdf> fortunely i threw like a thousand assertions everywhere so hopefully i can get it back to working
<geist> well, thats still a bug for you to fix
<wxwisiasdf> yes :)
<klange> reason #532 to not use autoconf, it's too automagical and you have no idea what it's actually doing, so you have no control over your compiler
<geist> probably something to do with the order things got linked maybe (assuming the compile switches are the same)
<wxwisiasdf> worst part is that i can't use gdb because s390 is kinda sus when it comes to proper emulators
<wxwisiasdf> geist: it can be anything really - i've been told the z/arch compiler is pretty buggy
<geist> klange: oh you might be interested in this, someone submitted a patch to LK that does something i thought about but never actually tried. hypothetically it would maybe work on x86 too
<wxwisiasdf> i've already crashed the xtensa ld once :^)
<geist> basically when enabling the mmu and branching to the high kernel address
<geist> instead of having to have a unity mapped low mmu thing
<geist> set the VBAR (in the case of arm64) or the IDT to the high address, enable the mmu and then wait for the cpu to fault
<geist> then catch it
<klange> that sounds evil
<geist> i thought about it on arm64 before, but figured it'd be too risky to try
<geist> but they tried it and says it works fine on real hardware and emultor
<geist> x86 maybe would work too, though would have to think about it
<geist> it's probably technically UB though
<clever> what about prefetch and a couple opcodes having come from the phys space?
<heat> that's horrific and brilliant
<clever> maybe a `b .` to trap it in the physical domain, until it faults?
<geist> yah
<bslsk05> ​github.com: [arch][arm64] replace the trampoline translation table with a trampol… by pcc · Pull Request #327 · littlekernel/lk · GitHub
<geist> could also have a br to the virtual address just to be safe i guess
<geist> so you could i guess do the same thing on x86 by arranging for the IDT to point such that it's #PF handler points to the next instruction in VA space
<geist> would have to point at a IDT entry that points to the next thing i guess
<geist> bt wouldn't have to reserve a huge table, just enough to get to the 14th
<clever> yeah, this basically deletes the need for an identity mapping
<geist> right
<clever> and on the spectre/meltdown front, can the IDT change the paging tables upon fault?
<geist> i'm sure it's not the first time anyone has thought of it, but it does kinda simplify things if its safe to use
<geist> on x86-64 though you'd be basically simultaneously turning on the mmu, entering long mode, and faulting
<geist> so that's a real push
<clever> oh, nice
<clever> thats exactly what i originally joined #osdev for, lol
<clever> and i was cheating, by having qemu pre-create the paging tables for me
<clever> so with this trick, my asm just has to set the IDT addr, and then turn on mmu+long, and fault into the real _start
<clever> but that project has long been retired
<geist> yah though also. you probably want a temporary IDT etc
<heat> how did that qemu page table thing work?
<geist> in this case this PR above basically burns 1K of text for a temporary arm64 exception table which is grody
<clever> heat: i was modifying qemu, so it could run xen unikernels, with the xen hypercall api
<geist> but can probalby arrange for it to point at something offset such that the VBAR's Nth entry goes where you want
<clever> heat: so i just populated the guest ram with a paging table, before the cpu came out of reset, and had a custom bios blob
<geist> but here are also alignment constraints for the arm vbar, etc
<clever> the idea being to spent as little time in real-mode as possible
<gamozo> Spending as little time in real-mode is probably one of my favorite hobbies
gamozo has quit [Quit: Lost terminal]
dude12312414 has joined #osdev
eryjus has quit [Remote host closed the connection]
eryjus has joined #osdev
gog has joined #osdev
sonny has joined #osdev
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #osdev
sonny has quit [Remote host closed the connection]
nanovad has quit [Ping timeout: 250 seconds]
nanovad has joined #osdev
smeso has quit [Quit: smeso]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
smeso has joined #osdev
<klys> yipe a new unikernel
gog has quit [Ping timeout: 272 seconds]
<energizer> did anything end up happening with unikernel linux? i thought that was a good idea
zaquest has quit [Remote host closed the connection]
IT_WAS_ALIENS has quit [Quit: THE RAM IS TOO DAMN HIGH]
Burgundy has joined #osdev
Likorn has joined #osdev
RAMIII has quit [Ping timeout: 260 seconds]
zaquest has joined #osdev
Burgundy has quit [Ping timeout: 276 seconds]
troseman has quit [Ping timeout: 260 seconds]
DonRichie has quit [Quit: bye]
gamozo has joined #osdev
DonRichie has joined #osdev
<heat> how are you supposed to pick the number of queues and queue depth for an nvme device?
<heat> like what's the heuristic
<geist> number of queues may be more based on number of cpus than anything else
<geist> since i think it's common to at least run some number in parallel, per cpu, up to some point
<heat> and the queue depth?
<geist> i dunno that's a good question
<geist> like a lot of these things, probably just a good guess, with some ability for the sysadmin to adjust it possibly
<heat> do you just allocate a page by default? do you allocate the whole thing (the whole thing may be too much, 4MB for a single queue)
<geist> probably something more reasonable, like say 256 or 512 or so entries
<geist> or a single page yeah
<heat> when I'm done with mine I should rework fuchsia's driver
<heat> it's very limited
<geist> funny you say that, yes yes it is
<geist> and i think we need someone to work on it
<heat> single queue, single page for each queue
<geist> i literally have a machine coming in on fedex tomorrow because it is known to not work with our nvme driver
<geist> i said i'd take a look at it
<heat> ooh
<heat> do you have logs?
<heat> the driver looks fine from the spec's POV
<heat> the only totally wrong thing I found is that the timeouts may technically be too short, but 5s should still be enough
<geist> it fails some transaction early on and then falls over
<geist> i dont have them handy
<geist> it gets an unhandled error from the device i think
<geist> the device is some sort of cheapo hynix thing i think
<gamozo> Weird, it works with Linux or some other environment?
<heat> well if you need help you know where to find me on the interwebz
<geist> heat: yah it's just on my work computer which is there and not here
<heat> it is indeed not in two locations at once
<wxwisiasdf> how do i tell ubsan that a NULLPTR i am writing to is okay?
<heat> you don't, you can't do that
<heat> you remap the page and try to write there
<wxwisiasdf> rip
<heat> you could technically try to fool the compiler but the compiler is smart
<heat> so maybe do it in assembly I guess
<wxwisiasdf> oh of course gcc is likely x1000 super smarter - i guess having an asm glue won't hurt too much
Likorn has quit [Quit: WeeChat 3.4.1]
No_File has joined #osdev
<No_File> Good Morning!
No_File has left #osdev [#osdev]
<geist> okay!
<geist> morning!
heat has quit [Ping timeout: 240 seconds]
<mrvn> Never expect hardware to actually follow the specs
<gamozo> Morning @No_File!
<Mutabah> <<No such user `No_File`>>
<gamozo> RIP
<gamozo> Too used to discord at this point I guess
<gamozo> Haven't been on IRC in years
<geist> okay, stuffed in an old first gen ryzen in the server
<geist> see if it is stable now
<geist> if it is, that doesn't mean much, because this cpu draws less power
<geist> so its possible it'll not stress out the VREGs as hard
<geist> and thus is stable
wxwisiasdf has quit [Ping timeout: 240 seconds]
nanovad has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
nanovad has joined #osdev
sikkiladho has joined #osdev
<sikkiladho> How can one implement PSCI_CPU_ON at hypervisor for secondary cpus, code in Trusted Firmware-A is lot complex to replicate. Any examples and docs would be great.
gxt has quit [Ping timeout: 240 seconds]
<clever> sikkiladho: to start with, you need to gain control of the other cores, via whatever mechanism the platform supports, dont even bother looking at PSCI until you have your code running on all 4 cores
<clever> all in hypervisor mode, with the mmu configured the same way
<clever> if you choose to run under the ATF, then you send it a normal PSCI, if you choose to run with the official arm stub then you poke the spintables and sev
gxt has joined #osdev
<sikkiladho> what if I've just booted up and other cores are in reset(or any platform-specific mode for RPi4)? Can I implement PSCI at hyp level to bring up secondary cores?
<clever> the job of PSCI is to convert the platform specific stuff into a standard api
<sikkiladho> So it's possible with spin-tables and not PSCI. I think they're different?
<clever> in the case of the rpi4, coming out of reset, all 4 cores just execute whatever is at PC=0
<clever> and you have no way to wake a core up after it has died
<clever> for the pi4, the job of ATF or a hypervisor, is to ensure a core never actually dies, and just sits in an idle loop, waiting for an inter-core message
<clever> when using the official arm stub, 3 of the cores will park themselves, and wait for an addr in the spintables
<sikkiladho> and with ATF-A?
<clever> ATF will gain control of the cores (probably by living at addr 0) on startup, and then it will park 3 of them in its own idle loop
<clever> and wait for a message from itself (sent by core0, in reaction to a PSCI cmd)
<clever> so when your hypervisor on core0 sends a PSCI command to wake core1, that just acts as a function call into ATF, forcing a switch into EL3
<clever> ATF then sends an IPI interrupt to core1, to wake the ATF thread on core1
<clever> core1 then reads the message, and executes your code in EL2 on core1
<clever> and you need to do the same when implementing a hypervisor
<sikkiladho> Thank you, I got it. Secondary cores are in control of ATF so the SMC must be forwarded to EL3(ATF).
<clever> but you cant just blindly forward the SMC
<clever> you must first gain control of those cores in hypervisor mode
<clever> and then setup the guest, the same way you did on core0
<sikkiladho> Yeah, I would trap the smc and replace the entry-point addres with my own, so that core1 jumps to my address.
<sikkiladho> and preserve the one sent by linux ofcourse.
<sikkiladho> So I should gain control of the CPUS before loading the linux.
<geist> i think in general it's assumed that if you're building a hypervisor, it's a full SMP system
<geist> so basically the fgirst thing the hypervisor nedes to do is bring up the secondary cores and make them part of the hypervisor itself
<geist> you say ATF is hard to replicate, well a hypervisor is much more sophisticated
<geist> since usually they're more or less a full kernel
<geist> so really, i ask, what are you trying to do here?
nyah has joined #osdev
<sikkiladho> @geist thank you. I will try to get control of secondary cpus, before setting up the guest.
<sikkiladho> I think ATF was hard to replicate because it's for multiple platforms and my hypervisor right now is very simple , but I don't think I have to replicate it in this case. thank you.
<sikkiladho> @geist I'm building a simple hobby hypervisor for rpi4 which just loads a single linux kernel and sits underneath. At first, that's it.
<geist> ah
<geist> well in that case you'll have to be prepared for hypervisor traps from each of the cpus, so though you may not be implementing a complex hypervisor you'll probably need to implement some amount of locking or whatnot internally
<geist> so in that respect you'll have to handle effectively a SMP hypervisor, even if it's very simple
<clever> personally, i would just use LK as a base
<clever> modify the mmu code to support running in EL2 instead of EL1
<clever> and then use a core-pinned thread for each guest core
<clever> whenever the LK scheduler thinks it can, it will run that thread, which will then drop down to EL1 and run the guest
<clever> and when the guest throws an exception/smc, control returns back to that thread in EL2/lk
<clever> if you want a second guest, just spin up more threads, and let the LK scheduler deal with it
<clever> pre-empting a guest? ensure timers can force a switch back to EL2!
<clever> geist: does that all seem sound?
<clever> hardest part i can see, is just having an LK thread "resume" after it dropped to EL1, like the drop had simply returned
<geist> yah i think that'd be pretty doable
<clever> it also loosely reminds me of the linux kvm api
<clever> where you just have a "run the guest" ioctl
<clever> and when anything goes wrong and the kernel cant deal with it (hypercalls, faults), the ioctl returns, and your code is left to deal with it
<geist> yah i have always thought that'd be a fun project
<geist> just build a pure type 1 hypervisor and run other stuff in it
<clever> you could similarly implement ATF, by just modifying LK to run in EL3
<geist> that's my way to assert dominance here: run everyone's hobby OS under mine
<clever> but i think EL3 is mmu-less?
<geist> no it has its own, it just doesn't nest
puck has quit [Excess Flood]
<clever> ah
puck has joined #osdev
<clever> so you would just have to modify the mmu code to support running under EL3/EL2/EL1, and to not drop to EL1 immediately
<clever> and then compile-time configure what EL you want it to drop to and run under
<geist> yah the hard part is there is a bunch of code tat accesses _EL1 fairly explicitly, so would have to at least macroize that stuff
<geist> right
<clever> and then normal thread/app stuff can deal with running guests at lower levels
<clever> and setting up secure vs non-secure guests
<geist> also IIRC EL3 and EL2 MMUs are funny: they only map the bottom part (ie, one of the two TTBRs) *Except* if you have a core that supports the EL2 extensions
<geist> so implicitly if you're EL3 or EL2 only on v8.0 you're limited to bottom half mmu
<clever> so you would have to change the kernel base
<geist> right
<clever> and change it into using TTBR0 for the kernel
<geist> right
<clever> i think lk always uses TTBR1
<clever> because it assumes its in the high half, and leaves 0 free for a userland
<geist> right
<clever> i think access to _EL1 regs will also work from EL2/EL3?
<clever> because the hypervisor/tf may want to modify EL1 state
<clever> so you cant rely on faults to tell you when your using the wrong regs
<mrvn> One more reason to run a lower half kernel / higher half user :)
<clever> and would have to audit the output asm
<mrvn> Where user in this case would be the linux kernel
<geist> sure
<clever> mrvn: oh, random thought, a high half userland, means that null pointers are "safer", even with a +3gig offset, lol
<geist> clever: that's right (re: _EL1 access)
<clever> you would need a massive positive offset, for it to clear over the kernel, and hit userland
<geist> but yeah iuser space being at the bottom is a pretty standard scheme now
<geist> usual reasons
<geist> and then some arches codify it
<geist> but not the general modern ones
<clever> and thinking about it a bit, more from a malicious angle
<clever> if i wanted the hypervisor to hide itself from a linux guest
<mrvn> AArch64 seems to codify: hypervisor = lower, kernel = higher, user = lower
<clever> i would need to block access to a region of memory where the hypervisor lives, and maybe mess with dma controller commands, to stop you from using dma to peek behind the hypervisor mmu
<mrvn> clever: you can just swap address spaces when enteriong the hypervisor and put it anywhere
<mrvn> short of that little change addres spaces stub
<mrvn> I think you pretty much have to do it that way on 32bit.
<clever> using nested paging tables, i should be able to ban linux from reading a 1mb chunk of ram
<clever> but i could map that to some other address, to make it less obvious
<mrvn> no nested tables in hpyervisor mode
<clever> isnt that the whole point of hypervisor mode, so you can run the kernel under a second set of tables?
<mrvn> ahh, sorry, yes, linux would be nested
<clever> EL2 sets up the nested tables, EL1 sets up its own tables, and now all translations go thru both EL1 and EL2's tables
<clever> and EL2 can use that to hide the hypervisor from linux
<mrvn> nod
<clever> at which point, how can linux detect the hypervisor?
<mrvn> only by trying to use some address space and it not working
<clever> what if i map the hypervisor's address to some other part of ram
<clever> so that 1mb block shows up at 2 addresses
<mrvn> what you would see is that you have an odd ram size.
<clever> and both are within the "no touchy" zone declared by the rpi firmware
<clever> which is already stealing 24mb of ram
<clever> i can just boot with gpu_mem=23, and now the firmware only steals 23mb
<clever> then take the extra 1mb for my hypervisor
<mrvn> yep, that hides it well
<clever> and the ram size is just as odd as without the hypervisor
<clever> the only sign it happened, is that a 1mb chunk of that 24mb "dont look here" is duplicated
<mrvn> can you ask the VC for it's ramsize?
<clever> on pi4, that is permanently pegged at 1024
<clever> the VC is only aware of the lower 1gig
<mrvn> I mean the gpu_mem
<clever> you can, but i could just hook those routines...
<clever> but, you just gave me a crazy idea
<clever> i could live inside the gpu_mem's heap!
<mrvn> there are probably some follow up problems if you mess with that
<clever> using mailbox functions, i can allocate say a 1mb object on the VC's heap
<clever> and then i can copy my hypervisor into that
<clever> now it really is "in use" by the firmware!
<mrvn> yeah, maybe better. And you could display the hypervisor memory as graphics output for fun
<clever> already tried that in another crazy idea, i wanted to dump the bootrom on the framebuffer, without bringing ram online :P
<clever> but i think the framebuffer cant be too close to 0 in ram
<clever> "ram"
<clever> so i have to bring ram online, to address further away from 0 and have it function
<mrvn> isn't the bootrom some secure memory that the graphics chip wouldn't be able to access?
<clever> much like the gameboy and xbox, its just a normal axi slave, until you set a magic flag, then it drops off the bus and that addr becomes ram
<mrvn> does the RPi4 have the secure extension?
<clever> the secure extensions in the ram controller, are wired into the VC, not the arm
<clever> so only the VC in secure mode, can access protected pages
<clever> the official firmware runs in non-secure mode by default, and has an array of trusted functions that can be ran in secure mode
<clever> and a syscall like api, to run a function by index
<clever> secure_fn_0 is used as an index lookup, you give it a function pointer, and it returns the index into that array
<clever> that index is then stored under this->fn_foo_index, and later used to call it
<clever> the VC has a 128 slot vector table, 32 slots for cpu exceptions, 32 slots for software interrupts (like int 0x80), and 64 slots for hw interrupts
<clever> each slot is just a PC to jump to, but bit0 of the value signals if the vector should be serviced in the current mode or secure mode
<clever> so storing `&irq_uart | 1` into a slot, causes the irq handler to be ran in secure mode
<clever> and the same for software interrupts
the_lanetly_052 has joined #osdev
<clever> each core (there are 2) also has a register for the base addr of that vector table (much like arm's VBAR)
<clever> alignment is enforced by the register simply not storing the lower bits, so if you read it back, its been rounded down to the nearest alignment
xenos1984 has quit [Read error: Connection reset by peer]
<clever> mrvn: nested paging tables are also taken to another level on the rpi, there is an extra mmu between "arm physical" and real ram, 64 pages of 16mb each
<clever> so you can potentially be going thru 3 paging tables, EL1, EL2, broadcom
<clever> the broadcom mmu is applied outside of the arm l1/l2 caches, so a cache-hit wont have any perf cost
graphitemaster has quit [Ping timeout: 260 seconds]
graphitemaster has joined #osdev
xenos1984 has joined #osdev
eau has quit [Quit: bleh!]
No_File has joined #osdev
sympt0 has joined #osdev
sympt has quit [Ping timeout: 252 seconds]
sympt0 is now known as sympt
diamondbond has joined #osdev
GeDaMo has joined #osdev
kingoffrance has quit [Ping timeout: 240 seconds]
sprock has quit [Quit: brb]
sprock has joined #osdev
gog has joined #osdev
the_lanetly_052_ has joined #osdev
sikkiladho has quit [Quit: Connection closed for inactivity]
the_lanetly_052 has quit [Ping timeout: 260 seconds]
eau has joined #osdev
kingoffrance has joined #osdev
Vercas9 has joined #osdev
Vercas has quit [Ping timeout: 240 seconds]
Vercas9 is now known as Vercas
<ddevault> can someone explain what the %gs register is for
<ddevault> I am utterly failing to understand its (apparently important) purpose
<GeDaMo> Thread local storage?
<gog> yes, typically %gs contains the base address for the thread's local data
<gog> %fs and %gs
<ddevault> hm
<gog> this was the convention before and since a few CPU generations ago is supported by CPU instructions
<gog> before amd64 thread-local storage was managed with the GDT, now it's managed with a pair of MSRs
<ddevault> I see
diamondbond has quit [Remote host closed the connection]
No_File has quit [Quit: Client closed]
<klys> global segment
nyah has quit [Quit: leaving]
jimbzy has quit [Ping timeout: 240 seconds]
gog has quit [Ping timeout: 248 seconds]
vinleod has joined #osdev
wand has quit [Remote host closed the connection]
vdamewood has quit [Ping timeout: 240 seconds]
wand has joined #osdev
nyah has joined #osdev
ZetItUp has joined #osdev
gog has joined #osdev
<mrvn> can one apply __attribute__((__packed__)) to a template<typename T>? gcc always says it will ignore it.
gog has quit [Ping timeout: 272 seconds]
Vercas has quit [Quit: Ping timeout (120 seconds)]
wand has quit [Ping timeout: 240 seconds]
gildasio has quit [Ping timeout: 240 seconds]
gxt has quit [Ping timeout: 240 seconds]
vdamewood has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 276 seconds]
vinleod has quit [Ping timeout: 276 seconds]
heat has joined #osdev
<bauen1> mrvn: some code i have here says you can, at least gcc (10, 11, 12) isn't complaining
<bauen1> code in question is roughly: `template <typename T> struct [[gnu::packed]] Timed { T value; }`
<mrvn> bauen1: is doesn't complain, it just ignores it. Check sizeof()
<mrvn> or it complains that it will ignore it
<mrvn> bauen1: your Timed is packed but T is not packed. So overall you just changed the alignment to 1 and broke T.
<mrvn> Try struct T { char c; int i; }; the value is not packed.
<bauen1> oh i hate c++
<mrvn> The problem might be that the [[gnu::packed]] neess to be between "struct" and "Name" in the T.
<bauen1> wtf
<bauen1> no, what, https://godbolt.org/z/Pecr3hsE3 seems to work
<bslsk05> ​godbolt.org: Compiler Explorer
<mrvn> bauen1: don't forget that packed isn't recursive. A struct in a packed struct is not itself packed. You have to apply the attribute to every sub struct too.
<mrvn> yes, packing S works, but packing Timed doesn't pack the inside T.
<bauen1> oh, i don't think that will be a problem, the code that cares about packed static_asserts that alignof(T<...>) == 1
<mrvn> I even tried this: <source>:4:34: warning: attributes ignored on elaborated-type-specifier that is not a forward declaration [-Wattributes] 4 | template <struct [[gnu::packed]] T>
<mrvn> ahh, I didn't think of you asserting it's packed. thanks.
<mrvn> well, packed or only contains chars
<mrvn> s/you/your/
<bauen1> mrvn: i have written a header that asserts all kind of weird things to ensure a struct can be passed between 2 platforms without issues, except you can't entirely ensure that as someone can always 1. forget to add a static_assert(sizeof() = x) on their struct and use types that actually have a different size, e.g. `long int`
<mrvn> Anyone know what the state of introspection is for c++? Could one use that to recursivley generate a "struct [[gnu::packed]] PackedT" from any given T?
<bauen1> it starts with:
<bauen1> static_assert(CHAR_BIT == 8, "Please use a reasonable platform");
<mrvn> hehe
<mrvn> You must be happy that int is now a two's complement.
<mrvn> till recently you could only share unsigned types and intX_t.
<bauen1> lol we're sharing floats and doubles here ...
<heat> c++ is a prime example of stockholm syndrome
<mrvn> uhoh. what about archs without denormalized doubles?
<heat> fuck em
<bauen1> there's also some really shitty stub headers so i can compile the microcontroller firmware for linux, and get all the offset of struct members and some other information exporter into JSON
<mrvn> On alpha doubles aren't even ieee unless you add a gcc flag that makes it run half speed.
<bauen1> writing a program to use libclang was also considered, but as far as i could see libclang works on the AST and not on e.g. the final struct layout / values
<mrvn> In the future you can do that with introspection.
<bauen1> mrvn: in the future there will be rust ...
<mrvn> can rust already do introspection?
<bauen1> mrvn: not sure, but it has macros / the derived-thingy that would make this exact thing a lot easier to build i think
<heat> bauen1, you could definitely use clang libraries to do that
<heat> clangd already knows sizes and alignments and whatnot
<bauen1> heat: libclang seems to only operate on the AST, or at least I couldn't figure how to find a list of all struct types in the entire project that fullfil a certain critera (e.g. passed to template, passed to function)
<bauen1> heat: problem is that all of this probably involves a few too many layers of templates :(
Vercas has joined #osdev
<mrvn> There is no problem that can't be made more magic by the use of more templates.
Mikaku has quit [Excess Flood]
Mikaku has joined #osdev
wand has joined #osdev
gildasio has joined #osdev
gxt has joined #osdev
nick64 has joined #osdev
Gooberpatrol_66 has joined #osdev
ptrc_ has joined #osdev
paulbarker_ has joined #osdev
psykose_ has joined #osdev
sham1_ has joined #osdev
mxshift_ has joined #osdev
seds_ has joined #osdev
merry_ has joined #osdev
dminuoso_ has joined #osdev
MiningMa- has joined #osdev
mcfrd has joined #osdev
shikhin_ has joined #osdev
night_ has joined #osdev
tomaw has quit [Killed (mercury.libera.chat (Nickname regained by services))]
froggey-1 has joined #osdev
dzwdz1 has joined #osdev
lg_ has joined #osdev
tomaw has joined #osdev
shikhin has quit [Killed (NickServ (GHOST command used by shikhin_))]
shikhin_ is now known as shikhin
kanzure_ has joined #osdev
dzwdz has quit [*.net *.split]
psykose has quit [*.net *.split]
hgoel[m] has quit [*.net *.split]
Gooberpatrol66 has quit [*.net *.split]
ptrc has quit [*.net *.split]
lg has quit [*.net *.split]
MiningMarsh has quit [*.net *.split]
kanzure has quit [*.net *.split]
mcfrdy has quit [*.net *.split]
merry has quit [*.net *.split]
dayimproper has quit [*.net *.split]
dminuoso has quit [*.net *.split]
mxshift has quit [*.net *.split]
froggey has quit [*.net *.split]
sham1 has quit [*.net *.split]
seds has quit [*.net *.split]
night has quit [*.net *.split]
warlock has quit [*.net *.split]
paulbarker has quit [*.net *.split]
unlord has quit [*.net *.split]
ptrc_ is now known as ptrc
mcfrd is now known as mcfrdy
psykose_ is now known as psykose
MiningMa- is now known as MiningMarsh
merry_ is now known as merry
mxshift_ is now known as mxshift
paulbarker_ is now known as paulbarker
seds_ is now known as seds
dayimproper has joined #osdev
No_File has joined #osdev
_xor has joined #osdev
hgoel[m] has joined #osdev
_xor has quit [Client Quit]
gog has joined #osdev
Likorn has joined #osdev
No_File has quit [Quit: Client closed]
dude12312414 has joined #osdev
kanzure_ is now known as kanzure
joe9 has joined #osdev
blockhead has joined #osdev
Likorn has quit [Quit: WeeChat 3.4.1]
froggey-1 is now known as froggey
diamondbond has joined #osdev
night_ has quit [Quit: goodbye]
night has joined #osdev
No_File has joined #osdev
dude12312414 has quit [Remote host closed the connection]
nick64 has quit [Quit: Connection closed for inactivity]
mahmutov has joined #osdev
Likorn has joined #osdev
sham1_ is now known as sham1
rustyy has quit [Quit: leaving]
rustyy has joined #osdev
gog has quit [Ping timeout: 272 seconds]
<geist> ddevault: are you using x86-64 or x86-32?
<ddevault> the former
<geist> also ugh, was responding to something 8 hours ago
<geist> oh looks like no more dicussion was on it
<ddevault> I still don't fully understand %gs, but I don't really need to right now
<geist> so yeah as gog was saying gs: is largely vestigial
<ddevault> well, I understand what it *was* for
<geist> basically the *value* in gs (and fs) is irrelevant now
<ddevault> but I don't really understand what kernels still do with it
<ddevault> in any case, my code works so I'm happy enough
<geist> but you can use an override prefix to dereference something off it
<geist> ie
<geist> mov gs:4, rax or something like that
<ddevault> hm
<geist> basically take the address that is 4 off of what gs 'points to' and move into rax
<geist> and that's accomplished in the assembler via a segment override prefix byte
<geist> the way gs (and fs) 'point to' something in x86-64 is not via the GDT like it used to, but via a set of MSRs you can set
<ddevault> I see
<geist> GS_BASE FS_BASE and GS_KERNEL_BASE
<geist> *basically* it's used for thread local storage in user space. traditionally fs points to the thread local structure
<geist> and in the kernel GS usually points to somethig similar. a cpu specific data structure
<mrvn> iirc on x86 the use of fs/gs is reversed
<geist> in an SMP system you always want to have at least one per-cpu structure that you can anchor things off of
<geist> so it's traditional (and kinda baked into the arch in 64bit) that gs points to that inside the kernel
<geist> on non SMP it isn't really mandatory
<ddevault> I understand, that makes more sense now
<ddevault> I was not grokking that the use by convention differed from the use per the CPU manual
<ddevault> thanks :)
<geist> yah the manual wont really describe what it's for, just the mechanism
<geist> this is also where GS_KERNEL_BASE and GS_BASE and swapgs will start to make sense
<ddevault> what confused me is that it had a much more important purpose before
<ddevault> so all of the docs cover it in great detail regarding its legacy use
<geist> which initially is head scratching, but if you have both the kernel and user space use GS, those features start to make sense
<mrvn> ddevault: You mean as an actual segment descriptor?
<geist> wasn't so much important as fs and gs were just another one of the regular segment registers then (ds, es, fs, gs, ss, cs)
<ddevault> yeah
<geist> and protected mode segment stuff was somewhat more powerful
<mrvn> it's been repurposed since all segment start/limit is ignored in 64bit.
<ddevault> yeah
<geist> exactly, so in 64bit the other 4 registers are basically entirely vestigial (except cs signalling what mode you're in)
<ddevault> but I still saw kernels in the wild messing with it
<ddevault> so I was a bit unsure as to why they were bothering and if it was important
<mrvn> What is surprising is: what's up with "es"? Why isn't that used?
<geist> but they left some functionality in fs/gs, but indirectly (via the MSRS) or via the new instructions to let you set them directly (fsgsbase instructions)
<geist> mrvn: anymore or at some point?
<mrvn> geist: in 64bit mode
<geist> oh i guess AMD basically left in the bare minimum
<mrvn> Is the "es" prefix worse than fs/gs?
<geist> also es has some hard coded uses in some instructions, so i'm guessing they left it along for that reason
<geist> otherwise you'd have to also modify those instructions to not use it, etc
<mrvn> ahh, that would explain it. Stipping it out of the instructions for 64bit mode would be complex.
<geist> yah iirc movs implicitly uses es for one of the sources? (amirite there?)
<GeDaMo> Destination, I think
<geist> or destination
<geist> yah
<geist> i dunno how the segment override prefixes work with movs. which side does/can it modify?
<GeDaMo> "For legacy mode, Move byte from address DS:(E)SI to ES:(E)DI. For 64-bit mode move byte from address (R|E)SI to (R|E)DI."
heat has quit [Ping timeout: 248 seconds]
<GeDaMo> Doesn't seem to apply in long mode
<geist> GeDaMo: yeah or it does implicitly use ds/es except those have no offset/length so effectively it disables it
<geist> also interesting questin: can you use fs or gs override prefix for it in 64bit mode
<GeDaMo> "The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden."
<bslsk05> ​www.felixcloutier.com: MOVS/MOVSB/MOVSW/MOVSD/MOVSQ — Move Data from String to String
<geist> there ya go. makes sense
<geist> or at least doesn't really make sense, but thats the answer!
<geist> as is lots of x86isms
<mrvn> If an opcode uses two segment registers then you can't override both of them. There is no "override the other segment" prefix byte.
<geist> right
<geist> that's the at least internally consistent part of it
<geist> and outside of movs i dont think too many other instructions access two pointers at the same time
<geist> i'm sure there's some other one somewhere (there always is) but id ont know of it offhand
<zid> does push [] count
<zid> also uses two selectors
<zid> (does that even exist?)
<geist> push indirectly? i dont think so
<bslsk05> ​www.felixcloutier.com: PUSH — Push Word, Doubleword or Quadword Onto the Stack
<geist> and yyeah indirects or double indirects may reference more than one thing but i dont rememer if x86 has a bunch of those
<mrvn> memory, register or immediate. The first would use 2 segments.
<geist> ie, indirect this and then use that word to then indirect something else
brynet has quit [Quit: leaving]
<mrvn> geist: mov (#1, ds:r2*4, es:r3), (#3, fs:r4*8, gs:r5) to the rescue.
<geist> hmm?
<mrvn> a hypothetical 4* indirect addressing opcode
<geist> ah
<geist> i was expecting you to show off a 68k opcode that does this no sweat :)
<mrvn> no braindead segements in m68k :)
<mrvn> any idea how m68k does TLS?
<geist> though i gotta say x86 limiting themselves to one memory deref per instruction in most cases really does make the microcode simpler
<geist> 68k and vax have fairly complex internal states to make sure that page faults or whatnot on the Nth operation can be unwound and restarted
<geist> good question re 68k TLS
<mrvn> They actually screwed that up in the 68020. can't recover from a bus error so they run 2 68020 (iirc) in parallel with a clock offset. If the first throws a bus error the second gets stopped before it become unrecoverable.
<mrvn> What a way to unwind an opcode on error
<geist> 68000 IIRC. 010 fixed that among other things
<mrvn> How is your m68k board?
<geist> from poking around the web i've seen a few references to sysv 68k abi just not having thread local storage. have to make a syscall in linux
<geist> it's doing fine, need to futz with it some more
brynet has joined #osdev
pretty_dumm_guy has joined #osdev
bliminse has quit [Quit: leaving]
No_File has quit [Quit: Client closed]
jimbzy has joined #osdev
wxwisiasdf has joined #osdev
GeDaMo has quit [Quit: There is as yet insufficient data for a meaningful answer.]
<wxwisiasdf> hello
<wxwisiasdf> how do i tell gcc to interpret printf formats with the -fexec encoding
<wxwisiasdf> i get lots of spurious warnings because i am using -fexec-charset=ibm-930 and it's very annoying because i have to basically rely on me not messing up formatting things on the kernel
<mrvn> no c++?
<wxwisiasdf> no it's c
<mrvn> maybe you should start there :)
<wxwisiasdf> ???
<wxwisiasdf> oh i see using automatic type deduction for formatting from c++
<mrvn> std::format
<wxwisiasdf> yeah
<wxwisiasdf> but this a kernel :)
<mrvn> even more reason to have it type safe
<Griwes> Idk what this being a kernel has to do with anything, my kernel formats stuff with std::format :P
Gooberpatrol_66 has quit [Quit: Leaving]
<mrvn> Griwes: with or without type erasure?
<Griwes> It's per std::format spec
<wxwisiasdf> okay if not the kernel then my libc also uses printf for the various *nix utilites
<mrvn> type erasure is a implementation improvement
<Griwes> Which type erasure
<Griwes> As the spec stands, you need to erase some argument types and you need to erase the iterator
<wxwisiasdf> and those are -fexec-charset ibm930 too
<mrvn> can't remember exactly but it reduces the code bloat
<Griwes> The iterator erasure, then
<Griwes> It was DR'd to be effectively required
Gooberpatrol66 has joined #osdev
mctpyt has joined #osdev
<Griwes> The thing that reduced most code bloat for me was a very careful dance of force inlining just the correct things
<mrvn> basically everything before the type erasure and nothing after
<Griwes> not... *quite*
<Griwes> it was a bit more involved
diamondbond has quit [Quit: Leaving]
mctpyt has quit [Ping timeout: 260 seconds]
joe9 has quit [Quit: leaving]
heat has joined #osdev
<heat> sup noobs
<jimbzy> Messing around with a schematic. Und du?
<heat> nothing, just got home
<heat> i'll probably try to finish my nvme driver tonight
<jimbzy> Schweet
<mrvn> tell me more
<mrvn> ups
<heat> tell me less
<bslsk05> ​'Grease - Summer Nights HD' by Kurt Harmsworth (00:04:01)
joe9 has joined #osdev
diamondbond has joined #osdev
joe9 has quit [Quit: leaving]
wxwisiasdf has quit [Ping timeout: 276 seconds]
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
heat_ is now known as heat
<geist> heat: looks like someone debugged the nvme fuchsia driver problem
<heat> oh man they ruined the fun :/
<heat> what was it?
<geist> some assumption the driver had about somethig. will check in a sec
<geist> not at work computer
<geist> iirc it was somehting like the river assumes you can build a queue this long but the device naked it
<heat> aha
<heat> IO queue right?
<heat> (not admin)
<geist> i'll have to check
<geist> afworkk right this sec
<heat> yeah i think it must be
<heat> unless the nvme is buggy
<heat> the queue limit they give you only applies to the io queue
<heat> btw I found out why PRPs and SGLs both exist
<heat> SGLs weren't a thing on spec 1.0
mahmutov has quit [Ping timeout: 256 seconds]
<heat> it can also explicitly not support SGLs
<heat> so you do need to support both PRPs and SGLs in your driver (yay complexity!)
archenoth has quit [Read error: Connection reset by peer]
<geist> ah that makes sense. i was expecting that SGLs are optional
<geist> so then the questio is what subset of consumer hardware supports it
<geist> i was thinking this was similar to the compex descriptors in SDHCI which has a similar thing (in spirit)
<geist> ie a simple scheme that everything supports and the complex one
<geist> which effectively means the simple one is the one you worry about
_xor has joined #osdev
<geist> and the other one is gravy that maybe you can use
<heat> linux does seem to use SGLs by default since they're probably faster
<geist> oh sure, favor the fancy thing but fall back
archenoth has joined #osdev
<heat> see, this is where I wonder if a buddy allocator is really the best choice for a page allocator
<heat> more contiguous memory = better
<heat> if you try to do SGLs on really fragmented memory you'll end up with basically a larger PRP
<heat> like how much are you actually paying at page alloc time vs all the speed ups you can go for
<heat> hugepages too
<geist> yah agreed re buddy allocator
<geist> i'm not a huge fan, but i think to be honest it's beacuse i'm not a fan of doing whatever linux does because they are
<gamozo> Mornin everyone!
<heat> sup gamozo
<gamozo> I do 5 hours of yard work and now apparently I need to sleep until 2pm
<gamozo> Ahaha
<heat> geist, what did your other projects use?
<geist> for pmm? the queue
<geist> just a queue of pages in whatever order
<heat> ah just the simple list?
<geist> yep. zircon does too
<geist> hard to argue with O(1)
<heat> idea: buddy allocator as the backend, cache of memory regions as a percpu thing
<j`ey> a single queue of PAGE_SIZE's?
<heat> yes
<geist> maybe multiple queues for different numa nodes, etc but the idea is the same
<geist> carve off a struct per page and toss it in a list
<geist> works quite well, lots of large systems have survived on that
<heat> if you percpu cache it, you eliminate lock contention and stop any possible yo-yo of regions when alloc/freeing
<geist> it just hurts to allocate more than one contig page
chartreus has joined #osdev
<geist> yep. we have a per cpu cache in front of the pmm now. helped a lot
<heat> this is food for thought
<j`ey> nomnom
<heat> what does nt use?
<heat> or freebsd?
ZetItUp has quit [Ping timeout: 276 seconds]
<heat> maybe it's in the windows internals books, can't find any information about it online
<geist> i think NT is queue based
<kazinsal> believe so
<kazinsal> I'd have to crack open Windows Internals to be sure
<kazinsal> and that's a lot of dead trees that's all the way over at the other side of my apartment, a whole 10 steps away
<geist> fairly certain queues of pages is pretty much the defacto implementation for more or less everything that was conceived of <2000 or so
<heat> here's a cute detail: windows maps IO ranges with large pages if it sees it can
<heat> i'm unsure of how this plays along with that scary x86 UB for large pages with multiple PAT attributes or whatever that was that doug16k once mentioned
Likorn has quit [Ping timeout: 240 seconds]
diamondbond has quit [Quit: Leaving]
<heat> haha they got struct page'd too
<heat> it's also a horrible thing with fields that are overloaded 4 times
<zid> Getting struct page'd is a terrible affliciton :'(
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
<heat> page tables are swappable wtf
nyah has quit [Ping timeout: 260 seconds]
<mrvn> if you have nested tables nothing stops you from swapping one of them
<kazinsal> just bank switch the switched bank
thatcher has joined #osdev
<heat> they mention lists of PFNs quite a lot
<heat> but I don't know if this is the actual format of the lists
<heat> they don't really mention large pages
<heat> except the "we try to use large pages transparently" part
<geist> yah i never completely grokked what they were talking about with 'prototype page table's or whatnot
<geist> PFN is probably just a way ot saying 'page address shifted over'
<geist> lots of systems do that
<geist> re: swappable page tables, that may have fallen out of earlier experience with VAX, which actually letsyou swap page tables
<heat> PFN is also what they call their struct page
<geist> remember most of the early devs for it were ex vax folks, so they applied a lot of the same design patterns
<geist> IRQL and whatnot is 100% a vax hardware feature they brought forward and emulated in software because they were used to the model
<heat> allegedly, they get PFNs on like 6 or 7 lists (dirty, clean, zero'd, unused, etc) and then allocate from those
<geist> yep. and theres a priority scheme there
<geist> when allocating a page it walks down the lists and finds the first one in the right list
<heat> but what constitutes a "list" is unclear to me. they mention a simple linked list
<geist> and then there's machinery that tries to keep the lists balanced and whatnot
<geist> i think it's basically an array of lists, in an allocation priorut order. i like that model, eve if it's only conceptual
<heat> but they surely mustn't be using a simple linked list if they want to allocate large pages
<heat> this section seems... vague
<heat> maybe microsoft will drop the nt kernel sauce next month
<geist> yah dunno how large pages would work in that model
<geist> but you said before it was for io pages
<geist> dunno the extent of large page support for non io pages
<geist> or maybe it's transparent in certain ways, like specific contiguous page allocation paths and then in that scenario it'll do a larg map if it works out that way
<geist> zircon has that
JanC has quit [Remote host closed the connection]
<heat> virtualalloc has a MEM_LARGE_PAGES
JanC has joined #osdev
<heat> they say they could possibly breakdown a 1040MB allocation into 1 huge page and 4 large pages
<heat> geist: re prototype PTEs, it seems they're like vm objects
<geist> possible. maybe they track the page assignments of pages to vmos using some sort of N level thing that they call prorotype PTEs
<heat> like a shadow page table, real PTEs point to it (with the P bit 0'd)
<geist> yah
<geist> how that would precisely work i dunno, but i guess the gist is to use some sort of similar thing
<heat> linux also uses a page table-ish structure for their vmos
<heat> the so called radix tree, now renamed xarray
<geist> right
<geist> zircon uses a wavl tree of runs of 16 pages
<geist> basically arbitrarily picked to be a reasonable compromise
<heat> oh yeah for sure, the prototype page tables are like a little page table for the "vmo"
<heat> why did you go for a wavl tree?
<geist> yahwhether or not you can actually use it as a real page table i dunno
<geist> because we already have a wavl tree implementatio
<geist> ro you mean why did i use a tree in the vmo or why a wavl tree vs some other tree?
<heat> why a tree vs the radix-tree/prototype page table thing
<geist> ah simply the former
<heat> s/tree/binary tree/
<geist> we already had it and it was expediant and it has pretty reasonble performance and size characteristics
<geist> especially given that most vmos are pretty small, and thus really only end up with a single run of pages
<heat> the most ingenius use of the page table stuff I've seen in linux is that you can kinda figure out what's dirty right from the top level of the tree
<heat> just like a page table
<geist> oh yeah?
wxwisiasdf has joined #osdev
<geist> ah
<wxwisiasdf> finally i got rid of the debug diag 8 cmd and now i use a proper ic console :D
<heat> when dirtying, they queue the dirty inodes; on writeback, they look at the radix tree and go down the branches that are dirty (literally a D bit)
<heat> then you can easily writeback large runs of pages at once
<geist> ah makes sense
<geist> yah that's a thing we wouldn't be able to do in the wavl tree because the order of the tree is not constant
<heat> yup
<heat> i was thinking about going down the radix tree route and making a dynamically growable tree (I don't know if that's how linux does it, but probably)
<heat> essentially add levels to it once they're required
<heat> small files would be trivial to look up, huge files would still be very fast
<heat> would probably keep each table PAGE_SIZE size'd for compactness' sake although getting it larger wouldn't be too bad either
<heat> i think it's teoretically way better than my binary tree in every characteristic except memory usage
<wxwisiasdf> radix trees seems interesting
<geist> yah i guess picking the right radix is interesting too
<geist> since that affects how much internal fragmentation you get
* heat nods
<heat> also if you use whole pages you can skip malloc and or its internal fragmentation
<geist> yah but thats probably pretty bad for internal fragmentation in the sense that you probably have a large set of unused page pointers
<geist> may be good to generally pick a radix or a set of radices that are relative to the size of the object or whatnot
<geist> no idea what linux does
<geist> but yeah the obvious one is one page radix
<heat> theoretically you could change the radix and restructure the tree when the levels get too deep
<heat> not that they would get too deep
<heat> at most you get 6 levels for a huge 64-bit vmo
<heat> ... at that point your main concern probably isn't the radix tree :D
<geist> also this is a fun thing where using larger page sizes affects your radix and the number of pages, etc
<heat> i wonder if changing page sizes does have a measurable effect on system performance
<heat> IO even
<heat> it's very common for kernels to just size things based on pages
<geist> right. it's an interesting question
lg_ is now known as lg