#osdev on 2022-05-17 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <clever> kingoffrance: ive also seen bugs, where the length of the shebang must fit within a certain number of bytes

00:00 <heat> that's not a bug

00:00 <heat> linux does that for instance

00:00 <clever> has it changed recently?

00:00 <klange> it's a security feature™

00:00 <heat> no

00:01 <heat> always has been like that

00:01 <clever> i remember something about ti changing

00:01 <heat> the limit has changed over time, but that was like 10 years ago

00:01 <klange> https://www.in-ulm.de/~mascheck/various/shebang/ has some collected details on different platforms

00:01 <bslsk05> www.in-ulm.de: The #! magic, details about the shebang/hash-bang mechanism

00:02 <clever> https://github.com/NixOS/nixpkgs/pull/55786

00:02 <bslsk05> github.com: improve perl shebang lines by switching to `use lib ...;` by cleverca22 · Pull Request #55786 · NixOS/nixpkgs · GitHub

00:02 <clever> apparently, the shebang in these perl scripts was reaching 30kb

00:02 <clever> and it worked for a while, and then broke

00:03 <clever> i cant remember what changed to make it stop working

00:05 <heat> apparently it's 256 now, used to be 128

00:05 <heat> mine is 100

00:06 <heat> https://github.com/heatd/Onyx/blob/master/kernel/kernel/binfmt/shebang.cpp <-- horrible, horrible code that I wrote while looking at linux's shebang code

00:06 <bslsk05> github.com: Onyx/shebang.cpp at master · heatd/Onyx · GitHub

00:06 <heat> it's pretty similar but a bit more readable IMO

00:08 <clever> heat: from what i can remember, i think the error might have been that linux started enforcing that it found a \n within the first N bytes?

00:08 <clever> when previously, it would truncate, and execute whatever was left

00:09 <heat> maybe

00:09 <heat> 30kb shebangs were never a thing

00:09 <clever> for nixpkgs, the perl include path was in the shebang

00:09 <clever> which is why it turned out to be 30kb

00:12 fkrauthan has quit [Quit: ZNC - https://znc.in]

00:13 fkrauthan has joined #osdev

00:14 fkrauthan has quit [Client Quit]

00:16 fkrauthan has joined #osdev

00:18 gildasio has quit [Ping timeout: 240 seconds]

00:21 gildasio has joined #osdev

00:25 <heat> clever, did perl just inject arguments when reading the shebang then?

00:26 orthoplex64 has joined #osdev

00:30 Ali_A has quit [Quit: Connection closed]

00:37 <clever> heat: something like that

00:40 <Clockface> does anyone here have a way to check if DOS is loaded and functioning

00:40 <Clockface> the program initially does not know if it was booted by the BIOS or loaded as a .COM file by DOS

00:43 <clever> Clockface: i think bios only loads the first sector, to 7c00 i think it was, while dos loads the entire .com file

00:43 <clever> so you coudl check the addr and if a magic# is present in the 2nd sector of the binary

00:46 <heat> try a dos interrupt and see if it works

00:47 <klange> clever: true for an MBR, not true for the channel's favorite meme: EL TORITO!

00:47 sikkiladho has quit [Quit: Connection closed for inactivity]

00:48 <geist> EL TORITO!

00:51 <heat> el tor

00:51 <heat> ito

00:51 <clever> klange: ah, i wasnt thinking about a bios with cdrom support

00:54 <heat> the best way to know you weren't booted by the BIOS is to get booted by UEFI

00:54 <heat> thank me later

00:54 wxwisiasdf has joined #osdev

00:55 <wxwisiasdf> ah well just threw autotools outta the window and now i am using make :D

00:55 <geist> huh the #! lore page was interesting

00:57 <heat> wxwisiasdf, tip: don't actually switch until you're sure that's what you want

00:57 <heat> rewritting build files is horrible

00:57 <heat> since I don't know what build system i'm actually going for, i'm keeping the current one for now

00:57 <heat> maybe i'll rewrite it bit by bit

00:57 <Clockface> i like the magic number

00:57 <Clockface> ill do that

00:58 <heat> but it doesn't work

00:58 <wxwisiasdf> no i already did it

00:58 <heat> that fast?

00:58 <wxwisiasdf> i just did it in one shot and it broke everything and my os now has like 2 ubsan bugs and stuff

00:58 <wxwisiasdf> but hey i am not using make

00:59 <wxwisiasdf> now*

01:01 <wxwisiasdf> fortunely i threw like a thousand assertions everywhere so hopefully i can get it back to working

01:02 <geist> well, thats still a bug for you to fix

01:02 <wxwisiasdf> yes :)

01:02 <klange> reason #532 to not use autoconf, it's too automagical and you have no idea what it's actually doing, so you have no control over your compiler

01:02 <geist> probably something to do with the order things got linked maybe (assuming the compile switches are the same)

01:02 <wxwisiasdf> worst part is that i can't use gdb because s390 is kinda sus when it comes to proper emulators

01:04 <wxwisiasdf> geist: it can be anything really - i've been told the z/arch compiler is pretty buggy

01:05 <geist> klange: oh you might be interested in this, someone submitted a patch to LK that does something i thought about but never actually tried. hypothetically it would maybe work on x86 too

01:05 <wxwisiasdf> i've already crashed the xtensa ld once :^)

01:05 <geist> basically when enabling the mmu and branching to the high kernel address

01:05 <geist> instead of having to have a unity mapped low mmu thing

01:06 <geist> set the VBAR (in the case of arm64) or the IDT to the high address, enable the mmu and then wait for the cpu to fault

01:06 <geist> then catch it

01:06 <klange> that sounds evil

01:06 <geist> i thought about it on arm64 before, but figured it'd be too risky to try

01:06 <geist> but they tried it and says it works fine on real hardware and emultor

01:07 <geist> x86 maybe would work too, though would have to think about it

01:07 <geist> it's probably technically UB though

01:07 <clever> what about prefetch and a couple opcodes having come from the phys space?

01:07 <heat> that's horrific and brilliant

01:07 <clever> maybe a `b .` to trap it in the physical domain, until it faults?

01:07 <geist> yah

01:08 <geist> https://github.com/littlekernel/lk/pull/327 for inspiration

01:08 <bslsk05> github.com: [arch][arm64] replace the trampoline translation table with a trampol… by pcc · Pull Request #327 · littlekernel/lk · GitHub

01:08 <geist> could also have a br to the virtual address just to be safe i guess

01:13 <geist> so you could i guess do the same thing on x86 by arranging for the IDT to point such that it's #PF handler points to the next instruction in VA space

01:13 <geist> would have to point at a IDT entry that points to the next thing i guess

01:13 <geist> bt wouldn't have to reserve a huge table, just enough to get to the 14th

01:13 <clever> yeah, this basically deletes the need for an identity mapping

01:14 <geist> right

01:14 <clever> and on the spectre/meltdown front, can the IDT change the paging tables upon fault?

01:14 <geist> i'm sure it's not the first time anyone has thought of it, but it does kinda simplify things if its safe to use

01:14 <geist> on x86-64 though you'd be basically simultaneously turning on the mmu, entering long mode, and faulting

01:14 <geist> so that's a real push

01:15 <clever> oh, nice

01:15 <clever> thats exactly what i originally joined #osdev for, lol

01:15 <clever> and i was cheating, by having qemu pre-create the paging tables for me

01:16 <clever> so with this trick, my asm just has to set the IDT addr, and then turn on mmu+long, and fault into the real _start

01:16 <clever> but that project has long been retired

01:16 <geist> yah though also. you probably want a temporary IDT etc

01:16 <heat> how did that qemu page table thing work?

01:16 <geist> in this case this PR above basically burns 1K of text for a temporary arm64 exception table which is grody

01:16 <clever> heat: i was modifying qemu, so it could run xen unikernels, with the xen hypercall api

01:17 <geist> but can probalby arrange for it to point at something offset such that the VBAR's Nth entry goes where you want

01:17 <clever> heat: so i just populated the guest ram with a paging table, before the cpu came out of reset, and had a custom bios blob

01:17 <geist> but here are also alignment constraints for the arm vbar, etc

01:17 <clever> the idea being to spent as little time in real-mode as possible

01:49 <gamozo> Spending as little time in real-mode is probably one of my favorite hobbies

01:55 gamozo has quit [Quit: Lost terminal]

02:05 dude12312414 has joined #osdev

02:09 eryjus has quit [Remote host closed the connection]

02:12 eryjus has joined #osdev

02:24 gog has joined #osdev

02:27 sonny has joined #osdev

02:38 pretty_dumm_guy has quit [Quit: WeeChat 3.5]

02:43 terrorjack has quit [Quit: The Lounge - https://thelounge.chat]

02:45 terrorjack has joined #osdev

02:54 sonny has quit [Remote host closed the connection]

03:01 nanovad has quit [Ping timeout: 250 seconds]

03:17 nanovad has joined #osdev

03:25 smeso has quit [Quit: smeso]

03:29 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

03:31 smeso has joined #osdev

03:34 <klys> yipe a new unikernel

03:39 gog has quit [Ping timeout: 272 seconds]

03:41 <energizer> did anything end up happening with unikernel linux? i thought that was a good idea

03:42 <energizer> this one https://www.bu.edu/rhcollab/files/2019/04/unikernel.pdf

03:49 zaquest has quit [Remote host closed the connection]

03:57 IT_WAS_ALIENS has quit [Quit: THE RAM IS TOO DAMN HIGH]

03:59 Burgundy has joined #osdev

04:00 Likorn has joined #osdev

04:05 RAMIII has quit [Ping timeout: 260 seconds]

04:08 zaquest has joined #osdev

04:10 Burgundy has quit [Ping timeout: 276 seconds]

04:14 troseman has quit [Ping timeout: 260 seconds]

04:27 DonRichie has quit [Quit: bye]

04:27 gamozo has joined #osdev

04:36 DonRichie has joined #osdev

04:42 <heat> how are you supposed to pick the number of queues and queue depth for an nvme device?

04:43 <heat> like what's the heuristic

04:43 <geist> number of queues may be more based on number of cpus than anything else

04:43 <geist> since i think it's common to at least run some number in parallel, per cpu, up to some point

04:45 <heat> and the queue depth?

04:45 <geist> i dunno that's a good question

04:45 <geist> like a lot of these things, probably just a good guess, with some ability for the sysadmin to adjust it possibly

04:45 <heat> do you just allocate a page by default? do you allocate the whole thing (the whole thing may be too much, 4MB for a single queue)

04:46 <geist> probably something more reasonable, like say 256 or 512 or so entries

04:46 <geist> or a single page yeah

04:46 <heat> when I'm done with mine I should rework fuchsia's driver

04:46 <heat> it's very limited

04:47 <geist> funny you say that, yes yes it is

04:47 <geist> and i think we need someone to work on it

04:47 <heat> single queue, single page for each queue

04:47 <geist> i literally have a machine coming in on fedex tomorrow because it is known to not work with our nvme driver

04:47 <geist> i said i'd take a look at it

04:48 <heat> ooh

04:48 <heat> do you have logs?

04:49 <heat> the driver looks fine from the spec's POV

04:50 <heat> the only totally wrong thing I found is that the timeouts may technically be too short, but 5s should still be enough

04:50 <geist> it fails some transaction early on and then falls over

04:50 <geist> i dont have them handy

04:50 <geist> it gets an unhandled error from the device i think

04:50 <geist> the device is some sort of cheapo hynix thing i think

04:50 <gamozo> Weird, it works with Linux or some other environment?

04:58 <heat> well if you need help you know where to find me on the interwebz

04:59 <geist> heat: yah it's just on my work computer which is there and not here

05:01 <heat> it is indeed not in two locations at once

05:01 <wxwisiasdf> how do i tell ubsan that a NULLPTR i am writing to is okay?

05:01 <heat> you don't, you can't do that

05:01 <heat> you remap the page and try to write there

05:01 <wxwisiasdf> rip

05:02 <heat> you could technically try to fool the compiler but the compiler is smart

05:02 <heat> so maybe do it in assembly I guess

05:02 <wxwisiasdf> oh of course gcc is likely x1000 super smarter - i guess having an asm glue won't hurt too much

05:10 Likorn has quit [Quit: WeeChat 3.4.1]

05:16 No_File has joined #osdev

05:16 <No_File> Good Morning!

05:16 No_File has left #osdev [#osdev]

05:17 <geist> okay!

05:17 <geist> morning!

05:18 heat has quit [Ping timeout: 240 seconds]

06:07 <mrvn> Never expect hardware to actually follow the specs

06:15 <gamozo> Morning @No_File!

06:17 <Mutabah> <<No such user `No_File`>>

06:18 <gamozo> RIP

06:18 <gamozo> Too used to discord at this point I guess

06:18 <gamozo> Haven't been on IRC in years

06:22 <geist> okay, stuffed in an old first gen ryzen in the server

06:22 <geist> see if it is stable now

06:22 <geist> if it is, that doesn't mean much, because this cpu draws less power

06:22 <geist> so its possible it'll not stress out the VREGs as hard

06:22 <geist> and thus is stable

06:26 wxwisiasdf has quit [Ping timeout: 240 seconds]

06:48 nanovad has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]

06:49 nanovad has joined #osdev

07:00 sikkiladho has joined #osdev

07:03 <sikkiladho> How can one implement PSCI_CPU_ON at hypervisor for secondary cpus, code in Trusted Firmware-A is lot complex to replicate. Any examples and docs would be great.

07:06 gxt has quit [Ping timeout: 240 seconds]

07:06 <clever> sikkiladho: to start with, you need to gain control of the other cores, via whatever mechanism the platform supports, dont even bother looking at PSCI until you have your code running on all 4 cores

07:07 <clever> all in hypervisor mode, with the mmu configured the same way

07:08 <clever> if you choose to run under the ATF, then you send it a normal PSCI, if you choose to run with the official arm stub then you poke the spintables and sev

07:08 gxt has joined #osdev

07:08 <sikkiladho> what if I've just booted up and other cores are in reset(or any platform-specific mode for RPi4)? Can I implement PSCI at hyp level to bring up secondary cores?

07:09 <clever> the job of PSCI is to convert the platform specific stuff into a standard api

07:09 <sikkiladho> So it's possible with spin-tables and not PSCI. I think they're different?

07:09 <clever> in the case of the rpi4, coming out of reset, all 4 cores just execute whatever is at PC=0

07:09 <clever> and you have no way to wake a core up after it has died

07:10 <clever> for the pi4, the job of ATF or a hypervisor, is to ensure a core never actually dies, and just sits in an idle loop, waiting for an inter-core message

07:10 <clever> when using the official arm stub, 3 of the cores will park themselves, and wait for an addr in the spintables

07:11 <sikkiladho> and with ATF-A?

07:11 <clever> ATF will gain control of the cores (probably by living at addr 0) on startup, and then it will park 3 of them in its own idle loop

07:12 <clever> and wait for a message from itself (sent by core0, in reaction to a PSCI cmd)

07:13 <clever> so when your hypervisor on core0 sends a PSCI command to wake core1, that just acts as a function call into ATF, forcing a switch into EL3

07:13 <clever> ATF then sends an IPI interrupt to core1, to wake the ATF thread on core1

07:13 <clever> core1 then reads the message, and executes your code in EL2 on core1

07:17 <clever> and you need to do the same when implementing a hypervisor

07:17 <sikkiladho> Thank you, I got it. Secondary cores are in control of ATF so the SMC must be forwarded to EL3(ATF).

07:18 <clever> but you cant just blindly forward the SMC

07:18 <clever> you must first gain control of those cores in hypervisor mode

07:18 <clever> and then setup the guest, the same way you did on core0

07:19 <sikkiladho> Yeah, I would trap the smc and replace the entry-point addres with my own, so that core1 jumps to my address.

07:20 <sikkiladho> and preserve the one sent by linux ofcourse.

07:21 <sikkiladho> So I should gain control of the CPUS before loading the linux.

07:22 <geist> i think in general it's assumed that if you're building a hypervisor, it's a full SMP system

07:23 <geist> so basically the fgirst thing the hypervisor nedes to do is bring up the secondary cores and make them part of the hypervisor itself

07:23 <geist> you say ATF is hard to replicate, well a hypervisor is much more sophisticated

07:24 <geist> since usually they're more or less a full kernel

07:24 <geist> so really, i ask, what are you trying to do here?

07:24 nyah has joined #osdev

07:25 <sikkiladho> @geist thank you. I will try to get control of secondary cpus, before setting up the guest.

07:26 <sikkiladho> I think ATF was hard to replicate because it's for multiple platforms and my hypervisor right now is very simple , but I don't think I have to replicate it in this case. thank you.

07:27 <sikkiladho> @geist I'm building a simple hobby hypervisor for rpi4 which just loads a single linux kernel and sits underneath. At first, that's it.

07:27 <geist> ah

07:28 <geist> well in that case you'll have to be prepared for hypervisor traps from each of the cpus, so though you may not be implementing a complex hypervisor you'll probably need to implement some amount of locking or whatnot internally

07:28 <geist> so in that respect you'll have to handle effectively a SMP hypervisor, even if it's very simple

07:30 <clever> personally, i would just use LK as a base

07:30 <clever> modify the mmu code to support running in EL2 instead of EL1

07:30 <clever> and then use a core-pinned thread for each guest core

07:31 <clever> whenever the LK scheduler thinks it can, it will run that thread, which will then drop down to EL1 and run the guest

07:31 <clever> and when the guest throws an exception/smc, control returns back to that thread in EL2/lk

07:31 <clever> if you want a second guest, just spin up more threads, and let the LK scheduler deal with it

07:32 <clever> pre-empting a guest? ensure timers can force a switch back to EL2!

07:32 <clever> geist: does that all seem sound?

07:33 <clever> hardest part i can see, is just having an LK thread "resume" after it dropped to EL1, like the drop had simply returned

07:34 <geist> yah i think that'd be pretty doable

07:36 <clever> it also loosely reminds me of the linux kvm api

07:37 <clever> where you just have a "run the guest" ioctl

07:37 <clever> and when anything goes wrong and the kernel cant deal with it (hypercalls, faults), the ioctl returns, and your code is left to deal with it

07:38 <geist> yah i have always thought that'd be a fun project

07:39 <geist> just build a pure type 1 hypervisor and run other stuff in it

07:39 <clever> you could similarly implement ATF, by just modifying LK to run in EL3

07:39 <geist> that's my way to assert dominance here: run everyone's hobby OS under mine

07:39 <clever> but i think EL3 is mmu-less?

07:39 <geist> no it has its own, it just doesn't nest

07:39 puck has quit [Excess Flood]

07:39 <clever> ah

07:40 puck has joined #osdev

07:40 <clever> so you would just have to modify the mmu code to support running under EL3/EL2/EL1, and to not drop to EL1 immediately

07:40 <clever> and then compile-time configure what EL you want it to drop to and run under

07:40 <geist> yah the hard part is there is a bunch of code tat accesses _EL1 fairly explicitly, so would have to at least macroize that stuff

07:40 <geist> right

07:40 <clever> and then normal thread/app stuff can deal with running guests at lower levels

07:40 <clever> and setting up secure vs non-secure guests

07:41 <geist> also IIRC EL3 and EL2 MMUs are funny: they only map the bottom part (ie, one of the two TTBRs) *Except* if you have a core that supports the EL2 extensions

07:41 <geist> so implicitly if you're EL3 or EL2 only on v8.0 you're limited to bottom half mmu

07:41 <clever> so you would have to change the kernel base

07:41 <geist> right

07:42 <clever> and change it into using TTBR0 for the kernel

07:42 <geist> right

07:42 <clever> i think lk always uses TTBR1

07:42 <clever> because it assumes its in the high half, and leaves 0 free for a userland

07:43 <geist> right

07:44 <clever> i think access to _EL1 regs will also work from EL2/EL3?

07:44 <clever> because the hypervisor/tf may want to modify EL1 state

07:44 <clever> so you cant rely on faults to tell you when your using the wrong regs

07:44 <mrvn> One more reason to run a lower half kernel / higher half user :)

07:45 <clever> and would have to audit the output asm

07:45 <mrvn> Where user in this case would be the linux kernel

07:45 <geist> sure

07:45 <clever> mrvn: oh, random thought, a high half userland, means that null pointers are "safer", even with a +3gig offset, lol

07:45 <geist> clever: that's right (re: _EL1 access)

07:46 <clever> you would need a massive positive offset, for it to clear over the kernel, and hit userland

07:47 <geist> but yeah iuser space being at the bottom is a pretty standard scheme now

07:47 <geist> usual reasons

07:47 <geist> and then some arches codify it

07:48 <geist> but not the general modern ones

07:49 <clever> and thinking about it a bit, more from a malicious angle

07:49 <clever> if i wanted the hypervisor to hide itself from a linux guest

07:49 <mrvn> AArch64 seems to codify: hypervisor = lower, kernel = higher, user = lower

07:50 <clever> i would need to block access to a region of memory where the hypervisor lives, and maybe mess with dma controller commands, to stop you from using dma to peek behind the hypervisor mmu

07:50 <mrvn> clever: you can just swap address spaces when enteriong the hypervisor and put it anywhere

07:50 <mrvn> short of that little change addres spaces stub

07:50 <mrvn> I think you pretty much have to do it that way on 32bit.

07:50 <clever> using nested paging tables, i should be able to ban linux from reading a 1mb chunk of ram

07:51 <clever> but i could map that to some other address, to make it less obvious

07:51 <mrvn> no nested tables in hpyervisor mode

07:51 <clever> isnt that the whole point of hypervisor mode, so you can run the kernel under a second set of tables?

07:51 <mrvn> ahh, sorry, yes, linux would be nested

07:51 <clever> EL2 sets up the nested tables, EL1 sets up its own tables, and now all translations go thru both EL1 and EL2's tables

07:52 <clever> and EL2 can use that to hide the hypervisor from linux

07:52 <mrvn> nod

07:52 <clever> at which point, how can linux detect the hypervisor?

07:52 <mrvn> only by trying to use some address space and it not working

07:53 <clever> what if i map the hypervisor's address to some other part of ram

07:53 <clever> so that 1mb block shows up at 2 addresses

07:53 <mrvn> what you would see is that you have an odd ram size.

07:53 <clever> and both are within the "no touchy" zone declared by the rpi firmware

07:53 <clever> which is already stealing 24mb of ram

07:53 <clever> i can just boot with gpu_mem=23, and now the firmware only steals 23mb

07:53 <clever> then take the extra 1mb for my hypervisor

07:54 <mrvn> yep, that hides it well

07:54 <clever> and the ram size is just as odd as without the hypervisor

07:54 <clever> the only sign it happened, is that a 1mb chunk of that 24mb "dont look here" is duplicated

07:54 <mrvn> can you ask the VC for it's ramsize?

07:54 <clever> on pi4, that is permanently pegged at 1024

07:54 <clever> the VC is only aware of the lower 1gig

07:54 <mrvn> I mean the gpu_mem

07:55 <clever> you can, but i could just hook those routines...

07:55 <clever> but, you just gave me a crazy idea

07:55 <clever> i could live inside the gpu_mem's heap!

07:55 <mrvn> there are probably some follow up problems if you mess with that

07:55 <clever> using mailbox functions, i can allocate say a 1mb object on the VC's heap

07:55 <clever> and then i can copy my hypervisor into that

07:56 <clever> now it really is "in use" by the firmware!

07:56 <mrvn> yeah, maybe better. And you could display the hypervisor memory as graphics output for fun

07:56 <clever> already tried that in another crazy idea, i wanted to dump the bootrom on the framebuffer, without bringing ram online :P

07:56 <clever> but i think the framebuffer cant be too close to 0 in ram

07:56 <clever> "ram"

07:57 <clever> so i have to bring ram online, to address further away from 0 and have it function

07:57 <mrvn> isn't the bootrom some secure memory that the graphics chip wouldn't be able to access?

07:58 <clever> much like the gameboy and xbox, its just a normal axi slave, until you set a magic flag, then it drops off the bus and that addr becomes ram

07:58 <mrvn> does the RPi4 have the secure extension?

07:58 <clever> the secure extensions in the ram controller, are wired into the VC, not the arm

07:58 <clever> so only the VC in secure mode, can access protected pages

07:58 <clever> the official firmware runs in non-secure mode by default, and has an array of trusted functions that can be ran in secure mode

07:59 <clever> and a syscall like api, to run a function by index

08:00 <clever> secure_fn_0 is used as an index lookup, you give it a function pointer, and it returns the index into that array

08:00 <clever> that index is then stored under this->fn_foo_index, and later used to call it

08:02 <clever> the VC has a 128 slot vector table, 32 slots for cpu exceptions, 32 slots for software interrupts (like int 0x80), and 64 slots for hw interrupts

08:03 <clever> each slot is just a PC to jump to, but bit0 of the value signals if the vector should be serviced in the current mode or secure mode

08:03 <clever> so storing `&irq_uart | 1` into a slot, causes the irq handler to be ran in secure mode

08:03 <clever> and the same for software interrupts

08:04 the_lanetly_052 has joined #osdev

08:04 <clever> each core (there are 2) also has a register for the base addr of that vector table (much like arm's VBAR)

08:05 <clever> alignment is enforced by the register simply not storing the lower bits, so if you read it back, its been rounded down to the nearest alignment

08:06 xenos1984 has quit [Read error: Connection reset by peer]

08:07 <clever> mrvn: nested paging tables are also taken to another level on the rpi, there is an extra mmu between "arm physical" and real ram, 64 pages of 16mb each

08:07 <clever> so you can potentially be going thru 3 paging tables, EL1, EL2, broadcom

08:08 <clever> the broadcom mmu is applied outside of the arm l1/l2 caches, so a cache-hit wont have any perf cost

08:20 graphitemaster has quit [Ping timeout: 260 seconds]

08:23 graphitemaster has joined #osdev

08:23 xenos1984 has joined #osdev

08:41 eau has quit [Quit: bleh!]

09:03 No_File has joined #osdev

09:03 sympt0 has joined #osdev

09:04 sympt has quit [Ping timeout: 252 seconds]

09:04 sympt0 is now known as sympt

09:07 diamondbond has joined #osdev

09:46 GeDaMo has joined #osdev

10:22 kingoffrance has quit [Ping timeout: 240 seconds]

10:28 sprock has quit [Quit: brb]

10:34 sprock has joined #osdev

10:49 gog has joined #osdev

10:57 the_lanetly_052_ has joined #osdev

10:59 sikkiladho has quit [Quit: Connection closed for inactivity]

11:00 the_lanetly_052 has quit [Ping timeout: 260 seconds]

11:06 eau has joined #osdev

11:15 kingoffrance has joined #osdev

11:15 Vercas9 has joined #osdev

11:17 Vercas has quit [Ping timeout: 240 seconds]

11:17 Vercas9 is now known as Vercas

11:36 <ddevault> can someone explain what the %gs register is for

11:36 <ddevault> I am utterly failing to understand its (apparently important) purpose

11:36 <GeDaMo> Thread local storage?

11:39 <gog> yes, typically %gs contains the base address for the thread's local data

11:39 <gog> %fs and %gs

11:39 <ddevault> hm

11:39 <gog> this was the convention before and since a few CPU generations ago is supported by CPU instructions

11:41 <gog> before amd64 thread-local storage was managed with the GDT, now it's managed with a pair of MSRs

11:42 <ddevault> I see

11:49 diamondbond has quit [Remote host closed the connection]

11:57 No_File has quit [Quit: Client closed]

12:12 <klys> global segment

12:13 nyah has quit [Quit: leaving]

12:13 jimbzy has quit [Ping timeout: 240 seconds]

12:15 gog has quit [Ping timeout: 248 seconds]

12:22 vinleod has joined #osdev

12:23 wand has quit [Remote host closed the connection]

12:23 vdamewood has quit [Ping timeout: 240 seconds]

12:23 wand has joined #osdev

12:45 nyah has joined #osdev

13:12 ZetItUp has joined #osdev

13:35 gog has joined #osdev

13:37 <mrvn> can one apply __attribute__((__packed__)) to a template<typename T>? gcc always says it will ignore it.

13:53 gog has quit [Ping timeout: 272 seconds]

14:19 Vercas has quit [Quit: Ping timeout (120 seconds)]

14:22 wand has quit [Ping timeout: 240 seconds]

14:22 gildasio has quit [Ping timeout: 240 seconds]

14:22 gxt has quit [Ping timeout: 240 seconds]

14:27 vdamewood has joined #osdev

14:27 the_lanetly_052_ has quit [Ping timeout: 276 seconds]

14:29 vinleod has quit [Ping timeout: 276 seconds]

14:29 heat has joined #osdev

14:33 <bauen1> mrvn: some code i have here says you can, at least gcc (10, 11, 12) isn't complaining

14:34 <bauen1> code in question is roughly: `template <typename T> struct [[gnu::packed]] Timed { T value; }`

14:34 <mrvn> bauen1: is doesn't complain, it just ignores it. Check sizeof()

14:34 <mrvn> or it complains that it will ignore it

14:34 <mrvn> bauen1: your Timed is packed but T is not packed. So overall you just changed the alignment to 1 and broke T.

14:35 <mrvn> Try struct T { char c; int i; }; the value is not packed.

14:36 <bauen1> oh i hate c++

14:37 <mrvn> The problem might be that the [[gnu::packed]] neess to be between "struct" and "Name" in the T.

14:38 <bauen1> wtf

14:39 <bauen1> no, what, https://godbolt.org/z/Pecr3hsE3 seems to work

14:39 <bslsk05> godbolt.org: Compiler Explorer

14:39 <mrvn> bauen1: don't forget that packed isn't recursive. A struct in a packed struct is not itself packed. You have to apply the attribute to every sub struct too.

14:40 <mrvn> yes, packing S works, but packing Timed doesn't pack the inside T.

14:41 <bauen1> oh, i don't think that will be a problem, the code that cares about packed static_asserts that alignof(T<...>) == 1

14:41 <mrvn> I even tried this: <source>:4:34: warning: attributes ignored on elaborated-type-specifier that is not a forward declaration [-Wattributes] 4 | template <struct [[gnu::packed]] T>

14:42 <mrvn> ahh, I didn't think of you asserting it's packed. thanks.

14:42 <mrvn> well, packed or only contains chars

14:43 <mrvn> s/you/your/

14:44 <bauen1> mrvn: i have written a header that asserts all kind of weird things to ensure a struct can be passed between 2 platforms without issues, except you can't entirely ensure that as someone can always 1. forget to add a static_assert(sizeof() = x) on their struct and use types that actually have a different size, e.g. `long int`

14:44 <mrvn> Anyone know what the state of introspection is for c++? Could one use that to recursivley generate a "struct [[gnu::packed]] PackedT" from any given T?

14:44 <bauen1> it starts with:

14:44 <bauen1> static_assert(CHAR_BIT == 8, "Please use a reasonable platform");

14:45 <mrvn> hehe

14:45 <mrvn> You must be happy that int is now a two's complement.

14:45 <mrvn> till recently you could only share unsigned types and intX_t.

14:46 <bauen1> lol we're sharing floats and doubles here ...

14:46 <heat> c++ is a prime example of stockholm syndrome

14:46 <mrvn> uhoh. what about archs without denormalized doubles?

14:46 <heat> fuck em

14:47 <bauen1> there's also some really shitty stub headers so i can compile the microcontroller firmware for linux, and get all the offset of struct members and some other information exporter into JSON

14:47 <mrvn> On alpha doubles aren't even ieee unless you add a gcc flag that makes it run half speed.

14:47 <bauen1> writing a program to use libclang was also considered, but as far as i could see libclang works on the AST and not on e.g. the final struct layout / values

14:47 <mrvn> In the future you can do that with introspection.

14:47 <bauen1> mrvn: in the future there will be rust ...

14:48 <mrvn> can rust already do introspection?

14:48 <bauen1> mrvn: not sure, but it has macros / the derived-thingy that would make this exact thing a lot easier to build i think

14:49 <heat> bauen1, you could definitely use clang libraries to do that

14:49 <heat> clangd already knows sizes and alignments and whatnot

14:51 <bauen1> heat: libclang seems to only operate on the AST, or at least I couldn't figure how to find a list of all struct types in the entire project that fullfil a certain critera (e.g. passed to template, passed to function)

14:51 <bauen1> heat: problem is that all of this probably involves a few too many layers of templates :(

14:52 Vercas has joined #osdev

14:54 <mrvn> There is no problem that can't be made more magic by the use of more templates.

14:57 Mikaku has quit [Excess Flood]

14:57 Mikaku has joined #osdev

14:59 wand has joined #osdev

15:00 gildasio has joined #osdev

15:00 gxt has joined #osdev

15:29 nick64 has joined #osdev

15:40 Gooberpatrol_66 has joined #osdev

15:41 ptrc_ has joined #osdev

15:41 paulbarker_ has joined #osdev

15:42 psykose_ has joined #osdev

15:42 sham1_ has joined #osdev

15:42 mxshift_ has joined #osdev

15:42 seds_ has joined #osdev

15:42 merry_ has joined #osdev

15:42 dminuoso_ has joined #osdev

15:42 MiningMa- has joined #osdev

15:42 mcfrd has joined #osdev

15:43 shikhin_ has joined #osdev

15:43 night_ has joined #osdev

15:43 tomaw has quit [Killed (mercury.libera.chat (Nickname regained by services))]

15:44 froggey-1 has joined #osdev

15:44 dzwdz1 has joined #osdev

15:44 lg_ has joined #osdev

15:45 tomaw has joined #osdev

15:45 shikhin has quit [Killed (NickServ (GHOST command used by shikhin_))]

15:45 shikhin_ is now known as shikhin

15:45 kanzure_ has joined #osdev

15:49 dzwdz has quit [*.net *.split]

15:49 psykose has quit [*.net *.split]

15:49 hgoel[m] has quit [*.net *.split]

15:49 Gooberpatrol66 has quit [*.net *.split]

15:49 ptrc has quit [*.net *.split]

15:49 lg has quit [*.net *.split]

15:49 MiningMarsh has quit [*.net *.split]

15:49 kanzure has quit [*.net *.split]

15:49 mcfrdy has quit [*.net *.split]

15:49 merry has quit [*.net *.split]

15:49 dayimproper has quit [*.net *.split]

15:49 dminuoso has quit [*.net *.split]

15:49 mxshift has quit [*.net *.split]

15:49 froggey has quit [*.net *.split]

15:49 sham1 has quit [*.net *.split]

15:49 seds has quit [*.net *.split]

15:49 night has quit [*.net *.split]

15:49 warlock has quit [*.net *.split]

15:49 paulbarker has quit [*.net *.split]

15:49 unlord has quit [*.net *.split]

15:50 ptrc_ is now known as ptrc

15:50 mcfrd is now known as mcfrdy

15:50 psykose_ is now known as psykose

15:50 MiningMa- is now known as MiningMarsh

15:50 merry_ is now known as merry

15:50 mxshift_ is now known as mxshift

15:50 paulbarker_ is now known as paulbarker

15:50 seds_ is now known as seds

15:56 dayimproper has joined #osdev

15:58 No_File has joined #osdev

16:00 _xor has joined #osdev

16:01 hgoel[m] has joined #osdev

16:04 _xor has quit [Client Quit]

16:34 gog has joined #osdev

16:35 Likorn has joined #osdev

16:42 No_File has quit [Quit: Client closed]

17:04 dude12312414 has joined #osdev

17:15 kanzure_ is now known as kanzure

17:18 joe9 has joined #osdev

17:19 blockhead has joined #osdev

17:22 Likorn has quit [Quit: WeeChat 3.4.1]

17:29 froggey-1 is now known as froggey

17:43 diamondbond has joined #osdev

17:48 night_ has quit [Quit: goodbye]

17:49 night has joined #osdev

17:52 No_File has joined #osdev

17:53 dude12312414 has quit [Remote host closed the connection]

17:58 nick64 has quit [Quit: Connection closed for inactivity]

18:01 mahmutov has joined #osdev

18:04 Likorn has joined #osdev

18:10 sham1_ is now known as sham1

18:12 rustyy has quit [Quit: leaving]

18:12 rustyy has joined #osdev

18:14 gog has quit [Ping timeout: 272 seconds]

18:24 <geist> ddevault: are you using x86-64 or x86-32?

18:24 <ddevault> the former

18:24 <geist> also ugh, was responding to something 8 hours ago

18:24 <geist> oh looks like no more dicussion was on it

18:24 <ddevault> I still don't fully understand %gs, but I don't really need to right now

18:24 <geist> so yeah as gog was saying gs: is largely vestigial

18:24 <ddevault> well, I understand what it *was* for

18:24 <geist> basically the *value* in gs (and fs) is irrelevant now

18:25 <ddevault> but I don't really understand what kernels still do with it

18:25 <ddevault> in any case, my code works so I'm happy enough

18:25 <geist> but you can use an override prefix to dereference something off it

18:25 <geist> ie

18:25 <geist> mov gs:4, rax or something like that

18:25 <ddevault> hm

18:25 <geist> basically take the address that is 4 off of what gs 'points to' and move into rax

18:25 <geist> and that's accomplished in the assembler via a segment override prefix byte

18:26 <geist> the way gs (and fs) 'point to' something in x86-64 is not via the GDT like it used to, but via a set of MSRs you can set

18:26 <ddevault> I see

18:26 <geist> GS_BASE FS_BASE and GS_KERNEL_BASE

18:27 <geist> *basically* it's used for thread local storage in user space. traditionally fs points to the thread local structure

18:27 <geist> and in the kernel GS usually points to somethig similar. a cpu specific data structure

18:27 <mrvn> iirc on x86 the use of fs/gs is reversed

18:27 <geist> in an SMP system you always want to have at least one per-cpu structure that you can anchor things off of

18:28 <geist> so it's traditional (and kinda baked into the arch in 64bit) that gs points to that inside the kernel

18:28 <geist> on non SMP it isn't really mandatory

18:28 <ddevault> I understand, that makes more sense now

18:28 <ddevault> I was not grokking that the use by convention differed from the use per the CPU manual

18:28 <ddevault> thanks :)

18:29 <geist> yah the manual wont really describe what it's for, just the mechanism

18:29 <geist> this is also where GS_KERNEL_BASE and GS_BASE and swapgs will start to make sense

18:30 <ddevault> what confused me is that it had a much more important purpose before

18:30 <ddevault> so all of the docs cover it in great detail regarding its legacy use

18:30 <geist> which initially is head scratching, but if you have both the kernel and user space use GS, those features start to make sense

18:30 <mrvn> ddevault: You mean as an actual segment descriptor?

18:30 <geist> wasn't so much important as fs and gs were just another one of the regular segment registers then (ds, es, fs, gs, ss, cs)

18:30 <ddevault> yeah

18:30 <geist> and protected mode segment stuff was somewhat more powerful

18:31 <mrvn> it's been repurposed since all segment start/limit is ignored in 64bit.

18:31 <ddevault> yeah

18:31 <geist> exactly, so in 64bit the other 4 registers are basically entirely vestigial (except cs signalling what mode you're in)

18:31 <ddevault> but I still saw kernels in the wild messing with it

18:31 <ddevault> so I was a bit unsure as to why they were bothering and if it was important

18:31 <mrvn> What is surprising is: what's up with "es"? Why isn't that used?

18:31 <geist> but they left some functionality in fs/gs, but indirectly (via the MSRS) or via the new instructions to let you set them directly (fsgsbase instructions)

18:32 <geist> mrvn: anymore or at some point?

18:32 <mrvn> geist: in 64bit mode

18:32 <geist> oh i guess AMD basically left in the bare minimum

18:32 <mrvn> Is the "es" prefix worse than fs/gs?

18:32 <geist> also es has some hard coded uses in some instructions, so i'm guessing they left it along for that reason

18:32 <geist> otherwise you'd have to also modify those instructions to not use it, etc

18:33 <mrvn> ahh, that would explain it. Stipping it out of the instructions for 64bit mode would be complex.

18:33 <geist> yah iirc movs implicitly uses es for one of the sources? (amirite there?)

18:33 <GeDaMo> Destination, I think

18:33 <geist> or destination

18:34 <geist> yah

18:34 <geist> i dunno how the segment override prefixes work with movs. which side does/can it modify?

18:34 <GeDaMo> "For legacy mode, Move byte from address DS:(E)SI to ES:(E)DI. For 64-bit mode move byte from address (R|E)SI to (R|E)DI."

18:34 heat has quit [Ping timeout: 248 seconds]

18:34 <GeDaMo> Doesn't seem to apply in long mode

18:35 <geist> GeDaMo: yeah or it does implicitly use ds/es except those have no offset/length so effectively it disables it

18:35 <geist> also interesting questin: can you use fs or gs override prefix for it in 64bit mode

18:36 <GeDaMo> "The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden."

18:36 <GeDaMo> https://www.felixcloutier.com/x86/movs:movsb:movsw:movsd:movsq

18:36 <bslsk05> www.felixcloutier.com: MOVS/MOVSB/MOVSW/MOVSD/MOVSQ — Move Data from String to String

18:37 <geist> there ya go. makes sense

18:38 <geist> or at least doesn't really make sense, but thats the answer!

18:38 <geist> as is lots of x86isms

18:39 <mrvn> If an opcode uses two segment registers then you can't override both of them. There is no "override the other segment" prefix byte.

18:40 <geist> right

18:40 <geist> that's the at least internally consistent part of it

18:41 <geist> and outside of movs i dont think too many other instructions access two pointers at the same time

18:41 <geist> i'm sure there's some other one somewhere (there always is) but id ont know of it offhand

18:45 <zid> does push [] count

18:45 <zid> also uses two selectors

18:47 <zid> (does that even exist?)

18:48 <geist> push indirectly? i dont think so

18:48 <mrvn> https://www.felixcloutier.com/x86/push

18:48 <bslsk05> www.felixcloutier.com: PUSH — Push Word, Doubleword or Quadword Onto the Stack

18:48 <geist> and yyeah indirects or double indirects may reference more than one thing but i dont rememer if x86 has a bunch of those

18:48 <mrvn> memory, register or immediate. The first would use 2 segments.

18:49 <geist> ie, indirect this and then use that word to then indirect something else

18:55 brynet has quit [Quit: leaving]

19:03 <mrvn> geist: mov (#1, ds:r2*4, es:r3), (#3, fs:r4*8, gs:r5) to the rescue.

19:04 <geist> hmm?

19:05 <mrvn> a hypothetical 4* indirect addressing opcode

19:05 <geist> ah

19:05 <geist> i was expecting you to show off a 68k opcode that does this no sweat :)

19:06 <mrvn> no braindead segements in m68k :)

19:06 <mrvn> any idea how m68k does TLS?

19:06 <geist> though i gotta say x86 limiting themselves to one memory deref per instruction in most cases really does make the microcode simpler

19:06 <geist> 68k and vax have fairly complex internal states to make sure that page faults or whatnot on the Nth operation can be unwound and restarted

19:07 <geist> good question re 68k TLS

19:08 <mrvn> They actually screwed that up in the 68020. can't recover from a bus error so they run 2 68020 (iirc) in parallel with a clock offset. If the first throws a bus error the second gets stopped before it become unrecoverable.

19:09 <mrvn> What a way to unwind an opcode on error

19:09 <geist> 68000 IIRC. 010 fixed that among other things

19:10 <mrvn> How is your m68k board?

19:10 <geist> from poking around the web i've seen a few references to sysv 68k abi just not having thread local storage. have to make a syscall in linux

19:10 <geist> it's doing fine, need to futz with it some more

19:21 brynet has joined #osdev

19:33 pretty_dumm_guy has joined #osdev

19:42 bliminse has quit [Quit: leaving]

19:43 No_File has quit [Quit: Client closed]

20:07 jimbzy has joined #osdev

20:12 wxwisiasdf has joined #osdev

20:12 GeDaMo has quit [Quit: There is as yet insufficient data for a meaningful answer.]

20:12 <wxwisiasdf> hello

20:13 <wxwisiasdf> how do i tell gcc to interpret printf formats with the -fexec encoding

20:14 <wxwisiasdf> i get lots of spurious warnings because i am using -fexec-charset=ibm-930 and it's very annoying because i have to basically rely on me not messing up formatting things on the kernel

20:15 <mrvn> no c++?

20:15 <wxwisiasdf> no it's c

20:16 <mrvn> maybe you should start there :)

20:16 <wxwisiasdf> ???

20:17 <wxwisiasdf> oh i see using automatic type deduction for formatting from c++

20:17 <mrvn> std::format

20:17 <wxwisiasdf> yeah

20:17 <wxwisiasdf> but this a kernel :)

20:17 <mrvn> even more reason to have it type safe

20:18 <Griwes> Idk what this being a kernel has to do with anything, my kernel formats stuff with std::format :P

20:18 Gooberpatrol_66 has quit [Quit: Leaving]

20:18 <mrvn> Griwes: with or without type erasure?

20:19 <Griwes> It's per std::format spec

20:19 <wxwisiasdf> okay if not the kernel then my libc also uses printf for the various *nix utilites

20:19 <mrvn> type erasure is a implementation improvement

20:19 <Griwes> Which type erasure

20:20 <Griwes> As the spec stands, you need to erase some argument types and you need to erase the iterator

20:20 <wxwisiasdf> and those are -fexec-charset ibm930 too

20:20 <mrvn> can't remember exactly but it reduces the code bloat

20:20 <Griwes> The iterator erasure, then

20:21 <Griwes> It was DR'd to be effectively required

20:21 Gooberpatrol66 has joined #osdev

20:24 mctpyt has joined #osdev

20:25 <Griwes> The thing that reduced most code bloat for me was a very careful dance of force inlining just the correct things

20:26 <mrvn> basically everything before the type erasure and nothing after

20:29 <Griwes> not... *quite*

20:29 <Griwes> it was a bit more involved

20:29 diamondbond has quit [Quit: Leaving]

20:29 mctpyt has quit [Ping timeout: 260 seconds]

20:34 joe9 has quit [Quit: leaving]

20:35 heat has joined #osdev

20:35 <heat> sup noobs

20:40 <jimbzy> Messing around with a schematic. Und du?

20:42 <heat> nothing, just got home

20:42 <heat> i'll probably try to finish my nvme driver tonight

20:42 <jimbzy> Schweet

20:44 <mrvn> tell me more

20:44 <mrvn> ups

20:45 <heat> tell me less

20:47 <mrvn> https://youtu.be/ZW0DfsCzfq4?t=60

20:47 <bslsk05> 'Grease - Summer Nights HD' by Kurt Harmsworth (00:04:01)

21:05 joe9 has joined #osdev

21:11 diamondbond has joined #osdev

21:19 joe9 has quit [Quit: leaving]

21:23 wxwisiasdf has quit [Ping timeout: 276 seconds]

21:24 heat_ has joined #osdev

21:24 heat has quit [Read error: Connection reset by peer]

21:30 heat_ is now known as heat

21:34 <geist> heat: looks like someone debugged the nvme fuchsia driver problem

21:35 <heat> oh man they ruined the fun :/

21:36 <heat> what was it?

21:37 <geist> some assumption the driver had about somethig. will check in a sec

21:37 <geist> not at work computer

21:37 <geist> iirc it was somehting like the river assumes you can build a queue this long but the device naked it

21:43 <heat> aha

21:43 <heat> IO queue right?

21:43 <heat> (not admin)

21:45 <geist> i'll have to check

21:45 <geist> afworkk right this sec

21:51 <heat> yeah i think it must be

21:51 <heat> unless the nvme is buggy

21:52 <heat> the queue limit they give you only applies to the io queue

21:52 <heat> btw I found out why PRPs and SGLs both exist

21:52 <heat> SGLs weren't a thing on spec 1.0

21:53 mahmutov has quit [Ping timeout: 256 seconds]

21:53 <heat> it can also explicitly not support SGLs

21:53 <heat> so you do need to support both PRPs and SGLs in your driver (yay complexity!)

21:54 archenoth has quit [Read error: Connection reset by peer]

21:57 <geist> ah that makes sense. i was expecting that SGLs are optional

21:57 <geist> so then the questio is what subset of consumer hardware supports it

21:58 <geist> i was thinking this was similar to the compex descriptors in SDHCI which has a similar thing (in spirit)

21:58 <geist> ie a simple scheme that everything supports and the complex one

21:58 <geist> which effectively means the simple one is the one you worry about

21:58 _xor has joined #osdev

21:58 <geist> and the other one is gravy that maybe you can use

22:00 <heat> linux does seem to use SGLs by default since they're probably faster

22:01 <geist> oh sure, favor the fancy thing but fall back

22:01 archenoth has joined #osdev

22:01 <heat> see, this is where I wonder if a buddy allocator is really the best choice for a page allocator

22:01 <heat> more contiguous memory = better

22:02 <heat> if you try to do SGLs on really fragmented memory you'll end up with basically a larger PRP

22:02 <heat> like how much are you actually paying at page alloc time vs all the speed ups you can go for

22:03 <heat> hugepages too

22:03 <geist> yah agreed re buddy allocator

22:03 <geist> i'm not a huge fan, but i think to be honest it's beacuse i'm not a fan of doing whatever linux does because they are

22:03 <gamozo> Mornin everyone!

22:03 <heat> sup gamozo

22:04 <gamozo> I do 5 hours of yard work and now apparently I need to sleep until 2pm

22:04 <gamozo> Ahaha

22:04 <heat> geist, what did your other projects use?

22:04 <geist> for pmm? the queue

22:04 <geist> just a queue of pages in whatever order

22:04 <heat> ah just the simple list?

22:04 <geist> yep. zircon does too

22:04 <geist> hard to argue with O(1)

22:05 <heat> idea: buddy allocator as the backend, cache of memory regions as a percpu thing

22:05 <j`ey> a single queue of PAGE_SIZE's?

22:05 <heat> yes

22:06 <geist> maybe multiple queues for different numa nodes, etc but the idea is the same

22:06 <geist> carve off a struct per page and toss it in a list

22:06 <geist> works quite well, lots of large systems have survived on that

22:06 <heat> if you percpu cache it, you eliminate lock contention and stop any possible yo-yo of regions when alloc/freeing

22:06 <geist> it just hurts to allocate more than one contig page

22:06 chartreus has joined #osdev

22:07 <geist> yep. we have a per cpu cache in front of the pmm now. helped a lot

22:08 <heat> this is food for thought

22:08 <j`ey> nomnom

22:08 <heat> what does nt use?

22:08 <heat> or freebsd?

22:09 ZetItUp has quit [Ping timeout: 276 seconds]

22:09 <heat> maybe it's in the windows internals books, can't find any information about it online

22:12 <geist> i think NT is queue based

22:28 <kazinsal> believe so

22:29 <kazinsal> I'd have to crack open Windows Internals to be sure

22:29 <kazinsal> and that's a lot of dead trees that's all the way over at the other side of my apartment, a whole 10 steps away

22:30 <geist> fairly certain queues of pages is pretty much the defacto implementation for more or less everything that was conceived of <2000 or so

22:32 <heat> here's a cute detail: windows maps IO ranges with large pages if it sees it can

22:33 <heat> i'm unsure of how this plays along with that scary x86 UB for large pages with multiple PAT attributes or whatever that was that doug16k once mentioned

22:34 Likorn has quit [Ping timeout: 240 seconds]

22:35 diamondbond has quit [Quit: Leaving]

22:39 <heat> haha they got struct page'd too

22:39 <heat> it's also a horrible thing with fields that are overloaded 4 times

22:42 <zid> Getting struct page'd is a terrible affliciton :'(

22:45 heat has quit [Read error: Connection reset by peer]

22:45 heat has joined #osdev

22:57 <heat> page tables are swappable wtf

22:57 nyah has quit [Ping timeout: 260 seconds]

22:58 <mrvn> if you have nested tables nothing stops you from swapping one of them

23:00 <kazinsal> just bank switch the switched bank

23:00 thatcher has joined #osdev

23:01 <heat> they mention lists of PFNs quite a lot

23:01 <heat> but I don't know if this is the actual format of the lists

23:01 <heat> they don't really mention large pages

23:02 <heat> except the "we try to use large pages transparently" part

23:07 <geist> yah i never completely grokked what they were talking about with 'prototype page table's or whatnot

23:07 <geist> PFN is probably just a way ot saying 'page address shifted over'

23:07 <geist> lots of systems do that

23:08 <geist> re: swappable page tables, that may have fallen out of earlier experience with VAX, which actually letsyou swap page tables

23:08 <heat> PFN is also what they call their struct page

23:08 <geist> remember most of the early devs for it were ex vax folks, so they applied a lot of the same design patterns

23:08 <geist> IRQL and whatnot is 100% a vax hardware feature they brought forward and emulated in software because they were used to the model

23:09 <heat> allegedly, they get PFNs on like 6 or 7 lists (dirty, clean, zero'd, unused, etc) and then allocate from those

23:10 <geist> yep. and theres a priority scheme there

23:10 <geist> when allocating a page it walks down the lists and finds the first one in the right list

23:10 <heat> but what constitutes a "list" is unclear to me. they mention a simple linked list

23:10 <geist> and then there's machinery that tries to keep the lists balanced and whatnot

23:10 <geist> i think it's basically an array of lists, in an allocation priorut order. i like that model, eve if it's only conceptual

23:11 <heat> but they surely mustn't be using a simple linked list if they want to allocate large pages

23:11 <heat> this section seems... vague

23:11 <heat> maybe microsoft will drop the nt kernel sauce next month

23:11 <geist> yah dunno how large pages would work in that model

23:11 <geist> but you said before it was for io pages

23:12 <geist> dunno the extent of large page support for non io pages

23:12 <geist> or maybe it's transparent in certain ways, like specific contiguous page allocation paths and then in that scenario it'll do a larg map if it works out that way

23:12 <geist> zircon has that

23:12 JanC has quit [Remote host closed the connection]

23:12 <heat> virtualalloc has a MEM_LARGE_PAGES

23:12 JanC has joined #osdev

23:14 <heat> they say they could possibly breakdown a 1040MB allocation into 1 huge page and 4 large pages

23:16 <heat> geist: re prototype PTEs, it seems they're like vm objects

23:17 <geist> possible. maybe they track the page assignments of pages to vmos using some sort of N level thing that they call prorotype PTEs

23:17 <heat> like a shadow page table, real PTEs point to it (with the P bit 0'd)

23:17 <geist> yah

23:18 <geist> how that would precisely work i dunno, but i guess the gist is to use some sort of similar thing

23:18 <heat> linux also uses a page table-ish structure for their vmos

23:19 <heat> the so called radix tree, now renamed xarray

23:19 <geist> right

23:20 <geist> zircon uses a wavl tree of runs of 16 pages

23:20 <geist> basically arbitrarily picked to be a reasonable compromise

23:20 <heat> oh yeah for sure, the prototype page tables are like a little page table for the "vmo"

23:20 <heat> why did you go for a wavl tree?

23:21 <geist> yahwhether or not you can actually use it as a real page table i dunno

23:21 <geist> because we already have a wavl tree implementatio

23:21 <geist> ro you mean why did i use a tree in the vmo or why a wavl tree vs some other tree?

23:21 <heat> why a tree vs the radix-tree/prototype page table thing

23:21 <geist> ah simply the former

23:21 <heat> s/tree/binary tree/

23:22 <geist> we already had it and it was expediant and it has pretty reasonble performance and size characteristics

23:22 <geist> especially given that most vmos are pretty small, and thus really only end up with a single run of pages

23:22 <heat> the most ingenius use of the page table stuff I've seen in linux is that you can kinda figure out what's dirty right from the top level of the tree

23:23 <heat> just like a page table

23:23 <geist> oh yeah?

23:23 wxwisiasdf has joined #osdev

23:23 <geist> ah

23:23 <wxwisiasdf> finally i got rid of the debug diag 8 cmd and now i use a proper ic console :D

23:24 <heat> when dirtying, they queue the dirty inodes; on writeback, they look at the radix tree and go down the branches that are dirty (literally a D bit)

23:24 <heat> then you can easily writeback large runs of pages at once

23:24 <geist> ah makes sense

23:24 <geist> yah that's a thing we wouldn't be able to do in the wavl tree because the order of the tree is not constant

23:27 <heat> yup

23:28 <heat> i was thinking about going down the radix tree route and making a dynamically growable tree (I don't know if that's how linux does it, but probably)

23:28 <heat> essentially add levels to it once they're required

23:28 <heat> small files would be trivial to look up, huge files would still be very fast

23:30 <heat> would probably keep each table PAGE_SIZE size'd for compactness' sake although getting it larger wouldn't be too bad either

23:31 <heat> i think it's teoretically way better than my binary tree in every characteristic except memory usage

23:31 <wxwisiasdf> radix trees seems interesting

23:32 <geist> yah i guess picking the right radix is interesting too

23:32 <geist> since that affects how much internal fragmentation you get

23:35 * heat nods

23:36 <heat> also if you use whole pages you can skip malloc and or its internal fragmentation

23:37 <geist> yah but thats probably pretty bad for internal fragmentation in the sense that you probably have a large set of unused page pointers

23:37 <geist> may be good to generally pick a radix or a set of radices that are relative to the size of the object or whatnot

23:37 <geist> no idea what linux does

23:38 <geist> but yeah the obvious one is one page radix

23:39 <heat> theoretically you could change the radix and restructure the tree when the levels get too deep

23:39 <heat> not that they would get too deep

23:40 <heat> at most you get 6 levels for a huge 64-bit vmo

23:40 <heat> ... at that point your main concern probably isn't the radix tree :D

23:41 <geist> also this is a fun thing where using larger page sizes affects your radix and the number of pages, etc

23:44 <heat> i wonder if changing page sizes does have a measurable effect on system performance

23:44 <heat> IO even

23:45 <heat> it's very common for kernels to just size things based on pages

23:46 <geist> right. it's an interesting question

23:58 lg_ is now known as lg