#osdev on 2022-09-29 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:07 SpikeHeron has quit [Quit: WeeChat 3.6]

00:20 <heat> is the tlb in qemu tcg a thing or what?

00:21 <heat> can I theoretically skip flushing the TLB?

00:21 <heat> is that a no-op?

00:24 <geist> i think it has some sort of cache, yes

00:36 <mrvn> heat: m[10, 20] isn't the "," operator though but some new evil hack to make that a n-ary []

00:36 <heat> I know

00:37 <mrvn> Makes me wonder how they make that work without breaking existing , operators inside []

00:39 <mrvn> vec[size = max(size, i), i] = val;

00:40 * mrvn ducks behind this horrible construct but can't of something else that could use , inside []

00:40 <mrvn> +think

00:42 \Test_User has quit [Read error: Connection reset by peer]

00:42 SpikeHeron has joined #osdev

00:47 gildasio has quit [Remote host closed the connection]

00:47 <heat> I've been unfucking my riscv port

00:48 gildasio has joined #osdev

00:48 <heat> it's pretty unfucked by now, but gcc is still acting up

00:48 <heat> I need some real CI for this...

00:51 nyah has quit [Ping timeout: 252 seconds]

00:57 <heat> I need interrupts

00:57 <heat> I do not have interrupts because I didn't bother writing a PLIC driver

00:59 Oshawott has quit [Ping timeout: 248 seconds]

01:02 <moon-child> mrvn: if it's in a statement context, there's no reason not to put the computation by itself

01:03 <moon-child> if it's in an expression context, you can pull it out: (size = max(size, i)), (vec[i] = val)

01:04 Gooberpatrol_66 has joined #osdev

01:10 <heat> ok so

01:10 <heat> what's the difference between a M-mode enable and an S-mode enable in the PLIC?

01:11 <heat> reading https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf here

01:12 \Test_User has joined #osdev

01:13 <heat> hrm, it seems that it may be a machine detail

01:13 <heat> the generic PLIC spec seems to be spec to 1024 interrupts

01:13 <heat> the actual sifive cores seem to only have 512 and have that M-mode vs S-mode thing

01:22 <heat> the logic to just figure out what goes where is horrific

01:22 <heat> cheers, they fucked up the irq controller

01:26 heat_ has joined #osdev

01:27 heat has quit [Read error: Connection reset by peer]

01:29 heat_ is now known as heat

01:32 <mrvn> moon-child: that isn't the point. Currently that calls operator[](size_t), in the future that might call operator[](size_t, size_t).

01:34 MelMalik is now known as AmyMalik

01:35 <heat> mrvn, I can imagine the behavior switching through -std=c++23

01:40 <mrvn> It should be that if operator[](size_t, size_t) is defined that should be prefered. Otherwise the old , operator should take hold.

01:48 <klange> Python has some similar challenges with commas - eg., where is a comma an operator that makes a tuple? Had a lot of "fun" with that in Kuroko's compiler...

01:56 <mrvn> (a,b) vs. a,b

01:56 <mrvn> f(a, b) vs. f((a, b))

02:10 archenoth has joined #osdev

02:34 gildasio has quit [Remote host closed the connection]

02:35 <moon-child> isn't (a,b) the same as a,b in python?

02:36 gildasio has joined #osdev

02:41 <\Test_User> a(a,b) and a((a,b)) is different

02:42 <\Test_User> er that was already stated... but when used in that kind of situation

02:42 <klange> as an expression on its own, yes, (a,b) and a,b are equivalent.

02:42 terrorjack has quit [Quit: The Lounge - https://thelounge.chat]

02:44 terrorjack has joined #osdev

02:47 [itchyjunk] has quit [Read error: Connection reset by peer]

02:50 <mrvn> I'm not sure f(a,b) isn't a tuple too. You can do "def foo(*args)" after all.

02:53 elastic_dog has quit [Ping timeout: 264 seconds]

02:59 elastic_dog has joined #osdev

03:05 gxt has quit [Remote host closed the connection]

03:07 gxt has joined #osdev

03:09 <klange> mrvn: it is not, but it was historically, keyword arguments necessitated entirely different parsing and that happened a long time ago

03:22 heat has quit [Read error: Connection reset by peer]

03:23 heat has joined #osdev

03:42 <mrvn> right, with kwargs you have to split it into a tuple and a dict.

03:44 <mrvn> but python is somewhat screwy there. def foo(x=0): print(x)

03:44 <mrvn> >>> foo(1)

03:44 <mrvn> 1

03:44 <mrvn> 2

03:44 <mrvn> >>> foo(x=2)

03:44 <mrvn> Is that a positional or key word arg?

03:47 <klange> positional vs. keyword is determined by the call, so it's positional in the first and keyword in the second

03:47 <mrvn> except when you do "def foo(*, x=0)"

03:48 <mrvn> eat all positional, so x must be keyword then

03:48 <klange> That's not what bare * means.

03:49 <klange> It means "all follow arguments can not be provided as positional arguments"

03:50 <mrvn> i.e. x must be given as keyword argument, the callee deterimend that

03:51 <klange> it mandated that, it didn't determine it

03:54 <mrvn> anyway, my point was that it must be a real pain to parse

03:54 <klange> There are ordering requirements that make it not too bad in that context.

03:54 <mrvn> can't just chuck any value without name into a tuple and any "name=val" into a dict. The position of the named args matters.

03:56 <klange> The ordering of the named args does not matter.

03:56 <mrvn> they have to be after unnamed args

03:57 <klange> Yes, and that's what makes it easier to parse...

03:58 <klange> Because, again, it absolutely is not a tuple...

04:01 <mrvn> I do love the "*args" and "**kwargs" syntax though. Does anything but python have that feature?

04:02 zaquest has quit [Remote host closed the connection]

04:03 zaquest has joined #osdev

04:04 <klange> I have it in Kuroko, but then whether that is "anything but Python" or not is up to you.

04:21 <moon-child> mrvn: iirc, raku has something like it

04:29 gxt has quit [Ping timeout: 258 seconds]

04:33 <energizer> mrvn: julia has f(a...; kw...)

04:33 gxt has joined #osdev

04:43 heat has quit [Ping timeout: 250 seconds]

05:10 Ram-Z has quit [Ping timeout: 265 seconds]

05:20 dormito has quit [Ping timeout: 268 seconds]

05:35 gxt has quit [Remote host closed the connection]

05:37 gxt has joined #osdev

05:55 dormito has joined #osdev

06:05 epony has quit [Read error: Connection reset by peer]

06:30 epony has joined #osdev

06:46 <geist> aww, sadness when your favorite chromebook starts bulging. the inevitable bulging battery syndrome

06:47 <geist> well had a good run, 7 years old

06:48 <\Test_User> can always remove the battery and just use it with an outlet nearby

06:49 <\Test_User> kinda defeats the point but still more mobile than a normal PC

06:50 <geist> yeah i will probably do that

07:17 tarel2 has joined #osdev

07:24 <pitust> geist: you could replace the battery, no

07:24 <pitust> ?

07:28 m5zs7k has quit [Ping timeout: 268 seconds]

07:33 m5zs7k has joined #osdev

07:41 <zid> I'm still not sure what you're supposed to do with bulged lithiums

07:42 <zid> throwing them into the trash seems like a recipe for the garbage truck to set on fire

07:42 <zid> and I don't wanna keep in my house

07:47 <mrvn> it's toxic waste, definetly not for normal garbage

07:47 <bradd> maybe a car battery shop could dispose of them?

07:47 <mrvn> return to producer?

07:48 <zid> put them on a convenient man-hole and fire BBs at it is my guess

07:48 <mrvn> car batteries are just acid and lead. Nothing explosive/flamable there. But maybe.

07:53 <geist> i actually disposed of some recently at a local recycling place that had basically a metal container to toss used batteries

07:53 <geist> outside

07:53 <zid> a burn barrel? :P

07:53 <geist> basically

07:54 <geist> well, in case it became a spontaneous burn barrel, that is

07:54 <geist> must be fun dealing with it downstream

07:56 <zid> maybe if lithium's price goes up from all the lithium being mined out slowly, they'll end up wanting them back as recycling

07:59 Ali_A has joined #osdev

08:11 archenoth has quit [Ping timeout: 248 seconds]

08:49 GeDaMo has joined #osdev

09:11 Ram-Z has joined #osdev

09:30 maxdev has joined #osdev

09:33 <maxdev> I think I have a misconception on interrupt handling. If a timer IRQ occurs, and my ISR is executed.. will, for example a keyboard IRQ, interrupt my ISR again? Or when is the next interrupt triggered, after existing my ISR, or after writing the EOI?

09:34 <maxdev> *exiting

09:34 <Mutabah> Depends on the interrupt controller

09:35 <maxdev> x86 APIC

09:35 <Mutabah> The legacy PIC allows lower IRQs to interrupt higher ones (.. there's a quirk where the secondary one is in the middle of the primary)

09:35 <zid> Bear in mind this is just which IRQ number you end up taking

09:35 <Mutabah> I think the APIC has finer-grained priorities... or just doesn't allow nesting

09:35 <zid> whether an IRQ happens at all is still up to whether you have interrupts enabled, the PIC has been acked since last time, etc

09:36 <Mutabah> ^

09:36 <Mutabah> Assuming you've just started handling an IRQ - IF will be clear (because hopefully you used an interrupt gate)

09:36 <zid> so in practice this is just "if two things happen simultaneously, which ends up happening first?"

09:36 <zid> it doesn't change how any of it works, your IRQs won't magically start nesting

09:37 <zid> unless they were already nestable to begin with

09:40 <maxdev> is IF also cleared if it's a software interrupt going through an interrupt gate?

09:41 <zid> I don't believe so? manual will know

09:41 <Mutabah> maxdev: It's just the gate type iirc

09:42 <maxdev> interrupts confuse me a lot lol

09:42 <maxdev> I tried making my kernel reentrant at some point and it's not very easy

09:42 <zid> interrupt gates seem to clear IF

09:44 <maxdev> but what is the EOI then good for?

09:44 <zid> eoi is for talking to the *pic* to acknowledge you've finished the IRQ it sent you

09:44 <zid> and it should send you another

09:44 <zid> it's nothing to do with the cpu's interrupts

09:45 <maxdev> ah i think i get it

10:10 archenoth has joined #osdev

10:24 kof123 has quit [Ping timeout: 268 seconds]

10:30 <ddevault> I am confuse

10:31 <ddevault> why would rdmsr on fsbase cause a GP fault

10:31 <ddevault> only on real hardware

10:51 <zid> does your real hw support it?

10:51 <zid> I don't have fsgsbase on my cpu

10:51 <ddevault> is't it always available for x86_64?

10:51 <zid> some is, but there's an extension

10:51 <zid> that adds some stuff

10:51 <ddevault> the rdfsbase/wrfsbase instructions

10:51 <ddevault> not using those, using the MSRs

10:52 <zid> k

10:54 <ddevault> wrmsr works fine

10:54 <ddevault> but not rdmsr

10:55 <ddevault> looks like linux reads the MSR just fine

10:55 <ddevault> I am confuse

10:57 kof123 has joined #osdev

11:01 <ddevault> rdfsbase works fine, though

11:02 <ddevault> I guess I'll just bite the bullet and do the cpuid check for it

11:08 Ali_A has quit [Quit: Client closed]

11:14 nyah has joined #osdev

11:16 <ddevault> is there any good database of cpuids which lets me identify CPUs which have a given feature?

11:19 eroux has quit [Ping timeout: 268 seconds]

11:21 <ddevault> if you have access to an older x86_64 computer running Linux, please run this command and tell me if it prints "no fsgsbase"

11:21 <ddevault> lscpu | grep fsgsbase >/dev/null && echo fsgsbase || echo "no fsgsbase"

11:28 demindiro has joined #osdev

11:29 <zid> I mean, I know I don't have it

11:29 <demindiro> ddevault: are you also setting bit 16 in CR4? I did a quick grep through your code but only see bit 9, 10 and 17 being set

11:30 <demindiro> Or ah, that's only for wrgsbae etc, nvm

11:30 <ddevault> yeah I added that

11:30 <ddevault> rdfsbase works fine for me

11:30 <ddevault> but reading the MSR does not work on real hardware, GP faults

11:30 eroux has joined #osdev

11:30 <demindiro> You're trying it from kernel space right?

11:30 <ddevault> aye

11:32 <zid> https://gist.github.com/zid/a93721230a8f6d2fbb34dd56c56497b2 lscpu is a neat tool

11:32 <bslsk05> gist.github.com: lscpu.txt · GitHub

11:32 <zid> I ususually just cat /proc/cpuinfo

11:33 <ddevault> erp

11:33 <zid> Legit never used it before

11:34 janemba has quit [Ping timeout: 252 seconds]

11:34 <ddevault> zid: can you test this ISO out on your hardware? https://mirror.drewdevault.com/boot.iso

11:34 <demindiro> My guess is %ecx somehow having garbage as value, no idea what else it could be.

11:34 <ddevault> demindiro: ecx looks fine in the panic dump of registers

11:34 <ddevault> working would be the classic "thread 1" "thread 2" logging

11:34 <ddevault> not working would be GP fault

11:35 <demindiro> Does RDMSR work in e.g. the bootloader?

11:35 <ddevault> depends on what mode it's in, I bet

11:35 <zid> I mean, under virt it seems to work?

11:35 <zid> it goes 'hello from helios' for like 10 seconds, then spams thread1 / thread2

11:36 <ddevault> right, I know it works fine on virt

11:36 <zid> I'm not rebooting my desktop and buying a cd-rom drive

11:36 <ddevault> but it doesn't work on my hardware

11:36 <ddevault> can boot from USB

11:36 <zid> or a usb drive

11:36 * ddevault shrugs

11:36 <demindiro> I can try

11:36 <demindiro> 1 min

11:36 <ddevault> demindiro: run lscpu first and look for fsgsbase

11:36 <zid> Clearly what I need is.. another desktop

11:37 <demindiro> fsgsbase works on my laptop

11:37 <ddevault> cool, curious to know if the iso works as well

11:39 <demindiro> kernel page fault at non-present address 0x9 @ 0xffffffffc003a839

11:40 <ddevault> uh, that's interesting

11:40 <ddevault> I think there might be a race condition somewhere, does it happen again if you reboot?

11:40 <ddevault> err wtf is that %rip

11:41 <demindiro> Yep

11:41 <ddevault> can you send me a picture of the panic screen?

11:41 <demindiro> I'll try another laptop too

11:41 d5k has joined #osdev

11:42 <demindiro> Yes, if I can get K9 mail to work and figure out where Android stores camera pictures

11:43 epony has quit [Remote host closed the connection]

11:44 <demindiro> Ditto on other laptop btw

11:44 <demindiro> Anyways, I hate android 10, phone froze again

11:44 <ddevault> here's another one to test out when you have a sec: https://mirror.drewdevault.com/rdfsbase.iso

11:45 janemba has joined #osdev

11:45 <demindiro> That ISO works

11:45 <demindiro> Loading etc, seems to halt at /hello

11:46 <ddevault> halt at /hello is not a working result, I'm afraid

11:46 <ddevault> should spam thread 1/thread 2 over and over

11:46 <demindiro> So on one laptop it appears to halt, on the other there's thread 1/2 spam

11:47 <ddevault> which laptop does it halt on? what model?

11:47 <demindiro> Samsung 520U something IIRC

11:47 <ddevault> curious to see lscpu output on that

11:47 <demindiro> Samsung NP530U3C

11:47 <zid> sounds like a centrino N-348934893 cpu

11:53 bauen1 has quit [Ping timeout: 265 seconds]

11:54 bauen1 has joined #osdev

11:55 <demindiro> So uhm

11:56 <demindiro> Apparently that samsung laptop does not have fsgsbase

11:56 <demindiro> And I forgot that I emulate it if not present

11:56 <ddevault> that's what I wanted to know, thanks

11:56 <demindiro> rdmsr stuff works though

11:56 <ddevault> nice

11:56 <ddevault> I'm surprised that the MSRs don't work consistently

12:06 lkurusa has joined #osdev

12:21 Ali_A has joined #osdev

12:21 Ali_A has quit [Client Quit]

12:21 <maxdev> forgot to say thank you for your help earlier, Mutabah and zid

12:21 <zid> I accept hard cash and sexual favours

12:23 eroux has quit [Remote host closed the connection]

12:29 eroux has joined #osdev

12:31 eroux has quit [Remote host closed the connection]

12:31 <maxdev> xD

12:38 <sham1> What about cash favours

12:39 <maxdev> what about hard sexual?

12:43 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

12:44 tarel2 has quit [Ping timeout: 252 seconds]

13:05 heat has joined #osdev

13:06 <heat> hello

13:06 <heat> help i dont get the irq!!!!!!!!!

13:07 <zid> I can email you one?

13:08 <heat> yes

13:08 <heat> irq 10 pls

13:09 <zid> okay, IRQ10 is send $20 paypal to zid

13:09 <zid> I'll deassert it once it shows up

13:09 <heat> what

13:09 <heat> but iz free no?

13:09 <zid> I'm reverse engineering rollercoaster tycoon cus.. someone asked me to

13:09 <zid> would you like to know what the incredible save encryption is

13:13 <heat> yes

13:14 <zid> add eax, 0x39393939; rol eax, 5;

13:14 <zid> You're welcome

13:18 <heat> sgtm

13:19 <zid> suck goats, tiny munchkin

13:22 <mjg> oh so you gonna cheat in game?

13:22 <mjg> add moneyz to save

13:24 <zid> no he wants to multibox it but there's two issues

13:24 <zid> one is that there's a single save blob that each copy overwrites in full

13:24 <zid> and the second is that it pops up a "this is already running" box

13:24 <zid> I fixed the latter already

13:24 <zid> the former I was just going to write a thing that merged save files

13:25 <mjg> whyn ot get a vm

13:25 <mjg> and bounce it

13:25 <zid> 'bounce it'?

13:25 <mjg> between machine

13:25 <mjg> s

13:25 <zid> pardon?

13:25 <heat> shut up

13:25 <mjg> there is technology(tm) to migrate vms along with the storage between different physical machines

13:25 <heat> go back to path walking flamegraph man

13:25 <zid> okay great

13:25 <zid> but how does it solve the problem

13:26 <zid> you end up with n saves, save 0 has beaten level 0, save 1 has beaten level 1, save 2 has beaten level 2

13:26 <zid> and you need a save with level 0 + 1 + 2 beaten so that level 4 unlocks

13:26 <mjg> maybe i misunderstood what the goal is

13:26 <zid> and the game doesn't partially increment the save file

13:26 <zid> it fully overwrites it all

13:26 <mjg> oooh wait, you mean the person wants to play *once* but generate several different instances of save files?

13:27 <zid> no

13:27 <mjg> as if they plaed the game n times?

13:27 <zid> they're playing 3 copies of the game at the same time

13:27 <zid> so they can speedrun unlocking level 4

13:28 <zid> but it won't unlock unless *one* copy beat level 1 2 and 3

13:28 <mjg> that's what i said

13:29 <zid> what did you say?

13:34 <heat> i think you should all shut the fuck up

13:34 <zid> I think heat is angry his IRQ is wedged

13:34 <heat> i am

13:34 <heat> wanna review it? yes? wonderful

13:35 <froggey> how do you multibox this if each level only gets unlocked if you've got the previous level beaten?

13:35 <zid> it's in sets, frog

13:35 <froggey> ahh

13:35 <zid> xyz start unlocked, beating each unlocks tuv

13:35 <zid> so you can triple-box xyz

13:35 <zid> then triple-box tuv

13:35 <zid> but if you *play* xyz, you just get a save game with ..z or .y. or .x. completed depending on which one saved last

13:36 <zid> err x.. :P

13:36 <zid> so I was considering writing a tool that could merge save game progress. But I think I've talked him into just starting on a save-file with every level unlocked, and just not playing tuv until xyz are beaten

13:37 <zid> scout's honor rather than "massive amounts of technical adulteration to the game files"

13:42 <ddevault> https://l.sr.ht/iqJG.png ta~

13:46 <zid> oh is that what the blips I couldn't follow were, the threadlocal x prints

13:46 <zid> I just noticed the linespacing changed sometimes

13:47 <ddevault> https://git.sr.ht/~sircmpwn/helios/tree/master/item/vulcan/cmd/init/main.ha

13:47 <bslsk05> git.sr.ht: ~sircmpwn/helios: vulcan/cmd/init/main.ha - sourcehut git

13:51 <heat> https://github.com/heatd/Onyx/pull/45/commits/c52d7c200249ea90f9d1d4632ad6d4fccbebc8a1

13:51 <bslsk05> github.com: Riscv work by heatd · Pull Request #45 · heatd/Onyx · GitHub

13:51 <heat> if you find a bug I'll suck your dick be very happy

13:51 <ddevault> bug: it's written in C++

13:53 <heat> WONTFIX

13:53 <heat> nexet

13:53 <heat> next

13:54 gildasio has quit [Quit: WeeChat 3.6]

13:56 puck has quit [Excess Flood]

13:56 puck has joined #osdev

14:01 gildasio has joined #osdev

14:02 gildasio has quit [Read error: Connection reset by peer]

14:02 <heat> per https://grok.dragonflybsd.org/xref/freebsd/sys/riscv/riscv/plic.c?r=2663ef1b I'm not doing anything wrong

14:02 <heat> i am confusion

14:02 <bslsk05> grok.dragonflybsd.org: plic.c (revision 2663ef1b) - OpenGrok cross reference for /freebsd/sys/riscv/riscv/plic.c

14:03 <maxdev> o

14:06 <maxdev> i'm calling a userspace function from my kernel, so that my driver can handle the IRQ.. is there anything special happening when using a CALL instruction on x86 and calling code in a page that's mapped as a user page? shouldn't matter I think?

14:06 <mjg> what

14:06 <mjg> it is definitely nt going to work if you enable smep

14:07 <mjg> but that aside, you do realize you are still in the kernel for all practical purposes

14:07 <maxdev> yes

14:07 <mjg> so why is the ufnc mapped there?

14:07 <maxdev> it isn't, i switch to the driver task and call it

14:07 <maxdev> so well it is at that moment

14:07 gildasio has joined #osdev

14:08 <heat> you should find a way that doesn't involve calling a random userspace function in ring0

14:08 <heat> else your pretty microkernel design will come crashing down

14:08 <maxdev> :D i had a different solution, but i found it quite messy

14:09 <heat> *this* is the less messy one? oh my!

14:11 <maxdev> well, what's a good solution? my "messy" solution is having an "interrupted state" on the task, then preparing the task to run in the handler, then switch to the task and once it's done switch back

14:11 <heat> that's the least messy solution you'll be able to find

14:11 <heat> otoh, it will be a bit slow

14:12 <maxdev> hmm it just doesn't satisfy me that i have to mess with the stack and stuff

14:12 xenos1984 has quit [Read error: Connection reset by peer]

14:14 d5k has quit [Ping timeout: 264 seconds]

14:14 <heat> you know threads need to block right?

14:15 LostFrog has quit [Read error: Connection reset by peer]

14:15 <maxdev> block in what sense?

14:15 <heat> in the sense of blocking, suspending

14:15 PapaFrog has joined #osdev

14:16 <heat> the cleanest solution from a typical perspective would be something like: "main() { int fd = open("/dev/irq/10", O_RDONLY); struct irq_data data; while (ioctl(fd, IRQFD_GET_IRQ, &data) >= 0) { do_stuff(&data); ioctl(fd, IRQFD_ACK);}

14:16 <heat> now, is it fast? no, you need to take two trips to the kernel

14:16 <maxdev> i have blocking/suspending logic etc

14:17 <maxdev> i wanted to make it faster with this interruption method, also i need it for signals

14:17 <maxdev> that clean solution would basically be polling

14:21 <heat> >signals

14:21 <heat> stop

14:21 <heat> mjg: !signalrant

14:21 <maxdev> :D

14:21 <heat> something something everyone was on LSD back then

14:25 heat has quit [Remote host closed the connection]

14:25 heat has joined #osdev

14:28 gildasio has quit [Remote host closed the connection]

14:29 gildasio has joined #osdev

14:31 xenos1984 has joined #osdev

14:35 <maxdev> i really lHoAvTeE osdeving

14:56 heat_ has joined #osdev

14:57 heat has quit [Read error: Connection reset by peer]

15:10 <ddevault> maxdev: the way my microkernel handles IRQs is

15:10 <ddevault> we have an IPC primitive called signals (based on seL4), which a task can block on, and which can be written to ("signalled") asyncronously, unblocking any waiting tasks

15:11 <ddevault> then there's an IRQ object which userspace can invoke to EOI after waking up on the signal

15:12 <ddevault> note: signal in this context is not the same thing as Unix signals

15:13 dude12312414 has joined #osdev

15:22 nvmd has quit [Quit: WeeChat 3.6]

15:31 <mrvn> ddevault: my kernel has mailboxes for IPC. If you want to deal with an IRQ you get a mailbox from the kernel and you send a message to it to EOI. When the IRQ happens the message comes back.

15:31 <ddevault> yeah that also works fine

15:31 <mrvn> And you can decide if you want to send the message async or wait for a reply.

15:32 <mrvn> e.g. the NIC might fetch the infoming frame, fire of the EOI message and then process the data before going back to sleep.

15:32 <mrvn> s/infoing/incoming/

15:33 <mrvn> How do you prevent the signal being send while you are still processing the last signal?

15:35 <ddevault> as soon as the thread is unblocked the signal can be delivered again, if that happens while it's still processing then the next call to wait() does not block

15:36 <ddevault> also depends on when the interrupt is EOI'd

15:36 <mrvn> so if it doesn't wait the signal does nothing till you wait?

15:36 <ddevault> aye

15:36 <ddevault> you could also have several threads/processes waiting on the same signal, if you want to pool CPU time

15:37 <mrvn> can you pin an IRQ to a core?

15:37 <mrvn> or cpu set

15:37 <ddevault> tbd, I don't have SMP yet so

15:38 <mrvn> ahh, the old days, when things where simple

15:41 demindiro has quit [Ping timeout: 252 seconds]

16:09 puck has quit [Excess Flood]

16:09 puck has joined #osdev

16:13 epony has joined #osdev

16:20 dude12312414 has quit [Remote host closed the connection]

16:23 demindiro has joined #osdev

16:23 dude12312414 has joined #osdev

16:27 bauen1 has quit [Ping timeout: 268 seconds]

16:29 bauen1 has joined #osdev

16:30 xenos1984 has quit [Ping timeout: 260 seconds]

16:30 maxdev has quit [Remote host closed the connection]

16:30 xenos1984 has joined #osdev

16:40 antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]

16:41 antranigv has joined #osdev

16:43 dude12312414 has quit [Ping timeout: 258 seconds]

16:49 <mjg> heat_: i just got aprog at $dayjob which takes a pthread mutex from a signal handler

16:49 <mjg> heat_: you know what's funny about it?

16:59 dude12312414 has joined #osdev

17:19 sympt has joined #osdev

17:29 demindiro has quit [Ping timeout: 252 seconds]

17:43 h4zel has joined #osdev

17:44 xenos1984 has quit [Ping timeout: 250 seconds]

17:57 poyking16 has joined #osdev

18:00 xenos1984 has joined #osdev

18:11 <geist> ddevault: that rdmsr of fsbase doesn't make sense. that should *always* work

18:11 <ddevault> I know

18:11 <geist> the fsgsbase stuff is just some new feature to do it via an instruction (and faster)

18:11 <ddevault> I have no clue why it isn't

18:11 <geist> so i'd super double check if you have the right number loaded in the right register

18:11 <geist> that has to be it, there's no other explanation

18:12 d5k has joined #osdev

18:12 <ddevault> I checked it several times, cross-referenced from two sources, and checked the constant against the intel and AMD manuals and the osdev wiki

18:12 <geist> are you using inline assembly? what does it look like?

18:12 <geist> did you look at the disassembly to make sure everythings in the right register?

18:12 <ddevault> not inline, but yes assembly

18:12 <geist> can you pastebin it?

18:12 gog has joined #osdev

18:12 <ddevault> I checked the panic dump of the CPU state, has the appropriate values

18:13 <gog> meow

18:13 <ddevault> the rdmsr code is very simple https://git.sr.ht/~sircmpwn/helios/tree/master/item/arch/%2Bx86_64/asm.s#L22

18:13 <bslsk05> git.sr.ht: ~sircmpwn/helios: arch/+x86_64/asm.s - sourcehut git

18:13 <ddevault> this is called with the system-v ABI

18:13 <geist> what is passed in what arg?

18:14 <ddevault> 0xC0000100 in the first parameter

18:14 <ddevault> no other parameters

18:14 <geist> so do you see the bug then?

18:14 <ddevault> ...err

18:14 <ddevault> yeah I do

18:14 <geist> hehe

18:14 <ddevault> ty

18:14 <geist> good ol duck debugging saves the day!

18:15 <j`ey> is

18:15 <geist> anyway yeah fsgsbase is just an optimization if you're using it in the kernel, and it's kinda annoying since it can't get to the alternate gs one.

18:15 <j`ey> delete 'mov %eax, %ecx'? i dunno x86 asm

18:15 pretty_dumm_guy has joined #osdev

18:15 <ddevault> yeah I grok fsgsbase

18:15 <ddevault> only started looking at it because the msr had weird issues

18:15 <geist> *howeveR* i do remember that it is a lot faster than the MSRs, so it's annoyingly fast enough to probably need to use in the context switch routines

18:15 <ddevault> will move the code back to the msr and add fsgsbase gated behind cpuid later

18:16 <geist> (obviously optimization, etc)

18:16 <geist> yah

18:16 <j`ey> what's the fix?

18:16 <ddevault> %edi, not %eax

18:16 <geist> j`ey: edi -> ecx

18:16 <geist> the first arg is passed in edi

18:16 <j`ey> ah

18:16 <ddevault> dealing with fsbase/gsbase in a context switch is really fucking annoying

18:16 <ddevault> 0/10

18:16 <ddevault> should have had the instruction from the start

18:17 <geist> ddevault: yeah tell me about it, you need at least a test and if statement

18:17 <geist> yah but then you also have to do a swapgs, access, swapgs back too

18:17 <geist> (to get to the gs one)

18:17 <geist> and if you get an NMI in that spot. ugh.

18:17 <geist> if it weren't so much faster than the MSRs i'd just forget the whole thing

18:19 <j`ey> geist: do you do patchng in fucshia?

18:19 <geist> yup

18:19 <geist> if you mean some routines are patched in the kernel, yes

18:19 <j`ey> yeah

18:20 <geist> in this particular case (using fsgs or not) it's not really worth patching per se, it's more that the ergonomics for using it for context switch are annoying

18:20 <geist> mostly because of the swapgs thing

18:20 <geist> these instructions are primarily for user space code

18:22 <geist> also and i already forgot the specific details, there's a little bit of weirdness with loading 0 or not 0 info the fs and gs registers on AMD and intel

18:22 <geist> specifically, iirc one of them will reset the fs/gs base to 0 when you do that in some conditions, and the other wont

18:22 <geist> so there's a little bit of an order of operation if you're also resetting the ds/es/fs/gs registers back to 0 in a context switch

18:23 <geist> it intersects with when you actually save/restore the fsgsbase MSRs

18:23 * geist pets gog

18:23 * gog prr

18:23 <gog> i finished my work thing and my boss said it was ok for me to grab some smaller jobs for the rest of the week :D

18:24 <gog> closed two today

18:26 <geist> noice!

18:27 <gog> started on a third but i put in a lot of extra time over the last week and a half so i went home on time for once :P

18:27 <gog> i really love my job

18:31 <mjg> bro

18:31 <mjg> oh right, you are new at the job

18:31 <mjg> here is a general pro tip: hard work results in more hard work with on payoff

18:31 <mjg> *no* payoff

18:32 <mjg> don't be the guy always taking the shift cause why not

18:33 <mjg> people suck and if you let them treat you a certain way, they will think less of you for it (and consequently look for more ways to do it, rinse & repeat)

18:33 <mjg> soft skillz for tech people 101

18:33 <zid> My strategy is to do 0 work, I've taken it to the extreme

18:34 <mjg> you undo work?

18:34 <zid> I technically undid some work today I guess?

18:34 <zid> I stopped a program doing a thing

18:34 <heat_> mjg, i dunno about that pthread_mutex thing

18:35 <heat_> if I were to guess, I would say pthread_mutex_lock and co. aren't required to be AS-safe in POSIX

18:35 <heat_> but I can also see why this would be a bad guess

18:36 <mjg> what if you get the signal while already hoding the lock

18:36 <mjg> which hte code at hand makes no provision to prevent

18:36 <heat_> trylock

18:36 <mjg> best part is that there should be no threading in this prog to begin with

18:36 <mjg> someone just did not know how to write an event loop

18:37 <heat_> that's great

18:37 <mjg> and then waht, you pretend therew as no signal? :>

18:37 <heat_> btw I thought you worked at the freebsd foundation?

18:37 heat_ is now known as heat

18:37 <mjg> used to, can't be fucked to update my linkedin profile

18:38 <mjg> why are you stalking me

18:38 <heat> i didn't stalk you

18:38 <zid> I'll stalk you if you want

18:38 <mjg> zid: appreciated

18:38 <heat> it's just that you work on freebsd and are a committer so it's highly likely you're at the fbsd foundation

18:38 * zid googles mateusz guzik

18:38 <mjg> heat: well i work on freebsd at the current workplace as well

18:38 <zid> no no heat, works on freebsd, has been commited*

18:39 <heat> "Kernel development with focus on scalability issues.Kernel development with focus on scalability issues."

18:39 <heat> replace that with "haha path go open(2)"

18:39 <zid> Kernel development with a focus on memoyr odreign

18:40 <zid> that's the alpha branch

18:40 <heat> what

18:40 <heat> did you have a stroke

18:40 <zid> yes, because I ran my code on an alpha

18:41 <mjg> chill person in person <-- that's me

18:41 <zid> That's because eastern europeans all know the only possible trajectory for their life is upwards, the opposite of the UK and US

18:42 <heat> is it?

18:43 <heat> you can get invaded

18:43 <heat> or shot

18:43 <heat> or both

18:43 <mjg> you can't get cancelled

18:43 <heat> or accidentally mass genocided in ex-yugoslavia

18:43 <zid> heat: yea but now tlk about places that aren't the US

18:43 <mjg> besides i take offense. i'm from central europe!

18:43 <heat> i'm not getting invaded nor shot nor mass genocided

18:44 <zid> poland is east sorry, you have an accent that sounds like russian to the untrained

18:44 <zid> that makes you eastern european

18:44 <mjg> that's just your lack of education!!

18:44 <zid> I don't make the rules

18:44 <heat> poland = weird letters = eastern europe

18:44 <zid> I just enforce them mercilessly

18:44 <mats1> is this slav on slav violence?

18:44 <mjg> seriously though, there is a common perception that anything east of germany is progressively more backwards

18:44 <zid> he says he isn't a slav

18:44 <heat> "how many consonants can you fit in a weird"

18:44 <heat> s/weird/word/

18:45 <heat> although weird is also a fitting description of polish words

18:45 <mjg> as in even poles will think less of people east of them

18:45 <zid> z with dirt on it is a vowel heat, deal with it

18:45 <mjg> zid: bro there are 2 variants!

18:45 <mjg> ź and ż

18:45 <zid> Pretend I just linked that sketch of the polish farmer giving his name to the german with the typewriter

18:45 <mjg> however, if you knew the fucked czech/solvak alphabet, you would consider poland to be very western

18:45 <zid> I'm too lazy to find it

18:46 <mjg> ye i know

18:46 <heat> you mean check

18:46 <mjg> brzęczyszcykiewicz

18:46 <heat> from the check republic

18:46 <zid> yea him

18:46 <mjg> classic

18:46 <heat> or if you're european, cheque

18:46 <zid> if your name ends in -vich, you're eastern european

18:46 <mjg> tbf his name sounds like shit even to poles though

18:46 <mjg> typical polish names are nowhere near as f-worded

18:47 <geist> does the word have not enough vowels? check!

18:47 <zid> we can give him some of germany's, as reparations

18:47 <zid> they have a lot of trailing e they can give up

18:48 <heat> poland bad germany bad france bad god save the queen pound

18:48 <zid> heat talking sense for once

18:48 <mjg> afghanistan under taliban rule is the place to be

18:48 d5k has quit [Quit: leaving]

18:49 <heat> "<mjg> afghanistan under taliban rule is the place to be"

18:49 <heat> this is a fucking quote

18:49 <heat> someone make this the topic

18:49 <geist> anyway, speaking of topics

18:49 <zid> right, back on topic guys, freebsd is shit

18:49 <mjg> i hear onyx does not scale

18:50 <heat> freebsd shit freebsd foundation shit praise mjg's new employer

18:50 <heat> WHAT

18:50 <heat> can't hear you

18:50 <mjg> heat: it's been over a year now

18:50 <heat> anyway yeah I actually had a real question

18:50 <mjg> what

18:50 <mjg> can't hear

18:51 h4zel has quit [Ping timeout: 265 seconds]

18:51 <heat> does freebsd still have slab's caching for the initialization/destruction or is that bs gone like in linux

18:51 <mjg> yes :(

18:51 <mjg> it has

18:51 <mjg> i have work inp rogress to whack it

18:51 <geist> idea is tht sort of shenanigans is no longer useful on modern hardware?

18:51 <heat> why is it bad

18:51 <mjg> there is one nasty consumer which makes it problematic

18:51 <mjg> well let's make sure we are on the same page here

18:51 <geist> on paper it seems like a good idea

18:52 <mjg> there is inititalistion of *allocation time* as in just before returning to the caller

18:52 <mjg> and there is some code to run when you import pages backing teh slab, creating objects to begin with

18:52 <geist> right, such that it has magazines of partially initialized things ready to go

18:52 <mjg> freebsd supports both and the latter is fine, while the former is a bad idea

18:53 <mjg> and i intend to remove it

18:53 <geist> which to remove?

18:53 <mjg> init just prior to return

18:53 GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]

18:53 <mjg> i reviewed several consumers and i'm did not find any which benefits from it

18:53 <geist> oh sure. okay. that maekse sense. i thought initially you were saying the 'partially initialized thing in the cache is bad' and i was wondering if that's one of those optimizations that end up not being worth it in the end

18:53 <mjg> and the loss is from an indirect function call

18:54 <geist> yah and i guess the consumers are still free to initialize/construct the object once youv'e handed it to them

18:54 <mjg> geist: it may or may not be worth it, the problem is that historical papers about it were doing stupid stuff to show the win

18:54 <geist> ie, here's 48 bytes of something

18:54 <mjg> right

18:54 <mjg> it gets largely overwitten most of the time anyway

18:54 elderK has joined #osdev

18:54 <mjg> and you still pay for branching on the existence of the init func, you waste space in the cache line to store it etc.

18:55 <geist> yah okay, makes sense. and yeah i was thinking the everything just overwrites it anyway part with modern software practices may be the reason the partially initialized stuff isn't as useful anymore

18:55 <mjg> it's all around a loss

18:55 <geist> i'd tend tot hink with modern machines the idea that you blat out something on the object and then immediately use it means the cache is hot, etc and may be a generally okay pattern

18:55 <geist> up to some point of course

18:55 <mjg> well let me tell you what prompted the existence of constructors on page import

18:55 <mjg> in solaris

18:55 <mjg> and it is pretty bad

18:56 <mjg> (sorry :))

18:56 <geist> yah i remember reading the same slab paper

18:56 <geist> that's kida what i'm going on in my mind

18:56 <heat> i was reading the slab paper on my uber today

18:56 <mjg> in solaris they have funcs like mutex_init, cv_init which are weirdly expensive

18:56 <geist> seems that the upside there is you can do all the initialization while not holding any locks

18:56 <geist> whereas alloc_slab -> construct is probably holding some consumers lock

18:56 <mjg> in linux you would literlaly store few bytes, done

18:56 <mjg> in solaris the yhave func calls

18:57 <zid> "which is weirdly expensive" sums up solaris as a whole doesn't it

18:57 <mjg> so ye, if you shave numerous func calls beause an obj has 5 mutexes and 3 condvars

18:57 <mjg> you can measure a win

18:57 <geist> different era. you generally didn't inline the fuck out of things like yo udo now

18:57 <geist> also because yuo probably had like 4KB cache, *total*

18:57 <heat> also a function call in linux

18:57 <heat> https://elixir.bootlin.com/linux/latest/source/kernel/locking/mutex.c#L46

18:57 <bslsk05> elixir.bootlin.com: mutex.c - kernel/locking/mutex.c - Linux source code (v5.19.12) - Bootlin

18:58 <geist> anyway, got it

18:58 <mjg> geist: if this was just few stores, it would *likely* still be faster to just have them in one central func allocating stuff

18:58 <mjg> note the store is probably shorter than generating a call +_recovery from it

18:58 <geist> probably. also unclear of compilers at the time even had inlining as a feature. you'd have to macro it probably

18:58 <mjg> i'm not saying everyone allocating has to inline this at their own callsite

18:58 <mjg> i'm saying: foo_alloc would conatain a call to slab + do the init

18:58 <geist> anyway, i just remember from say BeOS kernel, which did none of this inlining either

18:59 <mjg> that said, there may be legitimiately expensive things to init

18:59 <mjg> in which case the idea is great

18:59 <geist> yah. i think the idea though was if you freed an object it could keep a partially initialized version though

18:59 <geist> and thus as it got recycled it wouldn't need to be re-allocated

18:59 <geist> er re-initialized

19:00 <geist> so it's less of any sort of win on a 'clean' page added to the slab that then gets used, and more of an optimization for the recycle path, right?

19:00 <mjg> i'm not disupting it's ok in the worst case evne for this mutex stuff

19:00 <heat> oh yeah also, is the cache coloring still similar to the original paper? or did you get more fancy?

19:00 <mjg> i'm saying the ordeal was prompted from solaris inefficiency in the area

19:00 <geist> sure.

19:00 <geist> yeah i'm actually looking at this from a 'one of these days we'll do a slab in zircon too and have been vaguely thinking about these sort of ideas'

19:01 <heat> I don't fully understand how the coloring and I don't know how their bus example maps to modern x86/arm64 hardware

19:01 <geist> heat: i'm not entirely sure the cache coloring matters much anymore

19:01 <geist> i think it matters a lot more on direct mapped and or very low associtivity caches

19:01 <geist> which would have been the norm back then

19:01 <mjg> geist: for example in freebsd vnodes get added to a global lru list on alloc

19:01 <mjg> geist: as in when they are created. it would be prohibitively expensive on actual alloc/free

19:01 <geist> mjg: yah i think vnodes are the canonical example of these things

19:02 <mjg> i don';t know if they are canonical, freebs happend to not use this stuff prior to 5-ish years ago fwiw :P

19:02 <mjg> ... for vnodes

19:02 * geist nods

19:03 <geist> more of the canonical ideal object

19:03 <geist> ie, the one you would use if you wanted to describe it to a classroom of osdevers

19:03 <heat> maybe they're not canonical but red hat

19:03 <heat> ba dum tss

19:03 h4zel has joined #osdev

19:03 <mjg> anyway ye, constructor on page import == defo. an optional init function on alloc == please no kthx

19:03 <mjg> :)

19:03 <geist> sinc eyou have a fairly large and complicaed structure that gets recycled a lot

19:03 <mjg> ye agreed

19:03 <geist> but i think that also ignores the recycle path

19:03 <heat> do you still have dtors?

19:04 <mjg> heat: again what's a dtor in this context

19:04 <geist> ie, free_but_keep_alive(vnode);

19:04 <geist> now it goes in the lru with a ref count of 0

19:04 <heat> mjg, as in the dtor you pass to kmem_cache_create

19:04 <geist> if asked for again it pops it out, otherwise the slab grabs it, runs dealloc() and then returns it as a new one

19:04 <heat> that gets called when a slab goes bye-bye

19:04 <mjg> you mean when you "whatever_free" or when you destroy the page

19:04 <mjg> so ye, dtors exist and make perfect sense if you have a ctor

19:04 <heat> hrm

19:05 <mjg> i unlink vnodes from lru in a dtor

19:05 <heat> linux... does not have dtors

19:05 <mjg> no? that's weird

19:05 <heat> they have ctors, but not dtors

19:05 <geist> yah i thought a big part of the win is you can do the delayed dtor thing, and since the slab has a global view of what memory is in the system

19:05 <geist> you can allow the slab's global view to decide how to allocate memory between them

19:05 <geist> and it can globally trim slabs based no memory pressure by removing unused-but-not-dtored objects

19:05 <heat> https://lwn.net/Articles/233838/

19:05 <bslsk05> lwn.net: Slab allocators: Drop support for destructors [LWN.net]

19:06 <mjg> ye i like the general idea, but it tends to not be really utilized like it should

19:06 <geist> possible that that sounds good on paper, but something more bespoke is ultimately more flexible

19:06 <geist> yeah

19:06 <geist> like some sort of memory pressure notification to slab users: "hey trim your shit yo"

19:06 <geist> and then have the users do their own LRU

19:06 <heat> "Taking a

19:06 <heat> spinlock in a destructor is a bit risky since the slab allocators may run

19:06 <heat> the destructors anytime they decide a slab is no longer needed"

19:07 <mjg> one little tidbit you may want consider re dtors/ctors is batching

19:07 <mjg> the freebsd api, and likely solaris, just accepts 1 obj at at time

19:07 <geist> yah i'd also think that a slab has a better view of page utilization

19:07 <mjg> which is highly pessimal for usecases like mine

19:07 <geist> so it could decide to free these 8 objects because that makes the page free

19:08 <mjg> 8 vnodes fit in a page, so i get 8 lock/unlock cycles on each import

19:08 <mjg> it is rare enough to not be a big deal, but it is crap for no good reason

19:08 <geist> well anyway, good stuff to think about

19:08 <geist> i think we're mostly stuck with one design in zircon, because C++ anyway

19:08 <geist> though you can placement new, the language itself is somewhat stricter about lifecycle of objects

19:09 <heat> also "Well, constructors are on their way out too because they don't seem to give the performance benefit they were designed for anymore" in 2007

19:09 <geist> you can't be as loosy goosey about halfway constructed objects, etc

19:09 <mjg> heat: that's basically a remark on the mutex_init et al stuff i mentioned

19:09 <geist> and yeah i suspect that's the 'cpus L1 caches are larger and cpus are really good at blatting out a relatively small amount of data in pretty close proximity'

19:09 <mjg> and does not account for less common usecaess, like the vnode stuff i mentioned

19:10 <mjg> [the global lru]

19:11 <geist> side note i'm still amazed at how big the vnode cache gets on linux. i checked one time on i think a 64GB or maybe a 128GB machine after touching basically every file in my fs

19:11 <mjg> huh

19:11 <geist> and sure enough /proc/slabinfo showed like a little over 4GB of vnodes

19:11 <geist> millions of them

19:12 <mjg> well linux is pretty aggressive about populating the dentry cache

19:12 <geist> and the dentries, yeah

19:12 <mjg> i guess it is fine if you hae enough ram anyway?

19:12 <geist> linux seems to go all in on dentry as a load bearing thing, i was a little surprised to recently learn this

19:12 <mjg> i would be worried if the system was running into trobule because of it

19:12 <heat> oh yeah totally

19:12 <geist> my general VFS knowledge is very BSD centric, where a dir cache is purely an acceelerator cache, and not strictly necessary

19:12 <mjg> [which to be clear does happen :>]

19:13 <mjg> geist: funny you mention that, i was looking at makingi t mandatory :)

19:13 <heat> dentries are required if you want to go up or down the path, or if you want fast lookups

19:13 <mjg> the problem with an optional cache is that you hinder user debugging

19:13 <heat> hitting the fs is bad

19:13 <geist> yah at first i'm a little grossed out about it, but then once you think about it it's not too bad

19:13 <mjg> for example they open a file, you don't store the name

19:13 <geist> i thin kthe hard part will be network fses, they may require a bit more thinking re:dentries being mandatory

19:13 <mjg> then you lsof or compatible and there is no entry

19:13 <mjg> sucks

19:14 <geist> what i was surprised to hear is stuff like tmpfs doesn't need to really maintain its own dir structure, since it apparently just tosses stuff in dentries

19:14 <heat> the funny part of linux dentries is that filesystems can invalidate them

19:14 <geist> and thus you get the VFS storing your data for you

19:14 <geist> oh sure. i'd *assume* the fs can invalidate them. it has to

19:14 <mjg> it does not need to do it indeed, i have plans to utilize it in freebsd

19:14 <mjg> right now it is de facto double allocating names

19:14 <geist> well hard links would generally do that

19:14 <geist> but that should be fine: you just have to dentries pointing at the same vnode

19:15 <mjg> right

19:15 <mjg> fun fat, solaris refrains from entering tmpfs vnodes into dnlc

19:15 <mjg> instead they have a local hash

19:15 <geist> i think the dentry thing also nicely solves bind mounting and recursive mounts, i think

19:15 <heat> what's dnlc

19:15 <geist> though actually i havne't resolved in my mind completely how that works

19:15 <mjg> directory name lookup cache

19:16 <mjg> that's how old unix systems call it

19:16 <heat> I do all my mounts in dentries

19:16 <geist> ie, if you mount /a and then mount something else at /a/b and then at a/b /c you bind mount it, do you get /c/b? i think in linux you dont

19:16 <mjg> geist: i have to admit me neither :)

19:16 <heat> a mountpoint is just a dentry with DENTRY_FLAG_MOUNTPOINT

19:16 <geist> ie, bind mounts dont drag along the entire submount structure, so there must be some way to nak mountpoint dentries when traversing

19:16 <mjg> so to be clear, the idea is not linux-specific, dragonflybsd also has the general approach

19:16 <mjg> but i don't know how it differs

19:17 <mjg> geist: you use a touple of dentry + mount point to do the looukp

19:17 <heat> geist, I would imagine you would

19:17 <geist> that's kinda what i'm thinking

19:17 <geist> heat: i think not, because it also lets you recursively bind mount, and you dont want that to go forever

19:17 <heat> sure, you have a limit to recursion

19:17 <geist> ie, /a can be mounted at /a/b and i think it only goes one level?

19:17 <heat> symloops baby

19:18 <geist> anyway yeah there must be some sort of mount id instance or something that causes it to nak traversing it the second time, etc

19:18 <heat> yes

19:18 <heat> I forgot what its called

19:18 <heat> vfsmnt?

19:19 <heat> *shrug*

19:19 <geist> anyway the old scheme i used to use for VFSes is to store the mount point in the vnode itself. basically a field that says 'i'm covered by <pointer to new mount instance>'

19:19 <geist> but putting it in the dentry makes sense, if the design makes it load bearing

19:20 <mjg> wow they really don't have a dtor https://elixir.bootlin.com/linux/latest/source/mm/slab.h#L205

19:20 <bslsk05> elixir.bootlin.com: slab.h - mm/slab.h - Linux source code (v5.19.12) - Bootlin

19:20 <heat> I know

19:20 <heat> I gave you the patch that removed it

19:20 <mjg> heat: in 2007

19:20 <mjg> heat: for all i know it came back since then

19:20 <heat> https://marc.info/?l=linux-kernel&m=117882364330705&w=2 is the discussion

19:20 <bslsk05> marc.info: '[RFC] Slab allocators: Drop support for destructors' - MARC

19:22 <mjg> i'll read it later, have to go afk

19:22 <heat> geist, you know, I was just thinking, what if its ok to perform construction at slab allocation in zircon since you can just make construction lighter

19:22 <heat> particularly since google's coding style effectively suggests that

19:22 <heat> (with no exceptions, etc)

19:23 <geist> yah, would have to probably templatetize a slab instance on it so it knows which constructor to use, etc

19:23 <geist> and then that's mega code bloat (but that's probalby what most folks would do first)

19:23 <heat> the only drawback is that you'd stop being able to do something like "new inode{inode_num, uid, gid, ...};"

19:23 <geist> so you end up with a type safe slab

19:24 <heat> yeah

19:24 <geist> yah you'd need an Init() routine,, which is actually what the style generally calls for

19:24 <heat> I have a type safe memory pool but it's ass

19:24 <geist> anything that's not trivial in the kernel generally has a ctor + Init() routine pattern

19:24 <geist> because Init can return error, etc

19:24 <heat> particularly, you can't have a list of slabs, etc

19:24 <heat> and then, yes, template bloat woohoo

19:25 <geist> yah, but if you do it right can probably avoid any function pointers, at the expense of code bloat

19:25 <geist> the fun art is carving off inner routines and making them just arg driven

19:25 <geist> like the internal memory allocation and slicing routines

19:25 k8yun has joined #osdev

19:26 <geist> but really template bloat is par the course in Google Style. for better or worse. i'm a bit aghast at it but what can i do

19:26 <heat> avoid it? :P

19:26 <geist> "The avalanche has already started, it's too late for the pebbles to vote"

19:27 <heat> every time I look at zircon I find yet another thing that got replaced by some overengineered C++ solution

19:27 <heat> usually with templates

19:27 <heat> last time it was CPUID

19:27 <geist> agreed. imagine when it's your stuff that's being replaced. it's a bit difficult at times

19:27 <geist> oh dont get me started on the cpuid stuff.

19:27 <geist> at least the codegen on that stuff is the same, the ergonomics are just insane. and people *like it*

19:28 <geist> that's what i can't argue with. a whole class of folks just love that stuff

19:28 <geist> 'look at how unit testable it is! the fields are self descriptive! i can <insert random templatey neat thing you can do with it>'

19:28 <geist> *shrug* yeah it is 'better' i guess. i just like simple unless complexity is necessary

19:29 <heat> yeah but then the kernel doubles in compile time and maybe increases in size

19:29 <geist> yup. and i've made complaints about that before, but the general retort is 'can you prove that the increase in size is causing a problem?'

19:29 <heat> uint8_t ProcessorId::local_apic_id() const {return fbl::ExtractBits<31, 24, uint8_t>(registers_.ebx());}

19:29 <heat> just shift and mask?

19:30 <geist> sure, but folks that like that stuff point out that that solution is totally superior: no chance for mistakes

19:31 <geist> thing is that folks in the trenches that write this sort of stuff for a living in C or whatnot point out that these sort of bit level mistakes are really not that common. 95% of driver/low level stuff errors are not putting the wrong bit in the wrong place, they're overall logic errors, or misunderstandisng of hardware

19:31 <geist> which no amount of templatey goop will solve

19:31 <heat> yea

19:31 <mjg> but can obfuscate enough for you to bea ble to claim progress :>

19:31 <mjg> refactor this, refactor that, work is being done boss

19:32 <geist> but folks that haven't written drivers for a living love looking at the 'fragile nasty C' code and building complex abstractions to solve a perceived problem

19:32 <heat> is there a divide between old school and C++ people?

19:32 <geist> but at the end of the day what you *can* do is at least make sure the codegen isn't bad

19:32 <geist> heat: yes but the c++ people outnumber the old school 10/1

19:33 <geist> so what can you do

19:33 <heat> :|

19:33 <geist> anyway, it's not as bad as i make it out. there are smart C++ people. they know what is an isn't a good idea re:codegen

19:33 <j`ey> join a linux team!

19:33 <heat> I think geist would rather die a painful death

19:33 <geist> nah, linux is way to toxic. it'd be like asking me to play CSGO

19:34 <heat> what's wrong with csgo

19:34 <geist> toxic

19:34 <heat> noooooo

19:34 <heat> rocket league is way worse

19:34 <geist> it's all shades of toxicity

19:34 <heat> if you just dick around in csgo no one will fuck with you

19:34 <geist> fine. anyway re all this i got some work to do today

19:34 <heat> in rocket league you get "What a Save!"'d 4 times a match at least

19:35 <heat> j`ey, whats your favourite directory in linux

19:36 <heat> if you say arch/arm64 I know you're lying, that's a PR answer

19:36 <heat> arch/arm is even worse because no one likes that dir

19:36 MiningMarsh has quit [Ping timeout: 252 seconds]

19:37 MiningMarsh has joined #osdev

19:39 <j`ey> heat: drivers/firmware/efi

19:39 <heat> lie

19:39 <heat> what a fucking lie

19:40 <j`ey> D:

19:41 <heat> the only correct answer is in a hipster kernel

19:41 <heat> you have like 5 BSDs to choose from, and 9front

19:42 <heat> also darwin

19:42 <heat> and *ONYX* obv

19:42 <ddevault> the rewrite is paying off https://l.sr.ht/UekL.png

19:42 <ddevault> couldn't do this in the old version

19:42 <j`ey> heat: what is it?

19:43 <heat> i don't know

19:43 <heat> you choose

19:45 <heat> linux loving is boring, stanning a hipster kernel is cool

19:45 <heat> oh, forgot haiku, silly me

19:46 <j`ey> but I spend all my time in arch/arm64 D:

19:48 nick64 has joined #osdev

19:49 <heat> also too mainstream

19:49 <heat> I think ARM should pay you to work on VAX support

19:50 <heat> what's a unix without vax support

19:50 <nick64> ring0 and ring3 in Intel aren't just software constructs, but physical indicators in the processor, right?

19:50 <heat> yes

19:50 <gog> yeah

19:51 <gog> the instructions the current process has access to will depend on the CPL

19:51 <heat> x86 has ring 0, 1, 2 and 3, and then SMM and "theoretically" a hypervisor if you're under it

19:51 <gog> ring -1

19:51 <gog> as it's sometimes called

19:51 <heat> confusingly you can have SMM under a hypervisor as well

19:52 <heat> rings in x86 are hard

19:52 <gog> yeh x86 is a strange cpu

19:52 <heat> anyway use 0 and 3

19:52 <heat> the rest don't matter

19:52 <gog> yeah 1 and 2 aren't really useful for anything

19:52 <gog> almost nobody used them

19:52 <gog> they were supposed to be for privileged applications like kernel services for microkernels

19:52 <nick64> That is the very next question I was going to ask, are the negative rings really hardware indicators, or just indicators based on some register bits, as a software construct?

19:53 <gog> it's more like system management mode is an entirely different thing

19:53 <gog> and the programmer doesn't really have access to its inner workings

19:53 <heat> gog, funnily enough RISCV has M (machine) mode for fw and S (supervisor) mode for kernel stuff but you can't be in M mode under a hypervisor

19:53 <gog> interesting

19:54 <heat> so QEMU can't load firmware if kvm is enabled

19:54 <nick64> A virtual machine monitor in a type 1 hypervisor, from one point of view is a kernel of the type1 (ring 0), and from another point of view ring -1, since it is hypervisor. Now I am confused, what is it?

19:54 <heat> nick64, they're hardware modes

19:54 <ddevault> rdmsr/wrmsr proposed for worst x86_64 instruction

19:54 <heat> it's a fine instruction

19:54 <heat> slow as balls, but fine

19:54 <gog> use sparingly

19:55 <heat> nick64: anyway, hypervisors are funky. the "ring" they are really depends on where you are down the stack

19:55 <heat> if you're under it, -1, if you're alongside it, 0

19:55 <heat> all of this is relative and life is relative and nothing is absolute

19:56 <zid> idk why people insisted on extended the ring model beyond its means

19:56 <heat> yeah

19:56 <heat> it's just marketing at this point

19:56 <zid> we're up to like ring -8 now

19:56 <nick64> heat: I know that ring0 and ring2 are "real" and "physical". Can't wrap my head around ring -1 being physical

19:56 <zid> and we only had 0 and 3 to begin with

19:57 <heat> nick64: negative rings are just concepts around things that are "above you" in the traditional sense

19:57 <heat> hypervisors are one of them, so is SMM

19:57 <zid> I could make an interpretation for ring -1 being "physical". The silicon is cheating on you and presenting an orchestrated fake physical reality.

19:57 <heat> yes

19:57 <nick64> zid: what other way would you suggest instead of extending the rings, without making it insecure?

19:58 <zid> rings has nothing to do with security or lack thereof

19:58 <heat> zid isn't talking about "extending the rings"

19:58 <zid> it's a silly descriptor of a methodology

19:58 <heat> it's that the ring numbers are stupid and don't make sense

19:58 <zid> you don't have to remove metholody to remove a silly name

19:58 <heat> 0 - 3 really exist as rings

19:58 <ddevault> new iso should probably work should anyone care to test it for me: https://mirror.drewdevault.com/boot.iso

19:59 <heat> ddevault, doesn't work in kvm

19:59 <nick64> I see.. so you mean it should have been called ring0,1,2,3, and there should be some other naming/hardware bits to indicate the virtualization partition?

19:59 <ddevault> heat: aware

19:59 <ddevault> haven't bothered to investigate yet, patches welcome

19:59 <heat> okgreat

19:59 <ddevault> works on qemu softmmu and on my laptop, that's enough for now

20:00 <heat> nick64, arm64 and riscv levels are saner

20:00 <zid> I assume "halt" "unhandled IRQ1" is not what it's supposed to do?

20:00 <ddevault> that's fine

20:00 <ddevault> expected result is https://files.catbox.moe/h6r2g1.jpg

20:00 <gog> yeh if we wanna get really technical the ISA and the microarchitecture are not the same and what the programmer sees is a façade

20:00 <ddevault> halt is by design, unhandled IRQ1 is because there's no keyboard driver running

20:00 <zid> I didn't press anything I swear :,

20:01 <ddevault> doubt

20:01 <heat> nick64, the really important part to grasp is that every "ring level" is "physical"

20:01 <heat> as in, it exists

20:01 <nick64> heat: So say CPU is in ring 0, and decides to kick start the virtualisation stuff to spin up a guest. Does the CPU that is handling the monitor sort of "become" ring -1 (from the ring 0 it was), or is it just a visualisation, and it is still ring 0, with the VMCS indicating it is ring 0 root or ring 0 non-root?

20:01 <heat> SMM is more privileged than ring 0, hypervisors are more privilege than ring 0

20:01 <heat> last

20:02 <zid> technically

20:02 <zid> you end up with a clone of ring 0-3 again

20:02 <heat> yea

20:02 <zid> the guest has its own, the host has its own

20:02 <zid> the guest isn't supposed to "know" that this is the case

20:02 <heat> yeah

20:02 <zid> but the host can intercept things that the guest ring 0 did

20:02 <heat> TL;DR the negative ring numbering is bullshit

20:02 <nick64> Oh! So root partition has rings 0-3, and each non-root partition has their own ring 0-3 rings?

20:02 <heat> yes

20:02 <zid> so you can say that in *practice* the host ring 0 is actually the guest's ring -1

20:03 <heat> you can even have hypervisors under hypervisors

20:03 <zid> but it doesn't make ring -1 *exist*

20:03 <heat> yeah

20:03 <nick64> So it is not a hardware thing below ring 0, just that it is a software datastructure (aka VMCS structure)?

20:03 <nick64> How accurate? ^

20:03 <heat> it's a hardware thing

20:03 <zid> the hardware implements it

20:03 <zid> else it wouldn't work at all

20:03 <zid> there has to be some silicon to let you make a 'fake' ring0, that you can control from the 'real' ring0

20:03 <heat> on a similar note, you can have SMM under a hypervisor and SMM above the hypervisor

20:04 <heat> and both can stop you

20:04 <zid> And yea, the motherboard is also doing it to you in the first place

20:04 <heat> the difference is that the SMM above the hypervisor takes you out the vm, SMM under the hypervisor takes you to SMM inside it

20:04 <zid> Your host ring0 has its own ring -1, the motherboard, and the motherboard might have its own ring -1, the NSA :P

20:04 <nick64> Hardware implements the negative rings, or hardware implements the magic to make sense of the "software" VMCS structure on the memory?

20:04 <heat> which is -2 and which isn't? they both are

20:04 <heat> dude

20:04 <heat> negative rings don't exist

20:04 <zid> you're listening but you ain't listening

20:05 <heat> we've been through this

20:05 <zid> negative rings are an *abstraction*

20:05 <zid> they are not *real*

20:05 <nick64> Ok let me read everything all over again :D

20:05 <heat> it's just stupid marketing

20:05 demindiro has joined #osdev

20:05 <zid> It's just how you make sense of a guest client thinking it's in ring0, but not really, what 'number' do you give to the 'real' host ring0 priv level

20:05 <heat> -1 and -2 exist to make you go "woowwww there's something above ring0?????"

20:06 <nick64> How does CPU know if the ring0 it is in, is in fact the actual ring 0 or one of the VM's ring 0?

20:07 <nick64> Or in marketing terms, how does CPU actually know if it is in ring -1 or in ring 0?

20:07 <nick64> (Not asking how an application code can ask CPU this, how does the CPU itself know)

20:07 <gog> rings are fake. programmign is fake

20:07 <gog> our lives are meaningless

20:08 <nick64> How does CPU know to happy execute in/out in the right 0 of the host, but to cause a trap in ring 0 of the guest?

20:08 <gog> anyhow, there are flags for vm hypervisors

20:09 <gog> so it can know if it's in a "real" ring0 or not

20:09 <zid> of course the cpu knows?

20:09 <zid> How could it not, it's the one that implements it

20:09 <zid> if you did the "jump to guest instruction", it knows that happened

20:09 <zid> and can keep track if it in a bit somewhere

20:09 <nick64> I mean, if it "knows", then it is no longer "not real"?

20:09 <gog> vmenter and vmexit

20:09 <zid> ring -1 is not real, correct

20:09 <demindiro> Rings are a social construct and hardware state is magic

20:09 <gog> yes

20:09 <zid> there's the host ring0, and the guest ring0, that's it

20:09 Gooberpatrol_66 has quit [Remote host closed the connection]

20:10 <zid> the cpu knows which one it is in, as to whether it ends up vmexiting back to the host or not when something fun happens

20:10 <zid> that's the entire implementation and semantics

20:10 <zid> there's no special ring -1, either in the manual or in the cpu

20:11 <zid> there's just "if we vmentered, fun things should run vmexit"

20:11 <nick64> Okay, bear with me for a moment to make a contradicting statement, "ring -1 exists, but it is more appropriate to call it 'ring 0 of root-partition that came into picture post vmxon' rather than the 'stupid?' marketing term?

20:11 <zid> why is it

20:11 <zid> every time you are told ring -1 exists

20:11 <nick64> How wrong am I ^

20:11 <zid> doesn't exist*

20:12 <zid> you start every subsequent summary with "ring -1 exists"

20:12 <zid> :P

20:12 <nick64> :yikes:

20:12 <zid> even if you're now trying to devil's advocate, you seemingly can't resist

20:13 <zid> It's a buzzword you *might* see on the product info of a software product, where it says "LATEST TECHnOLOGY" and "APPROVED BY 97% OF ALL DOGS"

20:13 <gog> semi-related side note but i've been taking locker 386 at the gym because it's a really easy number for me to remember :D

20:13 <zid> nice

20:14 <zid> I did something similar recently, I got 386th place in trackmania so I just stopped improving that track

20:14 <zid> why spend a bunch of time practicing how to ruin it

20:14 <gog> nice

20:14 <nick64> So ring -1 does not exist. But there exists some sort of hardware magic where CPU can tell if the ring0 is in host or guest. And in that point of view, even the virtual machine monitor is actually just ring 0 (and not some marketing ring -1 stuff), just that it is in root partition which CPU knows is more privileged

20:15 <zid> still can't resist

20:15 <zid> host, and guest, these are the terms you want

20:15 <zid> the rest you can delete

20:16 <nick64> Is VMX partitioning real?

20:16 <zid> Does vmenter and vmexit deal in vmx partitions

20:16 <zid> if yes, yes, if no no

20:18 <nick64> Okay I have a better phrased version of my first question. Let us say CPU is running in ring 0. We all agree ring 0 is real I guess. When the code that CPU is running does all that vmxon, vmptrload, vmlaunch magic, and the control flow is transferred to the Ubuntu guest machine kernel of mine, does the "real ring 0" status of the CPU remain in ring 0 itself?

20:19 <gog> the cpu is operating in a more-or less normal mode with the exception of certain instructions and access to certain regions of memory

20:20 <gog> so when it's in guest ring 0, it's in ring 0 with caveats

20:21 <gog> but the guest OS doesn't know or care

20:21 <gog> it can do whatever it wants and the hypervisor services it

20:22 <nick64> Right. And when there is a VMExit, the code that handles the exit, although the marketing folks would say is in ring-1, is actually just ring0 itself, but with some other part of the CPU not restricting those privileged instructions?

20:23 <gog> yes

20:24 <zid> so you can probably appreciate why someone *might* refer to it as ring -1, given it's a "super ring0 with privledged instructions" now, given that "guest ring0 without privledged instructions" exists

20:25 <nick64> So if the Ubuntu guest, from within it, spawns up another VM inside of it, as far as the CPU is concerned, both the guest ring 0s are of the EXACT SAME privilege, and the only difference is the address of where the exit handler function pointer and other VMCS related points to?

20:25 <zid> but it's just a cheeky abstraction to make some lengthy exposition of the situation disappear, not a real thing

20:26 <nick64> Same question simplified, as far as the CPU is concerned a VM's kernel running in ring 0, and a nested VM within that VM's kernel which is also running at 0, are of same privilege?

20:26 <zid> the only privledge it officially acknowledges are 0, 1, 2 and 3

20:27 <zid> of which software uses 0 and 3

20:27 <zid> nested vm shit is just nested vm shit

20:27 <nick64> So how does it differentiate the security privilege of VM1 that runs on host, and the VM2 that is spawned by VM1?

20:27 <zid> it makes various datasets have a host and guest relationship

20:27 <nick64> VM2 ring0 is not supposed to be able to control VM1 kernel right

20:28 <zid> no, that's the point of the encapsulation provided by the virtualization

20:28 <zid> vm2's the guest of vm1, its host

20:28 <zid> vm1 is in the same relationship and position toward vm0

20:29 <nick64> CPU keeps tracks of the foreward allow access backwards don't relationship at multiple depts of nesting?

20:30 <zid> yes, it just knows where to look for the current set of flags and shit and updates accordingly at vmenter/exit

20:30 <zid> given you provide it when you set up the shit in each layer

20:30 <clever> nick64: aarch64 hypervisor api makes it far simpler to keep track of all of that and to explain it

20:33 <clever> nick64: https://i.imgur.com/MARtIZd.png

20:33 <clever> EL2 is where the hypervisor runs, and it has access to special hypervisor registers

20:33 <clever> with those registers, you can add a second set of paging tables, so the "physical address" from EL1(kernel mode) goes thru a second set of translations

20:34 <clever> and it can configure the cpu to trap any unauthorized or emulated function

20:34 <clever> EL1 then just cant access hw it isnt allowed to access, causing either a normal pagefault or a handler in EL2 to run

20:34 <clever> any time you context switch to another guest, EL2 has to re-configure all of those registers, to set the limits based on that other guest

20:36 <nick64> I wonder how it is restoring the trashed register states. Pushing to some virtual memory location like function stacks?

20:36 <clever> same as any other irq or exception handler, save all registers to a stack upon starting the handler

20:37 <nick64> Oh maybe intel is also doing something like that for nesting VMCS

20:37 <heat> demindiro, exactly

20:37 <heat> you're the most right person here

20:37 <clever> i think for nesting vm's, the hypervisor has to trap access to vm control registers

20:37 <clever> and then emulate the hypervisor api

20:37 <zid> heat: can I be 3rd most right?

20:37 <heat> tes

20:37 <heat> yes*

20:37 <heat> but also tes

20:37 <zid> I figure if I go for 2nd, I might have to fight

20:38 <j`ey> clever: nested virt requires extra stuff, linux doesnt even support it yet

20:38 <heat> x86 privilege levels are stupid and magic

20:38 <zid> and while those guys are fighting, I'll snap up 3rd

20:38 <heat> i would never give you 2nd

20:38 <heat> i'm 2nd

20:39 <nick64> I think we'd need to invite Steven Crowder to settle this

20:39 <zid> so we have demindiro, stallman, zid

20:40 <heat> hello it's me, richard stallman

20:40 <heat> pleasure to meet you

20:40 <zid> how do you feel about parrots

20:40 <heat> gnu and gplv3+ and GNU plus linucks and Hurd and tivoization and pedofiles and all that

20:40 <heat> pedophiles?

20:40 <heat> probably

20:42 <heat> richard stallman, matthew garrett, undead corpse of tony blair and a literal poltergeist is all this channel comes down to in the end

20:44 <zid> which one of us is the corpse of tony blair

20:44 <heat> yes

20:45 <zid> oh

20:45 <heat> the real corpse of tony blair is the friends we make along the way

20:45 <CompanionCube> objection: for better or worse, tony blair is still alive

20:46 <heat> but is he really

20:47 <nick64> heat: is there any difference whatsoever in privilege or anything, when it comes to "the kernel thread that executed the vmx set of instructions to become the monitor for a VM" vs "any other threads spawned my the kernel of the same host machine, running on the metal"

20:47 <CompanionCube> https://www.thetimes.co.uk/imageserver/image/%2Fmethode%2Ftimes%2Fprod%2Fweb%2Fbin%2F059a1e48-a7a1-11eb-9b76-9500a3917e5f.jpg?crop=1866%2C1050%2C139%2C75 look at that face, it's bad

20:47 <CompanionCube> well, mostly the hair is bad

20:48 heat has quit [Remote host closed the connection]

20:48 heat has joined #osdev

20:48 <heat> the link crashed my irc client

20:48 heat has quit [Remote host closed the connection]

20:48 <zid> the linux

20:48 heat has joined #osdev

20:49 <CompanionCube> the curse of tony blair

20:49 <heat> oh wow I can't open that link lmao

20:49 <zid> to be fair, it's thetimes.co.uk, you probs don't want to

20:49 <CompanionCube> try it in a browser maybe?

20:49 <zid> It'll somehow make you vote conservative by declaring that red wine gives you cancer

20:50 <CompanionCube> zid: that's the daily express or mail

20:50 <zid> and the times

20:50 <heat> yeah the hair is NOT IT

20:50 <zid> and the express

20:50 <zid> and the sun

20:50 <heat> the s*n

20:50 <zid> and the sunday express

20:50 <zid> and the telegraph

20:50 <zid> and the evening telegr-

20:50 k8yun has quit [Ping timeout: 268 seconds]

20:50 <CompanionCube> yes but the two mentioned are the ones who do the cancer thing

20:50 <heat> and SKY NEWS

20:50 <heat> PRAISE BE THE MURDOCH

20:51 <zid> Turns out if you put restrictions on how much of a right-wing dipshit you can be on TV, murdoch buys all your newspapers

20:51 <zid> except the left leaning one which is owned by.. a russian oligarch!?

20:52 <CompanionCube> and since it's a picture of tony blair, rather he makes you vote New Labour and pretend people still like the Third Way.

20:52 <Ermine> Wtf is happening here

20:52 <heat> zid, the guardian is nice

20:52 <CompanionCube> zid: a russian oligarch who's buddies with boris enough to get into the lords, no less!

20:53 <CompanionCube> You wonder how he feels about that 'baron of siberia' title post-invasion.

20:53 <zid> UK media needs a reset button pushing

20:53 <\Test_User> media in general does

20:54 <Bitweasil> False! According to major media fact checking agencies, media is doing just fine, and you're the fault! MediaFactCheck++ Certified Answer!

20:54 <Bitweasil> :(

20:55 <\Test_User> XD

20:55 <Bitweasil> Now's a good time to get hobbies like "kerosene lanterns" and "off grid power systems" and "Wow, you know, being cold isn't so bad when you bundle up!" :(

20:55 <zid> I have investigated myself and found no wrongdoing

20:55 <\Test_User> being cold doesn't feel bad once you've died

20:56 k8yun has joined #osdev

20:56 <heat> mjg, how do you make slab go fassssstttttttttttt

20:58 <nick64> Is the vm monitor thread running in ring0 (which the hypervisor product marketing would call it as ring -1) somehow any different from the rest of the ring0 threads running in the same kernel, or exactly same level/mode/privilege as far as CPU is concerned?

20:58 <heat> same pri

20:58 <heat> v

20:59 <Bitweasil> The code calling VMLAUNCH/etc is just "ring 0 code," with the same ability to do stuff as other ring 0 code.

20:59 <Bitweasil> There's nothing special about it.

20:59 <heat> your only privilege is with respect to what's running under you

21:00 <nick64> Cool. I think that concretises the idea that -1 is not real, and that is is the guest side of this is which is something new (ability to cause a trap on priv instrs)

21:00 <Bitweasil> Well... from whose perspective?

21:00 <heat> no ring is real in x86

21:00 <Bitweasil> ^^

21:00 <heat> it's all about perspectives

21:00 <gog> one ring to rule them all

21:00 <Bitweasil> From the *host* point of view, ring 0 is ring 0, the guest is running as a VMX guest, with its own rings.

21:00 <gog> precious

21:00 <\Test_User> aka intel me

21:00 <heat> ARM and RISCV are a lot more concrete

21:00 <Bitweasil> From the *guest* point of view, there are things "below ring 0."

21:00 <heat> \Test_User, intel ME is not a ring

21:01 <\Test_User> heat: true, but it does rule them all

21:01 <Bitweasil> Yeah, ME is a separate CPU core in the system, separate memory, etc.

21:01 <Bitweasil> Not really.

21:01 <Bitweasil> I'd argue that SMM, or the STM, are the lowest levels of x86.

21:01 <nick64> I think \Test_User meant SMM?

21:01 <heat> STM?

21:01 <Bitweasil> "Yo dawg, I heard you like hypervisors, so I put one in SMM so you can hypervise while you hypervize."

21:02 <\Test_User> lol

21:02 <demindiro> So bureaucracy

21:02 <Bitweasil> It's an obscure little corner of x86 I happen to know very well.

21:02 <zid> The lowest level of x86 is actually the superio chip

21:02 <nick64> I wonder if SMM to ME is some sort of processor to processor interrupt

21:02 <Bitweasil> No...

21:02 <zid> because I can fake keystrokes to log in as root via it, ergo it's the deepest attack vector

21:02 <heat> ME is totally separate

21:02 <nick64> Lowest level of x86 would be microcode, right?

21:02 <Bitweasil> So, SMM, system management mode, was the most privileged place in x86.

21:02 <heat> sure

21:02 <Bitweasil> And SMM handlers were a hot mess.

21:03 <Bitweasil> BIOS vendors largely said, "Well, we can't fix it, the guy who wrote that retired a decade ago."

21:03 <heat> Bitweasil, they're getting rid of a good chunk of SMM

21:03 <heat> PRM they call it

21:03 <Bitweasil> Oh? Link plz?

21:03 <Bitweasil> Anyway, SMM can violate hypervisor separation of guests, it can touch everything.

21:03 <Bitweasil> And back in the day, the High Assurance Platform (HAP) was upset about this.

21:03 <gog> SMM is the spooky ghost in the machine

21:03 <Bitweasil> So Intel solved it by adding a hypervisor over there.

21:03 <heat> let me find the whitepaper

21:04 <Bitweasil> So now, that hypervisor, the STM, logically lives "below" the executive hypervisor in the main operating space (you vmexit from ring 0 to the STM, vmlaunch from the STM to normal land ring 0).

21:04 <Bitweasil> And it can sandbox the existing legacy SMM handlers.

21:04 <Bitweasil> So now, *that* code is the most powerful code on the platform.

21:04 <Bitweasil> And if you're a tiny bit creative, you can do an awful lot of fun things from the STM with regards to introspection, snooping on other hypervisors, etc.

21:04 <heat> Bitweasil, https://uefi.org/sites/default/files/resources/Platform%20Runtime%20Mechanism%20-%20with%20legal%20notice.pdf

21:04 <\Test_User> interesting

21:05 <heat> https://github.com/tianocore/edk2/tree/master/PrmPkg for the codez

21:05 <bslsk05> github.com: edk2/PrmPkg at master · tianocore/edk2 · GitHub

21:05 <Bitweasil> Ooh, thanks!

21:05 <Bitweasil> "Problems with SMM" section is nice and to the point. It's an opaque hot mess. We've been polite, but... that's what it is.

21:06 <heat> for sure

21:06 <heat> it's a fucky area

21:06 <heat> no one really likes it

21:08 poyking16 has quit [Quit: WeeChat 3.5]

21:09 <Bitweasil> Dunno, I like it! :D

21:09 <Bitweasil> I've gotten paid for a lot of years to mess in it.

21:09 <Bitweasil> I mean, that's only because it's a poorly understood hot mess, but...

21:09 <Bitweasil> :p

21:10 <nick64> Some SMI handlers will be moved to OS context as part of PRM? Won't that dissolve all the security boundary between the OS and the SMM?

21:11 <heat> this is for code that doesn't need a security boundary

21:11 <nick64> If CPU executes in SMM privileges on code that can be modified by a malicious OS!

21:11 <Bitweasil> It depends on what you're doing. A lot of the stuff SMM does is just platform-specific stuff that happens to be shoved into SMM because legacy reasons.

21:11 <Bitweasil> Those *capabilities* need to be there, but they don't need to be executed in SMM.

21:11 <heat> you'll still need it for things like the lockbox or authenticated variables

21:11 <Bitweasil> They just happen to be, for legacy reasons.

21:11 <Bitweasil> Right, or hopefully NOR flashing.

21:11 <heat> >If CPU executes in SMM privileges

21:11 <heat> it doesn't

21:11 <heat> that's the point

21:12 <nick64> The code that is executing maybe something that does not need a security boundary, but the attacker in the OS can very well modify the code into the ones that open sesame, if the CPU fetch and execute from the OS mapped memory after getting into SMM mode, right?

21:12 <nick64> Oh got it

21:12 <Bitweasil> So, for instance, USB/PS2 emulation lives in SMM because it can be transparent. There's no reason that *needs* to be in SMM. It just... is.

21:12 <Bitweasil> Or some thermal response stuff.

21:12 <Bitweasil> Etc.

21:13 <Bitweasil> You'd be exposing those services to the OS, where it can execute those things without having to trap to SMM.

21:13 <Bitweasil> Because a multicore sync over there is hella-expensive.

21:13 <Bitweasil> (it doesn't *have* to happen, but most modern systems will lock up if you don't do that)

21:13 <nick64> I see. in that case it might actually be better for security, reducing code footprint on high privilege areas

21:13 <Bitweasil> Correct.

21:13 <Bitweasil> If you can move "the random crap that's in SMM now that doesn't actually need to be there" to somewhere not-SMM, that's an improvement.

21:14 <Bitweasil> And should perform a lot better too.

21:15 <heat> yeah

21:15 <heat> outb 0xb2 is super duper expensive

21:15 <Bitweasil> mmhmm.

21:15 <zid> what's on b2?

21:15 <Bitweasil> Super de duper mega expensive, flashing mah bling, making it rain microseconds!

21:15 <heat> register on intel chipsets that triggers an SMI

21:15 <Bitweasil> That's a traditional trap to SMI.

21:15 <zid> ah interesting

21:15 <zid> although every outb is pretty slow isn't it, because it does ISA bus speed, 1 microsecond

21:16 <Bitweasil> "These are my fathers ports. Archaic interfaces, for a single-core time."

21:16 <Bitweasil> "... yeah, you still have to learn them, because stuff still uses them."

21:16 <Bitweasil> "Yes, it sucks. No, nobody's going to get rid of them. Still gotta boot DOS, after all. Now, let me tell you the story of the A20 gate..."

21:16 <heat> the A20 is not a thing anymore

21:17 <Bitweasil> Last I saw it was deprecated. Is it actually *gone gone* now?

21:17 <heat> no reset vector messes with that, qemu already comes with it enabled

21:17 <Bitweasil> (deprecated != removed)

21:17 <heat> I don't know if it's gone, but it's definitely enabled by default

21:17 <Bitweasil> But mah DOS!

21:17 <heat> actually

21:17 <heat> hasn't it always?

21:18 <Bitweasil> No, I thought it used to boot with it set to only 1MB of RAM, so non-A20 aware OSes would work properly.

21:18 <heat> I guess it's just a thing you disable when BIOS booting/CSM

21:18 <Bitweasil> That would make sense.

21:18 <heat> because there's no way you could reach the reset vector if A20 was masked

21:18 <Bitweasil> Hm? I'm pretty sure you can...

21:18 <heat> how?

21:19 <Bitweasil> Reset vector is just under 1MB, right?

21:19 <heat> no

21:19 <heat> 0xfffffff0

21:19 eck has quit [Quit: PIRCH98:WIN 95/98/WIN NT:1.0 (build 1.0.1.1190)]

21:19 <heat> it was there in 286 times

21:20 <heat> the last 64KB (IIRC) are still mirrored there for Legacy Purposes

21:20 <heat> (tm)(r) Legacy is a trademark of Intel Corporation

21:20 <Bitweasil> I thought it was set up so the segment/offset got you the right thing.

21:20 <heat> no

21:20 eck has joined #osdev

21:20 <Bitweasil> I mean, INIT-SIPI-SIPI is just to work around a bug in *one* 486 variant...

21:20 <Bitweasil> (you don't normally need the second one)

21:21 <heat> at reset your segments look 286-ish (and would theoretically point to the legacy 1MB area), but anything 386+ has a segment cache that points to the upper 4GB reset vector

21:21 <Bitweasil> Anyway, I've not actually done x86 bringup in a while, so... I will happily defer to anyone doing it currently.

21:21 <heat> this is actually pretty well defined in the SDM

21:21 Starfoxxes has quit [Ping timeout: 265 seconds]

21:21 h4zel has quit [Ping timeout: 246 seconds]

21:22 Starfoxxes has joined #osdev

21:22 <heat> it's so theoretically 286s and 386s could boot in the same mobo

21:22 <Bitweasil> *nods*

21:23 <heat> just like itanium had an itanium entry point and an ia32 entry point in their firmware

21:23 <Bitweasil> I've mostly played in the 64-bit spaces.

21:23 <Bitweasil> UEFI is a mess, but it's a different kind. :)

21:23 <heat> this also applies to UEFI

21:23 k8yun has quit [Read error: Connection reset by peer]

21:23 <Bitweasil> Sure, I'm saying I play further along in the process.

21:24 <heat> yea

21:24 <Bitweasil> Most of my low x86 work is 64-bit hypervisors and SMM/STM>

21:24 k8yun has joined #osdev

21:24 <heat> I like PEI the most

21:24 <Bitweasil> I've done a bit of 32-bit work, but almost nothing below that.

21:24 <heat> out of UEFI

21:24 <heat> DXE is soo damn bloated

21:24 <heat> and SMM is SMM :v

21:26 <Bitweasil> Yup.

21:26 <Bitweasil> :)

21:26 <Bitweasil> Fun place to be, though.

21:27 <Bitweasil> "Awww, look at you peons way down there in ring 0! It'd be a shame if *someone* meddled!"

21:27 <heat> i find it mildly scary at best

21:27 <heat> but you do you :P

21:27 pretty_dumm_guy has quit [Ping timeout: 264 seconds]

21:27 <Bitweasil> Oh, I said it was a fun place to be. Related, I would *very very much not like* other people to be in there.

21:27 <Bitweasil> Since I know what you can do from it.

21:27 <Bitweasil> Still irked that Blackhat talk was denied. Not flashy enough, I guess.

21:27 <heat> but security in firmware is very much just a sideshow :)

21:27 <Bitweasil> I submitted a nice talk on what you can do from down there.

21:28 <Bitweasil> You ever mess with QubesOS?

21:28 <heat> no

21:28 <heat> but yeah I can imagine the stuff you can do

21:29 <heat> it's like zombocom

21:29 pretty_dumm_guy has joined #osdev

21:30 <Bitweasil> "Anything you want."

21:30 <Bitweasil> You have the full system at your fingertips and can configure it how you want.

21:31 <Bitweasil> Would you like the local APIC to deliver performance monitoring interrupts as SMIs? Sure, have at it! :D

21:31 <Bitweasil> (and then you can more or less do your own tracing, get execution every few branches, point PEBS or such at it...)

21:32 <heat> I prefer writing the firmware and *forgetting to lock SMRAM*

21:32 <heat> then get people to exploit the shit out of it

21:32 <heat> (how to have ONE FUCKING JOB and fail miserably at it)

21:33 <Bitweasil> There's that, too...

21:34 <Bitweasil> Plus that fun little race condition from... what, 12-13 years ago?

21:34 <Bitweasil> "Unlock the bit. Oh, wait, SMM goes nuh uh and locks it again!" Fine, for the single core era.

21:34 <Bitweasil> But if you have another core racing on things... yeah, help yourself.

21:34 <Bitweasil> I honestly don't trust computers anymore.

21:34 <Bitweasil> I just do them because it pays well. :(

21:34 <heat> lmao

21:35 <heat> there was also a problem with local apic relocation and SMRAM

21:35 <heat> where you could use it to reach SMM

21:36 <mjg> heat: you don't do what solaris did

21:36 <mjg> heat: ... which is locks around it

21:37 <heat> you can have percpu caches but you still need locks

21:37 <heat> and apparently it's a *tiny* bit more complex than that

21:37 <heat> given that linux has 3 slab allocators, all of them thousand-line beasts

21:38 <mjg> you don't

21:38 <heat> you don't?

21:38 <mjg> as long as you guarantee there will be no allocs from interrupt context

21:38 <mjg> you can just disable preemption around the fast path

21:39 <Bitweasil> Yeah, SMM has been a choose your own adventure story for a long while.

21:39 <mjg> you try to pop an obj, which normally succeds and leave

21:39 <mjg> if not, you got to she slowpath, take locks and whatnot

21:39 <Bitweasil> "If you want to try a multicore race, turn to page 49. If you want to hope the firmware didn't lock it and just directly access it, turn to page 76."

21:39 <mjg> well it is a little more complicated than that if you have 2 magazines, but the general point stands

21:40 <heat> how do you pop it atomically?

21:40 <mjg> you don't need atomics if that's what you mean

21:40 <heat> how?

21:40 <heat> 2 cpus pop the same thing, what now?

21:41 <mjg> that would be an invariant violation

21:41 <heat> why?

21:41 <heat> are you just talking about the percpu cache?

21:41 <mjg> the entire point is that each cpu has its own collection of objs it can pop without fucking with anyone else

21:41 <mjg> yes

21:41 <mjg> for refiling the cache you do need locks

21:41 <mjg> (well that's the easiest anyway)

21:41 <mjg> but the fast path -- pop or push an obj, you get away with disabling preemption

21:42 <mjg> if you size things right, the locked slowpath will be rare(tm)

21:42 <mjg> (unless you are unlucky with a workload, which does happen)

21:42 <heat> right

21:43 <heat> or irqs

21:43 <mjg> provide a dedicated alloc for irqs

21:43 <heat> I like to allocate in interrupts

21:43 <mjg> which should not be allocating shit anyway

21:43 <mjg> just give them a dedicated magazine

21:43 <heat> hrm

21:43 <heat> actually, do I allocate in interrupts?

21:44 <mjg> anyhow solaris per-cpu locks in the fast path, which is pretty slow

21:44 <mjg> and in fact their supposed fast path is very long

21:44 * mjg mumbles something about i-cache remarks

21:44 <heat> per-cpu locks?

21:44 <mjg> a lock stored in each per-cpu instance of the slab, next to the magazine

21:45 <heat> why?

21:45 <heat> so they can shuffle things around I guess?

21:45 <mjg> which part

21:45 <mjg> why not preemption?

21:45 <mjg> or if thisp rotects against other cpus

21:45 <mjg> in the slab paper or some other one bonwick claims disabling preemption there is prohibitive

21:46 <mjg> and they don't even disable migration to other cpus either

21:46 <mjg> [i find the claim to be backwards fwiw :>]

21:46 <mjg> so in their setup they do need actual locks

21:46 <mjg> but this is self-induced, not inherent to per-cpu slab allocation

21:46 <heat> disabling preemption is prohibitive, but they need to disable it anyway?

21:46 <heat> I don't see how you would *not disable preemption*

21:46 <mjg> they don't disable it

21:47 <mjg> they just take a lock

21:47 <heat> so what if you get scheduled out

21:47 <heat> and something hits the slab allocator again

21:47 <mjg> and concede they may get off cpu, and then land on another one

21:47 <heat> do you just sit there and wait

21:47 <mjg> ye you take extra trips around sleep/wakeup

21:47 <mjg> which is why i consider his claim backwards

21:47 <mjg> anyway just disbale the fucking preemption and be donew ith it

21:47 <heat> define extra trips

21:48 <mjg> you take the lock, go off cpu

21:48 <mjg> whoever preempted you also wants to alloc, so they take the lock... but that fails since you have it

21:48 <mjg> ... so they go off cpu and wake you up

21:48 <heat> oh shit

21:48 <heat> sleepable locks?

21:48 <mjg> od you did not know

21:48 <mjg> YES

21:48 <heat> fuck

21:50 <heat> actually I'm way into this to not write a slab allocator now

21:50 <heat> also riscv irqs aren't working and I can't be bothered to patch QEMU

21:51 <heat> mjg, btw is the general consensus to use the big linear page mapping or do you map things virtually?

21:52 <mjg> i don't know what you mean here

21:52 <mjg> for what specifically

21:53 <mjg> for the most part you try to slap the kernel into a collection of huge pages

21:53 <heat> slabs

21:53 <heat> are you mapping things virtually here? as in linux vmalloc or so

21:53 <mjg> there is a tech debt-induced nuance here i'm afraid

21:54 <mjg> but tl;dr apart from crertain case where you huge page, you are going to inject singular mappings

21:54 <mjg> one at a time

21:54 <mjg> so that, if need be, you can free the page and give it to userspace

21:54 <heat> what's the nuance

21:54 <mjg> you really like to read my rant, don't you

21:54 <heat> yes

21:54 <mjg> have you heard of "type stability"?

21:55 <heat> no

21:55 <heat> pls go on

21:55 <mjg> historically the old unixes were unable to ever free numerous object types

21:55 <mjg> most notably vnodes

21:55 <heat> yes

21:55 <mjg> on bsds this also included name cache entries, vm objects and more

21:55 <heat> im kinda aware

21:55 <mjg> so

21:56 <mjg> if you can't *ever* reuse this for any other purpose, you may at least damage control

21:56 <mjg> you slap a huge page and allocate from there

21:56 <mjg> [except freebsd fails to take advantage of it]

21:56 <mjg> vnodes and nc entries are freeable now, but vm objects still is not and that's probably never going to change

21:57 <heat> ok, that's weird

21:57 <mjg> LS... what was the letter

21:57 <heat> and linux doesn't fall into this right

21:57 <mjg> i don't know if linux ever had this problem

21:57 <mjg> i suspect not

21:57 <heat> linux likes to do all its memory stuff on the big mapping

21:57 <mjg> if you want weird you can check how 4.4bsd sorts out namecache entries

21:57 <mjg> work of fucking art

21:58 <heat> we've went through that IIRC

21:58 <mjg> *all*?

21:58 <heat> yes

21:58 <heat> vmalloc is the last resort

21:58 <mjg> that sounds very suspicious, 2MB granularity may not be enough to recover in low memory conditions

21:59 <heat> which kind of makes sense, given the buddy allocator and whatnot

21:59 <mjg> perhaps tthey default to huge pages which they demote if needed?

21:59 <heat> nooo

21:59 <heat> you're getting me wrong

21:59 <heat> you know the big linear mapping they have right? I don't know if fbsd has a similar one

21:59 <mjg> you mean direct map perhaps?

21:59 <heat> yes

21:59 <mjg> that's the name man :>

22:00 <heat> the ~800MB in 32-bit and gajillion bytes in 64-bit

22:00 scaleww has joined #osdev

22:00 <mjg> ye ye ye, that's fine

22:00 <heat> it's fine but it gets me nervous

22:00 <mjg> ye direct map is great, freebsd does not fully utilize it

22:00 <heat> it's super unsafe

22:01 <heat> it also only reaaallyyy works if you have a buddy allocator or something similar that can give out big chunks of pages

22:02 <heat> particularly, in a fast way

22:02 <heat> i have a list of pages, which doesn't particularly work

22:02 <mjg> descending into it is supposed to be rare

22:03 <heat> sure

22:03 <mjg> i would say for now just map pages as needed

22:03 <heat> but going down and allocating a slab is not something I want to be stupidly slow

22:03 <mjg> once it all works and whatnot i woudl ervisit direct map as an optimizatin

22:05 dude12312414 has quit [Remote host closed the connection]

22:06 dude12312414 has joined #osdev

22:06 demindiro has quit [Quit: Client closed]

22:48 gildasio has quit [Ping timeout: 258 seconds]

22:49 gildasio has joined #osdev

22:49 gildasio has quit [Remote host closed the connection]

22:50 gildasio has joined #osdev

23:06 gildasio has quit [Remote host closed the connection]

23:06 gildasio has joined #osdev

23:08 pretty_dumm_guy has quit [Ping timeout: 265 seconds]

23:10 pretty_dumm_guy has joined #osdev

23:25 scaleww has quit [Quit: Leaving]

23:29 les_ has quit [Quit: Adios]

23:32 les has joined #osdev

23:34 vdamewood has joined #osdev

23:35 lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]

23:44 <heat> mjg, how do you implement kfree()? as in knowing what slab an object belongs to

23:45 <heat> I know how linux does it but that's not suitable here, going down the page tables to get a struct page isn't an option

23:47 gog has quit [Quit: byee]

23:48 <heat> I could try and place some slab struct inside the page itself but then I can't have variable-sized slabs

23:48 gjnoonan has quit [Read error: Connection reset by peer]

23:48 gjnoonan has joined #osdev

23:49 vdamewood has quit [Read error: Connection reset by peer]

23:49 Arsen has quit [Remote host closed the connection]

23:49 qookie has quit [Remote host closed the connection]

23:50 vdamewood has joined #osdev

23:50 qookie has joined #osdev

23:50 Arsen has joined #osdev

23:54 nyah has quit [Ping timeout: 248 seconds]

23:55 [itchyjunk] has joined #osdev

23:55 <heat> ok freebsd handles that problem by simply not handling that problem

23:55 <heat> excellent