#osdev on 2023-03-21 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 gmodena has quit [Quit: ZNC - http://znc.in]

00:00 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

00:05 <mrvn> The *heat* death of the universe is a few billion years in the future. Someone will survive us all.

00:11 brunothedev has joined #osdev

00:15 <AmyMalik> bingo.

00:15 <AmyMalik> I reproduced the power virus on linux.

00:15 <AmyMalik> This still isn't the hot gun that I need, but it will help me reproduce it

00:20 <brunothedev> ?

00:21 <brunothedev> i remember when i build a script that read /dev/urandom into a variable in python, i felt so smart looking the ram usage going up and up

00:24 <brunothedev> oh also: "sudo dd if=/dev/urandom of=/dev/sda"

00:25 brunothedev has quit [Quit: WeeChat 3.8]

00:28 <geist> i wonder if they just typed that..

00:31 gdd has quit [Ping timeout: 265 seconds]

00:31 gdd has joined #osdev

00:42 dutch has quit [Quit: WeeChat 3.8]

00:52 <zid> we can only hope

01:07 dutch has joined #osdev

01:09 <AmyMalik> VoidLinux exhibits the power virus until it runs /etc/runit/1. Now to single step that init script.

01:27 <sakasama> mrvn: By then the only remaining lifeform will resemble the giant spaghetti monster.

01:35 rnicholl1 has joined #osdev

01:44 [itchyjunk] has quit [Ping timeout: 255 seconds]

01:47 Left_Turn has quit [Read error: Connection reset by peer]

01:48 [itchyjunk] has joined #osdev

01:49 danilogondolfo has quit [Remote host closed the connection]

01:54 heat_ has joined #osdev

01:54 heat has quit [Ping timeout: 246 seconds]

02:04 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

02:07 rnicholl1 has joined #osdev

02:08 Matt|home has joined #osdev

02:15 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

02:20 heat_ is now known as heat

02:30 rnicholl1 has joined #osdev

02:39 nyah has quit [Quit: leaving]

02:41 <rnicholl1> Hi, I am confused about how to use the WFE instruction

02:41 <rnicholl1> my understanding is that this puts the CPU into a low power sleep state until woken up by SEV

02:41 <rnicholl1> The examples show ldar followed by comparison and loop

02:42 <rnicholl1> the way to wake up the CPU is SEV

02:42 <rnicholl1> but my thought is, what happens if the stlr + sev happens between the comparison and WFE?

02:43 <Mutabah> The idea of WFE is it is used when you don't have anything else to do in that thread

02:44 <Mutabah> so timing doesn't matter, you're usually doing an infinite loop with it

02:44 <rnicholl1> Right, the documentation suggests it can be used for a spinlock

02:44 <rnicholl1> "you can insert the WFE instruction into the loop so the CPUs instead of looping continuously will enter STANDBYWFE. Then the CPU who has been using the resource should execute SEV instruction after it has finished using the resource. This will wake up all other CPUs from STANDBYWFE and another CPU can then access the shared resource."

02:44 <rnicholl1> But my question is

02:44 <Mutabah> It might have a maximum wait time? or be woken by any other event (e.g. a timer interupt)

02:45 <rnicholl1> yes, I would hope so, but I can't find this fact documented

02:46 [itchyjunk] has quit [Remote host closed the connection]

02:47 <rnicholl1> Do I need to implement reference counting on the wait flags?

02:47 <geist> yah, WFE is not like WFI

02:48 <geist> I assume you're talking about the arm64 version of WFE?

02:48 <geist> basically WFE waits for an 'event'. there are then a seires of things that generate events

02:48 <Mutabah> Ah, my bad then

02:48 <geist> so SEV is the obvious thing that generates an event, but there are other cases

02:48 <Mutabah> Guess - events are enqueued until WFE clears them?

02:49 <Mutabah> so an event raised just before a WFE triggers a spurious wakeup?

02:49 <geist> correct. sev;wfe basically falls through. actually that sequence is the caononical way to clear the event bit: set it (in case it wasn't already) then wfe to consume it

02:51 <geist> so in the spinlock case, the *other* case that sets an event is when you load a load exclusive on an address, and then dont release the exclusive line

02:51 <geist> then any other instruction *on another cpu* that writes to the same cache line 'breaks your exclusive lock' on it, and that generates a local wait event

02:51 <geist> thus releasing your cpu WFE

02:52 <geist> so a general solution is a spinlock: you try to acquire it on cpu A (ie, grab an exclusive load on it), then discover it's already held, then you WFE and spin around and try again

02:52 <geist> so it's a general spinlock, except the cpu doesnt continually spin: it waits until the core that owns it writes a zero to it

02:53 <geist> https://github.com/littlekernel/lk/blob/master/arch/arm64/spinlock.S#L21

02:53 <bslsk05> github.com: lk/spinlock.S at master · littlekernel/lk · GitHub

02:53 <geist> it's subtle,but line 27 is the key, it did an exclusive load in line 26 (ldaxr), then 27 decides it's non zero, so loop back to the WFE

02:54 <geist> line 33 the owning cpu does a 'stlr' which breaks the exclusive lock on the first cpu, which falls out of WFE

02:54 <geist> and then tries again

02:54 <geist> (not line 23 does the sev/wfe sequence when first entering the spinlock code, so it clears any event state that may be set before)

02:56 <geist> so the key is the true power of it comes from multiple cpus. a cpu by itself the sev/wfe thing isn't eally that useful

02:57 <rnicholl1> uh, my understanding was that sev is use to wake up other cores... why in this case is sev used by the code doing the unlock?

02:58 <rnicholl1> er, not used

02:58 <rnicholl1> So, supposing I do a load

02:58 <geist> again are you thinking about arm64 or arm32? it changed a little bit between armv7 and armv8

02:59 <rnicholl1> arm64

02:59 <geist> armv8 a store to the same address also sets an event

02:59 <geist> but tighter scoped: it only triggers the event on other cores that happen to have an exclusive monitor on that address

02:59 <geist> ie, ldaxr previously

02:59 <geist> the older armv7 version of it yes you would store a zero to the lock, and then SEV after it

03:00 <geist> but then it was global, so was less efficient than the armv8 one

03:00 <rnicholl1> supposing I do an acquire-load, the WFE instruction would be woken up if a recently executed acquire load lock is lost?

03:00 <geist> yes

03:00 <geist> monitor is the better word

03:01 <geist> basically by doing the acquire load and then storing to it the local cpu now has a monitor for that address

03:01 <geist> and then any other cpu breaking the monitor locally sets the event bit

03:01 <rnicholl1> is there a read-only version of this?

03:01 <geist> s/and then storing to it/and then *not* storing to it/

03:01 <geist> what do you mean?

03:01 <rnicholl1> For example, I want to loop on a flag that tells non-0 cores when they can start executing

03:02 <geist> sure

03:02 <rnicholl1> The flag is set by core 0, then core 0 executes sev?

03:02 <geist> just do half the spinlock: ldaxr on the address, then go back to the wfe. then on the core releasing it just do a stlr

03:02 <geist> core 0 doesn't have to execute sev if the other cores grab an exclusive monitor

03:02 <geist> but if you want to do it more cheezily without load/acquire, sure, sev works too

03:03 <rnicholl1> Is exclusive monitor different from acquire/load semantics?

03:03 <rnicholl1> I had assumed it was like seq_cst

03:03 <geist> yes those are completeky different concepts

03:04 <geist> the former is the monitor stuff that you use to build atomics, the latter is the implied barriers that you can add to the atomic ops

03:04 <geist> anyway the ARMv8 ARM does mention this event stuff. look up something like implicit event generation

03:04 <geist> but SEV is the obvious event generator

03:04 <geist> note SEVL is a local event generator, only for the current cpu. doesn't broadcast it to the others

03:05 <rnicholl1> hum, I am confused how it avoids the race

03:05 <geist> in what case?

03:06 <rnicholl1> Supposing you used ldaxr, I can't imagine it sets up a permanent monitor

03:06 <geist> it does

03:06 <geist> *until* you either clrex, or something breaks it

03:06 <geist> (or a bunch of other things that can break it, like an interrupt firing)

03:07 <geist> think of it logically as the local cpu is snooping the bus and seeing what other cpus are doing

03:07 <geist> it's really more complicated than that, it's using the L1 cache coherency mechanism to track whether or not another core modifies the same cache line

03:07 <geist> but since the L1 cache stuff already works, it actually just piggybacks on top of the same hardware

03:08 <geist> but also remember this loop/wfe stuff is sloppy. it may wake more times than it needs to, you always have to double check the condition and try again if it's not met

03:09 <zid> select(ALL_EVENTS_FD_SET);

03:10 <rnicholl1> Ok, so doing ldaxr gets an exclusive state, wfe will wake when another core writes to the same exclusive state, what it the purpose of sevl in this case and how does it differ from sev?

03:10 <geist> in this loop it was because sevl;wfe (in the first iteration of the loop) clears any pending events that may have been set

03:11 <geist> it's just the canonical way to clear an event safely

03:11 <geist> since you can't read the state of the event bit, think of it as a hidden bit in the cpu. you can only directly clear it via wfe, and you can only directly set it via sevl

03:12 <geist> it 'latches'

03:13 <rnicholl1> oh, so if I set "sev", and no other core is waiting for an event, e.g. they just did a load but haven't yet done wfe

03:13 <rnicholl1> the WFE will still wake from the previous sev?

03:14 <geist> yah the event latches forever until soething clears it

03:14 <rnicholl1> like, core 0 loads, finds 0, (sleep value), core 1 does SEV, core 0 then does WFE, but is woken by the previous SEV by core 1

03:14 <geist> yes

03:15 <rnicholl1> alright, that clears up the confusion a bit

03:15 <geist> so it always errs on the side of an extraneous wakeup. so if you wanted to build this 'wait for a signal from another core' flag, then it's really not much different from a regular loop on a variable, you just stick in a WFE there in the loop

03:16 <rnicholl1> so actually, there is not a race in the case of (load, wfe) + (store, sev) because even if the sequence is (load, store, sev, wfe) it will still wake up

03:16 <geist> and you can either use a ldaxr and have the signaler write to it, or just loop with a wfe + sev

03:16 <geist> riiiight. yeah

03:16 <rnicholl1> ok, I was thinking of the classic race condition variables have without using a mutex (i.e. notify/store race)

03:17 <geist> note the ARMv8 manual does discuss this in some detail, but as is always the case with ARMv8 it talks bout all the minutae

03:17 <rnicholl1> but seems there is a core-wide flag that acts a bit like a mutex

03:17 <geist> right, but as i was saying it's actually not ath common to use SEV anymore. using the SEV instruction is basically a global slam of all the cpus

03:17 <geist> there's even a bunch of flags when using virtualization that limits to scope to just one virtual machine, etc etc

03:17 <geist> so that some thread somewhere that just does SEV in a loop doesn't waste the whole system

03:18 <geist> (actually i forget, SEV may not be user accessible. SEVL probably is though, since it only is for the local core)

03:18 <geist> SEV was a v7 thing, v8 makes the scope tighter by having the implict SEV with exclusive monitors, etc. since that's mostly where it's used

03:19 <geist> but of course for early bringup or specialized kernel cases SEV is perfectly fine

03:22 Dyskos has quit [Ping timeout: 250 seconds]

03:35 <heat> geist, is all of that bs what makes arm64 cmpxchg weak?

03:36 <geist> dunno what you exactly mean, but if you're referring to up until 8.1 you needed a multi instrction sequence to do it sure

03:36 <geist> 8.1 adds a CAS instruction to do it directly

03:36 <heat> i mean the ldaxr, etc stuff

03:36 <heat> yeah HLE right? or some similarly named extension

03:37 <geist> LSE (large system extension i think)

03:37 <heat> ah yes

03:37 <rnicholl1> I think it would spuriously fail if another core got an exclusive lock on the same cache line right?

03:37 <moon-child> yes

03:38 <geist> right, but if it did then the cache coherence system has failed, and thats the least of your worries

03:38 <moon-child> ll/sc has scalability problems

03:38 <moon-child> iirc riscv guarantees that some restricted subset of ll/sc will eventually make progress

03:38 <moon-child> idk if arm does the same

03:38 <rnicholl1> I mean, I assume ldxr can be broken by other cores right?

03:38 <heat> how does any of this work with interrupts?

03:38 <heat> is there some special state you save and restore?

03:39 <geist> for a ll/sc pair (to use that nomenclature) the whole system relies on the ability for a local cpu to get an exclusive copy of the cache line and then it only does the store part *if* it hasn't lost the access in the interim

03:39 <geist> and that is just the normal L1 cache coherency protocol that already exists

03:39 <geist> which had better darn well work

03:39 <rnicholl1> but another core could get an exclusive lock, it would just cause the next store to fail

03:39 <geist> correct

03:40 <rnicholl1> so that would be why the exchange could "weakly fail"

03:40 <geist> because the store only works if the monitor on the local core is still valid. and the monitor is broken if another core takes away the cache line

03:40 <rnicholl1> or spuriously fail

03:40 <geist> right. and to answer heat: an interrupt and definitely a context switch would break the monitor

03:41 <geist> at the minimum in a context switch the OS is required to issue a `clrex` instruction, which manually breaks any monitors

03:41 <geist> but there are also some other things that break monitors. iirc an eret does too, implicitly

03:41 <rnicholl1> should I do clrex after wfe?

03:41 <geist> no. you should not

03:41 <rnicholl1> is there a limit to the number of monitors?

03:41 <geist> the only place to ever put it is inside a context switch, reallly

03:42 <moon-child> it's 1, isn't it?

03:42 <geist> i think it's 1

03:42 <moon-child> more than 1 is general transactions

03:42 <geist> but i'm not sure hardware mandates that

03:42 <moon-child> which only intel has semi-mainstream implementations of

03:42 <rnicholl1> sounds like we could get spurious wakeups if we don't clear it

03:42 <moon-child> arm has specced transactions, though

03:42 <rnicholl1> though that could happen anyway

03:42 <geist> also note hardware doesn't specify the size of an exclusive lock. it *could* be all of ram. but in practice its a cache line

03:42 <geist> rnicholl1: yah you just dont worry about that

03:43 <moon-child> I saw some stuff in the arm manual that was _basically_ speccing in cache lines

03:43 <geist> spurious wakeups are simply part of the game. all told it's still better than not using wfe at all, so it's all an optimization

03:43 <moon-child> I don't remember exactly what it said. Nor if its 'cache lines' were 64 or 128 bits. But it was definitely there

03:44 <geist> yah thee's a way to read the exclusive monitor size i think

03:44 <moon-child> *bytes

03:44 <rnicholl1> https://pastebin.com/cUYWDvMc

03:44 <bslsk05> pastebin.com: // The function waits until core_start_ptr is non-null and then jumps to it.// - Pastebin.com

03:44 <rnicholl1> So, this should be correct?

03:44 <geist> you dont need the clrex

03:45 <geist> also how is it cleared?

03:45 <geist> also the ldar i dont think does the exclusive lock. you probably want a ldaxr (the 'x' is the exclusive monitor)

03:46 <geist> or just use a sev where you write to it

03:46 <rnicholl1> https://pastebin.com/TmuTqhDM

03:46 <bslsk05> pastebin.com: // Starts the auxiliary core with the given entry point.// void aux_core_start - Pastebin.com

03:46 <geist> yah that'll work

03:46 <geist> then yo dont need the x stuff

03:46 <geist> note it should also work without the wfe. it's just an optimization to park them and not use cpu

03:47 <heat> pauseeeeeee

03:47 <rnicholl1> wait actually I do need the x

03:47 <geist> (as a tiny optimization, you dont need the 'add' instructions there, because you can just put the :lo12:... inside the [] part

03:48 <rnicholl1> thought I already fixed it, one moment

03:48 <heat> this is where I ask why you're doing a busy loop for $stuff, for what it seems to be secondary cores

03:48 <geist> basically either you do the 'x' stuff and avoid the sev, or you dont use the 'x' stuff and then use sev. the latter is probably easier to grok

03:49 <rnicholl1> heat: I haven't set up the stack for the other cores yet or the bss

03:51 <heat> geist, is any of the break-before-make stuff avoidable when breaking up the kernel's direct map?

03:51 <rnicholl1> oh, right, ok, I was a bit confused for a moment, so the x version sets a monitor, the non-x does not, other than that both are acquire loads?

03:52 <geist> right, like i said the monitor stuff and acqire/release are different things

04:01 <rnicholl1> and wow, I just read about the latest amd chip... over 1GiB of... L3 cache. wow

04:01 <geist> oh one of the uber server chips?

04:01 <zid> epyc

04:01 <zid> They had a 768MB they were demoing for fluid sims last year

04:02 <rnicholl1> yeah

04:02 <zid> Turns out power densities are so high now the heat doesn't make it to the IHS properly anyway, so you can just throw cache on top of the alus

04:06 bradd has joined #osdev

04:15 <rnicholl1> Which bootloaders should I implement support for? I understand the raspberry pi 3/4 just load kernel8.img, but there are some other bootloaders around?

04:15 <rnicholl1> like open source bootloaders

04:20 <geist> uboot is the general solution for ARM machines

04:20 <geist> i'd say unless it's some specific loader for a particular product line (ie, iboot for apple devices) or UEFI for servers, then it's probably either uboot or some sort of bespoke thing

04:24 <rnicholl1> Is GRUB not often used on ARM?

04:25 smeso has quit [Quit: smeso]

04:25 <rnicholl1> And multiboot?

04:25 <heat> no

04:25 <heat> and no

04:25 <heat> usually you either pretend to be linux (linux arm64 boot protocol, that uboot, etc load) or do EFI (eww)

04:26 <rnicholl1> What boot methods does win11 arm use?

04:27 <heat> efi

04:27 <rnicholl1> So, would uboot boot itself using EFI as well?

04:27 <heat> no

04:27 <rnicholl1> Like, EFI->uboot->linux

04:28 <rnicholl1> hum

04:28 <heat> that would make little sense

04:30 <zid> arm doesn't have efi-grub? :(

04:30 <zid> poor arm

04:31 <geist> what uboot can do is also pretend to be UEFI

04:31 <geist> or at least enough of a subset that linux can boot with it. i've seen that fairly commonly, also on riscv

04:31 <geist> i haven't looked at precisely what hooks it implements

04:32 <geist> so in that case it's uboot -> EFI -> linux

04:32 <geist> or uboot-as-efi -> linux

04:32 smeso has joined #osdev

04:32 <heat> yeah 2nd is more precise

04:32 <heat> it's still the same payload

04:32 <heat> zid, i believe there *is* support for it, for both multiboot and grub, but WHYYYYYYYYYYYYYYYYYYYYYYy

04:34 <heat> and fwiw this uboot EFI ""support"" is a fairly recent add by Google

04:34 <heat> 2020 iirc, for Reasons(tm)

04:34 <rnicholl1> oh interesting

04:34 <geist> huh yeah, didnt know tat. wonder what group did it

04:34 <rnicholl1> probably zircon

04:35 <geist> note that on ARM and riscv now it's fairly standard to use device tree (or ACPI) as the descriptor for where things are, etc.

04:35 <geist> so multiboot doesn't really provide anything in that case. its too basic

04:35 <heat> yeah, it'd be great if someone from zircon were here

04:35 <heat> alas

04:35 <geist> and it's highly x86 centric anyway

04:36 <heat> I suspect that uboot efi support was added because they want to remove tianocore on android

04:36 <geist> yah, something like that. but then lots of android doesn't boot via grub. but then lots of it does, so probably either way

04:37 <heat> although I would assume that we're still far from that, since AFAIK vendors now supply the platform init code in EFI binary modules, and probably EFI PI modules (and not EFI itself)

04:37 <heat> geist: s/grub/efi/ ?

04:38 <geist> either

04:38 <geist> or...

04:39 <geist> was thinking mabe it's because of the arm gce stuff

04:40 <rnicholl1> At some point I would like to work in zircon, it seems like it would be fun to write kernel code.

04:42 <rnicholl1> python is a little bit... less interesting

04:43 <rnicholl1> it's not bad though

04:43 <geist> python is definitely less interesting

04:44 <rnicholl1> I hate that it has weak typing

04:46 <rnicholl1> I may have caused an outage by passing [args...] instead of args to a random execution functoin

04:46 <rnicholl1> which of course didn't fail the pytype check since

04:46 <rnicholl1> [args...] was an array of objects... which is a valid argument

04:46 <rnicholl1> yay...

04:46 <geist> yha i've definitely been hit by duck typing. i rememer yeras ago was using some library that parses a bunch of html and returns a string

04:47 <geist> ten you pass this string around in python and use it. and then it turns out after ther fact that it' really passing yuo this functor thing that returns the string every time you query it

04:47 <geist> but otherwise it was hauling around 50MB of state

04:47 <rnicholl1> fortunately there was another outage at the same time

04:47 <rnicholl1> so nobody noticed

04:48 <rnicholl1> well, I mean, we noticed, obviously, but it wasn't that impactful as the other outage was much worse

04:48 <rnicholl1> the best part was

04:48 <rnicholl1> everyone thought we had unit tests

04:49 <rnicholl1> but it turns out, they pymock the entire class out that was modified

04:49 <rnicholl1> so it had no coverage

04:50 <rnicholl1> python allows you to do weird things

05:00 <rnicholl1> back on channel topic... when trying to figure out start and end address of ram, do I just need to scan DTB/FDT? I looked in dts e.g. https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/bcm2711-rpi-4-b.dts and I don't see RAM entries, does the bootloader set these up dynamically?

05:00 <bslsk05> github.com: linux/bcm2711-rpi-4-b.dts at master · torvalds/linux · GitHub

05:01 <rnicholl1> GPT suggests DTB, UEFI, and ACPI, but I have no idea how I'm supposed to know which one to use

05:02 <geist> it does, note that that dts includes other ones

05:02 <geist> so probably in the more basic bcm dtses it has a memory range

05:02 <klange> Python has _strong_ typing.

05:02 <klange> Do not confusing static with strong.

05:04 <rnicholl1> Should I chose between these based on the bootloader I am using?

05:04 heat has quit [Remote host closed the connection]

05:05 <geist> where is clever when you need them

05:05 <rnicholl1> e.g. I just have to know that the bootloader uses DTB

05:05 <geist> clever!!!!

05:05 <rnicholl1> Or know that I am using UEFI

05:05 <rnicholl1> And compile a different kernel based on this choice

05:05 <klange> Define "uses". The bootloader provides one, possibly one it dynamically generated - maybe even one it dynamically generated from UEFI and ACPI!

05:06 <rnicholl1> well, supposing I have my entry point, it seems on raspi 4 the firmware/bootloader sets x0 to the pointer to FDT

05:06 <rnicholl1> But how would I know if I am in UEFI instead of DTS/FDT based booting?

05:07 <rnicholl1> No way to know? just have boot_dts.asm and boot_uefi.asm separately?

05:07 <klange> If you're in EFI, you know because you're an EFI application with an EFI entrypoint.

05:07 <rnicholl1> so, boot_dts.asm, boot_efi.asm, etc

05:07 <rnicholl1> and chose which one to compile

05:07 <rnicholl1> as a kernel config parameter when compiling the kernel

05:15 * clever waves

05:15 <klange> That is an option. The approach I take, though, is to have a single common kernel binary for a given architecture and then different stubs for different boot environments - which is also typically how things work on x86: you don't build a different kernel for EFI or BIOS, you have an EFI loader and a BIOS loader, and if you want to be particulary clever and you can smoosh them into your kernel binary

05:15 <klange> (together!) to have it do it all in one resulting file.

05:15 <clever> rnicholl1: the rpi firmware updates the dtb and adds /memory nodes during boot, the exact layout varies depending on config.txt entries

05:19 <rnicholl1> ok

05:21 <Mutabah> klange: Bonus points for supporting 32-bit and 64-bit EFI in one

05:24 slidercrank has joined #osdev

06:01 bauen1 has quit [Ping timeout: 265 seconds]

06:05 Ali_A has joined #osdev

06:14 Ali_A has quit [Ping timeout: 260 seconds]

06:16 mahk has joined #osdev

06:44 gbowne1 has quit [Quit: Leaving]

06:51 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

06:51 rnicholl1 has joined #osdev

06:57 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

07:33 bauen1 has joined #osdev

07:34 Ali_A has joined #osdev

07:41 pfsoe has joined #osdev

07:48 bauen1 has quit [Ping timeout: 250 seconds]

08:52 gildasio1 has joined #osdev

08:57 Ali_A has quit [Quit: Client closed]

09:10 gog has joined #osdev

09:13 mahk has quit [Ping timeout: 268 seconds]

09:27 Sos has joined #osdev

09:43 Sos has quit [Quit: Leaving]

09:46 <gog> good morning

09:46 * gog passes out bagels and coffee

09:56 GeDaMo has joined #osdev

10:12 mahk has joined #osdev

10:44 slidercrank has quit [Quit: Why not ask me about Sevastopol's safety protocols?]

11:01 dutch has quit [Quit: WeeChat 3.8]

11:04 heat has joined #osdev

11:05 * klys politely declines yet accepts the bagel

11:15 <AmyMalik> FreeBSD does not know how to manage PCIe power. The result? a CPU power virus.

11:33 bradd has quit [Ping timeout: 260 seconds]

11:35 dutch has joined #osdev

11:50 danilogondolfo has joined #osdev

12:18 nyah has joined #osdev

12:43 heat has quit [Remote host closed the connection]

12:51 heat has joined #osdev

12:53 Ali_A has joined #osdev

12:54 <heat> our fucking national postal service sucks so much ass

12:54 <heat> i'll have to fucking pray the svr4 book hasn't been sent back to fuckoffshire, england

12:55 SGautam has joined #osdev

12:55 <heat> these idiots think that ringing the doorbell is optional

12:56 elastic_dog has quit [Killed (zinc.libera.chat (Nickname regained by services))]

12:56 elastic_dog has joined #osdev

13:08 <nikolar> Kek

13:16 <SGautam> Might be slightly off topic but nevertheless this channel would know a thing or two about hard drives. Is it my anecdotal experience or is it something well known that Seagate drives are known to fail?

13:16 <SGautam> I've had 3 drives go off on me one by one, in succession.

13:28 <heat> wasn't that a headline like a few days ago?

13:28 <gog> are these newer seagate drives?

13:28 <gog> iirc they have a product line that's based on what they acquired from Maxtor and they're notoriously bad

13:28 <gog> not like the barracudas of the mid to late 2000's

13:29 <heat> https://www.securedatarecovery.com/blog/how-long-do-hard-drives-last

13:29 <bslsk05> www.securedatarecovery.com: How Long Do Hard Drives Last

13:29 <heat> yeah it was a headline a few days ago

13:35 bauen1 has joined #osdev

13:41 inegatives has joined #osdev

13:41 <heat> gog, i have an interview in a few minutes and i feel nervous, help

13:41 <gog> get drunk

13:41 <heat> damn you're my savior, thanks

13:44 anbdummy has quit [Quit: WeeChat 3.8]

13:46 <gog> anytime

13:51 <gog> i'm doing a good job at work

13:51 <gog> i'm a competent developer even if i don't always know the answer right away

13:51 <gog> i am not an imposter

13:54 gildasio1 has quit [Quit: WeeChat 3.8]

13:56 <heat> damn right

13:56 <heat> tomatoanus is proud of you

13:56 <zid> I am sus

13:56 <heat> amogus

13:57 <gog> sussy

13:57 <gog> ( ?° ?? ?°)

13:57 <gog> dang

13:58 <heat> DONG?

13:58 <heat> 𓂸

13:59 <heat> oh, dang

13:59 <heat> dang it

13:59 Ali_A has quit [Quit: Client closed]

13:59 * FireFly dangs it

13:59 <heat> gosh darnit

14:00 <gog> con sarnit

14:00 <zid> heat stop asking to see my sussy

14:00 <zid> it isn't free

14:00 <mrvn> SGautam: harddrive are widely different from model to model. Sure there where some segate drives that failed. In general though the likelyhood is for a drive to fail early, then there is an unlikely period for the working life and then the failure rate rises towards the end again. If you bought a batch of drives it's also like for them all to fail after around the same amount of working hours.

14:00 <Mutabah> well this is a... discussion

14:00 <zid> Mutabah's already paid up in full ofc

14:02 <mrvn> Drives in a raid also like to fail during recovery because when the first fails and you replace it the raid rebuild put so much extra strain on the drives.

14:03 danilogondolfo has quit [Ping timeout: 276 seconds]

14:03 heat_ has joined #osdev

14:04 heat has quit [Ping timeout: 246 seconds]

14:04 Ali_A has joined #osdev

14:09 <Mutabah> Plus, RAIDs tend to be made at the same time, and it's easy to get n of the same model... so likely to have the same failure mode

14:09 <gog> yes

14:10 <gog> and if any of them have a defect that wasn't detectable at the time of QA then they're all gonna be sus

14:10 <gog> hm, how would one time-accelerate testing for MTBF?

14:11 <zid> yea I was thinking about exactly that after my ssd just died

14:11 <zid> it's samsung's shitty fw that died it, so I bet if i had two of them in raid both would have died

14:12 linearcannon has quit [Remote host closed the connection]

14:12 linearcannon has joined #osdev

14:15 gildasio has joined #osdev

14:16 nvmd has joined #osdev

14:17 inegatives has quit [Quit: Client closed]

14:18 inegatives has joined #osdev

14:21 linear_cannon has joined #osdev

14:23 linearcannon has quit [Ping timeout: 250 seconds]

14:24 linearcannon has joined #osdev

14:27 linear_cannon has quit [Ping timeout: 276 seconds]

14:30 <gog> i have to support a legacy thing and a new thing and the logic to do it is annoying and complicated

14:33 inegatives has quit [Quit: Client closed]

14:38 <mrvn> Mutabah: if you have the budget you buy a few extra spares for the raid and then you replace some drives after some random times so they are all aged differently.

14:38 * Mutabah is away (Sleep)

14:40 <mrvn> gog: they tend to run the drive under worse conditions, like turn up the heat and humidity and then they have a multiplier how much longer that relates to in real life. Which is basically purely made up.

14:40 <gog> yeahhh

14:42 <mrvn> The specs on SSD is real funny too. It has some life expectancy of 3 to 5 years but there is also a value for how much data you can write before it's expected to fail. Devide the amount by the max speed and you get les than a week.

14:42 <nortti> wait what SSDs only have a life expecteancy of 3 to 5 years?

14:43 <mrvn> nortti: same as spinning disk. Bogus made up value for which the manufacturer will replace your disk.

14:46 <mrvn> With SSDs I would trust most the amount written before failure because that they can actually test easily. Just write to the same address on the flash over an over and count. They know it manages like 10'000 or 100'000 writes or whatever the current value is and from that and their wear leveling algorithm they know how much writes you can do.

14:46 elastic_dog has quit [Remote host closed the connection]

14:47 elastic_dog has joined #osdev

14:47 <zid> ssds are consumable goods yea

14:48 <zid> and they often die by just disappearing, not going read only

14:48 <zid> everybody's warranties have been getting shorter and shorter on them too

14:48 <mrvn> I recently also heard that SSDs will loose data if you put them on a shelf and don't use them. In use the data is refreshed by the wear leveling every now and then but unused it fades. Annyone heard that too?

14:48 <mrvn> No idea if that's true.

14:50 vdamewood has joined #osdev

14:53 <gog> iirc that's more of a thing with MLC than SLC

14:53 <gog> because the distance between the stored charge is smaller and in more directions than with MLC it can just do quantum weirdness and start to degrade

14:53 <gog> s/MLC/SLC

14:54 <gog> > According to that rule, MLC SSDs generally have a shorter life span than SLC SSDs, and the difference can be significant. For example, an MLC SSD may last for about 3,000 cycles, while an SLC SSD in the same application may last for 60,000.

14:55 <zid> yea more layers is a more bad ssd

14:55 <zid> generally

14:55 <zid> they sell TLC and QLC too

14:55 <gog> considering returning to optical media for long-term backups

14:55 <zid> flash is fairly shelf stable I think

14:56 <zid> it just doesn't like being written to, either the flash dies or the controller does

14:56 <mrvn> gog: how many cycles is sitting in a shelf for a year?

14:56 <gog> i'm looking for shelved data integrity but not finding much

14:56 <mrvn> gog: DAT tapes?

14:56 <mrvn> .oO(and yes, that's tape tapes)

14:57 <gog> a tape drive for really long-term storage sounds fine

14:57 <gog> it's a time-tested technology

14:57 <gog> and you can get a lot on a tape

14:57 <zid> also cheap

14:57 <mrvn> you still need to rewind them every now and then to prevent bleed through I think.

14:57 <zid> flash prices are going down a lot though

14:57 <zid> multi TB drives exist now

14:58 danilogondolfo has joined #osdev

14:58 <mrvn> Also, is there any software to run a raid algorithm over tapes? Or do you just mirror for redundancy?

14:59 <mrvn> E.g. XOR everything written to a tape to a harddisk and after 4 tapes write the harddisk contents out to a 5th tape as redundancy.

15:00 <gog> ehhhh

15:00 <gog> i'd say just mirror

15:00 <gog> having blocks spanning multiple tapes seems like a good way to lose everything if one of the tapes gets damaged

15:00 <nortti> from what I see a lot of ppl don't bother using anything but mirroring with disks either nowadays

15:00 Ali_A has quit [Quit: Client closed]

15:00 <nortti> due to write amplification and such

15:00 <mrvn> gog: I don't want to stripe files across tapes. :) Just make it so a lost tape can be recovered.

15:01 <zid> I use a raid array between my ssd and a floppy disk, with --write-mostly

15:02 <mrvn> zid: must be a 9 1/2' floppy or it would be far too small for the data. :)

15:02 <mrvn> nortti: doing 3 or 4 copies with mirroring becomes expensive

15:02 <zid> I did actually do that for my firefox profile back in the day, made a raid array between a ramfs file mounted loopback, and a real file, with the real file --write-mostly

15:03 <mrvn> zid: does that help? firefox writes and syncs every 10 seconds and only reads back the data on restart. So you mostly write anyway.

15:06 inegatives has joined #osdev

15:10 <gog> mrvn: raid 6 types

15:10 <gog> tapes

15:14 Ali_A has joined #osdev

15:15 <mrvn> gog: go further. 16+4 redundancy or so.

15:16 <mrvn> Survive 4 tapes failing at 25% cost increase.

15:16 <mrvn> Once you do that your drive is probably going to fail just when you can't get a new one anymore. :)

15:17 <gog> oops

15:17 <gog> what LTO level are we up to now

15:17 * mrvn wants to dote diamond with something as permanent storage on an atomic level.

15:18 <mrvn> At least LTO-8

15:18 <mrvn> make that 9, a few hits down on google.

15:18 <mrvn> "The ninth generation of LTO Ultrium was announced in 2020 and can hold 18 TB in a cartridge of the same physical size."

15:18 <heat_> 9000

15:20 <mrvn> I didn't even notice that tapes have exceeded my harddisk sizes.

15:20 <gog> lmao "how many levels of LTO are you on my dude" "8 or 9"

15:21 <FireFly> was just thinking that too lol

15:22 <gog> fuck. i am not a competent developer

15:22 dude12312414 has joined #osdev

15:22 <gog> i think i need to learn more about react because this is clearly supposed to be easy but i fail to comprehend it

15:25 <mrvn> Tape drives are way to expensive for my taste: Qualstar Qi LTO-9 SAS EXT. Tape Drive -- 5.550,00€

15:25 <mrvn> I can buy a lot of harddisks before tapes become cheaper.

15:26 <Ermine> gog: may I pet you

15:26 <gog> yes

15:26 * Ermine pets gog

15:26 * gog prr

15:27 <gog> yeah it makes way more sense if you're backing up whole data centers

15:27 <gog> not so much if it's just your own stuff

15:27 <Ermine> Do tape drives still exist???

15:27 <gog> oh yeah

15:27 <Ermine> want one

15:28 <gog> i should pick up a storage drive for backups

15:28 <Ermine> Like those? https://www.amazon.com/LTO-5-Ultrium-3000-Internal-Drive/dp/B0067EZ52G/ref=sr_1_5?keywords=lto+tape+drive&qid=1679412492&sr=8-5

15:28 <bslsk05> www.amazon.com: LTO-5 Ultrium 3000 SAS Internal Tape Drive - Bandlaufwerk - LTO Ultrium ( 1.5 TB / 3 TB )

15:29 <Ermine> heck, those are expensive

15:35 <mrvn> Even the older models still are expensive.

15:36 <mrvn> Ermine: is there anything other than tapes that is designed for 30-60 years of storage? Only thing lasting more would be paper.

15:36 <Ermine> Ok ok

15:37 heat has joined #osdev

15:37 heat_ has quit [Read error: Connection reset by peer]

15:38 inegatives has quit [Quit: Client closed]

15:39 <SGautam> gog, Yes, one was purchased in 2018, the other in 2019 and the newest one in 2020. They all have the same design. They were all 2 TB Backup Plus line.

15:39 <zid> people still buy petabytes of tapes

15:39 <SGautam> All failed right during the time something was being transferred though. One failed while trying to install a game from Steam, the other failed when transferring photos, and the last one failed while installing ANSYS.

15:40 <SGautam> Like suddenly. You hear the Windows eject sound, boom, its gone. Setup / Transfer errors out, and you put it back in and poof, nothing. Windows nor Linux even detect it.

15:40 Left_Turn has joined #osdev

15:45 Ali_A has quit [Quit: Client closed]

15:45 bauen1 has quit [Ping timeout: 268 seconds]

15:47 linearcannon has quit [Remote host closed the connection]

15:48 linearcannon has joined #osdev

15:51 <mrvn> Remember zip drives with their click of death?

15:51 <heat> from experience, windows completely falls over once ATA transfers start failing

15:52 <heat> linux is a lot more resilient *and* you get all the ATA errors in the dmesg

15:53 <zid> that's because windows treats half the fs as important files that must not be missing

15:53 <zid> linux be like "I guess, bzImage has to be in memory?"

15:54 Ali_A has joined #osdev

15:54 <mrvn> heat: linux also easily goes indo an endless reset loop and never reports a failure to the upper layers.

15:55 <mrvn> So e.g. the raid never throws out the faulty drive and retries on a different one. It just blocks forever.

15:57 slidercrank has joined #osdev

16:01 <mrvn> If anyone wants to work on the Linux ATA errors I have a fun little project that needs support. I have a bunch of relays inserted into the power connection so you can cold reset the drive on demand or even put it into cold standby on inactivity.

16:16 Ali_A has quit [Quit: Client closed]

16:23 Ali_A has joined #osdev

16:31 <SGautam> I'm really scared at doing anything with hard drives at this point. Like I thought the point of HDDs was to store large stuff.

16:32 <zid> hdds are very very reliable

16:32 <zid> (unless they are not)

16:32 <SGautam> Three of them failed continously for me in the last month.

16:32 <zid> was there an earthquake

16:33 <SGautam> Nope, all failed when trying to transfer large amount of data.

16:33 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1087776035619221685/image.png

16:33 <SGautam> Maybe the USB controller in my laptop could be an issue?

16:34 <SGautam> Because I did notice that all USB devices started to act wacky whenever an HDD failed.

16:34 <zid> 4850 days

16:34 <zid> ah if it's a portable drive then the answer to was there an earthquake is "yes", btw

16:34 <SGautam> Yes, it was an external HDD

16:34 <zid> 13.3 years my 1TB drive has been powered on for

16:35 <SGautam> Is it advisable to carry external HDDs in a backpack while travelling without some sort of a case?

16:35 <zid> when they're off they're fairly fine, the heads aren't loaded

16:35 <SGautam> I'm trying to narrow down the cause of failure here. 3 HDDs failing should mean something's definitely wrong with my system.

16:36 nvmd has quit [Quit: Connection closed for inactivity]

16:36 <SGautam> Basically all 3 drives now, once connected spin for a while, and then stop. OS doesn't detect anything.

16:36 <zid> oh today is the 40k hour anniversary of the 4TB drive

16:36 <SGautam> 40,000 hours certainly...?

16:36 <zid> certainly what?

16:36 <SGautam> Oh nvm, didn't read the "hour"

16:37 <SGautam> read it as "40k anniversary"

16:37 <zid> Yes, it's now officially part of the warhammer universe

16:41 pfsoe has left #osdev [#osdev]

16:42 <heat> mjg, idk if you've been following the thread or not, but linux is going to start returning EINVAL (like netbsd) there

16:42 <heat> 1) sweet 2) linux just broke userspace 😲😲😱😱😱😱

16:43 Ali_A has quit [Quit: Client closed]

16:43 foudfou_ has joined #osdev

16:44 foudfou has quit [Remote host closed the connection]

16:44 Ali_A has joined #osdev

16:46 inegatives has joined #osdev

16:50 linearcannon_ has joined #osdev

16:50 linearcannon has quit [Remote host closed the connection]

16:51 <geist> heat: woot your patch got accepted?

16:52 <heat> no

16:53 slidercrank has quit [Ping timeout: 276 seconds]

16:53 <heat> my patch restored the old behavior, they decided to break any past linux behavior and just EINVAL the whole thing

16:53 <heat> because it was stupid and "since no one complained for two stable releases, no one cares probably"

16:54 <heat> which is a good idea IMO

16:54 <geist> ah, so create and dir at the same time is simply EINVAL now?

16:55 <heat> yes

16:55 <heat> it's probably an open question (might ask) if there's any value in letting open create a directory in that case

16:56 <heat> but I doubt anyone on that side is inclined towards that

16:59 <heat> you know, it's pretty remarkable how linus still knows his shit

17:00 <zid> where's the lkml link

17:00 <zid> or do I have to check my own emails

17:00 <zid> oh found it, from:pedro

17:01 <geist> well sounds like someone voted for pedro

17:02 <heat> damn, i'm getting votes now?

17:02 <heat> what do I get?

17:02 <zid> executed.

17:02 <heat> shit

17:03 <geist> a t shirt: https://usercontent.irccloud-cdn.com/file/u94LPfGK/pedro.png

17:04 Perflosopher has joined #osdev

17:04 <zid> heat if you reply again, put a 'hi zid' at the bottom kthx

17:05 <heat> "hi linus, can you say hi to my friend zid? he's a big fan"

17:05 <zid> exactly

17:05 <heat> he needs a cameo account but he's still not a washed enough celebrity for it

17:06 <zid> I actually like the idea of O_CREAT | O_DIR though

17:06 <zid> it's less racey than doing that as two calls

17:07 <zid> like "I wanna make this file, but if it's a directory, open it instead"

17:07 <heat> it's a weird idea though?

17:07 <zid> gentoo does something similar for portage

17:08 <zid> it'll either open blah.conf or every file in blah.conf/

17:08 <heat> like in what case would you ever want to open *a directory*, but if its not a directory, create a regular file?

17:08 <geist> it depends on if you consider O_CREAT to be 'create a file, not anything else'

17:08 <geist> or 'O_CREAT can be combined with other attributes to specify the type of thing'

17:08 <heat> having O_CREAT | O_DIR do "open-dir-or-create-dir" atomically would be a much more interesting idea

17:09 <geist> right, i'd tend to read O_DIR being combined with O_CREAT to be a modifier to O_CREAT to tell you to make a dir instead of a file

17:09 gog has quit [Quit: Konversation terminated!]

17:09 <geist> but since there's a mkdir call, it's redundant, so the EINVAL return is probably legit

17:09 <heat> it's not redundant

17:09 <geist> but then you can argue that if it *did* make a dir, then it's not redundant

17:10 <heat> mkdir only creates, does not open, and does not give you a fd

17:10 <geist> because then it's actually useful feature: open a dir ... yeah wha tyou said

17:10 inegatives has quit [Quit: Client closed]

17:11 <geist> i do wonder what really old unices did here. it's fairly clear that early on there was no distinction between dir and file, though maybe the dir bit was still set in the inode

17:11 <geist> so seems like the open(CREAT|DIR) would have been the original pattern

17:11 <heat> O_DIRECTORY isn't that old

17:11 <geist> oh maybe that's the diff, at some point dir as a separate type of file came along

17:11 <heat> it's in linux since 2.1

17:11 <geist> though i dunno how in the old days it was specified that something was a dir, vs a regular file

17:12 <geist> well, okay, so thats like only 20ish years or so

17:12 <heat> i mean, normally? with a file type?

17:12 <zid> heat: I like fewer calls to the fs to achieve what I want, and imo there's no drawback to *supporting* create-file-or-open-dir-in-one-call

17:12 <heat> O_DIRECTORY really is just a "make sure this is a dir"

17:12 <zid> and I can think of *a* use for it

17:13 <zid> like, if the alternative is -EINVAL then why not just make it do something

17:13 <heat> because the "something" is kind of the antithesis of O_DIRECTORY

17:13 <zid> meh, I just consider it a flag

17:13 <geist> yah depends on if you consider the flag combination to be 'limit it to this' or 'add this additionally'

17:14 <zid> flags can do flaggy things and mean things dpeneding on other flags

17:14 <heat> O_DIRECTORY = only return directories, O_DIRECTORY | O_CREAT = open *only* directories, or create a regular file

17:14 <geist> i'd tend to consider it to be the former

17:14 <geist> ie, i passed O_DIR on this, so i only want this to deal with dirs, and nothing else

17:14 <zid> mkdir's existence at all is the weird part of all this

17:14 dutch has quit [Quit: WeeChat 3.7.1]

17:14 <geist> and in that logic it makes sense to limit the combination to at most 'create a new dir if it doesn't exist, but if it's a file, error'

17:15 <zid> I don't value consistency here, given it returns EINVAL now, and it does *something* useful.

17:15 <geist> so i guess -EINVAL is just a cop out to avoid the problem

17:15 <zid> I'd rather have the useful option available than not

17:15 <heat> i think in theory mkdir, mknod, etc could in theory have been mostly collapsed to open()

17:15 <zid> yea idk why mkdir just isn't O_CREAT | O_DIR

17:16 <zid> to begin with

17:16 <heat> because le unix

17:16 <geist> well because mkdir only creates a dir, not opens it additionally

17:16 <zid> do I need to dissect dennis tichie

17:16 <geist> so it's not precisely the same semantics

17:16 <heat> geist, so open(O_DIR | O_CREAT | O_EXCL)

17:16 <geist> that makes sense

17:17 <zid> I think it's probably rarer that you want to make an empty dir than you are making a dir so you can then write a file

17:17 <geist> of course mkdir() user code also returns a DIR * iirc, but that's just a layering thing. it could built on top of open(...)

17:17 <heat> no, mkdir returns an int (0 or -1)

17:17 <geist> hmm, what is the user api that gets you a DIR *?

17:17 <zid> mkdir bad and racey and inconsistent with other api

17:17 <heat> should be the opendir stuff, etc

17:17 <heat> but those don't create I think

17:17 <geist> oh i see. mkdir is ust error, opendir gets you the DIR &

17:18 <geist> yah i'm gonna guess it has something to do with when DIRs were just regular files and the contents were maintained by user space

17:18 <geist> so in that case mkdir is kinda like 'format this file with dir structures'

17:18 <zid> yea

17:18 <zid> like mkisofs

17:19 <heat> mkdir (and mknod) are kind of odd because there's a blatant pseudo-TOCTOU that no one has cared to address, ever

17:19 <heat> while open is super nice and super well designed but does not support any sort of other files because of stubbornness or something

17:19 <zid> yep

17:19 <zid> that's why my brain goes "Just make open do it"

17:20 <zid> fewer syscalls better, less racey

17:20 <Ermine> real thing

17:20 <zid> I *want* an atomic 'open file or dir or make it or whatever' because I definitely do *not* want to have to write that myself some day.

17:20 <geist> flip side is, you're dealing with posix and 50 years of legacy, so you kinda are already knee deep in crap at that point

17:20 <zid> from userspace

17:20 <geist> you kinda cant get what you want by definition

17:20 <geist> ie, if you want clean apis dont implement posix :)

17:21 <Ermine> the only thing is that dir works through DIR struct, and regular files are just a fd

17:21 <geist> which of course doesn't fix your problem, it just makes it entirely your problem that you can hypothetically fix

17:21 <geist> might be interesting to see what win32's raw dir api looks like. i dont know off the top of my head

17:21 <heat> but UNIX is perfect

17:22 <geist> i bet win32 just treats it all as a separate kind of file: OpenDirectory, ReadDirectory, etc

17:22 <heat> it's like a gordon ramsey dish but where gordon ramsey never actually went to culinary school

17:23 <zid> An application cannot create a directory by using CreateFile, therefore only the OPEN_EXISTING value is valid for dwCreationDisposition for this use case. To create a directory, the application must call CreateDirectory or CreateDirectoryEx.

17:23 <zid> is all I know about dirs on windows

17:24 <heat> yeah ok, so you can open a dir with CreateFile

17:24 <zid> yea you just can't make one

17:24 <zid> linux's open can create, windows' create can open

17:24 <zid> :D

17:24 <heat> passing note that all these win32 calls are weirdly named and CreateFile is "Create internal win32 file handle to $stuff"

17:25 <geist> yah was gonna point out that createfile is weird

17:25 <zid> yea createfile is basically createfd

17:25 <zid> and can be a file, or DEVICE_HANDLE or whatever

17:25 <zid> cus they don't go all 'files are devices'

17:25 bauen1 has joined #osdev

17:25 <zid> they just say 'everything gets a HANDLE'

17:25 <heat> yeah, they go "all files are objects in this object manager extravaganza"

17:25 <heat> which, honestly, i'm not sure which is worse

17:26 <geist> they probably have limits of what kinda handles you can point files at but that makes sense in that case: create file says no

17:26 <zid> at least linux doesn't crash if you try to open CON

17:26 <geist> sockets too being handles of a particular type

17:26 <immibis> windows also doesn't crash if you try to open CON. It just opens CON, right?

17:26 <zid> no

17:26 <immibis> you could say linux crashes when you try to open /dev/tty

17:26 <zid> windows 3 -> xp just crashed

17:26 <zid> windows supports posix socket stuff, it just caps out with an impressiv MAX_FDS of 32

17:27 <geist> i definitely embrace the notion of one handle namespace for all things, which we tilted hard on in zircon and still works

17:27 <geist> and that files are just one kinda handle

17:27 <geist> so i guess in windows the fd namespace is separate, but you have to map files to it

17:27 <Ermine> Directories are a special kind of files, so I don't think you can get away without some special API

17:27 <geist> or the fd namesace is maintained in user space, maybe

17:28 <zid> \\.\GLOBALROOT\Device\ConDrv\KernelConnect is what you can open to crash modrn windows apparently

17:28 <zid> same file, but bypasses the 'CON' strcmp protections

17:28 <geist> nice, though presumably you need particular ACLs to get that

17:28 <heat> <immibis> you could say linux crashes when you try to open /dev/tty <-- what?

17:29 <zid> My favourite windows crash was (or maybe still is) that two applications could debug each other, as a sort of DRM, then if you tried to attach a debugger to either it'd just bluescreen

17:29 <heat> Ermine, how so?

17:30 <Ermine> They have a structure, they are not a bag of bytes

17:30 <zid> neither are files

17:30 Ali_A has quit [Quit: Client closed]

17:30 <zid> they have spooky things like names and permissions

17:30 <zid> which you have to call special apis to query

17:31 <heat> i can imagine a world where reading through a dir fd would just give you structured records of directory entries

17:31 <Ermine> Or you need to call fdopendir after open every time

17:31 <zid> heat: I think that's where we came from

17:31 <zid> the DIR * thing

17:31 <Ermine> heat: and reading anything else than sizeof(struct direntry) is UB?

17:31 <heat> old UNIX (and some new UNIX) just reads the raw disk struct

17:32 <heat> Ermine, no, it could work more or less like getdents does

17:32 <zid> or .zip or .cab or .hpi or any other format that can handle nested files, imo

17:32 <heat> actually, current netbsd does support catting directories (which I assume is just a plain open + read())

17:32 <heat> and it gives you binary garbage, probably straight from the fs

17:33 <geist> last i checked most of the BSds let you directly read the dir if you're root. but i suspect that only makes sense on FSes where the dir itself is a flat file with a structure on it

17:33 <geist> presumably fses where the dir is a more abstract concept on disk wouldn't work

17:33 <mjg> freebsd disallows it now

17:33 <heat> yeah i wonder what happens if you do it on zfs

17:33 <geist> mjg: makes sense

17:33 <mjg> it was hysterical behavior

17:34 <mjg> afair there used to be no readdir back in the day

17:34 <geist> yah and wouldn't makes sense for something like zfs

17:34 <heat> mjg, FREEBSD IS A TRAVESTY AND IS A BETRAYAL OF TRADITIONAL UNIX VALUES

17:34 Ali_A has joined #osdev

17:34 <geist> ALL HAIL TRU64

17:35 <heat> tru64 is indeed the true 64

17:35 <heat> mjg, fyi i'm now armed with 100 or so pages about STREAMS

17:36 <mjg> did ya know that AIX in the 90s was claimed to support 8 CPUs?

17:36 <heat> send me money or I'll add STREAMS to FreeBSD

17:36 <heat> this is not a threat, it's a promise

17:36 <zid> 8 cpus? nobody can afford 8 cpus.

17:36 <zid> THat's a lie

17:36 <Ermine> heat: how many?

17:36 <zid> intel's selling 4 cpus superglued together now though, for $11000, so we're almost there

17:36 <geist> 1 2 3.... 7

17:36 <geist> 8 no way

17:36 <mjg> or 9

17:36 <mjg> but 8?

17:36 <heat> Ermine, a football field worth of pages

17:36 <geist> 10 is right out

17:36 <mjg> liek gap in the market innnit

17:37 <zid> what if I promise to proceed directly to 11?

17:37 <Ermine> heat how much money?

17:37 <heat> idk whatever I can extort from those freebsd nerds

17:37 <Ermine> say no more

17:37 * Ermine runs to bank

17:38 <heat> i said freebsd, not alpine

17:38 <zid> heat: as your manager, don't forget my 30%

17:38 <heat> your musl distribution is safe, for now

17:38 <mjg> so what book is it

17:38 <mjg> solaris internals or secrets of the magic garden

17:38 <heat> secret of the magic gard

17:38 <mjg> which is 'solaris internals to be'

17:38 <mjg> :x

17:38 <Ermine> heat: you lost a chance to get some $$$

17:39 <Ermine> Btw does anything use STREAMS?

17:40 <heat> some of the old UNIXes maybe?

17:40 <geist> hmm, does any of the modern ones implement it?

17:41 <CompanionCube> geist: illumos does?

17:41 <heat> yeah erm, that's what I meant with "old UNIXes"

17:41 <geist> ah because solaris heritage

17:41 <heat> AIX, HPUX, Illumos, Solaris

17:41 <geist> i honestly never looked at it. like all things it's probably got some okay ideas but didn't catch on

17:41 <heat> basically anything coming out of the SVR4 oven

17:41 <heat> hell, UnixWare

17:41 <geist> though i guess it could just as easily have totally fatal flaws

17:42 <CompanionCube> heat: isn't current unixware freebsd-based though?

17:43 <geist> hmm, looks like poll may have come out of streams

17:43 <mjg> poll?

17:43 <nortti> the poll-syscall

17:43 <heat> you know, gotta love these UNIX function names

17:43 <heat> lookuppn

17:43 <mjg> nortti: not what i'm asking

17:43 zxrom has quit [Quit: Leaving]

17:43 <geist> poll syscall yes

17:44 <geist> just reading the wikipedia summary of STREAMS https://en.wikipedia.org/wiki/STREAMS

17:44 <geist> says in the first part poll, putmsg, and getmsg were added

17:44 <mjg> fbsd man page says The poll() function appeared in AT&T System V UNIX.

17:45 <mjg> no info how did that happen ;>

17:45 <heat> magic

17:45 <mjg> > This port added the putmsg, getmsg, and poll system calls, which are nearly equivalent in purpose to the send, recv, and select calls from Berkeley sockets.

17:45 <mjg> huh

17:45 <heat> it's where the magic garden came along, really just magic

17:45 <mjg> > FreeBSD has basic support for STREAMS-related system calls, as required by SVR4 binary compatibility layer.[21]

17:46 <mjg> hehe

17:46 <heat> o shit

17:46 <mjg> not true dawg

17:46 <heat> can't extort you idiots

17:46 <geist> yah makes sense that at least one part of it survived. poll() sounds like made the cut

17:46 <mjg> maybe it wa at the time

17:46 <heat> ah, so the man page is lying again

17:46 <geist> well, basic support for the syscalls may be just returning -EINVAL

17:46 <heat> swear to god, why have man pages?

17:46 <mjg> geist: ye i'm saying i would expect poll to show up shortly after select

17:46 <mjg> being less fucking retarded 'n all

17:46 <mjg> heat: you mean why have wiki pages

17:47 <heat> geist, fyi if you want to read up on STREAMSSSSSSSSSSSSSSSSSSSSS, https://archive.org/details/magicgardenexpla0000good/

17:47 <heat> just borrow it for an hour and laugh at the oldies

17:48 <heat> "BUT THIS DOESNT SCALEEEEEEEEEEEEEEEEEE!!!!!!11111111111!!!!" - exclaimed mjg geist

17:48 <CompanionCube> there's also a programming guide released by sun that isn't a dead link now

17:49 warlock has quit [Ping timeout: 276 seconds]

17:49 inegatives has joined #osdev

17:50 gog has joined #osdev

17:51 <mjg> DOORS MOFER

17:52 Ali_A has quit [Quit: Client closed]

17:52 <heat> windows should get GATES

17:52 <heat> mjg, you still haven't properly explained why the 4.4BSD vfs sucks

17:53 <geist> heat: i see wha tyou did there

17:53 <heat> if you go for a single lookup(bunchaoptions) do you end up getting some spaghetti-ass-code from all the special casing?

17:53 <gog> meow

17:54 <mjg> heat: i did give the gist of it

17:55 <CompanionCube> heat: i mean, technically wow64 iirc has something called 'heaven's gate'

17:55 <mjg> heat: the gist is someone wrote ufs support directly all over, then "vfs" got added in that a bunch of places got an indirect function call instead of just descending into ufs

17:55 <mjg> heat: .... all while numerous idiotisms stemming from ufs remained as the de facto api

17:56 inegatives has quit [Ping timeout: 260 seconds]

17:58 <geist> yah makes sense. NFS too: take everything that was at the vfs layer and shove it over a network

18:00 Ali_A has joined #osdev

18:02 <mjg> heat: also read about 'vnode aliasing'

18:06 rnicholl1 has joined #osdev

18:09 <heat> mjg, do you have no qualms with the lookup stuff?

18:10 Ali_A has quit [Quit: Client closed]

18:11 <mjg> i do, it's all weird at best

18:12 <mjg> perhaps you forgot of v_id

18:12 Ali_A has joined #osdev

18:13 <heat> what's v_id?

18:13 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

18:13 <heat> and why is it "weird"?

18:13 zxrom has joined #osdev

18:13 <mjg> it is "we don't properly manage vndoe lifecycles, so vnodes never get *actually* freed. instead, v_id gets bumped when they do, so you check it"

18:14 <heat> lol

18:22 foudfou_ has quit [Remote host closed the connection]

18:22 foudfou has joined #osdev

18:23 <mjg> again check vnode aliasing, related to device node handling

18:26 inegatives has joined #osdev

18:27 <mjg> i'm not gonna point fingers at systems which *still* do it :]

18:28 <heat> is it LibreBSD or Sunloris?

18:28 <heat> or both?

18:30 <mjg> net and open b

18:30 <mjg> here is how you 'free' a vnode https://grok.dragonflybsd.org/xref/386bsd/usr/src/kernel/kern/fs/vnode.c?r=dc8b130e#489

18:30 <bslsk05> grok.dragonflybsd.org: vnode.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/kern/fs/vnode.c

18:31 <mjg> you don't. it lands on an eternal "free list"

18:31 <mjg> here is how you alloc https://grok.dragonflybsd.org/xref/386bsd/usr/src/kernel/kern/fs/vnode.c?r=dc8b130e#104

18:31 <bslsk05> grok.dragonflybsd.org: vnode.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/kern/fs/vnode.c

18:31 <mjg> 109 if (numvnodes < desiredvnodes) {

18:31 <mjg> 110 vp = (struct vnode *)malloc((u_long)sizeof *vp,

18:31 <mjg> 111 M_VNODE, M_WAITOK);

18:32 <mjg> otherwise grab from the "free list"

18:32 <mjg> vnode once allocated persist till reboot

18:32 <mjg> cache_purge called from there bumps v_id

18:32 <heat> what would happen if you actually freed?

18:33 dutch has joined #osdev

18:34 <heat> i mildly enjoy how you just xref'd on 386BSD lol

18:34 <mjg> presumably you wuld get crashen all over

18:36 <mjg> https://grok.dragonflybsd.org/xref/386bsd/usr/src/kernel/nfs/nfs_vnops.c?r=dc8b130e#451

18:36 <bslsk05> grok.dragonflybsd.org: nfs_vnops.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/nfs/nfs_vnops.c

18:36 <mjg> 451 vpid = vdp->v_id;

18:36 <mjg> ... vget fuckery

18:36 <mjg> 470 if (vpid == vdp->v_id) {

18:36 <mjg> ops hehe not the same vnode anymore?

18:37 <heat> why does the average BSD fs routine have 50 variables

18:37 inegatives has quit [Quit: Client closed]

18:38 <heat> all poorly named too

18:38 <mjg> hands busy taking lsd

18:38 <mjg> no time to type

18:39 <heat> the USL people also had this same issue

18:39 <heat> lookuppn

18:39 <heat> compvpp, dirvpp

18:40 <mjg> i also note that by implementation, not actual real needz, all thatl ookup code is expensive af

18:40 <mjg> massive branchfest all over

18:41 <heat> there you go

18:41 <heat> yes, i assume it's spaghetti

18:41 <heat> that's what I wanted to hear

18:41 <heat> "doing everything in one lookup routine is a bad idea"

18:42 <heat> not like linux lookup seems to be any readable either

18:43 <heat> but it does seem like the strategy is overall a bit different

18:43 <mjg> https://grok.dragonflybsd.org/xref/freebsd/sys/ufs/ufs/ufs_vnops.c?r=191115cf#368

18:43 <bslsk05> grok.dragonflybsd.org: ufs_vnops.c (revision 191115cf) - OpenGrok cross reference for /freebsd/sys/ufs/ufs/ufs_vnops.c

18:43 <mjg> almost all real-world calls are with VEXEC

18:44 <mjg> the cpu sees 'ufs_*' and knows it's going to bei n ther for a while

18:44 <mjg> https://grok.dragonflybsd.org/xref/freebsd/sys/kern/vfs_subr.c?r=62a573d9#5244

18:45 <bslsk05> grok.dragonflybsd.org: vfs_subr.c (revision 62a573d9) - OpenGrok cross reference for /freebsd/sys/kern/vfs_subr.c

18:45 Ali_A has quit [Quit: Client closed]

18:45 <mjg> let's keep computing all these possible flags

18:45 <mjg> so that at the end of it one can: if ((accmode & (priv_granted | dac_granted)) == accmode) {

18:46 <mjg> meanwhile in a chad filesysetm:

18:46 <mjg> (node->tn_mode & all_x) == all_x))

18:46 <mjg> if (__predict_true(accmode == VEXEC &&

18:46 <mjg> return (0);

18:46 johngammerson has joined #osdev

18:46 <mjg> there, almost all sorted out in 2 brnaches

18:47 <mjg> one may notice it would be better to have a dedicated routine so there is no need tob ranch on whether accmode == VEXEC

18:47 <heat> __predict_true is stupid

18:47 <mjg> ... which is what i did later

18:47 <mjg> how so dawg

18:47 <johngammerson> hello guys, i am truly ashamed to come back here but i have an assembly question, does loop auto return to the label it was called from?

18:47 <heat> because it's so stupidly named I probably wouldn't think of using it

18:47 <heat> likely()!

18:47 <johngammerson> i rtfm'd but still couldn't find an answer

18:47 inegatives has joined #osdev

18:48 <johngammerson> do i need to return manually after the loop?

18:48 johngammerson is now known as ilovethinking

18:48 <mjg> heat: oh the name. i don't have a dstrong opinion

18:48 <nortti> ilovethinking: are you talking of the x86 "loop" instruction? when you say "the label it was called from" what do you mean?

18:48 <mjg> heat: afair it was named like that to "discourage frivoluos use"

18:48 <ilovethinking> nortti: can i paste somewhere?

18:48 <heat> also I would guess that doing this sort of weird micro optimization is weird

18:48 <ilovethinking> like paste.sh?

18:49 <nortti> yeah

18:49 <heat> it's like the weird prefetch()es in the middle of random linux kernel code

18:49 <gog> when the loop condition is satisfied it will continue to the next instruction after loop

18:49 <mjg> heat: how about likely or unlik.ly in said code

18:49 <mjg> you might have missed the part where almost all actual calls are VEXEC

18:50 <heat> yeah, they are, but will it ever change and does this make a difference?

18:50 <heat> the answer to "will it ever change" is clearly yes, since you added a completely separate routine for VEXEC

18:50 <heat> now your likely() makes no sense

18:51 <gog> for regular loop, opcode e2, it will loop until ecx is 0

18:52 <gog> for loope and loopne it'll loop until ecx is 0 or when zf is 1 or 0 respectively, which can be set within the body of the loop

18:52 <heat> i don't think i've ever used loop before

18:52 <ilovethinking> https://paste.sh/M0suk_AY#mydgpuiSk3WvC0qlNk6-mVqF

18:52 <gog> me either

18:52 <gog> easiser to use conditional jumps

18:52 <heat> what? you have loops?

18:52 <bnchs> hi osdeveersss

18:52 <heat> i just unroll everything

18:53 <heat> i am firmly against the concept of a loop

18:53 <gog> ilovethinking: ok so

18:53 <gog> there's no ret after loop

18:53 <gog> so it'll fall through back into _start

18:53 <gog> but you'll still have the stack frame

18:53 <gog> mov eax, 1 will never execute

18:54 <bnchs> heat: but loops save disk space on repeating :<

18:54 <heat> loops are morally wrong

18:54 <gog> you'll forever call .SetEntry

18:54 <gog> and eventually overflow your stack

18:54 <ilovethinking> hmmm... is there a way to make it continue to execute the mov eax? can i do this without calling? idk if im wrong but i saw someone do it without call, just setting the .setentry in the middle of start

18:54 <ilovethinking> that makes no sense

18:55 <nortti> "hmmm... is there a way to make it continue to execute the mov eax?" yeah, use a ret

18:55 <mjg> heat: is still makes sense. the routine added is only used by lockless lookup

18:55 <mjg> heat: should i add one for locked looukp, then this indeed should b removed

18:55 <gog> every call needs a corresponding ret

18:55 <ilovethinking> so basically ret after the loop?

18:55 <gog> yes

18:55 <mjg> heat: so the real calls which land there are still almost alwasy VEXEC< there is just way fewer of them

18:55 <ilovethinking> thanks

18:55 <ilovethinking> appreciate it

18:56 <gog> assembly will do EXACTLY what you tell it

18:56 <gog> a label is not a function

18:56 <gog> it's just a symbol that refers to a position in code

18:56 <ilovethinking> i'm gone from here again so i don't indulge in accidentally making someone spoongf

18:56 <ilovethinking> spoonfeed me

18:57 <gog> there are very few things that will happen implicitly

18:57 <ilovethinking> gog: yeah that's kinda hard to get used to for me

18:57 <gog> you'll grow to understand it

18:57 <gog> assembly doesn't have a concept of functions or objects or anything. it's instructions and data

18:58 <gog> you can arrange the instructions to behave like a function

18:58 <gog> but it has to be explicit

18:58 <nortti> ilovethinking: if possible, I'd try to write some user-mode programs in assembly to get used to concepts

18:58 <gog> yes

18:58 <ilovethinking> good idea, what's something i could write to get used to it?

18:59 <gog> implement strlen

18:59 <gog> doesn't have to be fast or branchless

18:59 <gog> but it's a good starting point

19:00 <gog> also godbolt.org is a good resource

19:00 <gog> you can write C code and see the assembly output

19:00 <ilovethinking> i'll implement strlen yes

19:00 <ilovethinking> good idea

19:00 <ilovethinking> thanks goggy

19:00 <ilovethinking> lub u

19:00 <ilovethinking> <3

19:00 ilovethinking has quit [Quit: WeeChat 3.8]

19:01 <bnchs> can i pat gog?

19:01 <gog> yes

19:01 * bnchs patpatpat gog

19:01 * gog prrr

19:01 <bnchs> gog is smart unlike me :3

19:02 <zid> I'm surprsised you still talk to him

19:02 <gog> stop it bnchs

19:02 <gog> zid: i'm too helpful for my own good

19:02 <bnchs> gog: stop what?

19:02 <zid> I'd say it's helpful for neither of you :P

19:02 <gog> bnchs: saying you're not smart

19:02 <gog> believe it or not i _have_ cut people off from programming help

19:03 <zid> I cut heat off, he just can't understand the glory of zidlists

19:03 <gog> huehue

19:04 <gog> i was feeling generous today anyway, i got a lot done at work and i feel pretty good

19:04 <bnchs> gog: i'll stop saying it, anyway how are you?

19:04 <gog> p good today

19:04 <zid> you're very mood volatile

19:04 <gog> how's bnchs

19:04 <bnchs> the usual

19:04 <zid> just feel meh every day like a nice neutral grey person like me

19:04 vdamewood has quit [Quit: Life beckons]

19:04 <gog> i physically can't, i have bipolar disorder

19:04 <zid> except monday for 20 mins when honzuki chapters release, ofc

19:05 <zid> what's the fix for that, bandsaw?

19:05 <gog> low-dose antipsychotics work for me most of the time

19:05 <gog> :P

19:05 <lav> i think you're supposed to eat batteries

19:06 <gog> lithium does _not_ work for e

19:06 <gog> it only causes me gastrointesinal distress

19:06 <lav> mhm

19:06 * gog offer patpat to lav

19:06 <zid> try nickel or cadmium

19:06 <lav> gib

19:06 * gog patpatpatpatpat lav

19:06 <lav> uwu

19:06 * lav prr

19:07 <zid> lead acid is hard on the gastro distress i feel

19:07 <gog> yes

19:07 <gog> i prefer alkaline batteries for indigestion

19:08 <lav> personally i like gallium's chewyness

19:08 <zid> I like gallium served over aluminium

19:08 <zid> good mouthfeel

19:08 * bnchs offer patpat to lav too

19:08 * lav accepts

19:08 * bnchs patpatpatpat lav

19:09 * lav purrs more

19:10 <bnchs> cute

19:10 <lav> askjhgjfgh

19:11 <Ermine> uwu

19:16 gbowne1 has joined #osdev

19:16 inegatives has quit [Quit: inegatives]

19:16 theboringkid has joined #osdev

19:19 <lav> uwu

19:27 rnicholl1 has joined #osdev

19:34 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

19:42 slidercrank has joined #osdev

19:47 inegatives has joined #osdev

19:48 <geist> pata pata pata pon

19:48 <geist> damnit, you made me remember that

19:48 <geist> i've lost the game

19:49 <bnchs> hi geist

19:50 d5k has joined #osdev

19:51 <geist> hey

19:52 <gog> uwu

19:54 <Ermine> geist: pon pon pata pon

19:55 <zid> I need a new audiobook so I can play more tetris

19:55 <moon-child> wat

19:56 <zid> You heard me

20:00 <mjg> CAN'T HURT ME BY GOGGINS

20:02 <geist> Ermine: nooooo!

20:03 rnicholl1 has joined #osdev

20:03 <geist> that game is insidious

20:05 rnicholl1 has quit [Client Quit]

20:06 rnicholl1 has joined #osdev

20:06 <d5k> hi

20:09 rnicholl1 has quit [Client Quit]

20:12 <gog> hi

20:13 SGautam has quit [Quit: Connection closed for inactivity]

20:14 <Ermine> hi

20:15 * Ermine thinks about getting psp emulator...

20:16 <bnchs> hi

20:21 FreeFull has joined #osdev

20:25 <zid> I have a nice new 5800x so naturally I am playing an opengl 1.1 game designed to run on a pentium 4

20:28 inegatives has quit [Quit: Client closed]

20:30 <gog> nice

20:31 <zid> (and it barely used any cpu% on a p4)

20:32 <GeDaMo> Which game?

20:32 <zid> tgm3

20:33 <zid> https://cdn.discordapp.com/attachments/518247259838611476/1087115303995056147/image.png Need to get this from 2:32 to 2:28 or I'm not allowed to keep playing

20:33 <zid> but I was at 2:58 when I started derusting, so, almost there

20:43 d5k has quit [Quit: leaving]

21:04 theboringkid has quit [Quit: Bye]

21:04 theboringkid1 has joined #osdev

21:06 theboringkid1 is now known as theboringkid

21:10 inegatives has joined #osdev

21:19 theboringkid has quit [Quit: Bye]

21:38 xvmt has quit [Ping timeout: 246 seconds]

21:40 GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]

21:40 xvmt has joined #osdev

21:41 inegatives64 has joined #osdev

21:41 inegatives64 is now known as innegatives

21:45 inegatives has quit [Ping timeout: 260 seconds]

21:47 slidercrank has quit [Ping timeout: 250 seconds]

21:56 heat_ has joined #osdev

21:56 heat has quit [Read error: Connection reset by peer]

21:58 innegatives has quit [Quit: Client closed]

22:09 rnicholl1 has joined #osdev

22:11 bgs has quit [Remote host closed the connection]

22:26 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

22:29 rnicholl1 has joined #osdev

22:35 inegatives has joined #osdev

22:37 inegatives has quit [Client Quit]

22:43 <mjg> check out scalability on freebsd running ubuntu jammy userspace in a chroot and doing -j 104 allyesconfig kernel

22:43 <mjg> 25667.23s user 13060.02s system 2784% cp

22:43 <mjg> u 23:10.71 total

22:43 <mjg> that is about 1/3rd of the total is in the kernel, out of which about 2/3rds are spinning on locks

22:44 <mjg> and the entire thing took to so long total real time because said spinning is for spinlocks to queue you up for off cpu wait

22:44 [itchyjunk] has joined #osdev

22:44 <mjg> interestingly it was not anywhere near this bad on ubuntu 18, where it took about 8-ish minutes

22:44 <mrvn> 1/3rd utiliziation sounds rather bad

22:44 <mjg> something has changed in glibc malloc which, most of the problem i see is i mmap et al

22:45 <moon-child> it sounds very bad

22:45 <mjg> it is incredibly bad

22:45 <mjg> almost all these waits are on 'page busying'

22:45 alturmann1729 has quit []

22:46 <rnicholl1> probably moving to do more page allocation and less management in userspace?

22:46 <rnicholl1> oh

22:46 <rnicholl1> maybe also the spin issue

22:46 <rnicholl1> spinlocks are REALLY bad when scaling up the number of threads

22:47 <rnicholl1> like they scale O(T^2) in overhead, T= threads

22:48 <rnicholl1> they made more sense when you had like 1 or 2 or 4 processor cores

22:48 <rnicholl1> anything more than that though.. just sleep immediately

22:49 <mjg> you missed the part where going off cpu is so frequent that the all the spinning is on the locks needed to go off cpu

22:49 <mrvn> You want a per-core malloc that pre-allocates dynamically

22:49 <rnicholl1> less an issue with the kernel, where you can guarantee the other thread is running

22:49 <mrvn> mjg: don't run significantly more threads than core

22:49 <rnicholl1> per core? how would that work

22:50 <rnicholl1> os can reschedule threads to different core at any time

22:50 <rnicholl1> per thread is possible

22:50 <rnicholl1> but ...dumb

22:51 <mrvn> rnicholl1: you can check the core-id when entering malloc and then use that cores pool.

22:51 <rnicholl1> Hum, you still need a lock though

22:52 <rnicholl1> Any other thread could get scheduled onto the same core

22:52 <mrvn> that's less of a problem since that rarely happens. But you can have multiple threads on the same core.

22:53 <mrvn> But if the malloc lock is held then you pretty much know another thread on the core is holding it so you can just go to sleep.

22:53 <mrvn> Which then allows the owning thread to wake up and finish. And you don't have contention between cores.

22:54 <mrvn> But really, why are you using more threads than cores? That's usualy stupid or a sign of a bad language that uses threads for IO.

22:58 <mrvn> Threads should have a dont-preempt flag that's writable from userspace that makes the kernel not migrate the thread and not switch to another thread from the same process on that core.

23:13 <rnicholl1> yeah that would be pretty neat

23:14 <rnicholl1> I was actually thinking of doing that if I ever get my code working

23:14 <rnicholl1> but I am a filthy casual when it comes to kernel programming

23:14 <rnicholl1> and might never build a working OS

23:15 <rnicholl1> I am starting to get the feeling that Linux is bloated though

23:15 <rnicholl1> somehow, windows is more efficient at scheduling now than linux

23:15 <kazinsal> one of these days I'll finally stop fiddling around with the half dozen dumb side projects and actually go back to working on my "main" osdev project

23:15 <rnicholl1> 10 years ago I would have laughed if someone said windows had a more efficient scheduler than linux

23:15 <rnicholl1> But the benchmarks don't lie

23:16 <rnicholl1> Microsoft has been improving windows and Windows 11 is actually decent

23:16 <rnicholl1> Meanwhile, Linux is resting on laurels, so to speak

23:17 <rnicholl1> In some ways, I think windows NT was way ahead of its time

23:18 <rnicholl1> A lot of the async stuff that windows NT had that seems silly and wasteful on a 1/2 core system is paying dividends on modern 16-core cpus

23:18 <rnicholl1> Linux approach was designed for single thread with async tacked on as an afterthought

23:19 <rnicholl1> io-uring might help a little but still has the same fundamental issue, it's a band-aid for a flawed interface

23:19 <rnicholl1> basically all linux system calls are designed for synchronous operation

23:20 <rnicholl1> There are huge gaping holes too

23:20 <rnicholl1> like no way to async-write usb devices

23:20 <rnicholl1> cant use epoll for that

23:20 <rnicholl1> blocking or go home

23:20 <immibis> you could also see windows's security model

23:20 <immibis> vs linux's

23:20 <immibis> windows lets you do what you want, and linux doesn't

23:20 <kazinsal> I think there's been changes in the ideology of release types and how much you're allowed to "go fast and break things" on both sides of the fence there

23:21 <rnicholl1> yeah, linux is stuck in the 90s, in more ways than 1

23:21 <kazinsal> since Microsoft maintains multiple different branches of the OS they have more freedom to just try weird things out on the faster release branches

23:21 <immibis> when you get into things like SACLs it becomes quite clear that microsoft had security experts to advise their security design

23:21 <kazinsal> whereas you know that the LTSB will always be a functional snapshot of a specific point in the release history but with security updates for several years

23:22 <kazinsal> for a while I was running Insider Preview builds of Windows 10 and it was interesting to see what they would just throw at the wall to see what would stick

23:22 <rnicholl1> Honestly, the only good point about linux these days is that it's open source

23:22 <immibis> linux's advantage is customizability. Imagine just writing a new scheduler for windows. You can't. Only microsoft can do that.

23:22 <rnicholl1> I think in most other ways, aside from open source/customizability, linux is worse. well maybe stability too.

23:23 <immibis> mrvn: locks and threads might be entirely the wrong paradigm. Consider message passing instead\

23:23 <rnicholl1> Message passing is slower than locks for sure

23:23 <kazinsal> the fact so many appliances etc. rely on the linux kernel to be stable and provide a specific set of features with specific performance requirements without having constant regressions between releases is kind of a downer

23:23 <rnicholl1> how do you implement message passing

23:24 <rnicholl1> answer: with locks

23:24 <kazinsal> but it's a necessity

23:24 <rnicholl1> Linux is great for embedded applications

23:25 <immibis> do you necessarily? you implement it that way if you're on a system with threads and locks

23:25 <kazinsal> if 5.15.103 works fine but 5.15.104 suddenly loses 2% performance on average in kvm guests and that was just considered an acceptable risk you'd have vendors screaming from the rooftops

23:25 <immibis> why not implement message passing with a lockless ring buffer (forcing synchronization when it appears full or empty)?

23:25 <immibis> (i'm sure there's a good idea why not)

23:26 <rnicholl1> because... how do you implement a lockless ring buffer?

23:26 <rnicholl1> CAS?

23:26 <rnicholl1> that's the same as a mutex lock

23:26 <immibis> you just write the data to memory in a ring, you have a head pointer and a tail pointer

23:26 <rnicholl1> assuming happy path no contention

23:27 <rnicholl1> What do you think lock does?

23:27 <immibis> wait for something

23:27 <rnicholl1> only if it's already locked

23:27 <rnicholl1> GNU's mutex implementation is kinda horrible though

23:27 <rnicholl1> If you DIY it you can make much better ones

23:28 <rnicholl1> GNU literally benchmarks the mutex in a loop with multiple threads going after the same mutex

23:28 <rnicholl1> like how realistic is that

23:28 <rnicholl1> answer: not very

23:28 <immibis> so you're complaining they benchmark the contended case?

23:29 <rnicholl1> They optimized for an unrealistic scenario

23:30 <immibis> seems like one of many realistic scenarios to me

23:30 <immibis> how do you think mutices are realistically used?

23:30 <rnicholl1> The right way to make mutexes, imo, is to make them as small as possible

23:30 <rnicholl1> GNU optimized for stuff like "fairness"

23:31 <rnicholl1> Stuff that should be implemented as additional layers on top of mutexes if you need it

23:31 <rnicholl1> like userland task scheduling

23:31 <immibis> unfair mutices can create counter-intuitive problems. Fair ones "just work" more often, despite being heavier.

23:32 <rnicholl1> true but it's not worth optimizing for bad code when it makes good code slower

23:32 <rnicholl1> imo anyway

23:33 <immibis> if you can deal with a less general faster data structure you should perhaps have to request it

23:33 <rnicholl1> If you write code that spins instead of using condition variables, that's your fault. The standard library shouldn't "fix" your errors.

23:34 <rnicholl1> Or maybe they should implement things with the minimum generality specified by C and C++ standards

23:34 <rnicholl1> and if you want a more general mutex than the one specified by C / C++, you ask for it

23:34 <rnicholl1> instead, std::mutex and mtx_t are just wrappers around pthread_mutexes

23:35 <rnicholl1> despite pthread_mutex_t being a much more heavyweight mutex than what std::mutex/mtx_t is required to do

23:35 <rnicholl1> GNU got lazy

23:35 <rnicholl1> It could have been way more efficient

23:35 <immibis> write your own

23:37 <immibis> isn't that the beauty of open-source?

23:37 <rnicholl1> yeah sure, I've written mutexes before

23:37 <rnicholl1> 4-byte mutex is easy

23:37 <immibis> imagine if people were still calling non-standard OS functions like bzero

23:38 <rnicholl1> would rather std::mutex be more efficient though

23:38 <immibis> did you know what happened when memcpy worked backwards sometimes?

23:40 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

23:46 rnicholl1 has joined #osdev

23:46 <rnicholl1> I wish my laptop would stay online when I closed the lid for 2 minutes heh

23:52 <moon-child> rnicholl1: there are waitfree queues

23:53 <rnicholl1> you still need to wait when there's no message to act on

23:53 <rnicholl1> unless.. god forbid

23:53 <rnicholl1> you spin when there's no messages

23:54 <rnicholl1> Besides, wait-free is just a pipe dream

23:54 <rnicholl1> it's kinda a dumb concept, imo

23:54 <rnicholl1> Based on some aversion to locks

23:54 <rnicholl1> on a theory that the code would be faster if there were no locks

23:54 <rnicholl1> This isn't actually true

23:55 <moon-child> waitfree is a model. All models are wrong, but some are useful

23:55 <rnicholl1> On a fundamental level, the hardware is using locks to implement wait-free algorithms

23:55 <moon-child> no one claims that waitfree code is universally faster than non-wait-free code because of the lack of locks

23:56 <moon-child> wait freedom means that a process is guaranteed to make progress in a bounded number of time steps. Most computers contain a constant number of processors. Therefore, they can in fact implement wait free algorithms

23:56 <rnicholl1> Except that it isn't guaranteed to make progress because the OS scheduler might not schedule them to run

23:57 <zid> Depends on what the sched guarentees

23:57 <moon-child> you said 'CAS? that's the same as a mutex lock'. It's not. It is different and has different performance properties

23:57 <zid> paring weird locks and a weird sched is just a weird os

23:57 <zid> pairing*

23:57 <moon-child> rnicholl1: I was imagining that we were below the level of scheduling user-level processes ('infinite cpu time' abstraction)

23:57 <zid> hopefully you're not on such a syste

23:58 <rnicholl1> CAS is how I implement mutex locks

23:58 <moon-child> FUN FACT

23:58 <moon-child> cas can be used to implement all sorts of things

23:59 <rnicholl1> It's the most performant way if you don't try to implement fairness in userspace

23:59 <rnicholl1> try once, call kernel on fail

23:59 <rnicholl1> that's how I implement locking

23:59 <rnicholl1> spin locks scale reaaaaallly badly on many-core systems