klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
gmodena has quit [Quit: ZNC - http://znc.in]
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
<mrvn> The *heat* death of the universe is a few billion years in the future. Someone will survive us all.
brunothedev has joined #osdev
<AmyMalik> bingo.
<AmyMalik> I reproduced the power virus on linux.
<AmyMalik> This still isn't the hot gun that I need, but it will help me reproduce it
<brunothedev> ?
<brunothedev> i remember when i build a script that read /dev/urandom into a variable in python, i felt so smart looking the ram usage going up and up
<brunothedev> oh also: "sudo dd if=/dev/urandom of=/dev/sda"
brunothedev has quit [Quit: WeeChat 3.8]
<geist> i wonder if they just typed that..
gdd has quit [Ping timeout: 265 seconds]
gdd has joined #osdev
dutch has quit [Quit: WeeChat 3.8]
<zid> we can only hope
dutch has joined #osdev
<AmyMalik> VoidLinux exhibits the power virus until it runs /etc/runit/1. Now to single step that init script.
<sakasama> mrvn: By then the only remaining lifeform will resemble the giant spaghetti monster.
rnicholl1 has joined #osdev
[itchyjunk] has quit [Ping timeout: 255 seconds]
Left_Turn has quit [Read error: Connection reset by peer]
[itchyjunk] has joined #osdev
danilogondolfo has quit [Remote host closed the connection]
heat_ has joined #osdev
heat has quit [Ping timeout: 246 seconds]
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
rnicholl1 has joined #osdev
Matt|home has joined #osdev
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
heat_ is now known as heat
rnicholl1 has joined #osdev
nyah has quit [Quit: leaving]
<rnicholl1> Hi, I am confused about how to use the WFE instruction
<rnicholl1> my understanding is that this puts the CPU into a low power sleep state until woken up by SEV
<rnicholl1> The examples show ldar followed by comparison and loop
<rnicholl1> the way to wake up the CPU is SEV
<rnicholl1> but my thought is, what happens if the stlr + sev happens between the comparison and WFE?
<Mutabah> The idea of WFE is it is used when you don't have anything else to do in that thread
<Mutabah> so timing doesn't matter, you're usually doing an infinite loop with it
<rnicholl1> Right, the documentation suggests it can be used for a spinlock
<rnicholl1> "you can insert the WFE instruction into the loop so the CPUs instead of looping continuously will enter STANDBYWFE. Then the CPU who has been using the resource should execute SEV instruction after it has finished using the resource. This will wake up all other CPUs from STANDBYWFE and another CPU can then access the shared resource."
<rnicholl1> But my question is
<Mutabah> It might have a maximum wait time? or be woken by any other event (e.g. a timer interupt)
<rnicholl1> yes, I would hope so, but I can't find this fact documented
[itchyjunk] has quit [Remote host closed the connection]
<rnicholl1> Do I need to implement reference counting on the wait flags?
<geist> yah, WFE is not like WFI
<geist> I assume you're talking about the arm64 version of WFE?
<geist> basically WFE waits for an 'event'. there are then a seires of things that generate events
<Mutabah> Ah, my bad then
<geist> so SEV is the obvious thing that generates an event, but there are other cases
<Mutabah> Guess - events are enqueued until WFE clears them?
<Mutabah> so an event raised just before a WFE triggers a spurious wakeup?
<geist> correct. sev;wfe basically falls through. actually that sequence is the caononical way to clear the event bit: set it (in case it wasn't already) then wfe to consume it
<geist> so in the spinlock case, the *other* case that sets an event is when you load a load exclusive on an address, and then dont release the exclusive line
<geist> then any other instruction *on another cpu* that writes to the same cache line 'breaks your exclusive lock' on it, and that generates a local wait event
<geist> thus releasing your cpu WFE
<geist> so a general solution is a spinlock: you try to acquire it on cpu A (ie, grab an exclusive load on it), then discover it's already held, then you WFE and spin around and try again
<geist> so it's a general spinlock, except the cpu doesnt continually spin: it waits until the core that owns it writes a zero to it
<bslsk05> ​github.com: lk/spinlock.S at master · littlekernel/lk · GitHub
<geist> it's subtle,but line 27 is the key, it did an exclusive load in line 26 (ldaxr), then 27 decides it's non zero, so loop back to the WFE
<geist> line 33 the owning cpu does a 'stlr' which breaks the exclusive lock on the first cpu, which falls out of WFE
<geist> and then tries again
<geist> (not line 23 does the sev/wfe sequence when first entering the spinlock code, so it clears any event state that may be set before)
<geist> so the key is the true power of it comes from multiple cpus. a cpu by itself the sev/wfe thing isn't eally that useful
<rnicholl1> uh, my understanding was that sev is use to wake up other cores... why in this case is sev used by the code doing the unlock?
<rnicholl1> er, not used
<rnicholl1> So, supposing I do a load
<geist> again are you thinking about arm64 or arm32? it changed a little bit between armv7 and armv8
<rnicholl1> arm64
<geist> armv8 a store to the same address also sets an event
<geist> but tighter scoped: it only triggers the event on other cores that happen to have an exclusive monitor on that address
<geist> ie, ldaxr previously
<geist> the older armv7 version of it yes you would store a zero to the lock, and then SEV after it
<geist> but then it was global, so was less efficient than the armv8 one
<rnicholl1> supposing I do an acquire-load, the WFE instruction would be woken up if a recently executed acquire load lock is lost?
<geist> yes
<geist> monitor is the better word
<geist> basically by doing the acquire load and then storing to it the local cpu now has a monitor for that address
<geist> and then any other cpu breaking the monitor locally sets the event bit
<rnicholl1> is there a read-only version of this?
<geist> s/and then storing to it/and then *not* storing to it/
<geist> what do you mean?
<rnicholl1> For example, I want to loop on a flag that tells non-0 cores when they can start executing
<geist> sure
<rnicholl1> The flag is set by core 0, then core 0 executes sev?
<geist> just do half the spinlock: ldaxr on the address, then go back to the wfe. then on the core releasing it just do a stlr
<geist> core 0 doesn't have to execute sev if the other cores grab an exclusive monitor
<geist> but if you want to do it more cheezily without load/acquire, sure, sev works too
<rnicholl1> Is exclusive monitor different from acquire/load semantics?
<rnicholl1> I had assumed it was like seq_cst
<geist> yes those are completeky different concepts
<geist> the former is the monitor stuff that you use to build atomics, the latter is the implied barriers that you can add to the atomic ops
<geist> anyway the ARMv8 ARM does mention this event stuff. look up something like implicit event generation
<geist> but SEV is the obvious event generator
<geist> note SEVL is a local event generator, only for the current cpu. doesn't broadcast it to the others
<rnicholl1> hum, I am confused how it avoids the race
<geist> in what case?
<rnicholl1> Supposing you used ldaxr, I can't imagine it sets up a permanent monitor
<geist> it does
<geist> *until* you either clrex, or something breaks it
<geist> (or a bunch of other things that can break it, like an interrupt firing)
<geist> think of it logically as the local cpu is snooping the bus and seeing what other cpus are doing
<geist> it's really more complicated than that, it's using the L1 cache coherency mechanism to track whether or not another core modifies the same cache line
<geist> but since the L1 cache stuff already works, it actually just piggybacks on top of the same hardware
<geist> but also remember this loop/wfe stuff is sloppy. it may wake more times than it needs to, you always have to double check the condition and try again if it's not met
<zid> select(ALL_EVENTS_FD_SET);
<rnicholl1> Ok, so doing ldaxr gets an exclusive state, wfe will wake when another core writes to the same exclusive state, what it the purpose of sevl in this case and how does it differ from sev?
<geist> in this loop it was because sevl;wfe (in the first iteration of the loop) clears any pending events that may have been set
<geist> it's just the canonical way to clear an event safely
<geist> since you can't read the state of the event bit, think of it as a hidden bit in the cpu. you can only directly clear it via wfe, and you can only directly set it via sevl
<geist> it 'latches'
<rnicholl1> oh, so if I set "sev", and no other core is waiting for an event, e.g. they just did a load but haven't yet done wfe
<rnicholl1> the WFE will still wake from the previous sev?
<geist> yah the event latches forever until soething clears it
<rnicholl1> like, core 0 loads, finds 0, (sleep value), core 1 does SEV, core 0 then does WFE, but is woken by the previous SEV by core 1
<geist> yes
<rnicholl1> alright, that clears up the confusion a bit
<geist> so it always errs on the side of an extraneous wakeup. so if you wanted to build this 'wait for a signal from another core' flag, then it's really not much different from a regular loop on a variable, you just stick in a WFE there in the loop
<rnicholl1> so actually, there is not a race in the case of (load, wfe) + (store, sev) because even if the sequence is (load, store, sev, wfe) it will still wake up
<geist> and you can either use a ldaxr and have the signaler write to it, or just loop with a wfe + sev
<geist> riiiight. yeah
<rnicholl1> ok, I was thinking of the classic race condition variables have without using a mutex (i.e. notify/store race)
<geist> note the ARMv8 manual does discuss this in some detail, but as is always the case with ARMv8 it talks bout all the minutae
<rnicholl1> but seems there is a core-wide flag that acts a bit like a mutex
<geist> right, but as i was saying it's actually not ath common to use SEV anymore. using the SEV instruction is basically a global slam of all the cpus
<geist> there's even a bunch of flags when using virtualization that limits to scope to just one virtual machine, etc etc
<geist> so that some thread somewhere that just does SEV in a loop doesn't waste the whole system
<geist> (actually i forget, SEV may not be user accessible. SEVL probably is though, since it only is for the local core)
<geist> SEV was a v7 thing, v8 makes the scope tighter by having the implict SEV with exclusive monitors, etc. since that's mostly where it's used
<geist> but of course for early bringup or specialized kernel cases SEV is perfectly fine
Dyskos has quit [Ping timeout: 250 seconds]
<heat> geist, is all of that bs what makes arm64 cmpxchg weak?
<geist> dunno what you exactly mean, but if you're referring to up until 8.1 you needed a multi instrction sequence to do it sure
<geist> 8.1 adds a CAS instruction to do it directly
<heat> i mean the ldaxr, etc stuff
<heat> yeah HLE right? or some similarly named extension
<geist> LSE (large system extension i think)
<heat> ah yes
<rnicholl1> I think it would spuriously fail if another core got an exclusive lock on the same cache line right?
<moon-child> yes
<geist> right, but if it did then the cache coherence system has failed, and thats the least of your worries
<moon-child> ll/sc has scalability problems
<moon-child> iirc riscv guarantees that some restricted subset of ll/sc will eventually make progress
<moon-child> idk if arm does the same
<rnicholl1> I mean, I assume ldxr can be broken by other cores right?
<heat> how does any of this work with interrupts?
<heat> is there some special state you save and restore?
<geist> for a ll/sc pair (to use that nomenclature) the whole system relies on the ability for a local cpu to get an exclusive copy of the cache line and then it only does the store part *if* it hasn't lost the access in the interim
<geist> and that is just the normal L1 cache coherency protocol that already exists
<geist> which had better darn well work
<rnicholl1> but another core could get an exclusive lock, it would just cause the next store to fail
<geist> correct
<rnicholl1> so that would be why the exchange could "weakly fail"
<geist> because the store only works if the monitor on the local core is still valid. and the monitor is broken if another core takes away the cache line
<rnicholl1> or spuriously fail
<geist> right. and to answer heat: an interrupt and definitely a context switch would break the monitor
<geist> at the minimum in a context switch the OS is required to issue a `clrex` instruction, which manually breaks any monitors
<geist> but there are also some other things that break monitors. iirc an eret does too, implicitly
<rnicholl1> should I do clrex after wfe?
<geist> no. you should not
<rnicholl1> is there a limit to the number of monitors?
<geist> the only place to ever put it is inside a context switch, reallly
<moon-child> it's 1, isn't it?
<geist> i think it's 1
<moon-child> more than 1 is general transactions
<geist> but i'm not sure hardware mandates that
<moon-child> which only intel has semi-mainstream implementations of
<rnicholl1> sounds like we could get spurious wakeups if we don't clear it
<moon-child> arm has specced transactions, though
<rnicholl1> though that could happen anyway
<geist> also note hardware doesn't specify the size of an exclusive lock. it *could* be all of ram. but in practice its a cache line
<geist> rnicholl1: yah you just dont worry about that
<moon-child> I saw some stuff in the arm manual that was _basically_ speccing in cache lines
<geist> spurious wakeups are simply part of the game. all told it's still better than not using wfe at all, so it's all an optimization
<moon-child> I don't remember exactly what it said. Nor if its 'cache lines' were 64 or 128 bits. But it was definitely there
<geist> yah thee's a way to read the exclusive monitor size i think
<moon-child> *bytes
<bslsk05> ​pastebin.com: // The function waits until core_start_ptr is non-null and then jumps to it.// - Pastebin.com
<rnicholl1> So, this should be correct?
<geist> you dont need the clrex
<geist> also how is it cleared?
<geist> also the ldar i dont think does the exclusive lock. you probably want a ldaxr (the 'x' is the exclusive monitor)
<geist> or just use a sev where you write to it
<bslsk05> ​pastebin.com: // Starts the auxiliary core with the given entry point.// void aux_core_start - Pastebin.com
<geist> yah that'll work
<geist> then yo dont need the x stuff
<geist> note it should also work without the wfe. it's just an optimization to park them and not use cpu
<heat> pauseeeeeee
<rnicholl1> wait actually I do need the x
<geist> (as a tiny optimization, you dont need the 'add' instructions there, because you can just put the :lo12:... inside the [] part
<rnicholl1> thought I already fixed it, one moment
<heat> this is where I ask why you're doing a busy loop for $stuff, for what it seems to be secondary cores
<geist> basically either you do the 'x' stuff and avoid the sev, or you dont use the 'x' stuff and then use sev. the latter is probably easier to grok
<rnicholl1> heat: I haven't set up the stack for the other cores yet or the bss
<heat> geist, is any of the break-before-make stuff avoidable when breaking up the kernel's direct map?
<rnicholl1> oh, right, ok, I was a bit confused for a moment, so the x version sets a monitor, the non-x does not, other than that both are acquire loads?
<geist> right, like i said the monitor stuff and acqire/release are different things
<rnicholl1> and wow, I just read about the latest amd chip... over 1GiB of... L3 cache. wow
<geist> oh one of the uber server chips?
<zid> epyc
<zid> They had a 768MB they were demoing for fluid sims last year
<rnicholl1> yeah
<zid> Turns out power densities are so high now the heat doesn't make it to the IHS properly anyway, so you can just throw cache on top of the alus
bradd has joined #osdev
<rnicholl1> Which bootloaders should I implement support for? I understand the raspberry pi 3/4 just load kernel8.img, but there are some other bootloaders around?
<rnicholl1> like open source bootloaders
<geist> uboot is the general solution for ARM machines
<geist> i'd say unless it's some specific loader for a particular product line (ie, iboot for apple devices) or UEFI for servers, then it's probably either uboot or some sort of bespoke thing
<rnicholl1> Is GRUB not often used on ARM?
smeso has quit [Quit: smeso]
<rnicholl1> And multiboot?
<heat> no
<heat> and no
<heat> usually you either pretend to be linux (linux arm64 boot protocol, that uboot, etc load) or do EFI (eww)
<rnicholl1> What boot methods does win11 arm use?
<heat> efi
<rnicholl1> So, would uboot boot itself using EFI as well?
<heat> no
<rnicholl1> Like, EFI->uboot->linux
<rnicholl1> hum
<heat> that would make little sense
<zid> arm doesn't have efi-grub? :(
<zid> poor arm
<geist> what uboot can do is also pretend to be UEFI
<geist> or at least enough of a subset that linux can boot with it. i've seen that fairly commonly, also on riscv
<geist> i haven't looked at precisely what hooks it implements
<geist> so in that case it's uboot -> EFI -> linux
<geist> or uboot-as-efi -> linux
smeso has joined #osdev
<heat> yeah 2nd is more precise
<heat> it's still the same payload
<heat> zid, i believe there *is* support for it, for both multiboot and grub, but WHYYYYYYYYYYYYYYYYYYYYYYy
<heat> and fwiw this uboot EFI ""support"" is a fairly recent add by Google
<heat> 2020 iirc, for Reasons(tm)
<rnicholl1> oh interesting
<geist> huh yeah, didnt know tat. wonder what group did it
<rnicholl1> probably zircon
<geist> note that on ARM and riscv now it's fairly standard to use device tree (or ACPI) as the descriptor for where things are, etc.
<geist> so multiboot doesn't really provide anything in that case. its too basic
<heat> yeah, it'd be great if someone from zircon were here
<heat> alas
<geist> and it's highly x86 centric anyway
<heat> I suspect that uboot efi support was added because they want to remove tianocore on android
<geist> yah, something like that. but then lots of android doesn't boot via grub. but then lots of it does, so probably either way
<heat> although I would assume that we're still far from that, since AFAIK vendors now supply the platform init code in EFI binary modules, and probably EFI PI modules (and not EFI itself)
<heat> geist: s/grub/efi/ ?
<geist> either
<geist> or...
<geist> was thinking mabe it's because of the arm gce stuff
<rnicholl1> At some point I would like to work in zircon, it seems like it would be fun to write kernel code.
<rnicholl1> python is a little bit... less interesting
<rnicholl1> it's not bad though
<geist> python is definitely less interesting
<rnicholl1> I hate that it has weak typing
<rnicholl1> I may have caused an outage by passing [args...] instead of args to a random execution functoin
<rnicholl1> which of course didn't fail the pytype check since
<rnicholl1> [args...] was an array of objects... which is a valid argument
<rnicholl1> yay...
<geist> yha i've definitely been hit by duck typing. i rememer yeras ago was using some library that parses a bunch of html and returns a string
<geist> ten you pass this string around in python and use it. and then it turns out after ther fact that it' really passing yuo this functor thing that returns the string every time you query it
<geist> but otherwise it was hauling around 50MB of state
<rnicholl1> fortunately there was another outage at the same time
<rnicholl1> so nobody noticed
<rnicholl1> well, I mean, we noticed, obviously, but it wasn't that impactful as the other outage was much worse
<rnicholl1> the best part was
<rnicholl1> everyone thought we had unit tests
<rnicholl1> but it turns out, they pymock the entire class out that was modified
<rnicholl1> so it had no coverage
<rnicholl1> python allows you to do weird things
<rnicholl1> back on channel topic... when trying to figure out start and end address of ram, do I just need to scan DTB/FDT? I looked in dts e.g. https://github.com/torvalds/linux/blob/master/arch/arm/boot/dts/bcm2711-rpi-4-b.dts and I don't see RAM entries, does the bootloader set these up dynamically?
<bslsk05> ​github.com: linux/bcm2711-rpi-4-b.dts at master · torvalds/linux · GitHub
<rnicholl1> GPT suggests DTB, UEFI, and ACPI, but I have no idea how I'm supposed to know which one to use
<geist> it does, note that that dts includes other ones
<geist> so probably in the more basic bcm dtses it has a memory range
<klange> Python has _strong_ typing.
<klange> Do not confusing static with strong.
<rnicholl1> Should I chose between these based on the bootloader I am using?
heat has quit [Remote host closed the connection]
<geist> where is clever when you need them
<rnicholl1> e.g. I just have to know that the bootloader uses DTB
<geist> clever!!!!
<rnicholl1> Or know that I am using UEFI
<rnicholl1> And compile a different kernel based on this choice
<klange> Define "uses". The bootloader provides one, possibly one it dynamically generated - maybe even one it dynamically generated from UEFI and ACPI!
<rnicholl1> well, supposing I have my entry point, it seems on raspi 4 the firmware/bootloader sets x0 to the pointer to FDT
<rnicholl1> But how would I know if I am in UEFI instead of DTS/FDT based booting?
<rnicholl1> No way to know? just have boot_dts.asm and boot_uefi.asm separately?
<klange> If you're in EFI, you know because you're an EFI application with an EFI entrypoint.
<rnicholl1> so, boot_dts.asm, boot_efi.asm, etc
<rnicholl1> and chose which one to compile
<rnicholl1> as a kernel config parameter when compiling the kernel
* clever waves
<klange> That is an option. The approach I take, though, is to have a single common kernel binary for a given architecture and then different stubs for different boot environments - which is also typically how things work on x86: you don't build a different kernel for EFI or BIOS, you have an EFI loader and a BIOS loader, and if you want to be particulary clever and you can smoosh them into your kernel binary
<klange> (together!) to have it do it all in one resulting file.
<clever> rnicholl1: the rpi firmware updates the dtb and adds /memory nodes during boot, the exact layout varies depending on config.txt entries
<rnicholl1> ok
<Mutabah> klange: Bonus points for supporting 32-bit and 64-bit EFI in one
slidercrank has joined #osdev
bauen1 has quit [Ping timeout: 265 seconds]
Ali_A has joined #osdev
Ali_A has quit [Ping timeout: 260 seconds]
mahk has joined #osdev
gbowne1 has quit [Quit: Leaving]
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
rnicholl1 has joined #osdev
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
bauen1 has joined #osdev
Ali_A has joined #osdev
pfsoe has joined #osdev
bauen1 has quit [Ping timeout: 250 seconds]
gildasio1 has joined #osdev
Ali_A has quit [Quit: Client closed]
gog has joined #osdev
mahk has quit [Ping timeout: 268 seconds]
Sos has joined #osdev
Sos has quit [Quit: Leaving]
<gog> good morning
* gog passes out bagels and coffee
GeDaMo has joined #osdev
mahk has joined #osdev
slidercrank has quit [Quit: Why not ask me about Sevastopol's safety protocols?]
dutch has quit [Quit: WeeChat 3.8]
heat has joined #osdev
* klys politely declines yet accepts the bagel
<AmyMalik> FreeBSD does not know how to manage PCIe power. The result? a CPU power virus.
bradd has quit [Ping timeout: 260 seconds]
dutch has joined #osdev
danilogondolfo has joined #osdev
nyah has joined #osdev
heat has quit [Remote host closed the connection]
heat has joined #osdev
Ali_A has joined #osdev
<heat> our fucking national postal service sucks so much ass
<heat> i'll have to fucking pray the svr4 book hasn't been sent back to fuckoffshire, england
SGautam has joined #osdev
<heat> these idiots think that ringing the doorbell is optional
elastic_dog has quit [Killed (zinc.libera.chat (Nickname regained by services))]
elastic_dog has joined #osdev
<nikolar> Kek
<SGautam> Might be slightly off topic but nevertheless this channel would know a thing or two about hard drives. Is it my anecdotal experience or is it something well known that Seagate drives are known to fail?
<SGautam> I've had 3 drives go off on me one by one, in succession.
<heat> wasn't that a headline like a few days ago?
<gog> are these newer seagate drives?
<gog> iirc they have a product line that's based on what they acquired from Maxtor and they're notoriously bad
<gog> not like the barracudas of the mid to late 2000's
<bslsk05> ​www.securedatarecovery.com: How Long Do Hard Drives Last
<heat> yeah it was a headline a few days ago
bauen1 has joined #osdev
inegatives has joined #osdev
<heat> gog, i have an interview in a few minutes and i feel nervous, help
<gog> get drunk
<heat> damn you're my savior, thanks
anbdummy has quit [Quit: WeeChat 3.8]
<gog> anytime
<gog> i'm doing a good job at work
<gog> i'm a competent developer even if i don't always know the answer right away
<gog> i am not an imposter
gildasio1 has quit [Quit: WeeChat 3.8]
<heat> damn right
<heat> tomatoanus is proud of you
<zid> I am sus
<heat> amogus
<gog> sussy
<gog> ( ?° ?? ?°)
<gog> dang
<heat> DONG?
<heat> 𓂸
<heat> oh, dang
<heat> dang it
Ali_A has quit [Quit: Client closed]
* FireFly dangs it
<heat> gosh darnit
<gog> con sarnit
<zid> heat stop asking to see my sussy
<zid> it isn't free
<mrvn> SGautam: harddrive are widely different from model to model. Sure there where some segate drives that failed. In general though the likelyhood is for a drive to fail early, then there is an unlikely period for the working life and then the failure rate rises towards the end again. If you bought a batch of drives it's also like for them all to fail after around the same amount of working hours.
<Mutabah> well this is a... discussion
<zid> Mutabah's already paid up in full ofc
<mrvn> Drives in a raid also like to fail during recovery because when the first fails and you replace it the raid rebuild put so much extra strain on the drives.
danilogondolfo has quit [Ping timeout: 276 seconds]
heat_ has joined #osdev
heat has quit [Ping timeout: 246 seconds]
Ali_A has joined #osdev
<Mutabah> Plus, RAIDs tend to be made at the same time, and it's easy to get n of the same model... so likely to have the same failure mode
<gog> yes
<gog> and if any of them have a defect that wasn't detectable at the time of QA then they're all gonna be sus
<gog> hm, how would one time-accelerate testing for MTBF?
<zid> yea I was thinking about exactly that after my ssd just died
<zid> it's samsung's shitty fw that died it, so I bet if i had two of them in raid both would have died
linearcannon has quit [Remote host closed the connection]
linearcannon has joined #osdev
gildasio has joined #osdev
nvmd has joined #osdev
inegatives has quit [Quit: Client closed]
inegatives has joined #osdev
linear_cannon has joined #osdev
linearcannon has quit [Ping timeout: 250 seconds]
linearcannon has joined #osdev
linear_cannon has quit [Ping timeout: 276 seconds]
<gog> i have to support a legacy thing and a new thing and the logic to do it is annoying and complicated
inegatives has quit [Quit: Client closed]
<mrvn> Mutabah: if you have the budget you buy a few extra spares for the raid and then you replace some drives after some random times so they are all aged differently.
* Mutabah is away (Sleep)
<mrvn> gog: they tend to run the drive under worse conditions, like turn up the heat and humidity and then they have a multiplier how much longer that relates to in real life. Which is basically purely made up.
<gog> yeahhh
<mrvn> The specs on SSD is real funny too. It has some life expectancy of 3 to 5 years but there is also a value for how much data you can write before it's expected to fail. Devide the amount by the max speed and you get les than a week.
<nortti> wait what SSDs only have a life expecteancy of 3 to 5 years?
<mrvn> nortti: same as spinning disk. Bogus made up value for which the manufacturer will replace your disk.
<mrvn> With SSDs I would trust most the amount written before failure because that they can actually test easily. Just write to the same address on the flash over an over and count. They know it manages like 10'000 or 100'000 writes or whatever the current value is and from that and their wear leveling algorithm they know how much writes you can do.
elastic_dog has quit [Remote host closed the connection]
elastic_dog has joined #osdev
<zid> ssds are consumable goods yea
<zid> and they often die by just disappearing, not going read only
<zid> everybody's warranties have been getting shorter and shorter on them too
<mrvn> I recently also heard that SSDs will loose data if you put them on a shelf and don't use them. In use the data is refreshed by the wear leveling every now and then but unused it fades. Annyone heard that too?
<mrvn> No idea if that's true.
vdamewood has joined #osdev
<gog> iirc that's more of a thing with MLC than SLC
<gog> because the distance between the stored charge is smaller and in more directions than with MLC it can just do quantum weirdness and start to degrade
<gog> s/MLC/SLC
<gog> > According to that rule, MLC SSDs generally have a shorter life span than SLC SSDs, and the difference can be significant. For example, an MLC SSD may last for about 3,000 cycles, while an SLC SSD in the same application may last for 60,000.
<zid> yea more layers is a more bad ssd
<zid> generally
<zid> they sell TLC and QLC too
<gog> considering returning to optical media for long-term backups
<zid> flash is fairly shelf stable I think
<zid> it just doesn't like being written to, either the flash dies or the controller does
<mrvn> gog: how many cycles is sitting in a shelf for a year?
<gog> i'm looking for shelved data integrity but not finding much
<mrvn> gog: DAT tapes?
<mrvn> .oO(and yes, that's tape tapes)
<gog> a tape drive for really long-term storage sounds fine
<gog> it's a time-tested technology
<gog> and you can get a lot on a tape
<zid> also cheap
<mrvn> you still need to rewind them every now and then to prevent bleed through I think.
<zid> flash prices are going down a lot though
<zid> multi TB drives exist now
danilogondolfo has joined #osdev
<mrvn> Also, is there any software to run a raid algorithm over tapes? Or do you just mirror for redundancy?
<mrvn> E.g. XOR everything written to a tape to a harddisk and after 4 tapes write the harddisk contents out to a 5th tape as redundancy.
<gog> ehhhh
<gog> i'd say just mirror
<gog> having blocks spanning multiple tapes seems like a good way to lose everything if one of the tapes gets damaged
<nortti> from what I see a lot of ppl don't bother using anything but mirroring with disks either nowadays
Ali_A has quit [Quit: Client closed]
<nortti> due to write amplification and such
<mrvn> gog: I don't want to stripe files across tapes. :) Just make it so a lost tape can be recovered.
<zid> I use a raid array between my ssd and a floppy disk, with --write-mostly
<mrvn> zid: must be a 9 1/2' floppy or it would be far too small for the data. :)
<mrvn> nortti: doing 3 or 4 copies with mirroring becomes expensive
<zid> I did actually do that for my firefox profile back in the day, made a raid array between a ramfs file mounted loopback, and a real file, with the real file --write-mostly
<mrvn> zid: does that help? firefox writes and syncs every 10 seconds and only reads back the data on restart. So you mostly write anyway.
inegatives has joined #osdev
<gog> mrvn: raid 6 types
<gog> tapes
Ali_A has joined #osdev
<mrvn> gog: go further. 16+4 redundancy or so.
<mrvn> Survive 4 tapes failing at 25% cost increase.
<mrvn> Once you do that your drive is probably going to fail just when you can't get a new one anymore. :)
<gog> oops
<gog> what LTO level are we up to now
* mrvn wants to dote diamond with something as permanent storage on an atomic level.
<mrvn> At least LTO-8
<mrvn> make that 9, a few hits down on google.
<mrvn> "The ninth generation of LTO Ultrium was announced in 2020 and can hold 18 TB in a cartridge of the same physical size."
<heat_> 9000
<mrvn> I didn't even notice that tapes have exceeded my harddisk sizes.
<gog> lmao "how many levels of LTO are you on my dude" "8 or 9"
<FireFly> was just thinking that too lol
<gog> fuck. i am not a competent developer
dude12312414 has joined #osdev
<gog> i think i need to learn more about react because this is clearly supposed to be easy but i fail to comprehend it
<mrvn> Tape drives are way to expensive for my taste: Qualstar Qi LTO-9 SAS EXT. Tape Drive -- 5.550,00€
<mrvn> I can buy a lot of harddisks before tapes become cheaper.
<Ermine> gog: may I pet you
<gog> yes
* Ermine pets gog
* gog prr
<gog> yeah it makes way more sense if you're backing up whole data centers
<gog> not so much if it's just your own stuff
<Ermine> Do tape drives still exist???
<gog> oh yeah
<Ermine> want one
<gog> i should pick up a storage drive for backups
<bslsk05> ​www.amazon.com: LTO-5 Ultrium 3000 SAS Internal Tape Drive - Bandlaufwerk - LTO Ultrium ( 1.5 TB / 3 TB )
<Ermine> heck, those are expensive
<mrvn> Even the older models still are expensive.
<mrvn> Ermine: is there anything other than tapes that is designed for 30-60 years of storage? Only thing lasting more would be paper.
<Ermine> Ok ok
heat has joined #osdev
heat_ has quit [Read error: Connection reset by peer]
inegatives has quit [Quit: Client closed]
<SGautam> gog, Yes, one was purchased in 2018, the other in 2019 and the newest one in 2020. They all have the same design. They were all 2 TB Backup Plus line.
<zid> people still buy petabytes of tapes
<SGautam> All failed right during the time something was being transferred though. One failed while trying to install a game from Steam, the other failed when transferring photos, and the last one failed while installing ANSYS.
<SGautam> Like suddenly. You hear the Windows eject sound, boom, its gone. Setup / Transfer errors out, and you put it back in and poof, nothing. Windows nor Linux even detect it.
Left_Turn has joined #osdev
Ali_A has quit [Quit: Client closed]
bauen1 has quit [Ping timeout: 268 seconds]
linearcannon has quit [Remote host closed the connection]
linearcannon has joined #osdev
<mrvn> Remember zip drives with their click of death?
<heat> from experience, windows completely falls over once ATA transfers start failing
<heat> linux is a lot more resilient *and* you get all the ATA errors in the dmesg
<zid> that's because windows treats half the fs as important files that must not be missing
<zid> linux be like "I guess, bzImage has to be in memory?"
Ali_A has joined #osdev
<mrvn> heat: linux also easily goes indo an endless reset loop and never reports a failure to the upper layers.
<mrvn> So e.g. the raid never throws out the faulty drive and retries on a different one. It just blocks forever.
slidercrank has joined #osdev
<mrvn> If anyone wants to work on the Linux ATA errors I have a fun little project that needs support. I have a bunch of relays inserted into the power connection so you can cold reset the drive on demand or even put it into cold standby on inactivity.
Ali_A has quit [Quit: Client closed]
Ali_A has joined #osdev
<SGautam> I'm really scared at doing anything with hard drives at this point. Like I thought the point of HDDs was to store large stuff.
<zid> hdds are very very reliable
<zid> (unless they are not)
<SGautam> Three of them failed continously for me in the last month.
<zid> was there an earthquake
<SGautam> Nope, all failed when trying to transfer large amount of data.
<SGautam> Maybe the USB controller in my laptop could be an issue?
<SGautam> Because I did notice that all USB devices started to act wacky whenever an HDD failed.
<zid> 4850 days
<zid> ah if it's a portable drive then the answer to was there an earthquake is "yes", btw
<SGautam> Yes, it was an external HDD
<zid> 13.3 years my 1TB drive has been powered on for
<SGautam> Is it advisable to carry external HDDs in a backpack while travelling without some sort of a case?
<zid> when they're off they're fairly fine, the heads aren't loaded
<SGautam> I'm trying to narrow down the cause of failure here. 3 HDDs failing should mean something's definitely wrong with my system.
nvmd has quit [Quit: Connection closed for inactivity]
<SGautam> Basically all 3 drives now, once connected spin for a while, and then stop. OS doesn't detect anything.
<zid> oh today is the 40k hour anniversary of the 4TB drive
<SGautam> 40,000 hours certainly...?
<zid> certainly what?
<SGautam> Oh nvm, didn't read the "hour"
<SGautam> read it as "40k anniversary"
<zid> Yes, it's now officially part of the warhammer universe
pfsoe has left #osdev [#osdev]
<heat> mjg, idk if you've been following the thread or not, but linux is going to start returning EINVAL (like netbsd) there
<heat> 1) sweet 2) linux just broke userspace 😲😲😱😱😱😱
Ali_A has quit [Quit: Client closed]
foudfou_ has joined #osdev
foudfou has quit [Remote host closed the connection]
Ali_A has joined #osdev
inegatives has joined #osdev
linearcannon_ has joined #osdev
linearcannon has quit [Remote host closed the connection]
<geist> heat: woot your patch got accepted?
<heat> no
slidercrank has quit [Ping timeout: 276 seconds]
<heat> my patch restored the old behavior, they decided to break any past linux behavior and just EINVAL the whole thing
<heat> because it was stupid and "since no one complained for two stable releases, no one cares probably"
<heat> which is a good idea IMO
<geist> ah, so create and dir at the same time is simply EINVAL now?
<heat> yes
<heat> it's probably an open question (might ask) if there's any value in letting open create a directory in that case
<heat> but I doubt anyone on that side is inclined towards that
<heat> you know, it's pretty remarkable how linus still knows his shit
<zid> where's the lkml link
<zid> or do I have to check my own emails
<zid> oh found it, from:pedro
<geist> well sounds like someone voted for pedro
<heat> damn, i'm getting votes now?
<heat> what do I get?
<zid> executed.
<heat> shit
Perflosopher has joined #osdev
<zid> heat if you reply again, put a 'hi zid' at the bottom kthx
<heat> "hi linus, can you say hi to my friend zid? he's a big fan"
<zid> exactly
<heat> he needs a cameo account but he's still not a washed enough celebrity for it
<zid> I actually like the idea of O_CREAT | O_DIR though
<zid> it's less racey than doing that as two calls
<zid> like "I wanna make this file, but if it's a directory, open it instead"
<heat> it's a weird idea though?
<zid> gentoo does something similar for portage
<zid> it'll either open blah.conf or every file in blah.conf/
<heat> like in what case would you ever want to open *a directory*, but if its not a directory, create a regular file?
<geist> it depends on if you consider O_CREAT to be 'create a file, not anything else'
<geist> or 'O_CREAT can be combined with other attributes to specify the type of thing'
<heat> having O_CREAT | O_DIR do "open-dir-or-create-dir" atomically would be a much more interesting idea
<geist> right, i'd tend to read O_DIR being combined with O_CREAT to be a modifier to O_CREAT to tell you to make a dir instead of a file
gog has quit [Quit: Konversation terminated!]
<geist> but since there's a mkdir call, it's redundant, so the EINVAL return is probably legit
<heat> it's not redundant
<geist> but then you can argue that if it *did* make a dir, then it's not redundant
<heat> mkdir only creates, does not open, and does not give you a fd
<geist> because then it's actually useful feature: open a dir ... yeah wha tyou said
inegatives has quit [Quit: Client closed]
<geist> i do wonder what really old unices did here. it's fairly clear that early on there was no distinction between dir and file, though maybe the dir bit was still set in the inode
<geist> so seems like the open(CREAT|DIR) would have been the original pattern
<heat> O_DIRECTORY isn't that old
<geist> oh maybe that's the diff, at some point dir as a separate type of file came along
<heat> it's in linux since 2.1
<geist> though i dunno how in the old days it was specified that something was a dir, vs a regular file
<geist> well, okay, so thats like only 20ish years or so
<heat> i mean, normally? with a file type?
<zid> heat: I like fewer calls to the fs to achieve what I want, and imo there's no drawback to *supporting* create-file-or-open-dir-in-one-call
<heat> O_DIRECTORY really is just a "make sure this is a dir"
<zid> and I can think of *a* use for it
<zid> like, if the alternative is -EINVAL then why not just make it do something
<heat> because the "something" is kind of the antithesis of O_DIRECTORY
<zid> meh, I just consider it a flag
<geist> yah depends on if you consider the flag combination to be 'limit it to this' or 'add this additionally'
<zid> flags can do flaggy things and mean things dpeneding on other flags
<heat> O_DIRECTORY = only return directories, O_DIRECTORY | O_CREAT = open *only* directories, or create a regular file
<geist> i'd tend to consider it to be the former
<geist> ie, i passed O_DIR on this, so i only want this to deal with dirs, and nothing else
<zid> mkdir's existence at all is the weird part of all this
dutch has quit [Quit: WeeChat 3.7.1]
<geist> and in that logic it makes sense to limit the combination to at most 'create a new dir if it doesn't exist, but if it's a file, error'
<zid> I don't value consistency here, given it returns EINVAL now, and it does *something* useful.
<geist> so i guess -EINVAL is just a cop out to avoid the problem
<zid> I'd rather have the useful option available than not
<heat> i think in theory mkdir, mknod, etc could in theory have been mostly collapsed to open()
<zid> yea idk why mkdir just isn't O_CREAT | O_DIR
<zid> to begin with
<heat> because le unix
<geist> well because mkdir only creates a dir, not opens it additionally
<zid> do I need to dissect dennis tichie
<geist> so it's not precisely the same semantics
<heat> geist, so open(O_DIR | O_CREAT | O_EXCL)
<geist> that makes sense
<zid> I think it's probably rarer that you want to make an empty dir than you are making a dir so you can then write a file
<geist> of course mkdir() user code also returns a DIR * iirc, but that's just a layering thing. it could built on top of open(...)
<heat> no, mkdir returns an int (0 or -1)
<geist> hmm, what is the user api that gets you a DIR *?
<zid> mkdir bad and racey and inconsistent with other api
<heat> should be the opendir stuff, etc
<heat> but those don't create I think
<geist> oh i see. mkdir is ust error, opendir gets you the DIR &
<geist> yah i'm gonna guess it has something to do with when DIRs were just regular files and the contents were maintained by user space
<geist> so in that case mkdir is kinda like 'format this file with dir structures'
<zid> yea
<zid> like mkisofs
<heat> mkdir (and mknod) are kind of odd because there's a blatant pseudo-TOCTOU that no one has cared to address, ever
<heat> while open is super nice and super well designed but does not support any sort of other files because of stubbornness or something
<zid> yep
<zid> that's why my brain goes "Just make open do it"
<zid> fewer syscalls better, less racey
<Ermine> real thing
<zid> I *want* an atomic 'open file or dir or make it or whatever' because I definitely do *not* want to have to write that myself some day.
<geist> flip side is, you're dealing with posix and 50 years of legacy, so you kinda are already knee deep in crap at that point
<zid> from userspace
<geist> you kinda cant get what you want by definition
<geist> ie, if you want clean apis dont implement posix :)
<Ermine> the only thing is that dir works through DIR struct, and regular files are just a fd
<geist> which of course doesn't fix your problem, it just makes it entirely your problem that you can hypothetically fix
<geist> might be interesting to see what win32's raw dir api looks like. i dont know off the top of my head
<heat> but UNIX is perfect
<geist> i bet win32 just treats it all as a separate kind of file: OpenDirectory, ReadDirectory, etc
<heat> it's like a gordon ramsey dish but where gordon ramsey never actually went to culinary school
<zid> An application cannot create a directory by using CreateFile, therefore only the OPEN_EXISTING value is valid for dwCreationDisposition for this use case. To create a directory, the application must call CreateDirectory or CreateDirectoryEx.
<zid> is all I know about dirs on windows
<heat> yeah ok, so you can open a dir with CreateFile
<zid> yea you just can't make one
<zid> linux's open can create, windows' create can open
<zid> :D
<heat> passing note that all these win32 calls are weirdly named and CreateFile is "Create internal win32 file handle to $stuff"
<geist> yah was gonna point out that createfile is weird
<zid> yea createfile is basically createfd
<zid> and can be a file, or DEVICE_HANDLE or whatever
<zid> cus they don't go all 'files are devices'
bauen1 has joined #osdev
<zid> they just say 'everything gets a HANDLE'
<heat> yeah, they go "all files are objects in this object manager extravaganza"
<heat> which, honestly, i'm not sure which is worse
<geist> they probably have limits of what kinda handles you can point files at but that makes sense in that case: create file says no
<zid> at least linux doesn't crash if you try to open CON
<geist> sockets too being handles of a particular type
<immibis> windows also doesn't crash if you try to open CON. It just opens CON, right?
<zid> no
<immibis> you could say linux crashes when you try to open /dev/tty
<zid> windows 3 -> xp just crashed
<zid> windows supports posix socket stuff, it just caps out with an impressiv MAX_FDS of 32
<geist> i definitely embrace the notion of one handle namespace for all things, which we tilted hard on in zircon and still works
<geist> and that files are just one kinda handle
<geist> so i guess in windows the fd namespace is separate, but you have to map files to it
<Ermine> Directories are a special kind of files, so I don't think you can get away without some special API
<geist> or the fd namesace is maintained in user space, maybe
<zid> \\.\GLOBALROOT\Device\ConDrv\KernelConnect is what you can open to crash modrn windows apparently
<zid> same file, but bypasses the 'CON' strcmp protections
<geist> nice, though presumably you need particular ACLs to get that
<heat> <immibis> you could say linux crashes when you try to open /dev/tty <-- what?
<zid> My favourite windows crash was (or maybe still is) that two applications could debug each other, as a sort of DRM, then if you tried to attach a debugger to either it'd just bluescreen
<heat> Ermine, how so?
<Ermine> They have a structure, they are not a bag of bytes
<zid> neither are files
Ali_A has quit [Quit: Client closed]
<zid> they have spooky things like names and permissions
<zid> which you have to call special apis to query
<heat> i can imagine a world where reading through a dir fd would just give you structured records of directory entries
<Ermine> Or you need to call fdopendir after open every time
<zid> heat: I think that's where we came from
<zid> the DIR * thing
<Ermine> heat: and reading anything else than sizeof(struct direntry) is UB?
<heat> old UNIX (and some new UNIX) just reads the raw disk struct
<heat> Ermine, no, it could work more or less like getdents does
<zid> or .zip or .cab or .hpi or any other format that can handle nested files, imo
<heat> actually, current netbsd does support catting directories (which I assume is just a plain open + read())
<heat> and it gives you binary garbage, probably straight from the fs
<geist> last i checked most of the BSds let you directly read the dir if you're root. but i suspect that only makes sense on FSes where the dir itself is a flat file with a structure on it
<geist> presumably fses where the dir is a more abstract concept on disk wouldn't work
<mjg> freebsd disallows it now
<heat> yeah i wonder what happens if you do it on zfs
<geist> mjg: makes sense
<mjg> it was hysterical behavior
<mjg> afair there used to be no readdir back in the day
<geist> yah and wouldn't makes sense for something like zfs
Ali_A has joined #osdev
<geist> ALL HAIL TRU64
<heat> tru64 is indeed the true 64
<heat> mjg, fyi i'm now armed with 100 or so pages about STREAMS
<mjg> did ya know that AIX in the 90s was claimed to support 8 CPUs?
<heat> send me money or I'll add STREAMS to FreeBSD
<heat> this is not a threat, it's a promise
<zid> 8 cpus? nobody can afford 8 cpus.
<zid> THat's a lie
<Ermine> heat: how many?
<zid> intel's selling 4 cpus superglued together now though, for $11000, so we're almost there
<geist> 1 2 3.... 7
<geist> 8 no way
<mjg> or 9
<mjg> but 8?
<heat> Ermine, a football field worth of pages
<geist> 10 is right out
<mjg> liek gap in the market innnit
<zid> what if I promise to proceed directly to 11?
<Ermine> heat how much money?
<heat> idk whatever I can extort from those freebsd nerds
<Ermine> say no more
* Ermine runs to bank
<heat> i said freebsd, not alpine
<zid> heat: as your manager, don't forget my 30%
<heat> your musl distribution is safe, for now
<mjg> so what book is it
<mjg> solaris internals or secrets of the magic garden
<heat> secret of the magic gard
<mjg> which is 'solaris internals to be'
<mjg> :x
<Ermine> heat: you lost a chance to get some $$$
<Ermine> Btw does anything use STREAMS?
<heat> some of the old UNIXes maybe?
<geist> hmm, does any of the modern ones implement it?
<CompanionCube> geist: illumos does?
<heat> yeah erm, that's what I meant with "old UNIXes"
<geist> ah because solaris heritage
<heat> AIX, HPUX, Illumos, Solaris
<geist> i honestly never looked at it. like all things it's probably got some okay ideas but didn't catch on
<heat> basically anything coming out of the SVR4 oven
<heat> hell, UnixWare
<geist> though i guess it could just as easily have totally fatal flaws
<CompanionCube> heat: isn't current unixware freebsd-based though?
<geist> hmm, looks like poll may have come out of streams
<mjg> poll?
<nortti> the poll-syscall
<heat> you know, gotta love these UNIX function names
<heat> lookuppn
<mjg> nortti: not what i'm asking
zxrom has quit [Quit: Leaving]
<geist> poll syscall yes
<geist> just reading the wikipedia summary of STREAMS https://en.wikipedia.org/wiki/STREAMS
<geist> says in the first part poll, putmsg, and getmsg were added
<mjg> fbsd man page says The poll() function appeared in AT&T System V UNIX.
<mjg> no info how did that happen ;>
<heat> magic
<mjg> > This port added the putmsg, getmsg, and poll system calls, which are nearly equivalent in purpose to the send, recv, and select calls from Berkeley sockets.
<mjg> huh
<heat> it's where the magic garden came along, really just magic
<mjg> > FreeBSD has basic support for STREAMS-related system calls, as required by SVR4 binary compatibility layer.[21]
<mjg> hehe
<heat> o shit
<mjg> not true dawg
<heat> can't extort you idiots
<geist> yah makes sense that at least one part of it survived. poll() sounds like made the cut
<mjg> maybe it wa at the time
<heat> ah, so the man page is lying again
<geist> well, basic support for the syscalls may be just returning -EINVAL
<heat> swear to god, why have man pages?
<mjg> geist: ye i'm saying i would expect poll to show up shortly after select
<mjg> being less fucking retarded 'n all
<mjg> heat: you mean why have wiki pages
<heat> geist, fyi if you want to read up on STREAMSSSSSSSSSSSSSSSSSSSSS, https://archive.org/details/magicgardenexpla0000good/
<heat> just borrow it for an hour and laugh at the oldies
<heat> "BUT THIS DOESNT SCALEEEEEEEEEEEEEEEEEE!!!!!!11111111111!!!!" - exclaimed mjg geist
<CompanionCube> there's also a programming guide released by sun that isn't a dead link now
warlock has quit [Ping timeout: 276 seconds]
inegatives has joined #osdev
gog has joined #osdev
Ali_A has quit [Quit: Client closed]
<heat> windows should get GATES
<heat> mjg, you still haven't properly explained why the 4.4BSD vfs sucks
<geist> heat: i see wha tyou did there
<heat> if you go for a single lookup(bunchaoptions) do you end up getting some spaghetti-ass-code from all the special casing?
<gog> meow
<mjg> heat: i did give the gist of it
<CompanionCube> heat: i mean, technically wow64 iirc has something called 'heaven's gate'
<mjg> heat: the gist is someone wrote ufs support directly all over, then "vfs" got added in that a bunch of places got an indirect function call instead of just descending into ufs
<mjg> heat: .... all while numerous idiotisms stemming from ufs remained as the de facto api
inegatives has quit [Ping timeout: 260 seconds]
<geist> yah makes sense. NFS too: take everything that was at the vfs layer and shove it over a network
Ali_A has joined #osdev
<mjg> heat: also read about 'vnode aliasing'
rnicholl1 has joined #osdev
<heat> mjg, do you have no qualms with the lookup stuff?
Ali_A has quit [Quit: Client closed]
<mjg> i do, it's all weird at best
<mjg> perhaps you forgot of v_id
Ali_A has joined #osdev
<heat> what's v_id?
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
<heat> and why is it "weird"?
zxrom has joined #osdev
<mjg> it is "we don't properly manage vndoe lifecycles, so vnodes never get *actually* freed. instead, v_id gets bumped when they do, so you check it"
<heat> lol
foudfou_ has quit [Remote host closed the connection]
foudfou has joined #osdev
<mjg> again check vnode aliasing, related to device node handling
inegatives has joined #osdev
<mjg> i'm not gonna point fingers at systems which *still* do it :]
<heat> is it LibreBSD or Sunloris?
<heat> or both?
<mjg> net and open b
<bslsk05> ​grok.dragonflybsd.org: vnode.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/kern/fs/vnode.c
<mjg> you don't. it lands on an eternal "free list"
<bslsk05> ​grok.dragonflybsd.org: vnode.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/kern/fs/vnode.c
<mjg> 109 if (numvnodes < desiredvnodes) {
<mjg> 110 vp = (struct vnode *)malloc((u_long)sizeof *vp,
<mjg> 111 M_VNODE, M_WAITOK);
<mjg> otherwise grab from the "free list"
<mjg> vnode once allocated persist till reboot
<mjg> cache_purge called from there bumps v_id
<heat> what would happen if you actually freed?
dutch has joined #osdev
<heat> i mildly enjoy how you just xref'd on 386BSD lol
<mjg> presumably you wuld get crashen all over
<bslsk05> ​grok.dragonflybsd.org: nfs_vnops.c (revision dc8b130e) - OpenGrok cross reference for /386bsd/usr/src/kernel/nfs/nfs_vnops.c
<mjg> 451 vpid = vdp->v_id;
<mjg> ... vget fuckery
<mjg> 470 if (vpid == vdp->v_id) {
<mjg> ops hehe not the same vnode anymore?
<heat> why does the average BSD fs routine have 50 variables
inegatives has quit [Quit: Client closed]
<heat> all poorly named too
<mjg> hands busy taking lsd
<mjg> no time to type
<heat> the USL people also had this same issue
<heat> lookuppn
<heat> compvpp, dirvpp
<mjg> i also note that by implementation, not actual real needz, all thatl ookup code is expensive af
<mjg> massive branchfest all over
<heat> there you go
<heat> yes, i assume it's spaghetti
<heat> that's what I wanted to hear
<heat> "doing everything in one lookup routine is a bad idea"
<heat> not like linux lookup seems to be any readable either
<heat> but it does seem like the strategy is overall a bit different
<bslsk05> ​grok.dragonflybsd.org: ufs_vnops.c (revision 191115cf) - OpenGrok cross reference for /freebsd/sys/ufs/ufs/ufs_vnops.c
<mjg> almost all real-world calls are with VEXEC
<mjg> the cpu sees 'ufs_*' and knows it's going to bei n ther for a while
<bslsk05> ​grok.dragonflybsd.org: vfs_subr.c (revision 62a573d9) - OpenGrok cross reference for /freebsd/sys/kern/vfs_subr.c
Ali_A has quit [Quit: Client closed]
<mjg> let's keep computing all these possible flags
<mjg> so that at the end of it one can: if ((accmode & (priv_granted | dac_granted)) == accmode) {
<mjg> meanwhile in a chad filesysetm:
<mjg> (node->tn_mode & all_x) == all_x))
<mjg> if (__predict_true(accmode == VEXEC &&
<mjg> return (0);
johngammerson has joined #osdev
<mjg> there, almost all sorted out in 2 brnaches
<mjg> one may notice it would be better to have a dedicated routine so there is no need tob ranch on whether accmode == VEXEC
<heat> __predict_true is stupid
<mjg> ... which is what i did later
<mjg> how so dawg
<johngammerson> hello guys, i am truly ashamed to come back here but i have an assembly question, does loop auto return to the label it was called from?
<heat> because it's so stupidly named I probably wouldn't think of using it
<heat> likely()!
<johngammerson> i rtfm'd but still couldn't find an answer
inegatives has joined #osdev
<johngammerson> do i need to return manually after the loop?
johngammerson is now known as ilovethinking
<mjg> heat: oh the name. i don't have a dstrong opinion
<nortti> ilovethinking: are you talking of the x86 "loop" instruction? when you say "the label it was called from" what do you mean?
<mjg> heat: afair it was named like that to "discourage frivoluos use"
<ilovethinking> nortti: can i paste somewhere?
<heat> also I would guess that doing this sort of weird micro optimization is weird
<ilovethinking> like paste.sh?
<nortti> yeah
<heat> it's like the weird prefetch()es in the middle of random linux kernel code
<gog> when the loop condition is satisfied it will continue to the next instruction after loop
<mjg> heat: how about likely or unlik.ly in said code
<mjg> you might have missed the part where almost all actual calls are VEXEC
<heat> yeah, they are, but will it ever change and does this make a difference?
<heat> the answer to "will it ever change" is clearly yes, since you added a completely separate routine for VEXEC
<heat> now your likely() makes no sense
<gog> for regular loop, opcode e2, it will loop until ecx is 0
<gog> for loope and loopne it'll loop until ecx is 0 or when zf is 1 or 0 respectively, which can be set within the body of the loop
<heat> i don't think i've ever used loop before
<gog> me either
<gog> easiser to use conditional jumps
<heat> what? you have loops?
<bnchs> hi osdeveersss
<heat> i just unroll everything
<heat> i am firmly against the concept of a loop
<gog> ilovethinking: ok so
<gog> there's no ret after loop
<gog> so it'll fall through back into _start
<gog> but you'll still have the stack frame
<gog> mov eax, 1 will never execute
<bnchs> heat: but loops save disk space on repeating :<
<heat> loops are morally wrong
<gog> you'll forever call .SetEntry
<gog> and eventually overflow your stack
<ilovethinking> hmmm... is there a way to make it continue to execute the mov eax? can i do this without calling? idk if im wrong but i saw someone do it without call, just setting the .setentry in the middle of start
<ilovethinking> that makes no sense
<nortti> "hmmm... is there a way to make it continue to execute the mov eax?" yeah, use a ret
<mjg> heat: is still makes sense. the routine added is only used by lockless lookup
<mjg> heat: should i add one for locked looukp, then this indeed should b removed
<gog> every call needs a corresponding ret
<ilovethinking> so basically ret after the loop?
<gog> yes
<mjg> heat: so the real calls which land there are still almost alwasy VEXEC< there is just way fewer of them
<ilovethinking> thanks
<ilovethinking> appreciate it
<gog> assembly will do EXACTLY what you tell it
<gog> a label is not a function
<gog> it's just a symbol that refers to a position in code
<ilovethinking> i'm gone from here again so i don't indulge in accidentally making someone spoongf
<ilovethinking> spoonfeed me
<gog> there are very few things that will happen implicitly
<ilovethinking> gog: yeah that's kinda hard to get used to for me
<gog> you'll grow to understand it
<gog> assembly doesn't have a concept of functions or objects or anything. it's instructions and data
<gog> you can arrange the instructions to behave like a function
<gog> but it has to be explicit
<nortti> ilovethinking: if possible, I'd try to write some user-mode programs in assembly to get used to concepts
<gog> yes
<ilovethinking> good idea, what's something i could write to get used to it?
<gog> implement strlen
<gog> doesn't have to be fast or branchless
<gog> but it's a good starting point
<gog> also godbolt.org is a good resource
<gog> you can write C code and see the assembly output
<ilovethinking> i'll implement strlen yes
<ilovethinking> good idea
<ilovethinking> thanks goggy
<ilovethinking> lub u
<ilovethinking> <3
ilovethinking has quit [Quit: WeeChat 3.8]
<bnchs> can i pat gog?
<gog> yes
* bnchs patpatpat gog
* gog prrr
<bnchs> gog is smart unlike me :3
<zid> I'm surprsised you still talk to him
<gog> stop it bnchs
<gog> zid: i'm too helpful for my own good
<bnchs> gog: stop what?
<zid> I'd say it's helpful for neither of you :P
<gog> bnchs: saying you're not smart
<gog> believe it or not i _have_ cut people off from programming help
<zid> I cut heat off, he just can't understand the glory of zidlists
<gog> huehue
<gog> i was feeling generous today anyway, i got a lot done at work and i feel pretty good
<bnchs> gog: i'll stop saying it, anyway how are you?
<gog> p good today
<zid> you're very mood volatile
<gog> how's bnchs
<bnchs> the usual
<zid> just feel meh every day like a nice neutral grey person like me
vdamewood has quit [Quit: Life beckons]
<gog> i physically can't, i have bipolar disorder
<zid> except monday for 20 mins when honzuki chapters release, ofc
<zid> what's the fix for that, bandsaw?
<gog> low-dose antipsychotics work for me most of the time
<gog> :P
<lav> i think you're supposed to eat batteries
<gog> lithium does _not_ work for e
<gog> it only causes me gastrointesinal distress
<lav> mhm
* gog offer patpat to lav
<zid> try nickel or cadmium
<lav> gib
* gog patpatpatpatpat lav
<lav> uwu
* lav prr
<zid> lead acid is hard on the gastro distress i feel
<gog> yes
<gog> i prefer alkaline batteries for indigestion
<lav> personally i like gallium's chewyness
<zid> I like gallium served over aluminium
<zid> good mouthfeel
* bnchs offer patpat to lav too
* lav accepts
* bnchs patpatpatpat lav
* lav purrs more
<bnchs> cute
<lav> askjhgjfgh
<Ermine> uwu
gbowne1 has joined #osdev
inegatives has quit [Quit: inegatives]
theboringkid has joined #osdev
<lav> uwu
rnicholl1 has joined #osdev
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
slidercrank has joined #osdev
inegatives has joined #osdev
<geist> pata pata pata pon
<geist> damnit, you made me remember that
<geist> i've lost the game
<bnchs> hi geist
d5k has joined #osdev
<geist> hey
<gog> uwu
<Ermine> geist: pon pon pata pon
<zid> I need a new audiobook so I can play more tetris
<moon-child> wat
<zid> You heard me
<geist> Ermine: nooooo!
rnicholl1 has joined #osdev
<geist> that game is insidious
rnicholl1 has quit [Client Quit]
rnicholl1 has joined #osdev
<d5k> hi
rnicholl1 has quit [Client Quit]
<gog> hi
SGautam has quit [Quit: Connection closed for inactivity]
<Ermine> hi
* Ermine thinks about getting psp emulator...
<bnchs> hi
FreeFull has joined #osdev
<zid> I have a nice new 5800x so naturally I am playing an opengl 1.1 game designed to run on a pentium 4
inegatives has quit [Quit: Client closed]
<gog> nice
<zid> (and it barely used any cpu% on a p4)
<GeDaMo> Which game?
<zid> tgm3
<zid> https://cdn.discordapp.com/attachments/518247259838611476/1087115303995056147/image.png Need to get this from 2:32 to 2:28 or I'm not allowed to keep playing
<zid> but I was at 2:58 when I started derusting, so, almost there
d5k has quit [Quit: leaving]
theboringkid has quit [Quit: Bye]
theboringkid1 has joined #osdev
theboringkid1 is now known as theboringkid
inegatives has joined #osdev
theboringkid has quit [Quit: Bye]
xvmt has quit [Ping timeout: 246 seconds]
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
xvmt has joined #osdev
inegatives64 has joined #osdev
inegatives64 is now known as innegatives
inegatives has quit [Ping timeout: 260 seconds]
slidercrank has quit [Ping timeout: 250 seconds]
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
innegatives has quit [Quit: Client closed]
rnicholl1 has joined #osdev
bgs has quit [Remote host closed the connection]
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
rnicholl1 has joined #osdev
inegatives has joined #osdev
inegatives has quit [Client Quit]
<mjg> check out scalability on freebsd running ubuntu jammy userspace in a chroot and doing -j 104 allyesconfig kernel
<mjg> 25667.23s user 13060.02s system 2784% cp
<mjg> u 23:10.71 total
<mjg> that is about 1/3rd of the total is in the kernel, out of which about 2/3rds are spinning on locks
<mjg> and the entire thing took to so long total real time because said spinning is for spinlocks to queue you up for off cpu wait
[itchyjunk] has joined #osdev
<mjg> interestingly it was not anywhere near this bad on ubuntu 18, where it took about 8-ish minutes
<mrvn> 1/3rd utiliziation sounds rather bad
<mjg> something has changed in glibc malloc which, most of the problem i see is i mmap et al
<moon-child> it sounds very bad
<mjg> it is incredibly bad
<mjg> almost all these waits are on 'page busying'
alturmann1729 has quit []
<rnicholl1> probably moving to do more page allocation and less management in userspace?
<rnicholl1> oh
<rnicholl1> maybe also the spin issue
<rnicholl1> spinlocks are REALLY bad when scaling up the number of threads
<rnicholl1> like they scale O(T^2) in overhead, T= threads
<rnicholl1> they made more sense when you had like 1 or 2 or 4 processor cores
<rnicholl1> anything more than that though.. just sleep immediately
<mjg> you missed the part where going off cpu is so frequent that the all the spinning is on the locks needed to go off cpu
<mrvn> You want a per-core malloc that pre-allocates dynamically
<rnicholl1> less an issue with the kernel, where you can guarantee the other thread is running
<mrvn> mjg: don't run significantly more threads than core
<rnicholl1> per core? how would that work
<rnicholl1> os can reschedule threads to different core at any time
<rnicholl1> per thread is possible
<rnicholl1> but ...dumb
<mrvn> rnicholl1: you can check the core-id when entering malloc and then use that cores pool.
<rnicholl1> Hum, you still need a lock though
<rnicholl1> Any other thread could get scheduled onto the same core
<mrvn> that's less of a problem since that rarely happens. But you can have multiple threads on the same core.
<mrvn> But if the malloc lock is held then you pretty much know another thread on the core is holding it so you can just go to sleep.
<mrvn> Which then allows the owning thread to wake up and finish. And you don't have contention between cores.
<mrvn> But really, why are you using more threads than cores? That's usualy stupid or a sign of a bad language that uses threads for IO.
<mrvn> Threads should have a dont-preempt flag that's writable from userspace that makes the kernel not migrate the thread and not switch to another thread from the same process on that core.
<rnicholl1> yeah that would be pretty neat
<rnicholl1> I was actually thinking of doing that if I ever get my code working
<rnicholl1> but I am a filthy casual when it comes to kernel programming
<rnicholl1> and might never build a working OS
<rnicholl1> I am starting to get the feeling that Linux is bloated though
<rnicholl1> somehow, windows is more efficient at scheduling now than linux
<kazinsal> one of these days I'll finally stop fiddling around with the half dozen dumb side projects and actually go back to working on my "main" osdev project
<rnicholl1> 10 years ago I would have laughed if someone said windows had a more efficient scheduler than linux
<rnicholl1> But the benchmarks don't lie
<rnicholl1> Microsoft has been improving windows and Windows 11 is actually decent
<rnicholl1> Meanwhile, Linux is resting on laurels, so to speak
<rnicholl1> In some ways, I think windows NT was way ahead of its time
<rnicholl1> A lot of the async stuff that windows NT had that seems silly and wasteful on a 1/2 core system is paying dividends on modern 16-core cpus
<rnicholl1> Linux approach was designed for single thread with async tacked on as an afterthought
<rnicholl1> io-uring might help a little but still has the same fundamental issue, it's a band-aid for a flawed interface
<rnicholl1> basically all linux system calls are designed for synchronous operation
<rnicholl1> There are huge gaping holes too
<rnicholl1> like no way to async-write usb devices
<rnicholl1> cant use epoll for that
<rnicholl1> blocking or go home
<immibis> you could also see windows's security model
<immibis> vs linux's
<immibis> windows lets you do what you want, and linux doesn't
<kazinsal> I think there's been changes in the ideology of release types and how much you're allowed to "go fast and break things" on both sides of the fence there
<rnicholl1> yeah, linux is stuck in the 90s, in more ways than 1
<kazinsal> since Microsoft maintains multiple different branches of the OS they have more freedom to just try weird things out on the faster release branches
<immibis> when you get into things like SACLs it becomes quite clear that microsoft had security experts to advise their security design
<kazinsal> whereas you know that the LTSB will always be a functional snapshot of a specific point in the release history but with security updates for several years
<kazinsal> for a while I was running Insider Preview builds of Windows 10 and it was interesting to see what they would just throw at the wall to see what would stick
<rnicholl1> Honestly, the only good point about linux these days is that it's open source
<immibis> linux's advantage is customizability. Imagine just writing a new scheduler for windows. You can't. Only microsoft can do that.
<rnicholl1> I think in most other ways, aside from open source/customizability, linux is worse. well maybe stability too.
<immibis> mrvn: locks and threads might be entirely the wrong paradigm. Consider message passing instead\
<rnicholl1> Message passing is slower than locks for sure
<kazinsal> the fact so many appliances etc. rely on the linux kernel to be stable and provide a specific set of features with specific performance requirements without having constant regressions between releases is kind of a downer
<rnicholl1> how do you implement message passing
<rnicholl1> answer: with locks
<kazinsal> but it's a necessity
<rnicholl1> Linux is great for embedded applications
<immibis> do you necessarily? you implement it that way if you're on a system with threads and locks
<kazinsal> if 5.15.103 works fine but 5.15.104 suddenly loses 2% performance on average in kvm guests and that was just considered an acceptable risk you'd have vendors screaming from the rooftops
<immibis> why not implement message passing with a lockless ring buffer (forcing synchronization when it appears full or empty)?
<immibis> (i'm sure there's a good idea why not)
<rnicholl1> because... how do you implement a lockless ring buffer?
<rnicholl1> CAS?
<rnicholl1> that's the same as a mutex lock
<immibis> you just write the data to memory in a ring, you have a head pointer and a tail pointer
<rnicholl1> assuming happy path no contention
<rnicholl1> What do you think lock does?
<immibis> wait for something
<rnicholl1> only if it's already locked
<rnicholl1> GNU's mutex implementation is kinda horrible though
<rnicholl1> If you DIY it you can make much better ones
<rnicholl1> GNU literally benchmarks the mutex in a loop with multiple threads going after the same mutex
<rnicholl1> like how realistic is that
<rnicholl1> answer: not very
<immibis> so you're complaining they benchmark the contended case?
<rnicholl1> They optimized for an unrealistic scenario
<immibis> seems like one of many realistic scenarios to me
<immibis> how do you think mutices are realistically used?
<rnicholl1> The right way to make mutexes, imo, is to make them as small as possible
<rnicholl1> GNU optimized for stuff like "fairness"
<rnicholl1> Stuff that should be implemented as additional layers on top of mutexes if you need it
<rnicholl1> like userland task scheduling
<immibis> unfair mutices can create counter-intuitive problems. Fair ones "just work" more often, despite being heavier.
<rnicholl1> true but it's not worth optimizing for bad code when it makes good code slower
<rnicholl1> imo anyway
<immibis> if you can deal with a less general faster data structure you should perhaps have to request it
<rnicholl1> If you write code that spins instead of using condition variables, that's your fault. The standard library shouldn't "fix" your errors.
<rnicholl1> Or maybe they should implement things with the minimum generality specified by C and C++ standards
<rnicholl1> and if you want a more general mutex than the one specified by C / C++, you ask for it
<rnicholl1> instead, std::mutex and mtx_t are just wrappers around pthread_mutexes
<rnicholl1> despite pthread_mutex_t being a much more heavyweight mutex than what std::mutex/mtx_t is required to do
<rnicholl1> GNU got lazy
<rnicholl1> It could have been way more efficient
<immibis> write your own
<immibis> isn't that the beauty of open-source?
<rnicholl1> yeah sure, I've written mutexes before
<rnicholl1> 4-byte mutex is easy
<immibis> imagine if people were still calling non-standard OS functions like bzero
<rnicholl1> would rather std::mutex be more efficient though
<immibis> did you know what happened when memcpy worked backwards sometimes?
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
rnicholl1 has joined #osdev
<rnicholl1> I wish my laptop would stay online when I closed the lid for 2 minutes heh
<moon-child> rnicholl1: there are waitfree queues
<rnicholl1> you still need to wait when there's no message to act on
<rnicholl1> unless.. god forbid
<rnicholl1> you spin when there's no messages
<rnicholl1> Besides, wait-free is just a pipe dream
<rnicholl1> it's kinda a dumb concept, imo
<rnicholl1> Based on some aversion to locks
<rnicholl1> on a theory that the code would be faster if there were no locks
<rnicholl1> This isn't actually true
<moon-child> waitfree is a model. All models are wrong, but some are useful
<rnicholl1> On a fundamental level, the hardware is using locks to implement wait-free algorithms
<moon-child> no one claims that waitfree code is universally faster than non-wait-free code because of the lack of locks
<moon-child> wait freedom means that a process is guaranteed to make progress in a bounded number of time steps. Most computers contain a constant number of processors. Therefore, they can in fact implement wait free algorithms
<rnicholl1> Except that it isn't guaranteed to make progress because the OS scheduler might not schedule them to run
<zid> Depends on what the sched guarentees
<moon-child> you said 'CAS? that's the same as a mutex lock'. It's not. It is different and has different performance properties
<zid> paring weird locks and a weird sched is just a weird os
<zid> pairing*
<moon-child> rnicholl1: I was imagining that we were below the level of scheduling user-level processes ('infinite cpu time' abstraction)
<zid> hopefully you're not on such a syste
<rnicholl1> CAS is how I implement mutex locks
<moon-child> FUN FACT
<moon-child> cas can be used to implement all sorts of things
<rnicholl1> It's the most performant way if you don't try to implement fairness in userspace
<rnicholl1> try once, call kernel on fail
<rnicholl1> that's how I implement locking
<rnicholl1> spin locks scale reaaaaallly badly on many-core systems