#osdev on 2022-08-08 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <immibis> other people have done some work (but not on the dram controller), it's linux-sunxi

00:00 <clever> immibis: the ddr2 controller was RE'd and is in basically the same state: https://github.com/librerpi/rpi-open-firmware/blob/master/firmware/sdram.c

00:00 <bslsk05> github.com: rpi-open-firmware/sdram.c at master · librerpi/rpi-open-firmware · GitHub

00:01 <clever> for the ddr2, its just a bunch of magic register writes, and some of the numbers make some sense

00:01 <clever> for the bcm2711 ddr4, you need to copy several 20kb blobs into the ddr4 controller first

00:01 <clever> along with register writes

00:02 <immibis> firmware blobs are debatably maybe not actually a problem

00:02 <immibis> unless the licence says they are

00:03 <clever> immibis: yeah, thats a second issue towards what heat asked, i may not be allowed to re-distribute modified blobs that include tianocore

00:03 <heat> blobs of what?

00:03 <immibis> was it accolade vs sega where a court said that if you require a blob for your system to work, you can't enforce the trademark on that blob? If the court is feeling particularly anti-nefarious maybe they would rule the same on copyright, but I doubt it, since that would make oligarchs less rich

00:03 <heat> why do you need to modify them?

00:03 <immibis> Blobs of bytes

00:03 <heat> ty

00:03 <heat> blobs of mrc code?

00:04 <heat> or a blob of tianocore?

00:04 <clever> one sec

00:04 <zid> nintendo's attempt was to put the nintendo logo onto the cart, then hash the cart header

00:04 Vercas6 has quit [Quit: Ping timeout (120 seconds)]

00:04 <clever> -rw-r--r-- 1 clever users 47K Feb 8 18:16 bootcode.bin

00:04 <zid> which I always found kind of genius

00:04 <clever> heat: this blob runs on the VPU, it deals with initializing the ddr4, and loading the other ddr4 blobs

00:04 <clever> -rw-r--r-- 1 clever users 4.1K Feb 8 18:16 mcb.bin

00:04 <clever> -rw-r--r-- 1 clever users 14K Feb 8 18:16 memsys00.bin

00:05 <clever> there are 8 memsys files, and all of these are used in bringing ddr4 online

00:05 <clever> -rw-r--r-- 1 clever users 234K Feb 8 18:20 bootmain.elf

00:05 <clever> after the ddr4 is online, bootcode.bin runs bootmain.elf, and the sha256 of bootmain.elf is held inside bootcode.bin

00:05 <clever> so, if i want to replace bootmain, i have to fix the hash in bootcode.bin (and resign it)

00:05 <clever> and now i must ship a modified blob

00:06 <clever> and what does the license say about that?

00:06 <immibis> I don't think it was Nintendo in the court case. Btw the Nintendo one isn't hashed.

00:07 <clever> https://github.com/raspberrypi/rpi-eeprom/blob/master/LICENSE

00:07 <bslsk05> github.com: rpi-eeprom/LICENSE at master · raspberrypi/rpi-eeprom · GitHub

00:07 <clever> > Redistribution and use in binary form, without modification, are permitted provided that the following conditions are met:

00:08 <clever> heat: this implies that modifying the pieeprom.bin files in any way is not allowed, but there seems to be a problem here, because the repo includes a python script for modifying a .txt file inside the .bin

00:08 <clever> and sending those modified files is a form of support you can do

00:09 Vercas6 has joined #osdev

00:10 heat has quit [Remote host closed the connection]

00:12 <immibis> Asking for support isn't necessarily considered distribution

00:12 heat has joined #osdev

00:12 <immibis> Here we see the annoying part of an adversarial legal system: the oligarch will assert all the rights they want, regardless of which ones they are actually entitled to. The only way to find out they asserted a right they don't have is to try violating it and risk prison time.

00:13 <clever> immibis: yeah, there are 2 different levels, 1: using an official python script to embed a .txt into a .bin, and sending the result to a user who cant figure that out

00:13 <clever> and 2: replacing the code inside the .bin, to make it do something entirely different from normal

00:13 <clever> technically both are modifying the .bin file

00:14 <immibis> You didn't modify the file, you just constructed a new file containing unmodified parts of the original :)

00:14 <clever> and how much can i play that card? :P

00:16 <heat> yeah erm sounds like you're screwed

00:16 <clever> heat: i could just provide a script that modifies things for the end-user

00:17 <clever> its not "Redistribution" if you never distribute it

00:17 <heat> i guess you could pull that card

00:18 <clever> but then the end-user also needs the signing keys

00:18 <heat> it'd be great if you could find your way around that

00:18 <clever> you can read the keys from a custom start4.elf

00:18 <heat> make bootmain.elf jump to your code

00:19 <clever> bootmain.elf's job, is to load start4.elf from a supported media, SD/USB/TFTP/NVME/HTTPS

00:19 <immibis> You can play that card until they sue you, which is most likely eternity. But it's not guaranteed to be eternity.

00:19 <heat> i'm sure you could find a way to accidentally corrupt things

00:19 <clever> if i keep the official bootmain.elf, then i can only boot from one of the above sources

00:19 <clever> if i use a custom bootmain.elf, then i can boot from anything i want

00:19 <immibis> Observe how intellectual property laws oppress the common man

00:20 <heat> you're making this into a class thing

00:20 <heat> it's not a class thing

00:20 <clever> heat: yeah, ive not been looking for buffer overflow exploits in depth, more just how the code is meant to function

00:20 <heat> ddr4 controller people write MRC code, you get the blob because they're scared to share what's there

00:21 <immibis> heat: every thing is a class thing in the end. If only because it creates its own classes

00:21 <immibis> what does MRC stand for?

00:21 <heat> you get the blob with the condition that you don't modify it if redistributing, end users get it too

00:21 <heat> memory reference code

00:21 <clever> -rw-r--r-- 1 clever users 61K May 5 2020 bootcode.bin

00:21 <heat> intel term for ddr2/3/4/5 training code

00:21 <clever> heat: the older bootcode.bin, did both dram init, and loading of start4.elf from sd/usb/tftp/nvme

00:22 <clever> so its clearly being built by RPF, and is a mix of ddr4 source, and bootloader source

00:22 <clever> the design of the code also matches the older pre-ddr4 firmware

00:22 <clever> so it seems like RPF got ddr4 driver source, and integrated it into the existing bootloader

00:22 <heat> you sure?

00:22 <heat> i bet it's just blobs in blobs

00:23 <clever> the bootcode.bin has no real blobs hidden in it

00:23 <clever> its all VPU asm, strings, and small binary constants (like gpt uuid's)

00:23 <heat> you said it loaded more blobs though?

00:23 <clever> those are seperate files, clearly tagged with a size&name

00:23 <immibis> How is VPU assembly not a blob?

00:23 <heat> i bet that's where the secret sauce is

00:24 <clever> and just get copied to a dedicated area and the ddr4 deals with it

00:24 <clever> immibis: because i can decompile it, so its effectively source

00:24 <immibis> so a blob is anything you can't decompile? Strange definition

00:24 <clever> well, more, that there are no non-vpu blobs mixed in with the vpu asm

00:25 <clever> so i can look at any given byte-range, and tell you what its doing

00:25 <clever> the vpu asm is still a blob, but its a blob i can understand

00:25 <heat> in intel platforms you get the Intel FSP and you integrate it with your platform code to do magical things, like setting up Top Secret platform stuff and train memory

00:25 <clever> but the mcb.bin/memsys0{0-7}.bin are unknown blobs

00:26 <heat> vendors don't get to see the FSP

00:26 <clever> heat: yeah, the VPU kind of acts like the FSP, but the entire bootcode.bin+bootmain.elf+start4.elf takes place on the VPU, and start4.elf must live on SD/USB/TFTP/NVME

00:26 <heat> the term Memory reference code comes from a time where they actually needed to share the code, but things were much stricter and no one could see it

00:26 <clever> and if you want to boot from something else, you cant

00:27 <heat> s/no one/only intel partners/

00:27 <heat> these days, it's just a blob in your blob

00:27 <heat> defined API, you call it, it works

00:28 <heat> there's even a dispatch mode where it's integrated seemlessly with the rest of your UEFI PEI

00:28 <heat> could you not see where the memory training function starts and call that?

00:29 <clever> dont really have that option on the rpi, the 47kb of VPU asm expects to be loaded to a specific addr, and i only have 128kb of ram to work with, and my own blob is loaded to the same addr!

00:30 <clever> heat: i could, but id have to manually figure out all of the relocation patching by hand, or move my own binary out of the way and load it to the right addr

00:30 <heat> well yes, but such is life :)

00:30 <clever> or just leave the bootcode.bin as the 1st stage, and have it run my code after raminit

00:30 <clever> thats far simpler

00:31 <clever> for that, i just have to replace bootmain.elf, and patch the sha256 inside bootcode.bin, and re-sign it

00:32 <heat> but that's not legally viable

00:32 <clever> the relocation and run the blob thing, has 2 different routes

00:33 <clever> 1: my initial bootcode.bin then relocates/loads to +47kb, to create a 47kb hole at the load addr, loads the original bootcode.bin, and calls into it, then continues to boot

00:34 <clever> 2: my initial bootcode.bin relocates/loads to +47kb, loads the original bootcode.bin, does the sha256 patching, and then jumps to its entry-point, it then does ramint and runs bootmain.elf

00:34 <clever> 1 would break with every update, and i have to find the right function

00:35 <clever> 2 is just patching it at runtime, and thats far simpler

00:36 <clever> in theory, i could write a loading stub in asm (very small), put it at the top of 128kb, then that copies bootcode.bin from spi->L2, patches, and runs it

00:36 <clever> then bootcode.bin is very small, and just memcpy's that stub out of the way, and runs it

00:44 <zid> immibis: nintendo logo is hashed, the gb just doesn't actually *care* about the hash

00:45 <clever> and thats more in the realm of the signing on the .bin file

00:46 <clever> from the factory, the .bin file (on sd or spi) must be signed with an hmac-sha1 key

00:47 <clever> and the only changes you can do, is to make it more restricted, enabling RSA with an unknown private

00:48 Matt|home has joined #osdev

00:49 <clever> baring exploits/bugs in the boot rom, which i havent found yet on the 2711

00:49 <clever> the only exploit ive found was a timing exploit on the 2835's sig checking, but that doesnt even check sigs on the pi0/pi1

00:53 <clever> and a minor bug in the gpio boot mode stuff, but nothing that could be exploited

00:59 <immibis> and yet you wrote your own boot rom. How does that not make security irrelevant?

00:59 <immibis> well, not rom

01:03 <clever> immibis: the entire pi0-pi3 lineup has signature checks disabled by default, so it will just run any bootcode.bin it finds on the SD card

01:04 SGautam has joined #osdev

01:04 <clever> immibis: the bcm2711 has hmac-sha1 checks enabled by default, but the start4.elf it loads later isnt verified, and can dump enough data to reconstruct the key

01:05 <immibis> By the way I worked at a place where we had the source code for the DRAM training in modified u-boot. It doesn't make it any better

01:06 <clever> immibis: my goal is less about changing the dram init, and more about changing what happens after dram init

01:06 <immibis> (We didn't need that source code, but we had it as part of the board support package. We also had a register listing of the entire SoC - still not useful as half of them only had bit names and no descriptions)

01:06 <clever> same for the rpi

01:06 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/common/broadcom/bcm2708_chip/arm_control.h#L10-L21

01:07 <bslsk05> github.com: rpi-open-firmware/arm_control.h at master · librerpi/rpi-open-firmware · GitHub

01:07 <immibis> I think they just hack the hardware and software together until it works, then ship it. The reason for no documentation is because they just ask the hardware people

01:07 <clever> immibis: addr, bit field masks, but this file is in a different format from normal

01:07 <clever> and even has typos

01:07 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/common/broadcom/bcm2708_chip/hvs.h#L4-L17

01:07 <bslsk05> github.com: rpi-open-firmware/hvs.h at master · librerpi/rpi-open-firmware · GitHub

01:07 <clever> immibis: nearly all of the files are in this format

01:08 <clever> but some peripherals are oddly blank

01:08 <clever> either they deleted something, or they just never had the definitions

01:11 <heat> is it possible to toggle 32-bit code generation for a single function? x86

01:11 <heat> or can I only do it for compilation units?

01:14 <immibis> I think only for compilation units. Doesn't it actually run a different compiler?

01:14 <immibis> well I don't know if it does, actually

01:15 <heat> /shrug

01:15 <heat> that makes things significantly yuckier

01:15 <heat> unless I switch to 64-bit first

01:15 <heat> also an option

01:23 <heat> this is going to be hairy

01:24 <clever> heat: ive heard something about some modern bios's switching to 64bit mode, before the dram is even online

01:25 <heat> yes you're correct

01:25 <heat> not sure if it has been shipped though

01:25 Vercas6 has quit [Remote host closed the connection]

01:25 <clever> if you have know what address your loading to, you could pre-create some identity paging tables in your .data

01:26 <heat> theoretically your whole PEI stage (which has premem and postmem code) can do that

01:26 Vercas6 has joined #osdev

01:26 <heat> well, this is relocation code

01:26 <clever> and then you just have to setup the page table root and enable 64bit mode?

01:26 <heat> :)

01:26 <clever> then your asm will also have to generate paging tables, at least one page, enough to run some compiled C code

01:26 <heat> I know

01:26 <heat> I need to get the load bias because there's no such thing as rip-relative addressing in 32-bit mode

01:27 <heat> :^

01:27 <heat> fucking shoot me this is horrible

01:27 <clever> abuse the call opcode?

01:27 <clever> call 1f \n 1: pop ra

01:27 <heat> i don't have a stack yet

01:28 <heat> :)

01:28 <heat> well, erm, theoretically I can use the bootloader's stack I guess?

01:28 <clever> and you dont know your load addr, so you cant point the stack to .bss

01:28 ghee has joined #osdev

01:28 <heat> yup

01:28 <clever> yeah

01:28 <clever> it likely has at least one word free

01:35 ghee has quit [Quit: EOF]

01:43 zhiayang has quit [Quit: oof.]

01:45 zaquest has quit [Remote host closed the connection]

01:46 zhiayang has joined #osdev

01:47 [itchyjunk] has joined #osdev

01:49 zaquest has joined #osdev

02:03 lawliet has quit [Quit: WeeChat 3.0]

02:07 srjek has quit [Ping timeout: 255 seconds]

02:39 chartreuse has quit [Read error: Connection reset by peer]

02:50 foudfou has quit [Remote host closed the connection]

02:50 gxt___ has quit [Read error: Connection reset by peer]

02:50 Vercas6 has quit [Write error: Connection reset by peer]

02:50 foudfou has joined #osdev

02:50 Vercas6 has joined #osdev

03:01 skipwich has quit [Ping timeout: 268 seconds]

03:02 chartreuse has joined #osdev

03:03 gog has quit [Ping timeout: 252 seconds]

03:09 [itchyjunk] has quit [Remote host closed the connection]

03:20 hyenasky has joined #osdev

03:23 <hyenasky> anyone have experience with paged pool implementation

03:23 <hyenasky> i.e. swappable kernel heap

03:32 saltd has quit [Quit: joins #libera]

03:33 SGautam has quit [Quit: Connection closed for inactivity]

03:33 <heat> https://github.com/heatd/Onyx/commit/cab4f36a0b8dd0fa0ce24f4b2d696a33f7af6ca3#diff-23043e8456f1ed18d9b64887ca1c2c984c5b3d5e58d1a26c4a2d54e566161a67R108

03:33 <bslsk05> github.com: x86: Add physical memory relocation · heatd/Onyx@cab4f36 · GitHub

03:33 <heat> lea is overpowered

03:34 <clever> yeah, lea is amazing for PIC code

03:35 <heat> adding this was actually not that hard

03:35 <heat> the hard part was figuring out where I was assuming phys = sym - virt base and virt = phys + virt base

03:35 <heat> since it was failing to take into account the load bias

03:37 <heat> I didn't actually need to make the early mmu setup that complicated that I would need C

03:38 <heat> maybe for KASLR I'll add MMU + relocation as a separate step

03:38 <heat> so like jump to 64-bit, and then call C to set up the mmu, reload, jump

03:39 <clever> can 64bit be ran with the mmu off??

03:39 <heat> no

03:39 <heat> I would need to set up some basic page tables

03:39 <clever> thats what i thought

04:09 poyking16 has joined #osdev

04:26 skipwich has joined #osdev

04:34 heat has quit [Ping timeout: 240 seconds]

05:20 poyking16 has quit [Ping timeout: 245 seconds]

05:30 wxwisiasdf has joined #osdev

05:32 poyking16 has joined #osdev

05:32 wxwisiasdf has quit [Client Quit]

05:39 vai has joined #osdev

05:39 <vai> hi morning from Finland EU

05:45 mzxtuelkl has joined #osdev

06:03 vai has quit [Remote host closed the connection]

06:15 mzxtuelkl has quit [Quit: Leaving]

06:40 scaleww has joined #osdev

06:46 opal has quit [Remote host closed the connection]

06:46 opal has joined #osdev

07:10 mzxtuelkl has joined #osdev

07:20 poyking16 has quit [Ping timeout: 268 seconds]

07:21 poyking16 has joined #osdev

07:29 gmacd has joined #osdev

07:32 poyking16 has quit [Read error: Connection reset by peer]

07:33 poyking16 has joined #osdev

07:39 poyking16 has quit [Ping timeout: 252 seconds]

07:49 poyking16 has joined #osdev

08:07 puck has quit [Excess Flood]

08:07 puck has joined #osdev

08:15 hyenasky has quit [Quit: Client closed]

08:19 DanDan has quit [Ping timeout: 272 seconds]

08:21 opal has quit [Ping timeout: 268 seconds]

08:25 bauen1 has quit [Ping timeout: 268 seconds]

08:27 opal has joined #osdev

08:30 gmacd has quit [Ping timeout: 255 seconds]

09:21 bauen1 has joined #osdev

09:23 GeDaMo has joined #osdev

09:25 gxt___ has joined #osdev

09:28 DanDan has joined #osdev

09:33 poyking16 has quit [Read error: Connection reset by peer]

09:36 poyking16 has joined #osdev

10:00 DanDan has quit [Ping timeout: 245 seconds]

10:12 DanDan has joined #osdev

10:17 gxt___ has quit [Remote host closed the connection]

10:18 gxt___ has joined #osdev

10:39 socksonme_ has joined #osdev

10:57 scaleww has quit [Quit: Leaving]

10:58 scaleww has joined #osdev

11:03 MiningMarsh has quit [Ping timeout: 252 seconds]

11:16 foudfou has quit [Remote host closed the connection]

11:17 gog has joined #osdev

11:17 foudfou has joined #osdev

11:19 nyah has joined #osdev

11:20 heat has joined #osdev

11:35 gxt___ has quit [Remote host closed the connection]

11:36 gxt___ has joined #osdev

11:46 <heat> my nightlies have been broken for like a week because dropbear can't host a fucking tarball

11:51 <psykose> which mirror you using

11:51 gildasio has joined #osdev

11:51 <psykose> australia one seems to be solid

11:53 poyking16 has quit [Ping timeout: 252 seconds]

11:53 <Mutabah> heat: ... yeah... not dropbear's fault

11:53 <Mutabah> The hosting provider is having... issues

11:53 <Mutabah> psykose: Main dropbear site has been down for a week or so

11:54 <psykose> aha, that time again

11:54 <psykose> haha, infinite redirects

11:59 <Mutabah> heat: Wait, your nightlies pull the tarball from matt.ucc.asn.au every night?

12:00 gmacd has joined #osdev

12:01 <heat> yes

12:02 <heat> my package build system doesn't host tarballs

12:02 <heat> just patches

12:03 <Mutabah> huh

12:03 <heat> the build script fetches a tarball and patches it

12:03 <heat> a-la arch

12:03 <heat> and a-la alpine as well I think

12:03 <Mutabah> Not like I can complain, I used to do similar with an entire cross-compiler :)

12:03 <zid> no love for gentoo? :(

12:04 gxt___ has quit [Remote host closed the connection]

12:04 <heat> you need to compile love first

12:04 gxt___ has joined #osdev

12:13 MiningMarsh has joined #osdev

12:14 socksonme_ has quit [Remote host closed the connection]

12:22 gmacd has quit [Ping timeout: 240 seconds]

12:29 Andrew is now known as HaxCPU

12:29 HaxCPU is now known as Andrew

12:32 <GeDaMo> https://love2d.org/

12:32 <bslsk05> love2d.org: LÖVE - Free 2D Game Engine

12:34 Andrew is now known as HaxCUP

12:34 HaxCUP is now known as HaxCPU

12:34 <zid> that was what I was thinking of too GeDaMo :p

12:35 HaxCPU is now known as ANdrew

12:35 ANdrew is now known as Andrew

12:57 Vercas6 has quit [Remote host closed the connection]

12:58 Vercas6 has joined #osdev

13:07 Vercas6 has quit [Remote host closed the connection]

13:07 Vercas6 has joined #osdev

13:14 scaleww has quit [Quit: Leaving]

13:24 srjek has joined #osdev

13:24 xenos1984 has quit [Read error: Connection reset by peer]

13:34 the_lanetly_052 has joined #osdev

13:37 poyking16 has joined #osdev

13:42 xenos1984 has joined #osdev

14:07 socksonme has joined #osdev

14:08 poyking16 has quit [Ping timeout: 244 seconds]

14:10 poyking16 has joined #osdev

14:15 poyking16 has quit [Read error: Connection reset by peer]

14:18 poyking16 has joined #osdev

14:25 <junon> I used LOVE to build a game once. It's good for... simple games. Once you start getting into more complex situations, it quickly falls apart.

14:27 <mjg> there was a free engine whose name just lest my memory

14:27 <mjg> left

14:27 poyking16 has quit [Ping timeout: 268 seconds]

14:27 <mjg> afair it somehow did 2d and 3d games

14:27 <GeDaMo> Godot? Irrlicht?

14:29 carbonfiber has joined #osdev

14:29 <GeDaMo> https://en.wikipedia.org/wiki/List_of_game_engines

14:29 <bslsk05> en.wikipedia.org: List of game engines - Wikipedia

14:29 poyking16 has joined #osdev

14:29 <GeDaMo> I have experience with forgetting stuff :|

14:30 <mjg> unity

14:31 <mjg> only found it because i remembered slender: 8 pages uses it

14:31 <mjg> never used it personally, but i hear it is pretty good

14:52 <zid> heh

14:52 <zid> mjg, like half of all games on steam use it

14:52 <zid> it's like saying "I think I heard of this OS.. micro shaft? winders?"

14:55 socksonme has quit [Ping timeout: 252 seconds]

14:55 <mjg> zid: :)

14:55 <mjg> zid: the newest game i played was released in 2008

14:56 <mjg> zid: so there is that

14:56 <mjg> or slightly later, but the point remains

15:07 the_lanetly_052_ has joined #osdev

15:10 the_lanetly_052 has quit [Ping timeout: 244 seconds]

15:25 dude12312414 has joined #osdev

15:26 dude12312414 has quit [Remote host closed the connection]

15:27 dude12312414 has joined #osdev

15:42 gildasio has quit [Quit: WeeChat 3.6]

15:53 gog` has joined #osdev

15:57 gog has quit [Killed (NickServ (GHOST command used by gog`))]

15:57 gog` is now known as gog

15:57 gog` has joined #osdev

16:04 jafarlihi has joined #osdev

16:05 <jafarlihi> I'm trying to do FreeBSD kernel development but can't figure out getting IDE right for things like autocomplete. Is vim with CoC and bear best option? How do you set up things like CLion for autocomplete? What do you use?

16:06 <clever> jafarlihi: i tend to just use vim + youcompleteme, YCM auto-completes any keyword in any currently open file, so it just magically works, if the .h is open in another :tabe

16:09 <jafarlihi> clever: So I need to have the struct definition .h open in another tab to get autocomplete on fields? yuck

16:11 <bauen1> I use neovim + clangd-13 lsp to get autocomplete for my C / C++ projects, but I'm not sure how well that would work with the FreeBSD kernel

16:12 <clever> jafarlihi: or another .c file thats using it

16:12 <clever> jafarlihi: YCM probably also has better options, i just never bothered to look into how they get setup

16:12 skipwich_ has joined #osdev

16:13 skipwich has quit [Ping timeout: 252 seconds]

16:13 puck has quit [Excess Flood]

16:13 puck has joined #osdev

16:15 Raito_Bezarius has quit [Ping timeout: 255 seconds]

16:22 bauen1 has quit [Ping timeout: 245 seconds]

16:28 Raito_Bezarius has joined #osdev

16:30 Raito_Bezarius has quit [Max SendQ exceeded]

16:31 wootehfoot has joined #osdev

16:31 puck has quit [Excess Flood]

16:32 puck has joined #osdev

16:32 Raito_Bezarius has joined #osdev

16:35 Raito_Bezarius has quit [Max SendQ exceeded]

16:38 Raito_Bezarius has joined #osdev

16:38 gildasio has joined #osdev

16:40 <jafarlihi> /exit

16:40 <jafarlihi> q

16:40 jafarlihi has quit [Quit: WeeChat 3.6]

16:44 wootehfoot has quit [Ping timeout: 240 seconds]

16:47 <heat> he's on freebsd now huh

16:47 <heat> 3rd kernel in like 1 month

16:52 <mjg> chad kernel dev

16:55 <gog> dang i gotta step it up

16:55 <gog> i dev on 0 kernels

16:55 <gog> i started to have ideas last night but fell asleep instead

16:55 <gog> probably will do the same tonight

16:59 carbonfiber has quit [Quit: Connection closed for inactivity]

17:02 <heat> hi vincent van gog

17:02 <heat> I take patches

17:02 <junon> If spinlocks in the kernel are so bad because of interrupts (specifically NMIs) and reentrancy then how do you even begin to achieve a shared resource system?

17:03 <heat> they're not

17:03 <heat> spinlocks are great in the kernel

17:03 <heat> use them as widely as possible for smaller locks

17:03 <heat> larger locks (that can sleep), mutexes, rwlocks, etc

17:04 <junon> How do you handle cases where a spinlock needs to be locked in both normal code as well as during an interrupt? Can spinlocks be reentrant?

17:04 <junon> Otherwise you risk deadlocking, no?

17:04 <junon> Or you have to guarantee that you don't try to lock a spinlock from an NMI and just mask interrupts in the critical section

17:05 <junon> Unless I'm misunderstanding something?

17:06 <junon> Oh someone else has already brought this up a bit - https://forum.osdev.org/viewtopic.php?t=10363

17:06 <bslsk05> forum.osdev.org: OSDev.org • View topic - Spinlocks that disable interrupts...

17:06 <junon> It doesn't address the NMI case though

17:29 bauen1 has joined #osdev

17:33 <gog> NMI via standard exceptions is an unlikely case

17:33 <gog> they're more likely these days to be handled by machine check or jump into SMM

17:34 <gog> ofc the IOAPIC can do nonmasking hmm

17:35 <mjg> heat: you can convinced me to take a look at https://github.com/heatd/Onyx/blob/master/kernel/kernel/spinlock.cpp

17:35 <bslsk05> github.com: Onyx/spinlock.cpp at master · heatd/Onyx · GitHub

17:35 <mjg> heat: reading from the lock is pessimal to just blindly doing cmpxchg the first time around

17:35 <zid> yea that sounds slow in the fast case for no reason

17:36 <mjg> it slows down contended case as well

17:36 <mjg> at least on amd64

17:37 srjek has quit [Ping timeout: 244 seconds]

17:40 <clever> now that i think about it, i think even just a normal load/store to the same cache line will contest with mutexes on arm

17:40 <mjg> while (__atomic_load_n(&lock->lock, __ATOMIC_RELAXED) != 0)

17:40 <mjg> clipper: did you mean expected_val?

17:41 <mjg> clever: that should be a problem virtually everywhere due to coherency protocols

17:41 <clever> yep

17:41 <clever> but on arm, its somewhat worse

17:42 <clever> arm has no atomic operations

17:42 <mjg> or at least i'm unaware with anyone better than cache line

17:42 <mjg> s/with/of/

17:42 <mjg> doing*

17:42 <clever> on arm, you do load-exclusive, modify the data in a reg, then a conditional store-exclusive

17:42 <clever> if you still had exclusive ownership of the cacheline, the store happens

17:42 <clever> but if somebody stole the cacheline, the atomic fails, and you have to repeat that

17:43 <mjg> unless you have LSE

17:43 <clever> LSE?

17:44 gildasio has quit [Remote host closed the connection]

17:44 dude12312414 has quit [Remote host closed the connection]

17:45 <mjg> large system extensions

17:45 gildasio has joined #osdev

17:45 dude12312414 has joined #osdev

17:45 <mjg> most notably adds compare-and-set

17:45 <mjg> basically you no longer have to ll/sc

17:47 <clever> ahh

17:50 <clever> what i was thinking though, is where x86/lse vs arm differ, if you load during a cmpxchg, then one or the other stalls, while the L1 cache line bounces about

17:50 xenos1984 has quit [Read error: Connection reset by peer]

17:50 <clever> but on arm, if you load during a ll/sc, the atomic fails, and that whole block of opcodes has to repeat

17:51 <clever> so it can possibly be more costly, likely the driving force behind LSE

18:05 <mjg> lse definitely degrades less, but i don't remember numbers

18:08 xenos1984 has joined #osdev

18:20 <clever> mjg: yeah, with true atomic ops, the cpu can atomicly do the entire operation in basically 1 clock cycle, once it claims the L1

18:20 <clever> vs the arm style, where it needs many clocks, and can possibly loose the L1 line and have to restart

18:39 heat has quit [Remote host closed the connection]

18:39 heat has joined #osdev

18:43 kpel has joined #osdev

18:46 opal has quit [Ping timeout: 268 seconds]

18:46 opal has joined #osdev

18:52 GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]

18:54 dude12312414 has quit [Remote host closed the connection]

18:55 dude12312414 has joined #osdev

19:05 srjek has joined #osdev

19:06 kpel has left #osdev [Leaving]

19:07 gmacd has joined #osdev

19:09 socksonme has joined #osdev

19:16 _koolazer is now known as koolazer

19:27 the_lanetly_052_ has quit [Ping timeout: 252 seconds]

19:33 <heat> mjg, what's your suggestion? cmpxchg and then spin?

19:33 <heat> thanks for code reviewing my shit tho

19:33 <heat> much appreciated

19:34 <heat> I've got 120k more lines for you to review

19:34 <mjg> the most simplistic lock is this: if (cmpxchg(...)) return YAY; do { spin(); } while (atomic_read(&lock) != 0);

19:34 <mjg> or so

19:34 <mjg> and loop back that is

19:46 <geist> yeah that'll work pretty well too, i'd start with that

19:46 <geist> it doesn't scale too well, but it's also implementable pretty much any arch, even one with simply a swap instruction

19:50 <heat> https://godbolt.org/z/GMra5ed7M this looks optimal?

19:50 <bslsk05> godbolt.org: Compiler Explorer

19:56 opal has quit [Write error: Connection reset by peer]

19:56 gildasio has quit [Remote host closed the connection]

19:56 <heat> geist, wdym doesn't scale too well?

19:57 <heat> are you talking about spinning there vs mcs locks and whatnot?

19:57 opal has joined #osdev

19:58 gildasio has joined #osdev

20:02 <geist> yeah

20:02 <geist> but it works pretty well

20:02 <heat> AIUI it's not that simple for some reason

20:03 <geist> yah what you have there should be probably okay for x86. for ARM it's a bit more complicated because of wfe/sev/atomic interactions

20:03 <geist> but arm has the example version of that up somewhere

20:03 <heat> linux has a configure option between normal spinning spinlocks and queued spinlocks (MCS locks)

20:03 <heat> so it's not like MCS is always faster I guess? or at least there's a drawback

20:03 <geist> yah. personally i'd write the spinlock in hand asm for each arch, or have a generic one (like that) and then have a per arch version

20:03 <geist> yah there are tradeoffs, i think for small number of cores a spinlock with a pause/wfe is probably more ideal

20:04 <geist> if nothing else because it uses less space, etc

20:04 <heat> how much do you gain from micro-optimizing this?

20:04 <geist> define 'micro optimize'

20:04 <heat> hand-writting it

20:04 <heat> s/tt/t/

20:04 <geist> well, for example, on arm you'd actually get the ability to stop the cores from spinning, which i think is a fairly major power/emulation win

20:05 <geist> since the cores will basically mwait/monitor while the lock is held

20:05 <mjg> mcs is not always faster

20:05 <mjg> in fact mcs tends to be fucking *terrible*

20:05 <geist> indeed. that's why i haven't just jumped into rewriting all of the zircon ones until we really have time to figure out what to switch to

20:05 <geist> it's a compromise

20:05 <mjg> what mcs guarantees is fairness

20:05 <geist> right

20:05 <heat> fucking terrible?

20:05 <mjg> but fairness can demolish performance

20:05 <heat> why?

20:06 <heat> SMT?

20:06 <mjg> no

20:06 <geist> this is where i'd encourage you to go look at how MCS works

20:06 <heat> i have

20:06 <geist> it's clever and interesting, but you can probably see why its far more complicated and would have if nothing else bad cache coherency issues etc

20:06 <mjg> key to slowdown when things are contested is that there are cachelines bouncing back and forth

20:06 <geist> OTOH i haven't looked at it in some times

20:06 <geist> yah

20:06 <geist> (the cacheline stuff)

20:06 <mjg> if the lock is 100% fair, they bounce more than with a greedier lock

20:07 <mjg> this is especially visible if you have a multisocket system

20:07 <mjg> and all the cpus are pounding on the same lock

20:07 <mjg> perf is atrocious, but nobody is highly favored

20:07 <mjg> so it is a pragmatic choice when you can't have HUGE outliers

20:07 <mjg> with a greedy lock you may find someone is starved to death, so to speak

20:08 <geist> heat: re: the wfe thing i'm talking about https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/kernel/arch/arm64/spinlock.cc#14 is basically what we arrived at. it's very subtle, though that's basically the ARM reference implementation of a spinning lock without using LSE

20:08 <bslsk05> fuchsia.googlesource.com: zircon/kernel/arch/arm64/spinlock.cc - fuchsia - Git at Google

20:08 <geist> i honslty dont know how to do a WFE style lock with LSE

20:08 <mjg> that said, for most systems, a perfectly OK (not perfect!) lock would just use backoff

20:08 <heat> i'm not at a point where I can read that

20:08 <heat> but looks cute

20:08 <mjg> the standard approach is to increase spin teams

20:08 <mjg> times, geez

20:09 <mjg> 1 spin, 2, 4, and so on up to a predefined limit

20:09 <geist> heat: yeah i was just pointing out that a standard arch-neutral C version with intrinsics is not a perfectly optimal solution

20:09 <geist> i generally prefer to write that stuff in hand asm so that the compiler doesn't one day decide to do something stupid and tank the implementation

20:09 <mjg> :)

20:10 <mjg> heat: as for the paste https://godbolt.org/z/GMra5ed7M, if you want "optimal", you would have a fast path which just takes the lock and falls to a func call if that fails

20:10 <mjg> compilers like to do nasty stuff when faced with a loop

20:10 <geist> in that link above the real key is between line 22 and 20. if it looks at the old value, sees that it's already acquired, immediately loop back to a WFE. the cpu will halt until it sense that it lost the exclusivity of the cache line

20:11 <heat> mjg, I tried to add an explicit cmpxchg before it but clang just makes it part of the loop

20:11 <mjg> heat: because you need a different func

20:11 <heat> gcc explicitly creates a fast path there, as you'd like

20:11 <geist> another cpu writing to a cache line that a cpu thinsk it has an exclusive lock on is an implicit 'event' in arm world

20:11 <mjg> with attribute(noinline)

20:11 <mjg> ye i mostly deal with clang and it keeps messingi t up

20:11 <mjg> geist: do you have a bench for that wfe thing?

20:12 <geist> it's not a bench it's the fact that the cpu stops spinning

20:12 <geist> and thus uses dramatically less power

20:12 <mjg> but how does it affect performance

20:12 <geist> it may even be slower (though i'm fairly certain thats not the case)

20:12 <geist> i have no idea, but it's not even on the table to *not* do that

20:12 <geist> though i'm fairly certain it is pretty fast

20:13 <mjg> i would definitely try to quantify at least

20:13 <geist> oh sure, it's just been a few years since i looked at it

20:13 <mjg> point being *tiny* contention is everywhere

20:13 <mjg> and you don't to pay big time for being unlucky

20:13 <geist> but it's probably faster even because the other cores stop spinning on it, and bouncing the cahe line around

20:13 <mjg> don't want to*

20:13 <geist> but like i said the power savings are basically mandatory

20:14 <geist> it's the equivalent of leaving out the pause instruction on these things on x86

20:14 <mjg> well you may still get them if the lock is contested for more than one or two spins, so to speak

20:14 <geist> you can do it, and it might even be faster, but you simply dont (unless there's an errata)

20:14 <mjg> not having pause will definitely make it *slower8

20:14 <mjg> as you keep fucking up the line

20:14 <mjg> even if you switch to pure reads

20:15 <geist> yah. what i dont understand is how to do the WFE trick on a LSE implementation

20:15 <mjg> well not the line itself but ownership

20:15 <geist> i may well be that you always use the ll/sc variants for spinlocks, because they're more flexible

20:15 <geist> linux is no help because they just use MCS locks on arm64 now, so there's no example to look at

20:15 <geist> and last i checked the BSDs all still use something simple like this

20:15 <geist> using ll/sc

20:16 <mjg> btw, did you mention last time that zircon has reasonably optimized string primitives? memset, memcpy and so on

20:16 <mjg> for arm64?

20:16 <geist> we use the canonical ones that ARM publishes

20:16 <mjg> did you just take cortex strings lib?

20:16 <geist> linux uses them too

20:16 <geist> cortex strings lib

20:16 <mjg> ye that's what freebsd is using

20:17 <mjg> i wonder if that stuff is truly the best you can do (without doing different tradeoffs)

20:17 xenos1984 has quit [Read error: Connection reset by peer]

20:17 <mjg> anyhow, re mcs vs wfe, have yout ried asking arm? :)

20:18 <geist> have not. tis the fun part of work: juggling 27 things

20:18 <geist> i tend to try to focus on one or two things at a time

20:18 <mjg> (:

20:19 <geist> or at least it's the only way i've found i can reasonably expect to make forward progress

20:19 opal has quit [Remote host closed the connection]

20:19 <geist> so one day i or someone ont he team will look at it, and then we'll see

20:19 opal has joined #osdev

20:19 <mjg> ye i just made a bunch of hacks to freebsd (not committed yet) to unfuck make

20:19 <mjg> talking about "focus"

20:19 <heat> make deserves to be fucked

20:19 <heat> like everyone

20:20 <geist> the biggest problem is we have a few spinlocks that are a source of contention, so really the first strategy is to break them up a bit

20:20 <geist> then optimize the spinlock

20:20 <mjg> thisr eminds me, do you have anything rcu-like?

20:20 <geist> no

20:20 <mjg> now that is going to put a damper on these efforts

20:20 <geist> there is/was a hard rule to avoid anything rcu like

20:20 <mjg> any plans to get one?

20:20 <geist> like, dont even look at it, because patents

20:20 <mjg> why?

20:21 <mjg> ok

20:21 <mjg> that's fair

20:21 <geist> i think it's all expired, but i dont know beause i haven't looked

20:21 <mjg> some of it is, enough to make something which basically works

20:21 <heat> rcu-like is not rcu

20:21 <mjg> at a tolerable scale

20:21 <heat> I think mjg meant like EBR

20:21 <geist> sure, but IANAL

20:22 <mjg> i'm saying sooner than later you will need it

20:22 <heat> epoch(9) in your local freebsd manpages

20:22 <geist> yes, but that doesn't describe the implementation

20:22 <heat> if you're not using freebsd, you're a regular, normal human being

20:22 <mjg> performance aside, it helps avoid deadlocks

20:22 <mjg> as in gets rid of some of possible lock orderings by letting you not take one to begin with

20:22 <geist> indeed. that being said the zircon kernel doesn't have as many locks, etc. i think we prefer to measure and go in eyes open what we're doing

20:23 <geist> find bottlenecks, fix them, reorganize to avoid them, etc

20:23 <mjg> now i'm curious how do you do path lookups

20:23 <geist> basically at level 1 of optimization right now, haven't yet gotten into really fancy data structures yet

20:23 <geist> path lookups?

20:23 <heat> mjg, not in the kernel

20:23 <mjg> oh?

20:23 <heat> THIS IS A MICROKERNEL, BABY

20:23 <mjg> open("/foo/bar/baz", ...);

20:23 <heat> EVERYTHING IS IN USER SPACEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

20:24 <geist> user space

20:24 <mjg> huh

20:24 <geist> and i dont know how they do it

20:24 <mjg> you got some plan9 devs over there?

20:24 <geist> (it's probably slow as hell)

20:24 <heat> yes

20:24 <heat> they do

20:24 <mjg> i'm not talking about pike :p

20:24 <geist> uh no. rob pike works at google, but he doesn't work on fuchsia

20:25 <heat> I'm guessing a good chunk of fuchsia isn't very scalable because it's not a server OS, for now at least

20:25 <geist> exactly

20:25 <geist> that' why it's not *that* big of a deal yet that the spinlocks dont scale, etc.

20:25 <geist> the bigger problem is fairness because of big.LITTLE

20:25 <heat> geist, dhobsd was around 9front

20:25 <heat> which is totally plan9

20:26 <geist> sure

20:26 <geist> anyway, yeah part of zircon is just avoiding a lot of the complex data structures because there's just less code in the kernel anyway

20:26 <geist> by far the most complex part of the kernel is the VM i'd say

20:27 <mjg> does the kernel do networking or is that also outsorced

20:27 <geist> user space

20:28 <geist> no fs, no networking, no drivers (except the handful needed to drive the cpu)

20:28 <mjg> aah classic http://harmful.cat-v.org/software/ (the 9 stuff reminded me of it)

20:28 <bslsk05> harmful.cat-v.org: All software sucks

20:28 <geist> one kernel per arch, etc

20:28 <geist> it's an async ipc design, so closer to mach than L4

20:29 <heat> >less harmful alternatives

20:29 <geist> extremely capability based, so no concept of user or permissions or whatnot. just handles to kernel objects with rights per handle

20:29 <heat> >sed

20:29 <heat> fuck yeah baby

20:29 <heat> erm

20:29 <heat> s/sed/ed/g

20:29 <heat> fuck yeah baby

20:30 <mrvn> mjg: files are ina a graph and a path lookup walks a path through the graph.

20:31 <gog> yes

20:31 <geist> no doy

20:32 <mjg> harmful:

20:32 <mjg> > FreeBSD, NetBSD, Solaris.

20:32 <mjg> less harmful:

20:32 <mjg> OpenBSD

20:32 <mjg> (:

20:32 <mrvn> mjg: I'm still not sure RCU is all that usefull because they make delete almost impossible and if you do have write contention your task will just get stuck in doing the same thing over and over because their update fails.

20:33 <mjg> ?

20:33 <mjg> deleting stuff is pretty trivial

20:33 <heat> openbsd just discredits them entirely

20:34 <mjg> say you keep a linked list of rcu-protected objects

20:34 <mrvn> mjg: you can only delete stuff once all readers are done with it. That's the hard part in RCU and where all the patents are.

20:34 <heat> "or best of all: don't use HTTP."

20:34 <mjg> in that case i have to ask what do you mean by 'delete'

20:34 <heat> what do they want? gopher?

20:34 <mjg> remove from the list

20:34 <mjg> or actually free

20:34 <mrvn> mjg: dlete node

20:34 <mjg> delete in c++ parlance as in free?

20:34 <mrvn> c++ delete, free

20:35 <mjg> ye, there is crappery around it, but nothing fundamentally hard to get it to work to begin with

20:35 <mjg> really this starts getting difficult when you want this to work at scale

20:35 <mrvn> And if you aren't working at scale there is no problem with locks to begin with

20:35 xenos1984 has joined #osdev

20:36 <mjg> "scale" for me is definitely way above 100 threads

20:36 <mjg> and even below that avoiding locks is huge

20:36 <mjg> as certain objects are heavily hsared, see the path lookup stuff i mentioned the last time

20:36 <mjg> plus you get a perf win single-threaded because you are doing fewer atomics

20:37 <heat> the fd table

20:38 <heat> you could get away with a rw lock but then accept4(), et al becomes slower

20:38 <mrvn> mjg: I think thats more a problem how you lock that locking. Like my file graph at the moment has multiple locks. One that covers all the linkage and everything you need to walk the graph and one to modify objects without affectting the linkage. So I can happily write to files all I want, updating their size, cache references, ... without ever blocking the path walking.

20:40 <raggi> gos spinning in the scheduler and rwlocks will show up on profiles under some conditions and isn't super healthy, but util the os provides a cooperative way to avoid that without a syscall there's not much can be done. A simple solution would be a thing in the vdso that helps you decide if you have enough quanta left that spinning makes sense or if you should just yield instead. It wouldn't look a whole lot more complex than gettimeofday optimizations, it

20:40 <raggi> could even share a page with it

20:41 <mrvn> raggi: isn't that what futex is about?

20:41 <mjg> mrvn: to recall from last time, both freebsd and linux have fully scalable lookups where terminal path component is different

20:42 <mrvn> mjg: both bsd and linux come from a design with a single global lock. Don't think they have ti split up enough yet.

20:42 <mjg> mrvn: for example foo/bar/baz/${threadid}. doing that will *never* bounce cachelines

20:42 <mjg> mrvn: what is your code going to do

20:43 <mjg> 2 threads, one wants foo/bar/baz/thread1, another one foo/bar/baz/thread2

20:43 <mjg> there will be literally 0 bouncing of anything on freebsd and linux

20:43 <raggi> mrvn: if futex was suitable, we'd be able to persuade people to stop spinning in userspace ;)

20:43 <mjg> in your code i presume they will compete for the same lock(s)

20:44 <mrvn> mjg: On my os: They ask the filesystem service to open "threadX" for the handle "foo/bar/baz". So there is on refcount bouncing.

20:44 <mjg> raggi: what you really want is info if the lock owner is on cpu imo. this info is what's used for adaptive spinning everywhere that i have seen

20:44 <mrvn> raggi: futext spins until it doesn't make sense

20:44 <mjg> raggi: .. in kernels

20:45 <mjg> mrvn: well let me restate, how does it scale

20:45 <raggi> mjg: yeah, essentially we need to expose more scheduler information to userspace in order to get userspace to contend with it less

20:45 <mjg> mrvn: i can boot a 96-way box, have 96 threads opening foo/bar/baz/${mythreadnumber}

20:45 <mjg> mrvn: and have it scale perfectly

20:45 <mjg> (well modulo some uarch issues)

20:45 <mjg> but no SMP problems

20:46 <mrvn> mjg: creating the files will go down the crapper

20:46 <mjg> how something like this is going to look like on your kernel

20:46 <mrvn> mjg: you end up with 96 cores spinning on foo/bar/baz/ updating it over and over and failing.

20:47 <mjg> you mean in your kenrel or bsd/linux

20:47 <mrvn> mjg: on bsd/linux

20:47 <mjg> also i did not ask about file creation

20:47 <mjg> just *opening* a file, which presumably already exists

20:47 <mjg> then see above

20:47 <mjg> as for perf of parallel creation in the same dir, we can talk in a minute

20:47 <mjg> let's sort out this bit first

20:48 <mrvn> IN my kernel it will end up being sequential with all 96 cores doing an "atomic incr".

20:48 <mjg> which part. opening a file which already exists or creating a new one

20:48 <mrvn> opening an existing file

20:48 <mjg> well that's terrible from perf standpoint

20:48 <mjg> and wont be an issue on linux nor freebsd

20:49 <mjg> in fact i can slap a quick bench right now

20:49 <heat> slaappppppppppppp it

20:50 <mrvn> mjg: will it? Is that actualy you will do over and over? Sounds more like something you do once on program start and then run with it for a long time.

20:50 <mrvn> +something

20:50 <mjg> i refer you once more to the -j 104 bzImage test

20:51 <mjg> granted it was not 'open' a lot, but 'stat'

20:51 <mjg> and then

20:51 <mjg> before: 142.96s user 1025.63s system 4924% cpu 23.731 total

20:51 <mjg> after: 147.36s user 313.40s system 3216% cpu 14.326 total

20:51 Vercas6 has quit [Ping timeout: 268 seconds]

20:52 <mjg> slight correction: i still did not commit unslowing of the open files counter (as in for file descriptors), so open per se does run into a bottleneck, just not related to path lookup

20:52 <mjg> as for pure path lookup, here is access() calls to different files within /tmp

20:52 <mrvn> mjg: But now look at the case of creating or deleting a file.

20:52 scoobydoob has joined #osdev

20:52 <mjg> ./access1_processes -t 1

20:53 <mjg> min:4139933 max:4139933 total:4139933

20:53 <mjg> min:4139242 max:4139242 total:4139242

20:53 <mjg> min:1618870 max:1690272 total:172582177

20:53 <mjg> min:1618546 max:1693889 total:172728838

20:53 <mjg> i would say scales pretty nicely

20:53 <mjg>

20:53 <mrvn> mjg: so your open performes like mine. every core needs an atomic inc of the open counter.

20:53 <mjg> so yes, parallel creation of files in the same directory runs into a lock

20:53 <mjg> what

20:53 <mjg> man

20:54 <mjg> you are missing the point

20:54 <mjg> the open at hand does not suffer any scalability problems from path lookup, which i'm talking about

20:54 poyking16 has quit [Ping timeout: 268 seconds]

20:54 <mjg> there is a bottleneck outside of it, which i'm going to fix later

20:54 <mrvn> mjg: If you unslow the open THEN you are better

20:54 gmacd_ has joined #osdev

20:54 <mjg> but the question is whether delayed memory reclamation is of anty use

20:54 <mjg> not whether i got rid of the global open file counter

20:55 <mjg> and for that, see access benchmark above for an example

20:55 <mrvn> mjg: there are 2 problems with create/delete. One is the memory reclamation. The other is contention on the update.

20:55 Ram-Z_ has joined #osdev

20:55 antranigv_ has joined #osdev

20:55 <mrvn> RCUs are really really bad if you get update contention.

20:55 <mjg> they may slow things down, yes

20:56 darkstarx has joined #osdev

20:56 zhiayang_ has joined #osdev

20:56 chartreus has joined #osdev

20:56 Marsh has joined #osdev

20:56 <mjg> have you tried benchmarking your code against linux fwiw?

20:56 Patater has joined #osdev

20:56 <heat> benchmarking against linux means taking a big L

20:56 <mjg> i have some recollection they started adding support for parallel file addition

20:56 onering has joined #osdev

20:56 bleb_ has joined #osdev

20:56 <mrvn> they try to update, fail, retry, fail, retry, fail. Every loop only one core succeeds and all others spin doing the same work over and over.

20:56 seer_ has joined #osdev

20:56 Emil_ has joined #osdev

20:56 <mrvn> mjg: That's O(N^2)

20:56 <heat> mjg, I think the rename lock is now a seqlock

20:57 <mjg> what loop?

20:57 brenns107 has joined #osdev

20:57 <mrvn> mjg: the update process

20:57 gdd1 has joined #osdev

20:57 sbalmos1 has joined #osdev

20:57 Terlisimo1 has joined #osdev

20:57 <mjg> be more specific please

20:57 wereii_ has joined #osdev

20:57 DutchIngraham has joined #osdev

20:57 <mjg> update of what, global rcu state?

20:57 kaichiuchi_ has joined #osdev

20:57 __xor has joined #osdev

20:57 <mjg> so that grace periods can move forward?

20:58 <mrvn> mjg: when you want to create a file the core goes to the directory node, copies it, modifies it and then tries to compare&exchange it atomically. If the compare fails it has to do it all over again.

20:58 sbalmos has quit [Killed (NickServ (GHOST command used by sbalmos1!sbalmos@about/java/sbalmos))]

20:58 sbalmos1 is now known as sbalmos

20:59 arminweigl_ has joined #osdev

20:59 ornitorrincos has joined #osdev

21:00 theruran_ has joined #osdev

21:00 jjuran_ has joined #osdev

21:00 brynet_ has joined #osdev

21:00 <mjg> well it has been some time since i had a look at what ilnux is doing there, but i'm pretty sure they take a lock

21:00 JTL1 has joined #osdev

21:00 <mjg> or at least they used to

21:00 <mjg> on the inode

21:00 <mrvn> mjg: but then it isn't RCU anymore.

21:01 <mjg> *changing* stuff is not

21:01 <mjg> they take hte lock to change stuff

21:01 <heat> dentry code isn't fully RCU

21:01 <mrvn> mjg: but I think you have to. RCU are good for lots of readers but writing goes bad quickly.

21:01 dormito_ has joined #osdev

21:01 <heat> it's best case RCU, worst case you take locks

21:01 <heat> no it doesn't

21:01 <heat> that's why RCU is a thing

21:01 <heat> rw locks aren't good enough

21:01 <mjg> rcu does suck if your workload mainly modifies stuff

21:02 <mjg> but that is NOT the case for path lookups

21:02 <mrvn> heat: I consider O(N^2) realy bad if N is 1000

21:02 <mjg> or several other cases where rcu is heavily used

21:02 zaquest has quit [*.net *.split]

21:02 zhiayang has quit [*.net *.split]

21:02 MiningMarsh has quit [*.net *.split]

21:02 dutch has quit [*.net *.split]

21:02 _xor has quit [*.net *.split]

21:02 gdd has quit [*.net *.split]

21:02 Ram-Z has quit [*.net *.split]

21:02 troseman has quit [*.net *.split]

21:02 wereii has quit [*.net *.split]

21:02 Terlisimo has quit [*.net *.split]

21:02 Emil has quit [*.net *.split]

21:02 bleb has quit [*.net *.split]

21:02 seer has quit [*.net *.split]

21:02 Rubikoid has quit [*.net *.split]

21:02 Beato has quit [*.net *.split]

21:02 scoobydoo has quit [*.net *.split]

21:02 Dreg has quit [*.net *.split]

21:02 valerius_ has quit [*.net *.split]

21:02 antranigv has quit [*.net *.split]

21:02 alpha2023 has quit [*.net *.split]

21:02 zhiayang_ is now known as zhiayang

21:02 bleb_ is now known as bleb

21:02 Marsh is now known as MiningMarsh

21:02 scoobydoob is now known as scoobydoo

21:02 JTL has quit [Ping timeout: 240 seconds]

21:02 xenos1984 has quit [Ping timeout: 240 seconds]

21:02 gmacd has quit [Ping timeout: 240 seconds]

21:02 chartreuse has quit [Ping timeout: 240 seconds]

21:02 Celelibi_ has joined #osdev

21:03 <mrvn> mjg: sure. there are plenty of cases where RCU will work. So far I seem to have just avoided them.

21:03 <mjg> well you got path lookups, so that's already one

21:03 <mjg> :)

21:04 brynet has quit [Killed (NickServ (GHOST command used by brynet_))]

21:04 brynet_ is now known as brynet

21:04 Celelibi has quit [Ping timeout: 240 seconds]

21:04 <mrvn> mjg: except I don't tend to walk paths but have more of an openat() design

21:04 <heat> linux's process/pid data structures is also fully rcu

21:04 xenos1984 has joined #osdev

21:05 <mjg> heat: interestingly the task stuff does not scale

21:05 <mjg> heat: most notably the tasklist_lock is heavily overused

21:05 <mrvn> fully RCU? tasklist_lock? only one of those can be true

21:06 <mjg> task stuff is not fully rcu, but varoius components are

21:06 <heat> maybe not fully yeah

21:06 <mjg> for example you can look up a task with rcu

21:06 <heat> my bad

21:06 <mjg> and perm check against it

21:06 darkstardevx has quit [Ping timeout: 240 seconds]

21:06 DrPatater has quit [Ping timeout: 240 seconds]

21:06 aejsmith has quit [Ping timeout: 240 seconds]

21:06 theruran has quit [Ping timeout: 240 seconds]

21:06 brenns10 has quit [Ping timeout: 240 seconds]

21:06 jjuran has quit [Ping timeout: 240 seconds]

21:06 kaichiuchi has quit [Read error: Connection reset by peer]

21:06 Starfoxxes has quit [Ping timeout: 240 seconds]

21:06 arminweigl has quit [Ping timeout: 240 seconds]

21:06 ornitorrincos_ has quit [Ping timeout: 240 seconds]

21:06 dormito has quit [Ping timeout: 240 seconds]

21:06 jjuran_ is now known as jjuran

21:06 brenns107 is now known as brenns10

21:06 kaichiuchi_ is now known as kaichiuchi

21:06 theruran_ is now known as theruran

21:06 arminweigl_ is now known as arminweigl

21:06 <mjg> what does not scale for shit on linux is task creation + destruction

21:07 <mjg> part of the problem is that they never cleaned up the old hack with threads pretending to be processes

21:07 <mrvn> mjg: One thing to know about my kernel is that everything uses message passing. So processes and drivers have a mailbox. So 96 cores trying to open a file will hammer the vfs mailbox trying to add messages to the mailbox. So you have your big serializer there that RCU can't fix.

21:07 zaquest has joined #osdev

21:08 <mjg> so for example a thread being created (or dying) in a process serializes against creation of a new process (or its destruction)

21:08 <mrvn> mjg: to scale I will have to implement some per-core mailbox design.

21:09 <mrvn> mjg: thread and process are basically the same. Just different flags what namespaces to create. :)

21:09 <mjg> to make it worse, even a toy bench where you fork n workers, each of which spawns and kills a thread in a loop

21:09 <mjg> runs into dramatic contention

21:09 Rubikoid has joined #osdev

21:09 wxwisiasdf has joined #osdev

21:09 <wxwisiasdf> Good morning

21:09 gildasio has quit [Quit: WeeChat 3.6]

21:10 <mrvn> mjg: that's why you don't do that. :)

21:10 <mjg> well real apps do create and kill tons of threads

21:10 <mjg> and you write a microbenchmark like that to get an idea where it is slow

21:10 <mjg> for example clang is notorious for it

21:10 <mrvn> mjg: have they never heard of thread pools?

21:10 <mjg> this is what they do, except the entire thing is short lived

21:11 Starfoxxes has joined #osdev

21:11 <mjg> they check how many cpus they can run on, spawn this many threads blindly

21:11 <mjg> and exit shortly after

21:11 <mrvn> Oh you mean, clang starts, creates 96 threads, reads a header and dies?

21:11 <mjg> does the work with however many threads can do the work

21:11 <mjg> and then exits

21:11 <mjg> afair it is mostly their linker

21:11 <mjg> now imagine building many different programs at the same time

21:11 <wxwisiasdf> solution is OpenTBB :-)

21:12 <mrvn> Yeah. but to be fair, linking is usualy not a short job.

21:12 <mjg> ye i'm pretty sure they don't have any win from spawning all these threads for most programs

21:12 <mjg> not everything is chrome

21:12 gmacd_ has quit [Ping timeout: 240 seconds]

21:13 <mjg> i created a ticket about it, but as you can imagine there was no response

21:13 <mjg> not even a "fuck off"

21:13 <mrvn> mjg: They could check the input and estimate the workload and then create threads when needed.

21:13 <mjg> this is what i suggested

21:13 <mjg> in fact i may end up implementing something like this

21:13 <mjg> good news is that i have an ok base of software to test it against

21:13 Vercas6 has joined #osdev

21:13 <mrvn> Could be a big win for unit tests. They are probably the smallest link jobs you get.

21:14 <mjg> namely the entire freebsd ports collection

21:14 <mrvn> Does clang have an option to limit thread counts?

21:14 <mjg> progs of all shapes and sizes

21:14 <wxwisiasdf> The best way to minimize that is to pass all the files at once to clang so it reuses threads and doesn't repeatedly spawn stuff

21:14 <mjg> not explicitly, the best you can do is use taskset or cmopatible

21:14 <wxwisiasdf> but then you get absurd build times and something failing means you have to do it all again so uh

21:14 <mjg> if it finds out it can only run on n cpus, it is going to spawn n threads

21:14 <mrvn> If I build wiht -j I don't want the compiler to use any threads.-

21:15 <mrvn> Only at the end when I link with LTO I want to us all cores.

21:15 dude12312414 has quit [Remote host closed the connection]

21:15 <wxwisiasdf> mold + clang perhaps?

21:15 <mrvn> (but then that lacks integration with make)

21:16 dude12312414 has joined #osdev

21:16 DutchIngraham is now known as dutch

21:16 <mrvn> mjg: Would be cool if clang could use the GNU make jobserver.

21:17 <mjg> ye i'm done with ranting about make for the week

21:17 <mjg> (bsd make mostly :>)

21:17 <mjg> and i know it is only monday

21:17 <mrvn> mjg: what? No complainsd about "make -j4" all mixing up the output from the processes?

21:17 <mjg> make -sss

21:19 <mrvn> GNU make has an option now to buffer the output from a target and then atomically print it. Only problem with that is that it always buffers the output till the end. Now have 2 interactive jobs waiting for input.

21:20 <mrvn> it should show the output from one target and buffer the rest.

21:20 <mjg> i don't watch any output modulo errors, so i'm mostly fine here

21:20 <mjg> see -s

21:21 <mjg> what i like about linux build process is that they default to simply printing CC foo.c

21:21 <mjg> as opposed to the full blurb

21:21 <mrvn> I'm building netboot images for cluster and one step of it is running kvm, run test cases in kvm and recording what binaries get used. That can take a while and all that time make shows nothing.

21:21 foudfou has quit [Remote host closed the connection]

21:21 antranigv has joined #osdev

21:21 antranigv_ has quit [Quit: ZNC 1.8.2 - https://znc.in]

21:22 <mrvn> And I have a flag to go interactive on error. Can't use that with -j.

21:22 foudfou has joined #osdev

21:23 <mrvn> SO you start the build with -j 8, it churns a bit and then you have 8 kvms running. 20 minutes later it stops with an error because one of the targets had an error right at the start but was blocked from printing it. :((

21:24 <mjg> uh

21:24 kaichiuchi has quit [Ping timeout: 240 seconds]

21:24 <mrvn> or it printed it but you didn't see it scroll by

21:25 <mjg> well bsd make will tell you stuff failed *somewhere*

21:26 <mrvn> sure, but it will still run those 8 other targets, or 7 since one failed.

21:26 <mjg> but then will have tons of other make processes claim something failed, so they are exiting

21:26 <mjg> even with -j 40 that's enough for the error to go way out

21:26 <mjg> it got to a point where i filter the output with awk

21:26 <mrvn> mjg: I often run "make -j8 || make"

21:27 kaichiuchi has joined #osdev

21:27 <mjg> https://dpaste.com/8NPDDL7M7

21:27 <bslsk05> dpaste.com: dpaste: 8NPDDL7M7

21:27 <mjg> with this i get actual errors

21:28 <mrvn> mjg: I kind of would like a make browser. Sort all the make output into a tree following how each traget triggers the next and let me collaps or expand the branches.

21:28 <mjg> ye sounds nice

21:28 <mrvn> Show successful targets in green, running in yellow, failed in red.

21:29 <mjg> existing CI suites can somewhat do it fwiw

21:29 <mrvn> future targets (if it can predict them) in black

21:29 <mjg> ... not that i can honestly recommendo ne

21:32 <mrvn> mjg: I think my kernel makefile is rather nicely done. Per default it just shows "CC foo/bar/baz.cc", "AS boot.S" or "LD kernel.img". One line per file. If you want details you have to set VERBOSE.

21:32 <mjg> ye that's the linux way

21:32 <mjg> i like it

21:33 <mjg> but normally i build the entire thing in 30s

21:33 <mjg> (freebsd i mean)

21:33 <mjg> so i don't want anything scrolling by

21:33 <mrvn> it sucks when the slowest thing in your build is printing the commands

21:35 <mrvn> Do you use ccache?

21:36 <mjg> no

21:36 <mjg> i don't trust it and have no use for it

21:36 <mjg> (not ccache specific, i barely trust anything claiming to do incremetnal builds)

21:37 <mrvn> it's not incremental. It just caches the whole output of a command

21:37 <mjg> i got screwed over by .o files not recompiling after header changes

21:37 <wxwisiasdf> ccache?

21:37 <mrvn> If you have something that takes longer than 30s to build it's a huge speedup.

21:37 <mjg> i'm sure it mostly works for people, good for them

21:37 <mrvn> mjg: that has nothing to do with ccache.

21:38 <mjg> but it has to be able to detect that 'the same file compiled' actually needs to be compiled

21:38 <mrvn> mjg: ccache hashes the preprocessor output, compiler inode, ...

21:38 <mjg> well if it is reliable, that's nice

21:38 <mrvn> haven't seen it fail yet

21:38 <mjg> if i had use for this kind of a tool i would look into it

21:39 <mjg> fortunately see above

21:39 <mrvn> might get your 30s compile time down to 10s :)

21:39 <mjg> 30s is from scratch

21:39 <mjg> i mostly modify .c files and that i trust bmake to pick up for an incremental build

21:40 <mjg> in which case it is literally 2-3s

21:40 <mjg> i'm good here

21:40 <mrvn> .oO(i barely trust anything claiming to do incremetnal builds)

21:40 <mjg> i don't for .h files

21:40 aejsmith has joined #osdev

21:40 <mjg> so i do a fresh dbuild each time i modify one

21:41 <mrvn> mjg: for incremental builds ccache is totally useless. But if you "make clean" after modyfiying a header it would help.

21:41 <mjg> i can live with the 30s

21:41 <mrvn> In my case I hugely helps when building packages because they always make clean.

21:42 <mrvn> I an just "git-build-dpkg" and it's like I do an incremental build.

21:44 wxwisiasdf has quit [Remote host closed the connection]

21:44 <mjg> you mean .debs?

21:44 <mjg> i can see how that would be of use, sure

21:55 nohit has quit [Ping timeout: 240 seconds]

21:55 bgs has quit [Ping timeout: 240 seconds]

21:55 puck has quit [Ping timeout: 240 seconds]

21:55 hl has quit [Ping timeout: 240 seconds]

21:55 Raito_Bezarius has quit [Ping timeout: 240 seconds]

21:55 pieguy128 has quit [Ping timeout: 240 seconds]

21:55 _whitelogger has quit [Ping timeout: 240 seconds]

22:57 __xor is now known as _xor