#osdev on 2022-02-24 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:09 kingoffrance has quit [Ping timeout: 240 seconds]

00:14 heat has joined #osdev

00:16 <klange> hm, my serial console is freezing, and it doesn't look like it's a problem on the RPi side; if I restart the serial console (picocom) I get output again

00:16 heat has quit [Remote host closed the connection]

00:16 <klange> This is not a smart link, it's just tx/rx so it's not communicating any control signals when it restarts... may be my serial adapter on the host side being finnicky?

00:19 <mrvn> what does minicom or screen do?

00:20 kingoffrance has joined #osdev

00:23 <klange> have awful terminal emulators embedded in them? oh you mean do they have the same behavior ;)

00:24 <mrvn> the later

00:24 <klange> I'll let screen run for a bit.

00:26 diamondbond has quit [Ping timeout: 256 seconds]

00:31 <eryjus> klange, i ended up having to write my own. believe it or not, I think it was leveraged from something mrvn wrote for the original rpi

00:33 <mrvn> raspbootcom?

00:34 <eryjus> yes. i guess it doesn't quite qualify as my own..

00:34 <eryjus> most of the code was rewritten by my own hand but i used it as a reference solution

00:35 <eryjus> https://github.com/eryjus/pi-bootloader

00:35 <bslsk05> eryjus/pi-bootloader - This is a serially connected bootloader for the rpi hardware (0 forks/0 stargazers/NOASSERTION)

00:36 <eryjus> `pbl-server /dev/ttyUSB0 .` will work as a dumb serial console

00:46 Burgundy has quit [Ping timeout: 256 seconds]

00:47 [itchyjunk] has joined #osdev

00:59 theruran has joined #osdev

01:03 <geist> klange: re serial locking up like that, I’ve seen that from time to time with screen and a usb serial adaptor

01:04 <geist> Sometimes its outgoing data locks up until you restart screen

01:04 <geist> I’m guessing it’s not screen per se, but it needs some sort of tty level reset that happens when you reattach it

01:04 <dmh> i was just mentioning this problem in another channel

01:04 <dmh> hrm

01:04 <geist> Which may or may not result in some sort of usb level control reset for the serial adaptor

01:04 <klange> screen just did it, and if I c-a k and restart, it's fine, so, yeah, screen and picocom both doing it

01:05 <geist> One of these days I’ll just have a scope or probe on it and then i can tell if the problem is between screen and FTDI/etc or something else

01:06 <geist> I’ve always wondered if its possible it could get a stray vt100 command that puts screen off in some mode where it seems to be locked up

01:07 <geist> As you can sometimes get if you’re looking at pure garbage coming in (or binary data)

01:07 <geist> Or perl

01:09 <klange> relieved to hear this is not a 'me' problem

01:10 <geist> Yah i specifically see it on Linux if i leave screen just tailing some console on a fuchsia device for a few days. Usually i come in a day or two later, hit a key and get no response, but if i restart screen the console ‘comes back’

01:10 <geist> But it’s still receiving things. Never have looked into what the deal is but since it’s not using HW flow control andit’s just 3 wire serial, i dont think it can be fuchsias fault

01:10 <klange> I get it after _minutes_, but this is jankiest TTL-to-serial-to-USB setup

01:10 <dmh> was just going to ask if using any hw rts/cts

01:11 <dmh> which ttl chip, ft232?

01:11 <dmh> or is it some mystery dongle

01:11 <klange> clone max3232

01:11 <dmh> ah

01:12 <geist> Yahi’ll have to check on mine as well. I generally have good luck with ftdis

01:12 <klange> attached to whatever RS232 USB adapter I could get my hands on, some Elecom (local big-name retail consumer electronics brand) crap I haven't opened up to see what's inside

01:14 <geist> I suppose it could be that when you disconnect from the tty the serial chip drops the high level TX output which is the steady state

01:14 <geist> And thus the other side sees a long BRK basically and that resets everything

01:15 <geist> I forget the key stroke but in screen you can do a ctrl-a to send a BRK so may be worth trying that next time

01:15 <geist> (Also good for testing uart drivers since BRKs are usually an error condition to the uart)

01:15 <mrvn> Linux sends a brk when you open the device. That's why an arduino always reboots when you open the console.

01:16 <geist> Yah

01:17 <clever> i kind of looked at that, as break signaling a literal break in the serial cable

01:17 <clever> somebody unpluged or severed the line

01:17 <klange> The rpi's mini uart interprets breaks as nil bytes: As a result of this, a UART1_RX input line which is continuously low (a break condition or an

01:17 <klange> error in connection or GPIO setup) causes the receiver to continuously receive 0x00 symbols.

01:17 <clever> yeah, the mini-uart is uter crap

01:17 <klange> nice pdf ya got there, broadcom, that should not have been two lines...

01:18 <clever> config.txt can give you the PL011 uart on the gpio header

01:18 <clever> and either route the mini-uart to the bt, or just disable bt entirely

01:19 <klange> I don't have the hardware to benefit from talking to a real UART anyway, this dumb little max3232 board only has tx/rx/gnd/5v

01:19 <clever> also note, that the uart on the rpi is only 3.3v tolerant

01:19 <clever> if you shove 5v into its pins, you will fry them

01:20 <klange> Yes, this is a 3v ttl chip, the 5v is... to power the chip, presumably.

01:20 <clever> yeah, that can be handy, for low-draw devices

01:20 <klange> Or in the rs232 conversion, all I know is I had to hook up all four wires or it didn't work.

01:21 <clever> of note, the usb-c port on a pi400 can emulate a usb device

01:21 <mrvn> Can the rip3/4 still be powered over the 5V gpio pin?

01:21 <clever> ive turned my pi400 into a dumb usb keyboard

01:21 <clever> mrvn: pi3/pi4 yes, pi400 no

01:22 <mrvn> 800mA seems a bit low for them. Unless you think those cheat chinese TTL USB dongles don't do current limiting and give out more.

01:22 <clever> its also up to the usb host to limit current as well

01:22 <clever> and some dont

01:23 <mrvn> They probably just pass the USB 5V line though.

01:23 <clever> the uart board, yeah

01:23 <mrvn> Then you just need a cheap USB chip that doesn't require USB3 power negotiation before allowing 2A draw.

01:24 <clever> or the reverse, get something with a buck regulator and PD negotatiation

01:24 <clever> then you get get 2A 50V and buck it back down to 5v, to provide mega amps :P

01:25 <klange> Anyway, it is nice to be able to actually interact with this thing now, should make poking PCIe and XHCI a bit easier than having to repeatedly rebuild + restart...

01:25 <clever> yeah

01:25 <clever> i had serial working before i had any video working

01:25 <mrvn> klange: does sending a break in screen work too?

01:26 <klange> Did not check. In picocom it's C-a C-\ so I'll try that whenit happens again there.

01:54 pretty_dumm_guy has quit [Ping timeout: 256 seconds]

01:55 elastic_dog has quit [Ping timeout: 252 seconds]

02:01 elastic_dog has joined #osdev

02:07 diamondbond has joined #osdev

02:08 ephemer0l has quit [Ping timeout: 250 seconds]

02:32 diamondbond has quit [Ping timeout: 240 seconds]

02:49 Brnocrist has quit [Ping timeout: 240 seconds]

02:51 <klange> sending break in picocom did not fix things

02:51 <klange> restarting it did

02:51 <clever> one of my usb uart adapters is very buggy

02:52 <clever> if you send it a malfirmed uart byte, it deadlocks

02:52 <clever> and it doesnt recover until you close and re-open the tty in linux

02:56 Brnocrist has joined #osdev

03:00 smeso has quit [Quit: smeso]

03:02 diamondbond has joined #osdev

03:11 smeso has joined #osdev

03:12 diamondbond has quit [Ping timeout: 256 seconds]

03:25 gog has quit [Ping timeout: 240 seconds]

03:28 gog has joined #osdev

03:34 diamondbond has joined #osdev

03:55 masoudd has joined #osdev

04:05 eryjus has quit [Remote host closed the connection]

04:09 eryjus has joined #osdev

04:10 eryjus has quit [Remote host closed the connection]

04:13 eryjus has joined #osdev

04:14 diamondbond has quit [Ping timeout: 240 seconds]

04:18 gog has quit [Ping timeout: 240 seconds]

04:28 ephemer0l has joined #osdev

04:50 vdamewood has joined #osdev

04:57 vinleod has joined #osdev

05:00 vdamewood has quit [Ping timeout: 240 seconds]

05:05 bradd has quit [Remote host closed the connection]

05:06 bradd has joined #osdev

05:13 Starfoxxes has quit [Ping timeout: 260 seconds]

05:14 Starfoxxes has joined #osdev

05:14 vinleod is now known as vdamewood

05:27 Jari-- has quit [Ping timeout: 240 seconds]

05:45 ElectronApps has joined #osdev

06:15 srjek has quit [Ping timeout: 240 seconds]

06:38 Jari-- has joined #osdev

07:01 the_lanetly_052_ has joined #osdev

07:36 [itchyjunk] has quit [Read error: Connection reset by peer]

07:39 sdfgsdfg has joined #osdev

08:03 gwizon has quit [Quit: Lost terminal]

08:46 pretty_dumm_guy has joined #osdev

08:52 Belxjander has quit [Ping timeout: 256 seconds]

08:57 eroux has joined #osdev

09:03 <klange> One of the forum regulars is in Bakhmut.

09:35 <mrvn> wherever that is

09:39 <klange> Uh, it's in Donetsk.

09:40 <FireFly> :\

09:59 the_lanetly_052_ has quit [Ping timeout: 240 seconds]

10:02 <vdamewood> klange: Fingers crossed.

10:07 dormito has quit [Quit: WeeChat 3.3]

10:19 GeDaMo has joined #osdev

10:22 Burgundy has joined #osdev

10:30 <Jari--> https://www.gnu.org/software/hurd/hurd-and-linux.html

10:30 <bslsk05> www.gnu.org: The Hurd and Linux

10:30 <Jari--> Hows Hurd going today?

10:31 <GeDaMo> Well, it seems to still have a website :P

10:31 <mjg> i checked the repo few weeks back, activity dwindled almost to 0

10:31 <kazinsal> iirc debian stoppped updating their hurd distro

10:32 <mjg> and even before that it was few different people committing something here and there

10:32 <mjg> i think the funny part is that they have a doc somewhere about "hurdng"

10:32 <mjg> realising shortcomings of hurd, evne ignoring that it does not work

10:32 <kazinsal> debian-hurd mailing list has only nine unique posters across fifteen posts this month apart from automated bot postings

10:33 <kazinsal> hurd is dead, unless you're rms

10:33 <mjg> does rms even use it?

10:33 <kazinsal> no, he uses some bespoke linux distro on a bizarre chinese spyware laptop

10:35 <Jari--> https://cdimage.debian.org/cdimage/ports/latest/hurd-i386/debian-hurd.img.tar.gz

10:35 <Jari--> Hopefully it wont crash the robot though :)

10:35 <Jari--> Anyone here born before 1980 ?

10:35 <Jari--> Im 30.05.1980

10:35 <GeDaMo> o/

10:35 <kazinsal> hurd seems so incredibly superfluous at this point

10:36 <GeDaMo> I'm 0x39

10:36 <Jari--> GeDaMo thats decimals?

10:36 <GeDaMo> ... no :|

10:36 <Jari--> 42

10:36 <Jari--> almost granpa kids here at Pupuhuhta say

10:37 <kazinsal> linux actually has asyncio, so does freebsd, there are other actually functional microkernels for embedded and real-time systems like qnx, and NT is a stable hybrid kernel

10:37 xenos1984 has quit [Read error: Connection reset by peer]

10:37 <kazinsal> you do hurd things in 2022 because you're part of the cult of unhygenic MIT tenure

10:37 <Jari--> No more subjectline robot?

10:37 <Jari--> I thought it might download the image.

10:38 <kazinsal> bslsk only spits out summaries for things it can parse the response for iirc

10:38 <kazinsal> based on content-type

10:38 <klange> Hurd never even managed to deliver on the promise of microkernel stability, it's buggier than toaru32 and device drivers take out the whole thing just as good as any Linux.

10:39 <kazinsal> QNX is almost 40 years old now

10:39 <kazinsal> and it still works real well

10:39 <kazinsal> iirc BlackBerry owns it now

10:40 <kazinsal> if you want a stable, functional, open source microkernel, just download minix

10:40 <kazinsal> if you want a stable, functional, widespread microkernel, go get a QNX license

10:41 <kazinsal> and if you want an open source microkernel that's worth hacking on but you're not expecting stability or functionality, learn rust and grab redox

10:41 <kazinsal> there's no point in hurd anymore

10:49 <mlombard> kazinsal, probably Zircon would be a better choice right now

10:55 xenos1984 has joined #osdev

10:55 <mjg> this makes me wonder how many people even took hurd seriously back in the day

10:56 <mjg> by the time i started playing with systems (2003 or so) linux was the de facto standard

10:56 <mjg> with hurd already being a curiosity

10:58 <Jari--> If you have an Intel-chipset based motherboard, there are great chances it is equipped with the Intel Management (Intel ME) unit. This is not new. And concerns regarding the privacy issue behind that little know feature were raised for several years. But suddenly, the blogosphere seems to have rediscovered the problem. And we can read many half-true or just plain wrong statements about this

10:58 <Jari--> topic.

10:58 <Jari--> https://itsfoss.com/fact-intel-minix-case/

10:58 <bslsk05> itsfoss.com: The Truth About the Intel's Hidden Minix OS and Security Concerns - It's FOSS

11:02 <Jari--> Built into many Intel® Chipset–based platforms is a small, low-power computer subsystem called the Intel® Management Engine (Intel® ME). The Intel® ME performs various tasks while the system is in sleep, during the boot process, and when your system is running.

11:02 <Jari--> wow

11:02 <Jari--> so basically there is even 4-bit subprocessors in your new Intel

11:02 <Jari--> ?

11:03 <j`ey> not sure where 4-bit came from?

11:03 <Jari--> just guessing what is "low level"

11:04 <j`ey> i'm assuming ME is probably 32-bit

11:05 <GeDaMo> "Starting with ME 11, it is based on the Intel Quark x86-based 32-bit CPU and runs the MINIX 3 operating system." https://en.wikipedia.org/wiki/Intel_Management_Engine

11:05 <bslsk05> en.wikipedia.org: Intel Management Engine - Wikipedia

11:11 dormito has joined #osdev

11:20 vdamewood has quit [Read error: Connection reset by peer]

11:21 vdamewood has joined #osdev

11:23 <mjg> wierd, i heard 32 bit cpu are more power hungry (and not because of age)

11:24 <j`ey> more power hungry tha 64bit?

11:24 <mjg> ye

11:31 dormito10 has joined #osdev

11:34 <mrvn> mjg: only if you use the extra bits to do less ops.

11:34 dormito has quit [Ping timeout: 272 seconds]

11:34 <mrvn> carrying an extra 32 zeroes on every address doesn't save power.

11:35 <mrvn> same for ints

11:35 <clever> mrvn: ive heard the reverse recently, about a pi4 running hotter when in 64bit mode

11:35 <mrvn> If the ME is a x86 32bit cpu can we hack it and run it as extra core for our kernel?

11:37 <mrvn> clever: Sure. as I said, carrying around an extra 32bit doesn't save power. In 32bit mode it probably shuts down the upper half of units.

11:37 <clever> yeah

11:37 <clever> one question, is what address space the ME lives in

11:37 <mrvn> But if you use the 64bit to twice the work and then go to sleep that should save power.

11:37 <mrvn> +do

11:37 <GeDaMo> Writing to a 32 bit version of a register zeroes the upper 32 bits of the 64 bit register on both x86 and ARM

11:38 <mrvn> clever: doesn't the ME have access to all the memory and pci devices?

11:38 <clever> the VPU on the rpi behaves a lot like an intel ME, and also lives in a different and more powerful address space

11:38 <clever> mrvn: yeah, but is it mapped to the same physical addresses?

11:38 <mrvn> clever: do we care? I assume it has an MMU

11:38 <clever> linux wont be happy if you need a different physical address for each core

11:39 <clever> can linux use a different set of paging tables for each core?

11:39 <mrvn> clever: no way

11:39 <mrvn> cache coherency is probably toally broken with the ME.

11:41 dennis95 has joined #osdev

11:41 <mrvn> GeDaMo: I think on x86 it sign extends on addresses and you can choose for integers.

11:42 <GeDaMo> If you do mov eax, <anything> the upper 32 bits of rax are zeroed

11:42 <mrvn> and movsx eax, <anything> will sign extend.

11:43 <GeDaMo> That will still zero the upper 32 bits of rax

11:43 <GeDaMo> Because you're writing to eax

11:43 <clever> ah, and for unsigned ints, you dont want it to sign-extend, because that could corrupt any int between ~2bil and ~4bil

11:44 <mrvn> GeDaMo: are you sure? doesn't it sign extend the full 64bit?

11:44 <GeDaMo> No

11:45 <GeDaMo> If you want to sign extend the full 64 bits it would have to be moxsz rax, <whatever>

11:46 <mrvn> still, you have a choice which extension you want.

11:47 <mrvn> no need to first load the register and then sign or zero extend. That's what makes x32 usefull.

11:48 <mrvn> (x32 being x86_64 in 64bit mode but using ILP32 abi)

11:49 <mrvn> or was it IP32L64?

11:49 <mrvn> long, the most useless type in C

11:49 <GeDaMo> https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models

11:49 <bslsk05> en.wikipedia.org: 64-bit computing - Wikipedia

11:49 <clever> aarch64 also has a similar 32bit mode

11:50 <mrvn> fun. does gcc/clang support x32 on aarch64?

11:50 <clever> i think so, i think its named arm64ilp32 (was in the above wiki link)

11:50 <clever> https://wiki.debian.org/Arm64ilp32Port

11:50 <bslsk05> wiki.debian.org: Arm64ilp32Port - Debian Wiki

11:51 <mrvn> I never understood why x32 isn't much more supported on amd64.

11:51 <clever> https://lwn.net/Articles/754516/ patches to linux to enable support for it

11:51 <bslsk05> lwn.net: ILP32 for ARM64 [LWN.net]

11:51 <clever> given that those patches are for 4.16, its probably in mainline by now

11:52 <mrvn> I was more thinking for my own kernel

11:52 <clever> https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/AArch64-Options.html ah here it is, -mabi=ilp32

11:52 <bslsk05> gcc.gnu.org: AArch64 Options - Using the GNU Compiler Collection (GCC)

11:52 <clever> so just pop that flag into your makefile, and fix all of the bugs caused by pointers shrinking

11:53 <mrvn> At most I have 8GB ram so I don't think I will even have one process using >4GB and needing 64bit addresses.

11:54 <GeDaMo> 4Gbytes should be enough for anyone! :P

11:54 <mrvn> It's a bit more involved than that. You need a LP64 kernel but ILP32 user space.

11:55 <j`ey> clever: linux doent support it

11:55 <clever> j`ey: ah, the patches never got merged?

11:55 <j`ey> nope

11:56 <clever> dang

11:56 <clever> there is also an mmap flag i saw, that i think is related

11:57 <clever> its missing from the man page now, but there was an mmap flag to ask that the resulting virt addr be in the lower 4gig

11:57 <clever> so an ilp32 userland could run under a fully 64bit kernel

11:58 <clever> though, the syscall api would still have to be fully 64bit, since the kernel expects that

11:58 <clever> a bit tricky, but still feasible

11:58 <GeDaMo> Unless you use the int 80h interface :P

11:58 <clever> why!? :P

11:59 <j`ey> MAP_32BIT clever?

11:59 <clever> j`ey: yeah, thats the one, i dont know why its missing from my man page now

12:00 <clever> j`ey: i also mentioned it in https://github.com/raspberrypi/firmware/issues/550

12:00 <bslsk05> github.com: AArch64 support · Issue #550 · raspberrypi/firmware · GitHub

12:00 <clever> ah weird, somebody else did

12:00 <clever> i could swear i had also said it in one of the rpi places

12:01 <clever> in a similar vein, the rpi firmware is using 32bit opaque tokens, to hold userland pointers

12:01 <clever> so when a msg from the blob returns back to userland, the library can find its own state

12:01 <clever> and that all breaks when you throw a 64bit userland at things

12:02 <clever> me and others had said to just allocate that state with MAP_32BIT, so the firmware doesnt have to widen the fields

12:02 <clever> but RPF instead moved all of the blob communications into the kernel, and then userland uses more standardized api's like v4l

12:05 dormito has joined #osdev

12:07 ElectronApps has quit [Remote host closed the connection]

12:07 ElectronApps has joined #osdev

12:08 dormito10 has quit [Ping timeout: 256 seconds]

12:09 vdamewood has quit [Read error: Connection reset by peer]

12:10 vdamewood has joined #osdev

12:17 <mrvn> you don't want userland to pass pointers to the VC. It might get it to read/write memory the user shouldn't have access to

12:19 <mrvn> Who here has their userspace start at 0x1_0000_0000 so any int<->pointer conversions fail for sure?

12:25 <Jari--> fails, related to VC?

12:29 ElectronApps has quit [Remote host closed the connection]

12:30 ElectronApps has joined #osdev

12:31 ElectronApps has quit [Remote host closed the connection]

12:42 <mrvn> related to int pointer conversions

12:45 not_not has joined #osdev

12:45 <not_not> Hey

12:45 Payam23 has joined #osdev

12:45 <not_not> War was beginning

12:46 <not_not> Somone set up us the bob

12:46 <not_not> Its you!!

12:47 <not_not> Putin: how are you gentlemen

12:47 adder has joined #osdev

12:57 the_lanetly_052_ has joined #osdev

13:09 <g1n> hi not_not

13:09 <g1n> are you from ukraine?

13:11 not_not has quit [Ping timeout: 256 seconds]

13:18 [itchyjunk] has joined #osdev

13:33 diamondbond has joined #osdev

13:36 gog has joined #osdev

14:03 diamondbond has quit [Ping timeout: 256 seconds]

14:13 corecode has quit [Quit: ZNC - http://znc.in]

14:21 dude12312414 has joined #osdev

14:31 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

14:35 sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]

14:36 <gog> felxiclouteier.com isnt loading ;-;

14:37 <gog> i don't wanna trawl through the pdfs

14:40 nyah has joined #osdev

14:44 srjek has joined #osdev

14:49 <vdamewood> gog: :(

14:59 the_lanetly_052 has joined #osdev

15:01 the_lanetly_052_ has quit [Ping timeout: 240 seconds]

15:02 blockhead has quit []

15:05 adder has quit [Read error: Connection reset by peer]

15:07 adder has joined #osdev

15:12 adder has quit [Read error: Connection reset by peer]

15:13 corecode has joined #osdev

15:14 adder has joined #osdev

15:17 <gog> it's all good i found the answer i needed

15:17 <gog> it just took longer

15:19 adder has quit [Read error: Connection reset by peer]

15:20 adder has joined #osdev

15:21 diamondbond has joined #osdev

15:25 gwizon has joined #osdev

15:46 Burgundy has quit [Remote host closed the connection]

15:49 ping- has joined #osdev

16:17 diamondbond has quit [Ping timeout: 240 seconds]

16:30 FatalNIX has quit [Quit: Lost terminal]

16:44 bxh7 has joined #osdev

16:56 <GeDaMo> That's why I have a local copy of felixcloutier's reference :P

16:57 <gog> is it multipage html?

16:57 <gog> i like that a lot better than a big fat pdf lol

16:57 <GeDaMo> Yeah, it's all separate HTML pages, nothing fancy

16:57 <gog> can you upload that somewheres?

16:57 <gog> i'd like a copy too because the site still isn't loading

16:59 <GeDaMo> Do you know a file site that doesn't need a login?

16:59 <j`ey> transfer.sh

16:59 <gog> i don't but thank you j`ey

17:02 Payam23 has quit [Quit: Client closed]

17:03 <GeDaMo> gog: https://transfer.sh/6gCt8p/fcx86.tgz

17:04 <gog> big thanks!!

17:04 <GeDaMo> index.html

17:04 pretty_d1 has joined #osdev

17:04 dude12312414 has joined #osdev

17:05 sonny has joined #osdev

17:06 pretty_dumm_guy has quit [Ping timeout: 245 seconds]

17:06 <gog> nice

17:06 <gog> this is indeed going to be handy

17:07 <GeDaMo> I took a local copy because I use it so often :P

17:07 <GeDaMo> It'll be annoying to not be able to refer others to it though

17:08 <gog> i'm hoping the outage is temporary but agreed

17:08 <gog> it's like when phoenix took down their UEFI reference wiki

17:08 <gog> except that was permanent and i think on copyright grounds?

17:10 mahmutov has joined #osdev

17:14 <GeDaMo> Ah felix cloutier's site is working again :)

17:14 bxh7 is now known as tty

17:14 tty is now known as ttydriver

17:14 <gog> still not working for me

17:15 <gog> dns resolves

17:15 ttydriver is now known as chigorin

17:15 <gog> ¯\_(ツ)_/¯

17:18 <GeDaMo> Never mind, you have your local copy now :P

17:19 <gog> yes :D

17:24 <eryjus> well, I might get in trouble but https://eryjus.com/x86

17:24 <bslsk05> eryjus.com: x86 and amd64 instruction reference

17:27 <gog> aaay

17:34 <sonny> "Singularity's approach differs in that it isolates a process's objects by memory pages, rather than allocating them in a common address space. When a process terminates, Singularity quickly reclaims the process's memory pages, rather than turning to garbage collection to reclaim memory." -- I'm reading about singularity OS but I don't get how it

17:34 <sonny> lets a programmer handle memory if it only reclaims memory when the process is finished.

17:35 masoudd has quit [Ping timeout: 272 seconds]

17:35 <sonny> The source is here if anyone's interested <https://cacm.acm.org/magazines/2010/8/96608-the-singularity-system/fulltext>

17:35 <mrvn> sonny: all OSes do that

17:35 <mrvn> (if at all)

17:35 <sonny> yeah

17:36 <gog> applications don't typically give back memory to the system during runtime

17:36 <sonny> except you don't get delete

17:36 <sonny> in singularity you have to use something like C# so I don't understand how one would manage memory

17:37 <sonny> gog: oh, I thought that's what memory management was for?

17:37 <mrvn> sonny: 1) I don't see how singularity differs when it does what every OS does anyway.

17:37 <sonny> I don't either

17:37 <sonny> that's what I asked lol

17:37 <mrvn> sonny: 2) c# handles memory internally and will allocate or free larger chunks

17:39 <mrvn> What singularity are you talking about anyway? The container?

17:39 <sonny> mrvn: "When a process terminates, Singularity quickly reclaims the process's memory pages, rather than turning to garbage collection to reclaim memory." -- The modified C# they use doesn't free stuff I think

17:39 <sonny> mrvn: The research OS from microsoft

17:40 <GeDaMo> It probably does garbage collection while running but at the end can discard all the allocated pages

17:40 <mrvn> sonny: So maybe they mean the singularity knows what memory a process has as opposed to c# normaly deducing that by garbage collection

17:40 <sonny> oh

17:40 <mrvn> So each process has their own GC over the processes address space and on exit the whole address space gets dumped.

17:41 <sonny> I see

17:41 <mrvn> just a guess though.

17:43 <mrvn> The way I hacked this together for ocaml was to spin up separate ocaml runtimes for each cpu core (i.e. process). So I just run 4 ocamls in parallel with a little glue to IPC.

17:44 <GeDaMo> I believe Singularity is a single address space system

17:44 <mrvn> GeDaMo: doesn't really matter as long as allocated memory is dedicated to a process.

17:45 <GeDaMo> Yeah

17:46 <mrvn> Anyway, my point was that I just used the MMU to separate the ocaml instances so each had it's own memory pool without me having to modify ocaml at all.

17:47 <mrvn> Just had to provide malloc/free and some other libc functions as well as the IPC mechanism and the normal linux ocaml runtime and binaries work barbone.

17:48 <sonny> neat

17:48 <mrvn> Singularity might do something similar and run totally normal c# runtimes as processes.

17:48 <sonny> singularity's thing is objects go in pages

17:48 <mrvn> every object no matter how small uses a page?

17:49 amazigh has quit [Ping timeout: 240 seconds]

17:50 <sonny> not quite

17:50 <sonny> but I have to go sorry, I'll explain later maybe

17:50 <mrvn> Singularity is a stupid name. How is one supposed to google for that?

17:52 <bauen1> there's a game called singularity about an ai achieving singularity

17:52 amazigh has joined #osdev

17:52 <mrvn> and a movie and the general concept and a ton of other stuff

17:54 diamondbond has joined #osdev

18:04 sonny has quit [Ping timeout: 256 seconds]

18:07 <gog> i like gravitational singularities

18:07 <gog> aka the cosmic succ

18:16 diamondbond has quit [Ping timeout: 256 seconds]

18:20 <Bitweasil> There's also Universal Paperclips.

18:20 <Bitweasil> https://www.decisionproblem.com/paperclips/

18:20 <bslsk05> www.decisionproblem.com <no title>

18:20 <Bitweasil> I'm sorry :)

18:21 dennis95 has quit [Quit: Leaving]

18:21 <gog> when you need to paperclip everything

18:21 <gog> literally everything

18:21 nitrix has joined #osdev

18:21 <GeDaMo> https://www.evilmadscientist.com/2013/paperclip/

18:22 <bslsk05> www.evilmadscientist.com: How to Build a Working Digital Computer… out of paperclips | Evil Mad Scientist Laboratories

18:24 <geist> hah port your kernel to that

18:25 [itchyjunk] has quit [Remote host closed the connection]

18:26 [itchyjunk] has joined #osdev

18:26 the_lanetly_052 has quit [Ping timeout: 250 seconds]

18:37 not_not has joined #osdev

18:37 <not_not> How R U all?

18:38 <geist> howdy. got back from disneyland last weekend

18:38 <geist> was fun

18:38 <not_not> Nice

18:38 <not_not> Mine was hell

18:38 <not_not> Or no

18:38 <not_not> Well

18:38 <not_not> Im locked up in an insane asylum

18:39 <not_not> No access to real meds

18:40 <not_not> Oh and ww4 started

18:41 <not_not> Hmm i scared russisk straight before but

18:41 <not_not> Russia

18:42 <not_not> How was Disney land

18:43 <Jari--> hey guys

18:43 <not_not> Hey

18:43 <Jari--> put up a coffee machine project, use your OS

18:43 <Jari--> sell it out, get rich

18:43 <not_not> Lol

18:45 <not_not> So russia war again

18:48 <not_not> Jari

18:48 <not_not> Do it

18:48 <not_not> I have 4 companies now

18:48 nyah has quit [Remote host closed the connection]

18:51 <not_not> Jari just wrote the first Lines of code

18:51 <not_not> Just the Main function and printf

18:52 <not_not> "coffe cash candy"

18:53 <not_not> God i need to stay up at nights

18:53 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

18:54 <not_not> All the people who want to harm me on idealisttic principles

18:54 <not_not> Get up early

18:55 <not_not> Go to bed early

18:55 <gog> i finished my mm refactor yesterday

18:56 <not_not> Gz

18:56 <not_not> Whats that?

18:56 <gog> shook loose a bug that i otherwise wouldn't have found until later

18:56 <gog> memory management

18:56 <not_not> Ahhhh

18:56 <not_not> Ahhhhh nice

18:57 <not_not> Gonna have mom come with SD card

18:57 <gog> i still have more to do but the new skeleton is pretty cool

18:57 <not_not> Nice got a github?

18:59 <not_not> Wanna see lots of example code before i make me own os project

18:59 <gog> uhh github ain't loading for me rn

18:59 <not_not> Ahh

19:00 <not_not> Ukranian serveres?

19:00 <gog> no

19:00 <GeDaMo> I don't have a local copy of github :|

19:00 <gog> at least i dont think so

19:00 <gog> GeDaMo: lol ofc not

19:00 <not_not> Hahaha

19:00 <not_not> Im worried

19:00 <gog> i just pushed my repo like 10 mins ago

19:00 <gog> no failure

19:00 <gog> so idk

19:01 <geist> yay whatcha working on lately gog?

19:01 <not_not> The clowns manage to lock me in an in a sanitarium letting me have 2 * 2 milli valium a day

19:01 <geist> not_not: cut it out. we dont want to hear it here

19:01 <geist> not on topic

19:01 <gog> geist: https://github.com/adachristine/sophia/tree/restart

19:01 <bslsk05> github.com: GitHub - adachristine/sophia at restart

19:01 <geist> first warning

19:02 <gog> also @not_not

19:02 <not_not> Ok

19:02 <gog> it's working now idk why

19:02 <gog> geist: my struct page table is now demand-allocated :D

19:02 <not_not> Hate those bugs

19:02 <geist> hmm, what does htat struct represent?

19:02 <gog> physical pages

19:03 <gog> like their allocation status and refcount

19:03 <geist> ah. interseting. when a physical page is in use a struct is allocated?

19:03 <gog> presence, allocation status and refcount

19:03 <gog> no, it keeps track of every page of physical address space that gets referenced (up to a point)

19:03 <geist> ah 9okay. was going to say if it was a 1:1 of every ppage, i usually recommend just allocating those up front

19:03 <not_not> Sorry

19:04 <geist> because if it's not in use there's no real waste of space

19:04 <gog> it's more like i was relying on ugly hacks to get it set up in memory

19:04 <geist> yah that's always annoying

19:04 <gog> and the code was gross and i didn't like the way it looked

19:04 <geist> especially so with systems like x86-pc where there's generally at least one discontinuity in physical space

19:05 <gog> yes, exactly, so checking the status of a page that hasn't been accounted for yet just maps a zero page into the table

19:05 <gog> and if it's never marked as "present" a write will never happen for it

19:05 <geist> does it get the pages near it?

19:06 <geist> since presumably the struct is smaller than a page

19:06 <gog> the struct is 4 bytes rn lol

19:06 <gog> but if an adjacent page is marked present then a real page gets swapped in

19:06 <geist> ah i see. yeah makes sense.

19:06 <mrvn> I assume you put htat[1024] into one page in 1024

19:06 <geist> yeah that's a nice strategy. <approval>

19:07 <gog> mrvn: uh say again?

19:07 <gog> geist: i'm pretty proud of it lol

19:08 <mrvn> gog: you batch those per page structs together till they fill a page

19:08 <gog> oh yeah the array is over virtual space

19:08 <gog> so there's 1024 of them in each page

19:08 <not_not> Nice code

19:09 <gog> for any contiguous range of uh... 4MiB?

19:09 <Jari--> what, no coffee machine runs Windows 11? Tried amazon.com !

19:09 <mrvn> uhoh, that's bad. So if mmap 1 page every 1024 you waste 50% of memory on per page structs?

19:09 <geist> it's for physical pages

19:09 <gog> no there's no waste

19:09 <gog> https://github.com/adachristine/sophia/blob/restart/kc/core/memory/page_stack.c

19:09 <bslsk05> github.com: sophia/page_stack.c at restart · adachristine/sophia · GitHub

19:09 <geist> which in the long time will end up being occupied anyway, so it's really there to delay the setup of the physical pages

19:09 <geist> physical page structs

19:09 <mrvn> if it's over virtual space instead of physical space ...

19:10 <gog> i probably didn't explain it so well

19:10 <geist> yeah now that i think about it i may have misheard

19:10 <gog> basically i just create an anonymous memory region where the array lives

19:11 <mrvn> so just demand page mapping

19:11 <gog> yes

19:11 <gog> and i have an early allocator that can grab pages from the memory map at boot time

19:11 <gog> but it checks the primary allocator first

19:11 <geist> to be clear: the struct is per physical page or per virtual page in an aspace?

19:11 <gog> per physical page

19:12 <geist> okay, then i did understand

19:12 <gog> and the index of the struct in the array is its pfn

19:12 mahmutov has quit [Ping timeout: 252 seconds]

19:13 xenos1984 has quit [Read error: Connection reset by peer]

19:13 <gog> ope found a bug

19:14 <gog> well hm idk tho

19:14 <gog> i don't think i need to refcount mmio pages though

19:14 <mrvn> gog: it can get pretty bad when it has to recurse a bunch of times to allocate all the page tables and per page structs a bunch of times to get a free page.

19:15 <geist> yah usually what i do with these is add alist node to the struct, and then move the page struct around between objets (or free list)

19:15 <gog> that recursing over page tables settles down shortly after boot

19:15 <geist> which then acts as the ownership mechanism

19:15 <geist> mmio pages i think you may want to treat differently

19:15 <mrvn> I would initialize the array on boot according to the memory map.

19:15 <geist> usually i *dont* add a tracking structure since there's no reason to track it

19:15 <geist> and you can't predict where mmio will appear anyway

19:16 <geist> may be wayyyyy out past the end of physical ram

19:16 <geist> though on PCs it's usually 'near' regular ram

19:16 <geist> but i've seen some high PCIe apertures be fairly far out, even on a PC

19:16 <gog> that might be a problem

19:17 <gog> but i have 4GiBs of virtual space for the table

19:17 <mrvn> you can easily have an array covering all of the address space.

19:17 <geist> well, need to be clear: what do you mean by address space here

19:17 <mrvn> all 40-56bits of addressable address space

19:18 <geist> so virtual

19:18 <gog> and i can index 41 bits of physical space with my current struct

19:18 <mrvn> no, physical

19:18 <mrvn> address sizes: 43 bits physical, 48 bits virtual

19:18 <geist> 41 is a pretty good run. that's 2 TiB

19:18 <gog> yeah

19:18 <mrvn> So an array of sizeof(struct per_page) << 43

19:19 <geist> of course every time you double the size of the struct you lose a bit

19:19 <gog> yes but that's as simple as moving the base of the array down

19:19 <gog> er fixing it

19:19 <gog> there's no shortage of address space below my kernel

19:20 <geist> does have the property that if you 'probe' all of this space it'll end up mapping the zero page a ton of times

19:20 <mrvn> 4 byte is 0.1% of memory (plus more for peripherals), 8 byte is 0.2%, 40 byte is 1% of memory.

19:20 <geist> which can chew up a fair amount of page tables

19:20 <geist> though i guess it can be smart and try to use large pages and break them up

19:21 <geist> mrvn: yah i think that's one of the reasons x86 will probably also eventually want to consider large base pages as average size of physical grows

19:21 <gog> yeah i don't quite have the ability to do that but it wouldn't be terribly hard to adapt to it

19:21 <geist> yah

19:22 <mrvn> You don't really want a per-page struct. Handing this in ranges of memory is far more space efficient.

19:22 <mrvn> but it's a space vs speed problem.

19:23 <geist> over time the per page gets more and more useful though

19:23 <geist> but increasing the size of a page has some scaling benefits

19:23 <geist> at the expense of internal fragmentation

19:24 <mrvn> if you merge page infos for continious regions it becomes easy to switch to mapping huge pages when possible.

19:24 <geist> oh sure

19:24 <geist> but per page can handle that if you have ways to dynamically coalesce

19:24 <gog> yeah the way linux handles that is they have a chaining thing with their page structs

19:24 <geist> yah something we'll eventually look into for zircon too

19:24 <gog> so you jsut check a page on the alignment you need and see if its chain is valid

19:24 <mrvn> without having to scan all 1024 pages of a huge page every imte?

19:25 <mrvn> s/imte/time/

19:25 <geist> that's of course the fun trick. doing that efficiently

19:25 <geist> i can think of a few ways to do it but i haven't looked into what linux does

19:25 <gog> coalese in idle time

19:25 <geist> you could, for example, set a bit on the subsequent pages that says 'i'm part of a log 2 large page' and then using math jump back to the first page in the set

19:25 <geist> but that does mean you have to touch each subsequent pages on coalesce or break apart

19:25 <geist> but that might not be so bad

19:26 <gog> you could also keep another stack

19:26 <geist> some sort of external tracking structure maybe

19:26 <mrvn> My plan is to just defragment memory. If a process has a lot of small pages then move pages around to free up a continous region.

19:27 <gog> that can work too

19:27 <geist> that too. i was looking at what linux does in `kcompactd` and it's basically doig that sort of stuff

19:27 <geist> you can fiddle with its parameters and cause it to go hog-wild

19:28 <gog> one issue i have is that mapping and unmapping single pages is kinda painfully expensive over the long run

19:29 <geist> yah it really is. you can optimize it, but really it's a fiar amount of code no matter how you do it

19:29 <gog> like i need a way to cache recently used tables

19:29 <mrvn> With defrag being the default I can just track the addres space and if a process has an aligned 2MB chunk in small pages it can make a huge one.

19:29 <geist> which is why i usually find that lots of IPCor whatnot algorithms that want to map/unmap stuff quickly usualy dont scale the way people do

19:29 <not_not> Phew paid my server bill

19:30 <geist> folks tend to assume that map/unmap is 'free' so you can do some memcpy, but really. the scaling is not that way

19:30 <geist> usually need to map something fairly large or copy from it multiple times to mitigate the overhead

19:30 <gog> no, a page table walk or every page map/unmap is a problem and i'm aware of it

19:30 <gog> i just don't know what i'm gonna do about it yet

19:30 xenos1984 has joined #osdev

19:30 <geist> yah i see that as a big bag of optimization bits, but you quickly hit diminishing returns

19:30 <geist> best thing to do is do page table manips less

19:31 <gog> that's yet another thing i made Just Do The Thing until it causes issues

19:31 <geist> or delay until required as much as possible

19:31 <gog> yeah

19:31 <geist> yep. totally. a working unoptimized thing is generally better than a not implemented optimal solution

19:31 <geist> unless the former precludes doing the latter

19:32 <gog> especially when it's a solution you don't really have the infrastructure for lol

19:32 <mrvn> My alloc() interface as a flag for what the memory is for: IPC, data, IO, ...

19:32 <gog> i do have a thing for temporary 4K page mappings

19:32 <mrvn> Different memory types can then use different allocators

19:33 <geist> how would yo optimize a temporary mapping differently?

19:33 <gog> it's basically the fulcrum around which everything rests lol

19:33 <mrvn> geist: don't merge pages into huge pages. And keep them out of the address range for long term data

19:33 <gog> i have a single-page fractal map at the very top of space

19:33 <gog> and it's _only_ for temporary mappings

19:34 <Jari--> https://ieeexplore.ieee.org/document/4724393/

19:34 <geist> ah

19:34 <bslsk05> ieeexplore.ieee.org: A Practical Implementation Course of Operating Systems: Curriculum Design and Teaching Experiences | IEEE Conference Publication | IEEE Xplore

19:34 <gog> its mapping func doesn't refcount or anything

19:34 <gog> but it needs RAII

19:34 <mrvn> For huge pages you can't have the virtual address space fragment.

19:34 <gog> otherwise you can leak the mapping and it falls down the void

19:36 <mrvn> geist: I think the most relevant optimization for temporary mappings would be to batch them. Like when you need 10 pages mapped you only pginvl() them at the end, potentially implicit with a task switch.

19:37 the_lanetly_052 has joined #osdev

19:37 <mrvn> and temporary mappings can be per core.

19:38 <mrvn> map_temp, bzero, unmap_temp - no IPI shootdown required.

19:38 <gog> yeah that was among the considerations

19:38 <gog> never use them on any other core, thread or even scope

19:39 <gog> i kinda violate that with my map_tableset function though. i need a better solution

19:40 <mrvn> On the other hand you can put a bit in the per-page struct so say if a page has been faulted in on more than one core.

19:42 <geist> yah per core is difficult because that basically now means you need multiple page table structures

19:43 <geist> though if you carve it out at the top level of the page table it's not *too* bad

19:43 <geist> but then you have per cpu ASIDs, etc

19:43 <geist> starts to chew into things pretty fast

19:43 <mrvn> geist: didn't you build them on the fly as needed and only cache the last N used tables?

19:43 <geist> for what?

19:43 <mrvn> for your kernel

19:43 <geist> what is 'them' in this case?

19:44 <mrvn> page tables

19:44 <geist> yes then? i guess i dont understand the context of your question

19:44 <mrvn> geist: so every core would build up the page table when a thread accesses it and they would look different on each core depending on the usage pattern

19:45 <geist> right. and then you have to maintain N copiues of the page table structure to provide a unique view per

19:45 <geist> and a different ASID/PCID, etc

19:45 srjek has quit [Ping timeout: 240 seconds]

19:45 <geist> can be done of course, but there are i think substantial tradeoffs

19:45 <mrvn> I would have one page table per ASID per core.

19:46 <geist> and no i have not done that. zircon and anything iv'e built does not have per cpu page tables

19:46 <mrvn> and synchronization for the process uses a different structure.

19:46 <geist> i think in general 64bit makes a bnunch of that less useful since you can generally just have long lived mappings in the kernel without worrying toomuch about it

19:50 <mrvn> and you have the memory to waste

19:52 <mrvn> *sigh* page tables, there go another 0.2+% of memory.

19:54 <Bitweasil> Linux does the "few most recently used" thing for page table PCIDs.

19:54 <Bitweasil> It helps a lot with stuff like piped processes that are process 1, kernel, process 2, kernel, process 1, etc.

19:55 <mrvn> you have less and less of those with the rise of more cores

19:56 <Bitweasil> Yeah, but that's what they're doing, last time I had to untangle the behavior.

19:56 <Bitweasil> I haven't had to trap CR3 changes for a year or so now, it's been kind of nice.

19:56 <geist> yah i thin realistically you need to handle having less PCIDs/ASIDs than you have processes or cores and then deal appropriately

19:56 <geist> x86 has 12 bits, arm has 8 or 16. easy to run out

19:57 <Bitweasil> You'll definitely have more processes than PCIDs, cores... we'll see, some of those ARM server chips are heading up in the core counts in a big hurry.

19:57 <Bitweasil> I think some are 256T?

19:57 <Bitweasil> And I'd expect that to increase in the years to come.

19:57 <geist> yah also fun thing: M1 cores have 8 bits of ASID

19:57 <mrvn> I wonder if I should set ASIDs fixed for kernel drivers and only use the left over for user processes

19:58 <geist> which is valid according to ARM, just all the arm cores i know of are 16 bit

19:58 <Bitweasil> A pool of kernel only ASIDs is entirely sane.

19:58 <Bitweasil> For a while, Linux was toggling a bit to change between user and kernel PCID on x86, for the split meltdown tables.

19:58 <Bitweasil> Kaiser? PTI? I can't recall the name they settled on.

19:58 <Bitweasil> Early enough it was the Kaiser patches.

19:58 <geist> yah i think they were using two per

19:59 <Bitweasil> They were toggling between 0x0 and 0x80 or something.

19:59 <Bitweasil> I wrote up some docs on it a while back, haven't had to think about it in a while.

19:59 <Bitweasil> And a paired set of page tables, 8kb aligned region.

19:59 <Bitweasil> So it was just a couple bit flips to switch between user and kernel page tables/PCIDs.

19:59 <Bitweasil> Kind of slick, really.

19:59 <geist> hmm, yeah that seems reasonable there

19:59 <mrvn> With 16 bit ASIDs I would pin all the kernel threads and do user space dynamically.

19:59 <Bitweasil> (and if you know that's in place, you can find the user or kernel page table from the other, and you know which one you're looking at)

20:00 <Bitweasil> Downside is you need an 8kb aligned region for page tables.

20:00 <geist> at the moment on zircon we're just allocating an ASID per process on arm (for 16bit) and on M1 it's just going to run out with more than 256

20:00 <Bitweasil> But it beats a lot of task struct lookups on every kernel transition.

20:00 <geist> hvae a bug to solve that

20:00 <mrvn> With 8 bit there might not be a lot of ASIDs left for user space

20:08 eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

20:32 the_lanetly_052 has quit [Ping timeout: 256 seconds]

20:44 vdamewood has quit [Read error: Connection reset by peer]

20:45 vdamewood has joined #osdev

21:05 gwizon has quit [Quit: Lost terminal]

21:31 dormito has quit [Quit: WeeChat 3.3]

21:40 <mrvn> geist: wouldn't you just flush the ASIDs when you overflow?

22:01 <clever> i was thinking it could be a kind of LRU queue, track the last 256 address spaces to be executing

22:01 <clever> and flush the oldest when you reuse its ID

22:01 <Bitweasil> Linux does last 6 or 8 or something.

22:02 <mrvn> oldest has a very bad pattern if you have 257 processes. random one would work.

22:03 rustyy has quit [Remote host closed the connection]

22:03 <clever> mrvn: probably the same reason nix garbage collection deletes random paths, and nix cant easily know the last-used stamp

22:03 <mrvn> For each process store the last ASID used and for each ASID store the last process using it. Then when switching task if the two agree then reuse the ASID, otherwise pick a random one and flush it.

22:04 GeDaMo has quit [Remote host closed the connection]

22:04 <clever> are asid only used at the core level, or shared between cores?

22:04 <mrvn> implementation defined

22:04 <clever> ah, so you could track per-core, for some systems

22:05 <clever> and then you can schedule 256 different aspaces per core, before you are forced to recycle an asid

22:05 <mrvn> geist: on arm+aarch64 is the ASID per core or shared?

22:05 rustyy has joined #osdev

22:17 srjek has joined #osdev

22:22 sonny has joined #osdev

22:25 <moon-child> do any FSes use perfect hashing?

22:26 <moon-child> seems nice, maybe worthwhile to avoid extra disc seeks. But maybe not

22:27 <mrvn> moon-child: so every time you create or delete a file you want to spend 5+h to figure out the perfect hash function to store the files without gaps?

22:27 PapaFrog has quit [Read error: Connection reset by peer]

22:27 LostFrog has joined #osdev

22:32 <mrvn> If you store all the filenames (of a dir) in a chunk on disk and make a hashtable<hash, offset_of_name> then the hashtable is rather small. Like 8 byte per entry, 512 entries per 4k block. Any normal dir will only take one read for ENOENT, 2-3 reads for a match.

22:34 <clever> mrvn: thats fine until you get a directory with 1,317,680 files in it

22:34 <mrvn> Use 16 or 32 bytes per entry and you can do short string optimization.

22:34 <clever> ls says the metadata for that dir is 1.3mb

22:35 <mrvn> clever: compute hash, load 4k block containing the hash, ENOENT

22:35 <clever> yeah, lookup of known names is always crazy fast

22:35 <clever> deletion of a file is abnormally slow

22:35 <mrvn> on hit, load string and compare -> 2-3 blocks

22:35 <clever> as-is iteration for dir listing

22:36 <mrvn> clever: why? zero out the hashtable entry, mark the string as free. 2-3 blocks again.

22:36 <clever> i dont know why

22:36 <clever> thats why the speed is abnormally slow

22:36 <mrvn> clever: iterating the dir iterates in hash order or order of strings on disk. The later would be more sequential.

22:37 <mrvn> clever: in many FS the deletion of the file also frees all the data blocks. Thats horribly slow.

22:37 <clever> yeah

22:38 <clever> a few weeks ago, i deleted a snapshot covering ~100gig worth of files on zfs

22:38 <mrvn> Some FS just move the file into the deleted folder and have a background process that frees the blocks when idle (or when running out of space)

22:38 <clever> and the machine was basically unusable for 2 hours

22:38 <mrvn> huh? I thought zfs did background deletion.

22:38 <clever> exactly

22:38 <clever> the first problem, is transaction groups (txg's)

22:39 <mrvn> The delete should only take a minute or so and then the zfs crunches in the background.

22:39 <clever> there will always be 1 open txg, and all transactions are grouped into that txg

22:39 <clever> once the txt is either 5 seconds old, or has too much data in it, the txg moves to the next state, and a new open txg is made

22:39 <clever> and there is a hard max of 1 txg in each state (other then closed)

22:40 <mrvn> as any COW will do

22:40 <clever> the first txg for deleting that snapshot, took 2605.047852 seconds to sync to disk

22:40 <clever> and during that time, no new txg's where allowed to be created

22:40 <clever> so any write IO syscall, hard locked, until that went thru

22:41 <mrvn> ouch. Sounds like it's missing a limit for how big a txg may be.

22:41 <clever> txg:38516320 state:C ndirty:33100 kb n(read/written):1019456/2416164 kb read/writes:228074/501194 otime:5.026643 qtime:0.007748 wtime:0.000029 stime:2605.047852

22:41 <clever> the debug output

22:42 <clever> https://github.com/openzfs/zfs/blob/master/module/zfs/spa_stats.c#L189-L203

22:42 <bslsk05> github.com: zfs/spa_stats.c at master · openzfs/zfs · GitHub

22:42 <clever> mrvn: which is this struct

22:42 <mrvn> So 3GB in 2605.047852s

22:42 <mrvn> or 1.18mb/s

22:43 <clever> `iostat -x 30` also showed 0 write/sec for nearly that entire timespan

22:43 <clever> and nearly zero reads

22:43 joe9 has quit [Quit: leaving]

22:43 <mrvn> Sounds more like a problem with the txg size in memory. Some O(n^2) algorithm or so.

22:43 <clever> almost like it was doing a tiny IO, compute, tiny IO, compute ...

22:44 <clever> and the drive never got a big enough request to saturate a bus

22:44 dormito has joined #osdev

22:45 <clever> txg:38516321 state:C ndirty:89239 kb n(read/written):356156/2021636 kb read/writes:84759/428906 otime:20.281326 qtime:0.000003 wtime:2584.774658 stime:939.237488

22:45 <clever> and then because that 1st txg was stalling, the txg immediately after it grew to unreasonable size, and took 15 minutes to sync

22:45 <clever> and then it gradually decayed, until stime was back in reasonable limits

22:46 <clever> as for the times, wtime is how long the txg was done collecting data in ram, and was waiting to just enter syncing state

22:46 <clever> so that 2nd txg was only open for 20 seconds (the otime) and then just got stuck waiting in line

22:46 <clever> txg:38516322 state:C ndirty:704132 kb n(read/written):207840/2464416 kb read/writes:50086/342490 otime:2584.774414 qtime:0.000002 wtime:939.237793 stime:626.675232

22:46 <clever> this 3rd txg, is where the system really ground to a halt, it was stuck open for 43 minutes

22:47 <clever> and at some point, it decided to just stop accepting new tx's, to keep its size under control

22:47 <clever> and being open that long, made it fat enough that it took 10mins to sync

22:48 <mrvn> You probably had the whole directory structure for the 100gig of files cached so the delete just run without disk access,

22:48 <clever> it was files i hadnt touched in ages, and they had already been deleted normally, but a snapshot was holding onto it

22:48 <clever> due to the snapshot, a new version of the metadata claiming they dont exist, was already on disk

22:49 <clever> so it should have been a simple matter of finding every block with a refcnt==1, and updating the free-space map

22:49 <mrvn> clever: odd. That should orphan the root of the snapshot. Then for each pointer in the block load the block and decr(refcnt).

22:49 <clever> but the tricky part, is that snapshots dont use a refcnt

22:50 <mrvn> Not that many blocks you can do that to in 5s.

22:50 <clever> basically, every record in zfs, has a birth-txg# attached to it

22:50 <clever> and a snapshot is just a txg#

22:51 <clever> so given 3 snapshots, a/b/c, when you destroy b, you can take a given block, and see if it was created before a, between a/b, between b/c, or after c

22:51 <clever> if the block was created before a, then a can also reference it, so it must be kept

22:52 <mrvn> must, or not?

22:52 <mrvn> must reference it

22:52 <clever> yeah, must

22:52 <clever> blocks created after c cant be found, if your traversing from b's root

22:52 <mrvn> between b/c and after c can't happen, not in b

22:53 <clever> exactly

22:53 <clever> so that just leaves blocks created between a<->b

22:53 <mrvn> you have to traverse b and c to see which blocks b and c have in common

22:53 <clever> and when you find such a block, you must also recurse

22:54 <clever> because it may be an indirect block, that is using a mix of refcnt=1 and refcnt>1 blocks

22:54 <clever> there are 2 other fun things of note

22:55 <clever> 1: the free space map, isnt a bitmap, but a log

22:55 <clever> every time you free or allocate a block range, you write a record of that to the end of a log

22:55 <clever> and if that structure gets too big, it gets compacted automatically

22:56 <clever> 2: the txg linked list on-disk is a bit weird due to CoW rules, the last valid txg, has a pointer to where on disk the next (doesnt exist yet) txg will live

22:56 <clever> so its pointing to data from the future, in a block that isnt valid!

22:56 <mrvn> clever: yeah, had to do the same.

22:56 <clever> so you can then make it valid, without having to modify history (the old txg)

22:57 <mrvn> My root actually has a skiplist, pointers to txg 2^n for n=1-16

22:58 <clever> http://www.giis.co.in/Zfs_ondiskformat.pdf

23:00 <clever> zfs has a "vdev label" (page 7) that acts as the master header for the block device, it is 256kb of size, and 4 copies are stored on each block device

23:00 <clever> 2 at the head end, and 2 at the tail end

23:01 <clever> each label has(page 8) an 8kb "blank space" at the front, an 8kb "boot header", a 112kb name/value pairs space, and then a 128kb uberblock array

23:02 <clever> that 8kb "blank space" could easily hold your MBR partition table, so zfs could live directly on sda, and not sda1

23:02 <mrvn> clever: grub needs way more than 8KB.

23:02 <clever> there is a solution to that too

23:03 <clever> immediately after the first 2 copies of the label (512kb in size) is a 3.5mb "boot block" hole

23:03 bgs has quit [Ping timeout: 252 seconds]

23:03 <clever> so you have 16kb of hole at offset 0, for your MBR stub, MBR tables, and maybe GPT tables

23:03 <clever> and then at offset 512kb, size 3.5mb, you have a hole for the grub kernel+drivers

23:04 <mrvn> assuming you have a bootloader that knows ZFS has a hole there and supports it

23:04 <clever> yeah

23:04 bgs has joined #osdev

23:04 <clever> the zfs tools also dont like doing this, `zpool create tank /dev/sda` will instead generate a GPT table, and put zfs on `/dev/sda1`

23:04 <clever> so you cant actually take advantage of this

23:05 <clever> its probably a legacy feature from before it had linux support

23:05 <clever> why have grub learn this about every fs in existance? when you can just make a 1mb "bios boot partition" and problem solved

23:05 <mrvn> plus tons of people installed of sda and then couldn't book because the BIOS is to stupid to boot without partition table.

23:06 <mrvn> EFI needs the efi partition to boot as well.

23:06 <clever> yeah

23:06 <clever> and 3.5mb is too tiny for that

23:06 <clever> this idea kind of worked for the MBR era

23:07 <mrvn> yeah, except not reliable

23:07 <clever> you could put a real MBR table in the first 8kb hole, to both make the bios happy, and hold the MBR stub

23:07 <clever> but with the arrival of efi and gpt, it doesnt work as well

23:07 <clever> so, with all of the above, the first 4mb of the partition, and the last 512kb, are all accounted for

23:09 <clever> oh, another fun fact, zfs always writes ints in native byte order

23:09 <clever> so if you write a record on a BE system, and read it on an LE system, it comes out backwards

23:09 <clever> but, the records also have a magic# on them, that is also backwards, so you know if a swap is needed

23:13 <mrvn> One thing I have a problem with is if you make 100 snapshots of A and then delete A. Now I have to compare A against 100 snapshots to find blocks I can free / must keep.

23:14 <moon-child> no reference counting?

23:14 <mrvn> moon-child: see above

23:15 adder has quit [Read error: Connection reset by peer]

23:15 adder has joined #osdev

23:18 <clever> mrvn: also, the uberblocks, are padded to be 2^ashift in size, so how many you can fit into the uberblock array will vary

23:18 <mrvn> moon-child: refcounting would have a similar problem. At first the root has a refcount of 1, make a snapshot and you have to update A and B to point to a copy with refcount 2. Another snapshot you have to write A, B and C to point to a refount of 3, ..., 100th snapshot then has to update 101 roots.

23:18 <clever> but that ensures that a sector you partially write, wont destroy other records in the ring

23:19 <clever> and a given uberblock, is always written to all 4 labels, on every vdev in the pool

23:23 <mrvn> hmm, I want to map http://www.mathe.tu-freiberg.de/~hebisch/cafe/mce/symm25.html onto a sphere and then onto an ikosaeder that I can cut out of wood and turn into a globe.

23:23 <bslsk05> www.mathe.tu-freiberg.de: Symmetriezeichnung 25

23:23 <mrvn> I think I need a tiny raytracer for this so I can make some pictures of each step.

23:50 sdfgsdfg has joined #osdev

23:51 adder has quit [Remote host closed the connection]

23:52 adder has joined #osdev

23:57 <geist> mrvn: arm ASIDs are shared

23:57 <geist> in the sense that though they'e per core TLB entries, when you do TLB shootdowns cross-core you generally specify the ASID + VA

23:57 <geist> so it only really works if the ASID is global

23:58 <geist> if you have some cores that dont participate in cross core TLB shootdowns they can reuse their own ASIDs

23:58 <geist> so you could potentially have an ASID (like 4) reserved for per core entries i suppose

23:58 <geist> and in those cases *dont* do cross TLB shootdowns

23:58 <geist> or just dont do cross TLB shootdowns for stuff in particular VA ranges