<klange>
hm, my serial console is freezing, and it doesn't look like it's a problem on the RPi side; if I restart the serial console (picocom) I get output again
heat has quit [Remote host closed the connection]
<klange>
This is not a smart link, it's just tx/rx so it's not communicating any control signals when it restarts... may be my serial adapter on the host side being finnicky?
<mrvn>
what does minicom or screen do?
kingoffrance has joined #osdev
<klange>
have awful terminal emulators embedded in them? oh you mean do they have the same behavior ;)
<mrvn>
the later
<klange>
I'll let screen run for a bit.
diamondbond has quit [Ping timeout: 256 seconds]
<eryjus>
klange, i ended up having to write my own. believe it or not, I think it was leveraged from something mrvn wrote for the original rpi
<mrvn>
raspbootcom?
<eryjus>
yes. i guess it doesn't quite qualify as my own..
<eryjus>
most of the code was rewritten by my own hand but i used it as a reference solution
<bslsk05>
eryjus/pi-bootloader - This is a serially connected bootloader for the rpi hardware (0 forks/0 stargazers/NOASSERTION)
<eryjus>
`pbl-server /dev/ttyUSB0 .` will work as a dumb serial console
Burgundy has quit [Ping timeout: 256 seconds]
[itchyjunk] has joined #osdev
theruran has joined #osdev
<geist>
klange: re serial locking up like that, I’ve seen that from time to time with screen and a usb serial adaptor
<geist>
Sometimes its outgoing data locks up until you restart screen
<geist>
I’m guessing it’s not screen per se, but it needs some sort of tty level reset that happens when you reattach it
<dmh>
i was just mentioning this problem in another channel
<dmh>
hrm
<geist>
Which may or may not result in some sort of usb level control reset for the serial adaptor
<klange>
screen just did it, and if I c-a k and restart, it's fine, so, yeah, screen and picocom both doing it
<geist>
One of these days I’ll just have a scope or probe on it and then i can tell if the problem is between screen and FTDI/etc or something else
<geist>
I’ve always wondered if its possible it could get a stray vt100 command that puts screen off in some mode where it seems to be locked up
<geist>
As you can sometimes get if you’re looking at pure garbage coming in (or binary data)
<geist>
Or perl
<klange>
relieved to hear this is not a 'me' problem
<geist>
Yah i specifically see it on Linux if i leave screen just tailing some console on a fuchsia device for a few days. Usually i come in a day or two later, hit a key and get no response, but if i restart screen the console ‘comes back’
<geist>
But it’s still receiving things. Never have looked into what the deal is but since it’s not using HW flow control andit’s just 3 wire serial, i dont think it can be fuchsias fault
<klange>
I get it after _minutes_, but this is jankiest TTL-to-serial-to-USB setup
<dmh>
was just going to ask if using any hw rts/cts
<dmh>
which ttl chip, ft232?
<dmh>
or is it some mystery dongle
<klange>
clone max3232
<dmh>
ah
<geist>
Yahi’ll have to check on mine as well. I generally have good luck with ftdis
<klange>
attached to whatever RS232 USB adapter I could get my hands on, some Elecom (local big-name retail consumer electronics brand) crap I haven't opened up to see what's inside
<geist>
I suppose it could be that when you disconnect from the tty the serial chip drops the high level TX output which is the steady state
<geist>
And thus the other side sees a long BRK basically and that resets everything
<geist>
I forget the key stroke but in screen you can do a ctrl-a to send a BRK so may be worth trying that next time
<geist>
(Also good for testing uart drivers since BRKs are usually an error condition to the uart)
<mrvn>
Linux sends a brk when you open the device. That's why an arduino always reboots when you open the console.
<geist>
Yah
<clever>
i kind of looked at that, as break signaling a literal break in the serial cable
<clever>
somebody unpluged or severed the line
<klange>
The rpi's mini uart interprets breaks as nil bytes: As a result of this, a UART1_RX input line which is continuously low (a break condition or an
<klange>
error in connection or GPIO setup) causes the receiver to continuously receive 0x00 symbols.
<clever>
yeah, the mini-uart is uter crap
<klange>
nice pdf ya got there, broadcom, that should not have been two lines...
<clever>
config.txt can give you the PL011 uart on the gpio header
<clever>
and either route the mini-uart to the bt, or just disable bt entirely
<klange>
I don't have the hardware to benefit from talking to a real UART anyway, this dumb little max3232 board only has tx/rx/gnd/5v
<clever>
also note, that the uart on the rpi is only 3.3v tolerant
<clever>
if you shove 5v into its pins, you will fry them
<klange>
Yes, this is a 3v ttl chip, the 5v is... to power the chip, presumably.
<clever>
yeah, that can be handy, for low-draw devices
<klange>
Or in the rs232 conversion, all I know is I had to hook up all four wires or it didn't work.
<clever>
of note, the usb-c port on a pi400 can emulate a usb device
<mrvn>
Can the rip3/4 still be powered over the 5V gpio pin?
<clever>
ive turned my pi400 into a dumb usb keyboard
<clever>
mrvn: pi3/pi4 yes, pi400 no
<mrvn>
800mA seems a bit low for them. Unless you think those cheat chinese TTL USB dongles don't do current limiting and give out more.
<clever>
its also up to the usb host to limit current as well
<clever>
and some dont
<mrvn>
They probably just pass the USB 5V line though.
<clever>
the uart board, yeah
<mrvn>
Then you just need a cheap USB chip that doesn't require USB3 power negotiation before allowing 2A draw.
<clever>
or the reverse, get something with a buck regulator and PD negotatiation
<clever>
then you get get 2A 50V and buck it back down to 5v, to provide mega amps :P
<klange>
Anyway, it is nice to be able to actually interact with this thing now, should make poking PCIe and XHCI a bit easier than having to repeatedly rebuild + restart...
<clever>
yeah
<clever>
i had serial working before i had any video working
<mrvn>
klange: does sending a break in screen work too?
<klange>
Did not check. In picocom it's C-a C-\ so I'll try that whenit happens again there.
pretty_dumm_guy has quit [Ping timeout: 256 seconds]
elastic_dog has quit [Ping timeout: 252 seconds]
elastic_dog has joined #osdev
diamondbond has joined #osdev
ephemer0l has quit [Ping timeout: 250 seconds]
diamondbond has quit [Ping timeout: 240 seconds]
Brnocrist has quit [Ping timeout: 240 seconds]
<klange>
sending break in picocom did not fix things
<klange>
restarting it did
<clever>
one of my usb uart adapters is very buggy
<clever>
if you send it a malfirmed uart byte, it deadlocks
<clever>
and it doesnt recover until you close and re-open the tty in linux
Brnocrist has joined #osdev
smeso has quit [Quit: smeso]
diamondbond has joined #osdev
smeso has joined #osdev
diamondbond has quit [Ping timeout: 256 seconds]
gog has quit [Ping timeout: 240 seconds]
gog has joined #osdev
diamondbond has joined #osdev
masoudd has joined #osdev
eryjus has quit [Remote host closed the connection]
eryjus has joined #osdev
eryjus has quit [Remote host closed the connection]
eryjus has joined #osdev
diamondbond has quit [Ping timeout: 240 seconds]
gog has quit [Ping timeout: 240 seconds]
ephemer0l has joined #osdev
vdamewood has joined #osdev
vinleod has joined #osdev
vdamewood has quit [Ping timeout: 240 seconds]
bradd has quit [Remote host closed the connection]
bradd has joined #osdev
Starfoxxes has quit [Ping timeout: 260 seconds]
Starfoxxes has joined #osdev
vinleod is now known as vdamewood
Jari-- has quit [Ping timeout: 240 seconds]
ElectronApps has joined #osdev
srjek has quit [Ping timeout: 240 seconds]
Jari-- has joined #osdev
the_lanetly_052_ has joined #osdev
[itchyjunk] has quit [Read error: Connection reset by peer]
sdfgsdfg has joined #osdev
gwizon has quit [Quit: Lost terminal]
pretty_dumm_guy has joined #osdev
Belxjander has quit [Ping timeout: 256 seconds]
eroux has joined #osdev
<klange>
One of the forum regulars is in Bakhmut.
<mrvn>
wherever that is
<klange>
Uh, it's in Donetsk.
<FireFly>
:\
the_lanetly_052_ has quit [Ping timeout: 240 seconds]
<Jari-->
Hopefully it wont crash the robot though :)
<Jari-->
Anyone here born before 1980 ?
<Jari-->
Im 30.05.1980
<GeDaMo>
o/
<kazinsal>
hurd seems so incredibly superfluous at this point
<GeDaMo>
I'm 0x39
<Jari-->
GeDaMo thats decimals?
<GeDaMo>
... no :|
<Jari-->
42
<Jari-->
almost granpa kids here at Pupuhuhta say
<kazinsal>
linux actually has asyncio, so does freebsd, there are other actually functional microkernels for embedded and real-time systems like qnx, and NT is a stable hybrid kernel
xenos1984 has quit [Read error: Connection reset by peer]
<kazinsal>
you do hurd things in 2022 because you're part of the cult of unhygenic MIT tenure
<Jari-->
No more subjectline robot?
<Jari-->
I thought it might download the image.
<kazinsal>
bslsk only spits out summaries for things it can parse the response for iirc
<kazinsal>
based on content-type
<klange>
Hurd never even managed to deliver on the promise of microkernel stability, it's buggier than toaru32 and device drivers take out the whole thing just as good as any Linux.
<kazinsal>
QNX is almost 40 years old now
<kazinsal>
and it still works real well
<kazinsal>
iirc BlackBerry owns it now
<kazinsal>
if you want a stable, functional, open source microkernel, just download minix
<kazinsal>
if you want a stable, functional, widespread microkernel, go get a QNX license
<kazinsal>
and if you want an open source microkernel that's worth hacking on but you're not expecting stability or functionality, learn rust and grab redox
<kazinsal>
there's no point in hurd anymore
<mlombard>
kazinsal, probably Zircon would be a better choice right now
xenos1984 has joined #osdev
<mjg>
this makes me wonder how many people even took hurd seriously back in the day
<mjg>
by the time i started playing with systems (2003 or so) linux was the de facto standard
<mjg>
with hurd already being a curiosity
<Jari-->
If you have an Intel-chipset based motherboard, there are great chances it is equipped with the Intel Management (Intel ME) unit. This is not new. And concerns regarding the privacy issue behind that little know feature were raised for several years. But suddenly, the blogosphere seems to have rediscovered the problem. And we can read many half-true or just plain wrong statements about this
<bslsk05>
itsfoss.com: The Truth About the Intel's Hidden Minix OS and Security Concerns - It's FOSS
<Jari-->
Built into many Intel® Chipset–based platforms is a small, low-power computer subsystem called the Intel® Management Engine (Intel® ME). The Intel® ME performs various tasks while the system is in sleep, during the boot process, and when your system is running.
<Jari-->
wow
<Jari-->
so basically there is even 4-bit subprocessors in your new Intel
<bslsk05>
eryjus.com: x86 and amd64 instruction reference
<gog>
aaay
<sonny>
"Singularity's approach differs in that it isolates a process's objects by memory pages, rather than allocating them in a common address space. When a process terminates, Singularity quickly reclaims the process's memory pages, rather than turning to garbage collection to reclaim memory." -- I'm reading about singularity OS but I don't get how it
<sonny>
lets a programmer handle memory if it only reclaims memory when the process is finished.
<gog>
applications don't typically give back memory to the system during runtime
<sonny>
except you don't get delete
<sonny>
in singularity you have to use something like C# so I don't understand how one would manage memory
<sonny>
gog: oh, I thought that's what memory management was for?
<mrvn>
sonny: 1) I don't see how singularity differs when it does what every OS does anyway.
<sonny>
I don't either
<sonny>
that's what I asked lol
<mrvn>
sonny: 2) c# handles memory internally and will allocate or free larger chunks
<mrvn>
What singularity are you talking about anyway? The container?
<sonny>
mrvn: "When a process terminates, Singularity quickly reclaims the process's memory pages, rather than turning to garbage collection to reclaim memory." -- The modified C# they use doesn't free stuff I think
<sonny>
mrvn: The research OS from microsoft
<GeDaMo>
It probably does garbage collection while running but at the end can discard all the allocated pages
<mrvn>
sonny: So maybe they mean the singularity knows what memory a process has as opposed to c# normaly deducing that by garbage collection
<sonny>
oh
<mrvn>
So each process has their own GC over the processes address space and on exit the whole address space gets dumped.
<sonny>
I see
<mrvn>
just a guess though.
<mrvn>
The way I hacked this together for ocaml was to spin up separate ocaml runtimes for each cpu core (i.e. process). So I just run 4 ocamls in parallel with a little glue to IPC.
<GeDaMo>
I believe Singularity is a single address space system
<mrvn>
GeDaMo: doesn't really matter as long as allocated memory is dedicated to a process.
<GeDaMo>
Yeah
<mrvn>
Anyway, my point was that I just used the MMU to separate the ocaml instances so each had it's own memory pool without me having to modify ocaml at all.
<mrvn>
Just had to provide malloc/free and some other libc functions as well as the IPC mechanism and the normal linux ocaml runtime and binaries work barbone.
<sonny>
neat
<mrvn>
Singularity might do something similar and run totally normal c# runtimes as processes.
<sonny>
singularity's thing is objects go in pages
<mrvn>
every object no matter how small uses a page?
amazigh has quit [Ping timeout: 240 seconds]
<sonny>
not quite
<sonny>
but I have to go sorry, I'll explain later maybe
<mrvn>
Singularity is a stupid name. How is one supposed to google for that?
<bauen1>
there's a game called singularity about an ai achieving singularity
amazigh has joined #osdev
<mrvn>
and a movie and the general concept and a ton of other stuff
<bslsk05>
github.com: sophia/page_stack.c at restart · adachristine/sophia · GitHub
<geist>
which in the long time will end up being occupied anyway, so it's really there to delay the setup of the physical pages
<geist>
physical page structs
<mrvn>
if it's over virtual space instead of physical space ...
<gog>
i probably didn't explain it so well
<geist>
yeah now that i think about it i may have misheard
<gog>
basically i just create an anonymous memory region where the array lives
<mrvn>
so just demand page mapping
<gog>
yes
<gog>
and i have an early allocator that can grab pages from the memory map at boot time
<gog>
but it checks the primary allocator first
<geist>
to be clear: the struct is per physical page or per virtual page in an aspace?
<gog>
per physical page
<geist>
okay, then i did understand
<gog>
and the index of the struct in the array is its pfn
mahmutov has quit [Ping timeout: 252 seconds]
xenos1984 has quit [Read error: Connection reset by peer]
<gog>
ope found a bug
<gog>
well hm idk tho
<gog>
i don't think i need to refcount mmio pages though
<mrvn>
gog: it can get pretty bad when it has to recurse a bunch of times to allocate all the page tables and per page structs a bunch of times to get a free page.
<geist>
yah usually what i do with these is add alist node to the struct, and then move the page struct around between objets (or free list)
<gog>
that recursing over page tables settles down shortly after boot
<geist>
which then acts as the ownership mechanism
<geist>
mmio pages i think you may want to treat differently
<mrvn>
I would initialize the array on boot according to the memory map.
<geist>
usually i *dont* add a tracking structure since there's no reason to track it
<geist>
and you can't predict where mmio will appear anyway
<geist>
may be wayyyyy out past the end of physical ram
<geist>
though on PCs it's usually 'near' regular ram
<geist>
but i've seen some high PCIe apertures be fairly far out, even on a PC
<gog>
that might be a problem
<gog>
but i have 4GiBs of virtual space for the table
<mrvn>
you can easily have an array covering all of the address space.
<geist>
well, need to be clear: what do you mean by address space here
<mrvn>
all 40-56bits of addressable address space
<geist>
so virtual
<gog>
and i can index 41 bits of physical space with my current struct
<mrvn>
So an array of sizeof(struct per_page) << 43
<geist>
of course every time you double the size of the struct you lose a bit
<gog>
yes but that's as simple as moving the base of the array down
<gog>
er fixing it
<gog>
there's no shortage of address space below my kernel
<geist>
does have the property that if you 'probe' all of this space it'll end up mapping the zero page a ton of times
<mrvn>
4 byte is 0.1% of memory (plus more for peripherals), 8 byte is 0.2%, 40 byte is 1% of memory.
<geist>
which can chew up a fair amount of page tables
<geist>
though i guess it can be smart and try to use large pages and break them up
<geist>
mrvn: yah i think that's one of the reasons x86 will probably also eventually want to consider large base pages as average size of physical grows
<gog>
yeah i don't quite have the ability to do that but it wouldn't be terribly hard to adapt to it
<geist>
yah
<mrvn>
You don't really want a per-page struct. Handing this in ranges of memory is far more space efficient.
<mrvn>
but it's a space vs speed problem.
<geist>
over time the per page gets more and more useful though
<geist>
but increasing the size of a page has some scaling benefits
<geist>
at the expense of internal fragmentation
<mrvn>
if you merge page infos for continious regions it becomes easy to switch to mapping huge pages when possible.
<geist>
oh sure
<geist>
but per page can handle that if you have ways to dynamically coalesce
<gog>
yeah the way linux handles that is they have a chaining thing with their page structs
<geist>
yah something we'll eventually look into for zircon too
<gog>
so you jsut check a page on the alignment you need and see if its chain is valid
<mrvn>
without having to scan all 1024 pages of a huge page every imte?
<mrvn>
s/imte/time/
<geist>
that's of course the fun trick. doing that efficiently
<geist>
i can think of a few ways to do it but i haven't looked into what linux does
<gog>
coalese in idle time
<geist>
you could, for example, set a bit on the subsequent pages that says 'i'm part of a log 2 large page' and then using math jump back to the first page in the set
<geist>
but that does mean you have to touch each subsequent pages on coalesce or break apart
<geist>
but that might not be so bad
<gog>
you could also keep another stack
<geist>
some sort of external tracking structure maybe
<mrvn>
My plan is to just defragment memory. If a process has a lot of small pages then move pages around to free up a continous region.
<gog>
that can work too
<geist>
that too. i was looking at what linux does in `kcompactd` and it's basically doig that sort of stuff
<geist>
you can fiddle with its parameters and cause it to go hog-wild
<gog>
one issue i have is that mapping and unmapping single pages is kinda painfully expensive over the long run
<geist>
yah it really is. you can optimize it, but really it's a fiar amount of code no matter how you do it
<gog>
like i need a way to cache recently used tables
<mrvn>
With defrag being the default I can just track the addres space and if a process has an aligned 2MB chunk in small pages it can make a huge one.
<geist>
which is why i usually find that lots of IPCor whatnot algorithms that want to map/unmap stuff quickly usualy dont scale the way people do
<not_not>
Phew paid my server bill
<geist>
folks tend to assume that map/unmap is 'free' so you can do some memcpy, but really. the scaling is not that way
<geist>
usually need to map something fairly large or copy from it multiple times to mitigate the overhead
<gog>
no, a page table walk or every page map/unmap is a problem and i'm aware of it
<gog>
i just don't know what i'm gonna do about it yet
xenos1984 has joined #osdev
<geist>
yah i see that as a big bag of optimization bits, but you quickly hit diminishing returns
<geist>
best thing to do is do page table manips less
<gog>
that's yet another thing i made Just Do The Thing until it causes issues
<geist>
or delay until required as much as possible
<gog>
yeah
<geist>
yep. totally. a working unoptimized thing is generally better than a not implemented optimal solution
<geist>
unless the former precludes doing the latter
<gog>
especially when it's a solution you don't really have the infrastructure for lol
<mrvn>
My alloc() interface as a flag for what the memory is for: IPC, data, IO, ...
<gog>
i do have a thing for temporary 4K page mappings
<mrvn>
Different memory types can then use different allocators
<geist>
how would yo optimize a temporary mapping differently?
<gog>
it's basically the fulcrum around which everything rests lol
<mrvn>
geist: don't merge pages into huge pages. And keep them out of the address range for long term data
<gog>
i have a single-page fractal map at the very top of space
<bslsk05>
ieeexplore.ieee.org: A Practical Implementation Course of Operating Systems: Curriculum Design and Teaching Experiences | IEEE Conference Publication | IEEE Xplore
<gog>
its mapping func doesn't refcount or anything
<gog>
but it needs RAII
<mrvn>
For huge pages you can't have the virtual address space fragment.
<gog>
otherwise you can leak the mapping and it falls down the void
<mrvn>
geist: I think the most relevant optimization for temporary mappings would be to batch them. Like when you need 10 pages mapped you only pginvl() them at the end, potentially implicit with a task switch.
the_lanetly_052 has joined #osdev
<mrvn>
and temporary mappings can be per core.
<mrvn>
map_temp, bzero, unmap_temp - no IPI shootdown required.
<gog>
yeah that was among the considerations
<gog>
never use them on any other core, thread or even scope
<gog>
i kinda violate that with my map_tableset function though. i need a better solution
<mrvn>
On the other hand you can put a bit in the per-page struct so say if a page has been faulted in on more than one core.
<geist>
yah per core is difficult because that basically now means you need multiple page table structures
<geist>
though if you carve it out at the top level of the page table it's not *too* bad
<geist>
but then you have per cpu ASIDs, etc
<geist>
starts to chew into things pretty fast
<mrvn>
geist: didn't you build them on the fly as needed and only cache the last N used tables?
<geist>
for what?
<mrvn>
for your kernel
<geist>
what is 'them' in this case?
<mrvn>
page tables
<geist>
yes then? i guess i dont understand the context of your question
<mrvn>
geist: so every core would build up the page table when a thread accesses it and they would look different on each core depending on the usage pattern
<geist>
right. and then you have to maintain N copiues of the page table structure to provide a unique view per
<geist>
and a different ASID/PCID, etc
srjek has quit [Ping timeout: 240 seconds]
<geist>
can be done of course, but there are i think substantial tradeoffs
<mrvn>
I would have one page table per ASID per core.
<geist>
and no i have not done that. zircon and anything iv'e built does not have per cpu page tables
<mrvn>
and synchronization for the process uses a different structure.
<geist>
i think in general 64bit makes a bnunch of that less useful since you can generally just have long lived mappings in the kernel without worrying toomuch about it
<mrvn>
and you have the memory to waste
<mrvn>
*sigh* page tables, there go another 0.2+% of memory.
<Bitweasil>
Linux does the "few most recently used" thing for page table PCIDs.
<Bitweasil>
It helps a lot with stuff like piped processes that are process 1, kernel, process 2, kernel, process 1, etc.
<mrvn>
you have less and less of those with the rise of more cores
<Bitweasil>
Yeah, but that's what they're doing, last time I had to untangle the behavior.
<Bitweasil>
I haven't had to trap CR3 changes for a year or so now, it's been kind of nice.
<geist>
yah i thin realistically you need to handle having less PCIDs/ASIDs than you have processes or cores and then deal appropriately
<geist>
x86 has 12 bits, arm has 8 or 16. easy to run out
<Bitweasil>
You'll definitely have more processes than PCIDs, cores... we'll see, some of those ARM server chips are heading up in the core counts in a big hurry.
<Bitweasil>
I think some are 256T?
<Bitweasil>
And I'd expect that to increase in the years to come.
<geist>
yah also fun thing: M1 cores have 8 bits of ASID
<mrvn>
I wonder if I should set ASIDs fixed for kernel drivers and only use the left over for user processes
<geist>
which is valid according to ARM, just all the arm cores i know of are 16 bit
<Bitweasil>
A pool of kernel only ASIDs is entirely sane.
<Bitweasil>
For a while, Linux was toggling a bit to change between user and kernel PCID on x86, for the split meltdown tables.
<Bitweasil>
Kaiser? PTI? I can't recall the name they settled on.
<Bitweasil>
Early enough it was the Kaiser patches.
<geist>
yah i think they were using two per
<Bitweasil>
They were toggling between 0x0 and 0x80 or something.
<Bitweasil>
I wrote up some docs on it a while back, haven't had to think about it in a while.
<Bitweasil>
And a paired set of page tables, 8kb aligned region.
<Bitweasil>
So it was just a couple bit flips to switch between user and kernel page tables/PCIDs.
<Bitweasil>
Kind of slick, really.
<geist>
hmm, yeah that seems reasonable there
<mrvn>
With 16 bit ASIDs I would pin all the kernel threads and do user space dynamically.
<Bitweasil>
(and if you know that's in place, you can find the user or kernel page table from the other, and you know which one you're looking at)
<Bitweasil>
Downside is you need an 8kb aligned region for page tables.
<geist>
at the moment on zircon we're just allocating an ASID per process on arm (for 16bit) and on M1 it's just going to run out with more than 256
<Bitweasil>
But it beats a lot of task struct lookups on every kernel transition.
<geist>
hvae a bug to solve that
<mrvn>
With 8 bit there might not be a lot of ASIDs left for user space
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
the_lanetly_052 has quit [Ping timeout: 256 seconds]
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
gwizon has quit [Quit: Lost terminal]
dormito has quit [Quit: WeeChat 3.3]
<mrvn>
geist: wouldn't you just flush the ASIDs when you overflow?
<clever>
i was thinking it could be a kind of LRU queue, track the last 256 address spaces to be executing
<clever>
and flush the oldest when you reuse its ID
<Bitweasil>
Linux does last 6 or 8 or something.
<mrvn>
oldest has a very bad pattern if you have 257 processes. random one would work.
rustyy has quit [Remote host closed the connection]
<clever>
mrvn: probably the same reason nix garbage collection deletes random paths, and nix cant easily know the last-used stamp
<mrvn>
For each process store the last ASID used and for each ASID store the last process using it. Then when switching task if the two agree then reuse the ASID, otherwise pick a random one and flush it.
GeDaMo has quit [Remote host closed the connection]
<clever>
are asid only used at the core level, or shared between cores?
<mrvn>
implementation defined
<clever>
ah, so you could track per-core, for some systems
<clever>
and then you can schedule 256 different aspaces per core, before you are forced to recycle an asid
<mrvn>
geist: on arm+aarch64 is the ASID per core or shared?
rustyy has joined #osdev
srjek has joined #osdev
sonny has joined #osdev
<moon-child>
do any FSes use perfect hashing?
<moon-child>
seems nice, maybe worthwhile to avoid extra disc seeks. But maybe not
<mrvn>
moon-child: so every time you create or delete a file you want to spend 5+h to figure out the perfect hash function to store the files without gaps?
PapaFrog has quit [Read error: Connection reset by peer]
LostFrog has joined #osdev
<mrvn>
If you store all the filenames (of a dir) in a chunk on disk and make a hashtable<hash, offset_of_name> then the hashtable is rather small. Like 8 byte per entry, 512 entries per 4k block. Any normal dir will only take one read for ENOENT, 2-3 reads for a match.
<clever>
mrvn: thats fine until you get a directory with 1,317,680 files in it
<mrvn>
Use 16 or 32 bytes per entry and you can do short string optimization.
<clever>
ls says the metadata for that dir is 1.3mb
<clever>
this 3rd txg, is where the system really ground to a halt, it was stuck open for 43 minutes
<clever>
and at some point, it decided to just stop accepting new tx's, to keep its size under control
<clever>
and being open that long, made it fat enough that it took 10mins to sync
<mrvn>
You probably had the whole directory structure for the 100gig of files cached so the delete just run without disk access,
<clever>
it was files i hadnt touched in ages, and they had already been deleted normally, but a snapshot was holding onto it
<clever>
due to the snapshot, a new version of the metadata claiming they dont exist, was already on disk
<clever>
so it should have been a simple matter of finding every block with a refcnt==1, and updating the free-space map
<mrvn>
clever: odd. That should orphan the root of the snapshot. Then for each pointer in the block load the block and decr(refcnt).
<clever>
but the tricky part, is that snapshots dont use a refcnt
<mrvn>
Not that many blocks you can do that to in 5s.
<clever>
basically, every record in zfs, has a birth-txg# attached to it
<clever>
and a snapshot is just a txg#
<clever>
so given 3 snapshots, a/b/c, when you destroy b, you can take a given block, and see if it was created before a, between a/b, between b/c, or after c
<clever>
if the block was created before a, then a can also reference it, so it must be kept
<mrvn>
must, or not?
<mrvn>
must reference it
<clever>
yeah, must
<clever>
blocks created after c cant be found, if your traversing from b's root
<mrvn>
between b/c and after c can't happen, not in b
<clever>
exactly
<clever>
so that just leaves blocks created between a<->b
<mrvn>
you have to traverse b and c to see which blocks b and c have in common
<clever>
and when you find such a block, you must also recurse
<clever>
because it may be an indirect block, that is using a mix of refcnt=1 and refcnt>1 blocks
<clever>
there are 2 other fun things of note
<clever>
1: the free space map, isnt a bitmap, but a log
<clever>
every time you free or allocate a block range, you write a record of that to the end of a log
<clever>
and if that structure gets too big, it gets compacted automatically
<clever>
2: the txg linked list on-disk is a bit weird due to CoW rules, the last valid txg, has a pointer to where on disk the next (doesnt exist yet) txg will live
<clever>
so its pointing to data from the future, in a block that isnt valid!
<mrvn>
clever: yeah, had to do the same.
<clever>
so you can then make it valid, without having to modify history (the old txg)
<mrvn>
My root actually has a skiplist, pointers to txg 2^n for n=1-16
<clever>
zfs has a "vdev label" (page 7) that acts as the master header for the block device, it is 256kb of size, and 4 copies are stored on each block device
<clever>
2 at the head end, and 2 at the tail end
<clever>
each label has(page 8) an 8kb "blank space" at the front, an 8kb "boot header", a 112kb name/value pairs space, and then a 128kb uberblock array
<clever>
that 8kb "blank space" could easily hold your MBR partition table, so zfs could live directly on sda, and not sda1
<mrvn>
clever: grub needs way more than 8KB.
<clever>
there is a solution to that too
<clever>
immediately after the first 2 copies of the label (512kb in size) is a 3.5mb "boot block" hole
bgs has quit [Ping timeout: 252 seconds]
<clever>
so you have 16kb of hole at offset 0, for your MBR stub, MBR tables, and maybe GPT tables
<clever>
and then at offset 512kb, size 3.5mb, you have a hole for the grub kernel+drivers
<mrvn>
assuming you have a bootloader that knows ZFS has a hole there and supports it
<clever>
yeah
bgs has joined #osdev
<clever>
the zfs tools also dont like doing this, `zpool create tank /dev/sda` will instead generate a GPT table, and put zfs on `/dev/sda1`
<clever>
so you cant actually take advantage of this
<clever>
its probably a legacy feature from before it had linux support
<clever>
why have grub learn this about every fs in existance? when you can just make a 1mb "bios boot partition" and problem solved
<mrvn>
plus tons of people installed of sda and then couldn't book because the BIOS is to stupid to boot without partition table.
<mrvn>
EFI needs the efi partition to boot as well.
<clever>
yeah
<clever>
and 3.5mb is too tiny for that
<clever>
this idea kind of worked for the MBR era
<mrvn>
yeah, except not reliable
<clever>
you could put a real MBR table in the first 8kb hole, to both make the bios happy, and hold the MBR stub
<clever>
but with the arrival of efi and gpt, it doesnt work as well
<clever>
so, with all of the above, the first 4mb of the partition, and the last 512kb, are all accounted for
<clever>
oh, another fun fact, zfs always writes ints in native byte order
<clever>
so if you write a record on a BE system, and read it on an LE system, it comes out backwards
<clever>
but, the records also have a magic# on them, that is also backwards, so you know if a swap is needed
<mrvn>
One thing I have a problem with is if you make 100 snapshots of A and then delete A. Now I have to compare A against 100 snapshots to find blocks I can free / must keep.
<moon-child>
no reference counting?
<mrvn>
moon-child: see above
adder has quit [Read error: Connection reset by peer]
adder has joined #osdev
<clever>
mrvn: also, the uberblocks, are padded to be 2^ashift in size, so how many you can fit into the uberblock array will vary
<mrvn>
moon-child: refcounting would have a similar problem. At first the root has a refcount of 1, make a snapshot and you have to update A and B to point to a copy with refcount 2. Another snapshot you have to write A, B and C to point to a refount of 3, ..., 100th snapshot then has to update 101 roots.
<clever>
but that ensures that a sector you partially write, wont destroy other records in the ring
<clever>
and a given uberblock, is always written to all 4 labels, on every vdev in the pool