#osdev on 2022-11-14 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <rwxr-xr-x> i see

00:00 <\Test_User> ld is typically the linker you'll use

00:00 <zid> and to produce object files from your assembler

00:01 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1041502972032598126/image.png

00:01 <zid> Makefile that runs nasm file.asm -o file.o -felf32 over each .asm file it sees, then runs ld file1.o file2.o file3.o -o test.elf ^

00:02 <heat> ld is like 3 linkers

00:02 <zid> along with some linkerscript voodoo to make it org the code to 0x7C00 and make sure boot.o's code ends up first

00:02 <\Test_User> > -o test.elf

00:03 <zid> yea, fite me irl

00:03 <\Test_User> definitely shouldn't be an elf file :P

00:03 <rwxr-xr-x> cool shit

00:03 <rwxr-xr-x> so much to learn

00:03 <zid> \Test_User: look at the commit

00:03 <rwxr-xr-x> almost overwhelming

00:04 <\Test_User> zid: it's a flat binary though, no?

00:04 <zid> What is the "it"

00:04 <gog> hi

00:04 <zid> hmm my linker script is wrong

00:04 <\Test_User> the output from ld, that you're putting into test.elf

00:05 <zid> no, it's an -melf_i386

00:05 <\Test_User> ahhh

00:05 <\Test_User> I see

00:05 <zid> 4288 bytes, test.bin, I fucked something up here..

00:05 <zid> oh it has oodles of padding

00:07 <\Test_User> any particular reason for elf_i386 when it's a mix of 16-bit and 32-bit code?

00:07 <zid> because there's no such thing as a 16bit elf?

00:07 <zid> and it has 32bit code in it

00:07 <zid> but who cares, it's just a carrier format so I can objdump it

00:07 <zid> objcopy*

00:07 <\Test_User> fair

00:08 <\Test_User> and it actually works right trying to use ld to match 32-bit addressing with 16-bit code?

00:08 <zid> calls/jumps are relative on x86, either way

00:09 <zid> and use the same instructions

00:09 <zid> the linker can't tell the difference

00:09 <zid> if I wanted to do 'dw some_32bit_symbol' i'd be in trouble, yea

00:09 <\Test_User> ah

00:10 <zid> pushed out the fixed version

00:10 <zid> properly works now (I forgot to remove the times db 512-($-$$) dw 0xAA55 part)

00:10 <heat> gog

00:10 <zid> which made the broken linker script look more correct than it was

00:10 <zid> heat: Are you beating my highscore yet? I'm up to 14af

00:11 <heat> no

00:11 <heat> im not having your malicious js running in the background

00:11 <\Test_User> does it error properly if you use more than 510 bytes of code or just pad itself again

00:11 <zid> I think it might? . = 0x200; is in there

00:12 vdamewood has joined #osdev

00:12 <zid> or it might just allocate around it and give you a bigger image idk :P

00:13 <zid> yep

00:13 <\Test_User> ld:link.ld:10 cannot move location counter backwards (from 0000000000007f29 to 0000000000007dfe) seems good

00:13 <zid> cannot move location coun-

00:13 <zid> linekrs are fnu

00:14 <\Test_User> cryptic error but not sure if much better could be done

00:15 <zid> nah cryptic is make

00:15 <zid> I swapped a $ and ( and it was complaining about 5 lines later where I used the symbol next about invalid whitespce or somthing dumb

00:16 <zid> gog: I just let the cat in, I think she had a message for you or something idk? She said Mrroworrrrw

00:17 <\Test_User> lol fair

00:17 <gog> zid: cool tell her i said "bbbbreeew"

00:19 <zid> gog: Any tips for not religiously typing objdump every time I want to dump a section, instead of objcopy?

00:19 <zid> *wall of useage text* *oh I used the wrong one again didn't I*

00:20 <zid> Thank god objdump -j is nonsense rather than "delete all your shit"

00:20 <gog> alias them

00:20 <zid> like sl?

00:20 <gog> alias copysection="objcopy -j"

00:20 <zid> I'd still type objdump

00:21 <gog> retrain yourself

00:21 <\Test_User> alias objdump='kill -9 1'

00:21 <gog> give yourself a little treat every tiem you do it right

00:21 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1041508083177902140/image.png

00:21 <zid> sl works better

00:21 <gog> pat yourself and call yourself a good boy

00:21 <zid> every time you type sl instead of ls, you have to wait for a steam locomotive to go past

00:21 <klange> set up a build pipeline that doesn't suck and never have to remember any of this shit ever again

00:21 <klange> i should package sl

00:22 <zid> I need an inverse alias

00:22 <zid> so that objdump -j runs sl

00:22 <klange> unfortunately it's a curses app and I haven't actually packaged curses in years

00:22 <klange> i should write my own curses

00:30 <heat> write your own curses?

00:30 <heat> you are clinically insane

00:31 rwxr-xr-x has quit [Remote host closed the connection]

00:31 <zid> I write my own curses all the time

00:32 <zid> by which I mean am I am too lazy to use ncurses and just put some escapes into my printfs

00:33 <klange> You should see my editor.

00:36 <gog> i can write my own curses

00:36 <gog> fuck

00:36 <gog> shit

00:36 <gog> see

00:36 <gog> easy

00:36 <zid> those are mine

00:36 <zid> get your own curses

00:37 <zid> with áéúóí

00:37 <gog> helvitis

00:37 <Matt|home> evening.

00:39 <Matt|home> i seem to have some mental block.. or learning disability perhaps, with this topic. i've taken to drawing shitty little diagrams on a whiteboard which were promptly mocked by my more IT experienced relatives

00:39 <Matt|home> im going to switch to crayon instead i think.

00:40 <zid> I thought you had actual brain damage, re a discussion in asm

00:40 <zid> or was that a different matt

00:40 <Matt|home> a different person. i haven't talked in that channel in 2+ years.

00:40 <Matt|home> but thank you.

00:41 <zid> just mistaken identity

00:43 <gog> hi

00:43 <Matt|home> o\

00:45 <klange> I suggest paper and pencil, less easy for others to look at.

00:45 <klange> Actually I suggest paper and erasable ink pens, but that's splitting hairs.

00:49 gog has quit [Ping timeout: 268 seconds]

00:53 <heat> gog

00:58 <zid> it's 1am heat

00:58 <zid> keep it in your pants

00:58 <heat> bazinga

00:58 <heat> oh wait wrong person answered

00:59 <heat> fuck

01:04 <heat> TIL -mrelax-cmpxchg-loop

01:04 <zid> what's that one do

01:04 <heat> Relax cmpxchg loop by emitting an early load and compare before cmpxchg, execute pause if load value is not expected. This reduces excessive cachline bouncing when and works for all atomic logic fetch builtins that generates compare and swap loop.

01:06 <heat> it even has a typo woah

01:07 <zid> get you some cachlines

01:07 <zid> want me to ping jwakely?

01:07 <zid> I've had him fix typos before

01:08 <heat> lol

01:09 <zid> punged

01:09 <zid> doing well today, bug in qemu, bug in gcc

01:10 <heat> LITERALLY BROKEN

01:10 <zid> critical 13.0 stopping bug

01:11 <klange> this is why I only use outdated versions of gcc

01:11 <klange> definitely not because I'm too lazy to port my target support patches forward and rebuild, no sirree...

01:12 MiningMarsh has quit [Read error: Connection reset by peer]

01:13 <klange> You would think I'd have Toaru support mainlined by now in gcc+binutils...

01:14 mavhq has quit [Ping timeout: 252 seconds]

01:16 mavhq has joined #osdev

01:18 <klange> if no one else has bothered to try to upstream my patches, then I don't deserve to have them upstreamed, simple as that

01:30 <kof123> Matt|home: what is the topic that causes such a grave situation as busting out the crayons that professionals use?

01:35 MiningMarsh has joined #osdev

01:43 <geist> heat: huh interseting

01:43 <geist> would be nice if it was more opt in than that maybe

01:44 <geist> ie, a separate builtin to accesss the relaxed version

01:44 <heat> i feel the same way

01:45 <heat> the only way this works is if you were either completely careless in your cmpxchg usage, or if you do a proper test-and-test-and-set everywhere and you don't need to pass -mrelax-cmpxchg-loop

01:46 <geist> and for something like your spinlock implementation you should either reimplement it per arch or already do the right thing

01:46 <heat> if you have test-and-test-and-set everywhere except your i.e spinlocks, your spinlocks get pessimized

01:47 <geist> might be interesting to see what it does with ARM

01:47 <geist> or is it x86 specific?

01:48 <heat> x86 as far as I can see

01:48 <heat> I didn't check the aarch64 options

01:48 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

01:49 <geist> i guess it having the instruction in the name pretty much means it's x86

01:50 <zid> it's on the x86 manual page

01:51 <zid> which doesn't stop something similar existing for other arches of course, but maybe then it'd be on the generic page instead

01:53 <Matt|home> kof123 : paging, reading up on it now.

01:54 <geist> yah lots of other arches tend to call the same thing 'cas' for compare and swap

01:54 <heat> https://godbolt.org/z/xr1xso4d6

01:54 <geist> armv8.1 included. 68k called it cas too

01:54 <bslsk05> godbolt.org: Compiler Explorer

01:54 <geist> ah thought the switch was in gcc 13+

01:55 <heat> this doesn't seem entirely like a good idea

01:55 <heat> unless that specific cmpxchg gets a lot of contention

01:56 <heat> your fast path takes a load and a cmpxchg

01:57 <Matt|home> is there a modern CPU architecture that doesn't use virtual addressing/have an MMU? apart from the very small ones like what arduino uses

01:57 <Matt|home> i mean like an actual desktop-usage computer

01:58 <Matt|home> or is it ubiquitous

01:58 <geist> Matt|home: no. not for desktop or server class stuff

01:58 <heat> i dont think so

01:58 <Mutabah> Desktop? Not that I know of

01:58 <geist> it's ubiquotous

01:58 <Matt|home> do they all function the same?

01:58 <Mutabah> There are the ARM -M variants

01:58 <heat> define same

01:58 <geist> not precisely,b ut they arrive at the same thing

01:58 rwxr-xr-x has joined #osdev

01:58 <Matt|home> i.e. the kernel sets up the page tables and that's all you have to worry about?

01:58 <geist> no they dont work the same

01:58 <Matt|home> kk

01:58 <Matt|home> so x86 is different than the others

01:58 <heat> I T A N I U M

01:58 <geist> but they have the same result: a translation of fixed size pages from virtual to physical addresses

01:58 <heat> no x86 is very similar

01:58 <geist> Matt|home: yah that's not the same thing from different fro the others

01:59 <Mutabah> x86/ARM are pretty similar in implementation

01:59 <geist> more like there are various patterns that some implementations follow. x86, arm, riscv are fairly similar

01:59 <Mutabah> (lots of little differences, but the same broad approach)

01:59 <geist> there are other strategise that other arches take.

01:59 <Mutabah> Compare to PPC where it's a software-managed TLB (iirc)

01:59 <heat> ITANIU

01:59 <heat> M

01:59 <geist> no PPC uses a hash table, but there are software managed things

01:59 <Matt|home> let me rephrase the question: if you were designing a kernel for two different architectures, is the leap between setting up paging very difficult to jump across or is it similar enough that it won't add six hours of reading material

01:59 vdamewood has joined #osdev

01:59 <klange> most of my mmu code between aarch64 and x86-64 is the same; enough that I should really consolidate it

02:00 <geist> Matt|home: very different. basically yo need to abstract the whole mmu into architecturally dependent code

02:00 <Matt|home> ok

02:00 <Matt|home> thank you

02:00 <geist> and then abstract it out

02:00 <CompanionCube> geist: i believe newer ppc has an additional radix mode

02:00 <heat> see bsd pmap

02:01 <geist> but your questions are in the right direction: ignoring the details of how each arch does the translation, what is the set of features the translations support

02:01 <geist> that's what you design your overall api for, and your virtual memory system around

02:01 <geist> and in that case it's pretty much standardized on basically the same set of features, plus or minus some

02:01 <heat> fwiw linux always assumes there are page tables, even when there's no such thing

02:02 <geist> yes and linux is the weird outlyer

02:02 <geist> that's highly page table centric and if you're not using page tables? (ie ppc) then too bad for yoy

02:02 <heat> i'm fairly sure they still maintain page tables

02:02 <heat> (on non pt architectures)

02:03 <geist> they do: you have to maintain it because it's what the upper VM does, but then at the arch level you end up translating it

02:03 <geist> into the PPC hash table, or in the case of the SW TLB you probably take a fault and then walk the upper level page table with software

02:04 <heat> https://twitter.com/itanium_guy/status/1582780728882114562

02:04 <bslsk05> twitter: <itanium_guy> Once you think you understand all about MMUs because you could figure out how self-mapping works... move to the next level: ␤ On Itanium, the VHPT Walker memory accesses (the ones that fetch PTEs) themselves go through the MMU... meaning walking is done with Virtual addresses. 🤪

02:04 <heat> what

02:05 <geist> problem with trying to unify page tables logic between x86, arm, and riscv is they eem to be prtty close, but the subtle details matter

02:05 <geist> so you possibly end up with huge piles of conditionals

02:05 <geist> heat: yep! i thought you had dug into it to figure that part out?

02:05 <heat> i don't remember this bit specifically

02:05 <geist> VHPT on ia64 is a weird beast. far as i know, aside from maybe VAX, there are no arches that do virtual page tables like that

02:06 <heat> i assume you need to insert the tlb entry for the vhpt manually

02:06 <geist> i think you do

02:06 <geist> or... you must always be able to handle a software tlb fault

02:06 <zid> wow, tick-tock-clock is now 0 A presses

02:06 <geist> and you can use that to insert the root of the VHPT

02:07 <geist> i think the idea is the sw tlb fault doseen't happen that often, so it's an acceptable outcome

02:07 <geist> i *think* in practice linux at east endedup deciding to not use the VHPT but it uses the Other Method, which i forget what its called, but it's functionally similar to POWER/PPC

02:07 heat has quit [Remote host closed the connection]

02:08 <geist> but i also think you can configure it per each 8 regions (bit [63:31] of the virtual address) so you can choose where to use either method. i think

02:08 heat has joined #osdev

02:09 <heat> we didn't deserve itanium

02:10 <geist> note VAX actually does someting kinda like this too: virtual space is statically carved up into 0-2GB (user) and 2GB+ is kernel. the kernel has a linear page table (just one flat thingm one entry per page) that lives in physical space

02:10 <geist> and each process has its own linear page table (0 ... length of page table) that lives in kernel space

02:11 <geist> so as the cpu is fetching the TLB for user pages it actually reads through the kernels virtual address space

02:11 <geist> so you can build sparse user page tables that way

02:11 <heat> isn't this very inefficient?

02:11 <geist> probably!

02:11 <heat> even just wrt memory usage

02:11 rwxr-xr-x has quit [Read error: Connection reset by peer]

02:12 <geist> ot really. remember this was the era when procsesses acted more in the sbrk() style. started from around 0 and grew upwards

02:12 <geist> so since when you loaded the user page table you set the base address + length, you only allocated as much table as you needed, and grew it oer time

02:12 <geist> and/or used the kernel page tables to provide a large sled of zero pages off the end of user

02:12 <geist> it is a kinda interesting problem to solve

02:12 <heat> good point

02:12 <heat> although they came up with mmap during VAX

02:12 <geist> also it was i an era when the cpu iterally had 4MB *max* memory

02:12 <geist> though later vaxen got up to 128MB or so

02:13 <heat> hmmmmm

02:13 <geist> so i think the vast majority of prcesses were very small, and clustered around 0

02:13 <geist> i think the big mistake they made with VAX was page size was 512 bytes, which i think quickly became too small

02:14 <heat> through my shitty calculations I'm getting 2MiB just for the kernel's page table

02:14 <geist> but hey, 1977. was pretty sophisticated at the time

02:14 <geist> heat: depens o how big you made the kernel. you also sized the kernel based on how much size yo uthink you needed

02:14 <geist> iirc in netbsd it'ssome percentage of total system memory

02:15 <geist> but functionally POWER/PPC has that problem too in a different way: you burn N bytes of physical memory to store the one page table, which must be physically contigous at boot

02:17 <geist> so in that sense intels radix tree when they added page tables was one of the few things they didn't outright copy from VAX. OTOH i dont think they invented it. other arches were doing that style too

02:17 <geist> btween like 1977 (vax) and 1985 (i386)

02:20 <Matt|home> "in x86 the kernel will load itself somewhere near the beginning of memory but it'll map itself closer to the end of virtual space" <-- is there a reason for this? seems more convenient to have a 1 to 1 mapping imo

02:20 <heat> i guess doing something like this for the kernel would be pretty smart?

02:20 <heat> sparse page table covering the whole range

02:20 <heat> Matt|home, yes, abi stability

02:21 <Matt|home> what's that mean, just cuz everyone else does it you should too?

02:21 <heat> if you map yourself at 1MiB and a user program wants to load itself at 3MiB, it means that you can't grow over 2MiB or you broke the ABI

02:21 <geist> Matt|home: well more ike they usually do it for a reason. so you figure out what the reason is

02:22 <geist> many times it's convenience and speed, or soetimes it's less obvious until later

02:22 <geist> but yes putting the kernel 'up high' on a 32bit system is basically standardized, for the reason heat described

02:24 [itchyjunk] has quit [Ping timeout: 260 seconds]

02:24 [_] has joined #osdev

02:38 * Matt|home slams his head on the desk

02:42 * heat slams his desk on the hand

02:44 Phytolizer has joined #osdev

02:44 Phytolizer has quit [Client Quit]

02:46 Phytolizer has joined #osdev

02:46 * klange smacks Matt|home around a bit with a large trout.

02:46 <Matt|home> thank you..

02:49 * Matt|home sticks a sharp pencil in his eye and pushes.. okay.

02:49 <Matt|home> "each process is given it's own individual page table" <--true or false..

02:50 Phytolizer has quit [Client Quit]

02:54 <heat> semi-true

02:54 <heat> some processes share page tables sometimes

02:54 <heat> but yes, 99% of processes have separate page tables

02:55 <Matt|home> e

02:55 <Matt|home> page directory

02:55 <Matt|home> directory not table

02:55 <Matt|home> see im already fucking stupid

02:55 <heat> (linux, and most other operating systems, have the possibility to create processes which share more or less stuff, one of those being the address space. but in practice most processes don't share anything)

02:55 <heat> hm?

02:57 <Matt|home> ... if i paypal you fifty dollars will you walk me through this until i understand it..

02:57 <heat> i dont want $50

02:57 <Mutabah> Just keep asking here, people will help

02:58 <heat> mostly because we don't use USD here

02:58 <Matt|home> i assure you that wherever on earth you are 50 USD is worth more than 50 of whatever currency you use..

02:58 <Matt|home> thanks inflation..

02:59 <Matt|home> okay look.. so here's what im sussing out.

03:00 <Matt|home> for x86.. each process is given it's own unique page directory, according to my notes (presumably im talking about linux). so if linux has a hard limit on the number of processes, which iirc is 512, that means there are 512 unique page directory entries correct?

03:00 <heat> sure thing but we still don't use usd here

03:00 <heat> linux doesn't have a hard limit on the number of processes

03:00 <heat> no

03:00 <Matt|home> i can link you to a paper i am currently reading that says otherwise

03:00 <heat> if you have 512 processes, you'll have roughly 512 page directories

03:00 <heat> sure

03:01 <zid> unique bottom halves, like weird minotaurs

03:01 <Matt|home> https://tldp.org/LDP/tlk/kernel/processes.html <-- let me find the exact sentence

03:01 <bslsk05> tldp.org <no title>

03:01 <klange> That document is from 1999.

03:02 <heat> btw euro are more valuable than usd

03:02 <Matt|home> This means that the maximum number of processes in the system is limited by the size of the task vector; by default it has 512 entries. As processes are created, a new task_struct is allocated from system memory and added into the task vector. To make it easy to find, the current, running, process is pointed to by the current pointer.

03:02 <heat> this is not true

03:03 <Matt|home> great..

03:03 <Matt|home> more confusion..

03:03 <klange> That is very ancient information.

03:03 <heat> 1996-1999 David A Rusling

03:03 <kazinsal> holy moly that's an old piece of paper

03:04 <Matt|home> yeah it's a little frustrating when every document im trying to read up on a subject is apparently subject to being out of date with incorrect information :\

03:04 xenos1984 has quit [Read error: Connection reset by peer]

03:04 <kazinsal> "Linux is a moving target; this book is based upon the current, stable, 2.0.33 sources as those are what most individuals and companies are now using."

03:04 <heat> please ask questions

03:04 <kazinsal> mah goodness

03:04 <heat> LMAO

03:04 <Mutabah> Try the osdev wiki

03:04 <heat> i don't recommend the osdev wiki

03:04 <Mutabah> Also, linux does some questionable things in the name of speed

03:05 <heat> but it's still better than this

03:05 <Matt|home> yeah i give up. my brain just doesn't work. im going for a walk

03:05 <Mutabah> And if you want a tutorial - https://os.phil-opp.com/

03:05 <bslsk05> os.phil-opp.com: Writing an OS in Rust

03:05 <heat> okay

03:05 <Mutabah> Sure it's rust - but it's the newest/best OSDev tutorial out there

03:06 <kazinsal> the chapter on paging in that one is one of the better explanations I've seen of x86-64 paging

03:06 <heat> i should write a tutorial some day

03:16 <klange> I should write a book in the style of Tanenbaum's Minix book.

03:21 <heat> commentary on UNIX v6 but it's about Onyx

03:21 <heat> ... that would actually be an interesting book lol

03:22 xenos1984 has joined #osdev

03:28 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

03:34 [_] has quit [Read error: Connection reset by peer]

03:34 <zid> What do you think of my processes as centaurs theory though

03:34 <kof123> (hold on zid, i was already typing) i tend to practice that the way to eat elephants is one elephant at a time. if i dont understand something, working on another component is acceptable, just so long as some progress is being made on one of the elephants

03:34 <kof123> short answer yes

03:35 <kof123> long answer: take that "kabbalah of os" graphic or whatever it was, and make a picture puzzle book lol

03:37 <heat> zid, not horrible

04:20 genpaku has quit [Remote host closed the connection]

04:24 genpaku has joined #osdev

04:26 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

04:41 srjek|home has quit [Ping timeout: 256 seconds]

04:59 MiningMarsh has joined #osdev

05:05 gxt has quit [Remote host closed the connection]

05:06 gxt has joined #osdev

05:08 bradd has joined #osdev

05:35 chartreuse has quit [Ping timeout: 260 seconds]

05:43 heat has quit [Ping timeout: 260 seconds]

05:43 bgs has joined #osdev

05:58 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

06:03 gxt has quit [Remote host closed the connection]

06:04 gxt has joined #osdev

06:07 bgs has quit [Remote host closed the connection]

06:13 eroux has quit [Ping timeout: 248 seconds]

06:15 MiningMarsh has joined #osdev

06:16 eroux has joined #osdev

06:22 Burgundy has joined #osdev

06:26 jjuran has quit [Ping timeout: 260 seconds]

06:31 jjuran has joined #osdev

06:36 jjuran has quit [Quit: Killing Colloquy first, before it kills me…]

06:37 bauen1 has quit [Ping timeout: 268 seconds]

06:37 jjuran has joined #osdev

06:48 Burgundy has quit [Ping timeout: 268 seconds]

07:21 <mrvn> Matt|home: Beware that linux didn't have threads. So when they added threads they did that by allowing processes to share an address space (and other namespaces). So it's all a bit confuising.

07:27 <mrvn> There are also schemes (not linux but generally) to (re)create page tables as needed. You have a fixed number of pages for page tables and you create them from the address space objects when processes page fault in an LRU fashion. Similar to swapping memory in/out you swap page tables.

07:33 bauen1 has joined #osdev

07:39 <Matt|home> thanks. i guess i just have trouble with abstract stuff

07:40 <Mutabah> Matt|home: https://os.phil-opp.com/paging-introduction/ - This is an excellent introduction to x86_64 paging

07:40 <bslsk05> os.phil-opp.com: Introduction to Paging | Writing an OS in Rust

07:40 <Matt|home> bookmarked thanks

07:41 <Mutabah> and to the concept in general

07:47 bauen1 has quit [Ping timeout: 260 seconds]

07:59 bauen1 has joined #osdev

08:32 joe9 has quit [Ping timeout: 260 seconds]

08:34 joe9 has joined #osdev

08:45 Burgundy has joined #osdev

08:48 bauen1 has quit [Ping timeout: 240 seconds]

08:50 bauen1 has joined #osdev

08:57 Burgundy has quit [Ping timeout: 240 seconds]

08:58 vdamewood has joined #osdev

09:01 vinleod has joined #osdev

09:01 SGautam has joined #osdev

09:01 vinleod is now known as vdamewood

09:01 vdamewood has quit [Killed (tantalum.libera.chat (Nickname regained by services))]

09:16 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

09:17 bauen1 has quit [Ping timeout: 268 seconds]

09:17 bauen1 has joined #osdev

09:22 bauen1 has quit [Ping timeout: 256 seconds]

09:23 bauen1 has joined #osdev

09:34 MiningMarsh has joined #osdev

09:47 bauen1 has quit [Ping timeout: 268 seconds]

09:49 bauen1 has joined #osdev

10:22 GeDaMo has joined #osdev

10:44 bauen1 has quit [Ping timeout: 256 seconds]

10:45 bauen1 has joined #osdev

11:09 SGautam has quit [Quit: Connection closed for inactivity]

11:11 gildasio has quit [Remote host closed the connection]

11:14 gildasio has joined #osdev

11:36 gildasio has quit [Ping timeout: 255 seconds]

11:40 gxt has quit [Ping timeout: 255 seconds]

11:43 gxt has joined #osdev

11:46 bauen1 has quit [Ping timeout: 268 seconds]

11:48 bauen1 has joined #osdev

12:03 Burgundy has joined #osdev

12:07 jafarlihi has joined #osdev

12:07 <jafarlihi> How are you supposed to use gprof with app that gets OOMd? It never saves the gmon.out since it gets killed before finishing

12:07 <Mutabah> Solve the OOM first?

12:07 <jafarlihi> I need the profile info to know what to solve

12:10 <Mutabah> "cleanly" quit early?

12:10 <jafarlihi> Oh yeah, that might work, thanks

12:13 <kazinsal> if you know you get 5 minutes in before dying of OOM, quit cleanly after 4 minutes and check the memory info

12:15 bradd has quit [Ping timeout: 268 seconds]

12:23 bauen1 has quit [Ping timeout: 260 seconds]

12:23 bauen1 has joined #osdev

12:41 epony has quit [Quit: QUIT]

12:47 gxt has quit [Remote host closed the connection]

12:47 gxt has joined #osdev

12:56 jafarlihi has quit [Quit: WeeChat 3.7.1]

13:10 epony has joined #osdev

13:11 bauen1 has quit [Ping timeout: 256 seconds]

13:13 bauen1 has joined #osdev

13:17 gxt has quit [Ping timeout: 255 seconds]

13:19 <mrvn> does valgrind survive a OOM kill?

13:21 bauen1 has quit [Ping timeout: 260 seconds]

13:22 gxt has joined #osdev

13:30 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

13:34 <zid> oh heat isn't here

13:34 <zid> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107676 for him later

13:34 <bslsk05> gcc.gnu.org: 107676 – Nonsensical docs for -mrelax-cmpxchg-loop

13:34 <mjg> lol title

13:35 <mjg> also this is bullshit

13:35 <mjg> pre-load *reduces* performance in face of multiple threads doing the owrk

13:37 <mjg> aand the intel doc linked in the commit does not corobarate what this person is saying

13:38 <mjg> while (__atomic_sub_fetch(&lock, 1, __ATOMIC_ACQUIRE) < 0) {

13:38 <mjg> do _mm_pause(); while (__atomic_load_n(&lock, __ATOMIC_ACQUIRE) != 1);

13:38 <mjg> }

13:38 <\Test_User> if you disable memory overcommit it can't be oom-killed

13:38 <mjg> it literally performs the atomic fucking op upfront

13:38 <zid> what is mm_pause btw

13:39 <mjg> presumably the pause instruction

13:39 <mjg> ye, it is

13:39 <zid> https://www.felixcloutier.com/x86/pause.html til

13:39 <bslsk05> www.felixcloutier.com: PAUSE — Spin Loop Hint

13:39 <mjg> oh?

13:39 <mjg> welp man

13:39 <mjg> not a smp person, are you

13:40 <zid> I'm not dumb enough to think I know enough to write my own high performance intrinsics

13:40 <mjg> the description above misses big part of the point of pause

13:40 <mjg> and that is to hopefully chill for long enough(tm) for whoever owns the lock to release it

13:41 <mjg> as in you try to mess with that cpu as little as possible

13:42 <zid> the one on the outside is to unpessimise?

13:42 <zid> try it, if it fails, do the pause+loop

13:42 <mjg> what

13:42 <mjg> in the linked bugzilla? that example is *wrong*

13:42 <zid> the one you just said

13:42 <zid> there is no one linked int he bugzilla

13:43 <mjg> the one i pasted does not have pause "outside" of anything

13:43 <zid> oh that's a clickable git SHA1, til

13:43 <zid> I didn't say pause outside

13:43 <mjg> > 14:42 < zid> the one on the outside is to unpessimise?

13:43 <mjg> wha'ts outside

13:43 <mjg> the atomic then?

13:43 <zid> yours is cmpxchg(); while(1){ pause(); cmpxcgh(); }

13:43 <zid> roughly

13:44 <zid> which seems to my naive face like it might be better than while(1){ pause(); cmpxchg(); } in the case where the lock isn't contested

13:44 <zid> aka unpessimise

13:44 <mjg> for single-threaded, it is WAY faster to start with cmpxchg

13:45 <zid> yes, I assumed so

13:45 <mjg> but it also happens to be faster for the multithreaded case

13:45 <zid> yes, I assumed so

13:45 <mjg> ultimately perf is mostly affected by what you do should the initial attempt fail

13:45 <zid> This seems like a lot of words to say "yes"

13:45 <mjg> bare minimum is to pause + re-read and check if it appears free

13:46 <mjg> a not higher than that is to speculatively backoff

13:46 <mjg> the mots common idea is to do exponential with an upper limit

13:46 <mjg> so 1 pause, then 2, 4 and so on

13:47 <mjg> this runs into problems of its own but it tends to beg ood enough(tm) at small (say < 50-ish threads) scale

13:47 <mjg> problems being potential starvation

13:47 MiningMarsh has joined #osdev

13:48 <mjg> where is that fucking paper

13:48 bauen1 has joined #osdev

13:48 <mjg> zid: read this motherfucker https://www.cs.rochester.edu/u/scott/papers/1991_TOCS_synch.pdf

13:49 <mjg> zid: fundamental concepts remain applicable today

13:49 <zid> no

13:49 <zid> wtf

13:49 <zid> I just asked if it unpessmised because this isn't my field, and I trusted you

13:49 <zid> but you gave me 4 pages of bullshit waffle instead, I am not your bro heat

13:50 <mjg> :[

13:50 Burgundy has left #osdev [#osdev]

13:50 <mjg> ok, how about this then

13:50 <mjg> don't trust the gcc doc

13:50 <mjg> kthxitsall

13:51 <zid> I am not trying to trust it

13:51 <zid> I am trying to write a fucking description of it

13:52 <zid> "Unpessimise cmpxcgh loops by hoisting one outside of the loop, and insert a pause inside the loop to relax cpu power use" or something idk

13:54 <mrvn> That loop could also be written as while(1){ if (cmpxchg()) break; pause(); }

13:55 <mjg> that remains highly pessimal

13:55 <mrvn> it's all the same code just written differently.

13:55 <zid> as in, it's worse than not doing it

13:55 <zid> or as in, you're a pro and you can do better

13:55 <mjg> mrvn: the code in the commit https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4d281ff7ddd8f6365943c0a622107f92315bb8a6

13:55 <bslsk05> gcc.gnu.org: gcc.gnu.org Git - gcc.git/commit

13:55 <mjg> performs a pre-read

13:55 <mrvn> What some people do is to do a non-atomic check first

13:56 <mjg> and another ead after the pause

13:56 <mrvn> mjg: the pasted code doesn't do a read, it does a write

13:56 <mjg> movl (%rdi), %ecx

13:57 <mjg> ..

13:57 <mjg> lock cmpxchgl %edx, (%rdi)

13:57 <mjg> it totally pre-reads the target

13:57 bauen1 has quit [Ping timeout: 268 seconds]

13:57 <mrvn> I'm talking about 14:38 < mjg> while (__atomic_sub_fetch(&lock, 1, __ATOMIC_ACQUIRE) < 0) {

13:58 <mjg> ok

13:58 <mjg> i was talking about the commit

13:58 <mjg> i did note the code from the paper which the comment references *does not do* what the commit claims

13:59 <mrvn> ok, so what are you trying to say? The commit is good? The commit references a bad pdf?

14:00 <mjg> the pdf is fine, at least the part referenced in the commit

14:00 <mjg> the commit *contradicts* the pdf and is wrong

14:00 <mrvn> mjg: Not really, if the pasted code was from the commit.

14:01 <zid> as in the broken chinese contradicts it, or the actual change to gcc contradicts it and the option is worse and useless?

14:01 <mrvn> from the pdf I mean

14:01 <mrvn> gcc does:

14:01 <mrvn> while (__atomic_sub_fetch(&lock, 1, __ATOMIC_ACQUIRE) < 0) { }

14:01 <mrvn> the pdf adds the pause loop

14:01 <mjg> no, according to https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=4d281ff7ddd8f6365943c0a622107f92315bb8a6

14:01 <zid> 'no' what?

14:01 <mjg> gcc rolls wiht a load

14:01 <mjg> movl (%rdi), %ecx

14:02 <zid> oh that was probably to mrvn

14:02 <mjg> > To relax above loop, GCC should first emit a normal load, check and jump to

14:02 <mjg> .L2 if cmpxchgl may fail.

14:02 <mrvn> mjg: that isn't the problem. It always does a lock cmpxchgl

14:02 <mjg> it only does after the initial load

14:02 <zid> [14:00] <mjg> the pdf is fine, at least the part referenced in the commit

14:02 <zid> [14:00] <mjg> the commit *contradicts* the pdf and is wrong

14:02 <zid> [14:01] <zid> as in the broken chinese contradicts it, or the actual change to gcc contradicts it and the option is worse and useless?

14:02 <mrvn> mjg: yes, but it always does it

14:02 <mjg> which *hurts* performance

14:04 bauen1 has joined #osdev

14:04 <mrvn> mjg: the original gcc code always does a "lock cmpxchgl", the pdf adds a "pause" loop when cmpxchgl fails, the patch in the commit adds the pause loop but also only does a "lock cmpxchgl" when the read says it will succeed

14:04 <mrvn> So to me it seems it is one step better than the pdf.

14:04 <mjg> i keep saying this load from the get go is pessimal

14:04 <mjg> before anything happens

14:05 <mjg> and is not what the pdf recommends either

14:05 * zid refuses to get drawn in

14:05 <mjg> it is pessimal *both* single and multithreaded

14:05 <zid> as in the broken chinese contradicts it, or the actual change to gcc contradicts it and the option is worse and useless?

14:06 <mjg> the option as implemented is bad, but should it move the the standard model a'la what's seen in the pdf

14:06 <mjg> it would be fine

14:06 <mrvn> mjg: haeh? How would you avoid the read? The code does "x |= 1;". That's a read-modify-write. No way to not read.

14:06 <mjg> as is it may happen to help or slow things down

14:07 <mjg> movl %eax, %edx

14:07 <mjg> orl %esi, %edx

14:07 <mjg> the found value is *not* used when computing the vlaue to be set

14:07 <mjg> lock cmpxchgl %edx, (%rdi)

14:07 <mjg> uh, did not paste the initial read: movl (%rdi), %ecx

14:08 <mrvn> mjg: oh, in the patched code. That seems wrong.

14:08 <mrvn> cmpl %eax, %ecx <---- there it is used

14:09 <mrvn> but %eax doesn't seem to be initialized by that code.

14:09 <mjg> yes, it is used to skip cmpxchg

14:09 <mjg> i assumed %eax was initialized elsewhere

14:09 <mrvn> In the original (bad) gcc code %eax was initialized

14:09 <mjg> will ahve to write a toy sample later

14:09 gxt has quit [Ping timeout: 255 seconds]

14:10 <mrvn> "movl v(%rip), %eax" seem to have been lost

14:10 <mjg> oh, it is atomic_fetch_or et al, so they can't bts

14:11 <mjg> now i'm curious what clang is doing

14:11 <mrvn> yes, or, xore, and, nand as atomic read-modify-write. Can't avoid the read.

14:11 <mjg> will have to get back to it in 1h or so

14:11 gxt has joined #osdev

14:12 <zid> Thanks for the.. help? I guess? *cries*

14:13 bauen1 has quit [Ping timeout: 260 seconds]

14:13 bauen1 has joined #osdev

14:13 <mrvn> mjg: what I find more worrysome is that the patched code as shown is an infinite loop. If cmpxchgl returns "ne" then it tries again. Otherwise it does "rep nop" and tries again. Nowhere does it exit the loop.

14:14 <mjg> ye the code as pasted is definitley not what is normally generated

14:14 <mjg> i would hope anyway

14:15 <mrvn> I think the shown code is just a badly stitched together fragment of the generated code.

14:15 <mrvn> Should never pass the test cases as shown.

14:16 <mjg> agreed

14:17 <mrvn> check what gcc and clang actually generate as code now.

14:18 <mjg> 15:11 < mjg> now i'm curious what clang is doing

14:18 <mjg> 15:11 < mjg> will have to get back to it in 1h or so

14:48 <mrvn> "We are undercover tactical nuns."

15:04 <sbalmos> in the right hands, those wooden rulers can be deadly

15:08 bgs has joined #osdev

15:09 rwxr-xr-x has joined #osdev

15:13 <mjg> so, despite the name of the gcc opt it is apparently only of significance for the loops explicitly implemented by the compiler for atomic_fetch_* primitives

15:14 <mjg> for that particular usecase the spinlock-related reasoning does not apply

15:15 <mjg> failing cmpxchg grants you exclusive access to the cache line

15:15 <mjg> if the only thing you want to do is to slap some value into it and gtfo, pause *reduces* performance

15:17 <mjg> so tl;dr it is wrong, but for a different reason than i inititally thought

15:17 dennis95 has quit [Ping timeout: 255 seconds]

15:19 <mjg> as in the doc talks about spinlocks and i blindly assumed the patched code is spinlock-y in nature

15:19 <mjg> which it is not

15:20 <mrvn> huh? pause makes it so other threads can make progress while you are stuck in a loop

15:21 <mrvn> If you have contention the cache line will bounce around cores all the time using up 99.99% of the bandwidth

15:21 <mjg> dude

15:21 epony has quit [Ping timeout: 268 seconds]

15:21 <mjg> for a case like the above, where you want to slap a bit into it and leave, it is pause which *adds* bouncing

15:22 <mjg> because

15:22 <mjg> > 16:15 < mjg> failing cmpxchg grants you exclusive access to the cache line

15:22 <mjg> which you can immediately take advantage of

15:22 <mjg> if you pause and there is other traffic, you just lost the E status

15:22 <mrvn> you mean when you just want to set a bit you don't want to loose the cache line before you retry?

15:23 <mrvn> ok, that makes sense.

15:24 <mrvn> But consider the case of 64 cores all wanting to set a bit. They will all be failing and retrying N^2 times

15:24 <mrvn> what you would want is a random amount of delay between retries

15:26 <mjg> this is true on arch

15:26 <mjg> erm arm64

15:26 <mjg> it is not true on amd64

15:26 <mrvn> mjg: if 2 cores do cmpxchg don't you loose the E status to the second core imediately?

15:28 <mrvn> or does speculative execution happen and the pipeline sees you do the cmpxchg on the same address again and keeps the cache line locked?

15:30 <mrvn> It sounds like you are banking on the later and then each core would succeed on the second try.

15:31 <mjg> you keep losing it as others cmpxchg to some degree, but there is apparent optimizaiton concerning this in uarchs

15:31 <mjg> you get yourself a real-world loop which has to - say - inc/dec by 1 as long as before/after is not 0

15:32 <mjg> you slap pause() into it and performance goes down

15:32 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

15:32 <mrvn> With pause() the cache line definetly bounces.

15:32 dude12312414 has joined #osdev

15:33 <mrvn> but in a bandwidth friendly way. :)

15:35 <mrvn> I could definetly agree that decision wether to pause or not pause is something that depends on more context. The lowlevel op can be used many ways and only some benefit from pause while others get hurt by it.

15:37 <mrvn> worse if you contradict the uarch optimization

15:37 <mjg> last time i benchmarked this specifically was with 80 threads

15:37 <mjg> but i don't remember the numbers, apart from a win from NOT pausing

15:38 <mrvn> the good example for pause is a lock where the pause lets other threads do their work and release the lock.

15:38 <mjg> that's right

15:39 <mjg> i confused myself with the gcc committer referencing a spinlock implementation when doing something unrelated

15:39 <mrvn> Or in general pause helps if other threads aren't doing cmpxchg at the same time.

15:40 <mrvn> slows down the cmpxchg but speeds up everything else.

15:42 <mrvn> But as with so many micro benchmarks the effect is probably less than the noise. You can certainly find enough examples for it slowing things and speeding up things.

15:43 <mrvn> At least the gcc thing is an option. You can use it where it helps and otherwise just not give the option.

15:43 <mjg> the effect is very real when the target object is refcounted in this manner for example and slapped a lot

15:43 <mjg> as in you will see it in real workloads

15:44 <mrvn> refcounted? If the uarch holds the E status then pause would slow things down.

15:45 <mjg> reference counted

15:49 <mrvn> the way I can see pause helping is when the refcount is increased and then decreased before the pause in other thread completes and therefore prevents the cache line from bouncing. You would have to hammer the refcount for that to happen. On the other hand if you have many threads accessing the refcount at the same time (e.g. woken up by a condition) but then leaving it alone then pause should hurt.

15:50 <mjg> it always hurts for this case on amd64

15:51 dennis95 has joined #osdev

15:51 <mrvn> I'm happy that I don't have threads or shared memory in my kernel and this scenario basically can't come up. At least not outside the microkernel IPC mechanism.

15:58 srjek|home has joined #osdev

16:19 nur has joined #osdev

16:21 MiningMarsh has joined #osdev

16:22 Terlisimo has quit [Quit: Connection reset by beer]

16:25 Terlisimo has joined #osdev

16:37 bgs has quit [Remote host closed the connection]

16:38 LostFrog is now known as PapaFrog

16:38 rwxr-xr-x has quit [Remote host closed the connection]

16:42 epony has joined #osdev

16:47 Dyskos has joined #osdev

16:48 heat has joined #osdev

16:55 <heat> linux just got rid of the red black tree in mm_struct

16:55 <heat> it's now a maple tree

17:05 <mrvn> What's a maple tree? Google only finds nature links.

17:05 <mrvn> (and a canadian flag)

17:06 <heat> https://lwn.net/Articles/845507/

17:06 <bslsk05> lwn.net: Introducing maple trees [LWN.net]

17:06 <heat> and https://www.youtube.com/watch?v=XwukyRAL7WQ

17:06 <bslsk05> 'The Linux Maple Tree - Matthew Wilcox, Oracle' by The Linux Foundation (00:39:02)

17:15 <mrvn> reads like it's a B-tree with locks for inner nodes and RCU for leaves.

17:16 <mrvn> So theoretically nothing new but lots of fiddly bits to implement the API needed for VMAs.

17:16 <mrvn> interesting mix of locked and lockless if I'm reading it right.

17:26 netbsduser has joined #osdev

17:29 xenos1984 has quit [Ping timeout: 256 seconds]

17:30 xenos1984 has joined #osdev

17:33 dude12312414 has quit [Remote host closed the connection]

17:35 dude12312414 has joined #osdev

17:39 <vdamewood> Maple trees are for saps.

17:39 <zid> Still waiting on Elm Tree

17:39 <vdamewood> What a nightmare.

17:49 <sbalmos> at least it's not a sweet gum tree with those spiky ball seeds

17:50 Dyyskos has joined #osdev

17:53 Dyskos has quit [Ping timeout: 260 seconds]

17:55 gog has joined #osdev

18:09 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

18:10 Dyyskos has quit [Quit: Leaving]

18:12 poyking16 has joined #osdev

18:17 gog has quit [Quit: byee]

18:18 xenos1984 has quit [Ping timeout: 260 seconds]

18:18 gog has joined #osdev

18:21 eroux has quit [Ping timeout: 260 seconds]

18:24 eroux has joined #osdev

18:30 poyking16 has quit [Quit: WeeChat 3.6]

18:33 xenos1984 has joined #osdev

18:59 <geist> always thought that b-tree like structures should be more used in data structures like that

19:03 <mrvn> having nodes be a cache line or two makes a lot of sense.

19:04 <mrvn> 4 sounds like a lot. But you have to balance locking with cache lines I guess.

19:06 <gog> this reminds me, i was going to improve my virtual memory allocator

19:06 <gog> not with a b-tree, but it's a tree

19:06 <gog> just like everything else in my "kernel" it has an inconsistent interface

19:07 <GeDaMo> Consistently inconsistent :P

19:07 <gog> yes

19:07 <gog> that's me in a nutshell

19:07 <gog> but also i want to hide the implementation a little better

19:07 <mrvn> The address space allocator or the mapping code?

19:07 <gog> allocator

19:08 <gog> the mapping code needs help too

19:08 <gog> it's all a big mess

19:08 <gog> my life is too so

19:11 <mrvn> gog: I'm still just searching the page table. B-tree enough?

19:12 <gog> i use an rbtree with a key of (base, length)

19:12 <gog> so it's fairly fast to see if an address is allocated

19:12 <gog> the comparator just checks if there's any overlap

19:13 <gog> if there is, then it checks the sucessor node

19:13 <gog> there's some low-hanging optimizations in there too

19:13 <gog> probably another tree with the same keys of free ranges

19:14 <gog> i think geist said something about that months ago when i was initially implementing it

19:14 <mrvn> I have the address past the last allocation and then I just check the page table for the next gap of N+1 pages.

19:14 <gog> that works too, but when the address space gets fragmented it'll slow down over time

19:15 <gog> but you can then just keep lists of free regions of 2^n pages

19:15 <mrvn> Not really. In 64bit the size of the address space is so much bigger than the ram that barely anything will be used.

19:15 <gog> oh yeah

19:15 <geist> well, not so fast. you can pretty easily fragment the crap out of a 64bit aspace

19:16 <mrvn> First time you would even notice fragmentation is when you used up 512Exabyte of ram. That's a lot of alloc and free.

19:16 <geist> especially when at leat given current hrdware it's really more like 47 or 48

19:16 <geist> also consider: ASLR

19:16 <gog> yes

19:16 <gog> i was about to point out most 64 bit impls are 48 bits or fewer

19:16 <mrvn> geist: You can have a ton of 4k pages allocated but the remaining free space will be magnitudes larger.

19:17 <geist> no doy. but that's not fragmentation

19:17 <mrvn> If your pages are close together an alloc might have to skip over all of them but then it will reach a huge hole for the next few million allocs. If the pages are further apart it will often find a hole while skipping. So I think it balances.

19:18 <mrvn> Note: I don't have mmap so you can't spam the address space with lots of mapped pages.

19:20 <mrvn> geist: Would you make allocations return randomized addresses instead of going round-robin?

19:20 <geist> depends. personally i'd like the latter, eve if doing ASLR (i think that's functionally what linux, etc does)

19:21 <geist> but in fuhsia we basically go full on random

19:21 <geist> so the aspace over time gets completely shotgunned with random allocations

19:21 <geist> for better or worse

19:22 <mrvn> The biggest drawback I've noticed is that my page tables can become big. You can leave 1 page allocated in a leaf and it will keep all 4 levels of the page tables locked in memory.

19:22 <gog> and that's when a fast way to look up (base, length) pairs comes in handy

19:23 <gog> if you've got holes everywhere in the aspace and you're just picking one at random

19:23 <mrvn> So 8-12k overhead per allocated page.

19:23 <geist> yup

19:24 <geist> though it's ot exactly 8-12k per page

19:24 <geist> but more like 8-12k per allocation, since neighboring pages mostly take advantage of thes ame page tables, etc

19:24 <mrvn> gog: you can have as many holes as allocations. With 8 GB ram that's max 2 million holes in a 47bit address space.

19:25 <geist> but yeah 64bit is so much nicer here. lots less stuff to worry about re running out of space

19:26 <mrvn> geist: the IPC moves pages between address spaces so I get a ton of single pages moving around. If you have some allocations that you keep inbetween then you could easily end up with one page per leaf. But that's the worst case.

19:26 * geist nods

19:27 <geist> you could i guess reserve a chunk of the aspace for incoming IPC buffers and then maybe mitigate it a bit

19:27 <geist> ie, this 512GB or 1GB rgion is where they come in

19:27 <mrvn> But my intention is that the libc would allocate 2MB for the heap and not single pages. And 2MB allocs can skip ahead in the page table and use a level 3 entry for a big page.

19:28 <mrvn> I also have a used_for flag in alloc, like GFX, IPC, ... so I can indeed reserve chunks for different use cases.

19:28 * geist nods

19:29 <mrvn> I've added that for the RPi so gfx memory uses pages below 1GB that the VC can access.

19:32 <mrvn> reusing addresses for IPC could also avoid having to allocate page tables and to invalidate page table walks.

19:33 <mrvn> On the other hand I like having use-after-ipc to just fail because the page will be unmapped.

19:34 cyao has joined #osdev

19:34 <heat> geist, it's not only the fact that it's a btree. being able to use RCU is a big win

19:35 <heat> i've seen the mmap rwlock get hugged to death in a big server with a >1000 threads

19:35 <heat> doing cat /proc/<pid>/maps would literally hang for seconds

19:36 <cyao> Hello, how do you implement the FILE type? is just a char* enough?

19:36 <heat> no

19:36 <heat> char * to what?

19:36 <cyao> To the file

19:37 <heat> what file?

19:37 <heat> what is a file and how is it a string

19:37 <cyao> like im just wantting to access a few small files

19:37 <cyao> read them from disk

19:37 <cyao> and access them

19:38 <gog> FILE is a complex structure

19:38 <heat> define file and how would it be a string

19:38 srjek|home has quit [Ping timeout: 268 seconds]

19:38 <cyao> so is just reading them and putting them in memory, then giving the function that needs the file the pointer to the memory good?

19:38 <gog> that's not really how it works

19:38 <heat> well, i mean

19:38 <heat> technically?

19:38 <gog> ok

19:38 <cyao> umm a file like a text file, plain text

19:39 <heat> are you doing this in the kernel?

19:39 <gog> so small files that are not sparse

19:39 <cyao> yes :P

19:39 <gog> you can just mmap those in one chunk into the address space

19:39 <heat> usually you create a VFS and read through that

19:39 <cyao> just tring to achive file reading

19:39 <gog> and then FILE just contains some pointers and implementation details

19:39 <heat> kernels don't have FILE in the C standard library sense

19:39 <gog> yes

19:40 <heat> see linux's struct file for an equivalent-ish

19:40 <cyao> you have a link to linux's file?

19:40 <cyao> couldn't quite find it

19:40 <heat> elixir.bootlin.com

19:40 <heat> search there

19:40 <cyao> i only found the one for aarch

19:40 <cyao> okk thx

19:41 <heat> the correct UNIX-like VFS approach is to have struct file, which represents a file descriptor, struct inode, which represents a filesystem inode (where you do reads), and struct dentry, which represents a directory entry (or a directory itself in case the inode is a directory)

19:41 <heat> or <insert BSD struct names>

19:43 <cyao> I just searched in the site, and it told me that the file def is just like this: typedef struct FILE { char dummy[1]; } FILE;

19:43 <cyao> Am I looking at the right one?

19:45 <heat> i told you, it's struct file, not FILE

19:45 <gog> https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L940

19:45 <bslsk05> elixir.bootlin.com: fs.h - include/linux/fs.h - Linux source code (v6.0.8) - Bootlin

19:45 <heat> thank you goggers

19:45 <gog> all caps is the stdio library, part of the c standard library

19:45 <gog> yw heaty

19:45 <heat> gog

19:46 <gog> yes my son

19:46 <cyao> Ahh thanks a lot!

19:46 <heat> BAZINGA

19:46 <heat> *laugh track*

19:46 <gog> bazooper

19:47 GeDaMo has quit [Quit: I'm going to show these people something you don't want them to see. I'm going to show them a world without you.]

19:48 <geist> also that whole FILE is just a char[1] is not correct at all

19:49 <geist> i have no idea where you found that, but that's not what a FILE is. but FILE in particular is a structure used in user space libcs to abstract a handle to a kernel file

19:49 <geist> so it really doesn't mean much in kernel space anyway

19:49 <gog> i think that's just the opaque pointer impl

19:49 <gog> the actual structure depends on the system

19:50 <gog> if they were looking at libc or smth

19:50 <gog> i'd have to open it up but it's almost dinner time

19:51 <cyao> geist i just found it in stdio.h https://elixir.bootlin.com/linux/latest/source/tools/include/nolibc/stdio.h#L25

19:51 <geist> ah yeah probably true

19:51 <bslsk05> elixir.bootlin.com: stdio.h - tools/include/nolibc/stdio.h - Linux source code (v6.0.8) - Bootlin

19:51 <cyao> dunno why it's there

19:51 <geist> yeah that's just some user space wrapper thing

19:51 <geist> i do, but frankly it's not worth talking about

19:51 <geist> since it's just some minutae around opaque usr space handles, etc

19:51 <gog> yeah don't worry about stdio.h

19:51 <geist> anyway it's not what you're looking for

19:52 <cyao> okk

19:52 * gog waves her hand "this is not the struct you're looking for"

19:53 <cyao> woah how do you do the actions like this?

19:53 <cyao> im new to this irc thiggy

19:53 <cyao> *thingy

19:54 <\Test_User> /me <msg>; probably, client-dependent but practically always that

19:54 * cyao hello

19:54 <cyao> ahh thx

19:55 cyao has quit [Changing host]

19:55 cyao has joined #osdev

20:07 cyao has quit []

20:18 <heat> geist, you do?

20:19 <heat> oh wait, I see what it is, in context

20:19 <heat> just a dummy struct

20:20 <heat> and cuz empty structs are undefined in C, they add a dummy member

20:22 <mrvn> heat: and char dummy[] or char dummy[0] has compiler problems too on odd unix systems

20:23 <mrvn> Although I'm not sure if linux was ever able to compile with not gcc anyway.

20:23 <heat> https://elixir.bootlin.com/linux/latest/source/include/linux/compiler-intel.h

20:23 <bslsk05> elixir.bootlin.com: compiler-intel.h - include/linux/compiler-intel.h - Linux source code (v6.0.8) - Bootlin

20:24 <heat> no idea if this works though

20:24 <heat> also clang if you count it as not gcc :)

20:26 <mrvn> heat: I don't. it implements (all) the gcc extensions.

20:26 <mrvn> (not for this case)

20:27 <heat> not quite all the gcc extensions

20:27 <heat> there has been some effort to get clangbuiltlinux (see clangbuiltlinux)

20:28 <mrvn> not really relevant for that FILE dummy struct

20:28 <mrvn> that's pre c99 stuff

20:33 <heat> most of linux still compiles in gnu89 mode

20:33 <heat> (although they pass -std=gnu11 iirc now)

20:33 <heat> the linter will hurt you if you mix declarations with code

20:34 <gog> clangux

20:34 <heat> bazingux

20:34 chartreuse has joined #osdev

20:40 <mrvn> Poll: "Jason X": Horror or Commedy?

20:50 <gog> heat: do you program visual entertainments

20:50 <gog> mrvn: both

20:50 <gog> i haven't actually seen it

20:51 <gog> if it's anything like jason goes to hell then it's both

20:51 <gog> over-the-top horror is also hilarious

20:51 <heat> what's a visual entertainment and why does that sound dirty

20:51 <gog> heat: bazinga

20:51 <heat> 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂

20:51 <heat> 😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂😂

20:54 <zid> nice black boxes

20:54 nickster has joined #osdev

20:54 <zid> gog when are you teaching me how to write an OS

20:55 <zid> mj only cares about smp locking prims, heat only cares about football and emoji

20:55 <heat> i also care about smp locking primitives

20:55 <zid> only enough to stop mj raging

20:55 <heat> it's not just football and emoji

21:00 puck has quit [Excess Flood]

21:00 <gog> zid: i don't know how to write an os

21:00 <gog> i can write pieces of a kernel

21:01 puck has joined #osdev

21:01 <zid> that's fi ne that's what I mean

21:01 <gog> oh

21:01 <zid> OS ends at kernel

21:01 <zid> I either need to write, a vfs, or I need to write an allocator

21:01 <gog> well step right up young man and welcome to gog's academy of partially-completed ideas

21:01 <zid> pick one

21:02 <zid> if you pick the former you have to write it without the latter ofc

21:02 <gog> well

21:02 <gog> i was just gonna say

21:02 <gog> how can you have a vfs if you don't have any ~~beans~~ allocator

21:03 <gog> i guess you could just statically allocate structures and only do operations one at a time and have to repopulate it every time you need to make a transaction from memory to fs

21:03 <gog> that sounds error-prone and also awesome

21:03 <gog> and vice versa

21:03 <gog> awesome and error-prone

21:04 <heat> write a slab allocator

21:04 <zid> eh just have struct blah n[MAX_FDS];

21:04 <gog> write a slub allocator

21:04 <zid> okay tell me

21:04 <heat> do you want the theory behind slab

21:04 <mjg> bonwick moment!

21:04 <gog> write a slob allocator

21:04 <heat> slob is shit and getting removed

21:04 <zid> I thought we already settled on slub

21:05 <gog> dang how is it going to allocate me then

21:05 <heat> slub is like slab but each partial slab is percpu

21:05 <mjg> write slyb

21:05 <gog> what if gog was one of us, just a slob like one of us

21:05 <zid> if I were some idiot you'd all be fawning over me trying to teach me shit

21:05 <zid> clearly you respect me too much to actually tell me things

21:05 <heat> hi idiot, im heat

21:05 <zid> you just wanna meme about sl?b

21:06 <gog> what am i supposed to teach you that you don't already know

21:06 <zid> I don't know anything about allocator design

21:06 <gog> don't you have a more advanced experiment than i do

21:06 <mjg> excrement

21:06 <zid> my kernel's 'allocator' so far is just a free list of pages

21:06 <gog> so is mine basically

21:06 <zid> so you can alloc I guess, you just can't free or do allocations that aren't 4k aligned

21:06 <gog> yeah

21:06 <gog> sounds familiar

21:06 <heat> ok so basically slab works in this way: you have caches, a cache being a collection of slabs (we'll get there) plus a ctor and dtor (optional, linux has stripped those out)

21:06 <mjg> don't let anyone gatekeep the term allocator!

21:07 <gog> if it allocates it's an allocator

21:07 <gog> doesn't have to free

21:07 <heat> a cache also has a name and an object size (so you create a cache for inodes, a cache for dentries, a cache for each kmalloc size class, etc)

21:08 <heat> a slab is a collection of objects, usually PAGE_SIZE'd, sometimes not (if the objects are individually too big, you allocate a higher order slab)

21:08 <zid> pool allocator with multiple pools okay

21:08 <gog> billiards allocator

21:08 <heat> slabs can be free (no objects being used), partial (some objs being used), free (no objs being used)

21:09 <heat> erm

21:09 <heat> *full (all objs being used)

21:09 <heat> basically on alloc you try a partial slab (if there is one), allocate

21:10 <heat> if you don't have partial slabs, you allocate a new slab and just grab an object from there

21:10 <zid> so your description of how an allocator works is step 1. Draw some rough lines. step 2. ??? step 3. Allocate.

21:10 <heat> each slab has a free list

21:10 <heat> so allocating is trivial

21:11 <heat> the advantages of this slab thing is that you have a lock per object and you *never* return slabs back to the page allocator unless you really need to (or if you got some heuristic to minimize slab memory usage)

21:11 <heat> s/object/object cache/

21:12 <heat> if you go down the shitty ctors and dtors you also get some theoretical benefits when constructing objects but I personally believe it's a stupid weird myth and it doesn't gain you much

21:12 <zid> gog: This is why I asked you

21:12 <heat> such that linux has done away with those

21:12 <dminuoso> I think the thing that is often missed with slab is the rationale. There's usually way too much talk about finer implementation details, and lack of the rough idea..

21:12 <zid> I don't give a fuck about any of this

21:12 <mrvn> your slab can also have a move callback if objects are movable. Then you can defragment slabs to free more

21:12 <heat> ...

21:12 <zid> dminuoso: exactly

21:13 <heat> great waste of my time, thanks

21:13 <heat> dminuoso, I could go on about "caches" and constructing/destroying but I don't believe in any of that

21:13 <mrvn> dminuoso: slabs have 3 benefits: 1) fast and simple, 2) efficient because it's equal sizes objects, 3) cache locality.

21:13 <clever> mrvn: palmos and the rpi firmware have a relocatable heap that isnt slab based, and doesnt use a callback either

21:13 <heat> I just believe in less lock contention, which is what you essentially get

21:14 <zid> "explain calculus" heat: "Well the derivative of speed is acceleration" dmi: "Calculating the area of a shape by cutting it into tiny slices and summing them"

21:14 <zid> guess which one is implementable

21:14 <mrvn> clever: most objects can't just be moved without fixing some pointers to them

21:14 <clever> mrvn: yeah, thats why the relocatable heap doesnt use pointers but handles, and you must lock the object to get its current addr

21:14 <zid> "ctors gives a benefit supposedly but linux got rid of them" is not

21:14 <heat> zid, I literally explained to you how a slab allocator works from top to bottom

21:14 <heat> what else do you want?

21:15 <mrvn> clever: that's just an object store, not a heap

21:15 <zid> ah so you're not even aware of the problem, no malice then

21:16 <clever> mrvn: once an object is locked, you get a physical address and it is contiguous in the physical space, that sounds heap-y to me?

21:16 <heat> what do you want to know about ctors?

21:16 <heat> time has proven them to be a shitty idea

21:16 <zid> Nothing, the question is why you even mentioned them

21:16 <zid> if you weren't going to explain them

21:17 <zid> you don't think you should so you didn't

21:17 <mrvn> clever: and once you put it back it can move around.

21:17 <zid> but what i want is to implement this, not to read a treatise on which parts are 'good'

21:18 <dminuoso> The gist of slab allocator is simple: rather than allocating pages on demand, you preallocate, and instead of unmapping you mark as free. This speeds up allocation in that you already have page table mappings. Furthermore, depending on what object type you want to store, you might group allocations together for a particular object type (say you need a THING_T often, so maybe you have a region in

21:18 <dminuoso> which you have a bunch of THING_T preallocated and perhaps even initialized, such that an allocation is just handing you a pointer - without finding a free page, mapping it, and initializing it.

21:18 <clever> mrvn: yep, once you unlock it (refcnt based), the kernel can move it to defrag the heap

21:18 <clever> mrvn: but i also recently learned, the rpi has a special flag when you unlock, to say the contents are not actually of importance

21:18 <clever> so it can skip the actual memcpy when moving

21:18 <zid> dminuoso: Nice. What's the strategy for actually allocating?

21:19 <dminuoso> zid: whatever you please, really

21:19 <mrvn> clever: so more a free than put back

21:19 <dminuoso> It's largely irrelevant

21:19 <zid> I still need an implementation of that though

21:19 <clever> mrvn: but the space is still reserved, and you will instantly get a range again next time you lock

21:19 <gog> zid: i respect you enough to know that you're not actually asking me for help

21:19 <zid> It's presumably fairly important that the actual you know, allocation bit of it works nicely, before you add a vaneer to it

21:19 <heat> dminuoso, except that has proven to be largely irrelevant

21:20 <mrvn> zid: usualy free THINGs are in a linked list and you just allocate by taking the head.

21:20 <heat> the caching part of the "object-caching kernel memory allocator" has been almost completely removed

21:21 <heat> what truly matters is the allocation algorithm - which I described in detail per bonwick 94

21:21 <mrvn> heat: how is ctor/dtor a bad idea?

21:22 <heat> you be sure in which context ctor() or dtor() is being called, and with which locks

21:22 <heat> can't be*

21:22 <dminuoso> heat: Oh okay. I think there was a misunderstanding on my part. I read "actually allocating" as "page allocation"

21:23 <dminuoso> And I meant how that is done is largely irrelevant, and if it is relevant you will know and know how to address it

21:23 <dminuoso> (e.g. do you need physically contiguous pages or not)

21:23 <heat> it also stops you from optimizing the slab layout by sticking the *next inside the actual object

21:23 <heat> dminuoso, sure. although that's also not quite irrelevant

21:24 <dminuoso> heat: well its easily replaceable and improvable

21:24 <heat> getting ptr_to_slab in an efficient way is a good idea

21:24 <mrvn> heat: depend on when you call ctor/dtor. If you do it on alloc/free, i.e. new/delete calls, then it's perfectly fine

21:24 <heat> mrvn, no, these ctor/dtor are called at slab allocation/destruction time

21:24 <mrvn> heat: when you pre-initialize then you are right about the next pointer

21:24 <dminuoso> I think there's something to be said about which parts are easy to refactor down the road, just to get a working implementation faster.

21:26 <heat> sure

21:27 <heat> but this aspect is relatively important for allocator performance

21:27 <heat> although after the vmem paper [bonwick 2001] it stops being important because the hotpath shifts significantly

21:28 <heat> well, stops being /super/ important

21:28 <mrvn> heat: calling ctor/dtor at slab construction time is a security problem too, you leak information across alloc/free

21:33 <heat> sure

21:33 <heat> (although that's a problem for every allocator)

21:33 <dminuoso> well, sometimes you explicitly want to cache it

21:33 <heat> I find this ctor pattern to be highly pessimal anyway

21:33 <dminuoso> so its more subtle to understand which cacheable construction artifacts are security sensitive and which ones are not

21:34 <heat> if you want to add all these cached objects to a list, you can't just lock once, add everything, unlock

21:34 <heat> you need to relock for every ctor() call

21:34 <dminuoso> other allocators could get away by just zero'ing out everything, so your only worry is whether that happens at all

21:34 <heat> the original big linux issue with dtor is that you can't reliably know in which context you're calling it

21:35 <heat> you could call it from IRQ context, you could call it as part of normal irqs-enabled preemption-enabled operation

21:38 <heat> my big issue with it is that not only do I not have objects that require such expensive initialization/destruction, it stops me from being able to reuse freed object space, and doesn't play well with C++ object lifetimes at all, and the dtor issue too

21:40 <heat> you'll also realistically only have 2 or 3 caches in the kernel, and those should get hand optimized

21:51 <mrvn> If you have an object that needs an expensive ctor call to initialize then why don't you need that after free and allocating it again?

21:52 <mrvn> as for the context in which you call it. How is that a problem. Don't alloc/free objects in an IRQ that are not IRQ safe and vice versa.

21:53 <heat> because per bonwick objects that get freed are still in a valid state

21:53 <heat> how is that a problem? because dtor isn't called at free() time, but at page_free() time

21:54 <mrvn> the problem is that the state they are in is pretty random. So any user needs to re-init the object to get a consistent state.

21:54 <heat> if you're allocating in a particularly complicated context (irqs off, preemption off, whatever), dtor() can be called whenever

21:54 <heat> if you're running out of memory for instance

21:55 <heat> sure, it's pretty random, which is why you need to be careful not to

21:55 <heat> the only big usage of ctor in linux I can find: https://grok.dragonflybsd.org/xref/linux/fs/inode.c?r=2e488f13&fi=inode_init_once#inode_init_once

21:55 <bslsk05> grok.dragonflybsd.org: inode.c (revision 2e488f13) - OpenGrok cross reference for /linux/fs/inode.c

21:55 <heat> it's this kind of dubious usage that tells me ctor() isn't a good idea

21:55 <mrvn> I find state that survives across free/alloc rather odd

21:55 <heat> well, that's the gist of the "caching allocator" part

21:56 <mrvn> don't you call page_free in free when you have an fully unused slab and enough free objects?

21:58 <heat> no, you call page_free when memory gets tight

21:58 <mrvn> that would imply you have a list of all slabs sorted by some metric to find the best one fro free from and such.

21:59 <heat> you essentially gather empty slabs in your cache for $indefinite amount of time, per bonwick

21:59 <heat> ofc in the real world things are a bit different

21:59 <heat> meh, that's optional

21:59 <mrvn> I don't cache free objects. I cache stuff with information in it. :)

21:59 <dminuoso> If empty slabs are going to be the reason you are running out of memory, I think you have a very different problem

22:00 <mrvn> dminuoso: if your slab never shrinks on free then it will be the cause of oom

22:00 <heat> yeah it's a valid issue

22:00 <dminuoso> I guess it is a situational problem, but a very sporadic housecleaning will take care of it.

22:01 <mrvn> dminuoso: think inode/dentry cache. That gets huge till you run out of memory.

22:01 <heat> yeah, which is why you shrink on OOM

22:01 <dminuoso> It might not run oom on the basis of never shrinking, though.

22:01 <dminuoso> Getting OOM is if you continously allocate new slabs

22:01 <dminuoso> But merely not freeing wont continuously increase memory pressure

22:01 <mrvn> But on OOM you then need to free inodes/dentries so the slab can shrink at all

22:01 <heat> one can imagine particularly big slabs, like a kmalloc-32MB slab

22:01 <dminuoso> mrvn: right

22:02 <dminuoso> at the OOM threshold the small latency is fine anyway, chances are even slab freeing is not going to be enough anyway

22:02 <mrvn> heat: is there a point of having slabs for obejcts > page size?

22:02 <dminuoso> it will at best only temporarily give you some breathing room

22:02 <heat> mrvn, yeah

22:02 <clever> mrvn: it reduces fragmentation, so you dont wind up with a pair of 16mb holes, and no 32mb hole

22:03 <heat> kmalloc is a big usage of that

22:03 <clever> the same thing the rpi's relocatable heap solves, but the rpi just moves things around after the fact

22:03 <mrvn> clever: irelevant for 64bit for consumer systems

22:04 <heat> slab is pretty much based on the fact that if you need it once, you'll probably need it again

22:04 <clever> mrvn: i think it depends on how you manage memory, if you map the entire physical ram to a contiguous range of virtual memory, the your hole fragmentation carries over

22:05 <clever> but if you dynamically change the kernel paging tables, you can assemble all of the holes on demand with the mmu

22:05 <mrvn> clever: that's then not using virtual memory in the kernel. Linux used to do that. Big problem.

22:05 <mrvn> You have an MMU. Use it.

22:05 <clever> ah, that explains why i thought linux did that

22:06 <clever> and 32bit linux with LPAE kinda needed to switch over

22:06 <clever> since you couldnt fit all ram

22:06 <heat> linux still does that

22:06 <heat> using virtual memory is slow

22:07 <heat> it allocates on top of the direct mapping

22:07 <clever> i can also see it being beneficial to map things twice

22:07 <clever> if you want a physically contiguous chunk of ram, allocate the pages with the PMM, and use the physical window in the virtual space

22:08 <clever> if you dont care about the physical view, allocate random pages with the PMM, map them somewhere, and its assembled

22:08 <clever> and adjust the protection bits as needed, so the mmu still does its job

22:08 <heat> vmalloc allocates actually virtual memory, but you should only use it for big allocations, it's pretty expensive and the allocation sizes get page aligned

22:08 <clever> yeah, thats what i was thinking of

22:08 <heat> (and not quite virtual, everything is mapped and pinned)

22:08 <heat> you also have kvmalloc which tries kmalloc and falls back on vmalloc

22:09 <clever> ah, nice

22:09 <mrvn> heat: if it's small you use a slab.

22:09 <heat> you may not be able to know if it's small though

22:09 <heat> imagine std::vector

22:10 <mrvn> heat: a vector is big and you probably want to reserve lots of space for it to grow.

22:11 <heat> a vector isn't necessarily big, and you don't want to reserve lots of space but rather 2^order (where order gets incremented when you run out of space)

22:11 <mrvn> heat: reserve, not allocate.

22:11 <heat> and you can't reserve $lots because, again, there's no virtual memory or on demand faulting

22:11 <mrvn> heat: a linux problem.

22:12 <mrvn> I have no problem reserving a big chunk of virtual address space and then later maping physical pages for it as needed.

22:12 <clever> the last time i actively wrote linux drivers, it was for a 3d core

22:12 <clever> most of the allocations where small and fixed size

22:12 <clever> but the framebuffer and textures where large

22:13 <clever> and all of them had to be physically contiguous

22:13 <heat> this is not a linux problem

22:13 <heat> it's a $every_os problem

22:13 <mrvn> clever: so you should probably have a slab for the small stuff and kvmalloc the big stuff. But with physically contiguous it's a bit different.

22:13 <heat> on demand faulting isn't something you can just do

22:14 <clever> https://github.com/cleverca22/v3d2/blob/master/v3d2.c

22:14 <bslsk05> github.com: v3d2/v3d2.c at master · cleverca22/v3d2 · GitHub

22:14 <mrvn> heat: I didn't say "on demand faulting". A std::vector knows when it resizes and can explicitly map pages as needed.

22:14 <mrvn> heat: you just don't want to move objects in the kernel so you need enough virtual address space for the vector to grow into.

22:15 <clever> mrvn: looks like i was using dma_alloc_coherent() and remap_pfn_range() to map it into userland

22:15 <heat> and how much would you reserve?

22:15 <heat> you can totally move objects, that's a non-issue

22:15 <clever> mrvn: if i was to rewrite it nowadays, i would use the new dmabuf framework to handle that half of the job

22:15 <heat> it will always be an amortized O(1) push_back

22:17 <mrvn> heat: depends on the context. $memory can't be too wrong. As for moving that invalidates all iterators (and pointers) and you either have to copy (which means using twice the old size in memory) or remap pages. Both are rather costly.

22:18 <heat> remapping doesn't work in standard C++

22:18 <mrvn> Again something you can't do in 32bit. In 64bit reserving 8GB address space on an 8GB ram RPi is no problem.

22:19 <mrvn> heat: depends on wether your object is trivially copyable.

22:19 <heat> in fact, userspace std::vector can't use realloc for this reason

22:20 <mrvn> heat: it can for trivially copyable.

22:20 <heat> I don't know if there's an optimization for that

22:20 <mrvn> not that I know of.

22:20 <heat> IIRC there wasn't last i checked

22:20 <heat> a solution for this would be to add a relocate() method

22:20 <mrvn> I missing a realloc_but_fail_if_you_have_to_move()

22:21 <mrvn> I guess in userspace the chance that realloc doesn't have to move is near zero unless you shrink.

22:21 <mrvn> heat: does std::vector realloc on shrink?

22:22 <heat> hmmmmmmmmmmmmmmmmmmmmmm, idk

22:22 <mrvn> I don't think C guarantees realloc + shrink won't copy.

22:24 <heat> i highly doubt C gives any guarantees on realloc at all

22:25 <mrvn> heat: it guarantes that the memory block from 0 to min(old_size, new_size) remains the same.

22:25 <heat> sure, except that

22:26 <mrvn> I would assume though that most OSes implement shrinking in realloc to not move the data.

22:26 <mrvn> But nothing about that in the manpage so I assume POSIX guarantees nothing there.

22:35 <klange> That cyao person showed up in my channel at 1am asking similar questions, and then got mad when I wasn't there.

22:36 <heat> timezone moment

22:36 <gog> hi

22:39 scoobydoo_ has joined #osdev

22:40 scoobydoo has quit [Ping timeout: 260 seconds]

22:40 scoobydoo_ is now known as scoobydoo

22:42 hmmmm has quit [Ping timeout: 252 seconds]

22:44 srjek|home has joined #osdev

22:45 Celelibi has quit [Ping timeout: 246 seconds]

23:13 <heat> ______________

23:13 <heat> \______ _____ /

23:13 <heat> | fuck you |

23:13 <heat> \/

23:13 <heat> .--.

23:13 <heat> |o_o |

23:13 <heat> |:_/ |

23:14 <heat> // \ \

23:14 <heat> (| | )

23:14 <heat> /`\_ _/`\

23:14 <heat> \___)=(___/

23:17 Burgundy has joined #osdev

23:21 <gog> :(

23:26 LittleFox has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]

23:27 LittleFox has joined #osdev

23:37 <heat> tux has grown to be a bit of an asshole

23:44 <geist> awww that's mean

23:45 <heat> but he's smiling and everything

23:47 <geist> reminds me the guy that drew the tux logo was at my university at the tmie

23:47 <geist> i remember it being some what of a deal

23:49 <Mutabah> heat: ... why'd you post that?

23:50 <heat> https://www.reddit.com/r/linux/comments/yv76he/oc_jfchmotfsdynfetch_the_most_minimal_fetch_tool/

23:50 <bslsk05> www.reddit.com: [OC] jfchmotfsdynfetch - The MOST minimal fetch tool that fetches precisely NO information about your PC : linux

23:51 <heat> i find it very cute

23:55 <zid> wtf is up with my interwebs

23:55 <zid> did you do this heat

23:56 <heat> yes