klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
_whitelogger has joined #osdev
<heat_> LOGGER'S BACK
<heat_> WOOOOOOOOOO
kfv has joined #osdev
<zid> oh nice
<zid> but no more talking about smurfs :(
<heat_> now that we're logged, what kind of SOURCE CODE LEAKS have yall been looking at
netbsduser has quit [Ping timeout: 240 seconds]
<heat_> i cannot wait to infringe on intellectual property in a logged channel
<gog> i'm generative ai
<gog> i can't not infringe on IP
<nikolapdp> zid smurfs is a great name for a filesystem
<zid> wait, what did you think I was talking about
<zid> if not smurFS
goliath has quit [Quit: SIGSEGV]
<nikolapdp> that's the spirit
Left_Turn has joined #osdev
Left_Turn has quit [Remote host closed the connection]
navi has quit [Quit: WeeChat 4.0.4]
netbsduser has joined #osdev
bauen1 has quit [Ping timeout: 268 seconds]
kfv has quit [Quit: Textual IRC Client: www.textualapp.com]
janemba has quit [Ping timeout: 245 seconds]
heat_ is now known as heat
navi has joined #osdev
Left_Turn has joined #osdev
luke9716 has quit [Remote host closed the connection]
bauen1 has joined #osdev
frkazoid333 has joined #osdev
zxrom has joined #osdev
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
pretty_dumm_guy has quit [Ping timeout: 252 seconds]
pretty_dumm_guy has joined #osdev
zetef has joined #osdev
TkTech has quit [Quit: Ping timeout (120 seconds)]
TkTech has joined #osdev
randm has quit [Remote host closed the connection]
randm has joined #osdev
zetef has quit [Remote host closed the connection]
joe9 has joined #osdev
jack_rabbit has quit [Read error: Connection reset by peer]
jack_rabbit has joined #osdev
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
netbsduser has quit [Ping timeout: 264 seconds]
antranigv has joined #osdev
netbsduser has joined #osdev
gog has quit [Quit: Konversation terminated!]
yoo has quit [Ping timeout: 256 seconds]
yoo has joined #osdev
goliath has joined #osdev
yoo has quit [Ping timeout: 246 seconds]
zetef has joined #osdev
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
GeDaMo has quit [Ping timeout: 264 seconds]
Shaddox404 has joined #osdev
GeDaMo has joined #osdev
bitoff has joined #osdev
<nikolapdp> heat: how do you do physical memory allocation
Neo has quit [Ping timeout: 260 seconds]
<heat_> imagine not having tab completion
heat_ is now known as heat
<nikolapdp> kek
<heat> nikolapdp, very generic question, please explain
<nikolapdp> like how do you keep track of what phyical pages have you allocated or not
<nikolapdp> do you use a slab, or bitmap or whatever
<zid> bitmap is life
<zid> bitmap is love
<nikolapdp> sure is zid
<zid> bitmap of bitmaps
<heat> i have two physical memory allocators
<heat> my bootmem allocator works pre-buddy, it's basically a list of available ranges and reserved ranges, and you carve out memory from the available ranges
<heat> it's a very simple thing
<zid> boros does a linked list cus it's boring and trivial
<heat> my actual page allocator (when memory is "properly up" and I have struct page available) is a buddy allocator
<nikolapdp> makes sense
<heat> pages in the buddy allocator get marked PAGE_FLAG_BUDDY, the order is also stashed in the struct page; these two things help me coalesce pages
<heat> then as a kind of "separate layer but not really" i have a percpu cache of order-0 pages
<heat> does this answer your question?
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<nikolapdp> yes it does
antranigv has joined #osdev
<heat> why do you care
<nikolapdp> just curious
<heat> oh, note that my buddy allocator is zone-based
<heat> and technically-but-not-actually NUMA-node-based
<heat> say, a node has memory for DMA32 and NORMAL (> 4GB)
zetef has quit [Remote host closed the connection]
<zid> nikolapdp: When are you adding me a proper allocator to boros?
<heat> what zone you prefer/use entirely depends on flags you pass the allocator
<nikolapdp> zid: when i am done writing my own os
<heat> it also has some initial support for kicking off page reclamation
<zid> you mean, after you're done reading honzuki
<nikolapdp> people can do two things
<nikolapdp> heat why is it numa but not really
<heat> in practice besides the basic LRU shit (which i *still* don't have) i need memory compaction in order to reliably be able to get higher order pages
joe9 has quit [Quit: leaving]
<heat> it's numa but not really because although I do have the beginnings of a struct page_node for each NUMA node, i don't instantiate any and the alloc_page() interface does not support specifying numa nodes
<heat> nor is slab numa-aware, nor is anything else
<heat> and i cant be arsed because i don't have numa hardware, so it'd be pretty hard to test nonetheless
<heat> even if i tried to add numa
<nikolapdp> lol fair enough
Neo has joined #osdev
jack_rabbit has quit [Read error: Connection reset by peer]
jack_rabbit has joined #osdev
Shaddox404 is now known as Shaddox_AFK
<heat> geist, have you seen Svvptc?
<heat> it works around the need for the "redundant" sfence.vma when mapping in a page fault
<heat> it makes stores to PTEs that set V happen-before an sret or mret
Shaddox_AFK has quit [Ping timeout: 256 seconds]
Shaddox_AFK has joined #osdev
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
<Ermine> time for speedrun build onyx 100%
<heat> wat
Shaddox_AFK is now known as Shaddox404
<Shaddox404> Anyone using NixOS here?
<nikolapdp> no
<heat> you're the only nixos user in the world
<heat> enjoy
<Ermine> heat: I've got a new laptop and I want to check how quickly it will build onyx
<heat> cool! onyx is very fast to build
<heat> all ports - less so
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
nitrix has quit [Quit: ZNC 1.8.2 - https://znc.in]
nitrix has joined #osdev
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
<geist> ONYXXXXX
<geist> i'm not sure i've ever built it either, need to do it
<geist> also hmm, which one is svvptc....
<geist> ah yeah, dunno. no haven't fiddled with it
<heat> yeah its very new
<heat> it wasn't ratified yet tho
gareppa has joined #osdev
<geist> does remind me i should look at the oh what is it extensio (i have a whole spreadsheet at work with a list of extensions, but i'm on my personal computer right now)
<geist> it's the one that lets yo split MMU flushes into separate flush and sync instructions
<geist> ie, like arm
<geist> that extensiom is starting to show up on things
<heat> the Owhatisit extension?
<geist> Svinval
<geist> qemu will emulate it but i'm sure it makes no difference at all, probably treats the sync as a nop
<heat> i have no idea how one is supposed to support all these extensions and differing code paths
gareppa has quit [Client Quit]
<heat> this looks like opengl extension hell, but architecture
<geist> well, in general you start adding global bools and either test at the place, start code patching, or have different virtual functions
<geist> at this point its nothing like suppirting a bunch of stuff on arm64 post v8.0
<heat> is arm64 worse?
<geist> well, now that it's up through 8.7 and whatnot there are a *ton* of details that you may want to conditionalize on in the kernel
<geist> behavioral stuff
<geist> feature bits that change this or that
<geist> it's the beahviorla ones i find to be more annoying, where based on feature X if you set bit Y now you need to do sequence Z instead of W
<geist> though as is usual most are optional, so you can pick and choose
<geist> likle say dont need to use x2apic vs apic kinda stuff
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<heat> yeah
<heat> i guess with riscv it's *usually* like that too
<heat> except with Zicbom, that's really annoying
<geist> part of the problem is so far a few of the extensions are not opt out. like if this is present you must deal with it
<geist> there's a new extension for precisely that problem actually, but i haven't seen it in place yet
<geist> lets you turn features off
<heat> the riscv platform spec(i think?) says something like "cache coherency is not a problem and is expected for UNIX kernels. if cache maintenance is required, there will be an extension present for it"
<geist> whereas x86 and arm are worried enough about forward compatibility that they almost always hide new things behind some sort of opt in bit
<heat> which sounds /ok/, but you don't know if there's a cache extension present, except if you support it
<heat> and if you don't... silent breakage all around
<geist> yah
<heat> e.g there's Zicbom, but there's also a Theadcmo or something liek that
<geist> stuff like wiether or not the cpu writes back to the A/D bit: you cannot opt out of that
<geist> it either does it or not, and you must deal with both paths
<geist> that ones the most annoying to me personally so far. i'd just as soon have it fall back to exceptions and then if needed write code to scan later, but in this case you dont have a choice
<heat> i had to deal with the zicbom problem personally, and it was the most annoying shit ever
<heat> because the EDK2 people want to half ass it and deal with the real problems later
<heat> and i don't quite understand the device <-> cache coherency problem well enough to really be an authority on it
<geist> yah
Shaddox404 is now known as Shaddox_AFK
<geist> yah added zicbom to zircon recently
Shaddox_AFK is now known as Shaddox404
<geist> what's making the extension explosion not get out of hand is the RVA stuff which defines these baselines and mandatory extensions
<geist> so for the most part if you follow along there and pick up the mandatory bits as the RVAs roll forward. RV... uh, what is the A
<geist> oh profiles. A is for application stuff i think
Left_Turn has quit [Ping timeout: 256 seconds]
<bslsk05> ​github.com: riscv-profiles/profiles.adoc at main · riscv/riscv-profiles · GitHub
<nortti> do we also have RVM?
<bslsk05> ​github.com: riscv-profiles/rvm23-profile.adoc at main · riscv/riscv-profiles · GitHub
<geist> not sure it's ratified yet, but there is a microcontroller version
antranigv has joined #osdev
<heat> actually, now that you're here geist: when do you need to maintain cache coherency explicitly?
<heat> i know there's a device tree property for it
Left_Turn has joined #osdev
<geist> between cpus or between cpus and devices?
<heat> does it depend on the device? the platform? both? the architecture? all of em?
<heat> cpus and devices
<geist> yes
<geist> i dont actually know if ther'es a device tree thing that says if it's coherent or not
<geist> so for example the sifive hifive and visionfive class socs *are* coherent, which is why there really isn't any cache flushing you have to do
<heat> there's dma-coherent and dma-noncoherent
<geist> theres some sort of front port AXI bus that if you run your bus mastering dma device through it, the cpu gets to snoop the transfers
<geist> yah and it makes it dma coherent, and thus you dont really need to manually flush anything. basically like x86
<geist> note this is independent of i&d cache coherency. riscv and arm (and most other arches) you have to manaylly sync data there, but that's known
<heat> where's it stated "this architecture is coherent by default"
<geist> it does not
<geist> it quite explicitly does not state it at all
<heat> because the device tree spec states:
<heat> "For architectures which are by default non-coherent for I/O, the dma-coherent property is used ..."
<heat> and vice-versa for the dma-noncoherent
<heat> so... how tf do you guess?
<heat> i'm assuming the device tree spec reflects reality
<geist> right. it quite possibly is Just Known, or it may be stated that you must assume it's non coherent unless specified elsewhere
<geist> depends. which spec are yo ureading? if it's the original spec it probably hasn't been updated in 20 years
netbsduser has quit [Remote host closed the connection]
<geist> but if you read the arm and riscv spec it may be stated somewhere that it's non coherent by default
<geist> i just cant tell you if/where that is
netbsduser has joined #osdev
<geist> however since i know it is that way because that's how it is, i dont particularly need to find it
<geist> i think what makes it more confusing is except for very high end server chips, any given ARM device is almost certainly non-dma-coherent, so it's sort of the default state: non coherent unless proven otherwise
<geist> and if you over flush stuff you're just wasting time, but it's otherwise harmless
<geist> on riscv it seems a lot of the initial cpu clusters (by sifive in general) *are* fully coherent, so it means a lot of initial code can forget about it, and then as more cores come out that are not, it gets much more messy
Shaddox404 is now known as Shaddox_AFK
<nortti> < geist> https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva22-profiles if you're not following along ← do you know why it's only a recommendation to have an illegal instruction exception for RVA22U64?
<geist> i've never heard of one that doesnt, so i can't say why
<geist> possible there were some existing cores that dont, so this is a recommendation to try to claim it back
<geist> this profile stuff seems to be a real attempt across the riscv world to make some order out of chaos
<geist> to make things a little more confusing, it's possible for machine mode (SBI in particular) to trap and emulate instructions transparently for you, so they may, for eample, just nop something it doesn't understand
<geist> in that case it's not the cpus fault, but a firmware issue. from the app developer point of view it may appear as if nothing was raised
<geist> not saying thats the thing, but possible something like that exists somewhere and this is an attempt to 'please dont do that again'
gbowne1 has joined #osdev
<geist> i'd like to tell you some of the real world riscv mess i've had to deal with over the last 6 months but i can't, but precisely this sort of nonsense does exist right now
<geist> but usual 'bring up on <thing> which is weird and nonstandard'
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
Shaddox_AFK has quit [Ping timeout: 256 seconds]
Turn_Left has joined #osdev
Left_Turn has quit [Ping timeout: 255 seconds]
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
Shaddox_AFK has joined #osdev
Shaddox_AFK is now known as Shaddox404
<Shaddox404> heat: nah, i use OpenSUSE
<Shaddox404> I was curious since it was termed to be "different"
<kof123> eh, ask in other channels, there are people
<Shaddox404> Sure
Shaddox404 has quit [Quit: Connection Terminated.]
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
antranigv has joined #osdev
<zid> heat help, my vindaloo is REALLY hot
netbsduser has quit [Ping timeout: 255 seconds]
<zid> cute bug
<zid> they added a +1 to a buffer size to fix a bug
<zid> now if sizeof(p) is 0 cus of a failed alloc, it no-longer falls through to the error cases, instead allocating 0+1 bytes and if(p) succeeds
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<heat> yo
<kof123> > this looks like opengl extension hell, but architecture well it sounds like /dev/duck :D
<heat> can ppl with amd zen older than 4 build https://github.com/kernelslacker/x86info and post the output of ./x86info -c
<bslsk05> ​kernelslacker/x86info - x86info : x86 processor register decoder. (20 forks/37 stargazers/GPL-2.0)
<heat> i also take new intel
<heat> like 10th gen forwards is interesting
<bslsk05> ​paste.artixlinux.org: Database Error
<nikolapdp> zen 2
<nikolapdp> heat ^
<kof123> or all the feature bits sounds like "The Thirty-Million Line Problem" .....which he argued e.g. for x86 (expansion hardware-wise, not just cpu), that was what brought innovation... this is just to say, because it is new stuff, the dust hasn't settled yet?
antranigv has joined #osdev
<heat> >L2 Instruction TLB (1G): Disabled. 0 entries.
<heat> huh, does zen3 have some weird errata?
<nikolapdp> same on zen 2: L2 Data TLB (1G): Disabled. 0 entries.
<nikolapdp> L2 Instruction TLB (1G): Disabled. 0 entries.
<heat> WOAH that's even weirder
<Mondenkind> would be interesting to bench some stuff with 1g pages and see what the actual behaviour is
<nikolapdp> can you even force 1g pages
<nikolapdp> other than writing your own os and doing it that way
<Mondenkind> if it's actually doing a full page walk for every access that would be ... bad ...
<Mondenkind> nikolapdp: pretty sure it is possible under linux. might require some hoop-jumping
<heat> yall have some really beefy fucking caches
<nikolapdp> not that i even have a gig of ram free at the moment :)
<heat> the direct map uses 1GB if possible AFAIK
<nikolapdp> heat: really
<heat> yes
<heat> here's my kabylake's
<bslsk05> ​gist.github.com: x86info-kbl · GitHub
<nikolapdp> huh differently displayed
<heat> well, the cache is different
<heat> intel CPUs (at least pre-kabylake) have a shared L2 TLB, AMD zen ones seem to be separated in iTLB and dTLB, and then separated in page size
jack_rabbit has quit [Remote host closed the connection]
jack_rabbit has joined #osdev
<nikolapdp> also mostly 8-way associative on zen
<heat> i want to see anything 10th+ gen on intel
<qookie> according to https://www.7-cpu.com/cpu/Zen2.html 1G pages use the same L2 TLB entries as 2M pages?
<bslsk05> ​www.7-cpu.com: AMD Zen2
<qookie> > 1-Gbyte pages smashed into 2-Mbyte pages in Data TLB L2: 2048 items. 16-way.
bauen1 has quit [Ping timeout: 256 seconds]
<nikolapdp> zen 3 had dedicated dTLB
<nikolapdp> realistically, who's mapping multigigabyte executables
<heat> it's important if you're running off of 1G pages in the kernel
<heat> e.g in the direct map
<nikolapdp> yeah that's true
gog has joined #osdev
<heat> qookie, ah, i guess that makes sense
<geist> heat: re: split TLB for different page sizes, seems to be the opposite
<geist> the L1 tlbs at least are all page sizes
bitoff has quit [Ping timeout: 256 seconds]
<heat> hrm
<heat> where do you see that?
<geist> I TLB L1 : 64 items. full-assoc, all page sizes
<geist> hmm, where is the D TLB L1....
<heat> what core?
<geist> oh zen 2
<zid> where DO you see that
<nikolapdp> yeah i don't see it
<geist> search what i pasted
<heat> i wonder if the 5000 line was different than the 3000
<geist> 5000 is zen3
<nikolapdp> mine is 5500u but zen2
<geist> okay, except the mobile, etc stuff
<geist> that's where the nubmers get confusing, darn you AMD!
<nikolapdp> they keep changing it which is also annoying
<geist> https://en.wikichip.org/wiki/amd/microarchitectures/zen_3 is in general a better place to find this stuff
<bslsk05> ​en.wikichip.org: Zen 3 - Microarchitectures - AMD - WikiChip
<qookie> geist: the data TLB is talked about in a bit more detail below, split into sections for each page size
<geist> it shows the zen 3 as having all page sizes for L1i/d and then L2s do 4K/2MB, 1GB flattened
<heat> hmm, i wonder if there's a bug in x86info
<geist> which is what i generally remember, AMD has generally had multi page size TLBs
<heat> x86info from what i was told essentially flattens the cpuid data into pre-baked strings
<heat> so whatever they lifted, was lifted straight from a manual
<geist> probably at the expense of being harder to implement
<nikolapdp> but i am missing both iTLB and dTLB for 1g pages
<geist> yes. almost certainly
<nikolapdp> while zen 3 is missing only iTLB
<heat> in the L2, because they smash it into 2M
<geist> Zen 4 seems to have 1GB pages
<bslsk05> ​en.wikichip.org: Zen 4 - Microarchitectures - AMD - WikiChip
<geist> in the L1s at least
<heat> in any case YALL ARE FLOODED WITH LARGE PAGE TLB
<zid> 1GB pages evil on my cpu, got it
<heat> i don't get it, maybe there's something we're missing
<nikolapdp> zen 2: 1-Gbyte pages are smashed into 2-Mbyte entries in the L2 ITLB
<nikolapdp> from wikichip
<geist> but for example skylake has split TLBs at the L1 level, shared at L2
<bslsk05> ​en.wikichip.org: Skylake (client) - Microarchitectures - Intel - WikiChip
<heat> my shit kabylake from 6 years ago has a larger L1 TLB than zen 4?
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<nikolapdp> was the zen 4 a mobile chip?
<heat> no
<heat> zen 4 is a microarchitecture
<nikolapdp> oh so no one sent a zen 4
<nikolapdp> ok
<heat> i have mcrod's zen 4
<geist> heat: well, your kabylake has 128 4K L1i and 64 4K L1d
<geist> but they're 8 way associative instead of being fully associative and all page size
<geist> so my guess is that's sort of similar
<geist> https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake#:~:text=Kaby%20Lake%20TLB%20consists%20of%20dedicated%20L1%20TLB%20for%20instruction%20cache%20(ITLB)%20and%20another%20one%20for%20data%20cache%20(DTLB).%20Additionally%20there%20is%20a%20unified%20L2%20TLB%20(STLB).
<bslsk05> ​en.wikichip.org: Kaby Lake - Microarchitectures - Intel - WikiChip
<heat> hmmm, cpuid does not say that
<heat> at least not x86info
<heat> Instruction TLB: 4K pages, 8-way associative, 64 entries
<geist> also interesting, it shows the DTLB being fixed partitioned between threads
<geist> which probably means 50/50
<gog> hi
<nikolapdp> hello gog
<geist> gog
<gog> geist:
<heat> gog:
<heat> bazinga
<gog> heat:
<gog> bazel
* geist shudders
<heat> do note that cpuid can be wrong as stated in the kabylake errata
<heat> it may be that amd is just really bad at describing its TLB layout through cpuid
<gog> my boss told me i'm a good programmer
<heat> congrats gog
<nikolapdp> go you gog
<heat> do you want to see windowsified linux page tables code
<heat> it's a blatant GPLv2 violation
<gog> HPAGE
<heat> i wish
<bslsk05> ​openfw.io: [edk2-devel] [PATCH v8 14/37] UefiCpuPkg: Add CpuMmuLib to UefiCpuPkg - Chao Li
<heat> the copied the whole weird layout, code is straight up copied and then converted
<heat> there's a SWAP_PAGE_DIR that they took from swapper_pg_dir
<heat> somehow the most GPL compliant chinese corporation
bitoff has joined #osdev
antranigv has joined #osdev
<zid> oh, heat's program is where you get it
<heat> that's not my program
<geist> greetings program
<zid> the one you linked, makes it yours
<bslsk05> ​torvalds/linux - Linux kernel source tree (52082 forks/165002 stargazers/NOASSERTION)
<heat> yall like my new kernel
<zid> nah cus I already knew about that
<zid> heat l2english already
<heat> heatux kernel
<nikolapdp> gnu/heatux
<zid> why not heatix
<zid> everything cool is *ix
<nikolapdp> UNIX
<heat> nix
<heat> wait a minute! nix isn't cool!
<zid> onix
<heat> overrated pokemon
<zid> overrated!?
<zid> Everyone considers it total trash
<heat> i agree then
<heat> total trash
<heat> i've had an idea for a while
<heat> one could /probably/ have a linux-ish style of page table management but still enjoy a pmap interface if need be
<zid> no, you can't eat tomatoes until you turn red
<heat> like, one thing doesn't necessarily exclude the other
<heat> my big qualm with having a per-arch pmap is that a lot of it is just copy-pasted
<zid> MASSIVE IFDEFS
<zid> hundreds, in one file
<zid> Just write every possible line, in a bunch of different orders
<zid> and piece together the impl with ifdefs
<heat> because arm64 - arm32 - x86 - x86 PAE - x86_64 - riscv32 - riscv64 are basically the same shit
<heat> but with varying numbers of levels, and some tiny differences when it comes to flags
<geist> indeed, however they're different enough, with different enough optimization paths, or requirements
<geist> that trying to unify them is a Bad Idea
<heat> i... i don't know
<heat> you probably know better than me
<heat> but i think there's a way to make it work
<zid> don't trust him heat
<zid> go for the combinatorial explosion of ifdefs
<heat> you dont need many ifdefs if you do it right
<heat> geist, what kind of stuff are you thinking off?
<heat> of*
<geist> well things like the precise ordering of TLB flushes. do you batch, do you do alone?
<geist> do you need to barrier here vs there
<geist> what about A/D writeback?
<geist> how many reserved bits per level? can you use intermediate page sizes?
<geist> does the cpu support combined page sizes, what about variable page size? etc etc
<gog> yes
<heat> right
<geist> stuff just starts to combinatorially explode. but the trouble is some of the pattersn result in needing to do the order differntly
<geist> like splitting a page table, what precise order do you need to do it in
<geist> splitting large pages that is
<heat> idk how much linux mm you've read, if any
<geist> ASID support with TLB flushing is a gigantic PITA
<heat> but you can totally offload things to helper functions
<geist> since there's no one precise pattern that works
<geist> sure i dont doubt you can completely plow through it brute force. the Linux Way
<geist> i just dont know if the result is worth it
<geist> i'd rather have N copies of highly tuned code
<heat> and even if complexity may jump up a bit, because things aren't exactly the same, you probably get a better result than having 10 pmap impls
<geist> (X) Doubt
<heat> even in a big system like freebsd most of them don't agree in the way you can go from a page to all the mappings
<geist> OTOH, it's also worth a try :)
<geist> ARM64 is the real outlier here since it's page tables are so flexible
<geist> and has some extremely careful ordering of updates, primarily because of the weak memory model
<heat> absolutely
<heat> for anything that's extremely fucky to get right/properly, you could just fork the code
<heat> that's the difference between my idea and linux's. linux's doesn't even attempt to have a pmap-ish layer
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<geist> yeah but it also treats the page table as a first class citizen
<geist> as an upper data structure, much to the chagrin to any arch that doesn't match that model
<geist> it's a fundamental design decision that has immense ramifications
gog has quit [Quit: byee]
* heat nods
<heat> i.. i don't know, this is hard to think about
<heat> i feel like the linux page table model came up as an accident, but a happy accident because by exposing them as a first-class citizen it ends up allowing for really fast "hacks" so to speak
<heat> it was just a "look haha my hobby system can map pages" that evolved into pgd/p4d/pud/pmd/pte go brrrrrrr
antranigv has joined #osdev
<geist> yah
<geist> but then it also is only a win on architectures where it lines up. ie, x86
<heat> linux generally throws away every other kernel's very pretty abstractions that map out very nicely on a whiteboard
<heat> and it ends up winning out because of that
<nikolapdp> geist where does it not line up
<heat> sun engineering ethos vs LINUX HACKER GPL!!!!
<nikolapdp> kek
<geist> nikolapdp: POWER/PPC comes to mind. or itanium
<heat> ppc, itanium, sparc
<geist> or arches that take explicit TLB misses, or even arm32
<nortti> 0
<heat> zero
<geist> iirc arm32 has some funny dual page table thing, where for every high level page table there's a second one
<nikolapdp> so a bunch
<heat> yes
<geist> yah but, if you notice all the modern ones basically copy x86
<geist> because they know where the bread is buttered
<geist> (not that x86 invented that strategy of page table)
<heat> fwiw windows also follows this idea somehow
<geist> yeah
<geist> prototype page tables, etc
<CompanionCube> the new POWER versions have more conventional page tables, don't they?
<heat> gosh linux was a fucking accident wasn't it
<nikolapdp> absolutely
<nikolapdp> just at the right place at the right time
<heat> the UNIX people generally have some sort of disdain for linux's abstractions
<heat> i guess this is what dave cutler talked about all along
<heat> the UNIX phds
<heat> vs the Linux... unemployed BSc's?
bauen1 has joined #osdev
<heat> vs the OpenVMS demigods of course
<nikolapdp> and who won out :)
<heat> IBM AIX
Matt|home has quit [Quit: Leaving]
<nikolapdp> SOLARIS
<heat> it's remarkably funny to read the svr4 internals book and see them justify the vnode as the end-all be-all of VFS's everywhere, but then when it comes to block devices and other special files, the vnode shits itself and needs a separate special filesystem to proxy
<nikolapdp> kek
<heat> whereas the linux jank has 3 separate structs with 3 separate method table structs
<heat> but everything Just Works(tm)
<nikolapdp> good enough(tm) always wins
<heat> yeah, the jank is there for a reason
<nikolapdp> if it works it ain't stupid i guess
<heat> there's a lot of stupid stuff *and* stuff that seems stupid but isn't
<heat> like struct page is really stupid and amazingly overloaded, but it's also the smallest of all the struct pages in UNIX
<nikolapdp> lol
<heat> there's a really great hairy trick in struct page: the mapcount field is biased to -1 (so 0 maps = -1 in mapcount)
<heat> this means that it's trivial and OPTIMAL to detect state transitions between mapped and unmapped
<heat> unmapped -> mapped = overflow to 0, mapped -> unmapped = underflow to 0xffffffff
<nikolar> Interesting
<nikolar> And very hacky
<heat> this is not a story the sun engineering department would tell you
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<heat> solaris would have a 128-bit counter to make it mega-future-proof
<nikolar> As if we're getting 128 bit processors any time soon
<heat> matthew wilcox (from linux) estimated they would pop up around 2050/2060 IIRC
<zid> in what year did he estiamate this
<zid> cus if it was anytime recent, I want what he's smoking