Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
nyah has quit [Ping timeout: 248 seconds]
[itchyjunk] has quit [Ping timeout: 255 seconds]
<heat>
how many TLB entries do you need to flush in order for a full TLB flush to be worth it?
<gog>
i guess it depends on the size of the working set
<gog>
compared to the number of TLB entries
smach has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
<heat>
for context, I have this optimization sitting on my local branch where, when COWing in fork(), instead of doing a TLB shootdown when COW-ing a memory region I do a TLB shootdown of every address in the end
<heat>
I guess I could try another strategy and buffer TLB shootdowns, and then flush them all
<heat>
that's also interesting... I don't know if it has been done though
<heat>
right now my strategy for TLB shootdowns is to go through the memory range in the page tables, collect A bits and TLB shootdown ranges of pages
<zid>
you're shooting down because the pages are becoming -w to do the COW?
<heat>
yes
<heat>
I don't think I can avoid that
<zid>
I'd just invplg as I went marking the -ws tbh
<heat>
but I can't just invlpg, I need to invlpg to everyone
<zid>
everyone?
<heat>
every other CPU executing my address space
<zid>
kinky bastard
<heat>
no u
<zid>
so I guess you INNER JOIN on your list of threads?
<heat>
fuck no that's slow
<zid>
you're going to have to tell them at some point
<heat>
I maintain a cpu set of cpus executing the address space
<zid>
also why would it be slow?
<heat>
when you ctx switch to set it, when you switch out you unset it
<heat>
i'd need to go through every thread
<heat>
that's not very gangsta
<zid>
process you're forking knows its threads, you set a bitmask for each cpu that thread is on, then for loop over your page tables sending an ipi to each set bit for each page, or you ipi to tell them all to dump the entire tlb
<zid>
s/that thread/a thread
<heat>
yes but that's already pre-computed
<zid>
okay then you already did the slow thing, grats
<heat>
also address spaces aren't really a process thing so...
<zid>
so I guess all you care about now is the ratio
<zid>
between "how many invplg" and "kill the entire tlb"
<heat>
korrect
<zid>
Let's start with 7
<zid>
and adjust it until it performs better
<heat>
not really "how many invlpg" but "how many shootdowns"
<zid>
a shootdown does an invplg, the count is identical
<heat>
a shootdown is 20x more expensive
<zid>
??
<heat>
IPI
<zid>
You do know I am literally talking about invplg via ipi right
<zid>
not an invplg that won't do anything because it's on the wrong cpu
<heat>
yes
<zid>
the amount of *remote* invplgs is the same as the number of ipis, so I can use "the number of invplgs" just fine
<zid>
to also mean the number of ipis
<heat>
ok sure
sikkiladho has quit [Quit: Connection closed for inactivity]
<zid>
So yes, start with 7, I assume that's easy to calculate because you know the memory map in terms of ranges somewhere?
<zid>
ala /proc/pid/map or whatever
<zid>
and you'll need some crap for shm I guess
<zid>
It's sounding more and more like you'll want to just do it blindly tbh
<zid>
given how annoying it will be to actually find out the answer of *what* needs flushing
heat_ has joined #osdev
<heat_>
my router just shootdowns my connection every so often
<heat_>
its gr8
<heat_>
<heat> right now it's not easy to calculate
<heat_>
<heat> I'm iterating through every region and mprotecting it COW
<heat_>
<heat> and an mprotect of a range can invalidate N pages as well as it can invalidate 0
<zid>
[02:28] <zid> It's sounding more and more like you'll want to just do it blindly tbh
<zid>
[02:28] <zid> given how annoying it will be to actually find out the answer of *what* needs flushing
heat has quit [Ping timeout: 256 seconds]
<heat_>
yes, it's hard
<zid>
given shm + walking multiple ranges
<heat_>
a strategy would be to buffer tlb invalidations in a list and then flush that shit in the end
<zid>
Or you could just drop the tlb and not waste 10kB of icache with this code ;)
<heat_>
booo
<heat_>
let me overengineer
<zid>
I can't find the x86 version of [02:28] <zid> It's sounding more and more like you'll want to just do it blindly tbh
<zid>
[02:28] <zid> given how annoying it will be to actually find out the answer of *what* needs flushing
<geist>
wouldn't hot by UK cuisine standards be like a chili pepper on the same table as some peas?
<zid>
we have the largest indian immigrant population in the world though?
<zid>
It'd be like making fun of texas for not being able to handle chili
<zid>
red dwarf is almost entirely about trying to find a good hot curry
<sbalmos>
gimme some good hot harissa sauce though
<zid>
that seems nice
<geist>
well,that's true. one time on a business trip to Korea the locals all tried to make us eat hot stuff at lunch to try to show off their cuisine. now, i can tell you korea has some spicy stuff
<zid>
yea it really dos
<geist>
but... living in bay area at the time there's also fairly spicy stuff around so it wans't successful
<geist>
but i remember the koreans laughing that the week before they took some italians out for lunch the week before and they were totally not up to it
<sbalmos>
oh yeah, a bowl of Maruchan spicy chicken
<geist>
which i guess makes sense, you dont think of italian cuisine as being particularly hot
<zid>
The spiciest thing an italian is like to eat is black pepper
<sbalmos>
dammit, now I'm going to have to go and get some habaneros this weekend. I've got two fresh mangos left I can dice up and make fresh mango habanero sauce with
<bslsk05>
redirect -> www.amazon.com: Amazon.com : El Yucateco Green Hot Sauce Bottle, Chile Habanero, 8 Ounce : Hot Sauces : Grocery & Gourmet Food
<zid>
ah that'd be fairly hot then yea
<geist>
it's beyond my ability to handle hot
<sbalmos>
yeah, straight habanero?
<zid>
habanero is ~250K scovilles, scotch bonnet is ~450k
<zid>
similar ballpark, it's fairly log scale in practice
<geist>
yah i'm totally willing to admit i have a limit
<zid>
you can only hurt SO much before more hurt is no different
<sbalmos>
yup
<geist>
i'm more of a tasty hot, like cholula sauce
<sbalmos>
geist: that's exactly why I love harissa
<zid>
so yea, if you want an idea of what happened, I got a big blob of your nice green sauce on my tongue, and it had a lump that was especially extra hot
<zid>
then double it
<geist>
sauce is totally getting triggered left and right now
<zid>
hopefully
* geist
waves at sauce
<zid>
man, I want more samyang buldak now
<zid>
I paid over the odds for some off amazon
<zid>
they were nice, but not £1.50 a pot nice
<sbalmos>
leftover chicken pad thai this evening. with extra hot chili oil. ;)
<zid>
hottest thing I've ever eaten was some chocolate that claimed it was 4 million
<geist>
heh i think because of someone here i ended up buying off amazon a few jars of uh what was that
<geist>
it was some little sauce, not HP sauce (which is very british)
<sbalmos>
zid: is that the one that has like ground up bhuk joloka or whatever it is
<sbalmos>
?
<geist>
but some sort of weird. hard to describe thing
<geist>
like the vegemite of uk
<zid>
there's a bunch, it's cheap to make and sells at a premium
<bslsk05>
twitter: <ChevyRay> nobody ever mentions the greatest benefit of not using source control: everything can be lost completely. the true best outcome. finally, it's gone, we're free.
<geist>
ddevault: indeed, but i'm not so sure it's worth pivoting your whole design around
<ddevault>
well, I definitely need continuous pages
<geist>
even a bitmap one you can allocate contiguous, it's just a slow operation
<ddevault>
and I'd prefer not to do a bitmap
<geist>
but since it's orobably rare that's probably fine
<geist>
okay, a linked list then, that's what i have always done
<geist>
an array of per page structures, that are then also stuffed in a free list
<ddevault>
linked list also makes it hard to to continuous allocations :<
<geist>
pop one off for a page, but to allocate contig you walk through the array
<ddevault>
hm
<heat>
i did what geist does
<heat>
it just works(tm)
<zid>
zidlist ftw
<geist>
gotta go, bbiab
<ddevault>
works if you're careful to avoid multi-page structures crossing page table boundaries
<heat>
allocating contiguous pages is shit but oh well
<zid>
a zidlist is where you allocate a linked list out of an array (or dynamic array) of structs
<heat>
I only allocate them when I really need to
<ddevault>
the alternative is making the kernel page tables a bit more sophisticated
<ddevault>
so that I can map non-continuous pages continuously
<ddevault>
but I'm not in love with that idea
<zid>
what do you need contig pages for, DMA?
<ddevault>
device memory is a much worse can of worms
<ddevault>
this is just because some of my data structures exceed a page in length and I'd like to be able to work with them through a regular old pointer
<ddevault>
and presently my kernel uses a fixed set of page tables
toluene has joined #osdev
saltd has joined #osdev
xenos1984 has quit [Read error: Connection reset by peer]
dude12312414 has joined #osdev
<heat>
gog, wtf is that program and why is that so fresh sheeeeeesh bruh on god frfr
xenos1984 has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
saltd has quit [Remote host closed the connection]
<geist>
ddevault: yeah that is an issue. if yo have a physmap (all of physical memory mapped i the kernel) and you allocate data structures out of it then it's hard to have them cross boundaries
<geist>
physmaps work great for things like page tables, since it doesn't matter if they're scattered around
<geist>
and you can directly access any page table in any aspace with just a simple transformation of physical -> physmap
<geist>
this is why i dont lean on things like recursive page tables, because the physmap solves it for all aspaces simultaneously on all architectures
<geist>
but... data structures that are large are a problem. for those you really gotta pull them out and make a virtual mapping
<zid>
I do both cus weird
<geist>
zircon does both too, actually
<geist>
the heap still lives n the physmap and thus still leans on contig allocations
saltd has joined #osdev
<geist>
though we have a general rule that you can't malloc() > 4K data structures so the heap can expand one page at a time
<zid>
I like like how simple the page table code is if you use recursive
<zid>
s/like/just
<geist>
yeah i just cant get behind it because its instrinsically an x86 only thing anyway
<geist>
so though it's cute i have to generically solve the problem for non x86
<geist>
and once you generically solve it, it has very little advantage outside of early bootstrapping
<geist>
also i think the TLB shootdown issues on x86 are kinda swept under the rug a bit. for a SMP machine i think it's pretty expensive to maintain these recursive mappings
<zid>
why would the recursive mapping way need more tlb shotdowns, are you likely to change the permissions on the pages in the pagetable?
<zid>
rather than the pages they're describing
<geist>
as you're mapping and unmapping inner page tables you'd have to do the TLB shootdowns on the recursive mapping itself
<zid>
yea I am asking why you would need to do the shootdowns on the recursive mapping at all
<geist>
since those end up being interpreted as leaf page tabe entries
<geist>
then you have to shoot those down as you remove inner page tables
<geist>
but a lot of that depends on the also complicated probem of how one does recursive mappings in a SMP world. does each cpu get a slot?
<geist>
do you have to pin the current thread on a cpu while fiddling with it?
<geist>
when you stop fidding with it and free the slot do you have to do a local TLB shootdown before you can reuse the slot? etc
<geist>
it gets fairly complicated pretty quickly
<geist>
i think it tips over the threshold of not worth the trouble pretty quickly
<geist>
i think for a UP x86 system it's not a bad strategy, since none of these TLB shootdown things become that complicated. maybe you have to do a few extra TLB invalidates when removing an inner page table, but they're local
<geist>
you might have to deal with having two slots for this stuff: the current aspace and an aternate one, since it's fairly common to manipulate a 'foreign' aspace whie one is active
<geist>
also a thing the physmap solution has no probem with, sicne there's no notoin of a live aspace vs a nonactie. they're all equally visible
<zid>
(note none of this will make sense until I figure out the answer to the question I asked earlier)
<geist>
which is?
<zid>
why would you ever need to invplg the recursive mapping across cpus to begin with
<geist>
okay, start with without having multipe cpus
<geist>
if you are UP, you do note that you have to do an invlpg when you detacth a page table itsef right?
<zid>
sure, otherwise the TLB would give me a hit instead of walking it properly
<zid>
and it relies on it being a proper walk every time to work
<geist>
so if you're SMP and they're mapped into the kernel, you'd have to shootdown cross cpu
<zid>
you skipped the part that answers my question there though
<geist>
*unless* you make the recursive mapping intrinsically single cpu
wootehfoot has joined #osdev
<zid>
what's causing other cpus to care
<geist>
being mapped into the kernel aspace
<zid>
they have their own valid/invalid mappings
<zid>
and we're *always* invplging anyway
<geist>
well, not when fiddling with page tabe entries
<geist>
this is a new thing you have to do with the recursive stuff
<zid>
I need to distribute the info that the mappings changed somehow though, whether I changed it via recursion or a direct magic bullet write
<geist>
basically you want to avoid manipuating the kernel aspace like the plague if you can, because it invoves gobal TB invalidates
<geist>
ah no, the difference is whether or not you have to do it for *all* cpus, or just cpus that currently have the aspace mapped
<geist>
since the latter is usually far lesser than the former, it's a huge win
<zid>
heh the onion keeps growing
GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]
<geist>
in the case of unmapping/protecting a regular page in some random aspace, any serious system at least keeps a bitmap of which cpus have the aspace active
<geist>
and ony shootdown on this
<zid>
Not seeing it
<geist>
but... for stuff ike the recursive table which is intrinsically mapped into the kernel you now expose *all* of the cpus to the mappings
<zid>
why do cpus without these page tables loaded care?
<geist>
i think we're just not on the same page here
<geist>
probably a different set of base assumptions about how stuff works
<geist>
not saying the recursive thing cant be made to work efficiently, but if you follow it's trail down to all the usual details it ends up with a few edge cases that are either nasty or require restructuring things differently than if you hadn't use recursive mappings
<zid>
one cpus starts playing snake, loads cr3 with some value where [cr3+504] == cr3
<geist>
and then it being x86 ony requires you now have to deal with an x86 only hack that infuences the rest of the design
<zid>
other cpu starts playing monkey island, does the same, different cr3 value
<zid>
why do they care about each other?
<geist>
what if the cpu that is playing snake wants to modify monkey island's aspace?
<zid>
Okay that's an interesting thought
<geist>
that's part of this 'starts to influence the whole design' thing. now it may be that the whole upper level VM has to start knowing about 'the current aspace' vs 'not currently active' and treating them differently
<geist>
i prefer designs where current and not current are equally availabe
<zid>
actually I don't understand the question, what are you meaning by 'aspace' here, modifying *its* page tables remotely? That's just some random physical memory I have to map into my process then write to, isn't it?
<geist>
you can, for example, swap to the target aspace, do some mmu ops, then swap back, but then that's a cr3 reload
<geist>
address space. think of it as the per process object that holds all of the page tables for that process, etc
<geist>
pmap in BSD parlance
<zid>
yea I got what it abbreviated for, just not necessarily what you meant by it, the actual page tables, or some tracking objects or whatever
<geist>
kinda both. i use aspace as the object that among other things holds the cr3
<zid>
It doesn't seem important to me in the least
<zid>
like.. how many times am I running CreateRemoteThread?
<geist>
it's a shortcut for saying 'that whole thing'
<zid>
when cheating at battlefield or whatever is all I can think of
<geist>
and this is where i think we have different assumptoins about what operations are important
<geist>
but that's what i was trying to express a bit a whie ago
<zid>
it's important to microkernels?
<zid>
cus of all the message passing stuff?
<geist>
it's important to compicated VMs that are scanning page tabes that are non active
<geist>
for example. it's also important to different designs that may or may not be posixy or win32y
<zid>
big iron processes vs hippie free love processes :P
<geist>
for zircon for example it's *extremey* important, but we build the whole design around it
<zid>
yea well if you have to optimize for it you have to optimize for it
<geist>
anyway, fine. i've said my peace. i just wanted to you to see that it's not always a win win
<geist>
but you're not interested so i should get back to work
<zid>
Saying "It's important in zircon to be able to easily access another process' page tables, and you can't even really USE recursive mapping to help you there" would have clued me in a lot better than talking about all cpu shootdowns or whatever
<zid>
what
<zid>
the fuck are you talking about geist
<zid>
when have I ever dismissed anything you said for any other reason than that I didn't understand
<geist>
well, no i was explicitly trying not to state zircon at all
<zid>
because as you mentioned, it isn't my wheelhouse
<geist>
you dragged it in and then wrote off the whole explanation as 'oh this is some zircon thing'
<geist>
which was *not* my intention
<zid>
I didn't write anything off, I said oh, is this on your mind and important to you, because of your important projects
<zid>
that seems 100% reasonable to me
<geist>
most OSes dont use this recursive thing because it has downsides
<geist>
i was trying to generically describe some of the downsides
<geist>
and plus it's x86 specific *anayway*
<geist>
without trying to bring zircon into it, because i'd assume you'd say what you did
<zid>
right but to actually *get* to those downsides you have to come up with a pretty specific example, and I was trying to figure out *when* that example would be important
<geist>
we haven't really described how it'd work otherwise in an SMP situation
<geist>
is that that each cpu intrinsicaly has the local aspace's reursive tables mapped into the same 'slot' because it's specific to the CR3?
<zid>
well you'd need a second code path regardless right?
<geist>
if so, does that mean you can only intrinsically manipulate the current aspace?
<geist>
we that's the question, what does the second code path ook like?
<zid>
the point of the recursive mapping is to abuse the MMU to do the walking code for you
<zid>
so if you're talking about a process that isn't loaded, it doesn't work
<zid>
so you'd need a 'walk the page tables like a normal human' version for remote ops
<geist>
or for example a VM worker thread that's cycling through and evicting page tables
<geist>
it could just keeep setting one active, manipuating it, moving on
<zid>
cute, loading cr3 blows the tlb though sans global bits
<zid>
and I don't think you want global bits set on your recursive mapping anyway
<geist>
ah but you have to wak them in the recursive case too, you just dont have to do the physica -> virtual transation at every leve
<zid>
so is that a problem?
<geist>
you can't 'jump to the end'
<geist>
so al you're really saving is the physical -> virtual lookup on every page table
<zid>
jump to the end of what? what jumping and what end?
<geist>
well you said 'the mmu doing the walking code for you'
<geist>
but you can't let it walk all 4 levels, you have to still iterate each level, you're just doing it in the recursive mapping
<zid>
you're generating virtual-virtual-addresses that let you modify virtual-addresses, by using the hw which does virtual->physical one step removed
<zid>
and tricking it into treating phys as virt
<geist>
yes but still have to do it per level, is my point
<geist>
you can't skip directly to the last level, which would be a good win
<geist>
*unless* you absolutely knew it was mapped already
<zid>
yea that'd be a useful optimization sometimes I imagine
<geist>
otherwise you'll get a page fault
<geist>
anyway, gotta get back to work
<geist>
i've managed to not convince another user of the recursive tables that it has issues :(
<zid>
It obviously has issues
<zid>
everything has issues
<geist>
i've tried this again and again over the years, but most users of the recursive tables will hold onto them until the grave
<zid>
the question is whether those are the *important* issues, to your workload/design
<geist>
or eventually they figure it out
<geist>
i think it fals into the same category as 'bitmap to track your page allocation'
<zid>
for my workload the pagetables basically never change so whatever works tbh
<zid>
so whatever's easiest to write into C is the best approach
<geist>
useful for the first few stages of osdev but it eventually ends up being more trouble than its worth
<geist>
sure. i just wanted you to see where some issues would happen
<zid>
right you just skipped a step and really confused me
<geist>
anyway, gotta meeting
<geist>
bbiab
<zid>
by saying you'd *have* to shoot down all cpus all the time, out of nowhere apropos of nothing
<zid>
so I had to do a deep dive into the 20 mental steps that were comfortable to you, but I'd never had to deal with in my designs
<zid>
then I think you got upset that I thought maybe it'd be useful for microkernels to not do that
<zid>
I still don't know why though
<geist>
fine.
<geist>
tired of arguing about it. forget the discussion happened
<zid>
okay but then you're just going to get upset again
<zid>
because I still don't know what upset you
<geist>
bummer.
<zid>
"babe what's wrong?" "Nothing" *oh fuck*
<geist>
both of us have 'get in the last word' syndrome
<geist>
so we keep dragging it along
<zid>
No you're just being a child
<geist>
no you!
<geist>
doodoohead!
<zid>
I'm was being courteous and asking you nicely what I did to upset you and how I could rsolve it
<zid>
you told me it was my fault and I had issues
<zid>
go away
<geist>
again can we just drop it?
<dh`>
looking at the scroll I can't even tell what you're fighting about :-(
<geist>
i just wanted a 10 minute discussion and it tipped into something personal for accidental reasons so i just pull the plug
<geist>
that's all. just a misfire. life will go on. /me buys zid a beer
<mjg>
can i have the last word?
<mjg>
SOLARIS
<mjg>
</thread>
<geist>
XENIX FOREVER
<geist>
(see, getting in the last word)
<mjg>
sounds like 2, to nitpick!
<j`ey>
aarch64
<geist>
speaking of: one of the really nice things about arm64: its page tables are very fragile. if you corrupt one, boom you get an exception that is well detailed
<geist>
so, for example, when some device dmas all over ram the kernel is frequently the canary in a bad page fault
<j`ey>
I think I found a bug in qemu cos my tables were wrong but it didnt blow up
<geist>
not sure qemu is as picky as real hardware is
<zid>
moar nochain
<zid>
nochain solves all ills
<j`ey>
I had to dive into qemu code
<zid>
It's an absolute panacea, an elixir of bug resolvement
<geist>
a common thing that you get in the real world, especially without an mmu, is dma trashing physical ram
<j`ey>
and by dive, I mean add printfs
<geist>
having the kernel be very aggressive about asserting things is good. just had a bug filed internally with a stray dma caught with our PMM checker logic (expensive, not on all the time)
<geist>
but basically it writes ot a known pattern to freed memory and scans for corruptions
<geist>
catches a fair amount of things
<geist>
looks like 24 bit pattern. probably RGB data
<j`ey>
nice
<j`ey>
DMAsan
<geist>
yah the proper solution is to have an iommu, but sadly most consumer ARM hardware still doesn't seem to have one
<geist>
i think this is partially ARMs fault for making it non mandatory, and then their design (SMMU) is hella complicated and probably expensive in hardware
<geist>
and thus vendors dont stuff it
<zid>
expensive devkit version with an iommu for debugging sounds useful though
<mjg>
geist: so you recently linked your memcpy or memmove for 32-bit arm
<mjg>
would be curious to bench it against the crap in freebsd
<mjg>
[although, not an arm erson, person that was 64 bit?]
<mjg>
s/person/perhaps/
<geist>
yeah that ended up being the darwin version (i wrote it when i was at apple)
<geist>
but arm64 has a completely different set of constraints, so arm64 memcpyes look totally different
<geist>
and i wasn't at apple at the time the 64bit world came along
<mjg>
do you have any board which can run that code/
<geist>
oh probably. but also i think that was born in an era when unaligned stuff mattered more than it does not
<geist>
so if i ran it on say a modern cortx-a53 or a72 in 32bit mode, it may be effectively 'too powerful'
<geist>
and just to be clear which code are you talking about? the arm32 assembly?
<mjg>
yea
<mjg>
bcopy.s
<geist>
as a side note they have an arm64 version right next to it, up and over a level in the file heirarchy
<geist>
i didn't write it, but it looks like it's been tuned at least
<mjg>
maybe that should take cortex strings
<mjg>
interestingly they have copy pasted the *terrible* bsd routines
<mjg>
for amd64
<geist>
yah i didn't look too closely at it
<geist>
maybe they runtime patch it
<dh`>
I'm glad someone looks after micro-optimizations because I usually just can't bring myself to care
<dh`>
:-p
<mjg>
one could argue that's passe in the current cpu vuln landscape
srjek has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
<geist>
yeah, and/or modern cpus tend to be pretty fast at plain ass code
<geist>
though there's stillr oom to eek out a bit for memcopies and whatnot