klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<mjg> if the page is going away
<mjg> otherwise i want it to stay on the lru list
<heat> yeaaaaaaaaaaaaaaaaaaaa idunno
<heat> slabs are bad anyway
<heat> long live the slubs
<mjg> long live buddy allocator
<heat> life is always better with a friend
<mjg> i have to stress, as much as slab is an ok idea, i don't think really crediting bonwick here is all that great
<mjg> i think the key takeaway from the paper which was not 100% obvious was converting malloc to use it
<mjg> but past that even the paper explicitly states people were hacking up their own caching layers in subsystems
<mjg> and his is just making it general
<mjg> which is not that much of a stroke of genius
SpikeHeron has joined #osdev
<heat> sure
<mjg> basically a number of other people would have done it if they could be fucked
<heat> otoh, it is different
<heat> it's like a funky object pool
MrBonkers has joined #osdev
<heat> but funnily enough it lost most of its funkyness in linux
<heat> it's just a pool now
<heat> a particularly named and organized pool
<mjg> "pool" is how they refer to it in the theo land
<heat> .theo
<mjg> although! it is plausible the pool thing is just for the per-cpu layer
<mjg> i don't remembert now
<heat> The world doesn't live off jam and fancy perfumes - it lives off bread and meat and potatoes. Nothing changes. All the big fancy stuff is sloppy stuff that crashes. I don't need dancing baloney - I need stuff that works. That's not as pretty, and just as hard.
<heat> theo dissing you
<mjg> theocracy
[itchyjunk] has quit [Ping timeout: 260 seconds]
<mjg> look man
<mjg> unscrew that vfs bench
<mjg> then i'm gonna take a look at patching obsd to retain the lead
<heat> lmao
<heat> which one
<heat> open3?
<mjg> whatever where you are behind
<mjg> i think it was open3
<mjg> but i'm happy to take any other
<mjg> as long as you are slower there now
<heat> I'm gonna take a little while here
<heat> so, my problem
<heat> I have the generic virtual address space allocator
<heat> it's kind of inefficient and holds a big, sleepable lock around the address space
<heat> i need a non-sleepable thing for vmalloc
<heat> I was going to take my time and get another tree impl
[itchyjunk] has joined #osdev
<mjg> heat: and unsingle-list your namecache ntires, ffs!
Test_User has joined #osdev
<heat> I think linux uses a hashtable
<mjg> everyone uses a hash table
<mjg> except for net and open
<heat> I don't know how that looks in contention
gorgonical has quit [Ping timeout: 265 seconds]
<mjg> it's great because you don't take locks when you do the lookup
<heat> you still do no?
<mjg> no
<mjg> R C U
<mjg> well hash table is not *inherently* scalable, but it is pretty good with a delayed memory reclamation mechanism
<mjg> like rcu
\Test_User has quit [Ping timeout: 268 seconds]
<heat> do your lookups use ebr?
<mjg> smr
Test_User is now known as \Test_User
<heat> oh yeah
<heat> That other rcu thing
<mjg> it's a freebsd-specific variation
<mjg> for the sake of this convo you may as well assume ebr
<mjg> key point is 0 ping pong against others accessing the same chain
<mjg> i\m confident i pasted benchmark results
<mjg> from not messing with locks for lookup
<heat> yea
<mjg> then wtf is up with the question dawg
<heat> i forgot dawg
gxt has quit [Remote host closed the connection]
<mjg> search for kern_statat
<mjg> from a major bottleneck to a blip
gxt has joined #osdev
<heat> oooh nice
<heat> geist, i'm takin trusty's wavl bst
<heat> it looks n o i c e
<mjg> hm what's rb vs wavl perf?
<heat> according to geist wavl is slightly better for vm region stuff
<heat> which is why zircon uses it
<mjg> no opinion
<mjg> i do have to note there is more than one way to implement rb
<mjg> fwiw i thought the forgeone opinion was that one should radix vm
<mjg> sigh
<mjg> i googled for wavl perf, found a suspicious site
<mjg> dude has sub-second reuslts for everything
<mjg> and then it turns out he implemented all of it in python
<mjg> f-word
<heat> if its a suspicious site always type your online banking login
<heat> mjg, why do your flamegraphs look drastically nicer than mine?
<heat> your fbsd ones
<mjg> in what way
frkzoid has quit [Ping timeout: 260 seconds]
\Test_User has quit [Ping timeout: 264 seconds]
<heat> less nested and with nicer colors
nick64 has quit [Quit: Connection closed for inactivity]
<heat> I haven't figured out what the colors mean tbh
<mjg> colors are random
<heat> huh
<mjg> and i don't know if you are joking or what
\Test_User has joined #osdev
<heat> so it just autumns all over my beautiful stacks
<heat> no, I'm not joking
<mjg> but the random short columns are bugs in dtrace stack unwinder
<mjg> for example cache_fplookup most definitely is not being called from amd64_syscall
<mjg> and i know cache_fplookup's caller did not tail call into it either
<heat> yeah but things look definitely less nested
<heat> your call depth doesn't look as high as mine
<mjg> oh that
<mjg> well it would be taller if it was not for the above
<mjg> taller than it is
<mjg> but past that, i suspect some inlines
<mjg> and
<mjg> your code sux0rz more than freebsd
freakazoid332 has joined #osdev
<heat> noo
<mjg> if you want a funny flamegraph https://people.freebsd.org/~mjg/lynx4.svg
<mjg> there used to be a global rw lock for the namcache
<mjg> and this is how it looked like :S
<heat> ugh
<heat> see I don't suck that bad
<heat> when was this?
<mjg> 2015
<heat> i'm better than 2015 fbsd
<heat> good enough for me
<mjg> actually you are not
<mjg> this is -j 40
<mjg> 2 sockets
<hmmmm> didn't linux have this fixed several decades ago with RCU?
<mjg> yes and no
<heat> linux has several tens of times the manpower of freebsd
<hmmmm> yeah pretty much the reason why i couldn't stay using freebsd
<mjg> funny case lookup started to scale way prior of course
<mjg> but rest assured there was tons of really nasty locking in there for a long time past that
* mjg might have ran into it
<heat> what do you think of a per-filesystem rename lock?
<mjg> fwiw it was performing ok enough on meh hardware at the time
vdamewood has joined #osdev
<heat> like linux had
<mjg> rename is a "fuck this shit" problem and a rename lock is an ok way to acknolwedge it
<heat> I know they do seqlock voodoo now
<mjg> fighting rename races is a losing battle
<mjg> so i support it 100%
<mjg> heat: you may want to say hi to dh` on thiso ne
<heat> hi dh`
<heat> how are you?
<heat> I have no context for this but I'll roll with it
freakazoid332 has quit [Ping timeout: 260 seconds]
vinleod has joined #osdev
nyah has quit [Ping timeout: 260 seconds]
vdamewood has quit [Killed (mercury.libera.chat (Nickname regained by services))]
vinleod is now known as vdamewood
Matt|home has quit [Quit: Leaving]
zid has quit [Remote host closed the connection]
zid has joined #osdev
<heat> I feel tempted to the the old vm allocation trick of having two trees, one with used regions and one with free regions, keyed by length
ThinkT510 has quit [Ping timeout: 250 seconds]
<heat> wait, no, how does this ever work? if you have two free regions with the same length
ThinkT510 has joined #osdev
mavhq has quit [Ping timeout: 246 seconds]
zid has quit [Ping timeout: 265 seconds]
zid has joined #osdev
mavhq has joined #osdev
<heat> geist, really like how simple lk is, super easy to take nice bits from it!
gxt has quit [Ping timeout: 258 seconds]
gildasio has quit [Ping timeout: 258 seconds]
gxt has joined #osdev
gildasio has joined #osdev
scaleww has joined #osdev
frkazoid333 has joined #osdev
saltd has quit [Remote host closed the connection]
gildasio has quit [Ping timeout: 258 seconds]
gildasio has joined #osdev
saltd has joined #osdev
scaleww has quit [Quit: Leaving]
saltd has quit [Read error: Connection reset by peer]
saltd has joined #osdev
<heat> binary_search_tree is pretty decent, it gets my LittleKernel(r) Integration Ready(tm) stamp of approval
<heat> only had to add a min function
<heat> could also be a bit nicer when it comes to callbacks, could just take callables and the functions could be templates, but those things are trivially addable
<heat> actually pretty decent is underrating it
<heat> it gets a proper Bri'ish Brilliant
<bslsk05> ​github.com: [PATCH] x86_64: prefetch the mmap_sem in the fault path · torvalds/linux@a9ba9a3 · GitHub
<heat> this is the kind of inquisitive prefetches I subscribe to
<heat> microbench slow? prefetch the lock!
saltd has quit [Read error: Connection reset by peer]
epony has quit [Quit: QUIT]
saltd has joined #osdev
<geist> heat: ah yeah that tree is nice. i need to use it for more stuff, only recently integrated it
<geist> wanted to write some more unit tests for it too
<geist> re: wavl trees vs rb trees, i honstly dont know. i didn't implement the wavl tree implementation in zircon
<geist> but it has some properties that are nice, but i forget off the top of my head
<geist> something about O(1) removals, or something, which is surprising, but i've been told it's the case
<geist> i haven't fully grokked how that's supposed to work on a balanced tree
heat has quit [Ping timeout: 260 seconds]
darkstardevx has quit [Remote host closed the connection]
darkstardevx has joined #osdev
srjek has quit [Ping timeout: 248 seconds]
srjek has joined #osdev
darkstardevx has quit [Ping timeout: 260 seconds]
opal has quit [Ping timeout: 258 seconds]
jstoker has quit [Remote host closed the connection]
opal has joined #osdev
jstoker has joined #osdev
opal has quit [Ping timeout: 258 seconds]
opal has joined #osdev
saltd has quit [Read error: Connection reset by peer]
frkzoid has joined #osdev
frkazoid333 has quit [Ping timeout: 244 seconds]
[itchyjunk] has quit [Remote host closed the connection]
dasabhi has joined #osdev
saltd has joined #osdev
saltd has quit [Read error: Connection reset by peer]
saltd has joined #osdev
tarel2 has joined #osdev
k8yun has joined #osdev
frkzoid has quit [Read error: Connection reset by peer]
k8yun has quit [Ping timeout: 252 seconds]
opal has quit [Remote host closed the connection]
opal has joined #osdev
dasabhi has quit [Quit: Lost terminal]
ThinkT510 has quit [Quit: WeeChat 3.6]
ThinkT510 has joined #osdev
tarel2 has quit [Quit: Client closed]
xenos1984 has quit [Ping timeout: 264 seconds]
xenos1984 has joined #osdev
terminalpusher has joined #osdev
GeDaMo has joined #osdev
[itchyjunk] has joined #osdev
gildasio has quit [Remote host closed the connection]
freakazoid332 has joined #osdev
wootehfoot has joined #osdev
epony has joined #osdev
terminalpusher has quit [Remote host closed the connection]
gildasio has joined #osdev
saltd has quit [Read error: Connection reset by peer]
saltd has joined #osdev
freakazoid332 has quit [Ping timeout: 244 seconds]
[itchyjunk] has quit [Read error: Connection reset by peer]
gog has joined #osdev
tarel2 has joined #osdev
biblio has joined #osdev
xenos1984 has quit [Ping timeout: 268 seconds]
xenos1984 has joined #osdev
nyah has joined #osdev
DoubleJ2 has joined #osdev
DoubleJ has quit [Ping timeout: 268 seconds]
DoubleJ2 is now known as DoubleJ
saltd has quit [Read error: Connection reset by peer]
DoubleJ has quit [Quit: Not all those who wander are lost]
DoubleJ has joined #osdev
biblio has quit [Quit: Leaving]
dude12312414 has joined #osdev
dude12312414 has quit [Ping timeout: 258 seconds]
saltd has joined #osdev
dude12312414 has joined #osdev
k8yun has joined #osdev
heat has joined #osdev
[itchyjunk] has joined #osdev
k8yun has quit [Quit: Leaving]
freakazoid332 has joined #osdev
tarel2 has quit [Ping timeout: 252 seconds]
<heat> mjg, squeezed an 20k on open3 with my still-crappy slab allocator for struct files
<heat> an extra*
wootehfoot has quit [Ping timeout: 265 seconds]
<mjg> heat: right on
<heat> I still have big contention in my locks as you can imagine, without the magazine
xenos1984 has quit [Read error: Connection reset by peer]
<heat> also I figured out why I never read bonwick's paper talking about the magazines last night
<heat> it's literally the same one as vmem
<heat> and there's big inefficiencies in my unlock routines for sleepable locks
<heat> as in, they always spin_lock(), try to wake up threads, spin_unlock
<heat> which is, as you would probably describe, "super pessimal"
<mjg> this is how you normally do it though
<mjg> well depends on other factors
<mjg> key point being actual sleeping should be rare
<mjg> :>
<mjg> oh i misread. if your actual unlock always starts with a spin lock, then ye
<mjg> that's turbo pessimal
<heat> no
<heat> well, in mutexes, more or less so
<heat> but for instance
<heat> if (__atomic_sub_fetch(&lock->lock, 1, __ATOMIC_RELEASE) == 0) rw_lock_wake_up_thread(lock);
<heat> rw_lock_wake_up_thread ends up taking a spinlock
<mjg> that's ok
<geist> ah yeah, usually the trick is to encode in the atomic if something is waiting
<geist> so you can avoid that trip in the optimal case
<heat> yea
<geist> it's not too difficult, worth a quick couple hours to bash it together and do it
<geist> i need to go back and retrofit the LK mutexes actually. did it on zircon, but the lk mutexes are still spinlocky
<mjg> 's what solaris is doing.... :-P
<mjg> and really everyone
<mjg> one note is to align your threads to 16 or so so you have lpenty of bits to spare
<geist> debating if i was going to keep both paths, since on a UP machine, or an embedded one, just entering a critical section and then doing the mutex stuff is probably faster
<mjg> if you really care about UP you can hotpatch yourself at boot time
<geist> especially on machines where an atomic is really just a criical section
<geist> no, i can't. not on those kinda machines
<geist> since they'd literlaly be running in ROM
<mjg> oh that
<geist> but i can #ifdef it. that's what i really mean, do i want to ifdef the older, UP version, per arch
<mjg> i ws thinking something which can happen to boot to either
<geist> or just toss it and move to an atomic based solution
<mjg> you find out there is only 1 cpu, thee are savings to be made
<heat> > lk mutexes are still spinlocky
<heat> isn't that the point?
<heat> mjg
<geist> hmm?
<heat> mjg says you should heavily spin for all kinds of sleepable locks
<heat> and that they "make or break performance"
<mjg> yep
<heat> i don't know if thats the kind of "spinlocky" you're talking about
<geist> sure you can do that too but that's a level 2 optimization, and highly centric on the workload, etc
<geist> at some point you have to start tailoring what you're doing to the workload. for highly contentious, short mutexes on a SMP machine it may make sense to spin for a period of time before giving up and blocking
<geist> we do that now in the zircon mutexes too, helps, but only for certain kinds of locks
xenos1984 has joined #osdev
k0valski1889 has joined #osdev
<mjg> i'm yet to see a real-world case which ultimately loses from it. i also note even linux tries to spin on all of its locks
<mjg> including semaphores
<mjg> the do or don't factor is whether the lock owner is still running
<mjg> well there is one degenerate pattern which does lose, but the answer to that is "don't employ the degenerate pattern"
<mjg> and that's multiple cpus taking the same locks one after another
<mjg> each cpu: for (i = 0; i < n; i++) { lock(obj[i]); ... ; unlock(obj[i]); }
isaacwoods has joined #osdev
nyah has quit [Quit: leaving]
<mjg> geist: do i read it right zircon mutex code does not track the owner?
<mjg> // Stop spinning if it looks like we might be running on the same CPU which
<mjg> // was assigned to the owner of the mutex.
<geist> it should
<mjg> what's up with this bit then
<geist> seems fairly self explanatory
<mjg> well let me restate
<mjg> can you reliably check from that code if the lock owner is running?
nyah has joined #osdev
<mjg> i don't see this bit but maybe i'm blind here
<geist> possibly not. i dont have it in front of me
<geist> it tracks who the owner is, but i dont know if it goes back and marks it as preempted
<geist> if the owner gets preempted, etc
<mjg> ye no
<mjg> // It looks like the mutex is most likely contested (at least, it was when we
<mjg> // mutex hoping that the thread which owns the mutex is running on a different
<mjg> // CPU, and will release the mutex shortly.
<mjg> // just checked). Enter the adaptive mutex spin phase, where we spin on the
<mjg> if this is the state you are dealing with, speculatively spinning, then i'm not surprised you see wins if you decide to stop
<mjg> this needs to be patched to check the owner
<mjg> and not guess
* geist nods
<mjg> then you will see consistent wins
<geist> there's reasons it's not easy to do that
<mjg> i presume this was not done becaues you have no means to safely access owner's struct
<geist> correct
<mjg> i got ya covered without rcu
<mjg> :)
<geist> there's work underfoot to restructure all of that, which will unlock the ability to do that
<mjg> one way, which i don't recommend, is what freebsd did: threads never get actually freed
<mjg> solaris added a special barrier: should yu end up releasing pages backing threads, you issue a special per-cpu barrier
<mjg> to make sure whoever is spinning buggers off
* geist nods
<mjg> i think it's an ok approach
<heat> swear to god, unixes never freed shit didn't they?
<geist> none of this code i 'own' anymore, so i can send it to the correct folks
<mjg> heat: normally they did not
<geist> but they'll nod at me and say 'yes we know, this is something we want to get to eventually'
<mjg> well i'm negatively surprised here
<mjg> a lot of machinery to timeout a spin and whatnot
<geist> aaaaaand once again this is why i dont post links to fuchsia code here anympre
<geist> beause then it just turns into a mjg shit on it fest
<mjg> welp
<geist> and frankly i dont want to deal with that right now.
<mjg> np
<mjg> anyway my recommendation to heat is to do an adaptive spin
<heat> :(
<heat> no negativity guys
<mjg> which is what everyone(tm) is doing
<mjg> heat: i would say show your unix roots and make the thread-backing slab never free the pages
<mjg> for the time being
<mjg> no point adding complexity to handle it at this stage
<heat> you realize I just look at the thread and yolo right?
<mjg> well yolo is not even looking
<mjg> which i'm not going to say what system is apparently doing
<heat> yo
<heat> chill with the hostilities
<heat> banter is opt-in, not opt-out
<mjg> for real though, all other things equal, you would beat openbsd just by not having pessimal locking behavior
<mjg> which they do have a lot
xenos1984 has quit [Ping timeout: 250 seconds]
xenos1984 has joined #osdev
<mjg> [unless ithey noticed it's a huge deal now that they got flamegraphs a little while back]
frkzoid has joined #osdev
<mjg> heat: fun fact, until 2005 or so solaris would walk all per-cpu state to check who is running there
<mjg> heat: instead of derefing threads
<heat> im not particularly interested in un-pessimizing locks since, erm, it doesn't matter much
<heat> if I have bad IO performance or poor VM code some nanoseconds shaved off the mutex_lock won't matter
<geist> right. optimizing without real load is generally a way to get side tracked and not get anything done
<geist> and honeslty again, i'd rather us not get into a This Vs That style discourse here
<geist> i'd like folks to rememeber this is about everyone developing their own os, learning as they go, etc
<geist> lets try not to throw up seemly artificial barriers that we have to compete wioth other things, etc
<geist> that might scare folks off
freakazoid332 has quit [Ping timeout: 264 seconds]
<mjg> i would liek to note that locks were top of the profile in onyx
* geist gives up
<heat> mjg, they were top of the profile because I have big locks in a few places
<heat> don't forget we're looking at 4 threads, not 40
<mjg> well i'm not gonna flame about it today
<mjg> i would say an osdev article about realities of locking be nice
<mjg> meanwhile if you find time you can read the mcs paper to get a general idea
<heat> <p>very herd</p>
<heat> post it on the wiki
<mjg> should keep up th perf work, just ping me when you beat openbsd in a fair bench
<mjg> with a flamegraph
<mjg> :)
<mjg> should you*
<mjg> and if possible find out to grab one for thier kernel
<heat> you know
<mjg> it is plausible you are going to need to recompile it, which may be too much hassle
<heat> I've been compiling with ubsan all this time
<mjg> oh?
<heat> yeah
<heat> it may already be faster
<mjg> well i do note obsd is partially intentionally self-shafted single-threaded
<mjg> due to mitigations to stuff
<mjg> which you don't have
<mjg> (other self-shafting is them just being slow though)
<mjg> i tried hotpatching th kenrel once to not employ meltdown mitigatins et al
<mjg> but then it crashed
<mjg> :)
<mjg> i don't think you can disable retpoline stuff either and i don't htink you are going to add it for yourself
<mjg> so tell you what, get the fastest multithreaded result you can
<mjg> and if that beats openbsd we will think
<mjg> sounds good?
<mjg> being able to run on more threads would be a great nullifier for that factor
<heat> ez
xenos1984 has quit [Ping timeout: 264 seconds]
<mjg> how much was open?
<heat> 200k
<mjg> that's 2 mln vs 0.2 mln?
<heat> no, it's 201k vs 200k
<heat> i win
<heat> ggez
<mjg> i see 7 digits
<mjg> can you boot up and test open again
<heat> shoot
<mjg> look man 2 mln at -j 4 is pretty pathethic still, so there is that
<mjg> :)
<heat> oh fuck off
<heat> it's decent
<mjg> oh wait that *same* file?
<mjg> ye that's ok then
<heat> yes open same file
<mjg> can you open2 instead please
<heat> sure
<heat> slightly slower for me
<heat> around 1.9M
<mjg> and vs open?
<mjg> bsd
<heat> let me get a flamegraph first
<heat> i'm curious
<mjg> +1
<heat> oh I think I see why, I'm always hitting the /tmp lock
<heat> instead of a hashtable
<mjg> but that bit should be read-locked
<heat> but it's a linked list
<mjg> i'm going to tell you something later which will make your day
<heat> ugh. it's impossible to see through this
<heat> noubsan just makes it inline more
<mjg> do you have an explicit linline keyword spread?
<mjg> there was a hack in the bsd land to use __inline instead
<mjg> which you can redefine as needed
<heat> doesn't matter
<heat> gcc just inlines
<mjg> it would say -O0 just to get the graph is not a tragic otion
<heat> that + C++ functions in classes on headers get an implicit inline (ish, I don't know what the std calls it)
<mjg> not the best either, but there is a sacrafice, innit
dude12312414 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
dude12312414 has joined #osdev
<heat> most hit instruction in path opening is the list iteration
<heat> in the*
<heat> I guess it's literally just caching issues
<mjg> can you make sure openbsd is doing 200k
<mjg> and not 2 mln
<heat> aktually wtf
<heat> it's doing 200k
<zid> what was that old kernel you guys were talking about RECENTLY, SHIT
<heat> old kernel and shit? probably freebsd
<mjg> sounds about right
<zid> *bashes his keyboard a bit*
<zid> I think it was some old microkernel that has been ported forwards a bunch of times
<heat> mach
<zid> no you were talking about it recently, had like 10 names for various incarnations
<zid> like.. a week ago
<heat> unix?
<heat> sorry, microkernel
<mjg> windows nt
<zid> if it were one anyone had heard of
<zid> I wouldn't have to grep logs for words I think might have been in the conversation
<heat> mjg, soooo I've been relying on their negative cache working right. this suggests it's either not working right or they really are bad
<zid> then give up and ask
<mjg> heat: EPARSE
<heat> where's the syntax error
<mjg> undefined variable "their"
<heat> open
<mjg> you ported their negative entry caching?
<heat> no
<heat> wtf
<mjg> how have you been relying on it
<mjg> where
<heat> I'm relying on in wrt the tests having any meaning
<mjg> this particular bnech is not running into negative entries
<mjg> open3
<heat> right
<heat> the path walk should be done entirely in-memory right?
<mjg> if anything, their existence can make things slower for open, since they may need to traverse them as part of RB
<mjg> yes
<heat> ok so they do have meaning
<mjg> if there are negative entries in the relevant rb tree AND are in the way to the target vnode, then yes
<heat> I'm still using tmpfs which is literally just implemented linux-like
<rb> tfw no wb trees
<zid> µITRON was my final answer
<heat> yeah i'm 10x faster than them wtf
<heat> ???
<mjg> that you can check with a flamegraph
<mjg> i find it very plausible they are still giant locking vfs entry points
<heat> so erm
<heat> either I misread at the beginning and I was already quite a bit faster than them, or I accidentally 10x'd my perf in the meanwhile
<heat> I'm willing to bet on the first
<mjg> you should be able to trivially test that though?
<heat> ugh
<heat> the gang checks out an old commit and rebuilds
<mjg> once you get to test make sure to run open1
<heat> on the old one?
<mjg> apart from open3
frkzoid has quit [Write error: Connection reset by peer]
<mjg> ye
<heat> ok mom
frkzoid has joined #osdev
<mjg> how long do you tkae to build anyway?
<heat> 3 minutes for the base system
<heat> more or less
<heat> but then I need to make a disk image, etc which takes a bit more
<heat> and install the development packages
<mjg> i would say snapshot the old build for future comparisons
<heat> (a run-once thing if I accidentally delete the sysroot)
<heat> OH
<heat> hahahaha
<heat> I was already almost 10x faster than openbsd
<heat> LMAO
<mjg> :)
<heat> this is fucking hilarious
<mjg> what about open1
<heat> around 1.6M as well
<mjg> give me a minute
dasabhi has joined #osdev
<mjg> do you have stat4 in there?
<heat> no
<mjg> can you easily compile it now?
<heat> yes-ish?
<bslsk05> ​dpaste.com <no title>
<mjg> that open1 was no -j or -j 4?
<mjg> would be nice if you ran both
<heat> -t 4
<mjg> ye ye
<mjg> add -t 1 variant plaese
<heat> open1 does around 1M -t 1
<mjg> ok
<heat> so, it's not super scalable right now
<heat> but, /decent/?
<gog> hi
<heat> hello goggers
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
<heat> mjg, 1.1-1.2M on stat4
<heat> around 600K on stat4 T=1
<gog> poggers
<gog> i'm making pasta
<heat> highly poggers
<mjg> heat: ok. note this is for separate terminal files
<mjg> with R C^W^Wforbidden tech this can just straight up scale
<heat> I don't even lock inodes when stating
<heat> I can't *find the actual stat* in the flamegraph
<mjg> lol
<heat> ah sorry
<heat> stat_vfs matches 0.1
<heat> %
<heat> dentry_open_cached is still the bottleneck
<mjg> ok
<mjg> so how does this compare to top of the tree?
<heat> testing now
aleamb has joined #osdev
<heat> 1.4-1.5M
<mjg> can you prefix results with -t
kpel has joined #osdev
<heat> -t4 1.4-1.5M -t1 670K
<mjg> so still better
<mjg> right on
<heat> you know something super funny
<heat> it was trivial to OOM my system back then
<heat> since these objects were on the malloc brk thing
<heat> this is still pre-slab-everything btw
<heat> mjg, time in open_cached doubles when -t 4
<heat> what does this mean magic performance man
<mjg> watcha doing in open cached
<mjg> is that the final setup?
<mjg> step
<mjg> im guessing you are doing shit atomics in there
<heat> not much, u
<heat> ?
<heat> nope
<mjg> lemme find it
<mjg> have you pushed everything?
<bslsk05> ​github.com: Onyx/dentry.cpp at master · heatd/Onyx · GitHub
<heat> yes
<mjg> dentry_get(d);
<mjg> __atomic_add_fetch(&d->d_ref, 1, __ATOMIC_ACQUIRE);
<mjg> as i said
<heat> shit atomics?
<mjg> well ok, it is "just" atomics :)
<mjg> i presume that's for the open benches
<heat> well I can't avoid that right
<heat> stat4
<heat> my stat4 is naive for now
<heat> it's just an open + stat + close
<heat> s/stat4/stat
<mjg> stat4 is stat(2), no open or cloes past initial creation
<heat> yes
<heat> i'm talking about sys_stat
<mjg> assuming you can't do R C....
<Griwes> you mean s/you/U/
<mjg> first, the perf loss is from multiple cpus pounding on the same line
<mjg> the loss will always be there if you have to do it
<mjg> but there is a hack
<mjg> in principle you can leapfrog all the way to the end using rw locks without refing/unrefing in the meantime
<heat> oh daddy pound my cache line 🤤
<mjg> this will keep ping ponging lines for rw locks, so it still wont scale, but some of the overhead will disappear
<mjg> so it will overall better
* Griwes bonks heat
<mjg> be*
<heat> Griwes, let me guess, horny jail?
<Griwes> indeed
<zid> mjg: Where should I get all the SCALE I need to add to my code from? Kettle? Bathtub?
<zid> Is there a specific mineral makeup that works best?
<mjg> zid: solaris fanboy tears work great when you run it on more than 8 cores
<mjg> oh no, did i shit on that system again
<heat> yo
<zid> tears don't leave limescale, just salt crystals
<heat> have you ran this on windows?
<heat> we're missing a bit of a crucial data point
<heat> even though they do all that flushing fuckery in CloseFile
<mjg> albeit i would be curious
<heat> i think one of the main takeaways you would get is that Windows is not UNIX
<mjg> not bieng unix does not inherently mean path ops suck
<mjg> ..w hich i hear they do on windows
<bslsk05> ​'"NTFS really isn't that bad" - Robert Collins (LCA 2020)' by linux.conf.au (00:48:04)
<mjg> ye ye
<mjg> also note solaris is a unix, even a Real One Motherfucker
<mjg> ... and they do suck there
frkzoid has quit [Ping timeout: 244 seconds]
<mjg> actually would be funny if windows outperformed illumos on thiso ne
<heat> it won't
<heat> please click the link and click play
<mjg> that's too bad
<mjg> i have seen a slight bit of a that video, will have to watch tomorrow as i'm going afk soon(tm)
<heat> closing files in windows does all sorts of crazy things
<heat> like writeback!
<mjg> wait
<mjg> every time?
<heat> yes
<mjg> even if there is someone which still has it open?
<heat> they like their data safety, I can't blame them
<heat> I believe so
<mjg> then it's not a fair test to begin with
<heat> <heat> i think one of the main takeaways you would get is that Windows is not UNIX
<mjg> maybe a read-ony open would be better
<mjg> fucking meltdown et al, you can't easily compare anything now
<heat> sure you can
<heat> real workloads
<mjg> real workloads like to have many moving parts
<mjg> which make it quite problematic to doa fair test
<mjg> unless you do somethingl ike tar up linux uesrspace and literally run it under freebsd
<heat> getting GCC + GNU make and compiling something is fair-ish
<heat> the only moving part there is libc.so
<mjg> no true, paths tend to differ
<mjg> not*
<mjg> most notably theyw ill be longer on the bsds
<mjg> i was evaluating build.sh from netbsd
<mjg> since it cross cmpiles from linux and other systems
<mjg> but it turns out to do different stuff here and there
freakazoid332 has joined #osdev
sinity has joined #osdev
dude12312414 has quit [Remote host closed the connection]
CryptoDavid has joined #osdev
<saltd> re
dude12312414 has joined #osdev
<heat> re
GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]
* saltd looks for someones used clothes
<zid> pervert alert
kpel has quit [Quit: .]
srjek has quit [Read error: Connection reset by peer]
<gog> you rang
<zid> no I was just alerting you to my presense
<gog> oh
<heat> mjg, looks like I r e g r e s s e d when moving to slab-backed malloc
<heat> what if this is a TLB thing?
<heat> since each object is spread out more
<mjg> don't speculate
<mjg> i would start with narrowing down easiest case which shows the problem
<mjg> so for one -t 1
<heat> -t 1 is faster
<mjg> wiat
<mjg> you still don't do per-cpu caching
<mjg> do you
<heat> no
<mjg> well there it is
<heat> but it makes no sense that I bled 100k out of nowhere
<heat> unless it really is TLB
<mjg> you get yourself differential flamegraphs
<heat> if I enable direct map slabs for lower sizes it seems to help out a little
<mjg> you definitely want to validate there is indeed a difference
<mjg> if there is a speed up single threaded i doubt it is tlb
<mjg> rather this looks like the typical case of pesky atomics
<heat> i don't have more atomics
<mjg> get fgs man
<heat> in fact, they had a good bit of atomics in their code for the malloc-y stuff
<mjg> where is the slab
<heat> the allocator?
<mjg> origin/slab
<heat> wait, not updated
<mjg> still what's the fg for the slower case
<heat> it looks similar to what I had
<heat> 20% spinnies
<mjg> show it to me
<mjg> i can't help but note you got other stuff apart from slab in that branch
<heat> yes
<mjg> sounds like you may be in position to bisect?
<heat> no
<heat> why would I wtf
<mjg> ok so where is do fg
<mjg> the
<bslsk05> ​gist.github.com: onyx-open3-slab.svg · GitHub
<mjg> well you keep contending on alloc and free
<heat> yes
<mjg> just get that per-cpu caching
<heat> the thing is
<heat> why?
<heat> why did I lose that 5%
<heat> i dont understandadad
<heat> fuck it
<heat> we're going to 2001 memory allocation now
<mjg> you want to know why get a differential flamegraph
<mjg> here is a likely scenario, and it comes down to locks again
<mjg> your locks are bad and degrade under contention more than they should
<mjg> meaning if some part of the kernel got faster, there is more contention elsewhere, and that can make it slower in total
<mjg> should you get a diff flamegraph you can compare how much time is spent elsewhere
<mjg> but rly just add per-cpu caching kthx
<mjg> want to see this phenomenon in action, run will-it-scale on your linux host
<mjg> dup1_threads -n -t $(ncpus)
<mjg> once you get a stable number
<mjg> getppid1_processes -n -t 2
<mjg> then go to 3 etc, keep bumping
<mjg> and you willl see dup1 perf going up
<mjg> how many threads you got there
<mjg> dn't tell me it's 4
<mjg> :>
<heat> 8 threads 4 cores
<heat> I don't see what you mean
<heat> ah wait
<heat> yea I still don't see wtf you're on about
<heat> you're on the lsd too
<heat> mjg, can you take a quick peek at size_to_order?
<heat> I don't like it
<bslsk05> ​gist.github.com: gist:ce8bbf25d2ecb7d7998c65febf17e82b · GitHub
<heat> codegen looks a bit ass ngl
<mjg> i'm saying the lower quality of locks, the more they degrade when faced with contention
CryptoDavid has quit [Quit: Connection closed for inactivity]
<mjg> if you lessen a bottleneck somewhere, but your workload is contended, you add to said contention
<mjg> and very well may suffer a performance loss
<mjg> just get that per-cpu stuff sorted out kthx
<mjg> there are many moving parts here but when turbo bottlenecked, like you are right now, and perf may do anything from going down, to not changing, to going up a little
<heat> are you personally invested in this
<mjg> no
<mjg> it's not my code
<heat> you're lyin
<heat> you wanted me to beat openbsd
<mjg> you already beaten openbsd so at this point it's whatever
<mjg> ye that i cared a little about
<mjg> :)
<mjg> thought would be funny, at it is
<heat> lmao
<heat> what's next, dragonfly or net?
<mjg> loller
<mjg> lemme find it
<mjg> that's 2 years ago and with 1 or 24 threads
<mjg> freebsd is doing a lot of these faster than it used to
<mjg> while i suspect the rest did not improve
<mjg> wait are you legit beating openbsd with vfsmix?
<mjg> i can't stress *legit* enough
<heat> no
<mjg> welp
<heat> I was going to work on writeback remember? before we took the longest bikeshedding session ever
<heat> it may very well be fast, I don't know
<mjg> well man you do you
<mjg> the fact that they are mostl ikely still giant locked around vfs kind of kills the fun here
<mjg> i would be willing to tackle it, but this is still so far back, i'm not getting anywhere near
<heat> does it kill the fun?
<mjg> would be willing if they just had bad smp, not de facto no smp for this one
<heat> I quite like it
<mjg> i had my laugh that they are slower now
<mjg> would get extra if i sped it up
<heat> i've been doing this for 7 years and I already beat openbsd
<mjg> and you had to beat it again
<heat> dude, 20% of my cputime is just locks
<heat> they have percpu shit
<mjg> it's more in their case
<mjg> they have shit, correct
<mjg> wanna talk smalk here, find out how to get a flamegraph from their kernel
<mjg> and get one
<heat> >using openbsd
<heat> hard skip
<mjg> you laready said something like that last time
<heat> i know
<heat> I used openbsd enough to figure out how to install packages and use the shell
<heat> that's where it stops
<mjg> well i'm defo not getting one
<mjg> a fun fact is that trying to beat illumos would be a problem in that their vm guest performance is atrocious
<mjg> due to frequent rdtscp all around the kernel
<mjg> you basically would need to run it on bare metal somewhere
<mjg> for anything even remotely fair
<heat> i would never haha
<mjg> bonwick revokes your right to use slab
* heat hides the two rdtscs when entering and exiting the kernel
<mjg> they literally have one on enter and one on exit
<mjg> for "microstate accounting", which you can't disable
<heat> i do that too
<mjg> there was a switch for it but they made it into a nop
<mjg> wtf for
<heat> for, erm, cpu times
<heat> ^^
<heat> listen sweetheart, I even have an atomic add right there
<heat> that one should probably be refactored out
<heat> but I don't see how I would be able to get cpu system/user times otherwise
<mjg> for what
<mjg> normally kernels are pretty sloppy concerning precision of tehse
<mjg> you get a tick, you check ify ou interrupt kernel or userspace and that's where you add the supposed time spent
<mjg> maybe linux is doing something better now, but even then there is no way they suffer rdtscp on each entry/exit
<heat> but "getting a tick" is not a valid way to measure time
<mjg> it is how it used to be done
<heat> particularly if you're dynticks or noticks
<mjg> nobody claims it is perfect
<mjg> sure
<mjg> now i'm curious what ilnux is doing
<mjg> > To implement it, this new cputime infrastructure plugs into
<mjg> kernel/user/guest boundaries to take snapshots of cputime and
<mjg> flush these to the stats when needed.
<mjg> wut
<heat> see
<mjg> wtf
<heat> i'm not stupid
<mjg> is htis optional
<heat> neither is bryan cantrill
<heat> no
<heat> guess what sched_clock is
<bslsk05> ​elixir.bootlin.com: tsc.c - arch/x86/kernel/tsc.c - Linux source code (v5.19.12) - Bootlin
<mjg> well is the use conditionedo n CONFIG_VIRT_CPU_ACCOUNTING_GEN
<mjg> aka optional
<mjg> i see
<mjg> /boot/config-5.15.0-47-generic:CONFIG_TICK_CPU_ACCOUNTING=y
<mjg> so i'm pretty sure the default, as shipped with ubuntu, reacts to ticks
<mjg> i know for a facti llumos likes to call rdtscp so much it absolutely floors performance
<mjg> they have an anti-idiom where they use 'gethrestime' (or so) for just about anything
<mjg> timestamps more precise than they need or random numbers
<mjg> if anything i know linux to intentionally play sloppy with timestamp precision
<mjg> presumably to save on perf
<mjg> see e.g., timestamp updates on pipe writes
<heat> why do you need timestamps on those?
<mjg> 1 ms precision so that they don't have to dtscp
<mjg> fuck if i know, it is a thing
<heat> that reminds me
<mjg> pipe: reduce atime precision
<mjg> Sample result from will-it-scale pipe1_processes -t 1 (ops/s):
<mjg> before: 426464
<mjg> after: 3247421
<mjg> freebsd doing the same thing
<heat> i should redo my scheduler for tickless
<heat> ticking is bad
<heat> power efficiency is good
<heat> upvotes to the left
<mjg> with your perf level
<mjg> tickless should not be a concern for power usage
<heat> most of the machine is idle
<heat> usually
elastic_dog is now known as Guest8653
Guest8653 has quit [Killed (tungsten.libera.chat (Nickname regained by services))]
elastic_dog has joined #osdev
<saltd> Day changed to Sunday, 2. Oct 2022
<heat> not true
<zid> [23:29] <saltd> Day changed to Sunday, 2. Oct 2022
<zid> That's some incredible clockmanship
<heat> incredible very much in the sense that you can't believe it
* saltd calculating answer ...
* saltd calculating answer ... 100%
<gog> hi
<jimbzy> yo
<saltd> re
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
dasabhi has quit [Ping timeout: 265 seconds]
elastic_dog has quit [Killed (molybdenum.libera.chat (Nickname regained by services))]
elastic_dog has joined #osdev
MiningMarsh has quit [Read error: Connection reset by peer]
MiningMarsh has joined #osdev
gog has quit [Ping timeout: 260 seconds]