#osdev on 2022-09-30 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:01 <heat> ugh sorry

00:01 <heat> they do

00:01 <heat> and handle it exactly how I said

00:02 <heat> go down the page tables and get the vm_page_t

00:02 pretty_dumm_guy has quit [Ping timeout: 268 seconds]

00:11 h4zel has joined #osdev

00:17 h4zel has quit [Ping timeout: 252 seconds]

00:40 genpaku has quit [Remote host closed the connection]

00:40 genpaku has joined #osdev

00:44 archenoth has quit [Quit: Leaving]

01:33 frkzoid has joined #osdev

01:38 frkzoid is now known as freakazoid333

01:47 SGautam has joined #osdev

01:54 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

01:55 sympt has quit [Read error: Connection reset by peer]

01:57 sympt has joined #osdev

01:57 MiningMarsh has joined #osdev

02:42 frkzoid has joined #osdev

02:44 freakazoid333 has quit [Ping timeout: 244 seconds]

02:46 gildasio has quit [Ping timeout: 258 seconds]

02:48 gildasio has joined #osdev

02:49 heat has quit [Ping timeout: 260 seconds]

03:11 [itchyjunk] has quit [Remote host closed the connection]

03:39 frkzoid has quit [Ping timeout: 244 seconds]

04:05 freakazoid332 has joined #osdev

04:09 freakazoid332 is now known as freakazoid333

04:17 h4zel has joined #osdev

04:41 darkstardevx has quit [Ping timeout: 265 seconds]

04:53 darkstardevx has joined #osdev

04:54 scoobydoo has quit [Ping timeout: 244 seconds]

04:54 darkstardevx has quit [Remote host closed the connection]

04:55 scoobydoo has joined #osdev

04:55 darkstardevx has joined #osdev

05:02 h4zel has quit [Quit: WeeChat 3.0.1]

05:24 elderK has quit [Quit: Connection closed for inactivity]

06:16 SGautam has quit [Quit: Connection closed for inactivity]

07:03 GeDaMo has joined #osdev

07:35 <nick64> What is the difference between IOPL and CPL? Or rather, when does IOPL change?

07:37 <Mutabah> A quick google (because I've not worked with it a lot) says that IOPL is in EFLAGS (CPL is in CS)

07:38 <Mutabah> IOPL controls the port IO instructions afaik

07:38 <nick64> I was more interested in how/when it changes

07:39 <Mutabah> Well, since it's in EFLAGS, you can change it whenever

07:39 <Mutabah> but I think you'd usually leave it at zero?

07:39 <nick64> I was not asking how to change it, I was asking when does it change usually

07:39 <Mutabah> (Which, afaik, won't allow antyhing other than CPL=0 to do in/out instructions)

07:39 <Mutabah> "when" - Whenever there's a popf/iret

07:40 <nick64> https://www.irccloud.com/pastebin/ym19XacA/

07:40 <bslsk05> IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/ym19XacA

07:40 <nick64> A few of examples of when CPL changes: 1. When you make a syscall, 2. When a syscall returns. Similarly when does IOPL change?

07:40 <Mutabah> It _can_ change on any of those - anything that can load EFLAGS could change it

07:41 <Mutabah> (although, I suspect that if the modifying code is not CPL=0, it'll GPF if the bits change)

07:41 <nick64> I don't think I was able to convey what I am trying to ask here

07:41 scaleww has joined #osdev

07:41 <nick64> What is a functional use case where it changes

07:42 <Mutabah> The simple answer is "whenever you [the kernel developer] want to change it"

07:42 <froggey> are you asking when you might want to change it?

07:42 <nick64> froggey: exactly

07:42 <nick64> What is an example scenario where it changes in the real world

07:43 <Mutabah> "basically never" is my impression

07:43 <froggey> never

07:43 <froggey> right

07:43 <nick64> Why is it a thing then to be able to change it?

07:43 krychu has quit [Quit: ZNC 1.8.2+deb1+bionic2 - https://znc.in]

07:43 <Mutabah> There's a lot of legacy stuff in x86

07:44 <froggey> linux had an iopl function for changing it per-process. xorg uses it to get access to hardware io ports

07:44 <Mutabah> that said - I think there's a syscall on linux that allows you to modify the IOPL such that userland (for a single process, or even thread) can use the IO instructions

07:44 <Mutabah> I don't know off the top of my head how it inteacts with the IO permissions bitmap

07:44 <froggey> but because iopl=3 also enables use of the cli/sti/hlt instructions, the implementation has changed to use the tss io bitmap instead

07:45 <Mutabah> ^ that makes sense (that the IOPB passing is ORed with the IOPL passing)

07:45 <Mutabah> Much finer grained

07:46 <froggey> yeah

07:47 * nick64 is processing

07:47 <froggey> it's a weird legacy feature basically, like rings 1 & 2

07:47 <nick64> I see

07:50 <Mutabah> For reference - Intel's manuals have a decent description in the EFLAGS register section

07:55 <nick64> Yeah I was reading that earlier and couldn't wrap my head around why that is required

07:57 <nick64> Like, if there is a way for userland to request it's IOPL to be downgraded to match the CPL, why have IOPL checks in the first place. If Kernel does some sort of privilege check (maybe UID of the user thread?) during the IOPL downgrade, then it can perhaps do the same check during the IO port access time itself, and not even have an IOPL maintained

07:57 <Mutabah> There's a lot in x86 that is left over from experiementation in the transition to 32-bit

07:57 <nick64> Yeah, when you mentioned it is only for legacy, I abandoned that train of thought

08:00 k8yun has quit [Quit: Leaving]

08:01 elastic_dog has quit [Ping timeout: 244 seconds]

08:07 elastic_dog has joined #osdev

08:20 roan has joined #osdev

08:21 elastic_dog has quit [Ping timeout: 260 seconds]

08:21 <nick64> Correcting a flaw in my thought there. It would be the CPU that does the access checks and not the kernel, and CPU is not UID aware

08:23 <Mutabah> Yep

08:23 <Mutabah> The CPU only knows about the special CPU registers. The kernel checks the UID/permissions before doing something in a syscall

08:23 <nick64> Upon further reading, looks like it is CAP_SYS_RAWIO what is the deciding factor by the kernel

08:24 <Mutabah> so the `iopl` will check permissions before actually setting IOPL

08:24 epony has quit [Remote host closed the connection]

08:25 epony has joined #osdev

08:26 elastic_dog has joined #osdev

08:28 elderK has joined #osdev

09:14 zaquest has quit [Remote host closed the connection]

09:23 zaquest has joined #osdev

09:37 scaleww has quit [Quit: Leaving]

09:42 <ddevault> ✓ multithreading

09:42 <ddevault> complete userspace implementation clocks in at about 250 lines of code

09:48 gxt has quit [Ping timeout: 258 seconds]

09:49 freakazoid333 has quit [Ping timeout: 244 seconds]

09:50 gxt has joined #osdev

09:51 gxt has quit [Remote host closed the connection]

09:52 gxt has joined #osdev

09:52 frkzoid has joined #osdev

11:00 lkurusa has joined #osdev

11:25 elastic_dog has quit [Ping timeout: 246 seconds]

11:27 bauen1 has quit [Ping timeout: 244 seconds]

11:27 bauen1 has joined #osdev

11:28 frkzoid has quit [Ping timeout: 250 seconds]

11:32 elastic_dog has joined #osdev

11:38 elderK has quit [Quit: Connection closed for inactivity]

11:43 elastic_dog has quit [Quit: elastic_dog]

11:44 elastic_dog has joined #osdev

12:10 [itchyjunk] has joined #osdev

12:28 obertisa has joined #osdev

12:29 <obertisa> https://endchan.net/tech/res/15123.html

12:29 <bslsk05> endchan.net: /tech/ - pci_disable_device unknown parameter error

12:29 <obertisa> help me

12:30 <obertisa> https://gitlab.com/sulincix/device-blocker

12:30 <bslsk05> gitlab.com: ⁨𐰀𐰞𐰃:𐰺𐰃𐰔𐰀:𐰚𐰀𐰾𐰚𐰃𐰤 (𐰽𐰆𐰞𐰃𐰤𐰆𐰽) / device-blocker · GitLab⁩

12:30 obertisa has left #osdev [#osdev]

12:47 opal has quit [Ping timeout: 258 seconds]

12:47 bauen1 has quit [Ping timeout: 252 seconds]

12:47 bauen1 has joined #osdev

12:49 gxt has quit [Ping timeout: 258 seconds]

12:51 nyah has joined #osdev

12:51 opal has joined #osdev

12:53 gxt has joined #osdev

13:01 gildasio has quit [Remote host closed the connection]

13:02 gildasio has joined #osdev

13:07 gildasio has quit [Remote host closed the connection]

13:07 opal has quit [Remote host closed the connection]

13:07 gxt has quit [Remote host closed the connection]

13:07 opal has joined #osdev

13:07 gildasio has joined #osdev

13:09 gxt has joined #osdev

13:11 bauen1 has quit [Ping timeout: 250 seconds]

13:13 bauen1 has joined #osdev

13:19 gildasio has quit [Remote host closed the connection]

13:20 gildasio has joined #osdev

13:24 <pitust> ddevault: it depends on how you define "complete"

13:25 <ddevault> aye, does not include syncronization primitives

13:25 <ddevault> which I think belong in a separate module

13:25 <pitust> someone had bash running (with mlibc) in 1000 lines of code

13:26 <pitust> it also depends on how you measure "userspace"

13:26 <ddevault> judge for yourself

13:26 <ddevault> https://git.sr.ht/~sircmpwn/helios/tree/master/item/vulcan/rt/tls.ha

13:26 <bslsk05> git.sr.ht: ~sircmpwn/helios: vulcan/rt/tls.ha - sourcehut git

13:26 <ddevault> https://git.sr.ht/~sircmpwn/helios/tree/master/item/vulcan/threads

13:26 <bslsk05> git.sr.ht: ~sircmpwn/helios: vulcan/threads/ - sourcehut git

13:47 <ddevault> re: completeness: create, destroy, suspend, resume, detach, exit, join; plus TLS support

13:49 <ddevault> everything a growing young process needs

13:54 opal has quit [Remote host closed the connection]

13:56 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

13:57 opal has joined #osdev

13:59 Starfoxxes has quit [Ping timeout: 260 seconds]

14:01 Starfoxxes has joined #osdev

14:06 <pitust> can't you implement join/detach/resume/suspend with signals and shared memory, no extra stuff required

14:06 <ddevault> I essentially already do

14:06 <ddevault> or do you mean userspace signaling

14:07 <ddevault> https://git.sr.ht/~sircmpwn/helios/tree/master/item/vulcan/threads/thread.ha#L122

14:07 <bslsk05> git.sr.ht: ~sircmpwn/helios: vulcan/threads/thread.ha - sourcehut git

14:07 <ddevault> https://git.sr.ht/~sircmpwn/helios/tree/master/item/vulcan/threads/thread.ha#L154

14:07 <bslsk05> git.sr.ht: ~sircmpwn/helios: vulcan/threads/thread.ha - sourcehut git

14:11 <ddevault> anyone have any experience reading power details (e.g. battery capacity) from ACPI?

14:13 <ddevault> ah, found the relevant parts of the spec

14:18 Starfoxxes has quit [Ping timeout: 265 seconds]

14:25 Starfoxxes has joined #osdev

14:44 gildasio has quit [Remote host closed the connection]

14:44 xenos1984 has quit [Read error: Connection reset by peer]

14:45 gildasio has joined #osdev

14:48 freakazoid332 has joined #osdev

14:52 terminalpusher has joined #osdev

15:01 puck has quit [Excess Flood]

15:02 puck has joined #osdev

15:02 xenos1984 has joined #osdev

15:04 puck has quit [Excess Flood]

15:05 puck has joined #osdev

15:08 heat has joined #osdev

15:09 <heat> all your operating systems are shit

15:09 <heat> mine is the best

15:09 <heat> and this is straight fax

15:10 <Ermine> no u

15:10 <heat> shut up poopy head

15:15 <zid> fax is dead, long live chain emails

15:15 <zid> if you don't forward this irc message to 20 people, you owe me $10

15:29 <heat> zid, send me your credit card info pls

15:29 <heat> it's for a survey

15:29 <heat> i give u 10 doller after yes?

15:34 gildasio has quit [Remote host closed the connection]

15:35 gildasio has joined #osdev

15:40 <zid> z*dsoft+p_ypal@gmail.com

15:40 <zid> cvv is 386

15:40 <zid> ccv?

15:41 <heat> I know you're lying to me

15:41 <heat> who's this z*d fella and why are you giving me his/her cc info

15:43 <zid> if the $10 ends up with the wrong person I will forgive you

15:44 <zid> I am very benevolent

16:15 <mats1> order some ribs

16:17 gxt has quit [Remote host closed the connection]

16:19 gxt has joined #osdev

16:27 xenos1984 has quit [Ping timeout: 250 seconds]

16:29 xenos1984 has joined #osdev

16:30 vin has left #osdev [WeeChat 2.8]

16:31 vin has joined #osdev

16:32 <heat> clang is flattening a 400 line slab allocator into like 5 functions

16:32 <heat> it's pretty funny

16:33 maxdev has joined #osdev

16:35 <heat> mjg, btw is vmem freebsd's page allocator? I couldn't tell from the book

16:35 <heat> it seemed like it allocated virtual memory but I got the impression page allocation was kind of attached to it

16:55 <heat> a really nice optimization I made on my slab allocator is that I scratched having a ctor because it's semi-useless

16:55 <heat> and most importantly, I can now fit bufctls in the unused buffers

16:57 <heat> and having read their solution to the "object to slab" problem, I much prefer going down 4 or 5 page tables

16:57 <heat> I'm willing to bet their stupid hash table is way slower (and probably requires locking)

16:58 <heat> i thought about it for a bit and realized I don't really even need locking since the mapping is live

16:59 <heat> (if it isn't and we got a bad pointer, it's a whole other question that I'm not ready to solve. maybe some sanity checking along the way would be a decent idea)

17:02 terminalpusher has quit [Remote host closed the connection]

17:40 xenos1984 has quit [Ping timeout: 250 seconds]

17:42 opal has quit [Remote host closed the connection]

17:48 opal has joined #osdev

17:55 kof123 has quit [Remote host closed the connection]

17:55 xenos1984 has joined #osdev

18:21 gxt has quit [Remote host closed the connection]

18:22 <mjg> heat: i don't see how not having a constructor optimizes anything

18:22 <mjg> heat: codepath at hand should be very rarely used

18:22 <mjg> heat: anyway vmem is sometimes used, but it is slow as fuck

18:22 <heat> you can't store your bufctl inside the object itself if you have a ctor initializing everything when allocating slabs

18:22 <mjg> heat: i don't know if that's an artifact of how it was implemented in freebsd

18:22 <mrvn> you always have to initialize objects. if the slab doesn't have a ctor then that just means you have to call the ctor by hand.

18:23 <moon-child> heat: every time you protect a hash table with a lock, cliff click sheds a tear

18:23 <heat> s/cliff click sheds a tear/mjg rants/

18:23 <heat> :P

18:23 <mjg> ... or if vmem is indeed that slow

18:23 <mjg> i tried to use it once and it was a disaster

18:23 gxt has joined #osdev

18:24 <heat> but *what is* vmem?

18:24 <heat> is it a vmalloc() kind of thing?

18:26 <mjg> no

18:26 <mjg> vmem promises fast range allocation

18:26 lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]

18:26 <mjg> range of numbers from x to y, typically a virtual address

18:27 <mjg> but you can use it for other stuff, bonwick mentions just pid allocatin

18:27 lkurusa has joined #osdev

18:27 <mjg> [which solaris does not use it for btw :>]

18:27 <mjg> core idea is the same as with per-cpu slabs

18:28 <maxdev> i hate signals and i refuse to further deal with their implementation

18:28 <heat> ugh

18:29 <mrvn> maxdev: I've avoided them so far too

18:29 <heat> the fucking book is so vague when it comes to this

18:29 <mjg> maxdev: signals hate you

18:29 <mjg> heat: :))

18:29 <heat> fuck you mckusick

18:29 <mjg> i'll relay the messager

18:29 <maxdev> i have them implemented, they cause more trouble than good, i'm removing them

18:29 <mjg> anyhow read the vmem paper by bonwick

18:29 <heat> i have not

18:29 <mjg> i noticed

18:29 <mjg> i'm sayin you should do it

18:30 <heat> probably

18:30 <mrvn> mjg: I have a pointer to the end of the last allocation and any allocation just checks the next N+1 pages are free.

18:30 <mjg> just remember to not buy into the hype :>

18:30 <heat> but I don't think I'll go for vmem in any case

18:30 <heat> or percpu vmem areas

18:30 <mrvn> (using the processes page table)

18:30 <heat> that sounds... not too useful

18:31 <mjg> heat: not saying you should, but it is one of the papers i consider i must read if you do osdev

18:31 <heat> interestin

18:31 <heat> thanks

18:31 <mjg> and it's not an endorsement of bonwick, just he happened to write about it :)

18:31 <heat> I also want to read uvm and ubc one of these days

18:34 <mjg> whatever stuff you read make sure to not buy into the success story

18:34 <heat> lmao

18:34 <heat> why, because it's netbsd?

18:34 <mjg> most papers claim whatever they describe is best shit ever

18:34 <heat> :P

18:34 <mjg> not talking about uvm specifically

18:34 <mjg> i read some vfs papers and rest assured, LOL does not begin with describe it

18:35 <mrvn> mjg: most publications don't accept failures.

18:35 <mjg> all while beinig presented like a turing award-worthy endeavor

18:35 <mrvn> "We tried this and see how bad this was." **rejected**

18:35 <heat> where are all the papers about freebsd and linux?

18:36 <heat> i have not seen one

18:36 <heat> at least for linux

18:37 <heat> these big classics are solaris and netbsd

18:37 wootehfoot has joined #osdev

18:37 lkurusa has quit [Ping timeout: 250 seconds]

18:51 <mjg> heat: just google around, i don'th ave anything handy

18:51 <mjg> there is defo a paper about vfs for freebsd speed up

18:51 <mjg> which basically replaces one WTF with another

18:52 <mjg> an excellent lollery material is the solaris internals book

18:52 <GeDaMo> Maybe something here? https://papers.freebsd.org/

18:52 <bslsk05> papers.freebsd.org: FreeBSD Presentations and Papers :: FreeBSD Presentations and Papers

18:52 <mjg> which goes like "we had X performance problem, here is why, and now we don't!"

18:53 <mjg> example: read-write locks with reader counter protected with a mutex

18:53 <mjg> this is such a no-no i can't fanthom wtf thought this is a good idea

18:53 <mjg> they replaced it with something which is not as bad, but remains pessimal

18:54 gildasio has quit [Ping timeout: 258 seconds]

18:55 <mjg> they had a global lru in the name cache, with entries requeued on each hit, fix: remove the lru

18:56 gildasio has joined #osdev

18:56 <heat> here's a cute thing: http://web.mit.edu/tytso/www/linux/ext2intro.html

18:56 <bslsk05> web.mit.edu: Design and Implementation of the Second Extended Filesystem

18:57 <mjg> GeDaMo: no, the real lollers are all from the 80s

18:57 <heat> and https://www.kernel.org/doc/ols/

18:57 <bslsk05> www.kernel.org: Proceedings of the Ottawa Linux Symposium

18:57 <heat> mjg, freebsd wasn't a thing in the 80s

18:58 <heat> so, see, not freebsd

18:58 <mjg> bsd was a thing in the 80s

18:58 <heat> it's CSRG's fault

18:58 <heat> IMO, blame mckusick again

18:58 <mjg> maybe for some counterbalance i'm gonna point out you can't really trust lkml either

18:58 <heat> oh sure

18:58 <mjg> for example there is an old e-mail by torvalds that spinlocks are trivial, caues spinlocks and if you add complexity there gtfo

18:59 <mjg> then he had seen the light apparently as mcs spinlocks landed

18:59 <heat> well, it's torvalds

18:59 <heat> he's an expert in linux, not in any specific area

19:00 <mjg> i know people who treat his opinion as sacred

19:00 <heat> I much prefer linux's weird heuristics

19:00 <heat> like

19:01 <heat> linux giving up invlpg'ing if they need to invalidate more than exactly 33 pages

19:01 gildasio has quit [Remote host closed the connection]

19:02 gildasio has joined #osdev

19:02 <mjg> i don't know if heuristics is the right word for this one

19:03 <heat> it is

19:03 <mjg> anyway, a lot of these old geezers know one another from different projects and have lasting personal beefs

19:03 <heat> they apparently tested it

19:04 <mjg> so when they post a benchmark and some competition is doing worse in it, chances are it was intentionally gamed

19:04 <mjg> heat: i'm not saying this is a bad idea, just thew ord does not seem legit

19:05 <heat> https://github.com/torvalds/linux/commit/a5102476a24bce364b74f1110005542a2c964103

19:05 <bslsk05> github.com: x86/mm: Set TLB flush tunable to sane value (33) · torvalds/linux@a510247 · GitHub

19:07 <heat> mjg, is that wrt BSD people or does it also include linux

19:08 <mjg> literally everyone

19:09 <mjg> you even used to have devs from one project show up on lists of anotherp roject and straight up lie

19:09 <mjg> pretty funny

19:10 <mjg> my favourite saga concerns syscall performance (way pre cpu mitigations), where the bsd land claimed syscall perf over there is f4st3rz and overall better implemented

19:10 <heat> everyone grew soft and boring

19:10 <heat> except theo

19:11 <mjg> right on man

19:11 <heat> where's the drama, the lies, the flamewars?

19:13 <heat> i think theo would lie and say his performance is even worse because m'safeteh

19:13 <heat> (disregard that detail where I can't beat openbsd)

19:15 <mjg> want some juicy drama read up on grsecurity vs openbsd

19:15 GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]

19:21 gxt has quit [Remote host closed the connection]

19:21 gildasio has quit [Write error: Connection reset by peer]

19:22 gildasio has joined #osdev

19:23 gog has joined #osdev

19:24 gxt has joined #osdev

19:28 maxdev has quit [Remote host closed the connection]

19:33 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

20:01 gxt has quit [Ping timeout: 258 seconds]

20:02 <mrvn> heat: don't forget global pages. reloading pages tables won't invalidate them.

20:02 <mrvn> And I would expect that "33 pages" limit to be CPU specific.

20:02 <mrvn> s/limit/break even point/

20:03 opal has quit [Remote host closed the connection]

20:05 opal has joined #osdev

20:09 gxt has joined #osdev

20:11 freakazoid332 has quit [Remote host closed the connection]

20:11 freakazoid332 has joined #osdev

20:28 <geist> oh huh the 33 thing is pretty reasoned (at least for the time)

20:28 <geist> mrvn: see the commit message heat linked before

20:29 <geist> Though of course that’s for a specific workload (compiling the kernel) but interesting nonetheless. Actually not a bad idea running some tests to figure out what the average flush batch is

20:29 <mjg> it is kind of funny though that the canonical kernel bencmark is kernel build

20:29 <mjg> ... which is not at all what people are using the kernel for

20:29 <geist> Indeed. I do worry that that would tend to optimize the whole kernel for very specific workloads. Ie, lots of processes lots of forking

20:30 <geist> Vs say a lot of large long lived heavily multithreaded processes

20:30 <geist> That seems to be somewhat of the opposite pattern

20:30 <geist> Or even medium or small sized processes with lots of threads. That’s basically where fuchsia is at right now

20:31 <mjg> i'm happy it's not a microbenchmark at least

20:31 <mjg> :)

20:31 <geist> Does remind me, now that zen3 has been out a while, has linux finally gotten support for it’s fancy TLB shoot down mechanism?

20:31 <mjg> running real workloads is hard(tm)

20:31 <geist> I was surprised that AMD didn’t dump it in instantly

20:31 <mjg> well, i;m afraid amd is notorious for not optimzing linux

20:31 <mjg> (not saying they never do it)

20:32 <geist> I worry that it’s actually not a good win, or it is a win on desktopy stuff, but since servers are where a lot of the devs are they aren’t interested in it

20:32 <geist> There are definitely things like the clzero thing being bad on multi socket machines ( i think that was you reporting that) but it’s probably fine on desktop stuff, etc

20:32 <mjg> i have not seen any numbers from it

20:32 <geist> But since the monies are all in server land…

20:32 <mjg> clzero is doing non-termporal stores

20:33 <mjg> which in *certain* cases, multisocket or not, is terrible

20:33 <geist> Sure, the issue was cross socket bandwidth, etc

20:33 <mjg> and is extra bad on multisocket

20:33 <geist> Ah. What’s interesting is that’s the default way to do it on ARM

20:33 <mjg> is it?

20:33 <geist> Absolutely. Clzero is clearly one of the AMD features that was picked up from K12

20:33 <geist> Seems that at least half of these new AMD fancy things are 1:1 an ARM feature

20:34 <geist> (The tlb shoot down is the same thing too)

20:34 <mjg> are you in position to count cache misses with this vs regular zeroing?

20:34 <mjg> when doing the famed kernel build :p

20:34 <geist> On what hardware?

20:34 <mjg> arm

20:34 <geist> Sure, but my point is the clzero arm equivalent (`dc zva`) is used *everywhere*

20:34 <geist> Like the default memset implementation uses it

20:34 <mjg> at least 2 years ago nt tsores for clear_page still resulted in *more* cache misses (aka slower)

20:35 <mjg> maybe i misunderstood what you wrote here

20:35 <mjg> does arm64 clear_page use nt stores?

20:35 <geist> Basically if memset is asked to write zeros there’s a ‘fast path’ that just blats out 64 bytes at a time with the instruction

20:36 <mjg> nt stores for big enough(tm) memset and memcpy is preferable, since you are busting the cache anyway

20:36 <geist> `dc zva` is an instruction that blats out a zeroed cache line. It is defined as being non temporal (or at least cache skipping)

20:36 <mjg> but that's way past 4KB

20:36 <geist> Though the arch manual says it doesn’t always have to be, etc etc. ie, it doesn’t cache allocate

20:37 <geist> But yeah i do wonder if perf is dropped on the floor because of overuse of that in general purpose stuff

20:37 <geist> Thoughit’s possible arm is more clever about it in general, since they advocate using it more or less at every possible place

20:37 <mjg> https://elixir.bootlin.com/linux/latest/source/arch/arm64/lib/clear_page.S#L39

20:38 <bslsk05> elixir.bootlin.com: clear_page.S - arch/arm64/lib/clear_page.S - Linux source code (v5.19.12) - Bootlin

20:38 <mjg> so that's nt stores?

20:38 <mjg> 1:dczva, x0

20:38 wootehfoot has quit [Read error: Connection reset by peer]

20:38 <geist> Yep. Dc zva

20:39 <mjg> so

20:39 <mjg> on amd64 nt stores *in this routine* result in more cache misses (== slowdown)

20:39 <mjg> and i would be seriously surprised if it was different for arm64

20:39 <geist> Also the other path (the one that doesn’t use dc zva) is using the `stnp` instruction which is also non temporal

20:39 <mjg> heh

20:40 <mjg> well, in principle, regular zeroing may be so slow that you come out ahead even with cache misses

20:40 <mjg> the point t hough is that you are inducing more traffic to ram, which can;t be good

20:40 <gog> hi

20:40 <geist> Yah that’s my general thought. And again I think in the case of zeroing pages that’s different case from just general usage in user space in memset

20:41 <geist> Which ARM also generally does

20:41 <mjg> but perhaps arm is special casing this somehow and if the page is the line is already in cache it does not get evicted?

20:41 <geist> Oh I’m fairly certain that’s precisely the case.

20:41 <mjg> geist: i have no issues with nt stores in memset past a certain size

20:41 <geist> That’s what i mean about it being more clever

20:41 <geist> As in the non temporal part is a strong hint

20:41 <mjg> ye, that would make 100% sense

20:41 <mjg> ... and it's not what happens on amd64

20:41 <mjg> :>

20:41 <geist> All non temporals or just clzero?

20:41 <mjg> all nt

20:41 * geist pets gog

20:42 <mjg> unless i missed some special case

20:42 * gog prrr

20:42 <geist> So all nts in x86 are defined as explicitly writing back and evicting any cache lines that intersect?

20:42 <mjg> they invalidate thel ines

20:43 <mjg> and tnen you go to ram

20:43 <mjg> i did bench a clzero-based clear_page equivalent for freebsd. system time went down (faster zeroing), user time went up (cache misses)

20:43 <geist> Got it

20:43 <mjg> total real time basically the same

20:44 <mjg> so a net loss overall from more ram traffic

20:44 <geist> Yeah, same amount of work is done, just accounted differently

20:44 <geist> Also i guess a lot of this depends on if zeroing is done on free or alloc. On free it seems that dumping the cache as a result would be more okay than on the alloc path

20:44 <mjg> ... which will cause slowdowns when more cpus want to access it

20:44 <geist> since on the alloc path you might want to touch the page soon anyway

20:44 <mjg> i don't understand why you would zero on free

20:44 <geist> Security, etc

20:45 <mjg> user pages?

20:45 <mjg> i don't mean explicit_memset here

20:45 <mjg> :)

20:45 <geist> User pages what?

20:45 <mjg> in this case we are talking about pages being reused?

20:45 <mjg> clear_page when you get it or unmap

20:45 <geist> Yes

20:45 <mjg> right, so i don't see the point

20:46 <geist> Of what? I’m missing context here

20:46 <mjg> well in principle there may be a kernel memory disclosure bug

20:46 <mjg> ... of zeroing pages when freeing them instead of when they get faulted

20:46 <geist> Got it. Yeah i dont particularly see the point *except* from a security point of view some folks get antsy with user data sitting around in it

20:47 <mjg> i found a bug in linux once which would indeed be ablet o dump it

20:47 <geist> But yeah it has the whole ‘have to keep a queue of freed and freed and zero pages’ etc floating around

20:47 <mjg> security aside, zeroing on alloc is a clear cut win from perf standpoint

20:47 <geist> Or at least it’d be a lot more complicatated than to just treat freed pages as yolo

20:47 <geist> Agreed.

20:48 <geist> One thing i would like is when running a VM host with a lot of tenants, it’d be lovely if they zeroed their pages on free

20:48 <geist> Since the VM host can dedupe them, however that is really what balloon memory reclaimation is for anyway

20:48 <geist> Since that generally pressures the clients to trim their file cache or whatnot

20:49 <mjg> i don't know if zeroing is needed

20:49 <mjg> i would expect you could tell the hvm that this is unused now

20:49 <geist> I’ve forced it on my VM by doing the whole ‘fill a tmpfs file with zeros, delete it’ method

20:49 <mjg> then it can optimize bulk zeroing as needed

20:49 <geist> Yeah having some sort of memory based TRIM like call would be nice

20:49 <mjg> and have the guests vmexit if they want said pages

20:50 <geist> Agreed, but there doesn’t seem to be anything like that in any of the VMs i know about

20:50 <mjg> not much of a vm guy myself

20:50 saltd has joined #osdev

20:50 <geist> I think it’s generally the status quo to just let the guests use their allotted mem and be done with it

20:50 <mjg> fucking guests man

20:50 <geist> And if you really want some sort of overcommit and the guests are okay with it (ie, it’s your box) youcan use balloon memory scheme for that

20:50 <mjg> there are systems i'm not gonna name which just eat cpu while idle

20:50 <mjg> and not some 1%

20:51 <geist> But i suspect that’s not en vogue because most VM guests in most situations are on something like AWS or Azure or GCE where there’s no real reason for a guest to ‘play nice’with the rest of the machine

20:51 <geist> They’re paying for N GB of ram, they can use it

20:51 <geist> But i have no idea, I’m not a vm person either, so i dont know where it’s at

20:51 rwb has quit [Ping timeout: 260 seconds]

20:51 <mjg> that's precisely the environment where i expect vendors to try to squize more free ram

20:52 <geist> But i have a personal box running 10 or so qemu instances that i somewhat overcommit, and generally rely on page deduping and swap to work

20:52 <geist> Anyway gotta go. Ferry is landing

20:52 gildasio has quit [Write error: Connection reset by peer]

20:53 gildasio has joined #osdev

20:53 <mrvn> I think you have to look at why you get cache misses or lack thereof.

20:54 freakazoid332 has quit [Ping timeout: 244 seconds]

20:54 <mrvn> One pattern I've seen is: struct Foo foo; memset(foo, 0, sizeof(foo)); foo.x = 1;

20:54 <mjg> the extra misses are from accessing now evicted lines

20:54 <mrvn> If memset bypasses the cache then you write 0 to memory, get a cache miss and then write 1.

20:54 <mjg> alloc page, do work, free page, alloc page -- you are back to the same page, fully cached

20:55 <mjg> if you now zero it with nt stores you have to read it back from memory

20:55 <mjg> and for workloads like building the kernel this happens a lot

20:56 <mrvn> mjg: I would expect a nt store of cached data to update the cache.

20:57 rwb has joined #osdev

20:59 <mrvn> mjg: have you actually tested this? Does building a kernel frequently free a page and then alloc it again? I would think the libc just increases the heap till the compile step is done and never free anything.

20:59 <mrvn> mjg: or did you mean libc malloc() reusing freed memory over and over?

21:00 <mjg> i did test it, just like other people did

21:00 <mjg> it's not necessarily literally the same page, but it's still something you have in llc

21:01 <mrvn> mjg: and you get tons of sbrk() calls or munmap/mmap?

21:01 <mjg> lemme give you an example

21:01 <mjg> https://cgit.freebsd.org/src/commit/?id=64415b8b229f42d68126c0fcd1a22dd4dc138aed

21:01 <bslsk05> cgit.freebsd.org: src - FreeBSD source tree

21:02 <mjg> page copy instead of page zero, but same concept applies

21:02 <mjg> google around, you will find linux people doing more extensive tests and reaching the same conclusion

21:03 Ali_A has joined #osdev

21:03 <mrvn> mjg: pagecopy is far from freeing and allocating the same page though.

21:04 <mjg> i noted it's not necessarily literally the same page, just something you still have in llc

21:04 <mrvn> mjg: even that isn't hapening there.

21:04 <mjg> i used 'the same page' example for easier illustration of what's going on

21:04 <mjg> it is

21:04 <mjg> buildkernel is full of short lived processes

21:04 <mjg> so pages keep getting reused

21:04 <mrvn> what I see there is that copying the page will prime the cache so accessing the page after the copy generates cahe hits.

21:05 <mjg> found it

21:05 <mjg> there was a time where freebsd did not do numa

21:05 <mjg> and i had a 2 socket box

21:05 <mjg> make -j 40 buildkernel:

21:06 <mjg> nt stores: 1726.98s user 554.49s system 1841% cpu 2:03.87 total

21:06 <mjg> rep movsq: 1683.30s user 550.70s system 1876% cpu 1:59.08 total

21:06 <mjg> for pagezero

21:07 <mrvn> The interesting part in your url I find is that there is 0 change in the runtime. ~25% less cache misses and 0 change in speed.

21:10 <mjg> that's because at the time the kernel was incredibly slow in general

21:10 <mjg> lemme try to find an example

21:10 <mjg> some syscalls are now 3x the speed

21:11 <heat> how many openbsds was freebsd at that time

21:11 <mrvn> I also don't get the original code. WTF is it doing there? It copies a page in blocks of 64 byte and then loops some more blocks of 32byte?

21:12 <saltd> fuck your friends, we are going home

21:12 <saltd> but

21:12 <mjg> sigh i don't have numbers that old

21:12 <mrvn> Because a page isn't a multiple of 64 and surely there must be multiple blocks of 32 byte at the end of the page that aren't a 64byte block?

21:12 <saltd> wrnh chann

21:12 <saltd> o ops

21:13 <mrvn> oh wait, the first loop just prefetches a page and then it copies in a second loop, right?

21:14 <heat> yes

21:14 <mrvn> Wasn't the point of non temporal not to trash the cache? How does prefecthing work there?

21:15 <heat> you're prefetching the source

21:15 <heat> prefetch source -> normal load from cache -> nt store to dest page

21:16 epony has quit [Ping timeout: 252 seconds]

21:16 <mrvn> urgs, NT stores are weakly ordered but prefetchNTA is full coherent. So maybe that first loop just makes sure the memory is in sync.

21:17 frkzoid has joined #osdev

21:19 gorgonical has joined #osdev

21:19 <gorgonical> I am losing my mind with this assembler

21:19 <mrvn> heat: the prefetchnta seems to do something different on every cpu.

21:20 <heat> gorgonical, drop nasm

21:20 <heat> use gas

21:20 <gorgonical> I am using gas unfortunately

21:20 <heat> gas is good

21:20 <heat> what's your problem and why are you wrong

21:20 <mrvn> gas or gcc -S?

21:20 <gorgonical> I am writing a forth implementation in risc-v and can't abuse resetting variables to do word linking for me

21:20 epony has joined #osdev

21:20 <gorgonical> "redefined symbols can't be used in reloc"

21:20 <heat> dude

21:20 <mrvn> .oO(Tell us your problem and we will tell you why you are wrong)

21:21 <heat> you're writing riscv assembly?

21:21 <gorgonical> yes

21:21 <gorgonical> I acquired a taste for it this summer while porting a kernel

21:21 <heat> I would've blown my brains out at line 10

21:21 <mrvn> gorgonical: have you tried writing it in C and looking at the assembly output?

21:22 <heat> wait

21:22 <heat> variable?

21:22 <heat> what's a variable'

21:22 <gorgonical> I know what I want it to do, actually. I just can't tell if the assembler directive/macro system is good enough

21:22 <mrvn> gorgonical: if you use numbers for symbols you can reuse them

21:22 <gorgonical> basically each word needs a pointer to the previous one. .int link holds that. The regular forth impl resets link each time you define a word

21:22 <gorgonical> .int link

21:23 <gorgonical> .set link,\current_word or so

21:23 gxt has quit [Ping timeout: 258 seconds]

21:23 <mrvn> gorgonical: so a compile time register tracking the last allocation?

21:23 <gorgonical> yes done all by the macro system of the assembler

21:23 <mrvn> gorgonical: can you push/pop variables?

21:24 <gorgonical> I'm like 80% sure I can do this the manual way by tracking what the last word was and just inserting it manually but I don't really want to do that lol

21:24 <gorgonical> mrvn: Not that I'm aware. But I'm not that versed in gas directives/macros

21:24 rwb is now known as rb

21:25 gxt has joined #osdev

21:25 <gorgonical> by tracking I mean "remembering myself"

21:25 <mrvn> gorgonical: I tried doing macros and structures in gas but support for that is horrible. Under AmigaOS I had a Devpac assembler where you could basically define structs like C code and it would define offsets for all the members that you could use.

21:26 <mrvn> It had an actually usable macro language.

21:26 epony has quit [Ping timeout: 252 seconds]

21:26 <gorgonical> Yeah the gas macro language is ugly. I'm almost thinking it would be preferable to use the C preprocessor

21:26 <heat> why not both

21:26 <heat> I use both

21:27 <mrvn> gorgonical: that's what I always do. Just to be able to "#include" alone.

21:27 <heat> there's certain stuff you can't do in Cpp macros though

21:27 <heat> like instruction sequences, since there's no newlines in CPP

21:27 <mrvn> heat: op; op2; op3;?

21:28 <heat> does that work in gas?

21:28 <heat> I thought that was an inline assembly thing

21:28 <mrvn> not sure

21:29 <gorgonical> I can almost get away with what I want by using the \@ symbol that counts the number of times any macro has execd. But I need to insert link\@ and set link(\@+1) and that doesn't work

21:29 epony has joined #osdev

21:29 <gorgonical> And because of the way gas macros work you are limited in what you can concatenate to make a symbol name, apparently

21:29 <heat> It does!

21:29 <heat> https://godbolt.org/z/E1hxn9fTn

21:29 <bslsk05> godbolt.org: Compiler Explorer

21:30 <mrvn> https://godbolt.org/z/rooxWzMKW

21:30 <bslsk05> godbolt.org: Compiler Explorer

21:30 <mrvn> great minds think alike :)

21:30 <heat> equally stupid minds also think alike

21:30 <mrvn> syntax highliting says it's a comment. :)

21:30 <heat> :)

21:31 <mrvn> <source>: Warning: end of file not at end of a line; newline inserted :)

21:41 <gorgonical> I think the right (read: only) thing I can do without introducing the cpp is to manually specify what word came previously. When defining words interactively some word in memory is used to remember the latest thing we defined. That's what link does at compile-time

21:41 <gorgonical> I think

21:45 saltd has quit [Remote host closed the connection]

21:49 <gog> you can use the c preprocessor for assembler files

21:49 <gog> just use .S

21:49 <gog> gcc will do the right thing with them

21:49 <gog> clang too

21:50 <gorgonical> Yeah I just don't know exactly how to get the result I want with cpp at the moment

21:50 <gog> ah hm

21:50 <gog> that is tricky

21:50 <gorgonical> Maybe like a macro that spits out the right link variable name would work

21:51 <gorgonical> At the moment gas is complaining that gp and t0 aren't valid risc-v registers, which I have never seen before

21:51 <gorgonical> oh possibly I am a dumbdumb

21:52 <gorgonical> yes I think I understand lol

21:53 <gorgonical> Yeah I forgot the memory syntax and used j where I should have used jr

21:54 saltd has joined #osdev

21:54 <gog> oops

21:56 <heat> apparently there's a bug on every gcc since 4.8

21:56 <heat> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107107

21:56 <bslsk05> gcc.gnu.org: 107107 – Wrong codegen from TBAA when stores to distinct same-mode types are collapsed?

22:01 saltd has quit [Read error: Connection reset by peer]

22:09 saltd has joined #osdev

22:33 Ali_A has quit [Quit: Client closed]

22:36 epony has quit [Ping timeout: 252 seconds]

22:44 <geist> Make sure you read the riscv asm manual, which talks about what pseudo instructions are available and whatnot

22:45 <geist> It’s generally assumed that you lean on those very hard, which gives the assembler a fair amount of leeway to get what you need

22:45 <geist> Ie, using call pseudo instruction instead of jr directly, etc

22:45 <geist> Jalr that is

22:46 <heat> fo sho

22:47 epony has joined #osdev

23:01 <heat> if anyone wants to code review: https://github.com/heatd/Onyx/pull/46/commits/32c7ab8105524334ee9470dbca32e2660fc3e731

23:01 <bslsk05> github.com: slab: Add slab allocator by heatd · Pull Request #46 · heatd/Onyx · GitHub

23:01 <heat> it's still hella simple, no percpu magic yet

23:07 <heat> the cache's nr_objects and active_objects aren't touched because I quickly realized I'll need to add them to the percpu context

23:07 <heat> as to avoid the atomic add

23:07 <heat> yay for scalable counters I guess

23:08 <heat> ...actually it's under a lock right now

23:08 <heat> meh, don't care

23:08 <geist> Word. I’ll take a look tonight, am curious

23:12 <mjg> c->alignment = ALIGN_TO(c->alignment, 16);

23:12 <mjg> you want 8 bytes bro

23:12 <mjg> no simd

23:13 <geist> Depends on the ABI. If the ABI demands it you shouldn’t futz with that

23:13 <geist> And you can try to override it but that’s a good way to get to buggy compiler land

23:14 <heat> mjg: I like having extra bits in the address to stuff things in

23:14 <heat> but the alignment stuff isn't finished. as in, totally not finished

23:14 <heat> barely started

23:14 <mjg> well then don't get an 8 byte slab at least :p

23:14 <geist> Cmpxgh16b (and the arm equivalent) i think is a case where the 16b comes back

23:15 <mjg> i would whack null checking in kfree and demand non-null pointers

23:15 <geist> I believe we bumped into that in fuchsia. May not be x86, but arm may have the requirement for doubleword atomics

23:15 <mjg> for the special cases you can guarantee aligment

23:15 <mjg> i'm saying 8-byte sized (and smaller!) allocs do happen

23:15 <geist> Sure but problem is the compiler will assume that allocations are aligned that way, so will align their structures to assume as much

23:15 <heat> I originally tried to provide the same alignment guarantees as the original paper (as in, everything is $objsize aligned)

23:15 <heat> but aligning non-power-of-2 sizes is not trivial

23:16 <heat> the current code will also fail for > 4KB alignments

23:16 <heat> again, that area is super WIP

23:17 <mjg> i think you are giving yourself more work to do by not providing per-cpu support from the get go

23:18 <heat> yea, I'm on it

23:18 <mjg> key point being that with it you will only call down to slab for a full batch of objects

23:18 <heat> but why more work?

23:18 <mjg> or to return one

23:18 <mjg> ... batch

23:18 <mjg> no 1 obj at a time fuckery

23:18 <heat> I don't like that

23:18 <heat> i'm scared i'll just get objects stuck in the percpu queues forever

23:19 <mjg> it does not have to happen

23:19 <heat> and yes, I understand that's how fbsd rolls

23:19 <heat> i don't think linux's design does that? but I may be super wrong

23:19 <mjg> but if you cahe osmething per-cpu, you already are susceptible to it

23:20 <heat> sure, but the problem is probably less serious there

23:20 <mjg> ?

23:20 <heat> it also avoids fucking with other cpus' queues

23:20 <mjg> let's try an example

23:20 <mjg> in caes described in the original paper you have a "magazine" with n object cached

23:21 <mjg> what exactly are you planning to do in your caes?

23:21 <mjg> case

23:21 <heat> frees go to the percpu cache, first allocations hit the slabs

23:22 <mjg> how many objects are you wiling to store in the percpu cache

23:22 <heat> unknown

23:22 <mjg> ... and how are you going to avoid "stuck forever" problem

23:22 <heat> the idea for now will probably be to flush everything at purge time

23:23 <heat> as in, use the pcpu cache as a literal cache

23:23 <mjg> that's pretty weird imo and will definitely negatively affect your perf

23:23 <heat> it is?

23:24 <mjg> i would say, if you insist on returning per-cpu cache, and are handling the fast path with preemption disabled

23:24 <mjg> you can just ipi into it and tell the cpu to give everything back

23:24 <mjg> should you run into memory shortage

23:24 <mjg> otherwise bugger off

23:25 <heat> how do you even put it there without a lock?

23:25 <mjg> where

23:26 <mjg> oh back to slab?

23:26 <heat> no

23:26 <heat> in the magazines

23:26 <mjg> i think there is a miscommunication here, so let me pseudo code

23:27 <mjg> you get to the fast path, find you got nothing, you go to the slab layer, take whatever locks you need and grab 1 magazine of objects

23:28 <heat> oh, for you only?

23:28 <mjg> yea

23:28 <mjg> on free you get to the fast path, if there is free space in the magazine for the obj, you put it there

23:28 <mjg> now i remember why bonwick wnated 2 magazines

23:28 <mjg> if the magazine is full, you return it again with taking locks

23:29 <mjg> for the caes where you want to reutrn shit on demand, even if the cpu is chillin

23:29 <mjg> you can ipi, check it is not doing an alloc, and tell it to set the magazines aside

23:29 <mjg> e.g., you pass it a pointer to set to

23:29 <mjg> then on return you got the magazine and the cpu no longer does

23:29 <mjg> should you find it is mid-alloc, you cpu_relax and try again

23:30 <mjg> as preemption is disabled it is an invariant it will eventually finish

23:31 <mjg> in principle by the time you get there it may be doing another alloc and if that's something you are worried about i have a hack which sorts it out

23:32 <heat> I see

23:32 <mjg> remember to get 2 magazines tho

23:32 <mjg> you don't want a spot like this:

23:32 <mjg> you have a full magazine, free() comes your way

23:32 <heat> i was readin the codez and I think I was being conservative

23:32 <mjg> there is no place to put the obj, so return the magazine

23:33 <mjg> now you got 1 obj cached

23:33 gog has quit [Quit: byee]

23:33 <heat> I dont understand the 2 magazine shizzle btw

23:33 <heat> but that's not too important

23:33 <heat> surely

23:33 <mjg> well i gave you the corner case

23:34 <mjg> you may end up with 1 cached object and having to take another trip to slab

23:34 <heat> why would I end up with 1 object in the mag instead of $MAG_SIZE

23:35 <mjg> you have 1 magazine, it is full

23:35 <mjg> free comes your way

23:35 <heat> yes

23:35 <mjg> what are you planning to do

23:35 <heat> so dump 1 object, not the whole mag

23:35 <mjg> another free comes your way

23:35 <mjg> what are you planning to do

23:35 <heat> dump another object

23:35 <mjg> so you gonna keep taking 1 obj trip to slab now?

23:35 <mjg> as long as the above degenerate case happens?

23:35 <heat> in fact, here's what the linuz does

23:36 <heat> half the entries go to the global cache

23:36 <heat> i like this solution over your 2 magazine galaxy brain solution

23:36 <mjg> it's not mine but bonwicks

23:37 <heat> i bet bonwick designed doors

23:37 <mjg> also half is quite a difference from 1 by 1

23:37 <heat> yes

23:37 <mjg> here is a trivial caes where 1 by 1 fucking demolishes perf

23:37 <mjg> you rm -r a dir tree

23:37 <heat> i corrected my design by N / 2 - 1

23:37 <mjg> ok

23:37 <geist> Yeah get it working first, recognize that optimizations will come, and then move on

23:38 <geist> The much more important thing is to get the rest of the system thinking in terms of slab allocators

23:38 <heat> yeah

23:38 <heat> it works-ish right now

23:38 <mjg> makes you wonder, is slab really this good of an idea

23:38 <mjg> or do we all suck

23:38 <heat> last

23:38 <geist> Well, i have thought that thought too

23:38 <geist> Ie are slabs the end all of everything for kernels

23:39 <geist> It *seems*to generally fit the model in that you probably have a fairly limited set of objects in the overall system and thus it benefits well

23:39 <geist> Plus it can sit more or less directly on top of the PMM, etc

23:39 <heat> slabs are just object pools with a particular internal design

23:39 <mjg> one immediate problem with slab, which is not inherent to it, but is present

23:39 <mjg> is that you are going to have allocs with drastically different life spans

23:39 <heat> nt heavily does object pools I think

23:39 <mjg> which slab does not handle

23:40 <mjg> and you can keep entire pages hostage because of it

23:40 <geist> Yah there is a fair amount of loss due to external fragmentation?

23:40 <geist> Or would that be internal fragmentation. Either way, exactly

23:40 <mjg> but then is this a slab problem or cnsumer roproblem

23:40 <heat> bonwich paper says 7% fragmentation over 1 week

23:40 <heat> wick*

23:40 <mjg> on his sparc desktop?

23:40 <heat> yes

23:41 <geist> Well a mixed heap has at least more of a shot of mixing up allocations within a page

23:41 <heat> on his sparc desktop

23:41 <mjg> that's representative... of his sparc desktop

23:41 <geist> So *probably* on the average would have less unallocated space in pages

23:41 <mjg> i don't have any oither numbers, to be fair :-P

23:41 <heat> when does google release the server fleet's allocation profiles

23:41 <geist> Almost by definition you have a one page overhead per object type, but then lots of binning heaps have similar stuff

23:41 <heat> it's what they test tcmalloc on

23:42 <geist> Sure, but even that is the fleet. Big server stuff is really a different optimization than desktops/embedded/etc

23:42 <heat> yeah

23:42 gxt has quit [Ping timeout: 258 seconds]

23:42 <mjg> you i think a sparc desktop from late 90s is the middleground here!

23:42 SpikeHeron has quit [Quit: WeeChat 3.6]

23:42 <geist> This is a somewhat recurring problem in fuchsia, where we have server class algorithms/data structures/common knowledge applied to something that isn’t

23:43 <heat> but like, are you noticing 160k opens per second vs 200k opens per second in desktop/embedded?

23:43 <heat> if you're mjg and want to make fun of me, yes

23:43 <mjg> i would notice if you could make this many!

23:43 <geist> As is generally the case, having something working is infinitely more valuable than nothing working but an optimal design in the works

23:43 <geist> But it also depends on what makes you happy

23:43 <geist> Do you like making things go or do you like optimizing things. Both answers are totally valid

23:43 <geist> I’m much more in the former camp

23:44 <heat> yes

23:44 <mjg> later camp unite

23:44 * mjg is gonna sell merch

23:44 <heat> no

23:44 <heat> all camp

23:44 <mjg> OH

23:44 <geist> OPTIMIZERS RULE

23:44 <mjg> did i mention that certain perf baseline is part of correctness?

23:44 <geist> I think true engineering is designing with both in mind. Ie, getting something functional soon but make sure future designs are available

23:45 <mjg> i'm happy for you that your bubble sort works a-ok

23:45 <mjg> but i can't use it bro

23:45 gxt has joined #osdev

23:45 <geist> Trying to paint the room such that when you do full it in,you’re not in a corner

23:47 <mjg> heat: i think it would be in good taste to add a note that slab was inspired by the paper

23:47 <mjg> heat: at the top of the file

23:47 <mjg> also you may want to add a sun-related pun to onyx now

23:47 <heat> i made up slabs

23:48 <heat> my name is jeff bonwick

23:48 <heat> pleasure to meet you

23:48 <mjg> joel spolsky

23:48 <mjg> sup man

23:48 <mjg> you should post on stackoverflow

23:48 <heat> totally should matthew

23:48 <heat> loved your firmware talk!

23:48 <mjg> "interestingly" the english version of my name *is* matthew

23:49 <heat> guzik is polandian for garrett

23:49 <mjg> central european!

23:49 <heat> check

23:50 <heat> from the check republic

23:50 <mjg> anyhow openbsd has per-cpu slabs man, you should catch up

23:50 <mjg> basically i think you could learn a lot by reading their code

23:50 <heat> must. not. lose. against. theo.

23:51 <mjg> few years back openbsd add a new syscall which was printing some crap

23:51 <mjg> they directly derefed user memory in it

23:51 <mjg> s3cur1ty

23:51 <mjg> added*

23:51 <heat> are ctors for slabs that beneficial that you're willing to throw away memory for them?

23:51 <mjg> what do you mean throw away

23:52 <heat> you can't put your bufctl inside the free object

23:52 <heat> you effectively can put less objects in a slab

23:52 <mjg> just don't add them for now

23:52 <mjg> if you have something really heavyweight to do when creating the object, that's a candidate

23:53 <mjg> few stores don't count

23:53 <heat> right

23:53 <mjg> i expect you wont have any use for constructors for quite some time

23:53 <heat> but circling back to your vnode example with the lru

23:53 <heat> you're always paying the cost

23:53 <mjg> what do you mean by always here

23:54 <mjg> and cost in terms of memory or cpu time

23:54 <heat> cpu and memory

23:54 <mjg> memory would be used anyway

23:54 <heat> although maybe you guys found a better strategy for bufctls

23:54 <mjg> and it *saves* cpu time

23:54 <mjg> :>

23:54 <heat> why?

23:55 <mjg> consider n threads all allocating a vnode at the same time

23:55 <heat> I can call 16 ctors() at once, or 1 ctor at a time

23:55 <mjg> if there is no ctor, adding to global lru has to happen at said alloc time

23:55 <mjg> so you got n threads contending to do it

23:55 <mjg> and then you have to whack it from the list if the file goes away

23:56 <heat> sure but that's the best case scenario right?

23:56 <mjg> sounds like a bad case? :)

23:56 <heat> if you have contention, you dun goofed

23:56 <mjg> thanks to the ctor i don't

23:57 <heat> yeah but what if someone outside the allocation function locks it for any reason

23:57 <heat> traversal or wtv

23:57 <mjg> that's part of the point. ctor() stuff is incredibly rarely called compared to alloc/free

23:58 <heat> hrm

23:58 <heat> riiiight

23:58 <mjg> which also means i normallly avoid locking

23:58 <mjg> if ctor happens a lot, you are caching it wrong or are suffery turbo memory pressure

23:58 <mjg> suffering

23:59 <heat> but I'm not entirely convinced given that dtors are kind of a bad idea

23:59 <mjg> i need the dtor to dequeue the vndoe