#osdev on 2022-09-25 — irc logs at libera.catirclogs.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <heat> sorry

00:00 <mjg> + if (sysctlbyname("debug.flame_graph_data", buf, &bytes, NULL, 0) < 0) {

00:00 <mjg> + perror("debug.flame_graph_data");

00:00 <mjg> + exit(1);

00:00 <mjg> + }

00:00 <mjg> this is how userspace gets it

00:00 <heat> they're not using kmem, but it's kmem compatible

00:00 <heat> UNIX(r) Backwards Compatibility(tm)

00:00 <mjg> look i don't know why the macro is there

00:00 <mjg> i suspect a leftover

00:01 <mjg> uh

00:01 <mjg> + bytes = sizeof(size_t) +

00:01 <mjg> + sizeof(struct flame_graph_pcpu) +

00:01 <mjg> + sizeof(struct flame_graph_entry) * FLAME_GRAPH_NENTRIES;

00:01 <mjg> that is defo crap

00:01 <mjg> but the kinks of the interface can be worked out later

00:01 <mjg> if you don't have a general exporter from the kernel you can just temporarily add a fcntl or something

00:01 <heat> i'm adding a device file

00:01 xenos1984 has quit [Read error: Connection reset by peer]

00:01 <mjg> just to get this off the ground

00:02 <heat> as a true unix fan

00:02 <mjg> oh a man of culture

00:02 <heat> almost as good as /dev/poll

00:02 <mjg> or /dev/sliding_doors

00:02 <mjg> all this talk however reminds me to implement cheap lock profiling for freebsd

00:02 <mjg> with stacktraces and hookers

00:03 <mjg> there is this funny bit where there are 2 mechanisms, one turbo expensive based on dtrace, another kind of ok, but unable to grab stacks

00:03 <mjg> making it rather limited

00:04 <mjg> and by cheap i mean on the cpu, i defo expect to hog some ram

00:04 <mjg> and may even need to dedup stacks :[

00:05 <heat> do I need dtrace?

00:05 demindiro has quit [Quit: Client closed]

00:05 <heat> how does that even work

00:05 <mjg> you want something for instrumentation *at some point*

00:05 <heat> i'm only familiar with ebpf

00:06 <mjg> dtrace provides a dsl, but oversimplyfing one could argue they provide similar features

00:06 <mjg> [i know ebpf is more powerful :>]

00:06 <mjg> so not sure what you mean by how

00:07 <mjg> for example there is func entry/return tracing and that patches standard func prolog/exit with int3

00:07 freakazoid332 has quit [Ping timeout: 244 seconds]

00:07 <mjg> which i suspect ebpf is also doing

00:07 <mjg> (or maybe they have nops now

00:07 <heat> is dtrace also bytecode based?

00:08 <mjg> internally it has a vm of sorts afair, but you are not expected to know something like that as its user

00:08 <mjg> key design decision behind dtrace was to make suer it is safe to use in production

00:08 <mjg> to that end they implemented a dsl without loops and whatnot

00:09 <mjg> so they can prove the code finishes

00:09 <mjg> personally i like it to an extent, but it does come with limitations which are sometimes a problem

00:09 <mjg> you want to port something, port ebpf

00:09 <mjg> :>

00:10 <mjg> but i'm pretty sure that's not a weekend project, while flamegraph generation definitely is

00:11 <mjg> well if you are asking if dtrace sripts compile to direct cpu asm, then no

00:11 <heat> yeah but they compile to some bytecode yeah?

00:11 <mjg> perhaps you have been warped by systemtamp et al which create c -> asm -> actual cpu binary code

00:11 <heat> or do you just pipe strings to the kernel?

00:12 <mjg> no it has some form of a bytecode, but i never looked into it

00:13 <mjg> and i don't know who is doing the compilation

00:13 <mjg> plausibly the kernel while trying to elide or combine probes

00:14 <mjg> anyway

00:14 <mjg> + bytes = sizeof(size_t) +

00:14 <mjg> + sizeof(struct flame_graph_entry) * FLAME_GRAPH_NENTRIES;

00:14 <mjg> + sizeof(struct flame_graph_pcpu) +

00:14 <mjg> style or layering aside, this is a bug

00:14 <mjg> you most likely want to use offsetof

00:15 <heat> yeah

00:15 <mjg> or better yet sizeof(struct fg_export_entry) * entries

00:15 <mjg> i would just write a short tool which grabs how many seconds to sample for, tells the device and everyone is happy

00:15 <mjg> once it's all done the ioctl returns and your user buf is populated

00:16 <mjg> well you need to know how to much memory to alloc for it, so you could ask the device

00:20 demindiro has joined #osdev

00:21 xenos1984 has joined #osdev

00:26 <heat> legit the most horrifying code I've written

00:26 <heat> i'm taking all the shortcuts

00:26 <heat> like a true unix

00:29 <mjg> btw what would be great is lock contention flaemgraphs

00:29 <zid> sizeof (struct fg_export_entry[entries]) for life

00:29 <zid> even if it does need VLA support for the syntax

00:29 <mjg> as in you have lock1 and wait for lock2

00:30 demindiro has quit [Quit: Client closed]

00:30 <heat> mjg, i eventually want an actual tracing system like what perfetto can do

00:30 <mjg> indicate this is what happened, so that when you see crazy lock1 wait times, you know where they acame from

00:30 <heat> https://ui.perfetto.dev/

00:30 <bslsk05> ui.perfetto.dev: Perfetto UI

00:30 <heat> see the android example

00:30 <heat> it's so sweet

00:30 <mjg> never played with it

00:32 <mjg> meh

00:32 <mjg> i'm blanking on a name, but there are tools which do flamegraph0y stuff with time axis

00:33 <mjg> one thing you can easily add to have a leg up on unix

00:33 <mjg> is to track off cpu time waiting on i/o 'n shit

00:33 <mjg> and add it to time(1)

00:34 Iris_Persephone has quit [Ping timeout: 244 seconds]

00:34 <mjg> i wanted to add it on freebsd, but you can't without screwing the abi

00:34 <mjg> or adding a new variant of waitpid which exports the bigger struct

00:34 <mjg> and i don't think doing that just to get the extra numbers i can arguably obtian with dtrace is justifiable

00:35 <mjg> hmmm

00:35 <mjg> now that i said it

00:35 <mjg> wait4 takes an options flag

00:35 <mjg> i could add one which indicates "the target rusage area is actually the extended stuff"

00:36 <mjg> ye i think i'm gonna do it

00:36 <heat> that seems controversial

00:36 <mjg> have you ever seen perf stta?

00:37 <heat> yes

00:37 <mjg> https://dpaste.com/9LYHTL6R8

00:37 <bslsk05> dpaste.com: dpaste: 9LYHTL6R8

00:37 <mjg> freebsd has an equivalent

00:38 <mjg> now imagine extending that with off cpu stuff, which is blatantly missing

00:39 <mjg> ... and which i know for a fact is a major factor

00:39 gxt has quit [Read error: Connection reset by peer]

00:40 <mjg> https://reviews.freebsd.org/D24217#576110

00:40 <bslsk05> reviews.freebsd.org: ⚙ D24217 amd64 pmap: fine-grained pv list locking

00:41 gxt has joined #osdev

00:42 Matt|home has quit [Quit: Leaving]

00:54 Iris_Persephone has joined #osdev

01:00 <heat> ok i have something

01:00 <heat> lets see how it does

01:04 <mjg> did it crash? :)

01:05 elastic_dog has quit [Ping timeout: 260 seconds]

01:05 <heat> yes

01:05 <heat> my "is stack pointer out of bounds logic may be broken"

01:05 <mjg> i would say first iteration should just sample IP

01:05 <mjg> and fuck everything else

01:05 <mjg> once you know general machinery works you can start unwinding

01:06 <mjg> but you do you

01:07 <mjg> now that i mention it

01:08 <mjg> you can check if ip falls iwthin kernel range

01:08 <mjg> and if not, ignore the sample

01:08 <mjg> or write a placeholder

01:08 <mjg> as you sample

01:09 <mjg> storing a magic value like 0 could then be post processed to '[userspace]'

01:10 <mjg> bailing for the day, cheers

01:10 elastic_dog has joined #osdev

01:10 <mjg> fg repo https://github.com/brendangregg/FlameGraph.git see stackcollapse.pl

01:13 <heat> yeah I see

01:13 <heat> thanks for the helping

01:13 <heat> s/helping/help/

01:13 <mjg> now i wonder if a nasty user which just jumps to a kernle address would fuck this up

01:14 <heat> how?

01:14 <mjg> they literally jmp $something_in_kernel

01:14 <mjg> but before they crash

01:14 <mjg> yours ampling func finds it

01:15 <mjg> i would say something to worry baout later

01:15 <heat> but they're crashing immediately

01:15 <mjg> if there is no time window for this to get a false positive that's finew ith me

01:20 <mjg> haha wtf, see aix-perl.pl in the repo

01:20 <mjg> holy shit

01:21 <mjg> this has to be SO BAD

01:21 <heat> what's it doing?

01:22 * heat can't read perl

01:22 <mjg> + foreach my $pid (@proc){

01:22 <mjg> + my $command = "/usr/bin/procstack $pid";

01:22 <mjg> + print `$command 2>/dev/null`;

01:22 <mjg> + }

01:22 <mjg> you don't need to know perl for this one

01:22 <mjg> and this happens in a loop

01:22 <heat> getting stacks for userspace processes?

01:23 <mjg> getting stacks for everything i would think

01:23 <mjg> but this is so many forks and execs it has to be disfiguring the shit out of everything

01:23 <mjg> also how often can you sample

01:23 <mjg> defo not 1000 per second

01:24 Persephone has joined #osdev

01:27 Iris_Persephone has quit [Ping timeout: 252 seconds]

01:43 vdamewood has quit [Read error: Connection reset by peer]

01:45 vdamewood has joined #osdev

01:54 gog has quit [Ping timeout: 246 seconds]

02:06 frkzoid has joined #osdev

02:35 srjek has quit [Ping timeout: 244 seconds]

02:40 elastic_dog has quit [Ping timeout: 244 seconds]

02:45 elastic_dog has joined #osdev

02:53 nexalam_ has joined #osdev

02:53 Persephone is now known as Iris_Persephone

02:56 nexalam__ has quit [Ping timeout: 260 seconds]

02:56 Iris_Persephone has quit [Quit: Leaving]

02:56 freakazoid332 has joined #osdev

02:59 frkzoid has quit [Ping timeout: 244 seconds]

03:02 freakazoid332 has quit [Ping timeout: 260 seconds]

03:08 epony has quit [Ping timeout: 252 seconds]

03:13 saltd has quit [Remote host closed the connection]

03:36 saltd has joined #osdev

04:07 [itchyjunk] has quit [Remote host closed the connection]

04:32 vdamewood has quit [Read error: Connection reset by peer]

04:33 vdamewood has joined #osdev

04:36 <heat> mjg, flamegraphs need a stable frequency right?

04:36 <heat> I can't just plop a sampling point

04:36 <heat> yeah that sounds about right

04:41 <heat> ...or does it

04:42 <heat> yeah probably

04:42 <heat> else it wouldn't really correlate to time

04:47 <kof123> flamegraphs, like winamp ? j/k...but not really...

04:50 <mrvn> heat: you could divide the amounts by the time interval to normalize it. But the flames wouldn't fade out right unless you tell it the time passed too

04:50 <mrvn> you loose resolution on the flame then when you add samples too slowly.

05:41 epony has joined #osdev

05:57 moberg has quit [Quit: Disconnecting]

06:02 moberg has joined #osdev

06:31 Andrew is now known as haxcpu

06:47 GeDaMo has joined #osdev

06:57 <heat> https://gist.github.com/heatd/07ac7ba0be21e5e90a5ae1b56e969148 last flamegraph of the night (now day)

06:57 <bslsk05> gist.github.com: onyx-flame.svg · GitHub

06:57 <heat> without the staggering 70% idle

06:58 <heat> now I just need a way to get info about locks and sleeping processes :v

07:01 <heat> some of this stuff is making me real worried though

07:01 <heat> particularly pselect6 being 50%+ of the samples

07:02 <heat> and malloc + spinlocks inside malloc popping up there

07:02 <heat> I know musl's malloc was bad but this bad? just 4 threads :|

07:16 heat has quit [Ping timeout: 260 seconds]

07:17 <mrvn> it might say it's in pselect6 when it's sleeping there.

07:17 <mrvn> or rotating in a spin lock.

07:20 <mrvn> This is how a flame graph should look like: https://www.youtube.com/watch?v=jUJiULU4i0k

07:20 <bslsk05> 'XFlame: From the XScreenSaver Collection, 1999.' by yesthatjwz (00:02:00)

08:08 wootehfoot has joined #osdev

08:25 wootehfoot has quit [Quit: Leaving]

08:38 vdamewood has quit [Read error: Connection reset by peer]

08:39 vdamewood has joined #osdev

08:55 bauen1 has quit [Ping timeout: 264 seconds]

09:07 saltd has quit [Remote host closed the connection]

09:15 saltd has joined #osdev

09:31 CryptoDavid has quit [Quit: Connection closed for inactivity]

10:09 vdamewood has quit [Read error: Connection reset by peer]

10:10 vdamewood has joined #osdev

10:14 gog has joined #osdev

10:55 lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]

10:57 bauen1 has joined #osdev

11:23 <zid> https://faultlore.com/blah/oops-that-was-important/

11:23 <bslsk05> faultlore.com: Compiler Optimizations Are Hard Because They Forget - Faultlore

11:28 <kazinsal> me: man the early research unix filesystem sucks, I can probably design something better off the top of my head

11:29 <kazinsal> also me: [accidentally shits out a design that looks eerily close to the minix filesystem]

11:37 <clever> > And really at what point does the cost outweigh the benefit and aaAAaaAAAaAAAAA!!!!!

11:37 <clever> lol :D

11:41 <MelMalik> kazinsal, hehe

11:42 <zid> I already designed the best possible filesystem, dw

11:42 <sham1> kazinsal: thus is the hubris

11:43 <MelMalik> you probably can, though, just don't think that you can

11:45 <kazinsal> the research unix fs(5) optimizes for fast search of free block and inode lists through the use of a block list in the superblock + additional indirect blocks whereas the minix fs(5) -- and mine, apparently -- just allocates a contiguous block bitmap on disk and uses the contiguous list of inodes themselves to signal free/used status

11:51 <MelMalik> i wish you great luck

11:52 <gog> meow

11:53 * kazinsal gives gog headpats

11:53 * gog prrr

11:54 <gog> i'm doing inadvisble stuff with c++ and it's fun

11:54 <gog> namely, using c++ at all

11:58 <clever> kazinsal: zfs doesnt store the actual state of the free space map, but rather has a log, where every allocation/free is recorded

11:59 <clever> and when loading a spacemap, it creates a set of buckets, with holes of 2^n long in ram

11:59 <kazinsal> this filesystem is intended to run on an 8088 lol

11:59 <clever> ah, if your low on ram, yeah

12:41 qookie has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

12:41 Arsen has quit [Quit: Quit.]

12:41 qookie has joined #osdev

12:41 Arsen has joined #osdev

12:45 vdamewood has quit [Read error: Connection reset by peer]

12:46 vdamewood has joined #osdev

12:47 StoaPhil has quit [Quit: WeeChat 3.6]

12:48 [itchyjunk] has joined #osdev

13:04 SpikeHeron has joined #osdev

13:26 dude12312414 has joined #osdev

13:26 srjek has joined #osdev

14:08 demindiro has joined #osdev

14:37 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

14:43 demindiro has quit [Ping timeout: 252 seconds]

16:15 epony has quit [Remote host closed the connection]

16:15 epony has joined #osdev

16:18 <vin> Is it possible to make writes slower (by injecting some delay) than reads to memory? For example memcpy(region2, region1, size) should be 2x faster than memcpy(region1, region2, size).

16:38 <sortie> vin, there's some caching options available e.g. through the Page Attribute Table on x86, which may be interesting or useful to you. Can you tell us more about your use case?

16:39 <sortie> E.g. write combing is usually turned on for RAM by default, so writes happen asynchronously in deduplicated combined manners. For video memory, e.g., you would often want to turn it on to make writing to it more efficient. Meanwhile you definitely don't want it for memory mapped registers.

16:39 <sortie> *write combining

16:41 <vin> Thanks I'll check it out sortie. I am trying to emulate a device where reads are faster than writes.

16:44 wootehfoot has joined #osdev

16:44 wootehfoot has quit [Remote host closed the connection]

16:47 wootehfoot has joined #osdev

16:47 <vin> Actually its not the latency that I want to change but rather the bandwidth. Where write latency should be worse of than read.

16:48 <vin> I am thinking of running parallel workloads to keep the memory controller when write operations are happening (inducing the slow down)

16:48 frkazoid333 has joined #osdev

16:49 <sortie> If you turn off write combining, you might slow them down considerably

16:49 <sortie> But idk feels a bit futile

16:50 <jimbzy> Hey sortie!

16:50 <jimbzy> How have you been?

16:51 <sortie> eyyo jimbzy!

16:51 <sortie> Oh man been out there traveling the world, partying to the best music :)

16:52 <jimbzy> Nice!

16:52 <jimbzy> Sounds like time well spent

16:52 <sortie> Plus you know working a bunch and even doing some osdev :)

16:53 <sortie> I just finished up my new init system, going through code review now!

16:53 <sortie> Only took me like half a decade to finish it

16:54 <jimbzy> Sweet!

16:54 <jimbzy> Hey, it's just sands in the hourglass, amiright?

16:55 <sortie> Wait who told you that's privileged information!

16:56 <jimbzy> XD

16:57 gxt has quit [Ping timeout: 258 seconds]

16:59 xenos1984 has quit [Ping timeout: 246 seconds]

16:59 xenos1984 has joined #osdev

17:10 wootehfoot has quit [Ping timeout: 246 seconds]

17:11 <geist> woot

17:14 <gog> woot

17:14 xenos1984 has quit [Ping timeout: 246 seconds]

17:21 <zid> ..w-woot?

17:21 <zid> I am like the dog wagging its tail when the humans get good news

17:21 <zid> just happy to be included

17:24 <GeDaMo> I bought some RAM, you can 'woot' about that if you want :P

17:25 <zid> yay, did you buy me any?

17:25 <GeDaMo> Certainly didn't

17:25 <zid> I need 2x8GB PC3-14900

17:25 <zid> I don't have any paypal dollars atm

17:25 <zid> and amazon wants triple the price

17:27 wootehfoot has joined #osdev

17:27 <zid> GeDaMo: you're supposed to brag about what you got

17:27 <GeDaMo> I bought 16GB, now I have 20GB! :P

17:28 <zid> sick

17:28 <zid> dual channel is just a scam by big ram to sell you more dimms afterall

17:28 <gog> i added 8 to my machine last week

17:29 <zid> I'll trade you 32GB of DDR4 for 4x8GB DDR3 41866MHz URDIMMs

17:29 <gog> i don't have

17:29 <zid> 1866*

17:29 <zid> yea nobody does unless they happen to own a mac pro 2013

17:29 <gog> sorry :(

17:29 <GeDaMo> This is what I bought https://www.amazon.co.uk/Timetec-PC3-12800-Unbuffered-Desktop-Computer/dp/B01N7WGQBX/ref=psdc_340832031_t1_B01NAXC8AQ

17:29 <bslsk05> www.amazon.co.uk: 503 - Service Unavailable Error

17:30 <zid> 12800? didn't even go for the 14900?

17:30 <zid> also that's a fucking lot of money or that kit

17:30 xenos1984 has joined #osdev

17:31 <zid> https://www.ebay.co.uk/itm/204082522055

17:31 <bslsk05> www.ebay.co.uk: SK Hynix 2x8GB (16GB Total) 2Rx8 PC3L 12800U Desktop Memory | eBay

17:31 <gog> £22.33 delivery

17:31 saltd has quit [Quit: joins libera]

17:31 <zid> you're in iceland

17:31 <zid> so.. I bet

17:31 <gog> oh right

17:31 <gog> might try to convince my boss to let me build a workstation

17:31 <gog> on his dime

17:32 <zid> Your workstation needs 32GB of 32GB URDIMMs btw

17:32 <gog> yes

17:32 <zid> for the.. monitor

17:32 <gog> yes the monitor memory

17:32 <zid> but you should definitely let me do a professional inspection first

17:32 <gog> so it can show images faster

17:32 wootehfoot has quit [Read error: Connection reset by peer]

17:32 <zid> free of charge

17:32 <zid> My monitor needs ECC tbh, it snows for a bit when cold

17:33 <zid> The CRT next to it is still perfect though :P

17:36 frkazoid333 has quit [Read error: Connection reset by peer]

17:57 demindiro has joined #osdev

17:57 isaacwoods has joined #osdev

18:08 heat has joined #osdev

18:09 <heat> whats up bozos

18:12 <mjg> heat: the fuq were you doing to get pselect high on the profile

18:12 <mjg> is that from a gmake run?

18:12 kof123 has quit [Ping timeout: 268 seconds]

18:12 <heat> lmao

18:12 <heat> make -j4

18:12 <heat> on will-it-scale, output all piped to devnull

18:13 <heat> there's a lot of shit I know is slow (like mmap and munmap)

18:13 <heat> my vm rb tree isn't great

18:13 <heat> needs some love

18:14 <mjg> that pselect shit is terribad

18:14 <mjg> note gmake uses a shared pipe between all children

18:14 <mjg> as in all gmake procs are pounding on it

18:15 bauen1 has quit [Ping timeout: 246 seconds]

18:15 <heat> i did find some good optimization on it

18:15 <heat> that svg is better than what I had lmao

18:15 <heat> I think malloc is really struggling there

18:16 <mjg> https://github.com/heatd/Onyx/blob/master/kernel/kernel/spinlock.cpp#L40

18:16 <bslsk05> github.com: Onyx/spinlock.cpp at master · heatd/Onyx · GitHub

18:16 <mjg> you want this instead: do { cpu_relax(); } while (....);

18:17 <mrvn> mjg: it's a token system. There are 4 tokens in the pipe and every make fork waits to read on token, compiles and writes it back.

18:17 <mjg> mrvn: i'm painfully aware

18:17 <mjg> this does not scale for shit on bigger systems

18:17 <mjg> should not be visible with 4 workers tho:-P

18:17 <mrvn> usualy make jobs don't just last for miliseconds. :)

18:18 <heat> https://gist.github.com/heatd/442b63f1f903ed97347e47878932ee83

18:18 <bslsk05> gist.github.com: onyx-flame-vfsmix.svg · GitHub

18:18 <heat> mjg, think that makes a difference?

18:18 <mjg> at this scale it wont, your spinlocks don't scale anyway

18:18 <mjg> but that's the idiom

18:18 <zid> hey, I'm not an idiom, YOU'RE an idiom

18:19 <mjg> you literally just failed the op, there is 0 reason to instantly load again

18:20 <mjg> wow, wtf

18:20 <mjg> @ vfsmix fg

18:20 <mjg> do you have a global rwlock for lookups or something?

18:21 <heat> no

18:22 <heat> rwlock for each dir

18:22 <heat> they're contending all on /tmp

18:22 <mjg> ok, all these use the same dir anywya

18:22 <mjg> but it would be of note if they also fuck each other up when reaching /tmp

18:23 <mjg> ok i see

18:23 <heat> it's really worrying to me how much it's spinning while dequeuing itself when waking up

18:23 <mjg> you instnatnly go to sleep when faced with contention

18:23 <mjg> this alone is a huge problem

18:23 <mjg> in rw_lock_tryread:

18:23 <mjg> do

18:23 <mjg> {

18:23 <mjg> l = lock->lock;

18:23 <mjg> this load should 1. use atomics 2. be moved prior to the loop

18:24 <mjg> the cmpxchg op updates the value for you

18:24 <mjg> that's minor, what you really need to change is the going off cpu stuff

18:24 <heat> i have some spinning in mutex.cpp

18:24 <heat> might be able to port it to rwlocks too

18:24 <zid> My suggestion is tr/.*/return -EAGAIN/g

18:24 <mjg> do you have means to safely access lock owner?

18:25 <mjg> to check whether it is off cpu

18:25 <mjg> oh shit

18:25 <heat> hrm, no

18:25 <mjg> rwlock_prepare_sleep

18:25 <mjg> that's what you are leading with?

18:25 <heat> mutex does tho

18:25 <mjg> that's openbsd quality man

18:25 <mjg> :)

18:26 <heat> i do a fast path?

18:26 <mjg> oh, you sneaked it on top, my bad

18:26 <mjg> fast path should not be mixed with the slow path man

18:26 <mjg> you are partially defeating the purpose

18:27 <mjg> anyway, general mechanism aside, this will mostly clean itself up if you add adaptive spinning

18:27 <heat> something that's worrying is how sched_is_preemption_disabled so expensive

18:27 <mjg> now waiting for *writers* is easy since you can know what they are

18:28 <heat> it's a fucking trivial function

18:28 <heat> 0xffffffff8019de20 <+0>: push %rbp

18:28 <heat> 0xffffffff8019de21 <+1>: mov %rsp,%rbp

18:28 <heat> 0xffffffff8019de24 <+4>: mov %gs:0x7fe624fc(%rip),%rax # 0x328 <preemption_counter>

18:28 <heat> 0xffffffff8019de2c <+12>: test %rax,%rax

18:28 <heat> 0xffffffff8019de2f <+15>: setne %al

18:28 <heat> 0xffffffff8019de32 <+18>: pop %rbp

18:28 <mjg> the real problem is readers, since you hae no idea if all of them are on cpu

18:28 <heat> 0xffffffff8019de33 <+19>: ret

18:29 <mjg> is that the real func tho?

18:29 <mjg> i presume you are not really tracing interrupts here

18:29 <heat> how is it not the real func?

18:29 <heat> wdym tracing interrupts?

18:29 <mjg> the way i see it you may be doing smething really nasty in there and your code will mistakenly sample code which got irq'ed

18:29 <heat> I trace whenever my timer fires

18:29 <mjg> do you handle interrupts on the same stack?

18:30 <heat> yes

18:30 <mjg> hm

18:30 <mjg> then it is indeed peculiar

18:30 <mjg> are you using kvm here? it is plausible perf from the host iwll be able to shed some light

18:31 <heat> yes I am and no I can't use it for some reason

18:31 <mjg> it is a hit or miss whether it works, when it does, it can provide you with ips

18:31 <mjg> right, i never found out why sometimes does not work and then it stopped being a problem for me :)

18:31 <heat> heatd about intel pt, maybe it can give me more data?

18:32 <heat> heard*

18:32 <mjg> i would say the lock stuff is a problem and has a known path forward

18:32 <mjg> so i would take care of it first

18:32 <mjg> maybe the above will make itself clear later

18:32 <heat> anyhow, I like how this case is mostly action, no bad blocking problems

18:32 <heat> the make -j4 + gcc one is nasty

18:33 <heat> 70% idle :|

18:33 <mjg> it's probably idle from you going off cpu

18:33 <mjg> :)

18:33 <mjg> with lock waiters

18:34 <heat> yea

18:34 <heat> which is why I kinda want to trace that

18:34 <mjg> i would say should not be very hard

18:34 <mjg> the general mechanism you used for sampling can be repurposed

18:35 <mjg> you store stacktraces + off cpu time

18:35 <mjg> and maybe lock addr

18:35 <mjg> then again you export this into a flamegraph

18:35 <mjg> except instead of counts you get cumulative time

18:35 <heat> + off cpu time? what do you mean

18:36 <mjg> before = timstamp(); go_off_cpu(); off_cpu_time = timestamp() - before;

18:36 <heat> yeah but where would I store this?

18:36 <heat> not in this current system I have for sure

18:36 <mjg> you copy-patse the code you have for on cpu sampling

18:37 <mjg> you can sack space for 1 frame and just replace it with the time

18:37 <mjg> and have your tooling know about

18:37 <mjg> it

18:37 <heat> hrm

18:37 <heat> i'll have a think about it

18:38 <heat> I was thinking about getting Real Tracing(tm)

18:38 <mjg> that sounds like several weeks of work

18:38 <mjg> :->

18:38 <heat> is it?

18:38 <mjg> i would say best bang for the buck right now is the above, but you do you

18:38 <heat> get a ring buffer and write events to it

18:39 <mjg> i thought you want ebpf-esque soution

18:39 <heat> noooooo

18:39 <mjg> being real tracing 'n shit

18:39 <heat> that's too fancy

18:39 <mjg> well my proposal is above, i think very simple to tackle onto your existing code

18:39 <mjg> but it's your pick

18:40 <heat> btw I don't know if you're looking at the vfsmix svg still but most of my write(2) time is in malloc

18:40 <heat> :v

18:40 <heat> I may have outgrown this garbage malloc

18:40 <mjg> i'm kind of path lookup biased

18:40 <mjg> ;)

18:41 <heat> if I went with your approach, I couldn't use flamegraphs right?

18:41 <mjg> yep looks pretty crap

18:41 <mjg> why not?

18:41 <mjg> you do realize all the flamegraph stuff is just a fancy presentation of whatever you stacked up in there

18:41 <heat> would replacing the frequency with the time work?

18:42 <mjg> the numbers are whatever teh fuck you please

18:42 <mjg> there are even flamegraphs for file sizes

18:42 <mjg> https://people.freebsd.org/~mjg/poudriere-offcpu.svg

18:43 <mjg> at the bottom i added a special 'frame' "all", the value is total sleep time

18:43 <mjg> above that is wait channel

18:43 <mjg> again, value is sleep time on that fucker

18:43 <mjg> and above that are stacktraces

18:43 <heat> that graph screams "prince"

18:43 <mjg> by "convnetion" you add --color=io and that's how itl ooks like

18:43 <heat> where do you add that? flamegraph.pl?

18:44 <mjg> the --color arg to flamegraph.pl

18:44 Terlisimo has quit [Quit: Connection reset by beer]

18:44 <mjg> all the rest is just hte input file

18:44 <mjg> lemme generate an example real quick

18:45 <mjg> well will be 5, have to boot my test box

18:50 <mjg> heat: https://people.freebsd.org/~mjg/.junk/offcpusample.txt

18:50 <mjg> heat: perl ~/mjg/FlameGraph/stackcollapse.pl | perl ~/mjg/FlameGraph/flamegraph.pl --color=io > out-off.svg

18:51 <heat> what's that name above the number?

18:51 <mjg> in most cases lock name

18:51 <mjg> in others it is wait channel

18:51 <mjg> you can just prop your lock address for the time being

18:51 <heat> also really important question

18:52 <heat> why are lots of vfs functions in freebsd all-caps

18:52 <heat> like VOP_LOCK1_APV

18:52 <mjg> i don't know how tha came to be, i suspect they started as macros

18:52 <heat> why were they not changed back?

18:52 <heat> s/back//

18:53 <mjg> that's like the smallest problem about them

18:53 <heat> yes but erm

18:53 <heat> is weird

18:53 <mjg> i did not even think to do something like that

18:53 <mjg> instead of talking shit about vfs

18:53 <heat> maybe a genius

18:53 <mjg> vmocol

18:53 <mjg> kernel`vm_object_collapse+0x13b

18:53 <mjg> the crap i complained about already visible

18:53 <mjg> the trace is from building the kernel

18:54 <mjg> heat: a safe bet is lsd

18:57 GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]

18:57 Terlisimo has joined #osdev

18:59 <mjg> re sched_is_preemption_disabled, i wonder if the func is fine, but you are just calling it a metric fuckton

18:59 <mjg> also why does it exist to begin with

19:00 <mjg> as in the code should know

19:03 <mjg> hm now that said it, are you sure you are allocating per-cpu memory in a manner which avoids cache bouncing with other threads?

19:04 <mjg> bare minimum you want per-cpu bufs with sizes rounded up to a multiple of 128

19:05 <mjg> oh man you use linker sets for this fuckery?

19:05 <mjg> i would argue waste of cpu time

19:05 <mjg> [not that freebsd is better :]]

19:09 <mjg> heat: ok man, your cred code needs a revamp

19:09 <mjg> heat: stock standard approach is to have copy-on-write objs

19:10 <mjg> heat: your rw lock around them is a serious problem

19:10 <mjg> heat: well will be once you fix the current stuff :)

19:21 <zid> we don't take kindly to people who know what they're talking about mjg

19:28 demindiro has quit [Ping timeout: 252 seconds]

19:33 <mrvn> Is there something worse than ordering food online, getting a ETA with count down clock, confirmation that the food is on the way and yet nothing arrives. Do they just fake all the status updates, seriously?

19:35 <gog> grubhub?

19:35 <mrvn> lieferando. They even have a marker on the map where the delivery bike is supposed to be.

19:35 <gog> hm

19:36 <gog> they might be faking it

19:36 <mrvn> Oh, the status has changed: "Moaz ist mit Deiner Bestellung bei Liki Burger auf dem Weg." Was just confirmed to be on the way before.

19:36 <gog> auf dem Weg

19:37 <gog> possible that the driver jumped the gun on confirming pickup

19:37 <gog> or bicyclist

19:38 <mrvn> now the bike is moving....

19:38 <mrvn> If only the food where free if they take too long to deliver.

19:57 bauen1 has joined #osdev

20:06 freakazoid332 has joined #osdev

20:10 C-Man has joined #osdev

20:14 gxt has joined #osdev

20:57 freakazoid332 has quit [Ping timeout: 244 seconds]

21:13 frkzoid has joined #osdev

21:15 buffet has left #osdev [The Lounge - https://thelounge.chat]

21:18 frkzoid has quit [Ping timeout: 244 seconds]

21:18 <mjg> mrvn: these statuses are mostly fake

21:37 vdamewood has quit [Read error: Connection reset by peer]

21:37 xenos1984 has quit [Read error: Connection reset by peer]

21:38 opal has quit [Ping timeout: 258 seconds]

21:38 vdamewood has joined #osdev

21:53 scoobydoo has quit [Ping timeout: 244 seconds]

21:54 scoobydoo has joined #osdev

21:55 xenos1984 has joined #osdev

21:56 freakazoid332 has joined #osdev

22:10 DanDan has quit [Ping timeout: 252 seconds]

22:24 scoobydoo_ has joined #osdev

22:24 scoobydoo has quit [Ping timeout: 265 seconds]

22:24 scoobydoo_ is now known as scoobydoo

22:37 freakazoid332 has quit [Ping timeout: 244 seconds]

22:44 <heat> mjg, sorry for ignoring you, had to sleep what I didn't sleep last night

22:46 <heat> linker sets?

22:46 <heat> you mean a linker section?

22:46 <mjg> and all this time i thought you are furiously coding off cpu tracking

22:46 * mjg is disappointed

22:46 <heat> i was furiously coding on-cpu tracking last night til 8am

22:47 <mjg> on a serous note i wrote some genuine feedback

22:47 <mjg> with the cred stuff being lowest priority, but definitely to-be-fixed

22:48 <heat> think that will screw something up?

22:49 <heat> what's the alternative, refcounted struct cred + cmpxchg on write?

22:49 <heat> i thought about an rw lock because it's genuinely a case where you're not probably writing much

22:50 <heat> hence no need to worry about writer starvation

22:50 <mjg> copy-on-write

22:50 <mjg> free access at all times, no need to synchro squat

22:50 <mjg> apart from one special case

22:50 <heat> ...which sounds exactly like what I'm thinking

22:50 <mjg> typical approach is you check at user<->kernel boundary if your creds are current

22:50 <heat> hm

22:51 <mjg> performance problems of rw locking aside, you are establishing a lock ordering

22:51 <mjg> which, if you add lock ordering verification, i guarantee will eventually sohw deadlocks

22:51 <mjg> for example if you happen to hold this across i/o which writes to something which gets a page fault

22:52 <mjg> an improvement on checking for creds specifically is recognizing there may be other COW structs to sync

22:52 <mjg> and instead having a struct cow_objs { ... } thing you have a pointer to

22:52 <heat> why deadlocks?

22:52 <mjg> or some form of a generatin counter

22:53 <mjg> ok, just trust on me on this one, if you get a big enough kernel, all possible lock orderings which show up are pretty funny

22:54 <heat> oh i know

22:55 <mjg> did you know SOLARIS, world-famed SMP kernel does not have a lock ordering verification facility?

22:55 <mjg> while linux, a hippie-written kernel, does?

22:55 <heat> lmao

22:56 <heat> damn communists

22:56 <mjg> i'm guessing solaris kernel devs just don't write deadlocks

22:56 <mjg> 's why

22:57 <heat> world-renowned SMP experts

22:58 <heat> oh yea

22:58 <heat> something I want to ask

22:58 <heat> what's with unix and genunix?

22:58 <mjg> https://www.illumos.org/issues/13243 btw

22:58 <bslsk05> www.illumos.org: Bug #13243: deadlock on ZFS during concurrent rename and mkdir - illumos gate - illumos

22:58 <heat> are they still separate in modernish bsds?

22:58 <mjg> i don't know what's up with the split

22:59 <mjg> in bsd you just have kernel

23:00 <mjg> so apparently it was already reported 6 years ago by another freebsd dev

23:00 <mjg> 2 years ago i ran into it and gave them a reproducer

23:00 <mjg> they reprod 1 year ago

23:00 <mjg> no fix in sight, at least none mentioned

23:00 <mjg> :>

23:01 <mjg> would be funny to check is solaris proper still has the problem

23:01 <heat> in the example dtrace in the flamegraph repo there's still a separation

23:01 <mjg> i know

23:01 <mjg> i don't know why they roll with it

23:01 <heat> solaris proper? does that even handle concurrency? :P

23:03 <heat> btw thanks for the help and tips

23:03 <heat> really helpful :)

23:04 <mjg> i can rant, but it is getting late

23:04 <mjg> :[

23:04 <heat> lmao

23:04 <mjg> i know of some actual fixups to smp in solaris

23:04 <mjg> after it diverged from illumos

23:04 <mjg> some of it is stuff they should have done years ago

23:04 <mjg> and other is a combination of a smart idea implemented in a stupdi manner

23:04 <mjg> :[

23:04 hmmmm has joined #osdev

23:05 <mjg> heat: want a rant here it is https://www.illumos.org/issues/13057

23:05 <bslsk05> www.illumos.org: Bug #13057: pessimal mutex behavior - illumos gate - illumos

23:06 <heat> you have rants for days don't you

23:06 <geist> yeah and honstly.. i dunno. there's mmore to life than speed

23:06 <vin> sortie: Replying to you late. Wouldn't dissabling write combining make all writes slower. Anychance writes to a particular memory region can be made slower?

23:06 nexalam__ has joined #osdev

23:06 <geist> everyone has their thing, but performane performance prformance scaling, etc is not everything

23:07 <vin> For context: I want to limit just the write bandwidth made to far numa node memory.

23:08 nexalam_ has quit [Ping timeout: 246 seconds]

23:10 <mjg> geist: well enjoy your openbsd manuals in a vm man :)

23:10 * geist shrugs

23:10 <geist> just sayin

23:10 <mjg> i have my kinks, you have yours

23:11 <mjg> look i'm happy to shut about solaris if it is seen as a problem

23:11 <mjg> believe it or not :->

23:12 <geist> well anyway

23:12 <geist> it's late!

23:13 <heat> noooo

23:13 <heat> i like this channel's diversity

23:13 <geist> okay, sorry never mind

23:14 <heat> what have you been up to geist?

23:14 <geist> oh just general home maintenance, etc

23:14 <geist> thinking of doing some of FS hackery here in a sec

23:15 <heat> nice nice

23:15 <heat> still fat32?

23:15 <geist> yeah gotta finish it up

23:16 <geist> hate write wrorking the other day, now have to wire up all the remaining ops, run some stress tests and declare it v1

23:16 <mjg> got fsx on it?

23:16 <heat> do you have some gotos for stress tests?

23:16 <geist> what is fsx?

23:16 <clever> https://github.com/librerpi/lk-overlay/blob/master/lib/cksum-helper/cksum.c#L267-L272

23:16 <bslsk05> github.com: lk-overlay/cksum.c at master · librerpi/lk-overlay · GitHub

23:16 <heat> apple's shtick

23:17 <heat> i linked it the other day

23:17 <geist> oh. then no.

23:17 <clever> geist: if you pop this module into your build, and change the fstype on 262, it should help to stress test fat

23:17 <clever> it uses psci to shutdown qemu automatically when done

23:17 <geist> sure. but.. uh frankly that's a really lame stress test

23:17 <vin> what changes are you making to the fs geist ?

23:18 <geist> oh, just implementing it

23:18 <geist> a driver, that is

23:18 <heat> i stole fsx from apple and fsstress from ltp to stress my stuff

23:18 scoobydoo_ has joined #osdev

23:18 <vin> oh a driver for a new device?

23:18 <heat> a driver for the filesystem

23:18 <geist> no, a FS driver for FAT*

23:18 scoobydoo has quit [Ping timeout: 260 seconds]

23:18 scoobydoo_ is now known as scoobydoo

23:18 <clever> geist: what might you do to improve the stress test, maybe fire up threads and sha256 in parallel?

23:19 <geist> implement write

23:19 <clever> ah yes

23:19 <geist> that's not a stress test, that's simply a validation test

23:19 <clever> yep

23:19 <geist> its useful, but not really what i'd consider something that's stressing a fs

23:19 <mjg> want to seriously stress this -- xfstests

23:19 <heat> I've thought about that but it has too many deps

23:19 <mjg> has a barrage of tests, but i don't remember how portable it is

23:20 <heat> fsstress is something I can trivially add to my src/ and make it run on every CI

23:20 <clever> yeah, i had totally forgotten about write support, and that is something i also want to get working

23:20 <geist> huh i kinda wonder if fsx is a derivative of some fs code i wrote when i was at apple on the fs team

23:20 <heat> https://github.com/apple/fstools/blob/master/src/fsx/fsx.c

23:20 <heat> take a look

23:20 <bslsk05> github.com: fstools/fsx.c at master · apple/fstools · GitHub

23:20 <geist> i remember there being basically zero stress tests, so i started building a thing and then handed it off when they pulled me into iphone

23:20 <heat> also you were on the fs team?

23:21 <heat> you've been everywhere! :P

23:21 <vin> FAT32 did not have a driver? I am trying to understand the purpose behind the driver

23:21 <geist> ah no that predates it

23:21 <heat> "gcc -arch ppc -arch i386 -arch ppc64 -arch x86_64" what are these -arch switches?

23:22 <heat> are they for gcc-disguised clang?

23:22 <geist> ah actually yeah i did have a bit to do with this

23:22 <geist> i was there are 2005, write some standalone tool that did something similar. looking at the history of it they rolled some of the machinery into this in 2006

23:23 <geist> yeah this is sort of a distant derivative of some stress test code i wrote at the time

23:23 <heat> very cool

23:23 <geist> well portions of

23:24 <geist> most of it predates it, if nothing else because the style is not mine at all

23:24 <mjg> heh solid

23:24 <mjg> well your old code can come back to haunt you in a new way then

23:24 <heat> lk isn't posix

23:24 <heat> might be hard to do

23:24 <geist> but basicall what i wrote is fairky simple. had somehting like it at Be: spawn a crapton of threads, each thread goes and does a bunch of random stuff, run until something falls over

23:25 <geist> oh gosh no am i using any apple code in my projects

23:25 <geist> *that* is a stress test. try to find all the edge conditions

23:25 <heat> :D

23:25 <geist> i remember writing this tool in 2005 and it would almost instantly fatally corrupt HFS+. a bug was found and fixed

23:26 <geist> i remember the tseting folks were like 'oh this is great!' and they added it to their stuff

23:26 <geist> but it's the usual issue i have with unit testing vs stress testing. lots of places uit test, but dont stress test, because the latter takes time and has harder to define end states

23:27 nexalam__ has quit [Quit: Leaving]

23:27 <heat> tbf fsx.c is widely considered more of a unit-test thing these days

23:27 <geist> it's easy to run a unit test every time someone makes a CL, but much harder to say run a bank of machines, beating up on the software, trying to find edge cases

23:27 <mjg> fwiw freebsd has a stress testing machinery and it is pretty good

23:27 <heat> https://github.com/google/file-system-stress-testing

23:27 <bslsk05> google/file-system-stress-testing - A tool that can be used to stress test POSIX filesystems. (26 forks/88 stargazers/Apache-2.0)

23:27 <mjg> found tons of bugs

23:28 <heat> mjg, ever used that? ^^

23:28 <heat> I don't know how good it is but it seems to be made for freebsd

23:28 <geist> mjg: yeah i think OS projects are generally more amenable to stress testing, since there's essentially an infinite number of monkeys

23:28 <geist> vs a company where time is money

23:28 <mjg> heat: no, looking

23:28 <mjg> i wanted to note though that existence of instrumanetation + things like syzkaller definitely changed the landscape

23:28 <vin> https://github.com/google/file-system-stress-testing

23:29 <mjg> heat: heh it even has a mnual for freebsd

23:29 <geist> yeah. what i like to look for in a FS implementation is all the internal locking issues, and pushing all of the ops to MT collisions, etc

23:30 <mjg> wait, that's a stale repo

23:30 <mjg> what's the current one

23:30 <geist> also generating crazy lomg journal transations that blow up internals

23:30 <geist> rename() in particular is hell

23:30 <heat> mjg, seems to have been frozen, I don't know

23:30 isaacwoods has quit [Quit: WeeChat 3.6]

23:30 <geist> ie, via a crapton of random stuff end up with huge, incredibly fragmented files, then simultaneously rename A onto B onto C while C is renamed into A, etc

23:30 <geist> that sort of stuff finds all sorts of edge cases and failure cases

23:31 <mjg> ye, i even linked to [redacted] system deadlocking with it

23:31 <heat> [TOP SECRET]

23:31 <geist> anyway, that kinda stuff you can't easily unit test, since it relies on lots of heavily threaded slams on a machine

23:32 <mjg> https://www.netbsd.org/~riastradh/tmp/dirconc.c

23:32 <geist> sometimes newer/faster hardware even hids the problems. sometimes handy to test on slow ass hardware, where races are wider

23:32 <mjg> there is the funny technique were you inject delays to artificially expand race windows

23:32 <mjg> i mean tooling is doing it for you

23:32 <geist> (qemu using TCG is actually fairly good for severe SMP racy stuff)

23:33 <mjg> heat: so does onyx survive https://www.netbsd.org/~riastradh/tmp/dirconc.c ? :)

23:33 <geist> yah single threaded qemu does by default, sicne it context switches between emulated cpus. has gigantic race windows as a result

23:33 <vin> https://github.com/utsaslab/crashmonkey this is a reasonably good crash testing tool that was published recently

23:33 <bslsk05> utsaslab/crashmonkey - CrashMonkey: tools for testing file-system reliability (OSDI 18) (27 forks/176 stargazers/Apache-2.0)

23:34 <geist> anyway yeah i should try to port one of these things to LK, or at least take inspiration. passing that is what i'd consider a V1 release of a FS

23:34 DanDan has joined #osdev

23:34 <heat> mjg, haven't tried yet

23:35 <heat> i'm tackling the budget-ass wait tracing rn

23:35 <mjg> +1

23:36 <heat> i can't stress how much I love looking at flamegraphs and tracing shit

23:36 <heat> it just looks soo good

23:36 <mjg> my man!

23:36 <heat> these svgs are particularly satisfying

23:37 <heat> you can even click shit

23:37 <vin> +1

23:37 <mjg> this is not a safe space for such claims though

23:39 <mjg> heat: you should read up on brendan gregg's blog then

23:39 <mjg> in particular there graphs for sleep and wakeup stuff

23:39 <vin> +1 https://www.brendangregg.com/flamegraphs.html

23:39 <bslsk05> www.brendangregg.com: Flame Graphs

23:40 <mjg> https://www.brendangregg.com/blog/images/2016/tar-offwaketime-flamegraph.svg

23:40 <mjg> how you go off and how you get back

23:40 <mjg> https://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html

23:40 <bslsk05> www.brendangregg.com: Linux Wakeup and Off-Wake Profiling

23:42 <vin> For visualizing low latency measurements I like dick sites graphs more https://queue.acm.org/detail.cfm?id=3291278

23:42 <bslsk05> queue.acm.org: Benchmarking "Hello, World!" - ACM Queue

23:42 <mjg> oh ye the guy is great

23:43 <mjg> tracing is such a rabit hole though

23:43 <mjg> may i also recommend gil tene talking about latency

23:43 <vin> I am reading his book right now, so much of what he says is something I want to follow "measure and then build"

23:43 <heat> booooring

23:44 <heat> build and then measure 3 years later

23:44 <heat> or in my case, a solid 7 years

23:44 <vin> Haha

23:45 <mjg> overpromise and under deliver

23:45 <mjg> work for a corp for 1 year and you will know what i mean

23:45 <vin> My research has been mostly studying things carefully rather than coming up with some idea and hammiring it all the places

23:45 SpikeHeron has quit [Quit: WeeChat 3.6]

23:46 <vin> *hammering

23:46 <vin> Also, regarding my previous question. Is it possible to make just writes to far NUMA memory slow? I have a tmpfs on the other numa node and would like to make all writes to a mmaped region slow by some delay.

23:46 SpikeHeron has joined #osdev

23:47 dude12312414 has joined #osdev

23:47 <vin> I really want to mimic assymetric read write bandwidth of a device. Any thoughts?

23:47 <mjg> if you strictly control everything, i would just patch the kernel to add an artificial delay

23:48 dude12312414 has quit [Remote host closed the connection]

23:48 <heat> that seems... complicated

23:48 <vin> where exactly will I add this delay mjg

23:48 <heat> just add that in user space

23:48 <mjg> provided all the writes happen through write(2) et al, somewhere in tmpfs write

23:48 <heat> tmpfs write isn't a thing

23:48 <mjg> you spot you are in the "screw with it" area and just artificially wait

23:49 <mjg> things get harder if you mmap

23:49 <heat> rather tmpfs_writepages (but it got renamed now, it's readahead iirc?)

23:49 <heat> and also that's useless, writepages is only called when flushing

23:49 <heat> and we'd need to go into the details of linux vfs and... yeah

23:49 <heat> fuck that

23:50 <vin> I would love to just use "wait" in userspace but the thing is I am calling a library that move data around in this mmmaped region.

23:50 <vin> There are other places where I memcpy from a malloced area to the mmaped region where I can add a wait to induce this delay

23:50 <mjg> hm

23:51 <mjg> is this something you need to execute on bar metal?

23:51 <mjg> if you were playing valgrind you could inject delays at that level

23:51 <vin> mjg: Unfortunately yes. I am doing benchmakrs on how something would work on future byte addressable storage devices.

23:53 <mjg> welp good luck :)

23:54 <vin> haha, I will figure something out. Thanks for the discourse :)