#osdev on 2022-09-27 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:10 haxcpu is now known as ay

00:11 FreeFull has quit []

00:56 srjek|home has quit [Ping timeout: 252 seconds]

01:01 tacco has joined #osdev

01:07 srjek has joined #osdev

01:29 tacco has quit [Remote host closed the connection]

01:39 srjek|home has joined #osdev

01:42 srjek has quit [Ping timeout: 265 seconds]

01:53 [itchyjunk] has quit [Ping timeout: 252 seconds]

01:57 [itchyjunk] has joined #osdev

02:06 scoobydoo has quit [Read error: Connection timed out]

02:07 scoobydoo has joined #osdev

02:14 smach has joined #osdev

02:34 [itchyjunk] has quit [Remote host closed the connection]

02:55 srjek|home has quit [Ping timeout: 248 seconds]

03:25 smeso has quit [Quit: smeso]

03:30 smeso has joined #osdev

03:32 <geist> agreed re: not having 512 bit wide alu. glad there's not a bunch of dark silicon dedicated to it

03:46 tarel2 has joined #osdev

03:51 <mrvn> My Ryzen cpu can drive either sata or nvme, you have to pick one.

04:24 <mxshift> Huh. I'd have to go pull up the Ryzen zen4 docs to verify the SATA omission. I'm 80% confident EYPC zen4 still shows many of the PCIe lanes to be switched to SATA

04:37 smach has quit []

04:38 smach has joined #osdev

04:56 heat has quit [Ping timeout: 268 seconds]

05:15 <moon-child> disappointed at the slow gather/scatter

05:16 <moon-child> I had some code with zen2 that was way slower with gather than with scalar loads. Pretty sure next time I touch that code I'll upgrade it to avx512 and it'll be faster; 8-way gather on intel is pretty much the same throughput per load as scalar loads

05:17 <moon-child> (not on zen4, though, apparently. ¯\_(ツ)_/¯)

05:36 Goodbye_Vincent has joined #osdev

05:46 epony has quit [Ping timeout: 252 seconds]

05:48 smach has quit []

05:50 Ram-Z has quit [Ping timeout: 246 seconds]

06:29 scoobydoo has quit [Read error: Connection timed out]

06:30 scoobydoo has joined #osdev

06:40 m5zs7k has quit [Ping timeout: 250 seconds]

06:44 m5zs7k has joined #osdev

06:49 SGautam has joined #osdev

07:12 Ram-Z has joined #osdev

07:12 epony has joined #osdev

07:14 scoobydoo has quit [Read error: Connection timed out]

07:15 scoobydoo has joined #osdev

08:05 jjuran has quit [Read error: Connection reset by peer]

08:09 jjuran has joined #osdev

08:22 divine has quit [Ping timeout: 268 seconds]

08:28 divine has joined #osdev

08:57 opal has quit [Remote host closed the connection]

08:57 opal has joined #osdev

09:09 opal has quit [Remote host closed the connection]

09:10 opal has joined #osdev

09:18 jjuran has quit [Quit: Killing Colloquy first, before it kills me…]

09:18 jjuran has joined #osdev

09:24 GeDaMo has joined #osdev

09:28 zaquest has quit [Remote host closed the connection]

09:28 SGautam has quit [Quit: Connection closed for inactivity]

09:33 seer has quit [Ping timeout: 268 seconds]

09:54 vdamewood is now known as vinleod

09:58 vinleod is now known as vdamewood

10:04 opal has quit [Ping timeout: 258 seconds]

10:10 opal has joined #osdev

10:13 zaquest has joined #osdev

11:10 smach has joined #osdev

11:14 maxdev has joined #osdev

11:14 <maxdev> helloo

11:15 <sham1> Hello

11:15 <maxdev> man it's been some time since i've been here

11:17 <maxdev> does anyone know if reading the LAPIC id register has any side effects? i'm reading it a lot to identify which core i'm running on, and it's giving me a headache

11:18 Ram-Z has quit [Quit: ZNC - http://znc.in]

11:18 <sham1> Why not just read it once and then store that in some CPU-specific data structure accessed by %gs

11:20 srjek|home has joined #osdev

11:20 <maxdev> in %gs i currently write the index of my user-thread pointer segment so always 0x30 basically

11:21 <maxdev> so that the user-space thread-local stuff works.. hmmm

11:21 Ram-Z has joined #osdev

11:21 <sham1> Oh, if you're not in AMD64 then it might be more difficult

11:22 <maxdev> yeah i'm doing x86 only still

11:23 <sham1> Because in AMD64, %fs tends to be useful for thread-locals (at least with things like the System V ABI) and %gs for processor-locals

11:26 <maxdev> i guess there it's a little different

11:26 <maxdev> it is really weird, because the pure fact that i'm reading from the lapic causes unexpected behaviour, even if I don't use the value

11:27 <sham1> It certainly wouldn't be unheard of if reading from a hardware register would have some other kind of a side effect

11:36 Ali_A has joined #osdev

11:40 <Mutabah> I'm pretty sure there's no side-effects to it, but it might be slow

11:41 <zid> it might be serializing or something?

11:56 joe9 has quit [Quit: leaving]

11:57 scaleww has joined #osdev

11:58 joe9 has joined #osdev

12:09 scaleww has quit [Quit: Leaving]

12:14 <maxdev> @zid serializing what?

12:15 nur has quit [Remote host closed the connection]

12:16 <zid> That's just a jargon term

12:16 <zid> means to put things into order

12:16 <zid> these cpus are out of order

12:16 <zid> serializing operations make all the operations appear to have finished first

12:16 <zid> useful for ops like say, rdtsc, to serialize

12:17 <zid> otherwise you might speculatively read it in the future, then write it back into the past

12:17 smach has quit []

12:20 <maxdev> ahh.. :s

13:04 epony has quit [Ping timeout: 252 seconds]

13:05 <ddevault> diagramed out my kernel architecture https://l.sr.ht/WQ4q.svg

13:06 epony has joined #osdev

13:21 SGautam has joined #osdev

13:30 smach has joined #osdev

13:31 <mrvn> sham1: iirc fs and gs are swapped on x86 and x86_64 for tls or something.

13:32 <sham1> Could be, I haven't looked at x86 ABIs that much

13:32 <sham1> I really only know of AMD64 because that's my primary interest

13:34 <mrvn> On ARM it's better since you have a kernel only register, a kernel writable / user readable register and a user read/write register

13:35 <mrvn> So userspace can't mess up your per-core pointer.

13:37 <mrvn> on ARM64 you have 4 thread registers, one per ELx level.

13:42 sav_ has joined #osdev

13:48 nyah has joined #osdev

13:52 smach has quit [Remote host closed the connection]

13:52 sav_ has quit [Remote host closed the connection]

13:59 smach has joined #osdev

14:02 scoobydoo has quit [Read error: Connection timed out]

14:03 scoobydoo has joined #osdev

14:18 smach has quit []

15:02 Ali_A has quit [Ping timeout: 252 seconds]

15:23 opal has quit [Read error: Connection reset by peer]

15:24 opal has joined #osdev

15:31 opal has quit [Remote host closed the connection]

15:31 opal has joined #osdev

15:43 smach has joined #osdev

15:50 smach has quit [Remote host closed the connection]

15:54 maxdev has quit [Remote host closed the connection]

15:57 k0valski1889 has quit [Quit: Peace out !]

16:00 smach has joined #osdev

16:13 puck has quit [Excess Flood]

16:13 puck has joined #osdev

16:31 xenos1984 has quit [Ping timeout: 268 seconds]

16:32 <geist> plus a RO one at EL0, just like ARM

16:32 xenos1984 has joined #osdev

16:35 heat has joined #osdev

16:35 <heat> mjg, remember that trivial sched_is_preemption_disabled() dominating a good chunk of the CPU flamegraph?

16:35 <heat> it's a weird case of survivorship bias

16:36 <heat> it's literally the result of sti'ing and getting the pending IRQ 2 instructions later lmao

16:50 jafarlihi has joined #osdev

16:50 <jafarlihi> Hey, do you guys know how to multiple/add/divide 18 precision float without using "big

16:50 <jafarlihi> " libraries

16:51 <jafarlihi> Represented as string

16:51 SGautam has quit [Quit: Connection closed for inactivity]

16:51 <jafarlihi> What is it called? Is there resources on that?

16:51 <jafarlihi> Just need to implement multiplication and such, performance is not an issue

16:54 <zid> do it per digit and add with carry

16:54 <zid> same as if you were doing it on paper

16:54 terminalpusher has joined #osdev

16:55 <zid> helps if you do it backwards

16:55 <jafarlihi> I don't know how to do it on paper, got links?

16:55 <zid> links to some paper? that'd be hard

16:56 <zid> 152.8 * 3-> 8.251 * 3 -> 8 * 3 = 2.4, 2*3 = 6, 5*3 = 1.5, 1*3 = 3

16:56 <jafarlihi> https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic has links to decimal algorithms, but I can't find anything for floats

16:56 <bslsk05> en.wikipedia.org: Arbitrary-precision arithmetic - Wikipedia

16:56 <zid> floats are decimals

16:57 <jafarlihi> Oh

16:57 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1024364224039882882/unknown.png

16:58 <zid> it's just place value

16:58 <zid> it's easier to do it back to front if you're doing strings, then you can accumulate the value easier and not have to actually make a list, but that's an optimization and the concept is the same

16:58 <jafarlihi> Ok thanks

17:00 <heat> ok so I need some feedback

17:00 <zid> so if you had str[3] = {'8', '5', '1'} to represent 158, d1 = str[0]-'0' * 3; str[0] = (d1%10)+'0'; d2 = (str[1]-'0') * 3; str[1] = (d2%10)+'0' + d1/10; d3 = str[2]-'0' ...

17:01 <zid> You want it that way so that you don't end writing your remainders to str[-1]

17:01 <heat> I have lots of issues with holding locks and doing things for a loooong time with them held (imagine filesystem lookups, IO, etc)

17:01 <heat> what are the standard patterns for solving this?

17:01 <heat> I know linux does a lot of fuckery with flags and waiting on things futex-style

17:02 <zid> I mean, you described what you were doing, but not that actual issue?

17:02 <zid> If a long operation needs exclusion while you do it, it needs it. What's the *problem* your impl. causes?

17:02 <heat> the issue is that imagine I'm holding the lock for /home/zid

17:03 <zid> That your exclusion periods are not correctly fenceposted and you hold it for longer than you need to? That your locks are too expensive? etc

17:03 <heat> you have 2 threads doing lookups and 1 thread doing writes (which involve hitting the fs as part of O_CREAT)

17:03 <heat> the 2 threads that could do easy, quick lookups to caches dentries will end up being held back by the writer which is effectively serializing things

17:04 <zid> wouldn't you typically leave that to the reader to deal with? tocttu bugs etc

17:04 <heat> the issue is that doing expensive things like IO when holding contested locks will effectively serialize things

17:04 <zid> and if they want to avoid them, they use special interfaces like rename instead of rm; write

17:04 <heat> where does tocttu come into play?

17:05 <zid> that's why anyone would care about not being able to read data someone is updating

17:06 <zid> or rather, care that they can, and maybe they shouldn't be able to

17:06 <heat> ah, yes, UAFs?

17:06 <heat> and similar bugs

17:06 <heat> yes, that's a problem, which is why the lock is there

17:06 jafarlihi has quit [Ping timeout: 265 seconds]

17:07 <zid> right, I'm saying usually you leave that to the application to deal with, by making them request special primitives that are safer

17:07 <heat> but this crashes the kernel

17:07 <zid> oopsie doopsie

17:07 <zid> you never mentioned kernel crashes

17:08 <heat> if you have UAF bugs you can crash the kernel

17:08 <heat> simple

17:08 <heat> which is why the lock is required to be there

17:08 <zid> yea I hadn't figured out if you were being overzealous or underzealous yet

17:08 <zid> isn't this typically where you'd use an RCU

17:08 <zid> makes the alg lockless, but as a side-effect, also makes it.. lock safe

17:09 <zid> nobody needs to write potentially buggy locking code

17:09 <heat> my problem is that every time I hold the lock in a write-way (my dentry code uses rwlocks and not mutexes) I do something stupidly expensive

17:09 <heat> usually filesystem->open(...), or filesystem->creat(...)

17:09 <heat> you get the idea

17:09 <zid> RCU also helps with that, no?

17:10 <heat> no, AFAIK RCU requires preemption to be disabled

17:10 <zid> don't insert the new file into the dir until it's made and finished

17:10 <zid> rather than locking the dir

17:10 <zid> making the file, unlocking the dir

17:10 <heat> right, but then you have concurrent requests for the same data

17:10 <zid> there's no concurrency issue there

17:10 <heat> or concurrent creats

17:11 <heat> you *do not* want a race condition between creats, they need to be serialized

17:11 <heat> same for renames, yadda yadda

17:11 <zid> there's a gajllion lockless inserts

17:11 <heat> there's no lockless filesystem

17:11 <zid> bear in mind there's two things at play here

17:12 <zid> the bytes on the drive, and the structures in memory

17:13 <heat> anyway, linux solves these kinds of issues by just creating "incomplete" structures and waiting on some flag using a wait queue or a futex-in-the-kernel thing

17:13 <zid> you'll still want to serialize/lock the actual disk update so that two threads aren't shitting on each other, via whatever mechanism you want, dedicated worker thread or whatever, but the actual in-memory versions can have totally different semantics

17:13 <zid> cmpxchg doesn't really exists for hard disks

17:13 <heat> and their solution is OK but it seems complex

17:13 <zid> it does for cpus though

17:13 <heat> it does btw

17:14 <heat> nvme has cmpxchg

17:14 <zid> page level, or byte?

17:14 <heat> page

17:14 <zid> pretty big lock :D

17:14 <heat> unless I'm talking out of my ass here, but I specifically recall NVMe having commands for that

17:14 <heat> anyway

17:14 <zid> I'd probably use that to update the inode or whatever, but the thread organizing that, just reading from an in-memory RCU, that other threads stomp on

17:15 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

17:15 <heat> do you get my problem?

17:15 <zid> I think you just don't believe in yourself

17:15 <zid> You think you're not capable of writing it lockless *in memory* because the disk version *does* need locks

17:16 <heat> you can create an in-memory version, but if that in-memory version requires an expensive op you just cant/shouldn't hold the lock

17:16 <heat> and if you create an incomplete version of the structure, you'll need to wait for it to be complete (which is fine)

17:16 <zid> right, and if there's no lock, there's no locking bugs

17:16 <zid> you're just adding shit to a queue for the 'serialize to disk' process to happen

17:17 <zid> and the only "hard" part is localized into one spot

17:17 <zid> the dequeuer

17:17 <zid> not "every single appender"

17:17 <heat> no

17:17 <heat> you *need* to hit the filesystem

17:17 <heat> you can't create an in-memory inode (and not hit the filesystem), and then go "oopsie, we had no inodes left"

17:18 <heat> does it need to hit the disk? no, thanks to the buffer cache, etc

17:18 <heat> but you will probably need to read or lookup the buffer cache's version of it

17:18 <heat> and this is just the traditional case

17:18 <clever> i can see multiple ways i might implement such a thing

17:19 <heat> if you go for NFS, etc you'll be 10x more fucked

17:19 <clever> first layer, might be to have a count of free inodes, and a count of allocating inodes

17:19 <clever> grab a lock, check if free > allocating, then allocating++; and drop the lock

17:19 <clever> now you can go off and build the inode, before you even know what index it is, and be garanteed that one is reservd

17:20 <heat> but you do need to allocate

17:20 <heat> what if you fstat? and you have no st_ino?

17:20 <clever> yeah, at a later stage, you will need to grab some other lock, scan the inode table for an empty slot, and allocate it properly

17:20 <pitust> you can also do this without a lock, by using some atomic stuff

17:20 <clever> and then you need to grab 2 locks, and allocating--

17:21 <heat> yes

17:21 <heat> but all of this is part of your i_fops->creat(...)

17:21 <heat> this is filesystem stuff

17:21 <clever> pitust: with atomics, you could increment allocating, but what about the risk of 2 people incrementing it (properly atomicly), but now it exceeds free!

17:21 <heat> cmpxchg

17:21 <clever> ah, yeah that can work

17:22 <clever> so you increment it in a non-atomic copy, and then only store if you won the race

17:22 <pitust> or you can get the old value, and if that exceeds the max, subtract and retry

17:22 <pitust> although if you want the old value GCC and clang still have to use a CAS loop

17:22 <pitust> (on x86)

17:23 nvmd has joined #osdev

17:23 <clever> heat: the benefit i can see, to a seperate allocating and allocate stage, is if you inode table is broken up into groups

17:24 <clever> you have a very quick cmpxchg based increment, to reserve an inode fs wide

17:24 <heat> but there's no benefit to doing that

17:24 <clever> then each core can grab a lock on a different inode group, in parallel

17:24 <heat> this is generic code

17:24 <clever> and can scan that group for a free slot, in parallel

17:24 <heat> I don't know if your filesystem is a thing

17:24 <clever> mostly just theory crafting

17:25 <heat> https://gist.github.com/heatd/0bb5e2ae98361889877e4478879b7df5

17:25 <bslsk05> gist.github.com: dentry.cpp · GitHub

17:25 <clever> but from memory, i believe ext2/3/4 and xfs has the inode tables split up into groups

17:25 <heat> THIS is my problem

17:25 scoobydoo has quit [Read error: Connection timed out]

17:25 <heat> I can reverse the cheap and expensive part, but I need some way to wait for it to be complete

17:25 <heat> which is possible, but clunky and non-standard

17:26 scoobydoo has joined #osdev

17:27 <heat> and where you see "dentry" you can also imagine your page cache or something

17:27 xenos1984 has quit [Ping timeout: 268 seconds]

17:27 <heat> parallel lookups to the same thing will need to wait for completion

17:28 <heat> and in an ideal, non-serialized world, parallel lookups to other things will be able to go concurrently because no one is holding the lock

17:28 <clever> yeah, thats why i was thinking of a per inode group lock

17:28 <heat> dude

17:28 <clever> but you could do it lockless, and just retry upon failure

17:28 <heat> that's soooooooooooooooooooooo much lower level than what I'm talking about

17:29 <heat> also unavoidable

17:31 <heat> basically all I'm asking is if there's a known pattern for this

17:32 <heat> I can't use "lock and do thing" because that is slow

17:32 <zid> I need to get someone with more authority than me to repeat the bit I said earlier :P

17:33 <heat> i dont understand what you mean

17:33 <heat> lockless is not the issue here

17:33 <zid> stop trying to consider the disk while thinking about the data structures

17:33 <heat> my point here is that the data structures will only be complete when you hit the disk

17:34 <zid> why do you need complete ones?

17:34 <heat> two parallel lookups that hit the disk will need complete ones

17:34 <zid> If I issue three writes all hitting the same directory, if it's possible for my writes not to affect each other on the final disk image, the fancy data structures should handle that

17:34 <zid> if they don't, that's a missed optimization at best

17:34 <heat> "here's the inode I found! caller: which inode? lookup: i dunno"

17:36 <heat> lets ditch the dentry example

17:37 <heat> two threads try to look up page 0 of file foo (pagecache), one allocates the page and starts the IO, the other one will need to wait for the page to become filled

17:38 <heat> if you do the IO while holding the page cache's lock, you serialize everyone, so doing slow things outside the lock is the only valid approach

17:38 FreeFull has joined #osdev

17:39 <heat> if you them outside of it, only threads trying to look up the same thing will get blocked, which is the desired behaviour

17:39 <heat> s/if you them/if you do them/

17:39 <heat> right?

17:42 xenos1984 has joined #osdev

17:42 seer has joined #osdev

17:42 scoobydoo has quit [Read error: Connection timed out]

17:44 scoobydoo has joined #osdev

18:08 terminalpusher has quit [Remote host closed the connection]

18:10 nvmd has quit [Quit: WeeChat 3.6]

18:11 nvmd has joined #osdev

18:26 gog has joined #osdev

18:37 renopt has joined #osdev

18:38 dude12312414 has joined #osdev

19:15 SGautam has joined #osdev

19:34 <mjg> heat: so it was interrupts after all!

19:36 <heat> yeah

19:36 <heat> which kind of begs the question "why"

19:36 <heat> waking up threads isn't supposed to be slow

19:37 <mjg> that's not what 'begging the questin' means

19:37 <heat> maybe it's just natural and not an anomaly

19:37 smach has quit [Remote host closed the connection]

19:37 <heat> oopsie

19:37 <heat> you get what i mean

19:38 <mjg> ye, just sayin people misuse this bit so often i repeat it to myself to not fall back :>

19:38 smach has joined #osdev

19:39 <mjg> i had a look at your code, all the write locking should be whacked man

19:39 <mjg> when collecting the graphs

19:39 <mjg> erm, you have a global lock which you read lock on each cpu, that's super pessimal

19:39 <mjg> bare minimum, still pessimal, you can implement locks per-cpu so that at least they don't interfere with each other

19:40 <mjg> then when disabling the mechanism you flip the flag to off and wait for all locks to not be taken

19:40 <heat> it was the quickest solution

19:40 <mjg> correct way requires memory barriers and wahtnot and is not warrranted

19:40 <mjg> dude the above can be coded in the smae time + 2 minutes

19:40 <mjg> :>

19:40 <heat> :)

19:41 <heat> anyway I've been tackling bigger issues

19:41 <heat> mainly trying to remove the dentries' rwlock

19:41 <heat> I want a rwspinlock

19:41 <mjg> R C... don't want to triggr anyeone

19:41 <heat> lmao

19:41 <heat> you mean EBR

19:42 <mjg> believe it or not, rw lock there should perform just fine at the measily 4 threads you got

19:42 <Griwes> U seem to be really careful about it

19:42 <mjg> in fact it will be ok-ish until about 16

19:42 <mjg> it performs way worse than i'm describing because the implementation you have right now sucks

19:43 <mjg> dentry or not, you will keep running into it, so that should be fixed

19:43 <heat> I've switched it around a bit

19:43 <heat> i take less locks

19:43 <heat> and it seems to be similar to other kernels

19:43 <heat> except no spinning

19:44 GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]

19:44 <mjg> Griwes: 8)

19:44 <heat> https://github.com/heatd/Onyx/blob/77853fcdda34cdc256ed1a3bf5cc7daa9c950d9e/kernel/kernel/sched/rwlock.cpp#L60

19:45 <bslsk05> github.com: Onyx/rwlock.cpp at 77853fcdda34cdc256ed1a3bf5cc7daa9c950d9e · heatd/Onyx · GitHub

19:45 <mjg> it is not since you still take a sipnlock just to wait for it

19:45 <heat> something funny I did notice is that vfsmix performs way better when "lets try to reschedule" code is commented out because there's a lot more idle

19:45 <mjg> this actually may be worse than openbsd :-P

19:46 <heat> mjg, seems to be how lunix does it

19:46 <heat> :)

19:46 <mjg> what

19:46 <heat> yes

19:46 <mjg> where

19:46 <mjg> rwsem?

19:46 <heat> yes

19:46 <mjg> are you sure you did not misread it

19:46 <heat> yup

19:46 <mjg> fallback, the fucking bottom, definitely does it to interlock going off cpu vs unlock

19:47 <mjg> there is also a hack where pending writers serialize on a hand-rolled mcs lock

19:47 <mjg> but that's not the same thing

19:47 <heat> what fallback?

19:47 <mjg> slowpath, call it whatever you want, i have not seen that code in 5 years

19:47 <heat> https://elixir.bootlin.com/linux/latest/source/kernel/locking/rwsem.c#L1128

19:47 <bslsk05> elixir.bootlin.com: rwsem.c - kernel/locking/rwsem.c - Linux source code (v5.19.11) - Bootlin

19:48 <heat> I tried to look at freebsd but that code was bonkers

19:48 <mjg> ye

19:48 <mjg> you missed this part

19:48 <mjg> if (rwsem_can_spin_on_owner(sem) && rwsem_optimistic_spin(sem)) {

19:48 <heat> i didn't

19:48 <heat> <heat> except no spinning

19:48 <mjg> ok, miscocummunicated

19:48 <mjg> the no spinning bit makes/breaksp erformance man

19:49 <heat> does it?

19:49 <mjg> yep

19:49 <mjg> look for the commit which introduced it

19:49 <mjg> that or the mail thread has numbers

19:49 <mjg> but wait, they don't ever spin for *readers*?

19:49 <mjg> that's defo pessima

19:49 <mjg> l

19:49 <mjg> but i understand why

19:50 <heat> they do

19:50 xenos1984 has quit [Read error: Connection reset by peer]

19:50 <mjg> where

19:50 <heat> sorry, not spinning

19:50 SpikeHeron has quit [Ping timeout: 250 seconds]

19:50 <heat> * Reader optimistic lock stealing.

19:50 <mjg> so the general problem with rw locks where you can go off cpu while holding them

19:51 <mjg> is that there is no sensible way to track if any of the readers if off cpu

19:51 <mjg> so then what

19:51 <mjg> [there are funny ways to try to approach it, but i'm not fond of anything i came up with and i'm unaware of anyone coming up with anything better]

19:52 <heat> btw https://github.com/torvalds/linux/commit/617f3ef95177840c77f59c2aec1029d27d5547d6

19:52 <bslsk05> github.com: locking/rwsem: Remove reader optimistic spinning · torvalds/linux@617f3ef · GitHub

19:54 <mjg> that's not the thing,i need to clarify terminology

19:55 <mjg> there is writers spinning waiting for readers

19:55 <mjg> and there is readers spinning waiting for writers

19:55 <mjg> in the linked commit they removed the latter, while my comments were about the former

19:55 <mjg> that said, i find their result suspicious, they most likely severily pessimialy thread readers

19:56 <mjg> not allowing any of them as long as there are pending writers

19:56 <mjg> s/thread/treat

19:57 <heat> if your lock is only good while spinning doesn't that mean you should use a spinlock instead? :)

19:57 <mjg> what

19:57 <mjg> let me restate man

19:58 <heat> if the make or break of rwlocks is if you have optimistic spinning, then maybe you should really use something that spins in that instance

19:58 <mjg> contrary to popular belief that "spinning is wasting time", going off cpu, putting someone else on cpu, and then going back

19:58 <mjg> tends to be significantly more expensive than just spinning

19:58 <mjg> in total

19:58 <mjg> heat: OR you need to support going off cpu as a corner case

19:59 <heat> right

19:59 <heat> but spinning is only useful if the lock is held for a short-ish amount of time

19:59 <mjg> the moment you go off cpu, you are rolling a dice

19:59 <mjg> #define short-ish

19:59 <heat> if the owner blocks, you spun for jack shit

19:59 <mjg> that's true, but what if it did not

19:59 <heat> use a rw spinlock? :v

19:59 <mjg> there is a very important effect you need to account for here

20:00 <mjg> say you own a lock and want to grab another one, but you go off cpu to do it

20:00 <mjg> then someone else who wants the first lock also goes off cpu

20:00 <mjg> you find yourself in a funny spot where machine apperas idle

20:01 <mjg> and some people confuse it for having spare cycles

20:01 jimbzy has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]

20:01 <mjg> basically this multicore stuff likes to suddenly collapse

20:01 <mjg> in terms of performance

20:01 <heat> right, but it effectively is

20:01 <heat> you're all waiting for the last lock

20:02 <mjg> and you are waiting *longer* if the owner is off cpu

20:02 <mjg> there is a huge multiplicatin factor here

20:02 <mjg> if that going off cpu could have been avoided, you have a dramatic win

20:02 <heat> how big a win?

20:03 <mjg> let's say the ersource is contested and you have 32 cpu threads, which is not much

20:03 <mjg> 20 of which want the lock

20:03 <mjg> so whatever extra delay incured by the lock owner is multiplied by 20

20:04 <mjg> and even then they will be serializing on each other

20:04 <mjg> you went from possibly slow but tolerable to a non-starter

20:04 <heat> right

20:04 <mjg> i can't stress enough how this likes to degrade

20:05 <heat> but if your locks are spin-happy you're also just wasting cpu time for something that may very well take a long time

20:05 <mjg> to ilustrate with a real example, there was a point where freebsd was ok-ish at 80 threads when doing buildkernel

20:05 <mjg> on a 4 socket westmere

20:05 <mjg> then it was booted on 4 socket broadwell, 128 threads

20:05 <mjg> and the same workload collapsed into oblivion

20:06 <mjg> heat: i'm not saying every single instance of spinning is good, just that in practice, spinning tends to win

20:06 <mjg> ultimatley all locking is just performance damage control, the moment you contend you are already losing

20:06 jimbzy has joined #osdev

20:07 <mjg> and in fact you are losing already by having a shared lock, even if it is not contested as youare bouncing it

20:07 <gog> jimbzy: sosig

20:07 xenos1984 has joined #osdev

20:08 <jimbzy> SOSIG

20:08 <mjg> heat: all this aside, i propose a game for you

20:08 <mjg> heat: to make selected benchmarks, like vfsmix, scale better than on openbsd

20:08 <mjg> heat: you in?

20:08 <jimbzy> How are you doing, gog?

20:08 <gog> jimbzy: pretty well actually

20:09 <jimbzy> Love it!

20:09 <zid> I had sosig the other day, in a bnu

20:09 <zid> it was pig slices today

20:09 <zid> You know that noise of people running wood through a huge band saw? *nrrrrrrwwww*

20:09 <zid> Like that

20:10 <heat> mjg, sure

20:10 <heat> sounds good

20:10 <mjg> heat: right on

20:10 <mjg> heat: so i guess you shoul start with getting an openbsd vm

20:10 <heat> aw

20:11 <heat> im not in anymore

20:11 <mjg> (:

20:11 <heat> openbsd is CRINGE

20:11 <mjg> OH

20:11 <heat> oh

20:11 <mjg> good thing theo is not on the channel

20:13 <heat> what the fuck

20:13 <heat> why are there so many installation options

20:13 <mjg> they wanna fuck with you

20:14 <heat> anyway, something I want to ask you

20:14 <heat> how does fbsd do lookup when you need to hit the disk?

20:14 <heat> i assume your dentries have some sort of rwlock?

20:14 <mjg> there is a fallback to locked lookup

20:15 <heat> point being that I want to replace all my rwlocks with rw spinlocks and do the IO outside the lock

20:15 <mjg> ye that's sensible, but then you will still need a way to serialize on this

20:15 <heat> yes, I have that

20:15 <mjg> kind of a dedicated io lock, so to speak

20:15 <heat> io lock? to protect what?

20:16 <mjg> say you have 2 threads doing the same lookup and finding they need to i/o to proceed

20:16 <mjg> then what

20:16 <heat> oh yeah sure

20:16 <heat> I have a futexish thing

20:16 <heat> I'll make them wait on an address

20:17 <mjg> whatever syncs them is fine

20:17 <mjg> basically the point is to avoid repeat i/o

20:17 <mjg> and not get false negatives

20:18 <heat> I think I'll still need to repeat the lookup if it fails right?

20:18 <heat> you can't assume failure = ENOENT

20:18 <mjg> you do disgunguish "we have no entry" from "there is no file like that" from "we have an entry which says there is no file like that"

20:19 <heat> i dont have negative dentries yet

20:19 <mjg> now that i wrote it, do you cache results that there is nothing named like that?

20:19 <mjg> ouch

20:19 <heat> cry path resolution man

20:19 SpikeHeron has joined #osdev

20:19 * mjg cries a river

20:20 <mjg> look if you wanna beat openbsd, yuo have to step up

20:20 <mjg> as is you are probably around their level, unless they fixed something in the last 3 yeras since i looked

20:21 <heat> lmao

20:22 <heat> i assume that if negative dentries existed, error != ENOENT would mean you discard the negative dentry? and then concurrent lookups would need to retry

20:23 <heat> ...or I could store the errno in the negative dentry, but I don't know how iffy that is

20:24 <mjg> of a negative entry exists, where are you getting the error from?

20:24 <mrvn> heat: you get lots of code that searches PATHs and always for similar files. Like libc.so. Seems like it would be usefull not to have to read the disk every time.

20:24 <mjg> lookup succeeded without i/o

20:24 <mjg> you just return ENOENT to the caller

20:24 <heat> mrvn, i know that's what a negative dentry is

20:24 <mrvn> mjg: the first lookup produces and error. You store that and return it every time

20:25 <mjg> anyway just make sure you invalidate such entries on file creation, rename, mkdir etc

20:25 <mrvn> and have an option for the FS to disable or limit it. Like NFS.

20:26 <heat> I was assuming negative dentries would only be for non-existant files, vs lookups that errored out

20:26 <mjg> i have difficulty parsing this

20:26 <mrvn> heat: your choice. But why would you do a second lookup on EACCESS?

20:26 <mjg> you create a negative entry in the name cache when the fs told you it does not have the requested name

20:26 <heat> filesystems are not returning EACCES

20:27 <mrvn> heat: then the problem doesn't arrise. Note: NFS

20:27 <heat> imagine -EIO

20:27 <heat> do I cache that open("stupid.jpeg") returns EIO?

20:27 <heat> is that a cacheable return value?

20:27 <mrvn> questionable.

20:27 <mjg> no

20:28 <mjg> you cache when the fs tells you it got nothing, not when something failed to even find out

20:28 <mrvn> A user can easily DOS you by requesting that over and over and causing your disk and SATA controller to constantly reset.

20:28 <heat> ok, so ENOENT only

20:28 <mjg> i would say let the filesystem add an entry for now

20:31 <mrvn> or the block cache or block device. there are many places you can cache

20:34 <mrvn> Does anyone have a FS interface where you request stat() for a whole path at once and the FS then does a path walk and returns an array?

20:35 <heat> no

20:35 <heat> you /could/

20:35 <mrvn> I kind of want to keep the round trips for path walk small.

20:37 <mrvn> Maybe I should add the idea of an agent. The kernel doesn't ask the FS to stat a file but sends it an agent (function pointer basically) that then runs under the FS process to do a path walk.

20:38 <heat> "fstype: 4.2BSD"

20:38 <heat> am I supposed to be scared mjg?

20:39 <mjg> :)

20:39 <mjg> no

20:40 <mjg> note that they are going to have a single-threaded slowdowns vs you due to security mitigations

20:40 <mjg> however, once multicore performance is better, you can look into disabling that bit

20:41 <mjg> mrvn: but who needs that modulo userspace realptah, which you should implemented in the kernel instead

20:43 pretty_dumm_guy has joined #osdev

20:43 <Griwes> Implementing a thing _in the kernel_?! Travesty

20:44 <heat> musl used to use linux's kernel realpath but ended up rolling their own because of some issues

20:47 <heat> openbsd doens't fucking boot

20:47 <heat> wonderful

20:48 <mjg> kvm?

20:48 <heat> yeah

20:48 <mjg> i would do a quick google, chances are decent you can flip something easily

20:48 <heat> booting from the hard drive (after the installation) said No active partition

20:49 <heat> I also picked GPT despite them saying it was possible it couldn't boot sooo

20:49 <heat> retrying with MBR

20:49 <mjg> were you fucking with ithe installer?

20:49 <mjg> right

20:49 <heat> no

20:49 <mjg> go all defaults man

20:49 <heat> openbsd is a fragile flower

20:49 <heat> (2022 and im using a fucking MBR)

20:50 <mjg> well it does have 2005 scalability....

20:50 <mjg> obsd kernel

20:50 <heat> so do I, but I have gpt support

20:50 <mjg> i hear they added something which lets them get flamegraph tho!

20:51 <heat> ah that was it

20:51 <heat> they don't support GPT disks

20:53 <mjg> ?

20:53 <mjg> that would be weird

20:53 <mjg> well i'm not looking into this bit

20:53 <heat> well, it doesn't boot

20:53 <heat> but the MBR installation does sooo

20:54 <mjg> you will need to install 'gmake' and use that instead of make

20:55 <heat> can you pastebin the vfsmix again?

20:56 <heat> wait, no need

20:56 <mjg> you will need your hacked main.c as wel

20:56 <mjg> do you have any means to move files between onyx nad the rest?

20:56 <mjg> i guess you had to to get wis working

20:56 <heat> usually I just craft a new fs, it's the easiest

20:57 <heat> i do have a local copy soooo I'll just pastebin it myself

20:57 <mjg> you got ext2?

20:57 <heat> yes

20:57 <mjg> well you still need ot patch main.c :-P

20:57 <heat> openbsd has 4.2BSD which is highly superior

20:57 smach has quit []

20:57 <mjg> https://people.freebsd.org/~mjg/illumos/will-it-scale.diff for the outdated thingy

20:58 <mjg> although! maybe they have hwloc

20:58 <mjg> or libhwloc

20:58 <mjg> i don't know how to search for it with their tooling

20:58 <heat> who the fuck uses underscores in command names

20:58 <heat> it's like google's stupid underscored args

20:59 <mjg> :)

21:00 <heat> .........they don't have O_DIRECT

21:00 <mjg> vfsmix uses it?

21:00 <heat> some other tests do

21:01 <mjg> ye you may need to rm some stuff

21:01 <mjg> just get vfsmix workign man and roll with it

21:01 <mjg> note their /tmp is going to also be uffs

21:01 <mjg> as in *not* tmpfs

21:01 <mjg> and i don't know if you can enable the latter

21:02 <mjg> so for a fair comparison you will need to even it up

21:02 <mrvn> mjg: the kernel only handles memory, timers and irq dispatch

21:03 <mrvn> well, one timer per core

21:05 <heat> what an insane fucking system

21:07 <heat> hahahaha holy fucking shit it's so bad

21:07 <heat> I do 3x what they're doing on vfsmix

21:08 <mjg> but are you doing tmpfs or ext2

21:08 <heat> tmpfs in Onyx, mfs in OpenBSD

21:08 <mjg> that's not fair

21:08 <heat> why

21:08 <mjg> get ext2 on onyx

21:08 <heat> that's not going to work

21:08 <mjg> or tmpfs on openbsd

21:08 <mjg> why not

21:09 <mjg> you cange the path in the bench, does not have to use /tmp

21:09 <heat> isn't mfs tmpfs

21:09 <mjg> make no mistake, when i was saying openbsd is bad, i'm dead serious

21:09 <mjg> no it is not

21:09 <mjg> it is memory-backed ufs

21:10 <heat> my ext2 write isn't very stable

21:10 * mjg glares

21:10 <heat> I was going to work on it before we bikeshedded along until now

21:10 <mjg> ok man, let me give you something to do in the meantime

21:10 <mjg> lemme ifnd it

21:11 <mjg> two benchen

21:12 <mjg> https://dpaste.com/A4HLWL7JK

21:12 <bslsk05> dpaste.com <no title>

21:12 <mjg> if you can't remount tmp just change it to something

21:12 <mjg> like 'bench'

21:12 <mjg> or whtaever

21:12 <mjg> most notably stat4 rolls with a deep fucking path

21:13 <mjg> so infefficiencies in lookup are really highlighted

21:13 <mjg> but there are no changes made

21:13 DonRichie has quit [Quit: bye]

21:14 <mjg> as long as yu can safely have the file there, you are set to run

21:14 <heat> i'm kinda in the middle of something

21:14 <mjg> no rush here

21:16 DonRichie has joined #osdev

22:12 teroshan9 has quit [Quit: The Lounge - https://thelounge.chat]

22:12 teroshan9 has joined #osdev

22:14 teroshan9 has quit [Client Quit]

22:15 teroshan9 has joined #osdev

22:15 teroshan9 has quit [Client Quit]

22:16 teroshan9 has joined #osdev

22:24 jack_rabbit has quit [Ping timeout: 246 seconds]

22:25 jack_rabbit has joined #osdev

22:30 nvmd has quit [Quit: WeeChat 3.6]

22:41 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

22:48 [itchyjunk] has joined #osdev

22:59 opal has quit [Remote host closed the connection]

23:04 opal has joined #osdev

23:19 vdamewood has joined #osdev

23:31 SGautam has quit [Quit: Connection closed for inactivity]

23:34 bgs has joined #osdev

23:37 nyah has quit [Ping timeout: 248 seconds]

23:39 pieguy128 has quit [Ping timeout: 268 seconds]

23:40 pieguy128 has joined #osdev