d1rg has quit [Read error: Connection reset by peer]
<heat>
hmm let me give it a shot
<kkd>
but for pointers it makes it (void *) which is probably not great but if you receive it in some type of the same thing in lhs it works fine.
<heat>
yeah... i kinda need void *
<heat>
i could restrict these rcu_ helpers to "type a = rcu_*()" i guess
<kkd>
Ah, so you're basically applying no deref to force usage of rcu macros? Neta.
<heat>
in any case i got sparse to work with my kernel, which is good even though the dev experience is 10x worse
<kkd>
Neat*
<heat>
yes that's what i was trying to do
<heat>
if successful, possibly upstream to linux
<kkd>
makes sense, can certainly catch a fair amount of bugs where people do dereference directly
<heat>
yeah, sparse (make C=1) can already catch them, but having it pop up in compiler warnings (and clangd!) would be much better
<kkd>
the __unqual_typeof macro is btw __unqual_scalar_typeof in Linux, and shamelessly lifted from there, but I added another condition for pointer types.
<kkd>
otherwise clang was going crazy because I can't apply address_space to deref'd value type of a pointer to create a local variable.
<heat>
what do you need qspinlocks for in ebpf anyway?
<kkd>
to hold locks (that aren't shit under contention) :)
<kkd>
there's a bunch of BPF programs that do complicated things, like CPU schedulers written in BPF
<heat>
dont remind me sched_ext exists :(
<kkd>
which need to keep per-cpu / per-NUMA run queues but occassionally rebalance tasks etc.
<kkd>
but in general any kind of synchronization needs it
<heat>
there's a brand new paper where they do page cache policies in ebpf
<kkd>
yeah i've seen it
<kkd>
there was one in a workshop
<kkd>
and i heard the authors plan on submitting a full 12-pager eventually
<kkd>
i think it doesn't make too much sense (probably be better to hint immediate expiration from page cache through pread/pwrite etc.) but hey, people can do whatever they want
<heat>
that has approximately 0 chance of getting merged upstream i assume
<kkd>
I think someone already did the most compelling use case using RWF_UNCACHED
<heat>
hey there's a real patchset for immediate expiration
<heat>
yeah
<kkd>
and that's about it, if you need anything more complicated it's better to just manage your own page cache in user space
<kkd>
more flexibility than you could ever have with bpf
<kkd>
anyone doing this stuff seriously does that anyway
<heat>
maybe the idea would be to not require O_DIRECT + userspace page cache
<heat>
like postgres still defaults to buffered IO
<kkd>
I think for the cases where it matters (mostly scans polluting it) RWF_UNCACHED will be enough
<kkd>
anything more complicated or fine grained, and i think the complexity of doing it through bpf won't justify the effort
<kkd>
at that point you just do it in userspace
<kkd>
but i'm not a database expert so maybe i'm terribly wrong
<heat>
ah yes, postgres never does direct io
<heat>
this is where ebpf could possibly be used (or just fix the fuckin page reclaim)
<kkd>
yeah
<kkd>
but looking at all of this makes me think people miss the bigger picture
hwpplayer1 has quit [Remote host closed the connection]
<heat>
i think the bigger picture is that the current kernel mechanisms (scheduler, page reclamation) are currently suboptimal
<heat>
and people want to patch over it with ebpf mumbojumbo
<kkd>
yeah, that is definitely the case
<kkd>
it is more so with the scheduler
<kkd>
eevdf sucks ass for some workloads
<kkd>
and like people even complained about it when it was posted on the list
<kkd>
and then they went ahead and merged it anyway
<heat>
versus CFS?
<kkd>
yeah
<kkd>
there's like real regressions in some of the workloads, and then multiply 2% by the number of machines running in a big company's fleet
<kkd>
that's a lot of opex wastage suddenly because of upgrading the kernel
<kkd>
afaik google still carry a patch internally that enables CFS back
<heat>
that's sad
<heat>
but otoh you can't really handle every freaking case ever, can you
<kkd>
so i can totally see why people want something like sched_ext
<kkd>
yeah exactly
<kkd>
that's impossible
<kkd>
esp. if you want hyper specialization for a specific use case
<kkd>
well you can by having more schedulers in the tree
<heat>
yeah that could work
<kkd>
but apparently they want the one great scheduler to handle everything
<bslsk05>
lkml.org: LKML: Xi Wang: [PATCH 0/1] Add micro quanta scheduling class
<kkd>
it's used by google for their internal userspace network stack
<kkd>
to schedule its threads at us-scale latency
<kkd>
basically give each netstack thread 100us slices, and once it yields the rest of the us can be harvested by CFS tasks
<kkd>
the operational mode is that you have the netstack task on a cpu alone without mixing application threads, but you can use this to still run some kernel housekeeping kthreads
<kkd>
rcu callback processing and what not
<kkd>
and it just got shot down
<kkd>
and not even on the list, from what i heard the rejection was offline, f2f
<kkd>
so i just think it's accrued frustration from years of rejection that led to the current state
<kkd>
rejection or friction in upstreaming useful internal things
<heat>
ueaj
<heat>
yeah
blockhead has left #osdev [#osdev]
<heat>
fwiw im not a scheduler guy
<heat>
so i dont know if its just completely impractical to have multiple schedulers
<kkd>
you already have like 3-4 schedulers
<kkd>
there's one for realtime
<kkd>
then fair
<kkd>
they just added eevdf
<heat>
eevdf is fair
<kkd>
the core is already fairly well abstracted
<kkd>
that's exactly how sched_ext works, by just plugging callbacks written as BPF programs into ops struct
<kkd>
that could just be a kernel module supplying those callbacks
<kkd>
heat: there's also the case where sometimes you want tight integration between the scheduler and the application
<bslsk05>
lwn.net: User-managed concurrency groups [LWN.net]
<kkd>
this is how they built an internal "fiber" programming model using pthreads
<kkd>
turns out when you mitigations goop the context switch latency can be in the order of 50-100ns
<kkd>
or something low enough to be tolerable
<kkd>
the syscall latency is like 50ns anyway when mitigations are off
<kkd>
so it's approaching green thread switch latency
hwpplayer1 has joined #osdev
<kkd>
they just ended up building a fiber runtime on top of pthreads, there's a server thread that acts as the executor and a bunch of client threads that act as the "fiber" equivalents. You write synchronous code and when there's an event that requires suspension the framework hands over control to the server thread
<kkd>
that allow the server then to place the task on its own internal queue and schedule it back once there is an event
<kkd>
meanwhile it can schedule some other "fiber" thread that's ready to run
<kkd>
and it does so by doing a FUTEX_SWAP, i.e. on the same CPU, it will transparently context switch into the target thread
<kkd>
beginning its execution as a synchronous operation
<kkd>
and controlling its placement
<heat>
i mean, there's a small detail here
<heat>
google will try to upstream whatever works for them
<heat>
i dont know if this works for anyone else
<heat>
but i've seen this _multiple times_, with google in particular (but also every other big corp), where they try to hopelessly push whatever makes sense for them
<kkd>
that's definitely true, but there's a bunch of stuff already that's only used by two or three shops. as long as there's someone to maintain the code it's fine i guess.
<kkd>
and it's not as if they tried pushing it out immediately, they've been using it for what, 15 years or so now
<kkd>
the first time this was presented was back in 2012 or so iirc
<heat>
like for the new mseal stuff, they pushed it as a chrome feature. it's a chrome feature, the guy who added it has no grand vision for it. now they're adding subsequent patches while not caring about the fit
<kkd>
well yeah, but that's what you have maintainers for, to steer it into a generally usable thing
<heat>
sometimes things get automerged if you have a @google.com at the end of the email address
<heat>
and i like it when that doesn't necessarily happen
<kkd>
atleast in this particular case i felt like it would have been useful to have it merged than not
<kkd>
they proposed restartable sequences long ago for the same sort of thing that used this and that ended up getting merged eventually
<kkd>
not by them but somebody else picked it up
<kkd>
the funniest bit was that NetBSD had that restartable sequences concept almost always but it gets no credit
<heat>
yes and even with rseq there are some regrets wrt the interface
<bslsk05>
www.usenix.org: An Implementation of User-level Restartable Atomic Sequences on the NetBSD Operating System | USENIX
<heat>
such that some oracle guys (because of the glorious oracle db) are now proposing a _separate_ interface that works kind of like rseq but differently
hwpplayer1 has quit [Remote host closed the connection]
<kkd>
lol, it's like the guy sending that out never even looked into rseq
<kkd>
oh yeah, but in terms of execution speed it's not a big deal right? you take the hit only when you have the PF, and it gets handled by the exception handler and fixed up
<kkd>
so in the "correct" case you don't really have a real problem
<heat>
copy_to/from_user is very much measurable on x86 due to clac/stac
<kkd>
oh yeah i see
surabax has quit [Quit: Leaving]
eddof13 has quit [Quit: eddof13]
<kof673>
> something something late stage C yes, because it is the compiler :) but then what is it supposed to turn into?
<kof673>
</rhetorical>
vdamewood has joined #osdev
Halofreak1990 has joined #osdev
edr has quit [Quit: Leaving]
<the_oz_>
when google pushes a changfe upstream I trust it more when it has 1 maintainer that's not google more than it being an entire google product with a whole team behind it
<the_oz_>
because they kill things based on big company reasons that make no sense
d1rg has joined #osdev
d1rg has quit [Ping timeout: 268 seconds]
d1rg has joined #osdev
craigo has quit [Ping timeout: 248 seconds]
duckworld has quit [Remote host closed the connection]
d1rg has quit [Read error: Connection reset by peer]
d1rg has joined #osdev
Halofreak1990 has quit [Ping timeout: 268 seconds]
duckworld has joined #osdev
duckworld has quit [Remote host closed the connection]
duckworld has joined #osdev
agent314 has quit [Ping timeout: 268 seconds]
heat has quit [Ping timeout: 248 seconds]
vdamewood has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
Arthuria has joined #osdev
vdamewood has joined #osdev
d1rg has quit [Ping timeout: 268 seconds]
d1rg has joined #osdev
hwpplayer1 has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
kiboneu has quit [Remote host closed the connection]
kiboneu has joined #osdev
d1rg has quit [Read error: Connection reset by peer]
* kof673
quietly takes a swig of absolut c
d1rg has joined #osdev
duckworld has quit [Remote host closed the connection]
<nikolar>
they use it at my a uni my friend attends
<Lucretia>
go on...
<nikolar>
so they fixed it because they use it?
<Lucretia>
[11:45] <nikolar> lol i know why it's fixed
<Lucretia>
why?
<nikolar>
> they use it at a uni my friend attends
<Lucretia>
ok,was expecting the find out what the fix was
<nikolar>
oh no
<nikolar>
i don't know that
<nikolar>
cool
<nikolar>
funny thing is, i even helped that friend with a project using this exact xv6
<nikolar>
how did i forget about that
Halofreak1990 has joined #osdev
<Lucretia>
nikolar: check out commit 6a4d32dd7882d714315067e8919361f82f4a3137 from the pdos repo
<Lucretia>
not pdos, the os385 uic repo
<nikolar>
yeah figured
<nikolar>
-fno-pic and -nostdlib instead of -nodefaultlibs seem like the biggest changes
<Lucretia>
and linker script
<Lucretia>
oh, that's user
<nikolar>
interesting
<zid`>
istr it was dyna rel crap?
<zid`>
sounds like someone sneaked a default pie somewhere
<nikolar>
likely
<Lucretia>
the kernel doesn't show any rela.dyn sections
<zid`>
I'd hope not
hwpplayer1 has quit [Read error: Connection reset by peer]
<Lucretia>
I don't know what you mean then
<zid`>
The broken build
<Lucretia>
yeah...
<zid`>
was to do with rela stuff being produced where it shouldn't be
<zid`>
of the working kernel doesn't have that problem, it works
<Lucretia>
[12:25] <Lucretia> the kernel doesn't show any rela.dyn sections
<Lucretia>
that is the failing kernel
<zid`>
heisenkernel, doesn't work but works
<Lucretia>
what??
<Lucretia>
nikolar: any idea what he's on about?
<zid`>
<Lucretia> Get these errors building gcc 12, /lib64/libc.so.6: unknown type [0x13] section `.relr.dyn'
<zid`>
it was gcc
hwpplayer1 has joined #osdev
<zid`>
and relr
<Lucretia>
I don't get those on gcc14
<zid`>
yes, I expect the problem was someone accidentally a flag, that'sall
TkTech has joined #osdev
<nikolar>
xv6 doesn't run on less than 256mb of ram
<nikolar>
Guess I'm not running it on my 386
<HeTo>
... what? by mb, do you mean KiB and not MiB?
<HeTo>
I mean if you really mean 256 millibits then yeah I'm not surprised it doesn't run on less than a bit
<nikolar>
Very funny
surabax has joined #osdev
msv has joined #osdev
<sortie>
I added a --version option to a program that sets the version to that version, rather than reporting the program's version. I may be going to GNU hell.
<gog>
muahaha
<sortie>
// TODO: These should be moved to upgrade_finalize.
<sortie>
// TODO: Or maybe actually the tix-collection create call above.
<sortie>
// TODO: Remove this.
<sortie>
// TODO: Except wait a moment there's the case of ports only.
<sortie>
^^^^^ You ever run into cases like this in your codebase? lol
<gog>
i don't write TODO lines
<gog>
i just don't do
<Mutabah>
Quite often :)
<sortie>
TODOs are a bit part of how Sortix achieves quality. Anytime I have a quick thought or concern about the code, I immediately put in a TODO before I forget, and make sure I resolve them all before committing
Left_Turn has joined #osdev
heat has joined #osdev
hwpplayer1 has quit [Quit: I'll be back later]
<sham1>
At work we just add TODOs and then just never go back to them
<sham1>
We always intend to go back for them, but we never do
Turn_Left has joined #osdev
<heat>
as you should
<heat>
sortie, you're going to gnu hell
<heat>
it's all aclocal.m4 from here on out
<sortie>
oh no
<sortie>
There's no --help there's only the helpful manual page
Left_Turn has quit [Ping timeout: 272 seconds]
<heat>
why are you not using obstack yet??
<heat>
#include <error.h>
<Lucretia>
is the osdev.org forum admin here?
FreeFull has quit [Quit: rebooting]
<heat>
we have various mods here, including sortie and Mutabah
<Lucretia>
I can't log in and that theme is fucking blinding
<Lucretia>
as in bad
<heat>
if the hyper mega osdev forum admin is chase, you're fucked
<bslsk05>
github.com: Fix for 2025 by Lucretia · Pull Request #233 · mit-pdos/xv6-public · GitHub
<nikolar>
> NOTE: we have stopped maintaining the x86 version of xv6, and switched
<nikolar>
our efforts to the RISC-V version
<Lucretia>
I know
<nikolar>
Just saying
<Lucretia>
It's mentioned in the comment
<Lucretia>
x86 is still relevant
<heat>
hah lorenzo
<heat>
i know the guy
<Lucretia>
ah
<Lucretia>
it's a combination of all those patches which are required to get the thing compiled and run
Arthuria has joined #osdev
demindiro has joined #osdev
<demindiro>
The more I read the UEFI spec the more it feels like a microkernel-esque OS
netbsduser` has quit [Ping timeout: 276 seconds]
* Lucretia
found out today that Carl Sassenrath worked on an oo microkernel os at apple for scorpius/aquarius
Halofreak1990 has quit [Ping timeout: 268 seconds]
agent314 has joined #osdev
Left_Turn has joined #osdev
Halofreak1990 has joined #osdev
Halofreak1990 has quit [Ping timeout: 260 seconds]
frkazoid333 has quit [Ping timeout: 272 seconds]
frkzoid has joined #osdev
kpel has quit [Quit: Lost terminal]
<gog>
demindiro: it has services that are OS-like, but it has limitations that make it unsuitable for many types of applications
the_oz_ has joined #osdev
<gog>
like for example my boot code is a regular user-mode application and i wanted a consistent ABI for it to use, so i abuse the syscall mechanism a little. i have to swap GDTs and segment selectors so i have to disable interrupts during the times that i'm not making a "system call"
<gog>
that puts me in a fragile state were if the "hlt" instruction is executed, the whole system is irretrievably locked