carbonfiber has quit [Quit: Connection closed for inactivity]
wgrant has quit [Ping timeout: 252 seconds]
wgrant has joined #osdev
frkzoid has joined #osdev
gxt_ has quit [Remote host closed the connection]
gxt_ has joined #osdev
<heat>
should I discard SUID and SGID?
<heat>
security me would think so, but I think that a good chunk of UNIX revolves around it
<clever>
how else would you do things like sudo?
<clever>
or do you just never, ssh in again as root?
<heat>
you could have a root daemon and run stuff from there
<heat>
I think that magisksu does that
<mrvn>
sshd doesn't need suid
<mrvn>
heat: do you mean the uid/euid/suid fields in a process or the suid bit in the filesystem?
<heat>
suid bit
<mrvn>
sometimes you want to give programs more capabilities than the caller has. Unless you you want that dbus insanity from desktops
<heat>
suid is also insane
<heat>
in fact, even more
<mrvn>
at least it's simple an effective
<mrvn>
extended attributes to give capabilities would be more flexible
<heat>
its not simple in practice
gog has quit [Ping timeout: 260 seconds]
<mrvn>
usage or implementation?
<heat>
usage
<mrvn>
I don't know. I want something to be able to access the audio devices owned by the audio group so I set sgid audio. seems simple enough.
<heat>
i don't care about sgid audio
<heat>
I care about suid root
<mrvn>
still simple. if it should run as root set suid root
<heat>
thank you mrvn
<heat>
you truly understand the intricacies of building a secure system around setuid
<mrvn>
Handling the uid/euid/suid correctly in code and implementing it in the kernel I find harder.
<mrvn>
heat: suid/sgid is pretty much an all-or-nothing system. Not sure you can talk about secure with that.
<mrvn>
But it's simple: You either trust the binary with your live or not. Black & white.
<mrvn>
it's easy to debug too. Ever tried to debug the dbus madness to find out why some 3rd party code gets an error trying to do something?
<heat>
it's horrible to debug
<heat>
have you seen the 30000000000000000000000 setuid privilege escalation bugs?
<zid>
I'm not sure if 'setuid' is the thing causing the bug though, or just "bugs in methods to run privledged operations" in general is hard to do right.
<mrvn>
What has that got to do with debugging the setuid feature? That's about making it secure, which you can't. Any bug in a suid binary is probaby an escalation.
<zid>
That is to say, I've also seen a lot of bugs in syscalls, security tokens, etc
<heat>
I don't give a shit about debugging the setuid features
<zid>
setuid is just one method, of which will also be as buggy as all other things
<mrvn>
The same bugs in a food server running as root and accessed via dbus has the same escalation problem.
<heat>
it's way easier
<heat>
your attack surface is so reduced
<mrvn>
you think the dbus interface has a smaller surface than the command line?
<heat>
yes
<clever>
ive seen a car before, that had a dbus server running as root, listening on tcp, and it had a bloody command to just run anything as a shell command
<clever>
anybody that can connect to the wifi hotspot can abuse that
<clever>
the default WPA password, is based on the unix uptime of the first boot, before ntp has had a chance to fix the clock
<clever>
so its just a hash of how many seconds it takes a fresh install to boot
<clever>
how much entropy is that? :P
<mrvn>
2 or 3 keys
<clever>
to protect the ECU from a malicious entertainment system, there is a dedicated MCU acting as a firewall between the 2 CAN busses
<clever>
and only when the entertainment system is put into flashing mode, can that MCU also be reflashed
<bslsk05>
blog.qualys.com: PwnKit: Local Privilege Escalation Vulnerability Discovered in polkit’s pkexec (CVE-2021-4034) | Qualys Security Blog
<heat>
et al...
<clever>
and it only reads code from the (normally) read-only flash
<clever>
except, for one spot, where it can run a shell script from the rw partition
<clever>
so a malicious payload can create that script, reboot into flashing mode, re-flash the MCU, and then attack the ECU!
<clever>
and it gets worse!
<clever>
the dbus port, is also accessible over the cellular modem....
<mrvn>
heat: aeh, polkit is that thing that you use with dbus on desktops. You just shown that that has all the bugs of dbus plus suid bugs.
<clever>
mrvn: i recently took a peek, and i can disable polkit on nixos, xfce will sanely detect the lack of polkit, and just disable the shutdown/reboot buttons
<clever>
and polkit has bit me a few times, i typed "reboot" into the wrong, NON ROOT shell, and it rebooted without any confirmation
<clever>
i expect non-root shells to lack root :P
<heat>
polkit is using setuid for privilege escalation
<clever>
yep
<heat>
this is not the first problem with setuid, nor will it be the last
<clever>
there was also a fuse bug
<heat>
setuid is not fucking easy, because it's impossibly hard
<mrvn>
heat: not realy. polkit needs escalated priviledges by some form or another. That's not the bug.
<clever>
a lot of fuse programs are setuid, because you need root to mount an fs
<clever>
and libfuse will helpfully do `modprobe fuse`, if it fails to open `/dev/fuse`
<mrvn>
polkit is buggy leeting other abuse the escalated priviledges it runs under.
<mrvn>
s/leeting/letting/
<heat>
i swear to fucking god
<heat>
are you trolling?
<clever>
modprobe was never designed to be setuid friendly, so modprobe respects an env var to change its config path
<clever>
the modprobe config, can remap `modprobe fuse` into a shell command
<heat>
"setuid is bad because its insecure and hard to use" "no setuid is great, programs that use it are buggy and bad"
<clever>
and if you exaust the open fd counts, opening /dev/fuse will fail
<zid>
I think security is just hard to use and buggy and bad :P
<zid>
disregard security
<clever>
heat: exactly, the fuse authors messed up, in creasing the above bug
<clever>
setuid isnt to blame, its fuse&modprobe for making some bad assumptions
<heat>
setuid is to blame because it's an inherently flawed idea
<mrvn>
heat: no
<heat>
"lets keep everything as is, except the euid, what can go wrong?"
<clever>
heat: my understanding, is that euid is there, to make it easy to open things as the original user, so you dont wind up with bugs where you can `foo --config /etc/shadow` and it spits out your syntax errors
Ermine has quit [Ping timeout: 264 seconds]
<clever>
so you can temporarily drop root, but get it back, when running code you trust with root access
<mrvn>
clever: carefull, you have to do that differently when using suid bits.
<mrvn>
(and that part I think is a bad design)
<clever>
without euid, you would either need to re-implement all permission checks in userland (moar bugs), or spawn a dedicated child proc to drop root, read the file, then pass you the contents out its stdout
<mrvn>
clever: you are talking about a different (although related) feature
<clever>
fuse also violates a lot of the normal rules, where root can get permission denied, and you have no way to really re-implement that
<clever>
yeah, i might be getting some names mixed up
<zid>
security is hard, disregard security
Ermine has joined #osdev
<zid>
stop letting people run commands on your machine and they won't be running them as root due to setuid bugs
<zid>
run all your software in the butt
<clever>
zid: if you trust every piece of software on the machine, and its air-gapped, who needs security!
<mrvn>
heat: the fact remains that you will need some way to escalate priviledges. Figure out something better that can't be abused through buggy code and everyone will thank you.
<mjg>
there was a scritp recently which you were supposed to curl | bash
<mjg>
it had sudo invocation inside :D
<clever>
mrvn: the installer for the nix package manager does exactly that
<clever>
oops, mjg ^
<zid>
clever: I trust that the software is software, and therefore the worst it can possibly do is software things. The things I want to keep 'secure' on my machine are the least protected, my email account.
<mjg>
my assessment: LOL
<clever>
but if you have +w to /nix, it wont ask to use sudo
<zid>
software protects silly things, like the machine not being turned off by the wrong users etc
<clever>
polkit likes to violate that :P
<zid>
but polkit doesn't even ATTEMPT to secure the things I don't want others to have/see
<clever>
i have systems where the user with physical access, shouldnt have such permissions
<zid>
so security is useless
<clever>
for example, i'm running a media center out of my NAS, and if you hit escape, youll get a menu with the shutdown option
<mjg>
setenforce 0
<mjg>
first thing i do!
<zid>
If they can run commands on my machine, they can already completely ruin everything the moment they run rm -rf ~, or cat my firefox profile etc
xenos1984 has quit [Quit: Leaving.]
<zid>
if they got root and I have to then reinstall the machine afterwards is sort of irrelevent
<mrvn>
clever: I go through aall that trouble to disconnect the physical power button so users stop turning off the system and then they go and click "shutdown" in the GUI.
<clever>
mrvn: heh, ive not gone that far
gog has joined #osdev
<heat>
clever, if you dislike a specific polkit policy, you can change that
<heat>
it's all written in javascript, so not hard to pick up
<clever>
heat: yeah, the policy files are written in bloody javascript :P
<clever>
i could just `return false` everything
<heat>
it was either that, python or a domain specific language
<clever>
/etc/polkit-1/rules.d/10-nixos.rules just has a single rule for me, polkit.addAdminRule(function(action, subject) { return ["unix-group:wheel"]; });
<mrvn>
clever: groupd wheel? why should wheel have admin rights and adm not?
<clever>
and there is a security.polkit.adminIdentities config flag, with a default value of [ "unix-group:wheel" ]
<clever>
that polkit.addAdminRule part is always there, no way to remove it, but you can easily make it return [];
<clever>
extra rules can be added as well
<mrvn>
You could throw out the whole polkit mess and just use suid and make the binaries executable by group wheel.
<bslsk05>
github.com: nixpkgs/polkit.nix at master · NixOS/nixpkgs · GitHub
<clever>
mrvn: or just use `sudo reboot` and yeah disable polkit entirely
<heat>
i trust polkit a lot more than sudo
<clever>
so i get a password prompt and time to re-think my actions
<heat>
and certainly setuid
<mrvn>
How many thousands of lines of code does polkit add just to reimplement the user/group permissions in your case?
<clever>
heat: my problem, is with `reboot` just working, even if it lacked root, and having zero confirmations
<clever>
mrvn: all of spidermonkey, the JS engine that powers firefox, last i looked
<heat>
it doesn't reimplement anything
<clever>
i trust sudo over polkit
<heat>
it allows way way more finegrained security
<gog>
i trust nothing because security is a myth
<heat>
clever, sudo???
<mrvn>
clever: do you know molly-guard?
<heat>
have you looked at it? don't forget it's running in a suid binary
<mrvn>
molly-guard - protects machines from accidental shutdowns/reboots
<clever>
mrvn: oooo
<clever>
heat: yes, i know sudo is setuid root, thats how it gets the ability to setuid to the chosen user (sudo -u foo)
<mrvn>
Makes you confirm by typing in the hostname of the system you want to shutdown if you aren't on the physical console.
<clever>
mrvn: what would xterm be counted as?
<heat>
clever, sudo is huge
<heat>
doas is way better in that regard
puck has quit [Excess Flood]
<clever>
bbl
puck has joined #osdev
<mrvn>
clever: I think that works but you can install a script that checks if you are root or not and ask for confirmation easily
<mrvn>
clever: I use it for not shutting down systems via ssh
gelatram has joined #osdev
bauen1 has quit [Ping timeout: 252 seconds]
gog has quit [Ping timeout: 264 seconds]
FreeFull has joined #osdev
gelatram has quit [Quit: Ping timeout (120 seconds)]
Killy has joined #osdev
bauen1 has joined #osdev
gog has joined #osdev
gelatram has joined #osdev
bauen1 has quit [Ping timeout: 260 seconds]
gelatram has quit [Ping timeout: 252 seconds]
nyah has quit [Remote host closed the connection]
bauen1 has joined #osdev
alexander has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
alexander has joined #osdev
smach has quit [Ping timeout: 260 seconds]
wxwisiasdf has joined #osdev
<wxwisiasdf>
hello operating system development irc channel
<mjg>
hello random stranger
<wxwisiasdf>
I am trying to make an OS in a... well, rather unusual hardware board
<wxwisiasdf>
I've seen uClinux but it's apparently dead, any recommendations?
<wxwisiasdf>
i could just rollout my own OS through but i don't feel like reinventing the wheel needlessly :P
<wxwisiasdf>
i just want to knowif there is a rather "port-friendly" OS out there for embedded?
<GeDaMo>
CP/M? :P
<wxwisiasdf>
haha, no :)
<heat>
what's the arch?
<wxwisiasdf>
custom
<heat>
usually as far as I know, netbsd is the goto for super portable shit
<wxwisiasdf>
alright
gxt_ has quit [Remote host closed the connection]
gxt_ has joined #osdev
netbsduser` has joined #osdev
darkstardev13 has joined #osdev
archenoth has joined #osdev
darkstarx has quit [Remote host closed the connection]
netbsduser has quit [Read error: Connection reset by peer]
* geist
yawns
<geist>
good afternoon everyone
* mjg
yawns back
Oshawott has quit [Ping timeout: 252 seconds]
<heat>
sup geist
* mjg
looks at adding 'echo' and 'touch' keywords to bmake
<mjg>
if equivalent functionality is not alredy present
wxwisiasdf has quit [Quit: leaving]
<geist>
noice. not sure gmake has a touch (though it does have direct file access nowadays) but it does have echo in the form of $(info ...)
<mjg>
you can 'echo' in similar manner as well in bmake, but it's not equivalent to regular echo
<mjg>
it is prefixed with some stuff
<mjg>
afair bmake and submake level
<mjg>
anyhow the intent is to whack some of the forks + execs which are clearly avoidable
<heat>
b-b-but that's not the unix philosophy
<mjg>
... and which happen a lot
<geist>
yah what i never completely understood until someone htat new gnu make ws that gmake itself doesn't actually always start a shell for every line of a rule
<geist>
its only 'complicated' lines that it does, otherwise it runs a lot of them inline. probably basically what you're doing here
<geist>
basically trivial stuff like echos without environment variable substitution it does there
<dzwdz>
i wonder if creating a frankenstein make with embedded gcc could speed up builds
<mjg>
man my poor eyes bleed when i strace bmake
<mjg>
:(
<heat>
geist, it aliases SHELL invocations when SHELL is a well known value
<mjg>
/bin/sh -e -c echo "===> License GPLv2 accepted by the user"
<mjg>
and so on
<geist>
heat: right. and then for trivial lines. if the lines are more complicated then it punts it to the shell
<mjg>
that i don't mind
<mjg>
but if the content is known at compilatin time, PLZ DON'T
<geist>
so i used to try to merge lots of adjacent lines in gmake with the idea to reduce the number of forks, but actually can backfire
<heat>
yeah
<geist>
because combining trivial lines it already wasn't going to fork actually causes it *to* fork
<mjg>
you may appreciate this bit
<mjg>
building freebsd base system -j 104: 30689.14s user 2509.58s system 6453% cpu 8:34.45 total
<geist>
noice!
<mjg>
llvm linker spawns 104 threads every single time
<mjg>
when i limit that to 1....
<mjg>
30632.01s user 2392.17s system 6384% cpu 8:37.24 total
<mjg>
iow 3 seconds of a difference for spawning 1 thread instead of 104
<mjg>
great win llvm, worth it
<heat>
lld isn't very multithreaded
<geist>
well, is it 104 llvm instances each running 104 threads?
<mjg>
at times yes
<geist>
on what appears to be a 64 core machine
<mjg>
it is 104 thread machine
<mjg>
the build process is not *that* parallel all of the time
<mjg>
plus there is some contention putting stuff off cpu
<geist>
ah yeah, okay that makes more sense
<geist>
yah 100% util on a largeass mulitstage build like that is a holy grail
<mjg>
general point being, they put no upper limit on how many threads they decide to spawn, apart from your thread count
<mjg>
and it gets ridiculous quite fast
ghee has joined #osdev
<mjg>
--threads=1 links the kernel in ~1.5 seconds
<geist>
yeah i can see that. i wonder if it just proactively spawns the threads but then only uses them in some sort of worker fashion
<mjg>
--threads=8 in 0.6
<geist>
such that in reality only 1 or 2 isused
<mjg>
that's literally what it is doing
<mjg>
makes me a sad panda
<geist>
yah i bet it only gets some capability to parallel with sufficiently large programs that it can carve up
<mjg>
i created a ticket about it severl months ago, no response
nyah has joined #osdev
<geist>
my guess is you just really wont see much of a win against more plain C codebases
<mjg>
right, i wanted them to estimate on input files instead of going in blindly
<mjg>
they literally spawn this for a hello world man
<geist>
you really need a huge expansion of code that then gets merged for lld to be doing a ton of work
<geist>
or LTO
<geist>
yah
<mjg>
i don't mind for chromium et al
<mjg>
although i don't know how many threads they can really use
<geist>
yah and the 1 vs 8 number does also bring up the point that i've seen with rustc in fuchsia: adding more threads to the LTO linker doesn't spread the same amount of cpu across multiple threads
<heat>
sounds they should use make's job server?
<geist>
each new thread tends to do say 80% of the total work, so you end up burning more and more total cpu time across all of them
<geist>
reason being that each of the workers duplicates a lot of the work, since each of them carves out a chunk of the binary and statrs from scratch as far as codegen and whatnot
<geist>
the wall time is probably faster as you add more threads, but the total cpu time goes up a lot
<mjg>
i don't know what they do in termso f the actual work they need to accomplish
<geist>
so in isolation (one lld on a dedicated machine) you may has well -j as much as you can, but as part of a larger build it's not a win, you almost want -j1 for large builds
<mjg>
i can tell you the llvm linker itself just does not scale re its work list management
<geist>
except then you can end up with one of the linkers being the long pole in the tent
<mjg>
for one all the spawned threads start with taking a global lock to unlink something for them to do
<geist>
like i said i suspect it's more than just that. it's also very possible the linkers are duplicating work
<mjg>
so there is tons of bouncing on and off cpu from the get go
<mjg>
i have no doubt there is waste there as well
<mjg>
just saying how it looks like from OS pov
<geist>
areyou seeing a lot of demand faults? GN, our build system on fuchsia, for example *nails* the heap in really degenerate ways
<geist>
on a linux machine when it's doing a gn gen it's easily >1mil soft faults/sec
<mjg>
ye i'm doing more
<mjg>
i don't have exact numbers stored
<geist>
and on windows WSL it takes like 20 minutes because those soft faults are really badly emulated
<mjg>
part of the poblem is that bmake itself forks into oblivion
<geist>
a different problem, but one of those cases where if you develop a tool on something like linux you might not notice that it'll lean on stuff that linux is well optimized for
<heat>
>and on windows WSL it takes like 20 minutes because those soft faults are really badly emulated huh?
<geist>
heat: yeah! it was interesting sleuthing
<mjg>
funny bit, but so happens parallel thread creation and destruction is way faster on freebsd than on linux :)
<geist>
note this is WSL1 vs WSL2. WSL2 being just liux in a can runs GN fine
<mjg>
(one of the few things were freebsd is legitimatley faster)
<heat>
why does it have issues soft faulting?
<heat>
windows should still have it no?
<heat>
and since it's anon memory, yadda yadda
<mjg>
what fucks linux up on this is their lack of process abstraction and resulting numerous tasklist_lock acquires
<geist>
heat: it's because it's hitting the heap in a way that causes it to need to expand and shrink the heap aggressively, much past the point where the heap is asking for and freeing large chunks of address space
<clever>
that reminds me, somebody in the osdev discord was mentioning that they like the internal design of linux, but they dislike the userland and syscall api
<geist>
so basically it's wailing on the aspace, adding and removing mappings, and faulting them in
<clever>
and then i was wondering, how hard is it to have a different syscall table for each process?
<mjg>
clever: it is not. bsd can do it.
<clever>
a bit of research later, i found the answer, very hard :P (at least on linux)
<heat>
clever, bsds can do it, linux can (or could?) do it, windows can do it
<clever>
every arch in linux, implements the syscall lookup table differently!
<mjg>
freebsd is doing it a lot with linux emul
<clever>
and for the 2 arches i checked, its a global array of handlers
<clever>
x86-64 implements the lookup in plain c
<mjg>
heat: even illumos can do it! :)
<clever>
arm32 does the lookup in raw asm!
<geist>
heat: what we also found was a) adding removing anon regions in WSL1 is comparitively slow and b) there is some sort of internal lock contention in the WSL1 'vm'. GN is heavily multithreaded and the more threads you add to the mix while fiddling with the aspace == exponential slowdown
<clever>
so if i wanted to add this feature to linux, i would have to modify how every arch handles syscalls
<mjg>
geist: so fuchsia builds on linux?
<mjg>
geist: maybe i'll bench it on freebsd under linux emul
<geist>
and all the demand faults means the whole thing ends up collapsing to single threaded speed and most of the time is spent in the kernel blocked on
<mjg>
geist: it should work(tm)
<heat>
fuchsia only builds on mac and linux
<geist>
it *used* to build under freebsd and netbsd but we rely on a crapton of prebuilts
<geist>
so yeah you'd have to use linux emul
<geist>
might just work
<mjg>
fwiw linux emul is good enough to build the linux kernel
<geist>
i'm sure there will be silly edge cases, there are lots of build scripts that use uname to determine what host it's on, etc
<geist>
so you might have to fake it out at that level
<mjg>
uname lies and claims linux 3. something
<heat>
downvote
<heat>
where's 2.6???
<mjg>
in an optional setting
<geist>
but the build *should* be pretty hermetic. FWIW the amount of toolchains and prebuilt tools fuchsia downloads is largely to be highly hermetic
<heat>
damn right
<heat>
and 2.4?
<mjg>
in linus's butt
<geist>
even 'host' tools that fuchsia builds as part of its run are built with downloaded toolchains
<mjg>
geist: the only worry here is that your stuff is using a syscall which is not implemented
<mjg>
which is not very likely
<geist>
yah it's just tools so probably not
<geist>
gn + ninja + a bunch of prebuilt toolchains
<geist>
and some python3
<mjg>
that should all be fine
<mjg>
well modulo bugs maybe ;)
<geist>
if something ives you trouble it'll probably be a linux Go or Dart binary
<geist>
Go in particular is pretty wonky
<mjg>
fwiw, believe it or not, freebsd can build linux only slightly slower than linux
<mjg>
something like +5% more total real time
<mjg>
i'm working on it
<geist>
noice
<mjg>
it is mostly off cpu time stemming from shit i have not even looked at
<geist>
yah at some point we had a grand plan to buld fuchsia on fuchsia, but that got sufficiently difficult around 2018 when we started leaning more and more on prebuilt toolchain binaries
<geist>
there was a brief period where we self hosted for like a month though. back when it wasn't much more than LK + a simple but functional user space
<mjg>
there is a stupid problem where faulting on the same page gets an exclusive lock on it
<mjg>
and then you immediately go off cpu if you wait
<geist>
that was when we could still build on freebsd and netbsd and whatnot too, since it ws just using the LK build system, written in gmake
<mjg>
this can be patched to use shared locking, but there are stupid tech debt thingies which need fixing first
<mjg>
thanks mach vm
<heat>
geist, self hosted?
<geist>
heat: sentence fragment?
<heat>
i would think building fuchsia early on is sufficiently hard
<geist>
actually no, was easier early on
<geist>
since we were mostly just plain C and C++ then, and the LK build system already understood this stuff
<geist>
we had very quickly built up a posix-lite environment with musl + some file systems, and we were using gcc and binutils then, so it was quite easy to get that working
<heat>
interesting
<mjg>
geist: do you happen to have numbers from building fuchsia on something > 16 threads?
<geist>
was a case of being a mid-hobby os class design in terms of being advanced, just could do it much faster because we already had 10-20 people weorking on it
<geist>
heat: since then the component based design and whatnot has pivoted the design somewhat away from being able to do command liney stuff like that
<geist>
which is still somewhat of an open question: how does someone really *use* fuchsia interactively
<geist>
there's always a push internally to get rid of the shell, and a fair amount of folks push back, so it's a bit at a detente
<geist>
but conflict is fine. that's essence of engineering
<geist>
mjg: hmm, well, it's all over the place. depends on which level of fuchsia you build
<geist>
and how much server assist you get. a 16 core machine with no assist, building a 'core' build is i think in the 30-40 minute range?
<heat>
geist, yeah. building fuchsia on fuchsia sounds great though
<heat>
I guess the best chance you have is starnix? but that's probably super slow
<geist>
heat: yeah agreed. we're just so far away from that right now i dunno if we'll get back
<geist>
though it's really a topi for the discord channel
<mjg>
get rid of the shell?
<geist>
mjg: it's hard to verbalize how much fuchsia is *not* a posix system
<mjg>
you mean they don't like the posix-like shell (or whatever you got) or the concept in general?
<mjg>
even windows eventually got powershell
<geist>
yah and i'd say windows is much more aligned with posix than fuchsia is
<mjg>
i'm saying some form of usable command line is probably mandatory, does not ave to pretend to be unix
<geist>
i 100% agreed
<geist>
i am pro-command line team
<mjg>
team command line unite
<geist>
i think there's a contingent of folks that have the notion that you can build a more powerful notion of a sea of components that can be automatically started and solved for some sort of dynamic task
<geist>
which is probably pretty neat, i just dont know how that works with a user sitting in front of it
<geist>
but anyway, i shouldn't complain about it, at least here
<heat>
yeah. i don't know if a fully modular, capability-based system is super usable like that
<heat>
at some point using bash and some pipey bois to make stuff happen surpasses the need for super locked down stuff
<mjg>
worse is better
<geist>
or at least, command line is tuned for what folks like us think. i think there's some slightly different paradigms than what posix shells do that are interesting, but posix shells are kinda the lowest common denominator
<geist>
(vs say, some sort of command line job based thing you got on some other OSes at the time)
<mjg>
unix is shit, let's be real
<mjg>
but it also beats the shit out of typical GUI
<geist>
yah but it's *just* powerful enough to let you buid something powerful on top of it. that's the genius of it
<geist>
the line is drawn just at the right spot
<mjg>
agreed
<heat>
how well does the shell work if you can't even .. though
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
dude12312414 has joined #osdev
sonny has joined #osdev
<sonny>
I just thought what if ever user gets a copy of the OS and realized that's virtualization
<dzwdz>
what, like the matrix?
<sonny>
no, like vmWare
<j`ey>
what about it?
<sonny>
the only thing that's left is the language based OS I guess
<sonny>
j`ey: nothing, I am just thinking about what is possible
<geist>
hmm, not sure i follow there
<sonny>
language based OS there's no more need for additonal runtime, your server app can just be a function
<vin>
What is the policy that determines when page caches are flushed? Also how many flusher threads are spawned and when? For example if there is a cpu bound workload after a bunch of writes to the page cache, it wouldn't make sense to flush them because it would disrupt the cpu bound workload.
<vin>
Also do these flushes (from kernel pages) to disk happen through DMA?
<dzwdz>
page caches are overrated
<vin>
I am just trying to understand how linux does it dzwdz
<dzwdz>
sorry
<geist>
to answer the latter question (DMA), almost certainly
<geist>
but really it's just a matter of whatever the device driver does
<geist>
for pages that one way or another need to get flushed out to disk/storage they'll go through whatever the driver already does
<geist>
anything halfway modern uses at least some sort of DMA
<vin>
I see, so you are saying the flushing of dirty pages won't have a big impact on CPU load.
<heat>
vin, this is not a linux internals channel
<heat>
but anyway, block devices each have a flusher thread
<heat>
you dirty a page on an inode, that page gets set as dirty, that page gets added to the dirty list in the inode, the inode gets added to the bdev's dirty inode list
<vin>
heat: my bad, the purpose of using something specific (like linux) is to get the conversation started and later generalize to the best possible way to do it.
<heat>
well, these details change from kernel to kernel
<heat>
but anyway, it's done on a time limit
<klange>
We rather specifically push against using Linux as a discussion point here, as there's plenty of other places on Libera to talk Linux.
<heat>
I can't remember the sysfs knob that does it, but there's one
<klange>
Not a rule, but a community preference.
<heat>
and there's an easy way to test this
<geist>
all this aside, as a general rule most systems try to flush dirty pages as a combination of time and memory pressure
<heat>
getrusage(), write to shared file page (mmap), loop while getrusage().faults == oldrusage
<geist>
ie, a) make sure any dirty page doesn't stick around longer than N units of time (say 30 seconds)
<heat>
yeah, you can force this of course
<heat>
evicting an inode from the icache will sync it, fsync(file) will sync it, sync() will sync it
<geist>
and b) try to keep the number of total pages in the system that are dirty below some threshold
<vin>
klange: got it! My experience has only been linux and xv6, hence the question.
<geist>
there are tons of algorithms for this, but generally they are trying to do something along these lines
<geist>
a degenerate case you dont really want to get into is some super high percentage of pages in the system are dirty and waiting for writeback
<geist>
generally better in that case to apply backpressure against the process(es) that are generating pages
<geist>
while the writeback happens
<geist>
vs just purely reacting to dirty pages
<heat>
an interesting property of implementing mmap MAP_SHARED and dirtying is that flushing a page needs to mprotect every mapping of that page back to write protected
<vin>
I am curious about how the flush (by multiple threads) can potentially disturb other workloads in a multi-tenant setup
<vin>
Even if we assume the new workload is not doing any IO, the act of flusher threads being scheduled over the workload can be bad.
<geist>
sure
<geist>
but that's just fact of life. the system has to do work so stuff can continue to do work
frkzoid has quit [Ping timeout: 244 seconds]
<geist>
in general the cpu % of threads doing these background flushes are pretty low compared to user space work, probably. definitely more so now than say 20-30 years ago
<heat>
yeah, services can expect to get interrupted in high performance systems
<heat>
at cloudflare our edge network machines run all sorts of performance-critical services at the same time
<vin>
Right, I would like to quantify this potential performance degradation. Given there will be larger page caches in the future (with CXL), background flushes might need rethinking?
<heat>
why will there be larger page caches?
<vin>
larger number of pages cached. Since per machine DRAM capacity is poised to grow drastically with the number of cores reamining pretty much the same.
<heat>
and why would that matter, relly
<vin>
With this CXL era
<heat>
well, that's wrong then
<vin>
how?
<heat>
increasing memory without increasing cpus will lead to some interesting inbalances
<vin>
yup
<mjg>
please reduce memory access time kthx
<heat>
probably CXL will only impact people on the compute end
<heat>
and writeback isn't something you do frequently on compute, probably
<heat>
at least minimizing that is the goal
<heat>
at the end of the day, no sane person is running 256GB of ram on a PATA hard drive and a core-duo
<heat>
when RAM increases, you get more CPUs, faster CPUs, faster storage with more IO queues, etc
<heat>
and really, what alternative do you have to the current writeback system(s)? they're all going to be similar to what we have now
<heat>
if you have assign a thread to multiple bdevs, writeback will be slower
<heat>
if you assign multiple threads to a single bdev, then that's just weird
<heat>
I guess you could theoretically have one thread per io queue?
<vin>
So far it is believed that memory capacity is the bottleneck for most apps, so you were forced to run the app on two servers communicating over the network. This is what CXL is trying to solve, avoiding the need to scale out just because you don't have enough per node memory capacity.
heat has quit [Ping timeout: 260 seconds]
heat_ has joined #osdev
sonny has left #osdev [#osdev]
Oshawott has joined #osdev
archenoth has quit [Ping timeout: 268 seconds]
<zid>
Good news, I managed to make the cd-rom drive on the playstation send me a spurious INT 0, and break every single emulator
ptrc_ has joined #osdev
eschaton_ has joined #osdev
fkrauthan_ has joined #osdev
<geist>
hah nice. they dont bother emulating cdrom int support?
jstoker has quit [Ping timeout: 268 seconds]
antranig| has joined #osdev
<geist>
is the interface to the cdrom on those things even remotely standard?
tomaw_ has joined #osdev
dayimproper has quit [Ping timeout: 244 seconds]
PotatoGim_ has joined #osdev
Patater has quit [Ping timeout: 240 seconds]
travisg_ has joined #osdev
froggey has quit [Ping timeout: 268 seconds]
ornitorrincos_ has joined #osdev
bleb_ has joined #osdev
HeTo_ has joined #osdev
nyah_ has joined #osdev
sbalmos1 has joined #osdev
pretty_d1 has joined #osdev
Emil_ has joined #osdev
sbalmos has quit [Killed (NickServ (GHOST command used by sbalmos1))]
sbalmos1 is now known as sbalmos
joe9_ has joined #osdev
shikhin_ has joined #osdev
shikhin has quit [Killed (NickServ (GHOST command used by shikhin_))]
shikhin_ is now known as shikhin
nyah has quit [*.net *.split]
pretty_dumm_guy has quit [*.net *.split]
puck has quit [*.net *.split]
joe9 has quit [*.net *.split]
elastic_dog has quit [*.net *.split]
ptrc has quit [*.net *.split]
tomaw has quit [*.net *.split]
HeTo has quit [*.net *.split]
Clockface has quit [*.net *.split]
fkrauthan has quit [*.net *.split]
ornitorrincos has quit [*.net *.split]
eschaton has quit [*.net *.split]
aejsmith has quit [*.net *.split]
stux has quit [*.net *.split]
weinholt has quit [*.net *.split]
bleb has quit [*.net *.split]
Emil has quit [*.net *.split]
travisg has quit [*.net *.split]
Raito_Bezarius has quit [*.net *.split]
PotatoGim has quit [*.net *.split]
antranigv has quit [*.net *.split]
tomaw_ is now known as tomaw
ptrc_ is now known as ptrc
travisg_ is now known as travisg
bleb_ is now known as bleb
fkrauthan_ is now known as fkrauthan
PotatoGim_ is now known as PotatoGim
elastic_dog has joined #osdev
Raito_Bezarius has joined #osdev
puck has joined #osdev
pretty_d1 has quit [Quit: WeeChat 3.5]
nyah_ is now known as nyah
dayimproper has joined #osdev
jstoker has joined #osdev
aejsmith has joined #osdev
Patater has joined #osdev
nyah has quit [Ping timeout: 268 seconds]
antranig| is now known as antranigv
<heat_>
nothing I enjoy more than debugging refcount bugs
hbag has joined #osdev
matt__ has joined #osdev
matt__ is now known as freakazoid333
heat_ is now known as heat
<ebb>
I would simply count the number of references
<heat>
great tip
<heat>
would've never come up with that on my own
<ebb>
It hasn't let me down so far
freakazoid333 has quit [Ping timeout: 255 seconds]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<zid>
heat: have you considered refcounting your refcounts
<zid>
then if they don't match, you have a refcount bug
* zid
preens
<heat>
refcountsan?
<zid>
refcountcountsan
<zid>
for when you want to check that your refcount count doesn't have UB