<bslsk05>
old.reddit.com: _nandi comments on How to learn synchronization in operating systems?
pandry has joined #osdev
<Mutabah>
A bit of a troll... but they're kinda not wrong
<zid>
It's just naive
<moon-child>
i am a RUST PROGRAMMER
<zid>
"I am content with x, so y is all the matters"
<moon-child>
i protect everything with LOCKS
<moon-child>
FEARLESS SCALABILITY
<zid>
some people need complicated hand tuned assembly intrinsics for their locks and lock checkers and profiling and crap
<moon-child>
plus i NEVER DEADL--
<zid>
some people.. only need rust :p
<nur>
rust is a pain the ass
pandry has quit [Ping timeout: 245 seconds]
<nur>
writing kernels is a pain in the ass
<moon-child>
zid: those people should probably be using something higher level, then
<nur>
why should my ass take double the pain
<moon-child>
where you can express your parallelism declaratively and let the system take care of it uniformly
<moon-child>
though there is kindof a lack of good options there if you stop short of distributed-computation-level stuff, unfortunately
lg has joined #osdev
GeDaMo has joined #osdev
pandry has joined #osdev
goliath has joined #osdev
pandry has quit [Ping timeout: 268 seconds]
Left_Turn has joined #osdev
pandry has joined #osdev
pandry has quit [Ping timeout: 260 seconds]
pandry has joined #osdev
gog has joined #osdev
pandry has quit [Ping timeout: 264 seconds]
pandry has joined #osdev
gareppa has joined #osdev
gareppa has quit [Client Quit]
[_] has joined #osdev
[itchyjunk] has quit [Ping timeout: 268 seconds]
stylefish has joined #osdev
stylefish has quit [Quit: Leaving]
stylefish_ has joined #osdev
<maybemjg>
moon-child: i';m not opening that
<maybemjg>
moon-child: someone link the poor sap the perfbook
<maybemjg>
trade secret: msost kernel programmers don't know almost anything about locks
<maybemjg>
or how to write code w.r.t. performance
<maybemjg>
rather they wanted to do /something/ and it it is in the kernel and in c
<gog>
never lock
<gog>
locks only cause problems
<gog>
also never do two things at the same time
<gog>
concurrency only causes problems
<maybemjg>
that's parallelism
<gog>
never do parallelism
<gog>
or concurrency
<gog>
or ansynchronous
<gog>
all things linear. all things serialized.
<zid>
Correct
<zid>
threads are honestly massively over-used
<gog>
never thread
<maybemjg>
threads are webdeved into oblivion
<maybemjg>
i find it amusing how people who claim to be working iwth event loops
<maybemjg>
also automatically want threads for them
<maybemjg>
like mofo, do you know what an event loop is
<gog>
never loop either
<maybemjg>
never say never!!!
<gog>
always say never
<gog>
never don't always never always don't
<maybemjg>
always pet mjg
* gog
petpet maybemjg
kfv has joined #osdev
* maybemjg
maybe purrs
<kof123>
> trade secret: msost kernel programmers don't know almost anything about locks > or how to write code w.r.t. performance > rather they wanted to do /something/ and it it is in the kernel and in c
<kof123>
i think i found my motto
<gog>
i don't want to do anything
<gog>
i want to sleep
<GeDaMo>
Mmmmm ... sleep :P
<kof123>
or, i consider locks as needing asm basically, so hw-specific....so not a focus until later for me
<zid>
we're reading how the spec specifies integer literals
<zid>
and the token 0 by itself is.. octal 0
<zid>
basically number -> octal-number or hex-number or decimal-number, and decimal number is '123456789' then optional '0123456789' repeated, octal number is 0 then optional '01234567' repeated
<zid>
so all 0s end up on the disgusting octal path
<bslsk05>
xuanwo.io: Rust std fs slower than Python!? No, it's hardware!
<gog>
0o is better
<zid>
I figured you'd be in the 0w prefix camp
edr has joined #osdev
<gog>
Ow0
<zid>
gog how could you attack my eyes like this
<gog>
oopsie woopsie i made a fucky wucky uwu
<sham1>
gog: no
<gog>
sham1: i'm working vewwy hawd to fix this
goliath has quit [Quit: SIGSEGV]
bauen1 has quit [Ping timeout: 264 seconds]
heat has joined #osdev
<heat>
gog, never lock
<heat>
determinism is boring and a scam
<heat>
if you don't lock: 1) you avoid deadlocks (scam problem, not real, wake up sheeple) 2) it's faster 3) you can get a surprise if you're lucky
<maybemjg>
ey mofz
<maybemjg>
you don't need to lock anything in node.js
<maybemjg>
all while using threading
<zid>
heat I linked you a thing
<zid>
it's about both RUST and rep movsb
<zid>
your favourites
<heat>
fsrm is my favourite thing in the world
<maybemjg>
fsrs should be favouriterer
<heat>
rex.w 66h 66h 66h rep movsb
<heat>
>It seems that rep movsb performance poorly when DATA IS PAGE ALIGNED, and perform better when DATA IS NOT PAGE ALIGNED,
<heat>
LM
<heat>
AO
<heat>
THIS IS AN AMD MOMENT
<maybemjg>
lol at this article
<maybemjg>
i barely started reading and webdev is leaking
<zid>
see I knew you'd like it
<maybemjg>
> I learned how to use strace for syscall analysis. strace is a Linux syscall tracer that allows us to monitor syscalls and understand their processes.
<maybemjg>
fucks htis
<maybemjg>
i'm out
<zid>
memcpy needs per-cpu specialization at this point, given how random as fuck rep movsb's performance is
<heat>
wait
<heat>
the fuck is this dude on about
<heat>
>When (dst-src)&0xFFF is small (but non-zero), the REP MOVSB path in memcpy performs extremely poorly
<heat>
this is not page aligned
<heat>
zid, it does have per-cpu specialization
<maybemjg>
wait, does that fucking cpu even do fsrm
<heat>
apparently
<zid>
nah it's per.. something
<zid>
but it ain't per cpu
<zid>
per.. one of like 4 options someone wrote, gg
<maybemjg>
not long ago it was common for amd to not even have erms
<maybemjg>
they suddenly have fsrm?
* maybemjg
presses F to doubt
<maybemjg>
cpu from nov 2020
<maybemjg>
it should not be using rep movsb
<heat>
zid, all the x86 memcpies have tuning params that someone sets using cpuid
<heat>
so you can take those from caching info, x86 family, etc
<zid>
yes but cpuid doesn't tell you how to tune it
<zid>
you do some course if(0s
<zid>
It needs to be per-cpu. Not per 4 things.
<maybemjg>
total webdev that it took a kernel dev to run perf
<zid>
I will struggle, unless it's about pokémon red/blue
<heat>
in your vertex shaders, you're doing all sorts of matrix calculations in the shader instead of doing it in js and passing it in as a uniform
<zid>
Correct
<zid>
who the fuck wants to implement matrix ops in js
<zid>
when I have a perfectly good graphics card than can do them natively
<zid>
I only need them cpu side if i'm doing recursive calcs
<zid>
like rotating a thing around another thing
<zid>
which that doesn't do
<heat>
but all the examples i've seen online just push projection + view + model matrices (sometimes pre-multiplied) to the shader in uniforms
<heat>
so which is the best way to do things?
<heat>
i know the gpu is really good at floating point but i've never measured any of this shite
<zid>
It's whether it needs to be recursive or not, mainly
<zid>
MVP doesn't
<zid>
so you can just upload MVP and have the gpu do that part
<zid>
either as M V and P, or MVP
<zid>
but if your scene gets more complicated than "1 thing in the middle", you'll need to be able to rotate things around things etc, and suddenly the cpu needs to be able to calculate combined rotations and translations, before the MVP matrix
<zid>
and you can't use a fixed function pipeline for that
xenos1984 has quit [Read error: Connection reset by peer]
<zid>
good luck walking a scene graph on a gpu
<heat>
sgtm
<zid>
certainly doable
<heat>
the gang branches in shaders
<zid>
but you'd end up heavily gpu loaded and cpu idle
<zid>
for no real benefit
<heat>
that's perfect for a gaming rig
<heat>
great idea!
<clever>
zid: for the v3d on the rpi, it wants a flat array of vertex data, and then a second array containing sets of 3 indexes into the 1st array, so there is no real instancing or scene graph going on
<zid>
gpus never have scene graphs
<clever>
it does have other modes that use 2 indexes per poly, but they basically work the same way
<zid>
hence "good luck walking a scene graph on a gpu"
<zid>
what are you trying to say?
<heat>
for my gaming rig i'll piiiiiiiiiick... hmm, a 3090 and a 4770k
<zid>
heat: tbh a 4770k would be plenty for most games
<zid>
you just wouldn't be able to hit 200fps
<zid>
cus you wouldn't be able to get the game ran cpu side that quickly
<heat>
these days it's a bottleneck i think
<zid>
you'd hit 60fps at 8k just fine
<clever>
more, that it lacks instancing entirely, if you have 2 identical models in the scene, the cpu needs to offset/rotate the verticies in each instance
<zid>
but you would get >60fps at 1080p
<zid>
wouldn't*
<clever>
or add attributes
<gog>
i don't need a gpu i just rotate cubes in my mind
<clever>
have a transformation matrix appended to every vertex, in the vertex data
<heat>
are you high
<zid>
I have aphantasia so I need a gpu, gog
<gog>
i amnot high
<heat>
hi not high
<heat>
im heat
<zid>
heat: You can play GTAV on a Q6600 even with a bad gpu that takes ages to draw the frame.
<zid>
2.4GHz of core2 power
<heat>
yeah and GTA V is an ancient 2013 vidya game
<gog>
i mis my q6600
<zid>
people only need good cpus if the gpu is taking 14ms to render the frame, so the cpu only gets 2ms to play with
<zid>
or they want >60fps
<zid>
if you have a decent gpu and only want 60fps, I bet a 4770k could play basically any game
<gog>
i haven't played gta v since i finished story mode
<zid>
gog: same, the collectibles didn't interest me
<heat>
i'm relatively sure i've seen some benchmark results where the 4770k just castrates the kickass gpus
<zid>
gta3/vc I played the ever loving shit out of the post-game
<zid>
heat: yea, lowers them from 200fps to 60fps
<heat>
gta v i probably last played in, idk 2014 or 15?
<heat>
on my XBOX 360
<heat>
POWERPC POWERPC POWERPC
<heat>
XENON CPU BEST
<heat>
TRI CORE BAYBEEEEEE
<heat>
EIEIO
<zid>
with the slick prefetcher bug
<heat>
you know what bug it did not have?
<clever>
> people only need good cpus if the gpu is taking 14ms to render the frame, so the cpu only gets 2ms to play with
<heat>
it didn't leak AVX data in registers
<heat>
that it did not.
<clever>
zid: what if the cpu was preparing the next frame in advance?
<heat>
because it didn't have AVX BECAUSE WHO NEEDS THAT WHEN YOU HAVE POWERPC POWERPC POWERPC
<heat>
EIEIO
<clever>
then you have 16ms for the cpu to prepare the next frame, and queue it up in the gpu
<clever>
and as long as the gpu can display it within 16ms, you can maintain 60fps
bauen1 has joined #osdev
<clever>
at the cost of 32ms of delay between the state becoming visible on-screen
<kof123>
hmm...i got an xbox 360 for free, but seems to be "not old enough" for the easy way to homebrew :/
<zid>
nobody likes input lag
<zid>
some games do do it that way, they're called "bad games"
<clever>
back when i was getting 7fps, i noticed a number of games only register input if you hold the key for 1 frame
<clever>
and at 7fps, that means holding a key for 142ms, lol
<zid>
all games, I hope
<zid>
nobody likes buffered stale inputs either
<clever>
now try typing in a text input box :P
<zid>
yea that's when it becomes a problem
<heat>
i like stale bread and stale inputs
<zid>
when what you're interacting with is a TTY, not a video game
<zid>
buffering is nice on a tty
<clever>
what if the video game includes its own ingame scripting system and a text editor?
<heat>
sounds like a nerdy video game, uninstall
<clever>
lol
<zid>
why
<zid>
was it running at 7fps
<clever>
fx8350 and an equally old gpu
<zid>
that still doesn't explain it
<heat>
oh man
<heat>
the fx8350
<zid>
I'm not sure I could run a zachtronics game at 7fps
<heat>
what a cpu
<zid>
8350 still has the single core freq record?
<clever>
factorio was the only game that system could run at 60fps
<zid>
impossible
<zid>
an empty factorio world maybe
zxrom has joined #osdev
<zid>
did you try actually building a factory
<clever>
yes, on multiple planets (space expansion mod)
<zid>
I don't believe you
<clever>
*digs*
<zid>
'solar only and I had it oveclocked to 8GHz'
<node1>
when we run multiple virtual machine (VM) on a host operating system, it uses a combination of host memory and its own separate virtual memory( by each VM correct) ?
<clever>
node1: when using /dev/kvm on linux, your just mapping a chunk of virtual memory from userland, into the guest
<clever>
so the userland process on the host, has to first allocate that ram from the host kernel, via the normal manner
rustyy has joined #osdev
sham1 has joined #osdev
XgF has quit [Remote host closed the connection]
vdamewood has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
XgF has joined #osdev
virt has joined #osdev
stylefish_ has joined #osdev
stylefish_ is now known as stylefish
<node1>
clever thank you
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
zxrom has quit [Quit: Leaving]
node1 has quit [Quit: Client closed]
xenos1984 has quit [Ping timeout: 260 seconds]
gog has quit [Quit: Konversation terminated!]
xenos1984 has joined #osdev
<zid>
guys, I just remembered about the christmas arson goat
node1 has joined #osdev
stylefish has quit [Quit: stylefish]
<gorgonical>
Here's an idle thought: say your timer tick goes every 1 second. An application requests a sleep for 0.2s. Do you reconfigure your timer to fire in 0.2s and then after expiring existing timers, reconfigure for your standard period? Or do you accept that applications requesting a sleep period shorter than your scheduling period might be late?
<gorgonical>
Reconfiguring the timer means that the scheduling interval is now only *up to* the period, not exactly the period because you might cut it short with a pending timer
potash has quit [Ping timeout: 246 seconds]
<gorgonical>
But it also creates somewhat complicated bookkeeping, because what if the application's 0.2s wait spans a timer tick? Do you recompute pending timers and reconfigure the timer for the soonest tick that way?
<gorgonical>
or all this can be avoided by, instead of going into a halt, checking if you have timers that will expire before the next tick, and then busylooping instead of halting
<zid>
wake up in 0.2, then wake up in 1.0 from that.
<gorgonical>
So then at your timer tick you need to find the soonest pending timer, including your schedule tick, and schedule for that then
<gorgonical>
Which can be not so bad if you sort your timers as you make them
gog has joined #osdev
<clever>
gorgonical: that is the difference between a fixed tick interval and a dynamic timer
<gorgonical>
clever: I always assumed dynamic timer meant there wasn't a regular scheduling tick
<bslsk05>
github.com: lk-overlay/dev/timer/vc4/rules.mk at master · librerpi/lk-overlay · GitHub
goliath has joined #osdev
<clever>
its then up to the timer driver to declare if its dynamic or not, and LK can then work with both types
<gorgonical>
Yeah the reason I'm thinking about this is because I maybe have a reason to heartbeat check Linux for something from my kernel, at a much higher frequency than the scheduling timer
<gorgonical>
And currently if I want a kernel worker thread to sleep say 10us and poll again, there's no way to do this currently without increasing the scheduling tick frequency
<clever>
in LK, sleeps like that are handled seperately, over in the scheduler
<clever>
i think
<clever>
any time you schedule a task, it checks if there are other runnable tasks, and may setup a timer to force a context switch
<zid>
even fully 'tickless' machines often have a heartbeat
<clever>
so if there is only one runnable task, nothing is setup to interrupt it and context switch later
<clever>
but now that i think about it, sleeping is not runnable...
<bslsk05>
github.com: lk/kernel/thread.c at master · littlekernel/lk · GitHub
<clever>
yep, that makes sense, it sets up a one-shot timer, that will make the thread runnable again in the future
<clever>
and the timer returning INT_RESCHEDULE, tells it to do a reschedule as it returns from the irq
<clever>
so it may switch to the freshly waken thread
<clever>
but yeah, your limited to the resolution of the timers
<gorgonical>
how do you mean? whatever you specify as the shortest resolution of timers or something else?
<gorgonical>
because with mixed dynamic+heartbeat you aren't limited in that way
<clever>
in the case of the rpi, the timer is clocked at 1mhz
<clever>
so the hw increments a counter every 1 uSec
<gorgonical>
Indeed then you do mean the hardware resolution
<gorgonical>
My timer is 24mhz so that gives me ~500ns resolution
<clever>
enless you count once every ~76 minutes, the rpi is also not capable of ticking at a regular interval
<clever>
it has 4 compare registers, and will fire one of 4 irq's when they match a 32bit clock
<clever>
and the clock just counts up at 1mhz
zxrom has joined #osdev
<node1>
What's nested paging in virtualization?
<clever>
node1: when it first runs a virtual addr thru the guest paging tables, to create a guest physical addr, then runs it thru the hypervisor paging tables to create a host physical addr
<clever>
that lets the guest manage its own paging tables, but still restricts what the guest can do
<clever>
the other option, is where the host has to trap all attempts to modify the paging tables, then the host mangles the paging tables, changing the physical addresses
<clever>
which slows it down greatly
<gorgonical>
yeah nested paging without hardware acceleration is awful. something like up to n^2 for accesses in the guest
heat_ has quit [Remote host closed the connection]
<node1>
So this means Nested paging is a technology used by hypervisors in a virtualized environment. Which translate the guest physical addresses used by VMs into the actual physical addresses in the host's memory?
heat has joined #osdev
<gorgonical>
yes
<node1>
gorgonical okay thank you :)
xenos1984 has quit [Ping timeout: 252 seconds]
<node1>
Thank you clever
<gorgonical>
as an fyi people will call that technology different things. I think AMD calls it nested page tables, but intel calls it extended page tables (ept). Sometimes they are called shadow page tables when referring to software-only ones, too
<node1>
oh okay
joe9 has joined #osdev
netbsduser has joined #osdev
<heat>
gorgonical, fyi the linux high resolution timers have something called "slack" where they accumulate N events with less than <slack> ns between them
<heat>
because firing slightly late is no problem at all
<zid>
This is why you're not realtime licenced heat
<zid>
you say shit like that
<node1>
But some white paper claims that nested paging reduces overhead ?
<gorgonical>
It does if you have hardware that accelerates things
<pitust>
how would nested paging work without that
<node1>
How come? with traditional Shadow Paging ??
<gorgonical>
shadow page tables where every access forces a fault the hypervisor handles and then fixes up
<zid>
ooh new monitor crash, pretty stripes!
<node1>
Like how does the guest virtual address being translated into host physical address ? in shadow page table?
<clever>
node1: i think in that case, there is 2 paging tables, the one the guest thinks is in use, and the real one the hypervisor is managing (the shadow one)
<clever>
every write to the guest paging tables triggers a pagefault
<gorgonical>
yes this exactly
<clever>
and the hypervisor then changes the phys addr, and updates the shadow table
xenos1984 has joined #osdev
<clever>
so you have a major cpu overhead (faulting into host, and running lookups) every time the guest writes to the tables
<gorgonical>
which *also* includes the guest's page tables themselves, so the guest attempting to access some memory location, say 0x1000, causes a traversal of the guest's page tables which are themselves in guest memory. a page fault into the hypervisor to check for every single layer of the page table in the guest
<clever>
and also when the userland in the guest does mmap(), it faults all the way to the host
<gorgonical>
it is horribly slow without hardware acceleration
<clever>
so just allocating memory or creating a process in the guest, causes cpu overhead in the host
<heat>
that's all very true but have you considered that
<node1>
aha i see it looks shadow page table take one-level indirect addressing while nested page table have direct addressing.
gog has joined #osdev
<gorgonical>
never forget heat that lunix was invented by linyos torovoltos
<node1>
so if we enable nested paging. the CPU's memory management directly translates guest physical addresses to host physical addresses, without the need for hypervisor intervention in the translation process.?? correct??
<netbsduser>
oh yeah, linux doesn't believe in filesystem/disk blocks larger than page size
<netbsduser>
in fairness to them it adds some measure of complexity
<zid>
imagine spending more than 14 seconds optimizing for swap performance
<zid>
it's swap, perf's fucked already
<netbsduser>
it's not
<netbsduser>
little-used anonymous memory is fit only to be cast out when stealing those pages for e.g. file caching would be more advantageous
<zid>
have you considered
<zid>
having more ram than storage
<maybemjg>
i had that
<zid>
256GB of ram and a 230GB SSD, perfect
<maybemjg>
installer of some crapper was adamant on creating a swap partition 2 x ram
<maybemjg>
and thus failed
<netbsduser>
if you were trying to install sunos 4, it's because every committed page of anonymous memory must be backed by a corresponding page of swap
<netbsduser>
actually never mind
<netbsduser>
it's been a long day and i forgot that sunos 4 isn't typically installed on machines with 256gb of ram
<zid>
ah see there's your mistake
<node1>
Thank you all
<zid>
especially as this started as "linux smh, doesn't optimize for swap"
node1 has quit [Quit: Client closed]
<zid>
so presumably the installed OS is.. linux
<maybemjg>
you make a couple of typos and bam, sunos