<junon>
The lowest level facility to create build rules is the global `Rule()` function, which generates a ninja `rule` statement and returns another closure that you can call repeatedly to create ninja `build` statements.
<gog>
MelMalik: allt gott
<junon>
I just added the first standard library module that builds on top of that and auto-configures a C compiler for you.
<junon>
It also standardizes config params and converts them to compiler-specific flags, too.
<junon>
But the resulting config is chefs kiss. I'm really excited to start actually using it now.
<gog>
i love stuff like that
<gog>
parsimony
<junon>
I can compile a list of C files using `local cc = require 'cc'; return cc{ S'foo.c', S'bar.c', warn='strict' };`
<junon>
Once I add the linker then I can do `local link = require 'link'; return link{ out=B'my-program', cc{ S'foo.c', S'bar.c', warn='strict' } }`
<gog>
seems like it'd be pretty flexible too
<junon>
Yeah
<junon>
It's strict, so you can't access things above the configuration file's own directory.
<junon>
`B` and `S` can be prefixed to a string to resolve the path to the build and source directories, respectively
<junon>
and the build directory is always `<build_prefix>/<source_dir>`, where `<source_dir>` is the directory of the currently running config file, relative to the root config file directory
<junon>
you can also do things like `local c_path = S'foo.c'; local o_path = B(c_path):ext' .o';`
<junon>
err, switch the ' and the space right before the `.o`
<junon>
The way I was trying to configure modules for my kernel via CMake was just not clean or flexible and the resulting code was just awful.
<junon>
I need a good documentation generator now, though.
<junon>
I like the idea of RST but it lacks the kind of flexibility I want.
cooligans has joined #osdev
<cooligans>
Does anyone know why my AMD Ryzen 5500 dosen't come with TSC-Deadline mode
<geist>
not sure AMD has implemented it
<cooligans>
Oh
<cooligans>
it looks to me like its easy to implement
<geist>
if it did it's going to be in Zen 3, since i know it's not in zen 2
<cooligans>
intel CPUs have had it for quite a while
<geist>
maybe there's patent reasons AMD didn't pick it up yet
<cooligans>
true, true
<geist>
but yeah it does seem like it'd be pretty easy
<geist>
also took a while for Zen to pick up x2apic
<cooligans>
Does QEMU have it implemented (without KVM/WHPX)
<cooligans>
using TCG
<geist>
actually perhaps that's the reason. it only makes sense if you have the full x2apic support because of the whole local apic access using MSR
<geist>
yes
<cooligans>
are you by any chance on the osdev discord
<geist>
there have been multiple
<cooligans>
the biggest one
<cooligans>
~4000 members
<geist>
i have popped in from time to time, but haven't found it compelling to continue
<cooligans>
you like the calm of IRC
<cooligans>
discord is a bit more chaotic, since so many people are talking at once
<geist>
well, i more than that found at least at the time that the level of discorse in the discord to be low
<geist>
ie, lots and lots of folks without a lot of knowledge giving bad advice
<geist>
i tried to help for a few months but didn't seem like anyone was interested
<cooligans>
oh
<geist>
technically i'm still there too
<geist>
looking at it now, i just muted it forever ago
<cooligans>
I mean, you seem quite more knowledgeable than i am
<cooligans>
i'm sure you're help would be apreciated for the more advanced topics
<geist>
sure. like i said i tried to for a while, but it didn't seem like there was the appropriate level of... discourse. hard to describe
<cooligans>
like there was someone with a PKS (Supervisor Protection Keys) problem the other day
<cooligans>
I didn't even know that was a thing
<cooligans>
till 2 days ago
<klange>
I was on the current one really early on but left. I rejoined for a bit but I just find the active userbase has strong opinions. It's likely constantly being in a Hacker News comment section.
<geist>
yeah exactly
<cooligans>
lol
<klange>
s/likely/like/
<geist>
that's part of it, lots of folks with bad advice and a lot of opinions
<geist>
which is fine up to a point, but after a while it's like swimming in sand
<cooligans>
i get it
<geist>
if you're trying to sort of raise the bar so to speak
<cooligans>
anyways, has anyone implemented shadow stacks
<geist>
that being said i'll try to pay a bit more attention
<cooligans>
I'm trying to add it to my operating system
<geist>
what do you precisely mean by shadow stacks?
<geist>
there are tons of things called that
<cooligans>
CR4.CET and the whole SSP set of instructions
<geist>
ah no. it's some new intel thing i haven't looked at at all
<geist>
has it made it into consumer cpus?
<cooligans>
I think
<geist>
thats the poiint where i start paying a bit of attention
<cooligans>
its also on the AMD System Developers manual
* geist
nods
<cooligans>
Wait, how did you do that
<cooligans>
nods thing
<cooligans>
i'm kinda new to IRC
<geist>
honestly i spend most of my time in ARM64 land
<klange>
Usually `/me` is the command.
<geist>
type /me <emote>
* cooligans
nods
<cooligans>
there we go
<cooligans>
I find ARM64 to be a bit too confusing
<geist>
also a side note that i can't really get to worked up about but the discord server seems far more x86 centric than even this channel
<cooligans>
now that's definitely true
<geist>
virtually all conversations are implicitly x86, except one channel thats 'other-isas'
<geist>
and since i spend most of my time on !x86 it's less compelling for me
<cooligans>
90% of people work on x86 based projects
<cooligans>
there is only 2 projects that aren't x86 centralized afaik
<geist>
which isn't really untrue of here as well,m but i think non x86 stuff is more tolerated here since enoiugh of the regulars are also either building portable OSes or also work on arm/riscv/etc stuff
<cooligans>
managarm, whose creators are in this server
<cooligans>
i think
<klange>
There's two kinds of people in that Discord. The ones that make Managarm, and the ones that think their project is as cool as Managarm and are sorely mistaken.
<cooligans>
I have one last question before I go, what IRC clients do you guys use?
<cooligans>
managarm is impressive ngl
<gog>
konversation
<klange>
irssi
<cooligans>
its almost as impresive as zircon
<geist>
managarm? never heard of it
<cooligans>
I'm using hexchat
<cooligans>
its on github and it runs more than 50% of unix applications
<bslsk05>
managarm/managarm - Pragmatic microkernel-based OS with fully asynchronous I/O (40 forks/728 stargazers/MIT)
<junon>
geist: I think it's because all of the "hello world" tutorials e.g. on osdev.org exclusively target x86.
<geist>
looks like x86-64 only
<geist>
but it does at least do SMP so i'll give it that
<cooligans>
check kernel/thor/arch
<cooligans>
and kernel/eir/arch
<klange>
geist: You should look into it as it's a rare gem of functionality; they're going for userspace Linux compatibility and can run Wayland.
<geist>
cool
<cooligans>
thats where the arch specific code is stored
<klange>
They've earned my respect at least.
<geist>
yeah looks like it's a thing for sure
<cooligans>
the craziest thing is that its a microkernel
<cooligans>
most projects like this are monolithic
<geist>
sure, but microkernels aren't *that* weird
<geist>
see zircon, etc
<cooligans>
well yeah, but microkernels are rare
<cooligans>
is zircon still actively developed
<geist>
fair. the heydey was back in the 90s
<klange>
People make a big deal out of microkernels vs. not microkernels, but it's really a very minor thing. Minix is a microkernel, Hurd despite all of its stumbles is a microkernel and can run a full Debian XFCE desktop, Zircon is a microkernel.
<geist>
absolutely
<geist>
full disclosure: i'm one of the main creators of zircon
<geist>
and i still work on it at work
<klange>
I think people have gotten the wrong idea into their heads from the famed Tanenbaum debates, and even most "monolithic" OSes these days are far closer to microkernels than the ones that were relevant in the 80s.
<geist>
it's a fork of littlekernel, which is also one of my old projects
<cooligans>
does google plan to swap out chromeos with fuchsia
<cooligans>
at least that was my first impression
<junon>
geist: you mean Fuchsia's Zircon?
<geist>
cant talk about future stuff
<geist>
junon: yes
<junon>
I'm still surprised Google has kept the project. They like to kill those a lot these days.
<geist>
heh, not gonna argue with that
<geist>
naw, fuchsia has actually shipped in things at this point
<cooligans>
wow
<klange>
They've made a nest for themselves, so hopefully they'll stick around :)
<geist>
also side note littlekernel is all over Pixel 6 too
<geist>
kinda proud of that
<cooligans>
I've always had one problem with zircon though
<cooligans>
you can't build it standalone
<geist>
cooligans: you and me both
<cooligans>
there was like a bug number 35***
* klange
hopes that joke landed somewhere
<geist>
used to, cant anymore. i argued against it, but lost that particular battle
<junon>
geist: how many Fuchsia devs are in this channel, do you think?
<geist>
oh probably 5 or 6
<geist>
we have a discord server for it now if you want to ask technical questions
<cooligans>
geist: why did they vote aginst it
<geist>
just dont bother asking things like 'what is google going to do with it'
<cooligans>
ooh, could I get the link
<geist>
cooligans: unified build system
<klange>
Can't say we haven't tried to increase the number...
<junon>
klange are you on fuchsia as well?
* klange
is quite happy with current employment situation.
<geist>
makes sense. ease of building, one GN/ninja instance, etc
<klange>
No, but I've applied and been through interview processes, back in the before times.
<geist>
and there's some amount of cross-contamination of libraries and whatnot between user and kernel
<junon>
I see
<geist>
which i was initially against, but have since relented
<cooligans>
geist: mind sending me the discord link
<geist>
it's not super busy but if you have technical questions we'd be happy to answer
<cooligans>
thanks
<cooligans>
anyways, its kinda late, I gotta head out for the night
<geist>
anyway, fairly proud of how well zircon has turned out
<geist>
lots of fun decisions we made early on and most of them have turned out to be decent
<cooligans>
since gn is the build system, it it possible to build fuchsia on windows
<cooligans>
*is
<geist>
on a posix environment yes
<cooligans>
so wsl
<cooligans>
works
<geist>
otherwise no. there's a lot of prebuilt toolchain binaries
<geist>
clang/rustc/etc
<geist>
which are linux or mac only
<cooligans>
ok
<cooligans>
but since wsl is posix, it works
<geist>
yep. WSL2 builds it fairly well
<geist>
WSL1 is a trainwreck building fuchsia for Reasons
<cooligans>
ok
<junon>
WSL1 doesn't have a sleep syscall implementation :c
<junon>
so any attempts to `sleep` and the like fail
<cooligans>
wow
cooligans has quit [Quit: Connection closed]
<junon>
At least in every case I've personally tried.
<geist>
yah we found that gn itself has a terrible antipattern of heap usage for WSL1
<geist>
that causes it to take literally 20 minutes or so to run
<geist>
then there are some Go based tools in the build that also run horribly on WSL1. probably for sleep() like reasons
<geist>
WSL1 is a pretty amazing solution, but it's always the edge cases that fall over
<geist>
WSL2 being just a VM it works pretty fine
<klange>
WSL1 amazes me, and the fact that WSL2 happened just makes WSL1's entire existence even more crazy.
<junon>
Yeah. But there are outstanding issues with WSL2 that make me nervous to switch
<geist>
yah
<junon>
Did they pull the plug on WSL1 entirely now, in terms of support?
<klange>
I do a bunch of dev under WSL2 on a Surface, got it all set up for nested virtualization so I can use KVM with QEMU.
gog has quit [Ping timeout: 268 seconds]
dre_ has joined #osdev
cooligans has joined #osdev
[itchyjunk] has joined #osdev
cooligans has quit [Client Quit]
cooligans has joined #osdev
cooligans has quit [Client Quit]
<raggi>
geist: from what I could figure out the common cause was the ptmalloc strategy for small object allocations was hitting an extremely slow path (which is also slow on Linux, but 10x more impact on wsl1) - I'm anticipating starnix will have similar challenges to overcome eventually
<raggi>
er, glibc malloc, strictly speaking
<raggi>
jemalloc and tcmalloc with their map arenas operated much more efficiently
<raggi>
I think, but I didn't get around to asserting it for sure, that it was the fine grained madvise causing the bulk problem, and more for threaded programs than serial ones, so assumption is it's hitting a global or widely shared lock
m5zs7k has quit [Quit: m5zs7k]
m5zs7k has joined #osdev
Maka_Albarn has quit [Quit: Light travels faster then sound, which is why some people appear bright, until you hear them speak]
kajiryoji has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
srjek has quit [Ping timeout: 260 seconds]
Oli has quit [Quit: leaving]
kingoffrance has quit [Ping timeout: 265 seconds]
<klange>
I just dd'd my ISO to a USB stick and popped it in my ThinkPad and my bootloader actually works - it's the first time I've tried that.
darkstarx has quit [Quit: Leaving]
darkstardevx has joined #osdev
darkstardevx has quit [Remote host closed the connection]
darkstardevx has joined #osdev
tacco has quit [Remote host closed the connection]
ElectronApps has joined #osdev
josuah has quit [*.net *.split]
divine has quit [*.net *.split]
Arsen has quit [*.net *.split]
buffet has quit [*.net *.split]
bleb has quit [*.net *.split]
kkd has quit [*.net *.split]
valerius_ has quit [*.net *.split]
kori has quit [*.net *.split]
mxshift has quit [*.net *.split]
gorgonical has quit [*.net *.split]
nickster has quit [*.net *.split]
mxshift has joined #osdev
bleb has joined #osdev
Arsen has joined #osdev
nickster has joined #osdev
gorgonical has joined #osdev
kori has joined #osdev
kkd has joined #osdev
valerius_ has joined #osdev
divine has joined #osdev
jstoker has quit [*.net *.split]
pitust[m] has quit [*.net *.split]
MelMalik has quit [*.net *.split]
hl has quit [*.net *.split]
gruetzkopf has quit [*.net *.split]
geist2 has quit [*.net *.split]
CompanionCube has quit [*.net *.split]
V has quit [*.net *.split]
HeTo has quit [*.net *.split]
kazinsal has quit [*.net *.split]
mhi has quit [*.net *.split]
HeTo has joined #osdev
geist2 has joined #osdev
kazinsal has joined #osdev
V has joined #osdev
jstoker has joined #osdev
gruetzkopf has joined #osdev
hl has joined #osdev
CompanionCube has joined #osdev
Ellenor has joined #osdev
Ellenor is now known as MelanieMalik
_xor has joined #osdev
MelanieMalik is now known as MelMalik
bradd_ has joined #osdev
bradd has quit [Ping timeout: 260 seconds]
xenos1984 has quit [Quit: Leaving.]
vin has joined #osdev
<vin>
What does loads and stores being atomic with respect to each other mean? And why isn't this supported on x86?
<vdamewood>
vin: Do you know what it means for an operation to be atomic?
<vin>
yes vdamewood
<vin>
So a load after a store should always return the stored value? Is that it?
<vdamewood>
Yep, and a load before a store should load the value before the store.
<vin>
Right but I thought this is guranteed on x86, this is the basic memory consistency one has to support
<vin>
*guaranteed
<vdamewood>
I'm pretty sure this is guarandeed for a single core, but not for a multicore setup.
<bslsk05>
en.wikipedia.org: Peterson's algorithm - Wikipedia
<vdamewood>
No clue on my part.
<vin>
Also why isn't multi-core not guarantted? Because of different L1/L2 caches and instruction reordering? The coherency protocol invaldiates a dirty cache line to ensure consistency but a thread could do a load on it before the invalidation making the change made by other thread(other core) invisible?
<vin>
vdamewood: the notes section in the above wiki page sort of hints that memory reordering of sequential accesses without explicit memory barriers can break this algorithm but then any normal sequential program without barriers will also then provide no guaranttess of memory consistency!
<vin>
That's absurd
ElectronApps has quit [Remote host closed the connection]
<Griwes>
the algorithm seems to write and read from _different_ variables
<Griwes>
unless you use strong enough memory orderings or fences, those can be reordered with respect to each other
<Griwes>
the notes don't talk about accesses to the same address, but accesses to different addresses
<Griwes>
and those get reordered all the time
<Griwes>
your sequential code that does not have (1) fences, (2) atomics, or (3) data dependencies between instructions won't execute the way you wrote it
<Griwes>
there's also _at least_ two levels of reordering that happen to your program unless you directly write assembly: the compiler is going to reorder accesses (save for when it encounters memory order enforcement such as atomics or fences), so the assembly is not what you wrote; and the cpu is going to reorder instructions when actually executing (save for memory order enforcement such as atomics or fences)
<Griwes>
you are _probably_ safe on x86 when accessing the same variable, and in a bunch more cases because the memory model semantics of x86 are _incredibly_ strong
<Griwes>
but this looks like one case where you can easily get bitten
<zid>
compiler barriers are fun
<zid>
that's all you need on uniprocessor though
<zid>
your barriers and fences etc can all define out to asm(""::"memory");
<Griwes>
that's the strongest compiler fence, yes
<zid>
compiler barrier, it isn't a fence
<Griwes>
I'm not convinced that there's a meaningful distinction
<zid>
There isn't any meaning to compiler fence, is all
<zid>
couple hundred google results, mostly talking about fences inside compilers
<Griwes>
C++ has a function that is called atomic_thread_fence for this, and it's definitely a compiler operation too
<vin>
Griwes: just to clarify, even if I write the algorithm in asm (avoiding compiler reordering) I expect this algorithm to not work on a single core processor. Because x86 can reorder accesses to the two different memory locations this breaking the logic?
<vin>
*thus breaking
<zid>
I'd like to see how any of them would change in a uniprocessor system, though
<Griwes>
vin: unsure, because I don't remember what the exact semantics on memory operations in x86 are
<Griwes>
a few years back I'd be able to answer that :P
<zid>
C's going to treat it all as a no-op unless they're actually volatile, which there's no need for them to be because it's uniprocessor, and compile it down to a nice fat nothing, given sufficient optimization
<Mutabah>
vin: You will always read back waht you last wrote
<Griwes>
but I've moved safely into the realm of "I'll just do the thing that's correct from a language memory model point of view" and I'm happier not wondering about the details on the architecture
<Mutabah>
The diference is the ordering to another CPU/thread
vai has quit [Ping timeout: 245 seconds]
<moon-child>
with multiple threads on the same core you retain atomicity
<moon-child>
on the instruction level
<Mutabah>
(by "thread" I meant hardware thread, aka hyper-threading)
<zid>
ye HT tosses all this out the window, that's SMP again
<MelMalik>
I really just want to be a soft animal
<vin>
Mutabah: sure yes that's the MC an x86 supports but what about reordering of access made to different locations on a single core? Assuming single thread -- single core
<moon-child>
yeah; meaning vcore, not phys etc.
<MelMalik>
and i want my OS to represent that
<Mutabah>
On a single thread, you will never observe the re-ordering
<Mutabah>
(unless you have some way of observing the memory bus)
<moon-child>
not even that, cuz ssb
<moon-child>
and it'll get flushed if you get preempted
<Griwes>
right. I guess it depends on what you're observing
<Griwes>
if you had a way to observe memory reads (perhaps by being an mmio device), you can observe it
<vin>
Okay so there can be reordering of accesses to different locations with 2 physical threads on a single core. Since they share the same L1/L2 any invalidations are instantly observed (the store buffer could have stores cached though right?) by either threads thus making this algorithm work?
<Griwes>
re-reading the wiki page again, I'm not seeing any sentences that talk about single core systems
<vin>
Griwes: I am just thinking about it from different perspectives trying to explain when this would and wouldn't work on x86. Also improving my x86 MC knowledge
<geist>
hmmm
<geist>
trying to wade into this discussion
<vin>
It is clear why it wouldn't when threads are on different cores but if the threads are on same core with SMT (vthreads == pthreads)
<vin>
Hi geist
<geist>
SMT hardware threads behave pretty much identically to physical cpus
<geist>
so any sort of ordering guarantees (or lack of) apply there
<geist>
hi vin
<Mutabah>
SMT cores will generally have their own L1 cache
<Mutabah>
and will definitely have their own pipeline ordering
<vin>
THey share both L1 and L2 right Mutabah.
<Mutabah>
yeah... just realised that was probably not write :)
<geist>
right, there's no real interlocking between the SMT cores, except what may or may not be as a side effect of a given implement
<Mutabah>
Pipeline point still stands
<geist>
actually not entirely true. see Bulldozer
<geist>
it was a hybrid of SMT where later versions had dedicated L1s
<Mutabah>
... "write", what the ___ is wrong with me today
<geist>
kinda halfway between separate cores and full SMT
<geist>
i dunno, i dont particularly feel like trying to grok that algorithm right now
<vin>
Because of reordering
<geist>
but again i repeat: SMT for all practical purposes appears to be the same thing as separate cores
<Griwes>
I spend too much time thinking about SIMT synchronization these days and not enough thinking about "normal" architectures
<vin>
hmmm
<Mutabah>
If you _ever_ have multiple cores accessing memory, you need to use atomic ops at some level
<geist>
so if they do appear to be synchronized that's a side effect of the microarchitecture
<Griwes>
insert the "why can't you be normal" meme with a screeching GPU on the second panel
<geist>
also remember modern superscalar designs have *lots* of memory accesses going on in parallel, many times speculatively
<Griwes>
anyway as soon as you have two instruction streams, you need atomics
<geist>
a lot of what makes the memory model appear strong/etc is the dependecey tracking of all these outstanding transactions
<geist>
in the case of SMT you end up with a bunch of outstanding transations, just spread across multiple hw threads
<geist>
but then they wont have any explicit deps between them
<geist>
so particular barriers or barrier events or ordering events will only apply to a particular threads
<vin>
Wait aren't loads and store atomic? So even having two instruction stream shouldn't be a problem correct Griwes?
<Mutabah>
Depends on the architecture
<Griwes>
aligned stores and loads on x86 are atomic, yes, but that only guarantees no tearing
<vin>
x86
<geist>
note that we're talking about strongly ordered arches like x86
<Griwes>
it does not guarantee ordering
<geist>
most other arches that are still active nowadays are weakly ordered
<Griwes>
(between threads that is)
<geist>
in which case even single threaded ordering is not guaranteed
<Griwes>
it has some enforcement *within* a thread
<Griwes>
but not across
<geist>
and since other arches exist and are popular, you still have to deal with weakly ordered stuff
<Griwes>
also, if you are above assembly at any point, the language you're writing in will usually say that unsynchronized accesses are always a data race and always undefined
<geist>
unless you happen to be writing just the x86 portion of an x86 module
<geist>
but weakly ordering isn't as bad as it sounds, it just means a bunch of guarantees aren't there so you can't rely on particular behaviors and you need barriers, implied or explicit
<geist>
like, for example, an atomic variable with SEQ_CST or acquiring a mutex, etc
<Griwes>
vin: if you ever touch the same variable with two different instruction streams that have a potential to execute concurrently (and that includes on different hyperthreads), you need to use atomics. x86 allows you to avoid tearing without atomics on aligned accesses, but that's it
<geist>
to a certain extent a weakly ordered system is almost easier to reason about because the guard rails are off, so you can just imagine the cpu does what it does with less rules to constrain it
<vin>
Interesting, I thought SMT threads will be different. Wait the only strong ordering gurantee x86 provides is within a single thread and not across threads correct geist? like Griwes mentioned
<geist>
vin: correct
<geist>
that's what i keep saying. SMT for all practical purposes on all implementations i know of make no guaratees about cross thread sequences
<geist>
they act as if they were separate cores
<vin>
Cool, that way one less special case to design for.
<geist>
since again modern superscalar x86s for example may have like 64 or 80 outstanding load/stores in flight, many speculatively, etc
<vin>
I mean if one cares about performance then scheduling threads on same cores would make sense -- reuse L1/L2 hot cache lines.
<geist>
the strong memory model x86 is guaranteeing is basically a complex sets of interdependencies between those load/stores to ensure they appear on the 'bus' in order, but the cpu may have long since moved on, etc
<vin>
Got it!
<geist>
but that only extends to a single hardware thread. if the same core is running another thread it may just have another set of interleaved memory transcations that are only sorted relative to other transactions for that thread
<geist>
and yah, having software threads that are running code that deals with similar data can have a win with SMT for sure
<geist>
or at least less of a penalty
<Griwes>
it can have a cache benefit, but whether that will end up with perf benefit overall is something one needs to test
<Griwes>
because you are kinda getting less overall cpu time compared to scheduling on two separate cores that aren't doing anything on the hyperthread
<vin>
So to conclude the only case this algorithm wold work on x86 would be on a single core with no SMT. The performance would be abysmal because of spin-wait and lot of context switches.
<Griwes>
depends on the system load, depends on the kind of work that the threads do (compute vs memory heavy and whatnot)
<geist>
i haven't looked at it too closel but i wonder if these peterson thing would work on a weakly memory model machine
<geist>
probably, if the spinny variables are atomic
<vin>
Makes sense Griwes this depends on the workload and yes it is a tradeoff.
<Griwes>
scheduling's hard ;p
<geist>
since atomic variables (at least on arches like ARM) can/do/may have memory barriers built into them
<geist>
which then orders things before/after
<geist>
which is generally not a thig you have to worry about with x86 because effectively every load/store has an implicit barrier with it
<Griwes>
all of this reminded me of a funny (hardware) scheduling-related case of atomics doing funky stuff on a gpu
<geist>
ie, things htat happened before it happen before, things that happen after happen after (even if it's basically fiction)
<Griwes>
we were testing a hashmap that did two loads, relaxed+relaxed vs seq_cst+relaxed and... seq_cst+relaxed was faster
<geist>
huh!
<geist>
question i guess is did the gpu actually implement relaxed
<Griwes>
we aren't sure why but the working theory is that hitting a global seq_cst barrier synchronized all the warps so that it eliminated divergence
<geist>
ARM for example allows a given core to 'relax' any of the lesser barriers to something stronger
<Griwes>
but it's just a working theory
<geist>
ah good point
<Griwes>
yes, our gpus implement the full C++ memory model of atomics since a few generations ago
<geist>
nice
<geist>
relaxed atomics still make my head spi
<geist>
at some point i think i was enlightened and grokked how a pipeline would allow that, but then the moment passed
<Griwes>
our std::atomic implementation kinda translates from C++ enum names (like memory_order_acquire) to instructions with a matching part (i.e. it actually says "acquire" in the public ISA)
<geist>
and of coruse ARM at least has a complex set of rules about whether or not a barrier applies to *all* memory transactions or just things in the same cache line, etc
<Griwes>
relaxed is just "pls no tear" ;d
<Griwes>
some time ago there was someone talking about proposing memory_order_tearing
<geist>
yeah, arm64 does too. ldr and ldar and ldtr i think
<Griwes>
which would give you _no_ guarantees, but would allow you to do a non-ub access even though you could get values never written
<geist>
ldar (acquire) and stlr (release) is it yeah
<Griwes>
not sure where that idea went
<geist>
anyway, relaxed atomics are lovely. wish x86 had themb
<moon-child>
geist: are they though? Like if you don't have contention the strong stuff will be cheap, and if you are operating on the same memory concurrently, the relaxed stuff will lead to races
<geist>
they're great for things like counters
<Griwes>
relaxed needs to be used _very carefully_
<geist>
you just bump some counter and move on, but dont have to synchronize to world for it
<Griwes>
some number of years ago there was a really bad bug in one of the c++ stdlib implementations, in shared_ptr refcounting
<moon-child>
you mean like perf counters where it's fine if the value is wrong? i guess that could work. but also just make it thread-local
<Griwes>
something that needed to be release I think was relaxed and things broke badly
<geist>
yah i can see that
<geist>
we use them mostly for counters and stuff in the kernel yeah
<Griwes>
define "if the value is wrong"
<geist>
where if it's off by one that's fine
<Griwes>
if you use them right, you get the right values
<Mutabah>
or just out-of-date
<geist>
right
<moon-child>
Griwes: where what you care about is that the value is in the right ballpark, not precise value
<Griwes>
if you're using relaxed to do say a lockfree list, you're doing it wrong
<Griwes>
yeah
<Griwes>
(I mean with a lockfree list you're probably going to initially do a relaxed load to obtain the old value before you enter a cmpxchg loop but you get my point)
<geist>
for efficiency purposes we have all the kernel counters be per cpu but since you can be context switched in the middle of it we still do a relaxed atomic bump of it
<geist>
so that it at least doesn't corrupt the value
<geist>
99.9% of the time it's local to the cpu that did it so its even pretty efficieny
<moon-child>
ah yeah, that is sensible
<moon-child>
language vs cpu memory model
<moon-child>
(and actually kinda coincides with vin's question)
<geist>
yah also built around the armv8.0 atomics where you have to do a multi instruction sequence. vs v8.1 atomics
<geist>
which look much more like x86. riscv also did single instruction atomics.
<geist>
a violation of the risc manifesto, but basically the best way to do it on modern machines
<moon-child>
huh. so riscv w/atomics is not actually load-and-store?
<Griwes>
I mean being pragmatic beats strictly adhering to a manifesto in engineering
<geist>
they have both actually
<Griwes>
usually
<geist>
load/store conditional and a set of atomic alu single instruction ops
<vin>
geist: a lot of modern file systems also now maintain bitmaps and inode tables per core to provide better concurrency.
<geist>
which is fairly surprising considering how bare bones riscv tends to be
<moon-child>
x86 you actually have a cmpxchg loop for atomic anything but add/sub
<geist>
hmm, never thought about it but thought you could `lock or` or whatnot as well?
<geist>
or does the lock prefix only really work on add
<Griwes>
huh, that's news to me
<Griwes>
how come they don't have something that's sufficient to implement things like fetch_or
<geist>
i honeslty haven't thought about it in a while. i try to use builtins anyway
<moon-child>
ah huh; manual sez it can be applied to: add adc and btc btr bts cmpxchg dec inc neg not or sbb sub xor xadd xchg. For some reason I thought it was more restricted
<Griwes>
...huh, mystery deepens
<Griwes>
gcc + libstdc++ (gcc's stdlib) does cmpxchg loop
<Griwes>
oh
<Griwes>
mystery solved
<Griwes>
my -O flags weren't matching
<Griwes>
:'D
<Griwes>
new mystery, why does it do the cmpxchg loop at -O0
<geist>
good question
<Griwes>
clang does the same
<Griwes>
loop at -O0, lock or at -O1 and up
<Griwes>
it must be something in atomic, because the naked use of the intrinsic uses lock or at -O0 too
ElectronApps has joined #osdev
xenos1984 has joined #osdev
m3a has quit [Quit: leaving]
dennis95 has joined #osdev
pretty_dumm_guy has joined #osdev
<MelMalik>
would it be bad to extend risc5
gog has joined #osdev
<klange>
Ah, that's why VGA text mode was not working on my ThinkPad, despite the kernel log showing up fine...
<klange>
Was trying to map the region write-combining, which apparently doesn't work... the kernel log and bootloader weren't doing anything like that.
<klange>
And that's why we test on real hardware~
<klange>
I'm really starting to lose my patience with the ethernet port.
<klange>
At some point the retaining ledge broke so cables don't stay in any more.
Arthuria has joined #osdev
C-Man has quit [Ping timeout: 260 seconds]
ravan has quit [Read error: No route to host]
robert_ has quit [Remote host closed the connection]
mniip has quit [Ping timeout: 633 seconds]
ravan has joined #osdev
mniip has joined #osdev
<junon>
geist: Will Fuchsia work on iPhone/iPad devices in theory? Maybe you can't speak on that in any official capacity, and I know that that area is taboo because of no right-to-repair laws etc.
<j`ey>
just as well as Linux would I think
<junon>
That's probably true, if it were feasible it probably would have already been done with Linux.
<junon>
wow wtf, why is a hello world program in clang 14kb. That's bigger than in the past, right? Or am I imagining things?
<junon>
with -O3 -g0 -s -DNDEBUG=1
<junon>
just a single puts("hello world")...
<GeDaMo>
Symbols? Dynamic loading?
<junon>
no -s strips symbols
<GeDaMo>
Ah
<junon>
it*
<junon>
dynamic loading might be in, but I thought that'd reduce the file size wouldn't it?
<junon>
-static causes it to be 788k
<junon>
wow
<Geertiebear>
you can use bloaty to find out where all that space goes
<junon>
That's a new one, have a link to bloaty?
<junon>
just tried with both gcc and clang, about the same thing, and CMake actually produces a larger executable since in release mode it doesn't strip.
<junon>
oh google's bloaty, got it
<Geertiebear>
yeah, that's the one
<junon>
It shows 9.34ki as "unmapped"
<junon>
and 1.69ki as "ELF Section Headers"
<junon>
seems... wrong
<junon>
.text seems more or less correct though, just 376 bytes
<junon>
asking in #llvm oftc right now
EtherNet has quit [Quit: WeeChat 3.4-dev]
EtherNet has joined #osdev
<junon>
Seems platform specific but they agree it's kind of large. Oh well, I don't think there's anything I can add to a release build set of flags to make the binary any smaller. It's already smaller than CMake.
<klange>
junon: btw re: hn comment, I actually had the first join to this channel on Libera, but the network was unstable and channel registration was not available, you got it after I d/c'd ;)
xenos1984 has quit [Quit: Leaving.]
Arthuria has quit [Remote host closed the connection]
Belxjander has quit [Ping timeout: 260 seconds]
<junon>
Oh :D
<junon>
Yeah I remember now, the chanserv stuff was hugged to death
Arthuria has joined #osdev
ahalaney has joined #osdev
<junon>
People were worried andrew was going to try to retaliate against libera
<junon>
he forcibly took over ownership of most freenode channels that mentioned libera lol
<klange>
It was definitely a thing that happened. The rapidity with which everyone migrated to Libera/OFTC was quite extraordinary.
<junon>
Yes
<junon>
I connected pretty much right as the first resignation letter drafts were leaked, and my eyes were glued to the screen for the next 8 hours after that just watching it all unfold. It was incredible how fast they got everything up and running.
<junon>
cc jess :D lol
<junon>
you all did a good job IMO
<junon>
new servers, new site, managing permissions, getting everyone cloaked/registered/transferred, answering questions, doing downtime maintenance, dealing with internet drama/fallout, dealing with Andrew, dealing with the emotional end of it, all at once, the bulk of which within pretty much 48 hours from start to finish. Impressive.
dude12312414 has joined #osdev
Arthuria has quit [Remote host closed the connection]
dormito has joined #osdev
m3a has joined #osdev
Oli has joined #osdev
Arthuria has joined #osdev
Arthuria has quit [Remote host closed the connection]
dutch has quit [Ping timeout: 260 seconds]
dutch has joined #osdev
Arthuria has joined #osdev
srjek has joined #osdev
xenos1984 has joined #osdev
sprock has quit [Ping timeout: 260 seconds]
[itchyjunk] has joined #osdev
ravan has quit [Remote host closed the connection]
kingoffrance has joined #osdev
ravan has joined #osdev
makersmasher has joined #osdev
xenos1984 has quit [Quit: Leaving.]
Dreg has quit [Read error: Connection reset by peer]
Dreg has joined #osdev
tacco has joined #osdev
nostalgia has joined #osdev
nostalgia has quit [Remote host closed the connection]
psychopacket has joined #osdev
mahmutov has joined #osdev
ElectronApps has quit [Remote host closed the connection]
[itchyjunk] has quit [Remote host closed the connection]