#osdev on 2021-11-01 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <junon> The lowest level facility to create build rules is the global `Rule()` function, which generates a ninja `rule` statement and returns another closure that you can call repeatedly to create ninja `build` statements.

00:00 <gog> MelMalik: allt gott

00:01 <junon> I just added the first standard library module that builds on top of that and auto-configures a C compiler for you.

00:01 <junon> It also standardizes config params and converts them to compiler-specific flags, too.

00:01 <junon> But the resulting config is chefs kiss. I'm really excited to start actually using it now.

00:03 <gog> i love stuff like that

00:03 <gog> parsimony

00:04 <junon> I can compile a list of C files using `local cc = require 'cc'; return cc{ S'foo.c', S'bar.c', warn='strict' };`

00:06 <junon> Once I add the linker then I can do `local link = require 'link'; return link{ out=B'my-program', cc{ S'foo.c', S'bar.c', warn='strict' } }`

00:06 <gog> seems like it'd be pretty flexible too

00:06 <junon> Yeah

00:06 <junon> It's strict, so you can't access things above the configuration file's own directory.

00:06 <junon> `B` and `S` can be prefixed to a string to resolve the path to the build and source directories, respectively

00:07 <junon> and the build directory is always `<build_prefix>/<source_dir>`, where `<source_dir>` is the directory of the currently running config file, relative to the root config file directory

00:08 <junon> you can also do things like `local c_path = S'foo.c'; local o_path = B(c_path):ext' .o';`

00:08 <junon> err, switch the ' and the space right before the `.o`

00:09 <junon> The way I was trying to configure modules for my kernel via CMake was just not clean or flexible and the resulting code was just awful.

00:12 <junon> I need a good documentation generator now, though.

00:12 <junon> I like the idea of RST but it lacks the kind of flexibility I want.

00:29 cooligans has joined #osdev

00:30 <cooligans> Does anyone know why my AMD Ryzen 5500 dosen't come with TSC-Deadline mode

00:33 <geist> not sure AMD has implemented it

00:33 <cooligans> Oh

00:34 <cooligans> it looks to me like its easy to implement

00:34 <geist> if it did it's going to be in Zen 3, since i know it's not in zen 2

00:34 <cooligans> intel CPUs have had it for quite a while

00:34 <geist> maybe there's patent reasons AMD didn't pick it up yet

00:34 <cooligans> true, true

00:35 <geist> but yeah it does seem like it'd be pretty easy

00:35 <geist> also took a while for Zen to pick up x2apic

00:35 <cooligans> Does QEMU have it implemented (without KVM/WHPX)

00:35 <cooligans> using TCG

00:35 <geist> actually perhaps that's the reason. it only makes sense if you have the full x2apic support because of the whole local apic access using MSR

00:35 <geist> yes

00:36 <cooligans> are you by any chance on the osdev discord

00:36 <geist> there have been multiple

00:36 <cooligans> the biggest one

00:36 <cooligans> ~4000 members

00:36 <geist> i have popped in from time to time, but haven't found it compelling to continue

00:36 <cooligans> you like the calm of IRC

00:37 <cooligans> discord is a bit more chaotic, since so many people are talking at once

00:37 <geist> well, i more than that found at least at the time that the level of discorse in the discord to be low

00:37 <geist> ie, lots and lots of folks without a lot of knowledge giving bad advice

00:37 <geist> i tried to help for a few months but didn't seem like anyone was interested

00:37 <cooligans> oh

00:37 <geist> technically i'm still there too

00:37 <geist> looking at it now, i just muted it forever ago

00:38 <cooligans> I mean, you seem quite more knowledgeable than i am

00:38 <cooligans> i'm sure you're help would be apreciated for the more advanced topics

00:38 <geist> sure. like i said i tried to for a while, but it didn't seem like there was the appropriate level of... discourse. hard to describe

00:39 <cooligans> like there was someone with a PKS (Supervisor Protection Keys) problem the other day

00:39 <cooligans> I didn't even know that was a thing

00:39 <cooligans> till 2 days ago

00:39 <klange> I was on the current one really early on but left. I rejoined for a bit but I just find the active userbase has strong opinions. It's likely constantly being in a Hacker News comment section.

00:39 <geist> yeah exactly

00:39 <cooligans> lol

00:39 <klange> s/likely/like/

00:39 <geist> that's part of it, lots of folks with bad advice and a lot of opinions

00:39 <geist> which is fine up to a point, but after a while it's like swimming in sand

00:39 <cooligans> i get it

00:40 <geist> if you're trying to sort of raise the bar so to speak

00:40 <cooligans> anyways, has anyone implemented shadow stacks

00:40 <geist> that being said i'll try to pay a bit more attention

00:40 <cooligans> I'm trying to add it to my operating system

00:40 <geist> what do you precisely mean by shadow stacks?

00:40 <geist> there are tons of things called that

00:40 <cooligans> CR4.CET and the whole SSP set of instructions

00:41 <geist> ah no. it's some new intel thing i haven't looked at at all

00:41 <geist> has it made it into consumer cpus?

00:41 <cooligans> I think

00:41 <geist> thats the poiint where i start paying a bit of attention

00:41 <cooligans> its also on the AMD System Developers manual

00:41 * geist nods

00:42 <cooligans> Wait, how did you do that

00:42 <cooligans> nods thing

00:42 <cooligans> i'm kinda new to IRC

00:42 <geist> honestly i spend most of my time in ARM64 land

00:42 <klange> Usually `/me` is the command.

00:42 <geist> type /me <emote>

00:42 * cooligans nods

00:42 <cooligans> there we go

00:42 <cooligans> I find ARM64 to be a bit too confusing

00:42 <geist> also a side note that i can't really get to worked up about but the discord server seems far more x86 centric than even this channel

00:43 <cooligans> now that's definitely true

00:43 <geist> virtually all conversations are implicitly x86, except one channel thats 'other-isas'

00:43 <geist> and since i spend most of my time on !x86 it's less compelling for me

00:43 <cooligans> 90% of people work on x86 based projects

00:44 <cooligans> there is only 2 projects that aren't x86 centralized afaik

00:44 <geist> which isn't really untrue of here as well,m but i think non x86 stuff is more tolerated here since enoiugh of the regulars are also either building portable OSes or also work on arm/riscv/etc stuff

00:44 <cooligans> managarm, whose creators are in this server

00:44 <cooligans> i think

00:45 <klange> There's two kinds of people in that Discord. The ones that make Managarm, and the ones that think their project is as cool as Managarm and are sorely mistaken.

00:45 <cooligans> I have one last question before I go, what IRC clients do you guys use?

00:45 <cooligans> managarm is impressive ngl

00:45 <gog> konversation

00:45 <klange> irssi

00:45 <cooligans> its almost as impresive as zircon

00:45 <geist> managarm? never heard of it

00:45 <cooligans> I'm using hexchat

00:46 <cooligans> its on github and it runs more than 50% of unix applications

00:46 <cooligans> https://github.com/managarm/managarm

00:46 <bslsk05> managarm/managarm - Pragmatic microkernel-based OS with fully asynchronous I/O (40 forks/728 stargazers/MIT)

00:46 <junon> geist: I think it's because all of the "hello world" tutorials e.g. on osdev.org exclusively target x86.

00:46 <geist> looks like x86-64 only

00:46 <geist> but it does at least do SMP so i'll give it that

00:46 <cooligans> check kernel/thor/arch

00:47 <cooligans> and kernel/eir/arch

00:47 <klange> geist: You should look into it as it's a rare gem of functionality; they're going for userspace Linux compatibility and can run Wayland.

00:47 <geist> cool

00:47 <cooligans> thats where the arch specific code is stored

00:47 <klange> They've earned my respect at least.

00:47 <geist> yeah looks like it's a thing for sure

00:47 <cooligans> the craziest thing is that its a microkernel

00:47 <cooligans> most projects like this are monolithic

00:47 <geist> sure, but microkernels aren't *that* weird

00:47 <geist> see zircon, etc

00:48 <cooligans> well yeah, but microkernels are rare

00:48 <cooligans> is zircon still actively developed

00:48 <geist> fair. the heydey was back in the 90s

00:48 <klange> People make a big deal out of microkernels vs. not microkernels, but it's really a very minor thing. Minix is a microkernel, Hurd despite all of its stumbles is a microkernel and can run a full Debian XFCE desktop, Zircon is a microkernel.

00:48 <geist> absolutely

00:48 <geist> full disclosure: i'm one of the main creators of zircon

00:48 <geist> and i still work on it at work

00:49 <klange> I think people have gotten the wrong idea into their heads from the famed Tanenbaum debates, and even most "monolithic" OSes these days are far closer to microkernels than the ones that were relevant in the 80s.

00:49 <geist> it's a fork of littlekernel, which is also one of my old projects

00:49 <cooligans> does google plan to swap out chromeos with fuchsia

00:49 <cooligans> at least that was my first impression

00:50 <junon> geist: you mean Fuchsia's Zircon?

00:50 <geist> cant talk about future stuff

00:50 <geist> junon: yes

00:50 <junon> I'm still surprised Google has kept the project. They like to kill those a lot these days.

00:50 <geist> heh, not gonna argue with that

00:50 <geist> naw, fuchsia has actually shipped in things at this point

00:51 <cooligans> wow

00:51 <klange> They've made a nest for themselves, so hopefully they'll stick around :)

00:51 <geist> also side note littlekernel is all over Pixel 6 too

00:51 <geist> kinda proud of that

00:51 <cooligans> I've always had one problem with zircon though

00:51 <cooligans> you can't build it standalone

00:51 <geist> cooligans: you and me both

00:51 <cooligans> there was like a bug number 35***

00:51 * klange hopes that joke landed somewhere

00:51 <geist> used to, cant anymore. i argued against it, but lost that particular battle

00:51 <junon> geist: how many Fuchsia devs are in this channel, do you think?

00:52 <geist> oh probably 5 or 6

00:52 <geist> we have a discord server for it now if you want to ask technical questions

00:52 <cooligans> geist: why did they vote aginst it

00:52 <geist> just dont bother asking things like 'what is google going to do with it'

00:52 <cooligans> ooh, could I get the link

00:52 <geist> cooligans: unified build system

00:52 <klange> Can't say we haven't tried to increase the number...

00:52 <junon> klange are you on fuchsia as well?

00:52 * klange is quite happy with current employment situation.

00:52 <geist> makes sense. ease of building, one GN/ninja instance, etc

00:53 <klange> No, but I've applied and been through interview processes, back in the before times.

00:53 <geist> and there's some amount of cross-contamination of libraries and whatnot between user and kernel

00:53 <junon> I see

00:53 <geist> which i was initially against, but have since relented

00:54 <cooligans> geist: mind sending me the discord link

00:54 <geist> looks like https://discord.gg/pjfYkmbq69

00:54 <bslsk05> redirect -> discord.com: Fuchsia

00:54 <geist> i googled it.

00:54 <geist> it's not super busy but if you have technical questions we'd be happy to answer

00:55 <cooligans> thanks

00:55 <cooligans> anyways, its kinda late, I gotta head out for the night

00:55 <geist> anyway, fairly proud of how well zircon has turned out

00:55 <geist> lots of fun decisions we made early on and most of them have turned out to be decent

00:55 <cooligans> since gn is the build system, it it possible to build fuchsia on windows

00:56 <cooligans> *is

00:56 <geist> on a posix environment yes

00:56 <cooligans> so wsl

00:56 <cooligans> works

00:56 <geist> otherwise no. there's a lot of prebuilt toolchain binaries

00:56 <geist> clang/rustc/etc

00:56 <geist> which are linux or mac only

00:56 <cooligans> ok

00:56 <cooligans> but since wsl is posix, it works

00:56 <geist> yep. WSL2 builds it fairly well

00:57 <geist> WSL1 is a trainwreck building fuchsia for Reasons

00:57 <cooligans> ok

00:57 <junon> WSL1 doesn't have a sleep syscall implementation :c

00:57 <junon> so any attempts to `sleep` and the like fail

00:57 <cooligans> wow

00:57 cooligans has quit [Quit: Connection closed]

00:57 <junon> At least in every case I've personally tried.

00:57 <geist> yah we found that gn itself has a terrible antipattern of heap usage for WSL1

00:57 <geist> that causes it to take literally 20 minutes or so to run

00:58 <geist> then there are some Go based tools in the build that also run horribly on WSL1. probably for sleep() like reasons

00:58 <geist> WSL1 is a pretty amazing solution, but it's always the edge cases that fall over

00:58 <geist> WSL2 being just a VM it works pretty fine

00:58 <klange> WSL1 amazes me, and the fact that WSL2 happened just makes WSL1's entire existence even more crazy.

00:58 <junon> Yeah. But there are outstanding issues with WSL2 that make me nervous to switch

00:59 <geist> yah

00:59 <junon> Did they pull the plug on WSL1 entirely now, in terms of support?

01:00 <klange> I do a bunch of dev under WSL2 on a Surface, got it all set up for nested virtualization so I can use KVM with QEMU.

01:05 gog has quit [Ping timeout: 268 seconds]

01:45 dre_ has joined #osdev

01:49 cooligans has joined #osdev

01:49 [itchyjunk] has joined #osdev

01:51 cooligans has quit [Client Quit]

01:51 cooligans has joined #osdev

01:52 cooligans has quit [Client Quit]

02:02 <raggi> geist: from what I could figure out the common cause was the ptmalloc strategy for small object allocations was hitting an extremely slow path (which is also slow on Linux, but 10x more impact on wsl1) - I'm anticipating starnix will have similar challenges to overcome eventually

02:02 <raggi> er, glibc malloc, strictly speaking

02:03 <raggi> jemalloc and tcmalloc with their map arenas operated much more efficiently

02:04 <raggi> I think, but I didn't get around to asserting it for sure, that it was the fine grained madvise causing the bulk problem, and more for threaded programs than serial ones, so assumption is it's hitting a global or widely shared lock

02:12 m5zs7k has quit [Quit: m5zs7k]

02:14 m5zs7k has joined #osdev

02:18 Maka_Albarn has quit [Quit: Light travels faster then sound, which is why some people appear bright, until you hear them speak]

02:41 kajiryoji has joined #osdev

02:45 [itchyjunk] has quit [Remote host closed the connection]

03:03 srjek has quit [Ping timeout: 260 seconds]

03:25 Oli has quit [Quit: leaving]

03:29 kingoffrance has quit [Ping timeout: 265 seconds]

03:38 <klange> I just dd'd my ISO to a USB stick and popped it in my ThinkPad and my bootloader actually works - it's the first time I've tried that.

04:03 darkstarx has quit [Quit: Leaving]

04:11 darkstardevx has joined #osdev

04:13 darkstardevx has quit [Remote host closed the connection]

04:14 darkstardevx has joined #osdev

04:15 tacco has quit [Remote host closed the connection]

05:28 ElectronApps has joined #osdev

05:54 josuah has quit [*.net *.split]

05:54 divine has quit [*.net *.split]

05:54 Arsen has quit [*.net *.split]

05:54 buffet has quit [*.net *.split]

05:54 bleb has quit [*.net *.split]

05:54 kkd has quit [*.net *.split]

05:54 valerius_ has quit [*.net *.split]

05:54 kori has quit [*.net *.split]

05:54 mxshift has quit [*.net *.split]

05:54 gorgonical has quit [*.net *.split]

05:54 nickster has quit [*.net *.split]

05:54 mxshift has joined #osdev

05:54 bleb has joined #osdev

05:54 Arsen has joined #osdev

05:55 nickster has joined #osdev

05:55 gorgonical has joined #osdev

05:55 kori has joined #osdev

05:55 kkd has joined #osdev

05:56 valerius_ has joined #osdev

05:59 divine has joined #osdev

06:02 jstoker has quit [*.net *.split]

06:02 pitust[m] has quit [*.net *.split]

06:02 MelMalik has quit [*.net *.split]

06:02 hl has quit [*.net *.split]

06:02 gruetzkopf has quit [*.net *.split]

06:02 geist2 has quit [*.net *.split]

06:02 CompanionCube has quit [*.net *.split]

06:02 V has quit [*.net *.split]

06:02 HeTo has quit [*.net *.split]

06:02 kazinsal has quit [*.net *.split]

06:02 mhi has quit [*.net *.split]

06:03 HeTo has joined #osdev

06:03 geist2 has joined #osdev

06:03 kazinsal has joined #osdev

06:03 V has joined #osdev

06:03 jstoker has joined #osdev

06:03 gruetzkopf has joined #osdev

06:03 hl has joined #osdev

06:04 CompanionCube has joined #osdev

06:05 Ellenor has joined #osdev

06:07 Ellenor is now known as MelanieMalik

06:08 _xor has joined #osdev

06:18 MelanieMalik is now known as MelMalik

06:33 bradd_ has joined #osdev

06:35 bradd has quit [Ping timeout: 260 seconds]

07:18 xenos1984 has quit [Quit: Leaving.]

07:21 vin has joined #osdev

07:23 <vin> What does loads and stores being atomic with respect to each other mean? And why isn't this supported on x86?

07:23 <vdamewood> vin: Do you know what it means for an operation to be atomic?

07:24 <vin> yes vdamewood

07:24 <vin> So a load after a store should always return the stored value? Is that it?

07:25 <vdamewood> Yep, and a load before a store should load the value before the store.

07:25 <vin> Right but I thought this is guranteed on x86, this is the basic memory consistency one has to support

07:26 <vin> *guaranteed

07:26 <vdamewood> I'm pretty sure this is guarandeed for a single core, but not for a multicore setup.

07:28 ravan has joined #osdev

07:28 <vin> So does that mean one could implement mutex locks with https://en.wikipedia.org/wiki/Peterson%27s_algorithm on a single core on x86 safely?

07:28 <bslsk05> en.wikipedia.org: Peterson's algorithm - Wikipedia

07:30 <vdamewood> No clue on my part.

07:33 <vin> Also why isn't multi-core not guarantted? Because of different L1/L2 caches and instruction reordering? The coherency protocol invaldiates a dirty cache line to ensure consistency but a thread could do a load on it before the invalidation making the change made by other thread(other core) invisible?

07:45 <vin> vdamewood: the notes section in the above wiki page sort of hints that memory reordering of sequential accesses without explicit memory barriers can break this algorithm but then any normal sequential program without barriers will also then provide no guaranttess of memory consistency!

07:45 <vin> That's absurd

07:50 ElectronApps has quit [Remote host closed the connection]

07:56 <Griwes> the algorithm seems to write and read from _different_ variables

07:56 <Griwes> unless you use strong enough memory orderings or fences, those can be reordered with respect to each other

07:57 <Griwes> the notes don't talk about accesses to the same address, but accesses to different addresses

07:57 <Griwes> and those get reordered all the time

07:58 <Griwes> your sequential code that does not have (1) fences, (2) atomics, or (3) data dependencies between instructions won't execute the way you wrote it

08:00 <Griwes> there's also _at least_ two levels of reordering that happen to your program unless you directly write assembly: the compiler is going to reorder accesses (save for when it encounters memory order enforcement such as atomics or fences), so the assembly is not what you wrote; and the cpu is going to reorder instructions when actually executing (save for memory order enforcement such as atomics or fences)

08:00 <Griwes> you are _probably_ safe on x86 when accessing the same variable, and in a bunch more cases because the memory model semantics of x86 are _incredibly_ strong

08:01 <Griwes> but this looks like one case where you can easily get bitten

08:02 <zid> compiler barriers are fun

08:02 <zid> that's all you need on uniprocessor though

08:03 <zid> your barriers and fences etc can all define out to asm(""::"memory");

08:04 <Griwes> that's the strongest compiler fence, yes

08:04 <zid> compiler barrier, it isn't a fence

08:05 <Griwes> I'm not convinced that there's a meaningful distinction

08:06 <zid> There isn't any meaning to compiler fence, is all

08:06 <zid> couple hundred google results, mostly talking about fences inside compilers

08:07 <Griwes> C++ has a function that is called atomic_thread_fence for this, and it's definitely a compiler operation too

08:08 <vin> Griwes: just to clarify, even if I write the algorithm in asm (avoiding compiler reordering) I expect this algorithm to not work on a single core processor. Because x86 can reorder accesses to the two different memory locations this breaking the logic?

08:08 <vin> *thus breaking

08:08 <zid> I'd like to see how any of them would change in a uniprocessor system, though

08:09 <Griwes> vin: unsure, because I don't remember what the exact semantics on memory operations in x86 are

08:09 <Griwes> a few years back I'd be able to answer that :P

08:09 <zid> C's going to treat it all as a no-op unless they're actually volatile, which there's no need for them to be because it's uniprocessor, and compile it down to a nice fat nothing, given sufficient optimization

08:09 <Mutabah> vin: You will always read back waht you last wrote

08:09 <Griwes> but I've moved safely into the realm of "I'll just do the thing that's correct from a language memory model point of view" and I'm happier not wondering about the details on the architecture

08:09 <Mutabah> The diference is the ordering to another CPU/thread

08:10 vai has quit [Ping timeout: 245 seconds]

08:10 <moon-child> with multiple threads on the same core you retain atomicity

08:10 <moon-child> on the instruction level

08:11 <Mutabah> (by "thread" I meant hardware thread, aka hyper-threading)

08:11 <zid> ye HT tosses all this out the window, that's SMP again

08:11 <MelMalik> I really just want to be a soft animal

08:11 <vin> Mutabah: sure yes that's the MC an x86 supports but what about reordering of access made to different locations on a single core? Assuming single thread -- single core

08:12 <moon-child> yeah; meaning vcore, not phys etc.

08:12 <MelMalik> and i want my OS to represent that

08:12 <Mutabah> On a single thread, you will never observe the re-ordering

08:12 <Mutabah> (unless you have some way of observing the memory bus)

08:14 <moon-child> not even that, cuz ssb

08:14 <moon-child> and it'll get flushed if you get preempted

08:15 <Griwes> right. I guess it depends on what you're observing

08:15 <Griwes> if you had a way to observe memory reads (perhaps by being an mmio device), you can observe it

08:17 <vin> Okay so there can be reordering of accesses to different locations with 2 physical threads on a single core. Since they share the same L1/L2 any invalidations are instantly observed (the store buffer could have stores cached though right?) by either threads thus making this algorithm work?

08:17 <Griwes> re-reading the wiki page again, I'm not seeing any sentences that talk about single core systems

08:19 <vin> Griwes: I am just thinking about it from different perspectives trying to explain when this would and wouldn't work on x86. Also improving my x86 MC knowledge

08:19 <geist> hmmm

08:19 <geist> trying to wade into this discussion

08:20 <vin> It is clear why it wouldn't when threads are on different cores but if the threads are on same core with SMT (vthreads == pthreads)

08:20 <vin> Hi geist

08:20 <geist> SMT hardware threads behave pretty much identically to physical cpus

08:20 <geist> so any sort of ordering guarantees (or lack of) apply there

08:20 <geist> hi vin

08:20 <Mutabah> SMT cores will generally have their own L1 cache

08:21 <Mutabah> and will definitely have their own pipeline ordering

08:21 <vin> THey share both L1 and L2 right Mutabah.

08:21 <Mutabah> yeah... just realised that was probably not write :)

08:21 <geist> right, there's no real interlocking between the SMT cores, except what may or may not be as a side effect of a given implement

08:21 <Mutabah> Pipeline point still stands

08:22 <geist> actually not entirely true. see Bulldozer

08:22 <geist> it was a hybrid of SMT where later versions had dedicated L1s

08:22 <Mutabah> ... "write", what the ___ is wrong with me today

08:22 <geist> kinda halfway between separate cores and full SMT

08:22 <vin> So are you saying the algorithm will not work on a single core with SMT geist? https://en.wikipedia.org/wiki/Peterson%27s_algorithm

08:23 <geist> i dunno, i dont particularly feel like trying to grok that algorithm right now

08:23 <vin> Because of reordering

08:23 <geist> but again i repeat: SMT for all practical purposes appears to be the same thing as separate cores

08:23 <Griwes> I spend too much time thinking about SIMT synchronization these days and not enough thinking about "normal" architectures

08:23 <vin> hmmm

08:24 <Mutabah> If you _ever_ have multiple cores accessing memory, you need to use atomic ops at some level

08:24 <geist> so if they do appear to be synchronized that's a side effect of the microarchitecture

08:24 <Griwes> insert the "why can't you be normal" meme with a screeching GPU on the second panel

08:24 <geist> also remember modern superscalar designs have *lots* of memory accesses going on in parallel, many times speculatively

08:24 <Griwes> anyway as soon as you have two instruction streams, you need atomics

08:24 <geist> a lot of what makes the memory model appear strong/etc is the dependecey tracking of all these outstanding transactions

08:24 <geist> in the case of SMT you end up with a bunch of outstanding transations, just spread across multiple hw threads

08:25 <geist> but then they wont have any explicit deps between them

08:25 <geist> so particular barriers or barrier events or ordering events will only apply to a particular threads

08:25 <vin> Wait aren't loads and store atomic? So even having two instruction stream shouldn't be a problem correct Griwes?

08:26 <Mutabah> Depends on the architecture

08:26 <Griwes> aligned stores and loads on x86 are atomic, yes, but that only guarantees no tearing

08:26 <vin> x86

08:26 <geist> note that we're talking about strongly ordered arches like x86

08:26 <Griwes> it does not guarantee ordering

08:26 <geist> most other arches that are still active nowadays are weakly ordered

08:26 <Griwes> (between threads that is)

08:26 <geist> in which case even single threaded ordering is not guaranteed

08:26 <Griwes> it has some enforcement *within* a thread

08:26 <Griwes> but not across

08:27 <geist> and since other arches exist and are popular, you still have to deal with weakly ordered stuff

08:27 <Griwes> also, if you are above assembly at any point, the language you're writing in will usually say that unsynchronized accesses are always a data race and always undefined

08:27 <geist> unless you happen to be writing just the x86 portion of an x86 module

08:28 <geist> but weakly ordering isn't as bad as it sounds, it just means a bunch of guarantees aren't there so you can't rely on particular behaviors and you need barriers, implied or explicit

08:28 <geist> like, for example, an atomic variable with SEQ_CST or acquiring a mutex, etc

08:29 <Griwes> vin: if you ever touch the same variable with two different instruction streams that have a potential to execute concurrently (and that includes on different hyperthreads), you need to use atomics. x86 allows you to avoid tearing without atomics on aligned accesses, but that's it

08:29 <geist> to a certain extent a weakly ordered system is almost easier to reason about because the guard rails are off, so you can just imagine the cpu does what it does with less rules to constrain it

08:29 <vin> Interesting, I thought SMT threads will be different. Wait the only strong ordering gurantee x86 provides is within a single thread and not across threads correct geist? like Griwes mentioned

08:29 <geist> vin: correct

08:29 <geist> that's what i keep saying. SMT for all practical purposes on all implementations i know of make no guaratees about cross thread sequences

08:30 <geist> they act as if they were separate cores

08:30 <vin> Cool, that way one less special case to design for.

08:30 <geist> since again modern superscalar x86s for example may have like 64 or 80 outstanding load/stores in flight, many speculatively, etc

08:31 <vin> I mean if one cares about performance then scheduling threads on same cores would make sense -- reuse L1/L2 hot cache lines.

08:31 <geist> the strong memory model x86 is guaranteeing is basically a complex sets of interdependencies between those load/stores to ensure they appear on the 'bus' in order, but the cpu may have long since moved on, etc

08:31 <vin> Got it!

08:31 <geist> but that only extends to a single hardware thread. if the same core is running another thread it may just have another set of interleaved memory transcations that are only sorted relative to other transactions for that thread

08:33 <geist> and yah, having software threads that are running code that deals with similar data can have a win with SMT for sure

08:33 <geist> or at least less of a penalty

08:33 <Griwes> it can have a cache benefit, but whether that will end up with perf benefit overall is something one needs to test

08:34 <Griwes> because you are kinda getting less overall cpu time compared to scheduling on two separate cores that aren't doing anything on the hyperthread

08:34 <vin> So to conclude the only case this algorithm wold work on x86 would be on a single core with no SMT. The performance would be abysmal because of spin-wait and lot of context switches.

08:34 <Griwes> depends on the system load, depends on the kind of work that the threads do (compute vs memory heavy and whatnot)

08:35 <geist> i haven't looked at it too closel but i wonder if these peterson thing would work on a weakly memory model machine

08:35 <geist> probably, if the spinny variables are atomic

08:35 <vin> Makes sense Griwes this depends on the workload and yes it is a tradeoff.

08:35 <Griwes> scheduling's hard ;p

08:36 <geist> since atomic variables (at least on arches like ARM) can/do/may have memory barriers built into them

08:36 <geist> which then orders things before/after

08:36 <geist> which is generally not a thig you have to worry about with x86 because effectively every load/store has an implicit barrier with it

08:36 <Griwes> all of this reminded me of a funny (hardware) scheduling-related case of atomics doing funky stuff on a gpu

08:36 <geist> ie, things htat happened before it happen before, things that happen after happen after (even if it's basically fiction)

08:37 <Griwes> we were testing a hashmap that did two loads, relaxed+relaxed vs seq_cst+relaxed and... seq_cst+relaxed was faster

08:37 <geist> huh!

08:38 <geist> question i guess is did the gpu actually implement relaxed

08:38 <Griwes> we aren't sure why but the working theory is that hitting a global seq_cst barrier synchronized all the warps so that it eliminated divergence

08:38 <geist> ARM for example allows a given core to 'relax' any of the lesser barriers to something stronger

08:38 <Griwes> but it's just a working theory

08:38 <geist> ah good point

08:38 <Griwes> yes, our gpus implement the full C++ memory model of atomics since a few generations ago

08:39 <geist> nice

08:39 <geist> relaxed atomics still make my head spi

08:39 <geist> at some point i think i was enlightened and grokked how a pipeline would allow that, but then the moment passed

08:40 <Griwes> our std::atomic implementation kinda translates from C++ enum names (like memory_order_acquire) to instructions with a matching part (i.e. it actually says "acquire" in the public ISA)

08:40 <geist> and of coruse ARM at least has a complex set of rules about whether or not a barrier applies to *all* memory transactions or just things in the same cache line, etc

08:40 <Griwes> relaxed is just "pls no tear" ;d

08:40 <Griwes> some time ago there was someone talking about proposing memory_order_tearing

08:41 <geist> yeah, arm64 does too. ldr and ldar and ldtr i think

08:41 <Griwes> which would give you _no_ guarantees, but would allow you to do a non-ub access even though you could get values never written

08:41 <geist> ldar (acquire) and stlr (release) is it yeah

08:41 <Griwes> not sure where that idea went

08:42 <geist> anyway, relaxed atomics are lovely. wish x86 had themb

08:42 <vin> So a few months ago I read http://pages.cs.wisc.edu/~markhill/papers/primer2020_2nd_edition.pdf which was pretty interesting. I wished it covered modern protocols and gurantees used in x86 or arm

08:43 <moon-child> geist: are they though? Like if you don't have contention the strong stuff will be cheap, and if you are operating on the same memory concurrently, the relaxed stuff will lead to races

08:43 <geist> they're great for things like counters

08:43 <Griwes> relaxed needs to be used _very carefully_

08:43 <geist> you just bump some counter and move on, but dont have to synchronize to world for it

08:44 <Griwes> some number of years ago there was a really bad bug in one of the c++ stdlib implementations, in shared_ptr refcounting

08:44 <moon-child> you mean like perf counters where it's fine if the value is wrong? i guess that could work. but also just make it thread-local

08:44 <Griwes> something that needed to be release I think was relaxed and things broke badly

08:44 <geist> yah i can see that

08:44 <geist> we use them mostly for counters and stuff in the kernel yeah

08:44 <Griwes> define "if the value is wrong"

08:44 <geist> where if it's off by one that's fine

08:45 <Griwes> if you use them right, you get the right values

08:45 <Mutabah> or just out-of-date

08:45 <geist> right

08:45 <moon-child> Griwes: where what you care about is that the value is in the right ballpark, not precise value

08:45 <Griwes> if you're using relaxed to do say a lockfree list, you're doing it wrong

08:45 <Griwes> yeah

08:46 <Griwes> (I mean with a lockfree list you're probably going to initially do a relaxed load to obtain the old value before you enter a cmpxchg loop but you get my point)

08:46 <geist> for efficiency purposes we have all the kernel counters be per cpu but since you can be context switched in the middle of it we still do a relaxed atomic bump of it

08:46 <geist> so that it at least doesn't corrupt the value

08:46 <geist> 99.9% of the time it's local to the cpu that did it so its even pretty efficieny

08:46 <moon-child> ah yeah, that is sensible

08:46 <moon-child> language vs cpu memory model

08:47 <moon-child> (and actually kinda coincides with vin's question)

08:47 <geist> yah also built around the armv8.0 atomics where you have to do a multi instruction sequence. vs v8.1 atomics

08:47 <geist> which look much more like x86. riscv also did single instruction atomics.

08:48 <geist> a violation of the risc manifesto, but basically the best way to do it on modern machines

08:48 <moon-child> huh. so riscv w/atomics is not actually load-and-store?

08:49 <Griwes> I mean being pragmatic beats strictly adhering to a manifesto in engineering

08:49 <geist> they have both actually

08:49 <Griwes> usually

08:49 <geist> load/store conditional and a set of atomic alu single instruction ops

08:49 <vin> geist: a lot of modern file systems also now maintain bitmaps and inode tables per core to provide better concurrency.

08:50 <geist> which is fairly surprising considering how bare bones riscv tends to be

08:50 <moon-child> x86 you actually have a cmpxchg loop for atomic anything but add/sub

08:52 <geist> hmm, never thought about it but thought you could `lock or` or whatnot as well?

08:52 <geist> or does the lock prefix only really work on add

08:53 <Griwes> huh, that's news to me

08:53 <Griwes> how come they don't have something that's sufficient to implement things like fetch_or

08:54 <geist> i honeslty haven't thought about it in a while. i try to use builtins anyway

08:54 <Griwes> yeah

08:54 <geist> https://gcc.godbolt.org/z/x8nhxvYco ah no, you can lock or as well

08:55 <geist> and thus i assume and/xor/etc

08:56 <Griwes> ...why does std::atomic not do that

08:56 <moon-child> ah huh; manual sez it can be applied to: add adc and btc btr bts cmpxchg dec inc neg not or sbb sub xor xadd xchg. For some reason I thought it was more restricted

08:56 <Griwes> ...huh, mystery deepens

08:57 <Griwes> gcc + libstdc++ (gcc's stdlib) does cmpxchg loop

08:57 <Griwes> oh

08:57 <Griwes> mystery solved

08:57 <Griwes> my -O flags weren't matching

08:57 <Griwes> :'D

08:58 <Griwes> new mystery, why does it do the cmpxchg loop at -O0

08:58 <geist> good question

08:58 <Griwes> clang does the same

08:58 <Griwes> loop at -O0, lock or at -O1 and up

09:02 <Griwes> it must be something in atomic, because the naked use of the intrinsic uses lock or at -O0 too

09:14 ElectronApps has joined #osdev

09:20 xenos1984 has joined #osdev

09:27 m3a has quit [Quit: leaving]

09:29 dennis95 has joined #osdev

09:43 pretty_dumm_guy has joined #osdev

10:06 <MelMalik> would it be bad to extend risc5

10:09 gog has joined #osdev

10:14 <klange> Ah, that's why VGA text mode was not working on my ThinkPad, despite the kernel log showing up fine...

10:15 <klange> Was trying to map the region write-combining, which apparently doesn't work... the kernel log and bootloader weren't doing anything like that.

10:18 <klange> And that's why we test on real hardware~

10:18 <klange> I'm really starting to lose my patience with the ethernet port.

10:18 <klange> At some point the retaining ledge broke so cables don't stay in any more.

10:33 Arthuria has joined #osdev

10:36 C-Man has quit [Ping timeout: 260 seconds]

10:42 ravan has quit [Read error: No route to host]

10:44 robert_ has quit [Remote host closed the connection]

10:44 mniip has quit [Ping timeout: 633 seconds]

10:51 ravan has joined #osdev

10:51 mniip has joined #osdev

10:51 <junon> geist: Will Fuchsia work on iPhone/iPad devices in theory? Maybe you can't speak on that in any official capacity, and I know that that area is taboo because of no right-to-repair laws etc.

10:54 <j`ey> just as well as Linux would I think

10:54 <junon> That's probably true, if it were feasible it probably would have already been done with Linux.

10:57 <j`ey> https://wiki.postmarketos.org/wiki/Apple_iPhone_7/7%2B_(apple-iphone7)

10:57 <bslsk05> wiki.postmarketos.org: Apple iPhone 7/7+ (apple-iphone7) - postmarketOS

10:58 <junon> Doesn't the boot firmware verify the images against the root CA cert from Apple in modern devices?

10:58 <junon> So your image basically needs to be signed by Apple or something, right? You'd have to have their private key

10:59 <j`ey> this is why you need exploits

10:59 <junon> and the firmware can't be flashed since it's put on ROM at chip manufacture time according to this article I'm reading

10:59 <junon> Right okay, makes sense

10:59 <junon> so this is the goal of the jailbreak movement, pretty much directly, right?

11:00 <j`ey> I dont think they want to run linux

11:01 <junon> Right

11:03 GeDaMo has joined #osdev

11:08 <klange> Love a nice round number. https://klange.dev/s/Screenshot%20from%202021-11-01%2020-07-57.png

11:08 <Mutabah> :D

11:08 <gog> nice

11:08 <junon> Perfection

11:09 <Mutabah> :( 1520 here

11:09 <gog> show me 4KB: 4000 bytes

11:09 <gog> no the real 4KB: 4096 bytes

11:18 C-Man has joined #osdev

11:22 <junon> wow wtf, why is a hello world program in clang 14kb. That's bigger than in the past, right? Or am I imagining things?

11:23 <junon> with -O3 -g0 -s -DNDEBUG=1

11:23 <junon> just a single puts("hello world")...

11:23 <GeDaMo> Symbols? Dynamic loading?

11:24 <junon> no -s strips symbols

11:24 <GeDaMo> Ah

11:24 <junon> it*

11:24 <junon> dynamic loading might be in, but I thought that'd reduce the file size wouldn't it?

11:25 <junon> -static causes it to be 788k

11:25 <junon> wow

11:27 <Geertiebear> you can use bloaty to find out where all that space goes

11:28 <junon> That's a new one, have a link to bloaty?

11:30 <junon> just tried with both gcc and clang, about the same thing, and CMake actually produces a larger executable since in release mode it doesn't strip.

11:30 <junon> oh google's bloaty, got it

11:31 <Geertiebear> yeah, that's the one

11:36 <junon> It shows 9.34ki as "unmapped"

11:36 <junon> and 1.69ki as "ELF Section Headers"

11:36 <junon> seems... wrong

11:37 <junon> .text seems more or less correct though, just 376 bytes

11:38 <junon> asking in #llvm oftc right now

11:39 EtherNet has quit [Quit: WeeChat 3.4-dev]

11:42 EtherNet has joined #osdev

11:49 <junon> Seems platform specific but they agree it's kind of large. Oh well, I don't think there's anything I can add to a release build set of flags to make the binary any smaller. It's already smaller than CMake.

11:50 <klange> junon: btw re: hn comment, I actually had the first join to this channel on Libera, but the network was unstable and channel registration was not available, you got it after I d/c'd ;)

11:57 xenos1984 has quit [Quit: Leaving.]

12:02 Arthuria has quit [Remote host closed the connection]

12:06 Belxjander has quit [Ping timeout: 260 seconds]

12:15 <junon> Oh :D

12:15 <junon> Yeah I remember now, the chanserv stuff was hugged to death

12:17 Arthuria has joined #osdev

12:18 ahalaney has joined #osdev

12:21 <junon> People were worried andrew was going to try to retaliate against libera

12:22 <junon> he forcibly took over ownership of most freenode channels that mentioned libera lol

12:23 <klange> It was definitely a thing that happened. The rapidity with which everyone migrated to Libera/OFTC was quite extraordinary.

12:23 <junon> Yes

12:24 <junon> I connected pretty much right as the first resignation letter drafts were leaked, and my eyes were glued to the screen for the next 8 hours after that just watching it all unfold. It was incredible how fast they got everything up and running.

12:24 <junon> cc jess :D lol

12:24 <junon> you all did a good job IMO

12:26 <junon> new servers, new site, managing permissions, getting everyone cloaked/registered/transferred, answering questions, doing downtime maintenance, dealing with internet drama/fallout, dealing with Andrew, dealing with the emotional end of it, all at once, the bulk of which within pretty much 48 hours from start to finish. Impressive.

12:26 dude12312414 has joined #osdev

12:38 Arthuria has quit [Remote host closed the connection]

12:38 dormito has joined #osdev

12:39 m3a has joined #osdev

12:45 Oli has joined #osdev

13:03 Arthuria has joined #osdev

13:04 Arthuria has quit [Remote host closed the connection]

13:18 dutch has quit [Ping timeout: 260 seconds]

13:26 dutch has joined #osdev

13:33 Arthuria has joined #osdev

13:38 srjek has joined #osdev

13:45 xenos1984 has joined #osdev

13:51 sprock has quit [Ping timeout: 260 seconds]

13:59 [itchyjunk] has joined #osdev

14:17 ravan has quit [Remote host closed the connection]

14:21 kingoffrance has joined #osdev

14:23 ravan has joined #osdev

14:28 makersmasher has joined #osdev

14:31 xenos1984 has quit [Quit: Leaving.]

14:34 Dreg has quit [Read error: Connection reset by peer]

14:35 Dreg has joined #osdev

15:12 tacco has joined #osdev

15:27 nostalgia has joined #osdev

15:28 nostalgia has quit [Remote host closed the connection]

15:36 psychopacket has joined #osdev

15:46 mahmutov has joined #osdev

15:52 ElectronApps has quit [Remote host closed the connection]

15:56 [itchyjunk] has quit [Remote host closed the connection]

16:22 sprock has joined #osdev

16:51 cooligans has joined #osdev

16:56 cooligans has quit [Quit: Igloo IRC: https://iglooirc.com]

17:07 dennis95 has quit [Quit: Leaving]

17:22 dormito has quit [Ping timeout: 268 seconds]

17:48 xenos1984 has joined #osdev

17:55 sprock has quit [Ping timeout: 260 seconds]

17:56 dormito has joined #osdev

18:09 dra has joined #osdev

18:22 sprock has joined #osdev

18:40 srjek has quit [Ping timeout: 268 seconds]

18:53 buffet has joined #osdev

19:11 ahlk has quit [Ping timeout: 245 seconds]

19:20 <geist> indeed

19:21 <geist> it was pretty amazing how quickly that was all in the rearview mirror. for people that didn't have to admin anything at least

19:24 <Bitweasil> Yeah.

19:24 <Bitweasil> Not-Freenode popped up in a right hurry, and it's been solid!

19:26 <GeDaMo> The hardest part is remembering it's called libera.chat, not libera.net :P

19:30 ahlk has joined #osdev

19:32 sortie has quit [Ping timeout: 264 seconds]

19:34 sortie has joined #osdev

19:46 <geist> hah yeah

19:47 sprock has quit [Ping timeout: 268 seconds]

19:49 <kazinsal> clever domain hacks are usually worth it. usually

19:50 <j`ey> usually.worth.it

20:03 sprock has joined #osdev

20:25 dude12312414 has quit [Remote host closed the connection]

20:26 dude12312414 has joined #osdev

20:27 dormito has quit [Ping timeout: 260 seconds]

20:28 dormito has joined #osdev

20:33 dormito has quit [Quit: WeeChat 3.3]

20:51 YuutaW has quit [Ping timeout: 260 seconds]

20:51 YuutaW has joined #osdev

20:56 Nselm has joined #osdev

20:57 sprock has quit [Ping timeout: 268 seconds]

21:07 GeDaMo has quit [Remote host closed the connection]

21:13 * geist resists the urge to click on that

21:15 srjek has joined #osdev

21:18 philpax_ has joined #osdev

21:22 dormito has joined #osdev

21:32 dormito has quit [Ping timeout: 260 seconds]

21:33 _eryjus has joined #osdev

21:35 adder has joined #osdev

21:36 eryjus has quit [Ping timeout: 260 seconds]

21:37 dormito has joined #osdev

21:40 <Ermine> Spoilee: this domain is for sale

21:42 eryjus has joined #osdev

21:44 _eryjus has quit [Ping timeout: 260 seconds]

21:45 ahalaney has quit [Quit: Leaving]

21:46 Arthuria has quit [Ping timeout: 268 seconds]

21:47 dra has quit [Quit: Leaving]

21:54 Nselm has quit [Quit: Client closed]

22:37 psychopacket has quit [Quit: Textual IRC Client: www.textualapp.com]

22:38 <junon> second spoiler: they don't tell you a price without an email

22:38 <junon> so now they have a domain quote request from a very childishly named test email address

22:43 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

22:53 cooligans has joined #osdev

22:53 cooligans has quit [Client Quit]

23:06 sprock has joined #osdev

23:12 kanton has joined #osdev

23:12 kanton has quit [Client Quit]

23:25 __xor has joined #osdev

23:26 _xor has quit [Ping timeout: 245 seconds]

23:32 MarchHare has joined #osdev

23:37 dutch has quit [Quit: WeeChat 3.3]

23:38 dutch has joined #osdev

23:54 pretty_dumm_guy has quit [Quit: WeeChat 3.3]

23:55 X-Scale` has joined #osdev

23:56 X-Scale has quit [Ping timeout: 260 seconds]

23:56 X-Scale` is now known as X-Scale