klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<zid`> when a river erodes the turn it's going around and ends up straight again
<zid`> and kicks off a lake
<gog> ohhh yeah
<gog> yes
<zid`> it's like a reverse atoll
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
mctpyt has joined #osdev
nyah has quit [Quit: leaving]
Turn_Left has joined #osdev
Left_Turn has quit [Ping timeout: 252 seconds]
Burgundy has quit [Ping timeout: 246 seconds]
dutch has quit [Quit: WeeChat 3.8]
spikeheron has joined #osdev
gog has quit [Quit: byee]
<gorgonical> Question about GIC interrupt grouping: the GICD and GICR have separate registers for configuringthe group and security level that an interrupt has.
<gorgonical> I'm guessing that the GICD is in charge of SPI interrupt configuration and the GICR is in charge of the per-CPU interrupts like SGI, PPI?
<gorgonical> The main question is whether there's a "hierarchy" since although SGIs originate at the CPU interface and go to the GICR, they have to go to the GICD to make it to another CPU. So then in that case the first GICR determines the type? The second one? The GICD?
bradd has quit [Ping timeout: 248 seconds]
mctpyt has quit [Ping timeout: 260 seconds]
tiggster has joined #osdev
<geist> think of the GICR as the local apic and the GICD as an ioapic, iirc
<geist> one of them is indeed per cpu, the other is more of a gloal thing
<geist> SGIs i think Just Happen on the other core and there's not really any real overall configuration, since the range is basically reserved
<geist> but this is just off of memory, so i might be wrong
mctpyt has joined #osdev
<gorgonical> hmm
<gorgonical> If there's no configuration then that suggests anyone can SGI a secure core right?
zxrom has joined #osdev
srjek has quit [Ping timeout: 268 seconds]
heat has quit [Ping timeout: 248 seconds]
fedorafansuper has quit [Quit: Textual IRC Client: www.textualapp.com]
mctpyt has quit [Ping timeout: 252 seconds]
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
<geist> oh in a hypervisor situation that's a different story, but you're right i think if there's a separate core, then yeah i think there' dneed to be some way to mask it off
<geist> but i dont have the spec in front of me, there may be a mechanism to configure it locally
<geist> at least some sort of local interrupt mask for sure for that SGI
<geist> but i dont think there's necessarily a way of specifying which cores can SGI which other cores
<geist> *aside* from whatever virtualization extensions EL2 may implement
joe9 has quit [Quit: leaving]
spikeheron has quit [Quit: WeeChat 3.8]
<moon-child> is it slow to send ipi, or just to receive them?
Clockface has joined #osdev
dutch has joined #osdev
<Clockface> whats the most practical way to emulate a specific peice of hardware for other kernel mode code
<Clockface> will i have to just intercept every I/O thing from everything else
<Clockface> and then replicate all of it "for real"
<Clockface> except the stuff connecting to the fake device
bradd has joined #osdev
slidercrank has joined #osdev
foudfou has quit [Ping timeout: 255 seconds]
foudfou has joined #osdev
Vercas6 has joined #osdev
Vercas has quit [Ping timeout: 255 seconds]
Vercas6 is now known as Vercas
zxrom has quit [Read error: Connection reset by peer]
mctpyt has joined #osdev
mctpyt has quit [Ping timeout: 246 seconds]
foudfou has quit [Quit: Bye]
foudfou has joined #osdev
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
bgs has joined #osdev
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
epony has joined #osdev
masoudd has joined #osdev
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
bradd has quit [Remote host closed the connection]
danilogondolfo has joined #osdev
bradd has joined #osdev
hmmmm has quit [Remote host closed the connection]
gog has joined #osdev
slidercrank has quit [Ping timeout: 255 seconds]
Vercas9 has joined #osdev
fedorafan has joined #osdev
Vercas has quit [Ping timeout: 255 seconds]
Vercas9 is now known as Vercas
mahk has quit [Ping timeout: 260 seconds]
GeDaMo has joined #osdev
elastic_dog has quit [Read error: Connection reset by peer]
elastic_dog has joined #osdev
Burgundy has joined #osdev
les has joined #osdev
les has quit [Client Quit]
les has joined #osdev
mahk has joined #osdev
mahk has quit [Ping timeout: 248 seconds]
mahk has joined #osdev
foudfou has quit [Ping timeout: 255 seconds]
foudfou has joined #osdev
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
slidercrank has joined #osdev
<netbsduser`> a question about unified buffer caches: in general i know pages of these to get a different treatment from e.g. anonymous pages, because pages of a page cache get written out to their backing store regularly (i think on linux every 30s) rather than just in response to page replacement deciding that a page has to be put back to make room for another. but nonetheless they also get put back to disk in response to typical page replacement demands too
<netbsduser`> so consider the case of certain filesystems, which have to enact invariants, like "this journal block has to be written before that metadata block is, else all hell breaks loose." i know that there are a lot of filesystems which do in fact write journals lazily. what approach is usually taken in unified buffer caches to describe such invariants and to ensure that they are not violated by normal page replacement policy?
<netbsduser`> i have considered two approaches: one is to let the `struct buf`s associated with a UBC hold dependency information. this would allow the pageout daemon to continue to enact its own policy on page replacement (if it calls a page eligible for swapout, and beholds it contains bufs which have dependencies, it would then write those dependencies out first.) another is to have it handled at the filesystem level. the page descriptions (or bufs they
<netbsduser`> contain) would be marked to say, "fs driver will handle these ones"
bradd has quit [Ping timeout: 248 seconds]
joe9 has joined #osdev
<mrvn> you write out the dependencies then throw in a barrier/flush and only then the depending blocks.
<mrvn> the kernel will not reorder I/O across barriers
<mrvn> which is also a problem. Because when you fsync() a file the updates can be stuck behind barrier with tons of unrelated data and they can't be fast tracked because that would require crossing the barrier.
<mrvn> If you write your own IO system then having a dependency / order graph seems like an improvement over the simple queue strategy generally used.
<netbsduser`> mrvn: but who writes them by that order? would, let's say, the FS driver submit asynchronous writes to the I/O system and then it maintains the ordering information and if e.g. the pageout daemon wants to write out a page, the I/O system checks it against its queue of pending writes and orders appropriately? that might be a wiser approach than either of what i was considering
[itchyjunk] has joined #osdev
mctpyt has joined #osdev
[itchyjunk] has quit [Read error: Connection reset by peer]
heat has joined #osdev
mctpyt has quit [Ping timeout: 248 seconds]
[itchyjunk] has joined #osdev
heat has quit [Remote host closed the connection]
heat has joined #osdev
[itchyjunk] has quit [Read error: Connection reset by peer]
[_] has joined #osdev
craigo has quit [Ping timeout: 252 seconds]
dutch has quit [Quit: WeeChat 3.8]
<mrvn> netbsduser`: each IO layer writes their queue in the order the barriers enforce
<mrvn> There is also no checking. The I/O layers simply perform the IO they are told to do. If you write out a page twice it gets written out twice if there is a barrier between them. Maybe even always.
fedorafan has quit [Ping timeout: 248 seconds]
dutch has joined #osdev
fedorafan has joined #osdev
srjek has joined #osdev
aoei is now known as Stella
<kaichiuchi> hi
<heat> hai
<gog> hi
masoudd has quit [Remote host closed the connection]
masoudd has joined #osdev
bauen1_ has joined #osdev
bauen1 has quit [Ping timeout: 252 seconds]
<heat> "However, the modularity of UEFI also makes it easier for HP to innovate. HP DayStarter is a simple value-add to the system allowing users to have access to productivity information while waiting for the system to boot"
<heat> oh my fucking god
<gog> this is not what uefi is for but it's the inevitable consequence of making pre-boot application development easier
<gog> good job
<gog> we heard you liked operating systems so we put an operating system into your firmware
gog has quit [Quit: Konversation terminated!]
<heat> late stage capitalism EFI
<kof123> late stage osdev
<kof123> devours its children
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
<sakasama> Thank you HP DayStarter. Without this innovative technology I may never have known that useful fact about Chuck Norris.
<heat> i hope you all realize this is done in SMM
knusbaum has quit [Ping timeout: 248 seconds]
knusbaum has joined #osdev
<sakasama> I've heard of that! It's kind of like BDSM but participants need double the masochism.
masoudd has quit [Remote host closed the connection]
masoudd_ has joined #osdev
<heat> no, that is BSD
xenos1984 has quit [Ping timeout: 248 seconds]
xenos1984 has joined #osdev
Turn_Left has quit [Ping timeout: 252 seconds]
dude12312414 has joined #osdev
<clever> heat: isnt that just a clone of a minimal linux env in the flash? or does it run along side the os??
<clever> oh, checking the screenshot, it looks more like an odd overlay, after the bootloader has ran??
<clever> but where is it getting that data from
<heat> clever, The benefits to the customers are the instant-on user experience with user productivity information (such as calendar, to-do list and customizable information) available for display before and while Windows is booting. The main technology behind it is for the UEFI BIOS to locate the proper JPEG images and use the System Management Mode (SMM) to update the frame buffer content until Windows is ready for system login. At OS runtime, HP
<heat> implements an Outlook plug-in to capture the calendar information.
<heat> it uses fucking SMM
<heat> i hope they do jpeg decoding in SMM for the big funny
<clever> heat: windows already has a cheat for instant on, they renamed hibernate to shutdown :P
<clever> so when you think youve turned it off, it just went into hibernate
<heat> yes, this was in 2011
<clever> ah
<heat> imagine how much better DayStarter is these days!
<clever> smm also explains most of my questions
<clever> now it can be just as anoying as the HUD on my tv, getting in the way and covering up valuable UI elements
<clever> until it times out
<heat> modern daystarter should play youtube vids in SMM :v
<clever> or, you know, just boot faster :P
<heat> hmm, good point
<heat> there's room for a tiktok or two
<clever> but i have had a similar idea in the past, with that dislay on the apple keyboard
<clever> where they replaced the F1-F12 row, with what is basically an ipad
<clever> fully self-contained computer
<mats2> outlook in uefi
<mats2> amazing innovation
<clever> why not allow that to run on the keyboard, with the system off?
<clever> give it access to email, and calendar
<mats2> who needs windows when you have uefi
<bslsk05> ​linux.slashdot.org: A Web Browser in Your BIOS? - Slashdot
<acidx> with a web browser in the bios, who needs an operating system?
xenos1984 has quit [Ping timeout: 248 seconds]
<mrvn> heat: oh how I would lought to prevent crying when the SMM shows a popup that flash has to be updated.
xenos1984 has joined #osdev
<clever> acidx: thats basically what that linux in the bios did
<heat> linux in the bios is more alive than ever
<heat> i think google has been deploying LinuxBIOS at scale
<acidx> when I had Linux in the BIOS, I used it mostly as a makeshift "secure" bootloader
<heat> sorry, not linuxbios, linuxboot
<acidx> the kernel was even built without networking and whatnot
xenos1984 has quit [Ping timeout: 260 seconds]
xvmt has quit [Remote host closed the connection]
xvmt has joined #osdev
xvmt has quit [Remote host closed the connection]
xvmt has joined #osdev
xenos1984 has joined #osdev
gog has joined #osdev
srjek has quit [Read error: Connection reset by peer]
xvmt_ has joined #osdev
xvmt has quit [Read error: Connection reset by peer]
xvmt_ is now known as xvmt
fedorafan has quit [Ping timeout: 256 seconds]
fedorafan has joined #osdev
<gorgonical> how's everyone's fridays going?
<clever> its friday? lol
<gorgonical> unless my calendar is really wrong
<gorgonical> But I am in fact acutely aware of what day it is because of diet
<slidercrank> the day depends on the country
<gorgonical> yes I suppose for people like klange it is already Saturday
<gorgonical> And maybe Russians are far enough forward?
<heat> no
<heat> maybe in asia
<clever> cd
<heat> cd ~/clever
<gorgonical> yeah I'm -5 here and I don't know what russia is. They'd have to be +4
<gorgonical> According to a map almost nobody is just +4
<slidercrank> gorgonical, in part of Russia it's Saturday, in the other - still Friday
<gorgonical> It seems maybe the caucasus countries and oman are the only national +4
<gorgonical> wow this timezone map is awful. So many places completely misaligned with the longitudinal demarcation of the zone they're in
<heat> gmt 4 life
<gorgonical> gog what is the meaning of iceland being gmt
<gorgonical> the westfjords should even be in -2 based on position
<heat> if iceland shifts to -2 the brits will invade them again
<gorgonical> in other news my forth interpreter is getting pretty close to being "done" and I'll just have to write the rest in forth itself
<gorgonical> After catching a whole bunch of switched a0/a1 registers and memory alignment bugs it now actually runs whole words
<heat> you are disgusting
<gorgonical> i still don't know if riscv asm can do indirect jumps
<gorgonical> because I used a syntax that one manual says will do an indirect jump but it definitely did not in qemu
<heat> which one?
<gorgonical> jalr zero, (a0) should do it
<gorgonical> but for qemu that seems to just be equivalent to jalr zero, a0
<gorgonical> this one manual implied adding the memory access parens would suggest an indirect jump
<heat> yeah gcc doesn't seem to have anything of sorts
<bslsk05> ​godbolt.org: Compiler Explorer
<GeDaMo> Can riscv do memory indirect or do you have to load to a register first?
<heat> wait, wrong example
<heat> GeDaMo, load afaik
<gorgonical> GeDaMo: load yeah
<gorgonical> I had to change it to ld a0, (a0); j a0
<heat> ld a0, 0(a0)
<heat> jalr a0
<heat> so that answers your question
<heat> if jalr zero, 0(a0) was ever a thing, it's syntactic sugar for ld + jalr
<gorgonical> must have been
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> meanwhile chad x86
<gorgonical> don't taunt me
<gorgonical> though personally it does make programming directly in asm a lot easier
<mjg> who is highlighting me
<gorgonical> I have been writing a lot of aarch64 asm and I'm furious about it usually
<heat> wait
<heat> wtf is it doing
<heat> why is it saving %rax
<gorgonical> mjg: i don't see any mentions
<mjg> > chad
<mjg> that was it
<gorgonical> lmao
<heat> func: # @func
<heat> callq *(%rdi)
<heat> popq %rcx
<heat> addl $20, %eax
<heat> pushq %rax
<heat> retq
<heat> am I going cray-cray or does this make no sense?
<GeDaMo> Aligning the stack?
<heat> for int func(int(**f)(void)) { return (*f)() + 20; }
<heat> ooooooooh
<heat> maybe so
<gorgonical> does the stack need alignment on x86?
<heat> yes
<mjg> yes and no
<heat> GeDaMo, great one! seems to be it
<heat> gcc just does sub and add
<heat> now this makes me wonder, why does clang seem to codegen crap here?
<GeDaMo> 16 bytes
<heat> push %rax makes it depend on %rax
<heat> cc chad
<mjg> again with the highlights
<GeDaMo> The return address pushed by the call misaligns it
<gorgonical> i wasn't aware that the stack wanted/needs to be 16-byte aligned
<heat> gorgonical, yeah, it's there on sysv at least cuz of SSE
<mjg> gorgonical: that's only true if you use simd
<gorgonical> oooh
slidercrank has quit [Ping timeout: 248 seconds]
<heat> I think it's still true on -mgeneral-regs-only
<gorgonical> because I'm used to this on arm64, hence ldrp instructions and stuff
<heat> mjg, but seriously mr chad doesn't that make like 0 sense
<mjg> dude i'm running on negative brainpower today
<heat> unless you did something like xor %eax, %eax; push %rax to break the dependency
<GeDaMo> You can directly alter the stack pointer too
<heat> yes, gcc does that
Brnocrist has quit [Ping timeout: 268 seconds]
<gorgonical> then it is a good question why clang just pushes garbage
<mjg> lol it has tendra
Brnocrist has joined #osdev
masoudd_ has quit [Quit: Leaving]
<GeDaMo> The only reason that comes to mind is instruction size
dude12312414 has quit [Remote host closed the connection]
dude12312414 has joined #osdev
elastic_dog is now known as Guest218
elastic_dog has joined #osdev
<gog> gorgonical: my hypothesis is that it's to keep us more in line with business time in most of europe
<gog> particularly banking and securities trading
<gog> and that this is owing to iceland's recent history as a dubious and probably corrupt financial player
<heat> GeDaMo, would make little sense considering I passed -O3 and not -Os
<gog> and in the case of our infamous finance minister Bjarni Benediktsson, plainly corruppt
<GeDaMo> Pfft! You can't expect compilers to make sense :P
<zid`> clang pushes garbage because iceland is corrupt, got it
* zid` paying attention
<heat> Big Iceland controls the toolchains
<geist> re push vs add, i'm guessing it's a combination of instruction size and/or various optimizations for various microarches where sometimes pushes vs direct stack instructions are faster. if you're not specifying a -march it may be up to whatever each compiler thinks they're tuning for
<geist> i do remember there was a lot of back and forth on fiddling with stack pointer via anything other than push/pop being slow/fast/maybe
<heat> yeah but in this case you do not care about what you're pushing
<heat> so doing a mindless pushq %rax can stall the pipeline no?
<geist> right, and thus it's just there to align the stack
<geist> i doubt it, stack stuff is optimized out the wazoo
<geist> flip side is in some microarches, fiddling with SP directly may stall, because it may have to synchronize the stack engine, etc
<heat> you think the cpu will notice you never look at it?
<geist> the push probably not, the pop maybe?
<geist> as soeone else mentioned, arm64 has a lot of these trash push/pops to keep alignment
<geist> via ldp/stp and sometimes using xzr as one of the regs
divine has quit [Quit: Lost terminal]
<heat> wait, how much can the CPU optimize the stack?
<zid`> I bet it doesn't matter unless eax isn't "settled" by the point of the push
<heat> if you do e.g 1: push %rax; pop %rax; jmp 1b, is %rax ever written to the stack?
<heat> can it do something really smart and e.g only write if you read that memory region from another thread? or if you get interrupted?
<bslsk05> ​en.wikichip.org: Skylake (client) - Microarchitectures - Intel - WikiChip
<zid`> depends how good the uop optimiztion bits are I guess
<zid`> I doubt that has a fuse though
<zid`> zen2/4 might be able to do it
<geist> yeah there's a ton of optimizations around the stack. it's one of the reason arm moved the SP out of the main register file as well
<heat> this asks for a benchmark doesn't it
<geist> i think it's fairly standard practice to hae a cached copy of the SP floating around fairly early in the pipeline, outside of the general register file so it can be fast forwarded between stages to remove any interdependencies between instructions
divine has joined #osdev
<geist> historially i remember this meant something like if you did a bunch of push/pops in a row and then tried to read the ESP you'd get a stall because it'd have to 'write back' the cached SP to the main register file first
<zid`> heat we playing dark souls instead of this?
<heat> no
<zid`> even though I was *promised* dark souls? wow
hmmmm has joined #osdev
<heat> pushpop 3.46 ns 3.45 ns 203580329
<heat> mov 0.968 ns 0.966 ns 704777624
<zid`> try it on zen2/4
<heat> benchmark of 11 push %rax; pop %rbx vs mov %rax, %rbx
<zid`> and you definitely didn't straddle an icache line, and you put some gumpf before so the decode was nice and old etc?
<bslsk05> ​gist.github.com: mov-vs-push.cpp · GitHub
<heat> that's all I did
<zid`> doesn't account for many conflating effects then
<zid`> I imagine it's still slower though
<heat> push %rax; pop %rbx is actually smaller than mov %rax, %rbx
<heat> lol
<bslsk05> ​gist.github.com: mov-vs-push2.cpp · GitHub
<heat> with src and dst constantly swapped
<geist> well, yeah i mean of course the mov is faster
<heat> pushpop 13.6 ns 13.6 ns 51406861
<heat> mov 1.50 ns 1.50 ns 464182520
<geist> that just register renames
<zid`> stack renaming is also a thing sometimes though
<heat> yes, I was wondering if an x86 core could also rename that
<geist> ah
<zid`> zen2/zen4 is your best bet, it can do m emory renaming for sure
<geist> yeah re: the original thing the question is 'silly push/pop vs add/sub to rsp'
<zid`> and it may see the push and pop as a [rsp] that it renames
<heat> let me bench that as well
<zid`> no stop it
<heat> also no zid no dark soul
<geist> but even that might be hard to bench because it would be the interlocking of other stuff going around it at the time
<geist> ie, add to rsp when there is a call right before/after it (which also fiddles with the stack)
<zid`> wyhy no dark soul, you did a promise
<heat> i am lie
<heat> we are doing science here
<zid`> your science is rudimentary and flawed and also boring
<heat> i find it fascinating
<heat> can't debate with the other 2 though
<zid`> stick to locomotives like the rest of us
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
<bslsk05> ​gist.github.com: pushpopvssubadd.cpp · GitHub
<heat> in my kabylake
<heat> pushpop 13.7 ns 13.6 ns 51325676
<heat> pushpop2 3.47 ns 3.45 ns 201544501
<heat> subadd 6.54 ns 6.52 ns 107096910
<heat> i suspect I successfully got pipeline stalls in pushpop
<zid`> can I have that last binary
<heat> ok, discording you
<zid`> what is libbenchmark
<zid`> and why is it not an .a
<heat> oh shoot
<heat> it's google benchmark
<zid`> I am not a google mainframe sadly
<heat> let me see if I can get a static
<heat> nope
<heat> i'll give you the so
<zid`> so my PC is better at moving but worse at pushing
<heat> intel pt sampling on pushpop, pushpop2, subadd
<heat> i don't fully understand whats going on here but it seems interesting
Left_Turn has joined #osdev
<heat> my cpu does not seem to have a stalled cycles pmc
<mrvn> Why do you have a 5 opcode function? Why isn't that inlined? Embrace LTO and your whole benchmark becomes artificall.
<mrvn> WHat's the stack alignment on aarch64? 128bit?
<heat> yes 16b
<geist> also fun thing that you can enable but virtually all systems do, there's two control bits that you can set for EL0 and EL1 that cause it to instantly throw an exception if SP is ever for any reason unaligned to 16B
<mrvn> Other than the double register load/store does it even matter?
<heat> simd
<heat> probably perf
<mrvn> heat: I throw in simd load/store with double register load/store. Anything above 8 byte.
<moon-child> wtf is this benchmark
<moon-child> like what is it even trying to measure
<heat> sub add vs push pop
<moon-child> but why?
<moon-child> no one does just subs and adds or just pushes and pops
<heat> clang appears to
<mrvn> moon-child: except gcc vs. clang
<heat> for alignment stuff
<moon-child> yea they do that for stack alignment
<moon-child> and then they go and do other stuff
<mrvn> moon-child: The question remains though why one compiler prefers to push an extra reg while the other adds 8 to keep the alignment.
<moon-child> code size. push is better. But this doesn't demonstrate that because literally all it's doing is pushing and popping
<heat> why does that mean push is better?
<mrvn> heat: he means it's smaller.
<heat> no, he means better
<heat> "push is better"
<mrvn> prefixed by "code size"
dutch has quit [Ping timeout: 256 seconds]
dutch has joined #osdev
danilogondolfo has quit [Remote host closed the connection]
bgs has quit [Remote host closed the connection]
mctpyt has joined #osdev
elastic_dog has quit [Ping timeout: 264 seconds]
elastic_dog has joined #osdev
mctpyt has quit [Ping timeout: 256 seconds]