#osdev on 2023-01-27 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <zid`> when a river erodes the turn it's going around and ends up straight again

00:00 <zid`> and kicks off a lake

00:00 <gog> ohhh yeah

00:00 <zid`> https://upload.wikimedia.org/wikipedia/commons/e/e3/Meander_Oxbow_development.svg

00:00 <gog> yes

00:00 <zid`> it's like a reverse atoll

00:01 <zid`> https://upload.wikimedia.org/wikipedia/commons/e/e6/Wake_Island_air.JPG I played a lot of bf2 here

00:02 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

00:21 mctpyt has joined #osdev

00:24 nyah has quit [Quit: leaving]

00:32 Turn_Left has joined #osdev

00:34 Left_Turn has quit [Ping timeout: 252 seconds]

01:14 Burgundy has quit [Ping timeout: 246 seconds]

01:17 dutch has quit [Quit: WeeChat 3.8]

01:18 spikeheron has joined #osdev

01:23 gog has quit [Quit: byee]

01:40 <gorgonical> Question about GIC interrupt grouping: the GICD and GICR have separate registers for configuringthe group and security level that an interrupt has.

01:41 <gorgonical> I'm guessing that the GICD is in charge of SPI interrupt configuration and the GICR is in charge of the per-CPU interrupts like SGI, PPI?

01:41 <gorgonical> The main question is whether there's a "hierarchy" since although SGIs originate at the CPU interface and go to the GICR, they have to go to the GICD to make it to another CPU. So then in that case the first GICR determines the type? The second one? The GICD?

01:42 bradd has quit [Ping timeout: 248 seconds]

01:43 mctpyt has quit [Ping timeout: 260 seconds]

02:11 tiggster has joined #osdev

02:28 <geist> think of the GICR as the local apic and the GICD as an ioapic, iirc

02:29 <geist> one of them is indeed per cpu, the other is more of a gloal thing

02:29 <geist> SGIs i think Just Happen on the other core and there's not really any real overall configuration, since the range is basically reserved

02:29 <geist> but this is just off of memory, so i might be wrong

02:44 mctpyt has joined #osdev

02:50 <gorgonical> hmm

02:50 <gorgonical> If there's no configuration then that suggests anyone can SGI a secure core right?

02:50 zxrom has joined #osdev

03:35 srjek has quit [Ping timeout: 268 seconds]

03:41 heat has quit [Ping timeout: 248 seconds]

03:43 fedorafansuper has quit [Quit: Textual IRC Client: www.textualapp.com]

03:43 mctpyt has quit [Ping timeout: 252 seconds]

03:52 gildasio has quit [Ping timeout: 255 seconds]

03:59 gildasio has joined #osdev

04:15 <geist> oh in a hypervisor situation that's a different story, but you're right i think if there's a separate core, then yeah i think there' dneed to be some way to mask it off

04:16 <geist> but i dont have the spec in front of me, there may be a mechanism to configure it locally

04:16 <geist> at least some sort of local interrupt mask for sure for that SGI

04:16 <geist> but i dont think there's necessarily a way of specifying which cores can SGI which other cores

04:16 <geist> *aside* from whatever virtualization extensions EL2 may implement

04:18 joe9 has quit [Quit: leaving]

04:38 spikeheron has quit [Quit: WeeChat 3.8]

04:43 <moon-child> is it slow to send ipi, or just to receive them?

04:43 Clockface has joined #osdev

04:43 dutch has joined #osdev

04:44 <Clockface> whats the most practical way to emulate a specific peice of hardware for other kernel mode code

04:44 <Clockface> will i have to just intercept every I/O thing from everything else

04:44 <Clockface> and then replicate all of it "for real"

04:44 <Clockface> except the stuff connecting to the fake device

05:08 bradd has joined #osdev

05:30 slidercrank has joined #osdev

06:02 foudfou has quit [Ping timeout: 255 seconds]

06:02 foudfou has joined #osdev

06:02 Vercas6 has joined #osdev

06:03 Vercas has quit [Ping timeout: 255 seconds]

06:03 Vercas6 is now known as Vercas

06:11 zxrom has quit [Read error: Connection reset by peer]

06:16 mctpyt has joined #osdev

06:21 mctpyt has quit [Ping timeout: 246 seconds]

06:33 foudfou has quit [Quit: Bye]

06:34 foudfou has joined #osdev

06:43 foudfou has quit [Remote host closed the connection]

06:43 foudfou has joined #osdev

07:05 bgs has joined #osdev

07:20 jjuran has quit [Quit: Killing Colloquy first, before it kills me…]

07:21 jjuran has joined #osdev

07:34 Vercas has quit [Quit: Ping timeout (120 seconds)]

07:35 Vercas has joined #osdev

07:38 epony has joined #osdev

08:04 masoudd has joined #osdev

08:12 Vercas has quit [Remote host closed the connection]

08:13 Vercas has joined #osdev

08:38 bradd has quit [Remote host closed the connection]

08:39 danilogondolfo has joined #osdev

08:41 bradd has joined #osdev

08:53 hmmmm has quit [Remote host closed the connection]

09:25 gog has joined #osdev

09:37 slidercrank has quit [Ping timeout: 255 seconds]

09:44 Vercas9 has joined #osdev

09:45 fedorafan has joined #osdev

09:45 Vercas has quit [Ping timeout: 255 seconds]

09:45 Vercas9 is now known as Vercas

09:57 mahk has quit [Ping timeout: 260 seconds]

10:27 GeDaMo has joined #osdev

10:38 elastic_dog has quit [Read error: Connection reset by peer]

10:39 elastic_dog has joined #osdev

11:20 Burgundy has joined #osdev

11:50 les has joined #osdev

11:53 les has quit [Client Quit]

11:55 les has joined #osdev

11:57 mahk has joined #osdev

12:04 mahk has quit [Ping timeout: 248 seconds]

12:19 mahk has joined #osdev

12:20 foudfou has quit [Ping timeout: 255 seconds]

12:26 foudfou has joined #osdev

12:29 foudfou has quit [Remote host closed the connection]

12:30 foudfou has joined #osdev

12:43 slidercrank has joined #osdev

12:51 <netbsduser`> a question about unified buffer caches: in general i know pages of these to get a different treatment from e.g. anonymous pages, because pages of a page cache get written out to their backing store regularly (i think on linux every 30s) rather than just in response to page replacement deciding that a page has to be put back to make room for another. but nonetheless they also get put back to disk in response to typical page replacement demands too

12:52 <netbsduser`> so consider the case of certain filesystems, which have to enact invariants, like "this journal block has to be written before that metadata block is, else all hell breaks loose." i know that there are a lot of filesystems which do in fact write journals lazily. what approach is usually taken in unified buffer caches to describe such invariants and to ensure that they are not violated by normal page replacement policy?

12:58 <netbsduser`> i have considered two approaches: one is to let the `struct buf`s associated with a UBC hold dependency information. this would allow the pageout daemon to continue to enact its own policy on page replacement (if it calls a page eligible for swapout, and beholds it contains bufs which have dependencies, it would then write those dependencies out first.) another is to have it handled at the filesystem level. the page descriptions (or bufs they

12:58 <netbsduser`> contain) would be marked to say, "fs driver will handle these ones"

13:04 bradd has quit [Ping timeout: 248 seconds]

13:04 joe9 has joined #osdev

13:05 <mrvn> you write out the dependencies then throw in a barrier/flush and only then the depending blocks.

13:06 <mrvn> the kernel will not reorder I/O across barriers

13:07 <mrvn> which is also a problem. Because when you fsync() a file the updates can be stuck behind barrier with tons of unrelated data and they can't be fast tracked because that would require crossing the barrier.

13:08 <mrvn> If you write your own IO system then having a dependency / order graph seems like an improvement over the simple queue strategy generally used.

13:12 <netbsduser`> mrvn: but who writes them by that order? would, let's say, the FS driver submit asynchronous writes to the I/O system and then it maintains the ordering information and if e.g. the pageout daemon wants to write out a page, the I/O system checks it against its queue of pending writes and orders appropriately? that might be a wiser approach than either of what i was considering

13:26 [itchyjunk] has joined #osdev

13:47 mctpyt has joined #osdev

13:50 [itchyjunk] has quit [Read error: Connection reset by peer]

13:51 heat has joined #osdev

13:52 mctpyt has quit [Ping timeout: 248 seconds]

13:58 [itchyjunk] has joined #osdev

13:59 heat has quit [Remote host closed the connection]

14:01 heat has joined #osdev

14:03 [itchyjunk] has quit [Read error: Connection reset by peer]

14:03 [_] has joined #osdev

14:23 craigo has quit [Ping timeout: 252 seconds]

15:01 dutch has quit [Quit: WeeChat 3.8]

15:06 <mrvn> netbsduser`: each IO layer writes their queue in the order the barriers enforce

15:08 <mrvn> There is also no checking. The I/O layers simply perform the IO they are told to do. If you write out a page twice it gets written out twice if there is a barrier between them. Maybe even always.

15:20 fedorafan has quit [Ping timeout: 248 seconds]

15:22 dutch has joined #osdev

15:27 fedorafan has joined #osdev

15:37 srjek has joined #osdev

15:46 aoei is now known as Stella

16:03 <kaichiuchi> hi

16:13 <heat> hai

16:19 <gog> hi

16:32 masoudd has quit [Remote host closed the connection]

16:33 masoudd has joined #osdev

16:49 bauen1_ has joined #osdev

16:50 bauen1 has quit [Ping timeout: 252 seconds]

16:56 <heat> "However, the modularity of UEFI also makes it easier for HP to innovate. HP DayStarter is a simple value-add to the system allowing users to have access to productivity information while waiting for the system to boot"

16:56 <heat> oh my fucking god

16:56 <heat> https://i.imgur.com/gR89tr6.png

16:57 <gog> this is not what uefi is for but it's the inevitable consequence of making pre-boot application development easier

16:58 <gog> good job

16:58 <gog> we heard you liked operating systems so we put an operating system into your firmware

16:59 gog has quit [Quit: Konversation terminated!]

16:59 <heat> late stage capitalism EFI

16:59 <kof123> late stage osdev

16:59 <kof123> devours its children

17:04 Vercas has quit [Remote host closed the connection]

17:05 Vercas has joined #osdev

17:10 <sakasama> Thank you HP DayStarter. Without this innovative technology I may never have known that useful fact about Chuck Norris.

17:12 <heat> i hope you all realize this is done in SMM

17:13 knusbaum has quit [Ping timeout: 248 seconds]

17:14 knusbaum has joined #osdev

17:14 <sakasama> I've heard of that! It's kind of like BDSM but participants need double the masochism.

17:22 masoudd has quit [Remote host closed the connection]

17:23 masoudd_ has joined #osdev

17:25 <heat> no, that is BSD

17:26 xenos1984 has quit [Ping timeout: 248 seconds]

17:27 xenos1984 has joined #osdev

17:28 Turn_Left has quit [Ping timeout: 252 seconds]

17:30 dude12312414 has joined #osdev

17:39 <clever> heat: isnt that just a clone of a minimal linux env in the flash? or does it run along side the os??

17:40 <clever> oh, checking the screenshot, it looks more like an odd overlay, after the bootloader has ran??

17:40 <clever> but where is it getting that data from

17:40 <heat> clever, The benefits to the customers are the instant-on user experience with user productivity information (such as calendar, to-do list and customizable information) available for display before and while Windows is booting. The main technology behind it is for the UEFI BIOS to locate the proper JPEG images and use the System Management Mode (SMM) to update the frame buffer content until Windows is ready for system login. At OS runtime, HP

17:40 <heat> implements an Outlook plug-in to capture the calendar information.

17:41 <heat> it uses fucking SMM

17:41 <heat> i hope they do jpeg decoding in SMM for the big funny

17:41 <clever> heat: windows already has a cheat for instant on, they renamed hibernate to shutdown :P

17:41 <clever> so when you think youve turned it off, it just went into hibernate

17:41 <heat> yes, this was in 2011

17:41 <clever> ah

17:41 <heat> imagine how much better DayStarter is these days!

17:42 <clever> smm also explains most of my questions

17:42 <clever> now it can be just as anoying as the HUD on my tv, getting in the way and covering up valuable UI elements

17:42 <clever> until it times out

17:43 <heat> modern daystarter should play youtube vids in SMM :v

17:43 <clever> or, you know, just boot faster :P

17:44 <heat> hmm, good point

17:44 <heat> there's room for a tiktok or two

17:44 <clever> but i have had a similar idea in the past, with that dislay on the apple keyboard

17:44 <clever> where they replaced the F1-F12 row, with what is basically an ipad

17:44 <clever> fully self-contained computer

17:45 <mats2> outlook in uefi

17:45 <mats2> amazing innovation

17:45 <clever> why not allow that to run on the keyboard, with the system off?

17:45 <clever> give it access to email, and calendar

17:45 <mats2> who needs windows when you have uefi

17:48 <GeDaMo> https://linux.slashdot.org/story/02/06/15/1416224/a-web-browser-in-your-bios

17:48 <bslsk05> linux.slashdot.org: A Web Browser in Your BIOS? - Slashdot

17:53 <acidx> with a web browser in the bios, who needs an operating system?

17:56 xenos1984 has quit [Ping timeout: 248 seconds]

18:04 <mrvn> heat: oh how I would lought to prevent crying when the SMM shows a popup that flash has to be updated.

18:10 xenos1984 has joined #osdev

18:14 <clever> acidx: thats basically what that linux in the bios did

18:14 <heat> linux in the bios is more alive than ever

18:15 <heat> i think google has been deploying LinuxBIOS at scale

18:15 <acidx> when I had Linux in the BIOS, I used it mostly as a makeshift "secure" bootloader

18:15 <heat> sorry, not linuxbios, linuxboot

18:16 <acidx> the kernel was even built without networking and whatnot

18:18 xenos1984 has quit [Ping timeout: 260 seconds]

18:18 xvmt has quit [Remote host closed the connection]

18:25 xvmt has joined #osdev

18:27 xvmt has quit [Remote host closed the connection]

18:29 xvmt has joined #osdev

18:33 xenos1984 has joined #osdev

18:57 gog has joined #osdev

19:18 srjek has quit [Read error: Connection reset by peer]

19:36 xvmt_ has joined #osdev

19:36 xvmt has quit [Read error: Connection reset by peer]

19:36 xvmt_ is now known as xvmt

19:37 fedorafan has quit [Ping timeout: 256 seconds]

19:41 fedorafan has joined #osdev

20:17 <gorgonical> how's everyone's fridays going?

20:18 <clever> its friday? lol

20:18 <gorgonical> unless my calendar is really wrong

20:18 <gorgonical> But I am in fact acutely aware of what day it is because of diet

20:19 <slidercrank> the day depends on the country

20:19 <gorgonical> yes I suppose for people like klange it is already Saturday

20:19 <gorgonical> And maybe Russians are far enough forward?

20:19 <heat> no

20:20 <heat> maybe in asia

20:20 <clever> cd

20:20 <heat> cd ~/clever

20:20 <gorgonical> yeah I'm -5 here and I don't know what russia is. They'd have to be +4

20:21 <gorgonical> According to a map almost nobody is just +4

20:21 <slidercrank> gorgonical, in part of Russia it's Saturday, in the other - still Friday

20:21 <gorgonical> It seems maybe the caucasus countries and oman are the only national +4

20:22 <gorgonical> wow this timezone map is awful. So many places completely misaligned with the longitudinal demarcation of the zone they're in

20:22 <heat> gmt 4 life

20:22 <gorgonical> gog what is the meaning of iceland being gmt

20:22 <gorgonical> the westfjords should even be in -2 based on position

20:25 <heat> if iceland shifts to -2 the brits will invade them again

20:26 <gorgonical> in other news my forth interpreter is getting pretty close to being "done" and I'll just have to write the rest in forth itself

20:27 <gorgonical> After catching a whole bunch of switched a0/a1 registers and memory alignment bugs it now actually runs whole words

20:27 <heat> you are disgusting

20:27 <gorgonical> i still don't know if riscv asm can do indirect jumps

20:27 <gorgonical> because I used a syntax that one manual says will do an indirect jump but it definitely did not in qemu

20:27 <heat> which one?

20:28 <gorgonical> jalr zero, (a0) should do it

20:28 <gorgonical> but for qemu that seems to just be equivalent to jalr zero, a0

20:28 <gorgonical> this one manual implied adding the memory access parens would suggest an indirect jump

20:30 <heat> yeah gcc doesn't seem to have anything of sorts

20:30 <heat> https://godbolt.org/z/cfh9vs5W9

20:30 <bslsk05> godbolt.org: Compiler Explorer

20:31 <GeDaMo> Can riscv do memory indirect or do you have to load to a register first?

20:31 <heat> wait, wrong example

20:31 <heat> GeDaMo, load afaik

20:31 <gorgonical> GeDaMo: load yeah

20:31 <gorgonical> I had to change it to ld a0, (a0); j a0

20:32 <heat> ld a0, 0(a0)

20:32 <heat> jalr a0

20:32 <heat> so that answers your question

20:32 <heat> if jalr zero, 0(a0) was ever a thing, it's syntactic sugar for ld + jalr

20:32 <gorgonical> must have been

20:33 <heat> https://godbolt.org/z/zdhrM7fTY

20:33 <bslsk05> godbolt.org: Compiler Explorer

20:33 <heat> meanwhile chad x86

20:34 <gorgonical> don't taunt me

20:34 <gorgonical> though personally it does make programming directly in asm a lot easier

20:34 <mjg> who is highlighting me

20:35 <gorgonical> I have been writing a lot of aarch64 asm and I'm furious about it usually

20:35 <heat> wait

20:35 <heat> wtf is it doing

20:35 <heat> why is it saving %rax

20:35 <gorgonical> mjg: i don't see any mentions

20:36 <mjg> > chad

20:36 <mjg> that was it

20:36 <gorgonical> lmao

20:36 <heat> func: # @func

20:36 <heat> callq *(%rdi)

20:36 <heat> popq %rcx

20:36 <heat> addl $20, %eax

20:36 <heat> pushq %rax

20:36 <heat> retq

20:37 <heat> am I going cray-cray or does this make no sense?

20:37 <GeDaMo> Aligning the stack?

20:37 <heat> for int func(int(**f)(void)) { return (*f)() + 20; }

20:37 <heat> ooooooooh

20:37 <heat> maybe so

20:37 <gorgonical> does the stack need alignment on x86?

20:37 <heat> yes

20:38 <mjg> yes and no

20:38 <heat> GeDaMo, great one! seems to be it

20:38 <heat> gcc just does sub and add

20:38 <heat> now this makes me wonder, why does clang seem to codegen crap here?

20:38 <GeDaMo> 16 bytes

20:38 <heat> push %rax makes it depend on %rax

20:38 <heat> cc chad

20:39 <mjg> again with the highlights

20:39 <GeDaMo> The return address pushed by the call misaligns it

20:39 <gorgonical> i wasn't aware that the stack wanted/needs to be 16-byte aligned

20:39 <heat> gorgonical, yeah, it's there on sysv at least cuz of SSE

20:39 <mjg> gorgonical: that's only true if you use simd

20:39 <gorgonical> oooh

20:40 slidercrank has quit [Ping timeout: 248 seconds]

20:40 <heat> I think it's still true on -mgeneral-regs-only

20:40 <gorgonical> because I'm used to this on arm64, hence ldrp instructions and stuff

20:40 <heat> mjg, but seriously mr chad doesn't that make like 0 sense

20:40 <mjg> dude i'm running on negative brainpower today

20:40 <heat> unless you did something like xor %eax, %eax; push %rax to break the dependency

20:41 <GeDaMo> You can directly alter the stack pointer too

20:41 <heat> yes, gcc does that

20:41 Brnocrist has quit [Ping timeout: 268 seconds]

20:41 <gorgonical> then it is a good question why clang just pushes garbage

20:41 <mjg> lol it has tendra

20:41 Brnocrist has joined #osdev

20:46 masoudd_ has quit [Quit: Leaving]

20:50 <GeDaMo> The only reason that comes to mind is instruction size

20:52 dude12312414 has quit [Remote host closed the connection]

20:53 dude12312414 has joined #osdev

21:12 elastic_dog is now known as Guest218

21:12 elastic_dog has joined #osdev

21:24 <gog> gorgonical: my hypothesis is that it's to keep us more in line with business time in most of europe

21:24 <gog> particularly banking and securities trading

21:24 <gog> and that this is owing to iceland's recent history as a dubious and probably corrupt financial player

21:25 <heat> GeDaMo, would make little sense considering I passed -O3 and not -Os

21:25 <gog> and in the case of our infamous finance minister Bjarni Benediktsson, plainly corruppt

21:25 <GeDaMo> Pfft! You can't expect compilers to make sense :P

21:27 <zid`> clang pushes garbage because iceland is corrupt, got it

21:27 * zid` paying attention

21:28 <heat> Big Iceland controls the toolchains

21:30 <geist> re push vs add, i'm guessing it's a combination of instruction size and/or various optimizations for various microarches where sometimes pushes vs direct stack instructions are faster. if you're not specifying a -march it may be up to whatever each compiler thinks they're tuning for

21:31 <geist> i do remember there was a lot of back and forth on fiddling with stack pointer via anything other than push/pop being slow/fast/maybe

21:31 <heat> yeah but in this case you do not care about what you're pushing

21:31 <heat> so doing a mindless pushq %rax can stall the pipeline no?

21:31 <geist> right, and thus it's just there to align the stack

21:31 <geist> i doubt it, stack stuff is optimized out the wazoo

21:32 <geist> flip side is in some microarches, fiddling with SP directly may stall, because it may have to synchronize the stack engine, etc

21:32 <heat> you think the cpu will notice you never look at it?

21:32 <geist> the push probably not, the pop maybe?

21:33 <geist> as soeone else mentioned, arm64 has a lot of these trash push/pops to keep alignment

21:33 <geist> via ldp/stp and sometimes using xzr as one of the regs

21:34 divine has quit [Quit: Lost terminal]

21:35 <heat> wait, how much can the CPU optimize the stack?

21:36 <zid`> I bet it doesn't matter unless eax isn't "settled" by the point of the push

21:36 <heat> if you do e.g 1: push %rax; pop %rax; jmp 1b, is %rax ever written to the stack?

21:37 <heat> can it do something really smart and e.g only write if you read that memory region from another thread? or if you get interrupted?

21:38 <GeDaMo> https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#MSROM_.26_Stack_Engine

21:38 <bslsk05> en.wikichip.org: Skylake (client) - Microarchitectures - Intel - WikiChip

21:39 <zid`> depends how good the uop optimiztion bits are I guess

21:39 <zid`> I doubt that has a fuse though

21:39 <zid`> zen2/4 might be able to do it

21:39 <geist> yeah there's a ton of optimizations around the stack. it's one of the reason arm moved the SP out of the main register file as well

21:40 <heat> this asks for a benchmark doesn't it

21:40 <geist> i think it's fairly standard practice to hae a cached copy of the SP floating around fairly early in the pipeline, outside of the general register file so it can be fast forwarded between stages to remove any interdependencies between instructions

21:40 divine has joined #osdev

21:41 <geist> historially i remember this meant something like if you did a bunch of push/pops in a row and then tried to read the ESP you'd get a stall because it'd have to 'write back' the cached SP to the main register file first

21:41 <zid`> heat we playing dark souls instead of this?

21:42 <heat> no

21:42 <zid`> even though I was *promised* dark souls? wow

21:46 hmmmm has joined #osdev

21:46 <heat> pushpop 3.46 ns 3.45 ns 203580329

21:46 <heat> mov 0.968 ns 0.966 ns 704777624

21:46 <zid`> try it on zen2/4

21:47 <heat> benchmark of 11 push %rax; pop %rbx vs mov %rax, %rbx

21:47 <zid`> and you definitely didn't straddle an icache line, and you put some gumpf before so the decode was nice and old etc?

21:48 <heat> https://gist.github.com/heatd/789a400c749cc516fcfcc7fbdf2e9c45

21:48 <bslsk05> gist.github.com: mov-vs-push.cpp · GitHub

21:48 <heat> that's all I did

21:48 <zid`> doesn't account for many conflating effects then

21:48 <zid`> I imagine it's still slower though

21:49 <heat> push %rax; pop %rbx is actually smaller than mov %rax, %rbx

21:49 <heat> lol

21:53 <heat> ok, more weirdness: https://gist.github.com/heatd/96d246ec8d72b1a0ebe3911eb1da655b

21:53 <bslsk05> gist.github.com: mov-vs-push2.cpp · GitHub

21:53 <heat> with src and dst constantly swapped

21:53 <geist> well, yeah i mean of course the mov is faster

21:53 <heat> pushpop 13.6 ns 13.6 ns 51406861

21:53 <heat> mov 1.50 ns 1.50 ns 464182520

21:53 <geist> that just register renames

21:53 <zid`> stack renaming is also a thing sometimes though

21:53 <heat> yes, I was wondering if an x86 core could also rename that

21:53 <geist> ah

21:53 <zid`> zen2/zen4 is your best bet, it can do m emory renaming for sure

21:54 <geist> yeah re: the original thing the question is 'silly push/pop vs add/sub to rsp'

21:54 <zid`> and it may see the push and pop as a [rsp] that it renames

21:54 <heat> let me bench that as well

21:54 <zid`> no stop it

21:54 <heat> also no zid no dark soul

21:54 <geist> but even that might be hard to bench because it would be the interlocking of other stuff going around it at the time

21:54 <geist> ie, add to rsp when there is a call right before/after it (which also fiddles with the stack)

21:55 <zid`> wyhy no dark soul, you did a promise

21:55 <heat> i am lie

21:55 <heat> we are doing science here

21:55 <zid`> your science is rudimentary and flawed and also boring

21:56 <heat> i find it fascinating

21:56 <heat> can't debate with the other 2 though

21:56 <zid`> stick to locomotives like the rest of us

21:57 GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]

22:00 <heat> ok this one is interesting: https://gist.github.com/heatd/ed84257036e218f770fa93ae99bd07d7

22:00 <bslsk05> gist.github.com: pushpopvssubadd.cpp · GitHub

22:00 <heat> in my kabylake

22:00 <heat> pushpop 13.7 ns 13.6 ns 51325676

22:00 <heat> pushpop2 3.47 ns 3.45 ns 201544501

22:00 <heat> subadd 6.54 ns 6.52 ns 107096910

22:01 <heat> i suspect I successfully got pipeline stalls in pushpop

22:03 <zid`> can I have that last binary

22:03 <heat> ok, discording you

22:04 <zid`> what is libbenchmark

22:05 <zid`> and why is it not an .a

22:05 <heat> oh shoot

22:05 <heat> it's google benchmark

22:05 <zid`> I am not a google mainframe sadly

22:05 <heat> let me see if I can get a static

22:06 <heat> nope

22:06 <heat> i'll give you the so

22:08 <zid`> https://cdn.discordapp.com/attachments/1058163870453223465/1068653746525048993/image.png

22:08 <zid`> so my PC is better at moving but worse at pushing

22:17 <heat> https://cdn.discordapp.com/attachments/1058163870453223465/1068654716810166362/image.png https://cdn.discordapp.com/attachments/1058163870453223465/1068655166917722172/image.png https://cdn.discordapp.com/attachments/1058163870453223465/1068655641834573906/image.png

22:18 <heat> intel pt sampling on pushpop, pushpop2, subadd

22:19 <heat> i don't fully understand whats going on here but it seems interesting

22:19 Left_Turn has joined #osdev

22:19 <heat> my cpu does not seem to have a stalled cycles pmc

22:20 <mrvn> Why do you have a 5 opcode function? Why isn't that inlined? Embrace LTO and your whole benchmark becomes artificall.

22:22 <mrvn> WHat's the stack alignment on aarch64? 128bit?

22:23 <heat> yes 16b

22:23 <geist> also fun thing that you can enable but virtually all systems do, there's two control bits that you can set for EL0 and EL1 that cause it to instantly throw an exception if SP is ever for any reason unaligned to 16B

22:25 <mrvn> Other than the double register load/store does it even matter?

22:27 <heat> simd

22:27 <heat> probably perf

22:28 <mrvn> heat: I throw in simd load/store with double register load/store. Anything above 8 byte.

22:29 <moon-child> wtf is this benchmark

22:29 <moon-child> like what is it even trying to measure

22:29 <heat> sub add vs push pop

22:30 <moon-child> but why?

22:30 <moon-child> no one does just subs and adds or just pushes and pops

22:30 <heat> clang appears to

22:30 <mrvn> moon-child: except gcc vs. clang

22:31 <heat> for alignment stuff

22:31 <moon-child> yea they do that for stack alignment

22:31 <moon-child> and then they go and do other stuff

22:32 <mrvn> moon-child: The question remains though why one compiler prefers to push an extra reg while the other adds 8 to keep the alignment.

22:33 <moon-child> code size. push is better. But this doesn't demonstrate that because literally all it's doing is pushing and popping

22:36 <heat> why does that mean push is better?

22:37 <mrvn> heat: he means it's smaller.

22:37 <heat> no, he means better

22:38 <heat> "push is better"

22:38 <mrvn> prefixed by "code size"

22:42 dutch has quit [Ping timeout: 256 seconds]

22:42 dutch has joined #osdev

22:44 danilogondolfo has quit [Remote host closed the connection]

23:11 bgs has quit [Remote host closed the connection]

23:14 mctpyt has joined #osdev

23:48 elastic_dog has quit [Ping timeout: 264 seconds]

23:49 elastic_dog has joined #osdev

23:55 mctpyt has quit [Ping timeout: 256 seconds]