klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
trufas has quit [Ping timeout: 258 seconds]
trufas has joined #osdev
<vai> doug16k: whats your project home website URL? if any? or github would be good, as well.. interested
<bslsk05> ​doug65536/dgos - Operating System (13 forks/99 stargazers/GPL-3.0)
<doug16k> see also qemu-rom - it's a toy BIOS project that supports multiple cpus
<doug16k> it's kind of osdev
<clever> doug16k: the original goal of me joining #osdev, was to write a quick&dirty "bios" rom, that would go from realmode to 64bit paging, in as few steps as possible, for use exclusively in qemu
<clever> but i never got that working right
<bslsk05> ​github.com: qemu-rom/entry_arch.S at master · doug65536/qemu-rom · GitHub
<doug16k> clever, my qemu-rom project does exactly that
<clever> i was also cheating, and had qemu generate the full paging table from the host side, so its already in the addr space, and you just set whatever cr it was
<clever> the original need for all of that is long gone though, so not much point in finishing it now
<doug16k> yeah, I considered putting the page tables in ROM. I want to map 16GB and only need 2MB pages, so it is not even close to fitting
<clever> my goal was to run a xen PV guest under plain qemu
<clever> i got surprisingly far, with an entirely userland based solution
tacco has quit []
<vai> doug16k: my bootloader is huge.. takes 4 kilos of static assembly compiled binary
<doug16k> vai, my bios bootloader is about 100KB
<vai> doug16k: so on all systems it loads up a single track full of boot sector content
<bslsk05> ​github.com: xen-mock/main.cpp at master · cleverca22/xen-mock · GitHub
<clever> doug16k: basically, i just loaded the guest (an elf file) kernel into ram, and blindly ran the entry point, with a hypercall page setup
<clever> and it worked surprisingly well, able to do hypercalls and print debug logging
<clever> the problem, is that the guest assumed a certain order for stack/heap/.text, and crashed horribly because of pthread putting the stack in the wrong place
<doug16k> vai, my bootloader can boot all 6 combinations of {bios|efi}-{disk|cd|lan}
<klange> > 7.7K Jun 17 23:34 cdrom/boot.sys
<doug16k> the elf code can't even tell it is on tftp
<klysm> doug16k, have you been interested in doing hypercalls?
<doug16k> I have paravirtualized EOI
<doug16k> can eoi by writing an int to regular ram, instead of vmexit
<klysm> end of interrupt
<doug16k> yeah
<bslsk05> ​github.com: xen-mock/main.cpp at master · cleverca22/xen-mock · GitHub
<doug16k> telling LAPIC "ok, done that one, gimme next lower priority one"
gog has joined #osdev
<clever> klysm: for XEN, there is a "hypercall page" containing an array of 32 byte functions, each one for doing a different job, and the xen hypervisor pre-fills it with whatever opcode is best suited for the cpu (such as vmexit)
<clever> klysm: the guest then just blindly treats that as a set of functions, and calls it with the normal calling convention
<doug16k> EOI happens thousands of times per second
<doug16k> under heavy I/O
<clever> actually, it might be 128 byte functions
<doug16k> if you paravirtualize exactly one thing, make it the EOI
<clever> no, 32, misread this old code
<klysm> paravirtual means a contrast to hardware virtual, what else does paravirtual mean?
<bslsk05> ​github.com: linux/xen.h at master · torvalds/linux · GitHub
<doug16k> paravirtualizing means, making the guest aware of the host and do a special thing that the host provides to avoid an expensive vmexit trap
<clever> klysm: in the case of xen, a paravirtual VM, will lack a lot of legacy/real hw, no bios, no emulating an ata controller, everything must be done with proper virtualized drivers
<klysm> clever, then there must be a mixed environment, also
lg has quit [Quit: leaving]
<clever> klysm: xen still allows the PV ops on a non-pv based VM
lg has joined #osdev
<doug16k> normally, eoi would trap and it would exit, and kvm would do some work just figuring out what happened, then it would do some insignificant thing and resume the guest
<clever> so a guest can boot via the legacy stuff (such as windows), but then switch to using PV based drivers (once you install them, and the boot has progressed that far)
<doug16k> the paravirtualized eoi has you doing a single cmpxchg in the guest and kvm peeks at it some other time
<doug16k> no vmexit
<klysm> is vmexit an instruction?
<clever> doug16k: ahh, so the guest can finish servicing an irq, and return to running guest kernel/userland code, and not bother context switching back to the host?
<doug16k> no, vmexit is the cpu freaking out because the hypervisor configured the cpu to not allow that instruction in the guest
lg has quit [Client Quit]
<doug16k> it stops running the guest and the hypervisor gets control
lg has joined #osdev
<klysm> yeah I was thinking it was a reference to an instruction that was supposed to trap, from context
<doug16k> that instruction or operation or access or whatever
<klysm> next question is how do you choose the opcode that's supposed to trap?
<klysm> this would be for x86
<doug16k> there are bits in the vm state / registers where you set what traps and what is just allowed
<klysm> so does the vm state have a list of opcodes?
<doug16k> you can make it so they see the real MSRs and they are allowed to really change them, or you can set it to trap it
<doug16k> there is a list of things you can allow/trap
<klysm> the list of things you can allow/trap, is it in the manual?
<doug16k> yeah let me see what term they use
<doug16k> vol 3 entire chapter 27
<doug16k> if you mean intel
<doug16k> AMD system programming manual appendix B lists a pile of bitflags that decide whether to trap things
<doug16k> vol 2
nyah has quit [Ping timeout: 268 seconds]
<klysm> looking at 24.7 now
<klysm> and 27.2.1
<doug16k> you could make a hypervisor that lets the real machine "leak through" to the guest, and the guest is affecting the real machine, all the way down to the hypervisor not seeing anything real at all, nothing but emulated devices
<doug16k> sorry, the hypervisor not letting the guest see anything real at all
<klysm> that makes a bit more sense
<doug16k> letting everything leak through is super fast. trapping everything is slowest
<klysm> "VM exits due to the following causes: debug exceptions; page-fault exceptions; start-up IPIs (SIPIs); system-management interrupts (SMIs) that arrive immediately after the retirement of I/O instructions; task switches; ...
<klysm> ...INVEPT; INVLPG; INVPCID; INVVPID; LGDT; LIDT; LLDT; LTR; SGDT; SIDT; SLDT; STR; VMCLEAR; VMPTRLD; VMPTRST; VMREAD; VMWRITE; VMXON; XRSTORS; XSAVES; control-register accesses; MOV DR; I/O instructions; MWAIT; accesses to the APIC-access page (see Section 29.4); EPT violations; EOI virtualization"
<doug16k> yeah you can't let everything through. I mean leatting as much as you can through
CryptoDavid has quit [Quit: Connection closed for inactivity]
<klysm> so those are instructions you can cause to trap
<doug16k> there are things that trap no matter what, and there are things that trap if you like
<klysm> can you make a trap for "IRET," then?
<doug16k> and most normal instructions can't be trapped
<klysm> okok
<geist> hmm, interesting. was piddling around on my AMD bios and it has two options for TPM: discrete (in case you have something plugged in which i dont) and 'firmware tpm'
<klange> There's something satisfyingly nostalgic about spinning up VGA text mode...
<geist> the firmare TPM seems to be some sort of emulation style thing that the AEGSA implements. curious how that works
<doug16k> geist, security processor provides it
<geist> oooh makes sense
<geist> ah yes, rebooted into linux and now i see a 'TPM2' ACPI table
<geist> presumably describes it
<doug16k> yeah, my understanding is, at cold start, that same arm security processor brings up ram and pci, then it fires up the zen2 cores
<doug16k> or whatever zen
<geist> yah makes sense. okay that expains how it's 'built in' to the cpu
<geist> i wonder if linux has any drivers or whatnot for it? this particular build doesn't seem to say anything in dmesg spew about it
<doug16k> it happens before the rom. they actually do a thing a lot like uboot
<doug16k> the rom doesn't even have to start at fffffff0
<doug16k> you can tell it to put you in ram and jump into you wherever
<geist> yah obviously linux has drivers and whatnot for it, but doesn't seem like this distro is trying to use it
<doug16k> it's a completely generic tpm
<doug16k> it should just work like a keyboard
<doug16k> right?
<doug16k> no such thing as installing tpm drivers afaik
<doug16k> that would be like installing PIC drivers no?
<geist> there is a /dev/tpm0
<geist> i just saw no boot spew and no module for it. could be built into the kernel
flx has quit [Quit: Leaving]
<doug16k> I am very glad that you don't use the drivers that came with a tpm. as if I trust some random driver
<doug16k> I think the completely generic nature of them is part of the security
<doug16k> you can use a thoroughly proven driver
<klysm> what is the "tpm driver" and what does it allow you to do to your tpm?
<doug16k> it is for storing secrets, like your full-disk-encryption key
<geist> well FWIW if i boot into windows it sees the tpm (tpm.msc has a little status window)
<klysm> so if you know your keys, you have access to all the secrets?
<geist> presumably there's a public/private thing there that lets you authenticate the storer/asked
<geist> asker
<clever> i believe TPM's also have a "measured boot" mode for unlocking
<geist> dont think it'd be very useful if you could just store a bag of bits with the name "foo" and then ask for "foo" later
<clever> every executable chunk of code in the boot chain, will report the hash of the next chunk to the TPM, before passing on control
<geist> i guess i really should read the spec. it comes up at work enough that eventually i should really know what it is
<clever> and if you play the right sequence of hashes, the TPM will unlock itself
<clever> so if say grub.efi was modified by an evil maid, the hash will be wrong, and the TPM wont unlock
<clever> and there is no way to reset the TPM, without also resetting the cpu and giving the bios/efi control
<geist> yah
<clever> so the firmware on the board will start playing the hashes back again, and only the hashed code has control over what to do with the keys
<geist> or at least if you do reset it you lose everything that was stored in it
<clever> yeah, i would expect a factory-reset to still be available
<geist> there's even a button for it here in the TPM control panel in windows
<klysm> and the arm security processor is the first thing that executes on a cold boot? that's the tpm right?
<doug16k> yeah
<doug16k> zen bioses don't even need to do cache as ram or anything. ram and pci works at bios entry
<clever> ive mostly heard of the TPM as being a self-contained module plugged into the motherboard
<klysm> except the arm security processor locks out all modifiable code?
<geist> it says it stores the secrets in some special storage. dunno if that's flash on chip or just something in EFI storage that's encrypted out the wazoo with a private key
<clever> but i can see how modern x86 "backdoor" stuff can emulate a TPM securely
<geist> clever: yah the BIOS setting here on this machine is literally 'use the emulated TPM or use the plugged in one'
gog has quit [Ping timeout: 265 seconds]
<geist> curiously theer's no 'no TPM'
<geist> though it says basically if you tell it to use the plugged in one and there isn't one present, its the same thing
<doug16k> mine has no tpm
<geist> and that was the default
<doug16k> ...setting
<geist> interesting
<geist> is it hard enabled all the time?
<geist> maybe if you didn't have a socket on your mobo they dont bother giving you the option
<doug16k> how can I check?
<doug16k> to make sure I don't have a tpm
<geist> do you have windows?
<doug16k> linux
<geist> i guess the presence of a /dev/tpm0 and/or some ACPI table TPM2?
flx has joined #osdev
<doug16k> ls: cannot access '/dev/tpm*': No such file or directory
<doug16k> it might be in "amd cbs" setting
<klysm> geist, zgrep TPM /proc/config.gz
<geist> yah mine is fairly high level in the list, but it's possibly under that
<geist> also dunno if you've updated your bios, etc
<geist> mine is very new, updated it last week, because of the x570 stability fixe that's supposedly in
<doug16k> microcode: CPU31: patch_level=0x08701021
<doug16k> zen2 btw
<geist> yah we have the same 3950x
<geist> oh side note, FWIW 5950xes appear to finally be available for more than 30 minutes at a time
<doug16k> I have my 3950x in my B350-plus and my 2700x in my x470 pro lol
<doug16k> apparently it's lackwards band
<kazinsal> I think once 5950Xes show up in reality up here in the no longer frozen and actually rapidly melting wastes north of the 49th I'll probably pick up a gently used 3900X or similar to replace my 2xE5-2620v0 machine
<doug16k> oh and my really good cooler is on my cool running 2700x and my tired cooler is on my 3950x
<kazinsal> my desktop is an 8700K and frankly anything better just isn't going to exist in this country for another half a decade
<doug16k> is there any DDR4-3200 unbuffered ECC memory available?
<doug16k> looking forward to ddr5
<doug16k> if they don't call it qdr5 I am going to have a fit
<doug16k> it's not double data rate!
<geist> kazinsal: well they're starting to show up on newegg and B&H at least
<geist> at MSRP basically
<geist> dunno if that means it's avail in .ca
<geist> interesitng the wikipedia article on TPM says that hypervisor based TPMs are a legit thing
<kazinsal> yeah, we generally have a 10-15% markup on stuff before conversion
<kazinsal> it's pretty bad
<geist> so my guess is thats how win 11 will work in a VM. the hypervisor will just provide it
<geist> it's just not incredibly secure
<kazinsal> it looks like other 5x00 series are in stock though
<kazinsal> 5900X at $719 CAD
<kazinsal> ~585 USD
<kazinsal> 5950X is in stock, wow
<moon-child> meh, zen 3 is cool but my 3960x serves me very well
<kazinsal> showing as a "regular price" of 1149 CAD but a "sale" price of 1019 CAD
<doug16k> ya, the top zen2 stuff is close enough to zen3
<geist> yah i had some plan to upgrade my 3950x to a 5950x maybe, but now that it's here i just cant say ive been sad about the 3950x
<kazinsal> so yeah, wow, I'm wrong, that's basically MSRP
<geist> like i'm sure it's 20% faste ror so, but not worth the $800
<geist> the main advantage would be i'd roll it down to my server and then get more cores there
<kazinsal> GPUs are unobtanium still of course
<doug16k> I hardly care about the singlethread increase. my workload is mostly embarrassingly parallel
<geist> but... also haven't been really stressing out over that either
<doug16k> if I cared more about the singlethread, I'd be all over zen3
<kazinsal> there's a single RTX 3070 in stock at memoryexpress but you have to buy it in a whole system
<kazinsal> annoyingly it's listed in the video cards section and not the prebuilts section because you're actually just buying all the components for a 5800X + RTX 3070 build
<geist> yah though be careful there. i've heard that some of the OEM geforces are seriously nerfed
<geist> like yeah it's technically a RTX XXXX but it's underclocked, etc
<doug16k> maybe worst bin?
<doug16k> defaults don't even work
Yukara is now known as meisaka
<kazinsal> yeah, I'm just going to wait for the 4000s to come out and the shitcoin market to collapse further so 3080s become available on the used market
<kazinsal> my 1080 Ti does 99% of what I want it to and that last 1% is just silly RTX AI demos
martm193 has joined #osdev
<geist> oh totally. 1080 tis are like gold now
<geist> it's a super trooper
<kazinsal> yeah, the most stressful thing I'll be using it for is the next battlefield game
<kazinsal> and DICE has been historically great at making those run really well on slightly older machines
<graphitemaster> If you don't mind, you can pay a scalper :P
<graphitemaster> I mean yeah it'll be over-priced, but since it's impossible to get one otherwise.
<moon-child> https://xkcd.com/606/ an alternate strategy
<bslsk05> ​xkcd - Cutting Edge
<graphitemaster> If no one bought high end gaming gpus or cpus when they came out we'd be worse off though
<graphitemaster> Like whales are important for you to even participate in a five year lag
<graphitemaster> They're subsidizing the cost for you in a way
<doug16k> they are extremely high volume though, they are totally gouging for video card prices right now, even before the shortage
<doug16k> my 2060 super was a ripoff, before the shortage
<martm193> So this bus vulnerability was very long talked about by me, i was very honest about it, it's an inherent arch of pipelining thing either partial or full, or even no pipelining, on immediates used to load instruction cache no units can not be behind mtrr immediates i.e i-cache protection latches, and not behind mpu protection either from dma firmware, cause this is hw issue without needing any os sides, slow pipeline mode and not the best performing and
<martm193> lowest power and electron accurate in their phases with no powerline given due to no load on the wires -- this voids any license and patents for companies, cause all the software is ust like that reverse engineered, with the help of hw bugs. Ther will be never a way to block that locally, not even the on the secure pipeline mode, it's common sense guys, tht is why google gives their code away by default also, this zero day tmining vuln. research from
<martm193> zero-labs on google is just misleading trolling and they know it as well as every real computer engineer like doug16k .
gog has joined #osdev
<martm193> the company itself is a big success though, and the best they can do is might be in place, never tried to get into google servers with exploits, there is away only to not to let this happen same thing with satellites, if code is electron perfect and out of phase in that hw, tcp will either always time out, or the phase is out and you are cut out of service with your exlpoit, i.e it never executes.
martm193 was kicked from #osdev by geist [martm193]
<Mutabah> I'm too nice :(
<Mutabah> Every time he returns I think "maybe this is interesting to something, and he's not being insulting"
<geist> yeah he's sending pages of random at me now in privmsg
<kazinsal> unsurprising
<Mutabah> Props to him for finding the new channel
gog has quit [Quit: byee]
<clever> could have probably hid better if we went to #os-dev, lol
<clever> make it a little less obvious
<Mutabah> Nah... we (well.. I) try to be open to anyone looking for help
<Mutabah> the above is why we have mods
<klange> clever: Don't want to hide over one consistent troll :)
<clever> klange: could still update the wiki, and hope the troll doesnt look?
<clever> but yeah
<klange> I was going to wait for an actually-offensive comment, new network and all...
<Mutabah> They found us on a new network, despite the "old" channel still existing - they looked at the wiki
<kazinsal> it would be easier and more effective to stage a coup in their home country with the intent of introducing a universal mental healthcare system than it would be to attempt to hide an IRC channel from one obsessive poster
<clever> kazinsal: lol
aquijoule_ has joined #osdev
richbridger has quit [Ping timeout: 268 seconds]
<doug16k> I think people could find it if it were named #0a18a189-760d-4e5a-a297-6c9970419cc7
<klysm> which reminds me I've been writing an irc bot "obot" for my irc server at call.cbu.net which invites users to uuid-based channels and keeps local comments in a database for static content.
sts-q has quit [Ping timeout: 265 seconds]
<geist> yah i thought about not kicking him, for about 3 seconds
<geist> but figured, look we know where this is going
<kazinsal> this story only ends one way
ElectronApps has joined #osdev
<geist> spent a part of the day going through the rust book and getting it set up with visual studio code
<geist> actually a pretty nice experience
sts-q has joined #osdev
<Mutabah> Yeah, I was originally vim only for my rust programming (doing legacy osdev)
<Mutabah> but started using vscode+RA for some work-adjacent stuff, and wow
<geist> yah i'm not fully sold if using a gigantic external project that it may or may not be able to index
<geist> but... it's not a terrible experience. and with the vim style key bindings it's pretty easy to switch between
<bslsk05> ​gist.github.com: rubberduck.c · GitHub
<doug16k> info tlb elision (just 4KB pages so far)
<doug16k> contiguous pages with the same flags just say ...
<doug16k> once
<doug16k> anyone care to codereview that psycho bit of state machine I made to decide whether to print each line? :P
<doug16k> I suppose last_pte shouldn't have present bit set at start
<doug16k> so nonsensical -1 pte won't mislead it
<doug16k> gotta add check for reserved bit set in here too
<doug16k> it doesn't really care right now
<doug16k> it can only omit 510/512 though, due to it showing transistion to next pd entry
<doug16k> 99.6% removal
<doug16k> 99.8% on 32 bit paging
<bslsk05> ​gist.github.com: rubberduck.c · GitHub
<bslsk05> ​gist.github.com: rubberduck.c · GitHub
<doug16k> screwy UB MTRR overlapping large page = nonsense
<doug16k> makes that whole GB UC
<doug16k> if TLB miss that filled it was inside MTRR. otherwise, it falsely makes whole GB WB, in conflict with MTRR
<doug16k> intel doesn't guarantee that to work
<doug16k> AMD doesn't even say it works below 2MB paddr
<doug16k> I thought it would print a UB message for that
<doug16k> must have missed that case
<doug16k> would be awesome if it said which MTRR eh? :D
<doug16k> and show its base and size
<doug16k> maybe for UB scolding output I'll show the exact MTRR base/range vs page base/range
<doug16k> line 104 is an example of one that isn't elided because flags changed
<doug16k> do you think I should keep count and print how many pages were elided, instead of just ...?
srjek_ has quit [Ping timeout: 250 seconds]
drewlander has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
drewlander has joined #osdev
<bslsk05> ​gist.github.com: gist:3078354a8b0ceb1df0dc4c4ca7820747 · GitHub
<klange> nice
<doug16k> found a memory corruption in my bootloader I think
<doug16k> whenever I have problems with my kernel, I should work on qemu until qemu tells me what's wrong with my kernel, instead of debugging my kernel :P
<doug16k> basically Q's algorithm. if you have to lift something heavy, just change the gravitational constant of the universe!
<doug16k> you can modify the reality your kernel runs within to make it find your bugs
Izem has joined #osdev
<doug16k> fixed weird redundant entry in eliding-at-end-of-page boundary condition
Izem has left #osdev [#osdev]
fconti has joined #osdev
tenshi has joined #osdev
Mooncairn has quit [Ping timeout: 252 seconds]
Matt|home has joined #osdev
sortie has joined #osdev
flx has quit [Ping timeout: 268 seconds]
sortie has quit [Ping timeout: 268 seconds]
dennis95 has joined #osdev
<doug16k> how can it be truncated? https://www.godbolt.org/z/65qPMGeWc
<doug16k> fixed some nonsense, didn't help https://www.godbolt.org/z/3Ed85G9M8
<doug16k> it's not possible to be truncated with that code, right?
<moon-child> doug16k: it does the same thing if you say restore_path_len+imgname_len+100, I think it's just ignoring that part
<doug16k> why would it ignore it?
<doug16k> the new warnings are garbage
<doug16k> it warns about complete impossibilities
<j`ey> doug16k: fixed https://www.godbolt.org/z/sejr86nrb :P
<doug16k> warning, this never ever happens
<moon-child> I mean, I kinda feel like if you're using snprintf you're signing yourself up for truncated messages when the inputs are excessively long
<doug16k> more like if (ret < 0) __builtin_unreachable();
<moon-child> so even if it _did_ truncate they shouldn't warn you about it
<doug16k> no I didn't
<doug16k> look at the code
<doug16k> strlen strlen compute size, only if it fits, then snprintf
<doug16k> if the compiler can't see that, then it can't do that warning
<moon-child> doug16k: I'm saying that completely aside from the fact that the warning is wrong in this case, the whole idea that you should warn about this kind of situation is wrong
<j`ey> moon-child: it's a warning that you dont check the return value, if you check the return value the warning goes away
<doug16k> stupid warning
<moon-child> j`ey: I understand. I think that's a dumb warning is all
SGautam has joined #osdev
<doug16k> it is pointless to check that return value
<moon-child> 99% of the time I don't care if snprintf truncated
<doug16k> it is 100% guaranteed to succeed
<doug16k> I am annoyed about these broken warnings, while I am also glad to know a workaround
<doug16k> buffer overrun static analysis must not diagnose code that can't overrun
<j`ey> if (ret < 0) __builtin_unreachable();, is kinda nice to keep anyway, in case you messed up and put an off by 1 error in the if condition :P
<doug16k> if it does, then it needs to hear the boy who cried wolf
<doug16k> j`ey, yeah, ubsan would make it an assert
<doug16k> ubsan traps reached unreachable
<bslsk05> ​github.com: dgos/ubsan.cc at master · doug65536/dgos · GitHub
<doug16k> just look at backtrace
<doug16k> ubsan calls that if you execute builtin_unreachable() line
<j`ey> huh
<j`ey> and if ubsan isnt enabled?
<j`ey> 'ud'?
<doug16k> then it is UB
<j`ey> ud2
<doug16k> na, it hardly ever emits that
<doug16k> it just lets it fall through
<j`ey> o
<doug16k> whatever is next, who cares
<doug16k> seriously, I checked
<moon-child> yeah, it assumes that any path that reaches there won't be taken and just strips it out of the graph
<doug16k> it implicitly believes you
<j`ey> oh right, i was thinking of rust's unreachable!(), which inserts a panic!()
<moon-child> I think sometimes it does ud2 instead of ret if you reach the end of a function
<doug16k> it could emit ud2, but I hardly ever see it
<j`ey> moon-child: someone here had no ret and no ud2
<doug16k> I think clang does it more
<j`ey> so it just executed the next function in the binary lol
<j`ey> doug16k: ah, maybe that's why ive seen it, since i used clang a decent amount
<doug16k> yeah or worse
<doug16k> it might fall into some else or something
<doug16k> from the if body
<doug16k> true block
<doug16k> it didn't need to jump over else block right? it's unreachable
<doug16k> and the register allocation might not be the same there
<doug16k> it reordered the blocks all funny because of unlikely or something
<doug16k> so it could go completely insane
<doug16k> but run a while
<j`ey> do whatever it wants!
<doug16k> it's pretty close to worst case scenario. you want to fail fast, not run all screwy for a while with mostly valid variables
<doug16k> it's bad when you luckily have a bunch of valid pointer values in the registers and execute some code that thinks they are something else
<j`ey> (which is why I like that rust's unreachable actually panics!)
<doug16k> it'll keep going and not stop until it is useless to look at
<doug16k> it does if you have ubsan on
<doug16k> rust probably has a form of ubsan on forever
<moon-child> if you want fail fast, then you don't wanna __builtin_unreachable in the first place, you want to assert or w/e
<j`ey> heh, kinda
<doug16k> I have the option to turn all my assert(e) into assume(e) which makes then if (!(e)) __builtin_unreachable();
<doug16k> you can make assert go to one extreme or the other if you want
<moon-child> that's basically NDEBUG
<doug16k> no
<doug16k> it makes the optimizer assume every assert can't possibly fail
z_is_stimky has quit [Read error: Connection reset by peer]
<doug16k> and gives it really aggressive clues
z_is_stimky has joined #osdev
dormito has quit [Ping timeout: 268 seconds]
<moon-child> yeah. But from semantics perspective
<doug16k> yeah you could make it NDEBUG driven
<doug16k> usually NDEBUG just means leave it not said whether the assert is true or not. if you assume the asserts, you tell the optimizer the asserts can't possibly fail, so use that to make sweeping assumptions in value analysis
<moon-child> hahha, cute
<j`ey> lol
GeDaMo has joined #osdev
SGautam has quit [Ping timeout: 265 seconds]
isaacwoods has joined #osdev
sortie has joined #osdev
<doug16k> ubsan verifies them: https://www.godbolt.org/z/nbMW4Pz95
<j`ey> same link
<j`ey> that's better
<kingoffrance> thats pretty sweet, now just to embed a compiler and jit on the fly lol
<moon-child> that was synthesis os
<kingoffrance> yep, thats my understanding
<doug16k> what are you going to jit that you couldn't aot
<doug16k> why not just aot compile the bytecode for this cpu
<doug16k> aot = ahead of time
<doug16k> gcc is aot
<moon-child> https://wiki.c2.com/?SynthesisOs has some examples
<bslsk05> ​wiki.c2.com <no title>
<kingoffrance> i didnt say its a good idea :) any function where you know arg values and local variable values (or can eliminate some things, even if you dont exactly know) and can make an "optimized" variant; for many things and modern cpus i think it will be a loss, although i think more complicated code even tiny gains can add up, e.g. a( b( c() ); d(); ) what moon-child said too (havent seen it, presumably shows some real example)
<kingoffrance> i see it as a middle thing
<kingoffrance> some code that is somewhat complicated, calls other functions, but you need to be able to compile quick to make it worthwhile
<kingoffrance> i mean, i guess multicores, you could dedicate some to background "optimize"
<bslsk05> ​en.wikipedia.org: Partial evaluation - Wikipedia
<doug16k> I found an old jit thing I made for qsort callback long ago, lol
<moon-child> yea cache effects turn the scales slightly further against jit these days, but only slightly
<doug16k> it generated a thunk that gave you the extra parameter
<moon-child> doug16k: have definitely done that
SGautam has joined #osdev
dormito has joined #osdev
<kingoffrance> im not sure the specific differences jit versus aot; wikipedia says one context aot is opposite of jit, but jit page JIT compilation is a combination of the two traditional approaches to translation to machine code (ahead-of-time compilation (AOT), and interpretation) and combines some advantages and drawbacks of both
<moon-child> it's a somewhat poorly-defined term
Arthuria has joined #osdev
<kingoffrance> i almost see it as time/space tradeoff. in sense of, all these variant functions floating around, might be shorter, but more of them versus one "generic" one
<kingoffrance> *each individual might be shorter
<kingoffrance> if you say run a program 20 times
<kingoffrance> 20 of the same proc running at same time, each with different variants for whatever functions
<kingoffrance> i mean, nothing is free
<kingoffrance> *same program, different processes
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
X-Scale has quit [Ping timeout: 268 seconds]
<kingoffrance> maybe what you want is something like: -Oquick do any optimization that doesnt take too long to compile
<doug16k> -O is "quick"
<doug16k> without a number makes a big difference
<doug16k> -O basically does exactly what you said with not bad codegen
<kingoffrance> yeah, just embed it now :)
<kingoffrance> i do agree that does seem kind of how gcc -O is
<doug16k> if you said for (int i = 0; i < 20; ++i) it will precisely set the thing to 0, and check each time if it is less than 20, and increment the thing, but it will try to do that well
<doug16k> it won't transform anything
<doug16k> it varies across targets though
<doug16k> on x86 -O is extremely obedient
<doug16k> on riscv it says ya right and forces a call to something
<kingoffrance> the obvious of course, more "space" because you have to keep "source" around too
<doug16k> (where x86 would have just copied each int like you said)
<doug16k> there is a gcc jitter you know
<doug16k> I'm sure you heard of it
<bslsk05> ​gcc.gnu.org: JIT - GCC Wiki
<doug16k> it's more like a "don't care what time compiler"
<doug16k> best of both worlds. can precompile for instant cold start, or jit for smallness
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
ElectronApps has quit [Ping timeout: 265 seconds]
ElectronApps has joined #osdev
<bslsk05> ​gcc.gnu.org: Tutorial part 1: “Hello world” — libgccjit 12.0.0 (experimental ) documentation
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<doug16k> because very little of the code actually gets called, just jitting each thing as it is first used is way smaller
<doug16k> lots of the code runs rarely and briefly
<doug16k> usually I mean
<doug16k> that jit api looks great though
<doug16k> just have to transpiler to that and it compiles
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
SGautam has quit [Ping timeout: 252 seconds]
SGautam has joined #osdev
<kingoffrance> no i hadnt heard of it, i dont follow things, but yes that does look like youd just need to transpile basically
gog has joined #osdev
<kingoffrance> i wonder what magic goes on so that you can greet() isnt modern os not gonna let you just put some machine code to some address and execute it?
<kingoffrance> anyhow, at least from c pov, looks transparent. greet is just a normal function pointer
<GeDaMo> mprotect
<kingoffrance> yeah i figured, i just dont know if distros lock that down for a normal user and only allow it for certain programs etc.
<sham1> mprotect? It does work for normal users
<sham1> A lot of JITs use that to change between writing to the "machine code buffer" and making it executable again because W^X
SGautam has quit [Ping timeout: 268 seconds]
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<bslsk05> ​'The Dirty Way Manufacturers are Downgrading Your PC' by Linus Tech Tips (00:16:53)
<clever> doug16k: basically, its the ram density, number of chips changing, but total storage capacity the same
ElectronApps has quit [Read error: Connection reset by peer]
ElectronApps has joined #osdev
ahalaney has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
ElectronApps has joined #osdev
nyah has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
CryptoDavid has joined #osdev
tenshi has quit [Ping timeout: 268 seconds]
tenshi has joined #osdev
janemba has quit [Ping timeout: 258 seconds]
Mooncairn has joined #osdev
ElectronApps has quit [Remote host closed the connection]
ElectronApps has joined #osdev
xenos1984 has quit [Ping timeout: 250 seconds]
vai has quit [Ping timeout: 258 seconds]
xenos1984 has joined #osdev
ElectronApps has quit [Ping timeout: 258 seconds]
ElectronApps has joined #osdev
janemba has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
mniip has quit [Quit: This page is intentionally left blank.]
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
mniip has joined #osdev
BadQuanta has joined #osdev
Arthuria has joined #osdev
vdamewood has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
mahmutov has joined #osdev
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
mahmutov has quit [Ping timeout: 258 seconds]
flx has joined #osdev
srjek_ has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
dennis95 has quit [Remote host closed the connection]
Arthuria has quit [Read error: Connection reset by peer]
dennis95 has joined #osdev
dennis95 has quit [Remote host closed the connection]
dennis95 has joined #osdev
dennis95 has quit [Remote host closed the connection]
dennis95 has joined #osdev
Arthuria has joined #osdev
srjek_ has quit [Ping timeout: 250 seconds]
Arthuria has quit [Read error: Connection reset by peer]
archenoth has quit [Remote host closed the connection]
Arthuria has joined #osdev
archenoth has joined #osdev
asymptotically has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
dennis95 has quit [Read error: Connection reset by peer]
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
tacco has joined #osdev
mahmutov has joined #osdev
dh` has quit [Remote host closed the connection]
dh` has joined #osdev
mctpyt has quit [Ping timeout: 250 seconds]
mctpyt has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]
MiningMarsh has joined #osdev
zoey has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<doug16k> clever, linus never heard of dram banks before
<doug16k> he should go look up what an xor gate does
<doug16k> he's too clueless to see that AMD gets more performance out of additional banks than intel does
Arthuria has quit [Read error: Connection reset by peer]
<doug16k> that was noticed the day zen3 was released
Arthuria has joined #osdev
<clever> doug16k: in the video, he swapped the ram between 2 laptops, and the benchmark results almost entirely swapped
<doug16k> yes, last year this was news
<doug16k> recycled
<doug16k> he has taken zen3 taking better advantage of bank concurrency, and turned it into a pile of bullshit that implies that the intel is better somehow
<doug16k> he pisses me off. he doesn't even know how computers work
<doug16k> he thinks laptops are not the same as desktops
<doug16k> somehow dram performance is magically not the same because it is a laptop
<clever> i also couldnt make full sense out of which ram was better for which cpu, from that vid
<clever> i think the main point he was making though, was that OEM's where selling hw with the "worse" ram, and not listing the ram specs that matter
<doug16k> the reason that switching the ram helped amd is, he put the ram with more banks in the amd, and ram with fewer banks in the intel, and amd suddenly sped up because AMD is capable of more concorrency
<doug16k> bank concurrency
<doug16k> the intel doesn't benefit from the extra banks
<clever> yeah, i can see how that would help, so the AMD chip is better designed, but the OEM paired it up with the "wrong" ram, and didnt give that in the spec sheets
<doug16k> what about the zen3 supporting 3200 memory. what's the max non-OC clock on that intel?
<doug16k> not 3200 right?
<doug16k> the amd memory controller runs circles around that intel one
<doug16k> not even fair. intel using older process
<doug16k> just compare intel igpu against amd igpu. guess which one hammers the memory controller drastically harder and succeeds?
<clever> changing topics slightly...
<clever> i can see how a concurrent capable dram controller, might help in the rpi, given that it almost has 3 brains fighting over the ram
mctpyt has quit [Ping timeout: 252 seconds]
<clever> but from what ive seen of the design, it only has a single bus, so it can only do one transfer at a time
<doug16k> bank concurrency is expected from all memory controllers
<doug16k> even EE student homework memory controllers do it :D
<clever> i think i'm mixing up 2 different concepts
<doug16k> zen3 just took it to an extreme
<clever> 1: accessing one bank while another is doing open/refresh/commit
<clever> 2: accessing 2 entirely seperate chips, on seperate busses, in parallel
<clever> i dont think the pi is capable of 2
<doug16k> it can simultaneously access I/O and memory right?
<doug16k> like a pc
<clever> internal IO, yep
<doug16k> axi is separate concurrent interface isn't it?
<clever> i believe axi is a shared bus (with fifo's at entry/exit), to connect every master to every slave
<clever> so the axi bus routes a request to either the ram controller, or the mmio, based on the addr
<clever> and its packet based, so while a read is in progress, axi can send another read-request to a diff slave
<clever> the fifo's on both ends, let you shove a read request into the master port, without having to wait for your turn
<clever> and lets axi transfer it over to the slave port, without having to wait for the slave to be ready
<clever> whenever the slave is ready, it will pop a request off its fifo, act on it, and push a reply on a fifo facing the other way
<clever> doug16k: in theory, the ram controler can have its own internal fifo, where it will move requests, and then process them out of order, and concurrently
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
kwilczynski has joined #osdev
<bslsk05> ​github.com: rpi-open-firmware/sdram.c at master · librerpi/rpi-open-firmware · GitHub
tenshi has quit [Quit: WeeChat 3.2]
<clever> doug16k: when i was analyzing the existing source, i had found that the 1gig model of pi, has the same density of ram s the 512mb model, but half the bus width
<clever> what i think is happening there, is that they just shoved a pair of 512mb chips into a single package, and wired them up in parallel, to act as a single bigger chip, each taking up half the data bus
fkrauthan has quit [Quit: ZNC - https://znc.in]
fkrauthan has joined #osdev
mahmutov has quit [Ping timeout: 268 seconds]
mahmutov has joined #osdev
mctpyt has joined #osdev
fconti has quit [Quit: Leaving]
immibis has joined #osdev
<moon-child> doug16k: gccjit is garbage
<moon-child> ditto llvmjit
<moon-child> you don't want to actually use something like that for runtime compilation
<moon-child> webkit started out using llvm, and then they realised that was a horrible idea
<moon-child> llvm/gcc are set up for batch compilation. They're never going to be fast enough, and they're not set up to take advantage of things you can only do when jitting
GeDaMo has quit [Quit: Leaving.]
asymptotically has quit [Quit: Leaving]
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
aerona has joined #osdev
dormito has quit [Ping timeout: 268 seconds]
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<Bitweasil> ARMv7 starts in supervisor mode, secure, right?
<Bitweasil> Ok, yeah, found the reference.
<clever> Bitweasil: i believe arm always starts in the most powerful mode the cpu supports
<clever> so if the core has supervisor, it will start in supervisor
<clever> once your in kernel mode, there is basically no way to know that the core ever has supervisor support, enless an active supervisor reveals itself
<clever> same for 64bit support
srjek_ has joined #osdev
dormito has joined #osdev
Arthuria has quit [Read error: Connection reset by peer]
BadQuanta has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
zoey has quit [Quit: Leaving]
aerona has quit [Quit: Leaving]
<geist> clever: armv7 can't not support supervisor mode. it's a different model than the EL stuff in armv8
<geist> yes, it does
sortie has quit [Quit: Leaving]
Mooncairn has quit [Quit: Quitting]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
<Bitweasil> er.
<Bitweasil> Supervisor is EL1.
<Bitweasil> Are you thinking HYP/Monitor?
<Bitweasil> It appears to start in secure EL1, not monitor mode.
<Bitweasil> Though it's trivial to drop down into monitor mode, looking at the Pi armstubs.
<clever> Bitweasil: my memory says EL3 is supervisor? EL2 is hypervisor, EL1 is kernel, and EL0 is userland
<Bitweasil> EL3 is Monitor.
<clever> ah, getting those 2 mixed up
<Bitweasil> Supervisor and System are both EL1 modes.
<clever> i also had to implement dropping donw a level in LK
<j`ey> clever: supervisor ~= kernel
<clever> j`ey: ahh
<j`ey> supervises the users, hypervisor supervises the supervisors :P
<bslsk05> ​github.com: [arch][arm] fix booting when in HYP mode · littlekernel/lk@71687b4 · GitHub
<clever> Bitweasil: when LK was first ported to the rpi, the firmware left HYP mode for you, and everything just worked
<clever> Bitweasil: but then somebody wanted HYP on linux, and the firmware was modified to launch linux in HYP mode, which entirely broke LK
<clever> LK would set the MMU enable flag for supervisor/kernel mode, and being in HYP mode, that had zero effect
<Bitweasil> *nods*
<clever> then it would jump to the virt addr, and *fault*
<Bitweasil> yeah, this all sounds familiar.
<Bitweasil> I'm on the other end of it, but... yes.
ahalaney has quit [Remote host closed the connection]
<clever> Bitweasil: and i'm also running into similar problems on the open-firmware side, because the arm stubs are entirely omitted
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<doug16k> moon-child, yeah, I wouldn't expect anything to do magical JVM-style recompile-reoptimize stuff
Arthuria has quit [Read error: Connection reset by peer]
Arthuria has joined #osdev
<moon-child> that stuff aside, you want fast and probably tracing, neither of which gccjit can do
<doug16k> you could probably generate code faster, but unlikely you could generate faster code
Arthuria has quit [Read error: Connection reset by peer]
<doug16k> I don't even know how it is possible for gcc to be so fast
Arthuria has joined #osdev
<doug16k> tens of milliseconds to compile most of my files
<doug16k> it's over 100 files per second when I compile qemu
<doug16k> way over
<doug16k> in my rom project, I am often checking the build window to see if it really built something. seemed like it did nothing. it did it
<doug16k> it's weird when clean build is 40ms
<clever> heh
<doug16k> ah, has grown since then. up to 100ms on -j32, but the compiles finish so instantly that it could only keep 4 cpus going
<doug16k> real: 100ms, user, 452ms
<doug16k> imagine how fast gcc would be if it didn't need a new process for each file?
<doug16k> funniest thing, that 100ms, half that is page faults, half of that half is malloc calls, and a tiny blip of it is compiling
<doug16k> probably 10ms of actual compiling in there
<moon-child> ehh I don't buy that
<doug16k> don't have to. see for yourself in perf
<moon-child> tcc is 10-20x faster than gcc. It can't be 90% overhead
<doug16k> gcc spends more time in page fault than anything else
<doug16k> 2nd place is malloc
<doug16k> in a make -j32
<moon-child> well--alright
<moon-child> but you don't have to do page faults nor malloc
<moon-child> if you don't build graphs, everything is flat, access patterns are predictable
<clever> doug16k: sqlite does a thing where it concats every .c into one big fat .c file
<clever> doug16k: you can also `gcc foo.c bar.c -o baz`
<doug16k> that defeats parallelism though
<moon-child> ^
<clever> yeah
<doug16k> LTO could extract some back out of it
<clever> ghc can get parallelism, while also recursively finding all modules on its own
<moon-child> I actually wish compilers could do that kinda parallelism, though
<clever> but it works in a very different way, with no .h files
<moon-child> you could cut down on i/o that way too, only read headers once
<clever> moon-child: there is the pre-compiled headers thing
<clever> where you turn the .h files into a binary form, that will parse more quickly
<clever> i stumbled upon it in my early days, when i blindly ran gcc on every source file
<moon-child> not the parse, you have to do separate parse anyway (unless you do clever caching), the i/o
<clever> comments would also be gone from the compiled headers, so it may need less IO
<moon-child> and actually, you do want to do clever caching, but you want that in the compiler, not the build system, because the compiler will be less likely to get it wrong. So another argument for doing it all in one process
<clever> yep, thats what ghc is doing
<clever> it has a hashmap for other modules
<moon-child> it's harder to do correctly for c, though. You have to check that all the same macros are pre-defined before loading your cached copy
<clever> the weird thing with haskell, is that there are no header files
<moon-child> except obviously some macros are going to be different, so you instead have to check that all the macros that were actually _used_ are the same
<moon-child> clever: right, exactly. That kinda caching is way easier with proper modules
<clever> it will basically parse very .c file your linking against, and extract the type information from it
<clever> but it also auto-generates .hi files, which contain that type info
<clever> in a binary form
<clever> moon-child: this is also where performance problems come in, the hashmap contains lazy objects, so the compiler only has to compute the value if it actually wants to read it
<clever> moon-child: the problem is 2 fold, first its loosing that value and turning it back into lazy every time a module completes building
<clever> moon-child: second, a performance tunable called lazy blackholing, causes it to run that computation multiple times in parallel, wasting cpu time
<clever> so building with -j4, causes it to use 4*4 times as much cpu
Arthuria has quit [Read error: Connection reset by peer]
<clever> (rough estimate)
Arthuria has joined #osdev
<clever> it can be solved, now that its known, but its a tricky edge case
netbsduser``` has joined #osdev
netbsduser has quit [Remote host closed the connection]
Arthuria has quit [Ping timeout: 272 seconds]
<ZetItUp> https://wiki.osdev.org/PCI_IDE_Controller the ide_read_buffer bug, is there a fix for it?
<bslsk05> ​wiki.osdev.org: PCI IDE Controller - OSDev Wiki
buffet0 has joined #osdev
opios2 has quit [Ping timeout: 244 seconds]
buffet has quit [Read error: Connection reset by peer]
buffet0 is now known as buffet
Griwes has quit [Ping timeout: 244 seconds]
Griwes has joined #osdev
CryptoDavid has quit [Quit: Connection closed for inactivity]
mahmutov has quit [Ping timeout: 272 seconds]