klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<mrvn> if you later store the flag in memory then "m" should optimize that. But like this I have no idea what clang is thinking.
<mrvn> does that even work right? The pushf modifies the stack pointer to the "-8(%rsp)" could be the wrong offset
<heat> yes, it works right, it's 100% defined
<heat> <nbjoerg> it just prefers memory for complicated histerical reasons
<heat> from #llvm
<heat> which explains absolutely nothing
<mrvn> so popq first increments the %rsp and then -8(%rsp) is computed?
<gog> pop from memory to memory
<mrvn> https://godbolt.org/z/Kos1nGG6c still doing the right thing?
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> that doesn't work
<heat> you can't move the stack pointer and hope clang notices
<mrvn> You can if the compiler doesn't throw in SP relative memory references into your arguments
<mrvn> I don't think pushf with "m" works.
<heat> it does
<heat> believe me
gxt has quit [Ping timeout: 255 seconds]
<heat> * it evaluates its effective address -- this is part of the
<heat> * "=rm" is safe here, because "pop" adjusts the stack before
<heat> * documented behavior of the "pop" instruction.
<heat> */
<bslsk05> ​elixir.bootlin.com: irqflags.h - arch/x86/include/asm/irqflags.h - Linux source code (v6.0.9) - Bootlin
<mrvn> heat: yeah, in this special case
<mrvn> It would also work if clang would use a frame pointer instead of RSP
gxt has joined #osdev
<mrvn> heat: that code would normally be compiled with -mno-red-zone: https://godbolt.org/z/xdT1dsfjh
<bslsk05> ​godbolt.org: Compiler Explorer
<mrvn> looks even worth then
<mrvn> worse even
<heat> omg wtf is going on in f1
<heat> why is it saving %rax
<mrvn> heat: no red zone, so it first has to make space for "flags" on the stack.
<mrvn> same as "subq $8, %rsp"
<heat> wtf
<heat> so just sub?
<mrvn> sub ios 4 bytes, push is 1
<heat> sub doesn't store, push does
<heat> gcc agrees with me
<bslsk05> ​godbolt.org: Compiler Explorer
<mrvn> probably because gcc keeps the stack 16 byte aligned
<mrvn> clang lines 5+6 could be just "popq %rax"
<mrvn> https://godbolt.org/z/nb78s6MWT What is going on in g? Why isn't it passing the memory address to the inline asm?
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> if you static inline f you'll get cleaner code, but still not ideal
<mrvn> not really. it just removes the output for f()
<mrvn> I would expect "g()" to be "pushfq; popq 305419896"
<mrvn> isn't that possible?
<heat> that I'm not sure
<heat> you could do mov $0xdeadbeef, %rdi; pop (%rdi)
epony has quit [Quit: QUIT]
<mrvn> 8F /0 : POP r/m64, valid in 64-bit mode
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> there we go
<mrvn> but that's not the optiomizer doing the work
<mrvn> With the original the "m" makes no sense. If the optimizer isn't optimizing out the return and stores directly into the target memory then "m" can never be faster.
<heat> m can be faster if 1) the optimizer sees it 2) doing "r" spills a register for a return you're probably not actively using atm
<mrvn> it will always remain: pop to stack, read from stack into reg, store reg in memory
<mrvn> So you already need to spill the reg and then poping to that reg is faster
<mrvn> That was my test with "g": Does the optimizer see the store to memory and eliminate the temp register
<heat> right, but it's super possible you're just saving eflags, cli + disable stuff, and do something, restore eflags
<heat> in that case, storing to the stack is faster
<heat> (if you need to spill)
<heat> and it does indeed do it correctly on the stack
<mrvn> heat: but you aren't. You poping into a temp place, reading into a reg, ..., storing reg into temp place, pushing temp place.
<heat> but you can also can depressingly bad codegen like
<heat> ffffffff801b945a: 9c pushf
<heat> ffffffff801b945b: 8f 45 d0 pop -0x30(%rbp)
<heat> ffffffff801b945e: 44 8b 65 d0 mov -0x30(%rbp),%r12d
<mrvn> heat: I don't see the optimizer eliminating that "return flags" that loads into a reg anywhere.
<heat> well, keep in mind this func gets inlined
<mrvn> yeah, badly :)
<heat> in other news, omg this codegen is horrible
<mrvn> sometimes compiler really disappoint.
<bslsk05> ​gist.github.com: pushwtf · GitHub
<heat> per objdump -d kernel/vmonyx-unstripped | grep -C 2 pushf
<heat> 99% of the inlines are pushf; pop (somewhere in the stack); mov (somewhere in the stack), <reg>
<heat> like WTF???
<geist> hmm, i dunno, that's strange, but it's hard to do the same thing with less instructions
<mrvn> heat: that's clang?
<heat> yup
<mrvn> geist: gcc just uses "pop reg" saving the extra opcode
<geist> sure but it might still need to save it to the stck for some other reason elsewhere
<heat> a quick grep on gcc-built vmlinux sees a lot of pop <reg> effectively
<geist> it's popping it off the stack onto the local frame, maybe something else needs it
<geist> but i could see rewriting it as 'pushf; pop reg; move reg,-x20(rbp)'
<mrvn> geist: clang does too. They both end up using a register, just clang shoves the value onto the stack and then into a reg.
<geist> yah but question is does that stack value get used anywhere else
<geist> maybe it does later in the routine
<heat> it's not that clang has an all seeing eye and knows i'm storing it for later
<geist> also it *could* be a misread of the inline asm
<mrvn> geist: you mean the register is justr a copy that get trashed and then the original gets read again? Not in any of my tests.
<geist> that i have seen
<heat> it's that clang literally just stores to the memory
<geist> okay.
* geist shrugs
<heat> clang for some reason prefers memory to registers and no one knows why
<geist> i do bet a modern cpu will elide that second mov and see it's a store and then load from the same address
<geist> i think for some time now modern intel and amd machines have that sort of load-after-store optimizations
<heat> having a standalone x86_save_flags(): sub $8, %rsp; pushf; pop (%rsp); mov (%rsp), %rax; ret is just really poor codegen
<mrvn> .oO(Our compilers are dumb, lets fix that in the hardware)
<geist> of course mrvn
<heat> geist, how does that play with smp and fences?
<\Test_User> just make CPUs that run on common programming languages already if you're going to do that :P
<heat> JAZELLE
<mrvn> I though exactly such cases where the point of having SSA form. So this load-after-store would become trivial to recognize and optiomize out.
<geist> i think it'd be fine right? a store and then immediately load back from it is fine as long as it's cached. doubleplus so in a weakly ordered machine
<geist> yah i think the pipeline could just elide that to a register move
<heat> i guess, since we're not using atomics?
Lumia has joined #osdev
<mrvn> geist: depends on how smart the cpu is. It's a register and memory dependency so it could totaly serialize the operations and stall the pipeline
<geist> and it's cached. if it was uncached memory, ie a mmior egister, that's absolutely not okay
<geist> yah i just know i've seen talk recently of much more sophisticated load/store eliding than this, so i suspect this is already a given
<mrvn> the cpu has to detect it as load-after-store and optiomize it away
<geist> Zen 2 in particular had something somewhat more powerful than this, in particular
<geist> (though it was removed in zen 3 for some reason)
<heat> actually
<heat> movl $0x0,(%rbx)
<heat> this is atomic
<heat> with a release C11 memory ordering
<mrvn> I guess this is something useful if you have code that calls tiny functions a lot. registers get spilled to the stack, function call, ret, restore registers. If the function is small enough the cpu would elide the push/pop completely.
<heat> movl $0x0, (%rbx), <something else stores>, movl (%rbx), %rax <-- what value can %rax be?
<mrvn> 0 or something else
<heat> unless you need a proper atomic when writing with an acquire C11 memory ordering
<mrvn> atomic really only makes a difference when you have a second observer.
<geist> ugh my thermostat at home is starting to die i think
<mrvn> battery low?
<geist> it's more than once in the last few days suddenly in mid day read much higher temp than it is
<heat> mrvn, you do have two observers in this case
<geist> so it doesn't turn on, so then it gets colder and colder in the house
<mrvn> heat: a single thread is just one observer
<geist> but it dosn't seem to consistently read too high
<heat> to be clear, the <something else stores> is supposed to be another core
<mrvn> ahh, why don't you say so
<heat> I don't understand how you can ever elide that store
<heat> erm, load
<heat> particularly if the other cpu does a proper atomic cmpxchg or whatever
<mrvn> heat: there is no synchronizing event so the order os the movl and something else is undefined
<mrvn> if the other core does atomics then it's synchronizing. Could still execute as if the second "movl" was before something else.
<heat> I think movl $0x0, (%rbx); lock cmpxchg <...>; movl (%rbx), %rax is entirely defined
<mrvn> heat: when is the second movl decoded, added to the pipeline, optimized by the hardware, accesses the cache, ...
<heat> to be clear, you don't need any special fence to get that release memory ordering
<mrvn> If you execute that "lock cmpxchg" at just the right time then it will work. If you do it a quarter cycle later it might not.
<mrvn> mixing atomic (one core) and non-atomic access (other core) will have variable success.
<heat> but this is atomic
<mrvn> the 2 movl are not atomic. The lock can happen before both, in the middle or after.
epony has joined #osdev
<heat> the 1st mov is absolutely atomic
<mrvn> What x86 garantees is that you won't get %rax filled half before the lock and half after.
<mrvn> each movl is atomic but the pair is not.
clever has quit [Ping timeout: 260 seconds]
<mrvn> for your code to be deterministic you would have to turn of interrupts on both cores, synchronize them and then execute all opcodes with perfect knowledge of the timing so the "lock cmpxchg" on the second core executes right in the middle of the two "movl" on the first core.
Lumia has quit [Remote host closed the connection]
dude12312414 has quit [Remote host closed the connection]
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
clever has joined #osdev
<geist> i wonder. are there *any* circumstances by which multiple SMT threads are allowed to share TLB entries between cores, or are they all intriniscally tagged with the thread that generated them (when the TLB is being shared between them)
<geist> on x86 you could say that if the page is marked global then it could be, however that would be assuming the architecture states that you can only have one set of global pages, ie, they're intrinsically going to be referenced on all cores. not running multiple kernels with multiple sets for example
<geist> PCID is right out on x86, but perhaps ARM's ASIDs are allowed to share, because the arch pretty much states taht all cores within an inner shared domain must have the same asids
<geist> same set that is
<geist> if so that might have a complication if you truly wanted to run multiple kernels side by side without using virtualization
<geist> ie, statically allocating cores to individual kernels
terrorjack has quit [Ping timeout: 256 seconds]
<Mondenkind> about the store thingy (movl $0x0, (%rbx); lock cmpxchg <...>; movl (%rbx), %rax), x86 has total store order. That means the initial store _has to_ happen before the cas
<Mondenkind> and you can't fold away the load. Suppose you write a zero. Somebody else reads the zero, writes something else in its place, and then writes something to the location you cas to
<Mondenkind> your store happens-before his load happens-before his store happens-before your cas happens-before your second load
terrorjack has joined #osdev
k8yun has quit [Ping timeout: 252 seconds]
<heat> i've been writing a qemu pflash driver tonight
<heat> kinda cute
<heat> but god is this all really poorly documented
<heat> there's no proper cfi spec I can see
<heat> there are some docs but they're all shitty and super incomplete
<zid> what's a pflash
<zid> it just sounds like someone who can't say the letter f right
<clever> zid: parallel flash
<clever> vs serial (spi) flash
<heat> pflash is qemu's flash devices
<heat> they speak CFI which is some kind of standard-ish language flash devices speak
<zid> imagine having more than 8 pins
smeso has quit [Quit: smeso]
smeso has joined #osdev
epony has quit [Read error: Connection reset by peer]
k8yun has joined #osdev
k8yun has quit [Ping timeout: 248 seconds]
wxwisiasdf has joined #osdev
LostFrog has joined #osdev
chartreuse has quit [Remote host closed the connection]
PapaFrog has quit [Ping timeout: 240 seconds]
bradd has joined #osdev
heat has quit [Ping timeout: 256 seconds]
<wxwisiasdf> iapx 432 osdev
smach has quit [Ping timeout: 260 seconds]
epony has joined #osdev
acidx has quit [Remote host closed the connection]
<geist> i always thought it'd be kind ainteresting to actually try to do something with it, but iirc the documentation for it is incomplete
<geist> something like the programming docs are higher level and dont tell you precisely how the low level OO stuff work son the processor
<geist> so it's insufficient to implement an emulator or whatnot
<wxwisiasdf> it is unfortunely
<wxwisiasdf> even through it was a dumpsterfire it had some cool ideas
<geist> yah
<geist> and you can kinda see how some of it bled over into 286 i think
LostFrog is now known as PapaFrog
<wxwisiasdf> 286 TSS time
<geist> and in general the whole handle to segments with bytewise size, etc
<wxwisiasdf> i think the most weird thing is how instructions were bitstreams rather than bytestreams
<wxwisiasdf> like imagine getting an offset wrong
<geist> heh
<wxwisiasdf> whole program dead
<geist> i suppose it could resync fairly quickly, but depend son what the opcode layout is
<wxwisiasdf> i assume they just used nibbles instead of allowing 6 bit insn
<geist> yah good question. also depends on what units they address memory
<geist> i was fairly certain it was at least a full 32bit machine
acidx has joined #osdev
Burgundy has joined #osdev
wxwisiasdf has quit [Quit: leaving]
clever has quit [Ping timeout: 256 seconds]
Burgundy has quit [Ping timeout: 252 seconds]
simpl_e has quit [Read error: Software caused connection abort]
simpl_e has joined #osdev
acidx_ has joined #osdev
bauen1 has quit [Ping timeout: 256 seconds]
acidx_ has quit [Remote host closed the connection]
acidx has quit [Remote host closed the connection]
acidx has joined #osdev
wxwisiasdf has joined #osdev
MarchHare has joined #osdev
knusbaum has quit [Ping timeout: 248 seconds]
doppler has quit [Ping timeout: 248 seconds]
doppler has joined #osdev
knusbaum has joined #osdev
k8yun has joined #osdev
bauen1 has joined #osdev
k8yun has quit [Quit: Leaving]
Benjojo has quit [Read error: Software caused connection abort]
Benjojo has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
bauen1 has quit [Ping timeout: 255 seconds]
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
bauen1 has joined #osdev
Vercas6 has quit [Quit: Ping timeout (120 seconds)]
gog has quit [Quit: byee]
wxwisiasdf has quit [Ping timeout: 256 seconds]
clever has joined #osdev
Vercas6 has joined #osdev
carbonfiber has joined #osdev
nyah has joined #osdev
gaze___ has quit [Read error: Software caused connection abort]
gaze___ has joined #osdev
ElementW has quit [Quit: -]
SanchayanMaity has quit [Read error: Software caused connection abort]
SanchayanMaity has joined #osdev
bauen1 has quit [Ping timeout: 252 seconds]
bauen1 has joined #osdev
diamondbond has joined #osdev
ElementW has joined #osdev
bauen1 has quit [Ping timeout: 260 seconds]
bauen1 has joined #osdev
Mutabah has quit [Ping timeout: 248 seconds]
Mutabah has joined #osdev
GeDaMo has joined #osdev
Stella is now known as theWeaver
heat has joined #osdev
Maja[m] has quit [Quit: Bridge terminating on SIGTERM]
Irvise_ has quit [Quit: Bridge terminating on SIGTERM]
chibill has quit [Quit: Bridge terminating on SIGTERM]
sakasama has quit [Quit: Bridge terminating on SIGTERM]
identitas has quit [Quit: Bridge terminating on SIGTERM]
Maja[m] has joined #osdev
Burgundy has joined #osdev
diamondbond has quit [Ping timeout: 260 seconds]
identitas has joined #osdev
Irvise_ has joined #osdev
chibill has joined #osdev
sakasama has joined #osdev
heat has quit [Remote host closed the connection]
heat has joined #osdev
<heat> your operating systems are mid at best
<heat> mine is super good numba one best operating system ever
Burgundy has left #osdev [#osdev]
eroux has quit [Ping timeout: 260 seconds]
Bitweasil has quit [Remote host closed the connection]
Bitweasil has joined #osdev
eroux has joined #osdev
GeDaMo has quit [Read error: Connection reset by peer]
CryptoDavid has joined #osdev
GeDaMo has joined #osdev
Burgundy has joined #osdev
bradd has quit [Ping timeout: 252 seconds]
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
<mjg> hue
<mjg> conside the following: you impement scalable inode number allocation for tmpfs
<mjg> 's all per-cpu 'n shit
<mjg> and then you run into a funny corner case: / has to have ino 2
<mjg> but your scheme does not *guarantee* you will have that
<heat> my inode allocation scheme for tmpfs is cur_inode_num++
<clever> i believe x86 allows that to be atomic with just 1 opcode and maybe stalling? but it also steals the entire cache line the var is in?
<mjg> well it is lock xadd now
<mjg> the point is for it to NOT be this way
<heat> yeah I use atomics here
<mjg> just sayin a funny problem
<mjg> one which would probably would not show on a 2 thread vm for the test suite
<clever> but if you had per-cpu counts, and each count was in its own cache line, it wouldnt stall or need atomics
<mjg> would it
<mjg> clever: that is the entire point, yes
<clever> you could just `(core << 20) | counts[core]++` basically
<heat> are you just dividing the 64-bit inode space into NR_CPUs partitions?
<heat> I can see that working
<clever> heat: i was thinking, just slot the corenr into the upper bits, dont bother trying to partition it up better
<mjg> no. conceptually for_each_cpu(c) { per_cpu(c)->ino = c; }
<mjg> then allocation is you add MAXCPU
<mjg> to whatever per-cpu var you got
<clever> 8bit corenr, toss it into the top 8 bits, that leaves you with 56 bits for each cpu to count its own inodes
<mjg> guaranteed lack of conflict, but also huge gaps, which should not be a problem
<heat> odd scheme
<heat> make CPU2 start at 2 + MAXCPU ig
<heat> while you're at it, maybe also reserve 0 and 1
<heat> well, in any case you'll need the allocation code to detect these reserved inode numbers and skip them
<clever> in my scheme, inodes 0/1 effectively belong to core 0
genpaku has quit [Read error: Connection reset by peer]
<clever> and if you want those to be for things like the root dir, you can just pre-initialize core-0's inode table and counter, before you allow other cores into the ball-pit
genpaku has joined #osdev
<clever> if core-0 cant possibly be touching the state, then your free to initialize core-0's state however you like
<clever> your going to initialize all of the other cores anyways, from the wrong core
<heat> yes, that layout also works well
<heat> cpu0 just skips the first 3 inodes
<mjg> there is 0 difficulty skipping some range in my scheme
<mjg> the entire point tho was that you may need to allocate something which goes aganst the general scheme
<mjg> and thus needs to be special-caesd
<clever> oh, i just had another idea, just reserve core-255's range for special stuff
<clever> when do you expect to see 256 cores? :P
<clever> but that would drop the special ones at near thet top of the 64bit range
<mjg> there is no difficulty here per se
<mjg> just fucking around to get it done
<heat> it's mildly annoying to skip in your scheme given a low enough MAXCPU
<heat> does anything actually depend on root ino = 2?
<heat> that seems............ depressing
<clever> the only time ive ever had issues with the inode number being weird, was 64bit inodes and a 32bit userland
gildasio has quit [Remote host closed the connection]
Vercas6 has quit [Remote host closed the connection]
<clever> the 32bit readdir() returns EOVERFLOW if any inode is over 32bits in length
<mjg> heat: that's to make it cpu hotplug-proof
<clever> xfs spreads the inodes over the whole disk, and picks an inode near the first data block, so if the first data block is >2tb into the fs, the inode# is over 32bits
<mjg> heat: one can roll with current cpu count without conceptually changing anything
Vercas6 has joined #osdev
<clever> and much to my surprise, a number of core linux utils, used in initrd stuff, dont check the readdir() return code
<mjg> heat: but you may notice MAXCPU would be a macro known at compilation time, while current cpu count would have to be read every time
<clever> if it returns -1, it must be EOF!
<clever> so basic linux utils, claim file not found, when the file exists!!
<mjg> clever: is there no magic to misrepresent inos which don't fit?
<mjg> i would expect gnu to have a hack big time
<clever> mjg: your supposed to rebuild your program with large file support, and libc will then change the size of off_t&friends, and call readdir64 behind the curtain
<mjg> e.g., masking off the overflowing part
<clever> and then everything just works
<heat> GNU's big time hack is a 64-bit version of getdents
<mjg> for example linux in kernel had a scalable scheme to allocate 32-bit inos
<heat> which is actually not a big time hack but a normal time hack
<mjg> except it just assumed there would be no duplicates
<mjg> or even if you get one, it wont matter
<clever> one extra complication in my case, was that the fs itself, was xfs on a 64bit machine, running an nfs server
<mjg> linux in a nutshell
gildasio has joined #osdev
<clever> the client was then purely 32bit, and an nfs client
<mjg> clever: you lose by exporting nfs man
<clever> mjg: what else would you use for sharing files?
<mjg> ootb i don't see good alternatives on linux
<mjg> assuming you need to pretend posixy
<clever> yeah, thats why i use nfs
<clever> i discovered this bug, when working on nfs based net-boot for an rpi
<clever> my work-around was iscsi, export a whole block device, run the fs client side
<clever> also, due to the block dev being smaller, the inode table isnt big enough to cause the problem in the first place
<clever> but iscsi is limited to a single client, so it cant replace nfs
<mjg> and you are stuck with 32-bit inos on the client no matter what?
<clever> it was running the arm in 32bit mode
<clever> and due to limitations in dozens of linux packages, i couldnt use readdir64()
<clever> so yes
<mjg> linux
<mjg> no such problems on onyx amiright
<clever> i was able to apply overrides to the packages, to force them to build with large file support
<clever> which did technically fix it
<clever> but it was like a game of whack-a-mole
<clever> fix one, and 2 more come out
<heat> ONYX BEST OPERATING OF SYSTEM
<mjg> solaris on powerpc == $$$
<clever> i used gentoo on a sparc machine as my nas for a while
<mjg> wut
<clever> but one day, it just randomly stopped booting and i couldnt figure out why
<clever> and then i discovered, xfs is super lazy, the journal is in native byte order only
<mjg> you know, i do find it disheartening when people boot !solaris on sparc
<clever> and there is no cross endian journal replay
<clever> so an LE machine cant replay a BE journal
<mjg> and the endiannes
<mjg> having BE/LE branchfest to translate as needed was the shit
<mjg> s/and/ah/
<clever> zfs has a neat trick to solve most endian issues, write all records in native order, include a magic# in the header
<clever> if the magic# is backwards, your on a different endianness, swap all fields
<mjg> bugreport: magic is all 0s
<clever> if you dont change the host endianness, it will always be operating in native mode and never byteswap
<clever> but if you do mess with the host, it can still read things, but all new data will be in native order
<clever> however, there are a number of surprise fields, that are BE only
bauen1 has quit [Quit: leaving]
* mjg adds midgetendian
<clever> one thing i want to get back to at some point, is BE linux on aarch64
<clever> both arm32 and aarch64 can run in BE mode
<clever> its just a config register in the cpu
<mjg> can you fuck with it any time?
<clever> on 32bit arm, yes, and i think it has even caused some bugs
<clever> on 64bit arm, i think its part of the EL switch system, so when you drop from say hypervisor to kernel, or kernel to userland, you can also change bit width, and endianness
<mjg> now that makes sense
<clever> but linux isnt able to handle a userland with a differing endianness
<clever> so kernel and userland must match
<clever> i dont know of any that allow it, but a hypervisor could support both LE and BE guests
<clever> a few months back, i was helping some guys on the rpi forums, at booting BE linux
<clever> and they then discovered, half of the drivers arent doing proper bit-flips
<mjg> :]]
<clever> while linux will automatically byte-swap MMIO writes
* mjg == shocked
<clever> the control structures you dump in ram, and then point at, dont get magically swapped
<mjg> i'm positively surprised with the half which does
<clever> and those kind of edge cases, are why i want a BE CI machine
<mjg> i'm guessing some of the drivers only do it some of the time
<mjg> ;>
<clever> yeah
<clever> another whacky case ive been involved in, is the pistorm guys
<mjg> there was a cpu which had errata: writing to a reg is expected LE, but reading gives BE
<mjg> LS
<mjg> :S
<clever> basically, emu68 is a JIT'ing emulator, to run m68k (BE) code, on an rpi3, in BE aarch64 mode
<clever> by keeping the arm in BE mode, it never has to deal with byte-swaps when emulating
<clever> it can just directly translate an m68k load into an aarch64 load, and not care about address or endiannes
<clever> but where it gets whacky, is that the rpi MMIO window, is mapped into the guest address space
<clever> you then write drivers for the rpi hardware, with the BE->LE byteswaps everywhere, compile the driver for m68k, then JIT it into aarch64 at runtime
gxt has quit [Ping timeout: 255 seconds]
<clever> mjg: the other crazy part, is that emu68, is mostly only emulating the cpu, you socket the whole rpi into the cpu socket of an amiga, and it bit-bangs the entire motherboard, lol
gxt has joined #osdev
Burgundy has quit [Ping timeout: 252 seconds]
<heat> yeah but like, realistically who cares about big endian these days? at least on modern-ish not-embedded stuff
<clever> heat: and thats why BE stuff is always broken :P
eroux has quit [Ping timeout: 252 seconds]
eroux has joined #osdev
Burgundy has joined #osdev
Vercas6 has quit [Quit: Ping timeout (120 seconds)]
Vercas6 has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
xenos1984 has joined #osdev
<heat> today's saga: rdrand detection code
<heat> or "god oh god why can't amd get this right"
<zid> check if the cpuid name contains a z
<heat> z for what
<heat> brokenz rdrandz
<zid> but no double l, actually
<zid> to check for ryzen, or zen, but not bulldozer
<heat> the other ones also have broken shit
<heat> amd can't get rdrand right
diamondbond has joined #osdev
epony has quit [Ping timeout: 268 seconds]
<bslsk05> ​lore.kernel.org: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h - Lendacky, Thomas
<heat> "There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems."
<heat> this is a crying emoji moment
<zid> yea amd has a LOT of bugs with suspend, exception etc recovery just from.. not being that mature as a setup
<zid> lots of bios bugs, lots of cpu bugs
<heat> how hard can rdrand be
<heat> 1. get entropy 2. magic cryptography 3. ??? 4. put it in a register
diamondbond has quit [Ping timeout: 268 seconds]
<heat> you'll notice there's no step called "screw up rdrand for 11 years"
<kof123> this is some really good gourmet coffee heat
gildasio has quit [Ping timeout: 255 seconds]
<heat> thanks
<heat> i try my hardest to give you the best gourmet coffee
diamondbond has joined #osdev
<heat> btw re: that clang thing: https://github.com/llvm/llvm-project/issues/20571 cc mrvn, geist
<bslsk05> ​github.com: inline asm "rm" constraint lowered "m" when "r" would be preferable · Issue #20571 · llvm/llvm-project · GitHub
<heat> <jyknight> heat: This is a really long-standing issue which is not easy to fix. At the time the decision about whether to use "r" or "m" is currently made, it doesn't yet know that a register can be allocated at all. That's why it always falls back to memory -- the alternative is sometimes failing to compile. I believe https://github.com/llvm/llvm-project/issues/20571 is the canonical issue for this.
gildasio has joined #osdev
diamondbond has quit [Remote host closed the connection]
diamondbond has joined #osdev
Matt|home has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
wxwisiasdf has joined #osdev
xenos1984 has quit [Ping timeout: 246 seconds]
xenos1984 has joined #osdev
CryptoDavid has quit [Quit: Connection closed for inactivity]
diamondbond has quit [Ping timeout: 256 seconds]
xenos1984 has quit [Ping timeout: 252 seconds]
gog has joined #osdev
xenos1984 has joined #osdev
xvmt has quit [Read error: Connection reset by peer]
<gog> hi
<Ermine> hi gog!
<gog> :)
xvmt has joined #osdev
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
wootehfoot has joined #osdev
crm has joined #osdev
orthoplex64 has quit [Ping timeout: 255 seconds]
<wxwisiasdf> Hii
epony has joined #osdev
dude12312414 has joined #osdev
elastic_dog is now known as Guest5023
Guest5023 has quit [Killed (zirconium.libera.chat (Nickname regained by services))]
elastic_dog has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #osdev
kof123 has quit [Ping timeout: 268 seconds]
GeDaMo has quit [Quit: There is no spoon.]
kof123 has joined #osdev
<jbowen> o/
immibis_ has quit [Ping timeout: 260 seconds]
immibis_ has joined #osdev
<geist> heat_: ah so it was the lowering thing
<geist> now that i think about it i think i've seen something like this in zircon, grumbled about it as bad clang codegen, and moved on
<mrvn> heat_: premature optimization is the root of all evil
<mrvn> heat_: Also the point isn't wether a register is available but wether the result is going to go to memory.
gxt has quit [Ping timeout: 255 seconds]
gildasio has quit [Ping timeout: 255 seconds]
gxt has joined #osdev
gildasio has joined #osdev
<geist> but yeah i only late discovered you can *have* multiple constraints like that on inline asm, but then was dissapoint when i discovered it on zircon, using clang, since it's mostly broken there
<geist> now it makes sense why
<mrvn> It should not lower the asm until all the inputs and outputs have been determined or multiple options are viable.
heat_ is now known as heat
<heat> yeah
<heat> gcc handles it fine tbf
<heat> praise be the GNU operating system
<mrvn> heat: does it? doesn't it just default the other way?
<heat> no
<geist> yah gcc i've seen do it quite well. very pleasing when it handles in/out instructions with a constant, for example
<geist> look at the dissasmelby and all is well
<heat> maybe this warrants a #ifdef __clang__ #define RM_CONSTRAINT "=r" #else #define RM_CONSTRAINT "=rm" #endif
<mrvn> heat: I think yesterday gcc did pop into a register and then mov to memory for the "to memory" test case
<heat> well, that's not what I'm complaining about
<geist> i mean all else held equal if you can't have both constraints, just keep the 'r' one
<geist> since it's probably cheaper to pop into that and push it, and there's some chance you'll want to use it anyway
<heat> pushf; pop <something> <-- I expect gcc to pop into a register OR spill onto the stack
<mrvn> clang is worse in that when you store to memory it will pop to the stack, load into a register and then store that to memory.
<geist> constants and R for inputs also
<geist> really i mean goddamnit intel and/or amd: add a stupid instruction to load flags into an integer
<geist> why have they not fixed this damn bug after all these years
<heat> becuz
<geist> they have done far more intrusive ISA fixes in the intervening years
<mrvn> geist: and why do they have so many pops? reg16, reg32, reg64, mem16, mem32, mem64,
<heat> CISC ftw
<geist> well the multi pops make sense
<mrvn> especially where mem is relative to a register
<mrvn> geist: sure, 16, 32, 64bit pop is needed. but all the address modes?
<geist> do you really mean 'why do they have pop to memory?"
<mrvn> ARM is better there. pop is just a mov with the "store" flag
<mrvn> geist: yeah
<geist> that's just risc vs cisc
<geist> why x86 has pop to memory i dunno, but since it *does* have a memory operand i think it supports most of the usual addressing modes, so that just comes along with the territory
<geist> though honestly i haven't looked at it. it may be a special snowflake
<geist> part of the very few instructions that can implicitly access two memory opeands at a time
<mrvn> I didn't even know "m" would mean picking a SP relative address.
<geist> but really mrvn, as a 68k person i'm quite dissapointed in you
<geist> i'd expect the opposite: why is pop so limited in its addressing modes!
<heat> yeah but this is not two memory operands, just one
<mrvn> geist: that too. either it should one addressing or all
<geist> heat: yeah fair. one operand, two memory accesse
<heat> just like rep movsb has 0 memory operands
<heat> makes total sense lgtm ok heat@ reviewed-by: heat <heat@irc.libera.chat>
<mrvn> some are just implicit because 8bit opcodes just aren't big enough
<geist> but yeah even in that, there are only a handful of x86 instructions that can do two memory accesses, but none of them have two full memory operands i guess
<mrvn> pop probably uses the same logic path as mov
<geist> oh i seriously doubt it. i think push/pop is highhhhly specialized on x86
<mrvn> maybe nowadays
<heat> x86 really is a high quality very consistent architecture
<geist> though whether or not it always was
<mrvn> But really push/pop is just a mov with one argument being the SP and storing the address modification
<heat> I do really like that part where storing to a lower 16-bit halve doesn't zero the upper but storing to the lower 32-bits zeroes the upper part
<geist> i can't believe i have mostly forgotten the details of 68k and multiple operands. iirc it's far more consistent, but still limited to one operand except in a few cases (MOVEM comes to mind)
<mrvn> movem is 040 extension I think
<mrvn> or 030
<geist> 040 i think
<mrvn> Didn't make sense before you had cache I guess
<geist> however 68k has the indirect addressing modes though, so lots of stuff can implicitly do more than one access
<geist> my brain tends to smear 68k and VAX together, since they're clearly cut from the same cloth. in terms of ISA layout and addressing modes it's like VAX-lite
<geist> so i tedn to lose track unless i fiddle with it a bit
<mrvn> 68k is quite regular except for some opcodes that limit regs to A
<geist> right. but for example you can't ust have both operands be any addressing mode, right?
<mrvn> think so
<geist> so there tends to be multiple opcodes for mem to reg or reg to mem
<geist> and then of course opcodes that implicitly deal with A or D regs (which is admittedly the weirdest quirk of the arch)
<geist> if there's one weird quirk that didn't work, it's the two sets of registers
<geist> but what does make those addressing mdoes much more poewrful than x86 is theres so many of them
<geist> pre-post increment/decrement, indirects, indirects with pre-post increment/decrement, etc
<geist> all told i think there are 16 different addressing modes
<geist> (this is why there's no push/pop on 68k, since you can accomplish it with a move indirect with pre or post inc/dec)
<mrvn> same on ARM
<geist> yah until arm64 in which case you can't
<geist> one of the core, critical differences between arm32 and arm64
<geist> re: push/pop on x86. there are also some silly differences, like it always moves the stack in units of <defined by segment, mode, etc>
<geist> and it always implicitly uses the SS: segment
<mrvn> I guess they saw that languages don't use push/pop except potentially in the function entry and exit. But most often you have push + sub #x, %sp. Might as well drop the logic for pre/post inc/decrement and just make "x" larger.
<geist> (moves with BSP do too)
<geist> mrvn: well, i mean you can do push/pop on arm64, it's just more limited. basically you dont have the move multiple, but you have the 'load store pair' instruction
<geist> so you generally move 16 bytes, two registers, at a time
<geist> it does have pre/post inc/dec so basically you can do what you want, it's just more limited
<geist> so i guess in that case it's not really fundamentally different from arm32, just more limited
<mrvn> I like the move multiple.
<geist> yah those are the first thing you ditch when mkaking high speed impls. they clearly wanted to remove any microcoded style instructions eveyrwhere
<geist> and explicitly make things like PC hard to access and SP a special case so it can be treated specially. all tricks for high speed impls
<mrvn> geist: isn't that kind of similar to loading/storing a full cacheline? Same as SIMD registers doing large load/store?
<geist> sort of. mostly its because it fits with their model of 'stack pointer *must* be 16 byte aligned at all times'
<geist> so to make that useful they added double register load/stores
<mrvn> You could even limit it to the caller saved regs, or 4/8 regs.
<geist> sure but then you can't easily fit that in the instruction, since they have 32bit wide instructions and there are 31 regs
<mrvn> arm32 has double register load/store already. The register just got bigger.
<geist> one nice thing ldp/stp does in arm64 is it takes any two register pairs
<geist> including the same register
<mrvn> geist: first reg + log(size)
<geist> so it's not exactly the same thing as the arm32 double load/store
<geist> mrvn: yah that's what PPC does
<geist> you get a linear run of any registers, but only a run of them
dude12312414 has joined #osdev
<mrvn> which is exactly what a language needs
* geist nods
<geist> anyway you really should do some arm64
<mrvn> ahh, yeah, the good old time when they made RISC to implement just what a compiler needs.
<geist> yah POWER/PPC is clearly cut from a different cloth. its risc, but at its core it just has a different viewpoint, i guess
<geist> you can see that it has some heritage somewhere completely different
<mrvn> None of that fancy "solve this quadratic equation" opcode that you need once in every 10th program.
<geist> probably if i knew ibm 360 or whatnot i'd see the resemblance
<geist> yeah that's a distraction. i hate it when folks bring up the quadractic equation instruction, since a) it's not really what you think and b) it was a weird outlier even at the time
<mrvn> hence why they bring it up
<gog> wait solve quadratic equation instruction
<geist> to me the essence of CISC is more interesting and subtle
<geist> if you're *actually* interested in it i forget what it's called but can look it up
<geist> IIRC it's actually some sort of 'do math and do a table lookup' instruction
<gog> oh
<mrvn> If you comapre m68k and ARM I don't think there is that much difference in the addressing modes. The CISC/RISC line is totally blurred.
<geist> clearly intended to accellerate somethig, iirc
<geist> mrvn: except all the indirect addressing modes. none of those exist in ARM
<gog> is it in the main isa or is it x87?
<geist> but yes, minus the indirect stuff, t's clear that the ARM stuff took their inspiration from the motorola world.
<mrvn> geist: you mean base reg + offset reg + immediate?
<geist> gog: oh this is some famous VAX instruction
<gog> oh
<heat> aaa best instruction
<geist> mrvn: i mean 'base + something' then take that as an address and read the thing at that address
<mrvn> heat: better than sex?
<heat> yes
<geist> ie a double indirect
<heat> also better than eieio
<mrvn> geist: oehm, did m68k have that?
<geist> er i didn't phrase it right, but 68k has variations of most of the modes that double indirect
<geist> yeah, i think mostly 020+?
<mrvn> geist: sure it's not VAX?
<heat> gog, u still using x87
<geist> mrvn: VAX definitely has that
<heat> have you heard of streaming simd extensions
<heat> it's this new thing for floating point and simd
<mrvn> can't remember having double indirection on m68k but I learned 68000 asm and only little bits for 020+
<gog> i will never do floating point math
<mrvn> and it's been decades since I used it
<geist> FWIW the VAx instruction that mrvn was referring to is https://documentation.help/VAX11/op_POLY.htm
<heat> reject floating - it's the devil's way
<geist> mrvn: yeah and in fact i couldn't easily get the compiler to even emit it
<heat> oh wow, very cool
<bslsk05> ​www.thedigitalcatonline.com: The Digital Cat - Motorola 68000: addressing modes
<gog> oh neat
<geist> may be the vax stuff bleeding through
<mrvn> That POLY opcode is really powerfull. You need that for all the trigometric functions for example.
<heat> *heat
<gog> when you've programmed on too many architectures
<geist> in general vax has pretty much similar stuff to 68k there but then you can also take the result of it and use that as an address you then indirect once more
<geist> ie, compute this address as a table entry of pointers and then read whatever is at the pointer we just computed in memory
<mrvn> I like (d8,Dn,PC). You have an array of structs on the stack and access it with (off+label, array index, pc)
<mrvn> -on the stack
<geist> ah no. actually turns out 040 has more
<mrvn> or (d8,Dn,An) for the more general case
<geist> i think that may be where i was looking at it. trying to find a better list
<mrvn> geist: Not much more it can have. Everything up to 111 100 is in use.
<mrvn> So only 4 more to go
<geist> yeah in wikipedia article on 020:
<geist> Addressing modes added scaled indexing and another level of indirection"
<geist> yah that's what i remember seeing
<geist> i forget how it's encoded
<geist> but those seem highly exotic and special cased, probably removed in 060 and coldfire and whatnot
<mrvn> maybe opcode specific
<geist> possibly
<geist> for the indirect stuff you just need one more bit somewhere
<mrvn> The 040 was kind of a branch. When they made the 060 superscalar they went back a step and never implemented everything the 040 had.
<geist> yah though in this case these additional modes were added in 020
<heat> kinda offtopic but does arm64's RNDR also have a history of shitty impls?
wootehfoot has quit [Read error: Connection reset by peer]
<mrvn> like having a pattern?
scoobydoo_ has joined #osdev
scoobydoo has quit [Ping timeout: 260 seconds]
scoobydoo_ is now known as scoobydoo
<heat> having predictable results, or plain depressing impls like accidentally always returning all-1s
<gog> heh, oops, all 1's
<geist> heat: hmm, arm64s?
<geist> what machine are you on?
<mrvn> heat: bit 3 ^ bit 7 == 1
<heat> geist, none. just wondering
<geist> in general i dont think a lot of ARM machines have it yet. it's a new extension that i think is optional in most case
<geist> i haven't personally seen it on any machine. though M1 maybe?
<heat> hm
<j`ey> I dont think m1 has it either
<heat> I've just spent all day battling rdrand and negligence and was just wondering
<geist> j`ey: yeah i think it's probably going to show up on servers first is my guess
<mrvn> people don't trust hardware RND
<geist> gotcha
<bslsk05> ​github.com: edk2/RdRand.c at master · tianocore/edk2 · GitHub
<heat> look at this shit
<geist> wat!
<geist> all that aside i mean i think you're supposed to use it with care
<heat> a new patch wants to hook up this stuff automatically in OVMF
<heat> which means broken implementations return broken unsafe results and cpus that don't support it will just crash
<mrvn> .oO(and how is that different from the test function returning False?)
<heat> if the test function returns false you don't use compromised RNG OR crash
<geist> heat: guess it depends on precisely when it was added in the intel and AMD line
<geist> and does TCG properly implement it? i'd be worried about running a build of OVMF in a VM where it's not passed through
<heat> i don't know, does TCG even implement it?
<geist> right exactly. OTOH arguably even a broken implementation that returns all 1s is still entropy, it's just bad
<geist> one shouldn't use the instruction for a random number, but as entropy to feed into a pool
<geist> (i think at least, i usually consult experts when it comes to this stuff)
<heat> yeah except having a literal EFI_RNG_PROTOCOL tempts you to just use it for your early boot KASLR or whatever
<heat> and even if you mix entropy, you just got yourself a lot of predictable entropy. yay?
<mrvn> no, entropy can never get worse, only better
<mrvn> random bits XOR all-1s doesn't make it less random
<heat> I would genuinely prefer a rdtsc fallback over "100 shades of broken" rdrand
<mrvn> unless the rdrand is influenced by the existing pool it can't make it worse
<geist> anyway re: ARM and RNDR, it hasn't been as big of a deal since *most* soc hardware has a hardware block to read random from
<mrvn> just wastes cpu if it's really bad
<geist> or at least it's fairly common. i suspect the instruction spec is mostly for server stuff
<geist> lets a VM get entropy whe it's super abstracted from hardware or whatnot
<geist> and also a way to standardize it
<mrvn> hardware block?
<geist> yah like a device
<mrvn> A one-time pad or a random register?
<geist> latter
<geist> and yes i know what you're about to say now
<geist> you're going to try to punch a bunch of holes in that. i am well aware
<geist> same thing: if some random vendor implements a hardware RNG is it any good? Good question! No Fucking Clue
<mrvn> no, wasn't going there. I though it rather sensible to have a MMIO register for random numbers.
<geist> oh haha
<mrvn> waste of a perfeclty good opcode to have that as opcode.
<geist> okay. well yeah most reasonable ARM hardware i've seen has some sort of RNG device you can read from
<geist> However that means 'how good is the implementation if random vendor can make it?' no idea
<mrvn> it's usualy just a diode or open gate that you read and it flutters.
<mrvn> And you are supposed to mix it into an entropy pool. It might be totally biased like giving only 1% 1s but as long as they have no pattern that adds entropy.
<geist> the hard part is something like getting a random number 1ms after booting
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<mrvn> Physically it's not that challenging to build a random bit generator.
<geist> since you haven't had time to stir the pot. and in this case heat is saying EFI will potentially give you garbage
<heat> yup
<heat> particularly on AMD machines I guess
<mrvn> You probably need to init the pool by stiring it a few times first.
<mrvn> and onl then pick your random number for KASLR
<heat> since they're saying this is already hooked up on Real World Implementations(tm), I'd be curious to see if there indeed is an RNG_PROTOCOL and if it returns entirely predictable results
<heat> for e.g Ryzen zen3
<geist> there is a random instrution on some AMD machines, i just dont know where it got started
<mrvn> heat: boot a kernel 1000 times and plot the random bits?
<geist> i do rememeber reading a good whitepaper from AMD on how their hardware works
<heat> AMD RDRAND has been broken since bulldozer
<geist> oh?
<heat> every family since then has had a broken rdrand!
<geist> how so?
<bslsk05> ​lore.kernel.org: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h - Lendacky, Thomas
<heat> "There have been reports of RDRAND issues after resuming from suspend on some AMD family 15h and family 16h systems."
<geist> that's family 15/16h
<geist> what about 17 and 19? those are zens
<heat> wikipedia says 15h is bulldozer and a few others
<geist> yes. zen 1-2 is 17, zen 3-4 is 19h
<mrvn> heat: sounds like a bios problem, not stiring the hardware pool on resume
<heat> 17h and 19h also had their issues, zen2 I think suffers from the same suspend problem and zen3 just plain returns all-1s ALWAYS (without the microcode fix)
<heat> firmware doesn't stir anything on resume, this is entirely a CPU thing
* geist nods
<mrvn> heat: see the comment onyour url
<mrvn> heat: returning all-1s must be a different problem
<heat> it sounds like "firmware didn't do this hacky solution so now resume is always broken"
<geist> well thats from 2019
<geist> it is from an amd employee, so they seemed to be just functionally giving up
<heat> per the spec rdrand with the proper flag set should return you cryptography-grade randomness
<geist> or at least at that snapshot in time, they might had a quirk MSR bit later that says 'actually it's okay' and then reenable it
<geist> which they have done from time to time
<geist> but not on bulldozers
<geist> since those are dead
<heat> oh yeah geist you might like this small thread: https://twitter.com/aionescu/status/1393728057005920263
<bslsk05> ​twitter: <aionescu> Playing around with my first AMD Ryzen system. Turns out the "AMD PCI Driver" isn't actually a PCI Driver... at all. ␤ ␤ Here's a few fun facts: ␤ ␤ 1) It registers a process creation notify routine, and checks all process names against a list of 19 hashed names.
<mrvn> heat: I assume the CPU has a entropy pool that rdrand returns values from and that only works if the pool is random to begin with. So the firmware is supposed to generate a million random numbers to stirr the pool on boot to get it randomized.
<geist> oh haha will have to unblock twitter and read that
<geist> alas right now gonna do some work
<heat> he DRBG is re-seeded frequently from
<heat> an on-chip non-deterministic entropy source to guarantee data returned by RDRAND is statistically uniform, non-
xenos1984 has quit [Read error: Connection reset by peer]
<heat> In order for the hardware design to meet its security goals, the random number generator continuously tests itself
<heat> periodic and non-deterministic.
<heat> and the random data it is generating.
<heat> per Intel SDM volume 1 7.3.17.1
<heat> if amd rdrand requires this then they built a broken rdrand
<heat> s/this/manual stirring/
<mrvn> heat: see, and the firmware is supposed to make sure such a re-seeding happens at boot
<heat> no it's not
<geist> oh ugh. i even have that stupid AMD PCI driver installed
<heat> did you even read it? it's seeded on-chip, re-seeded on-chip, tested on-chip
<mrvn> heat: which would take time. It doesn't pop up random at boot.
<heat> rdrand can fail
<heat> it's explicitly defined that a rdrand "invocation" should call it up to 10 times to get a result
<mrvn> Maybe it's not what intel expected but I can totaly see AMD saying firmware has to stir the pool at boot a bit for randomness to happen.
<mrvn> And by stir the pool I mean call rdrand a million times so the re-seeding and self testing and such triggers.
<heat> the AMD manual says nothing. maybe there's something in the confidential FW docs, but I doubt it
netbsduser` has joined #osdev
k4m1_ has joined #osdev
TkTech6 has joined #osdev
<heat> AMD PCI driver is named like actual spyware, it's amazing
torresjrjr_ has joined #osdev
geist_ has joined #osdev
fkrauthan_ has joined #osdev
antranigv_ has joined #osdev
les has joined #osdev
pie__ has joined #osdev
vancz_ has joined #osdev
ElementW_ has joined #osdev
Effilry has joined #osdev
lanodan_ has joined #osdev
vin1 has joined #osdev
sprocket has joined #osdev
outfox_ has joined #osdev
Mutabah_ has joined #osdev
eck_ has joined #osdev
joe9_ has joined #osdev
<mrvn> heat: so any results on that tweet? Has anyone tried benchmarking the games with the original and changed name to see what it does?
kof123 has quit [Ping timeout: 268 seconds]
JTL1 has joined #osdev
<heat> *shrug*
identitas has quit [*.net *.split]
Mutabah has quit [*.net *.split]
ElementW has quit [*.net *.split]
fkrauthan has quit [*.net *.split]
antranigv has quit [*.net *.split]
vin has quit [*.net *.split]
pie_ has quit [*.net *.split]
TkTech has quit [*.net *.split]
vancz has quit [*.net *.split]
les_ has quit [*.net *.split]
qubasa has quit [*.net *.split]
lanodan has quit [*.net *.split]
geist has quit [*.net *.split]
JTL has quit [*.net *.split]
k4m1 has quit [*.net *.split]
torresjrjr has quit [*.net *.split]
sprock has quit [*.net *.split]
eck has quit [*.net *.split]
netbsduser has quit [*.net *.split]
joe9 has quit [*.net *.split]
FireFly has quit [*.net *.split]
outfox has quit [*.net *.split]
torresjrjr_ is now known as torresjrjr
geist_ is now known as geist
TkTech6 is now known as TkTech
fkrauthan_ is now known as fkrauthan
<zid> heat stop causing netsplits
<zid> your shrugs are too powerful
<geist> yeah heat!
<heat> thanos snap but shrug
<mrvn> "We can rebuild him. We have the technology."
elastic_dog has quit [Ping timeout: 246 seconds]
elastic_dog has joined #osdev
xenos1984 has joined #osdev
<zid> My friend made a db of intel cpus and is showing me silly stats :P
<zid> skylake has 7 sockets, LGA2011 has 114 SKUs, the cpu with the most FMA units is Intel® Core™ i7-11850HE Processor with 8
<zid> which is a mobile cpu
<mrvn> float-multiply-add?
<geist> oh as a side note i have found as definitive a list of microarches to cpuids as anywhere else i have: https://en.wikichip.org/wiki/intel/cpuid
<bslsk05> ​en.wikichip.org: CPUID - Intel - WikiChip
<bslsk05> ​en.wikichip.org: CPUID - AMD - WikiChip
<geist> in case were interested FWIW
eck_ is now known as eck
<zid> Sandy Bridge (Server) E, EN, EP 0 0x6 0x2 0xD Family 6 Model 45
<zid> checks out
<zid> I am a 0x6:0x2D
<heat> fused-multiply-add i guess
<mrvn> a*b+c is an extremly useful operation
<zid> So if you want some sick avx512 perf, get one of those mobile cpus, add an LN2 pot to the top
<zid> and get cunching
<geist> i remember there was some other weirdness where the early Bonnell atoms (first gen) were totall yunbalanced and actually had far more vector processing than they could fill
<geist> one of those weird quirks of design
<zid> intel were doing weird things with getting 10nm not working and saying "only comet lake mobiles get 10nm" and stuff
<zid> which I think is why the laptop cpus have all the avx
<zid> yield issues
kof123 has joined #osdev
<zid> smaller laptop dies in lower volumes = yes, huge 32 core desktop dies in high volumes = no
<mrvn> did intel ever build in more units than needed and then burned some fuses for the units that don't work?
maksy_ has joined #osdev
identitas has joined #osdev
sprocket is now known as sprock
Effilry is now known as FireFly
gxt has quit [Remote host closed the connection]
maksy_ has quit [Ping timeout: 246 seconds]
gxt has joined #osdev
maksy_ has joined #osdev
antranigv_ is now known as antranigv
<zid> so the results are in
<zid> buy W-3375 for your webserver, a 13900k to play dwarf fortress on, and a W-2225 for your desktop
gildasio has quit [Remote host closed the connection]
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
gildasio has joined #osdev
Burgundy has quit [Ping timeout: 255 seconds]
[itchyjunk] has joined #osdev
Mutabah_ is now known as Mutabah
Ellenor has quit [Quit: Bye Open Projects!]
<kaichiuchi> clang scares me now
Vercas62 has joined #osdev
<kaichiuchi> https://godbolt.org/z/Tbcaqcsda <- switch compiler #2 to gcc and observe that the code generated will be the same as compiler #1, but observe now that clang is generated what looks to be shit code for this
<bslsk05> ​godbolt.org: Compiler Explorer
Vercas6 has quit [Ping timeout: 255 seconds]
Vercas62 is now known as Vercas6