<bauen1>
nomagno: encoding, opcode, magic values everything is there
ckie has quit [Quit: off I go~]
ckie has joined #osdev
<nomagno>
bauen1: I can't find an appendix anywhere
<nomagno>
I'm looking at the latest unprivileged instruction set PDF. Am I supposed to look somewhere else?
<bauen1>
nomagno: riscv-spec-20191213.pdf page 129, apparently it's not an appendix, but rather chapter 24
<nomagno>
Found it. Thanks!
<nomagno>
That was WAY too deep into the document
<nomagno>
I've been looking for it casually for months...
<nomagno>
I kept taking a look and being like "ugh, where?"
dormito has quit [Ping timeout: 256 seconds]
<bauen1>
nomagno: yes, i was implementing an rv32i cpu in vhdl and was very confused what the actual values for e.g. OPCODE-ALU where until i found that page ...
ElectronApps has joined #osdev
dormito has joined #osdev
joe9 has joined #osdev
melonai has joined #osdev
<geist>
well, kinda makes sense. its because they introduce all the subsets of the arch
<geist>
if you put it up front it'd be even less sensical
heat has joined #osdev
masoudd has quit [Read error: Connection reset by peer]
rorx has joined #osdev
<klange>
really fealing spoiled by x87 here
<klange>
need to implement a pow(), which is a complicated mess generally done through log... which is then a complicated mess
<klange>
oh boy and with data tables
<klange>
maybe libm should not be part of my core and I should just package a good one
<klange>
but then my compositor really wants some of these things...
<Griwes>
yeah libm is absolutely one of those things I'm going to just "steal" from somewhere
<klange>
I would like to actually learn some of these neat bit fiddling tricks for doing things with doubles
<moon-child>
I borrowed sleef, but I kinda want complex support
<moon-child>
which sleef doesn't have
<moon-child>
soooo at some point I'll have to learn complex analysis, I guess
<heat>
musl has a good libm
<klange>
Most of musl's libm is the same Sun code that's been floating around since the early 90s.
<moon-child>
oh they used that too?
<moon-child>
I used it at one point via openlibm
<heat>
yes
<heat>
90% of libms come from that
<heat>
it turns out dark arts are harder than what harry potter makes it seem
<heat>
so they just copy sun :P
<moon-child>
ms has a different lineage, I bet
<klange>
it turns out computers are actually really bad at math
<moon-child>
and I think dyalog apl too
<moon-child>
one of its trig functions had a different branch cut than all the popular ones
<heat>
imagine someone asks for a timer event in 100ns, how do I stop my timers from possibly never firing? in case the timer surpasses the future timestamp before I program it
<clever>
heat: yeah, i think i would have to figure out the min safe hw timer, and then busy-wait or just ignore delays below n
<clever>
the rpi hw timers are == based, rather then > based, so if you miss it by even 1 uSec, (its only a 1mhz reference), the timer wont fire for ~70 minutes
<heat>
most timers are yeah
<heat>
but if they are 64-bit, they won't fire for like 200 years
<clever>
so i would set the min delay at maybe 2 or 3 uSec
<heat>
(actually linux for instance treats a wait of ~0UL as a sleep forever)
<clever>
there is a 64bit counter for uptime, but the timer irq's are based on 32bit compares, so the upper 32bits arent part of the irq
<clever>
so a 64bit counter that counts at 1mhz, but only 4 32bit compare registers, each giving a unique irq
<clever>
and with only a 1uSec resolution, a 100nSec delay isnt possible with the VPU timers
<clever>
but i think the arm generic timer can reference off the crystal, 19.2mhz
<heat>
"A machine timer interrupt becomes pending whenever mtime contains a value greater than or equal to mtimecmp" <-- riscv solves this woohoo
<klange>
I do all of my timing functions based on absolute timestamps based on whatever high-resolution continuous counter I can get
<clever>
checking the math, a 19.2mhz signal, has a period of ~52 nSec
rorx has quit [Ping timeout: 256 seconds]
ravan has joined #osdev
lg has quit [Ping timeout: 240 seconds]
lg has joined #osdev
<klys>
this allwinner nezha board, I got an image onto my hdmi monitor, except the resolution is still too narrow. it came with a script called ~sipeed/test_lcd.sh which enables the video. I've tried making some xorg.conf additions, yet they occupy the left 1/3 of my (loud) monitor. I tried to run this program: https://github.com/hglm/a10disp/blob/master/README ;except it seems to require a newer driver at
<bslsk05>
github.com: a10disp/README at master · hglm/a10disp · GitHub
<klys>
linux/drivers/gpu/drm/sun4i/ than was supplied with kernel v5.4.61. to complicate matters, the debian uboot and linux-image packages are not installed here. they're part of an 8-partition gpt mockup on the supplied 32gb microsdcard. partition 8 is 20gb of "data," so I'd better back that up before formatting. partition 7 is linux, and the kernel and uboot scripts appear in other partitions.
<heat>
i think clang started generating bad code because of an infinite loop
rorx has joined #osdev
<geist>
yesss infinite loops are UB now
<geist>
you can't rely on clang doing anything with them. it is free to just disregard them.
<geist>
i have learnd this the hard ware
sdfgsdfg has joined #osdev
<heat>
yeah I know they're UB I just never expect the compiler to act on it
<heat>
gcc didn't, but clang started triggering a random ubsan call
<geist>
yep. it's fairly new IIRC
<heat>
added an inline wfi into the loop and everything went back to normal
<heat>
but hey i have timing now
<heat>
just need IRQ and then im probably done with the CSRs I can interface with lol
<geist>
which is IMO completel broken. it's one thing to say 'its UB' but another one to literally yolo it
<heat>
the "j ." completely disappears
<geist>
yep. i dont see why that's okay, especially since it doesn't even warn about it
<heat>
and it probably falls into some random code and triggers a random ubsan out of nowhere
<geist>
yah, or simply falls through to the next function
<heat>
they should add a sanitizer call honestly, or a ud2/unimp
<geist>
exactly. i saw it on arm at work and was saved by padding between functions being BKPT
<geist>
but if it was x86, the paddings are usually nops
<heat>
you can choose
<heat>
I think clang is very int3 happy
<clever>
would something like `for(;;) { __asm__ __volatile__ ("sleep" :::) printf("sleep interrupted!\n"); }` still be considered UB by clang?
<heat>
clever, no
<geist>
no, the isntant it has anything actualyl in it it's fine
<geist>
even an empty asm is fine
<clever>
ah, *looks*
<heat>
I did while(true) { __asm__ __volatile__("wfi"); }
<geist>
side note: if it has no in/outs you dont need volatile
<geist>
though it's obviously fine to leave it in. just one of those things i learned
<heat>
something I started doing right now: compiling the kernel with full LTO regularly while deving
<heat>
my x86 kernel (or the full kernel, really) has issues with LTO and this is always helpful
<clever>
yeah, even my old hang_cpu had a nop in it
<heat>
geist: something I just thought about: is it possible for TCG to be faster than the current RISCV boards? did you ever test it?
<geist>
yes and no its not
<geist>
well, i mean it might be faster than say a 20mhz dev board
<geist>
my experience is on my fairly high end ryzen a TCG emulated anything is about say 100Mhz equivalent
<geist>
though it depends obviously on the work load
<heat>
i'm looking at phoronix's benchmarks on the unmatched and it's a couple of times slower than an rpi 400
<geist>
yah that's about right. it's roughtly cortex-a53 class
<geist>
and a rpi 400 is a72
<geist>
U74 cores from sifive are basically in order dual issue cpus, much like cortex-a53
<heat>
hey I have an a53
<heat>
it was 15 euro :P
<clever>
random semi-ontopic question, can you even detect an out-of-order cpu from measuring performance alone?
<clever>
if its doing its job properly, it should be functionally identical to an in-order core
<clever>
dual-issue is at least observable, when 2 opcodes take the same time as 1 opcode
<heat>
does superscalar not imply ooo?
<heat>
they call it superscalar
<clever>
the wikipedia page for superscalar looks more like dual-issue then ooo
<heat>
probably
<heat>
what's stopping sifive from making a high performance (lets say a72 to be nice) cpu? wouldn't the only difference be the ISA?
<heat>
(as you may have noticed, I know almost jack shit about CPU design)
<klange>
I was going to say licensing, but ARM massively rolled back their license and royalty fees specifically because of riscv.
<clever>
heat: one fun problem to try and wrap your head around, is register renaming, and scheduling for values from the future
<klange>
it's probably still licensing, tho
<clever>
heat: basically, if you use r0 for time 0-3, and then you immediately overwrite r0 with a different value for t4-6
<clever>
the cpu could internally rename r0 to 2 different internal registers (lets call them a and b)
<clever>
and then do both of those computations in parallel
<heat>
yes but my question is: why can't sifive apply already existing knowledge (because those problems have been solved) to their CPU designs? is it just licensing?
<clever>
heat: but also, if you load into r0, act on it, then overwrite r0, ooo + reg renaming, would allocate reg-a to hold the load from ram, and reg-b to do the 2nd computation
<clever>
and then the reg-a part can stall out, and reg-b can compute out of order
<clever>
yeah, i think the real question there, is the verilog for such things out in the public?
<geist>
heat: superscalar simply means it can potentially run > 1 instruction per cycle
<geist>
ie, the IPC > 1 or CPI < 1
<geist>
so dual or triple or whatnot issue, but in order can still be superscalar
<geist>
OOO is simply a technique to get farther
<clever>
just barely in-order, parallel is not yet crossing the border of ooo
<geist>
right, parallel has really nothign to do with in or out of order
<geist>
ooo just gives you more options
<clever>
superscalar feels like executing 1, 23, 45, 6, 7
<clever>
while ooo is more like 2, 1, 3, 5, 4, 7, 6
<geist>
what are those numbers?
<clever>
the order of the opcodes in memory
<geist>
uh. well that's literally not correct
<clever>
and each , is a clock cycle
<geist>
superscalar simply means it can achieve > 1 instructino per cycle. doesn't say how it does it
<clever>
yeah, thats what the 23 and 45 are, 2 opcodes in the same clock
<geist>
oooh i was thinking it was like opcode 23
<geist>
in your example it would be like
<clever>
and some of them have data dependencies and are falling back to 1 per clock
<geist>
1 23 45 6 7 and for ooo it'd be like 1 25 367 4
<clever>
yeah, the ooo, was non-super-scaler to make it simpler
<clever>
you can also combine both, to make it even more muddy
<geist>
anyway. to heats question: why can't sifive just do all the best things
<geist>
answer is, they are, it takes time
<geist>
it's like writing a compiler from scratch. you can know where you wanna get, but yuo can't jsut do everything the first time
<geist>
so they're basically crawling up through the tech tree and 'catching up
<clever>
yeah
<geist>
as ARM was for years, and say apple was for the last 10-12 years, etc
<geist>
the 8x series and whatnot that sifive has announced start tog et into cortex-a7x range
<geist>
and yes it seems a bit fishy that they chose numberig convention close to ARM and specifically chose to inflate it by one notch
<geist>
ie, sifive 7x stuff is about like arm 5x, sifive 5x is about like arm 3x, etc
<heat>
i was imagining it would be something like writing a kernel where if you've written a fast, complete VM subsystem you can write another fast, complete VM subsystem relatively quickly
<heat>
since you already know the inner workings of one
<clever>
i was thinking, you can simulate a given stream of opcodes on the verilog, and see how many clocks it takes
<clever>
and just not turn it into silicon until you meet some arbitrary goal
<geist>
well, it's a series of techniques that you get to achieve more parallelism
<geist>
i think what's hard to envision without knowing the details is how much harder it is to produce N+1
<geist>
the complexity of modern cpu designs shoot up by almsot an order of magnitude over time
<geist>
at least crawling from simple in order designs to dual issue to superscalar is a huuuge jump of complexity
<geist>
of coruse somethingl ike riscv should be relatively easy to implement, so at least it has that going for it
<clever>
for dual-issue, it seems simple enough to compare the operands of 2 opcodes, and determine if they are conflicting or not
<geist>
yep
<clever>
so you test opcode 1/2, and if they are conflicting, you run just 1, then test 2/3 on the next cycle
<geist>
hence why that sort of tech has been around since probably late 70s/80s
<clever>
and the sliding window will sometimes consume 2 opcodes at a time
<clever>
but when you get into tripple and quad-issue, comparing that many at once becomes more complex
<heat>
isn't out of order in x86 since the pentium or so?
<geist>
i thnk i remember reading that say a CDC6400 supercomputer was superscalar in a fairly similar way
<clever>
and you may need register renaming, to prevent collisions
<geist>
heat: ppro. pentium was dual issue. u and v pipeline
<geist>
ppro was the real jump forward, though it had some issues that wasn't really resolved until pentium 2
<geist>
and about the same time K5 and K6 from AMD
<geist>
K5 had great ideas but poor execution, but K6 was a solid design
<clever>
i feel like the cpu would need to dynamically translate an opcode stream into SSA form on the fly
<clever>
and for each assignment, pick a new register from a ring-buffer of registers
<geist>
if you're really curious how it works, look up Tomasulo algorithm
<clever>
and manage dependency chains, for if a given reg doesnt have a result yet, and forms a dep chain
<geist>
it's AFAIK still pretty much the basis for how register allocation and collision detection is done on modern designs
<geist>
invented in 1967
<geist>
if you hear things like 'reservation stations' and whatnot, that's part of tomasulo
<geist>
but i think what happens is as you have larger and larger register files and the number of instructions you're willing to track goes up you get this exponential increase in the size of silicon needed to pull it off
<geist>
hence why modern OOO designs can be orders of magnitude larger than simpler ones
<geist>
hence why ou get big.LITTLE stuff, since you can fit the 'little' cores in a quarter of the size of a big one and get a corresponding drop of power usage, etc
<mxshift>
POWER4 was 5 issue with something like 250 instructions in flight. Wasn't a surprise IBM went back to in order for a few generations
<geist>
yep. and that's where we're at now, though we hav new cores like M1 that are ridiculously parallel
<clever>
geist: ah, so your bit.LITTLE design, could both be implementing the exact same isa, with the same msr's, but one is more ooo, which runs more opcodes/clock, but uses more gates
<geist>
right
<geist>
so the little cores run less isntructions per cycle (IPC is lower) but they do it at much lower power
<geist>
so you get a bit of both worlds
<clever>
would you only use big or little, or would it be more like 2 little and 4 big, and you run on all 6 at once?
<geist>
so maybe the little core is say 2x slower than a big core but uses 1/3 the power
<clever>
and the kernel has to schedule to the right core?
<geist>
the latter
<heat>
aren't you bound to be more parallel if instructions are simpler?
<clever>
i did also hear about how some x86 big.little stuff has been breaking drm
<clever>
because the drm gets upset when the cpuid changes in the middle of running a thread
<geist>
heat: possibly, though of course the whole 'break cisc instructions into smaller riscy ones' has been going on forever
<geist>
which mitigates that greatly
<geist>
but it still requires more transistors to decode the complex isa
<geist>
ie, the 'front end' is more complicated
<geist>
by the time you get to the mid and back end of the cpu the ISA is mostly irrelevant and it's the same thing
<geist>
but.. yes on a simpler cpu design, the cost of the ISA starts to kick in more, since you're spending more and more transisters relatively speaking for instruction decoding
<geist>
and if you're not just transcoding the instructions internally into riscy uops, then you have to deal with the complex isa all the way through
<heat>
clever, I don't think the x86 big.little cores have different cpuids
<heat>
but I might be wrong
<clever>
might be something else, dont remember the exact details
<geist>
i need to actually buy an alder lake so i can verify it
<kingoffrance>
so some cpus might be illegal under the dmca </sarcasm>
<geist>
i think the answer is cpuid does change between cpus, but it always has, it just changes in areas that you usually dont care about (apic id, etc
<geist>
what should be the same is the feature set of the cpu
<geist>
same with ARM: big.LITTLE systems mandate that the cores must have the same feature set
sprock has joined #osdev
<heat>
"Denuvo is mistaking the E-cores as being from a different system altogether, resulting in crashes and failures to load properly. "
<geist>
yah probalby just some code that has tob e updated
<heat>
I remember reading some lkml patches where some vendor(intel? arm?) wanted to have CPUs that can't execute 32-bit compat code and the kernel had to carefully select the core each 32-bit thread would run on
<bslsk05>
www.kernel.org: Asymmetric 32-bit SoCs — The Linux Kernel documentation
<geist>
aaah
<geist>
yep got it. makes sense. i think the newer armv9 cores are 64bit only, but could possibly still pair it with an armv8 a55? hmm, that wouldn't make sense
<geist>
must be some armv9 cores that are that way
<heat>
any consumer armv9 yet?
<geist>
not that i know of
<geist>
but the pipeline takes a few years to make it out to users
<geist>
for the most part it's my understanding that armv9 is *mostly* a snapshot of armv8.5 + mandated SVE vectors
<mxshift>
Wasn't there talk about x86 big.LITTLE where only big had AVX512
<geist>
yep. that's precisely the case with alder lake right now
<geist>
some amount of controversy
<geist>
since i think officially the AVX512 is disabled because the littles (E cores, efficiency) dont have it
<geist>
but apparently in some of the vendors mobos you can disable the E cores and then you get an option to enable AVX512
<geist>
so basically the hardware is present in the P cores, but disabled by bios
<mxshift>
Ah
<mxshift>
Otherwise you'd have different feature flags in cpuid
<geist>
right, that's a prime example of the kernel needing to dealw ithit. it'd have to detect an illegal opcode on an E core, then force migrate the thread to a P core and set a bit that causes it to schedule there
Jari-- has joined #osdev
<heat>
oh no, not avx 512, what will we do without avx512!
<geist>
of course, that'd be totally doable.
<geist>
i mean i'd be happy writing code on zircon or whatnot that did that. sounds fun
<heat>
it'd be fun until you realise you need to decode x86 instructions :P
<Jari-->
morning.com *.EXE
<geist>
heh that's true. thjough i think AVX instructions are pretty easy
<heat>
and pair them with a cpuid bit
<geist>
that being said the P cores are alder lake are pretty quick. intel definitely did a pretty good job there
<mxshift>
Meh. Pretty often kernels end up needing an instruction decoder for emulating instructions in various cases
<geist>
yep
<geist>
now i d wonder if it's easy to mask of AVX512 in particular
<geist>
though of coruse there's like 10 separate subsets of AVX512 now
<geist>
and i guess you'd have to do a proper subset test
<heat>
and who knows what differences might exist in the future
<heat>
totally breaks forwards compat with future cpus
<geist>
yah though i think you could opt into enabling them. i think
<geist>
there are control bits to disable avx512 or so, though i dont know at what granularity you can
<geist>
at fir glance it looks like avx512 stuff is mostly in the EVEX space and wehatnot, which is its own prefix
<geist>
makes sense, bceause they doubled the number of regs from 16 to 32
* moon-child
still has to learn avx512 encodings
<moon-child>
avx2 one was clever
<moon-child>
though i didn't get why you have to invert the aux register number
Jari-- has quit [Remote host closed the connection]
ThinkT510 has joined #osdev
GeDaMo has joined #osdev
dormito has joined #osdev
nyah has joined #osdev
sdfgsdfg has quit [Quit: ayo yoyo ayo yoyo hololo, hololo.]
lg has quit [Ping timeout: 272 seconds]
qubasa_ has joined #osdev
lg has joined #osdev
qubasa_ is now known as qubasa
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
<g1n>
hello
dennis95 has joined #osdev
<mrvn>
I think cpuid needs to be extended for asymetric cores, preferably with a user configurable part. Something to say: We have avx512 but don't want to use it just for fun.
<mrvn>
If linux handles asymetric cores well maybe I can port it to my Amiga: MC68060 + PPC604e. 2 really asymetric cores.
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
srjek|home has joined #osdev
[itchyjunk] has joined #osdev
ElectronApps has quit [Remote host closed the connection]
srjek has quit [Ping timeout: 240 seconds]
niedzejkob[m] has joined #osdev
<niedzejkob[m]>
I'm looking to collect some data on various osdev-relevant BIOS and vendor idiosyncrasies. What would you measure, apart from the CS:IP combination used to jump to 7C00, what A20 methods work, and whether a partition table is necessary in the MBR to boot?
<zid`>
much more work but, e820 map
<zid`>
usb boot
<zid`>
hybrid 9660
<niedzejkob[m]>
I'm not too familiar with 9660, what do you mean by that?
<zid`>
cd-rom
<niedzejkob[m]>
yeah, but "hybrid"?
<zid`>
some bioses will boot ISO9660 filesystems off hard drives, some will let you boot a floppy image embedded into a cd-rom, blah blah
<niedzejkob[m]>
also, I assume usb boot means "can it boot from usb"?
<zid`>
yea
<zid`>
that's obviously a lot more code, it became de-facto around 2005
<bslsk05>
wiki.syslinux.org: Isohybrid - Syslinux Wiki
mahmutov has joined #osdev
<mrvn>
EFI data would be more relevant today: can it still BIOS boot, can secure boot be disabled, can you enter your own keys, does the EFI brick itself if the key store overflows?
<mrvn>
Can it boot FAT32 from USB and FAT16 from SATA?
<mrvn>
Suppoprt for M2.key
Terlisimo has quit [Quit: reboot]
the_lanetly_052 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 252 seconds]
Terlisimo has joined #osdev
vdamewood has quit [Read error: Connection reset by peer]
FreeFull has joined #osdev
vdamewood has joined #osdev
xenos1984 has quit [Read error: Connection reset by peer]
terminalpusher has joined #osdev
<Bitweasil>
... if I never have to burn another floppy image to a CD-ROM to update a BIOS, it'll be too soon.
<mrvn>
Better than having to hot-plug the bios chip to fix a bricked bios
<Bitweasil>
Yeah, won't argue there.
<zid`>
Thankfully my ME can do it over USB
<zid`>
So unless the ME is bricked my bios is safe
<mrvn>
It's unfortunate that so few boards have a ROM and flash and a jumper to pick
<mrvn>
or a mini bios that allows flashing and then the real one
<zid`>
Good ol 2011 technology, external ME :P
<gog>
i've had dual-bios rescue a seemingly bricked computer before
<mrvn>
Firmware on SD like the RPi has is nice too
<gog>
got it for nothing because the previous owner thought it was totallty dead
<zid`>
I had an ancient gigabyte with QUAD BIOS
<zid`>
because it'd eat its bios so often
<gog>
iirc this was a gigabyte mobo
<gog>
and it later corrupted its main bios again somehow
<zid`>
Quad BIOS is a feature unique to GIGABYTE that includes DualBIOS™ and Express BIOS Rescue Technology. This combination delivers a safety assurance mechanism that sports a total of 4 copies of BIOS distributed between the Flash ROM, hard-disk and driver CD.
<zid`>
* (because the bios software will continually corrupt itself and we couldn't fix it, so we spent literal dollars adding more hardware)
<Bitweasil>
lol, yeah... at some point, you have to wonder *why* some of this stuff is being done.
dennis95 has quit [Quit: Leaving]
<Bitweasil>
I totally get dual BIOS and a toggle if you're doing hardcore overclocking.
<gog>
yeah
<gog>
that's a sensible thing, but having to have a backup of the backup of the backup "just in case" seems like uh
<gog>
wtf you doin
* zid`
holds down the usb flashback button instead
<zid`>
I kept meaning to write tetris-from-cache and burn it to my bios chip but never got around to it because I cbf to reboot my desktop
<zid`>
so the machine just POSTS into tetris instead
<gog>
lol nice
masoudd has joined #osdev
the_lanetly_052 has quit [Ping timeout: 272 seconds]
<gog>
> The systems programmer has read the kernel source, to better understand the deep ways of the universe, and the systems programmer has seen the comment in the scheduler that says “DOES THIS WORK LOL,”
<gog>
big gog energy
<GeDaMo>
"There is nothing funny to print when you have a misaligned memory access, because your machine is dead and there are no printers in the spirit world."
<zid`>
No but hell is full of them
heat has joined #osdev
<mrvn>
GeDaMo: why would a misaligned memory access make the machine dead?
<GeDaMo>
It's a joke
<mrvn>
don't give up your day job
<GeDaMo>
I didn't write it, I was quoting
terminalpusher has quit [Remote host closed the connection]
<sham1>
mrvn: at least order ARM chips and other arches like Motorolla 68ks didn't like unaligned accesses at all
<sham1>
And it could indeed kill your machine in the worst case
<bslsk05>
en.wikipedia.org: Killer poke - Wikipedia
<gog>
halt and catch fire
<heat>
GeDaMo, the efivars thing has nothing to do with the rest lol
<heat>
i know it's technically against the UEFI spec but it's like deleting your whole root partition and being surprised it doesn't boot
<mrvn>
sham1: unlike 68000 - 68020 on basically any other cpu an unaligned access is perfectly defined and recoverable. You just need the exception handler.
<mrvn>
heat: the UEFI paragraph is also missing a much more surprising bug: If you store efivars and it runs out of space the system doesn't even power up enough to enter efi anymore.
sprock has joined #osdev
<heat>
mrvn, never heard of that one
<heat>
which boards?
<zid`>
I've heard of similar bugs
<zid`>
bios boot it and reflash uefi firmware to not suck I guess
<heat>
you can't bios boot it
<heat>
CSM works on top of UEFI
<mrvn>
heat: can't remember
<Bitweasil>
That doesn't surprise me in the slightest.
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
matrice64 has joined #osdev
<heat>
sounds like crap IBV code to me
<mrvn>
Infectious bronchitis virus?
matrice64 has quit [Client Quit]
<heat>
independent bios vendor
<heat>
the guys that write more crap on top of tianocore before it reaches you
<heat>
AMI for instance
<Bitweasil>
Is anyone else just amazed computers even boot anymore? :/
<Bitweasil>
With all the... poorly written cludges piled on top of everything else?
<mrvn>
works-for-me works pretty well for bioses
<gog>
if it works one one end leave it alone and just kludge atop or alongside it
<gog>
or you risk breaking it entirely
<gog>
the balance of every ecosystem is fragile
<mrvn>
Gog: I'm more and more on the side of: break it entirely (and do it right)
<gog>
b-b-but muh backward compatibility
<gog>
"backward" indeed :|
<heat>
actually it's not that fragile if you test it
<mrvn>
b-b-ut make clean; make all
<heat>
a nice chunk of firmware is unit tested for example
<heat>
your (linux) kernel is fuzzed and unit tested and stress tested too
<mrvn>
and yet since 20+ years the error recovery in scsi/sata is totaly broken in most cases
matrice64 has joined #osdev
matrice64 has quit [Client Quit]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
immibis has quit [Ping timeout: 256 seconds]
sdfgsdfg has joined #osdev
sdfgsdfg has quit [Client Quit]
Jari-- has quit [Remote host closed the connection]
<heat>
mrvn, maybe you can help then :)
Raito_Bezarius has quit [Ping timeout: 260 seconds]
<heat>
testing efforts in linux besides the "hey it works on a bunch of 24/7 systems so it's totally solid" is relatively new and is an ongoing effort
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<mrvn>
heat: to fix it I would have to start at the hardware level.
<mrvn>
The kernel should be able to power cycle harddisks on hard errors.
<heat>
you can power cycle the controller cant you?
<mrvn>
nope
<heat>
what does "reset the controller" mean then?
<mrvn>
you send a command to the controller telling it that it should reset itself
<mrvn>
the command then gets a timeout because, guess what, the firmware on the drive has crashed. deadlock
<geist>
ahci has a fair amount of ability to whack the device though
<geist>
at least there
<geist>
though yeah you can't drop hard power to it
<mrvn>
from experience the reset port command does absolutely nothing. If a drive goes into an error state it just stays stuck there.
* geist
nods
<mrvn>
Not sure if it's even ment to fix any more than the local SATA controller
<geist>
oh eep, when i set up my neww mac laptop i forgot to mark the src subvolume as case-sensitive
<geist>
probably just to be difficult there are a few files within the linux source tree in the same dir that differ with just their case
<heat>
ah what
<j`ey>
yes :|
<heat>
aren't they all lowercase excepts docs and makefiles?
<geist>
nope
<geist>
and worse there are some files that literally have two separate cases in the same dir
<geist>
i just wiped over it so i can't tell you what the name was
<geist>
something like xt_CONN... i think
joe9 has quit [Ping timeout: 240 seconds]
<j`ey>
some netfilter stuff
Raito_Bezarius has joined #osdev
<geist>
yah lookig now. some of those headers in the uapi dir i think
<geist>
xt_CONNMARK.h and xt_connmark.h
<geist>
and they're actually different. *shrug*
<geist>
anyway if you git clone that onto a case insensitive fs you always get a differing file
<mrvn>
geist: shouldn't git notice the file exists?
<geist>
yah but it gets upset about it, since only one instance ends up on the fs
<mrvn>
clone should fail
<geist>
*shrug*
<geist>
go fix git
<geist>
i mean it does notice it. it fucks up,but it has no solution to it
<kazinsal>
having not actually done an ahci driver, does ahci support PIO at all or is it purely DMA?
<mrvn>
obviously.
<kazinsal>
mostly thinking for a "the box is extremely hosed and I can't even trust that interrupts will work right" scenario where you still want to blat some logs or core or whatever to disk
<geist>
you cannot PIO pull data out of AHCI, no. the transfers are intrinsically DMA, descriptor based
<kazinsal>
gotcha
<geist>
you can queue PIO commands to the device itself, but it seems that AHCI knows this and does it for you
<geist>
but you can spin on the hardware if you want. you dont need irqs to send data
<kazinsal>
right, will put that on the to-do list, thanks
<heat>
fun fact: tianocore is interrupt-less
<geist>
yah i need t finish my driver. i got it basically working, sends ID command, but then i got side tracked adding more features to the PCI bus so i could test it on ARM
<kazinsal>
once in a while I forget that I've mostly given up on actually doing any "modern" osdev and have a thought that pops into my head like "ooh this could be useful"
<geist>
all in all the basic AHCI transfer mechanism is fairly straightforward i can now say with some authority
<bslsk05>
www.theregister.com: Intel to license x86 CPU cores for use in custom processors • The Register
<geist>
oooh maybe they'll finally make a 64bit only variant
<mrvn>
geist: would that make much of a difference?
<geist>
if i were licensing it for my project i'd say 'can you remove all that old trash and just start the cpu in 64bit mode' also 'give me a way to run in 64bit mode with no MMU enabled'
<geist>
i dunno!
<kazinsal>
it'd at least reduce the boot time complexity somewhat
<geist>
but if you were using it for some sort of embedded controller thing a 64bit only x86 would be nice
<mrvn>
I'm all for starting it in long mode but dropping the rest probably causes tons of bugs
<geist>
exactly
<geist>
dunno what it'd actually net in terms of silicon savings
<geist>
probably not substantial for a large core but may be actualy useful for a smaller core
<mrvn>
You would have to create some identity page tables for the MMU on powerup
<geist>
or add a new mode that lets you run without paging enabled
<mrvn>
==> bugs
<geist>
well, sure, but there'd be bugs alone from just mucking with ripping out old parts
<geist>
i'd kinda hope intel would try to test it
<mrvn>
geist: keep the old parts, change just a bit to start in long mode with the MMU using page tables from rom.
<geist>
and yes yes intel has made buggy hardware. but at some point you hope they have the basic ability to delier a product if you're going to pay for it
* geist
shrugs
<geist>
just a wish list
<geist>
i'm sure they wont do it
<mrvn>
geist: when hell freeze over
<geist>
likely. requiring page tables for 64bit i still dont completely understand precisely why AMD mandated it
<kazinsal>
yeah, it's a bit of an odd one
<mrvn>
geist: saves silicon. you never want to run without page tables at all other than boot and 16/32bit handles that.
<geist>
yeah i guess
<geist>
seems like they coulda just done it that you set the long mode bit and then the next call into a 64bit CS does it
<mrvn>
geist: They could have supported mega huge pages in L4 tables though. So you only need one page to make a mapping.
<geist>
but you have the long mode as the arm bit, and then the PG bit as the activation, which is strange
<geist>
yeah, though i dont remember if with 5 level paging they let 4th level pages work
<mrvn>
geist: I believe that's backward compatibility, squeezing the feature in along the existing cruft.
<mrvn>
geist: remeber also that you are supposed to go 16 -> 32 -> 64. You can got 16 -> 64 but that's not official.
<geist>
seems like they were using the long mode bit as the arm, and PG as the trigger. but it begs the question why you needed two stages
<geist>
but whatever not a thing i really want to waste brain cycles on today
<mrvn>
Might also have something to do with going back to running 16/32bit code from long mode.
<geist>
possibly
<mrvn>
The trigger part mirrors what happens when you return from 32bit code
<geist>
so my mission for today: try to actually start implementing EL1 stack overflow
<geist>
stack overflow detection. it's surprisingly tricky and AFAIK only linux solves it
<mrvn>
geist: don't you just insert a guard page?
<geist>
see above: surprisingly tricky. that would be *far* to easy
<geist>
think about it a bit. what happens if you fault on the guard page?
<mrvn>
geist: thinking ARM32 here: the mode switches and you run with shadowed registers.
<geist>
right. arm64 is different
<mrvn>
does arm64 dump something on the stack?
<geist>
no
heat_ has joined #osdev
<mrvn>
anyway, detection should still just be a guard page. Handling it then is the tricky part.
GeDaMo has quit [Remote host closed the connection]
heat has quit [Read error: Connection reset by peer]
<geist>
it saves the exception syndrome in ESR_EL1, the return address in ELR_EL1, and the fault address in another register which escapes my mind
<geist>
aaaand switches to SP_EL1 <- this is the key
<mrvn>
so you then have to check the stack register without corrupting any other register.
<geist>
right
<geist>
linux does this clever trickery where they essentially do the xor trick to swap SP_EL1 with x0, and then do math on x0 to see if it overflowed basde on implicit alignment of the stack
<mrvn>
in my kernel I save registers via a per-thread register into the task structure instead of the stack. So I really don't need the stack to handle exceptions or interrupts.
<bslsk05>
github.com: linux/entry.S at master · torvalds/linux · GitHub
<geist>
ie, all stacks are say 8K long, and must be 8K aligned in a 16K region so you can do a bit test to see if it overflowed
<geist>
but i dont like that solution so i've concocted something different, involving SP_EL0
<geist>
mrvn: you'd still have to recover your per thread structure
<mrvn>
geist: do you have 2 thread registers to spare?
<geist>
depends define what a thread register is
<geist>
are you talking about a MSR or a real integer register?
<mrvn>
geist: the special registers arm32 has on each core where you can store your per-cpu struct or other usefull things.
<geist>
sure. okay problem with those is the only way to use them is to copy them into a regular register
<geist>
ie, `msr x0, TPIDR_EL1`
<mrvn>
geist: hence the need for 2. Save one register in thread1, load thread2 into register
ThinkT510 has quit [Changing host]
ThinkT510 has joined #osdev
<geist>
ah, well no you only have one
<mrvn>
bummer
<geist>
but one solution is to have that point to something that has scratch space
<mrvn>
On ARM32 I get away with it due to banked registers.
<geist>
save a few regs, then do your math, then put them back and save it
<geist>
not sure arm32 works n all case. the banked part is still difficult, since yo have to detect a fault while faulting itself
<geist>
but it does give you more choices
<mrvn>
I just use the banked registers as scratch registers so I can do some shuffeling around. Point is that they don't overwrite anything I still need.
<mrvn>
I just must not fault until I've finished the shuffeling and the scratch registers are free again.
<geist>
yeah that's somewhat doable if you're armv6+ because you can fiddle with banked stuff without switching modes. probalby
<geist>
via cps and whatnot
<mrvn>
I think if my pre-cpu pointer in the thread register gets corrupted I'm uttelry lost.
<mrvn>
just fails in a loop
dormito has quit [Quit: WeeChat 3.3]
<mrvn>
On x86/x86_64 you can set a separate stack for the stack fault handler.
<geist>
yeah you can use the double fault trap via an IST or a TSS task switch
<mrvn>
faulting in a stack fault will underflow the stack eventually.
<geist>
one of the cases where the x86 stuff is very straightforward
<geist>
anywya, mostly using this as a rubber duck debugging / page it back in my brain
<geist>
explaining it here sort of refreshes my mind
<geist>
anywya, the trick that i'm settling on relies on the fact that there *is* a banked SP in the form of SP_EL0 and SP_EL1
<geist>
you can run the kernel in SP_EL0 (CurrentSP set to SP_EL0) and then when you fault the cpu aways switches to SP_EL1, so you hvae a free reg there
<mrvn>
geist: I'm not sure what you want to do after a stack underflow in EL1 though. Can you do more than dump register and stack content and die?
<geist>
and more importantly there are separate entry vectors that fire depending on if you were using SP_ELx or SP_EL1
<geist>
basically yes
<geist>
but that beats spinning forever in an exception loop
<mrvn>
so you can have a valid SP_EL1, save the SP_EL0, reset it to a backup stack and run the "dump and die" code in SP_EL0 mode again?
<mrvn>
If that faults again you survive a few loops till SP_EL1 underflows too
<mrvn>
Maybe a good idea would be to detect recursive faults and just stop after 2.
<geist>
enter the exception with SP_EL1 enabled (as the cpu does), copy SP_EL0 to SP_EL1 and push some regs, if it faults then the cvpu will reenter the kernel via a different exceptions (EL1 w/SP_EL1 set) and then you fail. basically treat that partiular excption as a double fault
<mrvn>
yep
<geist>
but in the normal case once you've pushed everything, switch back to SP_EL0
<geist>
something like that. the trick is to treat the pushing of regs as a critical section and if you trap inside that you reenter via a *different* exception vector
<geist>
and then you know you're in a bad plae
<mrvn>
nic if the hardware supports that
<mrvn>
nice
heat_ has quit [Read error: Connection reset by peer]
heat_ has joined #osdev
<sortie>
PSA: If you've been doing osdev for a couple years, and expect to be around in a couple of years, and have had to patch config.sub often, just go ahead and submit an upstream entry in config.sub and config.sub in the GNU config project NOW NOW NOW NOW NOW. Thank me in two years.
<klange>
why, they won't accept it
diamondbond has joined #osdev
<klange>
maybe if you send the patches from a @google.com email address
<sortie>
Obviously they will? They did? They accepted lots of junk entries in that file already
<sortie>
Did GNU config reject anyone? :O first I hear of this
<heat_>
they have something called "fiwix"
<sortie>
I certainly did not use a corp address for this (heck even predates that by a year)
<heat_>
also secbsd (which has no source code as far as I can see), serenityOS
<sortie>
This should in theory ought to be the easiest GNU code to contribute to since they force it on everyone to register their OS
<klange>
remember? he still posts on the forum as recently as a week or two ago
<sortie>
rdos is still around? Neat
dh` has quit [Ping timeout: 240 seconds]
<Griwes>
the threads he's in really evoke the same spirit of the osdev forums from 10 years ago, if you catch my meaning
<sortie>
^_^
<heat>
the osdev forums were a mistake
<klange>
helps keep people from arguing in talk pages on the wiki
mahmutov has quit [Ping timeout: 240 seconds]
<kazinsal>
I wonder if rdos is still pining for flat segmentation
<moon-child>
oh is that the pdos guy?
<klange>
fucking hell i just waited two hours for a video to process on youtube only to find that macos's screen recorder doesn't record audio which was literally the point of the video
<klange>
no rdos is the rdos guy, the pdos guy is "kerravon"
<heat>
couldn't you have played the video beforehand
<klange>
I had just assumed it would not be stupid, so the mistake was mine.
<zid`>
dubbable or will you need to entirely remake it with something else?
<mrvn>
I dubb thee sir zid.
<klange>
The audio is critical to the video, so it needs to be remade.
<klange>
It's Doom.
<zid`>
Unless you wanna make machine gun noises with your mouth
<klange>
Apparently this is just not a thing that macOS supports, at all.
<mrvn>
phew, phew, phew.
<zid`>
'phew' is a pretty weak machine gun
<mrvn>
ratatatatatata
<heat>
macOS is going for feature parity with linux
<klange>
Pulse supports this perfectly well.
<zid`>
it's easy peasy on linux, just erm.. dmix.. something something
<zid`>
(but yea, pulse can do it, if everything is set up to output to pulse)
<heat>
can't screen share with sound
<zid`>
I have stereo mix so that's also an option for me
<heat>
discord can't, and so didn't zoom IIRC
<zid`>
just record that in audacity
<zid`>
discord can do sound, at random
<klange>
Discord is a horrible webapp in a box.
<zid`>
it can't do desktop sound but if you attach to a window, some of the time it can hook the sound api and get sound too
<klange>
Apparently I need a third-party extension to provide a loopback interface, and the only free one, despite being FOSS, is only available from a sponsor site that is demanding an email address and my Full Legal Name to get a prebuilt version.
<klange>
This despite the fact the thing is on GitHub, which is a has a perfectly good and free CDN-backed binary releases feature.
<zid`>
sounds like it would defo be easier to do it by hooking the sound api for that one program
<klange>
in what?
<klange>
There is no way to do that.
<zid`>
hmm?
<klange>
OBS can't do it, macOS screen record can not do that.