<bslsk05>
jtsylve.blog: 2022 APFS Advent Challenge – Joe T. Sylve, Ph.D. – Digital Forensic Researcher and Educator
<geist>
oh interesting, looks like an interesting read
<zid>
sizeof(uint32_t);
<zid>
#define FOUR 4
<heat>
I do sizeof(uint32_t)
<heat>
deal with it
<zid>
yea it's legit if you're using it as abstraction for 4
heat has quit [Ping timeout: 252 seconds]
<geist>
well, if you're on a 16bit machine then the sizeof(uint32_t) would be 2
<geist>
ie, a TI Piccolo, which i dealt with in 2014 and still exists
<geist>
and yeah i know you're not likely to hit it, but that kinda stuff *does* still exist. i had to fiddle with some of my code when i took some bits out of LK and ran it on a piccolo
<geist>
notably things that assumed a char was 8 bit, and/or memcpy(..., N) copied N bytes
<klange>
Those are for analog sampling? Not _technically_ audio equipment, but basically the same idea?
<geist>
actually TI piccolo is used in a lot of motor controller stuff
<geist>
but it's basically a wonky 16 bit arch, where the smallest unit of addressible meory is 16 bits
<geist>
which is of course perfectly legal in C
<geist>
well, actually 32bit arch, now that i looked at it, but it did't have a huge address space
<zid>
geist: The code would break if CHAR_BIT was not 8
<geist>
yup
<zid>
That's why I said it's just an overly complicated way of writing 4
gog` has quit [Ping timeout: 264 seconds]
TkTech has joined #osdev
dude12312414 has joined #osdev
dude12312414 has quit [Client Quit]
zaquest has joined #osdev
Turn_Left has joined #osdev
Left_Turn has quit [Ping timeout: 260 seconds]
Turn_Left has quit [Ping timeout: 252 seconds]
bgs has quit [Remote host closed the connection]
<ddevault>
yeah it was the memory barriers
<zid>
It's always the memory barriers
craigo has joined #osdev
x8dcc has joined #osdev
craigo has quit [Quit: Leaving]
<x8dcc>
I am adding an IDT now and although I have some ideas on how to do it (initialize it) I am not sure about some stuff
<x8dcc>
first of all, I would need a full 256 entry idt, right? and I would need to initialize it as 0 (at least the present bit)
<x8dcc>
and on top of that, fill the entries I want to use/handle
<x8dcc>
I thought about declaring the idt bytes (for the 256 entries) in assembly, but I am not sure if it's the best idea because afaik it will be easier to fill it from a C function
craigo has joined #osdev
bauen1 has quit [Ping timeout: 256 seconds]
danilogondolfo has joined #osdev
<zid>
you don't need all 256 entries
<zid>
but you might as well leave the space for it regardless
<zid>
so that you don't need to keep moving it around as it grows
<x8dcc>
yeah I see
<x8dcc>
I thought I did
<zid>
the IDTR contains the size in bytes (less 1) of the IDT
<x8dcc>
still, I am not sure where to declare it because I the IRS will be in C
<zid>
IRS?
<zid>
oh interrupt servic routines?
<x8dcc>
so I don't know which way would be easier
<x8dcc>
ISR*
<x8dcc>
yeah yeah
<zid>
easier for me is C, idk what you find easy
<x8dcc>
I see, thanks
<x8dcc>
and yeah, I know its the size - 1, noticed from the gdt :)
<bslsk05>
github.com: boros/interrupt.c at master · zid/boros · GitHub
<zid>
64bit so the shifts and stuff will be different, but you get the idea
<x8dcc>
yeah, I will have a look now. Thanks!
fedorafan has joined #osdev
<ddevault>
ESR == 0x2000000
<ddevault>
whelp
vexmane has joined #osdev
<ddevault>
on... mov x8, x0?
<ddevault>
nothing in ARMARM D17.2.37 seems to be relevant
<ddevault>
this code worked yesterday :<
<x8dcc>
zid: why do you exactly need to remap the IRQs? I saw that in other projects as well but I am not sure what that does
bauen1 has joined #osdev
<ddevault>
well, it wasn't the data or instruction caches
<ddevault>
that's the only lead I could find
nyah has joined #osdev
<ddevault>
hm, maybe SPSR is wrong
<zid>
x8dcc: 0+ is where cpu exceptions are
<zid>
you won't be able to tell IRQ1 apart from #DIV
<zid>
etc
<ddevault>
sure would be nice if firefox would load the end of ARMARM so I could get to the register index
<ddevault>
nope, looks fine, only thing set is the carry flag
<ddevault>
what the hell
gog has joined #osdev
<gog>
computer?
<zid>
no
<zid>
no computer
<gog>
zid did you sleep
<zid>
no
<gog>
do you work
<zid>
no
<gog>
fair enough
<gog>
you want money
<zid>
yes pelase
<x8dcc>
zid: I don't understand what you mean by "0+"
<zid>
2 sacks
<zid>
x8dcc: rather than 32+
<zid>
the numbers larger than 31
<x8dcc>
I stil don't know what you are talking about, sorry
<zid>
are you aware of the number line
<zid>
or do you have a ruler handy idk
<zid>
centimeters or inches, both are fine
<x8dcc>
sure
<ddevault>
this does not make any sense ;_;
<zid>
Okay so imagine interrupt 0 is at 0cm, interrupt 1 is at 1cm, etc etc
<zid>
divide by zero exception is interrupt 0, interrupt 1 is debug exception, etc
<x8dcc>
yeah
<zid>
It's best to sliiide those IRQs over, so that they don't overlap
<zid>
to say, 32cm onwards
<x8dcc>
why? overlap with what?
<zid>
>divide by zero exception is interrupt 0, interrupt 1 is debug exception, etc
<ddevault>
exceptions
<ddevault>
the CPU generates interrupts on its own when invalid scenarios arise, such as divide by zero
<ddevault>
by default they overlap with IRQs, which are nominal cases
<ddevault>
you don't want to mix nominal and exceptional, so you remap the IRQs
<x8dcc>
oh, I didn't know that
<zid>
so you just weren't aware of cpu exceptions? huh
<x8dcc>
I didn't knew they overlapped
<zid>
Well I explained it, twice
<zid>
that you should make it so they shouldn't
<ddevault>
your metaphor was awful zid
<zid>
by moving the IRQs to 32
<zid>
it wasn't a metaphor?
<ddevault>
number line/ruler metaphor
<zid>
It wasn't a metaphor
<x8dcc>
I understand it now anyway
<ddevault>
I am fairly certain that the interrupt handlers are not literally 1cm apart
<ddevault>
if only because my CPU is narrower than 32cm
<zid>
a nm ruler is too hard to see so I scaled it up
<ddevault>
ah, naturally
* FireFly
ponders the distance to system memory in this analogy
<x8dcc>
well I have to go now, but I will mess with this a bit and probably ask more stuff when I get back ^^'
<FireFly>
take care!
<x8dcc>
thank you both anyway :D
<ddevault>
tfw thought I was done with bullshit problems
<ddevault>
reminds me of when my x86_64 raised a machine check
<ddevault>
always nice when you ask the manual what went wrong and the official answer is "I dunno lol"
<zid>
how the fuck did you get an ME
<ddevault>
writing to invalid physical memory addresses, at the time
<zid>
oh that works
<zid>
maybe? idk
x8dcc has quit [Ping timeout: 268 seconds]
<ddevault>
bah
<ddevault>
whatever
<ddevault>
I can't narrow it down
<ddevault>
not consistent enough
<ddevault>
I want to say it's cache related but I'm not sure
* ddevault
pulls up the cache chapter
<ddevault>
no, perhaps it's not cache related
<ddevault>
disabling the caches entirely in SCTLR_EL1 does not fix the bug
GeDaMo has joined #osdev
<ddevault>
I wonder if my rpi is just overheated, given that it spins the CPU when my test code finishes
* ddevault
puts it in the fridge
<ddevault>
can you tell I'm grasping at straws here
<zid>
I'm going to go out on a limb
<zid>
and suggest that maybe your code is inolved
<epony>
remember when I told you about waste heat being a major problem in SoCs
<ddevault>
well
<epony>
glue a radiator to it, it would help the convection cooling
<epony>
(thermal paste or similar)
<ddevault>
each build consistently causes an unknown fault at the same location, mov x0, x8, in the userspace syscall entry points
<ddevault>
but different builds fail at different times depending on magic, if I add more logging it fails in a different syscall
<zid>
what does qemu do? :P
<ddevault>
qemu works fine -_-
<zid>
what does linux do on the pi?
<ddevault>
I don't even know where to start looking to answer that question
<zid>
run linux on it?
<ddevault>
well I mean linux runs fine
<epony>
sounds like a halt and catch fire
<epony>
could be a HW bug?
<zid>
yea you did it, naughty boy
<ddevault>
I don't know
<zid>
maybe some timer or irq or somthing happens cus it's a real device
<epony>
aks others to validate / reproduce
<ddevault>
the manual suggests a number of possible causes related to this exception
<ddevault>
none of them seem likely
<ddevault>
the closest one mentions writing to SP_EL0 when SPsel = 0, but SPsel = 1
<ddevault>
yeah, if anyone has a raspberry pi I can share a test case
<ddevault>
raspberry pi 4, to be specific
<ddevault>
fridge strat did not work :(
fedorafan has quit [Ping timeout: 248 seconds]
vexmane has quit [Quit: bye bye]
<ddevault>
j`ey: would appreciate being able to pick your brain on this issue, as the resident ARM wizard
fedorafan has joined #osdev
smach has joined #osdev
<j`ey>
just skimmed scrollback.. not sure I have anything to suggest
<ddevault>
oh well :/
<epony>
ask on the qemu mailing list too
<ddevault>
it's not a qemu problem
<epony>
then the rpi lists
<ddevault>
yeah that may be better
<j`ey>
ddevault: have you looked at the unknown reason.. reasons?
<epony>
or run some functional testing / benchmarking tooling on it
<ddevault>
j`ey: you mean the part of the manual which enumerates possible causes?
<epony>
(or a memtest)
<j`ey>
yeah ddevault
<ddevault>
yeah, I have
<ddevault>
none of them seem likely, but I'll read it again
<epony>
eMMC can cause problems..
<epony>
could be the memory going unstable etc
<ddevault>
fwiw I'm not accessing the eMMC
<ddevault>
once the bootloader is done it's not touched again
<ddevault>
and I would expect "general instability" to not manifest this consistently
<epony>
if you're not randomising the layout in memory..
<epony>
things end up in the same places
<ddevault>
well, true
<ddevault>
I could see what happens if I move my stuff around a bit
<epony>
so a memory unreliable chip could be causing it
<epony>
yep
<ddevault>
putting init somewhere else in physical memory does not correct the issue
<epony>
so it's a CPU bug hypthesis
<epony>
hypertenzis
xenos1984 has quit [Read error: Connection reset by peer]
<ddevault>
>This message has been submitted successfully, but it will need to be approved by a moderator before it is publicly viewable. You will be notified when your post has been approved.
<ddevault>
-_-
<klange>
one sec
<ddevault>
rpi forums, not osdev
<klange>
oh :(
<klange>
you got my hopes up there for a moment
<klange>
thought I could actually be useful for once
<klange>
i had a lot of "fun" with my rpi400 bringup
<klange>
good times
<ddevault>
here's the full write-up I (tried to) post on the rpi forums: https://l.sr.ht/a2DF.png
<epony>
the next generation of the raspberry pi is going to be called rping
<ddevault>
every time I have a stupid idea to rule out I really hope it can't be ruled out
<ddevault>
it can always be ruled out
xenos1984 has joined #osdev
<kaichiuchi>
hi
<ddevault>
difficult to resist the temptation to burn this device in a pyre and start a farm
<ddevault>
oh my god it was the caches
<ddevault>
I was invalidating them incorrectly, AGAIN
<bslsk05>
'"APPLE PRODUCT LAUNCH" — A Bad Lip Reading' by Bad Lip Reading (00:04:16)
jjuran has quit [Ping timeout: 264 seconds]
epony has quit [Quit: QUIT]
gog has joined #osdev
Left_Turn has joined #osdev
epony has joined #osdev
<x8dcc>
I am back. I managed to add the IDT, but I feel like I messed something up. I just added ~21 ISRs to the IDT for printing the exceptions and panicking
<x8dcc>
what's a good way of checking if it's loaded correctly?
<x8dcc>
I didn't remap the IRQs yet, by the way
[itchyjunk] has joined #osdev
<epony>
"there is still some time left for yak shaving with the hairy parts of the kernel splicer for continuation resumption in the on chip system debugger console" --Chewie "Altdawg" Chewbacca
dude12312414 has joined #osdev
<gog>
i'm not wroking
<gog>
x8dcc: you will need to re-map the IRQs or mask all of them
<gog>
because if say, the timer IRQ fires, it's gonna show up as aaaa double fault i think
<x8dcc>
so if they are in the same position they fire both? I see
<gog>
yes
<gog>
by default the PC bios remaps the first pic to an offset of 8
<gog>
so IRQ 0, the timer irq, will vector to 8
<gog>
but it's not actually a double fault
<x8dcc>
I see
<gog>
it's just that your cpu was vectored to it wrongly
<x8dcc>
well I tried manually interrupting from assembly to see if the idt was loaded correctly, but it reboots
<gog>
that's a triple fault
<x8dcc>
its what I thought, but I was not sure why that happened
<gog>
faults on looking up ISR, faults on looking up double fault ISR, triple fault is reset
<gog>
or rather
<gog>
#GP due to missing isr, fault due to missing GP isr, then triple
<gog>
or malformatted isr etc
<x8dcc>
okay so for example what I did now was add the idt struct, and create the 256-entry idt. then fill ~21 entries with the right type flags, and offets for the isr, and the isr's should just panic with the current exception
<x8dcc>
because it compiles, I commited and pushed to a different branch, I can send the link if you want, but I am sure I am missing something (besides the IRQ remapping)
<gog>
show me the link pls
<gog>
the irq remapping won't affect what you've done with the ISR
<bslsk05>
github.com: sophia/descriptor.h at main · adachristine/sophia · GitHub
<bslsk05>
github.com: sophia/cpu.c at main · adachristine/sophia · GitHub
<gog>
yeah interrupt gates
<gog>
this is for 64 bit but it's basically the same
<x8dcc>
yeah
<x8dcc>
so what do you exactly wanted to se with the IDTR? the state when the kernel_main runs is different from before
<gog>
just that it's the size and address you expect
<gog>
presumably somewhre above 1MiB
<gog>
and 159 bytes of limit
<x8dcc>
the gdt for example is 100050, which looks pretty normal to me
<x8dcc>
but the idt is 447c0011 00004060
<gog>
that is way off
<gog>
well no
<gog>
it should be 4095
<x8dcc>
well its one of the things that makes me think I'm missing something
<x8dcc>
what exactly should be 4095?
<x8dcc>
I mean, what does that number represent
<gog>
the limit of your idtr
<gog>
since you declared it with 256 entries
<gog>
and it's 8 bytes
<gog>
its limit will be 4095
<gog>
8 bytes per descriptor
<x8dcc>
the size of the struct is 8, sizeof(idt_entry)
<x8dcc>
I looked that up before with a simple printf
<x8dcc>
the limit is set in idt_init as well
<x8dcc>
but the first 447c... part should be fine? that also seems off
<gog>
no it should be near your gdt
<gog>
in the 1MiB area
<gog>
100050, 0x50 1MiB + 80 bytes
<x8dcc>
:/
<x8dcc>
well at least is a start, I know *what* is wrong
fedorafan has quit [Ping timeout: 260 seconds]
<gog>
so basically your lidt instruction is wrong
<gog>
but i don't understand why yet
<x8dcc>
I will try to debug the stack when calling idt_load later
<x8dcc>
I have to leave for a while again... thanks for having a look, gog
<j`ey>
ddevault: getting 502 errors on sr.ht
<ddevault>
aware
<ddevault>
our tech just arrived at the datacenter
<gog>
i should really try to do some work
<j`ey>
ddevault: I hope he has his hammer
fedorafan has joined #osdev
<ddevault>
j`ey: back online
terminalpusher has joined #osdev
<j`ey>
ddevault: I see, and congrats on fixing the cacheu issue
<ddevault>
thanks!
<ddevault>
and on the outage <_<
<ddevault>
hardware failure on the main database server
epony has quit [Remote host closed the connection]
epony has joined #osdev
lucartc has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
gog has quit [Quit: Konversation terminated!]
epony has quit [Remote host closed the connection]
<kaichiuchi>
nice.
<kaichiuchi>
we bypassed read out protection on our equipment
epony has joined #osdev
<kaichiuchi>
STM32F* is ridiculous
gog has joined #osdev
jjuran has joined #osdev
xenos1984 has quit [Ping timeout: 256 seconds]
xenos1984 has joined #osdev
<geist>
ridiculous how? that there are a bazillion of em?
invalidopcode has quit [Remote host closed the connection]
SGautam has quit [Quit: Connection closed for inactivity]
invalidopcode has joined #osdev
gog has quit [Quit: Konversation terminated!]
terminalpusher has quit [Remote host closed the connection]
terminalpusher has joined #osdev
xenos1984 has quit [Ping timeout: 260 seconds]
fedorafan has quit [Ping timeout: 256 seconds]
danilogondolfo has quit [Remote host closed the connection]
fedorafan has joined #osdev
xenos1984 has joined #osdev
<Bitweasil>
Hm. PMSA on ARM, R-series. It's set by the SCRs. This implies the mappings are per-CPU-core, not global, correct?
<Bitweasil>
I can't see how it would be any other way.
<geist>
Yah almost certainly is per core
<geist>
Or more generically, i dont think in pretty much any situation is anything sort of thing like that global, unless it’s a separate peripheral (like GIC, etc(
<Bitweasil>
Right.
<Bitweasil>
Ok, thanks. Need more coffee or something this morning.
<geist>
ddevault: grats! Yeah, just as a suggestion, you really need to be super careful with cache/tlb stuff, since it can show up as Heisenbugs all over the place. Conversely if you see a lot of weird randomness, first instinct should be to suspect cache maintenance
<geist>
Less so on x86, but on ARM/riscv/etc it’s a bit more of a thing to worry about
<Bitweasil>
It really, really is. :/ The manual says, "Thou Shalt Do Updates This Way," and if Thou Doesn't, well, good luck.
<Bitweasil>
Because it's absolutely heisenbugs all the way down.
<Bitweasil>
We had a bug in the emulator where a TLB invalidate wasn't working properly in some cases, and it took a couple of us north of a week to run down.
<geist>
Also to be clear and a bit pedantic, the DSB after TLB ops is not really a memory barrier as much as a strong device barrier. That’s kinda the difference between DMBs and DSBs. DSBs are stronger than DMBs and synchronize any other outstanding things (like cache flushes)
<Bitweasil>
Because the kernel was getting a ton of "impossible" errors.
<geist>
Yeah. If there’s one place that’s worth getting some peer review and whatnot its your fundamental cache routines because if you dont have them right you’ll be tearing out your hair forever
<Bitweasil>
That's one of those places where inner-shared vs outer-shared matters, right?
<geist>
Yeah, though in general outer doesn’t really mean anything, in any reasonable SOC. So usually inner is sufficient when you’re talking about synchronizing things within your OS
<geist>
Ie, all cpus and all memory you’re dealing with for your OS image are basically by definition part of the inner domain
<geist>
Outer is hypothetically there so you could have some sort of loosely coupled cores somewhere else in the system running some other os and would let you be a little more fine grained about synchronization
<geist>
But i think in practice nothing uses it
<geist>
But there’s also ‘SY’ domain which covers all of inner/outer
<Bitweasil>
Hm, ok. That would explain why I can't seem to find much in the way of definitions for when it matters!
<geist>
And i think it’s the default domain if you dont specify any flags on the DMB/DSB instructions
<geist>
Yeah,m it’s mentioned somewhere in the arm manual that all cores/memory/etc that a single OS image runs on must be within the same inner domain
<Bitweasil>
Ah, ok.
<geist>
Doesn’t mean stuff outside of the os image (ie, GPUS, other cpus, etc) cant also be within the same inner domain, however
<geist>
And that’s how it’s generally done in most docs
<Bitweasil>
So outer might be relevant for something like those "One A core running something, two R cores running something different" hybrid SoCs?
<geist>
Yah totally
gog has joined #osdev
<Bitweasil>
Ok, that's useful. Thanks!
<geist>
That’s precisely what the idea is, but i think in practice nothing really implements that
<geist>
So as a result the rules are basically ‘use inner domain for synchronizing between your cpus and stuff on the cpus (TLB/caches)’ and ‘use sy domain for synching with hardware’
<geist>
Ie, a DMB ISH is good as a memory barrier to make sure all other cpus in your kernel ‘see’ what you just did
<geist>
A DMB SY would be overkill
<geist>
But a cache flush + DSB SY would ensure that whatever you wrote makes it out to memory
<geist>
Also note that individual cores are allowed to elevate a more fine grained barrier to something stronger, since strictly speaking larger domains encompass the smaller one
<geist>
Ie SY > outer > inner. And in fact IIRC the a53 manual says ‘actually everything is SY’ or something like that, but dont quote me on it
<geist>
This is another example of the ARM manual giving everyone (hardware and software) the rules, but not telling you what to do, you’re supposed to interpret the rules
<geist>
Definitely more cognitive load
Left_Turn has quit [Ping timeout: 260 seconds]
bauen1 has quit [Ping timeout: 268 seconds]
<zid>
oh x8dcc came back while I was ansleep
<zid>
[15:27] <x8dcc> but the idt is 447c0011 00004060
<zid>
leading zeros suggests *hex* for the length too
<zid>
so it's set to some weird value above 16 thousand :P
<zid>
ah you fucked up the asm
<x8dcc>
hey
<zid>
you accidentally the pointer
wootehfoot has joined #osdev
<heat>
man I sure hate it when i accidentally the pointer
<x8dcc>
I was having a look now, the idt descriptor address is fine when calling the assembly function
<zid>
yes
<zid>
it's the asm that's broken
<x8dcc>
I moved the idt_init call (not idt_load) to another part of the code and the address in "info registers" changed like a lot
<x8dcc>
now its 00000011 0004060
<zid>
yes, it's the asm that's broken
<x8dcc>
what's broken? I removed the pushes and the mov's and now its just lidt [esp + 4]
<zid>
pushing it to the stack makes it a void **, [esp+8] dereferences it once to get a void *
<zid>
so you loaded the actual value of the shit on your stack as the IDTR
<x8dcc>
huh? a void* as arg is a void**?
<zid>
no, a void * you take the address of is a void **