<klange>
would be nice, I use __seg_gs for x86_64; for extra fun, it's static const = 0;
<klange>
I think gcc is producing very similar code against x18 for me, so I've been content
<mrvn>
klange: as long as you always access stuff through the pointer. I would like to make x18 part of the type so pointers in structs will be relative to x18.
<mrvn>
auto *p = &this_core->foo; should e.g. be 8 and *p compile to 8(x18)
[itchyjunk] has quit [Ping timeout: 240 seconds]
[itchyjunk] has joined #osdev
blockhead has joined #osdev
Matt|home has joined #osdev
<klange>
mrvn: I think it does, but I'm away from my desk - I'll check when I get home.
<klange>
oh I misread, yeah, especially not in C, but also yes I always access through the pointer and anything else is bound to be nonsensical; if I want pther references, like pointers to the values within the structs, those end up as refs through the array that the thread pointers point into
<heat>
im so confused
<heat>
everything is fucked
<mrvn>
klange: I think I will try to trick the compiler into doing this in C++ with some operator overloads, making pointer like objects. I want the peripheral base address in a register on the RPi.
heat has quit [Remote host closed the connection]
<clever>
mrvn: mostly, ive just made it a compile time constant, i can make the entire pi0-pi3 range have the same value
<clever>
but the MMU can also be used to solve the model differences
<mrvn>
pre-mmu stuff.
<clever>
yeah, that becomes a bit more of an issue
<mrvn>
are you tweaking the config to make them all use the same base address? How?
<clever>
there is a broadcom custom mmu between "arm physical" and the real bus
<clever>
which can remap 16mb pages
<mrvn>
I know, but how do I change the mapping?
<clever>
you would need to use the open firmware
<mrvn>
or do you mean your own VC firmware?
<mrvn>
ahh, ok
<clever>
the closed firmware doesnt give you the choice
<clever>
and picks a random number out of a hat every time you jump models :P
<mrvn>
I like getting the addres from the device tree. Have to do that on other archs too.
<clever>
if your respecting DT properly, every single peripheral gets its own base-addr, and you have a dozen variables, for the base of each hw block
<clever>
so your always doing local-base + reg-offset
<mrvn>
hmm, true.
<clever>
but if your ignoring DT like i am, you can hard-code the right addr, and it just becomes a 32bit constant to load
<mrvn>
the local-base won't be a compile time constant so there is no gain in basing it of a fixed register.
<clever>
the compiler may cheat, load a common base, and then do short offsets
<mrvn>
*sigh* there goes that idea.
<clever>
the register idea only works if your not respecting DT
<mrvn>
*nod*
<clever>
LK solves this whole issue, by just not doing mmio until the mmu has been setup
<mrvn>
using a special pointer like object will be usefull though. Capsulate the volatile access and such automatically.
<clever>
linux mostly does the same thing, but early-printk can either compile a phys addr into the kernel, or set a phys addr in the kernel cmdline
<clever>
rpi-open-firmware is doing things in a few layers
<mrvn>
It's for raspbootin, my chain loader to load the real kernel over serial. So I don't want to change anything in the hardware I don't have to before the real kernel starts. But I need serial IO.
<clever>
for VPU side code, that is the real address, so you just #define HW_REGISTER_RW(addr) (*(volatile uint32_t *)(addr))
<mrvn>
I want a single raspbootin image that boot on all my arm devices no matter what.
<clever>
for ARM baremetal, you have to translate things, so you:
<clever>
mrvn: this assumes you have an mmiobase variable in scope, and references everything off that!
Brnocrist has quit [Ping timeout: 240 seconds]
<clever>
this lets me access MMIO from linux userland (mmap /dev/mem) or what your wanting, just set a var after you detect the model
<mrvn>
but as you said every perpheral base address is variable, coming from the DT
<klange>
what if peripheral base... as externally defined symbol, with self-relocations? fill it in on load... :thinking-emoji:
<clever>
heh
* gog
contemplates becoming one of the ARM kool kids
<klange>
we've got delicious pi
<gog>
that is alluring ngl
<clever>
mrvn: u-boot doesnt even try to solve the problem your doing, they just compile a new copy for each model, and the config.txt syntax can be told to load the right one
Brnocrist has joined #osdev
<mrvn>
The way this currently works is that my peripherals have a placement new, where the argument is the base address of the peripheral. The constructor then initializes the periphery.
<mrvn>
auto uart = new(dt->find_base_address("serial")) Uart(115200);
<clever>
mrvn: but the pi has between 2 and 6? uarts, so you want something more loop based
<mrvn>
clever: only one serial alias
<clever>
serial0 and serial1 aliases last i looked
<mrvn>
but yeah, the real code has some fallbacks looking for different things till it finds a serial.
<clever>
and you could wind up needing either driver
<clever>
serial0 is always the one on the gpio header
<clever>
serial1 is always the one on the bt controller
<clever>
but, which one is the PL011 and which is the mini-uart, varies, depending on config.txt entries
<clever>
so you need to read the compatible, and spawn the right driver
<mrvn>
ideally the raspbootin should listen on every single one because I can't know where I will connect on the next ARM board I will try this on.
<mrvn>
I want something where I can buy a brand new ARM board, flash raspbootin on it as is and have it give me a prompt when I power it up no matter whatr.
<clever>
in my most recent fun, i have been doing this:
<clever>
[nix-shell:~/apps/rpi/lk-overlay]$ make PROJECT=bootcode-fast-vga && ls -lh build-bootcode-fast-vga/lk.bin
<clever>
-rwxr-xr-x 1 clever users 26K Feb 18 21:22 build-bootcode-fast-vga/lk.bin
<clever>
that produces a custom bootcode.bin file (a symlink renames it), and then ships it off to a pi0 in usb-device mode (like DFU on other devices)
<clever>
so, i just plug the pi0 in (or trigger a reset), and it runs the latest build
<clever>
no fussing with SD cards
<mrvn>
nice
<clever>
once i had it doing the desired job, i realized it was only using ~60kb of the binary
<clever>
that leaves plenty of L2 cache to spare, so i just deleted the dram driver :P
<clever>
and now its down to 26kb
<clever>
libc and kernel are the biggest chunks of fat now
<clever>
80000c68 000007da T __adddf3
<clever>
and checking the `vc4-elf-nm build-bootcode-fast-vga/lk.elf -S --size-sort | tail` report, i can see that floating point routines are a large cost
<clever>
which i'm using to compute the pixel clocks
<clever>
but, i have hw float, why am i using double?
<klange>
i am a wee bit bigger than that
<clever>
platform/bcm28xx/dpi/dpi.c:103:45: error: implicit conversion from ‘float’ to ‘double’ to match other operand of binary expression [-Werror=double-promotion] desired_divider = (int)(desired_divider + 0.5);
<clever>
hmmm, how do i tell gcc that 0.5 is a float, not a double?
<klange>
put an f after it?
<clever>
-rwxr-xr-x 1 clever users 23K Feb 18 21:54 build-bootcode-fast-vga/lk.bin
<clever>
perfect, that shakes another 6kb off!
<clever>
__divdf3 is now the next victim
<clever>
-rwxr-xr-x 1 clever users 21K Feb 18 21:57 build-bootcode-fast-vga/lk.bin
<klange>
-rwxrwxr-x 1 klange klange 17M Feb 18 18:44 kernel8.img
<klange>
you are rounding error to my payload :(
<clever>
:D
<clever>
i also just saved a div, without even removing code
<clever>
`static const uint32_t xtal_freq = CRYSTAL;` this was an extern in a .h, and defined in a .c
<klange>
(to be fair... there's a lot of JPGs and PNGs and shit in that 17M...)
<clever>
but its a constant, so why not declare it as such? now `(double)xtal_freq/1000/1000` gets computed at compile time
<clever>
i only support tga files currently
<clever>
__udivdi3 is now coming up...
<klange>
dividing integers is for weenies
<clever>
i think thats 64bit division, cant really avoid that
<clever>
i have a 32bit div opcode, but no 64bit div opcode
<klange>
i don't even get out of bed unless the divisor is a power of two
<clever>
my hw clock ticks in uSec, but LK wants msec
<clever>
so i need to /1000 things
<klange>
f (and not the float suffix one)
<clever>
_printf_engine is my next biggest cost, this binary has zero way of emiting text
<clever>
if only i had an easy way to disable printf globally...
<klange>
#define printf(...) (void)0;
<klange>
er, minus the semicolon I guess
<klange>
or if you know all your printfs are statement function calls, just `#define printf(...)` and yolo it
<clever>
:D
<mrvn>
clever: do you have a printf that does "uint64_t / 10"?
<clever>
mrvn: that is why __umoddi3 is in the binary, longlong_to_string.constprop.0
<clever>
and why i'm trying to put printf on the chopping block :P
<gog>
aaay clang caught a bug for me
<mrvn>
clever: I replaced that with a few shifts and bit ops for a hand crafted / 10. No need to pull in libgcc for that.
<gog>
using || when i needed &&
<klange>
null dereference? foo || foo->bar?
<gog>
nah overlapping comparisons
<klange>
oh that's neat... I don't think gcc has anything for those
<gog>
it apparently doesn't because that code's been there for about 30 commits back
<klange>
i'm sure it figures them out as part of optimization, but haven't seen a warning or whatever and I know I've done it once or twice...
<bslsk05>
github.com: lk/debug.c at master · littlekernel/lk · GitHub
<clever>
panic also depends on printf!
<clever>
and that module isnt optional
<clever>
so yeah, i would have to neuter printf and vprintf globally, via some header
<clever>
klange: in terms of my binary being a rounding error for you, i dont even need the ram to be online, lol, and its not even using 20% of the L2 cache
<clever>
but its not quiet enough to have a full framebuffer
<clever>
i could maybe get a 320x320 palette based framebuffer
<bslsk05>
stackoverflow.com: math - Divide by 10 using bit shifts? - Stack Overflow
<clever>
mrvn: ive also seen gcc doing similar with high-side mults and bit masking
<clever>
its pretty weird
<clever>
oh, what happens if i hit the LTO button?
<mrvn>
clever: mult is something diverent and needs higher precision for the mult. x/10 == (x * (1/10 << 32)) >> 32
<mrvn>
clever: sometimes you need other shifts than 32 but 32 is the best for 32bit cpus.
<clever>
yeah
<mrvn>
gcc and clang are pretty good at finding inverse constants so the multiply actually is true for all inputs but they don't always find the same constants.
<mrvn>
The bit shifts and adds (I had) are derived from multiplying with the inverse. You just do the mult with shifts and adds basically.
<mrvn>
1/10 has a nice binary pattern so that works out great.
<heat>
holy fucking shit I was looking at the wrong thing
<heat>
it was a context switching bug
<clever>
been there, got the t-shirt
<heat>
i got no t-shirt
<heat>
:(
<clever>
in my case, i wrote a bit of critical-section code, that would disable irq's, then enable irq's
<clever>
it got ran by the scheduler when irq's where off
<clever>
so it turned irq back on, in the middle of the context switching routine
<clever>
which then context switched in the middle of context switching
<heat>
i was getting issues with corruption of *everything* in the stack
<heat>
looked at the mmu a bunch, no dice
<heat>
(although I did find a cute little bug)
<heat>
it turns out the bug was that I forgot a user thread could be inside the kernel
<heat>
so I was setting the scratch tp register wrong when switching back to user threads in kernel space
<clever>
ah
<heat>
and it wasn't faulting, but just wrecking havoc everywhere
<clever>
for me, it faulted, but only if an irq happens at the wrong time in context switching
<clever>
so it was fine until i tried to do irq heavy tasks
<heat>
the havoc really reminded me of TLB problems, so I looked at that a bunch (since I wasn't doing TLB stuff, but I also wasn't really touching memory such that I would need to shootdown stuff)
<heat>
I also tried porting a new allocator just to check
<heat>
now I have a half-finished port of scudo
<heat>
oh also this was somehow triggered by a malloc call in fpu code
<heat>
malloc corrupts everything turned out to be a context switching bug
<heat>
scudo might be a good allocator though
<heat>
dunno about the perf though
<heat>
i know it's supposed to be more secure, and that both android, fuchsia and trusty use it
<heat>
I'm using the old musl malloc, which is just *not good*
ElectronApps has joined #osdev
masoudd has quit [Ping timeout: 272 seconds]
srjek_ has quit [Read error: Connection reset by peer]
<klange>
mine's pretty shit, too, predates all the rest of my OS
<klange>
one of my uni+apple friends helped me with it, smart guy
mahmutov has quit [Ping timeout: 240 seconds]
nyah has quit [Ping timeout: 272 seconds]
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
gog has quit [Ping timeout: 240 seconds]
bradd has joined #osdev
k8yun has joined #osdev
k8yun has quit [Quit: Leaving]
elastic_dog has quit [Ping timeout: 240 seconds]
elastic_dog has joined #osdev
eroux has joined #osdev
xenos1984 has quit [Remote host closed the connection]
<bslsk05>
twitter: <a13xp0p0v> Finished my new security research:   Hacking Zircon microkernel of Fuchsia OS developed by @GoogleOSS .   Sharing the screenshot of the PoC exploit demo!   I'll publish a detailed write-up. https://pbs.twimg.com/media/FL6BIyDX0AElQob.jpg
lkurusa has quit [Client Quit]
ThinkT510 has quit [Quit: WeeChat 3.4]
ThinkT510 has joined #osdev
mahmutov has quit [Ping timeout: 256 seconds]
<klange>
rpi crashed last night; started it up this morning, it ran for 12 hours fine, deployed a new kernel it froze a few minutes in... worried I've got a really unlikely deadlock...
<klange>
gonna run the hvf vm in a similar config and see if I can get the same result where I can attach a debugger...
GeDaMo has joined #osdev
vin has quit [Remote host closed the connection]
<froggey>
I have a magic debug button, press the special key combo and back traces for all threads gets dumped to the serial port
<froggey>
Extremely useful for debugging deadlocks like that on real hardware
<mrvn>
would be nice to run kernel code through some thread sanitizers.
kleinweby has quit [Quit: ZNC 1.6.6+deb1ubuntu0.2 - http://znc.in]
<catern>
is there a name for the observed phenomenon that the CPUs and memory have gotten much further away from disks in speed than they used to be?
<GeDaMo>
Time dilation? :P
<mjg>
i wonder, do you have machinery in your kernels to detect lock ordering issues?
<mjg>
without having to run into actual deadlocks
gog has joined #osdev
<mrvn>
catern: has it actually? With SSDs and M2.key disk speed has leaped. 2GB/s compares much better to memory/cpu speed than the 160MB/s of rotating disks.
<kazinsal>
being an order of magnitude off instead of two orders of magnitude off is definitely an improvement, yeah
<mrvn>
cpu speed also hasn't improved in the last years. You only got more cpus. (basically)
<catern>
i mean since 1970
<mrvn>
do you have data for all that time?
nyah has joined #osdev
orthoplex64 has quit [Ping timeout: 256 seconds]
orthoplex64 has joined #osdev
[itchyjunk] has joined #osdev
mahmutov has joined #osdev
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 256 seconds]
X-Scale` is now known as X-Scale
masoudd has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
troseman has joined #osdev
Bonstra has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
myon98 has quit [Ping timeout: 250 seconds]
dude12312414 has joined #osdev
simpl_e has quit [Remote host closed the connection]
zaquest has quit [Ping timeout: 252 seconds]
zaquest has joined #osdev
rustyy has quit [Quit: leaving]
MiningMarsh has quit [Ping timeout: 240 seconds]
rustyy has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
wootehfoot has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
Bonstra has quit [Quit: Pouf c'est tout !]
GeDaMo has quit [Remote host closed the connection]
k8yun has joined #osdev
MiningMarsh has joined #osdev
Bonstra has joined #osdev
heat_ has joined #osdev
heat_ is now known as heat
<heat>
updog
<gog>
what's opdog
<heat>
its all good
<heat>
hehehehehe
<heat>
hehehehe
<heat>
hehe
<heat>
he
<j`ey>
at
<heat>
hat
<heat>
i feel compelled to start my arm64 port
heat has quit [Remote host closed the connection]
heat has joined #osdev
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<heat>
asm goto in GCC doesn't support output constraints
<heat>
your outputs need to be passed as inputs, and if you change memory, you add memory as a clobber, forcing the compiler to reload everything from memory
<heat>
yuck
<klange>
13 minutes in, instruction abort on a ret... 40 minutes in, instruction abort on a ret...
<mrvn>
why do you want to asm goto?
<heat>
get_user without doing it all in assembly
<mrvn>
Why would that need any asm at all?
<heat>
to be able to take a page fault and recover
<j`ey>
I had a look at linux's 'extable' stuff for that recently, it's cool
<heat>
yes
<heat>
i do something like that
<j`ey>
my coworker refactored the arm64 impl
<j`ey>
to use less asm
<heat>
define a label at the instruction that accesses user memory, .pushsection eh_table and fill out the struct, .popsection
<klange>
if this turns out to be that I needed one instruction before this other instruction imma be mad... i hate these bugs where running a full-blast VM for an hour or more is what it takes to reproduce them...
<bslsk05>
github.com: linux/extable.c at master · torvalds/linux · GitHub
<j`ey>
&ex->fixup + ex->fixup
<heat>
it's PC relative?
<j`ey>
yeah
<j`ey>
well, relative to the fixup itself
<heat>
that's funny
<heat>
so you get a smaller exception table?
<j`ey>
I assume that's part of it yet
<j`ey>
*yeah
<mrvn>
heat: why don't you check the memory region against your memory information for the task? You don't want the user process accessing bits of memory where only the kernel has access.
<heat>
i check for the limit address
<klange>
30 minutes in, nothing yet...
<klange>
it is absolutely pouring rain outside, tho
<heat>
but checking in the vm tree if it's mapped (and the perms) doesn't work and is slow
<heat>
way faster to optimise for the common case (access goes through) and works with threads
<mrvn>
heat: you don't have a range based VM structure?
<heat>
yes
<heat>
lock_address_space(); do_access(uptr); unlock_address_space(); is slow
<heat>
it also runs into horrible issues when you take a page fault
<heat>
the pf handler needs to somehow know you were holding the lock and doing a user access, as to avoid running into a deadlock
<mrvn>
you have to check the range, not just the pointer. And keep the lock while copying the memory.
<heat>
as I was saying, that is slow
<mrvn>
read lock, so the PF handler can read lock too
<mrvn>
From my point of view the memcpy is the bigger problem, takes longer.
<heat>
the atomic operation for a read lock is still a hell of a lot slower than just doing a cute load
<mrvn>
So your solution is to just copy blindly and hope the MMU pagefaults when the user tries something bad?
<heat>
yes my kernel expects the MMU to work
<mrvn>
except you are in kernel mode, so you have access to kernel only mapped memory.
<heat>
except I literally told you I check for the user address limit
<mrvn>
not sure what is supposed to tell me
<mrvn>
+that
<heat>
if (uptr < USER_ADDRESS_LIMIT) return -EFAULT;
<heat>
er, >
<j`ey>
heat: you know aarch64 has some 'unpriveleged' instructions for this