klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<klange> would be nice, I use __seg_gs for x86_64; for extra fun, it's static const = 0;
<klange> I think gcc is producing very similar code against x18 for me, so I've been content
<mrvn> klange: as long as you always access stuff through the pointer. I would like to make x18 part of the type so pointers in structs will be relative to x18.
<mrvn> auto *p = &this_core->foo; should e.g. be 8 and *p compile to 8(x18)
[itchyjunk] has quit [Ping timeout: 240 seconds]
[itchyjunk] has joined #osdev
blockhead has joined #osdev
Matt|home has joined #osdev
<klange> mrvn: I think it does, but I'm away from my desk - I'll check when I get home.
<klange> oh I misread, yeah, especially not in C, but also yes I always access through the pointer and anything else is bound to be nonsensical; if I want pther references, like pointers to the values within the structs, those end up as refs through the array that the thread pointers point into
<heat> im so confused
<heat> everything is fucked
<mrvn> klange: I think I will try to trick the compiler into doing this in C++ with some operator overloads, making pointer like objects. I want the peripheral base address in a register on the RPi.
heat has quit [Remote host closed the connection]
<clever> mrvn: mostly, ive just made it a compile time constant, i can make the entire pi0-pi3 range have the same value
<clever> but the MMU can also be used to solve the model differences
<mrvn> pre-mmu stuff.
<clever> yeah, that becomes a bit more of an issue
<mrvn> are you tweaking the config to make them all use the same base address? How?
<clever> there is a broadcom custom mmu between "arm physical" and the real bus
<clever> which can remap 16mb pages
<mrvn> I know, but how do I change the mapping?
<clever> you would need to use the open firmware
<mrvn> or do you mean your own VC firmware?
<mrvn> ahh, ok
<clever> the closed firmware doesnt give you the choice
<clever> and picks a random number out of a hat every time you jump models :P
<mrvn> I like getting the addres from the device tree. Have to do that on other archs too.
<clever> if your respecting DT properly, every single peripheral gets its own base-addr, and you have a dozen variables, for the base of each hw block
<clever> so your always doing local-base + reg-offset
<mrvn> hmm, true.
<clever> but if your ignoring DT like i am, you can hard-code the right addr, and it just becomes a 32bit constant to load
<mrvn> the local-base won't be a compile time constant so there is no gain in basing it of a fixed register.
<clever> the compiler may cheat, load a common base, and then do short offsets
<mrvn> *sigh* there goes that idea.
<clever> the register idea only works if your not respecting DT
<mrvn> *nod*
<clever> LK solves this whole issue, by just not doing mmio until the mmu has been setup
<mrvn> using a special pointer like object will be usefull though. Capsulate the volatile access and such automatically.
<clever> linux mostly does the same thing, but early-printk can either compile a phys addr into the kernel, or set a phys addr in the kernel cmdline
<clever> rpi-open-firmware is doing things in a few layers
<clever> #define SCALER_DISPCTRL HW_REGISTER_RW( 0x7e400000 )
<clever> first, you define a register like this
<mrvn> It's for raspbootin, my chain loader to load the real kernel over serial. So I don't want to change anything in the hardware I don't have to before the real kernel starts. But I need serial IO.
<clever> for VPU side code, that is the real address, so you just #define HW_REGISTER_RW(addr) (*(volatile uint32_t *)(addr))
<mrvn> I want a single raspbootin image that boot on all my arm devices no matter what.
<clever> for ARM baremetal, you have to translate things, so you:
<clever> # define HW_REGISTER_RW(addr) (*(volatile uint32_t *)(VC4_TO_ARM_PERIPH(addr)))
<clever> #define VC4_TO_ARM_PERIPH(addr) ((addr - VC4_PERIPH_BASE) + ARM_PERIPH_BASE)
<clever> but, i added more magic:
<clever> # define HW_REGISTER_RW(addr) (*(volatile uint32_t *)(mmiobase + (addr & 0x00ffffff)))
<clever> mrvn: this assumes you have an mmiobase variable in scope, and references everything off that!
Brnocrist has quit [Ping timeout: 240 seconds]
<clever> this lets me access MMIO from linux userland (mmap /dev/mem) or what your wanting, just set a var after you detect the model
<mrvn> but as you said every perpheral base address is variable, coming from the DT
<klange> what if peripheral base... as externally defined symbol, with self-relocations? fill it in on load... :thinking-emoji:
<clever> heh
* gog contemplates becoming one of the ARM kool kids
<klange> we've got delicious pi
<gog> that is alluring ngl
<clever> mrvn: u-boot doesnt even try to solve the problem your doing, they just compile a new copy for each model, and the config.txt syntax can be told to load the right one
Brnocrist has joined #osdev
<mrvn> The way this currently works is that my peripherals have a placement new, where the argument is the base address of the peripheral. The constructor then initializes the periphery.
<mrvn> auto uart = new(dt->find_base_address("serial")) Uart(115200);
<clever> mrvn: but the pi has between 2 and 6? uarts, so you want something more loop based
<mrvn> clever: only one serial alias
<clever> serial0 and serial1 aliases last i looked
<mrvn> but yeah, the real code has some fallbacks looking for different things till it finds a serial.
<clever> and you could wind up needing either driver
<clever> serial0 is always the one on the gpio header
<clever> serial1 is always the one on the bt controller
<clever> but, which one is the PL011 and which is the mini-uart, varies, depending on config.txt entries
<clever> so you need to read the compatible, and spawn the right driver
<mrvn> ideally the raspbootin should listen on every single one because I can't know where I will connect on the next ARM board I will try this on.
<mrvn> I want something where I can buy a brand new ARM board, flash raspbootin on it as is and have it give me a prompt when I power it up no matter whatr.
<clever> in my most recent fun, i have been doing this:
<clever> [root@amd-nixos:/home/clever/apps/rpi/usbboot]# ./rpiboot -d /home/clever/apps/rpi/lk-overlay/build-bootcode-fast-vga/
<clever> [nix-shell:~/apps/rpi/lk-overlay]$ make PROJECT=bootcode-fast-vga && ls -lh build-bootcode-fast-vga/lk.bin
<clever> -rwxr-xr-x 1 clever users 26K Feb 18 21:22 build-bootcode-fast-vga/lk.bin
<clever> that produces a custom bootcode.bin file (a symlink renames it), and then ships it off to a pi0 in usb-device mode (like DFU on other devices)
<clever> so, i just plug the pi0 in (or trigger a reset), and it runs the latest build
<clever> no fussing with SD cards
<mrvn> nice
<clever> once i had it doing the desired job, i realized it was only using ~60kb of the binary
<clever> that leaves plenty of L2 cache to spare, so i just deleted the dram driver :P
<clever> and now its down to 26kb
<clever> libc and kernel are the biggest chunks of fat now
<clever> 80000c68 000007da T __adddf3
<clever> and checking the `vc4-elf-nm build-bootcode-fast-vga/lk.elf -S --size-sort | tail` report, i can see that floating point routines are a large cost
<clever> which i'm using to compute the pixel clocks
<clever> but, i have hw float, why am i using double?
<klange> i am a wee bit bigger than that
<clever> platform/bcm28xx/dpi/dpi.c:103:45: error: implicit conversion from ‘float’ to ‘double’ to match other operand of binary expression [-Werror=double-promotion] desired_divider = (int)(desired_divider + 0.5);
<clever> hmmm, how do i tell gcc that 0.5 is a float, not a double?
<klange> put an f after it?
<clever> -rwxr-xr-x 1 clever users 23K Feb 18 21:54 build-bootcode-fast-vga/lk.bin
<clever> perfect, that shakes another 6kb off!
<clever> __divdf3 is now the next victim
<clever> -rwxr-xr-x 1 clever users 21K Feb 18 21:57 build-bootcode-fast-vga/lk.bin
<klange> -rwxrwxr-x 1 klange klange 17M Feb 18 18:44 kernel8.img
<klange> you are rounding error to my payload :(
<clever> :D
<clever> i also just saved a div, without even removing code
<clever> `static const uint32_t xtal_freq = CRYSTAL;` this was an extern in a .h, and defined in a .c
<klange> (to be fair... there's a lot of JPGs and PNGs and shit in that 17M...)
<clever> but its a constant, so why not declare it as such? now `(double)xtal_freq/1000/1000` gets computed at compile time
<clever> i only support tga files currently
<clever> __udivdi3 is now coming up...
<klange> dividing integers is for weenies
<clever> i think thats 64bit division, cant really avoid that
<clever> i have a 32bit div opcode, but no 64bit div opcode
<klange> i don't even get out of bed unless the divisor is a power of two
<clever> my hw clock ticks in uSec, but LK wants msec
<clever> so i need to /1000 things
<klange> f (and not the float suffix one)
<clever> _printf_engine is my next biggest cost, this binary has zero way of emiting text
<clever> if only i had an easy way to disable printf globally...
<klange> #define printf(...) (void)0;
<klange> er, minus the semicolon I guess
<klange> or if you know all your printfs are statement function calls, just `#define printf(...)` and yolo it
<clever> :D
<mrvn> clever: do you have a printf that does "uint64_t / 10"?
<clever> mrvn: that is why __umoddi3 is in the binary, longlong_to_string.constprop.0
<clever> and why i'm trying to put printf on the chopping block :P
<gog> aaay clang caught a bug for me
<mrvn> clever: I replaced that with a few shifts and bit ops for a hand crafted / 10. No need to pull in libgcc for that.
<gog> using || when i needed &&
<klange> null dereference? foo || foo->bar?
<gog> nah overlapping comparisons
<klange> oh that's neat... I don't think gcc has anything for those
<gog> it apparently doesn't because that code's been there for about 30 commits back
<klange> i'm sure it figures them out as part of optimization, but haven't seen a warning or whatever and I know I've done it once or twice...
<bslsk05> ​github.com: lk/printf.c at master · littlekernel/lk · GitHub
<gog> i don't think it does i have all warnings turned on
<clever> mrvn: yeah, its the %10, not the /10 that does the __umoddi3
<bslsk05> ​github.com: lk/debug.c at master · littlekernel/lk · GitHub
<clever> panic also depends on printf!
<clever> and that module isnt optional
<clever> so yeah, i would have to neuter printf and vprintf globally, via some header
<clever> klange: in terms of my binary being a rounding error for you, i dont even need the ram to be online, lol, and its not even using 20% of the L2 cache
<clever> but its not quiet enough to have a full framebuffer
<clever> i could maybe get a 320x320 palette based framebuffer
<clever> or RGB332
<mrvn> clever: can't find my code for uint64_t div10 but it's like this: https://stackoverflow.com/questions/5558492/divide-by-10-using-bit-shifts 2nd answer
<bslsk05> ​stackoverflow.com: math - Divide by 10 using bit shifts? - Stack Overflow
<clever> mrvn: ive also seen gcc doing similar with high-side mults and bit masking
<clever> its pretty weird
<clever> oh, what happens if i hit the LTO button?
<mrvn> clever: mult is something diverent and needs higher precision for the mult. x/10 == (x * (1/10 << 32)) >> 32
<mrvn> clever: sometimes you need other shifts than 32 but 32 is the best for 32bit cpus.
<clever> yeah
<mrvn> gcc and clang are pretty good at finding inverse constants so the multiply actually is true for all inputs but they don't always find the same constants.
<mrvn> The bit shifts and adds (I had) are derived from multiplying with the inverse. You just do the mult with shifts and adds basically.
<mrvn> 1/10 has a nice binary pattern so that works out great.
pretty_dumm_guy has quit [Quit: WeeChat 3.4]
pretty_dumm_guy has joined #osdev
heat has joined #osdev
pretty_dumm_guy has quit [Client Quit]
<bslsk05> ​gcc.gnu.org: LTO Overview (GNU Compiler Collection (GCC) Internals)
orthoplex64 has joined #osdev
<heat> holy fucking shit I was looking at the wrong thing
<heat> it was a context switching bug
<clever> been there, got the t-shirt
<heat> i got no t-shirt
<heat> :(
<clever> in my case, i wrote a bit of critical-section code, that would disable irq's, then enable irq's
<clever> it got ran by the scheduler when irq's where off
<clever> so it turned irq back on, in the middle of the context switching routine
<clever> which then context switched in the middle of context switching
<heat> i was getting issues with corruption of *everything* in the stack
<heat> looked at the mmu a bunch, no dice
<heat> (although I did find a cute little bug)
<heat> it turns out the bug was that I forgot a user thread could be inside the kernel
<heat> so I was setting the scratch tp register wrong when switching back to user threads in kernel space
<clever> ah
<heat> and it wasn't faulting, but just wrecking havoc everywhere
<clever> for me, it faulted, but only if an irq happens at the wrong time in context switching
<clever> so it was fine until i tried to do irq heavy tasks
<heat> the havoc really reminded me of TLB problems, so I looked at that a bunch (since I wasn't doing TLB stuff, but I also wasn't really touching memory such that I would need to shootdown stuff)
<heat> I also tried porting a new allocator just to check
<heat> now I have a half-finished port of scudo
<heat> oh also this was somehow triggered by a malloc call in fpu code
<heat> malloc corrupts everything turned out to be a context switching bug
<heat> scudo might be a good allocator though
<heat> dunno about the perf though
<heat> i know it's supposed to be more secure, and that both android, fuchsia and trusty use it
<heat> I'm using the old musl malloc, which is just *not good*
ElectronApps has joined #osdev
masoudd has quit [Ping timeout: 272 seconds]
srjek_ has quit [Read error: Connection reset by peer]
<klange> mine's pretty shit, too, predates all the rest of my OS
<klange> one of my uni+apple friends helped me with it, smart guy
mahmutov has quit [Ping timeout: 240 seconds]
nyah has quit [Ping timeout: 272 seconds]
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
gog has quit [Ping timeout: 240 seconds]
bradd has joined #osdev
k8yun has joined #osdev
k8yun has quit [Quit: Leaving]
elastic_dog has quit [Ping timeout: 240 seconds]
elastic_dog has joined #osdev
eroux has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
fkrauthan has quit [Quit: ZNC - https://znc.in]
fkrauthan has joined #osdev
heat_ has quit [Ping timeout: 245 seconds]
mahmutov has joined #osdev
eroux has quit [Ping timeout: 272 seconds]
chronon has left #osdev [#osdev]
[itchyjunk] has quit [Read error: Connection reset by peer]
copier_ has joined #osdev
copier_ has left #osdev [#osdev]
lkurusa has joined #osdev
<bslsk05> ​twitter: <a13xp0p0v> Finished my new security research: ␤ ␤ Hacking Zircon microkernel of Fuchsia OS developed by @GoogleOSS . ␤ ␤ Sharing the screenshot of the PoC exploit demo! ␤ ␤ I'll publish a detailed write-up. https://pbs.twimg.com/media/FL6BIyDX0AElQob.jpg
lkurusa has quit [Client Quit]
ThinkT510 has quit [Quit: WeeChat 3.4]
ThinkT510 has joined #osdev
mahmutov has quit [Ping timeout: 256 seconds]
<klange> rpi crashed last night; started it up this morning, it ran for 12 hours fine, deployed a new kernel it froze a few minutes in... worried I've got a really unlikely deadlock...
<klange> gonna run the hvf vm in a similar config and see if I can get the same result where I can attach a debugger...
GeDaMo has joined #osdev
vin has quit [Remote host closed the connection]
<froggey> I have a magic debug button, press the special key combo and back traces for all threads gets dumped to the serial port
<froggey> Extremely useful for debugging deadlocks like that on real hardware
<mrvn> would be nice to run kernel code through some thread sanitizers.
kleinweby has quit [Quit: ZNC 1.6.6+deb1ubuntu0.2 - http://znc.in]
<mjg> linux has kcsan
<j`ey> now on arm64 too!
<mjg> oh?
<bslsk05> ​github.com: arm64: Enable KCSAN · torvalds/linux@dd03762 · GitHub
<mjg> nice
pretty_dumm_guy has joined #osdev
<catern> is there a name for the observed phenomenon that the CPUs and memory have gotten much further away from disks in speed than they used to be?
<GeDaMo> Time dilation? :P
<mjg> i wonder, do you have machinery in your kernels to detect lock ordering issues?
<mjg> without having to run into actual deadlocks
gog has joined #osdev
<mrvn> catern: has it actually? With SSDs and M2.key disk speed has leaped. 2GB/s compares much better to memory/cpu speed than the 160MB/s of rotating disks.
<kazinsal> being an order of magnitude off instead of two orders of magnitude off is definitely an improvement, yeah
<mrvn> cpu speed also hasn't improved in the last years. You only got more cpus. (basically)
<catern> i mean since 1970
<mrvn> do you have data for all that time?
nyah has joined #osdev
orthoplex64 has quit [Ping timeout: 256 seconds]
orthoplex64 has joined #osdev
[itchyjunk] has joined #osdev
mahmutov has joined #osdev
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 256 seconds]
X-Scale` is now known as X-Scale
masoudd has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
troseman has joined #osdev
Bonstra has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
myon98 has quit [Ping timeout: 250 seconds]
dude12312414 has joined #osdev
simpl_e has quit [Remote host closed the connection]
zaquest has quit [Ping timeout: 252 seconds]
zaquest has joined #osdev
rustyy has quit [Quit: leaving]
MiningMarsh has quit [Ping timeout: 240 seconds]
rustyy has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
wootehfoot has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
Bonstra has quit [Quit: Pouf c'est tout !]
GeDaMo has quit [Remote host closed the connection]
k8yun has joined #osdev
MiningMarsh has joined #osdev
Bonstra has joined #osdev
heat_ has joined #osdev
heat_ is now known as heat
<heat> updog
<gog> what's opdog
<heat> its all good
<heat> hehehehehe
<heat> hehehehe
<heat> hehe
<heat> he
<j`ey> at
<heat> hat
<heat> i feel compelled to start my arm64 port
heat has quit [Remote host closed the connection]
heat has joined #osdev
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<heat> asm goto in GCC doesn't support output constraints
<heat> your outputs need to be passed as inputs, and if you change memory, you add memory as a clobber, forcing the compiler to reload everything from memory
<heat> yuck
<klange> 13 minutes in, instruction abort on a ret... 40 minutes in, instruction abort on a ret...
<mrvn> why do you want to asm goto?
<heat> get_user without doing it all in assembly
<mrvn> Why would that need any asm at all?
<heat> to be able to take a page fault and recover
<j`ey> I had a look at linux's 'extable' stuff for that recently, it's cool
<heat> yes
<heat> i do something like that
<j`ey> my coworker refactored the arm64 impl
<j`ey> to use less asm
<heat> define a label at the instruction that accesses user memory, .pushsection eh_table and fill out the struct, .popsection
<klange> if this turns out to be that I needed one instruction before this other instruction imma be mad... i hate these bugs where running a full-blast VM for an hour or more is what it takes to reproduce them...
<bslsk05> ​github.com: linux/extable.c at master · torvalds/linux · GitHub
<j`ey> &ex->fixup + ex->fixup
<heat> it's PC relative?
<j`ey> yeah
<j`ey> well, relative to the fixup itself
<heat> that's funny
<heat> so you get a smaller exception table?
<j`ey> I assume that's part of it yet
<j`ey> *yeah
<mrvn> heat: why don't you check the memory region against your memory information for the task? You don't want the user process accessing bits of memory where only the kernel has access.
<heat> i check for the limit address
<klange> 30 minutes in, nothing yet...
<klange> it is absolutely pouring rain outside, tho
<heat> but checking in the vm tree if it's mapped (and the perms) doesn't work and is slow
<heat> way faster to optimise for the common case (access goes through) and works with threads
<mrvn> heat: you don't have a range based VM structure?
<heat> yes
<heat> lock_address_space(); do_access(uptr); unlock_address_space(); is slow
<heat> it also runs into horrible issues when you take a page fault
<heat> the pf handler needs to somehow know you were holding the lock and doing a user access, as to avoid running into a deadlock
<mrvn> you have to check the range, not just the pointer. And keep the lock while copying the memory.
<heat> as I was saying, that is slow
<mrvn> read lock, so the PF handler can read lock too
<mrvn> From my point of view the memcpy is the bigger problem, takes longer.
<heat> the atomic operation for a read lock is still a hell of a lot slower than just doing a cute load
<mrvn> So your solution is to just copy blindly and hope the MMU pagefaults when the user tries something bad?
<heat> yes my kernel expects the MMU to work
<mrvn> except you are in kernel mode, so you have access to kernel only mapped memory.
<heat> except I literally told you I check for the user address limit
<mrvn> not sure what is supposed to tell me
<mrvn> +that
<heat> if (uptr < USER_ADDRESS_LIMIT) return -EFAULT;
<heat> er, >
<j`ey> heat: you know aarch64 has some 'unpriveleged' instructions for this
<j`ey> LDTR etc
<heat> cool
<bslsk05> ​developer.arm.com: Documentation – Arm Developer
<heat> riscv and x86 have turn on, turn off user access instructions
<mrvn> heat: this is all so much simpler if you don't have threads
<klange> 42 minutes, doing fine...
<heat> mrvn, nah this is pretty simple
<bslsk05> ​github.com: Onyx/usercopy.cpp at master · heatd/Onyx · GitHub
<mrvn> klange: "Absence of evidence is not evidence of absence."
<heat> then I have a piece of code that gets a recovery PC if I take an exception and it exists
<klange> please, my blood pressure is already over 150, I don't need reminders that this situation is f***ed
<heat> it's like C++ exceptions but way simpler
<mrvn> klange: you have a special get 64bit from user?
<mrvn> heat^^
<heat> yes
<mrvn> heat: would have thought the compiler optimizes the generic path to that if the size is 8.
<heat> the compiler can't see into assembly, and it doesn't technically work for x86
<heat> copy_from/to_user use rep movsb, and those have weird caching properties
<mrvn> heat: as discussed yesterday that's slower for <256 byte.
<heat> I don't have SIMD
<heat> this is the kernel
<mrvn> hence 256 and not 8k or what it was for simd
<heat> no it's 256 for SIMD
<mrvn> must be remembering it wrong then, it was 4 in the morning the other day.
<mrvn> heat: where is the comparison between repl and just movs?
<heat> it's not
<heat> but "Using ERMSB always delivers better performance than using REP MOVSD+B"
<mrvn> I think I will try this: define a fixed size byte array in a struct of the right size, cast the pointers, assign it.
<mrvn> Let the compiler insert the best code to copy a block of memory of known size.
<mrvn> s/byte/alignment appropriate type/
<klange> 1h10m on this vm running a workload with a lot of yield calls and the suspected fix in place, so far so good, also spinning up the rpi again
<mrvn> klange: does you fix include a check for the condition that was a bug and log "Fix triggered"?
<klange> the fix would happen thoudsands of times a second, so no