<bslsk05>
github.com: linux/binfmt-misc.rst at master · torvalds/linux · GitHub
<mrvn>
heat: says who?
<heat>
the kernel
<clever>
where the kernel will pre-open a filehandle to the binfmt-misc interpreter, and then execve by handle not path
<clever>
which allows executing a binary from outside a chroot env
<mrvn>
heat: you can't change the flags between pages but nothing really stops me from having: .init, .plt, .text with 16 byte alignment and all executable.
<clever>
or is that "fix binary", they both use similar working, bit confusing when you skim read
<mrvn>
heat: or stuffing some of the elf headers into the holes
<clever>
there is also a blog post i have related to this question...
<bslsk05>
dbp-consulting.com: Linux x86 Program Start Up
<clever>
there it is
<mrvn>
clever: which part of that big pretty picture is supposed to be kernel?
<clever>
ok, lets see, so execution begins at _start, which i believe is in the dynamic binary, even when using ld.so, and you call out to a symbol in it
<mrvn>
clever: when you get to _start all the kernel and loader is already done
<clever>
yeah, so that makes me wonder what ran the preinit array...
<mrvn>
clever: we need everything before "loader"
<clever>
> But first, how do we get to _start?
<clever>
> To summarize, it will set up a stack for you, and push onto it argc, argv, and envp. The file descriptions 0, 1, and 2, (stdin, stdout, stderr), are left to whatever the shell set them to.
<mrvn>
I know you can do "loader /path/binary" and it runs the program. So my guess would be that the kernel just maps the loader and stuff the original binary into args[1]
<clever>
it also goes on to explain the auxv, which is in the stack
<clever>
mrvn: except you can execute 1 binary but set argv[0] to point to something completely different
<mrvn>
clever: so? nobody said anything about arg[0]
<clever>
ah, you said args, not argv
<mrvn>
typo
<clever>
try `LD_SHOW_AUXV=1 ls`
<clever>
> The AT_PHDR is the location of the ELF program header that has information about the location of all the segments of the program in memory and about relocation entries, and anything else a loader needs to know.
<heat>
mrvn, ok so that possibility you talked about isn't a problem
<mrvn>
clever: inconclusive. Is that loaded by the kernel or the ld.so?
<clever>
3450: 00000000000278c0 33 FUNC LOCAL DEFAULT 15 __libc_main
<clever>
csu/version.c:__libc_main (void)
<mrvn>
clever: I don't know if any other lib has this too
<heat>
musl does
terrorjack has joined #osdev
<mrvn>
or if they even can
<heat>
yes they can
<mrvn>
heat: any non libc lib
<heat>
you can always set your shared object's entry
<clever>
it just calls a thin (inlinable) function, that just write()'s a single blob and exits, zero logic
<heat>
clever, anyway, yes shared libraries can be executable
<heat>
there's no such thing as a shared lib really
<heat>
ELF classifies your binaries as DYN or EXEC
<clever>
heat: i ran into trouble before, when i tried to dlopen an EXEC
<heat>
DYN need to have their base relocated (PIE or shared lib)
<heat>
EXEC don't
<clever>
it wouldnt let me, so i had to implement my own loader
<heat>
the kernel does 0 relocations
<clever>
that also makes my future plans of a PIE kernel simpler, its more obvious now that i can just compile it in DYN/-shared mode
<heat>
if you're a static PIE or an elf interpreter or whatever (any DYN), you need to relocate yourself
<heat>
hence the need for AT_PHDR, _DYNAMIC, etc
<heat>
a PIE kernel is wasteful
<clever>
ah yes, you cant even find your own sections, because youve been loaded to the wrong addr and not relocated properly
<clever>
and the AT_PHDR is at the top of the stack
kof123 has quit [*.net *.split]
yuiyukihira has quit [*.net *.split]
fluix has quit [*.net *.split]
graphitemaster has quit [*.net *.split]
alexander has quit [*.net *.split]
fluix has joined #osdev
<clever>
why is a PIE kernel wasteful?
alexander has joined #osdev
ornx has joined #osdev
graphitemaster has joined #osdev
<heat>
because PIE is designed to leave .text intact
<clever>
isnt that kind of needed for things like kaslr? (not what i want though)
<clever>
ah for reuse between processes
<heat>
and I'm talking about ELF PIE here, toolchain PIE
<clever>
but for a kernel, you dont care, and can freely modify the .text
<heat>
yes
<clever>
is that called something else?
<heat>
not really
<heat>
relocatable I guess?
<clever>
how would the toolchain args differ?
<heat>
essentially you end up relocating the kernel
<heat>
ld -r
<heat>
anything that keeps relocation data
<mrvn>
the hack to put all modifications together in fewer pages is only usefull if you map the same binary at multiple addresses. The kernel isn't mapped so you can just modify the .text.
<heat>
i actually support that on a toy os
cheapie has quit [Quit: Local host tripped over the cable]
<bslsk05>
github.com: Carbon/relocation.c at master · heatd/Carbon · GitHub
<heat>
linux doesn't do -r anymore
<clever>
where all .o's in a dir get linked into a dir.o
<heat>
it didn't work well with LTO
kof123 has joined #osdev
<clever>
ah, yeah, i can see that messing up
<clever>
LTO wants to do everything in the final link
MiningMarsh has quit [Read error: Connection reset by peer]
<clever>
doing 20 LTO passes is just silly, and partially linking without breaking LTO could be tricky
<mrvn>
heat: I do that with LTO per dir. Each directory can choose wether to do LTO at that point or push it to the top dir.
<heat>
linux uses thin ar archives these days
<clever>
ah, so its more like mini static libs
<heat>
those archives literally just have paths
<heat>
no file copying, no nothing
<mrvn>
clever: you don't do 20 passes. Once you LTO something it looses the LTO chunk in the .o file and just gets coppied at the next stage.
<clever>
oh
<heat>
ar T I think
<clever>
mrvn: ah, so you cant LTO between the dirs, enless you just put it off
<mrvn>
ar files / .a libs are something else again
MiningMarsh has joined #osdev
<mrvn>
clever: exactly.
cheapie has joined #osdev
<clever>
and if the linker starts to fuse .text's together, would that break LTO? or does that just get thrown out when LTO re-generates .text from the compiler state?
<mrvn>
clever: you get a bunch of .o files with binary data and some with lto data and the LTO linker plugin optimizes the LTO chunks and merges that with the binary chunks and then links.
<mrvn>
Note: .S files don't have any LTO chunks. So you already have to merge in a kernel.
<clever>
but isnt the .o containing a mix of .data, .text, .rodata, and LTO data?
<mrvn>
clever: that's a compiler option. You can have both.
<clever>
and the LTO contains data for recreating .text (and more?) after you merge several LTO chunks and re-optimize?
<clever>
ah
Affliction has quit [*.net *.split]
night has quit [*.net *.split]
arminweigl has quit [*.net *.split]
sjs has quit [*.net *.split]
les_ has quit [*.net *.split]
Patater has quit [*.net *.split]
dza has quit [*.net *.split]
<clever>
so you could omit the .text (and others?) entirely, and generate an LTO only .o file?
<heat>
yes, that's what happens
<clever>
i can see that being faster
night has joined #osdev
<heat>
unless you pass ffat-lto-objects
sjs has joined #osdev
<heat>
if you run file on a LTO'd .o you'll see GIMPLE
arminweigl has joined #osdev
Patater has joined #osdev
<mrvn>
clever: the LTO doesn't contain info to recreate the .text file. If you link an .S file and .cc file with LTO together and then LTO it later you loose the .S file.
<clever>
yeah, .S files cant be LTO'd, that makes sense
<mrvn>
clever: that part is kind of tricky.
les has joined #osdev
<clever>
since LTO is compiler (c/c++) internal state
<clever>
this also leads me to another idea
dza has joined #osdev
<clever>
if i use inline asm within a .c file, could LTO inline it directly at the call-site, as-if it was an inlineable function in a .h?
<clever>
and then it could perform better then calling a function in .S files?
<mrvn>
clever: yes
<mrvn>
You should really avoid calling functions in .S files that are trivial.
<clever>
yeah, i try to use inline asm any time i can
scoobydoo has quit [Read error: Connection reset by peer]
<clever>
the main reason i use .S files, is mainly when stack manipulation comes into play
scoobydoo has joined #osdev
<clever>
_start creating the start, irq handlers, context switching
<mrvn>
I only have boot.S to bootstrap, entry.S for IRQs and one switch_task function that swaps stacks and program counter.
<clever>
yep, exactly what i listed
corecode has quit [*.net *.split]
ornitorrincos has quit [*.net *.split]
remexre has quit [*.net *.split]
Maja[m] has quit [*.net *.split]
mjg_ has quit [*.net *.split]
rb has quit [*.net *.split]
Ameisen has quit [*.net *.split]
kkd has quit [*.net *.split]
jleightcap has quit [*.net *.split]
linkdd has quit [*.net *.split]
<clever>
s/creating the start/creating the stack/
patwid has quit [*.net *.split]
matthews has quit [*.net *.split]
patwid has joined #osdev
ornitorrincos has joined #osdev
jleightcap has joined #osdev
<mrvn>
Making that a function call has the benefit that the compiler will save state for you for most things.
matthews has joined #osdev
rwb has joined #osdev
remexre has joined #osdev
corecode has joined #osdev
<mrvn>
As in, it will save what it needs to save so you don't have to save all regs.
kkd has joined #osdev
<clever>
it also means your free to do whatever with the stack, as long as you undo it when returning
Ameisen has joined #osdev
mjg has joined #osdev
<mrvn>
clever: you could do that in inline asm.
<clever>
but your going to have a hard time context switching with inline asm
<mrvn>
With inline asm you would have to save all regs or mark all regs as clobber.
<clever>
you dont know exactly what gcc was storing on the stack
<clever>
and when creating a new thread, you need to falsify that saved state, so you can "restore" it when first spawning the thread
<mrvn>
To be correct you actually should be saving/restoring all regs so you don't leak any data from one task to another.
<clever>
yeah
<clever>
my context switch routine saves all regs to the stack, saves the final SP, then loads the new SP, and restores all regs
<mrvn>
One should also do that on the syscall border.
<clever>
so creating a new thread, just involves creating a fake stack frame with a set of "saved" registers
<mrvn>
(or zero out regs on exit)
Maja[m] has joined #osdev
<clever>
mrvn: so if i was to use -r to create a relocatable kernel, either i pass the whole .o file to a suitable loader, or i make it self-relocating, but then how do i ensure the relocation data survives an objcopy to .bin?
<clever>
can the linker script say to preserve it? and put it at some relative offset from .text?
<clever>
or perhaps use the right PIC opcodes to find that relative offset
dh` has quit [Remote host closed the connection]
wereii has quit [*.net *.split]
\Test_User has quit [*.net *.split]
Goodbye_Vincent has quit [*.net *.split]
Irvise_ has quit [*.net *.split]
ebb has quit [*.net *.split]
<mrvn>
clever: your linker script says what sections to keep and what to discard
<mrvn>
clever: and if you have any addresses in your C code then you need those relocation infos to make the kernel self relocate or have an extra loader.
dh` has joined #osdev
wereii has joined #osdev
<clever>
i know of at least one opcode in VPU that is properly PIC, when you try to load a symbol into a reg, it gets encoded as a PC relative offset
<clever>
so i can use that to find the start of the relocation data, in a custom _start
<mrvn>
clever: you have to mark the right sections so they are marked LOAD and kept in the objcopy to binary
<mrvn>
clever: doesn't work on the .data section
<clever>
yeah, i would want to tag them as LOAD, and ensure they dont cause huge holes that objcopy would null back-fill
<clever>
dont put .text at 1mb and relocations at 512mb, lol
<mrvn>
say you have "struct Task init_task; struct Task *current_task = &init_task;" then you need to relocate "current_task" on boot
<clever>
the relative spacing will be preserved in the .bin, and youll find up with a 511mb .bin file
<clever>
i would just relocate patch everything before the C code is even touched
<clever>
that should patch all of the initial .data values, right?
<mrvn>
hehe, don't do that. You can play with LMA and VMA to get everyhting close together even when you map it wide apart
<clever>
ah yes
Irvise_ has joined #osdev
<mrvn>
and yes, you relocate before _start so the C code just sees everything in place.
<clever>
There are no program headers in this file.
<clever>
i believe this binary was created with -r
<clever>
"file" says it is: ELF 32-bit LSB relocatable, Broadcom VideoCore III, version 1 (SYSV), with debug_info, not stripped
<clever>
i do also see a lot of .rela.* stuff in here, i assume that is what i would want the linker script to preserve?
<clever>
with `readelf -a` instead, i can see an entry-point of 0, so that implies the headers do have room for it, and i could still use this as a kernel binary, if i just pass the right flags at link-time
<clever>
ah, and there is the type, REL (Relocatable file)
ebb has joined #osdev
<mrvn>
All those relocation stuff depends on the arch and pic, PIC, pie, PIE flags. It's a bit of a mess.
<clever>
Relocation section '.rela.text.cmd_gpio_mode' at offset 0x9984 contains 23 entries: Offset Info Type Sym.Value Sym. Name + Addend
<mrvn>
clever: Why do you care for the kernel anyway? It runs in virtual space so just pick a fixed address. The only part where relocatable / position independent would be relevant is for boot.S
<heat>
KASLR
<clever>
mrvn: the VPU lacks an MMU, it doesnt run in virtual space!
<mrvn>
clever: ok, you are screwed. :)
<mrvn>
heat: why would PIC/PIE be overkill?
<clever>
i cant load to 0, because the arm reset vector is 0, and they would clash some, and remapping the arm would cause bigger problems
<clever>
i cant load to anything under 128mb, because linux and uboot like assuming the low parts of ram are available, until they parse DT
<heat>
mrvn, because you're the kernel, you can patch .text
<mrvn>
clever: you can link to any address, that isn't a problem
<heat>
you don't need a GOT, nor a PLT
einkoder has quit [*.net *.split]
stux has quit [*.net *.split]
brynet has quit [*.net *.split]
corank_ has quit [*.net *.split]
Andrew has quit [*.net *.split]
Arsen has quit [*.net *.split]
gdd has quit [*.net *.split]
woky_ has quit [*.net *.split]
bleb has quit [*.net *.split]
HeTo has quit [*.net *.split]
HeTo has joined #osdev
<mrvn>
heat: i beleive there are options for avoiding GOT and PLT
<clever>
mrvn: some arm kernels also assume ram is a single linear chunk, so i would need to load to the top of ram, to stay out of the way
Arsen has joined #osdev
bleb has joined #osdev
<clever>
but top of ram is a moving target
<clever>
so i need relocation...
<heat>
i don't think you can avoid the got
<heat>
PIE will not generate text relocs
<dh`>
if you don't have an mmu, you're kinda screwed with standard tools
<dh`>
best option is probably fdpic but it's pretty gross
<mrvn>
clever: the VPU can only access 1GB ram. How is that supposed to work at all?
<clever>
mrvn: the dram controller on the pi0-pi3 lineup, is also limited to 1gig of ram
Andrew has joined #osdev
<clever>
but there are models with 256 and 512 mounted
woky has joined #osdev
<mrvn>
clever: and on an 8GB model the vpu won't be at the end of ram.
einkoder has joined #osdev
corank_ has joined #osdev
<clever>
thats the pi4, totally different dram controller, the VPU is loaded to the top of the lowest 1gig
<clever>
and the extra 7gig is a second segment in device-tree
<clever>
that is basically arm-only
<mrvn>
clever: and any kernel that assumes you only have 1 segment will blow up
<clever>
the legacy api's only report the lower 1gig
<clever>
so any dumb kernel will just not see the other 7gig
<clever>
i suppose i could do the same, just load to 128mb, and claim 128mb of ram via the legacy api
<clever>
if you want more, use the device-tree
gdd has joined #osdev
<mrvn>
clever: or learn how the REL entries work and use a loader stub.
<mrvn>
or PIC/PIE
<clever>
the official firmware, has its own horid solution
<clever>
the top of ram, the gpu_mem config value, and fixup.dat are mixed together (no idea how), to create a binary patch against the whole damn elf
<clever>
it doesnt relocate .text, it patches the entire ELF, headers and all
<mrvn>
or use the solution geist has: During build you scan the elf file for the relocation info and generate your own data structure for where to start the _start address and glue that to the kernel image.
<clever>
thats kinda what the official firmware does
<mrvn>
which just shows that the ELF relocation methods are so bad you don't want to do that during boot.
<dh`>
there are kernels that relocate themselves via PIC at boot time, I've seen it done
<clever>
arm32 linux kinda does that, the decompression stub
<clever>
it has a PIC asm blob, that relocation patches the decompression (compiled C) stub
<dh`>
but it also makes you exciting problems if you are trying to e.g. use address constants in trap handling
<clever>
then it decompresses the real payload to some other addr, which also deals with loading it to a more sane addr
<clever>
then it exits C, and fills the mmu config in, and jumps to virtual
<mrvn>
On x86 where you need a 32bit stub to get to 64bit having a loader stub also makes a lot of sense. Just have a ld32.so before your kernel.elf and that maps the kernel into 64bit address space and jumps to it.
<mrvn>
The loader stub you can write in asm as truely position independent code
<clever>
aarch64 kinda put its foot down and just banned self decompressing stubs
<clever>
so the aarch64 entry point only sets up the mmu and thats it
<mrvn>
how would it prevent that?
<clever>
the kernel just doesnt have the code to decompress itself anymore
<clever>
its not an option it `make menuconfig` either
<mrvn>
but you could.
<mrvn>
linux just chooses not to
<clever>
yeah
<clever>
if i was making my own aarch64 kernel, i dont have to follow the same rules
<mrvn>
if your firmware uncompresses your kernel for you then there isn't much point in decompressing yourself.
<clever>
yeah, thats the rule linux set on aarch64
<clever>
if you want a compressed kernel, the bootloader has to undo it first
<mrvn>
it's much easier to decompress before you set up the mmu.
<clever>
which arm32 linux did do, but it was kinda a mess
<mrvn>
as in decompress the kernel to anywhere and map it to the virtual address it expects.
<clever>
some PIC asm will patch the decompression code, so it can decompress with the MMU off
<clever>
but if you drop decompression support, you dont have to patch bunzip to run without the MMU
dh` has quit [Quit: brb, client is hosed]
dh` has joined #osdev
gog has quit [Ping timeout: 272 seconds]
<clever>
mrvn: one reason i kinda want to use what elf has, is that i dont want to create the problem of having to keep the elf and relocation data in sync
<clever>
if you upgrade just the elf, the old relocation file will just shred the new code
<clever>
having both bundled in one file makes it more idiot proof
<clever>
but a fixed load addr like 128mb can also do that, and be far simpler
<mrvn>
clever: deprecating the legacy API unless they only want 128MB sounds perfectly fine to me.
<clever>
yeah
<clever>
i was loading to 64mb, but i recently discovered a nasty surprise in u-boot
<mrvn>
if you can use a fixed address that is worlds simpler to build.
<clever>
it assumes the arm has full control of the 0-128mb range, and relocates itself to the top of 128mb
<clever>
i'm already using 64mb as a fixed addr, but now see that 128mb seems like a safer bet, then u-boot and the firmware would be perfectly loading on either side of the 128mb boundary
<mrvn>
one problem might be to load a kernel+initrd that is big. Things like uboot might not support loading it to the second memory segment.
<mrvn>
but then how does that work now if uboot puts itself at the end of 128MB?
<clever>
uboot comes in ~2 stages, the SPL is typically linked for a fixed load addr (but can be PIC), and copies uboot proper to a different fixed addr (top of 128mb), before it parses any DT i think
<mrvn>
Does it only support kernel up to 120MB and loads the initrd into the second memory segment?
<clever>
uboot proper then has a real malloc, and can parse DT and load things anywhere
<clever>
i assume it can load the initrd after uboot, but uboot being in the middle kinda fragments your free space
<clever>
i think that 128mb is also a compile-time constant
<clever>
and can be tuned to whatever makes the most sense for the platform
<mrvn>
maybe it's just the first stage that's at 128MB and the uboot proper can be anywhere
<clever>
if the platform has predictable ram, you can put uboot at the top of ram
<clever>
last time i was debugging u-boot, one of its memory allocation routines glitched hard, and allocated some state to ~+3gig physical
<clever>
totally unmapped space
<clever>
and to my horror, i discovered that doesnt cause a bus fault
<clever>
writes are discarded, reads return zero
<clever>
so all state was silently zero'd out upon readback
<mrvn>
yeah, I don't get why the bus doesn't get a timeout and the cpu fault
<clever>
its either a config flag i missed (should intentionally do that under the official firmware) or a serious oversight in the axi interface design
brynet has joined #osdev
<clever>
its also complicated by the custom mmu involved
justache has joined #osdev
<clever>
for the pi0-pi3, the arm's "physical" addre space, is made up of 64 pages of 16mb each
<clever>
each page can be mapped to any addr (2mb alignment i think) on the real bus
<clever>
so i could totally scramble the arm->bus address mappings, if i wanted to wreak havoc with dma :P
<mrvn>
that would require a big hughe device tree range list
<clever>
64 entries max
<clever>
but that could also let both sides load to 0
<clever>
and just dont map 0 to 0
<clever>
i could even do light, page 0 (0-16mb) is mapped weirdly, but all other pages are mapped normally, and you just dont dma from page 0
<clever>
DT would be simple then, and both sides can load to 0, the first 16mb is isolated
<mrvn>
still would need an extra range statement for the DMA mapping
<clever>
but, i would need to toss in some L1/L2 flushing in my bootloader
<clever>
because the bootloader is also loaded to 0
<clever>
you already need that range statement in DT, because dma has to start from the 0xc000_0000 addr
<clever>
all you would be doing, is +16mb to both the parent and child addr
<clever>
this is also a problem the pi1 created, and DT later solved
<clever>
the pi1, only has an L1 arm cache, 16kb i believe
<clever>
no L2
<clever>
to get any kind of reasonable performance, all 64 pages of the arm, are mapped via the VPU L2 cache, 128kb
<clever>
and to make dma coherent, you flush the arm L1, then tell dma to read via the VPU L2
<clever>
there is a config.txt to disable that, but you also had to compile linux specially to dma right
<mrvn>
and you have to map pages as 16k chunks or the page coloring craps out
<clever>
DT automates that mess
<clever>
page coloring, is that related to what bank of dram things map to?
<heat>
cache
<mrvn>
bits of the address are used to pick a cache slot or might even not me remapped, can't quite remember. But if you map 4k chunks of a 16k block randomly you get problems in the cache.
<clever>
ah yeah
<clever>
a certain bit range of the phys addr is used to index into the cache, and then the tag (the other bits) is checked in parallel
<clever>
4-way cache, having 4 cache lines at a given index
<bslsk05>
docs.freebsd.org: Design elements of the FreeBSD VM system | FreeBSD Documentation Portal
<mrvn>
The CPU was basically designed to only run with 16k pages.
<clever>
so if you only use the first 4k out of every 16k chunk, you may have an abnormally high cache wastage
<mrvn>
clever: i.e. you only get 25% of the cache-
<clever>
according to my notes, the L1 cache on the bcm2835 was 4-way, with 8 words per line, 16kb total
<clever>
8 words, would be 32 bytes, 4 way means a given index can hold 128 bytes, and with 16kb in total, that means there are 128 of those slots
<clever>
so part of the addr, is turned into a 0-127 index? and then it compares the rest of the addr, against the 4 tags, to select one of the 32 byte lines?
<clever>
and if its using bits of the addr that conflict with your allocations, you may only ever get an index in the 0-31 range?
<clever>
with a 32 byte cache line, that means the lower 5 bits (4:0) are an index into the line, so those can be ignored
<mrvn>
clever: I think the cache was also buggy with multiple mappings. If you map a 4K page at 0 and 4096, write to one and read from the other you get garbage.
<clever>
i believe arm uses physical addresses for all cache logic
<bslsk05>
www.reddit.com: Beginner question: Do you guys use software to simulate an "empty computer" when developing an OS? : osdev
<clever>
so that cant happen on arm
<mrvn>
clever: not supposed to but bcm is screwy
<heat>
i feel like this person has just discovered emulators and i find it lovely that it happened on r/osdev
<clever>
mrvn: i have heard, that the bcm2835 axi port is ultra buggy with async and reordering
<clever>
mrvn: if you fire out 2 reads to different peripherals, due to all of the axi fifo's, the answers can come back out of order
<clever>
and the arm's axi master, cant deal with the re-ordering
<mrvn>
clever: yep.
<clever>
so the results get swapped
<clever>
and they recommend you barrier any time you switch between peripherals
<clever>
i assume that was fixed on the bcm2836, because you cant coordinate 4 arm cores to barrier properly when switching peripherals
<mrvn>
indeed, with SMP that's impossible to do
<clever>
another thing a properly functioning axi port should do, is dynamicaly change the transfer width
<clever>
for example, (going off memory), the bcm2835 arm axi port, is only 32bits wide
<clever>
so it can only ever move 32bits per clock, but you can have a 4 clock burst, sending 256 bits in total
<clever>
128*, math is hard :P
<clever>
but, other internal busses are 128bit wide, so it can translate that into a single 128bit transfer
<clever>
mrvn: however, some of the ports (peripherals) assume you only ever do a 32bit transaction, and do implementation defined funky things when you violate that, and dont translate into a burst
<clever>
if you do a 64bit load, covering reg1+reg2, youll just get reg1 repeated twice i believe
heat has quit [Remote host closed the connection]
<mrvn>
don't do a double register load/store to MMIO
<clever>
exactly
<mrvn>
which is why you have to declare every MMIO volatile
heat has joined #osdev
<clever>
the VPU also cant even do a 64bit load/store in scalar mode
<clever>
all regs are 32bits max
<clever>
you need vector mode to even do that, which does 16 consecutive addresses, 8/16/32bits each
<clever>
so 16/32/64 bytes in a burst
<clever>
oh right, forgot, the vpu had load-multiple, that can trigger it
<clever>
mrvn: but another fun bug, is mis-aligned 8bit loads
<clever>
basically, there is a switch-case block in hardware, that assumes your only ever giving it 32bit aligned addresses
<clever>
if something isnt 32bit aligned, its not a valid register
<clever>
except on the sdhci peripheral, that has 16bit registers, with bugs
<clever>
consecutive access to both halves of a 32bit reg corrupt the transfer
<zid>
Disregard pages of clevertext, aquire pencilcases
<clever>
lol
dza6 has joined #osdev
les has quit [Ping timeout: 260 seconds]
les has joined #osdev
scoobydoo has quit [Read error: Connection reset by peer]
scoobydoo has joined #osdev
dza has quit [Ping timeout: 260 seconds]
night has quit [Ping timeout: 260 seconds]
dza6 is now known as dza
night_ has joined #osdev
matthews has quit [Ping timeout: 260 seconds]
matthews has joined #osdev
mjg has quit [Ping timeout: 260 seconds]
heat has quit [Ping timeout: 246 seconds]
zaquest has quit [Remote host closed the connection]
bleb has quit [Ping timeout: 260 seconds]
woky_ has joined #osdev
bleb has joined #osdev
woky has quit [Ping timeout: 260 seconds]
zaquest has joined #osdev
GeDaMo has joined #osdev
scoobydoo has quit [Read error: Connection reset by peer]
scoobydoo has joined #osdev
poisone has joined #osdev
poisone has quit [Read error: Connection reset by peer]
gildasio1 has quit [Ping timeout: 255 seconds]
gildasio1 has joined #osdev
ZombieChicken has quit [Quit: WeeChat 3.6]
poisone has joined #osdev
<geist>
oh oh are you ready for the fallback?
<geist>
awww yeah
<poisone>
i
<geist>
1am again!
Stella is now known as Stella[OotC]
<poisone>
UPTIME: 0 days, 0 hours, 33 minutes
Goodbye_Vincent has joined #osdev
rwb is now known as rb
Burgundy has joined #osdev
<Jari-->
completed watching Terminator Salvation+Genisys on Netflix
<Jari-->
now watching Deep Space Nine
<Jari-->
interesting computers, operating system on those
<Jari-->
16 million colours, possible, and say, you want to use monochrome logos
<Jari-->
mainstream OS logos, etc.
<Jari-->
I have epilepsy, and never got attacks on colorful logos, etc. flickers
<Jari-->
made JTMOSDEV (google it) system under epilepsy attacks
scoobydoo_ has joined #osdev
scoobydoo has quit [Ping timeout: 260 seconds]
scoobydoo_ is now known as scoobydoo
wootehfoot has joined #osdev
Burgundy has quit [Ping timeout: 260 seconds]
Burgundy has joined #osdev
linkdd has joined #osdev
Ali_A has joined #osdev
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
genpaku has quit [Remote host closed the connection]
genpaku has joined #osdev
gog has joined #osdev
[itchyjunk] has joined #osdev
alexander has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
alexander has joined #osdev
dormito has quit [Ping timeout: 255 seconds]
eau has joined #osdev
wootehfoot has quit [Ping timeout: 252 seconds]
genpaku has quit [Ping timeout: 260 seconds]
genpaku has joined #osdev
elastic_dog has quit [Ping timeout: 276 seconds]
elastic_dog has joined #osdev
dormito has joined #osdev
gog` has joined #osdev
gog has quit [Killed (NickServ (GHOST command used by gog`))]
gog` is now known as gog
terminalpusher has joined #osdev
spikeheron has quit [Quit: WeeChat 3.7.1]
spikeheron has joined #osdev
terminalpusher has quit [Remote host closed the connection]
srjek_ has joined #osdev
<Jari-->
hi
\Test_User has joined #osdev
eau has quit [Ping timeout: 260 seconds]
eau has joined #osdev
ckie has quit [Quit: *poof*]
ckie has joined #osdev
Arthuria has joined #osdev
terminalpusher has joined #osdev
terminalpusher has quit [Remote host closed the connection]
poisone has quit [Read error: Connection reset by peer]
wootehfoot has joined #osdev
Arthuria has quit [Remote host closed the connection]
srjek|home has joined #osdev
srjek_ has quit [Ping timeout: 260 seconds]
wootehfoot has quit [Ping timeout: 248 seconds]
romzx has quit [Ping timeout: 272 seconds]
linearcannon has joined #osdev
d5k has joined #osdev
bauen1 has quit [Ping timeout: 255 seconds]
bauen1 has joined #osdev
romzx has joined #osdev
<yuu_>
I/
<yuu_>
o/
<gog>
I/O is very important
<GeDaMo>
I/O/yuu_ :P
<yuu_>
Hehe
<yuu_>
Hello, I noticed I was back in here, glad to be here
<bslsk05>
www.quora.com: Which hypervisor's provide FULL hardware support for MS-DOS, Windows 95, and Windows 98? I have hundreds of old games I can no longer play because of OS and hardware incompatibility. - Quora
<Jari-->
sorry, wrong chan
<Jari-->
but was like looking for any MS-DOS DJGPP then CWSDPMI-basedd hypevisor which runs actually under MS-DOS environment
<Jari-->
more ideal than running XEN
<Jari-->
not meaning DOS boot up for Linux loadlin
<Jari-->
not much of MS-DOS left, it would be actual own individual seperate operating system
<Jari-->
well, cwsdpmi is an OS or not ?
<gog>
no
<gog>
it's an implementation of DPMI
<Jari-->
gog: yeah BIOS reliancy calculates it is MS-DOS I gues
<gog>
it runs inside of MSDOS
<Jari-->
yeah because it preserves the first megabyte
<gog>
and it depends on DOS interrupt handlers for system calls
<gog>
it's not a standalone OS
<Jari-->
gog yeah so its VM86 it basically just compatibility mode with MS-DOS reliancy
<zid>
I am also a DPMI implementation
<Jari-->
zid: no clues on DPMI interface calls, etc. insights, dunno if many have this info
* gog
gives zid a bagel
<zid>
I've never had a bagel
<gog>
whaaaaaaaa
<zid>
I'm not jewish enough I guess
<Jari-->
football anyone?
<heat>
depends
<heat>
what side of the pond
<gog>
neither kind of fooballs interest me anymore
<heat>
booooooooooo
<zid>
we need to get heat on HRT so he shuts up about it too
<heat>
lol
* gog
hands heat cat ears and pink stripy thigh-highs
<gog>
you'll need these
<heat>
UwU
<heat>
what's this
<gog>
OwO
<zid>
PooOoo
xenos1984 has quit [Ping timeout: 255 seconds]
xenos1984 has joined #osdev
<Jari-->
This program requires Microsoft WIndows
<Jari-->
I had this on my COmmodore 64, programmed with turbo assembler
<heat>
winnie the pooh moment
<Jari-->
a PC simulator
<gog>
zid: what kind of pizza should i get
<zid>
jala + mush
<Ermine>
gog: may I pet you?
<gog>
Ermine: yes
* Ermine
pets gog
* gog
prr
<heat>
has it escaped everyone how winnie the pooh is fucking yellow
<heat>
bears are not yellow
<gog>
he's got hepatitis
<zid>
some are
<zid>
for example, winnie the pooh
* Ermine
is less anxious now and hopes so is gog
<gog>
:)
<heat>
gog, winnie went to too many gay clubs without protection 😳
<gog>
oh bother
<gog>
gotta safeguard your health, winnie
<zid>
bouncing's what tiggers do best
<zid>
tigger's the gay one
<gog>
help me pick a pizza
<zid>
I already did
<gog>
your choice was bad
<gog>
try again
<kof123>
spinach, mushroom, ...
<zid>
fine, if what I think is good is bad, then what I think is bad must be good
<zid>
sweetcorn and pineapple
<gog>
a less terrible choice
<zid>
okay so gog is a non-person
<zid>
some kind of weird alien
<gog>
i usually get blaze
<gog>
chili, garlic, jalapeno, pepperoni, pepper cehese, black pepper
<bslsk05>
lore.kernel.org: [PATCH] drm/atomic: do not branch based on the value of current->comm[0] - Jason A. Donenfeld
<kof123>
it includes a yamaha midi software synthesizer thing, and presumably a soundfont
<kof123>
that's right, midi
<heat>
j`ey, aw wtf
<heat>
how was that ever merged
<zid>
gog: It works nicely, but has 1 silly achievement to max your gil
<zid>
which takes a few hours of grinding movers and selling all materia buuut ff7 has a bug where if you gain too much materia it deletes other stuff to make room
<j`ey>
heat: i guess torvalds missed it go by
<zid>
guess who lost the underwater materia and had to roll back
Raito_Bezarius has quit [Ping timeout: 255 seconds]
<j`ey>
heat: congrats tho, now cp onyx/* linux/* and send that as a patch
<heat>
a good chunk of the codebase would be much improved
<j`ey>
a good chunk would be deleted
<heat>
particularly all the arm and arm64 code
<heat>
it wouldn't be there! how delightful
<j`ey>
:<
<gog>
:D
<heat>
hey
<heat>
there's no CoW for you to break if ARM isn't even supported
wootehfoot has joined #osdev
<heat>
there, saved you some trouble
<j`ey>
:<
<gog>
i decided on blaze
<heat>
should've gone with bazel
<heat>
blaze really is... an odd build system to choose
<gog>
i meant pizza
<heat>
pizza? is that a new build system?
<gog>
yes
<Ermine>
heat: congrats on patch!
<Jari-->
"synthetic DNA molecules are now considered as serious candidates for this new kind of storage"
<heat>
thanks
<Jari-->
new SSDs
<heat>
i did slightly fuck up because I forgot to rebase on kees's tree
<heat>
hopefully that doesn't matter? or I'll just send a rebased v2
<zid>
wow heat
<zid>
pissing off morpheus
<heat>
*torvalds angrily writes an email calling me INCOMPETENT and STUPID and DUMB*
<zid>
nah you're a rank amateur
<zid>
that == 'X' thing is dumb as shit
<zid>
but wtf is that left hand side
<zid>
current process's name is called 'comm' ?
<heat>
yrah
<heat>
see htop
<zid>
That's why nobody noticed it, anyway
<zid>
the true fix would be to rewind time to change that to be process_name_do_not_strcmp_to_block_things_wtf
Ali_A has quit [Quit: Client closed]
<heat>
or as BSD would call it, pndnstbtw
<gog>
BaSeD
bauen1 has quit [Ping timeout: 248 seconds]
<gog>
is that why you were asking if geist had any ppc64 machines
<heat>
yes
<gog>
nice
<heat>
geist is the likeliest person to have random architectures lying around
<gog>
true
<geist>
heh yep
<geist>
but yeah i haven't ran the g5 in a while. primary reason is it's a power hog
Raito_Bezarius has joined #osdev
<heat>
i bet you don't have a loongarch yet
linearcannon has quit [Read error: Connection reset by peer]
<geist>
surpisingly a lot of my other old exotic machines really arent. (the vax, sparcstations, etc). we forget that moderns stuff since about 2000 draws a shitton of power. prior to that you could get away with just a heatsink or maye a heatsink and a little fan on your 486 or pentium
bauen1 has joined #osdev
<geist>
so a early 2000s era machine pulling 200W to do what a modern machine can do in 5w feels wrong enough that i generally dont keep them on much
xenos1984 has quit [Ping timeout: 248 seconds]
<geist>
whereas prior to 2000 they do even less work but maybe pull 35 or 40 so i feel better about that
<heat>
dig on zid who uses a 10 year old xeon
<geist>
yah but they explicitily downclock it to use less
<geist>
gets a pass
<zid>
hey it's on 0.8V and uses 30W
<geist>
exactly
<gog>
just turn down the thermostat, use it to heat your house
<zid>
It's overclocked and undervolted at the same time because sandy is amazing
Raito_Bezarius has quit [Max SendQ exceeded]
<geist>
gog well honestly yes, once it starts getting cold in the room and its either some space heater in the form of some computer equipment or it's a space heater in terms of a heat pump, it's not too bad
<geist>
though the heat pump is hypothetically more efficient
<zid>
I wish people didn't go fucking *insane* on bitcoin and start using gpus and shit so that I could mine on my cpu to heat my room
<zid>
and.. have a greater than 1 in heatdeath chance of actually getting a hit
<heat>
bitcoin hasn't used GPUs in a loooooooong time
<heat>
all about ASICs baby
<zid>
yes I was describing the progression
<zid>
gpus, then shit (like asics)
<geist>
yah i'm not much on regrets, but bak in 2010 when i fiddled with bitcoin for lulz, i had spent a few weekend mining bitcoins back when you actually could
<gog>
the heat pump is not only hypothetically more efficient, it literally is :P
<zid>
heat pumps are black magic
<geist>
and earned like 2.4 of them. but i stopped after ike weeks because it was heating up the room and it was summer
<zid>
stealing warm from outside when it's already cold outside
heat is now known as _Heat
<geist>
gog: yeah, though then depends on the particular model, how cold it is outside, etc
<geist>
but yeah
<geist>
last year when it got substantially below freezing my main house heat pump switched to pure resistive mode and burned a *crapton* of power over the course of 2 or 3 days
<zid>
I think like, chicago, which is kinda famous for getting cold, only gets too cold for the refrigerant used in a heat pump for like 1 day a year
<zid>
if you get a modern one with good.. fluids
<geist>
yah it was really odd here that it got that cold, hence the heat pump not being designed to dealw ith it
<geist>
but omg i was burning power like craszy. i think for 3 days i was pulling about 250kWh a day
<geist>
which actually lines up: the resistive aux heater in the house system is like 8kW
<zid>
how did your house not melt
<geist>
and it was just running it flat out
<geist>
because 8kW os
<zid>
is your house actually a swimmingpool
<geist>
isn't enough. even with that running flat out the temp inside was like 60F
<zid>
8kW would get my entire house like.. hot to the touch :P
<_Heat>
250kwh is insane
<zid>
the bricks would start to glow
<geist>
250 lines up though: 24 hours in the day, about 8 or 9kW per h
<geist>
that was very exceptional. it rarely gets below like, -2C or so here
<geist>
this was more like -10, beyond the lock out temp of the compressor, etc
<geist>
and then it just stayed -10 continually for days on end
<GeDaMo>
Polar vortex bulging out
<geist>
the house is oddly efficient/inefficient in terms of insulation
<geist>
my house is a log cabin, so there's no actual insulation i the walls per se
<zid>
mine's double hulled brick with fibreglass between so yea
<zid>
8kW would literally cook me
<geist>
so it's odd in that it doesn't have a traditional r factor
<GeDaMo>
My house is granite, no insulation either
<gog>
my house is concrete and well insulated
<gog>
but it also does not breathe
<geist>
so i think the model is that it has technically fairly low r factor, but it also has a strong 'memory'
<zid>
Summer this year was >40C indoors :D
<geist>
so in general if it dips or gets hot for short periods of time (day or two) it doesn't really respond much to it
linearcannon has joined #osdev
<geist>
but eventually it'll normalize the wall temperatures to the outside and then you're in trouble
<zid>
I wanna fill my walls with water when it's hot
<zid>
and keep it all year
<geist>
so i think that wsas the problem with the -10 last year. once the walls normalized to being cold to the touch (i think they were like 40F for a while) then the heat pump just has to run flat out to stay ahead
<geist>
but yeah basically log cabins: do not recommend for very cold temps unless you ave a lot of heat source (ie a fireplace)
<zid>
mmm particulates
<zid>
my favourite
<geist>
but they work nicely for temperate zones where there's a fairly large daily temp swing: they act as a nice thermal battery over the day
SGautam has joined #osdev
ZombieChicken has joined #osdev
xenos1984 has joined #osdev
<geist>
_Heat: oh i guess that's to C
<geist>
took me a minute to figure out why they're ust now talking about it
<_Heat>
yeah, it's in C23
<zid>
make the type of nullptr incomplete
<_Heat>
clang HEAD now supports it
<zid>
doesn't that stop it being embeddable into structs
<_Heat>
why would you want to embed nullptr_t
<zid>
I like how all of their rationale is that
<zid>
NULL can be defined to 0 or (void *)0 the former of which most people view as a fucked up bug
<zid>
so their solution is to do the C++ thing and add another incompatible thing on top
<zid>
and all the code examples are in C++ style too as a bonus
<_Heat>
nullptr works fairly well
Raito_Bezarius has joined #osdev
<mrvn>
I wouldn't mind loosing the conversion from nullptr to bool
<geist>
yah form a very quick glance at the page before i got bored it seemed to state that most of the problem is broken NULL #defines that *dont* just define it as ((void *)0)?
<geist>
ie, the void * one is basically okay its just not standardized and thus broken impls exist
<zid>
spec says 0 and (void *)0 are the two alternatives
<zid>
you just need to delete the former from the spec
<zid>
and bam, no more problems
Raito_Bezarius has quit [Ping timeout: 252 seconds]
<mrvn>
geist: I see a lot of "If NULL has integer type" in the rationale.
srjek_ has joined #osdev
* geist
nods
wootehfoot has quit [Ping timeout: 260 seconds]
<mrvn>
"In memory, nullptr is represented with the same bit-pattern as a null pointer constant of type void*." So really, as geist said, drop the "0" and only leave "(void*)0"
srjek|home has quit [Ping timeout: 260 seconds]
<mrvn>
one step further would be: make "NULL" a keyword. Although having c++ compatibility with nullptr is nice.
<mrvn>
"nullptr is permitted as argument to ..., as long as the function interprets it as pointer to void or character type." Urgs. Thinking about it it makes sense. But have you ever seen any code that casts their nullpointers to the right pointer type on function calls?
mjg has joined #osdev
<geist>
hmm i thought the whole point was it could cast to any pointer type, much like a void *0 would
<geist>
ie, it's a null pointer of any pointer type
<geist>
or are they attempting to tighten that up?
<mrvn>
geist: but with "..." what would be the type of the pointer to cast to?
<geist>
fair point: and you can have short and wide pointers or whatnot
<mrvn>
No, you can't. storage size must be the same.
<geist>
well, i guess in C thatd be an extension. it's C++ that has the doublewide pointers (pointers to virtual members)
<mrvn>
But casting between types can change the bit representation.
romzx has quit [K-Lined]
<geist>
i guessin a case where you're using say x86 far pointers one defines what the null pointer bit pattern is
<geist>
presumably 0:0
<mrvn>
and then all pointers would have to be far
<geist>
and not just 'something aht ends up equating to 0
<geist>
'
Raito_Bezarius has joined #osdev
<mrvn>
That "nullptr in ..." thing really makes no sense I think. What it really comes down to is that you can only pass nullptr to ... for void* and char*. Every other case is UB.
<mrvn>
(and that is already the state of C)
<mrvn>
Are there actually real archs where the bit representation for pointers of different (data) types differ?
<_Heat>
hwasan :)
nyah has joined #osdev
<geist>
yah actually brngs up the quetion what the bit pattern of a hw tagged null pointer is
<geist>
ie, are you obligated to strip the tag before comparing to 0 (probably)
<mrvn>
geist: same as void* and char*. So they must have the same bit pattern.
<geist>
yah but with hw tagging you have this whole thing dealing with also comparing the tags when comparing with another pointer, i guess.
<mrvn>
Would a compare convert the nulltpr to the other type or the other type to nullptr_t?
<geist>
though maybe not. depends on what sw does with the tags really
<geist>
or i guess also kinda specifically: would you ever tag a nullpointer or do they always implicitly have tag 0
<mrvn>
geist: the specs say it has the bit pattern of "(void*)0" or "(char*)0"
<mrvn>
s/or/and/
<geist>
i'm thinking specifically of software based tagging schemes that use something like ARMs TBI (top byte ignore) feature. the top 8 bits are simply ignored by hardware so it's a sw problem
* geist
nods
<mrvn>
i.e. a sane NULL and nullptr must have the same bit pattern
romzx has joined #osdev
<geist>
anywya if i didn't report it: proxmox VM binary review: 1
<geist>
seems like a nice system that's doing things in a straightforward way
<geist>
there's still some parts tha tyou have to open a shell to do (like mount a NFS share to locate vms) and whatnot but it seems like a pretty straightforward to use system
<geist>
additional things i can do with it easily tht i haven't really figured out how to do manually with all my scripts to run qemu instances is it automatically supports virtio memory balloons
<geist>
and it seems to actually work. i created 2 or 3 ubuntu instances and way oversubscribed and then observed what happened, and though it worked slowly it did eventually balloon al of them to de-oversubscribe the system
romzx has quit [K-Lined]
<mrvn>
geist: "After WG14 refused a specification for a simple macro with value (void*)0 ..." They tried your way.
<bslsk05>
en.wikipedia.org: Kermode bear - Wikipedia
<heat>
those are white
<heat>
it even says so in the article
<zid>
'white'
<zid>
it's actually a black bear.
<heat>
yellow bears are not a thing
<zid>
so ergo, winnie the pooh is actually black and you're just racist
<heat>
much less so yellow bears that wear a fucking shirt
<zid>
sorry you had to find out this way
<heat>
why does he wear a shirt
<heat>
and no pants??
<heat>
is pooh a perv?
<zid>
it makes you MORE naked if you do that, true fact
<zid>
shirt and no pants > nothing at all > pants and no shirt > pants and shirt
<heat>
programming socks and nothing else >>
<GeDaMo>
Might be a bit chilly :|
<Jari-->
reiserfs stable?
<Jari-->
rating 5 317 reviews - but this is "recovery software"
<geist>
reiserfs 3 was pretty stable at the time
<geist>
had fairly bad failure modes if externally corrupted, but for a while it was hella fast and stable
<Jari-->
xfs ftw
<geist>
ah interesting, apparently reiserfs3 is declared deprecated
<Jari-->
well a years back, hosting companies defaulted on reiserfs
<Jari-->
because of saving resources hell a lot
<Jari-->
heck
<geist>
yeah that's what i mean. at the time the general defacto linux fs was ext3, and i would generally outperform it a lot
<geist>
it's heydey was like 2002
<Jari-->
:) no it was like 2015
<Jari-->
or
<Jari-->
and I got a corruption lost my longest project
<Jari-->
Perl code..
<Jari-->
Facebook+YouTube => killed like 10 million small companies on the fly
<Jari-->
by eliminating the competition
<geist>
huh?
<geist>
what was in 2015?
Piraty has quit [Quit: -]
<gog>
you also can't have a reiserfs image on a reiserfs filesystem iirc
<geist>
presumablky due to fsck reasons
<zid>
(plus it murders your wife, huhu)
<gog>
yes
<geist>
that being said rfs3 was the first fs i had really seriously looked at where the general idea is to just toss everything into one massive btree
Piraty has joined #osdev
<geist>
it's [rpetty simple by 'toss everything into one btree' fs standards
<geist>
but it's a fairly good example of what you get when you just follow that logical path
<geist>
i thnk that was the general origin of the issues. a failure of the btree was a serious issue since reconstructing it was rally really difficult
<Jari-->
Commander X-16 implements VFAT
<geist>
so i think the rfs cant be contained in it was something to do with the fsck detecting data blocks as metadata blocks
<Jari-->
so VFAT is still valid market FS
<geist>
which is a general problem if you let metdaat and data mix it up in the same area of the device, without a clear distinction between zones
<Jari-->
USB sticks still ship with exfat ?
<Jari-->
10 terabyte ones etc. ?
<Jari-->
NTFS no ?
<geist>
i remember BFS had the same problem: since inodes could be allocated anywhere the fsck utility could easily pick up something in a file itself as an inode
<mjg>
hehe
<Jari-->
oh well uefi defaults on ntfs - what else do we have ?
<geist>
ext* fses avoid it by having the inodes in dedicated spots, and btrfs and presumably xfs avoid it by having clear distinction between metadata and data stripes
<gog>
uefi defaults on vfat
<Jari-->
uefi with ext4 support would be preferable though, would please the marketing adopt please ext4
<geist>
Jari--: hmm? yes of course. FAT and exfat are still the defacto (and actually specced) fses for usb sticks
<mjg>
hm i think there is no such problem on ufs
<mjg>
go bsd
GeDaMo has quit [Quit: You are becoming what we French call 'Le Fruitcake'.]
<geist>
mjg: same reason i'm sure: the inode table is allocated up front, this inodes can only exist on certain spots on the disk
<Jari-->
Google : exFAT's maximum file size limit is 16EiB (Exbibyte). exFAT is compatible with more devices than NTFS, making it the system to use when copying/sharing large files between OSes.
<Jari-->
?
<geist>
correct
<Jari-->
oh
<mrvn>
geist: ext4 finally added a variable for how much of the inode table has been initialized and creates them on demand.
<Jari-->
basically exfat is a major leap, I thought it would suck, but this means 128 PB is the maximum size of a disk for exfat
<geist>
functionally speaking exfat is fat64, though it's different enough from the previous FAT file systems that it actually gets a new name
<mrvn>
And sane FSes added a filesystem ID to inodes so they wouldn't detect a FS image on the disk as part of its own FS on recovery.
<Jari-->
is it possible to boot up an exfat ? custom roms available to boot up an exfat drive? for qemu?
<geist>
mrvn: yeah so there must be a clear marker of 'not initialized'
<geist>
Jari--: sure. here's the problem: exfat is i think patent encumbered
<geist>
so in general it hasn't been a good idea to use it
<mrvn>
geist: something as simple as "largest used inode" works well
<geist>
i think linux only has gotten support for it in the kernel recently, and that was because MSFT bequeathed it to the kernel
<geist>
but i think the licensing terms are still not fully open
<geist>
mrvn: yeah and since it's per allocation group, you acn just keep a counter locally there
<geist>
if there's say 32 allocation groups, each with say 10k inodes, you can just store 32 local counters of the highest initialized inode, etc
<mrvn>
geist: that would require initializing allocation groups. Better to keep it in the superblock
<geist>
sure but the allocation groups already have a header. that's easy to put out
<geist>
frankly the whole splitting the fs into zones and having a local allocation group header, etc is archaic, but it was inherited from FFS, etc
<mrvn>
it's so you reduce seek times
<geist>
made sense when you had old style spinning media, but nowadays nothing really does that anymore
<mrvn>
Even with spinning disks isn't most of the seek time now rotating the disk to the start of the sector and not moving the head?
<geist>
right, vs having all the inodes crammed to the front of the disk it was a local win. more modern designs either dont care because SSD or the metadata can live anywhere on the disk so you can optimize it howeve ryou see fit over time
<geist>
good question: i guess you can fairly easily calculate what the rotational latency is, since it's basically fixed on RPM
<mrvn>
except that's variable too :)
<geist>
right except not really. there are only a finite number of RPMs
SGautam has quit [Quit: Connection closed for inactivity]
<geist>
ie, 4200 5400 7200 10k. so you can fairly easily compute it there
<mrvn>
7200RPM = 0.008333s per rotation
<geist>
anyway that's the worst case rotational delay + whatever worst case seek delay
<mrvn>
seek time is what? 12ms?
<geist>
there you go up to 8.3ms for the platter to come by the sector
<mrvn>
Should be <5ms on average :)
<geist>
that's more variable, depends on the drive. dunno what modern times are, but i think in general you hear average latencies of like 12ms or so, so i think that's taking into account average seek + average rotational
<geist>
though i dont know what the distribution of that is
<geist>
and of coruse the drive is free to start reading whatever sector it finds as soon as the head gets to it and start filling in a track cache buffer
<mrvn>
I always wondered if they offset sectors on neighbouring tracks so that a sequential read can seek and get the start of the sector right away.
<geist>
so subseuqnet sector reads off the same track are probably going to be either ready beacuse it already spun by, or will soon be there (within 8ms)
<geist>
traditionally with stuff like floppy disks you actually can interleave the sectors, like 2:1 or 3:1
<geist>
so you get say sector 0 3 6 9 ... and then 1 4 10 ...
<geist>
that way as the sectors go by the host/chipset has time to decide if it wants to read the next logical sector
<mrvn>
true. Seek, wait for a "start of block" marker and start reading a complete revolution. Then shuffle the blocks around.
<geist>
now of course how the disk controller lays out the sectors and what tracks are where is totally up to the disk
<geist>
but at least with old things like floppy disks its still easy to understand the logic
<mrvn>
That reminds me of the floppy for the C64. The CPU is to slow (or so they thought) to decode the 10:8 encoding while reading the data so you would read a block, decode a block, read a block, decode a block. Half speed. Only like last decade someone figured out to do it interleaved.
<geist>
i asume even with SSDs there's read-ahead logic that starts to see where the next N pages are via the translation table and starts prefetching data from those flash chips + banks
<mrvn>
even DRAM does that
<geist>
mrvn: yep. apple2 did the same thing. it was even more basic since it didn't have a 6502 on board. since the host cpu itself was basically directly looking at the data under the head i think it generally used a 3:1 interleave so the cpu had enough time to process what it had just seen before
<geist>
totally soft sectored
<mrvn>
and that's just incrementing the row/column address for the next block of data.
srjek|home has joined #osdev
<geist>
that was woz's great invention re: the apple 2 floppy drive. it was totally dump, offloaded all the work to the host cpu which was basically written in hand assembly to have the timing precisely right
<geist>
as a result the disk drives were very cheap compared to almost all the other 8 bit micros at the time
<geist>
s/dump/dumb
<mrvn>
I wonder if I have my old C128+floppy somewhere in my parents basement.
<geist>
yah i have an 1570 drive over here somewhere
<geist>
last i fiddled with it it works fine
<mrvn>
I had a 1571. No turning over the disk.
<geist>
noice
<geist>
a C128 i'd actually like to have, but they're getting pricey. much more rare than a c64
<mrvn>
But it hat that rotation lock mechanism that easily breaks and not the push down lever.
<mrvn>
geist: 6502 + Z80 cpu.
<geist>
yeah there are some great youtube vids about the history of it. including talking to some of the guys that designed it
<geist>
it was a fascinating tale. they were just throwing everything at it and surprised they could even make it
<geist>
commodore was already spiralling out of control at the time
srjek_ has quit [Ping timeout: 260 seconds]
ZombieChicken has quit [Ping timeout: 255 seconds]
JudgeChicken has joined #osdev
sympt7 has quit [Ping timeout: 246 seconds]
<heat>
block groups still make some sense
<heat>
vs having a big table of inodes and shit
wootehfoot has quit [Ping timeout: 260 seconds]
<heat>
particularly if expanding/shrinking a filesystem, or checksumming stuff
<heat>
Jari--, tianocore has ext4 support, its up to the platform builder to enable it
<heat>
those nvidia SBCs have it
wootehfoot has joined #osdev
JudgeChicken is now known as ZombieChicken
wootehfoot has quit [Client Quit]
<mrvn>
heat: on the other hand inode tables make not much sense.
<heat>
sure they do, for expansion
<mrvn>
heat: inode tables, no0t block groups
<geist>
well it's the whole preallocating the inodes and thus having a finite number of them on fixed locations that i think is obsolete
<heat>
why does an inode table not make sense?
<heat>
finding an inode is O(1)
<mrvn>
because it limits you to a fixed inode per block ratio
<geist>
it makes sense, it's just obsolete
* heat
rants on fancy pants filesystems cough cough zfs and btrfs
<mrvn>
using the block location as inode number has O(1) finding too
<geist>
right, that's exactly the strategy bfs did
<mrvn>
and xfs and btrfs
<geist>
wellthose are different, because an inode is part of one or more btrees
<geist>
i consider that a completely different class. there's a tree, the tree has all these data structures spread across it. an inode is an amalgom of those data structures, and part of the index is the inode #
<mrvn>
using the block location makes it trivial to pick a number and making sure it's unique and tells you how to find it all in one simple algorithm.
<geist>
right. however as i was stating before it has a serious flaw: when doing a fsck to try to fix it you can't easily tell an inode from file data
<geist>
so you need a solid way to determine the difference, and such you need another mechanism to differentiate metadata blocks from data blocks
<mrvn>
yeah, that's the drawback. But easily solved by including an FS uuid
<geist>
so you're kinda back to square one, unless you also have a data structure that describes that
<geist>
the FS uuid doesn't easily solve it at all. you can still easily maliciously include some inode in your file data
<geist>
and then enxt fsck picks it up
<mrvn>
true, it only helps against accidental inclusion
<mrvn>
damn those hackers trying to insert a SUID root bash into the FS.
<geist>
xfs and btrfs at least have layers of allocation schemes, so you already have a fairly clear notion of which stripe/allocation group/etc is dedicated to metadata or file data
<mrvn>
zfs has block groups and each group can be data, metadata or in case of emergency: mixed.
<geist>
i dunno how say XFS solved it. NTFS solves it by hvaing all FILE records exist inside a metadata file itself. it's recursive, because the metadata file ($MFT) is also dscribed by the MFT
<geist>
but there are mechanisms there to help
<mrvn>
that doesn't help in the NTFS case when the metadata file is corrupted
<mrvn>
And that's the case where data blocks get picked up as metadata. When you can't follow the metadata to all files anymore.
<geist>
right but there are mechanisms there to avoid it
<mrvn>
Otherwise you just start at / and scan all directories.
<geist>
notably the first N blocks of the MFT are guaranteed to be in one spot (described by the superblock) and those file records are used to describe the MFT itself
<geist>
so it's pretty hard to corrupt the MFT's description of where itself is located
<mrvn>
Where you pick up data by accident is when scanning for lost / unconnected files.
<geist>
right
<mrvn>
or deleted files.
<geist>
in NTFS you'd just walk the MFT and look for all FILE records that say they are active, but appear to be unlinked
<geist>
OTOH NTFS is also journalled, so in general it's pretty hard to get it too corrupted
<geist>
as are any systems that have a functional journal
<mrvn>
I always like to have multiple ways to verify files
<geist>
yah and this is why more modern stuff has checksums of stuff
<heat>
ufs and ext2 ideal filesystems
<geist>
so interesting: playing with PCI passthrough on proxmox (and thus qemu/kvm)
<mjg>
normie
<geist>
so there are limitations. turns out for some dumb reason both of my server like machines with multiple on board nics put all of the nics in the same iommu group
<heat>
j`ey, so arm64 linux has every kmalloc 128-byte aligned
<heat>
wtf??
<geist>
so trouble is you can't pull one device out of an iommu group and pass it through
<geist>
it takes all the rest of the devices out
<j`ey>
heat: yeah, hence the series
<geist>
seems to be in both cases all of the motherboard nics are in the same group. both an intel xeon nehalem machine and a more modern ryzen
<heat>
how has this not resulted in crippling fragmentation
<geist>
of course any nics in slots end up in a different group
<heat>
8 byte allocations get 120 bytes of alignment
<geist>
apparently there are some thing syou can do with it (something called ACS) but in general if it puts things in the same group seems you're outta luck
<heat>
arm32 aligns to the L1 cache size
<heat>
which is also horrible but less so
<geist>
what do you mean 128? that's the largest L1 cache in arm
<geist>
thats why
<heat>
yeah, they're aligning every kmalloc allocation to the L1 cache size
xenos1984 has quit [Read error: Connection reset by peer]
<mjg>
heat: *every*?
<geist>
now a slightly better thing is to run time look up the L1 cache size, which is almost always 64
<heat>
mjg, yes
<geist>
but i know of at least one ARM machine that is 128
<bslsk05>
lore.kernel.org: [PATCH v3 00/13] mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8
<mjg>
not saying this means linux would not do it
<geist>
sure, i mean sometimes you gotta crack some eggs
<heat>
it looks like this is very much a thing on most architectures
<geist>
but yeah my guess is they are noting the fact that folks may be wanting to allocate things that they're DMAing into out of kmalloc (which is perhaps the real issue)
<geist>
but if you do, you almost always gotta arrange for your buffer to exist in its own cache line(s) and not aliased with anything else
<geist>
yay no cache coherency
<heat>
yeah
<geist>
i have some cheezass mechanism in LK to allocate +cache_line-1 and then align within that boundary or something
<geist>
these sort of dma buffers and whatnot are a big issue when porting cache coherency naieve code to something like ARM
<mjg>
heat: i would expect this is for some form of aigned alloc
<mjg>
not straight up kmalloc
<geist>
my guess is there's some api that bottoms out in kmalloc, so they had to pass it down
smach has joined #osdev
<heat>
* kmem_cache_alloc and friends return pointers aligned to ARCH_SLAB_MINALIGN.
<heat>
* and ARCH_SLAB_MINALIGN, but here we only assume the former alignment.
<heat>
*/
<heat>
* kmalloc and friends return pointers aligned to both ARCH_KMALLOC_MINALIGN
<geist>
ah there you go
<heat>
arch_slab_minalign is alignof(unsigned long long)
<geist>
thankfully we avoided basically all of these sort of shenanigans in zircon kernel by not doing any of that in the kernel
<mjg>
heat: wut
<mjg>
utter nonsense
<heat>
this may not be a big deal
<heat>
depends on the kernel's allocation pattern
<geist>
maybe kmalloc is less of a concern, since most stuff is slab allocated
<heat>
hmm idk about that
<geist>
guess that's easy enough to figure out by looking at the slab info
<heat>
common stuff like file, vm_area_struct, etc, sure
<heat>
but idk if those represent the most allocations
<geist>
and heh yeah. if you look at /proc/slabinfo on an arm64 machine, the lowest bucket is -128
<heat>
see, this is why arm64 is a shit shit architecture lol risc loser go back to mips x86 rulez intel amd oh yeah
<mjg>
here is ONE WEIRD TRICK: fix whatever fucking api they are using to kmalloc_aligned or whatever it is
<mjg>
how the fuck did something like this go in
<heat>
mjg, OR
<geist>
mjg: we just established it. it's becuase of DMA cache coherency on arM
<heat>
dma-kmalloc-16
<heat>
isn't there a flag for this? GFP_DMA or something
<mjg>
geist: except most kmalloc'ed bufs are not used for dma, are they
<geist>
that's the point!
<geist>
they are. and that's the problem
<mjg>
what?
<geist>
probably beause no one gives a shit on x86 so they built the api like that
<geist>
and then on architectures where there's a problem. the 'temporary' hack was to reduce the minimum bucket to maximum cache line size (128)
<mjg>
they can still fix it?
<mjg>
i see 0 justification in the patchset as to why they don't do that instead
<geist>
i dunno, but that's how solutions are made. bandaid over it, fix it later. not all things are pefect all the time
<heat>
GFP_DMA is a thing man
<heat>
i think they could just use it
<mjg>
geist: that's how crap lasting decades is made
<geist>
making it locally work on your new architecture (arm64) is a lot easier than fixing the underlying problem. it's the essence of Real Engineering
<geist>
you can't always fix everything perfectly the first time
<mjg>
cue you in 10 years claiming well it clearly made sense bro, not a hack!
<geist>
*shrug*
<geist>
i mean somehting working sub-optimally is far better than not working at all
<heat>
yeah i mean
<geist>
what would make me more worried is a hack like this actually improving performance, such that it's harder to remove later
<heat>
this is unix
<mjg>
if they demonstrate how infeasible it is to fix
<mjg>
then sure
<geist>
by having less cache lines alias, etc etc
<mjg>
but i see nothing of the sort
<mjg>
so far looks like a lazy cop out, webdev stle
<geist>
most likely the globla, architecturally neutral api expects kmalloc to work fine
xenos1984 has joined #osdev
<geist>
and so fixing it involves rethinking/retooling that
<geist>
and so the local (arm64) hack is this
<heat>
this is defo not an arm64 only thing
<geist>
you should look at something like arch/ppc and how it has to deal with page tables in linux
<mjg>
or *maybe* replacing with places of kmalloc with kmalloc_aligned(..., ARCH_DMA_ALIGN)
<geist>
it's a total clusterfuck. but linux.
<mjg>
or similar
<mjg>
which would change nothing on x86
<geist>
heat: probably anything not x86 that doesn't have dma cache cohereny
<mjg>
but again 0 analysis performed
<mjg>
this is the kind of bullshit i normally expect in the bsd land
<mjg>
:]
<geist>
but anyway i have no real idea. we're seriousy armchair quarterbacking this thing
<bslsk05>
twitter: <marcan42> OMG. So it turns out Ampere Altra botched their PCIe controller in a way that makes it unable to use (e)GPUs just like Macs.   So what did they do?   They put an ARM64 load/store emulator into the kernel. <github.com/Tencent/Tencen… https://t.co/HS6NOK4Y90> <github.com/Tencent/Tencen… https://t.co/2UFPBLMmcf>
<mjg>
what's htep oint of linux existing if you can't shit on it
<mjg>
:thinkingface:
<geist>
heat: heh i've actually seen worse than that in vendor local linux trees
<geist>
there was a tree years ago where $vendor just inserted memory barriers all over the place, including in some core linux macros
<geist>
fixed the problem
<j`ey>
more barriers less problems
<heat>
there you have it
<heat>
fixed
<mjg>
:]
<heat>
🚢it
<mjg>
have fun removing them
<geist>
i eventually dug into it, turns out there was a serious memory controller bug that would occasionally reorder instruction fetches in front of data fetches
<mjg>
geist: lol
<geist>
so it was still a bug but really only needed to be when reading data into a block you were going to run
<mjg>
gotta love the hw bugs
<geist>
but their hack had been to just sprinkle isbs all over the kernel untilt he problem went away
<geist>
with of course no explanation
<mjg>
:]
<heat>
have you seen an unmoveable printk
<geist>
turns out they knew about the bug, so they could have at least more correctly fixed it.
<heat>
/ DONT REMOVE OR ELSE IT BREAKS
<geist>
heh yeah
<mjg>
where
<mjg>
i did not
<geist>
anyway gonna go take a walk. toodles
<heat>
i guess we'll wait
<mjg>
so how are things in the onyx land
<mjg>
support wifi yet?
<heat>
no
<mjg>
then i'm sticking to serenityos for my daily driver
<heat>
i found a bug in my KASAN quarantine code and i'm trying to track down the problem
<mjg>
started listened to can't hurt me yet?
<mjg>
listening
<heat>
serenity supports wifi?
<heat>
(X) Doubt
<mjg>
fuck if i know
<mjg>
does not stop me from picking it over onyx
<heat>
well sure
<heat>
they have a web browser, i have flamegraphs
<heat>
you choose
<mjg>
:thinkingface:
<heat>
enjoy their LibJS, i'll enjoy my flamegraph.pl
<mjg>
need to sleep on it
<j`ey>
heat: did you see one of their libjs dudes is on the js community in some form now
<heat>
nope
<heat>
i have limited brain power
<geist>
also just as i was leaving i saw a video about a guy that made a 3 meter long concrete sarcophagus with a bag of flaming out cheetos in it and buried it, not to be opened for 10000 years
<mjg>
i'll set a calendar event for it
<mjg>
is this stored next to nuclear waste
<heat>
hopefully serenityos doesn't have the 2038 epoch issue