klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
danilogondolfo has quit [Remote host closed the connection]
nyah has quit [Quit: leaving]
Burgundy has quit [Ping timeout: 260 seconds]
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
smach has quit [Ping timeout: 252 seconds]
Matt|home has quit [Quit: Leaving]
gog has quit [Ping timeout: 255 seconds]
Brnocrist has quit [Ping timeout: 252 seconds]
Brnocrist has joined #osdev
craigo has joined #osdev
xvmt has quit [Ping timeout: 264 seconds]
xvmt has joined #osdev
joe9 has quit [Quit: leaving]
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
wxwisiasdf has joined #osdev
<wxwisiasdf> Hiiiiii
<wxwisiasdf> today is the day we consume RISCV 64 and embrace the greatness of RISCV 128
smach has joined #osdev
fedorafansuper has joined #osdev
fedorafan has quit [Ping timeout: 252 seconds]
masoudd has quit [Quit: Leaving]
heat has quit [Ping timeout: 256 seconds]
smach has quit [Read error: Connection reset by peer]
sugarbeet has joined #osdev
sugarbeet has left #osdev [#osdev]
CryptoDavid has quit [Quit: Connection closed for inactivity]
fedorafansuper has quit [Quit: Textual IRC Client: www.textualapp.com]
small_ has quit [Quit: Konversation terminated!]
<mrvn> You went from riscv 32 to riscv 64 and it wasn't enough. What makes you think doing the same again will be any better? Come one, go up to the next operand. 64 * 64 = riscv 4096
<mrvn> 64^2
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
srjek has quit [Ping timeout: 268 seconds]
smeso has quit [Quit: smeso]
zxrom has quit [Quit: Leaving]
Vercas has quit [Quit: buh bye]
Vercas has joined #osdev
smeso has joined #osdev
<wxwisiasdf> mrvn: riscv 96-bit
wxwisiasdf has quit [Ping timeout: 264 seconds]
<geist> i think there is a prototype riscv128 in work though, i should dig up infos on it to see
bradd has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
bradd has joined #osdev
<sham1> What reason would there be
<sham1> What would be the use of a 128 bit ISA? I mean, okay, arithmetic, but other than that
<geist> well, the arithmetic could be a thing
<moon-child> I thought it was just about addr space
<moon-child> I mean, you could do just 128 bit arithmetic and smaller address space. That would be fine imo
<moon-child> (though not too useful in practice--multiword arithmetic is fine when you need it)
* geist nods
<geist> but a 64bit aspace we're fairy close to exhausting in some extreme situations, so i can see extending that out in a natural way to be a thing to consider
<moon-child> which conditions?
<moon-child> I mean, you could have more than 2^64 bytes of data
<geist> big ass machines
<moon-child> but it's not clear to me that you can practically exceed a 64-bit address space
<geist> also mapping large storage things into the aspace
<moon-child> and you don't want to double the size of your regular pointers
<geist> it's already enough that arm and x86 are extending from 48 to 57, etc
<AttitudeAdjuster> moon-child: bring back weird segment addressing of the old days maybe?
<geist> i'm not saying it's something anyone needs right now, but in 10-20 years easy
<AttitudeAdjuster> 16bit segment pointer with 64bit addr pointer
<moon-child> that's my point; the considerations for large storage things are different than for main memory. And doubling the size of all pointers seems like a bad tradeoff
<moon-child> AttitudeAdjuster: pls
<geist> so the riscv folks at least left in a nice forward compatibility mechanism
* geist shrugs
<AttitudeAdjuster> moon-child: fine i'll see myself out :'(
AttitudeAdjuster has left #osdev [dirty imposter]
AttitudeAdjuster has joined #osdev
<AttitudeAdjuster> jk
<moon-child> I wonder how many captchas on shady sites are fronts for captcha solving services
<moon-child> considering they get to sell a solved captcha and still get the benefit of a regular captcha
slidercrank has joined #osdev
truy has joined #osdev
<sham1> moon-child: doesn't even need to be a shady site. reCaptcha and thus nowadays Google does it even now to get training data
<moon-child> obviously. That's different
potash has quit [Quit: ZNC 1.8.2 - https://znc.in]
micttyl has joined #osdev
lockna has joined #osdev
bradd has quit [Ping timeout: 252 seconds]
bgs has joined #osdev
lockna has quit [Quit: lockna]
potash has joined #osdev
<epony> that implies that GOOG is not a shady business operaton.. but it is
<epony> nothing that happens in the inside is well understood and verifiable or validity evaluated by the public, it does some things that people speculate about and that's it
foudfou has quit [Quit: Bye]
foudfou has joined #osdev
<epony> the primary purpose of that is to rate limit the concurrency overloaded (enum) n:M (many) problem of servers that everyone uses, but that "concentration" is not really natural or meaningful, it's artificial (and not very intelligent)
<epony> it only obstruct regular usage people, not intentional violators of policies and limitations, nor business nor criminals, not mechanised and serviceable solvers and bypasses, as with copy protection and copyright and patents (and other "intellectual property") in general.. and GOOG steals secrets from your cornputers, that's why it's banned in research and development institutions and facilities outside USA (for example in German universities and other places)
Left_Turn has joined #osdev
<geist> can you just stop
<epony> yes
<geist> then please do
<epony> ok
Turn_Left has quit [Ping timeout: 268 seconds]
bradd has joined #osdev
truy has left #osdev [#osdev]
valerius_ is now known as valerius
potash has quit [Read error: Connection reset by peer]
smach has joined #osdev
<dinkelhacker> does anyone know how to make qemu start at EL3?
<dinkelhacker> nvm, found it -machine secure-on,virtualization=on
<geist> bingo. yep
<geist> aso means it wont emulate PSCI or whatnot, that's now your job (if you want to)
<dinkelhacker> as I don't know what it is I think I don't need it right now :D
<geist> yeah, if you just use virtualization=on you start at EL2 though
<geist> with PSCI emulated at a pseudo EL3
fedorafan has joined #osdev
craigo has quit [Ping timeout: 255 seconds]
danilogondolfo has joined #osdev
DonRichie has quit [Quit: bye]
smach has quit []
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
gog has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
unimplemented has joined #osdev
unimplemented has quit [Read error: Connection reset by peer]
<ddevault> is this slide comprehensible https://l.sr.ht/0MNQ.png
<zid`> not really
<zid`> Usually you'd put blank lines in and make a sort of diagram showing the switches
<ddevault> blank lines?
<zid`> making y the time axis
<ddevault> Y is the time axis here
<zid`> so like your (blocked) lines
<ddevault> but, hm
<ddevault> maybe a table is better than an enumeration
<zid`> your thing does two things at the same time on multiple rows, so it isn't cpu time on y
<ddevault> fixed some of the timing issues https://l.sr.ht/2Ke9.png
<ddevault> could be multiple cores, the key is not CPU time but task states
<zid`> I'd say just outright remove (blocked)
<zid`> it's just making the screen busier
<zid`> -- at best
<zid`> I'd get rid of line 12 for similar reasons
<zid`> and is task 1 line 6 doing anything?
<zid`> seems like it could be folded into 5, and remove another 'two things on same line' case
<ddevault> latest https://l.sr.ht/d8L8.png
<ddevault> not sure what these numbers refer to after several edits
<zid`> same
<ddevault> any better? https://l.sr.ht/ccq5.png
<zid`> much nicer
<zid`> I prefer the old colours I think though
<zid`> no idea what they were trying to express
<zid`> but they were prettier
<ddevault> orange is kernel, black is userspace
<ddevault> to be explained in narration
nyah has joined #osdev
<ddevault> here's the whole slide deck, still not done expanding it for the full hour slot https://l.sr.ht/04Q9.pdf
GeDaMo has joined #osdev
<dinkelhacker> So if one compiled with -pic for pa space and you switch to va space you can't just write the va offset to the pc and sp, right? I mean it works as long as you don't have any static function pointer arrays which will contain the pa addresses...
<zid`> well if it's pic it's pic
<zid`> if it's not pic it's not
<zid`> tautology best ology
<zid`> If it's PIC, you can.. position it wherever you want, if it's not, you cannot
<dinkelhacker> I'll have to check later but I think I compiled with -fpic
bradd has quit [Ping timeout: 248 seconds]
<dinkelhacker> zid`: even if a global array contains pointers to functions? I mean the addresses stored in memory can only be one value?
<zid`> you'd need to process the GOT for that
<zid`> and do relocations
<dinkelhacker> Hmm.. seems like it would be much easier if the bootloader already sets up the vaspace and you directly compile the kernel for that?
<zid`> I get it easy because I use two binaries
<zid`> I turn the mmu on and jump to the pre-prepared kernel binary built to run at a specific VA
<zid`> achieveable with a linker script fine though
<zid`> even as a single binary
<dinkelhacker> as a single binary? How? Tell the linker that this one part of the code is at pa and the rest at va?
<zid`> it's just two sections with two different virtual addresses
<zid`> . = 1M; .text.low : { bootstrap.o } . = -2GB; .text.high : { kernel.o } or such
<dinkelhacker> bootstrap.o would be at a physical address, then you turn on the mmu und jump to kernel.o which is at a virtual address? I don't get the "two different _virtual_ addresses part.
<zid`> VA = PA
<zid`> you can still consider it a virtual address
<zid`> it's just identity mapped until the mmu is on
<zid`> your code doesn't give a shit about the physical address, just which virtual address things are visible through
<dinkelhacker> okay but wouldn't the image grow if you they are far apart?
<zid`> We're only changing the virtual addressing
<zid`> the physical is still the load address of the ELF (1MB for me, text.low would be at like 0x1001000 and text.high would be at 0x1002000)
<dinkelhacker> and what exactely tells the linker that you are changing the virtual address?
<zid`> . =
bradd has joined #osdev
<zid`> I made a test setup I can show you
<bslsk05> ​zid/test_va - Example (0 forks/0 stargazers)
<zid`> There
<zid`> f() and g() both know which address they will be running from, as shown by the disassembly
<zid`> you can also use AT() to disjoint what ends up in the program headers, if needed
xenos1984 has quit [Read error: Connection reset by peer]
<zid`> or >
fedorafan has quit [Ping timeout: 256 seconds]
<dinkelhacker> thx! I'll take a look. I thought I did that at some point and it ended up groving my image a lot. But now that you explained it I don't kow why it should.
<zid`> you did . inside the {}
<zid`> so you had 'start of section at x, end of section at y'
<zid`> so it had to pad it
fedorafan has joined #osdev
<zid`> That's a weird binary that says .text.low will be in physical memory at 10M but expects to run from 1M, and .text.high will be in physical memory at 20M but expects to run at 128M
<zid`> I have a 1M = 1M, and a 1.1M = 510TB for my acutal thing, the 1M=1M low code runs with paging disabled, I use it to set up the 510TB -> 1.1MB mapping, then jump to 510TB
Burgundy has joined #osdev
<dinkelhacker> oh okay! I think I got it
xenos1984 has joined #osdev
Burgundy has left #osdev [#osdev]
bauen1 has quit [Ping timeout: 256 seconds]
truy has joined #osdev
smach has joined #osdev
gxt has quit [Remote host closed the connection]
gog has quit [Ping timeout: 246 seconds]
gxt has joined #osdev
heat has joined #osdev
elastic_dog has quit [Ping timeout: 252 seconds]
elastic_dog has joined #osdev
smach has quit [Read error: Connection reset by peer]
<dinkelhacker> zid`: thx, btw ;)
<ddevault> final slide deck https://l.sr.ht/Lw4Y.pdf
bauen1 has joined #osdev
bauen1 has quit [Ping timeout: 255 seconds]
bauen1 has joined #osdev
dutch has quit [Quit: WeeChat 3.8]
fedorafan has quit [Ping timeout: 252 seconds]
<dinkelhacker> zid`: ok so I've actually done it like you mentioned. But when I create a binary I used objcopy -O binary out.elf out.img. The -O binary makes it bigger actually
Gooberpatrol_66 has joined #osdev
<dinkelhacker> at least when I have sections with addresses far apart. Without that my binary is actually smaller (30k instead of 250k)
TkTech7 has joined #osdev
xvmt_ has joined #osdev
sebonirc_ has joined #osdev
Patater has joined #osdev
pounce_ has joined #osdev
puck__ has joined #osdev
fedorafan has joined #osdev
zhiayang_ has joined #osdev
samis has joined #osdev
childlikempress has joined #osdev
nyah_ has joined #osdev
outfox_ has joined #osdev
doppler_ has joined #osdev
corank has joined #osdev
nyah has quit [*.net *.split]
xvmt has quit [*.net *.split]
sortie has quit [*.net *.split]
TkTech has quit [*.net *.split]
DrPatater has quit [*.net *.split]
CompanionCube has quit [*.net *.split]
corank_ has quit [*.net *.split]
_koolazer has quit [*.net *.split]
Gooberpatrol66 has quit [*.net *.split]
ebb has quit [*.net *.split]
puck has quit [*.net *.split]
Clockface has quit [*.net *.split]
sebonirc has quit [*.net *.split]
mahk has quit [*.net *.split]
outfox has quit [*.net *.split]
stux has quit [*.net *.split]
doppler has quit [*.net *.split]
zhiayang has quit [*.net *.split]
moon-child has quit [*.net *.split]
pounce has quit [*.net *.split]
xvmt_ is now known as xvmt
sebonirc_ is now known as sebonirc
pounce_ is now known as pounce
TkTech7 is now known as TkTech
zhiayang_ is now known as zhiayang
sortie has joined #osdev
koolazer has joined #osdev
<heat> dinkelhacker, how does your linker script look?
<heat> for a regular elf if you start jumping around the vaddr when objcopying to binary you are forced to have padding
<heat> so PHDR [1MiB, 2MiB], PHDR [4MiB, 4MiB + 4] will objcopy to ~3MiB + 4 bytes
<heat> sorry, not vaddr but probably paddr
ebb has joined #osdev
<zid`> unrelated
<zid`> honestly the phys field in an elf loader is *incredibly* rarely useful
<dinkelhacker> heat: it looks like so https://pastebin.com/N4fQ6rX5
<bslsk05> ​pastebin.com: ENTRY(_start)__stack_core_0 = 0x160000 - 0x10000;__stack_core_1 = 0x1600 - Pastebin.com
<dinkelhacker> so if I `obcopy -O binary` that I get roughly 30k. Without -O binary I have 250k.. just tried running that on the pi which did not work
<zid`> show readelf -l
<zid`> end is a bit of a mess btw
<dinkelhacker> I mean probably because without that its just an ELF file right? The pi expects a binary?
<zid`> . = align(4096); . = align(4096); bss_end = .; end = .;
<heat> I don't know. maybe?
<heat> they usually expect a flat binary
<heat> but idk about the pi
<zid`> idk what pi expects, qemu can probably deal with elf at least
<zid`> but, show readelf -l
<heat> btw, let me guess, your elf has debug info/syms
<heat> :))
<zid`> /DISCARD/ ho
<heat> btw, quick linker script tips: you can ALIGN(4096) when declaring your sections (like .text ALIGN(0x1000) : ...), you should do *(.data*), *(.text*) because the compiler sometimes generates stuff like .text.hot, etc
<zid`> .text.startup
<zid`> is a classic
<zid`> and arm type devices always have a bunch of weird shit
<dinkelhacker> yeah qemu can but not the pi.. so that's my problem I can't compile it in a way where I have some code in the pa space and some in the va space to get around the problem I head when switching to va space
<zid`> like .data.constpool.rel8
<zid`> dinkelhacker: readelf -l plskthx
<heat> dinkelhacker, why can't you
<zid`> You just need to make the rom be what the elf would be sans header, which will probably just be.. to do nothing besides move . around
<dinkelhacker> heat: bc. the binary whould be huge?
<heat> it would not
<zid`> no, you're confusing file offsets with virtual addresses
<heat> you just need to do it properly
<zid`> file offsets should be linear and packed, you use the mmu to map some of that file into memory at high addresses
<heat> ELF supports vaddr != paddr
gildasio has quit [Ping timeout: 255 seconds]
<zid`> paddr doesn't even matter here, heat
<zid`> we won't be using a physical loader for the elf
<heat> in the linker script you can use AT(...) to set up the paddr for your sections
<zid`> if paddr mattered, which it won't
<heat> wdym "we won't be using a physical loader for the elf" ?
<zid`> elf will be flashed to a rom or whatever
<zid`> nobody is then going to 'load' the elf section by section into physical memory
<zid`> it'll get splatted there in an -O binary blob
<clever> but objcopy to .bin, uses paddr rather then vaddr when laying out sections
<zid`> which is why you should ignore it
<clever> so you may have a gig of gap in the vaddr, but no gap in paddr
gog has joined #osdev
<zid`> just shove everything into .text starting at 0 if you're blobbing
<heat> sure, but you need to do it properly to get a usable ELF you can easily objcopy or use for debugging, etc
<zid`> ignore paddr, let the linker sort it out
gildasio has joined #osdev
<zid`> yea that tracks
<clever> zid`: the difference matters most in XIP targets, where you want the linker to put .data in ram, but objcopy to put .data into the ROM with .text
<bslsk05> ​pastebin.com: Elf file type is EXEC (Executable file)Entry point 0x160000There is 1 progra - Pastebin.com
<zid`> clever: that's a loader
<zid`> jesus lol
<zid`> oh, -ffunction-sections?
<heat> what the fuck
<heat> why are you putting everything in one phdr?
epony has quit [Read error: Connection reset by peer]
<zid`> I mean, that's what I do, unless like you said, I need to run it through other tools
<zid`> like deboogers
<clever> looks like the linker script didnt merge .text.* into .text
<heat> it did not
<zid`> but I use the wildcard :p
<heat> dinkelhacker, btw your load address is bogus
<zid`> .text : { *.o (.text*); }
<zid`> --wide exists also btw
<dinkelhacker> guys I can't follow anymore >D
<kaichiuchi> hi
<zid`> sections go in, sections go out
<heat> ok everyone shut the fuck up
<heat> including kaichiuchi
<heat> fuck you
<kaichiuchi> fuck you too
<zid`> oh I have him ignroed, makes sense
<heat> <3
<kaichiuchi> zid`: me!
<kaichiuchi> ?
<heat> dinkelhacker, pressing concerns: your load address is nothing a rpi will ever load
<zid`> pfft I checked logs he was being fine
<bslsk05> ​github.com: documentation/boot.adoc at develop · raspberrypi/documentation · GitHub
<kaichiuchi> wonder why zid` would ignore me
dude12312414 has joined #osdev
bauen1 has quit [Ping timeout: 252 seconds]
<kaichiuchi> i’d love to target an OS to rpi
<heat> dinkelhacker, I've also heard "0x80000 for older 64-bit kernels ("arm_64bit=1" set, flat image)"
<gog> hi
<kaichiuchi> hi
<heat> hell
<gog> did i mss some darama
<heat> no
<kaichiuchi> no
<gog> boring
<clever> heat: you can also just set kernel_address= in config.txt to force a certain load addr, as long as it doesnt conflict with other parts
<heat> dinkelhacker, TLDR your load address makes no sense and that would explain it if your thingy doesn't work
<heat> clever, any insight into the load address spaghetti fuckery?
<clever> i would need to see the linker script
<dinkelhacker> heat: The binary the pi will load is a small bootloader I have on the sd card which allows me to send the actualy binary via uart. However, this bootloader has the some load address. I mean, the pi just loads the binary to address 0x80000 and jumps to it.
<dinkelhacker> and it works fine ... i don't think the load address matters at all for the pi?
<clever> dinkelhacker: its more, that if your not writing PIC code, and your binary is loaded to a different address from where you linked it, things malfunction in fun ways
<heat> yes, the address you link your binary to run at needs to more or less match or here be dragons
<clever> but if the bootloader is loading your binary to 0x160000, it should be fine
<heat> s/more or less//
<dinkelhacker> yeah that is what I'm saying. The bootloader is linked to 0x80000, the pi laods that and executes it and loads the binary to 0x160000
<sham1> PIC without IP-relative addressing seems "fun". I hope that ARM has that
<clever> dinkelhacker: what is not working?
<dinkelhacker> but I still don't get how i should have one linker script where one code section is at 0x160000 and one at 0x40000000, objcopy that to a flat binary without it not beeing like 1GiB in size?
Vercas has quit [Ping timeout: 255 seconds]
<clever> dinkelhacker: thats what AT and paddr is for, to tell objcopy how to layout the things in the .bin, something else (mmu or memcpy) then has to move them to the "right" addr later
bradd has quit [Ping timeout: 260 seconds]
<heat> sham1, i think arm and riscv are mostly PIC
<clever> ive recently been looking into the encoding more, `b label` is always PC-relative, but bits0/1 of the addr are missing, because the target must be 32bit aligned
<clever> but if your jumping to something that might be thumb, you need `bx r0`, and now you need to get the addr into r0 first, `ldr` is typical, but thats not usually PIC
<clever> and i vaguely remember an `adc` opcode, that is basically just `r0 = pc + offset`
<bslsk05> ​github.com: Onyx/linker.ld at master · heatd/Onyx · GitHub
<heat> this linker script has code at 16MiB and -2GiB (almost +256TiB)
<heat> as you may guess, I don't get a 256TiB blob :))
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
<dinkelhacker> I thought AT was irrelevant? o.O
<dinkelhacker> but that makes a lot more sense ^^
<clever> objcopy uses the paddr, AT sets the paddr
antranigv has joined #osdev
<dinkelhacker> Ok... that makes sense... THANK YOU
<sham1> I dislike this immensely. Why would you physically link your kernel at 16MiB heat
bauen1 has joined #osdev
<sham1> Just make a separate thing that puts your kernel at -2TiB vaddr from the outset
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
mlombard has joined #osdev
<zid`> 16MB gives you loves of space for activities underneath
<zid`> lots
<zid`> like, 128 stacks
valerius has quit [Killed (NickServ (GHOST command used by theophilus!~corvus@user/theophilus))]
valerius_ has joined #osdev
<clever> dinkelhacker: also, a handy trick, if you pass qemu a .elf file, it will respect the load addresses (i forget if its paddr or vaddr) and i think the entry-point
<clever> dinkelhacker: so you could skip your bootloader when in qemu
<dinkelhacker> I do that.
antranigv has quit [Quit: ZNC 1.8.2 - https://znc.in]
epony has joined #osdev
antranigv has joined #osdev
Vercas has joined #osdev
Vercas has quit [Client Quit]
<dinkelhacker> how do you actually set the load address?
Vercas has joined #osdev
<bslsk05> ​github.com: gba-template/linker.ld at master · cleverca22/gba-template · GitHub
<clever> this defines various regions of memory that the linker should know about
<clever> and the > later on, says which region a section belongs in
<clever> .data has a vaddr within iwram, but a paddr within rom
<clever> any time c/asm refers to a symbol in .data, it will get the vaddr of the symbol
<clever> but objcopy -O binary, uses the paddr
Burgundy has joined #osdev
Vercas has quit [Client Quit]
Vercas has joined #osdev
<dinkelhacker> So my load address was weird because I never set a paddr in the linker script?
<clever> there was probably a default load addr somewhere
<clever> or your using that trick others do, where they just shove 16mb of zeros into the .o file, via .space
<clever> and praying it all lines up
<dinkelhacker> where could that default load address be?
uzix has joined #osdev
<clever> somewhere in the binutils source
uzix is now known as mahk
<dinkelhacker> kk
<dinkelhacker> and that load address whould be normaly used by a proper loader?
<clever> when using a .bin file, the load address is basically lost
<clever> objcopy just gives you a binary, that spans the lowest addr to the highest addr
<clever> and its your responsibility, to ensure its loaded at the addr the linker was expecting
<clever> dinkelhacker: another option, is to just implement elf in the bootloader, and send it that
<clever> then the bootloader will respect the elf headers
<dinkelhacker> Yeah that lines up with what I knew. I think I just got completely confused bc. I didn't know about about the paddr/vaddr/objcopy thing.
srjek has joined #osdev
<dinkelhacker> Maybe once I have usb running. sending more than a couple of KB via uart is so slow
<dinkelhacker> heat: thx for the link of your linker file. That helps!
<zid`> Imagine having more than a couple of kb of project
<clever> i implemented xmodem a while back, for loading an entire .elf
<clever> it wound up taking 2 minutes to load
* zid` hides his 3.2MB kernel image
<clever> so i went back to using the official netboot
<zid`> It's water weight I swear
<clever> dinkelhacker: which reminds me, you can just have your rpi boot kernel.img over tftp, 100mbit or more!
<zid`> It *might* be the giant background bmp.
<sham1> I wouldn't imagine XModem being particularly fast
<dinkelhacker> zid`: I bet you don't send it via uart^^
dutch has joined #osdev
<zid`> who needs a uart when you have have a background bmp
<clever> sham1: yeah, all it added was error detection and retry
<clever> i was also only running at 115200 baud
<zid`> lto breaks --wide even, stupid lto
<clever> but i have ran at 1,000,000 baud before, and could have tried that
<zid`> .gnu.lto_ayame.0.9b1d301769837a9b
<zid`> good section name
Vercas has quit [Ping timeout: 255 seconds]
<sham1> Thanks .gnu
<clever> dinkelhacker: have you looked into the netboot on the pi yet? it works on every model
<clever> (that has ethernet)
<dinkelhacker> no i haven't
<clever> it lets you just throw start(4).elf + kernel.img onto a tftp server, whack reset on the pi, and boom, its running
<clever> no need to swap SD cards, no need to wait on uart
<dinkelhacker> seems like I tend to set life difficulty to `hard`....
<dinkelhacker> "wack reset" please don't tell me it has a reset button...
<clever> it has a reset pin
<clever> in the past, ive wired it to a giant arcade console button
<clever> so i can just smack it every time the build is done
<clever> but lately, ive wired reset to a pin on my uart adatper
<dinkelhacker> I've setup an interrupt on one of the gpios which i trigger through openocd and than let the watchdog timeout
<clever> openocd could also just halt the arm, and then write to the watchdog
<clever> and with the arm halted, it cant fight back!
<dinkelhacker> funny you say that ... i realized that today and tried which seems to segfault my openocd version
<clever> that sounds like a bug in openocd
<clever> also beware of the arm mmu, you might need to turn it off, if your not sure where the mmio is mapped
<clever> also, with just 3 opcodes (and knowing which registers can be clobbered), you can write a single byte to the uart
<dinkelhacker> Follow up question on the linker topic: So i get now how i can compile the code so that one portion uses pa and the other va. If one version uses a function that is normally in the other world that won't work? Or will it with PIC?
small has joined #osdev
<clever> in the past, ive made a putc ASM macro, so i could just print a char anywhere, to debug things
<zid`> It's all VA.
<zid`> Just sometimes the mmu is disabled, such that PA=VA
<zid`> (or identity mapped)
<dinkelhacker> okay right
<clever> the linker always acts on vaddr, and assumes the vaddr is always right
<clever> so when the mmu is off, you need to ensure the binary is loaded to that vaddr (or exclusively use PIC asm)
<clever> when the mmu is on, you then need to ensure the binary is mapped to that vaddr
<clever> 2023-01-24 10:09:07 < heat> this linker script has code at 16MiB and -2GiB (almost +256TiB)
<clever> this example, has 2 chunks of code, a pre-mmu code, with an addr that is in valid phys memory
<clever> and some post mmu code, that lives at the top of the virt addr space
<dinkelhacker> and you can't call the post mmu code from the pre mmu code
<clever> if the mmu is on, and youve mapped both to the respective addresses, you can call back and forth
<dinkelhacker> yeah of course
<dinkelhacker> well that might be the way to go
<clever> but typically, the pre-mmu half is only mapped for a short time, until you jmp to the post-mmu code
masoudd has joined #osdev
<clever> then the pre-mmu half is discarded
<zid`> va -> what address I want to jump to to run this code
<zid`> if your mmu is off at the time, that locks you into "it has to be the same as the physical address it is loaded to", if it's on, it can be whatever you like
<zid`> You know what the case is
<dinkelhacker> Right. So the first thing the pre-mmu part would do is map the post-mmu part. Now nothing can go wrong at this point and you can jump wherever. After that I jump to post-mmu code and disable the pre-mmu mapping
<dinkelhacker> Is that more or less what you would do?
<clever> yep
<clever> yep
<zid`> <zid`> I have a 1M = 1M, and a 1.1M = 510TB for my acutal thing, the 1M=1M low code runs with paging disabled, I use it to set up the 510TB -> 1.1MB mapping, then jump to 510TB
<zid`> and we're full circle again :p
Vercas has joined #osdev
<dinkelhacker> Yeah.. I'm a bit slow today.. woke up at 5 bc our central heating died
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
slidercrank has quit [Ping timeout: 265 seconds]
bauen1 has quit [Ping timeout: 260 seconds]
<mrvn> dinkelhacker: I found it becomes far easier to understand and implement if you separate the pre-mmu and post-mmu parts fully. Build your kernel to run in virtual address space and make a blob of that. Then make a tiny loader that has a bit of ASM code and the kernel blob and just activates the MMU, sets the page table and then calls into the actual kernel.
<mrvn> there shouldn't really be any shared code between the two.
<clever> that split design also makes it far simpler to have a pre-mmu printf, and a post-mmu printf
<clever> and you can just printf() from either, and it will call the right variant
<zid`> If you're really disgusting it can be the same printf twice
<zid`> using cool ifdefs to stop it including the wrong headers, yum yum
<clever> another option (little-kernel for example), is to hand write the pre-mmu part as PIC asm
<mrvn> not like the pre-mmu stuff should need a full printf. a puts() and put_hex() at most.
<clever> zid`: headers shouldnt matter, it can even be the same printf.o, its purely what the linker script does
<clever> mrvn: that as well
<clever> with LK, the pre-mmu part is as dumb a a brick, and i dont think it even has a stack
<clever> and because its PIC, the load addr can be "wrong"
<clever> and it will just configure the mmu to fix that
<mrvn> that's the ideal case.
<zid`> clever: Depends where your printf is
<clever> yep
<zid`> mine's in basically "string/stdlib except malloc.o"
<zid`> so you'd need to massage the source a small amount with some light ifdefs to stop it trying to pull in the rest of my kernel
<dinkelhacker> mrvn: I was thinking about that but wouldn't I end up with 2 binaries ?
<clever> zid`: ive had trouble getting newlib to work on my latest project, so i just grabbed the old rpi-open-firmware printf
<bslsk05> ​github.com: gba-template/xprintf.c at master · cleverca22/gba-template · GitHub
<mrvn> dinkelhacker: sort of. You build kernel.elf -> kernel.blob and that you link into the loader.
<clever> zid`: this basically just turns into a xprintf.o with a .text, and in theory, the linker could then include that in both the pre-mmu and post-mmu binaries
<zid`> https://github.com/zid/bootstrap/blob/master/boot/print.c I just wrote one, ignore the ghetto as fuck ega text parts :p
<bslsk05> ​github.com: bootstrap/print.c at master · zid/bootstrap · GitHub
<zid`> but I started the kernel one by just copy pasting this file
<dinkelhacker> mrvn: okay so only one binary in the end?
<zid`> so really I could hve just done stupid incestuous linking
<clever> dinkelhacker: yeah, the post-mmu binary gets baked into the second binary
<clever> either with cat, or .incbin
<zid`> lame
<mrvn> dinkelhacker: yes. In many cases you only have the option of a kernel and initrd. With multiboot you can do loader, kernel, initrd, other-blobs, ... but that is rare.
<bslsk05> ​github.com: lk-overlay/payload.S at master · librerpi/lk-overlay · GitHub
<clever> dinkelhacker: here is a .incbin example, where i'm taking the objcopy output of another build, and including it into the .rodata
<mrvn> dinkelhacker: On some hardware you even have to attach the initrd to the loader/kernel for a single file alltogether.
<clever> xen under grub, abuses the initrd api, to pass the true kernel to the xen "bootloader" kernel
<dinkelhacker> clever: so you just branch to bcm2835_payload_start and - abrekadabra - you
<dinkelhacker> are in the other binary?
<clever> dinkelhacker: you would want to configure the mmu, so something like -2GiB maps to bcm2835_payload_start
<clever> and then turn on the mmu and jump to -2GiB
<clever> .align can be used, to ensure bcm2835_payload_start is page-aligned
<mrvn> dinkelhacker: in the simplest case the included binary just starts with the entry point and you just jump to it. But you can also have a blob that contains structured data telling you where the the .text, .rodata, .data, .bss section of the kernel is. Where the entry point is. A whole lot of relocation data so you can do address space randomization. But just calling the payload_start is a good begining.
<dinkelhacker> Okay... man I just wanted to run the thing in qemu, which made me realize all kinds of things have just worked by accident because of quirks of the pi and now I'm basically back to the start >D But that's good I feel like the picture gets much clearer.
<zid`> ye I rewrote my boot setup shit several times
<zid`> until I got something I was only vaguely unhappy with
<dinkelhacker> Haha yeah sometimes it's one step forward, 2 miles back
<clever> some things i need to look into in the future
<clever> 1: usb-device bootloader, for the device capable models
<clever> 2: usb-host bootloader, with msd/tftp support
<clever> 3: fixing u-boot
<clever> 4: implementing psci
stux has joined #osdev
Matt|home has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
joe9 has joined #osdev
gildasio has joined #osdev
Vercas has quit [Ping timeout: 255 seconds]
<dinkelhacker> a lot to do
pretty_dumm_guy has joined #osdev
Vercas has joined #osdev
terminalpusher has joined #osdev
bauen1 has joined #osdev
<bslsk05> ​www.phoronix.com: Trying Out The BSDs On The Intel Core i9 13900K "Raptor Lake" - Phoronix Forums
<kaichiuchi> you forgot to highlight me as well
<kaichiuchi> :(
<kaichiuchi> since i am a bsd fan
masoudd_ has joined #osdev
masoudd has quit [Read error: Connection reset by peer]
<mjg> well in short it is already resolved, just not present in the release they tested
<kaichiuchi> thanks
<mjg> and it was not even a freebsd bug per se
small has quit [Ping timeout: 252 seconds]
<mjg> even so, makes you wonder how come openbsd did not have the problem
craigo has joined #osdev
craigo has quit [Read error: Connection reset by peer]
<zid`> That's exactly as working as I expected freebsd to be
gog has quit [Quit: Konversation terminated!]
<mjg> :)
craigo has joined #osdev
truy has left #osdev [#osdev]
craigo has quit [Ping timeout: 252 seconds]
masoudd_ has quit [Quit: Leaving]
craigo has joined #osdev
linearcannon has joined #osdev
terminalpusher has quit [Remote host closed the connection]
terminalpusher has joined #osdev
masoudd has joined #osdev
small has joined #osdev
xenos1984 has quit [Ping timeout: 246 seconds]
xenos1984 has joined #osdev
craigo has quit [Ping timeout: 246 seconds]
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
joe9 has quit [Quit: leaving]
craigo has joined #osdev
childlikempress is now known as moon-child
Vercas has quit [Ping timeout: 255 seconds]
Vercas has joined #osdev
small has quit [Ping timeout: 260 seconds]
fedorafan has quit [Read error: Connection reset by peer]
fedorafan has joined #osdev
gog has joined #osdev
joe9 has joined #osdev
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
xenos1984 has quit [Ping timeout: 246 seconds]
<ddevault> back to EFI grief
<ddevault> yeah this ain't it
<ddevault> $ git add boot
<ddevault> $ git commit -m "some garbage that doesn't work"
<ddevault> $ git checkout master
<gog> taht's programming
xenos1984 has joined #osdev
<ddevault> would be nice if someone wrote a good linker
<ddevault> a halfway decent linker that can build hare programs and/or helios is probably only a few weeks of work
<ddevault> hmm...
genpaku has quit [Read error: Connection reset by peer]
vexmane has joined #osdev
fedorafansuper has joined #osdev
genpaku has joined #osdev
fedorafan has quit [Ping timeout: 252 seconds]
<zid`> ddevault: Yea I've considered a quick and dirty linker as a fun project
micttyl has quit [Quit: leaving]
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
<sham1> Replace the GNU ecosystem from your OS build process one by one
<sham1> Where GNU of course is Giant, Nasty and Unavoidable
<mjg> and BSD is Bad, Stale and Dead
<sham1> Right. That's why we should all just use TempleOS
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
fedorafansuper has quit [Read error: Connection reset by peer]
fedorafansuper has joined #osdev
<heat> sortie, linker when????
<mjg> sortild
<mjg> not a good name
<sham1> sortie-link
<sham1> Could also say that it's an exit of some kind
<mjg> i would totally use Elon Musk linker
<mjg> would probably be named linkex
<heat> sortie-link is very microsoft
<heat> ...perfect for MAXSISTRING
<heat> CONST STATIC MAXSI_STRING gOutputName
<heat> mjg: mjg's object link editor
<heat> mold for short
<mjg> you are just jelly onyx does not run on a toaster
<heat> NetOnyx when
srjek has quit [Ping timeout: 260 seconds]
<mjg> here is a historical lolfact concerning netbsd
<mjg> when they decided to larp as a smp-capable os a bunch of code showed up which required the CAS instruction
<mjg> around 2009 or so
<mjg> apparently however the instruction is implemented on *VAX* it sucks terribly over there
<mjg> and some dude started protesting the smp effort becaue ofi t
<heat> they said it runs everywhere
<heat> they did not say it runs everywhere, *well*
<mjg> "of course it ruins netbsd"
<mjg> the official slogan misses a letter by accident
<heat> lol
<heat> still can't believe none of you idiots have /bin/python3
<zid`> I have a /usr/bin/python3
<zid`> does that help
<heat> no
<heat> the bsd idiots don't
fedorafa_ has joined #osdev
fedorafansuper has quit [Ping timeout: 252 seconds]
<zid`> if you want it in /bin you need to root me first
* zid` passwd -L heat
<heat> i don't run bsd
<heat> what the fuck do you think I am
<zid`> heat did you ever figure out how to use mkisofs
<heat> no
<heat> i can't connect my xbox one controller to linux thru bluetooth
<heat> thank you desktop linux
<heat> look at this shit
<mrvn> heat: lib is a link to usr/lib in most modern linuxes so a lot of people have it
<heat> the best part about using linux is that everything is fucking broken
<sham1> It's not broken, if you define it as not broken
bgs has quit [Remote host closed the connection]
<heat> ok so apparently I need to boot to windows to fix this shit
<heat> poggers
<heat> kill me now
heat has quit [Remote host closed the connection]
<sortie> what u do to our heat
<sham1> Made 'em launch Windows
<zid`> You can't leave a conga line, only form a rival conga line that is in competition with the original
<gog> i run bsd
<gog> just kidding i don't hate myself
<zid`> too busy leading a rival conga line to run bsd
elastic_dog has quit [Ping timeout: 252 seconds]
elastic_dog has joined #osdev
puck__ is now known as puck
<kaichiuchi> sometimes being a programmer is annoying
<kaichiuchi> definitely feels like you can’t write a hello world without 500,000 lunatics criticizing it
xenos1984 has quit [Read error: Connection reset by peer]
<mrvn> kaichiuchi: you are missing punctuation. :)
<kaichiuchi> :)
<jimbzy> Constructive criticism doesn't bother me.
<kaichiuchi> that’s fine
<kaichiuchi> there’s nothing wrong with that
<kaichiuchi> it’s when you get completely shit on no matter what you do
<kaichiuchi> not that i’m a victim of that
<jimbzy> Yeah, I give those people a standard, ":D" response and go about my business.
<kaichiuchi> but I saw something at work that I did not want to see
Vercas has quit [Quit: Ping timeout (120 seconds)]
Vercas has joined #osdev
danilogondolfo has quit [Remote host closed the connection]
<geist> yah also jokingly shit comments bugs me sometimes too
<jimbzy> ?
Vercas has quit [Client Quit]
Vercas has joined #osdev
<kaichiuchi> essentially, there is an intern who is legitimately trying to learn and get better
<kaichiuchi> but his boss is completely shitting all over him
<kaichiuchi> not a good look
<gog> definitely not
<gog> the point of an internship is to learn, not to get beat up
<gog> and if the boss is just beating up somebody who has no power in the arragnement then the boss is a massive jerk
<gog> if the internship is unpaid double my condemnation
<jimbzy> Unpaid internships should be illegal.
xenos1984 has joined #osdev
<gog> agreed
<gog> they are in many places anyway
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
Vercas has quit [Ping timeout: 255 seconds]
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
vexmane has quit [Quit: Leaving]
Vercas has joined #osdev
<immibis_> while watching emerge update my system I wonder why some kind of throughput scheduler isn't more common. Instead of running `make -j5` the system should have a queue of all remaining work, and it should pick the next item from the list whenever the CPU is idle.
<immibis_> it shouldn't be make's job to guess how many concurrent processes to run. It should queue them all as soon as they are ready to run, and the system decides when to start them
<immibis_> this scales properly when make runs make (or emerge runs make) without the need for a "job server"
ptrc has quit [Remote host closed the connection]
<sham1> Wouldn't the kernel in that case count as a job server
<immibis_> recursive make normally uses a "job server" process which just hands out "concurrent process tokens" so that you get 5 concurrent processes instead of 25
ptrc has joined #osdev
AttitudeAdjuster is now known as MorallyFlexible
<immibis_> sham1: only if you consider it to already be a job server since it already schedules processes
<mrvn> "This is the time when you run."
<immibis_> the time when I run is when a velociraptor is chasing me.
<mrvn> immibis_: If you start a new build whenever the cpu is idle then every time the compiler waits for a file to load from disk a new compiler spawns. YOu end up with all file being build in parallel.
MorallyFlexible is now known as EthicsGradient
<mrvn> Better would be to put all jobs into a group and always run the lowest PID in running state when a cpu is idle.
<mrvn> Picking the job that will run longest would be even better. Otherwise you end up with all jobs finished except one that takes forever.
EthicsGradient is now known as AttitudeAdjuster
<mrvn> and jobs that block many other jobs.
Vercas has quit [Ping timeout: 255 seconds]
<\Test_User> could you just start the next one ahead of time and wait() for the previous?
<\Test_User> or would that totally break if one of 'em took an absurd amount of time
<moon-child> immibis_: see discussion of a few days ago. Kernel has limited knowledge of what userspace is actually doing
<\Test_User> or actually, have make itself be multithreaded wait()ing on stuff, then fork and start the next when the last ends
<\Test_User> waitpid*
<\Test_User> actually no, generic wait would do from a single thread bc it'd detect if any exit, so yeah
<mrvn> \Test_User: how many can you start before you run out of resouces?
<\Test_User> ...but it should already be doing that, so where's the extra delay..
<mrvn> "start the next one ahead of time and wait()" is kind of what "make -j5" does. Every fork does a read() on the jobserver pipe instead of your wait but that's basically the same.
Vercas has joined #osdev
<mrvn> just fewer resources invested before the read()
<\Test_User> make -j5 runs 5 actively at once though, so more ram eaten
<mrvn> as it should. But all the extra ones wait on a read()
<\Test_User> though yeah... why would read be delaying long enough for an extra thread to make the difference ig isw the quest
<\Test_User> *question
<mrvn> The read blocks till one of the running 5 writes back a token.
<mrvn> Only then the new process starts up and allocates resources.
<\Test_User> and it's not writing as soon as it's done? or...
<clever> mrvn: that jobserver stuff might explain that weird bug ive noticed, where make sometimes hangs
<clever> but its been years since i saw it happen
<mrvn> it is. The only difference is that the resource allocation is after write insted on your start+wait idea it would be before
<clever> if i just whack the process with a non-fatal signal, it unhangs
<mrvn> clever: you should never loose tokens so make should never hang.
<clever> hence it being a bug
<clever> i never got good details on it, because it was so rare
<mrvn> kernel bug then, the read()s should wake up with pending data.
<clever> and now that i mention it, i realize i havent seen the fault in years
<mrvn> clever: did it maybe happen when 2+ processes finished and then you only wake up one read() even though that only processes 1 byte?
<clever> dont remember
<clever> i just know that make had no children, and wasnt using any cpu
<clever> its safe to assume its been fixed by now
<immibis_> \Test_User: starting the next one ahead of time and then waiting, seems equivalent to just running a certain number in parallel, like make already does
<immibis_> yes, RAM usage is a problem
<immibis_> CPU and I/O throughput are in some sense queue-able resources; if they are not available now, you can delay the task and get it later. Memory does not work that way.
<immibis_> of course this is a well-known fact in scheduler design
<mrvn> except it kind of does. you can swap out processes and run fewer compilers in parallel when ram gets tight.
<immibis_> What if the compiler was segregated into input/process/output phases - you could start a new input or output phase whenever the disk drive wasn't busy, and a processing phase whenever the CPU wasn't busy. With limits on the number of pending tasks in each state.
<mrvn> immibis_: use c++. I/O is quite irrelevant then.
<immibis_> you can swap processes out, but it seems slower than not having started them to begin with
<immibis_> mrvn: segregating the I/O phase avoids the problem of starting a new processing phase whenever a processing phase does I/O
<\Test_User> immibis_: having more waiting rather than running means less process switching
<mrvn> immibis_: only when you have to swap. if you have enough ram then running one compiler per core is worth it.
<mrvn> swapping is just to recover when you guessed wrong
<\Test_User> also removes the chance of enough ending at the same time
<immibis_> running more than one compiler per core can be better if they are I/O bound
fedorafa_ has quit [Read error: Connection reset by peer]
<immibis_> or rather, partially I/O bound. If they are fully I/O bound you might want to run one per disk drive :)
fedorafan has joined #osdev
* immibis_ 's system currently has 7 disk drives attached
<clever> that reminds me, twice now (on both linux and macos), ive seen bugs where not calling fsync on a file, and then copying it with cp, pokes giant holes in the file
<clever> the linux case, was a zfs bug
<bslsk05> ​github.com: Fix lseek(SEEK_DATA/SEEK_HOLE) mmap consistency by behlendorf · Pull Request #12724 · openzfs/zfs · GitHub
<clever> i dont know how macos had nearly the identical bug
<mrvn> In most cases the whole thing is a non-issue anyway. Just run one compiler per core. They have enough ram and all file I/O will just use caches or close enough with ssd.
<immibis_> that's not a terrible heuristic. I tend to configure N+1 parallel processes.
<clever> but in both cases, the hole detection api lied, and then cp copied around the fake hole
<clever> resulting in giant nulls in a file
<immibis_> either way the kernel should still be responsible for the parallel processing limit
<immibis_> or at least for avoiding extra context switches of processes tagged for throughput
<mrvn> clever: asking the FS where holes are and then copying around them is race prone.
<immibis_> if I start 5 compilers on 4 cores, and they all want to use the CPU, suspend whichever one is last, until one of the earlier ones does I/O
<immibis_> copying a file that's currently being written to is race-prone
<mrvn> immibis_: yes, you should have a process group like that
<clever> mrvn: in the zfs case, the problem is that after you close() a file, but it only exists journal, the kernel reports holes where data actually exists
<mrvn> clever: if you don't fsync() then there is no sequence point. So I would say user error
<immibis_> close, or rather munmap, should probably clear up such inconsistencies
<immibis_> if it doesn't I'd say that's a bug
<clever> yeah, fsync or even plain /bin/sync was enough to mask the problem
<immibis_> if you are copying the file while still mapped, that's user erro
<mrvn> "A successful close does not guarantee that the data has been success‐ fully saved to disk, as the kernel uses the buffer cache to defer writes.
<clever> immibis_: in both cases, it occured after the file was close()'d
<mrvn> " just close without sync is not enough
<immibis_> mrvn: but the kernel cache should be consistent
<clever> if you closed the file, then immediately copied with cp, it had chunks missing
<mrvn> immibis_: it should.
<clever> but if you closed the file, `sleep 120`, then cp, it didnt have chunks missing
<immibis_> apparently the ZFS bug is that ZFS did not update holes immediately on close/munmap
<clever> yeah
<mrvn> which is totally fine if the cp is not in the same process
<immibis_> no, it's not fine, because all processes use the same kernel cache
<mrvn> fine as in by specs
gildasio has quit [Remote host closed the connection]
<immibis_> close/munmap (whichever one it was) should behave as a sequence point. anything else is crazy
<clever> macos is more of a black box, and bisection pointed to a commit where coreutils had sparse support re-added
<mrvn> immibis_: yeah. but the specs explicitly say it's not
<clever> which implied macos was always broken, and just removing sparse support from cp fixed it
gildasio has joined #osdev
<immibis_> mrvn: the specs are stupid then. It's excusable for cache to not be written back on close, but it's not excusable for the cache itself to be inconsistent
<clever> yeah, i agree with that
<clever> if read() says there is data at a given offset
<mrvn> immibis_: might not be kernel cache but per process IO buffers
<clever> then lseek should not claim there is a hole at that offset
<immibis_> mrvn: per-process I/O buffers after closing and unmapping?
<mrvn> immibis_: sure. they take time to flush
<clever> mrvn: userland buffers where not the issue, it was basically a bash script that did: ghc foo.hs -o foo ; cp foo $out/bin/foo
<clever> and random holes appeared in the file
<immibis_> mrvn: explain where these per-process I/O buffers are implemented?
<mrvn> immibis_: anywhere between your code and the disk.
<mrvn> clever: in that case the process ending is a sequence point
<immibis_> mrvn: and where is that?
<mrvn> immibis_: in hypothetical land
fedorafan has quit [Read error: Connection reset by peer]
fedorafansuper has joined #osdev
<mrvn> immibis_: close can also fail before data is flushed to the FS.
<mrvn> (but has already closed the FD, so don't close it again)
<mrvn> Fun fact: If close() is interrupted by a signal that is to be caught, it
<clever> but in this case, it hasnt failed, because just running sync in a shell between ghc and cp fixes it
<mrvn> shall return -1 with errno set to EINTR and the state of fildes
<mrvn> is unspecified.
<mrvn> clever: obviously your case was a bug
<clever> yeah
<mrvn> clever: as said the process ending (the shell running waitpid) and starting the cp is a sequence/synchronization point.
<immibis_> there's another OS design problem here about flushing in general: how to square the desire that a process has really finished when it thinks it's finished, with the conflicting desire for efficiency when the file is temporary
<immibis_> when I run `cp -r ~/homework /mnt/usb/` I would like the command to finish when the copying has really finished
<immibis_> but when I run `cp foo.o build/foo.o` I would like the command to finish immediately so the command stream can run ahead. In fact I don't even care if the data is ever on the disk as I can remake it
<mrvn> immibis_: sync on close on removable drives?
<clever> immibis_: in the past, i wasnt aware of how much usb will buffer the crap out of things, and often thought "oh it crashed again" and forcibly remove the usb
<clever> i still dont know why usb lets the dirty memory hit 500mb+, while a hdd doesnt
<immibis_> probably because your hdd is faster to write back
<immibis_> because it's just faster
<mrvn> The "eject USB device" should show a popup with progress bar showing the amount of buffers to be written.,
<clever> immibis_: na, ive seen cp take 10 minutes to run before
<clever> its definitely blocking on the writes, and refusing to get dirty, heh
<mrvn> clever: dirty memory is kind of broken in linux. You get some 30% and then the data is flushed. While that happens you rack up gigabyte of more dirty data for the USB stick without it getting blocked.
<mrvn> clever: but not at first. Takes a few dirty/flush cycles before that happens. At first it blocks future writes correctly.
<clever> ah
<mrvn> Happens with USB sticks or NFS.
<mrvn> Somehow I don't see it with local disks, they might just be fast enough.
<clever> ive not seen it happen on nfs
<mrvn> write a few TB to NFS.
<clever> the cp on the nfs client always blocks for me
<clever> but ive not tried copying TB
<mrvn> Always worked for me for some 100GB and then suddenly it flips and has no limit.
<clever> the nfs server is also configured as async, for that client
<clever> so it should just lie and take everything
<clever> but i have noticed the write speed varies based on free space
<mrvn> that's likely the FS at fault if you are talking >90% full
<clever> 6gig free, out of ~8tb
<mrvn> any reserved free space?
<clever> just the usual zfs slop space
<clever> *looks*
<mrvn> zfs definetly has that slowdown when it gets full
<clever> [root@nas:~]# cat /sys/module/zfs/parameters/spa_slop_shift
<clever> 5
<mrvn> ext3/4 reserves 5% per default that only root can use and that isn't included in the free stats.
<clever> i forget the math, but this tunes how much zfs reserves, so the CoW doesnt hard jam from a full disk
<clever> if i echo an 8 into there, i suddenly have 105gig free
<clever> because i told it to reserve less
<mrvn> both ext and zfs slow down towards the end. zfs gets really slow.
<clever> so df may claim i have 6gig free, but its actually over 100gig
<mrvn> .oO(Gives you time to buy more disks before it fails :)
<clever> in the zfs case, the major slowdown is the spacemap histograms
<clever> and zfs_metaslab_try_hard_before_gang being turned on
<clever> each metaslab (like an ext block group) has its own free space list, and they are rather memory costly
<clever> so zfs only has a few loaded at once
Vercas has quit [Ping timeout: 255 seconds]
<clever> if zfs_metaslab_try_hard_before_gang is enabled, and zfs cant find a big enough hole, it will "try hard" (load more metaslab spacemaps) to find a properly sized hole
<mrvn> buy more disks and make some big holes.
<clever> without that, it can give up early (faster) and create a fragmented record, which harms performance more down the road
<clever> i also wrote a patch to zfs, that lets me generate these graphs cheaply
<clever> if that orange line hits zero, then even with zfs_metaslab_try_hard_before_gang, it will fragment most writes
bxh7 has joined #osdev
terminalpusher has quit [Remote host closed the connection]
<immibis_> cp over NFS reminds me of yet another OS design problem which is how to efficiently accelerate things that can be more efficiently done by external hardware or other computers
xenos1984 has quit [Read error: Connection reset by peer]
srjek has joined #osdev
<immibis_> you could expect to tell an NFS server "copy this byte range to that byte range" without downloading the entire byte range and uploading it again
<immibis_> and maybe NFS has that ability, and maybe it's even supported in cp, but it's all special-cased
<clever> i also have had other fun bombs go off with nfs
<immibis_> there's absolutely no ability for e.g. gcc -E to rewrite the unchanged segments of include files through that special case
<mrvn> immibis_: NFS doesn't but some filesystems have smart links for that
<clever> my server was graphing free disk space, and that involved running df in cron
<immibis_> and it would be completely absurd to expect gcc to write special-case code for it
<clever> the "server" had the laptop mounted over nfs (as an nfs client)
<clever> when i left for a trip, the laptop went with me
<clever> df then hung, because the nfs server was missing
<clever> cron kept forking out new df's, and swap just ate them all harmlessly
<clever> then the laptop returned....
<immibis_> on Windows, that would get you an ERROR_NETNAME_DELETED I think
<clever> every single df, woke up at once, and all demanded a share of the cpu, and ram
<immibis_> the decision of which errors to return to clients and which to attempt to paper over has no universal right answers
<clever> immibis_: thats what the soft vs hard mount flag controls, in nfs
<immibis_> I believe in MS-DOS, you could simulate a dual-drive system with a single drive. When accessing B: after accessing A:, the system would pause the running "process" and ask you to swap disks.
<clever> hard means retry forever
<mrvn> clever: I now that behavior. Takes a while but everything eventually recovers just fine
<clever> soft means give an io error if there is network problems
<immibis_> such emulation seems rather useful in odd cases and anti-useful in othres
<clever> immibis_: ah, i had seen that on YT recently, to get 3 floppy drives working on 1 machine
<mrvn> immibis_: AmigaOS has disk names so you can open "fonts:bla.ttf" and it will acess whatever drive thas the fonts floppy in it or ask you to insert it.
<clever> he had physical switches to re-route the drive select lines
<clever> and enabled that DOS feature, and then re-routed things manually
<mrvn> immibis_: you can even remove a floppy during write operations and reinsert it in another drive and it will just keep going.
<clever> neat!
<immibis_> every problem can be solved by adding more abstraction except the problem of too much abstraction
<immibis_> Linux also has this ability, if you were to set up something to automount floppies, but mount them at consistent paths - but it wouldn't block on access, you'd probably need something like FUSE for that
<clever> zfs can recover from a block device going missing mid-write, but only if it comes back at the same /dev/ path
<immibis_> the Linux behaviour of "an unmounted drive is an empty folder" is not a particularly sensible default
<clever> renaming or symlinks can fool it enough to work, its a limitation of the userland tooling
<mrvn> immibis_: no it doesn't. You can't umount and remount a device and have open files continue to work
<immibis_> it just falls out of the design of how Linux mounts go over the top of existing folders
<immibis_> mrvn: as we see with clever's df thing, hanging the process until the drive comes back isn't always a good idea either
<mrvn> immibis_: starting a cron job again while the previous is still running is just plain broken.
<mrvn> cron should never do that as default.
<clever> systemd timers dont do that!
Vercas has joined #osdev
<immibis_> also not universally true
<clever> because its less of a cron job, and more of a service, that starts (if not already running) on a schedule
<immibis_> and I bet if it was a flag, clever would've had a 50% chance of setting it to the wrong value because why would you even think to consider that?
<mrvn> immibis_: hence the "as default"
<clever> the df cronjob, was part of the cacti polling setup
craigo has quit [Ping timeout: 252 seconds]
<clever> but ive since moved to prometheus based graphing, which doesnt spawn a new process on every poll
<clever> so it would never fork-bomb the same way
<clever> more likely to just hang the entire exporter
<immibis_> instead it would just freeze the entire graphing system until the drive came back?
<mrvn> also bad
<clever> yeah, for that one machine
<clever> but not DoS level bad
<mrvn> you want to create a thread per resource so all other graphs still process
<immibis_> what we all want is a highly abstracted system, so everything is very flexible, with no abstractions so everything is very efficient
<clever> and thats why i just soft-mount everything now
<immibis_> the Cheetah/XOK webserver stores your static HTML files as pre-formatted TCP packets on disk
<mrvn> immibis_: sendfile to the rescue
<immibis_> sendfile is not the same level of abstractionlessness
<clever> mrvn: the exporter, is basically just an http endpoint, that returns all of the metrics
<clever> the central graphing server, has http timeouts, so it wont 100% die
mlombard has quit [Read error: Connection reset by peer]
<clever> it will just consider that 1 host as down
<immibis_> even being able to "store your HTML as TCP packets" requires cutting through a lot of abstractions and writing code that only works for the specific case of serving static files over TCP
<immibis_> and then you have to rebuild your static pages folder if the link MTU changes
<mrvn> how does that even work? You need the right sequence number
<clever> immibis_: oh, that reminds me, you can configure some http servers, to send a foo.html.gz file, but slap a content-encoding header on it
<immibis_> I assume it filled in the dynamic fields at runtime
<clever> so the client will decompress it on the fly
<clever> that then saves you cpu cycles on the server, having to re-compress the file for every request
<immibis_> some network cards might support TCP Segmentation Offload, and then you have code that not only only works for TCP, but only works for your specific card and DMA controller, but it's very fast because it DMAs directly from the disk to the network card
<mrvn> immibis_: using writev() seems like a better way. Splice the ip packets together from the header and chunks of the file.
<clever> ive looked at the genet (bcm2711 ethernet) driver before, the tx ring is a big array of addr+size+flag sets, and flags include "start of packet" and "end of packet"
<immibis_> depending on the segment size the kernel might do more work to process a writev than it would cost to just copy the bytes
<immibis_> now, if you could store a DMA descriptor chain on disk...
<clever> so scatter-gather dma, is just giving it multiple addr+length pairs
<mrvn> immibis_: splice should be able to directly DMA stuff
<immibis_> I don't know consumer ethernet drivers but I worked on some kind of industrial router and this sometimes involved peeking under the hood. The engine takes a linked list of DMA descriptors.
Vercas has quit [Quit: Ping timeout (120 seconds)]
<clever> so your writev() call, could prepend a few buffers for packet data, and gather-dma pieces out of both kernel and userland ram
<clever> but, if userland modifies the buffer mid-write, your checksums wont be right
<immibis_> you would, of course, want it to gather the same packet header over and over, while gathering different data pieces
Vercas has joined #osdev
<immibis_> maybe you add some kind of modification thing to the DMA chain, telling it how to increment sequence numbers and decrement checksums
<mrvn> clever: that's why my OS doesn't even allow that all. If you write a buffer you give up rights to the buffer. you can't modify it. Having to COW every buffer on write is too much work.
<immibis_> or you hardcode logic in the network card telling it how to generate TCP packets from a plain old data stream
gabi-250 has quit [Ping timeout: 255 seconds]
<immibis_> mrvn: how's the overhead of handling many smallish buffers?
<clever> mrvn: yep, that sounds like a valid solution
<mrvn> immibis_: On modern cards you write 64k frames to the NIC and it internally splits it into MTU chunks and generates all the right headers for it.
<immibis_> that would be called TCP Segmentation Offload
<mrvn> immibis_: horrible. 1 page for the message, 1 page for the buffer. Two INVLPGs. I'm not writing my OS to be fast, just simple.
gabi-250 has joined #osdev
xenos1984 has joined #osdev
<immibis_> that seems to be a common problem in message passing systems (and what is the difference between a message and a buffer?)
<mrvn> immibis_: and everything is a process. So every subsystem the message passes through is another INVLPG fest. You should not use an external buffer for short stuff. Better to include it in the message itself.
<clever> mrvn: i would say, if the buffer is under some size threshold (maybe under 1 page), just copy it
<immibis_> a silly thought: maybe messages should be passed in XMM/YMM registers
<mrvn> clever: that would require allocating pages inside interrupts. not a possibility.
<clever> or that, include the data in the same page as the message
<mrvn> clever: that's what I do.
<clever> the zfs journal does similar
<clever> for small writes, the data is in the journal itself
<clever> but for large writes, the data goes to its usual final destination, and the journal just holds the pointer
<immibis_> in XMM registers you have 256 bytes, with direct access from CPU instructions, that won't be stomped on by context switching code and in fact not stomping on them makes the context switch *faster*
<mrvn> If you write under 4000 bytes then just include it in the message itself.
<clever> so it can put off the more costly updates, until later
Turn_Left has joined #osdev
<clever> and the journal is enough of a promise to userland, that the data is secure
<immibis_> mrvn: software interrupts or real interrupts?
<clever> immibis_: that sounds like lazy fpu context switching
<clever> defer the fpu context switch, and set access control registers so it faults upon any access
<mrvn> Another thing I plan to do is to have buffers attached to a message but not map them. You just get a handle for the buffer and can pass that around and if you actually want to access the buffer you have to map it.
<mrvn> immibis_: yes
<immibis_> even lazier. deliberately leaking FPU context between processes as message passing
<clever> mrvn: that sounds like linux dma_buf api
<clever> but in linux, each buffer, is a seperate fd handle
<clever> so you need to be passing whole fd's around, potentially 1-3 per video frame
<mrvn> clever: for me it would be addr+size specifying a bunch of physical pages that then get turned into some VM handle.
<clever> immibis_: heh, thats one way, just read it, and context switch to the right destination
Left_Turn has quit [Ping timeout: 256 seconds]
<immibis_> can Memory Protection Keys be used to switch between 16 different processes without changing page tables?
<mrvn> immibis_: ARM has 256 ASIDs you can switch between
<mrvn> x86_64 has 4096, right?
<immibis_> the ideal message-passing context switch is like "ASID = new_proc->ASID; jmp new_proc->MessageReceiver;"
<clever> immibis_: now youve reminded me of the centurion cpu6, it has 16 banks of registers, each with its own mmu config, and can switch between them freely, an irq will also force a switch to a specific set (each irq is bound to a diff one)
<immibis_> mrvn: don't know
<mrvn> immibis_: you still load a new page table but you don't loose any TLB or cache content.
<immibis_> oh well that's good. Last time I knew about context switches, page table flushing was the main overhead.
gildasio has quit [Remote host closed the connection]
<bslsk05> ​github.com: Instructions · Nakazoto/CenturionComputer Wiki · GitHub
<mrvn> immibis_: with a microkernel you should definetly look into ASIDs.
<clever> i should get off to bed, its getting late here
<mrvn> then you still have time. it's not early yet.
<clever> lol
gildasio has joined #osdev
<kaichiuchi> I cannot believe CMake doesn't have reasonable line break support