<bslsk05>
github.com: toaruos/start.S at master · klange/toaruos · GitHub
<heat>
mrvn, all code is usually PIC if you don't reference global symbols
<geist>
usually and always are not the same thing
<geist>
depends on how much risk you want to take
<heat>
of course, you need to be careful
<geist>
i have also learned this the hard way
<mrvn>
heat: hard to setup the page tables without accesing global symbols.
<geist>
thus i dont really want to risk it on LK because i cant control what compilers people use
<heat>
but for bootstrapping code I've found that I don't need much other than a map_pages function
<clever>
geist: thats where i can see some value in having java like tagging on functions, similar to how exceptions propogate up the call stack
<clever>
if you can tag a method as being true PIC, or needing relocations
<heat>
mrvn: pass the page tables through an argument, get the address in assembly
<clever>
and the linker will spread that up as you reference functions
<geist>
though funny this came up, i noticed teh trusty folks literally did what heat is talking about recently
<geist>
they moved the early mmu setup code into C
<klange>
And then I jump to C code that sets up framebuffer, sets the CPU clock, exits EL2 because my kernel assumes it's running EL1 and yes I know that's going to bite me when I want to do hypervisor stuff in the far future, does an initial MMU setup, and loads the actual kernel: https://github.com/klange/toaruos/blob/master/kernel/arch/aarch64/rpi400/main.c#L386
<bslsk05>
github.com: toaruos/main.c at master · klange/toaruos · GitHub
<dh`>
there are machines where ordinary code has absolute jumps in it unless you take precautions
<geist>
but... they also control precisely what compilers they use and can add build time tests that it doesn't suddenly generate something bad
<clever>
and then you can make it a fatal error to call a non-pic symbol from an asm file that has declared that rule
<heat>
geist: because of KASLR I bet?
<mrvn>
heat: that's a function call into a different translation unit ==> global symbol, non PIC jump
<geist>
because of KASLR yes. they just added support for it in trusty
<mrvn>
heat: or did you mean inline asm?
<heat>
yeah KASLR starts to make page table bringup non trivial
<mrvn>
heat: it's rather easy to get some relocatable code into your C compiler output by accident.
<geist>
we also did this for zircon (move the initial mmu code into C) but it has caused some problems
<geist>
because of the 'must enforce it is actually PIC even if the compiler some day decides to screw it up'
<geist>
so i've been burned by precisely that
<heat>
mrvn: call instructions are usually PIC
<heat>
dunno about arm64
<mrvn>
didn't zircon have a small asm stub that does relocations?
<heat>
but you could probably do a hack if it's not
<dh`>
there are a lot of platforms where call instructions are not PIC by default
<mrvn>
heat: PIC or GOT. and you can't know which.
<geist>
it's moving towards that with a new thing called physboot
<geist>
which is turning into a pre-kernel loader that actually loads the zircon kernel as ELF
<geist>
but up until now, it's start.S (from LK basically) where the MMU setup code is in C
<geist>
basically as heat is suggesting
<geist>
and it has worked, but.... been a bit fragile
<clever>
that reminds me, arm32 linux has ~3 stages
<clever>
1: the initial asm stub, to apply relocation stages to the decompressor
<heat>
fwiw linux also seems to do as I say in most main archs
<clever>
2: a small chunk of c code to decompress the real kernel
<clever>
3: some more asm, to turn the mmu on, and run the real kernel
<clever>
4: the non-relocated kernel, that can now run because the MMU placed it at the right addr
<geist>
if i hadn't been burned by it i'd actually do it again on LK on some of the arches
<clever>
1/3 are all hand-written and fully pic, 2 is compiled and has to be patched
<geist>
OTOH, you can just write the mmu setup code in assembly. it's just harder to do
<mrvn>
The advantage of a separate loader stage is that you can move the kernel around in memory and then map it with e.g. 2M pages to where it wants to be.
<geist>
but you can make sure it works forever
<clever>
LK on arm, is just 3/4
<dh`>
turning the mmu on from C code is scary
<geist>
clever: yeah right now we have physboot in zircon that does 1 and 2 as well
<dh`>
at least for x86, maybe arm is more friendly?
<geist>
but will also subsume 3
<clever>
dh`: ive done it, but only when i plan to run from identity mapping and never switch out
<geist>
dh`: i'd set up the page tables and then go back to asm to actually turn it on
<geist>
key is no one really wants to write page table setup code in asm.
<mrvn>
dh`: you make an ASM stub for that that you call with the address to call
<geist>
but at the end of the day it's all the same thing, you just can do it in a nicer way in C but then you have to deal with the risk of the compiler generating non PIC
<geist>
so it's a compromise
<dh`>
really? page table setup code is not a big deal
<geist>
dh`: or you just do it in ASM. yes
<mrvn>
dh`: In the end it's nothing different than a task switch you use millions of times later on.
<geist>
but it's a bit more difficult if the tables are purely dynamic
<geist>
ie, KASLR, physical and virtual
<clever>
the rpi firmware has a dedicated register it uses as a base, to minimize how much relocation patching is done
<clever>
but there is still a decent amount of abslute addressing sprayed around the binary
<mrvn>
dh`: how about uncompressing an xz file for the real kernel image? Still want to do it in asm?
<clever>
mrvn: aarch64 just put its foot down, and declared that the bootloader must undo the xz before even starting linux
<dh`>
what's this "real kernel image" business and why are you committed to the silly linux way of doing things? :-)
<clever>
mrvn: so that mess doesnt exist anymore
<geist>
personally i dont think compression is a big deal, but i think once you have some sort of C based pre-loader it is basically free
<mrvn>
so sad :)
<geist>
you just toss in a decompressor
<dh`>
in a sane environment the n'th stage bootloader reads /kernel.xz or whatever and unpacks it as needed, and it's had enough setup to be able to run C code
<geist>
i dont htink it's a goal unto itself, it's just a nice to have
<clever>
dh`: yep, thats how aarch64 linux works
<geist>
yah at some point we're talking about the same thing, it's just how it's packaged, where it comes from, etc
<heat>
i wasn't at ease writing riscv asm when starting out, and I was hitting issues, so I wrote the initial page table setup in C
<mrvn>
geist: For KASLR it's definetly the way to go I think. Keep the bootstrap code and kernel code completly separate.
<heat>
even if it's still trivial code
<geist>
if you prepend it to the kernel such that the last N stages are 'part of the kernel' then it looks like thekernel does it
<geist>
but really it's just hte last stage. that's what we have with physboot on zircon nowadays
<dh`>
for riscv the initial pagetable setup is really easy...
sonny has quit [Ping timeout: 256 seconds]
<clever>
i think the only thing that gets "simpler" is efi, doesnt that launch your kernel with the mmu on?
<clever>
and you can define the mappings?
<klange>
undefined
<mrvn>
clever: only way to get 32bit mode.
<heat>
dh`, it's super easy unless you don't know how to write riscv assembly
<dh`>
true
<geist>
clever: depends on the arch, and anyway you have to redo the tables immediately
<heat>
unfortunately, I was starting out ;)
<klange>
efi says mmu is either off or is set up with a 1:1 mapping
<geist>
yah it just so happens with x86-64 you *have* to have it on or you can't be in 64bit mode, but that's an arch detail
<clever>
geist: yeah, but if you can map your kernel to a high addr like 0xffff.0000.0000.0000, you dont need to be PIC, so when you redo the tables, the code doesnt move
<geist>
i think for arm64 and riscv and probably x86-32 it doesn't have to be on
<mrvn>
geist: does physboot just setup dummy tables that the kernel has to replace right away?
<clever>
klange: ah dang, so identity mapping rules out placing the kernel at an arbitrary high addr
<dh`>
for i386 you have to be in protected mode to do anything useful, but you don't have to turn paging on
<klange>
I recall virtualbox was previously not turning it on on 32-bit, and then in an update their implementation had switched to enabling it with PAE
<geist>
mrvn: at the moment it doesn't do that *yet* but it will set up the tables the kernel uses to map itself
<heat>
does virtualbox not use ovmf?
<geist>
ie, it'll pick a spot, set up the kernel address space and then the kernel gets it's entry point in C
<klange>
heat: it's based on ovmf, but has a lot of downstream modifications
<clever>
oh right, but wont pe32 also include relocation data, and the firmware can patch you up?
<geist>
that's the next phase of physboot: turn zircon kernel into a pure ELF binary that is loaded by physboot and run basically like a user app
Burgundy has quit [Ping timeout: 256 seconds]
<heat>
klange: I had the impression OVMF always turned on the MMU
<geist>
but since phyusboot and the kernel are compioled at the same time, and always paired, there's no risk of them getting out of sync
<mrvn>
geist: My ARMv6 stub sets up the page tables with .text, .rodata, .data, .bss in mind and the kernel can then extend that as needed. Nothing needs to be redone though.
<geist>
so we can pass data across that barrier however we see fit
<klange>
my aarch64 set up is that my actual kernel is a static, non-PIC, high-loaded ELF, and I do all the bullshit somewhere else before loading it
<geist>
right
<klange>
heat: turning it off may have been a modification they made and then reverted later
<geist>
yah we're slowly morphing it in that direction. since the dna of zircon is based on LK, it has generally up until now assumed nothing from the prior stage, and sets up its own world
<mrvn>
I even pass the VIRT_TO_PHYS variable as extra argument to kernel_main() so it can compute physical addresses.
<klange>
I should make my x86 setup work the same way.
<geist>
which i think works in an environment that LK thrives in, but less so for 'big' OSes wher you can build your bootloader and control your universe a bit better
<mrvn>
Do you need more than the ELF header of the kernel.elf to setup the environment for the kernel?
<geist>
if you assume the entry point is in C then you need to do the usual set up the cpu for C things
<geist>
but otherwise no
<geist>
for ARM you might want to define how the cache is handed off
<mrvn>
You need to have a convention that the main core calls kernel_main() and other cores call kernel_secondary_main() or something. But otherwise kernel is just like an APP.
<geist>
a non issue for x86, but depends on what you define the state to be: mmu on? off? d cache on? off? completely cleaned and invalidated?
<geist>
the latter was actually a bug that LK had that we fixed recently
Oli has joined #osdev
<mrvn>
Sure. But your loader and kernel are from the same source. Easy to make them agree on that.
<geist>
sure. in that case yep
<geist>
i was just saying you have to define that
<mrvn>
Unlike linux that can get started in millions of ways.
<geist>
yes but linux has rules too. my mistake in LK was not to follow linux rules
<mrvn>
hehe
<geist>
because all the loaders in the world assumeyou're linux, and thus you have to act like linux
<geist>
the rules specifically are: d cache is disabled, kernel image has been cache cleaned
<geist>
which is subtle: means that the kernel can immediately turn on the icache (with dcache disabled) and still work
<geist>
but... the inverse of it is the problem. it means that data *outside* of the kernel image may be stale
<mrvn>
klange: In your aarch64 start.S what is line 9 doing?
<geist>
and after 10 years of LK's start.S we finally found a case where that was an issue
<clever>
how was it an issue?
<geist>
means the dcache (which is disabled) may have modified + unwritten data for any page that isn't in the kernel image
<clever>
ah, yeah
<geist>
which means as LK starts to set itself up, specifically the page tables its constructing, if they happen to coincide with some dcache data when you turn the dcache back on it suddenly overwrites the page tables
<mrvn>
So when you turn it on it starts writing out stuff?
<clever>
and you may get an unexpected cache hit, when you try to do access outside of the image?
<clever>
like to ram thats becoming heap?
<geist>
right, or you may bypass the cache and then if you turn it on it overwites things
<klange>
mrvn: you mean the Raspberry Pi start.S that I linked?
<clever>
yeah, messy!
<mrvn>
klange: yes.
<geist>
so then the answer is: okay, invalidate the dcache bfore you turn it on
<klange>
mrvn: moving x4 into x1
<mrvn>
klange: duh. Why?
<geist>
but... that *also* means you may actually toss out real data (ie, the initrd) that is also sitting in the dcache
<klange>
x4 is the base address from the rpi armstub
<clever>
geist: i think on arm, you can get a fault if you have a cache-hit, when the cache was off? and you must flush before you turn off?
<klange>
and I want it as the second argument going into the C code
<mrvn>
klange: base address of what?
<clever>
i would just flush the initrd to ram as well, if i was making such a bootloader
<geist>
so the solution in LK is to invalidate each page table page before they're used
<klange>
almost definitely pointless as it should also be the address of `start` which I already know
<geist>
then by the time the mmu and the dcache are back on, should be coherent
simpl_e has joined #osdev
<klange>
(I also don't even use it beyond printing it)
<geist>
looks like klange is just moving the arg at x4 into x1 (arg slot 2) before calling into C
<mrvn>
klange: oh, but I don't know the address of "start". I only know the link address of "start"
<geist>
klange: if you move label 3 down one line should avoid doing a double test on every loop
<mrvn>
On ARM I do: subr12, pc, #8; ldrr3, =_start; subr12, r12, r3 to get the offset between where I am and where I want to be
<clever>
mrvn: i saw a near solution to that in the rpi-open-firmware start.S, there is an opcode to get the address of a nearby symbol, that is always pc-relative
<geist>
it's fairly easy in arm64: you can use an 'adr' instruction to compute the address of a nearby label
<clever>
mrvn: so if you load _start twice, using both absolute(linker) and pc-relative, and just look at the delta
<mrvn>
clever: that's what the sub + ldr do.
<clever>
mrvn: i think i saw something in the linux src, where it just does a function call at a known point, and uses the lr for it
<geist>
yah in the x86-32 days that was the *only* way to get the current EIP
<geist>
since there was no PC relative anything
<geist>
well, addressing modes, or ways to get to the IP
<mrvn>
yeah, x86 has no way to read out the PC.
<klange>
an important thing to keep in mind in most of my code is that I generally have no idea what I'm doing and I stumble into everything, so expect plenty of cargo culting
<geist>
a big major fix in x86-64
<geist>
klange: yah it looks pretty good otherwise. clean loop there
<geist>
note: the ldr = stuff is generally deprecated in ARM64
<mrvn>
Is "x4 == address of start" a general aarch64 thing or just RPi?
<klange>
just the rpi
<geist>
rpi, and i suspect it's an accident: my guess is the previous instruction did a `b x4` so it happense to be there
<klange>
yes, it's not in any spec they provide, just happens to be there
<bslsk05>
github.com: lk/start.S at master · littlekernel/lk · GitHub
<geist>
that's the sequence you want to compute a PC relative addresss anywhere in the binary
<geist>
the gist is ADR is a fairly close range PC relative thing, has a reach of a MB or so (20 bits i think)
<klange>
if you can permit me a brief rant, `adrp` makes objdump output utterly unreadable
<geist>
ADRP can reach anywhere within 4GB I believe, but returns a page aligned value
<geist>
and that's what the `:lo12:` prefix on the symbol means for the second add instruction
<mrvn>
20 + 12 == 32, makes sense.
<klange>
so much "load address of <this symbol that has nothing to do with what the code is actually looking at but happens to be at that page>"
<geist>
it asks it to provide the low 12 bits (ie, within the 4K page) ... yeah
<geist>
the compiler will use ADRP all over the place for PIC code, and it's quite efficient, *especially* since most ldr/str instructinos also take a 12 bit offset
<geist>
so you can manuy timse do a `adrp base_reg, symbol; ldr reg, [base_reg + :lo12:symbol];
<mrvn>
klange: can't see how objdump could do any better. The opcode only has the upper bits.
<geist>
yah riscv has something extremely similar. works well, but it does make objdump hard to read
<geist>
since it can't figure out what symbol you're referring to without being smarter
<mrvn>
aren't your page tables 4k aligned?
<geist>
...
<klange>
mrvn: true, objdump output on arm is just unreadable mess regardless :D
<geist>
i can tell you soemthing worse: llvm-objdump
<geist>
it doesn't even try to assist you. it just gives you raw numbers
<klange>
"good luck"
<geist>
also note that adrp/add two instruction sequence on modern ARM cores is usually collapsed into a single uop
<mrvn>
Does aarch64 have the feature like ARM where I can limit the size of the user/kernel page tables? Doesn't make as much sense with the 4 level setup. But does it?
<geist>
the optimziation guides give you a bunch of instructions sequences liek that that are collapsed. like compare and branch
Goodbye_Vincent has joined #osdev
<geist>
mrvn: limit the size in what way (the answer is probably yes, but i want to know what you're asking precisely)
<geist>
also remember arm64 cores still have all the features that arm32 had under the hood because most of them also still let you run arm32 kernels as well
<geist>
so if there's some underlying feature its probably available in arm64
<mrvn>
Number of entries. On ARM you can open up a hole that's unreachable around 0x80000000
<geist>
yes you can
<geist>
in arm64 when you load a page table into the TTBR you can specifiy how big it is
<geist>
in powers of 2
<geist>
ie, i want a 39 bit address space, and then the cpu will dynamically limit the number of page tables to 3
<klange>
that reminds me of another annoying thing about the arm manuals
<geist>
or 38, it'll be 3 page tables with half of the top one unused
<klange>
tons of pages on the web version where the first half is "if you're in aarch32 mode" and then waaay down the page (with no index/toc!) is the 64-bit version
<geist>
it's independent based on each half of the address space too
<klange>
and a bunch of the regs are differently organized between the two
<geist>
TTBR0 and TTBR1 independently can specify their page sizes, and size of aspace, etc
<geist>
klange: yeah no kidding. i wish you could jsut make a version of the manual with no 32bit stuff
<geist>
same weith x86 to be honest
<mrvn>
any good reason to use that?
<geist>
mrvn: use what?
<mrvn>
geist: limit the sizes of the page tables.
<geist>
if you limit it such that there are less levels, it's hypothetically faster
<geist>
since less page tables to walk through
<geist>
also with larger page granules (16K, etc) you can get to 3 or 2 page tables 'quicker' since the numbers add up differently
<mrvn>
I can see that. Userspace won't need to map more than 8GB so less levels save space.
<geist>
16k: 14 + 11 + 11 + 11 = 47 i think?
<geist>
so right off the bat you can get to 3 levels by limiting the aspace to 47
<klange>
on a scale from 1-10 how evil is setting TTBR0 and 1 to the same thing
<geist>
should be fine
<geist>
whateve ryou map at 0 will also map at 0xffff.0000.0000.0000 (assuming full 48 bit + 4K)
<klange>
I do it because a lot of my MMU bring up is based on marcan's m1n1 code where he was doing that for simplicity
<mrvn>
On ARM I played with putting the level 0 and level 1 page tables, the process struct, the stack and a mini heap all into a single 4K page. I don't see how that could work on aarch64 though.
<geist>
yep. the LK thing also sets up a temp mapping in TTBR0 so it can get up to the kernel and then turnse it off later
<geist>
the split PT thing is *really* nice
<mrvn>
It doesn't have the 1k sub tables in 64bit mode.
<geist>
one of those things you get really used to. was sad to see that riscv doesn't have it
<geist>
mrvn: yeah 1k pages have been gone a while
<geist>
i think armv7 hard deprecated it, armv8 removed the feature
<mrvn>
On amd64 I don't really miss the split tables. I just limited user space to the first entry in the L4 table.
<geist>
makes sense
<geist>
512GB is good enough for anybody
<mrvn>
512GB are enough for any user. :)
<geist>
actually an interesting reason to enable 5 level paging right?
<geist>
since a single entry in the top of a 5 level structure is the size of the entire 4 level structure
<geist>
hmmmmm!
<mrvn>
And then I have servers at work with 1TB memory.
<geist>
never thought about it until now
<clever>
funny you should say that, haskell always uses a 1tb mmap for its heap
<clever>
double the limit your setting :P
<mrvn>
clever: it's a hardware limit. 1TB would need 2 table entries.
<clever>
and then madvise is used to allocate/free memory within that 1tb mapping
<geist>
riscv is pretty regular like this too: when you enable the mmu you basically jsut specify if you want 32, 39, or 48 bit addressing (and 57 is i think reserved already)
<clever>
mrvn: is it really a hw limit, or is it just a choice of how your implementing context switching?
<geist>
though of coruse different implementations have different levels of support, so it's not a free for all
<mrvn>
clever: hardware. A single page table entry covers 512GB.
<clever>
mrvn: what is stopping you from just using 2 slots in the L4 table?
<mrvn>
clever: nothing.
<clever>
then its not actually a hw limit?
<clever>
your just choosing to not use a 2nd slot?
<geist>
what i haven't had time to do is really benchmark the difference between adding/removing a level
<mrvn>
You would need 4 entries though, since a bit is already used for the haskell programm and it wants a full 1TB.
<clever>
yeah
<geist>
the cpu should have to walk more page tables to satisfy a TLB miss but modern cpus *also* have a page table cache that's generally fairly transparent
<clever>
its showing as 1025.1g virtual in top
<mrvn>
The idea was to simulate the 2 page table registers by having slot 0 be the user page table.
<geist>
so in many of those cases the page table cache lets hardware skip the cache
<geist>
so it's really the subset of cases where it TLB misses and has to walk the whole table
<energizer>
why is it good to have a 8MB stack limit instead of like 1TB?
<mrvn>
fault early, fault fast
<clever>
mrvn: yeah, i see, your just abusing slot-0 in a global pagetable directory, as-if it was TTBR0, and updating it upon context switch
<clever>
and using a 2nd slot, would add to the overhead
<geist>
technically nothing keeps you frmo building an OS that simply treats the entire aspace as a bit pile of zeros , and jsut demand faults them in as they're touiched
<mrvn>
clever: yep, that was the idea. So 2 slots would ruin that a bit.
<geist>
but i think there's a lot of value to leaving hoes
gjnoonan has quit [Changing host]
gjnoonan has joined #osdev
<geist>
haha holes that is
<heat>
energizer, big stacks are bad
<heat>
also 1TB is a lot of address space
<energizer>
heat: my question was why are they bad
<heat>
if you have a lot of threads you'll end up consuming a lot of address space
<geist>
right you're chewing up the aspace that you can also use for othe things, like memory mapping files, libraries, etc
<heat>
well, your stack isn't meant to be used like that
<clever>
heat: if the 1tb allocation fails, haskell will fall back to just asking for pages on-demand via mmap
<geist>
1TB is 40 bits of aspace. so in a 47 bit aspace like x86-64 you only have 127 1TB slots
<mrvn>
energizer: 1TB stack would limit you to < 256 threads.
<geist>
oh that too!
<clever>
1tb just gives it far more freedom over what virtual addr it places things at
<geist>
47 bits is a lot, but that *that* much
<energizer>
mrvn: that's fine, i only have 24
<zid>
It's good enough for the girls I go out with
<clever>
so the kernel doesnt make the choice on where the next free page lands
<heat>
energizer, software threads, not cpu threads
<heat>
you have thousands of OS threads running, I guarantee you
<energizer>
ok what about 1GB stack
<zid>
I have.. 1272
<geist>
30 bits. it's smaller
<heat>
what would a 1GB help you in?
<heat>
1GB stack*
<geist>
8MB is arbitrary too. the general idea is dont use more than you need. but if you need more, ask for it
<zid>
I'd be under a thousand if I closed bloody discord :p
<geist>
there's no hard and fast rule, but 1TB is a bit much
<heat>
8MB is already seen as pretty big by some people
<mrvn>
geist: If you think about it the idea to give every process the whole address space from the start and just fault in pages as needed has a lot speaking for it. You're doing that already for anything malloced and the whole syscall overhead for allocating memory is basically pointless.
<geist>
yah. but it's also generally not fixed. usually apps/languages can ask for different amounts
<zid>
yea 8MB is immense for the code I run, but I imagine random stack languages would want it
<geist>
mrvn: sure. it's virtual memory in the purest sense
<energizer>
zid: is that ps aux | wc -l ?
<zid>
no it's taskmgr.exe :p
<heat>
also the 8MB aren't already mapped, that's just the max size
<geist>
right i think everyone is assuming this is all demand faulted
<geist>
here'd be a bad problem if you had a huge stack (even if you never intended to use it)
<geist>
imagine a program accidentally recurses forever
<mrvn>
The 8MB is probably also a leftover from 32bit. Asking for 1TB on 32bit archs? hehe
<geist>
it'd start chewing up memory, push the system into swap, etc would be a Bad Time
<geist>
whereas if it was 8MB or so it'd hit that and explode pretty quickly
<heat>
thankfully linux handles swapping very well!
<heat>
:P
<geist>
so i think the general idea is ask for what yo need and maybe a little more for safety
<heat>
note that you probably don't need 8MB of stack
<mrvn>
Does linux record the maximum stack usage anywhere in the process stats?
<geist>
heat: yeah totally. did i tell you my M1 mac keeps leaking and chewing up GB of ram? I have a process that i kileld a while ago that was using 30GB (16GB machine). 29.5 of it as compressed to 7GB which was swapped out
<geist>
it actually handled it totally fine. i just check every few days and kill it
<heat>
mrvn, i don't think so but you could find that out using pmap
<kazinsal>
eight (thousand) megs and constantly swapping
<zid>
I tried to switch torrent client this week and I asked it to re-check my torrents to get them to 100% completed rather than 0% and it decided to use as much ram as it could until malloc failed to do so. That was a fun desktop experience.
<mrvn>
hint: it's not emacs that's eating 30GB ram.
<zid>
gotta love a random process using 30GB of ram because I told it to check a 30GB file
<heat>
geist: I've had windows' dwm.exe (window manager) leak and use up *all memory* because of buggy graphics drivers
<geist>
zid: rtorrent? i think i've seen it do exactly that
<zid>
qbittorrent
<geist>
like it loads some sort of huge index
<heat>
it would leak silently until you literally couldn't use the thing because it's busy swapping
<mrvn>
geist: rtorrent uses mmap
* geist
nods
<zid>
I think maybe it did mapviewoffile or something and then.. touched every page it could
<klange>
So happy I have a dev environment on my M1 now.
<zid>
which just immediately consumed all of my ram
<mrvn>
(for checking at least)
<geist>
klange: yah right? did you by the way manage to build current gcc? I haven't really debugged it, but gcc still doesnt' build out of the box stock
<kazinsal>
there's an electron app that I regrettably use for stat tracking and stuff for League of Legends and it leaks about ten gigs of swap per day in the background
<geist>
but i've just been using x86 toolchains on my M1 and it works fine
<zid>
I still prefer utorrent 2.1 and it's still better, but it's getting REALLY untenable to actually use, with how many outstanding CVEs etc there are
<mrvn>
zid: Do you know what really helps kill those things quickly? no swap.
<klange>
geist: yeah, it was a pain and required some patches
<geist>
ah yeah i've just been waiting for someone to fix it upstream
<zid>
You should use the JVM strategy, wrap the allocator
<heat>
no swap is great until you run out of memory
<zid>
free() old memory
<mrvn>
heat: and then it's still great because firefox just dies instead of making your system unusable.
<bslsk05>
rfjakob/earlyoom - earlyoom - Early OOM Daemon for Linux (118 forks/1905 stargazers/MIT)
<heat>
sorry but I don't have binary swap
<klange>
2020 Air
<geist>
noice. typing here on a M1 mini basically the first thing they sold. i got it to hack on but i've used it more and more as my main desktop
<heat>
my swap isn't either empty or full
<geist>
it's such a lovely machine, except the leaking process
<heat>
it's a bit more nuanced than that
<mrvn>
If you get beyond maybe 1GB of swap space that's actually used then your system just freezes.
<mrvn>
If you just have unused pages in there then no problem.
<klange>
After the last several years of using a Surface, I've missed having a clamshell hinged keyboard.
<zid>
swap is either empty, because my ram is mostly used for page cache, or 100% full because something leaked 100TB of ram
<heat>
mrvn: depends. linux? sure, because linux avoids swapping like the plague
<zid>
I've not experienced anything inbetween since ddr2
<geist>
i usually set /proc/sys/vm/swappiness to 100 jsut to try to coax it util using it a bit
<heat>
but my windows installation can get like 300MB of swap usage pretty easily
<geist>
after some time i usually end up with a few hundred MB
<mrvn>
heat: OS independent. With 1GB actively used in swap you are just waiting for the drive.
<geist>
whcih i assume is generally just unused pages in long running apps
<zid>
yea windows xp was the complete opposite
<zid>
its idea of "long running app" was "not currently focused"
<heat>
mrvn, no.
<klange>
Finally going to do some travel to the US next month, and having the Air should be great. Don't need to lug my old ThinkPad, don't need to struggle with keeping the Surface upgright on the seatback table in a plane...
<zid>
you'd constantly have completely unused ram, not just 'a copy is also in swap'
<heat>
there's a lot that can get swapped out before you're swapping out things you're using
<geist>
i read what that tunable does, it basically is a threshold of which linux will choose to reclaim a page from either a file cache or an old anonymous page
<geist>
with the default settings it'll avoid swapping something out like the plague
<zid>
problem with swap is that if you can afford a large capacity SSD to put swap on it, you can afford the RAM not to have to :P
<mrvn>
heat: note the "actively used". I can swap out 100GB of unused data no problem. The problem comes from swapping stuff back in all the time.
<geist>
but what i dont think linux ever does is preemptively write pages out so they can then be chosen to be evicted, which i think is basically what systems like NT and mac do more
<heat>
mrvn, yes.
<zid>
yea a *copy* in swap is okay
<heat>
I've never said the opposite
<mrvn>
geist: linux does that but verry little.
<zid>
then you can treat the page in memory as page-cache
<geist>
right
<clever>
geist: but there is the SwapCached stat
<clever>
SwapCached: 2426176 kB
<zid>
XP would *delete the original*
<clever>
this is not pre-writing to swap, but rather keeping things in swap when it reads them back out
<zid>
mindblowingly awful in practice, had to disable swap on xp
<mrvn>
SwapCached I believe is just data that was read back from swap and not modified since.
<clever>
yeah
<geist>
yah
<clever>
so it can "return" back to swap instantly, by just discarding the ram copy
<geist>
my VM server, however, is very healthily using 5.4GB of swap here
<heat>
windows also swaps to compressed memory
<zid>
Maybe someone will eventually drop a dual mapping swap backend on linux, I'd use it if I had an ssd I didn't mind killing
<geist>
but i have 10 VMs running and swappiness set to 100 so its probably moving VM pages out
<mrvn>
Geists idea is to take pages that have not beem modified in a while, write them to swap and make them SawpCached.
<heat>
linux as well if you enable zswap/zram
<geist>
mrvn: right then they can be moved a bit higher up the priority queue to be reclaimed
<geist>
though i think in modern compressiong based systems you probably compress those pages first
<geist>
then over time move them into swap
<heat>
that's how zswap works
<mrvn>
I don't think keeping a compressed page in memory is such a good idea.
<geist>
i do wonder though: does it work by just having a swap file it pretends is a disk but is actually compressed memory?
<mrvn>
what is a disk?
<geist>
or is it truly a new level? because i think compression from memory to memory is almost a new level in a heirarchy
<geist>
in the case of M1, for example, it has instructions to compress/decompress pages, whcih really incentivizes the VM to compress early and often
<clever>
geist: i think its just the priority column in /proc/swaps
<geist>
probably essentially as soon as it's unmapped (inactive pages) may as well get it on the fast path to compression
<mrvn>
How is the compressed swap set up? Is it N MB of memory where it stored compressed blobs or is it N compressed blobs and the memory needed varries over time?
<geist>
clever: right, so if linux wasn't really going to preemptively swap in the first place, then all you're doing is when it finally does it'll get some compression while its at it
<clever>
mrvn: i think its a ramdisk, where the uncompressed side is of a fixed size (only way to make mkswap/swapon happy)
<clever>
and how much actual ram it uses, depends on compression ratio
<geist>
so it's only as effective as linux was already going to try to swap
<geist>
whcih is not a lot
<clever>
ive also had fun bugs many years ago, when i set swap priority on my only swap partition
<clever>
due to bugs with the migration from hda -> sda, my hdd appeared in /dev/ twice
<heat>
it looks like zswap is a whole other layer
<clever>
swapon by uuid, turned on sda2, and hda2, the same partition
<heat>
see frontswap in linux
<mrvn>
geist: how does uncompressing a page compare to reading it from M2.key?
<clever>
for most users, that only crashes when sda2 fills up, and it begins to overwrite it via the hda2 alias
<geist>
if you have an instruction i guess it's probably far cheaper, time and power wise
<geist>
firing up a nvme drive is i think quite expensive power wise
<clever>
but for me, with the same priority, it striped between sda2, and hda2, and overwrote things immediately
<clever>
*boom*
<geist>
side note: my NAS box is using SSD-in-front-of-raid caching and it's quite slick
<mrvn>
clever: that shouldn't have worked. opening sda2 should have locked hda2
<geist>
i forget what linux calls it, but it's clearly using it a lot
<heat>
geist, what do the M1 instructions compress as?
<clever>
mrvn: thats why it was a bug :P
<geist>
heat: dunno
<geist>
priobably something fast, but since it's also compressing 16k pages it probably has a fair amount of data to work with
<mrvn>
a verry un-RISC opcode that "compress a page" one.
<geist>
heh yeah
<geist>
i honestly dint know how it works, but someone reverse engineered it online
<geist>
haven't really looked itno it, and probably shouldn't
<heat>
can't wait for a libzstd implementation in microcode
<mrvn>
Would be interesting to benchmark it to see how it compares to the 2GB/s or so an M2.key manages.
<geist>
if it's quick it may make sense to always compress just before witing it out to swap *anyway*
<geist>
since that'd reduce the bandwidth
<geist>
but from what i understand flash nvme is not power cheap in general, so you really want to avoid it if you're a laptop or whatnot
<mrvn>
Totally. I was more thinking about how much keeping compressed pages in ram is worth it.
<geist>
so if you can compress mem to mem N percentage of the time that causes it to swap M percentage less that's probably a net win
<geist>
depending on what the costs and N and M are
<mrvn>
(assuming you are not on battery power)
<geist>
sure yeah
<geist>
anyway gotta go. have fun everyone
<clever>
i should also get to bed
<mrvn>
The drawback of having compressed pages in ram is that you have less ram. You might not even have to swap at all without compressed swap.
<klange>
i still need to write a gzip compressor...
<mrvn>
me too.
<heat>
mrvn, compressed pages only exist on demand
<mrvn>
klange: lzx, lha, runlength encoding?
<heat>
if you swap, you needed to swap
<heat>
if you swap to disk, you ran out of compressed memory space
<mrvn>
heat: but do you compress 20 pages to get enough free space for an extra page or do you compress one, save it to disk and reuse the page?
<heat>
first option
<mrvn>
then any working set that needs one of the 20 pages will have to constantly compress/uncompress pages.
<heat>
zswap even decompresses memory when writing to the disk
<klange>
I have a gzip decompressor, so I need a compressor to go with it
<heat>
mrvn, if your swapping algorithm is good, that's not a problem
<mrvn>
swapping alogrithm are never good. :) They always have a worst case and you always hit that.
[itchyjunk] has joined #osdev
<mrvn>
Well, I get a "Hello, World64" output. Time for sleep.
immibis has joined #osdev
rsjw has joined #osdev
Jari-- has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
heat has quit [Ping timeout: 250 seconds]
ElectronApps has joined #osdev
sonny has joined #osdev
sonny has quit [Quit: Client closed]
Oli has quit [Quit: leaving]
nyah has quit [Ping timeout: 260 seconds]
<klys>
klange, I have a toaruos 32bits vm running and want to set the ip address. is there help?
<klange>
yeah, don't use a 32-bit ToaruOS :)
<klange>
though I don't have user-facing options on 2.0 either (but it is a fairly standardized ioctl-based process)
rsjw has left #osdev [#osdev]
<klange>
1.x only supported dhcp, 2.x has a completely new network implementation that can support manual address assignment, but I haven't added anything in my `ifconfig` to do that - `dhclient` has the relevant code
<klys>
ok thanks that makes it a lot easier
<klange>
Stupid answer for 32-bit/1.x: You can write a module that uses `get_default_network_interface` and set ->source to a host-ordered integer IPv4 address.
<klys>
thanks for that, figuring that 1.x is probably deprecated today, glad you would oblige.
<klange>
i realize if you need to set a static ip then you probably don't have a network with which to obtain a toolchain and compile that locally... ;)
freakazoid333 has quit [Ping timeout: 272 seconds]
<klys>
i was passing the following to qemu and running: brctl addif eth0 tap1; at startup: -netdev tap,id=nd0,br=eth0,script=no,downscript=no -device pcnet,netdev=nd0,mac=00:12:34:56:78:90
<klys>
note that you need a unique mac addy for each vm instance
<klange>
The network stack in 1.x was really janky. The network stack in 2.x is only kinda janky, but isn't that much better, though the latter got a lot more testing against a real DHCP server.
<klange>
Note I don't support 'pcnet' anymore / at the moment, switch to e1000 or e1000e.
<klange>
And/or file an issue report reminding me to port the driver forward...
<klange>
Same for rtl8139, that also has yet to be recovered. It's not all that involved, I've just been lazy as everything I've been testing on supports e1000(e).
<klange>
And I should also just say "just use slirp" if that's an option... not like I have any server stuff to run yet...
<zid>
I had a gigabyte realtek in my P35 machine, I should figure out how to revive that board
<geist>
yah i think between realtek 8111 and e1000 you've got pretty much mot of the stock ethernet
<zid>
Q35 actually, thinking about it, but, same shit
netbsduser has quit [Read error: Connection reset by peer]
[itchyjunk] has quit [Remote host closed the connection]
the_lanetly_052 has joined #osdev
freakazoid343 has quit [Remote host closed the connection]
eryjus has quit [Read error: Connection reset by peer]
netbsduser has joined #osdev
eryjus has joined #osdev
masoudd has quit [Ping timeout: 272 seconds]
xenos1984 has quit [Read error: Connection reset by peer]
bauen1 has quit [Ping timeout: 272 seconds]
sonny has quit [Quit: Client closed]
pretty_dumm_guy has joined #osdev
pretty_dumm_guy has quit [Client Quit]
pretty_dumm_guy has joined #osdev
nyah has joined #osdev
xenos1984 has joined #osdev
masoudd has joined #osdev
bauen1 has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
epony has quit [Read error: Connection reset by peer]
lkurusa has joined #osdev
marshmallow has joined #osdev
dormito has quit [Quit: WeeChat 3.3]
not_not has quit [Ping timeout: 240 seconds]
GeDaMo has joined #osdev
not_not has joined #osdev
<not_not>
Hoi
<not_not>
We have a language Dev Chanel?
<geist>
Programming language? On Libera irc? Dunno
<klange>
"we"? There is no language dev channel affiliated with #osdev, but I'm sure there's at least one on Libera.
<not_not>
Ahh ye meant libera when i Said we
<not_not>
Ahhh
<not_not>
Autocorrect off its giving me ulcers
<GeDaMo>
#proglangdesign
<not_not>
So what emu should i use? Planning on just doing basic bootloader + kmain and screw around in ring 0
<not_not>
Stuck on protected mode on my current loader im a noob
<not_not>
Gedamo ty
<GeDaMo>
:)
<Mutabah>
qemu
<not_not>
Oki
dormito has joined #osdev
lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]
dormito10 has joined #osdev
dormito has quit [Ping timeout: 256 seconds]
not_not has quit [Read error: Connection reset by peer]
the_lanetly_052 has joined #osdev
lkurusa has joined #osdev
lkurusa has quit [Read error: Connection reset by peer]
Jari-- has quit [Ping timeout: 240 seconds]
xenos1984 has quit [Remote host closed the connection]
<mrvn>
Say I'm afraid of the compiler allocating memory somewhere, e.g. for a closure. What are the ways to get an error during compile instead of an undefined symbol during linking?
xenos1984 has joined #osdev
dennis95 has joined #osdev
<GeDaMo>
Statically allocated memory? Or you want to limit where the dynamically allocated memory could be?
<mrvn>
purely dynamic non-stack memory.
<mrvn>
I don't have malloc/new. Only some placement new.
<GeDaMo>
Ah, this is C++?
<mrvn>
yes. Thought that was clear from closure.
<mrvn>
aka lambda captures
terrorjack has quit [Quit: Ping timeout (120 seconds)]
terrorjack has joined #osdev
MiningMarsh has quit [Read error: Connection reset by peer]
MiningMarsh has joined #osdev
dormito10 is now known as dormito
<mrvn>
Given this code: void foo(bool x) { if (x) raise Bla; } void blub(bool x) { try { foo(x) } catch (const Bla) { ... } } where Bla is one of the std exceptions can the compiler optimize the exception away? I.e. should one have an interface with optional to avoid the cost of exceptions?
<clever>
mrvn: if the compiler is in-lining things, i suspect it may detect that unreachable path, and just eliminate it
<mrvn>
it's all reachable
<kazinsal>
from a point of vie assuming bool x is a single bit, statistically you have a 50% chance of !x being 0
<clever>
but it could be translated into just `if (x) { ... } else { ... }`
<clever>
and not even use an exception
<kazinsal>
realistically some large number is going to get casted down to bool x
<clever>
or a function call
<mrvn>
clever: exactly. That's the desired optimization.
<kazinsal>
which makes the chance of !x being 1/(2^x)
<kazinsal>
but some dipshit is going to pass a null to foo() so do the safe thing.
<mrvn>
kazinsal: you are talking about the wrong thing
<clever>
that is also why ive seen some code doing foo(!!x)
<clever>
to normalize the bool
<kazinsal>
mrvn: I always assume when writing function interfaces that some dipshit is going to pass some functionally invalid parameter
<mrvn>
kazinsal: and you can't.
<mrvn>
any value of x is valid.
<kazinsal>
then you need to fail in a manner that stops worse things from happening.
<mrvn>
nothing fails. Every exception is caught.
<kazinsal>
catch a !x and crash then
<mrvn>
nothign crashes.
<kazinsal>
for a non-const x you process. assuming that non-const x is valid for a dereference or whbatever
<kazinsal>
for a const x you can simplify your check
<mrvn>
kazinsal: no. The x is checked and if it is true the code does one thing and if it is false it does another.
<kazinsal>
is your foo(x) externally linked?
<mrvn>
No. it's right there or the compiler couldn't optimize.
<kazinsal>
then best case you can statically optimize out your !x paths and worst case you can't
<kazinsal>
in the best case your !x path gets committed to the binary
<kazinsal>
er,
<mrvn>
no. no code path can be optimized away.
<kazinsal>
in the best case it doesn't
<kazinsal>
int heworst case it doews
<kazinsal>
so in foo(x), x is not const?
<mrvn>
The question was if the raise/catch pair gets optimized out. That's the only thing that can be removed.
<kazinsal>
if x is not const then it should not be optimized out
<mrvn>
kazinsal: it has no observable effect.
<mrvn>
The Bla() constructor might but not the raise.
<kazinsal>
if you can't guarantee the truthiness of x for foo(x) then you can't make any optimizations based on the truthiness
<mrvn>
The code is equivalent to: if (x) { Bla(); ... catch code ... } else { ... try code ... }
<j`ey>
mrvn: why dont you compile it and see :P
<mrvn>
because I don't have every modern compiler in the world
<kazinsal>
good news, everyone who matters compiles on gcc or clang
<j`ey>
mrvn: godbolt :P
<kazinsal>
icc is used for dipshit benchmarking code
<j`ey>
icc is based on clang now
<kazinsal>
same with amd's weird bespoke fork of gcc or clang or whatever amd's optimizing c compiler is
<kazinsal>
if your gcc or clang output is non-deterministic than your inputs are non-deterministic
<mrvn>
So now the question is: why is there a __cxa_allocate_exception?
<kazinsal>
because it's indeterminate whta get spassed to blub(x)
<mrvn>
kazinsal: irrelevant
_eryjus has joined #osdev
eryjus has quit [Ping timeout: 250 seconds]
<kazinsal>
having thrown some statics in there and an int main() I can attest that the answer is "the compiler isn't optimizing for the perfect case"
<mrvn>
kazinsal: you do see that in this simple case blub() could be optimized to just return, right?
<kazinsal>
it's a worst case scenario of "the compiler isn't doing pre-optimizing every case perfectly".
<mrvn>
unless we miss something that says execptions can be observed and therefore can't be optimized away.
<kazinsal>
this is not entirely unlike the problem that Itanium had as an explicit architecture -- the compiler could not sufficiently optimize impossible paths out
<kazinsal>
possibly the root cause is "we were expecting the compiler to be magic, and it wasn't"
<kazinsal>
which is quite literally the problem that Itanium had
<mrvn>
Itanium had many hardware problems in itself too. Compiler optimization just takes time and it died before it could really take off.
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
lkurusa has joined #osdev
<mrvn>
Back to the exception thing: throw() has side effects: std::current_exception() and std::uncaught_exceptions(). But they aren't used so I think optimizing the exception away is legal but compilers just aren't smart enough.
Payam63 has joined #osdev
ElectronApps has quit [Remote host closed the connection]
blockhead has quit []
toulene has joined #osdev
_eryjus has quit [Read error: Connection reset by peer]
_eryjus has joined #osdev
Payam63 has quit [Quit: Client closed]
Payam5 has joined #osdev
heat has joined #osdev
Payam5 has quit [Quit: Client closed]
freakazoid12345 has joined #osdev
[itchyjunk] has joined #osdev
lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]
lkurusa has joined #osdev
lkurusa has quit [Client Quit]
lkurusa has joined #osdev
lkurusa has quit [Client Quit]
freakazoid12345 has quit [Ping timeout: 256 seconds]
<immibis>
if it was implementation-defined to eject nasal demons that would also be a valid implementation bug
<mrvn>
I would only consider it a bug if there is no way to annotate the bounds for the literal pointer.
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
<heat>
it's not possible to annotate bounds for literals
<heat>
it's a bug
<heat>
how would you do MMIO to a fixed address otherwise?
pretty_dumm_guy has quit [Ping timeout: 268 seconds]
bauen1 has joined #osdev
bauen1 has quit [Ping timeout: 240 seconds]
bauen1 has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
<immibis>
heat: you can't, it's non-standard C
<immibis>
undefined behaviour which the compiler has no reason to support, if you are the kind of person who complains about such things, which I would hope people in this channel are not
<heat>
doesn't matter, it's not the point
<heat>
how would you do it?
<heat>
standard C is simply not a thing you can ever support
<immibis>
you would tell the person asking that it's impossible
<heat>
breaking something essential for low level programs (like kernels, bootloaders) and saying "well, that was UB anyway, we don't need to support it" is useless if you're breaking every low level program out there
<immibis>
clearly you are unfamiliar with language-lawyer-think
<heat>
if the compiler writers think so, their compiler is useless
<heat>
truth be told, this is probably a regression in trunk
<heat>
I don't think GCC trunk is stable
<heat>
(clang's is though)
k8yun has joined #osdev
dennis95 has quit [Quit: Leaving]
<mrvn>
heat: for example: (volatile int *)0x1234 __attribute__((bounds=1));
freakazoid12345 has joined #osdev
maksy_ has quit [Quit: WeeChat 3.1]
masoudd has quit [Ping timeout: 272 seconds]
Bonstra has quit [Quit: Pouf c'est tout !]
Bonstra has joined #osdev
wolfshappen has quit [Ping timeout: 256 seconds]
wolfshappen has joined #osdev
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
<geist>
yah the bounds warning on new gcc is annoying
<geist>
i haven't seen this particular one, but i've seen warnings like it, started with 10 i believe
<mrvn>
This one only happens in trunk
gareppa has joined #osdev
<geist>
oooh. interesting that i dont see ito n godbolt
<mrvn>
Anyone use attribute zero_call_used_regs on their syscalls?
the_lanetly_052_ has quit [Ping timeout: 250 seconds]
<geist>
oh fun. it's marked as dup, but the original bug it's a dup of has a pretty good argument going on
<geist>
hmnm no (re: zere call used regs). usually just zero the regs on the way out of the kernel
<mrvn>
yeah, the attribute just automates this.
<geist>
figure that can be put in the syscall exception logic, and is thus shared and can be made efficient
rorx has quit [Ping timeout: 250 seconds]
rorx has joined #osdev
blockhead has joined #osdev
dude12312414 has joined #osdev
tomaw has quit [Quit: Quitting]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
tomaw has joined #osdev
elastic_dog has quit [Ping timeout: 256 seconds]
dennis95 has joined #osdev
<geist>
side note: re talking about swap and whatnot in linux the other day, searched around and found the `smem` utility
<geist>
it's nice, gives you a good view of where things are
<mrvn>
package?
<mrvn>
I have a UART class that initializes the uart in the constructor and has placement new taking a DT entry to construct. What should the destructor do? Turn off IRQs probably. Drain the FIFO and turn it off? Set the BAUD to something default?
<geist>
package?
<mrvn>
geist: package that contains smem
<geist>
oh here, let me consult my spreadsheet that contains a list of linux distributions that everyone is using that i've been curating
<geist>
so that i can give you a personalized response
<geist>
it's in the SMEM_V2::RAD_VERSION that my specialized distro is on
<geist>
(hint the answer is 'smem')
<geist>
on ubuntu. i have no clue what you use
<mrvn>
There is realistically only debian and rh. So smem it is. Not sure I like the output.
<geist>
per process. which is what i really wanted
<mrvn>
Ahh, "Shared memory is divided evenly among processes sharing that memory." That part is interesting and probably why it's so slow.
<geist>
also seems to be written in python
<mrvn>
When I first started it I though: Hey, top tells me the same and it manages to update many times a second and this takes like 20 seconds.
<geist>
if top had a column for swap i'd be happy
<geist>
i just wanted to see what processes consume swap
<mrvn>
36. SWAP -- Swapped Size (KiB)
<mrvn>
The formerly resident portion of a task's address space written to the swap file when physical memory becomes over committed.
<mrvn>
You just have to configure top to show that column.
<geist>
oh yeah, huh. good
<mrvn>
"nMaj -- Major Page Fault Count" is also interesting to see what is actively driving your harddisk to die.
<geist>
yah
elastic_dog has quit [Ping timeout: 252 seconds]
<mrvn>
or vMj for just the delta
<geist>
and of course W to save it
<geist>
excellent
<mrvn>
enjoy.
GeDaMo has quit [Remote host closed the connection]
<geist>
fun to watch rustc processes fire 214k vMn
<geist>
in a second
elastic_dog has joined #osdev
epony has joined #osdev
<mrvn>
one per page it initializes
<geist>
yep
<mrvn>
shouldn't it use something like mmap MAP_POPULATE?
elastic_dog has quit [Ping timeout: 240 seconds]
nyah has quit [Quit: leaving]
nyah has joined #osdev
<geist>
probably just the heap expanding
<geist>
like any other large program
masoudd has joined #osdev
dennis95 has quit [Quit: Leaving]
mahmutov has quit [Ping timeout: 245 seconds]
sprock has quit [Quit: brb]
elastic_dog has joined #osdev
dormito has quit [Quit: WeeChat 3.3]
elastic_dog has quit [Read error: Connection reset by peer]
sprock has joined #osdev
sprock has quit [Client Quit]
sprock has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<mrvn>
geist: does LK have code to shutdown an UART?
elastic_dog has joined #osdev
<clever>
mrvn: not directly, but the PL011 init code, does shutdown the uart before changing the config, so it could be copied out, but it also depends on which uart your using
<mrvn>
whatever is in the DT. I probably never shut down the console but you could load and unload the Bluetooth.
<mrvn>
The RPi specs also said to wait 150 cycles at some stages in the init. Is that still a thing and how do I do that with the MMU online and future proof?
<mrvn>
My RPi driver waits 1 count of the 1MHz counter. But that's RPi specific.
k8yun has quit [Quit: Leaving]
<geist>
yah no
<geist>
generally you dont really need to shut down a uart though i guess you can disable the RX and TX function and maks the IRQs
<geist>
but if you want to shutdown, you should mask all the IRQs at the irq controller anyway
<clever>
i intentionally leave the uart configured when running the next stage
<clever>
so i can do uart prints before the platform_init has ran
<clever>
makes debug far easier
<mrvn>
IRQs have to be shutdown definetly if the driver with the IRQ handler is to unload.
<geist>
yah
heat has quit [Remote host closed the connection]
biosfood has joined #osdev
heat has joined #osdev
Vercas has quit [Quit: buh bye]
srjek has joined #osdev
masoudd has quit [Ping timeout: 245 seconds]
<vin>
Why are random read writes typically shown in IOPS where as sequntial are usually GB/s?
<mrvn>
because 0.000000001GB/s is ridiculus.
dormito has joined #osdev
<zid>
because they're completely different and measure completely different things, so the units are different?
Vercas has joined #osdev
<mrvn>
I've seen plenty of graphs with random read/write in B/s. It just depends on what you want to measure.
<bslsk05>
ark.intel.com: Intel Optane SSD DC P5800X Series 1.6TB 2.5in PCIe x4 3D XPoint Product Specifications
<vin>
But you are right it will be harder to make sense for slower devices.
<vin>
"make sense" of the numbers
<mrvn>
You think 6MB/s is not bad? My 1995 harddisk does 9.8MB/s, which is basically the limit of the SCSI bus.
<zid>
iops is a measure of latency
<zid>
GB/s is bandwidth
<vin>
yes zid my question was why choose different units for random vs sequential
<zid>
because they're measuring different things
<zid>
one is a measure of latency
<zid>
one is a measure of bandwidth
<vin>
mrvn: random read? Or am I making a mistake in my conversion for p5800x
<mrvn>
When you want to see how well the SSD manages to scatter gather reads and writes you would do random read/write and measure MB/s, not IOPS. But as said, it depends on what you want to measure.
<zid>
mechanical drives have *incredibly* poor random performance, because of the latency
<mrvn>
vin: In my case sequential because seek times on a rotating disk will kill you. But an SSD doesn't seek, so that should be fair.
<vin>
mrvn: the sequential read on that is 7.2 GB/s
<mrvn>
vin: and that should tell you that 6MB/s is not good by any measure of speed. It's 3 orders of magnitude less.
<mrvn>
vin: the data is not the bottleneck, the number of operations is.
<vin>
but I get your point that one should consider random read in MB/s when you are doing 'gather' rather than a single seek to a random block
<vin>
Yes you are right mrvn
<mrvn>
If you measure how well the disk can scatter gather or how different block sizes affect the throughput then MB/s can be relevant.
eddof13 has joined #osdev
<mrvn>
Or do random read/write of 4K blocks and then plot file size vs speed.
<vin>
So I was thinking more about the drawbacks of IO cost (num of ops * cost of the op) as a metric for an algorithm (with IO). It gives a good lower bound but does not consider concurrency or amount of data transferred.
<vin>
So bandwidth divided by the amount of data transfered gives an upper bound imo. Since bandwdith depends on amount of concurrency at play.
<clever>
vin: ive also hit an IOPS limit on SD cards, when i was doing 512 byte sequential reads
<clever>
changing the code to do 1024 byte reads (and more) gave measurable performance improvements
<clever>
the part i was too lazy to figure out though, was read-ahead and request merging
<clever>
so instead, i modified the fat32 driver, to read a whole cluster at a time, and then just formatted the card with bigger clusters
dude12312414 has joined #osdev
biosfood has quit [Ping timeout: 240 seconds]
<mrvn>
Say I have a vector<string> v; { string str{"foo"}; v.push_back(str); } can the compiler use move semantic because it deduces the lifetime of str to be over at that point?
eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<Griwes>
no
<mrvn>
why?
<Griwes>
because it's not allowed
<Griwes>
and it's observable
<mrvn>
for std::string?
<Griwes>
yes, because the default allocator uses ::operator new, so you can observe move vs copy when you are beyond the small internal buffer by replacing the global allocation function
<mrvn>
right. stupid new.
<mrvn>
I hate having to write std::move every time.
<Griwes>
there is probably an argument that for the small buffer it's not observable, but i don't think there's a real difference between a move and a copy there
<mrvn>
If the compiler knows the string so it knows it's small then it can inline everything and optimize it all away.
<mrvn>
As soon as it hits new though the optimizer would stop.
<vin>
clever: I think it also depends on the internal buffer size, doing missalgined access will always cause amplification.
<mrvn>
vin: try reading random bytes
<clever>
vin: in think the main problem i had, was the pipelined nature, where a large sequential read is easy for the SD firmware to deal with, but there is an X ms delay to start each new read
<clever>
and even doing sequential 512 byte reads, but with a pause between each, is enough to slow it down
<clever>
maybe a lack of read-ahead in that specific cards firmware
<mrvn>
memory controller have autoincrement so sequential reads just set row, set col, read, read, read, read, ....
<mrvn>
it probably forgets about where it is and maybe powers stuff down when you have a pause.
<clever>
the SD protocol also has 3 ways to read
<Griwes>
mrvn, there is some cases where new expressions can be optimized away, but I think none of those extend to explicit calls to ::operator new
<clever>
read-single, read-multiple(without count), and read-multiple(with count)
elastic_dog has quit [Ping timeout: 250 seconds]
elastic_dog has joined #osdev
troseman has joined #osdev
elastic_dog has quit [Ping timeout: 240 seconds]
<vin>
Have you ever come across concurrency being modelled mathematically? Just like runtime/IO complexity. I want to represent concurrent policies that do IO as a set of equations.