#osdev on 2022-03-06 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <mrvn> clever: no, you need to 0 out the ints.

00:00 <mrvn> moon-child: same thing if you have longs in your code.

00:00 <clever> mrvn: all ints are 64bit ints

00:00 <clever> even on a 32bit platform

00:00 <mrvn> clever: but they are likely <4GB, which is where all the allocated memory is.

00:00 <klange> my ints are 48-bit, please send help

00:01 <clever> *looks*

00:01 <moon-child> mrvn: I use a tiny fraction of the 64-bit address space

00:01 <moon-child> and it starts well above the 4gb mark

00:01 <clever> $ grep heap /proc/self/maps

00:01 <clever> 009a8000-00a13000 rw-p 00000000 00:00 0 [heap]

00:01 <mrvn> moon-child: lucky you. Linux doesn't.

00:01 <mrvn> (execpt on alpha)

00:01 <clever> moon-child: ah your right, my heap is very low in the virtual space, so it could collide with an arbitrary 32bit int

00:02 <clever> haskell searches in a similar manner, but every object has a pointer to its type description, that says exactly where to find the pointers

00:02 <clever> so it cant mis-interpret an int as a pointer

00:02 <mrvn> in ocaml pointers have bits 0/1 == 0, ints have bit 0 == 1 and are only 31/63bit.

00:02 <moon-child> http://ix.io/3RrX seems pretty high

00:03 <mrvn> 10 ans 11 are used for other special values.

00:03 <clever> https://github.com/NixOS/nix/blob/master/src/libexpr/eval.cc#L876

00:03 <bslsk05> github.com: nix/eval.cc at master · NixOS/nix · GitHub

00:04 <clever> mrvn: boehm has its own allocation routines, which could enforce alignment, so bit0/1 are always 0, and it may have extra metadata before the object

00:04 <clever> so a pointer into the middle of an object wont count

00:04 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

00:04 <moon-child> sbcl also has a tagging system, but does conservative stack scanning. Cuz sometimes you wanna spill unboxed values, and making stackmaps is a pain

00:05 <mrvn> clever: but it doesn't have control about where you point to. You can point to the middle of a string.

00:05 <moon-child> (however, scanning of everything aside from the stack is precise)

00:05 Oli_ has quit [Ping timeout: 272 seconds]

00:05 <klange> y'all seem to know a lot about gcs

00:05 <klange> why y'all ain't helpin me make mine suck less

00:05 <mrvn> moon-child: DWARF2 already has the stackmap made by the compiler.

00:05 Burgundy has quit [Ping timeout: 260 seconds]

00:05 <moon-child> sbcl does not use DWARF

00:06 <mrvn> klange: I don't think there is a GC for multi-core that doesn't suck. They just manage to suck about the same as malloc/free.

00:06 Oli has joined #osdev

00:07 <moon-child> just compacting wins you major points vs malloc/free

00:08 <mrvn> moon-child: but costs you time.

00:08 <moon-child> it pays for itself

00:08 <moon-child> but also: lots of gcs do parallel mark, and there are a few concurrent gcs

00:08 <mrvn> and they all suck in different ways

00:08 <moon-child> yes. Everything sucks! Such is life :)

00:09 <moon-child> (also, on the malloc/free front, interesting recent developments in mimalloc and snmalloc)

00:09 <mrvn> the biggest problem is how to modify values, because you have to somehow atomically flag them to the GC to be scanned again.

00:10 <moon-child> https://twitter.com/stevemblackburn/status/1494240906006110209 quotes 1% overhead from barriers

00:10 <bslsk05> twitter: <stevemblackburn> Some of the things I learned: ␤ ␤ Overhead now as low as 0.8% on a modern AMD CPU (was 1.8%). ␤ Intel CPUs are less able to hide the overhead. ␤ Field barrier overheads ~= object barriers on all but the AMD CPU. ␤ ␤ I'm so proud of these students. 🥰 ␤ ␤ 2/2

00:11 <mrvn> Luckily in functional languages nearly all values are immutable and mutable the rare exception. So you pay little for the overhead. Trying to GC something like C/C++ you pay a lot for it.

00:11 <moon-child> indeed

00:12 FatAlbert has quit [Ping timeout: 256 seconds]

00:13 <mrvn> Did boehm GC use mprotect to catch writes to already scanned memory?

00:14 <clever> mrvn: at least in the way nix uses it, its single-threaded, so scanning cant be interrupted or raced against

00:14 <mrvn> urgs, horrible. Stop the word for ages technology. :)

00:15 <mrvn> Modern GC do a little bit of work frequently and then have a tiny stop-the-world part on multi-core where all cores need to synchronize

00:16 <clever> boehm might be cheating, and doing scans in another thread, while nix does everything from 1 thread

00:16 <clever> ive not reviewed its source

00:16 <mrvn> unlikely. boehm is quite old.

00:19 <mrvn> I've seen other GC use generational heaps and mprotect. The idea is that you create a new heap every now and then and make the old ones read-only. Then if no page fault occured you know the old heap has no pointer into the new heap and doesn't have to be scanned.

00:21 <clever> haskell uses a copy collector for its stuff, with a nursery section as well

00:22 <clever> recently created objects are in the nursery, which is a smaller region, and gets scanned more often

00:22 <clever> if the object survives infant-mortality, it gets moved into the big-boys heap, where its scanned less often

00:23 <clever> the copy collector works by just copying every object in the entire heap, while chasing references, and somehow rewriting all pointers as it copies

00:23 <clever> and then it just marks the source heap as free

00:23 <clever> any objects it didnt copy, where not referenced

00:24 <clever> that also improves locality, an object is going to be closer to its siblings in the reference graph, because they got copied together

00:27 <mrvn> that's generational heaps with compaction. But haskell knows about modify operations from the compiler and doesn't have to use mprotect()

00:28 <clever> nix and haskell are also both functional, objects almost never get modified, at a value level

00:28 <clever> but thunks can mutate from a function-ptr+arg, into a concrete value, when you look at them

00:29 <mrvn> When you talk to c++ people about performance they always say: avoid indirections, never ever used linked lists. They will kill you. Ocaml, haskell, ... are full and full and full of indirections and still the code runs comparable to optimized c/c++. The linked lists and indirections are just so localized that they are basically always in cache.

00:30 <clever> ive seen a talk before about how to not have fps jitter in android games, without native code

00:30 <clever> and it basically boiled down to a few things

00:30 <mrvn> run GC::sweep() after every frame

00:30 <clever> 1: dont create any garbage in your render routines, reuse field members on a constant object, instead of local vars

00:31 <clever> 2: disable gc entirely during rendering, and force a gc scan when the fps doesnt matter, like a loading screen

00:31 <mrvn> urgs, horrible.

00:31 <clever> if you keep garbage creation to a minimum, you can go an entire level without a single gc

00:31 <clever> i think the root problem, is that a gc may take more then 1/60th of a second

00:32 <mrvn> I create tons of garbage while rendering in anything with fps. All in the nursery if I set it up right and at the end of rendering the GC frees it all in one sweep.

00:32 Oli has quit [Read error: Connection reset by peer]

00:32 <clever> and if it does, you miss a frame, and the frame rate now has jitter

00:32 <clever> the old dalvik engine may not have had a nursery

00:32 <mrvn> All the GC has to do is copy the 10 or so objects that are still alive (e.g. new rockets fired by the player) from the nursery into the long term heap and done.

00:32 <clever> pretty sure that talk also predates the dalvik->native llvm converter

00:33 Oli has joined #osdev

00:34 <mrvn> clever: For games you want a GC where you set it up to use a fixed (or given) amount of time to do work each frame. E.g. use 1/60 - <time spend rendering>.

00:35 eddof13 has joined #osdev

00:35 <mrvn> So if one frame takes longer to render you do less GC work and next frame you catch up again when less happens.

00:35 <clever> yeah, being able to tell the gc to do less work could do

00:35 <mrvn> You could even set up the GC to run until the vblank IRQ fires.

00:37 <mrvn> Some years back they profiled and benchmarked malloc/free in C/c++ and the GC in ocaml and found that programs generally spend the same amount of time in malloc/free as they do in the GC.

00:37 <mrvn> So you just have to make sure the GC work is cut into small enough pices to not get jitter.

00:38 <clever> i feel like free() should be pretty darn fast

00:39 <clever> you just need a structure at a negative offset from the object being freed, and maybe a doubly linked list to similar structures forward/back

00:39 <clever> so you just need to flag that slot as free space, and optionally merge it with contiguous free space blocks

00:39 vdamewood has joined #osdev

00:39 <clever> and maybe update some free-space lookup tables, if you have them

00:40 <clever> malloc feels like the costly one, having to search for a hole that is big enough, and then dividing it down into an object+hole pair

00:40 <mrvn> free needs to merge holes. It's ~400 lines of code in glibc from what I read.

00:40 <clever> !?

00:41 <clever> merging holes seems pretty simple, just check the previous and next item

00:41 <clever> you only have 4 cases to deal with, object/hole/object, hole/hole/object, object/hole/hole, and hole/hole/hole

00:41 <mrvn> check canaries, check if you could do sbrk() to free memory or munmap(), ...

00:42 <clever> and the only difference is if your new hole is 1/2/3 objects long

00:42 <milesrout> rubbish collection is much less necessary when you have less rubbish to collect, yes.

00:42 <mrvn> check if the address is in a heap of malloc/free at all

00:42 <mrvn> locking to protect against other threads

00:43 <clever> now i'm wondering how lk does things...

00:43 <clever> all of my builds are using miniheap

00:43 <mrvn> doesn't glibc also have extra hepas for small objects?

00:43 <clever> https://github.com/littlekernel/lk/blob/master/lib/heap/miniheap/miniheap.c

00:43 <bslsk05> github.com: lk/miniheap.c at master · littlekernel/lk · GitHub

00:44 <milesrout> the glibc malloc/free implementation needs to work for basically every program out there. of course you could come up with an algorithm that will work better for your program, but can you come up with an algorithm that will work better for every program? thankfully you don't need to when you're writing your own allocator

00:44 <mrvn> milesrout: I can, others have too.

00:44 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

00:44 <milesrout> there are good general purpose allocators out there and some of them may even be better than glibc malloc, i won't deny that

00:45 <mrvn> milesrout: especially in C++ you can gain a lot for allocations of known size.

00:45 nyah has quit [Ping timeout: 245 seconds]

00:45 <milesrout> but often you see people say things like "oh well I replaced malloc with my own memory allocator for this program and it's faster therefore glibc malloc is bad" which is obviously not true

00:45 <clever> lets see, miniheap has a linked list of free space, and initially just adds the entire defined heap range as one free chunk

00:45 <clever> heap_insert_free_chunk also deals with merging with neighboring chunks automatically

00:45 <milesrout> plus the whole malloc/free interface where free doesn't take a size is inherently less efficient

00:46 <mrvn> milesrout: not really. You can't trust the size given by bad code anyway.

00:47 <milesrout> and that's another reason: the standard library allocator needs to be more defensive, while most custom allocators are less defensively written :)

00:47 <clever> size_t is usually the same size as void* right?

00:48 <mrvn> clever: usually but not neccessary

00:48 <clever> LK's miniheap uses a 3 pointer wide struct to track free space, and just casts the ptr of your free'd object into that i think

00:49 <clever> so that implies objects have a minimum size of 3 pointers

00:49 <clever> reading more...

00:49 <mrvn> clever: isn't it 3 pointer + canary? Or does it not care about alignment rules?

00:49 <klange> I need to improve my malloc, it is very bad at freeing. It doesn't release memory at all.

00:49 <klange> It's pretty good on benchmarks otherwise.

00:50 <milesrout> to be honest that is probably a good thing for some programs...

00:51 <mrvn> clever: You have to store the size of every block before the block and looking at x86 memory must be 16 byte aligned. So a minimum block would be 32 bytes where 16 bytes are usable.

00:51 <milesrout> there are some C++ programs out there that hang when they quit because they call delete a million times on all those RAII objects only for the kernel to free all that memory when they quit anyway

00:51 <klange> It's surprisingly less important to release memory from free'd objects than one might think, but I have a lot of potentially long-running stuff that could end up holding onto megabytes of RAM it's not using and I want to deal with that.

00:51 <mrvn> milesrout: if they have RAII then they do something more than free in the destructor.

00:52 <milesrout> yes RAII manages resources other than memory

00:52 <mrvn> klange: I don't free memory, I buy more. Works a long long time.

00:52 <clever> mrvn: let me double-check what miniheap_alloc does exactly...

00:52 <mrvn> void *miniheap_alloc(size_t size, unsigned int alignment) {

00:52 <milesrout> they also call close on file descriptors that will be closed automatically by the kernel on process exit too :P of course SOME things do need to be done even on quit, like deleting lock files

00:53 <mrvn> clever: looks like it can place chars closer together than avx512 rebisters.

00:53 <klange> It's not that I don't have a free(), I have a very reasonable free(), and if your allocation patterns involve lots of the same size stuff my malloc is _excellent_.

00:53 <milesrout> but most of what is done in the average destructor of a C++ object doesn't actually need to be done atexit

00:53 <mrvn> milesrout: AmigaOS does none of that. Write good code and close/free everything properly please.

00:53 <klange> It's when you ask for a big thing... then ask for a _bigger_ thing, then free the smaller thing, and then never ask for anything of similar size again that there's problems.

00:53 <klange> Which I do sometimes when... resizing windows!

00:54 <klange> Some merge + divide for large allocations will help, too.

00:54 <milesrout> yeah no thanks, i have no interest in my processes literally hanging my computer when I quit them for seconds so they can do a whole lot of stuff that will be done anyway, just for the sake of the possibility of it one day being ported to a long-dead operating system that is fundamentally broken

00:54 <mrvn> milesrout: the average destructor in c++ does nothing. Destructors that do something are the exception.

00:54 <clever> mrvn: ah, line 58/63/64, neat, the heap will just omit all of the safety fields if you build without debug support

00:55 <milesrout> i am including in the work done by the destructor the automatic destruction of all the object's members

00:55 <clever> so, assuming no debug, that is 2 pointers of overhead prepended to each object, with debug it becomes ~5 pointers?

00:55 <mrvn> clever: what's the void *ptr for?

00:55 <clever> ah, but certain debug also adds a 64 byte padding between objects

00:55 <mrvn> clever: previous block?

00:56 <clever> mrvn: i think thats the hole its inspecting, and possibly returning back to the user

00:58 <mrvn> clever: no, the ptr is for alignment. It gets a chunk, adds a bit to get aligned and then ptr points back to the start of the chunk: line 253

00:59 Matt|home has quit [Quit: Leaving]

01:00 <mrvn> Line 118 is the mergeing code. bad bad bad

01:00 <moon-child> mrvn: re nursery, I wish I could tell gc when I blit, and it can tune the nursery size to 99%ile (or w/e) per-frame allocation size

01:01 <moon-child> combine with pretenuring and you get the same performance as manual regions, and way better safety

01:01 <moon-child> same applies to e.g. web programming

01:01 <mrvn> moon-child: bigger doesn't hurt. It knows how much is used.

01:02 <moon-child> it doesn't know when the ticks happen, though. Best-case scenario, gc happens every tick, and copies almost no objects between semispaces

01:02 <mrvn> moon-child: In ocaml the GC has a public interface with stats. You can read out the amount of allocations each frame and calculate the delta. Then set the nursery so 1.5*delta.

01:02 <moon-child> if you gc in the middle of a frame, you're gonna do a bunch of extraneous copying

01:02 mahmutov has quit [Ping timeout: 240 seconds]

01:02 <moon-child> ah that's cool

01:02 * moon-child hasn't really used ocaml

01:02 <mrvn> moon-child: and you call gc.minor() to force a nursery collection

01:03 <mrvn> I think such a public interface to the GC is essnetial if you want to tune it dynamically to a game.

01:03 <clever> mrvn: ah, line 183 is a bit tricky, if debug is off, the alloc_struct_begin is only 2 pointers in size, but free_heap_chunk is 3 pointers in size, so an malloc(1) gets bumped up to malloc(sizeof(void*))

01:05 <mrvn> pathological case. You have to alloc something less than 4, set alignment to less than 4 and the roundup to sizeof(void*) needs to not increase the size. So the only case I can see where free_heap_chunk won't fit is malloc(0, 1)

01:06 <mrvn> s/4/8/ for 64bit archs.

01:06 <clever> another topic, is mixing both malloc and relocatable heaps

01:06 <clever> palmos and the official rpi firmware both do that

01:06 <mrvn> what else would yo do?

01:06 <clever> small objects go to the normal malloc heap

01:07 <clever> large objects go to a special relocatable heap, where alloc returns an opaque token, not an addr

01:07 <clever> lock returns the current addr, and unlock unlocks!

01:07 <clever> while unlocked, the os is free to move objects around, to fix free space fragmentation

01:07 <moon-child> huh neat

01:07 <mrvn> oh, that's a totally different interface then.

01:07 eddof13 has joined #osdev

01:07 <mrvn> so ralocatable objects must not have any pointers, only inidexes into arrays.

01:08 <clever> yeah

01:08 <clever> its typically used for image buffers

01:08 <clever> the pointers and object-tokens are all in the standard malloc heap, where things are small

01:08 <clever> the relocatable heap then holds multi-mb images

01:08 <moon-child> lock/unlock also smells a lot like gpu apis

01:08 <moon-child> so makes sense

01:09 <clever> palmos also did it, because there was no mmu, and every active process had to share an address space

01:09 <mrvn> glibc malloc uses mmap at a certain size giving you individually mapped regions of memory.

01:09 <clever> mrvn: but the VPU on the rpi also lacks an mmu, so it cant just cobble random pages together in a random order

01:09 <mrvn> In functional languages you have copying/compacting GCs where the GC will move objects around and fix up all the pointers automatically. That's realy neat.

01:10 <moon-child> (also java)

01:12 <mrvn> Going back to miniheap. I would add a back pointer to allocations so that on free() you can access the chunk before and after and do an O(1) merge.

01:12 <clever> ah, now that i look at it, yeah, ouch

01:13 <clever> the header on any object, only has a ptr and a size, but what is that ptr for...

01:13 <moon-child> mrvn: funny, I just rewrote my allocator, using that design

01:13 <clever> line 248...

01:14 <clever> ptr is just "chunk" ...

01:14 <mrvn> clever: to undo the alignment.

01:14 <clever> ah

01:14 <clever> is the alignment always going to just waste enough space to align?

01:15 <mrvn> Looks like it. So alignment=4096 wastes a lot of space

01:15 <mrvn> it should check if the hole at the front is large enough for struct free_heap_chunk

01:15 <clever> i would have chosen to slice a free-chunk up, so it ends at alignment-sizeof(alloc_struct_begin)

01:15 <mrvn> size = ROUNDUP(size, sizeof(void *));

01:15 <clever> so the unused space remains free

01:16 <mrvn> The free chunks are 4/8 byte aligned but memory is 16 byte aligned per default.

01:17 <clever> the 3d code is the only time ive had to align things at runtime: https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c

01:17 <bslsk05> github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub

01:17 pretty_dumm_guy has joined #osdev

01:17 <clever> memalign() is the keyword to search for

01:18 <clever> 16 and 256 are my only alignment requirements at runtime currently

01:18 <mrvn> the code is odd. Check out line 193-199

01:18 <clever> the shader (line 143) must also be 32 byte aligned, but the linker can do that

01:18 FatalNIX has quit [Quit: Lost terminal]

01:18 <clever> the binner and render control-lists are a custom bytecode

01:18 <mrvn> Without alignment the size is rounded up to sizeof(void*). With alignment it's at least 16 bytes.

01:18 <Jari--> good morning friends

01:19 <clever> and thats what the original hello-world did to generate it

01:19 <clever> mrvn: https://docs.broadcom.com/doc/12358545 page 65

01:19 <mrvn> #Then on line 245 the alignment happens.

01:20 <clever> line 198 of v3d.c, you have a 112, so you look that up in the pdf, page 71

01:20 <Jari--> Enjoy OSDEV: Physical Memory Mapped I/O Memory Space Windows !!!

01:20 <mrvn> So miniheap_alloc(32, 0) will give memory that is 4/8 byte aligned and miniheap_alloc(32, 1) will give memory that is 16 byte aligned and wastes 16 extra bytes.

01:20 <clever> opcode 112 is "tile binning mode configuration", it is then followed by 120 bits of config (15 bytes)

01:21 <Jari--> Can you talk to the PCI bridge to realign the memory mapped devices to some other more proper memory space?

01:21 <Jari--> Does todays display adapters support 32-bit PCs?

01:22 <mrvn> Jari--: yes and no

01:22 <Jari--> mrvn, would you draw filled polygons with HW acceleration or actually with directly poking to the VGA RAM?

01:22 <Jari--> Whether which is faster.

01:23 <mrvn> yes

01:23 <clever> mrvn: i could use a struct with bit-fields, to create that entire opcode-112, and its 120 bits of payload, but is there then any elegant way to dynamically create a blob containing a random mix of structs?

01:23 <Jari--> mrvn, so many Radeons on the market.. buying a VGA driver for Radeon would make so many devices functional

01:23 <mrvn> clever: a factory returning a pointer to a base type.

01:24 <mrvn> clever: or templated factory

01:25 <clever> mrvn: but how would i create a byte-array, which just has the raw bytes of 4 different structs concated back2back, where the types are only known at runtime?

01:25 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c#L302-L330

01:25 <bslsk05> github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub

01:25 <mrvn> by returning a pair of pointer and size.

01:25 <mrvn> or std::string

01:26 <clever> in this example, i need a 7 byte object (line 303), then a variable number of 3byte + 5byte objects + 1byte objects

01:27 <mrvn> should that be returned by value or emplaced in some bigger arrays?

01:27 <clever> emplaced into a bigger array, so you dont waste memory bandwidth copying it there later

01:28 <mrvn> so pass in the pointer to the current position as reference and modify to point after the opcode on exit.

01:28 <clever> and ideally, that bigger array should have some pre-allocation logic, like std::string does

01:28 <mrvn> void emplace(void **pos, enum Op op, ...) { }

01:28 <clever> yep, and thats exactly what the addbyte(&p, 123) is doing

01:29 <clever> about all i would be changing, is making it into a nicer tileBinningConfig(&p, arg1, arg2, arg3);

01:29 <mrvn> If you want it save add a pointer to the end or use a struct { void *start, *pos, *end; }

01:30 <clever> i can just increment by sizeof(foo) i think

01:30 <mrvn> basically a bounds checked iterator of the big array.

01:30 <clever> in its current state, the only thing that is actually dynamic though, is the resolution

01:30 wolfshappen has joined #osdev

01:31 <clever> but if i want support for a variable number of shaders, or changing shader type, it will get more complex

01:31 <mrvn> If you have a struct Op121 { uint_t op{121}; uint16_t arg1{0}; ...} then watch the padding and alignment.

01:32 <clever> the hardware doesnt expect any special alignment on these, it just uses however many bytes it uses

01:32 <clever> but opcode-1 is a nop, so it could be used as padding

01:32 <mrvn> but the compiler will alignt it and pad it. and BOOM

01:32 <mrvn> probably a use case for attribute packed.

01:32 <clever> i could memset the entire buffer to 1 initially, so the padding turns into nop's

01:32 <clever> or that

01:33 <mrvn> The above would have 1 byte bapdding between op and arg1. That can't work.

01:33 <mrvn> s/bapdding/padding/

01:33 <clever> mrvn: but look at page 71 of https://docs.broadcom.com/doc/12358545

01:34 <clever> youll see that its not actually a big 121bit blob, but rather, a 32, 32, 32, 8, 8, 1, 1, 1, 2, 2, and 1bit field

01:34 <mrvn> I would just make helper function to emplace the different opcodes with nice argument lists and then place each byte into the stream manually.

01:34 <clever> yeah

01:34 <mrvn> Unless you are going to read out those lists a lot it's just not worth it.

01:35 <clever> these control lists are write-only

01:35 <clever> once i create the list, i pass it off to hardware, wait for an irq, then free it

01:35 <mrvn> You can define bitfields internally to move the arguments into place but just shifting things around is probably simpler.

01:36 <clever> ah, correction, i dont free it, i just overwrite it on the next frame

01:38 <mrvn> some years back I tried defining bitfields for the ARM page tables and MMIO registers for peripherals. But the compiler wouldn't optimize them nicely so that setting a bunch of bits would get combined into a single 32bit write all the time.

01:38 <mrvn> Just shifting and masking gets optimized all the time.

01:40 <clever> that topic also came up on the subject of MMIO a few days ago

01:40 <clever> some of the more sensitive registers, require you to OR 0x5a000000 into every write you do

01:40 <clever> if you try to modify a single bit using bitfields, the read/modifiy/write wont put the 5a back in

01:40 wolfshappen_ has joined #osdev

01:40 <clever> so the hardware then ignores the write entirely

01:40 wolfshappen has quit [Ping timeout: 256 seconds]

01:41 <mrvn> yeah, but there we talked about the need to volatile them. volatile kills any read/write combining.

01:41 <clever> fuschia i think, uses a c++ class system, where you control the load/store

01:41 <clever> so you can load from mmio->local var, but then use setters/getters to mutate it

01:42 <clever> and then store it back to mmio when you choose to

01:43 <mrvn> I have an abstraction layer for that somewhere using templates. I have Bit<i>, Bits<i0,i1,i2,i3,...>, MBZ, MBO, ... templates and a register is a collection of those. That then all gets shifted, masked, ored together with a volatile read/write for hardware access.

01:44 <geist> yah that's basically the scheme that fuchsia uses

01:44 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/common/broadcom/bcm2708_chip/cpr_clkman.h#L815-L848

01:44 <bslsk05> github.com: rpi-open-firmware/cpr_clkman.h at master · librerpi/rpi-open-firmware · GitHub

01:44 <geist> it's like MakeReg().GetValue().SetBit().ClearBit().SetField().Write();

01:44 <clever> the originaly released rpi headers, just use a big heap of #define's

01:44 <geist> i dont particularly like it, but it does control precisely when you read/write the register

01:44 <clever> to give you a starting bit, and some pre-made bit masks

01:45 <mrvn> The Bit and Bits types are views into the register. There isn't actually a local copy of the value unless you make one and every access picks out the right bits.

01:45 <clever> so you have 0xfffff9ff to clear a value, 0x00000600 to select the value, and 9 to << or >> for inserting or extracting

01:45 <clever> all named as CLR, SET, and LSB

01:46 <clever> but you then need to: CM_GP1CTL = (CM_GP1CTL & CM_GP1CTL_MASH_CLR) | ((mash << CM_GP1CTL_MASH_LSB) & CM_GP1CTL_MASH_SET);

01:46 <clever> and now it gets ugly :P

01:47 <mrvn> geist: how about: Reg << ~GLOBAL << PRESENT << KERNEL_RO;

01:47 <mrvn> streams get combined into one read, modify, write.

01:49 <mrvn> With your code what happens if you forget the .Write() at the end?

01:49 <mrvn> [[nodiscard]] error?

01:50 wolfshappen_ has quit [Ping timeout: 256 seconds]

01:50 ElectronApps has joined #osdev

01:51 <mrvn> clever: those masks and shifts break down on x86_64 with pagetables where the adddress is split up and stored at different locations.

01:52 <clever> yeah

01:53 Matt|home has joined #osdev

01:53 <mrvn> for x86_64 I had Bit, Bits, Span and Spans

01:54 <mrvn> Bits being random bits, Span a sequence of bits and Spans random Span combined into one value.

01:56 <mrvn> But with what geist mentioned the other day about needing to have barriers between access to different peripherals I think I will refactor my code to use a lock_guard approach. First you lock a peripheral into use and then registers if you want multiple accesses to a reg.

01:57 <clever> mrvn: thats a thing on the bcm2835/pi0/pi1, something about how the arm axi master cant deal with replies coming back out of order

01:58 <mrvn> clever: not just there. Back there you get corrupt read writes, which is worse. But in general you get reordering of writes.

01:58 <clever> my understanding of the bug, is that the re-ordering causes values to be swapped

01:58 <mrvn> yes, that was the big bug.

01:59 <clever> i assume the pi2 and up has fixed it, because with SMP, you cant control what the other cores are doing

02:00 <mrvn> But on any ARM the peripherals on the AXI are in a tree as geist described it. When you write the data goes to the root and the gets pushed onto the right child along the tree. Now say you write 20 things to the UART they all get pushed down the left side of the root AIX node and backlog there. Then you write to the GIC which goes to the left of the root AIX node and gets written out immediatly.

02:01 <clever> yeah

02:01 <mrvn> aeh, GIC on the right side.

02:01 <clever> same even happens in the rp2040 MCU

02:01 <mrvn> Each register does get the correct content (unlike with the bcm2835/pi0/pi1 bug) but the timing is out-of-order.

02:01 <clever> to limit the size of the axi's matrix, all of the "slow" peripherals are on a second axi tree, under a single slave port

02:02 <clever> and because of how slow uart is, you only need a write every n clocks

02:02 <clever> so contention between multiple masters fighting over that 1 slave, isnt as much of an issue

02:03 <mrvn> Now consider this: You turn of IRQs in the UART. Then you turn on IRQs in the GIC. If the GIC gets the write first you might suddenly get an IRQ from the UART because it hasn't turned off yet.

02:04 <clever> mrvn: https://i.imgur.com/9RQET8Z.png the rp2040 axi tree

02:05 <clever> the main crossbar, is a full 4:10, and can allow all 4 masters to be doing a transfer in the exact same clock cycle, as long as its to 4 different slaves

02:05 <clever> but that kind of thing is expensive, and making it a full 4:21 would probably be costly

02:06 <clever> so all of the "slow" stuff was shoved onto a secondary 1:11, far simpler with only 1 master

02:06 pretty_dumm_guy has quit [Quit: WeeChat 3.4]

02:06 <clever> but its still running at the same clock speed

02:06 <mrvn> And I challenge you to write multithreaded code that can access the same slave (other than APB Bridge) at the same time and not blow up

02:07 <clever> another kind of funky thing, is the flash xip, is on 2 slave ports

02:07 <clever> the second port under that other splitter, is only for the bulk flash->ram copying fifo

02:07 <mrvn> is one for boot and one for later?

02:08 <clever> the direct axi-slave, is the main XIP flash window, a 16mb address range, directly mapped to spi flash

02:08 <clever> that way, one master can have a cache hit on the XIP flash window (via the flash-xip block)

02:08 <clever> while a second master is doing a bulk copy from flash->ram, with the dedicated bulk-copy fifo (which has a lower priority, and wont trash the cache)

02:09 <clever> during boot, the XIP isnt enabled, the rom just drives the QSPI controller manually, to copy things into ram

02:10 <clever> there is also a priority system for the axi masters

02:10 <clever> so you can choose who winds when there is contention over a given slave

02:10 <mrvn> And flash doesn't change at runtime so reading either way and chaching never gets a bad result.

02:10 <clever> wins*

02:10 <clever> flash can change at runtime, but you have to disable the XIP first

02:11 <clever> and there are routines to flush the cache as well

02:11 <clever> https://github.com/raspberrypi/pico-sdk/blob/master/src/rp2_common/hardware_flash/flash.c#L86-L106

02:11 <bslsk05> github.com: pico-sdk/flash.c at master · raspberrypi/pico-sdk · GitHub

02:11 <clever> __no_inline_not_in_flash_func tells the linker to place this function in ram, so it can survive XIP being turned off

02:12 <clever> flash_init_boot2_copyout will pre-copy the XIP configuration block from flash->ram

02:12 <clever> flash_exit_xip turns XIP off, so you can issue plain SPI commands to the chip

02:12 <clever> flash_range_program and flash_flush_cache do what the name says

02:13 <clever> flash_enable_xip_via_boot2 then turns xip back on, using the boot2 that was previously copied out

02:14 wolfshappen has joined #osdev

02:16 masoudd has joined #osdev

02:18 <clever> its getting late, i should get off to bed

02:20 kingoffrance has joined #osdev

02:21 <mrvn> it's past late, it's getting early.

02:21 <mrvn> n8

02:23 * kingoffrance stares at 10h ago 2h and 3.7G. I wonder if the code will ever compile or if the source is a halting problem.

02:23 <gog> noite

02:24 <kingoffrance> i can either check the logs and assumed it finished, or assumed it took 10h and still going

02:36 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

02:38 [itchyjunk] has quit [Ping timeout: 260 seconds]

02:38 <moon-child> AGHHHH

02:38 <moon-child> I thought __builtin_clz was polymorphic like __builtin_popcnt and co

02:38 <moon-child> apparently nope

02:39 <moon-child> need to explicitly __builtin_clzll

02:39 [itchyjunk] has joined #osdev

02:47 <gog> consistent interfaces? idk what that is

02:47 <gog> :D

02:50 [itchyjunk] has quit [Ping timeout: 240 seconds]

02:57 [itchyjunk] has joined #osdev

04:27 Oli has quit [Ping timeout: 272 seconds]

04:28 <Clockface> i was thinking that libs could "hook" a string in my kernel, where a program submits a string, and if anything "hooked" it, the program is returned the adress to call to access that library

04:28 <Clockface> is this a decent way of doing things?

04:29 Oli has joined #osdev

04:29 <Clockface> how do dynamically linked libs normally tell the program what to call?

04:30 <moon-child> Clockface: why do you want to be able to do that?

04:31 <Clockface> i want to get a way to build my kernel of modules early on

04:31 <Clockface> since a modular kernel sounds nice

04:31 <Clockface> and its about time i decide on a way to link stuff together

04:32 <moon-child> a kernel module is not generally considered a program

04:33 <moon-child> do you want to use this as a mechanism for kernel modules to communicate with each other, or for userspace to communicate with userspace, or for userspace to communicate with the kernel?

04:33 <Clockface> kernel-kernel

04:34 <Clockface> module calls module manager, gives it a pointer to "mydependancy"

04:34 <Clockface> mydependancy is hooked to the string "mydependancy"

04:34 <Clockface> so it is returned the adress of mydependancy

04:34 <Clockface> so it can call it

04:34 <moon-child> ok. So the idea is that 2 modules can independently define an interface for communication, which the kernel knows nothing about?

04:35 <Clockface> yeah

04:36 <Clockface> the programmer of the module defines how to work with it

04:36 <moon-child> ok. That is doable, but somewhat fraught

04:36 <moon-child> basically recreating the issue of service discovery :P

04:36 <Clockface> exactly!

04:37 <moon-child> I think time has shown unified architecture works better

04:37 <moon-child> if you really want it, though, I suggest uuids rather than strings

04:37 <moon-child> (cf uefi)

04:37 <moon-child> to avoid collisions and permit versioning

04:37 <Clockface> im hoping strings result in less collisions

04:38 <Clockface> since then modules can be given a unique name

04:38 <Clockface> so 2 people dont decide on the same number for 2 different things

04:38 <moon-child> ermm no

04:38 <moon-child> rngs are very good. 128-bit uuid is not gonna have collisions

04:39 <moon-child> names will totally have collisions

04:39 <CompanionCube> there's a standard form of UUID, you can try those

04:39 <moon-child> (no one is picking uuids by hand :P)

04:39 <Clockface> oh cool

04:40 <CompanionCube> moon-child: iirc the GPT partition uuid for GRUB was hand picked

04:41 <moon-child> :<

04:42 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

04:44 <Clockface> well, 2^128 possible ID's and one billion already existing unique things has an acceptibly low collision chance of 1 in 340282366921000000000000000000 according to my calculator

04:44 <Clockface> im convinced

04:44 <CompanionCube> it's a memorable example because in ascii its 'Hah!IdontneedEFI"

04:45 vdamewood has joined #osdev

04:45 <Clockface> that has a suspect number of 0's in it

04:45 <Clockface> but its still massive

04:45 <Clockface> i dont need an exact number to be convinced

04:45 <moon-child> that's not quite right

04:45 <moon-child> due to birthday paradox it is lower

04:45 <moon-child> but still high enough

04:46 <Clockface> what is the birthday paradox?

04:47 <moon-child> it's not enough to consider the probability that I collide with an existing uuid when I generate a new one

04:48 <moon-child> you have to consider the probability that _any_ newly created uuid collides with _any_ existing uuid

04:48 <moon-child> it is so-called because of the apparently-paradoxical fact that, in a room of 21 people, there is a 50% chance that 2 of them have the same birthday

04:57 <moon-child> s/-//

05:02 <Clockface> ah

05:02 <Clockface> well bigger number than i can count to on my fingers regardless

05:02 <Clockface> im convinced

05:08 <Clockface> acshually, i can do a hybrid approach

05:08 <Clockface> because everyone loves doing both solutions at the same time despite common sense!

05:09 <Clockface> i will SAY you should have it be a 16 byte long randomly generated integer

05:09 <Clockface> but it will be null terminated

05:09 <Clockface> so you can use a normal string as well if your stupid

05:09 <Clockface> >:)

05:10 <gog> i can count to 1023 on my hands

05:10 <CompanionCube> inb4 instant null pointer exploit

05:10 <gog> 1048575 if i use my feet too

05:11 <moon-child> Clockface: if you do not force people to not be stupid, they will be stupid

05:11 <moon-child> and if you do not care whether people are stupid, then it doesn't seem worth bothering at all

05:12 Oli has quit [Ping timeout: 272 seconds]

05:13 <Clockface> but some might be good too

05:13 xenos1984 has quit [Read error: Connection reset by peer]

05:13 <Clockface> its a hybrid approach

05:13 <Clockface> :)

05:14 <Clockface> hybrid and modular everything

05:14 <Clockface> but yes, thank you for suggesting UUID's

05:21 <Jari--> https://en.wikipedia.org/wiki/Phantom_OS

05:21 <bslsk05> en.wikipedia.org: Phantom OS - Wikipedia

05:23 <Jari--> "Persistence – Application code does not see OS restarts and could live forever—this makes the concept of a file obsolete and any variable or data structure could be stored forever and at the same time be available directly through a pointer. Differently from hibernation that is done in other OSs, persistence lies in the very core principles of the Phantom OS core. It is done transparently

05:23 <Jari--> for applications; in most cases it requires no reprogramming of an application. Persistence stays even if the computer crashes."

05:24 <gog> that's pretty cool

05:25 Oli has joined #osdev

05:25 <moon-child> EROS&co did something like that too iirc

05:26 <Jari--> It really changes lot of the implemntation too, right?

05:26 <Jari--> Application-wise.

05:27 <gog> said it's transparent

05:27 <moon-child> hmmm probably depends

05:27 <moon-child> like, say you wanna make a word processor

05:27 <moon-child> you no longer have to make a custom serialization format ... unless you wanna be able to send documents to other people

05:28 <moon-child> (this is why I go referentially transparent; no pointers, so you can redistribute all the objects with no problem)

05:28 <gog> and while i can think of a number of sticky points involved with restoring state from a poweroff, it seems no different than when a process' context is suspended during a task switch

05:28 <moon-child> I have also been thinking about durability guarantees wrt ipc/transactions. But I don't know if they aim for fault tolerance

05:31 xenos1984 has joined #osdev

05:32 <moon-child> (if you don't force sequentially consistent ordering of storage, and fault/restore, maybe app A says to app B 'can you give me a token' and app B says 'but I gave you the token already')

05:32 <Clockface> my goal for mine is that everything that interfaces with non-kernelspace is like a DOS extender kind of, no syscalls are interrupts are hooked unless you install something that does so

05:32 <moon-child> (presumably you maintain coherence within a given application, but it could be prohibitive to do so across multiple apps)

05:34 <Clockface> my DOS extender analogy mostly is intended to convey that the OS itself yeilds complete control can be yeilded to the "extender" which runs programs of its own

05:35 <Clockface> everything runs in kernelspace until you run something to run userspace stuff

05:35 <Clockface> thats why i consider kernel modules a high priority

05:37 epony has quit [Read error: Connection reset by peer]

05:37 epony has joined #osdev

05:38 <gog> i'm working towards having modularity. its already feasible with a little more work

05:39 <Clockface> is it much trouble to port real mode assembly to 32 bit assembly?

05:40 <Jari--> Miss OS/2, but hellish, this is Microsoft's OS/2 I am using. Windows 10!

05:40 <Clockface> seems like it should not be, ill just be wasting some room in the registers

05:41 <Clockface> well to be honest, i plan on using virtual 8086 mode for the protected mode version, and my assembly interpreter for long mode

05:42 <Clockface> this is getting ugly

05:42 * Jari-- dreams his OS will be as stable as Windows 95 one day

05:42 <Jari--> Only ceremorial intended errors.

05:43 <Clockface> i plan on making windows look like a good option for a mainframe running something involving banking

05:43 <CompanionCube> huh?

05:43 <Clockface> windows: stable as a rock

05:43 <Clockface> comparitevly at least

05:44 <Clockface> my design has slowly morphed into a 16/32/64 bit monstrosity

05:45 <Jari--> Clockface, remember FARPTR* ?

05:45 <Jari--> The joy of DOS.

05:45 <graphitemaster> Everyone talks about nearptr and farptr but no one has had to deal with hereptr or thereptr

05:46 * gog slaps roof of memory

05:46 <gog> this bad boy can hold so many pointers

05:48 <CompanionCube> for sheer reliability, something like VMScluster or IBM parallel sysplex would be interesting if probably v dificult.

05:48 <Clockface> i missread that as FARTPTR* addmitedly

05:49 <Clockface> i proceeded to look that up to see if there was some sort of joke surrounding it

05:49 <graphitemaster> All farts have fartptrs, it's when someone points their finger and says "it was them"

05:50 <Clockface> lol

05:51 epony has quit [Remote host closed the connection]

05:52 <CompanionCube> huge fartptrs

05:52 <Jari--> enormous

05:54 <graphitemaster> stink up your code today, add a bunch of fartptrs

05:55 <graphitemaster> this bad boy has so many cache misses

05:56 <kingoffrance> hither thither whither hence thence....fseek() fseek() lseek() BUGS this document's use of whence is incorrect english, but is maintained for historical reasons. there is a whence...

05:56 <kingoffrance> "from what place" ...kind of like come from :)

05:56 <gog> from whence you came you shall remain until you are complete again

05:59 <kingoffrance> s/fseek/&o/

06:21 epony has joined #osdev

06:24 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

07:04 FatAlbert has joined #osdev

07:04 <FatAlbert> what's the difference between stty and termios.h i ? i guess the latter providees more granularit ?

07:05 <klange> stty is a utility, and termios.h is one of a couple of headers defining the values used in interfaces that stty, among other things, would use to configure terminal stuff

07:13 <FatAlbert> i know the definitions .. i guess most of the guys wouldn't write C progrm to define terminal stuff unless they need very specific set of values ( which i guess can't be handled with stty ) ?

07:16 gog has quit [Remote host closed the connection]

07:16 gog has joined #osdev

07:18 Jari-- has quit [Remote host closed the connection]

07:26 GeDaMo has joined #osdev

07:35 <klange> `stty` is generally implemented as little more than a call to tcgetattr+tcsetattr, maybe ioctl(..., TIOCGWINSZ, ...) and TIOCSWINSZ, and then some string parsing / generation to understand / explain settings

07:38 <FatAlbert> gog: don't hear you ..

07:38 <gog> mew

07:39 <FatAlbert> that's my girl

07:39 <gog> o:

07:39 <klange> I used to ship minix's stty prior to 2018, for a "real" example: https://github.com/klange/toaruos/blob/v1.0.0/userspace/extra/stty.c#L132-L190

07:39 <bslsk05> github.com: toaruos/stty.c at v1.0.0 · klange/toaruos · GitHub

07:39 <klange> and my own implementation these days: https://github.com/klange/toaruos/blob/master/apps/stty.c

07:39 <bslsk05> github.com: toaruos/stty.c at master · klange/toaruos · GitHub

07:41 Oli has quit [Ping timeout: 272 seconds]

07:55 [itchyjunk] has quit [Read error: Connection reset by peer]

08:05 zaquest has quit [Remote host closed the connection]

08:07 zaquest has joined #osdev

08:12 mahmutov has joined #osdev

08:22 amazigh has quit [Ping timeout: 240 seconds]

08:31 amazigh has joined #osdev

08:32 epony has quit [Ping timeout: 240 seconds]

08:40 epony has joined #osdev

08:41 <klange> Another weekend has gone by in which I did a bunch of stuff _except_ the thing I was "planning" on doing.

08:43 <gog> don't feel bad, i planned on doing things and i ended up playing factorio

08:43 <gog> ¯\_(ツ)_/¯

08:43 <klange> At least I did do useful things....

08:44 <klange> Active window resizing, reimplemented signals, and made my userspace malloc actually release memory. All of these were related.

08:48 <FireFly> usually my weekends go by planning to deal with $personal_infra_technical_debt or work on $hobby_project or so, and just end up watching youtube videos and lazing about

08:48 <FireFly> so I mean, comparatively you're more productive :p

08:49 <klange> I also played a lot of pokémon

08:49 <FireFly> neat

08:49 <FireFly> this weekend for me is just trains

09:02 <moon-child> I reimplemented most of my allocator

09:02 <gog> choo choo

09:02 <moon-child> also played a bunch of celeste

09:02 <klange> I deleted a bunch of my allocator.

09:02 <moon-child> and watched hbomberguy video

09:03 <gog> the deus ex one?

09:03 <moon-child> yeah

09:03 <gog> same

09:03 <moon-child> well, I watched part of it

09:03 <moon-child> really long!

09:03 <gog> i watched all of it. i uh, i haven't slept lol

09:03 <moon-child> :D

09:04 <klange> > 3h33m33s

09:04 <FireFly> oh there's a new hbomberguy video?

09:04 <FireFly> cool I guess I have something to watch during the train ride :p

09:04 <FireFly> and now I'm even in the part of the ride with decent network uplink, heh

09:05 <gog> yeah it's about deus ex: human revolution

09:05 <moon-child> youtube-dl while you have the chance

09:05 <gog> primarily

09:05 * FireFly nods

09:05 <gog> but he talks about the original game a lot too which i appreciated because that was a good game

09:05 <FireFly> moon-child: yeah, fired up a ytdl, we'll see if it takes forever

09:05 <gog> i should reinstall it

09:06 <moon-child> I have never played any deus ex game

09:06 <gog> i highly recommend the original

09:06 <gog> very well made, top-tier worldbuilding

09:06 <moon-child> might take a look later. Though I look at the list of games I have yet to play and ... ...

09:07 <gog> yeah i have a whole steam library of games i have never even played. waiting for my wife to get sick of skyrim before i try to take a turn :p

09:08 <gog> she spent a solid 3 weeks playing new vegas nearly every moment of her free time lol

09:08 <gog> finally got bored of that, took a run at life is strange, then got into skyrim just as i thought she was getting gamed out

09:09 <moon-child> I tried to play skyrim and fallout, never could get too deeply into them

09:09 <moon-child> probably played maybe 10 hours of each

09:09 <moon-child> it was fun, but didn't really grab me

09:10 <gog> fair

09:11 <gog> idk what i want to play next. i wish rimworld had a non-steam version i could download because factorio does and it plays so much better without steam

09:14 <GeDaMo> «If you want to run RimWorld without Steam, I think you can delete 'SteamAPI.dll' from the game's directory, and all Steam functionality will cease.» https://old.reddit.com/r/RimWorld/comments/86z9yb/how_can_i_play_this_game_without_steam_and_offline/dw95ph1/

09:14 <bslsk05> old.reddit.com: DrCubed comments on How can I play this game without steam and offline

09:15 <gog> hm

09:17 <graphitemaster> whence should've been named relative_to

09:17 <graphitemaster> But the C people were terrified of underscores for some reason

09:17 <FireFly> it's on gog as well (heh), so should be playable without steam

09:18 <FireFly> I should play rimworld at some point, it's just that I know it'd nerdsnipe me and distract me for a while and idk if I want to spend the time for that yet

09:20 <gog> yeah it'll do that for sure

09:20 <gog> it's a very complicated game and it's frustrating and engaging and just a wild ride

09:26 Starfoxxes has quit [Ping timeout: 240 seconds]

09:37 Starfoxxes has joined #osdev

09:57 X-Scale` has joined #osdev

09:58 X-Scale has quit [Ping timeout: 256 seconds]

09:58 X-Scale` is now known as X-Scale

10:02 Starfoxxes has quit [Ping timeout: 240 seconds]

10:03 valerius_ has joined #osdev

10:22 <geist> been playing a bunch of horizon forbidden west

10:22 <geist> thus far a solid sequel to an already excellent original

10:30 <clever> geist: ive been reading the miniheap code, and the alignment thing bothers me a bit, what about cutting the free chunk into 3 chunks, a before/target/after, where target is aligned correctly and has no waste?

10:31 <clever> is the mini refering to the metadata or the .text size of the implementation?

10:31 <geist> uh

10:31 <geist> lemme see

10:32 <clever> https://github.com/littlekernel/lk/blob/master/lib/heap/miniheap/miniheap.c#L219

10:32 <bslsk05> github.com: lk/miniheap.c at master · littlekernel/lk · GitHub

10:32 <clever> at this point, it can turn a chunk that is too large, into a target+after pair, and put after back into the free_list

10:33 <geist> yes

10:33 <clever> but alignment works by just making the allocation too big (the alignemnt size), and then shifting the object within that allocated area, to align it

10:33 <geist> that's what it does

10:33 <geist> yes

10:34 <clever> what if you also shifted the alloc_struct_begin object as well, and put the waste from before it, into a new free_heap_chunk

10:34 <clever> and then you dont allocate more then you needed

10:34 <geist> yes

10:34 <geist> thesed are all things. yes.

10:34 <geist> miniheap is designed to be minimal. simple. does the basics, and nothing more

10:35 <geist> it's not optimal, but it's small and simple

10:35 <geist> and what you are talking about only really happens when using allocations with large alignments

10:35 <clever> yeah

10:35 <clever> 256 is the largest alignment i use

10:36 <geist> it simply overallocates and then aligns within the result. basically memalign

10:36 <clever> and thats plenty of room to create a free_heap_chunk

10:36 <geist> yes.

10:36 <geist> it could you are correct. but that's wht the other heaps do better

10:36 <geist> cmptmalloc and dlmalloc are more complex, and handle this sort of thing

10:37 <clever> and lk also has cmpctmalloc, i should study it as well, you did mention i should try changing heaps a while back

10:37 <geist> miniheap is designed to be extremely simple, not very much code gen

10:37 <geist> most of the 'big' arches default to cmpctmalloc

10:37 <geist> zircon actually uses it, almost unmodified

10:38 <clever> ah, i already see improvements only 20 lines in, multiple free lists!

10:40 <geist> miniheap is basically useful for lower end embedded stuff that doesn't really use a heap that much. maybe a handful of allocs and nothing more. on some cortex-m style enviroments it works great since it allocates little chunks from the novm in units of 1k or so

10:40 <geist> basically minimum overhead

10:40 <clever> ah yeah, i have seen convos on the rpi forums about the mcu line, and how you should basically never use malloc

10:40 <clever> statically allocate everything, then you cant run out of ram

10:41 <geist> yah

10:41 <clever> and my bootcode.bin case is even more restrictive, 128kb for me, 264kb for the rp2040 mcu

10:42 <clever> and i must share that limit between both data and code, so its more like 20kb

10:42 <clever> while the rp2040 has a full 16mb xip window into spi flash

10:43 <clever> i need to dig into l1/l2 stuff more, things get unstable if i try to do anything new

10:53 <clever> https://github.com/littlekernel/lk/blob/master/lib/heap/cmpctmalloc/cmpctmalloc.c#L64

10:53 <bslsk05> github.com: lk/cmpctmalloc.c at master · littlekernel/lk · GitHub

10:54 <clever> ah, and i can kinda see a trick here, ->left is the previous, but maybe this + ->size == next?

10:55 <klange> After that conversation the other day, I'm thinking of putting some effort into improving my memory management systems.

11:00 <clever> geist: oh, another intesting thing i note, miniheap is using LK's linked list code, while cmpctmalloc is managing the prev/next directly

11:00 <clever> its more self-contained, and can work outside of lk

11:06 pretty_dumm_guy has joined #osdev

11:17 Burgundy has joined #osdev

11:46 <clever> https://github.com/littlekernel/lk/blob/master/lib/heap/cmpctmalloc/cmpctmalloc.c#L737-L763

11:46 <bslsk05> github.com: lk/cmpctmalloc.c at master · littlekernel/lk · GitHub

11:47 <clever> geist: ah, thats simpler then i was imagining, it over-allocates by both the alignment size, and the structs for managing free space, using the regular malloc

11:47 <clever> then it re-grabs the mutex, and messes with internal metadata, to cut it up into smaller pieces, after having allocated, and return the extra back to free

11:48 <clever> so instead of finding a hole that covers a region meeting my requirements, it just finds a hole thats too big, and then sub-divides it after allocating

12:04 ElectronApps has quit [Remote host closed the connection]

12:17 elastic_dog has quit [Ping timeout: 250 seconds]

12:23 elastic_dog has joined #osdev

12:34 Starfoxxes has joined #osdev

12:54 Burgundy has quit [Ping timeout: 272 seconds]

13:05 nyah has joined #osdev

13:06 elastic_dog has quit [Ping timeout: 252 seconds]

13:10 wootehfoot has joined #osdev

13:12 elastic_dog has joined #osdev

14:15 the_lanetly_052 has joined #osdev

14:17 the_lanetly_052_ has quit [Ping timeout: 272 seconds]

14:39 bauen1 has quit [Ping timeout: 256 seconds]

15:15 [itchyjunk] has joined #osdev

15:27 Oli has joined #osdev

15:29 <mrvn> geist: concerning miniheap I see some inconsistencies: 1) If you use alignment=0 then you get 4/8 byte aligned data because you always keep the size a multiple of void*. This is insufficient to do a double register load.

15:30 <mrvn> 2) If you ask for alignment then the alignment gets bumped up to 16. No way to get 8 byte aligned data. And alignment always wastes alignment many bites not matter what.

15:31 <clever> mrvn: cmpctmalloc doesnt have that waste issue

15:33 <mrvn> 3) You DEBUG_ASSERT len % sizeof(void *) and len > sizeof(struct free_heap_chunk) but then in the code you deal with those cases.

15:35 <mrvn> 4) alignment should I think use ROUNDUP(size, alignment), not size += alignment.

15:36 <clever> mrvn: if the user wants a 40 byte object, that is 10 byte aligned, and you round-up, that will remain as a 40byte size, the malloc can then pick addr 35, and it has no wiggle room to fix things

15:36 <mrvn> clever: DEBUG_ASSERT len % sizeof(void *)

15:36 masoudd has quit [Ping timeout: 272 seconds]

15:36 <clever> the idea there, is to add 10 to the size, so you now have a 50 byte window, where that 40 byte object can land

15:37 <clever> allowing you to shift its addr by 10 bytes forward

15:37 <clever> so moving it from 35 to 40, creates a 5 byte hole on either side of the actual payload

15:37 <mrvn> clever: you are half right. it might have to waste space at the front and back.

15:37 <clever> yeah

15:38 <clever> but you cant know how much you need to waste on the front, until youve already picked a hole in the free space

15:38 <mrvn> Yeah, I take that part back.

15:38 <clever> cmpctmalloc solves that problem, by mutating the heap metadata AFTER youve allocated, to turn the before/after waste back into free space objects

15:40 <mrvn> The wasteage problem comes later. When the chunk is too big the unused part is returned to the free list. But because the original size is lost the extra waste reserved for alignment purposed can't be returned too.

15:42 <mrvn> In line 219 the code should compute how much space is left after the allocation after alignment and using the original size.

15:42 <mrvn> but maybe that counts as too complex.

15:43 <mrvn> anyway, the important case is 1) as that could unexpectably fault.

15:46 vdamewood has joined #osdev

15:50 <mrvn> what should miniheap_alloc(0, 0) return?

15:50 <mrvn> or general malloc(0)?

15:50 <mrvn> Specs say: "If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free()."

15:56 masoudd has joined #osdev

16:03 <mrvn> Am I remembering that issue with double register loads wrong: https://godbolt.org/z/ds6531q8j

16:03 <bslsk05> godbolt.org: Compiler Explorer

16:04 <mrvn> Line 16 shows Pair being only 4 byte aligned but bar(Pair&) does a double register load.

16:05 <mrvn> And what is the compiler doing in the asm in lines 2,3,4,6?

16:06 <mrvn> Make space on the stack, get a pointer to the original address, store the Pair and never ever use it.

16:07 <j`ey> use clang its a betterer compiler!!

16:09 dude12312414 has joined #osdev

16:10 <mrvn> https://godbolt.org/z/coc6WaP3P

16:10 <bslsk05> godbolt.org: Compiler Explorer

16:10 <mrvn> better in some, worse in others

16:10 <j`ey> weird it doesnt do the ldrd

16:10 <j`ey> in bar

16:11 <mrvn> clang does two ldr, which I believe is required.

16:11 <j`ey> required for?

16:11 <mrvn> alignment. iirc ldrd will fault when the address isn't a byte aligned.

16:11 <mrvn> 8 byte aligned

16:12 <mrvn> let me check that ...

16:15 <mrvn> hmm, seems I remeber that wrong: https://developer.arm.com/documentation/ddi0406/cb/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/LDRD--immediate- It does a 64bit read if possible, otherwise 2 32bit reads.

16:15 <bslsk05> developer.arm.com: Documentation – Arm Developer

16:33 <mrvn> Look what happend when I tune it for the RPi1: https://godbolt.org/z/Yo16hYnnT

16:33 <bslsk05> godbolt.org: Compiler Explorer

16:35 <mrvn> It's odd. Now gcc uses ldmia in bar but 2 ldr in baz while clang uses ldr in bar but ldmib in baz.

16:35 <mrvn> Is there a tool that shows cpu cycle times for ASM code?

16:52 Ermine is now known as Santurysim

16:52 Santurysim is now known as Ermine

16:52 Ermine is now known as Santurysim

16:52 Santurysim is now known as Ermine

17:04 <sortie> My new ports system prototype is now fully functional and merged to my volatile builds :)

17:05 <sortie> It's BSD ports style. /src/ports contains a subdirectory for each port containing metadata with links to upstream releases and patches to apply to them. The build system automatically downloads the appropriate files when first used

17:06 <sortie> The ports system is part of the main repository and versioned together. All ports are built by default but can be overridden using the PACKAGES variable

17:07 <sortie> In other words: You can literally just boot up my latest volatile iso and type "cd /src && make" and it will rebuild EVERYTHING natively OUT OF THE BOX

17:08 <sortie> This is how easy I wanted to make developing my OS. Just download it. Bam instant dev environment. No need to build cross-compilers. It's all there. You can optionally install it if you want a persistent environment.

17:08 <mjg> congratulations. you reached templeos level :)

17:09 <mrvn> he still lacks god

17:09 <zid> now rename it to stage3.tar.gz

17:09 <zid> and you're at gentoo level

17:09 <sortie> I probably exceed the other systems because my WHOLE ports system cross-compiles cleanly out of the box too

17:10 <mjg> templeos recompiles everything it can run afair

17:10 <mjg> so..

17:10 <sortie> It's a quite impressive how big a tech loop I've built :)

17:10 <sortie> It's even connected to IRC for some reason

17:17 underscoreanne has joined #osdev

17:22 Burgundy has joined #osdev

17:25 the_lanetly_052 has quit [Ping timeout: 240 seconds]

17:34 the_lanetly_052 has joined #osdev

17:39 brynet has quit [Quit: leaving]

17:40 the_lanetly_052 has quit [Ping timeout: 256 seconds]

17:56 not_not has joined #osdev

17:56 <Clockface> GNU/purgatory is GNU/heaven you must compile from source

17:56 <not_not> Ahahaha

17:57 <not_not> Im doing lfs right now acturally

17:57 xenos1984 has quit [Read error: Connection reset by peer]

17:59 <not_not> But its heaven if U dont like sleep

17:59 <not_not> And if you truely hate the Sun and everyone who walks below it

18:09 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

18:14 <Clockface> eh, i decided ill do randomly generated null-terminated sequences

18:15 <Clockface> so its variable length, in case i feel like saving memory or 600 trillion peices of software end up getting made

18:15 xenos1984 has joined #osdev

18:16 underscoreanne has quit [Ping timeout: 256 seconds]

18:20 brynet has joined #osdev

18:25 <zid> TIL nvidia is open source

18:28 <gog> lol

18:28 <gog> yes

18:53 <geist> mrvn: you're right, the miniheap stuff may not deal with minimum alignment properly,but i think it alwyas gets you at least a word alignment

18:53 <geist> which on the 32 bit arches its used for it sufficient

18:53 <geist> but i dont think that's good for x86-64 or whatnot

18:53 <geist> which i think has a general 16 byte alignment (2 words)

18:54 <geist> like i said it's only really used on microcontrollers or whatnot

18:54 <geist> the bigger arches default to cmptcptmalloc or dlmalloc

18:54 <geist> but... i should at east write some docs at the top f the file or whatot saying this

18:55 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

18:55 <geist> also while looking at it, can probably simplify it a bit and use a single linked list for free, since it always walks the free list in order anyway

18:56 <geist> i honestly have't look at that code much in the last 12 years or so. it's been a workhorse so i dont think there's any overt bugs with it, just optimizations not taken

19:00 <mrvn> geist: I was under the impression that ldrd would fault when the address is not aligned but the docs say it just does 2 32bit loads in that case. So void* alignment is sufficient.

19:00 <geist> yes, it does void alignment

19:00 <geist> but you're right, the ABI may want you to do 2 word alignment

19:01 <geist> or, sort of more specifically, the compiler may *assume* the heap does a particular alignment and optimize accordingly

19:01 <mrvn> geist: gcc and clang both do and don't depending on $random.

19:01 <geist> i think we found that on cmptctpmalloc in zircon. it was giving us 8 byte aligned pointers, but the compiler was assuming 16

19:01 <geist> and then if you really wanted something 16 byte aligned, called memalign, it would actually call malloc

19:01 <geist> etc

19:02 <geist> this is actually worth double checking in LK with all the arches

19:02 <geist> note there's a compiler switch to override the assumed heap alignment

19:02 <mrvn> What I would change is the check for minimum alignment >= 16. Nothing wrong with asking for alignment = 8.

19:03 <mrvn> or 4

19:03 <geist> well depends o 32 vs 64

19:03 <geist> and arch, so actually i think a really good solution is a bit more complicated

19:03 <mrvn> not really. If you can't handle 4 byte alignment then don't ask for it.

19:03 <geist> but, in this case, with miniheap it basically takes any align < sizeof(void *) and rounds it up

19:04 <geist> eh, i disagree. if you ask for lower alignment than the minimum, then give the minimum

19:04 <mrvn> the roundup to sizeof(void*) is a different case. That's ok and for the internal sanity.

19:04 <geist> especially sicne it's potentially arch specific

19:04 <mrvn> Which means you already get 4/8 byte aligned depending on the arch.

19:04 <geist> right

19:04 <mrvn> But when the user specifically asks for an alignment then you bump that up to 16.

19:05 <geist> hmm why?

19:05 <mrvn> lines 193-195

19:06 <geist> hmm, trying to grok that

19:07 <mrvn> I see no reason to increase the alignment beyond that sizeof(void*) requirement. If the user asks for 1 byte aligned data then there is no reason to align to 16 byte.

19:07 <geist> hmm, trying to look through the git history to see why that's the case

19:09 <mrvn> Just seems odd to force something on the user when they specifically asked for less.

19:10 <geist> hmm, drilling through history that line was last touched in 2008

19:10 <mrvn> As for "the compiler was assuming 16". That's also an issue. Depending on the arch malloc() is assumed to return aligned data for good speed. E.g. on x86 you get 16 byte aligned for SSE registers to work.

19:10 <geist> 'initial commit' so yeah there's no prior history

19:11 <geist> yes. most likely that's what i was observing at the time and simply set the minimum alignment to 16 in that function

19:11 <mrvn> maybe add a DEBUG_ASSERT to see if it even triggers anywhere in the code.

19:11 <geist> and then literally haven't thought about it

19:11 <geist> since min alignment 16 is pretty much the max min alignment i've observed on all arches

19:11 <mrvn> then you messed that up. :) The default is alignment == 0

19:11 <geist> alignment == 0 simply means 'i have no opinion'

19:11 <geist> the fact that the heap code itself then rounds it up to 16

19:12 <geist> is an implementation detail. does the heap need to do that? probably not, depends on the arch

19:12 <mrvn> except in that case it doesn't.

19:12 <mrvn> I would change the code to: if (alignment == 0) alignment = 16;

19:13 <geist> ah i see. yes it defacto aligns to sizeof(void *) because that's the alignmebt of free slots

19:14 <geist> ah i see. no it's more complicated than that. i think it WAI, but it's implicit

19:14 <geist> alignment = 0 is okay, it just then goes through a slightly implicit path in the allocator

19:14 <geist> it's just not that clear

19:15 <mrvn> if alignment==0 then in line 244 you don't align. So you get the natural void* alignment of the free block.

19:15 <geist> right, which is WAI

19:15 <mrvn> And I think the compiler would assume 16 byte aligned on many arches in that case.

19:15 <geist> because later on when it's searching for a block it doesn't then try to round up the pointer

19:15 <geist> well, *that* is a different problem, yes

19:16 <geist> and you're right, except this is omly ever run on arches where void * alignment is okay

19:16 <geist> though i should double check that

19:16 <geist> and/or statically assert it

19:16 <geist> *or* override the compiler flag for minimum alignment if this heap is selected

19:16 <mrvn> So my implementation would be: alignment == 0 ==> 16 byte, alignement < sizeof(void*) ==> sizeof(void*), else as requested.

19:16 <geist> okay

19:17 <mrvn> or instead of 16byte whatever the arch has as assumed alignment.

19:17 <geist> i dont really agree, but i think there's a solution in there

19:17 <geist> yes. i think the per arch thing is important

19:19 <mrvn> Is the miniheap used a lot in LK or just during bootstrap?

19:20 <geist> it's used in embedded builds of it

19:20 <geist> usually cortex-m. stuff where the heap is in the 10s of K

19:20 <geist> it doesn't scale at all for larger heaps, or lots of alloc/frees sine it's entirely O(N)

19:20 <mrvn> it has O(n) performance but with 10s of K that's probably still ok.

19:20 <geist> exactly

19:21 <geist> it performs *horribly* in a highly alloc/free environment (C++ with lots of new/frees)

19:21 <geist> new/deletes

19:21 <mrvn> on the other hand with that little memory a best-fit strategy might be usefull.

19:21 <geist> possibly? best-fit has a tendency to create the smallest possible holes

19:22 <geist> also it first fits which *probably* generates pretty good GCing

19:22 <geist> sicne it can return unused blocks to the system allocator

19:22 <mrvn> yeah, there was some flavour that used perfects holes and otherwise picked holes to split smarter.

19:22 <geist> it has a self trim mechanism

19:22 demindiro has joined #osdev

19:23 <geist> for builds like cortex-m what LK has is a 'novm' VM. basically it carves up all the free space after the text/bss/etc to the end of RAM into fixed sized pages. up to the build, but maybe 512 bytes or 1K or whatnot. uses a bitmap (because small)

19:23 MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]

19:23 <geist> the heap grabs chunks out of that to expand itself, but otherwise code is free to allocate directly from the novm

19:23 <geist> basically lets larger allocs skip the heap if it wants

19:24 <geist> or run some sort of second heap (like some garbage collected, language heap)

19:24 eddof13 has joined #osdev

19:24 <geist> so miniheap is pretty good about being frugal with its free list (stores the free list in the free pages itself) and fairly aggressive about returning free chunks to the novm

19:25 <mrvn> If you have a lot of malloc/free it really helps to have separate free lists for 8, 16, 32, 64 byte objects.

19:25 <geist> so i think in general being an agressive firt-find would tend to cluster allocs at the start of the heap

19:25 <geist> mrvn: yah that's precisely what cmptctpmalloc does

19:25 <geist> its a standard binning heap

19:25 <mrvn> but speed vs. complexity :)(

19:25 <geist> more overhead because ore lists, etc, but the performance is far better

19:26 <geist> so for builds like arm64, x86, riscv, etc usually use cmpctmalloc

19:26 <geist> or even dlmalloc, which is also in the tree, but my testing is cmpctmalloc does as good of a job and is at least a little bit easier to read

19:32 <geist> but i should also write some unit tests for this stuff, etc

19:33 <mrvn> can I make qemu-system-arm print all cpu exceptions and interrupts?

19:33 <geist> `-d int` probably?

19:33 <gog> yes

19:33 * geist pets gog

19:34 <gog> if it's enabled in your build

19:34 * gog prrrrs

19:34 <mrvn> thanks.

19:34 <geist> hmm that's true, it may be possible to disable it, but i think you'd have to have explicitly done so

19:34 <mrvn> any way to get a register dump when one happens?

19:34 <gog> it dumps every time

19:34 <mrvn> Exception return from AArch32 mon to svc PC 0x448

19:34 <gog> you're gonna get a buttload of terminal spam

19:34 <mrvn> doesn't tell me much.

19:34 <gog> oh

19:34 <gog> okay then

19:35 <geist> i think it may be per arch

19:35 <gog> yeah x86_64 spits out the whole state

19:35 <geist> if you have something that's crashing very early you can get away with mega spam with

19:35 <geist> `-d int,cpu,exec`

19:35 <geist> one of cpu or exec, i froget, dumps the state

19:35 <mrvn> thx

19:35 <geist> but it dumps it also on every block of code, so it's more than just interrupts

19:36 <geist> but since the int is a forced jmp essentially i thik you can line up the nearby cpu state block with the irq

19:36 <geist> useful if you have an early crash, or you're trying to debug your x86 real mode to long mode code or whatnot

19:37 <mrvn> trying to see why my code stops after enabling the MMU

19:38 <geist> yah and sadly there's no `info mmu` for arm

19:39 <mrvn> Taking exception 4 [Data Abort]

19:39 <mrvn> ...with DFSR 0x7 DFAR 0x3f201018

19:39 <mrvn> That's the UART0 status register

19:39 <geist> oooh, trying to write to a register that's no longer mapped?

19:39 <j`ey> is it mapped?

19:39 <mrvn> if only I had "info mmu" to check

19:40 knusbaum has quit [Ping timeout: 256 seconds]

19:41 MiningMarsh has joined #osdev

19:42 <geist> gosh i wish apple would fix this. seems that on my M1 mini every few days the 'systemstats' process grows and grows, leaking ram such that it fills up the swap

19:42 <geist> it deals with it admirably, but eventually i think starts slowing the system down

19:42 <geist> in this case it was 30GB virtual space, 29.5GB of compressed memory, using 8GB of swap

19:49 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

19:49 <mrvn> ups, I set the size of the peripherals to map in r5 instead of r6. So it mapped total garbage.

19:51 FatAlbert has quit [Quit: WeeChat 3.4]

19:53 <mrvn> I'm still annoyed that "qemu -M raspi2" loads the kernel at 0x10000 instead of 0x8000 where a real one does.

19:53 <mrvn> or is that some old firmare vs. new firmware thing?

19:56 rwb is now known as rb

20:08 <geist> i have vague recollections that the firmware changed somewhere, it broke LK at some point too

20:14 sonny has joined #osdev

20:15 <sonny> floppy disk boot sounds so much easier :-)

20:16 <geist> than what?

20:16 <gog> i watch retro computer youtubers just to hear the seek test

20:16 <gog> bwwwmm-bwm-brrrrmp

20:16 <geist> hah yes

20:16 <geist> was watching some guy fix a RX02, but that's a bit more noisy

20:16 <gog> o:

20:16 <geist> with a nice CLICK sound every time the head goes inactive

20:16 <gog> nice

20:16 <gog> did you see adrian's latest

20:16 <gog> i'm disappointed in our man

20:17 <geist> there's apparently a little solenoid to pull the head back

20:17 <gog> he said 28mins was a "long" video

20:17 <geist> hmm, which one? i'm on his patreon and he just posed a vid of his basement

20:17 <geist> he has a pretty rad man cave

20:17 <gog> he rescued a pet

20:17 <gog> it needs rehab

20:17 <geist> ah yes, haven't watched the latest

20:17 <gog> i wanted to know what was wrong with it and he stopped the video!

20:18 <geist> oh i thought he just fixed it? haven't watched that one yet

20:18 <gog> nah brand new pet

20:18 <gog> 4120

20:18 <geist> the one with the bad memory detect

20:18 <gog> yes

20:18 <geist> ah

20:18 <gog> wait did he post a new one? /me checks

20:18 <geist> maybe patreon

20:18 <geist> usually posts them there a few days early, i lose track

20:18 <gog> ah fair

20:19 <gog> the only patreon i subscribe to is wtypp

20:19 <geist> this one is pretty reasonable. $4 gets you the goods

20:19 <geist> but yeah i patreon a few of em

20:19 <gog> i pay $2 to listen to jaded people explain why everything is awful and we can't have nice things

20:19 <gog> it's therapeutic

20:20 <geist> heh, if that works for you!

20:20 <gog> :D

20:20 * geist is defragging a few egregiously fragged files

20:20 <gog> oops

20:20 <geist> not really therapeutic though, with no sound

20:21 <gog> if a file gets fragemented in the forest but nobody hears it did it really get fragmented?

20:21 <geist> yah and does it make any performance difference? no. but in MY DAY we cared about fragmentation

20:21 <geist> i can just think about all the spillover FILE records to track a file with 50k extents

20:22 <geist> does it matter? nope.

20:22 <gog> lol

20:22 <gog> i mean it did matter

20:22 <geist> and technically the cpu is doing a bit more work there to seek through the extent list

20:22 <FireFly> oh I just finished the latest wtypp vid, heh

20:23 <gog> i've been listening to true crime shows all day

20:23 <sonny> geist uefi

20:23 <geist> sometimes just lisxtening to Steve at gamers nexus blab on about benchmarks for 30 minutes is good background noise

20:23 <geist> i think his brain works too fast for his mouth

20:23 <sonny> I just watched https://youtu.be/xD14SLU2u2k

20:23 <bslsk05> 'Pushing the limits of floppy disk boot sectors: sectorLISP' by The Oldskool PC (00:14:23)

20:24 <gog> nice

20:24 <sonny> yeah, I didn't think it would be so simple

20:24 <geist> sonny: ah yes. UEFI is definitely a designed by committee mess. OTOH despite it's ergonomics, i do rather like the overall structure

20:24 <geist> ie, giving you an api/environment to write your 'whatever you want to do' bootloader is fairly nice

20:25 <sonny> yeah that is cool

20:25 <geist> and, and this is really important to me, it's portable across arches

20:25 <gog> ILikeUefiButTheFunctionNamesAreSoBad()

20:25 <sonny> lol

20:25 <geist> so for thigs that support it, you end up with a single loader which is very nice

20:25 <sonny> more than amd64 support uefi?

20:25 <geist> yah, arm, riscv

20:25 <gog> ^

20:25 <sonny> oh cool

20:26 <sonny> yeah that's very helpful

20:26 <geist> yah same API, etc. just compile yuor code differently

20:26 <gog> and ia32

20:26 <geist> now, obviously the issue is different ARM boards, etc dont *have* t support UEFI, but i think the trend is slowly moving that way

20:27 <sonny> so no more device tree stuff?

20:27 <j`ey> lots of those boards can get uefi via uboot

20:27 <geist> device tree can coexist with UEFI

20:27 <j`ey> (but not UEFI+DT not acpi)

20:27 <geist> i have to actually try it, but there's a UUID to get the DT blob out of UEFI

20:27 <geist> GetSystemTable or something

20:28 <gog> SystemTable->ConfigurationTable

20:28 <gog> same as acpi

20:28 <gog> you get a pointer to the bloob with the UUID

20:28 <geist> yah

20:28 <geist> so based on how you wanna roll, you can use ACPI or DT (or both) to configure your drivers

20:29 <sonny> I looked at the uefi hello world and the string "hello world" wasn't there D:

20:29 <geist> windows uses ACPI, and i think linux tends to favor DT< though it's unclear precisely what uses ACPI and what uses DT when both are present

20:29 <geist> that's because it's probably stored as UTF16

20:29 <geist> so look for h\0e\0....

20:29 <geist> because MSFT

20:30 <sonny> https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Application/HelloWorld/HelloWorld.c

20:30 <bslsk05> github.com: edk2/HelloWorld.c at master · tianocore/edk2 · GitHub

20:30 <geist> yeah?

20:31 <geist> i dunno what that PcdGetPtr stuff is, but that's what documentation is for

20:31 <geist> maybe some way to build per language strings or something

20:31 <geist> but indeed does not seem like a very friendly way to build a hello world

20:31 <sonny> ok, I'll investigate that later

20:32 <geist> but the real gist is it calls Print() with a string

20:32 <geist> so thats hello world

20:32 <sonny> fair

20:37 <gog> sonny: https://github.com/adachristine/efitutorial/tree/main/loader

20:37 <bslsk05> github.com: efitutorial/loader at main · adachristine/efitutorial · GitHub

20:37 <gog> the best uefi hello world

20:38 <sonny> ah, thanks a lot!

20:38 <kingoffrance> dont see it on sonny's link, i think one can gather most of the C "hello world"s into an "unnecessary use of printf()" page, since puts() suffices for most of them

20:38 <geist> heh well the compiler will probably replace it anyway

20:38 <sonny> yeah, just that there puts() seemed convoluted

20:38 <sonny> s/there/their/

20:39 <geist> so can counter that as an 'unnecessary microoptimization that the compiler will do for you'

20:39 masoudd has quit [Ping timeout: 272 seconds]

20:39 <kingoffrance> eh, that doesn't bother me, just seems like messing with noobs right from the start

20:40 <geist> what i dont like about puts is it puts int he \n at the end

20:40 <geist> which i think folks should generally be aware of right off the bat

20:40 <geist> though other languages tend to have two variants of their printf that inserts or not

20:40 <geist> so that may just be old skool me

20:42 <gog> i'm the 83rd most active dev known to be in iceland :D

20:43 <sonny> congrats, is that stat via github?

20:43 <gog> https://github.com/gayanvoice/top-github-users/blob/main/markdown/public_contributions/iceland.md

20:44 <bslsk05> github.com: top-github-users/iceland.md at main · gayanvoice/top-github-users · GitHub

20:44 <gog> 212 if you count all contributions rather than public :p

20:44 <mrvn> Urgs, my kernel puts() is broken, it doesn't add a newline. :)

20:44 <gog> i gotta up my game here

20:45 <gog> a commit for every changed line

20:45 <geist> also fputs *doesnt* so you can't just implement puts as an alias of fputs(stdout)

20:45 <geist> annoying it is

20:45 <mrvn> kingoffrance: printf("Hello, world!\n") or printf("%s\n", "Hello, world!")?

20:45 <sonny> I don't use any web service long enough to get on those lists :(

20:46 <gog> i do not like implicit \n if i wanted \n i would have put it in the string

20:46 <geist> mrvn: if you're puts doesn't do it and you have -fbuiltins, you'l find out pretty fast, since hte compiler will happily replace printfs with puts for you

20:46 <geist> gog: exactly

20:46 <mrvn> I have putc, puts, puti, putx from the days before printf.

20:46 <gog> nice

20:48 <not_not> Nice

20:48 <not_not> Gog same

20:55 <sonny> computer memory doesn't have a [signed] representation right? It's just bits?

20:56 <GeDaMo> Yes, just bits

20:57 mahmutov has quit [Ping timeout: 272 seconds]

20:57 <mrvn> the signed bit comes from the opcodes

20:57 <gog> it's an array of (minimum addressable unit)

20:57 <mrvn> .oO(9 bit)

20:57 <gog> hehe

21:01 pretty_dumm_guy has quit [Quit: WeeChat 3.4]

21:14 Teukka has quit [Read error: Connection reset by peer]

21:15 xenos1984 has quit [Remote host closed the connection]

21:16 xenos1984 has joined #osdev

21:17 Teukka has joined #osdev

21:20 <vin1> :q

21:20 vin1 has quit [Quit: WeeChat 2.8]

21:20 vin has joined #osdev

21:23 GeDaMo has quit [Remote host closed the connection]

21:23 <kingoffrance> mrvn: first one should be puts() :) second is unnecessary unless the output is configurable/changeable

21:24 <kingoffrance> < a commit for every changed line of course, that's how you get persistency and rollback features

21:27 <mrvn> run "make" in the pre-comit hook and reject on error :)

21:27 <kingoffrance> sonny, there was sign and magnitude, but i'm not sure how that was defined i.e. could programmer toggle it, or was it a "side effect" of instructions. nevertheless, at some point, was probably still just another bit

21:28 <mrvn> that's called 1's complement and is just a different interpretation of the bits in memory.

21:30 <sonny> kingoffrance I'm looking at the 704 and that stuff seems to only be defined for numbers

21:31 FreeFull has joined #osdev

21:32 <kingoffrance> mrvn, https://en.wikipedia.org/wiki/Signed_number_representations#Sign-and-magnitude_method

21:32 <bslsk05> en.wikipedia.org: Signed number representations - Wikipedia

21:32 eddof13 has joined #osdev

21:34 <kingoffrance> sonny: Fixed-point numbers are stored in binary sign/magnitude format. :) from https://en.wikipedia.org/wiki/IBM_704

21:34 <bslsk05> en.wikipedia.org: IBM 704 - Wikipedia

21:35 <sonny> yeah

21:36 <mrvn> The format is no longer supported in the latest C/C++ standards.

21:37 <kingoffrance> ^ believe it is all twos complement

21:37 <mrvn> sign/magnitude is ones complement

21:37 <mrvn> i believe

21:38 <sonny> sign/mag doesn't use a complement iirc

21:39 <kingoffrance> ^ wikipedia example: 10000010−125 8 bit "ones complement" for sign & mag, should just i believe be 1 <sign bit somewhere> 1111101

21:39 <kingoffrance> i just dont know if say from a c program, you can access that magic "sign bit"

21:39 <kingoffrance> or is it "transparent"

21:40 <kingoffrance> somewhere i have in my notes such a system c compiler, but have not used it

21:40 <kingoffrance> *system +

21:40 <mrvn> you can always access it.

21:40 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:41 <mrvn> bool negative = x < 0;

21:42 <kingoffrance> yeah, but thats what seems to me "transparent"

21:42 <kingoffrance> i mean with a mask, say a uchar

21:42 <mrvn> or (unsigned)x >> bits;

21:42 <mrvn> I highly doubt the cpu would hide the sign bit for unsigned ints.

21:43 <kingoffrance> can you toggle padding bits?

21:46 JanC has quit [Remote host closed the connection]

21:46 JanC has joined #osdev

21:56 <mrvn> geist: Are ARMv8 with only a single VA range common?

21:56 <geist> VA range how?

21:57 <j`ey> you mean PA range?

21:57 <mrvn> "For a translation stage that supports a single VA range, a 48-biz VA width gives a VA range of 0x0000000000000000 to 0x0000ffffffffffff."

21:57 <geist> physical address range?

21:57 <geist> aah no. that's referring to EL2 and whatnot

21:57 <mrvn> No, virtual address

21:57 <geist> in EL2 without the type 2 extensions enabled (i forget the acrynom) EL2 only supports a single VA range

21:58 <geist> hence why that verbiage exists

21:58 <j`ey> VHE

21:58 <geist> yah VHE

22:03 <j`ey> with !VHE there's only TTBR0_EL2

22:03 <mrvn> Can you explain the address tagging in AArch64 state in a few words?

22:03 <geist> the ASID stuff?

22:04 <geist> (there are a few things with tagging and whatnot)

22:04 <mrvn> If that's ASID then yes. I thought it might be but the wording is horrible.

22:04 <geist> there's also a bit you can set that lets you put 8 bits of whatever you want in the top 8 bits of the address

22:04 <geist> i think that's also called tagging

22:04 <mrvn> that's the one I'm reading

22:04 <kingoffrance> the good news is, if clang, llvm, whatever else, keeps abstracting into arbitrary-sized integers, eventually they will abstract far enough these things will be a thing again

22:04 eddof13 has joined #osdev

22:05 <geist> that's a different bit. lets you (per addrses space) tell the cpu to ignore bits 63:56 i think

22:05 <geist> and then bit 55 becomes the one that determines user and kernel

22:05 <geist> can use it for various schemes

22:05 <geist> we're actually enabling it in zircon for user space, like right now

22:05 <mrvn> Only thing the docs say is that the top bits get iggnored for <list of things>. Are those bits used for anything?

22:05 <j`ey> TBI top byte ignore

22:05 <geist> yah TBI

22:06 <geist> when TBI is set, no

22:06 <geist> that's the pioint, lets code use the top bits for their own purposes

22:06 <mrvn> What's the use case for that?

22:06 <j`ey> MTE uses the top bits

22:06 <geist> we're enabling it in fuchsia for similar stuff. i forget the exact thnig

22:06 <geist> but basically user space pointer tagging

22:07 <j`ey> (MTE = memory tagging extension)

22:07 <mrvn> j`ey: No, MTE is somethig else.

22:08 <j`ey> "MTE is built on top of the ARMv8.0 virtual address tagging TBI (Top Byte Ignore) feature"

22:08 <j`ey> https://www.kernel.org/doc/html/latest/arm64/memory-tagging-extension.html

22:08 <bslsk05> www.kernel.org: Memory Tagging Extension (MTE) in AArch64 Linux — The Linux Kernel documentation

22:09 eddof13 has quit [Client Quit]

22:11 sonny has quit [Quit: Client closed]

22:13 lkurusa has joined #osdev

22:17 knusbaum has joined #osdev

22:22 <mrvn> ARM always has so many config registers and options. The docs are nearly unreadable. I'm trying to see the page table structure and I'm 10+ pages of "if this then that else that" into the chapter and not one bit about what a page table looks like.

22:22 <mrvn> but I have 4 levels.

22:22 <klange> haha yes, it's utter garbage

22:23 <mrvn> I want a "How do I use this" document instead of a "How to I build this".

22:23 <geist> yah gyuess i kinda have a stockholm syndrome with it

22:23 <klange> try this https://armv8-ref.codingbelief.com/en/chapter_d4/d43_1_vmsav8-64_translation_table_descriptor_formats.html

22:23 <bslsk05> armv8-ref.codingbelief.com: D4.3.1 VMSAv8-64 translation table level 0 level 1 and level 2 descriptor formats · ARM Architecture Reference Manual for ARMv8-A

22:23 <geist> been using that doc so long it's hard to remember how hard it is

22:24 <mrvn> geist: In my mind the whole document is backward. It first describes all the millions of exceptions before giving even the base format. The docs should start with a graphic showing the required layout all cpus must have.

22:25 <geist> yah i suppose. trouble is there really are a bunch of variants

22:25 <sortie> I just merged a links(1) port made by a contributor :)

22:25 <geist> there really isn't a base format, because the base format is 'you chose these options, this is what you get'

22:25 <sortie> I can now $LC_SEARCH :)

22:26 <mrvn> In the AMD docs it has 3 pictures of address translation for the various options. You look at it and then you can understand what the text below each is talking about.

22:26 <geist> yeah. the pic you seek is there, you just have to know which variant it is

22:26 <geist> it's pretty much what klange linked

22:27 <mrvn> yeah, it's there somewhere. But by the time you find it you are utterly confused.

22:27 <geist> it's the fact that it's so configurable that makes it hard. ie, there's no 4 level paging. it's paging can go up to 4 levels, dynamically

22:27 <geist> for example

22:28 <geist> i'm certain this is why info mmu doen't exist on qemu. the amount of work to parse all the options is a lot more than x86 and especiallt riscv which is almost comically simple compared to the ARM versions

22:29 <mrvn> That's bad code design in qemu. The MMU should have an abstraction layer so that the implementation of the MMU also generates the info mmu output for free.

22:29 <klange> but qemu _implements_ the mmu, so surely the work was already done and it's just laziness...

22:29 <klange> yeah what mrvn said

22:30 <geist> right, yeah

22:30 <geist> i looked at the existing code for these and its in no way shared with the rest

22:30 <mrvn> Each CPU should just define "this is how a pagetable looks like" and then generic page walking code and info mmu takes over or similar.

22:31 <mrvn> With a few exceptions there should be a ton of code different cpus can share there.

22:31 <geist> well... that's probably not a good idea for speed purposes

22:31 <geist> trying to make a generic page walker routine that works on all arches is a recipe for a slow implementation

22:32 <mrvn> .oO(Isn't that what the TLB is for :)

22:32 <geist> sure but the TLB is also epr arch too

22:32 <mrvn> qemu has a TLB?

22:32 <geist> i actually dunno how qemu handles translations

22:32 <klange> TCG has a generic TLB

22:32 <geist> it almost certainly has some sort of fast cache yes

22:33 <mrvn> I don't think it emulates the actual TLB.

22:33 <klange> last I checked it was not per-arch

22:33 <klange> and does not emulate the arch-specific TLB behaviors

22:33 <geist> right. some sort of generic translation cache indeed

22:33 <geist> and since it's compiled again and again it can have some #defines to change its behavior

22:36 <mrvn> In the page table format a "block entry" is a huge page, right?

22:36 nyah has quit [Quit: leaving]

22:37 <geist> yah

22:37 <geist> basically a terminal page table entry

22:37 <mrvn> what about 4k pages? Are those blocks too?

22:37 <geist> not at the last level

22:38 <geist> they are, but i think they're just not called block entries

22:38 <geist> probably for legacy reasons. i *think* they act exactly identically if it's an entry at L0 or higher up the stack

22:39 <mrvn> Look identical apart from the number of reserved bits

22:39 <geist> i think in arm32 they were called block entries yes

22:39 <geist> also there are 'contiguous pages' which are different

22:39 <geist> that's wen you mark say 8 or 16 pages in the final page table entry as combined with the nearby ones

22:40 <geist> ie, how you get 64k pages in a 4K page granule table

22:40 <mrvn> hah, not the same. A block ends in 01. A page ends in 11 (like a table entry)

22:40 <geist> well, yeah not *identically*. they're marked differently for sure

22:40 <mrvn> And 01 on the lowest level is reserved. Why did they do that?

22:40 * geist shrugs

22:41 <mrvn> Now I need 2 descriptions for the same bit. Level 0,1,2 it's Block vs. Table and level 3 it's revered vs. page.

22:42 <geist> right, was going to say if you're going to try to build some sort of bitfield struct for it, you're going to have a bad time

22:42 <geist> there's a fair amount of 'if this then interpret this as that'

22:42 <geist> in the upper PTE attributes especially

22:42 nyah has joined #osdev

22:43 <geist> i know its not your style but if you need some reference i've already flattened most to https://github.com/littlekernel/lk/blob/master/arch/arm64/include/arch/arm64/mmu.h#L172

22:43 <bslsk05> github.com: lk/mmu.h at master · littlekernel/lk · GitHub

22:43 <mrvn> I guess nobody at ARM likes the fractal/recursive page mapping trick from x86.

22:43 <geist> yah does not work, i do not think

22:43 <geist> but then i dont either, so no loss for me

22:47 <klange> just seems fragile and inflexible to me

22:48 <geist> agreed and as i've pointed out before it starts to get very hard to manage in a multi-cpu environment

22:48 <geist> also has some issues with seeing data before it's available, etc. basically in production stuff it gets hard to deal with

22:49 <klange> Really need to focus on USB stuff this week... I'm starting to face a deadline...

22:49 <mrvn> You always have that on the kernel level. When you map a page it's there but uninitialized.

22:50 <geist> right which is why you can't map a page before initialized

22:51 <geist> also means you can't add a page to the page table structure before it's initialized and memory barriered

22:52 <mrvn> I don't see where a fractal mapping is any different there. You map it to a temp location, initialize, map it to the real location.

22:52 <geist> you just hit it right on the head

22:52 <geist> you map it to one place, then unmap it and map it again

22:52 <geist> that's a *lot* of extra work

22:53 <mrvn> same amount of extra work with and without fractal mapping. you always do that.

22:53 <geist> not if you keep all pages mapped

22:53 <geist> then you dont have to map/unmap to access the page tables

22:53 <mrvn> same with fractal mapping.

22:53 <geist> anyway i've had this discussion liek 27 times over the years, dont want to get into it now

22:53 <mrvn> you can have fractal mapping and all pages mapped if you want.

22:53 <geist> there are a lot of reasons it becomes sub-optimal such that the utility of it does not outweigh the disadvantages

22:54 <geist> but they usually onkly show up once the design has reached enough sophistication that it causes trouble

22:54 <mrvn> Where I found it got complex was when copying data between different address spaces, e.g. 2 processes.

22:54 <geist> so it's great for hobby stuff, but over time becomes a crutch

22:54 <geist> and anyway its x86 specific anyway, which is jsut one grain of sand on the beach

22:55 <mrvn> What I like about is that you can define a page table entry once and then use it at every level.

22:55 <geist> sure, it's nice while it works

22:55 <geist> not sayign there's not a utility with it, but that's moot. a thing that's nice that becomes a hindrance you have to just move past

22:56 <geist> it's like a simple library/API that you hang onto far past when you should have moved onto something more complicated and powerful

22:56 <geist> human nature is to keep trying to fit the square peg you're used to into the round hole

22:58 vdamewood has joined #osdev

22:58 <mrvn> I need an App for the page tables. Something that lets me set all the bits in control registers and then shows me just the docs pertaining to that config.

22:58 <geist> anyway, sorry about being salty about this one

22:58 <geist> i've just had this argument before

22:58 <geist> mrvn: yeah years ago someone i know wrote a python script to decode page tables and it was really neat

22:58 <mrvn> no problem. I'm just testy because I'm having information overload here.

22:59 <geist> yah

22:59 <gog> now what about using only the page table recursively

22:59 <geist> i totally get that

23:00 <geist> it's like every time i try to dig into memory barriers or memory order on ARM it's like <sigh> and then half the day is wasted digging through the manual

23:00 <mrvn> If I could just mask out all the bits about EL2/EL3 the docs would be half the size already.

23:00 <klange> much ugh, very bleh

23:02 <klange> I still have lingering issues that I think are related to instruction caches on return from fork()

23:02 <mrvn> on fork()? Not exec()?

23:03 <klange> the child process on return from fork() _without_ Cow.

23:03 <klange> Bovines might fix it.

23:03 <klange> Or at least work around it.

23:03 <mrvn> but the instructions should be identical for parent and child.

23:03 <klange> But the child starts on a core that was running something else previously.

23:03 bradd has quit [Ping timeout: 256 seconds]

23:04 <geist> are you using ASIDs when context switching?

23:04 <klange> what's an asid

23:04 <geist> so you're not. okay. so lemme think

23:04 <mrvn> ahh. SO you get instructions from a previous random process.

23:04 vinleod has joined #osdev

23:04 <mrvn> Doesn't reloading the page table register flush the instruction cache?

23:05 <klange> doesn't seem to, no

23:05 <geist> hmm, that is strange

23:05 <geist> so I and D cache on arm64 is supposed to be VIPT, or at least behave that way

23:05 <mrvn> fork() and multitasking should be the same there. If fork() fails the multitasking should fail too.

23:06 <geist> so unless you're reusnig the same physical pages (which may be your problem) you should't get a bad alias

23:06 vdamewood has quit [Killed (erbium.libera.chat (Nickname regained by services))]

23:06 vinleod is now known as vdamewood

23:06 <geist> however. if you had a physical page A with some instructions in it that the cpu ran some time in the past

23:06 bradd has joined #osdev

23:06 <geist> and it gets recycled, mapping a *different* piece of code that you then run without dumping the icache

23:06 <geist> you can run old code that was there

23:06 <klange> ^ The physical pages are almost definitely reused things with other code

23:06 <geist> so possible during fork you're cycling thrtough some pages

23:07 <mrvn> Are you copying the pages for the .text segments on fork()?

23:07 <geist> okay, so basically what you need to do here is whenever you allocate a ppage for the purposes of mapping code and go to map it you need to dump the icache on it

23:07 <geist> this can happen even without fork, if you were demand faulting in stuff from .text and pulling in one page at a time

23:07 <geist> you have to make sure there's no stale icache entries that cover it

23:08 <geist> of coruse you also need to flush the icache whenever you modify the data in the page too

23:08 <mrvn> or when migrating a process to another core

23:08 <geist> which you can kinda think of is also what happens when you copy or fill a page with code

23:08 <geist> no, not when migrating

23:08 <geist> or at least if you follow the rules above, then multicore doesn't matter

23:08 <mrvn> if you haven't flushed the other cored icache when loading the .text then you have to do it on migrate.

23:09 <geist> but that's not how flushes work on armv8

23:09 <geist> you haven't gotten to that particular pile of complexium yet

23:09 <geist> TL;DR if you alway use global (broadcast) flushes then when you flush locally you flush on all cores simultaneously

23:09 <geist> and there's almsot no reason to not do that all the time

23:09 <geist> *especially* for icache flushes

23:09 <mrvn> ahh, nice. that's one headache gone.

23:10 <geist> yah TLB and cache maintenance is much more straigthfroward (though of course complex) on armv8

23:10 <geist> than v7 and especially x86

23:11 <geist> klange: so yeah back to the basic first princple: before you allow a process to run code on a page that has been modified since the last time code *may* have run on it, you must flush the icache for that page (or globally)

23:11 <geist> if you always follow that rule you should be okay

23:13 Oli has quit [Quit: leaving]

23:17 <klange> I think I see what I did wrong. I was actually trying to clear icache, but I was calling a function I had set up for exec + ld.so and it does a page walk to check if the address given is accessible to userspace...

23:17 <klange> And I was running it on a temporary kernel mapping

23:17 <geist> yah for now i'd do a global i cache flush

23:17 <geist> it's a single instruction iirc

23:17 <geist> there are Reasons for that (VIVT vs VIPT)

23:18 <geist> doing a global flush is sub optimal but it ensures that all aliases to the page are got

23:18 <geist> (there's a bit in CTR_EL0 that tells you if you can do this or not, but a global flush is safer for now)

23:19 <geist> these are all things i've learned the hard way over the last few years

23:19 <mrvn> Are there ARMv8 that don't support 4kB granule?

23:20 <geist> not that i know of. but 16K and 64k are not guaranteed

23:20 <geist> if there's gonna be a core that ditches the 4k granule it'll be apple with their M2 or whatnot

23:20 <geist> because OSX runs with 16k granules. i suspect they only have 4k around because of x86 emulation

23:21 <geist> most newer ARM cores, post about cortex-a75 or so support all 3

23:22 <klange> I have no solid way of testing this beyond running `sysinfo` (does a bunch of forks to call shells that call utilities) over and over and see if anything crashes

23:22 <klange> so far so good

23:23 <geist> yah that's the hard part. this is a toughy to test, or even reproduce reliably

23:24 <klange> unrelatd, but ever since I moved a couple of cables around on my desk my serial console has been stable :)

23:24 <clever> when i was getting linux to boot on the pi3 i think it was, i had a real nasty problem, where as soon as i got into userland, i had major data corruption

23:24 <clever> i eventually tracked it down to linux lacking permission to flush the arm L2 cache

23:24 xenos1984 has quit [Read error: Connection reset by peer]

23:24 <klange> I think it was electrical interference from charger.

23:24 <clever> so any dma was not being coherent with the caches

23:25 <clever> i think it was ACTLR that controlled that?

23:25 <clever> but linux ran surprisingly well in that mess, until you hit userland

23:28 <klange> My startup process also launches about 60 or so processes before getting to a desktop, so booting at all is at least somewhat of a stress test.

23:30 <geist> yah though fresh processes will never trigger a stale thing

23:30 <geist> it's when you start cycling through things

23:31 <mrvn> if you don't reuse pages immediately you can run quite a bit before a page gets reused and by then the cache is probably clear.

23:31 bauen1 has joined #osdev

23:32 <klange> I think my current physical page allocation pretty aggressively reuses pages.

23:32 <mrvn> or if you have ASID that probably extends the time till you can get a collision.

23:32 <geist> indeed, though cache entries arent tagged by ASID

23:32 [itchyjunk] has quit [Remote host closed the connection]

23:32 <geist> ASID jsut lets you keep from excessively flushing TLB entries

23:33 <mrvn> my page allocation is FILO or LIFO too.

23:34 <klange> I have a dumb bitmap allocator with a floating index to the next place it thinks has available space. Whenever something is freed that is lower than that index, the index will be moved down; whenever any allocation happen, the index moves to the page after that; the bitmap gets walked from that index until an available page is reached; if it hits the end, it resets the index to the bottom and tries again

23:34 <klange> before panicking.

23:35 <mrvn> I just have a stack. No memory wasted.

23:35 <klange> I can fairly easily find arbitrarily sized contiguous segments.

23:35 <mrvn> assuming you have any. :)

23:35 <mrvn> I decided that when I need it I can defrag phsical memory.

23:37 <mrvn> do you support huge pages?

23:37 <klange> Not for anything I give to userspace.

23:37 <mrvn> for anything that may allocate later or just during boot?

23:38 <klange> Just during boot.

23:38 heat has joined #osdev

23:38 <mrvn> That's easy then. You always have long ranges free at boot.

23:39 <heat> updog

23:39 eddof13 has joined #osdev

23:40 <mrvn> My though process was like this: Assume the system has been running a month. Now something wants a 2MB page. What's the chance the physical memory is not totaly fragmented and there is a 2MB contigous chunks?

23:40 <mrvn> Probably 0.

23:40 <zid> what's up dog

23:41 <mrvn> wuff wuff

23:41 <heat> it's all good

23:41 <heat> wbu

23:41 <zid> friendo isn't here to play scrabble with me tonight, sadge

23:42 sonny has joined #osdev

23:42 <mrvn> I once tried to make a scrabble AI that played optimal. It's rather hard (as in takes ages to make a move).

23:42 xenos1984 has joined #osdev

23:42 <heat> friends that miss scrabble night are bad friends

23:42 <mrvn> friends should play scabble online to avoid covid.

23:43 <nanovad> good news, covid isn't a valid scrabble word

23:44 <mrvn> Note to self: Build a pair of scrabble bots that synchronize a board over the internet.

23:44 <mrvn> nanovad: lol

23:45 <kingoffrance> consider all the words that slowly change meaning, and spelling ^ nanovad i think it trends towards halting problem

23:45 <mrvn> and by scrabble bot I mean: https://www.youtube.com/watch?v=pCBufhnrbDE

23:45 <bslsk05> 'ITRI's Scrabble-bot first look at CES 2018' by Engadget (00:01:49)

23:45 <kingoffrance> or at least, some kind of fractal jurassic park thing, chaos theory

23:45 <zid> of course it is

23:45 <kingoffrance> if php is a fractal of bad design, what is english?

23:46 <kingoffrance> words cannot describe

23:46 <mrvn> minus the AI. just placement of tiles.

23:46 eddof13 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

23:47 <klange> kingoffrance: English is a pickpocketer that has pilfered phonetics and philology from parents and pals

23:48 <zid> English corners languages in dark alleys for loose words and grammar

23:50 <clever> english isnt a language, its 5 languages in a trench-coat :P

23:51 <heat> tbf english is pretty simple when compared with other european languages for instance

23:51 <clever> "i before e except after c" was it?

23:51 <zid> no

23:51 <clever> how many exceptions are there do that rule?

23:51 <zid> that has never once been a rule, just a bad joke

23:53 <zid> English is complicated by just.. baggage, none of the vowels line up with out they're *presently* pronounced (which will again change), has more vowels than it has vowel letters by approximately 25, variously re-spells and doesn't respell foreign origin words, which eliminates things like phonotactical rules etc

23:54 <kazinsal> English is the Habsburg jaw of languages

23:54 <mrvn> I think I will use my favourite way to write asm to implement aarch64 boot.S: gcc -S

23:54 <mrvn> Urgs, that's a new warning: kernel/main.cc:27:10: warning: array subscript 0 is outside array bounds of ‘volatile unsigned int [0]’ [-Warray-bounds] 27 | while (*UART0_FR & (1 << 5) ) { }

23:54 <zid> incredibly free form grammar also makes incredible nuance often present

23:54 FreeFull has quit []

23:54 <zid> but at least it isn't japanese.

23:54 <mrvn> Since when can't I dereference a pointer?

23:55 <geist> dunno, how did yuo declare the UART0_FR?

23:55 <geist> if it's some [0] stuff that's probably not okay anymore

23:55 <mrvn> volatile unsigned int *UART0_FR = (volatile unsigned int *)(UART0_BASE + 0x18);

23:55 <zid> oh no, french uarts

23:56 <geist> huh. that's odd

23:56 <mrvn> zid: I think it's the Flag Register

23:56 <heat> mrvn, you can always use C

23:56 <heat> no need to -S

23:56 <mrvn> heat: boot.S. Have to setup the stack and MMU first.

23:57 <klange> stack yes, mmu not really I do my mmu setup in C

23:57 <heat> yes, C still works

23:57 <heat> all you need to give it is a stack

23:57 <heat> and that's trivial

23:57 <geist> not really. no

23:57 <mrvn> heat: nope, the code isn't 100% position independent. Have to map it to a fixed address first.

23:57 <geist> in this case there's a fair amount of set up before yo can do arm64

23:57 <heat> how?

23:57 <geist> stack, control registers, position independent code, etc

23:58 <clever> PIC is the biggest problem i can see there

23:58 <geist> but you dont have to write a *lot* of asm, you jsut can't get away with doing it in pure C

23:58 <mrvn> heat: any static initialization involving an address of aomething is not PIC

23:58 <geist> there are some control regs you need to set up pretty quickly because of assumptions of C

23:58 <geist> SCTLR bits basically

23:58 <heat> well, yes, I meant that you write a small _start that pretty much starts calling C code

23:58 <geist> not a big deal, jsut gotta do it before getting into C

23:59 <mrvn> heat: and I do that in C, g++ -S, check and fix PIC issues.

23:59 <geist> yah or godbolt, good way to learn the arch