#osdev on 2023-03-29 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:04 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

00:07 elastic_dog is now known as Guest497

00:07 elastic_dog has joined #osdev

00:25 <geist> yay

00:26 <heat> overcomplicating things for the sake of performance is always fun

00:27 <heat> believe me!

00:28 <bnchs> so i'm thinking of a data structure

00:29 <heat> wait, what's this "memory map" for?

00:29 <bnchs> heat: memory map for an emulated CPU

00:30 <bnchs> i'm literally trying to design one in a drawing

00:30 <heat> you could totally have an rb tree here

00:31 <heat> I think, I don't know how fragmented your thing is

00:45 <bnchs> heat: i read my OS' documentations, they say fragmentation is a issue in their implementation

00:45 <bnchs> sooo, let's just say, i'll implement that

00:46 <bnchs> and call it a feature

00:47 Coldberg has joined #osdev

00:48 C-Man has quit [Read error: Connection reset by peer]

00:48 <bnchs> also thanks for the idea, i was thinking of something similar to that, but i forgot the name

00:51 rnicholl1 has joined #osdev

01:16 <heat> bnchs, implementation of what?

01:16 <bnchs> heat: implementation of their memory allocation system

01:16 <heat> but you're not supposed to allocate your emulator's memory chunk by chunk

01:17 <heat> you get a big chunk of memory thru mmap or something, and it Just Works

01:19 <heat> in all honesty since using an actual tree probably sucks, you may want some other scheme

01:20 <heat> the stupid simple one is to have from 0 to N as DRAM, and then from N to infinity and beyond as MMIO

01:20 <heat> this is how x86 PCs work

01:21 <heat> except those weird fucking bits in the legacy space and the stuff right under 4GB, but those are just warts in x86 (who would've guessed, warts in x86)

01:24 <netbsduser> the virtio spec is a bit of a joke

01:25 <netbsduser> its virtio-fs section basically says "lmao we have not described the protocol, here is a link to a linux header file"

01:25 <heat> that's also a problem with the virtio-gpu stuff

01:25 <heat> particularly the 3D bits

01:26 <netbsduser> the header alone is so far from adequate, at the least they could say what replies you get for different opcodes - i eventually gave up on one of them because there was no figuring it out (i would have to traipse through the linux kernel to figure it out)

01:27 <netbsduser> it turned out i made a mistake with one of them and was getting junk replies, which i finally fixed (and now gcc runs: https://i.imgur.com/8kSp4LH.png)

01:29 <heat> nice

01:29 <heat> did you check one of the BSDs?

01:29 <heat> fun fact: they check linux as docs all the time

01:29 <netbsduser> yeah, i used a mixture of free and openbsd

01:30 <heat> there's a good handful of stuff on the ext2/3/4 stuff that would be impossible to figure out unless you purposefully checked the linux code

01:30 <heat> also suspiciously similar code, etc

01:30 <heat> all very funny stuff

01:31 <heat> netbsduser, https://i.imgur.com/piCENbO.png im so fucking annoyed now

01:31 <netbsduser> virtio-fs follows the fuse wire protocol exactly while openbsd rolled theri own

01:31 <zid`> ext_two_II_2_also_too

01:32 <netbsduser> heat: oh, it doesn't mount it

01:32 <netbsduser> it just found that gpt partition

01:32 <heat> yes its just

01:32 <heat> the name

01:33 <heat> ext_extended_2_two_second

01:33 <netbsduser> ext4 is also poorly documented, the article on it on some linux wiki or other was clearly given up very quickly

01:33 <heat> it's not

01:33 <gog> hih

01:33 <heat> there's a full spec over at kernel.org

01:33 <zid`> goggles

01:33 <gog> ziddles

01:34 <netbsduser> >"NOTE: This is a work in progress, based on notes that the author (djwong) made while picking apart a filesystem by hand"

01:34 <zid`> gog, may I turn you into paste and spread you on toast

01:35 <gog> yeah

01:35 <heat> netbsduser, ok but it's entirely correct

01:35 <heat> and decently complete

01:35 <gog> what are you going to season me with

01:35 <netbsduser> heat: that's good to hear

01:36 <netbsduser> if i ever try ext4 at least i will have something to look at

01:36 <heat> i should complete/clarify some missing bits

01:36 <heat> but basically the big difference between ext4 and ext2 are just the extent stuff

01:37 <bnchs> heat: mmap in a 32-bit environment

01:37 <heat> and those are basically just a basic btree of {logical start, physical start, length}

01:37 <bnchs> now lemme get this straight, the address has to be within 0x0 to 0x00FFFFFF

01:37 <bnchs> also i'm not allocating my emulator's memory chunk by chunk

01:37 <heat> very, very standard stuff. ofc they don't really describe it as such, which is IMO one of the docs's shortcomings

01:38 <heat> the only thing I have not really looked at in depth is just the htree stuff. there are also some new feature for uninit stuff (inode tables, bitmaps, and extents), mainly for virtualization purposes, etc. so your driver needs to handle those if writing

01:39 <heat> you also only need the htree stuff if writing, because it's backwards compatible

01:41 <heat> erm, the uninitialized extent stuff is not really for virtualization/DISCARD purposes but rather for block allocation

01:41 <heat> oh and RE journal, yeah you're kind of screwed there AFAIK

01:44 frkazoid333 has quit [Ping timeout: 240 seconds]

01:44 <bnchs> like this is a 32-bit processor

01:45 <heat> if you're not allocating chunk by chunk, why do you need an rb tree?

01:45 <heat> why is it fragmented?

01:45 <bnchs> i need a rb tree for the memory map

01:45 <bnchs> and the memory allocation system is seperate from it

01:45 <heat> you do not

01:45 <bnchs> what do i need?

01:45 <heat> i told you what you need

01:46 <heat> a basic way to separate mmio accesses from memory accesses, memory gets all mapped straight up using mmap

01:46 <bnchs> again, how can mmap help me here

01:46 <bnchs> huh

01:46 <heat> user does ./myemu -m 1G

01:46 <heat> you mmap a 1GB chunk of memory, contiguous memory

01:49 <bnchs> how can mmap help make memory mappings that's.. basically like MMIO

01:49 <zid`> If only the cpu had a memory map built into its hw that you could use

01:49 <bnchs> when a program tries to read from it, it does a page fault, but only the OS can handle the page fault

01:49 <heat> why do you want to handle the page fault?

01:50 <clever> bnchs: if the OS cant handle the fault, it runs the SEGV handler, which is free to fake the answer and resume execution

01:50 <heat> (also, technically no, userfaultfd, but that's besides the point)

01:50 <bnchs> for virtualized memory-mapped input output

01:50 <heat> m8

01:50 <heat> can u read?

01:50 <bnchs> just like you said

01:50 <heat> <heat> i told you what you need

01:50 <heat> <heat> a basic way to separate mmio accesses from memory accesses, memory gets all mapped straight up using mmap

01:50 <clever> mmio can just be not mapped, trap the fault, then emulate the access in software

01:50 <heat> then on mmio you obviously need a separate way (rb tree or something should be fine, not perf critical)

01:51 <bnchs> clever: yes that's what i mean

01:51 <heat> but what matters here is that the stupidly common operation of reading and writing to DRAM Just Works

01:52 <bnchs> also this is overkill and possibly not portable for something that doesn't even use x86

01:52 <heat> what is?

01:52 <bnchs> mmap method

01:52 <heat> how is this overkill, whats your idea? how is this not portable?

01:52 <clever> the mmap method can be used on any arch

01:52 <heat> it has been portable since fucking 4.1BSD or something

01:53 <bnchs> this program is not meant to only run in Linux

01:53 <heat> windows has VirtualAlloc

01:53 <heat> every UNIX used in 2023 as a desktop OS has mmap

01:53 <klange> i swear toaru will have it soon

01:53 <klange> at least for this usecase; file mappings tbd

01:53 <zid`> when is unix getting mmapEx

01:54 <zid`> like windows has MapViewOfFileEx and VirtualAllocEx

01:54 <heat> sorry, you mean mmap2

01:54 <heat> this is unix m8

01:54 <klange> mmap2 is already at hing

01:54 <heat> is it?

01:54 <klange> (In Linux, and has been since 2.3!)

01:54 <heat> I know linux has mmap_pgoff or whatever they call it

01:54 <zid`> mmap2 is no good, I want an ex

01:54 <zid`> bonus plus: plus alpha

01:55 <klange> It's actually _probably_ the syscall your libc `mmap` is calling, too!

01:55 <heat> oh, mmap_pgoff is mmap2

01:55 <heat> cool

01:55 <bnchs> heat: the CPU emulator has memory read/write functions, which is how it accesses the memory

01:55 <moon-child> there's futex2 at least. And lseek64 or so

01:55 <heat> damn.

01:55 <moon-child> .oO( if mmap2 is so good how come there's no mmap2 2? )

01:55 <heat> we went through 63 lseeks before this last version

01:55 <heat> these linux people don't know how to build APIs do they

01:55 <zid`> google says mmap2 is bytes/4096, so that you can do 2^44 not 2^32 in 32bit

01:56 <clever> mmap64 just takes a 64bit byte count instead

01:56 <heat> bnchs, yes. it accesses what memory exactly?

01:56 <clever> but both of those, are hacks to allow a 32bit userland to access larger files

01:56 <heat> what's your idea here? malloc(1GB)?

01:56 <clever> a 64bit userland, just always has a 64bit byte offset

01:56 <bnchs> heat: the emulated OS's memory (which executes a function), and the executable itself can request memory

01:56 <heat> I guess that also works but is wasteful and will also use mmap

01:57 <heat> ok so the emulated OS's memory is very vague

01:57 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

01:57 <heat> requesting memory does start to get into memory ballooning territory or something, so i'm out

01:58 <bnchs> when the executable accesses the emulated OS' memory, the emulator runs a function to give it a version of the OS memory that is compatible with the executable

01:58 <bnchs> (endian differences)

01:58 <klange> _what_

01:58 <heat> ... so this isn't a normal emulator?

01:58 <bnchs> no

01:58 <heat> facepalm.gif

01:58 <heat> <bnchs> heat: memory map for an emulated CPU

01:58 <heat> for the record

01:58 <bnchs> yes

01:59 <bnchs> the CPU is the only emulated part

01:59 <bnchs> the rest is implemented as a compatibility layer

01:59 <heat> are you reinventing qemu-user

01:59 <bnchs> this is literally not running linux

02:00 <zid`> heat can you emulate a version of heat who doesn't speak english for a couple of hours so I can concentrate

02:00 <clever> i think we have to go back to step 1, what is the host cpu? what is the guest cpu?

02:00 <bnchs> the executable is NOT for linux

02:00 <heat> why do you think this is for linux

02:00 <clever> linux is not a cpu

02:00 <bnchs> because qemu-user assumes that it is

02:01 <heat> could you take what I say a little less literally?

02:01 <bnchs> alright

02:01 <heat> it's like, the whole fucking idea of qemu-user. make thing run on other thing, but userspace

02:02 <heat> and in this case, if this is userspace, why do you need mmio?

02:02 <bnchs> heat: to try to give the executable a translated version of the OS memory structure

02:02 <bnchs> for endian and pointer size differences

02:02 <clever> bnchs: what kind of executable? what guest cpu?

02:02 <heat> wha

02:03 <bnchs> clever: it's m68k

02:03 <bnchs> also i'm just responding to heat's suggestion to using mmap

02:03 <clever> bnchs: what OS are these executables normally talking to?

02:03 <bnchs> a emulated OS-9 environment

02:04 <bnchs> kinda like wine

02:04 <clever> and i assume the host cpu is never going to be m68k?

02:04 <bnchs> no

02:04 <clever> wine is not an emulator

02:04 <clever> its right in the anme

02:04 <clever> name*

02:04 <bnchs> yes i mean the design of the OS layer is kinda like wine

02:04 <bnchs> not the emulated CPU

02:05 <clever> so you basically have 2 choices

02:05 <clever> 1: emulate the cpu fully in software, just fetch an opcode, decode it, execute it, all ram access can just go into a big old byte-array, all mmio thru a function, convert endian as needed

02:06 <bnchs> congrats, you just stated the obvious, and what i already did

02:06 <clever> 2: JIT the m68k into host asm, let blocks of it run natively, and call pre-written functions when doing mmio

02:06 <bnchs> i guess i could do that for later

02:07 _whitelogger has joined #osdev

02:07 <clever> https://github.com/michalsc/Emu68 this is a baremetal m68k JIT based emulator

02:07 <bslsk05> michalsc/Emu68 - M68K emulation for AArch64/AArch32 (32 forks/245 stargazers/MPL-2.0)

02:07 <clever> it turns the m68k asm directly into aarch64 asm, it avoids the endian problem by just running in big-endian aarch64 mode

02:08 <clever> in this case, it runs the original amiga os under the emulation

02:08 <clever> and the crazy part, is that it maps the host peripherals directly into the guest, so drivers compiled to m68k, can interface with host peripherals (after doing a byte-swap)

02:09 <bnchs> this is not ummm

02:09 <bnchs> related to the original question

02:09 <heat> you do not need mmio

02:09 <heat> period

02:09 <clever> yeah, ive not seen any need for it

02:10 Arthuria has joined #osdev

02:10 <clever> just break out of the emulator upon any syscalls, and translate the syscall args

02:10 <clever> then handle the syscall however you want

02:10 <zid`> a lot of emulators for be on le just.. run the memory in the wrong endian

02:10 <zid`> and fix it up later

02:10 <zid`> if they type pun then it secretly emits the wrong instructions for it

02:11 <moon-child> meh

02:11 <zid`> makes writing cheat codes pretty weird

02:11 <moon-child> le cpus have free byteswaps

02:11 <zid`> le cpus have free puns

02:11 <moon-child> I would keep the memory in be

02:11 <heat> how do you hide it?

02:11 <bnchs> clever: now, tell me this, if a device filesystem driver requires accessing the system global variables (which the OS gives it by writing it in a CPU register)

02:11 <bnchs> it's going to access it without a syscall

02:11 <zid`> you can only detect it if you pun

02:12 <zid`> be needs +2 and le needs +0 or whatever if you try to dword -> short

02:12 <heat> bnchs, who called drivers into this?

02:12 <heat> what's a system global variable?

02:12 <bnchs> heat: this is literally apart of the emulator

02:12 <moon-child> zid`: sure, but you're not really going to need to do much (if any) of that

02:12 <zid`> that's why you keep it LE

02:12 <moon-child> since you're going to be operating on behalf of the emulated code

02:12 <moon-child> so keep it be

02:12 <zid`> because it's rare compared to "write it to memory" or "read from memory"

02:13 <moon-child> that way you never have to do any fixups for the emulated code

02:13 <zid`> the BE program has no idea its memory is all in the wrong order

02:13 <zid`> and it runs at full speed

02:13 <moon-child> you don't even have to try to remember

02:13 <bnchs> heat: system global variables are basically the kernel variables, which the filesystem driver can read and also write to

02:13 <clever> zid`: your assuming a BE program never does 8bit access to a 32bit int

02:13 <bnchs> it's important for some stuff like trying to change the behavior of the kernel

02:13 <moon-child> if you keep the memory in le, then you have to do extra bookkeeping so you can do the fixup when the program does differently-sized accesses

02:13 <moon-child> what if the emulated program does a memcpy?

02:14 <clever> yep

02:14 <zid`> clever: my assumig? no. I specifically said you need to do fixups for punning.

02:14 <zid`> Several times

02:14 <zid`> repeatedly

02:14 <clever> ah, skimming

02:15 <zid`> moon-child: memcpy using byte writes?

02:15 <bnchs> trying to say "i don't need mmio" while the program can access the OS's own memory at any time

02:15 <zid`> that's an edge-edge case :P

02:15 <moon-child> zid`: memcpy using any size writes

02:15 <zid`> irl riscy BE memcpy uses dwords

02:15 <moon-child> say memcpy uses 8-byte writes, and you've got a 4-byte int in memory

02:15 <zid`> the same size you're tracking memory at

02:15 <moon-child> whatever

02:15 Coldberg has quit [Ping timeout: 255 seconds]

02:16 <moon-child> or memcpy does misaligned accesses, assuming the architecture allows it

02:16 <zid`> like, I didn't just make this up

02:16 <zid`> lots of actual real life emulators do it

02:16 <zid`> that's why I started with "lots of emulators .."

02:17 <bnchs> now did you all get confused by that?

02:22 gog has quit [Ping timeout: 265 seconds]

02:30 heat has quit [Ping timeout: 248 seconds]

02:41 [itchyjunk] has quit [Read error: Connection reset by peer]

03:03 jtbx has joined #osdev

03:20 jtbx has quit [Quit: jtbx]

03:21 jtbx has joined #osdev

03:25 pmaz has quit [Ping timeout: 248 seconds]

03:35 tiggster has quit [Ping timeout: 240 seconds]

03:43 vdamewood has joined #osdev

03:56 Arthuria has quit [Remote host closed the connection]

04:08 xenos1984 has quit [Quit: Leaving.]

04:33 vdamewood has quit [Quit: Life beckons]

04:50 jtbx has quit [Quit: jtbx]

04:51 jtbx has joined #osdev

05:05 bliminse has quit [Ping timeout: 240 seconds]

05:06 bliminse has joined #osdev

05:45 slidercrank has joined #osdev

05:55 xenos1984 has joined #osdev

06:24 <mrvn> You can negate all your addresses and then your le memory will look be.

06:26 <mrvn> So the emulator translates any pointer access as "high - addr".

06:26 <moon-child> thanks, I hate it

06:27 <moon-child> also does that work? I'm not sure if it does

06:27 <moon-child> say I do a 4-byte access, and then another 4 byte access which overlaps 2 bytes of the other access, I don't think that gives the right results

06:29 <mrvn> moon-child: a 4-byte access specifies the end of the variable so you access "high - addr - 3" and then it works or something.

06:30 <mrvn> s/3/4/

06:30 <mrvn> Do all AArch64 support BE mode?

06:32 bgs has joined #osdev

07:07 <moon-child> ''NOTE: If the processor is executing code from the same memory area that is being used for the paging structures, the setting of these flags may or may not result in an immediate change to the executing code stream.'

07:07 <moon-child> good to know

07:11 <sakasama> mrvn: That's elegant enough, though then your memory accesses tend to involve decreasing addresses, which doesn't sound good for implicit prefetch.

07:12 GeDaMo has joined #osdev

07:12 <moon-child> sakasama: prefetcher handles descending accesses just fine

07:13 <sakasama> Write-combining too. :/

07:13 jtbx has quit [Quit: jtbx]

07:13 <moon-child> hrm, I assume that's also handled with aplomb, but don't actually know

07:14 <sakasama> moon-child: On architectures that support BE mode, I'd expect that, but not when emulating a BE arch on LE-only hardware.

07:15 <moon-child> prefetching is uarchitectural

07:15 <moon-child> and implementations of le archs handle it fine

07:17 <clever> mrvn: i think all aarch64 cores support BE mode, ive looked into it before, and at least with qemu and a linux guest, you must execute linux in LE mode, it will switch itself to BE, and faults if you try to run a BE kernel in BE mode initially

07:21 <sakasama> moon-child: Hmm... it seems Intel handles this case smoothly, but you get fewer prefetch streams for decreasing addresses so it's not quite symmetrical. The details appear to be model dependent.

07:25 <moon-child> interesting

07:26 <moon-child> is it that there are some ascending-only streams and some descending-only? Or some bidirectional and some ascending-only?

07:29 <sakasama> The documentation I've found sucks, so I can't tell. :/

07:30 <moon-child> what documentation?

07:34 <sakasama> Here, enjoy this rubbish: https://community.intel.com/t5/Software-Tuning-Performance/When-L1-Adjacent-line-prefetchers-starts-prefetching-and-how/m-p/1166311

07:34 <bslsk05> community.intel.com: When L1 Adjacent line prefetchers starts prefetching and how many cache lines it prefetch ? - Intel Communities

07:35 <moon-child> oh john mccalpin

07:36 <sakasama> There seems to be very little hard documentation on this, but here's another interesting one.

07:36 <sakasama> https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Hardware-prefetch-and-shared-multi-core-resources-on-Xeon/m-p/1074003

07:36 <bslsk05> community.intel.com: Hardware prefetch and shared multi-core resources on Xeon - Intel Communities

07:37 <sakasama> The last response especially.

07:37 gog has joined #osdev

07:38 <moon-child> 'I haven't encountered the number of prefetch streams as being as significant a limiting factor as the number of fill buffers (10 per physical core, ever since Woodcrest).' soo true

07:41 bliminse has quit [Quit: leaving]

07:42 <sakasama> Okay, I should stop. I can imagine myself wasting far too much time reading about this. I may as well watch The Hobbit on repeat.

07:45 <moon-child> lol

07:46 <moon-child> here is a question: for an sasos, so 1gb pages make sense? They technically let you fit more address space into your tlb, but depend on locality; I think the tlb only has space for 4 1g pages

07:46 bliminse has joined #osdev

07:47 <kazinsal> sitting here cackling like a madman because I've spent the past half hour debugging why something wouldn't work only to realize it was because I had an off-by-two in my cdecl math

07:48 <moon-child> I think riscv will let you cover basically all your physical memory (assuming remotely sane quantities of physical memory) with one massive page. Alas, not x86

07:51 <sakasama> My mind has purged most of my memories of x86 page tables. Now I am merely a depraved compiler developer.

07:52 <moon-child> I've tried, but I can't; I must play for all the teams

07:54 gog has quit [Ping timeout: 276 seconds]

07:55 <sakasama> I sometimes have a terrifying impulse to create my own mixed ternary architecture and intermittently hack in it during lapses of sanity, but otherwise I keep these urges under control.

07:58 <moon-child> ooh that sounds fun

08:04 bauen1 has quit [Ping timeout: 250 seconds]

08:20 hmmmmm has joined #osdev

08:23 hmmmm has quit [Ping timeout: 252 seconds]

08:26 <mrvn> moon-child: Are you sure that 4 1GB pages isn't a myth? With 32bit that was all you ever could have but with 64bit and todays ram having just 4 1GB pages seems rather limited for the TLB.

08:26 <mrvn> sakasama: you should make a quaternary architecture

08:29 <moon-child> mrvn: https://0x0.st/HoDu.26.png seems they are up to 8 on icelake

08:29 <moon-child> pretty sure it was 4 prior to that, though

08:30 <moon-child> oh this is interesting

08:30 <moon-child> actually

08:30 <moon-child> more complicated than that

08:30 <mrvn> you have 2 levels though.

08:31 <moon-child> it seems like it might be optimal to have a small number of 1gb pages and the rest 2mb pages?

08:31 slidercrank has quit [Quit: Why not ask me about Sevastopol's safety protocols?]

08:32 <mrvn> moon-child: You have 32 2MB entries, that's 64MB. Not really anything that can replace a 1GB entry.

08:32 <moon-child> yes, but the entries can be spread all over the address space

08:32 <moon-child> so the question is whether applications exhibit enough spatial locality for the 1gb things to make sense

08:32 <sakasama> mrvn: It's mixed by using balanced ternary arithmetic but with quaternary logical instructions.

08:33 <mrvn> second level seems to be split 1024/1024. But that's just 2 1GB pages you can replace with 2MB entries at the cost of blocking all 4k pages.

08:33 <moon-child> yes--don't care about 4k pages

08:33 <mrvn> Then use 1GB pages for the phys mapping. You probably don't have 1TB of memory.

08:33 <moon-child> I can't tell if this is saying the l2 tlb can store 1024 different 1gb pages

08:34 <mrvn> thatÄs how I read it. But 4k pages would compete for the slows and evict entries.

08:35 <mrvn> One question is how pages map to entries. If you map 1TB memory will they all go to separate indexes of will you have collisions and holes?

08:36 <moon-child> you mean like if they actually hash the address or just use the high bits directly?

08:37 <mrvn> The certainly do hash in some way. The question is how. It's 8 way so I assume each address can go to one of 8 slots. But if you have no competing 4k entries you should be able to get all 1024 slots filled with unique 1G pages.

08:38 <moon-child> https://0x0.st/HoDS.31.png yeah skylake only 4 slots for 1g pages in the l1 tlb

08:38 <moon-child> l2 doesn't share 1g pages with 4k pages, but also only has 16 slots for them

08:38 <mrvn> It's interesting that there is cache for loads and stores separately.

08:39 <moon-child> indeed

08:39 <mrvn> What if you write to a page you just read from? Does it have to fetch it from L2 again or can it use the "loads" entry?

08:39 <moon-child> I wonder if it has to do with the coherency protocol?

08:40 <mrvn> likely

08:40 <moon-child> like maybe it simplifies some of that logic

08:41 <mrvn> simplifies the dirty bit and write back too I bet

08:41 <moon-child> oh yeah

08:41 <moon-child> https://pvk.ca/Blog/2014/02/18/how-bad-can-1gb-pages-be/

08:41 <bslsk05> pvk.ca: How bad can 1GB pages be? - Paul Khuong: some Lisp

08:41 <mrvn> that's nearly a decade old

08:50 <mrvn> interesting read though, the numbers will be different nowadways so you have to benchmark again but it's a good idea to look at the best and worst case for each pagesize.

08:51 <mrvn> He mentions that huge pages leave more memory because page tables are smaller. But at 0.2% needed to map everything as 4k pages is that bit of extra memory really relevant?

08:54 <clever> mrvn: there is also the cost those page tables have on the d-cache

08:54 <clever> (i assume the tables partially live in both d-cache and tlb)

08:55 <clever> d-cache, because the cpu read something, but it can expire without harming the tlb

08:56 <clever> with 4 layers to the paging tables and 4k pages, a TLB miss will involve 4 cache lines becoming live (either refreshing the LRU, or causing a cache-miss&fetch)

08:57 <mrvn> The page walk has cache too

08:57 <clever> seperate from the i-cache and d-cache?

08:57 <mrvn> no idea

08:57 <geist> yup. think of it as TLB entries that jump the page walker to the end

08:57 <clever> ive never heard of one before

08:58 <geist> ie it can say 'for pages within virtual range X through Y, the terminal page table is located at physical address Z'

08:58 <clever> but i can see how that might work, you can just cache every node in the tree as you walk

08:58 <clever> and can skip to whatever node is the best, and resume the walk

08:58 <geist> no, that's not at all how it works. it just saves the terminal

08:58 <clever> ah

08:58 <geist> you did't read the arm manual deep enough then, it's described somewhat there. even a53 has it

08:58 <mrvn> How does that work with a mix of 1G/2M/4k pages? The terminal for 2M isn't the terminal for 4k.

08:59 <geist> you actually have to maintain it, since there's a bit in the TLB flush instruction that says whether or not you additionally flush the page table walker cache

08:59 <clever> ahhh, was wondering if it was just hidden under all of the tlb flush operations

08:59 <geist> mrvn: good question. i think it's best effort, so it stands to reason the walker cacche only works for 4K terminal entries

08:59 <clever> manual control means you have to document it better

09:00 <geist> x86 has the same thing, but it's completely transparent. it's mentioned in both intel and AMD manuals, but by default when you invlpg it also invalidates the page walker cache that may cover that range

09:00 <geist> AMD has a feature bit you can enable that lets you take over direct control of it, but i dont think anything really uses it

09:00 <clever> ah, so if you dont move the tables, you could keep that cache intact?

09:00 <geist> but with armv8 you absolutely have to be aware of it and it will absolutely bite you in the ass if you arent

09:01 <clever> and speed up walking after a tlb invalidate

09:01 <mrvn> clever: if you just change a page in the table you don't invalidate the page walker.

09:01 <clever> yeah

09:01 <geist> right. for ARM you explicitly invalidate page table entries like normal, but then if you move the page table you *additionally* want to invalidate the pt walker cache

09:01 <geist> so it's ideal, you choose to flush it when it actually changes

09:01 <mrvn> any change of a page directory would need to clear the page walker

09:01 bauen1 has joined #osdev

09:01 <clever> geist: but couldnt you also just say to wipe everything (tlb and walk cache) and ignore the problem?

09:01 <geist> right, that's what x86 does by default

09:02 <mrvn> clever: reloading the page table should do that

09:02 <geist> invlpg also invalidates the pt walker cache for that page table

09:03 <geist> if you dig into say the a53 manual, it'll describe how many pt walker entries there are, etc. it talks a bit about how the TLB sram array is carved up

09:03 <geist> and how many entries are for what. the tlb walker stuff is basically a seperate list of entries, iirc, with its own tags

09:03 <geist> it's implementation defined how it works, but a lot of the ARM cores do it more or less the same way, with varying sizes and implementation details

09:03 <geist> maybe they handle > smallest page granule PTs

09:04 <geist> would have to track at what level its a cache entry for, etc

09:04 <clever> ah, i think i see the issue, 600 page document for a63

09:04 <geist> that's really how i figured out how it works. the ARMARM only generically describes it, but then the specific core manuals basically tell you precisely what is stored in it, so you can figure out how it works

09:05 <clever> a few years back, i picked a random arm arm (forget which core), and just started reading it from page 1 while walking outside

09:05 <clever> either i was on the wrong core, or i didnt get deep enough in

09:05 gog has joined #osdev

09:05 vdamewood has joined #osdev

09:05 <clever> i do remember one bit about dumping the tags for the caches, i think?

09:06 <geist> section 5.2.4 Walk cache RAM

09:06 <clever> ah, was trying to ctrl+f for it

09:06 <geist> yep, if you read the part about the format of the cache dump you can infer a lot about the internal structure

09:06 <clever> that helps, and now that i look at the section headers, its incredibly obvious

09:06 <geist> • 4-way set-associative 64-entry walk cache.

09:07 <geist> ah here's the large page answer: "The walk cache RAM holds the result of a stage 1 translation up to but not including the last level. If the stage 1 translation results in a section or larger mapping then nothing is placed in the walk cache."

09:07 <clever> one slightly odd thing i noticed with the rp2040, if you disable the XIP cache, you can reuse its data ram as regular ram, its just mapped into the addr space

09:07 <clever> but its tag ram isnt mapped

09:07 <geist> yah the tag ram may not have standard ram layout maybe

09:07 <clever> so while you might be able to cheat, and peek at the cache's data ram, you dont know where that came from

09:07 <clever> so you cant abuse that as a debug method

09:08 <clever> but arm has proper debug access, via the co-processor web

09:08 <clever> though, arm doesnt allow arbitrary rw access to the cache ram, so you cant repurpose it as more normal ram

09:09 <geist> Table 6-15 Walk cache descriptor fields is the one you want

09:09 <geist> describes the 128 bit tag ram for the walk cache basically

09:09 <geist> well, 117 bit actually

09:11 <moon-child> it feels like with this kind of stuff the arch has a tendency to get overfit to the uarch

09:12 <clever> section 5.2.1 says there are 10 micro tlb's, on each of the instruction and data "sides", what exactly is a "side", a core?

09:13 <clever> or are they just refering to the 2 halves of the L1 cache, L1i and L1d as 2 sides?

09:13 <geist> yeah i suspect that's what they're saying

09:14 <geist> it's not strictly speaking a half, sice it's possible (though not in this case) for there to be a dissimilar amount of space dedicated to i or d caches/tlbs/etc

09:14 <geist> usually larger amount of icache and perhaps a correspondingly higher amount of micro itlb

09:14 <geist> though seems in this case it's symmetric

09:15 <clever> *checks notes*, pi3 is cortex-a53, it has 32kb of L1i and L1d, i think its 0x80 sets for d-cache, but i-cache doesnt have a valid set count?

09:15 <geist> but it'd be wrong to call it a 'half'

09:15 <clever> L1d had 0x700fe01a in its description, and L1i 0x201fe00a

09:15 <geist> dunno, it should describe it

09:15 <clever> yeah, i'm just not sure i'm decoding the description right, so i logged the raw encoding too

09:15 <geist> it is only 2 way set associatinve on the i cache

09:16 <geist> so maybe that causes it to have more sets

09:16 <geist> see section 6.1

09:16 nyah has joined #osdev

09:16 <clever> my notes say assoc is 3 and 1, but i think everything is stored as n-1

09:16 <geist> yah

09:17 <clever> ah found it, section 4.3.22 and its relatives

09:17 <clever> thats where those 32bit numbers i pasted came from

09:17 <clever> cache size id register

09:18 <geist> yup

09:19 <clever> L1i claims to have an assoc of 0xff+1

09:19 <clever> but that seems like an escape hatch for N/A

09:19 <clever> sets*

09:20 <clever> 6.1 says its 2-way

09:20 <clever> ah, but thats assoc, not sets, hmmm

09:21 <clever> where did i put my notes on caches

09:21 <geist> yah youd need 2x sets if your assoc is 1/2 i think

09:21 <geist> anyway, going to bed. have fun!

09:21 <clever> its 6am, why am i not sleeping? :D

09:21 * moon-child tosses pillow at clever

09:22 * clever re-reads http://csillustrated.berkeley.edu/PDFs/handouts/cache-3-associativity-handout.pdf

09:23 <mrvn> clever: doesn't every ARM core boot with the cache as ram and then you configure the DRAM?

09:23 <mjg> clever: you not sleeping *yet* or *anymore*

09:23 * clever points at rpi

09:23 * mjg woke up around 4 am

09:23 <mrvn> (any normal ARM, RPi is not normal :)

09:24 <clever> mjg: not yet, i expect to pass out around 10am

09:24 <mrvn> clever: doesn't rpi do that too though? configuring the DRAM though becomes using the VCs ram.

09:24 <clever> mrvn: yeah, i think a normal arm starts in rom, and just turns the cache on blind (ramless)

09:24 <clever> mrvn: yeah, but with the VPU cache, not the arm cache

09:25 <clever> there is a tiny bit of sram, unknown why, the rom then uses vector-stores to write nulls to the L2 cache, and zero out the whole thing

09:25 <clever> then it runs from that L2 cache

09:25 <mrvn> I always wanted to write a mini OS that keeps running just in cache. Who needs ram anway?

09:25 <clever> i suspect its using a vector-store, to avoid triggering a line-fill

09:25 <clever> the way the code acts, it feels like its not a proper cache-as-ram mode

09:26 <clever> its more, dont miss or evict, and the cache will never catch on

09:26 <mrvn> wouldn't that fail on the first eviction though?

09:26 bauen1 has quit [Ping timeout: 265 seconds]

09:26 <clever> yep

09:27 <mrvn> you better be careful you don't alias any pages. :)

09:27 <clever> the rom uses an invalidate control op, to just reset the entire L2, and then vector-stores to fill it without causing a miss

09:27 <clever> then stage1 can live entirely within a 128kb range, starting at 0

09:28 bauen1 has joined #osdev

09:28 <mrvn> Does the ARM on the rpi actually have ROM or is it getting that from the VC as read-only mapped memory?

09:28 <clever> no arm rom, and its not even ro mapped

09:28 <clever> the arm just begins with PC=0, which is plain old ram

09:29 <clever> so the VPU has to drop some arm asm at the front of ram first

09:29 <clever> for backwards compatability, the arm is always forced to start in 32bit mode

09:29 <mrvn> the VPU could map it read-only though, map a bit of actual ROM to 0

09:30 <clever> but the VPU can opt-in to aarch64, if its aware of that, and the hw supports it

09:30 <gog> mew?

09:30 <clever> mrvn: i have yet to find a way to map things into the arm physical space, in a read-only manner

09:30 <mrvn> gog: rpi silliness

09:30 * moon-child pets gog

09:30 <clever> but there are several unused bits in the broadcom mmu

09:31 <mrvn> clever: you map it readonly in the VPU and then the ARM gets a "bus error"?

09:31 <clever> the VPU has no mmu on its end

09:31 <clever> physical or go home!

09:32 <mrvn> Is has those 16 (was it?) memory regions you can map

09:32 <clever> re-reading section 6.1 of the a53 docs, the L1i is 2-way, 64 byte cache lines, and my notes say 32kb, so thats 512 cache lines, broken up into 256 pairs

09:32 * gog prr

09:33 <clever> mrvn: 64 pages, of 16mb each, but that mmu only impacts what the arm thinks the physical space is

09:33 <clever> the VPU is entirely unaffected by that

09:33 <clever> and its unknown if it has any permission flags

09:33 <mrvn> ahh, that way around. I knew there was a 16 in there

09:33 <clever> from how the bit-shifting is done, you have a 2mb resolution, on the target of each page

09:34 <clever> ive not tested if 16mb alignment is required

09:34 <gog> in c++ abi are reference parameters always just pointers or are there implementations that don't do that?

09:34 <mrvn> is it used for more than moving the peripherals around?

09:34 <clever> gog: i always assumed references are just pointers hidden behind a simpler syntax

09:34 <gog> that's what i'm finding too, but there were allusions to this being implementation-defined

09:34 <mrvn> gog: references are pointers that can't be nullptr.

09:34 <clever> mrvn: yep, i have booted a pi1 with peripherals at 0x3f00, and a pi3 with peripherals at 0x2000

09:35 <clever> basically, rpi didnt want to fragment ram, so they put peripherals just after ram

09:35 <clever> but ram kept growing, and they didnt want to break compat with a firmware update

09:35 <mrvn> and that is kind of a moving target

09:36 <clever> and now mmio is a moving target, what model are you on?

09:36 <mrvn> both problems are basically solved by the DT

09:36 <clever> i solved them in an even more fun way

09:36 <clever> i have a #define for where the peripherals live

09:36 <moon-child> gog: I don't know shit about c++, but I think that if a caller has int *x; f(*x), and the callee is void f(int& x) { &x }, the callee's &x has to match the caller's x

09:36 <moon-child> so it has to be a pointer

09:37 <clever> the VPU side of the firmware, reads that, to configure the mmu

09:37 <clever> the arm bootloader reads that, to offset all peripheral access

09:37 <clever> and the arm bootloader patches the DT automatically, based on that

09:37 <gog> that makes sense

09:37 <clever> so i can just put peripherals in any of the 64 pages (except page 0), recompile, and it magically works

09:37 <mrvn> moon-child: it has to eventually point to the same address. But you could have different memory representation for pointer and reference

09:38 <moon-child> I suppose

09:38 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/arm/arm.c#L331-L350

09:38 <bslsk05> github.com: lk-overlay/arm.c at master · librerpi/lk-overlay · GitHub

09:38 <mrvn> gog: potentially a pointer could be struct { void *current, void *start; size_t len; } while a reference is always a single object so start/len make no sense.

09:39 <clever> mrvn: line 333 will map every page to the framebuffer, as a sort of alarm, any writes will be visible, 338 then re-maps the lower 64mb to ram, 342 maps some highmem, 346/7 map mmio twice, and 350 map the framebuffer

09:39 <moon-child> what if the caller says int x[2]; f(x[0]) and the callee is void f(int& x) { (&x)[1] = 27 }

09:40 <clever> part of that is just playing around with options, and i could just do a linear map of all ram, plus mmio at the tail, for every model

09:40 <moon-child> is that legal? I don't know c++, again, but it seems like it should be

09:40 <mrvn> moon-child: then I would think that's UB but works.

09:40 <moon-child> see also containerof and such like

09:40 <clever> but i could also randomize every page of ram, and update dma-ranges on boot

09:40 <mrvn> moon-child: breaks aliasing rules I bet

09:40 <clever> respect dma-ranges, or fail

09:40 <clever> no cheating, it changes on every boot!!

09:40 <moon-child> it doesn't seem like it should but idk

09:41 <moon-child> hmmmm. I think some stuff supports hotplug ram. Which seems fairly marginal but I'm sure is helpful sometimes. But I wonder--if a kernel has support for that, the same abstraction can be reused to let vm guests share memory dynamically with the host (and thence other guests); do any kernels support this or is there an established protocol for it?

09:41 <mrvn> moon-child: the contract says you get a reference to a single int. Accessing the int past that violates the contract.

09:41 <mrvn> moon-child: int y; f(y); would be bad

09:42 <clever> the price for avoiding pointer syntax, is that you cant treat it as a pointer and increment the addr

09:42 <clever> you must treat it as a regular variable

09:42 <moon-child> mrvn: obviously, but I don't see what that has to do with anything. If I write void f(int *x) { x[1] = 27; }, then int x[2]; f(x) is fine, and int x; f(&x) is not. It doesn't seem obvious to me that references would be different. It does seem plausible that the standard would distinguish, but it doesn't seem at all obvious

09:43 <mrvn> you should really forget about pointers in C++.

09:43 <moon-child> 'you should really forget about ... C++'

09:43 <moon-child> yes, I agree with that

09:43 <clever> yeah, references let you basically just ignore pointers

09:43 <moon-child> you can't put a reference in a struct, can you?

09:43 <moon-child> I don't like c++ references because they are not explicit

09:43 <moon-child> which means you have less local reasoning

09:44 <moon-child> gog: https://imgur.com/gallery/UZCmI

09:44 <clever> moon-child: ah yeah, structs/classes kinda need pointers, since references are args only

09:45 <mrvn> clever: use smart pointers

09:45 <clever> if you want anything more complex then static references

09:45 <clever> and yeah, smart pointers avoid a lot of the old problems with dumb pointer

09:45 <clever> s

09:45 <mrvn> moon-child: https://godbolt.org/z/aEcYKf9qY No warnings or errors. But that code really bugs me.

09:45 <bslsk05> godbolt.org: Compiler Explorer

09:46 <mrvn> In modern C++ a pointer should also just be a single object. Otherwise use spawn or ranges.

09:46 <mrvn> -w

09:47 <mrvn> moon-child: I hate that too. When you see f(x); you never know if x is passwd by value or by reference. Will it be modified or not?

09:48 <mrvn> worse if the function has multiple overloads.

09:48 <moon-child> hahaha can you have one overload that takes an reference and one that doesn't

09:48 <moon-child> where just the former mutates

09:48 <moon-child> and it pickes the former if it's an lvalue?

09:48 <moon-child> this is why I don't touch c++ :P

09:48 <mrvn> moon-child: f(int x), f(int &x), f(int &&x)

09:49 <mrvn> template<T> f(T x)

09:49 <moon-child> is the latter equivalent to the former?

09:50 <clever> mrvn: just double-checked the math on the custom broadcom mmu, the bus address is >>21'd before going into the MMU, so you have 2mb resolution as i remember, but you also have 21 "unused" bits in the pagetable, that could potentially contain flags?

09:50 vdamewood has quit [Quit: Life beckons]

09:50 <mrvn> With templates you usualy get a bunch of overloads for const and non-const that differ.

09:50 <clever> i once used templates and inline asm, to make some pretty crazy code

09:51 <clever> basically, you give the function an 8bit, 16bit, or 32bit int, and it will then dynamically generate asm based on what type you picked

09:51 <mrvn> in/out macros?

09:51 <clever> mostly constexpr

09:51 <clever> let me find the code...

09:52 <mrvn> inb/outb for 8bit, 16bit 32bit I ment

09:52 <clever> ah no, VPU vector opcodes

09:53 <mrvn> ahh yes, writing your own vector __builtins basically.

09:53 <clever> exactly

09:53 <clever> for (int i=0; i<16; i++) { int temp = a[i] * b[i]; if (store) c[i] = temp; if (accumulate) accumulator[i] += temp; }

09:53 <clever> the VPU can do this entire operation in just 2 clock cycles

09:53 <clever> for mult, a/b can be 8bit or 16bit, and c can be 8/16/32, accumulator is always 48bit

09:54 <clever> now, so you want to write a variant of the function, for every combination of (dont)store, (dont)accumulate, 8/16/32bit op1, 8/16/32 op2, 8/16/32/null dest, and every actual ALU op?

09:55 diamondbond has joined #osdev

09:55 <clever> or do you want the compiler to just do its job and build from a template!

09:55 Coldberg has joined #osdev

09:56 <clever> https://gist.github.com/cleverca22/79143cb23a50d572b9d527c9ea479492#f

09:56 <bslsk05> gist.github.com: simple-test.cpp · GitHub

09:56 <clever> mrvn: and here is the cursed code i wrote!

09:56 <mrvn> it's nice nowadays where you can use constexpr if. With template specialization that stuff becomes a nightmare.

09:57 <clever> i ran into trouble inserting H/HX/HY (the bit size specifier) with inline asm, there doesnt seem to be any way, beyond just writing it 3 times and using a constexpr if on sizeof

09:58 <clever> but luckily, the assembler also accepts H8, H16, and H32 as well

09:58 <clever> so i lied to gcc, and claimed thats an immediate it has to insert into the asm

09:58 <mrvn> hehe

09:58 <mrvn> I was wondering about that

09:58 <clever> also, take every combination i gave earlier, and double it some more

09:59 <clever> it supports both horizontal and vertical modes

09:59 <mrvn> I didn't think you could use the asm arguments to construct the memnonic. But in the end the compiler just inserts the text for the argument in the asm output and lets the assembler deal with it.

10:00 <clever> yep

10:00 <clever> its basically just a glorified compile-time printf statement

10:00 <mrvn> indeed

10:00 <clever> the only problem, is that you cant lookup a string, and inject that

10:00 <mrvn> At least you used %[name]

10:01 <clever> yeah, with this many args, and the order being all over the place, its the only way to make it manageable

10:01 <clever> but, half the power of this example code, is vpu-support-purec.h

10:01 <mrvn> I hate it when people have 20 line asm() statements with %0, %1, %2, ...

10:01 <clever> re-implementing the same functions, in plain old C

10:02 <clever> so you could write an algo on x86, and work within the same restrictions as the real hw

10:02 <clever> and once you work out all the bugs, like only having 16bit*16bit->32bit mults, you can migrate to the VPU, and not have to rewrite it all

10:03 <clever> purec mode, also allows plain old gdb to read the matrix

10:03 <clever> because its a regular old C array

10:04 <clever> but the purec method, also counts how many clocks it would have taken on real hw, so you can still kinda benchmark it

10:04 <clever> though, the dual-issue scalar and single-issue vector, still give opertunities to improve the code further

10:05 <clever> the slowest computation opcode i know of, is doing 1024 mults, and it takes 128 clock cycles to complete

10:05 <clever> but, once you start that operation, the cpu can go off and do normal scalar opcodes in parallel

10:05 <clever> it only blocks, if you try to do another vector operation too early

10:07 <GeDaMo> Can you get interrupts when a vector operation completes?

10:07 <clever> i suspect you can

10:07 <clever> as long as the ISR never touches a vector opcode, it just wont care

10:07 <GeDaMo> Thinking about a "job queue" for vector operations

10:08 <clever> something i need to investigate more, is interrupt latency, and how vector operations impact that

10:09 <clever> the scalar side is "dual issue", so the instruction decoder/scheduler can submit 2 opcodes to it in the same clock cycle, and they can both complete at once, but only certain combinations

10:09 <clever> and i suspect the decoder block can only do vector or scalar, so it cant issue orders to both sides at once

10:09 <clever> vector opcodes can also be rather huge, up to 80bits long, including all the operands

10:11 <clever> my rough understanding of the pipeline, is that you can issue a vector opcode, issue some scalar opcodes, service an irq, and potentially even return from irq before the vector opcode completes

10:12 <clever> but if you issue a vector opcode too soon after the previous, it stalls, does that stall block irq handling? or has it not really ran anything, and can abort trying to execute it?

10:12 <clever> things i need to test

10:14 <clever> another thing to keep in mind, is that there is no known way to fault on vector access

10:14 <clever> so you cant implement lazy context switching there

10:14 <clever> and the vector state is over 4kb in size

10:16 <clever> the official firmware solves this with a dumb old mutex

10:16 <clever> context switching can happen freely, but only 1 thread is allowed to use vector opcodes at any time

10:16 <clever> no save/restore, so expect it to be trashed next time you get the lock

10:20 tanto has quit [Quit: Adios]

10:20 pie_ has quit []

10:20 vancz has quit []

10:20 Bitweasil has quit [Ping timeout: 246 seconds]

10:20 tanto has joined #osdev

10:20 pie_ has joined #osdev

10:20 vancz has joined #osdev

10:21 tanto has quit [Client Quit]

10:21 pie_ has quit [Client Quit]

10:21 vancz has quit [Client Quit]

10:23 <moon-child> agh

10:23 <moon-child> my efi bootloader is now too big so it's making calls to chkstk

10:24 <clever> there should be a gcc flag to disable that?

10:24 tanto has joined #osdev

10:24 vancz has joined #osdev

10:24 pie_ has joined #osdev

10:24 <moon-child> oh no I was just being stupid

10:25 <clever> ah

10:25 <moon-child> stack allocated the structure with the memory map (fixed 256 entries inline)

10:25 <GeDaMo> There's no flag for that :P

10:26 <clever> thatll use up a lot of stack

10:26 <clever> there is a gcc flag for helping with that

10:26 <clever> -fstack-usage

10:26 <clever> genet.c:185:13:genet_init 4 static

10:27 <clever> boom, this function uses up 4 bytes of stack

10:27 <clever> mandelbrot.c:19:40:dump_entire_matrix 4112 static

10:27 <clever> oh god, and this one uses over 4kb!

10:27 <clever> partition.c:51:5:partition_publish 268 dynamic

10:28 <clever> hmmm, why was this dynamic....

10:28 <clever> https://github.com/littlekernel/lk/blob/master/lib/partition/partition.c#L65 STACKBUF_DMA_ALIGN(buf, dev->block_size); ah!

10:28 <bslsk05> github.com: lk/partition.c at master · littlekernel/lk · GitHub

10:29 <clever> its allocating room for 1 sector on the stack

10:29 Bitweasil has joined #osdev

10:32 diamondbond has quit [Quit: Leaving]

10:40 <kof123> " do any kernels support this" [hot plug ram] solaris? im sure i missed the context. think giant box that cannot suffer downtime maybe. of course, things would arguably be better redundant. but i guess "live hw upgrade" is an exception

10:41 <kof123> let me rephrase: sooooooooooooolaris

10:45 <kof123> what i mean by "arguably be better redundant" presumably another reason is swapping out something bad. so...what happened to those programs/data using that ram? vanished?

10:46 <kof123> its different if you planned it i guess, or got warnings first and manually could disable the part you think will fail first

10:52 <kof123> i would guess it was mainly or only for planned hw switch

10:55 <mjg> i thought linux can do memory hotplug, in a vm tho

10:55 <mjg> i would not put any bets on bare metal

11:02 theboringkid has joined #osdev

11:02 <kof123> and all the same for cpus

11:28 <kof123> i know nothing, talking out my...asterisk. feel free to tell me what is wrong here: it makes me think imaginary hw i might want some area (cpu(s)+ram) that is redundant for kernel/drivers, and also could be hot swappable (but, depends on however many are needed to reach "quorum"; so like raid, can survive some "copies" dying) .... and then a separate area, for userland programs, where the cpus and ram are hot swappable, but redundan

11:28 <kof123> is optional. basically, kernel and drivers get strong guarantee, programs are supposed to checkpoint or whatever else if they are concerned. this would be a hybrid, versus requiring everything be redundant.

11:28 * kof123 <-- knows nothing about hw

11:28 bnchs has quit [Read error: Connection reset by peer]

11:29 <kof123> just seems cheaper, and not having to guess "was that cpu or ram that died running something i cant recover from?"

11:31 <moon-child> yeah I'm sure there are all sorts of asterisks for doing it on hardware

11:31 <moon-child> I just wanna do it in a vm

11:36 <kof123> well, giant mainframe stuff...nonstop, tandem...fancy stuff probably figured it all out. just no idea if *everything* was redundant or not

11:36 <mjg> lmao https://www.youtube.com/watch?v=cwyH59nACzQ

11:36 <bslsk05> www.youtube.com: Why 111-1111111 is a valid Windows 95 key - YouTube

11:47 theboringkid has quit [Ping timeout: 252 seconds]

11:50 <kof123> bit of a strawman but also hilarious https://blog.davetcode.co.uk/post/21st-century-emulator/

11:50 <bslsk05> blog.davetcode.co.uk: Bringing emulation into the 21st century - David Tyler's Blog

11:51 <kof123> As with all modern design it’s crucial to adhere to the model of “make it work then make it fast” In 1974 when the 8080 was released it achieved a staggering 2MHz. Our new modern, containerised, cloud first design doesn’t quite achieve that in it’s initial iteration. One of the many beautiful things about a microservice architecture is that, because function calls are now HTTP over TCP, we’re no long

11:51 <kof123> limited to a single language in our environment

11:51 <kof123> MOV Swift Moves data from one register to another 257MB 4.68ms and so forth

12:01 danilogondolfo has joined #osdev

12:16 Left_Turn has joined #osdev

12:17 theboringkid has joined #osdev

12:19 xenos1984 has quit [Quit: Leaving.]

12:22 ThinkT510 has quit [Quit: WeeChat 3.8]

12:25 ThinkT510 has joined #osdev

13:08 Left_Turn has quit [Ping timeout: 252 seconds]

13:12 Left_Turn has joined #osdev

13:21 <nikolar> That's wonderfully cursed

13:30 <kof123> Estimated People Required (organic) 6.666796

13:34 * sakasama only eats organic people.

13:39 danilogondolfo has quit [Ping timeout: 260 seconds]

13:41 danilogondolfo has joined #osdev

13:42 bauen1 has quit [Ping timeout: 265 seconds]

13:42 bauen1 has joined #osdev

14:02 <gog> y'all wanna hear me complain about how much i hate the codebase i have to work with

14:03 <gog> blah blah blah tech debt blah blah blah

14:03 <gog> most of this shit was written before any of us on the current team started

14:03 <mjg> :D

14:03 <gog> i'm so sick of fighting the bad practices of our forebears

14:03 <mjg> brah

14:03 <mjg> looks like you done goofed

14:04 <gog> i didn't goof, i write perfect code all the time

14:04 <mjg> not being handed out a piece of garbage is new employment 101

14:04 <gog> i need a money

14:04 <gog> and i like this company

14:04 <mjg> rest assured the twats who wrote the code took all the rewards

14:04 <mjg> now that it does not work it is your fualt

14:04 <gog> one of them is a member of the board lmao

14:04 <mjg> as i said

14:04 <gog> our sales frontend is limping along

14:05 <mjg> this probably used to "provide great customer value"

14:05 <mjg> or so he was able to claim

14:05 <gog> there's so much copypaste in this codebase

14:05 <gog> there's so much useless redundnacy

14:05 <gog> it's bad for performance, it's costing us money

14:05 <mjg> are you reading thedailywtf yet?

14:05 <gog> lol yeh

14:06 <gog> i asked my boss for a sales front bug bash and feature freeze

14:06 <gog> he's hesitant

14:06 <mjg> here is a funny story

14:06 <gog> but istg this thing is going to crash and burn severely

14:06 <mjg> years back i was in a company which had a flagship product drowning in tech debt

14:06 <gog> it only continues to work by the grace of God

14:06 <mjg> anytimg i had to touch i was leaving nasty comments inside

14:07 <mjg> so i came up with a plan how to unfuck it and presented it to my boss

14:07 <mjg> he said no

14:07 <mjg> so i quit

14:07 <mjg> [one of the reasons]

14:07 <gog> i totally understand

14:07 <mjg> then they let me unfuck it on the way out ][lol]

14:07 <gog> i really don't want to quit about this

14:07 <gog> i want to fix it for real

14:07 <mjg> turns out the unfucking came with a funny bug which i don't rmeember

14:07 <gog> instead of these constant bandages

14:07 <mjg> but which took some time to find :D

14:08 <mjg> i guess it would have been better if the unfucking wad done by someone who stays there

14:08 <gog> all of this coincides with yandex and majestic and petal starting to index us

14:08 <gog> we have some critical flaw or MVC has some critical flaw

14:08 <gog> arguably the latter

14:08 <gog> but i can't find it in this fucking mess

14:14 <kof123> i think my 2 (web) coding jobs were: "we're throwing this all away eventually (production)" and "we're throwing this all away eventually (maintenance)"

14:15 <gog> i think i need to push harder for throwing the baby and the bathwater out

14:15 <gog> it's so fucked

14:15 <gog> i can't figure out why this one thing doesn't display

14:15 <kof123> *maintenance == legacy production, leftovers barely touched except when needed, somewhat separate

14:16 <gog> but the CEO is absolutely not going to approve that

14:16 <gog> so the other option is pushing for some kind of technical process where we do a static analysis, async/await figuring outs

14:16 <gog> etc

14:24 [itchyjunk] has joined #osdev

14:44 theboringkid has quit [Quit: Bye]

14:44 theboringkid1 has joined #osdev

14:47 theboringkid1 is now known as theboringkid

14:51 theboringkid has quit [Quit: Bye]

14:53 <mrvn> kof123: I believe Linux on s390 can physically and kernel wise hot-plug memory.

14:53 <mrvn> kof123: and balooning on VMs does it.

14:57 <mrvn> gog: what language is it in?

14:59 <mrvn> A great way to unfuck legacy c++ code is to look for all "new" calls and make them smart pointers. Then keep fixing compiler errors till it builds again. Repeat by going through all raw pointers in classes. And last annotate any raw pointer.

15:00 <mrvn> Once you update just that to c++11/17/23 best practices you basically have gone through all the code.

15:01 theboringkid has joined #osdev

15:08 <gog> mrvn: c#

15:08 <gog> asp.net mvc

15:08 <gog> and an unholy mess of javascript, typescript, shitscript

15:16 theboringkid has quit [Ping timeout: 248 seconds]

15:16 <gog> i think one of our problems is a mismatched await/async

15:17 <gog> i need to take some time and use this vs plugin that'll analyze that

15:18 theboringkid has joined #osdev

15:20 bnchs has joined #osdev

15:27 <sakasama> The only truly unholy mess in this world is my genetic code. Oh, and... whatever hell is being invoked when attempting to resolve data dependencies with paraconsistent logic.

15:29 <gog> true

15:29 <gog> according to my boss it's way better than it was when he started in 2017

15:29 <gog> like it was unusably bad when he started

15:29 <sakasama> That's terrifying nonetheless.

15:30 <sakasama> May Baphomet relieve you of your suffering.

15:30 <gog> takk

15:30 <gog> i have yet another bandaid submitted

15:31 <bnchs> sakasama: hi

15:31 <gog> maybe tonight i'll actually do osdev

15:31 <gog> or i'll play factorio

15:32 theboringkid has quit [Ping timeout: 255 seconds]

15:32 <bnchs> i had a power outage, but my laptop was running on battery

15:32 <sakasama> bnchs: Greetings. I have been attempting programming: it is not proceeding smoothly.

15:33 <bnchs> what did i do to spend the time? play games on emulators, fml

15:33 <bnchs> sakasama: what are you programming?

15:34 theboringkid has joined #osdev

15:34 <sakasama> A programming language to control an expert system intended to rewrite me into a more elegant form.

15:35 theboringkid has quit [Client Quit]

15:36 theboringkid1 has joined #osdev

15:36 theboringkid1 has quit [Client Quit]

15:36 <bnchs> really?

15:36 <sakasama> Yes. I have no life.

15:36 <bnchs> but i thought people were still reverse engineering the brain for centuries at this point

15:37 <sakasama> Irrelevant. I have achieved absolute desperation.

15:39 <sakasama> On the bright side, I can claim I'm still an operating systems developer, with myself as the architecture.

15:40 <bnchs> black-box reverse engineering a brain for centuries doesn't seem so exciting

15:41 <sakasama> I'm not reverse engineering any brains. If all I succeed at is recreating my own intelligence, or that of any human, the project shall have been an abysmal failure.

15:44 <sakasama> For instance, here is an approximate illustration of the median level of my cognitive functions: https://i.imgur.com/TGhAdkQ.jpeg

15:45 <gog> ah yes relating to marceline

15:45 <gog> mood

15:49 Dyskos has joined #osdev

15:55 <zid`> https://cdn.discordapp.com/attachments/232377444877729792/1090665334161277079/fCylyDIl.jpg

15:56 <gog> fair

16:08 Jari-- has quit [Ping timeout: 276 seconds]

16:24 slidercrank has joined #osdev

16:29 Dyskos has left #osdev [Leaving]

16:39 xenos1984 has joined #osdev

16:44 <Ermine> gog: may I pet you

16:46 heat has joined #osdev

16:46 heat has quit [Remote host closed the connection]

16:47 heat has joined #osdev

16:48 <heat> gog, omg omg you're doing osdev tonight?????

16:49 <heat> SOMEONE'S ON TOPIC, FINALLY

16:50 rnicholl1 has joined #osdev

16:50 joe9 has joined #osdev

16:51 <zid`> no fair he has a bat waifu

16:55 bauen1 has quit [Ping timeout: 255 seconds]

16:57 bauen1 has joined #osdev

17:01 * Ermine . o O (gog waifu)

17:21 <gog> i'm already somebody's waifu

17:21 * gog prr

17:23 xenos1984 has quit [Ping timeout: 248 seconds]

17:23 <mjg> heat: i got a developer response to lld threading

17:23 <mjg> heat: check it out https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160/2

17:23 <bslsk05> discourse.llvm.org: Avoidable overhead from threading by default - #2 by tschuett - LLD - LLVM Discussion Forums

17:24 <kazinsal> drake meme no: have a waifu / drake meme yes: become the waifu / drake meme activated eyes: become your waifu's waifu

17:30 gog has quit [Quit: Konversation terminated!]

17:32 rnicholl1 has quit [Quit: My laptop has gone to sleep.]

17:37 xenos1984 has joined #osdev

17:51 sortie has quit [Remote host closed the connection]

17:53 rnicholl1 has joined #osdev

17:56 bnchs has quit [Read error: Connection reset by peer]

17:57 rnicholl1 has quit [Ping timeout: 248 seconds]

17:59 sortie has joined #osdev

18:00 joe9 has quit [Quit: leaving]

18:07 frkzoid has joined #osdev

18:09 <mrvn> mjg: I want the ld.lld to use the makeserver. It should select on the token pipe and spawn an extra thread for every token is gets. Up to maybe 8 given how malloc degrades.

18:15 gog has joined #osdev

18:16 bauen1 has quit [Ping timeout: 255 seconds]

18:29 danilogondolfo has quit [Ping timeout: 255 seconds]

18:30 danilogondolfo has joined #osdev

18:31 bnchs has joined #osdev

18:33 <gog> hi

18:34 <lav> a gog!

18:34 <Ermine> hi gog

18:34 <gog> hi gog

18:35 <Ermine> gog

18:35 <gog> Ermine:

18:50 vdamewood has joined #osdev

19:20 valshaped has quit [Read error: Connection reset by peer]

19:22 valshaped has joined #osdev

19:28 linearcannon has quit [Read error: Connection reset by peer]

19:38 bauen1 has joined #osdev

19:43 DrinkThePoison has joined #osdev

19:44 vdamewood has quit [Quit: Life beckons]

19:57 Vercas6 has quit [Remote host closed the connection]

19:57 Vercas6 has joined #osdev

20:02 heat has quit [Read error: Connection reset by peer]

20:02 heat has joined #osdev

20:09 DrinkThePoison has left #osdev [#osdev]

20:13 gildasio has joined #osdev

20:27 slidercrank has quit [Ping timeout: 265 seconds]

20:36 <heat> mjg, weird mixed response

20:39 <mjg> i'm gonna srespond rather negatively tomorrow

20:40 <heat> maskray's going to be on vacation tomorrow lol

20:40 <mjg> lul

20:40 <mjg> in that case i'm gonna slam him in few h

20:40 <heat> so, glhf

20:41 <mjg> ultimately it is pretty apparent the insanity is not going to change

20:41 <heat> i think adding GNU make jobserver support is actually maybe a really good idea here

20:41 <mjg> so i'm going to patch it locally

20:41 <mjg> the *concept* would be great here

20:41 <mjg> but the reality of the job server would make it suck even more

20:42 <mjg> also note that wankers would spawn these threads anyway

20:42 <mjg> they would just add extra overhead to figure out how many can run

20:42 <mjg> the job server problem is the globally shared pipe

20:43 <mjg> which is a massive source of contention on non-handwatch-scale system

20:43 <mjg> s

20:45 <geist> THREEEEEEEADS

20:45 heat has quit [Read error: Connection reset by peer]

20:45 heat has joined #osdev

20:45 foudfou has quit [Remote host closed the connection]

20:46 foudfou has joined #osdev

20:52 <heat> you have to wonder what Solaris /bin/ld did here

20:52 <heat> OR IS IT PESSIMAL

20:52 <heat> does it scale mjg

20:52 <heat> did it spawn negative threads? a blackhole of threads? did it spawn 10000 threads?

20:52 <mjg> brendan gregg himself said that

20:52 <mjg> do you know that the venerable solaris scheduler has a perf bug where it keeps migrating threads for no reason?

20:52 <mjg> pretty funny

20:52 <mjg> while still at joyent

20:52 <heat> did you know that the venerable solaris was used by the nazis in world war 2?

20:53 <mjg> that was still sunos at the time

20:53 <mjg> get your facts straight

20:58 <mjg> heat: https://www.youtube.com/watch?v=6TYC5h4yz1o

20:58 <bslsk05> www.youtube.com <no title>

21:07 Coldberg has quit [Read error: Connection reset by peer]

21:07 Coldberg has joined #osdev

21:16 dude12312414 has joined #osdev

21:21 bnchs has quit [Remote host closed the connection]

21:42 bgs has quit [Remote host closed the connection]

21:47 Coldberg has quit [Ping timeout: 248 seconds]

21:47 bnchs has joined #osdev

21:48 GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]

21:54 bnchs has quit [Remote host closed the connection]

21:56 <heat> mjg, https://www.youtube.com/watch?v=7-_rK0KkB6k

21:56 <bslsk05> www.youtube.com: In Defense of Inefficiency - YouTube

22:04 heat_ has joined #osdev

22:04 heat has quit [Read error: Connection reset by peer]

22:05 heat_ is now known as heat

22:11 <sham1> That's something an inefficient person might say

22:29 <zid`> skill issue

22:39 danilogondolfo has quit [Remote host closed the connection]

22:40 Vercas6 has quit [Remote host closed the connection]

22:40 Vercas6 has joined #osdev

22:41 <gog> i'm not a very good programmer

22:41 <gog> :(

22:41 <zid`> what did you do now

22:42 <gog> idk yet

22:42 <gog> but it was something

22:43 <gog> or nothing

22:43 <gog> it's impossible to know the true nature of things

22:43 <moon-child> oh no

22:43 <moon-child> have you tried not doing that

22:43 <moon-child> or, alternately, doing it

22:43 <gog> i get the same result both times

22:43 <moon-child> you must not have done it hard enough

22:43 <moon-child> or maybe you did it too hard

22:44 <gog> exactly

22:44 <zid`> just remember that if it's broken, you weren't the last one to touch it

22:44 * moon-child touches zid`

22:45 <gog> maybe it's not the code that's broken, but me

22:45 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

22:46 Left_Turn has quit [Read error: Connection reset by peer]

22:46 <zid`> who touched you last? we blame them, problem solved

22:48 * moon-child pets gog

22:48 * gog prr

22:56 <heat> gog blame

22:59 <gog> yes

22:59 <gog> everything is my fault

22:59 fkrauthan has quit [Quit: ZNC - https://znc.in]

23:00 fkrauthan has joined #osdev

23:01 <heat> thats not the output

23:02 <heat> gog blame is git blame but for gog

23:02 <heat> calling it got would also work but the fucking openbsd weirdos took that name

23:19 dutch has quit [Quit: WeeChat 3.8]

23:26 dutch has joined #osdev

23:27 <kazinsal> even though I know my april fools osdev project isn't actually going to be done in time for april fools it's sent me down a rabbit hole of looking at custom 5.25" floppy low-level formats

23:28 <kazinsal> because clearly if I'm dumb enough to do a unix for a 5150, I may as well start fiddling with the sector gap length and squeeze out another 80K per disk...

23:57 <klange> I don't really have enough ready for a new PonyOS release and I'm going to be out of town this weekend.