#osdev on 2022-08-31 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:08 kof123 has joined #osdev

00:15 opal has quit [Ping timeout: 258 seconds]

00:20 gog has joined #osdev

00:22 <rpnx> Is it harder to program for arm64 or x64?

00:22 <rpnx> For kernel mode

00:23 <klange> that might be a controversial question; imo, arm is harder because of the memory model

00:25 <rpnx> I understand the C++11/C11 memory model, is that what you are talking about? or something related to mmu?

00:26 <rpnx> armv8 atomic map pretty well to C++ atomic memory orders

00:26 netbsduser has quit [Remote host closed the connection]

00:28 opal has joined #osdev

00:29 <geist> i'd say they're about the same for different reasons. x86 is probably simpler because it's a more well trodden path

00:29 <geist> but i think it'd be uncontroversial to say that riscv64 would be simpler than both

00:32 rpnx has quit [Quit: Quit]

00:32 rpnx_ has joined #osdev

00:42 <zid> I'll stick to my 2 protection levels with no gimmicks and really strong memory ordering

00:42 <moon-child> zid: that is 100% factually correct

00:42 <moon-child> unix thompson told me so himself

00:55 epony has quit [Remote host closed the connection]

01:05 gog has quit [Quit: byee]

01:07 rpnx_ has quit [Quit: This computer has gone to sleep]

01:08 xenos1984 has quit [Read error: Connection reset by peer]

01:11 <heat> zid, surely you mean 6 protection levels and a co-processor

01:13 rpnx_ has joined #osdev

01:19 <rpnx_> Are there any good examples of setting up virtual page tables and so on for ARM/raspberry pi?

01:20 <heat> rpnx_, see geist's lk

01:20 <heat> https://github.com/littlekernel/lk

01:20 <bslsk05> littlekernel/lk - LK embedded kernel (533 forks/2660 stargazers/MIT)

01:20 <zid> heat: if that's true, someone's short changed me out of 4 levels and a co-processor

01:20 <zid> those bastards

01:21 <heat> they're all there

01:21 <clever> rpnx_: its just normal arm, so it should work the same on any arm device

01:21 <heat> ring 0 - 3, ring -1, ring -2, intel ME

01:21 <zid> nobody uses ring 1 or 2, ever, and the rest are abstractions and aren't real

01:22 <rpnx_> I don't actually know the ARM instruction set that well outside of some userspace stuff, it's mostly that I wanted to use a raspberry pi since the SD card is easy to replace.

01:23 <heat> nah the rest are really there

01:23 <clever> rpnx_: the rpi is also able to network boot, which makes development far faster

01:23 <heat> SMM is definitely a separate protection level, and so are hypervisors

01:23 <heat> and *you* don't use ring 1 or 2 because *you* are poopy and smelly

01:23 gog has joined #osdev

01:24 <heat> we'd all be living in 3022 if we unlocked the full power of ring 1 and 2

01:25 * CompanionCube expects that even the VMS port doesn't use ring 1 and 2 and VMS actually has multiple kernel privilege levels doesn't it?

01:26 xenos1984 has joined #osdev

01:26 <\Test_User> exactly, no one will bother to make use of them till 3022

01:28 <rpnx_> There's a lot of code in littlekernel

01:28 <heat> fax

01:28 <clever> rpnx_: i have read it and am familiar with the mmu setup, do you want 32bit or 64bit arm?

01:29 <rpnx_> 64bit for sure, I'm trying to figure out how it works, for example, if there is some register that is set to provide a jump to a function that translates the virtual address to physical address or something like that

01:30 <clever> rpnx_: its the same as on x86, you create a multi-level paging table, that maps virtual to physical, and put the physical addr of that table into a special register

01:31 <geist> CompanionCube: that's a good question. x86-64 effectively doe't hae ring 1 and 2, so they havce to make it work with just 2 priviledge levels

01:31 <geist> though ia64 i dont think did either, i thnk, so they probably already did it a while back

01:31 <rpnx_> clever: so this register is just an array of pointers? One thing I am a little confused by is that I understand linux can add fields to the page table

01:31 <geist> or alpha for that matter

01:31 <heat> IA64 baby!

01:31 <rpnx_> That's different from the hardware page table? Or the same?

01:31 <clever> rpnx_: the register is a single pointer, to a big array in memory

01:31 <clever> that array typically then has pointers to more arrays

01:32 <clever> and some bits of those slots are instead flags instead of address

01:32 <rpnx_> Ok, how would I set the size of this array?

01:32 <clever> for aarch64, there is a register that sets how large the array is, i believe

01:33 <rpnx_> Ok, so there is 1 register that sets the page table and 1 register that sets the size of it

01:33 <clever> https://github.com/littlekernel/lk/blob/master/arch/arm64/start.S#L82-L300

01:33 <bslsk05> github.com: lk/start.S at master · littlekernel/lk · GitHub

01:34 <rpnx_> So, are the things pointed by the table able to be customized at all?

01:34 <clever> its a bit hard to read (one big block of asm), but this is the code that generates the initial paging tables

01:34 <heat> ok you're explaining this in a confusing way

01:34 <clever> https://github.com/littlekernel/lk/blob/master/arch/arm64/start.S#L303-L338

01:34 <bslsk05> github.com: lk/start.S at master · littlekernel/lk · GitHub

01:34 <clever> rpnx_: and this is what actually tells the hardware to use the tables

01:35 <clever> rpnx_: https://wiki.osdev.org/ARM_Paging also goes over everything

01:35 <bslsk05> wiki.osdev.org: ARM Paging - OSDev Wiki

01:36 <clever> ah, but that page is 32bit only

01:36 <heat> let's assume a 4-level page table layout (which is kind of the standard for 64-bit platforms with 4KB pages): You have L0 (u64 entries[PAGE_SIZE/8];), which point to L1 (u64 entries[PAGE_SIZE];), yadda yadda, until L3, which points to an actual page

01:37 <heat> sorry, PAGE_SIZE/8 in L1 as well, etc. it's always a page full of pointers(therefore 8 bytes) plus some other encoded information

01:37 <clever> and each tier is assumed to be aligned to something (i forget what), which means the lower N bits of that pointer are assumed to be 0

01:37 <heat> they're page aligned

01:37 <clever> which leaves those lower N bits in the entries array, free to be used for that other information

01:38 <heat> yes

01:38 <clever> for 4k pages, N=12 then

01:38 <heat> so, to map something you would need to create a path from the L0 table to the L3 table, where you would create your 4KB mapping - this is your base case (and the most common one)

01:38 <clever> on arm, that is where the cacheability is encoded, along with the r/w/x flags

01:39 <heat> architectures then usually have a hardware register where you keep the L0 table, plus possibly some other bits

01:39 <heat> ARM64 has <TO BE FILLED, CANT REMEMBER>, x86 has cr3

01:39 <clever> on 32bit arm, each entry in the L0 table covers 1mb of the virtual space, and it does support 1mb pages

01:40 <clever> so you can just do this, and ignore the other levels: https://github.com/librerpi/rpi-open-firmware/blob/master/arm_chainloader/mmu.c#L12-L32

01:40 <bslsk05> github.com: rpi-open-firmware/mmu.c at master · librerpi/rpi-open-firmware · GitHub

01:40 <clever> heat: arm has TTBR0 and TTBR1, for the lower and higher halves

01:41 <heat> TTBR? yes TTBR I think

01:41 <heat> riscv has the satp

01:41 heat has quit [Remote host closed the connection]

01:41 <clever> and there is a setting elsewhere for the size of each half

01:41 heat has joined #osdev

01:41 <heat> clever, discussing every stupid little detail doesn't help!

01:41 <clever> much older arm cores have just TTBR, for the whole space

01:41 <heat> lets keep it simple

01:42 <heat> rpnx_, following?

01:42 <rpnx_> Oh, I was reading the wiki

01:42 <rpnx_> Let me read this

01:44 sonny has joined #osdev

01:44 <heat> a decently useful detail here (when bootstrapping) is large pages

01:45 <heat> which is pretty much just a page table entry that instead of pointing to the next level, points to a large-sized, $size aligned very big page

01:46 <heat> for instance your L2 entries each correspond to a 2MB range of address space - instead of mapping things 4KB by 4KB with a lot of granularity, you could map it all in a 2MB chunk with a large page

01:47 <heat> this pretty much just saves you memory and hassle when bootstrapping; most bootstrapping code will use it

01:47 <rpnx_> Ok, so when looking at the page table, does this lookup happen in an interrupt? Or is the page table inspected by the hardware without executing any kernel code

01:47 kof123 has quit [Read error: Connection reset by peer]

01:47 <clever> yep, thats what i used in the mmu.c i linked above

01:47 <heat> depends on the architecture

01:48 <heat> most modern ones do page walks in hardware

01:48 <heat> but I T A N I U M does it in software because it's the best architecture ever

01:48 <clever> rpnx_: on arm/x86, the hardware reads the pagetable automatically, any time there is a TLB miss, the TLB acts as a cache, so it doesnt have to translate every single virtual address

01:49 <heat> yeah, the TLB is like a cache that remembers the mapping virt -> physical and its permissions, caching, etc; it has a limited size

01:49 <heat> but that's a problem for a later date

01:50 <heat> (this is a architectural detail that isn't explicitly exposed to you apart from "invalidate tlb mapping" instructions)

01:51 <clever> but i assume itanium instead only has a TLB and nothing else?, and software is then responsible for filling and evicting the TLB?

01:51 <rpnx_> Ok, so if I want special properties like 'this page is mapped to a device' that would be put in a secondary structure?

01:51 <heat> clever, oui

01:51 <heat> rpnx_, no

01:51 <clever> rpnx_: that would all go into the paging tables

01:51 <heat> no it doesn't

01:51 <heat> pages aren't mapped to devices

01:51 <clever> a 32bit arm example, with 1mb pages: https://github.com/librerpi/rpi-open-firmware/blob/master/arm_chainloader/mmu.c#L33-L36

01:51 <bslsk05> github.com: rpi-open-firmware/mmu.c at master · librerpi/rpi-open-firmware · GitHub

01:52 <heat> STOP LINKING SHIT

01:52 <heat> let's get the basics straight

01:52 <heat> so your platform/chipset/whatever the fuck configures devices to MMIO on a certain physical memory range

01:52 <heat> (MMIO = memory mapped IO)

01:53 <heat> the only thing you need to do wrt MMU and "device memory" is to make sure your caching is right (and coherent across all mappings of that page)

01:54 <heat> $caching_is_right depends on the architecture of course

01:54 kof123 has joined #osdev

01:55 <heat> and just so we're clear, this MMIO thing is how you talk with devices; you write some value to some location in MMIO range, and it goes to the device, and the device may do something and give the result back to you on another location

01:56 <heat> a very easy example would be a framebuffer - you write some RGBA values to the framebuffer, it goes to your GPU which pushes it to your display

01:57 <heat> rpnx_, still following?

01:57 <clever> a framebuffer is also a case where you would enable write combining

01:57 <rpnx_> I think I understand the concept but I'm looking for something like a document describing the fields in the page table structure

01:57 <heat> write combining doesn't matter gith now

01:58 <rpnx_> If I understand, it would be something like

01:58 <heat> rpnx_, ok, I can tell you the chapter gimme a sec

01:58 <heat> the arm arm is incredibly stupid and hard to go through

01:58 <heat> and large, very lage

01:58 <rpnx_> page_table[(pointer & (part of pointer)) >> offset][(pointer & (part of pointer2)) >> offset2][(pointer & (part of pointer3)) >> offset3]

01:58 <rpnx_> Something like this?

01:59 <heat> yes kinda

01:59 <clever> rpnx_: yeah something like that, i did have a graphic at one point that showed the entire bit routing

01:59 <clever> but i have lost it

02:00 <heat> of course that won't work because the permission bits and all that

02:00 <rpnx_> That's something I will need to understand for sure before implementing it

02:00 <heat> but conceptually you're right

02:00 epony has joined #osdev

02:01 sonny has quit [Ping timeout: 252 seconds]

02:01 <clever> heat: implementing it like that might actually explain spectre/meltdown, heh

02:01 <zid> (It's just a binary prefix tree)

02:01 <clever> check the permissions later, they dont matter!

02:01 freakazoid333 has quit [Ping timeout: 255 seconds]

02:01 <heat> a tiny detail clever was mentioning back there is that ARM64 divides the address space into two halves, so you have two CPU register that point to two L0 tables; the first one points to the bottom half of the address space, the second points to the top half

02:02 <heat> if you go through the math you'll notice that you have a hole in between the halves where nothing can be mapped - that's normal and pretty standard

02:02 <rpnx_> One for userspace and one for kernel?

02:02 <clever> yep

02:03 <zid> oh that's neat

02:03 <heat> since no one needs 2^64 bytes of memory, people only implement like 48 or 49 bits of the address space

02:03 <clever> c2d ~ # cat /proc/cpuinfo | grep sizes

02:03 <clever> address sizes : 36 bits physical, 48 bits virtual

02:03 <clever> for my random desktop, its 48bits

02:04 <rpnx_> Hum, so these are virtual, not physical? For example, I could map a page to the same location in user and kernel space?

02:04 <heat> rpnx_, define location

02:05 <zid> they'd have unique virtual addresses by virtue of being in different virtual memory ranges

02:05 <Mutabah> You can map the same physical page any number of times :)

02:05 <zid> but you can reuse physical memory as much as you like

02:05 <Mutabah> A common trick is to have one read-only page filled with zeros

02:05 <zid> (and will want to, for things like shared libraries)

02:05 <rpnx_> Right

02:05 <Mutabah> and map that everywhere that wants a blank page (and swap it out for a fresh one when the application tries to write)

02:06 <rpnx_> I don't plan to actually implement copy-on-write pages like that.

02:06 <rpnx_> One thing I don't like is unpredictable out-of-memory errors

02:06 <clever> ah, i found the image: https://documentation-service.arm.com/static/5e8e358afd977155116a8bb5?token=

02:07 <clever> its from https://developer.arm.com/documentation/ddi0333/h/memory-management-unit/mmu-descriptors/first-level-descriptor-address

02:07 <bslsk05> developer.arm.com: Documentation – Arm Developer

02:07 <rpnx_> Thanks

02:07 <heat> rpnx_, if you look at D4.3 you'll find the descriptor formats pretty well layed out

02:07 <clever> this shows some bits from TTBR0, and some bits from the virtual address, being combined, to form the address of a single entry in the L0 table

02:07 <heat> note: DISREGARD ALL THE STUPID COMPLEXITY OF ARM64

02:07 <clever> and then repeat down the tree

02:07 <zid> I used lazy mmap to solve a code golf thingy challenge once that was fun, I needed a trie so I just used the mmu

02:08 <heat> it has so many fucking options and realistically you just want to use 4KB pages, 48 bits (so 4 levels), two halves

02:08 <zid> I thought 64kB pages were the hotness these days

02:08 <clever> rpnx_: about your view on CoW: https://i.imgur.com/frNvbL7.png

02:08 <heat> you may want to try out some other stuff in the future, but that's super late stage shit

02:08 <zid> too much buffer bloat

02:09 <heat> also note that demand paging doesn't mean overcommit

02:09 <heat> you can feasibly manage that such that you can't crash from an OOM

02:09 <clever> yeah, you could CoW, but reserve enough space to un-CoW every single page

02:10 <heat> yes

02:10 <clever> zfs does that with volumes and snapshots

02:10 <heat> that's how linux works

02:10 <epony> the problem with arm* is more unique hardware (no two models are alike) and it has no standards

02:10 <heat> https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

02:10 <rpnx_> Most linux distributions I know turn on overcommit by default

02:10 <zid> epony: I also have no standards, which is why I use ARM

02:10 <zid> *rimshot*

02:10 <heat> epony, it does have standards!

02:11 <heat> EBBR, SBBR, SBSA

02:11 <zid> and those are just the ones that match his regex

02:11 <heat> all of these require UEFI, SBBR requires ACPI and PCIe(right?) as well

02:11 <klys_> ****

02:11 <epony> not quite, on paper maybe, on device.. surprises, and by the time you get something going, it's obsolete, expired, cancelled, eol and no more

02:12 <heat> yes, on device

02:12 <epony> so it's not a hobbyist platform, it's for big development teams and in vendor programming

02:12 <zid> arm is very standard

02:12 <heat> your server board needs to be SBBR compliant if you want to get ARM SystemReady(tm)(r) certification

02:12 <zid> I counted at least 50

02:13 <heat> the so called "x86 PC standards" are a lie btw

02:13 <heat> mostly propagated by ACPI and UEFI protecting you from the platform, plus you consuming the same family of platforms from the same vendors for the last 30 years

02:13 <rpnx_> Well, for now I am only targeting 1 device (raspberry pi 4) so those issues shouldn't matter too much for the time being

02:14 <heat> oh, also, SMM! can't forget good old SMM making sure you think you still have a PS/2 controller

02:15 <heat> the so called firmware bloat is also the thing make sure "hurr durr arm bad unstable, x86 good stable standardized" reminds kind of true

02:15 <heat> which is why ARM wants it as well

02:15 <heat> although I don't think ARM processors are growing SMM any time soon

02:16 * heat knocks on wood

02:16 <klys_> wouldn't nearly prognosticate a future arm system with no firmware payloads to chuck in

02:16 <epony> arm* is just following up x86 with late delays and same mistakes done again, for short lived times and expired product lines

02:17 <epony> in much much smaller quantities and general purpose adept produce

02:18 <epony> "vendor managed" ;-) much more than x86

02:19 <heat> anyway, bed time

02:19 <klys_> good night heat.

02:19 <heat> thanks

02:19 <heat> love you klys_

02:19 <klys_> yw

02:19 <rpnx_> The arm documentation doesn't make too much sense to me, for example.

02:19 <rpnx_> 0x0 -> 1<<(32-N) that TTBR0 controls

02:19 <rpnx_> What is this notation?

02:20 <rpnx_> (0x0, 1 <<(32-n)) or [0x0, 1 <<(32-n)) or (0x0, 1 <<(32-n)] or something else?

02:20 sonny has joined #osdev

02:20 <clever> i think `0x0 -> foo`, is an address range, from 0 to foo

02:20 <klys_> the range from 0 to 1 shifted left 32-n times

02:20 <clever> and 1<<(32-N) is the formula to convert N to an address

02:20 <clever> yep, that

02:21 * clever opens the 8000 page armv8 specs

02:21 <rpnx_> Uh... ok... so they invented their own notation for a range instead of using something standard...

02:21 opal has quit [Ping timeout: 258 seconds]

02:22 <epony> it is not called a reduced instruction set (of several such sets of no really reduced anything)

02:22 <epony> and it changes between models and products

02:22 <epony> and versions

02:24 heat has quit [Ping timeout: 260 seconds]

02:25 <rpnx_> Hum, it looks like this is the 32-bit code?

02:25 <geist> Yay I look away for a bit and fairly good conversations happen

02:25 opal has joined #osdev

02:26 <geist> rpnx_: think of it this way

02:26 <geist> The arm64 mmu is roughly equivalent in design to the x86 mmu and the riscv ones

02:26 <geist> the page tables work basically the same way, basically. The details matter

02:27 <geist> And arm64 has a bunch of extra knobs, but most folks configure it in the basic way (4 level paging, 4K pages)

02:27 <geist> So what you may be seeing are some of the controls that you can set, like how ‘large’ and address space can be, but usually you configure it for a full 48 bits

02:27 <geist> And that ‘N’ you’re seeing up there is i think one of the tunable knobs you usually leave in the default position

02:28 <klys_> geist, how much firmware on an internal default fuchsia setup today, curious

02:28 <geist> On x86 and riscv for example the configuring of the ‘size’ of the virtual address space is pretty coarse. You only really get to pick from 2 3 4 or 5 levels

02:28 <geist> Arm64 lets you finelycontrol it down to the bit within a range they define

02:28 <geist> klys_: hmm? I failed to parse that

02:29 <klys_> geist, supposing there was such a device prototype at your company, I ask, how much firmware is loaded on the prototype

02:30 <epony> these are SoC systems and have lots of firmware

02:30 <rpnx_> Oh, I think I understand how this works... maybe. The value of N sets the split point between the OS and userspace I think?

02:30 <rpnx_> But the limit here is 32

02:30 <geist> Yeah, also depends a lot on what’s ou mean by firmware

02:30 <epony> in the processor there are multiple blocks not just cores

02:30 <rpnx_> So... not going to access 8GIB with this

02:30 <geist> rpnx_: yep. Also to be clear precisely what are you reading and looking at?

02:31 <geist> Is this armv8, 64bit mode paging?

02:31 <rpnx_> 6.12.1 First-level descriptor address

02:31 <klys_> giest, blobs of data you know nothing about

02:31 <rpnx_> I think this manual is for 32-bit arm

02:31 <geist> Yeah but ti’s in the 64bit section in the arm … oh 32bit arm works differently

02:31 <geist> 64bit paging adds a bunch of features and looks similar but has different layouts in the page tables and whatnot

02:32 <rpnx_> https://developer.arm.com/documentation/ddi0333/h

02:32 <bslsk05> developer.arm.com: Documentation – Arm Developer

02:32 <geist> It’s different enough that I consider arm32 and arm64 different architectures

02:32 <geist> Basically 64bit is an ARM32 inspired 64bit architecture

02:32 <geist> Annoyingly arm insists on documenting both in the same manual

02:32 <rpnx_> Hum

02:33 <sonny> they did more than just extend the registers?

02:33 <geist> Much much more

02:33 <sonny> ok time to refresh on arm

02:33 <geist> The ISA is mostly different, and the system mode stuff, exception model, etc is basically completely different

02:34 <geist> That’s why I consider it a different arch. in LK for example there’s arch/arm and arch/arm64

02:34 <geist> You’ll find pretty much the same thing on BSD and Linux

02:34 <geist> The main connection is arm64 cores can (optionally) also run arm32 in various run levels

02:35 <sonny> so I never get the exception thing ... is that to support C++ or is it expected to be used by the OS programmer?

02:35 <rpnx_> Arm exception is interrupt

02:35 eryjus has joined #osdev

02:35 eryjus has quit [Client Quit]

02:35 <rpnx_> This will be very confusing when I add C++ support to the kernel :)

02:36 <geist> Yah i dont mean programming exceptions, i mean the way the cpu accepts hardware exceptions and interrupts

02:36 <sonny> doesn't arm have the stack winding thing?

02:36 <sonny> oh

02:36 <moon-child> rpnx_: better not do that, then

02:36 <geist> ARM calls traps where the cpu stops for some program order reason an ‘exception’ and hardware initiated events as interrupts

02:36 <rpnx_> moon-child, why not?

02:36 <geist> Which i think is a nice clean model

02:37 <geist> X86 mashes them all together

02:37 <sonny> ok, I get that definition

02:37 <moon-child> c++ is a foul beast which scorches the earth upon which it walks

02:37 <geist> But in arm64 they completely changed the hardware exception model from arm32

02:38 <geist> It’s much simpler and more straightforward. Arm32 exception model is frankly pretty wonky. Made sense back when everyone was hand writing assembly

02:38 <geist> But wasn’t very extensible

02:38 <rpnx_> moon-child, why did gcc switch to C++ and clang/llvm and MSVC all decide to use C++ then?

02:38 <moon-child> programmers are known for their masochism

02:38 <rpnx_> Your C compiler is written in C++ :)

02:38 <moon-child> mine?

02:38 <klys_> speaking of which

02:38 <rpnx_> If you use gcc, yes

02:38 <rpnx_> I don't know about icc

02:38 <rpnx_> Since that's proprietary

02:39 <rpnx_> No source code...

02:39 <geist> rpnx_: heh you’ll get into a lot of trouble if you start assuming folks are using particular things

02:39 <geist> You’ll find that lots of folks here are fairly fringe in terms of their choices of host OSes and toolchain and whatnot

02:40 <rpnx_> Hum, I am using cmake and clang, can't get much more boring than that I suppose.

02:40 <moon-child> https://0x0.st/op1t.txt

02:40 <moon-child> you do not need the source code to an application in order to get information about it

02:40 <rpnx_> oh

02:40 <rpnx_> Seems icc also uses c++

02:40 <rpnx_> So... hum.

02:41 <geist> Anyway, re: c++ in the kernel. It can work. I work on one at work

02:41 <moon-child> regardless: I got my stuff building under tcc (written in c). And wrote a decent chunk of my own c compiler (in c)

02:41 <geist> You just have to be very careful. If you’re an experienced c++ dev and you know what’s a good idea and what’s bad idea in a resource-starved environment like a kernel, you’ll do fine

02:41 <moon-child> (though I couldn't get tcc to link--had to use binutils stuff for that)

02:41 <rpnx_> Oh, that's neat. I forgot about TCC.

02:42 <rpnx_> geist: there is a nice YouTube video about some developers that got RTTI and C++ exceptions working in the Linux kernel

02:42 <geist> Sure. That doesn’t meanit’s a particularly good idea

02:43 <rpnx_> https://www.youtube.com/watch?v=uQSQy-7lveQ

02:43 <bslsk05> 'CppCon 2016: Rian Quinn “Making C++ and the STL Work in the Linux / Windows Kernels"' by CppCon (01:03:33)

02:43 <geist> But if you go in with eyes open, and you really really grok what’s going on under the hood, then go for it

02:43 <geist> But I really wouldn’t advise it

02:43 <rpnx_> Exceptions?

02:43 <geist> That could get extremely hairy in some deep kernel contexts

02:43 <geist> Yeah

02:43 <geist> RTTI sure, i dont personally see the big problem with it *except* it bloats the binary

02:44 <geist> And have to know that it’s not free runtime-wise, etc

02:44 <geist> But seems like in some contexts it’s really nice to have a dynamic_cast here or there

02:44 <rpnx_> I am not sure if that matters so much anymore... my raspberry pi has more memory than my old core 2 quad did

02:44 <geist> Yes and no, i think the real interesting metric there is what is the relative L1/L2 cache sizes

02:44 <geist> Sometimes ram goes up but not the low level caches

02:44 <klys_> I was debugging https://code.woboq.org/gcc/gcc/c-family/c-common.c.html#4868 and wanted a value to attach to splay_tree_node node; for reference: https://github.com/gcc-mirror/gcc/blob/master/include/splay-tree.h for now I'm using ( node -> value ) and getting values like 0xFFE1BD5CF7BFA0A0

02:45 <bslsk05> code.woboq.org: c-common.c source code [gcc/c-family/c-common.c] - Woboq Code Browser

02:45 <bslsk05> github.com: gcc/splay-tree.h at master · gcc-mirror/gcc · GitHub

02:45 <rpnx_> I remember when I upgraded to I think it was 8GB ram, on a desktop

02:45 <geist> But yes, for ‘big kernels’ i think size isn’t the biggest deal

02:45 <geist> I remember when i added the last 16K to my desktop to expand it to a full 64KB

02:45 <rpnx_> Now, even a $100 computer can have 8GiB

02:46 <geist> Now i can load integer basic *and* AppleSoft basic! At the same time!

02:46 <geist> (Apple 2 language card)

02:46 <rpnx_> I think the oldest computer I had was a windows ME laptop

02:46 <rpnx_> And that hard an expansion card with 256MB storage

02:47 <geist> But anyway, like i said if you are thinking of doing C++ in your kernel, realize you are taking on a lot more work than you would if you stuck with something like C

02:47 <geist> Only because you now have to consider all of the ramifications of the extended language and be very careful what to use where

02:47 <geist> If you use it judiciously it’s very nice, in my experience

02:47 <rpnx_> Hum, it's more work in some areas, such as needing the ELF loader and eh_frame management

02:48 <geist> No i mean all these extra features in the language you should probably *not* use

02:48 <rpnx_> What do you mean?

02:48 <rpnx_> Like std::string?

02:48 <geist> But then you have to avoid. Also things like if you like std:: template ibraries? Too bad. You probably can’t use most of those, so you’ll have to implement your own, etc

02:48 <rpnx_> Geist: The developers in that video ported the STL to Linux and Windows kernelspace, lol

02:49 <geist> And even knowing which parts of std:: are safe or not requires a fair amount of knowledge of how it all works

02:49 <geist> rpnx_: yes but they also probably are very experienced devs. Are you?

02:49 <moon-child> if I had to use a c++-alike for kernel, I would probably use d

02:49 <sonny> rpnx_ would that be considered firmware now?

02:49 <moon-child> not only because it's a more sensible language, but also there's been some serious effort done to get its runtime working in embedded contexts

02:49 <epony> sonny, which one "that"?

02:49 <rpnx_> I will probably reimplement the parts of the STL that I will use, and maybe copy some of their code because I think it is MIT licensed.

02:49 <geist> My point is you can *do* it, but knowing what to do and what not to do is the hard part. And if you dont know what you dont know, you’ll get into a lot of trouble

02:50 <geist> Which is itself a good learning thing, but keep in mind it’ll get in the way of progress if you’re just getting started

02:50 <sonny> porting STL to kernel space, it seems to be about how much runtime you want to provide for yourself

02:50 SpikeHeron has quit [Quit: WeeChat 3.6]

02:50 <epony> kernel space is programs that you load after boot, so it's not firmware

02:50 <geist> It’s like saying ‘I’m going to build a car from scratch. I have some metal and some welding torches here (C) but i think I’ll make it in carbon fiber (C++)’

02:51 <geist> Okay, sure, you can do the latter, but that’s a different class of learning curve you just added to the pile of things

02:51 <epony> firmware is running on chip (processor, hardware device, controller or asic)

02:51 <rpnx_> I will work on getting a kernel working before worrying about making it support C++

02:51 <geist> But you said you wanted to write the kernel in c++

02:51 <rpnx_> But eventually I want to have the main bits implemented in C++ so I can reuse the same libraries in kernel and userspace

02:51 <sonny> epony: yeah but say you make an interpreter for real mode, what's that called?

02:51 <epony> a program

02:52 <geist> rpnx_: that is a thing I’d highly highly reomcmend not doing unless you have to

02:52 <geist> Same reason: user and kernel space code has typically widely different requirements

02:52 <geist> We have to thread that needle a lot in fuchsia and it definitely requires a lot of careful thought

02:52 <epony> sonny, where you store and at which stage you run it makes it firmware or software

02:52 <geist> And we only very judiciously do it (share library code with user and kernel space)

02:52 <epony> it's software in general too, but one that is not user programmable

02:52 <sonny> epony so whatever gets read from rom for initilisation?

02:53 <rpnx_> Oh fushica, that's cool. I liked the way zircon looked

02:53 <epony> more like, precisely, what is stored on chip and running their without the need for software and user programming

02:53 SpikeHeron has joined #osdev

02:54 <sonny> oh

02:54 <geist> soi forget, what was the firmware question a while back?

02:54 <geist> I kinda got sidetracked

02:54 <klys_> geist, about proprietary blobs

02:54 <geist> Ah yes. For what kinda of devices are you asking?

02:54 <klys_> how much blob is in the current prototype

02:54 <geist> Current prototype of what? A consumer device? A server?

02:55 <klys_> for all included devices, something for a future consumer

02:55 <geist> Consumer devices tend to have not a tremendous amount of firmware blobs. Usually the first couple stages of the bootloader and then some code left behind to run in secure mode

02:55 <geist> That’s usually from the vendor

02:55 <klys_> what about the radios

02:55 <geist> The bootloader probably too, though frequentlythat’s a just u-boot or a fork of LK or whatnot

02:56 frkzoid has joined #osdev

02:56 <geist> If a device has a radio, yes usually it’s got some vendor provided firmware. Wi-Fi radios too

02:56 <epony> the baseband modems are chip-computer ASIC and SoC too

02:56 <geist> Maybe some blobs on the GPU

02:56 <klys_> so, seeing as the iphone 11 for example had 5+ radios

02:56 <epony> with firmware that is not open and not available to you as trade secret

02:56 <geist> But outside of that usually there’s probably less blobs of ‘firmware’ than on a typical PC

02:57 <geist> Well if you’re working on the product it’s probably available to you, even if you dont personally build it

02:57 <klys_> do you have a few megabytes to look forward to?

02:57 <rpnx_> Well, I can't even build C++ right now

02:57 <geist> Usually you and the vendor have some sort of shared git repo and can send patches back and further

02:57 <rpnx_> When I try to

02:57 <geist> But then by the time it hits a product a lot of that code is just prebuilt

02:57 opal has quit [Ping timeout: 258 seconds]

02:57 <rpnx_> Oh wait I'm dumb

02:57 <rpnx_> I didn't even put C++ in enable_languages

02:57 <clever> epony: in the case of the rpi, there are 3 blobs that occur before the arm core runs the user supplied kernel (typically linux, but anything that can run on arm fits)

02:57 <epony> arm* are cores, but what you have as an IC is a lot more than that, it's an entire System on Chip (SoC) and even different models by the same vendor may differ too much adn required different set of firmware and binary payload

02:57 <clever> epony: the maskrom, the .bin stage, and the .elf stage

02:58 <rpnx_> CMake has the best error messages sometimes...

02:58 <epony> there is firmware in the processor SoC too, which you don't bother to work with, but is there

02:58 <klys_> opensbi, uboot, and...

02:59 <geist> You can bet that apple has access to all of the source on their devices

02:59 <epony> in about at least several locations, similar to the controllers on your generic PC and similar to the BIOS payloads

02:59 <geist> But that’s cause they are total control freaks

02:59 <epony> if they were, they'd have their own and not Arm* cores

03:00 <epony> I think they are vendor / implementer like the others using Arm* cores

03:00 <rpnx_> https://gist.github.com/rnichollx/82a8c803bf94fae1a3440dfa59d2a949

03:00 <bslsk05> gist.github.com: gist:82a8c803bf94fae1a3440dfa59d2a949 · GitHub

03:00 <rpnx_> Surprisingly worked quite easily

03:00 <rpnx_> Now, the exception handling and ctor/dtor stuff is still missing

03:01 <geist> Anyway like i said there’s a difference between the the device maker sees as a blob, and what the end user sees

03:01 <rpnx_> So I'd have to reimplement librt and so on.

03:01 opal has joined #osdev

03:01 <klys_> yes, there is. also, the kernel dev isn't likely the device maker either

03:03 <epony> except in maybe Apple's case and a few other similar like Microsoft and Google specific products

03:04 <klys_> so, epony, what's the current ios microkernel if you know

03:04 <epony> where maker is the company that ordered the CPU / SoC batch, not making these themselves, but getting that done with fabless foundries and device integrator companies

03:05 <epony> in that sense "Apple" does not "make" anything, but get it designed, ordered, programmed and sold / managed as services (same for the other similar software-device vendors)

03:07 <geist> Uh, i think apple would extremely beg to differ

03:07 <geist> If there’s a device maker that really designs their own shit its apple

03:08 rpnx_ has quit [Quit: This computer has gone to sleep]

03:08 <epony> yes, of course, but the ODM/OEM model for electronics is efficient and yet not "own made" https://en.wikipedia.org/wiki/List_of_ARM_processors#Designed_by_third_parties see the "variety"..

03:08 <bslsk05> en.wikipedia.org: List of ARM processors - Wikipedia

03:08 <klys_> poking around the claim is circulating that ios still uses xnu

03:08 rpnx__ has joined #osdev

03:09 <geist> Sure i mean apple doesn’t physically manufacture stuff

03:09 <geist> But that’s fairly unimportant, they do most of the hardware and software design themselves

03:10 <rpnx__> Anyway, I should stop messing around with C++ now

03:10 <geist> Anyway I’m not trying to discourage you rpnx__

03:10 <epony> especially in the Arm (post-Intel) period, previously PowerPC

03:10 <rpnx__> Is this the right manual https://developer.arm.com/documentation/ddi0333/h/?lang=en

03:10 <bslsk05> developer.arm.com: Documentation – Arm Developer

03:10 <rpnx__> For arm64

03:10 <geist> Just saying go into it with eyes wide open. As a kernel developer that deals with c++ in the kernel, it’s quite powerful but also a lot of responsibility

03:11 <geist> And i wouldn’t take it on at the same time as learning kernel stuff

03:11 <geist> rpnx__:not by a long shot

03:11 <geist> That’s the manual to a cpu from 2006

03:11 <geist> What you want is the armv8 arm architecture manual

03:11 <geist> We call it the ARM ARM

03:12 <rpnx__> Ok, that would explain why I cannot find how to set up more than 32 bits on mmu :)

03:12 <geist> And armv8 is the version of the arm architecture that added 64bit mode

03:12 <geist> So what you’ll find, and this is different from x86, is there is a core architecture manual

03:12 <geist> It explains and defines what the ARM architecture is. Basically the blueprints for how any given core implementation would implement the ARM architecture

03:12 <geist> then there are multiple cores that each have their own manual

03:13 <geist> Those manuals are much smaller because they’re basically describing how their cpu implements the ARM arch

03:13 <geist> Arm1176 is from about 2006. It was a later mode implementation on armv6

03:13 <geist> Armv7 came along shortly afterwards and then armv8 came along in the early 2010s

03:14 <epony> some generic info about SoC to understand how much it's not possible to expect firmware free and vendor unencumbered products in this category https://en.wikipedia.org/wiki/System_on_a_chip#Applications

03:14 <bslsk05> en.wikipedia.org: System on a chip - Wikipedia

03:14 <geist> Armv9 was just released, but it’s basically a mod of armv8

03:14 <geist> epony: yeah and there’s no intrinsic reason they can’t be free, it’s mostly that vendors are stingy and would rather not release stuff

03:14 <geist> If there’s no business reason to they wont

03:14 <epony> yes

03:15 <epony> for competitive and secrecy and business (financial) not technical reasons

03:16 <epony> but also they claim that this is against counterfeit and theft from the Asian OEM and integrator companies

03:16 <rpnx__> Ah, I found this

03:17 <rpnx__> https://lowenware.com/blog/osdev/aarch64-mmu-programming/

03:17 <bslsk05> lowenware.com: AArch64 MMU Programming - Löwenware

03:17 <rpnx__> This seems a better way of understanding it

03:17 <geist> Be careful. Keep in mind you’re getting blogs like that for free

03:18 <geist> So many times you get exactly what you pay for

03:19 <geist> So lots of times the idea is sound, but it’s frequently written by folks that just learned the topic

03:19 <geist> And then they release stuff with errors that lead you astray

03:21 <rpnx__> Well, I've seen that a few times before.

03:22 <rpnx__> But maybe better to get a working understanding first and then learn the details later

03:25 <klys_> epony, I get the impression you think risk management is nontechnical and suspect.

03:26 <geist> rpnx__: sounds good! Welcome to arm64 hackers!

03:26 <geist> Hackery even. Silly ipad keeps autocorrecting

03:27 <klys_> epony, being able to quantify risk in terms of statistical measures is a rather basic subject of academic inquiry.

03:39 <epony> klys_, it's RISC architecture

03:40 <klys_> I have a book here called "the basic practice of statistics"

03:40 <epony> in contrast to CISC (x86 for example)

03:40 <epony> I was talking about the ISA / CPU architecture comparisons only.

03:42 <epony> https://en.wikipedia.org/wiki/Instruction_set_architecture#Classification_of_ISAs

03:42 <bslsk05> en.wikipedia.org: Instruction set architecture - Wikipedia

03:42 <klys_> and another one, "statistical techniques in business and economics." it has a leaflet at the front with four pages of helpful formulae

03:42 <klys_> :)

03:43 <klys_> when I first studied the RISC debate, rcollins was around. he didn't like intel too much, particularly because of their CISC habit.

03:44 rpnx__ has quit [Quit: This computer has gone to sleep]

03:45 <epony> see Arm* are both RISC, x86 6502 and Z80 are CISC

03:45 <epony> https://en.wikipedia.org/wiki/Comparison_of_instruction_set_architectures#Instruction_sets

03:45 <bslsk05> en.wikipedia.org: Comparison of instruction set architectures - Wikipedia

03:45 <klys_> how about 6809, pretty reduced eh

03:46 <epony> these designs are historic and go back 50 years

03:47 <klys_> hmm I didn't realize the 6809 was big endian

03:47 <epony> the popular ones are LE

03:48 <epony> for CPU the endiannes optimisations lead to that, for NET the serialisation optimises for BE

03:49 <klys_> well I have a bunch of dip sockets and some copper clad board, just need some hcl and an iron and I could use the press-n-peel to put together a 6809 board

03:49 <klys_> oh and a nice drilling machine I have

03:50 fkrauthan has quit [Quit: ZNC - https://znc.in]

03:51 fkrauthan has joined #osdev

03:52 fkrauthan has quit [Client Quit]

03:52 sonny has quit [Quit: Client closed]

03:53 <geist> yah of the 8 bit era 6809 is my fave

03:53 <geist> i put togeher a board a while back and had fun with it

03:53 <klys_> https://github.com/travisg/6809obc/tree/master/eagle

03:53 <bslsk05> github.com: 6809obc/eagle at master · travisg/6809obc · GitHub

03:53 <epony> I got 6502 and z80 usage in the early to mid 80ies, these were fantastic PCs

03:53 <geist> note an 8 bit machine being BE or LE is usually only in the form of 'what order do bytes get stuffed on the stack when you push the PC'

03:54 <geist> but 6809 had more 16bit regs and i think a load/store 16bit so i guess it did start to matter

03:54 <geist> klys_: yeah i have been meaning to make a rev 2 of that in kicad

03:54 fkrauthan has joined #osdev

03:55 <klys_> geist, do you usually use kicad on a specific desktop or laptop environment?

03:55 <geist> not really. i've found it works equally well on windows/linux/mac

03:55 <geist> i have it installed on all three

03:56 <klys_> well cool, you're looking forward to it I see

03:57 <geist> yeah it's just one of many projects really

03:58 * klys_ clones the repo, good times.

03:58 <geist> smewhere i have a rev of a straight conversion to kicad. i should check that into a branch if i haven't already

04:02 rpnx_ has joined #osdev

04:03 <rpnx_> I found the ARM ARM

04:03 <geist> recommendation: save a copy of the pdf and use locally

04:03 <geist> it's *huge*

04:03 <klys_> geist yeah I have rev2

04:03 <geist> can you paste a link?

04:04 <klys_> https://github.com/travisg/6809obc/tree/rev2/eagle

04:04 <bslsk05> github.com: 6809obc/eagle at rev2 · travisg/6809obc · GitHub

04:04 <klys_> rpnx_ the link to the arm arm pdf?

04:04 <geist> yeah that's a WIP, i totally forget what i was starting to change

04:04 <geist> rpnx_: want to make sure it's the right one

04:04 <geist> and i'm in front of a computer now so i can easily check

04:06 <klys_> hope he doesn't have to "inspect element" to get a link

04:16 <geist> nah the current arm doc site is quite good. it used to be a disaster, some sort of sharepoint like thing that was incredibly slow

04:16 <geist> but they redid it a year or two ago and now it's pretty discoverable and the links work and whatnot

04:26 <klys_> https://dweet.net/6ttfF twirling sphere demo

04:26 <bslsk05> dweet.net: dweet 6ttfF

04:27 <zid> neat

04:27 <zid> https://zid.github.io/gl.html I made a cog for my 'rendering things in js' demo

04:27 <bslsk05> zid.github.io <no title>

04:29 <zid> (and a normal viewer, for kicks)

04:32 <rpnx_> https://developer.arm.com/documentation/ddi0553/latest

04:32 <bslsk05> developer.arm.com: Documentation – Arm Developer

04:32 <rpnx_> This is the one I'm using

04:32 <rpnx_> Hopefully this is a good one

04:44 <klys_> a different guess here: https://documentation-service.arm.com/static/62ff43b0e95b0a633aff8a64?token=

04:45 <rpnx_> This will take a lot of reading.

04:45 <rpnx_> I have so many documents now

04:45 <rpnx_> Haha

04:47 rpnx_ has quit [Quit: This computer has gone to sleep]

04:48 <epony> one of them you need

04:49 rpnx_ has joined #osdev

04:49 <rpnx_> hum

04:50 <rpnx_> I wish I didn't d/c every time my laptop went into sleep mode.

04:50 <clever> rpnx_: run an irc client on another box, under screen, then ssh in and `screen -x`

04:51 <rpnx_> If I wanted to learn a terminal IRC client...

04:52 \Test_User has quit [Ping timeout: 268 seconds]

04:52 \Test_User has joined #osdev

04:53 <rpnx_> I wonder if I could make an OS with no terminal :)

04:53 <rpnx_> GUI only

04:56 <rpnx_> I think this mmu stuff will require me to just read the docs throughly, there doesn't seem to be many easily digestible summaries of how to set it up.

05:04 <rpnx_> This one looks pretty neat: https://github.com/bztsrc/raspi3-tutorial/blob/master/10_virtualmemory/mmu.c

05:04 <bslsk05> github.com: raspi3-tutorial/mmu.c at master · bztsrc/raspi3-tutorial · GitHub

05:05 Oshawott has quit [Ping timeout: 252 seconds]

05:06 <clever> rpnx_: its far simpler when using pure identity paging, because you can just start in no-mmu C, generate the tables, turn the mmu on, and basically nothing changes

05:07 <clever> the LK example i had linker earlier is more complex, because it wants to run from some high address, and the compiler/linker assumes the code is running from that address

05:07 <clever> so if its not mapped correctly, C code just implodes

05:07 <clever> which means having to generate the paging tables with asm instead

05:08 <rpnx_> right

05:09 <rpnx_> But identity paging will not allow me to set up the userspace

05:10 <clever> you then have 2 options

05:11 <clever> 1: break the code up into 2 chunks, a loader written in C and linked for a low addr, that generates the paging tables to map the 2nd chunk mapped to a high addr, and jump to it

05:11 <clever> 2: write that 1st chunk in position independant asm, so it can be included in the same build and it wont care that its initially loaded to the wrong addr

05:13 <rpnx_> So, if I understand correctly, it's using 512 pointers for the page table, and those are pointers to whereever.

05:13 <rpnx_> Or possibly 1024

05:15 <clever> 512 for that L0 table i think, this code is complicated by putting L0, L1, L2, and L3 into the same array

05:15 <clever> personally, i would have used multiple seperate arrays

05:15 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/arm_chainloader/mmu.c#L12-L36

05:15 <bslsk05> github.com: rpi-open-firmware/mmu.c at master · librerpi/rpi-open-firmware · GitHub

05:15 <clever> this is how i did it on 32bit arm

05:16 <clever> for 32bit, the L0 entries each cover a 1mb chunk of the virtual space, and the hardware supports 1mb pages

05:16 <clever> so i can just skip every other layer

05:16 \Test_User has quit [Quit: e]

05:18 <JerOfPanic> morning

05:19 <rpnx_> So it seems it eventually just points to the start of a page and has some extra bits for controlling things like execute and so on

05:20 <rpnx_> So this isn't the same "page table" as I was thinking of

05:20 <rpnx_> Fully hardware, I would need some other way to keep track of e.g. what processes are using what pages and such

05:21 <clever> rpnx_: each entry in the L0 table, is pointing to the address of an L1 table, which must be page-aligned

05:21 <clever> and then each entry in the L1 table, is pointing to a new L2 table

05:22 <clever> so it forms a giant tree

05:22 <clever> at some steps, you can set a special flag, to say that your pointing to a huge page of mapped data, rather then another table

05:22 <clever> and that terminates the tree branch early

05:23 rpnx_ has quit [Quit: This computer has gone to sleep]

05:23 kof123 has quit [Ping timeout: 268 seconds]

05:23 <JerOfPanic> I am working on FPU support for my multitasking, got any source code like this? Like e.g. storing FPU registers in variables? And returning an FPU state?

05:25 <JerOfPanic> Intel documentation is highly versable.

05:25 <JerOfPanic> I think I should do a FWAIT on FPU.

05:26 <JerOfPanic> FSAVE, FLDENV

05:26 <JerOfPanic> should I check for error conditions?

05:27 <JerOfPanic> "The contents of the x87 FPU status register (referred to as the x87 FPU status word) can be stored in memory using the FSTSW/FNSTSW, FSTENV/FNSTENV, FSAVE/FNSAVE, and FXSAVE instructions. It can also be stored in the AX register of the integer unit, using the FSTSW/FNSTSW instructions."

05:28 <JerOfPanic> this is very verbose

05:33 \Test_User has joined #osdev

05:33 xenos1984 has quit [Read error: Connection reset by peer]

05:33 <JerOfPanic> https://github.com/torvalds/linux/blob/master/arch/x86/power/cpu.c

05:33 <bslsk05> github.com: linux/cpu.c at master · torvalds/linux · GitHub

05:38 rpnx_ has joined #osdev

05:38 <rpnx_> clever, ah okay, I assumed that meant that whole cpu was configured for huge pages

05:39 <zid> fxsave is the good one

05:40 <JerOfPanic> zid: yeah, but I need to also render a new state

05:40 <JerOfPanic> for new processes threads

05:41 <JerOfPanic> clone it?

05:41 <zid> https://www.felixcloutier.com/x86/fxsave has the tables from the manual if you want a web reference

05:41 <bslsk05> www.felixcloutier.com: FXSAVE — Save x87 FPU, MMX Technology, and SSE State

05:41 <clever> rpnx_: i think there are 2 seperate things, one for just making every page bigger, and then another for terminating the page-table walk early, and making even bigger "pages" from a collection of the 1st one

05:44 Clockfac1 has quit [Ping timeout: 268 seconds]

05:45 rpnx_ has quit [Quit: This computer has gone to sleep]

05:49 <JerOfPanic> thinking

05:53 xenos1984 has joined #osdev

06:01 \Test_User has quit [Ping timeout: 252 seconds]

06:10 kof123 has joined #osdev

06:41 hbag has quit [Remote host closed the connection]

06:52 \Test_User has joined #osdev

07:02 bauen1 has quit [Ping timeout: 255 seconds]

07:10 epony has quit [Remote host closed the connection]

07:38 gildasio has quit [Read error: Connection reset by peer]

07:38 foudfou has quit [Remote host closed the connection]

07:39 foudfou has joined #osdev

07:39 gildasio has joined #osdev

08:09 gildasio has quit [Ping timeout: 258 seconds]

08:11 gildasio has joined #osdev

08:13 bauen1 has joined #osdev

08:28 ZipCPU_ has joined #osdev

08:29 ZipCPU has quit [Ping timeout: 255 seconds]

08:29 ZipCPU_ is now known as ZipCPU

08:39 zaquest has quit [Remote host closed the connection]

08:40 zaquest has joined #osdev

08:51 GeDaMo has joined #osdev

09:06 smach has joined #osdev

09:19 Piraty has quit [Quit: -]

09:19 Piraty has joined #osdev

09:26 SGautam has joined #osdev

09:55 riverdc has quit [Quit: quitting]

09:56 riverdc has joined #osdev

09:59 <mjg> moon-child: https://gnats.netbsd.org/56979 btw

09:59 <bslsk05> gnats.netbsd.org: NetBSD Problem Report #56979: fork(2) fails to be signal safe

10:00 xenos1984 has quit [Read error: Connection reset by peer]

10:01 eroux has quit [Ping timeout: 252 seconds]

10:02 xenos1984 has joined #osdev

10:04 eroux has joined #osdev

10:46 frkzoid has quit [Ping timeout: 244 seconds]

11:11 Ali_A has joined #osdev

11:25 netbsduser has joined #osdev

11:27 lkurusa has joined #osdev

11:56 <mrvn> when I build spack it links against liba, libb, libc, libd. I wonder when they will get a libe.

11:58 gog` has joined #osdev

12:02 m5zs7k has joined #osdev

12:23 nyah has joined #osdev

12:24 TheSpecialist666 has joined #osdev

12:24 <TheSpecialist666> I just want to warn everyone here to stay away from Discord OSDev. I worked hard to wrote a 64-bit OS which is already a challenge and I went there for help on a complicated problem with the GDT. Everytime they fixed a problem there was another one.

12:24 <TheSpecialist666> I dont like debugging and I am new to it. THey still made me use GDB and other crap which is like using a hammer to fix a watch. When I had more problems later I asked again and all they did was bitch at me for not giving a diff for what I changed

12:24 <TheSpecialist666> Which they said was why it broke. I don't even use patchutil and diffutil so of course I don't have a fucking diff.

12:25 <TheSpecialist666> I almost had enough of 64 bit now but I would like to know why I get a triple fault https://github.com/TheSpecialist666/MiniOS on my OS. When you have figured it out made an issue on the repository.

12:25 <bslsk05> TheSpecialist666/MiniOS - Hobby OS project (0 forks/2 stargazers)

12:25 TheSpecialist666 has quit [Quit: Thanks in Advance]

12:35 kof123 has quit [Ping timeout: 268 seconds]

12:40 <Mutabah> ... well, that's an attitude

12:41 <Mutabah> While I've heard that the discord server can be a little brash... I assume they were asking for people to just fix the problem

12:41 <sham1> More than likely

12:46 SGautam has quit [Quit: Connection closed for inactivity]

12:48 <netbsduser> disappointed to read that

12:50 <netbsduser> i spent over an hour last night assisting them to figure out a triple fault, and i went to some effort to outline why we used each tool in the toolbox to figure it out

12:51 <Mutabah> yeah, that was my rough guess

12:53 gog has quit [Ping timeout: 252 seconds]

12:54 gog has joined #osdev

12:55 gog` has quit [Ping timeout: 252 seconds]

12:55 <netbsduser> others assisted as well, including the author of the bootloader they used who noticed some other problems (the bootloader Limine provides a callback for printing to the framebuffer during early boot, but you've got to preserve the expected GDT layout + keep the framebuffer identity mapped)

12:55 <netbsduser> oh well, i will just chalk it up to their frustration after a pretty rough introduction to the wonderful world of the bare metal

13:01 <Mutabah> Yeah, that's probably it

13:15 isaacwoods has joined #osdev

13:33 <sbalmos> Learning curves are steep, sniffle sniffle

13:35 <netbsduser> there is no royal road to operating systems, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits

13:36 gog has quit [Ping timeout: 268 seconds]

13:36 <sbalmos> if you consider a small campfire out in the wilderness luminous, fair enough

13:56 Matt|home has quit [Ping timeout: 244 seconds]

14:20 rpnx has joined #osdev

14:20 gog has joined #osdev

14:26 rpnx has quit [Ping timeout: 244 seconds]

14:27 opal has quit [Ping timeout: 258 seconds]

14:41 bxh7 has joined #osdev

14:43 bxh7 has quit [Quit: ZNC 1.8.2 - https://znc.in]

14:45 opal has joined #osdev

14:49 heat has joined #osdev

14:49 <heat> >When you have figured it out made an issue on the repository.

14:50 <heat> wow are you trying to nerdsnipe me?

14:50 rpnx has joined #osdev

14:55 <heat> fork() in a signal handler is easily of the cursediest cursed thing ever

14:57 rpnx has quit [Ping timeout: 244 seconds]

15:02 isaacwoo1 has joined #osdev

15:03 matt__ has joined #osdev

15:04 isaacwoods has quit [Ping timeout: 244 seconds]

15:05 <heat> klys_, btw, the amount of firmware on a typical machine is "a lot"

15:06 <heat> the amount of proprietary blobs in that firmware is also "a lot"

15:08 <heat> on a typical x86 system, you have the intel ME image (blob, not shared with vendors), the FSP (does a lot of the platform initialization, memory training, etc; afaik, not shared with vendors), a shit ton of proprietary modules that AMI and all the other crappy downstreams build on EDK2 (proprietary, shared with vendors), some proprietary Intel modules that are shared with vendors (like CSM), EDK2 (FOSS, although possibly modified)

15:09 <mrvn> Give a man a bugfix and he will have fixed a bug. Teach a man to use GDB and he will fix bug for live. Excpet when the man is TheSpecialist666. Then he will just go on irc and command people there to fix his code.

15:10 <mrvn> heat: I have code where everything (but the start) runs in signal handlers. Only way to change the stack.

15:10 <heat> then each PCI device has a ROM, which is usually a EFI driver blob; Intel platforms' iGPU also has the VBT (video BIOS table) which is a non-free blob you have to copy (or build yourself, good luck have fun, most people just take it from the blob firmware)

15:11 <mrvn> user space multithreading in C is fun.

15:11 <heat> ACPI is, you guessed it, a bunch of blobs, although usually built dynamically in firmware

15:11 <heat> and I'm probably missing some

15:12 <heat> mrvn, no it's not?

15:12 <heat> __asm__ __volatile__("mov %0, %%rsp; call *%1" :: "r"(new_stack), "r"(func) : "memory");

15:13 <mrvn> heat: that's not C.

15:13 <gog> fix my code

15:13 <heat> yes it is

15:13 <mrvn> heat: it's a compiler extension and assembly

15:13 <heat> I don't write standard C

15:13 <heat> that's madness

15:13 <heat> gog, yes mommy

15:13 <heat> what code

15:13 <gog> idk

15:14 <gog> i don't eben write code

15:14 <mrvn> heat: in POSIX / C you can use sigaltstack so the signal handler runs on btheir own stack and then trigger a signal and voila, new stack.

15:14 <gog> mrvn: hax

15:14 <heat> mrvn, sure, you also need to go to the fucking kernel for a signal

15:14 <heat> if you restrain yourself to standard C everything is bad and slow

15:15 <heat> and stupid anal

15:15 <heat> I should static_assert(sizeof(char) == 1 && CHAR_BIT == 8)

15:17 <mrvn> no to the first, yes to the second

15:17 <mrvn> sizeof is the size in chars

15:18 <heat> see, everything is stupid

15:19 <mjg> using standard c only is like sticking to posix interfaces on unix platforms

15:21 <heat> see, I would like to use io_uring, but it's not standard, so aio_read it is

15:24 isaacwoo1 has quit [Quit: WeeChat 3.6]

15:32 <mrvn> heat: that's horrible, that's a rather limited number of threads emulating asynchronous IO.

15:33 <mrvn> if you only intent to run on linux then using linux-aio or uring is perfectly fine.

15:34 <heat> fyi linux-aio is also shit

15:34 <heat> it's rarely asynchronous

15:34 <mrvn> because FSes don't implement it. Basicaly only works on block devices.

15:34 <heat> https://blog.cloudflare.com/io_submit-the-epoll-alternative-youve-never-heard-about/

15:34 <mrvn> is uring any better?

15:34 <heat> yes, uring is way better

15:35 epony has joined #osdev

15:35 <mrvn> how? And why isn't linux aio using the same callbacks?

15:36 opal has quit [Remote host closed the connection]

15:36 <heat> because linux aio and uring are very different in design

15:37 opal has joined #osdev

15:37 <mrvn> but the aio interface isn't the problem, is the call into the FS that blocks.

15:37 <mrvn> is uring just better at hiding that or did they fix the FSes?

15:39 <heat> https://kernel.dk/io_uring.pdf

15:39 <heat> this looks like it answers it

15:40 <mrvn> thanks

15:49 [itchyjunk] has joined #osdev

15:52 epony has quit [Remote host closed the connection]

15:52 epony has joined #osdev

16:03 pretty_dumm_guy has joined #osdev

16:16 gareppa has joined #osdev

16:18 gareppa has quit [Remote host closed the connection]

16:19 Mutabah has quit [Ping timeout: 268 seconds]

16:25 Mutabah has joined #osdev

16:41 carbonfiber has joined #osdev

16:47 terminalpusher has joined #osdev

16:49 gareppa has joined #osdev

16:49 Ali_A has quit [Quit: Connection closed]

16:58 gareppa has quit [Quit: Leaving]

16:59 Mutabah has quit [Ping timeout: 240 seconds]

17:01 Mutabah has joined #osdev

17:12 bauen1 has quit [Ping timeout: 260 seconds]

17:34 xenos1984 has quit [Ping timeout: 260 seconds]

17:44 xenos1984 has joined #osdev

17:47 gildasio has quit [Ping timeout: 258 seconds]

17:48 opal has quit [Ping timeout: 258 seconds]

17:53 lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]

17:57 gildasio has joined #osdev

17:58 opal has joined #osdev

18:00 FreeFull has joined #osdev

18:10 nur has quit [Ping timeout: 268 seconds]

18:24 Matt|home has joined #osdev

18:39 bauen1 has joined #osdev

18:43 gog has quit [Ping timeout: 252 seconds]

18:44 matt__ is now known as freakazoid333

18:50 ghee has joined #osdev

18:54 ghee has quit [Client Quit]

18:56 gog has joined #osdev

19:02 terminalpusher has quit [Remote host closed the connection]

19:02 terminalpusher has joined #osdev

19:04 rpnx has joined #osdev

19:14 dude12312414 has joined #osdev

19:15 nur has joined #osdev

19:17 dude12312414 has quit [Client Quit]

19:18 nur has quit [Quit: Leaving]

19:18 nur has joined #osdev

19:27 freakazoid333 has quit [Ping timeout: 255 seconds]

19:34 lkurusa has joined #osdev

19:35 lkurusa has quit [Client Quit]

19:45 <mjg> i just came up with a funny idea

19:45 <mjg> consider a preemptible kernel and something which needs to disable preemption in the fast path

19:46 <mjg> how do you avoid the branch on re-enabling it

19:46 <mjg> ez

19:47 <mjg> in the preemption ipi handler you check rip and if that's within the preempted func, you overwrite the return address with something which preempts (and stash the real return address somewhere in pcpu)

19:48 <mjg> i think that's pretty decent, will implement probably next week

19:48 <geist> problem there would likely be some sort of issue with the stack and whatnot. or at least the thing you replace the PC with would have to assume the stack is in a bad place, save all the regs, etc

19:49 <geist> uestion is is it worth all the extra effort to avoid a branch, where modern machines branch predict the shit out of stuff like that

19:49 Ali_A has joined #osdev

19:49 <mjg> geist: for something like malloc or unlock i do think it's 100% worth it

19:49 <geist> okay

19:50 <mjg> it can make fast path malloc juse have one branch: doy ou have enough in the per-cpu cache

19:51 frkzoid has joined #osdev

19:51 mrlemke has joined #osdev

19:53 pretty_dumm_guy has quit [Quit: WeeChat 3.5]

19:58 GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]

20:09 opal has quit [Remote host closed the connection]

20:09 gareppa has joined #osdev

20:09 opal has joined #osdev

20:12 gareppa has quit [Remote host closed the connection]

20:13 Ali_A has quit [Quit: Connection closed]

20:13 <moon-child> mjg: yep, I came up with that trick a while ago

20:13 <mjg> moon-child: fu man

20:13 <mjg> moon-child: )

20:13 <moon-child> been meaning to do a proper writeup to have something to refer to

20:13 <moon-child> lol

20:14 <mjg> on a serious note, one thing to watch out is tail call optimization

20:14 <mjg> basically need to only patch up whitelisted sites

20:14 <zid> -ftail-calls-are-banned-y

20:14 <zid> o

20:15 <moon-child> mjg: another application I came up with is preemptive ipc without ipi

20:16 <moon-child> not _quite_ sure if it works, but I think it does

20:18 <mrvn> mjg: if you disable IRQs then how would the ipi handler get called?

20:19 <moon-child> that's the point; you don't disable irqs

20:20 <moon-child> the isr itself is then responsible for ensuring that nothing untoward happens in the critical section

20:20 <mrvn> why? then anything can preempt and you need a lot of branches to work around it

20:20 demindiro has joined #osdev

20:20 <moon-child> why do you need a lot of branches?

20:21 <mrvn> is sti/cli so expensive that you rather branch and meddl with return addresses?

20:21 <mrvn> moon-child: to check if you are in any of the critical sections

20:22 <mrvn> and you need that in every isr

20:22 <moon-child> it's overhead that you only pay in the isr, when your performance is killed anyway

20:23 <moon-child> and it's not too bad

20:23 <mrvn> O(log(critical sections))

20:23 <demindiro> If I'm following: you're trying to avoid a branch by overwriting the return address?

20:24 <mrvn> you also loose all the tail calls as you mentioned.

20:24 <moon-child> mrvn: not exactly

20:24 <moon-child> it's O(log(# all critical sections that you might want to enter from a single isr call))

20:25 <moon-child> if in one isr I want to access shared state X, and in another I want to access shared state Y, in each of those isrs I only have to check for one critical section

20:25 <moon-child> even though there's a total of 2 critical sections in the kernel

20:26 <moon-child> common case is just malloc (was why I came up with it too); then you're only checking one thing

20:26 <mrvn> moon-child: hmm. so X fires and then Y fires. In Y you see that you are in the middle of X so you fiddle with the return address of X so that it calls do_Y instead and delay the isr?

20:26 demindiro has quit [Quit: Client closed]

20:27 <moon-child> no; the point is that X and Y can run concurrently

20:27 <moon-child> because they're touching different pieces of shared state

20:27 <moon-child> think of it like a lock. You implicitly lock X by being in the region of code that modifies X. But locking X doesn't mean you have to lock Y too

20:27 <mrvn> then what is the case where you fiddle with the return address?

20:28 <moon-child> when you start modifying X in the main kernel, and then you get interrupted, and the isr wants to modify X too

20:30 <mrvn> If I understand you right you are implementing a verry hackish wait-queue for a lock.

20:30 <moon-child> yep!

20:31 <mrvn> That will totally blow up with SMP by the way

20:31 <moon-child> yes, that's why it's only for arbitrating core-specific stuff (such as, for example, malloc)

20:31 <mrvn> Doesn't x86 also have a return address predictor that will mispredict when you fiddle with the return address?

20:31 <moon-child> mjg: he doesn't like the stack trick :\

20:32 <moon-child> mrvn: sure; the point is to avoid branches in the hot path

20:32 <mjg> i'm busy for next 15-20 mins

20:32 <mjg> will flame later

20:32 <mrvn> moon-child: for the stack trick you need a function call. I think sti/cli will be cheaper than a function call.

20:33 <moon-child> malloc is usually a function call anyway

20:33 <mrvn> only if the whole function is one big critical section

20:33 <moon-child> malloc is usually one big critical section

20:34 <moon-child> but, well, ok, say the critical section is smaller than a whole function

20:34 <moon-child> that just means that your 'scheduled' code runs a bit later than it otherwise would

20:34 <moon-child> which is fine

20:34 <mrvn> well, not here. it has process local data (the address space) and core local data (free pages for the core) and global data (reserve of free pages) that all need different locking.

20:35 <moon-child> talking about kernel malloc here (presumably, if talking about isrs rather than signals), so you'd only hit the last two, and the third case is the slow one anyway

20:35 <moon-child> again, this is about speeding up the hot path

20:35 <mrvn> yeah, the third needs a full lock for SMP.

20:36 <mrvn> The address space might need a full lock if you have threads for the process running on other cores

20:36 <mrvn> kmalloc won't have that

20:44 <mjg> so

20:44 <zid> faputa-hime so cute :(

20:44 <mjg> cli and sti are still quite slow, but i don't remember exact latency

20:45 <mjg> the current code spends a branch to figure out whether it needs to go off cpu

20:45 <mrvn> are they serializing?

20:45 biblio has joined #osdev

20:45 <mjg> fair quesation, i don't remember

20:46 <mjg> looks like no

20:46 <mjg> anyhow

20:47 <mjg> with the aforementioned hack the fast path does not have to check if preemption is needed

20:47 <mjg> only the comparatively rare case of getting preempted there needs to do more work

20:47 <mjg> which imo is a great tradeoff to make

20:48 <mjg> i also have a wip hack to make sleepable mutexes be unlockable without atomics

20:49 <mjg> which requires preemption to be disabled around it

20:49 <mjg> not having to check for it in the routine would be A+

20:50 <mrvn> what it the isr fires twice?

20:50 <mrvn> you can only do the stack fiddling once

20:51 <mjg> twice when

20:51 <mjg> while within the preemption disabled area?

20:51 <mrvn> yes, malloc can take some time

20:52 <mjg> that's inconsequential, consider the current code

20:52 <mjg> the preemption handler just keeps finding that preemption is disabled

20:52 <mjg> and denotes it needs to be handled once enabled

20:52 <mjg> i literally only need to do the stack fiddling once

20:53 <mrvn> but it records the original return address in some variable and relaces it on the stack. so the 2nd call overwrites the safed return address.

20:53 <mjg> and i can trivially check that i alreday did it -- return address is the special routine

20:53 <mjg> which would not happen without preemption

20:53 <mrvn> but then you need a counter for how often you tried to replace the address

20:53 <mjg> i don't

20:54 <mjg> you do realize that at the end of the day i only get preempted once, with or without the hack

20:54 <mrvn> mjg: the example from moon-child wasn't about preemption but about accessing shared state.

20:54 <mjg> i did not see his example

20:55 <mjg> i'm saying the above should work for what i intend to use it for

20:55 <mjg> which is protecting per-cpu state

20:55 <mrvn> not without dropping interrupts

20:56 <mjg> if interrupt handlers handlers are to legally mess with that state, sure

20:56 <mjg> but that's not part of the state i'm concerned with here

20:56 <mjg> for example in freebsd the internal allocator normally must NOT be called from irq handlers

20:56 <mjg> so you are guaranteed to be safe with mere preemption disablement

20:57 <mjg> my hack does not loosen anything here and removes the branch from the common case

20:57 <moon-child> what's a sleepable mutex? The kind where you say 'wake me up if it's unlocked, or if x time passes without anything happening'?

20:57 <mjg> moon-child: that's the "owner went off cpu, so i'm going off cpu too"

20:57 <mjg> moon-child: "and i'm denoting that in the lock"

20:58 <mrvn> mjg: so you only use a wait queue if the mutex is held by a non-running process?

20:58 <moon-child> so you follow the owner to its new cpu?

20:58 <mjg> mrvn: thread, yes

20:59 <mjg> moon-child: so you go off cpu and propagate your priority to the sucker if applicable

20:59 <mjg> the solaris way :>

20:59 <mrvn> having a non-preemptible microkernel makes all this so much simpler.

20:59 <mjg> mrvn: i agree, but i did not make it this way ;)

20:59 <mjg> and i can't just make it not preemptible by default

20:59 <mrvn> also not having threads

21:00 <moon-child> well

21:01 <mrvn> .oO(This process wants to malloc some memory, let fiddle with it's page table. I know no other core will access that)

21:01 <mjg> who is doing that

21:01 <mrvn> me

21:01 <mjg> lul

21:01 <mjg> that's some... interesting ideas you got there mate

21:02 <mrvn> My kernel is based on message passing. There is no shared state between processes.

21:04 <moon-child> so you have to copy everything all the time?

21:04 <moon-child> .oO( the rust methodology, applied to os design )

21:06 <mjg> speaking of

21:07 <mjg> is there a perf comparison of rust stuff vs whatever else?

21:07 <mjg> i don't mean "we implemented 2 totally different algos" which i managed to find osme time ago

21:07 <mjg> :[

21:07 <moon-child> I wouldn't expect any difference in microbenchmarks (vs c)

21:08 <klange> absolute classic: beating C by writing a horrible implementation in C and a good implementation in the other language!

21:08 <moon-child> I would expect there to be some impact on overall program design, but that is much harder to measure sensibly

21:08 <mrvn> moon-child: I pass pages around.

21:09 <mjg> klange: you bet

21:09 <moon-child> can I write an audio server?

21:09 <moon-child> can't really 'pass pages' there :P

21:09 <mrvn> Does rust copy stuff around all the time? Isn't it about passing by reference and still knowing who owns the data at all time?

21:10 <mrvn> moon-child: why not? You app fills a buffer, say 64k, with music and then passes that off to the sound card driver.

21:10 <moon-child> for music, sure

21:10 <moon-child> for low-latency stuff, less so

21:10 <mrvn> 4k sound data isn't verry long

21:10 <moon-child> you wanna ring buffer which is continuously filled by the client and emptied by the server

21:11 <klange> 4k sound data is 23ms.

21:12 <moon-child> re rust copying: yes, rust may require you to copy where another language would allow you to share

21:12 <klange> Assuming stereo, 16-bit, 44.1kHz :) only 21 at 48!

21:12 <mrvn> can't do that at the moment. Might be worth as optimization for streams but it's outside the design currently.

21:12 <zid> I only use 192kHz audio from a cassette

21:12 <mjg> moon-child: i'm moslty worried about branch predictor

21:13 <klange> Only 5ms at 192kHz - that's less than a frame at even at 140Hz!

21:14 <zid> don't forget 5.1

21:14 <moon-child> depends on your framerate :P

21:14 <klange> I literally said 140Hz.

21:14 <zid> moon pay attention

21:14 <zid> jesus

21:14 <moon-child> oh I thought you were conflating with video frame

21:14 <zid> moon-child more like moon-mmon amirite

21:14 <moon-child> ok cool

21:15 <moon-child> oh wait you were

21:15 <moon-child> nvm

21:15 scoobydoo_ has joined #osdev

21:15 scoobydoo has quit [Ping timeout: 252 seconds]

21:15 scoobydoo_ is now known as scoobydoo

21:15 <moon-child> is confusing because in audio context I interpret 'frame' to mean sample x channels

21:16 <klange> I just call that a stereo sample, but I come from a different audio background.

21:16 <zid> a frame is one round of a game of snooker

21:16 <zid> why are you talking about audio

21:16 <mrvn> But hey, my scheduler can handle really big process counts. I've tested it with 786432 processes, one per pixel for a 1024x768 framebuffer.

21:16 <moon-child> mjg: why?

21:16 <klange> A frame is what I never bother to put my photos in.

21:17 <zid> I wonder where this jargon comes from

21:17 <zid> a frame meaning a logical unit of something

21:17 <mrvn> A frame is that thing around the picture holding it up.

21:17 <moon-child> a frame is a context for something

21:17 <moon-child> a wrapper

21:17 <moon-child> in this case, the context is temporal--the frame is a snapshot of a bunch of states, which are connected in time

21:17 * mrvn runs a frame job on moon-child

21:18 <moon-child> kinky

21:19 <mrvn> moon-child: no kidding: https://m.media-amazon.com/images/I/81GU3ZI-aLL._SL1379_.jpg

21:19 <klange> A frame is the core structure of something (eg. the wooden frame of a house) or a wrapper (eg. a framed painting)

21:19 <moon-child> mjg: I mean, with the current setup, you'd miss that branch anyway, right?

21:20 <mjg> moon-child: i mean in rust progs vs c

21:20 <moon-child> oh, why?

21:20 <mjg> not worried in the preemption hack case

21:20 <mjg> i presume it adds some branchin'

21:20 <moon-child> like what? I mean, presumably boundschecks, but those should predict correctly

21:21 <mjg> more branches in general == more conflict potential

21:21 <mrvn> I assume the check_for_critical_section is written so the non mtching casde is the predicted path and the hit branches to some cold code and does the stack fiddling.

21:21 <moon-child> mjg: sure

21:22 <moon-child> (this is why I want branch hints)

21:22 <mrvn> does x86 have any branch hints?

21:22 <moon-child> used to, then they got rid of them

21:23 <moon-child> mjg: oh, agner says that on some cpus, correctly-predicted never-taken branches won't take up a btb entry

21:23 <moon-child> (but that doesn't help if a legitimate btb entry conflicts with a lolbranch)

21:28 <epony> long jump

21:29 epony has quit [Remote host closed the connection]

21:41 terminalpusher has quit [Remote host closed the connection]

21:41 ckie has quit [Quit: *poof*]

21:42 ckie has joined #osdev

21:44 rpnx has quit [Ping timeout: 252 seconds]

21:45 rpnx has joined #osdev

21:54 epony has joined #osdev

21:58 qubasa has joined #osdev

22:05 smach has quit [Remote host closed the connection]

22:08 smach has joined #osdev

22:09 gildasio has quit [Ping timeout: 258 seconds]

22:09 MiningMarsh has joined #osdev

22:09 smach has quit [Remote host closed the connection]

22:10 smach has joined #osdev

22:10 sav_ has joined #osdev

22:11 sav_ has quit [Client Quit]

22:11 gildasio has joined #osdev

22:12 frkzoid has quit [Ping timeout: 255 seconds]

22:15 <mjg> hey armz, do you happen to know how to silence this warn:

22:15 <mjg> > warning: use of PC in the list is deprecated

22:15 <mjg> stmia r3, {r4-r12, sp, lr, pc}

22:15 <mjg> not an arm person myself, from what i hear the use of pc there is unavoidable

22:16 <zid> march= it somehow?

22:16 <mjg> i already march it, no dice

22:17 <mjg> now that i asked

22:17 <mjg> if (MI.getOperand(OI).getReg() == ARM::PC) {

22:17 <mjg> Info = "use of PC in the list is deprecated";

22:17 <mjg> thanks clang

22:18 smach has quit [Remote host closed the connection]

22:18 smach has joined #osdev

22:21 <zid> lol

22:21 rpnx has quit [Read error: Connection reset by peer]

22:21 rpnx has joined #osdev

22:27 <moon-child> #pragma gcc diagnostic ...?

22:31 Emil_ is now known as Emil

22:39 <mrvn> mjg: why would you store the pc?

22:40 <mrvn> and is that the address of the stmia or the opcode after?

22:53 <moon-child> presumably because you wanna know what the pc was

22:53 <moon-child> (eg could be for logging, to get a source location)

23:08 <heat> and how tf do you deprecate an instruction

23:11 carbonfiber has quit [Quit: Connection closed for inactivity]

23:19 <geist> mjg: which arm core are you compiling for?

23:19 <geist> and/or which arm version? (ie, armv4, armv5, etc)

23:19 <geist> but that being said, why are you storing PC there?

23:19 <geist> that's usually pretty strange thing to do

23:21 <heat> i think the real question is "why are they deprecating it"

23:21 <heat> like, its a valid instruction, what's the issue?

23:22 <heat> (and who's deprecating it? llvm? certainly not arm?)

23:23 <geist> no, i think it's deprecated because it's pseudo undefined, in some cores

23:23 <geist> but again, it's kinda an XY problem. there's very very little reason to do exactly that, so i kinda wonder why you are?

23:24 <geist> store multiple of PC is generally not what you do. it's very common to `stmia ... { ..., lr }` and then later on `ldmia ... { ..., pc }`

23:25 <geist> but the idiom of storing PC is pretty odd, since it's going to best case write the PC of the instruction 2 ahead, so the only reason you'd really want to do that is if you're going to branch in the next instruction, and then are writing something to return to the instruction after the branch

23:25 <geist> which is feasible, but would be pretty specialized case

23:27 <heat> why is storing pc undefined?

23:28 <geist> my guess is since it's a very specialized case, and stmia is a multi cycle instruction, and by definition the last bit written is the last register, it's going to play havok with the pipelie

23:28 <geist> so probably thinking there's some cpu somewhere where that's undefined

23:28 <geist> ie 'we dont wanna deal with that specialized case'

23:31 netbsduser has quit [Remote host closed the connection]

23:33 <geist> ah no i think it's for ARM/thumb reasons

23:33 <geist> arm does straight up say PC is depreciated i the list, so clang is not full of shit

23:34 <geist> (looking in the armv8 arm arm)

23:34 <heat> deprecating a register is definitely new

23:34 <heat> will the designs just stop decoding that instruction?

23:35 <geist> here's why i think it's the case: ARM32 ISA has 16 bits in the stm/ldm instructions, so you can do it (though there are limitations butnot this)

23:35 <geist> thumb1 only has 8 bits, so it can oly encode r0-r7

23:35 <geist> thumb2 only has 14 bits, and one extra one for LR, but no bit for PC

23:36 sonny has joined #osdev

23:36 <geist> so, i think what's going on here is ARM expects the 'unified' assembly syntax (some old thing they rolled out years ago) to work against pretty much all modern arm32 ISAs (thumb2 and arm32)

23:37 <geist> so they basically deprecate using constructs that can't be emitted in both thumb2 and arm32

23:37 <geist> and thus clang is trying to steer you away from it

23:38 <geist> if you want to follow along i'm looking at DDI0487G_a_armv8_arm section F5.1.224 (the section describing STM instruction) page F5-2020

23:39 <geist> https://usercontent.irccloud-cdn.com/file/8o52h4Dk/Screen%20Shot%202022-08-31%20at%2016.38.34.png

23:39 <geist> so it's nothing to do with microarchitecture and mostly to do with ISA compatibility.

23:40 <geist> if you're writing hard asm in a .S file and you're fixing it to .arm i think you're okay, if you're doing it in inline asm or whatnot clang is warning you not to do it because it would break depending on what instruction set the compiler is targetting (and it doesn't know what you want to do there)

23:43 <geist> it's interesting a few pages before how much wiggle room ARM givesn themselves to invalid encodings of these instrutions

23:43 <geist> it has all sorts of things like 'if you include no registers in the stm/ldm list here's a list of things that can happen'

23:44 <geist> and undefined, nop, and more interestingly

23:44 <geist> "The instruction operates as an STM with the same addressing mode but targeting an unspecified set of registers. These registers might include R15. If the instruction specifies writeback, the modification to the base address on writeback might differ from the number of registers stored."

23:44 <mrvn> heat: as geist says the PC is incremented in parallel with the instruction and pipelining so you don't know what value you get when storing the PC. The encoding for the opcode has a bitfield of registers, same for stm and ldm. but only for ldm it's always defined what happens if you include PC.

23:45 <mrvn> geist: so you might buffer overflow your storage, nice.

23:46 <geist> my guess is they're giving themselves wiggle room for 'some schleppy thumb -> arm64 decoder and/or emulator can skip some logic for invalid encodings and just yolo it'

23:46 <geist> shows how much they care about 32bit arm nowadays

23:47 <mrvn> they should have left it undefined then they could reuse the opcode for something else if they need one.

23:48 <geist> well, undefined implies there's logic to detect it

23:48 <geist> vs this which is basically unpredicatable

23:48 <mrvn> nah, not illegal opcode, just undefined

23:49 <geist> ah but 'undefined' means something extremely specific in arm world. that means triggers an undefined opcode

23:49 <mrvn> unspecified

23:49 <geist> what they're doing here is saying 'it'll yolo you in any number of ways' which is effectiely what you're saying

23:49 qubasa has quit [Ping timeout: 252 seconds]

23:49 <mrvn> yeah, bad word choide there with "undefined opcode"

23:49 qubasa has joined #osdev

23:50 <mrvn> "illegal opcode" has a much harsher ring. .oO(I know what this is and I'm not doing THAT)

23:50 sonny has quit [Ping timeout: 252 seconds]