#osdev on 2023-02-05 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:05 <geist> Clockface: note that when yo usay 'any cpu made in the last 7 years' x86 and everything not x86 are fairly different stories when talking about mode switches

00:05 <geist> x86s are a lot faster than they used to be, but mode switching overhead is somewhat higher in general, then say modern arm and riscv cores

00:06 <geist> per clock at least

00:06 <geist> there's just a lot more work to do on x86, and though they're fast for what they have to do, the microcode still has to do a lot

00:13 <mrvn> geist: Do you know how many clock cycles an FIQ takes on ARM or IRQ on ARM64?

00:19 mctpyt has joined #osdev

00:21 <geist> well depends highly on the core and i thin kthere's an implicit memory barrier so it may end up stalling a bit

00:21 <geist> but in general it's like 10s of cycles

00:22 <geist> i have measured in the low ns on a reasonably modern core running at a few Ghz

00:23 <geist> on OOO cores there is an implicit pipeline stall though, but taking an IRQ out of WFI might be faster

00:24 <mrvn> no need to finish the pipeline from WFI state

00:25 <geist> right

00:25 <geist> it *could* start actually implementing the front part of the interrupt (saving the state, etc) at the time the WFI starts

00:25 <mrvn> right, that would be smart

00:25 <geist> i dont think it can on modern armv8 because there sa lot of other things that can happen in a WFI but the idea is something like that

00:26 <geist> i know of some 8 bit micros that did the same thing. apparently even the original 8086 does an interrupt start at HLT

00:26 <geist> it just pauses until the actual event happens

00:26 <geist> since all the state written out to the stack is the same if it is written now or in the future

00:26 <mrvn> it probably runs that so the pipeline gets flushed and such.

00:28 <mrvn> the stack should be observable by other cores.

00:29 * geist nods

00:29 <geist> and possible the mapping could be changed out from underneath it by another core modifying the page tables

00:29 <geist> so possible that optimization is no longer feasible

00:30 <geist> also could trap into SMM, etc. but original 8086 didn't have this problem

00:30 <mrvn> Don't think that would work as the stack should be in the TLB.

00:31 <mrvn> You would have to manipulate it so the stack is right at a page border so the next write does a new page walk etc

00:31 <mrvn> or INVLPG the stack before HLT

00:34 * geist nods

00:34 <geist> or at least it's within the architectural spec because you didn't invlpg it

00:34 <geist> ie, it conforms to the model of what you should observe

00:34 <mrvn> it could also just prepare the write in the write back buffer or something.

00:39 mctpyt has quit [Ping timeout: 252 seconds]

00:52 wand has quit [Remote host closed the connection]

00:59 [itchyjunk] has quit [Ping timeout: 255 seconds]

01:03 [itchyjunk] has joined #osdev

01:08 wand has joined #osdev

01:17 vin has joined #osdev

01:17 <vin> /join #fosdem2023:fosdem.org

01:17 <vin> Oops wrong window!

01:23 slidercrank has quit [Ping timeout: 248 seconds]

01:28 gildasio1 has quit [Remote host closed the connection]

01:29 gildasio1 has joined #osdev

01:43 spikeheron has quit [Quit: WeeChat 3.8]

01:44 immibis_ has quit [Ping timeout: 246 seconds]

01:45 immibis_ has joined #osdev

01:50 vin has quit [Ping timeout: 252 seconds]

02:08 dutch has joined #osdev

02:10 joe9 has joined #osdev

02:19 joe9 has quit [Quit: leaving]

02:20 joe9 has joined #osdev

02:25 wootehfoot has quit [Ping timeout: 252 seconds]

02:27 joe9 has quit [Quit: leaving]

02:28 gog has quit [Ping timeout: 248 seconds]

02:29 joe9 has joined #osdev

02:48 heat has quit [Ping timeout: 260 seconds]

03:15 dude12312414 has quit [Remote host closed the connection]

03:16 dude12312414 has joined #osdev

03:42 terrorjack has quit [Quit: The Lounge - https://thelounge.chat]

03:43 fedorafan has quit [Quit: Textual IRC Client: www.textualapp.com]

03:44 terrorjack has joined #osdev

03:56 smach has joined #osdev

03:57 kiri1606 has joined #osdev

04:02 dude12312414 has quit [Remote host closed the connection]

04:03 dude12312414 has joined #osdev

04:16 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

04:26 epony has quit [Ping timeout: 268 seconds]

05:00 [itchyjunk] has quit [Remote host closed the connection]

05:00 Gooberpatrol66 has joined #osdev

05:09 matrice64 has joined #osdev

05:21 matrice64 has quit [Quit: Textual IRC Client: www.textualapp.com]

05:23 craigo has quit [Quit: Leaving]

05:39 Turn_Left has joined #osdev

05:43 Left_Turn has quit [Ping timeout: 265 seconds]

06:46 immibis_ has quit [Ping timeout: 248 seconds]

06:52 immibis_ has joined #osdev

06:52 Left_Turn has joined #osdev

06:54 Turn_Left has quit [Ping timeout: 252 seconds]

07:11 GeDaMo has joined #osdev

08:31 sprock has quit [Ping timeout: 252 seconds]

08:35 unimplemented has quit [Remote host closed the connection]

08:37 sprock has joined #osdev

08:46 unimplemented has joined #osdev

08:49 immibis_ has quit [Ping timeout: 248 seconds]

09:03 rb has quit [Ping timeout: 260 seconds]

09:04 rwb has joined #osdev

10:21 danilogondolfo has joined #osdev

10:21 gog has joined #osdev

10:34 fedorafan has joined #osdev

10:45 dza has quit [Quit: Ping timeout (120 seconds)]

10:56 Brnocrist has quit [Ping timeout: 248 seconds]

10:56 dza has joined #osdev

11:05 fedorafan has quit [Ping timeout: 265 seconds]

11:06 <ddevault> https://fosdem.org/2023/schedule/event/heliosuk/

11:06 <bslsk05> fosdem.org: FOSDEM 2023 - Introducing Helios Micokernel

11:06 <ddevault> 1 hour :)

11:08 fedorafan has joined #osdev

11:08 smach has quit [Ping timeout: 268 seconds]

11:11 Brnocrist has joined #osdev

11:19 slidercrank has joined #osdev

11:25 x8dcc has joined #osdev

11:25 <gog> i am stepping on my stack somehow

11:25 <gog> it's making things hard to debug

11:28 <moon-child> hmmm

11:28 <moon-child> have you tried not doing that?

11:32 Brnocrist has quit [Ping timeout: 260 seconds]

11:33 small has joined #osdev

11:41 unimplemented has quit [Read error: Connection reset by peer]

11:42 Brnocrist has joined #osdev

11:57 Brnocrist has quit [Ping timeout: 252 seconds]

11:57 Brnocrist has joined #osdev

12:00 unimplemented has joined #osdev

12:02 <gog> yes

12:02 <gog> i figured out how not to do that

12:02 <gog> now when i render a glyph it's to a bitmap that is currently on the stack but in the future i'm going to cache them

12:02 <gog> this might be a premature optimization

12:03 <gog> but it feels better than writing directly to the framebuffer

12:05 sortie has quit [Quit: Leaving]

12:07 Turn_Left has joined #osdev

12:08 sortie has joined #osdev

12:11 Left_Turn has quit [Ping timeout: 252 seconds]

12:18 <clever> gog: in some situations, it may help performance to memcpy an entire line of the glyph into the framebuffer at once, depending on how the caches and bus is configured

12:20 <gog> YES

12:20 <gog> yes

12:20 <gog> that's what i'm doing now

12:20 <gog> although i'm not sure how much it matters in an emulator

12:21 <Ermine> gog: may I pet you

12:21 <gog> yes

12:22 * Ermine pets gog

12:22 * gog prrr

12:23 * Ermine awwws

12:37 leon has joined #osdev

12:37 tejr has joined #osdev

12:37 nikolar has joined #osdev

12:37 yyp has joined #osdev

12:37 MrPortmaster has joined #osdev

12:37 Brnocrist has quit [Ping timeout: 260 seconds]

12:38 <Ermine> We're GOAT

12:39 <MrPortmaster> FOSDEM gang

12:39 unimplemented has quit [Read error: Connection reset by peer]

12:40 fedorafan has quit [Quit: Textual IRC Client: www.textualapp.com]

12:40 tanto has joined #osdev

12:41 <nikolar> just joined from FOSDEM :)

12:41 <tanto> same :)

12:42 <Ermine> Welcome!

12:44 <MrPortmaster> Kinda sad he doesn’t like the discord tho. I personally feel like there is a lot of knowledge there too

12:44 MrPortmaster is now known as MrBonkers

12:45 <nikolar> well discord is a nightmare to sift through and find information

12:45 <nikolar> so it's not much better than irc

12:45 <nikolar> Ermine: thanks :)

12:45 <MrBonkers> It’s indeed not the best to store information yeah

12:46 <nikolar> that's why theres a osdev wiki i imagine

12:46 <tanto> matrix? :^)

12:46 <tanto> speaking of, I should start editing the wiki.

12:46 <GeDaMo> Also, this channel is logged

12:46 <tanto> hi mr fbi

12:46 <MrBonkers> the wiki do be kinda outdated at points tho

12:47 <tanto> exactly. is there a better way of documenting knowledge?

12:47 <MrBonkers> not really to my knowledge

12:48 fedorafan has joined #osdev

12:54 <Ermine> ddevault: are helios messages size unlimited?

12:55 <staceee> hey goats people, I would like to learn kernel development: would you recommend a particular resource or documentations to start with?

12:56 <MrBonkers> ddevault have you ever tried the osdev discord? Has some good people too there

13:03 <Mutabah> staceee: The osdev.org wiki has several decent guides. I also recommend phil-opp's OSdev tutorial

13:08 <ddevault> fuck discord

13:08 Brnocrist has joined #osdev

13:08 <ddevault> Ermine: yes, 64 registers and 64 caps max

13:08 <ddevault> but if one of those is a page cap...

13:10 <gog> oops https://i.imgur.com/0Fq6VJf.png

13:11 <Ermine> kaboom

13:12 Brnocrist has quit [Ping timeout: 248 seconds]

13:12 <Ermine> ddevault: ok, thank you

13:12 <nikolar> gog: spicy

13:13 Brnocrist has joined #osdev

13:13 mehdix has joined #osdev

13:14 <moon-child> you fucked up

13:19 Left_Turn has joined #osdev

13:20 <gog> i did

13:20 <gog> idk how or where or when or why or how

13:21 unimplemented has joined #osdev

13:21 <gog> oops

13:21 <gog> inverted condition

13:23 Turn_Left has quit [Ping timeout: 252 seconds]

13:24 <gog> i am not good at computer

13:25 <moon-child> oh yeah well

13:25 <moon-child> ur mum am not good at computer

13:27 <Ermine> gog: happens

13:30 Brnocrist has quit [Ping timeout: 252 seconds]

13:31 <FireFly> gog: weirdly aesthetic console :p

13:36 <gog> yes

13:39 <gog> moon-child: :<

13:46 craigo has joined #osdev

13:55 small has quit [Quit: Konversation terminated!]

13:56 Brnocrist has joined #osdev

14:03 Brnocrist has quit [Ping timeout: 248 seconds]

14:10 heat has joined #osdev

14:10 <heat> gewd afternon

14:13 <gog> heat heatheat heat basel

14:13 <Ermine> hewwo heat

14:13 <heat> good afternoon from bazel, switzerlandinga

14:14 <heat> gog operating system development now??

14:14 <gog> yes

14:14 <Ermine> ontopic happens

14:14 <gog> i'm developing raster skills

14:14 <gog> one love jah

14:15 <heat> hehehe

14:15 <heat> get it? cuz raster and rasta

14:15 <gog> yes

14:15 <heat> ngl your font drawing looks like ass

14:16 <heat> hopefully the pixel garbage isn't part of your raster skills

14:16 <Ermine> fonts are hard

14:16 <heat> they are

14:17 <moon-child> who the fuck is sophia

14:18 <heat> who the fuck is moon-child

14:18 <moon-child> not me

14:21 gildasio1 has quit [Remote host closed the connection]

14:21 gildasio1 has joined #osdev

14:23 <heat> https://i.imgur.com/4XX6ZA9.png best looking 16 yo englishman

14:24 Brnocrist has joined #osdev

14:28 <gog> the font isn't mine

14:28 <gog> i know it's terrible

14:28 <gog> nbut the pixel garbage is fixed

14:29 Brnocrist has quit [Ping timeout: 265 seconds]

14:29 [itchyjunk] has joined #osdev

14:29 <gog> https://i.imgur.com/PALhQkN.png

14:29 <gog> i hate the font though, it's really ugly

14:30 Brnocrist has joined #osdev

14:30 <heat> oh cool coo

14:30 <heat> l

14:30 <heat> so what's your strat again?

14:30 <heat> like cache renderings of codepoints?

14:31 <gog> eventually

14:31 <gog> rn i'm rendering each glyph for every call

14:31 <heat> i would like to do that but I realize it's impossible or maybe non-obvious for a terminal

14:31 <heat> since your bg/fg may change

14:32 <gog> true

14:32 <gog> i might render them without color and use them as masks in the case of hardware that supports mask sprites

14:32 <moon-child> hmmm I guess you could change the colour inline

14:32 <moon-child> though that gets dicy as soon as you add aa

14:32 <gog> but then you're basically just re-rendering the glyph each time anyway

14:33 <moon-child> well, it's less memory traffic

14:33 <moon-child> (see, this is why I don't like software rendering)

14:33 <gog> yeh

14:33 <Ermine> ddevault: I forgot what was on that slide, but does helios provide any API to map virtual addresses to physical? It's useful for DMA implementation

14:34 <heat> moon-child, no AA, this is bitmap mr moon

14:34 <moon-child> yeah, hence 'as soon as you add'

14:35 <moon-child> gog: proggy

14:35 <heat> "never"

14:35 <heat> there you go

14:35 <moon-child> :<

14:35 <gog> goggers is proggers

14:35 <heat> doing AA inside the fucking kernel sounds depressing

14:35 <gog> this is eventually going to move outside the kernel

14:35 <gog> i just need something to show me text rn

14:35 <gog> so i don't have to watch the serial console for everything

14:35 <heat> no unix??

14:35 <moon-child> didn't windows to text rendering inside the kernel

14:36 <gog> yes

14:36 <moon-child> and then had a fucktonne of vulnerabilities

14:36 <moon-child> lol

14:36 <gog> i think the TTF engine in kernel still exists?

14:36 <Ermine> gog: what about vesa/gop?

14:36 <gog> Ermine: i'm using GOP here

14:36 <gog> or rather, passing the pointer and pixel format and pitch forward to the kernel

14:36 <Ermine> ah wait, it doesn't have text mode

14:36 <gog> yeah

14:36 <Ermine> okay, okay..

14:37 <gog> so i'm forced to implemented a halfass framebuffer console

14:37 <heat> right-wing gog using gop

14:37 <gog> what

14:37 <gog> oh

14:37 <gog> lol

14:38 <Ermine> heat: you mean the party?

14:38 <gog> yes he's making a pun

14:38 <gog> he's not funny

14:38 <moon-child> it's only not funny because you explained the joke

14:39 <heat> i am hilarious

14:44 <ddevault> Ermine: yes

14:45 Brnocrist has quit [Ping timeout: 265 seconds]

14:50 <Ermine> ok, maybe will take a look at it

14:56 <Ermine> well' it's too different to compare with minix imo

15:00 <zid> I like my half assed framebuffer console

15:00 <zid> I'm just not sure when I should update it

15:01 <heat> now

15:01 <zid> okay we're on frame 1 then

15:05 <phr3ak> anyone have experience how gd32 is compatible with stm32f4xx? I want to patch an stm32f429 bootloader to work on gd32f450.

15:11 Brnocrist has joined #osdev

15:13 unimplemented has quit [Read error: Connection reset by peer]

15:13 <x8dcc> my framebuffer console rocks

15:15 <zid> does yours have ayame bg

15:15 <x8dcc> mine has gimp peppers on top

15:15 <zid> yea mine is way better then

15:15 <x8dcc> because I can initialize it with position and dimension sheeeesh

15:16 <x8dcc> zid: probably. I have had a look at your os btw

15:16 <zid> 'os' :D

15:17 <x8dcc> it has been useful, but I have lots of questions

15:17 <zid> I bet

15:17 <zid> I have lots of questions and I wrote it

15:17 <clever> x8dcc: my framebuffer console is using sprites to do scrolling with zero copying, and its actually too fast, i need to throttle it on vsync or it glitches out

15:17 <x8dcc> not related to boros, its just that I am not sure what should I focus on

15:17 <x8dcc> clever: what do you mean by zero copying?

15:17 <zid> it's composited

15:17 <zid> the gpu does the scroll by moving the geometry around

15:18 <clever> yep

15:18 <zid> rather than moving pixels around in a big texture

15:18 <zid> with memcpy

15:18 <clever> exactly

15:18 <zid> (aka blitting)

15:18 <x8dcc> oh, I don't mess with gpu

15:18 <zid> he's on a pi so he gets to cheat

15:18 <x8dcc> I mess with the peppers if you know what I mean

15:18 <nikolar> well that doesn't count

15:18 <zid> x8dcc: Are you going to ask any of those questions

15:19 <clever> nikolar: i still had to init the gpu from scratch!

15:19 <clever> id say that counts :P

15:19 <moon-child> didn't you have to reverse engineer it

15:19 <nikolar> let's see you write an intel/amd driver huh

15:19 Brnocrist has quit [Ping timeout: 252 seconds]

15:19 <clever> moon-child: i had to read the linux drivers and write my own docs from that

15:19 <x8dcc> zid: well more than a specific question is just that I don't know what should I do next

15:19 <zid> oh boring

15:19 <clever> moon-child: and decompile some firmware for the missing init steps

15:19 <nikolar> clever: i'll give you half points :P

15:19 <x8dcc> I made a simple shell, which is called by the kernel

15:19 <x8dcc> not a module or anything, uses libk too

15:20 <x8dcc> and I would like to make a proper userspace and what not, but I have to do some stuff before

15:20 <clever> nikolar: i still dont have hdmi working, i think the PLL is in another power domain, and linux cant bring that online

15:20 <nikolar> really

15:20 <nikolar> got to love embeded platforms

15:20 <clever> linux relies on the closed firmware to bring the PLL online

15:20 <x8dcc> I have been thinking about doing something related to ext2 or fat, which is one thing I am not sure about

15:20 <clever> so the instant you replace that firmware, the PLL just stays dead

15:21 <x8dcc> but I don't have paging yet, and I don't know if I really need it for now

15:21 <nikolar> clever: kind of leaves you with no options

15:21 <nikolar> wasn't there a project to rewrite raspi firmware as open source

15:21 <clever> nikolar: i still have 2 options, DPI (which can convert to vga) and composite (ntsc/pal)

15:21 <moon-child> is the kind of thing where you need a key to bring it up?

15:21 <moon-child> or just no one did the legwork yet

15:22 <clever> nikolar: thats what ive been working on

15:22 <nikolar> oh nice

15:22 <clever> moon-child: i think its just an undocumented control register needing a bit flipped

15:22 <clever> but the problem is which register? which bit?

15:22 * moon-child nods

15:23 <heat> x8dcc, paging

15:23 <heat> absolutely paging

15:24 <heat> then a vfs + tmpfs

15:24 <clever> nikolar: https://ext.earthtools.ca/private/rpi/standard-gfxconsole.mp4 this is memcpy based scrolling, with the stock LK gfxconsole

15:24 <x8dcc> heat: I have been thinking about paging and I have come to the humble conclusion that I don't really know why it's useful

15:24 <clever> it goes slowly enough that you can see it scroll

15:25 <x8dcc> heat: I *think* I understand the concept, but I don't know why I need it

15:25 <heat> x8dcc, that's an easy question. 1) memory protection

15:25 <clever> nikolar: https://ext.earthtools.ca/private/rpi/faster-console-1.mp4 and this is then my re-implementation, blink and youll miss it, lol

15:25 <x8dcc> so processes don't write to other processes memory?

15:25 <heat> 2) process isolation 3) with virtual addressing you stop needing contiguous physical memory

15:26 <heat> 1) is mprotect, 2) is processes don't write to other processes' memory

15:26 <x8dcc> heat: yeah, for example with process isolation, I was not sure if I needed it before multitasking

15:26 Brnocrist has joined #osdev

15:26 <nikolar> clever: lol

15:26 <nikolar> love the commodore colourscheme too

15:26 <clever> x8dcc: the mmu also helps with load-address, a lot of binaries break if you load it to the wrong addr in ram

15:26 <clever> you can do relocation patching, but that can be tricky

15:27 <clever> nikolar: the background border even works nearly identically to how it does on the c64, the 2d core on the rpi has a background fill option, so you can change that whole border with a single register write

15:28 <x8dcc> clever: yeah, I had a look at stuff related to processes and I saw where they were loading them, and I felt like that could be a problem

15:28 <clever> you can even change it in hsync, like on the c64, and profile code with that

15:28 <gog> can qemu vga accelerate

15:28 <gog> i'm not finding a lot in the way of docs here

15:29 <heat> wdym

15:29 <Ermine> gog: wsym

15:29 <Ermine> wdym

15:29 <heat> Ermine, omg twiiiiiiiiiiinssss

15:29 <gog> like can i put sprites in it and tell it to move the sprites around without doing it in "software"

15:30 <gog> i know it's all software and it's not really accelerated, just more of a shortcut for the next things i need to do

15:30 <clever> gog: the moue cursor is one form of that

15:30 <zid> yea vga is dumb as hell

15:30 <zid> when qemu-voodoo2

15:30 <x8dcc> heat: so for example now I am using a heap allocator I made for alloc and all that, so now I would have to change that to allocate with paging? this is the kind of stuff that confuses me

15:30 <gog> yes

15:30 <heat> gog, not in vga, but qxl and vmware and virtio-gpu support that

15:30 <gog> x8dcc: you can build allocators atop allocators

15:30 <heat> ^^

15:30 <gog> ok i'll look at virtio-gpu

15:30 <clever> gog: i think most dumb desktop gpu's are limited to 2 sprite like overlays, a small one for the cursor, and a large one for accerated video playback, plus the main framebuffer

15:31 <clever> and the rest all goes into the 3d half of things

15:31 <x8dcc> gog: not sure I understood that :/

15:31 <gog> x8dcc: your heap allocator is going to be atop something like SLAB which is atop a page frame allocator

15:31 <gog> it's allocators all the way down

15:31 <moon-child> gpu should have more options for latency oriented stuff

15:31 <moon-child> teh pipelines are too deep!

15:32 <gog> not that any of this is really necessary i just don't want to deal with moving huge bitmaps around in my cheap framebuffer console code

15:32 <clever> the video surface tends to be chroma-key'd into the main framebuffer, and accepts yuv

15:32 <Ermine> If you want to try virgl, idk if it stable right now

15:32 <heat> virgl is super stable

15:33 <heat> it's like 80% of virtio-gpu magic

15:33 <clever> gog: with the power of a proper composition capable core, things like https://www.youtube.com/watch?v=JFmCin3EJIs use basically 0 cpu time

15:33 <zid> virgin gl vs chad dx?

15:33 <bslsk05> 'Chaos, 13 sprites randomly bouncing around' by michael bishop (00:00:12)

15:33 <heat> the rest honestly kind of sucks

15:33 <Ermine> heat: At least I didn't manage to make it work in my virtual machines

15:33 <moon-child> zid: what's the diff?

15:33 <Ermine> (or is it supposed to be laggy?)

15:33 <heat> Ermine, did you do gl=on

15:34 <Ermine> Is it qemu option? Idk then, I use libvirt

15:34 <heat> yes, qemu option

15:34 <heat> libvirt is libcringe

15:35 <Ermine> I should seat down and learn how to use qemu directly, I know

15:36 <nikolar> clever: suspicioulsy, my cpu runs at like 60% of a single core even though it should have hardware acceleration for video decoding

15:36 <clever> nikolar: use top to find out which pid is to blame, then `perf top -p <PID>` to see what function is to blame

15:36 <heat> hardware video accel sucks in linux

15:36 <heat> like most things suck in linux

15:37 <clever> heat: a common issue i see on the rpi, is that xorg wants rgb images, so even if the gpu can accept a yuv frame, your wasting cpu converting yuv->rgb, to pass it off to xorg

15:37 <clever> similar issues may occur on other hardware

15:37 <clever> because xorg is equally dumb on all platforms

15:38 <zid> if your gpu only does yuv I feel bad for it

15:38 <clever> more, that the gpu can accept both rgb and yuv, so you can just feed the yuv frames in as a composition layer

15:38 <clever> but xorg doesnt allow that, and demands you convert to rgb first

15:38 <Ermine> clever: is it problem on wayland? If the client gives you rgb buffer you still need to convert it

15:39 <zid> that seems totally fine

15:39 <zid> the alternative is that it settled on yuv, and fuck that

15:39 <clever> Ermine: i know less about wayland, but one of the wayland backends uses kms for composition, and that could fully hw accelerate things

15:39 <nikolar> clever: libavcodec.so.59.37.100, don't have debug symbols though

15:39 <clever> but kms is limited to 32 surfaces on linux i think

15:40 <clever> nikolar: definitely sounds like its not hw accel

15:40 <nikolar> yeah 60% is suspicious as i said

15:40 <nikolar> didn't expect i'd go down the hardware acceleration rabbit hole today

15:41 <clever> if i perf top a chrome renderer process for YT, but its paused, its peaking at only 4% cpu in a single function, some js stuff

15:41 <clever> if i then play the video, some changes to the top functions, but nothing over 4%

15:42 <clever> CopyRow_AVX.cfi is using about 2-3%

15:42 <nikolar> [ffmpeg] AVHWDeviceContext: Cannot load libcuda.so.1

15:43 <nikolar> [ffmpeg/video] hevc_v4l2m2m: Could not find a valid device

15:43 <nikolar> well that confirms it

15:44 <clever> MergeUVRow_SSE2 and vpx_filter_block1d16_h8_ssse3 are also near the top

15:44 <clever> so its clearly using sse2 and sse3 extensions to accelerate some things

15:45 <nikolar> ok that was an easy fix

15:45 <nikolar> needed some decode libraries for amd gpus

15:45 <clever> and if i check a different video file in a more native player, ff_h264_decode_mb_cabac is near the top

15:46 <clever> i'm not sure if this gpu can do much video accel

15:46 <nikolar> now it's all in kernel clever: 9.90% [kernel] [k] read_hpet

15:46 <nikolar> wonder why it's hammering the hpet so much

15:47 <clever> hpet is often used to pace the playback, so the fps is stable

15:47 <nikolar> but 10%

15:47 <nikolar> really=

15:48 <clever> yeah, thats kind of high

15:48 <nikolar> 30% on a higher bandwith video

15:48 <clever> what is your cpu?

15:48 <nikolar> r5 5500u

15:49 danilogondolfo has quit [Remote host closed the connection]

15:49 <moon-child> I wonder if it's sampling error

15:49 <clever> nikolar: https://www.cpubenchmark.net/compare/4141vs1780/AMD-Ryzen-5-5500U-vs-AMD-FX-8350-Eight-Core

15:49 <bslsk05> www.cpubenchmark.net: AMD Ryzen 5 5500U vs AMD FX-8350 Eight-Core [cpubenchmark.net] by PassMark Software

15:50 <moon-child> cus it's waking up, checking time, and at the same time sampling that in the profile

15:50 <nikolar> moon-child: yeah guess it could be

15:50 <clever> nikolar: your cpu is about 2.1x faster then mine, and uses 10% of the wattage, wut?

15:50 <nikolar> why is it higher on a more demanding video

15:50 <nikolar> clever: laptop cpu from 2021 ¯\_(ツ)_/¯

15:51 <clever> nikolar: amd cpu from when people thought amd couldnt recover and would go under, lol

15:51 <clever> thats just how bad mine is :P

15:51 <nikolar> clever: lol yeah

15:51 <zid> power usage scales with like, voltage squared, and they like to drive desktop chips at higher pwoer levels for more cycles

15:51 <nikolar> amd made massive jumps in the past few years

15:51 <zid> If I want 5GHz on my cpu rather than 4GHz I have to take it from 120W to 200W :P

15:51 <nikolar> not to mention smaller process

15:52 <nikolar> zid: moar power

15:52 <zid> need more cooling for that though

15:52 <clever> that reminds me, in the past, i could set off the UPS overload alarm, just by maxing all 8 cores

15:52 <zid> pair of £5 fans not cutting it for 5GHz

15:53 <nikolar> clever: get a better ups :P

15:53 <clever> nikolar: it was running half my monitors, and the desktop

15:53 <clever> and it didnt warn, if i turned a monitor off

15:53 <nikolar> lol that would do it

15:53 Brnocrist has quit [Ping timeout: 255 seconds]

15:53 <clever> but it has since died, i forget why, and been replaced with one that doesnt complain, but does die after just 10mins of outage

15:53 <clever> and its only running 1 monitor!

15:54 <nikolar> does it at least warn you to shut off your computer in time

15:54 <clever> not really

15:54 <clever> no low-battery alarm, just *click*, its dead

15:55 <heat> the slander!

15:55 <heat> the fx 8350 is the greatest heater that amd ever made

15:55 <nikolar> doesn't sound particulary useful as ups

15:55 <zid> fx 8350 actually really good cpu

15:56 <zid> it had the single core world record for like 15 years

15:56 <nikolar> heat: does it compete with pentium 4s though

15:56 <clever> nikolar: it covers short blips, and gives me time to do a proper shutdown, but not much more

15:56 <nikolar> but you might not even notice the power was out in time

15:56 <nikolar> if there's no idication

15:56 <moon-child> oh yeah

15:56 <moon-child> zid: I got your favourite cpu

15:56 <zid> w-2195?

15:56 <moon-child> 2295

15:56 <zid> w-1390p?

15:56 <zid> 2295 is sick af

15:56 <moon-child> it doesn't actually clock all the way up though

15:56 <zid> 1390p > 2295 though

15:57 <clever> nikolar: outages are very noticable, half my monitors die, and the fridge makes a sudden clunk noise, then the whole house goes silent

15:57 <moon-child> idk why. But it goes up to 3.7ghz, and maintains that on all cores, so I'm happy

15:57 xenos1984 has quit [Read error: Connection reset by peer]

15:57 <moon-child> zid: :<

15:57 <zid> why did you break your 2295

15:57 <nikolar> clever: well if it works ¯\_(ツ)_/¯

15:57 <moon-child> zid: that literally has two memory channels

15:57 <moon-child> and 8 cores

15:57 <zid> yea but it's so fast

15:57 <moon-child> wtf is the point of a cpu with only 8 cores and 2 memory channels

15:57 <zid> 5.3Ghz out of the box

15:58 <moon-child> can't computer anything with that

15:58 <zid> 2295 is for webserver

15:58 <zid> well, workstation

15:58 <zid> 1390p for desktop

15:58 <zid> and what was the webserver one like.. 3175? I don't remember

15:58 <zid> 2225 also very good

15:59 <zid> W-3375 is the webserver

15:59 <moon-child> 2225 has 4 cores

15:59 <moon-child> that's even less than 1390p

15:59 <zid> 4.6GHz and quad though

15:59 <zid> and 1/3rd the price

16:00 <zid> and actually exists, afaik, 1390p is paper launch

16:00 <moon-child> only 1/3?

16:00 <heat> xeon weirdos

16:00 <heat> use normal CPUs

16:01 <moon-child> but fast

16:01 <moon-child> and ecc

16:01 <zid> and 2011 is fully unlocked :D

16:01 <clever> nikolar: but recently, a big storm has blown thru, it got down to -25c, heavy wind, and 7 interrupts to the power in the last week!

16:01 <zid> why the 2011 xeons are unlocked nobody knows

16:01 <zid> but they am

16:01 <nikolar> clever: that's a lot

16:01 <zid> They're all K or X cpus as intel would name them

16:01 <moon-child> if I don't ecc, I might end up with the wrong bits

16:01 <nikolar> where does it get to -25c lol

16:01 <moon-child> WRONG BITS!

16:02 <zid> 1620 is a K, 1650 is an X

16:02 <zid> and they're $20

16:02 <clever> nikolar: some where short 20min outages, enough to kill the ups, some where planned 4h outages, and some where just a blip, under 1 second, enough to reboot everything

16:02 <moon-child> zid: I think there was some stuff in bios where I could adjust cpu voltage

16:02 <nikolar> clever: under 1s?

16:02 <zid> yea I always turn mine down usually

16:02 <clever> nikolar: NB canada

16:02 <nikolar> isn't that why you have ups

16:02 <nikolar> clever: ah makes sense

16:02 <zid> My cpu out of the box idles at 1.1V but it's stable at 0.8V

16:02 <clever> nikolar: yep, but the UPS only covers 1 machine, i have another 2 rooms with computers

16:02 <zid> saves like 40W

16:03 <clever> so i would need 2 more UPS's to cover everything

16:03 <zid> just need to up the load-line so that it doesn't die at full turbo from not enough wolts

16:03 <nikolar> clever: yeah that's unfortunate

16:03 <moon-child> load-line?

16:03 <zid> it's what they call the voltage scaling

16:03 <zid> loadhttps://www.pcgamer.com/what-is-load-line-calibration-in-my-bios-and-how-can-i-use-it/

16:03 <bslsk05> www.pcgamer.com: What is 'load line calibration' in my BIOS and how can I use it? | PC Gamer

16:03 <nikolar> clever: on possible way to minimize fs corruption is to use something like zfs i guess

16:03 <nikolar> but nothing can replace hardware protections

16:04 <zid> I have like, low med high very high obscene options

16:04 <zid> as well as pwm frequency

16:04 <clever> nikolar: i do have zfs on most of the new systems, but the router and irc client are a mix of ext4 and xfs

16:04 <zid> my bios is nuts

16:04 <nikolar> clever: yeah my laptop is on ext4 but i plan to install zfs on next os reinstall

16:05 <nikolar> whenever that happens to be

16:05 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1071823753857745017/image.png

16:05 <zid> I have pages and pages of this shit

16:05 <clever> same image twice!

16:05 <zid> oh whoops

16:06 <zid> https://cdn.discordapp.com/attachments/417023075348119556/1071824185002827786/image.png

16:06 <zid> It's a stupid skinned window that isn't screenshotting properly

16:07 <clever> zid: that sounds similar to the voltage/freq ratios in arm/rpi, where you need to raise the voltage before raising the clock

16:07 <zid> I have it scale from 0.8V to 1.3V between 800MHz and 4.4GHz

16:07 <clever> thats also something i have not yet tried messing with on the open firmware

16:07 <clever> and i suspect its part of my instability

16:08 <clever> the voltage is whatever the reset-default is, and i'm probably clocking it too much for that

16:08 <zid> I can also do all this same shit on my dram ofc

16:08 <zid> all per channel

16:08 <clever> yep, the rpi has 3 seperate voltage settings for the dram

16:09 <nikolar> clever: or you could lock it to the lowest frequency :P

16:09 <clever> phy, io, and core

16:09 <zid> I can make different channels run on different PWM frequencies and stuff, it's silly :D

16:09 <clever> dang!

16:09 <clever> ive only got 1 dram channel, so not much to play with there

16:10 Brnocrist has joined #osdev

16:10 <zid> I have four but they're paired sometimes for some options

16:11 <zid> slot AB and slot CD

16:11 <zid> but I used paired kits so it dosn't impact me

16:11 <zid> imagine being able to find 4 of the same dimm

16:11 <clever> my desktop came with a pair of ram sticks, and i later bought a second pair to upgrade it

16:11 <clever> but that 2nd pair has since turned faulty

16:12 <zid> I found out about a nice cheap dimm *after* I already bought different ones, like within a week, kinda sad

16:12 <zid> so now I'd need 4 new ones again, fuck that

16:13 <zid> 16GB 1066MHz urdimms exist and aren't anywhere near as expensive as I would have guessed, considering the price of 8GB 933MHz urdimms was teetering on ancient relic sealed inside a tomb prices

16:13 <zid> I found a cheap kit of the latter and thought I did really well

16:14 <nikolar> zid: what do you need 16gb 1066mhz ram sticks for

16:14 <zid> so that I can have lots of fast ram?

16:14 <zid> surely that's better than 4GB of 800MHz ram

16:14 <nikolar> well i wouldn't call 1066 fast

16:14 xenos1984 has joined #osdev

16:15 <zid> 1066 is very fast

16:15 <heat> ddr3 moment

16:16 <zid> I legit see people recommending single channel for ddr4 cus it trains so badly :D

16:17 <zid> you have to back off all the timings to make two dimms of it work in the same postcode

16:18 <nikolar> lol

16:23 <zid> https://cdn.discordapp.com/attachments/272621603836133378/1026632860502859796/unknown.png ddr4 users be like "look at my riced system"

16:24 <zid> "I got the teriaries really tight!"

16:24 <zid> >25GB/s

16:32 srjek has joined #osdev

16:40 <mrvn> With ram why aren't we driving 4 DIMMs in parallel instead of just 2?

16:42 <mrvn> or ave DIMMs with twice the number of chips on it and have the controller on the DIMM alternate them?

16:47 <moon-child> at a guess: cost ineffective vs capacity

16:47 <moon-child> servers have a shitton of channels _and_ a shitton of dimms

16:51 <mrvn> But instead of driving 8 chips at 1066MHz your could drive 16 chips at 533MHz needing less power and producing less heat and requiring less tollerances on the chips.

16:59 <dinkelhacker> clever: The pi sets all the interrupts to group 1 in the armstubs. Qemu seems to do the same. Do you know if this is part of a protocol? I checked the linux boot protocol and haven't found a hard requirement for that.

17:00 <mrvn> dinkelhacker: isn't that so a non-smp kernel can boot?

17:00 <mrvn> an armv6 kernel won't know how to set the group

17:01 <dinkelhacker> Hm.. I don't see what it has to do with smp? You have to be in secure wold to do that. And according to the linux boot protocol the kernel can be booted in EL2 or non secure EL1. So it kinda makes sense to do that. I was just wondering if that is something you can rely on when targeting platforms that also support linux.

17:03 <mrvn> don't rely on anything.

17:03 <dinkelhacker> well some things you have to rely on^^

17:05 <clever> dinkelhacker: i think its also, because once you drop to NS mode, you cant change the groups, and the armstub doesnt have any EL3 handlers

17:05 <clever> so once you go to NS mode, secure mode is permanently lost

17:06 <dinkelhacker> Is EL2 always non secure?

17:06 <clever> EL2/hypervisor can be both secure and non-secure

17:06 <clever> its up to EL3 to decide which one it will be when EL3 does eret

17:06 <dinkelhacker> Ok. So it could just boot in secure el2

17:07 <clever> and only by trapping back to EL3 (such as the svc? opcode) can you change that

17:07 <clever> dinkelhacker: i think some things expect NS mode, and will malfunction in S mode

17:10 <dinkelhacker> I did originally boot the kernel in el3 and did all that myself. But maybe I can get rid of that stuff and just expect it to be done right until I'll stumble on something wherer it is not done that way.

17:11 <clever> i think EL3 doesnt support the mmu?

17:11 <clever> and EL2 can lack a high/low split in its MMU, so supporting a userland is tricky

17:11 <clever> linux+kvm runs in EL1, but has a stub in EL2 for accessing protected registers

17:12 <dinkelhacker> I think it does.. but anyway I was just setting up the gic and dropping to el1

17:12 <clever> and any time linux+kvm context switches to a guest, it has to go EL1(host)->EL2->EL1(guest)

17:12 <clever> which is a performance cost

17:12 <clever> there is an EL2 extension, that allows EL2 to act more like EL1

17:12 <clever> all of the EL1_ registers become aliases, pointing to the respective EL2 version, and EL2 gains a high/low split

17:12 <nikolar> clever: apparently i can't make firefox use hardware acceleration

17:13 <clever> so a relatively unmodified "EL1" linux can run in EL2 instead, and then it becomes EL2(host)->EL1(guest)

17:13 <clever> saving a step

17:13 <dinkelhacker> nice

17:14 <clever> the rpi lacks that EL2 extension

17:53 <gog> mew

17:53 <nikolar> mew

17:59 xenos1984 has quit [Ping timeout: 246 seconds]

18:00 xenos1984 has joined #osdev

18:02 <moon-child> mew

18:07 <nikolar> clever: apparently it's better to watch a youtube video through mpv than firefox

18:07 <nikolar> for acceleration

18:08 <clever> nikolar: there is also software like freetube i think, which can do similar

18:09 fedorafan has quit [Ping timeout: 252 seconds]

18:09 <nikolar> i just can't get ff to do hardware acceleration

18:10 foudfou has quit [Ping timeout: 255 seconds]

18:12 <clever> nikolar: i have heard that chrome does do hw accel

18:12 fedorafan has joined #osdev

18:13 nyah has joined #osdev

18:14 <nikolar> i don't know if i need to enable something but it's at like 100% on one core

18:16 foudfou has joined #osdev

18:16 <GeDaMo> nikolar: in Firefox, try Help > More Troubleshooting Information, there's a graphics section there

18:17 <nikolar> i followed instructions on arch wiki

18:33 <mrvn> oh no, a follower. :)

18:36 srjek|home has joined #osdev

18:36 srjek has quit [Ping timeout: 252 seconds]

18:46 <nikolar> mrvn: kek

19:10 epony has joined #osdev

19:12 wootehfoot has joined #osdev

19:16 xenos1984 has quit [Ping timeout: 260 seconds]

19:31 Turn_Left has joined #osdev

19:31 srjek has joined #osdev

19:32 xenos1984 has joined #osdev

19:33 <dinkelhacker> my dtb is 1MB but only has data until ~0x2200 rest seems to be zeroed. Does anybody know why?

19:33 <mrvn> how do you know it's 1MB?

19:34 <dinkelhacker> Well it's the size of the file and also if I look into the header

19:34 srjek|home has quit [Ping timeout: 260 seconds]

19:35 Left_Turn has quit [Ping timeout: 252 seconds]

19:35 <dinkelhacker> according to the fdt spec the second uint32 is the length of the dtb which is 0x1000 in my case. After swizzeling that its 0x100000

19:36 bgs has quit [Remote host closed the connection]

19:38 <heat> probably just padding

19:39 <zid> or their build tool doesn't fill it in automatically so they just said 1MB should be plenty for forever

19:40 <dinkelhacker> Ok yeah it's the one qemu genenrates. Probably that is one size fits all

19:40 wootehfoot has quit [Read error: Connection reset by peer]

19:41 <mrvn> swizzeling?

19:42 <mrvn> It 0x1000 maybe the length rounded up to the next page?

19:47 joe9 has quit [Quit: leaving]

19:55 <dinkelhacker> mrvn, chaning endianess. 0x1000 le is 0x100000 be

19:59 troseman has joined #osdev

20:02 micttyl has joined #osdev

20:12 <geist> my guess is they just padded it out

20:12 <geist> i've seen that when doing dumpdtb with qemu

20:12 <geist> it's the space reserved for the largest possible dtb they could ever have

20:16 <clever> that reminds me, i padded my dtb out to 16mb a few months ago, that entirely broke linux

20:16 <clever> had to reduce it down to a more sane value

20:18 <mrvn> what did you put in there?

20:22 <geist> i suppose one could stick entire binaries in it, or other binary data

20:22 <geist> images, etc

20:23 <mrvn> is there a entry for that in the specs?

20:25 <geist> the specs are pretty ad hoc in places

20:26 GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]

20:27 <mrvn> spash_screen = <0x00000000, 0x10101010, ... >

20:30 dutch has quit [Quit: WeeChat 3.8]

20:33 fedorafan has quit [Ping timeout: 252 seconds]

20:35 <clever> mrvn: there is a size field near the start, that tells you how big of an area the dtb covers, but thats unrelated to where the end marker actually lives

20:35 <clever> its more of a safety/helper, so you can just memcpy the whole thing without parsing it

20:35 <clever> and if you go past the end, something has gone wrong

20:35 <mrvn> obviously.

20:36 <clever> and libfdt also uses that size to enforce not writing past the end of an allocated buffer

20:36 <clever> so i just whacked it to 16mb, the size of my buffer

20:36 <mrvn> have you checked why linux blows up?

20:37 fedorafan has joined #osdev

20:38 <clever> mrvn: not fully, i assume there is some max size its willing to reserve and copy/parse

20:39 <clever> libfdt also has a shrink function, that will parse the tree, find out the minimum size, update the length field, and report that size

20:39 <clever> that entirely resolved the issue

20:40 <mrvn> I could understand it running out of memory for page tables to map it or something.

20:40 micttyl has quit [Quit: leaving]

20:41 <clever> yeah, there might be an area of early ram, that it assumes it just available, enough to get the mmu online, and parse the dtb later

20:41 <clever> and it just blindly respects that size field

20:41 <geist> yah libfdt is probably almost more useful in scenarios where you're actually building and modifying the DTB

20:41 <geist> ie, a boot loader

20:41 <geist> more than half its routines are modifying things

20:41 <mrvn> geist: it's huge for just parsing the dtb.

20:42 <clever> yep, thats exactly what i was doing as well, loading an existing dtb file, expanding it, and then modifying some bits

20:42 <geist> yah but most of that strips out if you only call a few routines

20:42 <heat> huge?

20:42 <heat> it's pretty small

20:42 <geist> most of the routines are leaf nodes, so it collapses nicely if you're using linker gc

20:42 <mrvn> indeed.

20:42 <geist> i've looked into it for LK and it works nicely

20:42 <clever> libfdt also managed to trigger a compiler bug in the VPU fork of gcc

20:42 <mrvn> ouch

20:43 <geist> heat: oh BTW did you get what i discovered about the mem reserve stuff?

20:43 <clever> the VPU has a `switch r0` opcode, where it will then expect an array of 8bit pc-relative offsets to directly follow the opcode

20:43 <clever> so switch-case blocks can be easily handled

20:43 <heat> geist, yep

20:43 <clever> gcc decided to reuse the link register, and do `switch lr`

20:44 <clever> and apparently, that doesnt work, and the cpu malfunctions, going down the wrong branch

20:44 <geist> heat: yah didn't kow if yo had read that or not. interesting at least

20:44 <mrvn> clever: that sounds like someone screwed up the specs for the switch opcode.

20:44 <clever> yeah, they assumed lr was like any other general-purpose reg

20:46 <mrvn> ARM has a bunch of special cases for regs too.

20:47 <mrvn> sp and pc mostly.

20:48 <clever> yeah, the pipeline can make pc very different from how you expect it

20:48 <geist> which they removed in arm64 precisely because they wanted no special cases

20:48 <mrvn> geist: you mean they remove PC from being a general register, right?

20:49 <geist> and SP

20:49 <mrvn> that part I hate.

20:49 <mrvn> such a waste with stackless languages.

20:49 <geist> do you hate it like it hurts you or you hate it in that you find it distasteful?

20:49 <clever> how do you manage a framepointer then?, what can SP do?

20:49 <geist> ah. well you can use regular registers as stacks

20:50 <clever> can add/sub/mov still interact with SP?

20:50 <mrvn> clever: You have special opcodes that work with the SP. Like push/pop.

20:50 <geist> it's primarily because things like 'sp must be aligned on 16 bytes' and 'sp is banked' that makes sense to remove it out of the regular register file

20:50 <geist> so there's no special case

20:50 <clever> mrvn: but what if you want to create a 256 byte hole in the stack?

20:50 <clever> for a char[256] local var

20:50 <geist> you can add/sub from it

20:50 <mrvn> clever: addsp #256

20:51 <clever> ah, a dedicated opcode, that solves add/sub

20:51 <mrvn> clever: it's just not the regular add reg, reg, #imm encoding.

20:51 vin has joined #osdev

20:51 <clever> what about frame pointers, can mov still read/write sp?

20:51 <mrvn> clever: why would you need a frame pointer?

20:51 <clever> or context switching, having to swap stacks

20:52 <clever> just the first example that came to mind, on copying sp

20:52 <clever> context switching is a much more useful case

20:52 <mrvn> clever: there surely also is a mov sp, reg and move reg, sp opcode.

20:53 <mrvn> clever: or you have a link/unlink opcode like m68k has. unlink loads the old SP from the stack

20:54 <mrvn> .oO(Or did it just add to the SP? something like that that remnoves one stack frame)

20:54 <clever> i recently found that the centurion's CPU6 call/return opcodes, are a funky hybrid

20:55 <clever> it will push X onto the stack, store the return addr into X, and set PC to the function being called

20:55 <geist> note one use of fp thats more or less mandatory: alloca

20:55 <clever> so X behaves like a link register, but it also auto-saved the old X to the stack

20:55 <mrvn> geist: only for variable sizes alloca

20:55 <geist> at the minimum you have to create some sort of register thing to save some sort of anchor to the stack to restore it

20:55 <geist> yes. that's what i said

20:55 <clever> but where things get really funky, is that a lot of centurion code, expects immediates after the call opcode

20:56 <clever> as-in, you put immediates into your .text, after the call opcode, and the called function will increment the link-reg as it reads them

20:56 <clever> and if it doesnt consume the right number of arguments, it returns to non-opcode data

20:57 <clever> its kind of treating call like an opcode that can consume 20+ immediates

20:57 vin has quit []

20:57 <mrvn> geist: alloca really screws you as compiler builder. It's so rarely used and yet you have to support it in all your codegen.

20:58 <nikolar> Centurion as in the 70s minicomputer?

20:58 <geist> yup

20:58 <geist> yup to mrvn that is

20:58 <clever> nikolar: yep

20:58 <nikolar> Surprisingly complicated for a cpu built out of logic chips

20:58 <mrvn> Lazy compiler just have a frame pointer. But what a waste of a register on those old cpus that have so few (*cough* x86 *cough*)

20:59 <clever> nikolar: i think this was a work-around, for having so few registers, and rather then push constants onto the stack, then pop them back out, it just put the constants inline in .text

20:59 <mrvn> clever: was .text even read-only?

20:59 <nikolar> Yeah fair

21:00 <clever> mrvn: variables went via the fancy double-indirect load opcode, you would push a constant, that points to a global pointer

21:00 <clever> so instead of printf("%d", foo);, it was more like bar = &foo; printf("%d", bar);

21:00 <clever> and then bar is staticly allocated

21:01 <clever> nikolar: another crazy thing i discovered recently, is that there is a dedicated opcode, and a large chunk of microcode, for binary relocation

21:01 <nikolar> So argument passing was done through static addresses basically

21:01 <nikolar> Not through registers or stack?

21:01 <geist> note that it' wasn't really successful as a minicomputer

21:01 <clever> nikolar: yep, except every function and syscall does arguments differently!

21:01 <geist> so it's also entirely possible it's generally not a good example of a cpu architecture

21:01 <geist> which are interesting in their own right

21:01 <clever> nikolar: much like the amiga, you need to know exactly where the function is expecting its args

21:01 <geist> ie, AT&T 3b2, etc

21:02 <nikolar> No common abi then

21:02 <nikolar> Very interesting

21:02 <clever> geist: some recently discovered docs say they sold i think 10k units

21:02 <nikolar> There's a lot

21:02 <nikolar> s/there/that

21:02 <geist> not really

21:02 <nikolar> Well for 70s and minicomputers

21:03 <geist> i think it was early 80s though right?

21:03 <geist> ie, a bit late to the game

21:03 <mrvn> clever: it's easy for C code. ints in Dx and pointer in Ax. But the register allocation for functions is part of the FFI interface.

21:04 <clever> mrvn: yep, there are clear rules, but when using gcc you have to define those in the .h file i think

21:04 <nikolar> geist: ah you're right

21:04 <nikolar> sort of 70s tech in early 80s

21:04 <geist> yeah, kidna. i suspect it was mostly a fairly cheapish back office kinda thing

21:04 <clever> https://github.com/Nakazoto/CenturionComputer/wiki/Timeline

21:04 <bslsk05> github.com: Timeline · Nakazoto/CenturionComputer Wiki · GitHub

21:04 <clever> > Also, honestly, reading through it - car crashes, hotel fires, counterfeiting, in-fighting with EDS, law suits out the wazoo... I'd totally watch that Netflix series

21:05 <mrvn> clever: they had like 20 compilers to build AmigaOS. it's a wonder how it all works together.

21:05 <clever> looks like the first centurion was made somewhere in 1973

21:05 <nikolar> mrvn: that's a bit ridiculous

21:05 <nikolar> clever: very intriguing

21:06 <clever> the way amiga handles relocations, seems to focus on a single register for all library state

21:06 <clever> base - offset, is a function pointer table, so you can call any exported functions

21:06 <clever> and base + offset is local vars, and then code

21:06 <nikolar> speaking of centurion - https://www.youtube.com/watch?v=c_KWEBjdc8k

21:06 <bslsk05> 'Emulating the Vacuum Tube Computer on the Centurion Minicomputer' by Usagi Electric (00:23:07)

21:06 <mrvn> clever: yes, A6 is the base register for library calls. The "this" pointer.

21:07 <clever> ive yet to find any similar api in centurion, but the relocation is pretty fancy

21:07 <clever> all executables, are made up of a series of records

21:07 <clever> each record has a type code, an 8bit length, a 16bit addr, a payload, and a checksum

21:07 <clever> type-0 records, just write up to 120 bytes to base+addr, you specify the base when loading

21:08 <clever> type-1 records, have a main addr, and a list of addresses

21:08 <clever> for each addr in the list, you read from base+addr (it got relocated), add base+mainaddr, and then write it back

21:08 <mrvn> clever: It might help to think of library calls as IPC. There is only one copy of the library in memory and all programms share the same address for it. The library gets initialized once and finalized once when the last program using closes.

21:08 <clever> so its basically just an array of all constants, and you then just offset them

21:08 <clever> but the crazy part, is that this patching is done by microcode in the cpu

21:09 <clever> you just call a single CVX opcode, give it the base addr, and the start of a record, and it will execute that entire record

21:09 <clever> either copying 120 bytes, or modifying up to 60 16bit ints in ram

21:10 <nikolar> The power of making custom hardware

21:10 <nikolar> And microcode :)

21:10 <clever> nikolar: but i also found an inefficiency, these records cant span across sectors on the hdd, so there is often 20-90 bytes wasted at the end of each sector

21:10 <clever> because a "copy 120 bytes" record didnt fit

21:11 <nikolar> The least they could do is change to 140b or something

21:11 <clever> but... if you just split that into "copy 90 bytes" and "copy 30 bytes"

21:11 <clever> then you can fill the sector up

21:11 <nikolar> Makes sense

21:12 <clever> nikolar: i think that 120 limit, comes from the FS being heavily record based, and the record length limit is ~128 bytes, and with the headers on a record, 120 gets to ~128

21:12 <clever> everything, even text files, are just a series of records

21:12 <clever> text files, each line is a seperate record in the file

21:12 <nikolar> Interesting

21:12 <nikolar> So no lines longer than 120?

21:12 <clever> i think so

21:12 <mrvn> but back then everything after column 65 was comment.

21:13 <mrvn> nikolar: the punch card only has 120 columns

21:13 <nikolar> mrvn: wasn't it 80

21:13 <mrvn> that wouldn't match with 120 byte records

21:14 <clever> i'm thinking it was just a char[128] buffer, plus some overheads

21:14 <clever> so 120 bytes of payload

21:17 <nikolar> interesting

21:17 <nikolar> didn't it have really weird hard drive format

21:17 <clever> all sectors are 400 bytes

21:18 <clever> i think thats a hold-over from when it was tape based

21:18 <nikolar> yeah that

21:18 <mrvn> SO you get 3 records per block and some padding?

21:18 <clever> mrvn: yep

21:19 <nikolar> also doesn't it have some chksum bytes per hd block too

21:19 <clever> yeah

21:19 <mrvn> Those people. tss. Only 2 ascii chars for the year and then they waste tons per blocks because the blocksize isn't a multiple of records.

21:19 <clever> nikolar: https://github.com/Nakazoto/CenturionComputer/wiki/CDC-Hawk-Drive#data-arrangement

21:19 <bslsk05> github.com: CDC Hawk Drive · Nakazoto/CenturionComputer Wiki · GitHub

21:20 <mrvn> 16 byte checksum?

21:20 <clever> nikolar: each sector on disk has a sync, 16bit sector addr (docs say a sector checksum, but this doesnt), a gap, a second sync, 400 bytes of data, and a 16bit checksum

21:20 <clever> mrvn: 16bit

21:20 <nikolar> That's a lot of metadata

21:21 <mrvn> 400 bytes of data is 3 records with 16 bytes left over

21:21 <clever> during writes, the controller will read that sync+addr, to confirm its on the right sector, then switch to write mode

21:21 <clever> and the gap before the 2nd sync, gives it time to switch modes

21:22 <mrvn> I wonder what happens when one of the address bits flips. The controler sees it's at the wrong position so it seeks?

21:22 <nikolar> Very interesting

21:22 vin has joined #osdev

21:22 <clever> mrvn: the sector-address is only written during a format, but if it gets the wrong addr during any operation, it will return an error code

21:23 <clever> its hard-sectored, so there is a wheel on the hdd, giving an index pulse 16 times per rotation

21:23 <clever> and it knows exactly where on the platter each sector should begin

21:23 <nikolar> so why is address necessary on the disk

21:23 <mrvn> So it's more a "hey, the motor is broken" error when the address is wrong.

21:23 <clever> nikolar: to detect errors, like the seek head being on the totally wrong track

21:23 <geist> heh this centurion history wiki page is some salacious stuff

21:24 <nikolar> Yeah it's fascinating

21:24 <clever> https://i.imgur.com/f93BgvS.png each bit on the hawk hdd, is stored in a 400 nSec window

21:25 <clever> you have a clock pulse running at 2.5mhz, and a "1" is encoded by having a second pulse between the clocks

21:25 <clever> while a "0" is just the clocks alone

21:26 <clever> each sync pattern, is 87 "0"'s, and then a single "1" bit

21:26 vin has quit [Client Quit]

21:26 <clever> because that 2.5mhz clock wont be in phase with the reader, and it has to re-align itself

21:27 <clever> https://i.imgur.com/FCcVMib.png the hard-sectoring wheel, each ring has a different number of sectors/track, and you move the sensor (top of frame) to select a sector/track setting

21:27 <clever> the hdd then counts that, and outputs a 4bit sector-number on the control ribbon

21:27 <clever> and the 2 quick pulses (bottom of frame) is the other index mark, to signal 1 complete rotation

21:28 immibis_ has joined #osdev

21:31 vin has joined #osdev

21:36 <vin> Hi, does anyone know how postgres or other row major databases store records on disk. Especially, when the records are of variable length and the entries of the record themselves can be of different sizes.

21:36 <vin> Is there a metadata maintained, that says at what offset new records begin and at what offset different entries in the record exists

21:36 <clever> vin: sqlite has good docs on that

21:37 Burgundy has joined #osdev

21:37 <mrvn> vin: DBs really hate variable sized records

21:37 srjek has quit [Ping timeout: 248 seconds]

21:37 <nikolar> Sqlite kind of suggest using sqlite databases as a replacement for tarballs lol

21:38 <clever> vin: https://sqlite.org/fileformat2.html section 2, schema layer

21:38 <bslsk05> sqlite.org: Database File Format

21:38 <clever> every row is a single variable length record, and then i think there is a b-tree pointing to where the record begins

21:39 <clever> behind the scenes, tables always have a primary key, if you dont assign one, it creates a rowid auto-increment column for that

21:39 <clever> and every database engine does this differently, mysql even supports multiple engines (innodb vs myisam)

21:43 <vin> Thanks clever I skimmed through the docs. So the length of each record or its entries does not seem to be explicitly stored anywhere right

21:43 <clever> for sqlite, yeah

21:44 <nikolar> so the main thing is the b-tree for metadata and record locations?

21:44 <clever> nikolar: yeah, that would tell you where every row starts

21:44 <nikolar> i should probably read about the on disk format

21:44 <vin> So for each primary key go to the appropriate leaf of B-tree and read until you encounter a NULL and then parse through the row for a entry

21:45 <nikolar> sqlite is a really neat library

21:45 <clever> section 1.6, b-tree pages, should have more

21:45 <clever> vin: i think the secondary indexes, the b-tree just holds the rowid(primary key), and then you scan that b-tree next

21:46 <clever> and once you have a record, you can find it in every index, because you know all the values

21:46 <nikolar> does it have a full on programming language interpeter for sql

21:47 <clever> nikolar: sqlite has its own bytecode, and it will compile your sql into that bytecode, then run it

21:47 <nikolar> wonder if jitting would help with speed at all

21:47 <clever> vin: near the start of the file, is the root page# for the "sqlite_master" table b-tree, which you can then query like any other table: https://gist.github.com/cleverca22/d39bbb6e06982b7e71aacce93f42e759

21:47 <bslsk05> gist.github.com: gist:d39bbb6e06982b7e71aacce93f42e759 · GitHub

21:47 <clever> that internal table, then holds the root page# for every other table

21:48 dutch has joined #osdev

21:48 <clever> along with the schema for how each table/index was made

21:48 <clever> nikolar: an example of sqlite bytecode: https://gist.github.com/cleverca22/26c46201a9e63031add14b1b66890aba

21:48 <bslsk05> gist.github.com: gist:26c46201a9e63031add14b1b66890aba · GitHub

21:49 <nikolar> Huh interesting

21:49 <clever> updated the gist

21:49 <clever> the column command, expects a pointer to a record, as described earlier, and it copies a value from the record, to the output buffer

21:50 <clever> so when you do something like `select c,a,b from ...`, it can re-arrange the columns

21:50 <nikolar> Why the heck is chromium idling at 100% CPU usage in duo lingo

21:50 <nikolar> clever: wonder what unions look like

21:50 <mrvn> nikolar: not sure. it's hard to get it that low.

21:50 <clever> nikolar: it can also do math operations: https://gist.github.com/cleverca22/26c46201a9e63031add14b1b66890aba#file-gistfile3-txt-L9

21:50 <bslsk05> gist.github.com: gist:26c46201a9e63031add14b1b66890aba · GitHub

21:51 <vin> clever: thanks! if you need a entry of 10B in a row of 4kB this seem to be an inefficient way to store on disk

21:52 <nikolar> It's fast though

21:52 <mrvn> vin: you don't want read-modify-write ops in a DB

21:53 <clever> nikolar: a join: https://gist.github.com/cleverca22/26c46201a9e63031add14b1b66890aba#file-gistfile4-txt

21:53 <bslsk05> gist.github.com: gist:26c46201a9e63031add14b1b66890aba · GitHub

21:53 <vin> But you are reading 400x more data than what you actually want to read

21:53 <clever> vin: sqlite also operates on pages, and always loads a full page from disk, before doing anything with it

21:53 <geist> yah that's very b-tree like

21:54 <nikolar> vin: you can only ever read 4kb pages at once anyway

21:54 <nikolar> you can't read less from storage

21:54 <clever> `pragma page_size` reveals that my current database is using 1024 byte pages

21:54 <nikolar> or 512b if you have old hdds

21:54 <mrvn> nikolar: you can't mmap 512 byte

21:54 <clever> pagesize and fs cluster size dont have to match

21:55 <nikolar> clever: well now i am completely lost at that union

21:55 <clever> but when there is a mismatch, you get more RMW cycles on the hdd

21:55 <vin> nikolar: Modern storage devices offers smaller granularity such as 256B and 512B

21:55 <nikolar> do they?

21:55 <vin> Yup

21:55 <clever> nikolar: https://sqlite.org/opcode.html explains what each opcode does

21:55 <bslsk05> sqlite.org: The SQLite Bytecode Engine

21:56 <nikolar> i remember that it was excplicitly mentioned to set 4kb pages for zfs

21:56 <clever> nikolar: zfs treats all files as having 128kb records by default

21:56 <nikolar> because using smaller is bad on ssds and they report having smaller pages for compatibility reasons

21:56 <nikolar> clever: i am talking about hardware pages mostly

21:57 <clever> ah, yeah, zfs calls that ashift, fat calls it cluster size

21:57 <mrvn> nikolar: no, ssds report a block size and erasure size. Block size is pretty much irrelevant.

21:57 <clever> the smallest unit you can use on-disk

21:57 <nikolar> clever: exactly, ashift=12

21:57 <clever> ssd's might internally operate on 4kb blocks, and when you do a 512b write, it has to RMW the whole 4k block

21:57 <nikolar> ie 4kb

21:57 <vin> Doing small writes are bad on SSD due to endurance, so usually it's write combined. That said there are new media types beyond flash that don't have this kind of band endurance property

21:58 <clever> so telling zfs to do 4k writes avoids the RM and just does W

21:58 <nikolar> wonder why intel killed optane

21:58 <nikolar> seemed to be doing pretty well

21:58 <mrvn> clever: worse. the have to read 4k and copy them somewhere else because they can't erase 4k

21:58 <clever> nikolar: i heard that ntfs wasnt designed for a read-cache, and bodging it in caused major cpu overhead

21:58 <vin> nikolar: I think CXL is the answer

21:58 <nikolar> cxl?

21:58 <clever> nikolar: so intel then vendor-locked the optane drivers to certain intel cpu's, that where known to be able to handle it

21:59 <nikolar> clever: kek ms doing ms thins

21:59 <clever> zfs on the other hand was designed to allow this kind of thing

21:59 <nikolar> clever: zfs really is the last word in filesystems lol

21:59 <nikolar> latest?

22:00 <clever> nikolar: with that sqlite union, the first thing it does is open both tables, and open 1 index, it then scans over validpaths (just iterating over every single row)

22:00 <nikolar> and then searches for those ids in the other table?

22:01 <clever> nikolar: i think addres 7-13, fetches the deriver (join column) from that table, and then does an index based search on the 2nd table

22:02 <nikolar> yeah that's what i said, poorly

22:02 <clever> SeekGE's p1 is 2, that is the handle from the 3rd OpenRead, the index

22:02 <nikolar> that's actually exactly what i thought databases do

22:03 <clever> the main trick with sqlite, is that the `ResultRow` opcode, pauses the virtual machine, and returns back to the caller

22:03 <clever> then you can access the current result, and call `sqlite_step()` again, to run the VM until it either has another ResultRow, or a Halt

22:04 <clever> network based engines like mysql/postgresql, tend to gather many rows into a buffer, and then spit them over the network in bulk

22:04 <nikolar> Is limit done in the VM or in the c code

22:05 <clever> in the vm

22:05 <nikolar> interesting

22:05 <clever> nikolar: https://gist.github.com/cleverca22/26c46201a9e63031add14b1b66890aba#file-gistfile5-txt

22:05 <bslsk05> gist.github.com: gist:26c46201a9e63031add14b1b66890aba · GitHub

22:06 <clever> addr 1 initializes a local var to 5, DecrJumpZero will decrement it, and maybe jump

22:06 <nikolar> oh so it's nothing special

22:06 <nikolar> just a loop

22:06 <clever> kinda, its reusing the same array the result is in

22:06 <clever> and ResultRow offsets what your looking at

22:06 <clever> the results are now in 2-10, not 1-9

22:07 <clever> so r[n] is a variable size array, that the bytecode can just store anything it wants into

22:07 <clever> Integer can put constants in, DecrJumpZero can manipulate it, Column copies from a record to r[n], ResultRow passes a slice of the array back as a result

22:09 <clever> nikolar: that loop is also there if you lack the limit clause, so the only thing DecrJumpZero is doing, is aborting when the specified count is reached

22:09 <nikolar> interesting

22:09 <nikolar> how is offset implemented

22:09 <clever> compare gistfile5 and gistfile2

22:09 <nikolar> does it have to go through all rows until it gets to the offset

22:10 <nikolar> clever: got it

22:10 <heat> geist, i guess the mem reserve mechanism fdt supports is kind of limited in platform description purposes

22:10 <heat> no way to fit i.e a phandle or name

22:10 <geist> yah since it's in the header it's presumably some old mechanism that's basically deprecated by modern usages

22:10 Burgundy has left #osdev [#osdev]

22:10 <geist> would be easy for a piece of dump firmware to fill it out, etc

22:11 <clever> nikolar: https://gist.github.com/cleverca22/26c46201a9e63031add14b1b66890aba#file-gistfile6-txt and 6 has a skip

22:11 <bslsk05> gist.github.com: gist:26c46201a9e63031add14b1b66890aba · GitHub

22:11 <clever> yeah, ive thought that memreserve was meant more for a dumb loader, that cant understand fdt

22:12 <geist> anyway, i dont even know how you fill it out using dts, since it doesn't seem to be expressed anywhere in the source

22:12 <clever> a special tag at the top, /memreserve i think

22:12 <clever> it should be in the dtc docs

22:13 <nikolar> so basically it has to go throigh all rows

22:13 <clever> but it can do that at the b-tree level, i assume

22:13 <clever> so its skipping thru leaves in the b-tree

22:14 <nikolar> if it has a where clause too, it couldn't skip i imagine

22:15 <heat> nikolar, zfs is the last word in garbage software

22:15 <nikolar> heat: why the hate lol

22:15 <heat> i call it how I see it

22:16 <heat> zfs is the most stupidly complex filesystem and filesystem driver

22:16 <heat> with very glaring flaws

22:16 <nikolar> It does it's job ¯\_ (ツ) _/¯

22:16 <mrvn> heat: is there anything better though?

22:16 <nikolar> At least it never corrupted my drive like btrfs

22:16 <heat> ext4, btrfs

22:16 <heat> ntfs is decent too

22:17 <clever> nikolar: ive had btrfs go read-only, because a disk took too long to respond

22:17 <heat> i think xfs was or is nice

22:17 <mrvn> crap, more crap, bad

22:17 <heat> bull💩

22:17 <nikolar> honestly, the only fs i would trust for long term storage is zfs

22:17 <clever> heat: any fs ontop of mdadm, cant deal with re-reading the other half of the mirror when corruption is found, because the fs and raid are too isolated

22:18 <heat> ext4 is the most reliable, simple, performing fs

22:18 <nikolar> ext4 is reliable, until the bitrot sets in

22:18 <mrvn> heat: and doesn't have checksums everywhere

22:19 <nikolar> i have no issues with ext4, and it's great given how simple it is

22:19 <nikolar> but zfs is just something else

22:19 <heat> zfs has a nasty codebase, horrific design (2 journals or whatever the fuck that was?), unbounded complexity

22:19 <heat> bunch of edgecases

22:19 <clever> heat: one journal per filesystem, multiple filesystems in a pool

22:19 <nikolar> wouldn't know about the codebase

22:20 <heat> clever, sorry, whatever the hell ZIL is

22:20 <heat> it had two

22:20 <clever> the ZIL is very weird, compared to say ext4

22:20 <nikolar> and the point of having two is that you always have one valid

22:20 <nikolar> no matter what happens

22:20 <nikolar> so you can't corrupt the drive

22:20 <heat> riiiight

22:20 <clever> ext4 journal, i believe is a backup of metadata its actively modifying

22:21 <heat> what if we had some sort of, idk, thing where you could clone data?

22:21 <heat> or a way to log the operations you do on a filesystem

22:21 <mrvn> clever: by default the journal is only metadata

22:21 <clever> ive not seen anything about the ZIL being duplicated

22:21 <clever> mrvn: and when in data mode, does it write all data twice? to the journal, then the fs?

22:21 <heat> clever, no, it's about you committing shit to the ZIL, and then needing to recommit to the actual data store

22:21 <heat> it's so fucking idiotic

22:21 <clever> heat: ah, thats only done for small writes

22:21 <clever> large writes dont do that

22:22 <heat> yes, and it breaks normal filesystem semantics

22:22 <heat> I took the time to submit a bug report and I had 0 engagement from anyone in the zfs team

22:22 <clever> to do with filesize?

22:22 <heat> block count

22:22 <clever> ah yeah, that one

22:23 <heat> and *fsync()* does not actually sync

22:23 <heat> it's somehow much worse

22:23 <clever> i believe the ext3/4 journal, is purely an overlay, so reading block 42 gives you what the journal has, rather then the real block 42

22:23 <clever> because all metadata is edited in place, and thats the only way to not corrupt things

22:23 <mrvn> clever: that's kind of the point of the journal

22:23 <clever> but CoW FS's like zfs, avoid that problem entirely

22:23 <clever> the old metadata isnt modified

22:24 <clever> it just makes new metadata in free space

22:24 <mrvn> except for the ZIL

22:24 <nikolar> i'd be happy to switch when we get a filesystem that can do what zfs does, and is simpler

22:24 <clever> the ZIL is a shortcut, to commit things to disk without having to re-write the entire metadata tree

22:24 <nikolar> but at the moment, the zfs is probably the best we have

22:24 <mrvn> clever: you mean a journal? *wonder*

22:24 <clever> each ZIL block, is pointing to a future ZIL block, that is free at the time you pointed to it

22:25 <clever> and the ZIL doesnt store blocks being written, but is more of a record of write() syscalls

22:25 <heat> if by "do what zfs does" you mean store files and directories, may I introduce you to ext4?

22:25 <clever> so you can replay the last few actions, and rebuild the dirty state

22:25 <nikolar> no i meant snapshots, functional raid, clones, etc

22:25 <mrvn> clever: for me the point of a COW filesystem was to get away from fournaling.

22:25 <heat> easy, btrfs

22:26 <mrvn> heat: btrfs doesn't perform on nearly full filesystems. So basically never.

22:26 <clever> mrvn: if you set sync=disabled, then zfs basically never touches the zil, and all writes go thru the CoW tree

22:26 <clever> but i think it just ignores sync() entirely as well

22:26 <clever> so you can loose the last few seconds-minutes of changes

22:26 <mrvn> clever: unsuable

22:26 <clever> but they will at least be in-order and atomic, i believe

22:26 <nikolar> well the first time i tried using btrfs, it ate my data, so no thanks :)

22:26 <nikolar> not yet at least

22:26 <clever> i'm fuzzy on the exact semantics, need to study it more

22:27 <clever> mrvn: zfs also performs poorly on a full disk, i think most FS's do

22:27 <nikolar> yeah most do

22:27 <mrvn> clever: scales much better till you get real close to 100%.

22:28 <mrvn> and I haven't run into a ENOSPAC on "rm file" yet.

22:28 <nikolar> kek

22:28 <clever> for zfs specifically, the disk is broken into groups (same as ext), and each group has its own free-space histogram and list

22:28 <clever> by default, if it cant find a big enough hole in the loaded groups, it just gives up and fragments the record

22:29 <mrvn> if you get down to the last 8 groups or so zfs slows down drastically.

22:29 <nikolar> does it never compact the existing data

22:29 <clever> zfs.zfs_metaslab_try_hard_before_gang=1 forces it to scan every group in the pool, but that slows it down more

22:29 <mrvn> nikolar: no, that would possibly corrupt that data

22:29 <clever> nikolar: nope, once a record is fragmented, its stuck that way, until you delete it

22:29 <mrvn> nikolar: it has a defrag that will copy fragmented data to new groups.

22:29 <clever> and fragmented blocks are a huge waste of space, ive seen a file take up double its size

22:29 <nikolar> i know that lfs comacted the existing data when it was running low

22:30 <nikolar> and zfs took a lot of ideas from it

22:30 <nikolar> don't know how exactly zfs does everything though

22:30 <clever> this is where ext2/3, ext4, and zfs differ some

22:30 <mrvn> you can also resilve the FS if you change the stripe count or such.

22:30 <mrvn> resilver

22:30 <clever> ext2/3 stored a file as a big array of clusters, you pick a cluster size at format time, and then it has to store the array of cluster#'s in the fs indirection tree

22:31 <clever> and if a file isnt fragmented, you waste space storing every number from 10 to 30 in a block

22:31 slidercrank has quit [Ping timeout: 246 seconds]

22:31 <nikolar> yeah which makes it a bad idea to resize the fs

22:31 <mrvn> clever: that overhead is really irrelevantr.

22:31 <clever> ext4 switches to extent trees, where it just says block X of the file, starts at block Y on the disk, and is Z blocks long

22:32 <mrvn> extends are more about speed than disk space.

22:32 <clever> so the larger your fragments, the less space you waste on metadata

22:32 <clever> yeah, extents let you handle massive files and small blocks, without worry

22:32 <mrvn> faster to find the block for an offset, easier to do a sequential read for contiguous data

22:32 <clever> zfs instead has records, a file is broken up into blocks of say 128kb

22:33 <clever> that block is then compressed, and rounded up to 2^ashift (4kb for example)

22:33 <clever> and then that chunk is put onto the disk, and a pointer to it is stored in the indirection tree

22:33 <nikolar> *optionaly compressed and encrypted

22:33 <clever> yep

22:33 <mrvn> zfs can't really do extends because of compression and checksumming

22:34 <clever> but its different from just setting the ext block size to 128kb, because a block can occupy less then 128kb

22:34 <clever> and also the whole record system

22:34 <clever> its basically 2 different block systems, stacked

22:34 <mrvn> encryption is fooie in zfs though

22:34 heat has quit [Remote host closed the connection]

22:34 <clever> the file is made of a series of 128kb blocks, but those may be compressed, and then laid down on a 4kb block disk

22:35 Burgundy has joined #osdev

22:35 <clever> but the file block size can be changed fs-wide

22:36 <clever> mrvn: as for speed, ext2/3 was nice, in that you could pre-compute your path down the indirection tree, read block X, look at offset X1, read the block from there, look at offset Y1

22:37 <clever> and then just fire off a chain of IO's, but its sequential, so you had to wait for each single read

22:37 <nikolar> mrvn: fooie?

22:37 <mrvn> nikolar: has flaws

22:37 <clever> ext4 is more cpu intensive, and you have to parse the whole block, before you know which block is next, but there is likely fewer IO in total

22:37 <nikolar> i am aware

22:37 <nikolar> i know that metadata is never encrypted

22:37 <nikolar> natively

22:38 <mrvn> clever: you can parse a whole lot of extends in the time it takes a spinning disk to read a block.

22:38 <clever> yeah, and thats where extent trees win

22:38 <mrvn> nikolar: they also messed up the HMAC or something. Can't remember the details but you can edit the disk and it won't notice.

22:38 <nikolar> at least you can't read the data

22:39 <clever> mrvn: i dont see how thats possible, given all the checksums going on

22:39 <mrvn> clever: you replay another block which has correct checksums inside

22:39 <nikolar> the encryption layer wouldn't notice i imagine

22:39 <nikolar> not the rest

22:39 <clever> mrvn: the block holding that checksum, is also checksummed

22:39 <clever> its checksums all the way to the root!

22:39 <clever> so you would need a zfs aware tool to modify the whole tree

22:39 <mrvn> clever: obviously

22:40 <clever> but yeah, i can see the issue

22:40 <clever> zfs lets you send an encrypted dataset to an untrusted party that lacks the keys

22:40 <clever> and in that state, its just a series of numbered files, each made up of an array of <=128kb objects

22:40 <mrvn> but when I encrypt my FS I would expect that nobody can read my data but also that nobody can modify my data

22:41 <clever> and using zfs tooling, you can replace any record or delete any file

22:41 <clever> you just wont know what your messing with, because the directory tree is also encrypted

22:41 <clever> i can see why you would want hmac ontop of checksum

22:42 <mrvn> clever: I want the checksum connected to the encyrption. Not separate

22:42 <mrvn> if someone can alter the cheksum without the encryption key then they can swap out encrypted blocks.

22:43 <clever> yeah, that as well

22:43 <clever> for that, you may want an hmac on the root of the dataset

22:43 <clever> hmmm, but with how indirect blocks work, that would be invalidated upon zfs send

22:43 <clever> it would need to be a custom hmac of just the hmac's

22:44 <clever> rather then the metadata

22:46 <clever> a "block pointer" in zfs, is a 128 byte object, that holds the hash of the data its pointing to, and metadata about which compression and hash algo was used, if its encrypted, if its fragmented, if its under dedup, the size before/after compression, and up to 3 pointers of where it is on-disk (certain important metadata is stored 2 or 3 times)

22:46 <clever> but encrypted records, reuse one of those pointers for crypto state, so it can only store 2 copies

22:46 <clever> an indirect block, is then just a big array of those, compressed, and stored as another record, with a new block-pointer pointing to it

22:47 <clever> so its much more like ext2/3, where you know the depth of the tree, and which index to use at each layer

22:49 <mrvn> clever: holes reduce the tree

22:49 <clever> yeah, a sparse hole, would be all nulls in the indirect block, which then compresses down

22:49 <clever> and depending on the size of the file, entire branches of the indirect tree may be missing

22:50 <clever> size of the hole*

22:52 <mrvn> "Skip intro", the best invention since binch watching episodes.

22:52 <nikolar> kek

22:53 Gooberpatrol66 has quit [Quit: Leaving]

23:18 dude12312414 has joined #osdev

23:20 <zid> https://cdn.discordapp.com/attachments/347254691966615552/1071903147322265740/image0.jpg

23:21 dude12312414 has quit [Client Quit]

23:59 fedorafansuper has joined #osdev