#osdev on 2024-04-27 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <clever> my nas is running low on 128kb sized holes, and fragmentation can double the size of a file

00:00 <nikolapdp> but that's only an issue if you approach 100% used space

00:00 <clever> so ive been having to reduce the recordsize to prevent that

00:00 rpnx_ has joined #osdev

00:00 <nikolapdp> makes sense

00:01 <mjg> getting yourself fuck with snapshots is partially PEBKAC

00:01 hunta987 has joined #osdev

00:01 <nikolapdp> it is

00:01 <mjg> wow, got a system old enough that 'boot disk' means a literal floppy

00:01 <nikolapdp> lol

00:01 <clever> nikolar: ah found it, the free space histograms, say my nas has 188gig free, df claims 27gig free

00:01 <clever> that 161gig difference, is the slop space

00:01 <nikolapdp> yeah one accounts the reserved space

00:02 <clever> that both lets things work when "full", and also stops you from making the fragmentation get too bad

00:02 <clever> https://github.com/openzfs/zfs/blob/master/module/zfs/spa_misc.c#L351-L360

00:02 <bslsk05> github.com: zfs/module/zfs/spa_misc.c at master · openzfs/zfs · GitHub

00:03 <clever> ah, its basically just 1 / (1<<spa_slop_shift) percent is reserved

00:03 <clever> so at 5 (the default), 3.125% is reserved

00:03 <clever> so earlier, when i changed it to 4 and got 0 free, its because i doubled the reservation, to 6.25%

00:04 <nikolapdp> yeah checks out

00:04 <mjg> clever: what's your biggest zfs pool

00:05 <clever> mjg: 3 x 4tb in a raidz1 array

00:05 <clever> so 8tb of capacity, with 4tb of parity

00:05 <clever> but, those numbers are also fuzzy, because of how raidz1 stores shorter records

00:06 goliath has quit [Quit: SIGSEGV]

00:06 <mjg> rookie numbers man

00:06 <clever> lets say the block size is 4kb, and i write an 8kb record to disk, raidz1 will split into 4k+4kb of data, and 4kb of parity, and spread it over 3 disks

00:06 <nikolapdp> we're not rich here mjg

00:06 <mjg> not to throw shade!

00:06 <clever> but if i write a 4kb record, raidz1 will basically degrade down to a mirror, and store just 2 copies

00:06 <clever> so the parity overhead varies, depending on the size of the record

00:07 <mjg> what's that, your local nas for storing movies 'n shit?

00:07 <mjg> or a backup server

00:07 <clever> mjg: i still need to get the 3 x 16tb sas drives going

00:07 <mjg> (perhaps both?)

00:07 <clever> both and everything, lol

00:07 <clever> its got backups of a laptop i had over a decade ago

00:07 <nikolapdp> same actually lol

00:07 <clever> its got more shows then i could possibly rewatch

00:08 <clever> i think it has a corrupt ext4 volume from when lvm ate things

00:08 <clever> its also got a chunk of my windows steam library, shared over iscsi

00:08 <mjg> lol

00:08 <clever> the windows machine was full, the nas wasnt, samba hates steam, so iscsi it was :P

00:09 <nikolapdp> neat

00:09 rpnx_ has quit [Ping timeout: 256 seconds]

00:09 <clever> i also ran talos princicle 2 over iscsi like that for a few days, and ouch, the LOD generation is shit

00:09 <kof673> "a most DE-LIGHTful shade" </alchemy land is almost 180 of modern things>

00:09 <heat57> clever: ext4 got "fast commit" which is basically just logging ext4 operations instead of block writes

00:09 <clever> the low LOD models are auto-generated, and look horrible

00:10 <clever> and due to the fragmentation on my nas, it can take minutes to load

00:10 <clever> then i freed up space on my nvme, and it loads so fast i cant even catch the low LOD model, lol

00:13 <mjg> i have to say the old bsd installers i used to consider shite back in the day (2005-ish)

00:13 <mjg> beat the shit out of linux installers in the same timeframe

00:14 <mjg> at least ubuntu and slackware

00:14 <mjg> erm, debian

00:14 <mjg> and salckware

00:14 <clever> ive installed LFS before, so gentoo and nixos where easy mode :P

00:14 <mjg> it's not about being easy or not, it's about being CUMBERSOME or not

00:14 <mjg> fucking xfree86 configuration vibes

00:15 <clever> oh, that reminds me, the first time i switched to wpa_supplicant, because i was away from home and was forced to go wpa, lol

00:15 <clever> and the problem, is looking up how it works, when you dont have wifi :P

00:15 <mjg> clever: there is a funny quote somewhere in freebsd's kernel developer guide

00:16 <mjg> clever: they recommend printing the docs because "it is difficult to read online documentation while single-stepping the kernel"

00:16 <clever> lol

00:16 Matt|home has quit [Quit: Leaving]

00:17 <clever> thats why i basically always install a new nixos machine over ssh

00:17 <clever> so i have a working env with copy/paste and all of my usual tools

00:17 <clever> nixos does allow making a custom iso that has all of your usual tools, but it wont have your state (chrome bookmarks and such)

00:18 <clever> and the usual problem of fitting 2 keyboards and mice on one desk :P

00:18 <clever> when you can just throw the pc in a random corner, plug in a cat5, and ssh into it

00:19 <clever> 8.5G -rw-r--r-- 1 root root 8.3G Apr 17 2017 /nas/backups/dev-server/sda.img.gz

00:19 <clever> ah, and here is an example of a fragmented file in zfs

00:19 <clever> 8.3gig of data, is taking up 8.5gig of space

00:19 <clever> 186M of overhead

00:20 <clever> [root@nas:~]# ls -lhsi /nas/backups/dev-server/sda.img.gz

00:20 <clever> 596386 8.5G -rw-r--r-- 1 root root 8.3G Apr 17 2017 /nas/backups/dev-server/sda.img.gz

00:20 <clever> if i add -i, i can see the inode#

00:21 <clever> [root@nas:~]# time zdb -ddddddddbbbbbbbb naspool/nas 596386 > fragmented-files

00:21 <clever> and this will then dump all of the metadata for that file

00:21 rpnx_ has joined #osdev

00:22 <clever> https://gist.github.com/cleverca22/0c93ea2fb28720d60b61894a8e7beda8 partial output from that

00:22 <bslsk05> gist.github.com: gist:0c93ea2fb28720d60b61894a8e7beda8 · GitHub

00:22 <clever> the `gang` flag in that listing, means the block is fragmented

00:23 bauen1 has joined #osdev

00:24 <clever> the record on line 30 for example, if i take the 2 sizes listed (0x188000 + 0x2000), then i get 1.5mb, but the dblk on line 4 says it should be 1mb

00:25 <kof673> the only thing i ever had an issue with on freebsd install, is i believe you need to manually glabel and gmirror ahead of time if you want those, don't know if they ever added that (or zfs now i suppose)

00:25 <clever> ah, but because of raidz1, it should be 1.5mb (exactly 50% overhead), but its actually 1.5390625mb

00:25 <clever> an extra 40kb wasted, on that record then

00:26 <clever> kof673: for zfs, you can upgrade a lone disk into a mirror at any time, and downgrade it back into a lone disk

00:26 <clever> converting between single/mirror, and raid i dont think is possible, and removing a disk i dont think is possible (just replacing it with the same type, and same or larger size)

00:26 <kof673> you can do that, just you have to create the mirror on one disk first :D

00:26 rpnx_ has quit [Ping timeout: 260 seconds]

00:26 <kof673> the installer did not do that :D

00:27 Arthuria has quit [Ping timeout: 268 seconds]

00:27 <clever> ah, for zfs, you dont need that, it can seamlessly turn a single disk into a mirror at any time

00:27 <clever> all that really does, is change some top level metadata, and clone all of the data over

00:27 <kof673> ^^^ yeah so if you create them ahead of time, then you can just point the installer there

00:27 <kof673> or it is all "transparent" the installer is none the wiser

00:28 <clever> have you seen the labels in the vdev stuff?

00:28 <kof673> i have no idea what vdev stuff is like /dev/label? yes there was that, or something similar

00:29 <clever> kof673: https://gist.github.com/cleverca22/ee588aa10c208880a8eb6be84b72e138

00:29 <bslsk05> gist.github.com: desktop · GitHub

00:29 <clever> basically, a vdev is a disk within the pool

00:30 <clever> when doing raid, the entire raid collection is considered a single vdev

00:30 <clever> so if you do raidz1 over sda/sdb/sdc, then that whole set is a single raidz1 vdev

00:30 <clever> the gist i linked, shows the headers at the start of a single partition

00:31 <kof673> oh yeah, i just meant it makes entries like that: > path: '/dev/disk/by-id/wwn-0x50014ee2654606d2-part2'

00:31 <kof673> so fstab/whatever can use those IIRC

00:31 <clever> on my desktop, you can see the headers define a single disk (no redundancy)

00:31 <clever> the path in there, is a performance thing, when mounting the pool, you can look there first

00:32 <clever> but if the disk cant be found there, you can take the guid from desktop line 19, and just search every disk in the box

00:32 <clever> but when you start to get 100+ disks in a box, that search could become costly

00:32 <kof673> what i do know, the geom stuff is all producer/consumer so i believe it is supposed to be entirely transparent to whatever else you do (e.g. what filesystem you create on it), and arbitrary stacking of whatever components

00:33 <clever> that sounds like lvm

00:33 <kof673> so there is also an encryption thing etc.

00:33 <clever> if you look at my nas in the above gist, line 18 says the type is raidz, line 21 says 1 parity disk, and then lines 28-51 define the children, what guid to expect for each member of the raidz1

00:33 carbonfiber has quit [Quit: Connection closed for inactivity]

00:33 <kof673> yeah, i just use labels for those :D

00:34 <clever> zfs uses guid's all over the place

00:35 <clever> for my nas, lines 31/39/47 define the guid of each child of the raidz1, line 20 is then the guid of the whole raidz1 vdev, and line 13 is the "top guid", the guid for the root containing all data

00:35 <clever> in this case, the raidz1 and top guid match, so the raidz1 is the root

00:35 troseman has quit [Quit: troseman]

00:35 <clever> but, if you instead did `raidz1(sda,sdb,sdc) + raidz1(sdd,sde,sdf)`, the top guid wouldnt match

00:36 <clever> so then you need to search for other things, with the same pool_guid, and assemble the parts

00:36 <clever> the pool_guid is the closest thing to an ext4 uuid

00:38 carbonfiber has joined #osdev

00:40 rpnx_ has joined #osdev

00:45 rpnx_ has quit [Ping timeout: 245 seconds]

00:50 <kof673> *or, freebsd maybe i didn't even use the installer, just make your label and mirror, put an MBR at some point, make FSses, mount, then untar IIRC lol

00:50 <kof673> that is all the base system "sets" were, is tar files

00:55 rpnx_ has joined #osdev

01:01 netbsduser` has quit [Ping timeout: 256 seconds]

01:08 xelxebar has quit [Ping timeout: 240 seconds]

01:09 xelxebar has joined #osdev

01:21 Arthuria has joined #osdev

01:25 xFCFFDFFFFEFFFAF has joined #osdev

01:28 Arthuria has quit [Ping timeout: 268 seconds]

01:29 Arthuria has joined #osdev

01:32 Arthuria has quit [Killed (NickServ (GHOST command used by Guest684531))]

01:32 Arthuria has joined #osdev

01:41 vaihome- has quit [Remote host closed the connection]

01:42 <clever> kof673: gentoo is the same, the stage3 tarball is a complete working userland, but no kernel

01:42 <clever> kof673: unpack that to a rootfs, chroot in, build/install a kernel, setup the bootloader

01:48 FreeFull has quit []

01:49 FreeFull has joined #osdev

01:50 heat57 has quit [Quit: Client closed]

01:51 heat has joined #osdev

02:04 xFCFFDFFFFEFFFAF has quit [Ping timeout: 240 seconds]

02:05 edr has quit [Quit: Leaving]

02:10 Arthuria has quit [Ping timeout: 240 seconds]

02:27 xFCFFDFFFFEFFFAF has joined #osdev

02:28 <heat> https://github.com/microsoft/MS-DOS/blob/main/v4.0/src/CMD/FDISK/FDISK.C#L221

02:28 <bslsk05> github.com: MS-DOS/v4.0/src/CMD/FDISK/FDISK.C at main · microsoft/MS-DOS · GitHub

02:28 <heat> pascal in C

02:30 <kazinsal> oh YEAH that's the kind of cursed shit I love to see

02:32 <kazinsal> same thing is used in the SELECT.EXE source

02:37 <clever> is that just { and } behind a #define??

02:37 <kazinsal> you know it

02:37 <clever> i'm surprised at how many comments are there, back when disk space was costly, but its nice to see it self-documenting so well

02:39 <hunta987> >To whoever winds up maintaining this, I will apoligize in advance. I had just learned 'C'

02:39 <clever> 403180: c3 ret

02:39 <clever> 403181: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)

02:39 <clever> why might gcc be generating such large junk after the ret?

02:39 <clever> its just a no-op function, a ret and nothing more

02:39 <kazinsal> padding

02:40 <clever> why not pad with nop's?

02:40 <heat> it's a nop

02:40 <clever> and yeah, it is starting each function on 16 byte alignment

02:41 <Mutabah> it's a multi-byte nop

02:41 <kazinsal> looks cooler

02:41 <clever> lol

02:41 <heat> it's more efficient

02:41 <Mutabah> I assume that it's cheaper for the CPU to decode a long instruciton than to decode lots of small ones

02:41 <heat> yep

02:41 <clever> but its after a ret, so it shouldnt even be decoding it

02:41 <heat> and CPUs recognize certain forms of the nop instructions

02:41 <heat> LOL clever you're funny

02:42 <heat> in any case yes you could do it with int3's

02:42 <clever> i know there is some prefetch and look-ahead, but to blaze right thru an unconditional ret??

02:42 <heat> yep

02:42 <heat> SLS (straight line speculation) hardening does "ret; int3"

02:42 <heat> and there's a performance impact

02:42 <clever> another thing, i was wanting to try and optimize the binary for size, but nm claims the function is 1 byte, and the padding adds 15

02:45 Arthuria has joined #osdev

03:06 dude12312414 has joined #osdev

03:07 xFCFFDFFFFEFFFAF is now known as ignoratio

03:07 ignoratio is now known as ignorratio

03:07 ignorratio is now known as ignORratio

03:08 ignORratio is now known as ignORratIo

03:09 dude12312414 has quit [Excess Flood]

03:09 gildasio has quit [Remote host closed the connection]

03:10 gildasio has joined #osdev

03:14 gildasio has quit [Remote host closed the connection]

03:15 gildasio has joined #osdev

03:24 \Test_User has quit [Quit: \Test_User]

03:26 \Test_User has joined #osdev

03:48 heat has quit [Ping timeout: 268 seconds]

04:44 stolen has joined #osdev

04:56 Gooberpatrol66 has quit [Ping timeout: 268 seconds]

05:06 Gooberpatrol66 has joined #osdev

05:26 volum has joined #osdev

05:37 rpnx_ has quit [Ping timeout: 255 seconds]

05:45 Arthuria has quit [Ping timeout: 268 seconds]

05:48 npc has joined #osdev

05:50 rpnx_ has joined #osdev

05:55 rpnx_ has quit [Ping timeout: 245 seconds]

05:57 npc has quit [Remote host closed the connection]

06:07 rpnx_ has joined #osdev

06:13 rpnx_ has quit [Ping timeout: 268 seconds]

06:17 ThinkT510 has quit [Quit: WeeChat 4.2.2]

06:20 ThinkT510 has joined #osdev

06:25 rpnx_ has joined #osdev

06:25 bliminse has quit [Quit: leaving]

06:31 rpnx_ has quit [Ping timeout: 260 seconds]

06:43 rpnx_ has joined #osdev

06:48 volum has quit [Ping timeout: 250 seconds]

06:48 rpnx_ has quit [Ping timeout: 268 seconds]

07:02 rpnx_ has joined #osdev

07:07 rpnx_ has quit [Ping timeout: 246 seconds]

07:20 rpnx_ has joined #osdev

07:20 stolen has quit [Quit: Connection closed for inactivity]

07:26 rpnx_ has quit [Ping timeout: 272 seconds]

07:27 \Test_User has quit [Quit: e]

07:29 \Test_User has joined #osdev

07:31 zhiayang has quit [Quit: oof.]

07:31 zhiayang has joined #osdev

07:39 gbowne1 has quit [Quit: Leaving]

07:40 rpnx_ has joined #osdev

07:45 rpnx_ has quit [Ping timeout: 260 seconds]

07:58 rpnx_ has joined #osdev

08:04 rpnx_ has quit [Ping timeout: 255 seconds]

08:17 rpnx_ has joined #osdev

08:22 rpnx_ has quit [Ping timeout: 260 seconds]

08:35 rpnx_ has joined #osdev

08:38 \Test_User has quit [Quit: \Test_User]

08:40 rpnx_ has quit [Ping timeout: 255 seconds]

08:41 GeDaMo has joined #osdev

08:41 ignORratIo has quit [Read error: Connection reset by peer]

08:45 \Test_User has joined #osdev

08:48 hunta987 has quit [Quit: Lost terminal]

08:53 rpnx_ has joined #osdev

08:58 Burgundy has joined #osdev

08:59 rpnx_ has quit [Ping timeout: 268 seconds]