#osdev on 2022-10-30 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:02 thinkpol has quit [Remote host closed the connection]

00:02 moatx has quit [Quit: Leaving]

00:03 thinkpol has joined #osdev

00:09 epony has quit [Ping timeout: 252 seconds]

00:10 epony has joined #osdev

00:13 moatx has joined #osdev

00:21 [itchyjunk] has quit [Remote host closed the connection]

00:21 <geist> huh my driver has a problem with the i219 that's built into the alder lake machine

00:21 amine has quit [Quit: Ping timeout (120 seconds)]

00:21 <geist> lets see what the diff is between i210 and i217 and i219

00:21 <geist> yay

00:21 [itchyjunk] has joined #osdev

00:22 amine has joined #osdev

00:24 xalarm has quit [Quit: joins libera]

00:25 <geist> oh interesting, https://pci-ids.ucw.cz/read/PC/8086 says it's a i219-V but it's different than the raw i219 manual

00:25 <bslsk05> pci-ids.ucw.cz: PCI Devices

00:25 <geist> since it's integrated into the cpu i guess it's some embedded versions

00:30 <heat> nic integrated into the cpu?

00:30 <geist> well, on the SOC i guess

00:31 <geist> since it shows up at address 0:1f.0

00:31 <geist> otherwise it'd be on the other side of a pci bridge i guess

00:31 <heat> ah yeah

00:31 <heat> what will happen when intel runs out of pci device id space?

00:32 <heat> they're getting remarkably close

00:32 moatx has quit [Quit: Leaving]

00:33 <heat> ah I think they also own 8087

00:33 <geist> oh i see. i think it's because its integrated it doesn't seem to have an EEPROM programming model

00:33 <geist> and thus the code i have that tries to read the mac address instantly locks up because it is getting nothing back

00:35 <kazinsal> ah, yeah, on chips without an eeprom I think you can just grab it from bar0+0x5400

00:36 <geist> the i219 manual is confusing as heck. it almost looks like the registers are laid out completely differently or something

00:38 <kazinsal> the one that just jumps straight into the PHY registers at the beginning of Register Definitions?

00:38 <geist> yes!

00:38 <geist> as if you're supposed to access it via 2 phy banks or whatnot

00:39 <geist> there's some leap of logic i have to make there to grok what they're talking about

00:39 <kazinsal> yeah, something is very odd about how Intel built the i219 as even on the Ark site it says the i219 is *just* a PHY

00:39 <geist> exactly, does it as a result use a different driver? (/me boots back into linux to see what it does)

00:40 <kazinsal> I think it still uses e1000e

00:40 <geist> yeah ui'm wondering if everytning (EEPROM aside) is really mapped the same way, they just switched to a different mechanism to describe it

00:40 <geist> in terms of PHY addresses and whatnot

00:41 <geist> yeah seems to be e1000e

00:41 <geist> https://www.irccloud.com/pastebin/5zLADT8E/

00:41 <bslsk05> IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/5zLADT8E

00:42 <geist> https://www.irccloud.com/pastebin/gcwUplWg/

00:42 <bslsk05> IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/gcwUplWg

00:42 <geist> so i think it's an i219 basedon the string, though it's unclear what (17) means in this case

00:43 [itchyjunk] has quit [Ping timeout: 250 seconds]

00:43 <heat> e1000e best driver

00:43 <geist> there's another set of pcid ids on the pci id site that has a (16)

00:43 <kazinsal> it's related to the revision and whether it's a -V or -LM I think

00:43 <heat> "how many nics do you want to support?" "yes"

00:43 <kazinsal> the -V is just an -LM that doesn't support VT-d passthrough

00:44 <kazinsal> my 8700K machine has an I219-V rev 00

00:44 <kazinsal> 8086:15B8

00:44 <geist> oh that makes sense. the i219 manual i found on the interwebs just describes the discrete chip version of it i think and has 0x156f and 0x15b7

00:44 <heat> what's VT-d again?

00:44 <geist> oh hah even better, a 15b8

00:45 <kazinsal> VT-d is PCI passthrough

00:45 <geist> oh yeah if you look at https://pci-ids.ucw.cz/read/PC/8086 there are a butt-ton of i219 variants, each with a (N) after it

00:45 <geist> goddamnit intel

00:46 <geist> why do they keep rolling device ids for minor changes? I guess maybe that's what you're supposed to do, but seems like if it's basically compatible you should report the same id with a features/revision field inside it

00:46 <kazinsal> looks like one new one for every PCH change, god damn

00:46 <geist> they do this with everything else too, but at least something like AHCI you can match based on a class/subclass thing

00:46 <kazinsal> https://www.irccloud.com/pastebin/5oD11mFH/

00:46 <bslsk05> IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/5oD11mFH

00:47 <geist> yah so the specific one i have here is E1000_DEV_ID_PCH_ADP_I219_V17

00:47 [itchyjunk] has joined #osdev

00:48 <heat> are skipping the fact that intel's naming makes no sense

00:48 <heat> for everything

00:48 myon98 has quit [*.net *.split]

00:48 duckworld has quit [*.net *.split]

00:48 myon98 has joined #osdev

00:48 duckworld has joined #osdev

00:50 <geist> https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/intel/e1000e/netdev.c#L7794 appears to be the huge table

00:50 <bslsk05> github.com: linux/netdev.c at master · torvalds/linux · GitHub

00:50 <kazinsal> yeah

00:51 <geist> and via some indirects leads to a bunch of descritors of stuff

00:51 <geist> and pointers to accessor routines. yay.

00:51 <geist> well okay, this driver will not be fixed today then

00:51 <heat> that's the i915 syndrome

00:52 <geist> yah and since there's enough subtle variations, it's not even safe to generally set your driver to bind to something you haven't personally seen

00:52 <kazinsal> yeah, the official e1000e driver is a bit of a mess

00:52 Affliction has quit [*.net *.split]

00:52 <geist> but then since there are like a hundred variants you never see all of them

00:52 Affliction has joined #osdev

00:52 <kazinsal> and a lot of the variants just boil down to "same shit, different coat of paint"

00:53 <heat> the freebsd one is also horrible btw

00:55 <geist> the gist at least for reading the mac address is apparently you have to read it out of a nvram. i bet it is ultimatley as kazinsal says: there's some flash nvram mapped into the aperture (which is larger than i'd seen in the past, 128K) and there's some structure to it

00:55 <geist> that the i219 manual is describing. so probably you just directly read the flash structure. seems to be what these accessor routines are doing in the intel manual

00:56 duckworld has quit [*.net *.split]

00:56 Affliction has quit [*.net *.split]

00:56 duckworld has joined #osdev

00:56 duckworld has quit [Max SendQ exceeded]

00:56 Affliction has joined #osdev

00:57 Affliction has quit [Max SendQ exceeded]

00:57 duckworld has joined #osdev

00:57 Affliction has joined #osdev

00:58 <kazinsal> aha! found something relevant in the bsd em driver

00:59 <kazinsal> https://github.com/openbsd/src/blob/master/sys/dev/pci/if_em.c#L1716

00:59 <bslsk05> github.com: src/if_em.c at master · openbsd/src · GitHub

00:59 <kazinsal> https://www.irccloud.com/pastebin/usyE3VEh/

00:59 <bslsk05> IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/usyE3VEh

01:00 ephemer0l has quit [Remote host closed the connection]

01:10 <geist> yah the pch ones i guess are the builtins and the tcp/adp/cnp stuff is the family name

01:10 <geist> tiger lake, alder lake, etc

01:13 nyah has quit [Ping timeout: 276 seconds]

01:19 spikeheron has quit [Quit: WeeChat 3.7.1]

01:26 spikeheron has joined #osdev

01:37 vdamewood has quit [Read error: Connection reset by peer]

01:39 vdamewood has joined #osdev

01:56 gog has quit [Ping timeout: 250 seconds]

01:58 kof123 has joined #osdev

02:08 <heat> weird q: is there a problem in doing OOB reads on string functions (strlen, etc)?

02:09 <zid> yes?

02:09 <zid> Like, very a lot

02:10 <heat> I was trying to be precise in my KASAN bounds (as in making it exactly the size the caller asked for, and not the rounded up obj size) but strlen is triggering it

02:10 <heat> and it's not possible to not trigger it and still have a fast implementation

02:11 <heat> reading more has zero side effects but triggers kasan

02:11 * zid puts the string at the end of a page

02:11 <zid> zero effects other than that page fault.

02:12 <heat> which is why you align the pointer

02:12 <heat> you're never getting an unwanted page fault reading 8 bytes if your pointer is 8 byte aligned

02:13 <zid> okay then *puts a string on his gameboy stack*

02:13 <zid> it might be a viable optim on x86 if you can prove a bunch of stuff for sure

02:13 <zid> but it isn't.. universally legal

02:13 <heat> it's not prove, it's that you read word-sized chunks on word-sized boundaries

02:13 epony has quit [Ping timeout: 252 seconds]

02:13 <heat> you obv need to manually align the ptr in strlen itself

02:14 <heat> https://github.com/heatd/Onyx/blob/master/kernel/lib/libk/string/strlen.cpp

02:14 <bslsk05> github.com: Onyx/strlen.cpp at master · heatd/Onyx · GitHub

02:14 epony has joined #osdev

02:20 epony has quit [Ping timeout: 252 seconds]

02:26 flx has quit [Remote host closed the connection]

02:27 flx has joined #osdev

02:30 <heat> i'm just adding an ASAN exception for strlen I guess

02:55 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

03:12 zaquest has quit [Remote host closed the connection]

03:14 zaquest has joined #osdev

04:32 Rubikoid has quit [*.net *.split]

04:32 weinholt has quit [*.net *.split]

04:32 Ermine has quit [*.net *.split]

04:32 eau has quit [*.net *.split]

04:32 jjuran has quit [*.net *.split]

04:32 klange has quit [*.net *.split]

04:32 merry has quit [*.net *.split]

04:32 kori has quit [*.net *.split]

04:32 DoubleJ has quit [*.net *.split]

04:32 dminuoso has quit [*.net *.split]

04:32 moberg1 has quit [*.net *.split]

04:32 colona has quit [*.net *.split]

04:32 Mutabah has quit [*.net *.split]

04:32 seer has quit [*.net *.split]

04:32 CompanionCube has quit [*.net *.split]

04:32 _xor has quit [*.net *.split]

04:32 alpha2023 has quit [*.net *.split]

04:32 Rubikoid has joined #osdev

04:32 klange has joined #osdev

04:33 jjuran has joined #osdev

04:33 CompanionCube has joined #osdev

04:33 colona has joined #osdev

04:33 DoubleJ has joined #osdev

04:33 Oshawott has joined #osdev

04:34 Ermine has joined #osdev

04:34 moberg has joined #osdev

04:34 merry has joined #osdev

04:34 seer has joined #osdev

04:36 dminuoso has joined #osdev

04:36 archenoth has quit [Ping timeout: 272 seconds]

04:37 Mutabah has joined #osdev

04:37 heat has quit [Ping timeout: 246 seconds]

04:39 Terlisimo has quit [Quit: Connection reset by beer]

04:39 sprock has quit [Remote host closed the connection]

04:39 Terlisimo has joined #osdev

04:42 darkstardevx has quit [*.net *.split]

04:42 ZipCPU has quit [*.net *.split]

04:42 ZipCPU has joined #osdev

04:42 darkstardevx has joined #osdev

04:44 Griwes has quit [Excess Flood]

04:44 sprock has joined #osdev

04:46 Griwes has joined #osdev

04:53 kaichiuchi has quit [*.net *.split]

04:53 danlarkin has quit [*.net *.split]

04:53 sham1 has quit [*.net *.split]

04:53 stephe has quit [*.net *.split]

04:53 DanDan has quit [*.net *.split]

04:53 ElementW has quit [*.net *.split]

04:53 sm2n has quit [*.net *.split]

04:53 gjnoonan has quit [*.net *.split]

04:53 exec64 has quit [*.net *.split]

04:53 alethkit has quit [*.net *.split]

04:53 milesrout has quit [*.net *.split]

04:53 alecjonathon has quit [*.net *.split]

04:53 noeontheend has quit [*.net *.split]

04:53 ddevault has quit [*.net *.split]

04:53 tom5760 has quit [*.net *.split]

04:53 ggherdov has quit [*.net *.split]

04:53 cultpony has quit [*.net *.split]

04:53 dragestil has quit [*.net *.split]

04:53 phr3ak has quit [*.net *.split]

04:53 w41 has quit [*.net *.split]

04:53 Luci-ghoule has quit [*.net *.split]

04:53 mxshift has quit [*.net *.split]

04:54 ElementW has joined #osdev

04:54 phr3ak has joined #osdev

04:55 noeontheend has joined #osdev

04:55 tom5760 has joined #osdev

04:55 alethkit has joined #osdev

04:55 gjnoonan has joined #osdev

04:55 exec64 has joined #osdev

04:55 alecjonathon has joined #osdev

04:55 ddevault has joined #osdev

04:55 cultpony has joined #osdev

04:56 dragestil has joined #osdev

04:56 sm2n has joined #osdev

04:56 milesrout has joined #osdev

04:57 sham1 has joined #osdev

05:00 [itchyjunk] has quit [Remote host closed the connection]

05:19 epony has joined #osdev

05:46 xalarm has joined #osdev

05:49 <Jari--> morning /s /w /p

05:51 Jari-- has quit [Quit: apt upgrade && reboot]

05:52 xalarm has quit [Read error: Connection reset by peer]

06:04 varad has quit [Quit: Ping timeout (120 seconds)]

06:05 varad has joined #osdev

06:35 merry has quit [Ping timeout: 252 seconds]

06:35 klange has quit [Ping timeout: 252 seconds]

06:35 klange has joined #osdev

06:36 merry has joined #osdev

06:43 Terlisimo has quit [Ping timeout: 252 seconds]

06:45 Terlisimo has joined #osdev

06:45 <geist> oh no the sortix irc serer is down

06:52 GeDaMo has joined #osdev

06:52 gildasio has quit [Remote host closed the connection]

06:57 fkrauthan has quit [Quit: ZNC - https://znc.in]

06:57 dragestil has quit [Ping timeout: 252 seconds]

06:57 dragestil has joined #osdev

06:58 fkrauthan has joined #osdev

06:58 sham1 has quit [Ping timeout: 252 seconds]

06:59 fkrauthan has quit [Client Quit]

07:00 sham1 has joined #osdev

07:01 fkrauthan has joined #osdev

07:07 Oshawott has quit [Ping timeout: 250 seconds]

07:08 archenoth has joined #osdev

07:22 epony has quit [Ping timeout: 252 seconds]

07:24 epony has joined #osdev

07:29 vai has joined #osdev

07:29 vai is now known as Jari--

07:33 <Jari--> good morning everyone

07:34 <klys> ack good night (morning jari--)

07:35 <GeDaMo> Good early morning, the clocks just went back :P

07:44 FireFly has joined #osdev

07:49 <geist> oh woot. another week before the clocks reset here

07:50 <Griwes> Yeah dst is not only stupid but also stupidly shifted by a week between europe and NA

07:50 <GeDaMo> Most (all?) of Europe uses the last weekends in March and October

07:51 <GeDaMo> "It includes all countries in the European Economic Area (EEA), except Iceland." https://www.timeanddate.com/time/europe/

07:51 <bslsk05> www.timeanddate.com: Time Zones and DST in Europe

07:51 <zid> BST ENDS. STOP.

07:52 <zid> no more daylight for 6 months, rip

07:53 <kof123> i was expecting an "iceland has no concept of time, nor trees, so doesn't need time zones" joke

08:03 vdamewood has joined #osdev

08:09 elastic_dog has quit [Ping timeout: 240 seconds]

08:09 elastic_dog has joined #osdev

08:11 poisone has joined #osdev

08:20 <FireFly> I mean, effectively :p

08:21 <FireFly> I know some people set Iceland as time locale to lock their time into UTC

08:24 <zid> I'd just.. set it to UTC

08:25 <zid> I wonder if that's possible in systemd

08:26 <zid> ah yea it appears timedatectl is just making a symlink

08:26 <zid> for openrc it's echo "Europe/Brussels" > /etc/timezone or whatever, and there are UTC files you can straight use rather than country files

08:30 <zid> \/usr/share/zoneinfo/UTC (or UCT if you're a frencho I guess)

08:30 <zid> no idea what GMT+0, GMT-0 and GMT0 are though..

08:30 <zid> maybe just aliases

08:33 xenos1984 has quit [Ping timeout: 246 seconds]

08:34 xenos1984 has joined #osdev

08:34 <Griwes> I'd hope those are aliases lol

08:34 <zid> the MD5s are all the same, phew

08:35 <zid> 'GB' and 'GB-Eire' are different though

08:38 weinholt has joined #osdev

08:42 <zid> might be interesting to write some code that can parse out some of the info from these files at some point

08:45 <GeDaMo> https://datatracker.ietf.org/doc/html/rfc8536 ?

08:45 <bslsk05> datatracker.ietf.org: RFC 8536 - The Time Zone Information Format (TZif)

08:45 <zid> yea

08:45 <zid> implementing tzif2

08:51 <zid> Oh right.. I think my books are going to come out an hour later now? :(

08:52 <zid> hopefully it's an hour earlier and I can't tell time

08:55 <GeDaMo> Books?

09:00 dragestil_ has joined #osdev

09:00 dragestil has quit [Ping timeout: 252 seconds]

09:02 dragestil_ is now known as dragestil

09:04 fkrauthan_ has joined #osdev

09:05 fkrauthan has quit [Ping timeout: 252 seconds]

09:05 fkrauthan_ is now known as fkrauthan

09:27 epony has quit [Ping timeout: 252 seconds]

09:28 wootehfoot has joined #osdev

09:29 epony has joined #osdev

09:44 k0valski1889 has joined #osdev

09:45 <Mondenkind> ohai FireFly. Didn't realise you were here

09:45 <Mondenkind> and in #tendra!

09:46 puck has quit [Excess Flood]

09:47 puck has joined #osdev

09:51 DanDan has joined #osdev

10:36 <FireFly> oh heya

10:36 <FireFly> zid: yeah, more for cases where that's not (easily) possible, like mobile phones or whatever where the interface wants you to pick a location

10:50 ss4 has joined #osdev

10:53 wootehfoot has quit [Ping timeout: 252 seconds]

10:56 ss4 has quit [Ping timeout: 246 seconds]

11:11 nyah has joined #osdev

11:15 chartreuse has quit [Ping timeout: 260 seconds]

11:22 potash has quit [Quit: ZNC 1.8.2 - https://znc.in]

11:23 poisone has quit [Remote host closed the connection]

11:27 nick64 has joined #osdev

11:27 <nick64> `__asm__ volatile ("clac");` is sufficient to disable SMAP right? Or do I have to edit the EFLAGS as well?

11:31 epony has quit [Ping timeout: 252 seconds]

11:33 epony has joined #osdev

11:39 potash has joined #osdev

11:41 poisone has joined #osdev

11:44 mahk has quit [Ping timeout: 268 seconds]

11:46 k0valski18896 has joined #osdev

11:47 k0valski1889 has quit [Ping timeout: 252 seconds]

11:47 k0valski18896 is now known as k0valski1889

12:03 * sortie debugs the way root intended, the only primitives available is to print the current registers, and to hexdump memory addresses; proceeds to manually do a stack trace, having rebuilt a similar binary with symbols, which I manually inspect with shell commands to manually calculate offsets

12:05 <sortie> I really need to invest in building a stable kernel debugging solution for my OS, but this qemu monitor fallback is at least theoretically able to provide all the information, if I behave like gdb in my mind

12:05 <sortie> It's one of those days when one of my production VMs faulted on a super rare kernel bug, so VNC'd into the qemu to debug it

12:15 <sortie> int sys_ppoll(struct pollfd* user_fds, size_t nfds, const struct timespec* user_timeout_ts, const sigset_t* user_sigmask)

12:15 <sortie> ...

12:15 <sortie> if ( desc->poll(&ctx, node) == 0 )

12:15 <sortie> ^^^ https://gitlab.com/sortix/sortix/-/blob/b6bbc05eafba25dfba2bc8fc7fa13bdc2f3aeb85/kernel/poll.cpp#L295 ← The stack trace continues, the desc and inode call frames got tail optimized away

12:15 <bslsk05> gitlab.com: kernel/poll.cpp · b6bbc05eafba25dfba2bc8fc7fa13bdc2f3aeb85 · sortix / Sortix · GitLab

12:19 <sortie> Let's see if I can somehow untangle all of this assembly and see if I can examine the data structures

12:19 <klange> someone sent me this: https://user-images.githubusercontent.com/76218884/198864055-8bebc57a-b0aa-4dc5-9b4e-34b3ea30ea4b.png

12:19 Jari-- has quit [Ping timeout: 246 seconds]

12:21 epony has quit [Remote host closed the connection]

12:23 <sortie> Hooo boy that's cursed

12:23 epony has joined #osdev

12:27 potash has quit [Read error: Connection reset by peer]

12:30 potash has joined #osdev

12:31 <klange> yeah, kernel issues with threads that never really cropped up in my dumb tests

12:31 gog has joined #osdev

12:34 elastic_dog has quit [Remote host closed the connection]

12:35 elastic_dog has joined #osdev

12:43 bauen1 has quit [Quit: leaving]

13:00 bauen1 has joined #osdev

13:04 bauen1 has quit [Remote host closed the connection]

13:12 awita has joined #osdev

13:14 isaacwoods has joined #osdev

13:18 poisone has quit [Remote host closed the connection]

13:20 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

13:35 epony has quit [Ping timeout: 252 seconds]

13:38 epony has joined #osdev

13:43 awita has quit [Ping timeout: 252 seconds]

13:46 spikeheron has quit [Quit: WeeChat 3.7]

13:56 dutch has joined #osdev

14:01 poisone has joined #osdev

14:21 heat has joined #osdev

14:24 [itchyjunk] has joined #osdev

14:31 vai has joined #osdev

14:31 vai is now known as Jari--

14:32 <Jari--> howdy

14:34 <heat> mfw i can't tell if its the sanitizer's thats buggy or the code that's buggy

14:35 gog` has joined #osdev

14:35 gog has quit [Ping timeout: 252 seconds]

14:37 awita has joined #osdev

14:45 bauen1 has joined #osdev

15:08 poisone has quit [Ping timeout: 272 seconds]

15:11 <zid> I have pizza, so your comment is irrelevent

15:13 _awita_ has joined #osdev

15:14 awita has quit [Read error: Connection reset by peer]

15:20 gog` has quit [Remote host closed the connection]

15:20 gog` has joined #osdev

15:21 gog` is now known as gog

15:21 poisone has joined #osdev

15:23 <gog> https://media.discordapp.net/attachments/769207507767656489/1036294542539890708/never_done_anything_wrong.gif

15:24 <zid> me_irl

15:25 <GeDaMo> # I'm as honest as the day is long; the longer the daylight, the less I do wrong # :P

15:26 <GeDaMo> https://www.youtube.com/watch?v=Ul_Kotlqi3k

15:26 <bslsk05> 'Madness - Shut Up (Official Video)' by Madness (00:03:28)

15:27 <gog> made this because we deploy tomorrow and i'm going to need it because i have a major feature going live

15:28 Jari-- has quit [Remote host closed the connection]

15:29 <gog> :P

15:31 lkurusa has joined #osdev

15:34 lkurusa has quit [Remote host closed the connection]

15:40 epony has quit [Ping timeout: 252 seconds]

15:43 epony has joined #osdev

15:48 netbsduser has joined #osdev

16:01 <heat> https://eloncodereview.com/

16:01 <bslsk05> eloncodereview.com: Elon Code Review - Print it out and Elon Musk will check it

16:17 _xor has joined #osdev

16:33 theWeaver has joined #osdev

16:34 <gog> lmao

16:34 <gog> how would you even satisfy this request

16:35 <gog> print the last n days of PR's?

16:35 <gog> elon really has no clue whatsoever how programming works

16:35 dutch has quit [Quit: WeeChat 3.7]

16:35 <theWeaver> wait wait wait

16:36 <theWeaver> did he actually tell ppl to print their code

16:36 <gog> yes

16:36 <theWeaver> smh

16:36 <gog> for him to review

16:36 <theWeaver> LOL

16:37 <GeDaMo> https://nitter.ca/caseynewton/status/1586127052767318016?s=46&t=cQeLx1_rGjRZK2XPJhlAaA

16:37 <bslsk05> twitter: <CaseyNewton> Here it is: Twitter engineers were told today to *print out* their last 30 to 60 days of code, so they could show it to Elon Musk himself. ␤ ␤ Then they were told wait, no, actually, please shred all that code you just printed out. ␤ ␤ Subscribe to read ➡️ https://www.platformer.news/p/elon-takes-over-twitter https://pbs.twimg.com/media/FgMOJdmUAAAiR0T.jpg

16:37 <bslsk05> [According to four current employees, engineers spent Friday afternoon at Twitter dutifully printing out their code in anticipation of meetings with Musk and some of his senior engineers from Tesla. Other engineers were told to prepare for “code pairing” with Musk, in which they would sit with him and review code together. ␤ ␤ Just after noon, an executive assistant asked engineers to begin preparing code to show to Musk. “Please print out

16:37 Raito_Bezarius has quit [Quit: free()]

16:37 <theWeaver> lmfao

16:37 <gog> half of my code changes are deltions, how do you represent that

16:37 <gog> or does he just want to see the current state

16:38 <gog> what does it even mean

16:38 <theWeaver> the - lines on the diffs

16:38 <theWeaver> i guess

16:38 <gog> that was my intuition too

16:38 <gog> just print out your last n days of diffs

16:38 <gog> but if it's git format then you only have 3 lines of context

16:38 <theWeaver> i knew he was a moron but i wasnt expecting this

16:38 <theWeaver> he will have to infer the broader context himself

16:38 <theWeaver> ¯\_(ツ)_/¯

16:39 <gog> anyhow, when my wife told me this story i was like "what the fuck does that even mean"

16:39 <theWeaver> it means "musk is a freaking idiot"

16:40 <gog> yes

16:41 * zid shreds all of his code too in solidarity

16:41 <zid> sorry to all 4 users

16:41 <theWeaver> lol

16:42 dude12312414 has joined #osdev

16:49 <gog> that's infinity times as many users as my code

16:49 <zid> idk how many users it has it got pirated

16:49 <zid> I gave it to 4 people

16:50 xenos1984 has quit [Read error: Connection reset by peer]

16:52 <zid> https://stackoverflow.com/questions/70669211/why-dll-is-not-imported-in-the-python-script-as-the-other-libraries

16:52 <bslsk05> stackoverflow.com: Why DLL is not imported in the python script as the other libraries? - Stack Overflow

16:52 <zid> hey there's a nonsense stackoverflow question about it even

16:52 <zid> ctrl-f zid

16:53 <zid> possibly, anyway

16:56 <zid> TIL bing indexes this channel's logs and google doesn't

16:56 fatal1ty has joined #osdev

16:57 dude12312414 has quit [Ping timeout: 258 seconds]

16:57 sikkiladho has joined #osdev

16:58 <gog> bing

16:58 <zid> I always want to say "intel inside" but with swapped to bing, so like.. bingle inside

16:58 dude12312414 has joined #osdev

17:09 xenos1984 has joined #osdev

17:12 MrBonkers has quit [Quit: ZNC 1.8.2+deb2build5 - https://znc.in]

17:14 _Agua_ has joined #osdev

17:15 MrBonkers has joined #osdev

17:15 _awita_ has quit [Read error: Connection reset by peer]

17:33 dutch has joined #osdev

17:39 * kof123 http://aminet.net/package/mods/voice/IntelOut

17:39 <bslsk05> aminet.net: Aminet - mods/voice/IntelOut.lha

17:44 epony has quit [Ping timeout: 252 seconds]

17:46 epony has joined #osdev

17:50 tomaw has quit [Quit: Quitting]

17:51 sm2n has quit [Ping timeout: 255 seconds]

17:53 sm2n has joined #osdev

17:53 mahk has joined #osdev

17:59 tomaw has joined #osdev

18:00 xenos1984 has quit [Ping timeout: 246 seconds]

18:01 xenos1984 has joined #osdev

18:13 xenos1984 has quit [Ping timeout: 276 seconds]

18:16 Raito_Bezarius has joined #osdev

18:18 Raito_Bezarius has quit [Max SendQ exceeded]

18:20 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

18:21 _xor has quit [Ping timeout: 252 seconds]

18:22 _xor has joined #osdev

18:28 xenos1984 has joined #osdev

18:36 Raito_Bezarius has joined #osdev

18:39 Raito_Bezarius has quit [Max SendQ exceeded]

18:47 kaichiuchi has joined #osdev

18:51 _Agua_ has quit [Quit: Leaving]

18:57 Raito_Bezarius has joined #osdev

19:01 <heat> my scheduler is ASAN unfriendly because it doesn't unwind the stack when context switching

19:01 <heat> ://

19:04 <heat> this is legitimately annoying

19:04 <heat> asan stack poisoning just poisons, but doesn't unpoison valid shadow areas

19:05 <fatal1ty> heat hi

19:05 <heat> my code assumed it could jump around with the stack without unwinding, but now it's just making valid stack areas have poisoned bytes ://

19:06 <heat> hi

19:06 <fatal1ty> poisoning, validity ...

19:06 <fatal1ty> i like you more and more by the minute

19:07 <fatal1ty> a real 2022 gangsta

19:07 <fatal1ty> im in

19:09 <fatal1ty> RofL

19:11 <fatal1ty> just to make it clear for everybody

19:11 <fatal1ty> whenever i be in a position when i need help with security stuff, i want to see all the gangsters here come with solutions

19:11 <fatal1ty> but right now i just brushing on my C

19:16 jjuran has quit [Quit: Killing Colloquy first, before it kills me…]

19:17 MrPortmaster has joined #osdev

19:17 jjuran has joined #osdev

19:17 MrBonkers has quit [Read error: Connection reset by peer]

19:17 fatal1ty has quit [Ping timeout: 276 seconds]

19:19 fatal1ty has joined #osdev

19:38 nur has quit [Ping timeout: 264 seconds]

19:41 poisone has quit [Remote host closed the connection]

19:47 sikkiladho has quit [Quit: Connection closed for inactivity]

19:48 epony has quit [Ping timeout: 252 seconds]

19:51 epony has joined #osdev

19:55 gog` has joined #osdev

19:55 gog has quit [Read error: Connection reset by peer]

19:55 gog` is now known as gog

19:57 misnor has joined #osdev

19:59 chartreuse has joined #osdev

20:15 dutch has quit [Quit: WeeChat 3.7]

20:22 dutch has joined #osdev

20:25 _xor has quit [Ping timeout: 252 seconds]

20:26 _xor has joined #osdev

20:39 sprock has quit [Remote host closed the connection]

20:44 sprock has joined #osdev

20:49 GeDaMo has quit [Quit: You are becoming what we French call 'Le Fruitcake'.]

20:51 sprock has quit [Quit: brb]

20:55 divine has quit [Ping timeout: 240 seconds]

20:56 sprock has joined #osdev

20:56 * geist yawns

20:56 <geist> totally slept in today, though i think i needed it

20:57 <heat> 💯

20:57 <fatal1ty> at least you can sleep]

20:57 <fatal1ty> be thankful for it

20:58 <fatal1ty> when im sleeping i regret that i got up in the morning

20:59 <fatal1ty> https://youtu.be/TjV44tRS6IY?t=181

20:59 <bslsk05> 'Prison Break(2005) S01 E03 - I don't sleep (6/6) l Netflix' by 3 Minute Drama (00:03:40)

20:59 sprock has quit [Client Quit]

21:00 epony has quit [Ping timeout: 252 seconds]

21:00 <heat> geist, have you played around with ASAN yet?

21:00 <fatal1ty> heat: shut up

21:00 <geist> not personally, aside from fuchsia

21:01 <heat> i've been reworking my implementation this weekend

21:01 <heat> the early memory mapping is... gnarly

21:01 <geist> we do however have asan and kasan for fuchsia, and there are bots that run it, etc

21:01 <geist> yeah i think it is. burns like 1/8 of ram up front for it

21:01 <geist> (for kasan that is)

21:01 <heat> that's not the worst

21:01 <geist> for user space asan you can rely on overcommitting and lots of virtual memory to allocate and demand fill in the bitmap

21:01 <heat> I can't do that because I do ASAN for vmalloc (which dynamically allocates address space)

21:02 <geist> are we talking about user sace or kernel space?

21:02 <heat> so I essentially implemented a CoW-like thing for the shadow mapping's page tables

21:02 <heat> kernel

21:03 * geist nods

21:03 <heat> if you have asan-stack=1 you're even more screwed

21:04 Burgundy has joined #osdev

21:04 <heat> not only do you need the early boot shadow zero mapping, but you need a real mapping for the stack because the compiler will emit code that writes directly to it

21:05 <heat> (you have a TODO in fuchsia)

21:05 <geist> unrelated: i read the other day that Intel Atoms from 5 or 6 years ago in the Rangeley class apparently fail over time

21:05 <heat> isn't that normal?

21:05 <geist> well, over time as in lots of them already have, after just a few years

21:05 <geist> and my firewall is one of them

21:05 <heat> oh

21:06 <geist> it's apparently some sort of manufacturing defect that causes the i think LPC bus to fail to work, which usually causes most machines to not post

21:06 fatal1ty has quit [Read error: Connection reset by peer]

21:06 <geist> since the bios can't talk to peripheral devices

21:07 <geist> somethingl ike some internal clock derivatoin logic fails over time prematurely

21:08 <heat> cpu or chipset?

21:08 <heat> I can never tell if things are apart of the CPU itself or the chipset

21:09 <geist> both it's a single core thing

21:09 <geist> https://ark.intel.com/content/www/us/en/ark/products/codename/60191/products-formerly-rangeley.html

21:09 <bslsk05> ark.intel.com: Products formerly Rangeley

21:09 <geist> ie, intel C2338, etc. they were an earlier rev of a line of atoms with lots of built in networking stuff

21:10 <geist> sold for exactly stuff like firewalls and whatnot. still very popular

21:10 <geist> in fact mine is literally a C2558

21:10 <gog> i cannot connect to the professional sortix network again :(

21:11 <geist> anyway what people are reporting is a very high failure rate on these

21:11 <heat> the professional sortix network had a crash

21:11 <geist> yah its still down it seems

21:11 <heat> sortie is on it

21:11 <gog> ooof

21:11 <sortie> gog, you can evacuate to #sortix on this network instead :)

21:11 <sortie> It's the evacuation assembly point

21:11 <gog> good to know :)

21:11 <sortie> In case of fire, don't use the elevators for up/down in gdb

21:12 <sortie> But yeah I managed to attach a gdb to the crashed kernel on my production VM and am right now trying to understand where the bad pointer causing a GPF came from

21:14 <gog> oops! #GP no fun

21:14 <geist> sortie: omg hasnt work told you you immediately roll the change back and fix the failure later?

21:15 <sortie> geist, would do if it hit the sortix.org front page, but irc.sortix.org is my test bed for debugging critical stability issues

21:15 <sortie> Better if it's down for a day if I debug a bad kernel crash

21:15 <geist> heh i'm kidding. i find it kinda annoying some time (that policy) but it's the rules!

21:15 <sortie> Oh it's def a policy for a reason and I do it all the time when I break infra

21:16 <geist> ie, always roll back never fix forward

21:16 <geist> yah it's just annoying when the fix forward is like a one line typo

21:16 <sortie> And then I debug the problem and reland a fixed version

21:16 <sortie> Well I do fix forward now and then

21:16 <sortie> But only if a fix is obvious

21:16 <sortie> geist, though you definitely appreciate the value of a super rare kernel crash

21:17 <sortie> You know how stable the irc server has been for months and then whoops there's a super mysterious crash

21:17 <geist> oh tell me about it. now that we have bazillion hours of zircon running i the field there's enough kernel crashes getting reported that there are some really bizarro ones to try to scratch your head over

21:17 <gog> cosmic rays

21:18 <geist> though many of them are flipped bits, or at least *have* to be, and the kernel is the canary

21:18 <gog> fatal single-bit flip caused by a high-energy particle originating deep from outer space

21:18 <geist> doesn't help that the kernel is very very highly asserted, so it tends to trip over anything slightly amiss. keeps a tight ship

21:20 MrBonkers has joined #osdev

21:20 MrPortmaster has quit [Ping timeout: 252 seconds]

21:20 <geist> acyually a thing you can find is sometimes there's one or two devices in the field that clearly have bad DRAM

21:20 <geist> and just continually generate failures. you feel bad for the owner but theres nothing you can do

21:21 <heat> why are the asserts still compiled in for release?

21:21 <geist> because they find problems

21:22 <heat> but you're complaining that they're finding too many

21:22 <geist> though to be clear we have two levels of asserts: ones that are DEBUG_ASSERTS that are turned off for production devices

21:22 <geist> and then regular ASSERTs that are left on all the time

21:22 <geist> sure, but it's *good*

21:22 <geist> i'm complaining not complaining

21:23 <gog> compilaining

21:23 <gog> is when you bitch about gcc or clang

21:23 <gog> get it

21:23 <gog> do y'all get my hilarious pun

21:23 <heat> hehehehehe

21:23 <geist> but what i mean is when you suddenly have millions of hours on your kernel and getting live reports of kernel panics it starts to get a little intimidating

21:24 <heat> totes

21:24 <geist> even if it's only a few here and there. the stragglers are the real weird ones

21:24 <geist> for example, there's an *extremely* rare bug that we've never been able to reproduce and keep adding more code to try to catch it: when tearing down an arm64 aspace, it asserts that the page tables are empty

21:24 <heat> I don't know how it looks in the operating systems world but for distributed systems (like the one I worked on at CF) it heavily depended on the team

21:25 <geist> ie, it's mandatory that something already called Unmap() across all of the mapped things, so by the time the aspace tears down the top level page table should be empty

21:25 <gog> me every time one of our watchdogs sends me an exception report

21:25 <geist> but extremely rarely it's not

21:25 <heat> the nginx team ran an extremely tight ship. no sentry issues *ever*

21:25 <geist> so been thinking it's a cosmic ray, but it's so hard to catch we have to just add another assert and wait another month

21:26 <geist> and we've gone over the state machine that should ensure that unmap() was called, and no other thing could have mapped into the page tables in the interim before destructing the object, etc

21:27 <geist> mind you *these* are the kind of failres i'm talking about we get reports on. the kernel is pretty solid, it's lingering issues like this that are tracking

21:27 <geist> i also just made another pass through the logic to make sure all the appropriate memory barriers are in place, etc.

21:27 <heat> yay memory barriers

21:28 <geist> tightened it up a bit, that might fix it? perhaps there's some extremely rare case where some PT entry gets written very late to the page table, but i can't imagine how

21:28 <heat> have you never reproduced it?

21:28 <geist> no.

21:28 <heat> how much info are you collecting?

21:28 <geist> that's the thing, given that we only see it extremely rarely and there are millinos of hours in the field, probably daily, it's clearly very very hard to reproduce

21:28 <heat> do you have coredumps?

21:28 <geist> oh no

21:28 <geist> just a panic message

21:28 <heat> :(((

21:29 <heat> get coredumps and check again

21:29 <geist> on purpose: has to not have any user data in it, etc etc

21:29 <geist> has to be anonymized, etc

21:30 <geist> and has to survive a reboot, so by the time the kernel panics it squirrels the panic data in a little memory area (or an EFI capsule if there) and the next kernel grabs it

21:30 <geist> but the space is very limited

21:30 <heat> does it? i think the law is a bit more fine tuned than (no personal information ever)

21:30 <geist> what do you mean?

21:30 <heat> we had law-compliant coredumps, but they had to remain in a remote server, etc

21:31 <geist> right, i mean perhaps, but a core dump of the kernel is... a lot

21:31 <geist> or i guess maybe the other way of looking at is is we've never even considered building a core dump system into the kernel because i dont know how we'd be able to use that in the field

21:31 <geist> so we get a panic message and about 4K of the kernel log leading up to it

21:32 <geist> it's fine, that's not the problem here

21:32 <geist> the problem here is the damn bug rarely reproduces

21:32 <heat> if you have the hw (which you probably do) and it never reproduced there, maybe it really is just a cosmic ray

21:33 <geist> perhaps

21:34 <geist> that's a theory, the logic in how the PT code cleans itself up and unmaps page tables means that if there's a bit corruption somewhere in a PT it'll keep that part of the tree from being cleaned up

21:34 <geist> that's our current working theory, but we have to add more crash time logic to confirm it

21:34 <geist> ie, 'if we sense that the top level PT is not empty, walk the PT leaves until you find the deepest PT and print all of the nonzero entries'

21:34 <geist> i think that's the current logic we have in there to try to catch it

21:34 <geist> add that, wait a few months

21:35 <heat> yeah

21:36 <heat> i mean, if you need to wait a few months, it's not that serious

21:36 <heat> you can keep drinking your mojitos in the sun

21:36 <geist> sure

21:36 <geist> but that was the original point of the discussion: it's the lingering rare bugs that are really annoying

21:36 <heat> linux gets around this by not having any automated bug reports

21:37 <heat> easy

21:37 * gog gets the crushed ice and muddles some mint

21:37 <heat> i wonder how windows' bug reports look

21:37 <heat> probably way scarier

21:38 <heat> think about it, you're running a relatively controlled experiment with devices you know and that were made by Google itself

21:38 <heat> not a crappy 2006 laptop

21:38 <geist> oh totally

21:39 <geist> the anonymization is frustrating though. for example if we see a device that continually blows up due to bad dram, there's no way to tie it to a user

21:39 <geist> i'd love to get the device back and send them a new one, but there's no way to do it

21:39 <geist> i mean i am 100% for all the anonymization but this is the one case where you feel bad for the owner but can't do anything about it unless they report it

21:39 <heat> maybe you could add some bad hw detection stuff

21:39 <heat> like if the OS crashes way too much, etc

21:39 <geist> i suppose. when the kernel detects it it's hard

21:40 atilla has joined #osdev

21:40 <atilla> use risc-v

21:40 <geist> but yeah, that's probably the only real strategy. if you get too many forced reboots that are not just due to someone unplugging it, etc, then try to get them to report it

21:40 <heat> I don't mean memtest, but actual "your stable build of fuchsia is crashing way too much. you probably have bad hw, please talk with support..."

21:41 <geist> yah that's the best you can do

21:41 <atilla> that wouldnt be happening if you were using riscv

21:41 <geist> i'll brig that up with the device metrics folks, see if they're not always thinking this

21:41 vai has joined #osdev

21:41 <geist> s/always/already

21:41 <vai> geist: howdy + friends

21:41 vai is now known as Jari--

21:41 <geist> howdy Jari--

21:41 <atilla> wassup baby

21:41 <geist> every time i see vai i think Steve Vai

21:42 <Jari--> geist: Finland is hot, more than 5 celcius today

21:42 <geist> then i think of the new Polyphia song, which is growing on me

21:42 <atilla> I am thinking of building an operating system that will keep niiggers away. How do I build one? I am thinking of detecting spelling mistakes and ebonics to initiate SSD wipe sequence

21:42 <Jari--> Anyone in the osdev teams thought of walking to the Silicon Valley and convince the investors to get in?

21:42 <geist> aaah here we go

21:42 <atilla> more like gayist amirite

21:43 atilla was kicked from #osdev by geist [atilla]

21:43 <heat> took you long

21:43 <geist> was waiting for it

21:43 <heat> geist, re riscv: have you seen https://www.notebookcheck.net/Intel-demos-Horse-Creek-developer-board-with-SiFive-RISC-V-CPU-DDR5-RAM-and-PCIe-5-0-slot.661186.0.html

21:43 <bslsk05> www.notebookcheck.net: Intel demos “Horse Creek” developer board with SiFive RISC-V CPU, DDR5 RAM and PCIe 5.0 slot - NotebookCheck.net News

21:43 <heat> Jari--, we've talked about this. no investor wants to build an OS

21:44 <heat> even embedded usually reuses other things

21:44 <geist> heat: yeah am very curious about that. will try to see if i can get one via work

21:44 <geist> hopefully it wont be super secret and if i do i can talk about it

21:44 <geist> but sometimes i have to be dark about it because NDA

21:45 <heat> noooo not darth geist!

21:45 <geist> omg i'd be the worst sith lord ever

21:47 <Jari--> heat: my plan is to borrow get parts from open source projects, result is an open source project OS

21:48 <Jari--> I dont believe I have much chance on propertiary markets, it simply sucks, the idea behind it

21:48 <heat> but why

21:48 <heat> no one that wants to make money will build an OS

21:48 <mjg_> :]

21:48 <heat> it really is just a "fuck you" money kind of thing

21:48 <Jari--> one day my kids can dig it up, and use it

21:49 <Jari--> :)

21:50 <heat> metaverse blockchain operating system!!!!11!1

21:58 <Jari--> click here os

21:58 gog` has joined #osdev

21:59 misnor has quit [Ping timeout: 252 seconds]

21:59 <Jari--> android partly famous for its games

21:59 gog has quit [Ping timeout: 252 seconds]

21:59 gog` is now known as gog

22:03 <heat> running candy crush for my mom is the world's most critical task

22:11 <geist> the candy must flow

22:14 <geist> today's task: rearrange some VLANs at my home network to move the main LAN traffic off the untagged network

22:15 <heat> lk/fuchsia as a router/switch when?

22:15 <geist> currently using pfsense, but one day i'll use LK as my router!

22:20 kori has joined #osdev

22:22 <heat> if I had a cool embedded OS that runs everywhere I would make sure it ran everywhere

22:22 <heat> hmm, i wonder if lwip is good for routing?

22:22 <geist> yeah it's my general lack of focus that holds it back. too many projects at once

22:23 <geist> my general issue with lwip, at least last time i looked at it, is it's pretty un-performant

22:23 <geist> iirc it's general strategy is to put everything in a queue and then iterate ont he queue

22:23 <heat> steal BSD's stack

22:23 <heat> as is tradition

22:23 <gog> i will not use pfsense again

22:23 <geist> makes for a simple design and fairly easy to deal with, but there's no way that scales up to routing speed

22:23 <geist> gog: oh? you've had issues?

22:24 <gog> the wireguard scandal made me question their review practices

22:24 <geist> ah

22:24 <gog> for a distro intended for secure appliances i wouldn't even consider it for a business that needs to comply with PCI-DSS

22:24 <geist> that's fair. i'm just not sure i particularly trust anything else any better

22:24 <gog> yeah

22:25 dutch has quit [Ping timeout: 252 seconds]

22:25 <gog> that's the problem isn't it

22:25 <geist> also trying to figure out why my network traffic is much higher this month

22:25 mxshift has joined #osdev

22:25 <geist> peaking around 1.9TiB this month, which is much higher than usual

22:25 <gog> eep

22:26 <geist> the router seems to indicate that it's about even in/out which is suspicious

22:26 <gog> you got a torrent runnign somewhere you forgot about?

22:26 <heat> can it track per-host?

22:26 <geist> sort of. i have a plugin darkstat that can, but it seems to only go back a few days

22:27 <geist> and in that case nothing sticks out, mostly just traffic between me an google

22:27 <gog> were you working from home more than usual this month?

22:27 <heat> goma?

22:27 dutch has joined #osdev

22:27 <heat> plus whatever build caching solution you have for rust

22:27 <geist> well, that does have a fair amount to do with it. i'm looking at stuff coming from my work vlan recently

22:28 <geist> and yeah, like one work host has moved 45GiB in the last few days

22:28 <geist> 32GB on the 'out' path

22:28 <geist> but this makes sense: nightly backups for work machines, though they're usually deltas

22:29 <geist> also to note: xfinity adds up both in/out traffic for that 1.9

22:30 <geist> which is about what i see here: my router is showing 755GiB in, 617GiB out, and i restarted it last around october 10th so it mised the first 10 days of the month

22:30 <geist> i'm just mostly curious about that 617

22:34 sprock has joined #osdev

22:40 <sortie> SO. I debugged my weird crash. I found out the Descriptor object was alive but the Vnode it pointed to appears to have been freed and reused.

22:40 heat has quit [Remote host closed the connection]

22:40 <sortie> The file descriptor said it was an Unix socket

22:41 <geist> !!

22:41 heat has joined #osdev

22:41 <sortie> It's crashing in my solanum server calling ppoll(2)

22:41 <sortie> So I'm thinking I got a Unix socket refcount issue when they're being passed. I'm already aware of such a bug, saw a weird crash about that the other day when casually testing

22:43 <gog> aaay

22:43 <sortie> But hey super duper impressive.. I had a failing VM with just a qemu monitor and a kernel without symbols, and I was able to rebuild a matching debug kernel, attach gdb, recover the thread frame of the crashed thread, and then do stack traces and lots of inspections to get a clue about what caused this super weird corruption

22:43 <sortie> Like this is a production system, not meant for debugging at all, and I was able to inspect it really powerfully and see what's going on inside it

22:44 <sortie> I restarted my irc.sortix.org network :)

22:44 <gog> sortix sophisticated

22:44 ids1024 has joined #osdev

22:44 <heat> careful with "rebuilding matching debug kernels"

22:44 <heat> pr

22:44 <sortie> Let's see if stays up, I mean this bug is rare

22:44 <sortie> heat, hmm?

22:44 <heat> particularly if you don't have reproducible builds

22:44 <sortie> Yeah at least my builds are pretty reproducible

22:45 <sortie> I was even able build my kernel on Linux matching the binary that I got when I built it natively on Sortix

22:45 <heat> same sha?

22:45 <sortie> Probably not, didn't check, but offsets for all the major stuff matched

22:54 moberg1 has joined #osdev

22:54 moberg has quit [Ping timeout: 255 seconds]

23:01 potash has quit [Quit: ZNC 1.8.2 - https://znc.in]

23:05 Burgundy has left #osdev [#osdev]

23:08 <heat> inling asan instrumentation makes my kernel 1MB larger

23:09 <heat> it also makes it wayyy more usable

23:10 <nick64> What is the reason for this error? https://postimg.cc/K3L1m4vR

23:10 <bslsk05> postimg.cc: Screenshot 2022 10 31 at 3 02 33 AM — Postimages

23:11 <heat> null pointer dereference

23:11 <heat> your kernel code is buggy

23:12 <heat> compile with KASAN+UBSAN and you might get more info

23:13 <nick64> I mean, why is kernel making the null pointer reference here? It is deliberately doing it to make it crash for security form smap, but I explicitly disabled security

23:14 <heat> because you're doing it?

23:14 <nick64> I am doing it, but just to be sure we are on same page, what do you mean by "it"?

23:15 <heat> *(char*)0x0

23:15 <nick64> To clarify: I am not doing NULL deref, kernel is just deliberately doing this error condition to make it crash to prevent hacking

23:16 <heat> where? why?

23:16 <nick64> What I can't figure out is, I have disabled security on kernel, and can't seem to understand why it is still crashing

23:16 <heat> linux doesn't have "hacking" prevention

23:16 <nick64> I am clearing bit 21 of CR4 register in the code

23:16 <nick64> that is a bit to prevent hacking

23:16 <heat> unless some weird ass security module you have is doing that

23:16 <heat> no, that has nothing to do with deref'ing a NULL

23:17 <nick64> From what I understand, kernel (the kernel thread) is deliberately taking the BUG branch that is usually taken for NULL Deref, to make the thread crash, and it is not an ACTUAL null deref

23:19 <heat> what kernel thread

23:19 <nick64> Whichever thread is running the code that is accessing usermode pages from kernel

23:19 <j`ey> if you've changed the code, to do something different from how linux does it now, maybe you need to change more things?

23:19 <nick64> But the actual question in, how do I ask the OS to temporarily disable smap security

23:19 <heat> nick64, stack trace pls

23:19 <heat> you don't

23:20 <nick64> I have done what is recommended inthe OSDev wiki

23:20 <heat> you omitted the stack trace, the only relevant thing here

23:20 <nick64> Let me do a dump_stack real quick

23:21 <nick64> Wait, that's not what you want, you want the stack trace on crash.. let me screenshot that

23:21 <heat> the oops should have the trace

23:22 <nick64> yeah

23:26 <nick64> heat: https://postimg.cc/DmxKj7vT

23:26 <bslsk05> postimg.cc: stacktrace — Postimages

23:28 <heat> memcpy to a null address

23:28 <heat> your patch_code is wrong

23:31 <nick64> patch_code is just invoking kernels built in text patch function, not doing anything from my own implementation there: https://postimg.cc/bDM9rG0P

23:31 <bslsk05> postimg.cc: patch code — Postimages

23:32 archenoth has quit [Ping timeout: 248 seconds]

23:33 <heat> god that looks confusing

23:34 <nick64> That text_poke is a kernel built in function, if that sounded like something I have defined somewhere else : https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/text-patching.h#L44

23:34 <bslsk05> elixir.bootlin.com: text-patching.h - arch/x86/include/asm/text-patching.h - Linux source code (v6.0.6) - Bootlin

23:35 <nick64> Oh and those function pointers I am passing to it via that wrapper function patch_code is something that The Almighty Kernel itself hinted me, so that can't be wrong, right? : https://postimg.cc/0MgNNP63

23:35 <bslsk05> postimg.cc: fnptrs — Postimages

23:37 <nick64> This wiki says that SMAP is determined by a bit in CR4 so that is what I am trying to do to avoid this error in the original screenshot, which cries out loud that it has something to do with smap. It says it is 21st bit so I am just clearing it: https://wiki.osdev.org/CPU_Registers_x86-64#CR4

23:37 <bslsk05> wiki.osdev.org: CPU Registers x86-64 - OSDev Wiki

23:37 <heat> I don't know, have you checked?

23:37 <gog> nick64: "=m" (new_cr4) should be "+m"

23:37 <heat> why are you touching smap anyway?

23:37 <gog> actually no nvm

23:37 <gog> you don't read it

23:38 <heat> smap is so you don't touch user memory in kernel space without it being deliberate

23:40 <nick64> I am not particularly doing SMA, but the kernel seems to be complaining of SMAP, so figured it maybe side effect of modifying kernel text section from a module insmod'ed from userspace

23:41 * klange smells linux talk

23:41 <gog> no, the kernel is being modified from kernel space

23:41 <gog> the module becomes part of kernel space

23:42 <nick64> Yeah, beats me why it is complaining smap, was trying to shut the kernel up about it anyway by disabling it

23:42 <gog> what's happening is the kernel is trying to dereference a pointer at 0x0000000000000023

23:42 <gog> which is in a userspace VMA

23:42 <gog> idk how SMAP works exactly so i'm gonna read for a minute

23:42 <nick64> I assumed that error message would be generic smap fault error message

23:42 <nick64> not sure

23:43 <heat> please show all the errors you're getting

23:43 <nick64> because 23 is definitely not null

23:43 <nick64> The first screenshot has all errors

23:43 <nick64> let me find that

23:43 <heat> 0x23 is ((struct blah*)0)->field for sure

23:44 <nick64> Oh okay got it, but still think that is generic error path taking for smap failure

23:44 <nick64> This is the error message : https://postimg.cc/K3L1m4vR

23:44 <nick64> Rest all is just stacktrace and registers

23:45 <nick64> Oh I will share the error on the insmod command

23:46 <nick64> Actually nothing much there really https://postimg.cc/kRKYdN0z

23:46 <bslsk05> postimg.cc: insmod err — Postimages

23:48 archenoth has joined #osdev