klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
thinkpol has quit [Remote host closed the connection]
moatx has quit [Quit: Leaving]
thinkpol has joined #osdev
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
moatx has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
<geist> huh my driver has a problem with the i219 that's built into the alder lake machine
amine has quit [Quit: Ping timeout (120 seconds)]
<geist> lets see what the diff is between i210 and i217 and i219
<geist> yay
[itchyjunk] has joined #osdev
amine has joined #osdev
xalarm has quit [Quit: joins libera]
<geist> oh interesting, https://pci-ids.ucw.cz/read/PC/8086 says it's a i219-V but it's different than the raw i219 manual
<bslsk05> ​pci-ids.ucw.cz: PCI Devices
<geist> since it's integrated into the cpu i guess it's some embedded versions
<heat> nic integrated into the cpu?
<geist> well, on the SOC i guess
<geist> since it shows up at address 0:1f.0
<geist> otherwise it'd be on the other side of a pci bridge i guess
<heat> ah yeah
<heat> what will happen when intel runs out of pci device id space?
<heat> they're getting remarkably close
moatx has quit [Quit: Leaving]
<heat> ah I think they also own 8087
<geist> oh i see. i think it's because its integrated it doesn't seem to have an EEPROM programming model
<geist> and thus the code i have that tries to read the mac address instantly locks up because it is getting nothing back
<kazinsal> ah, yeah, on chips without an eeprom I think you can just grab it from bar0+0x5400
<geist> the i219 manual is confusing as heck. it almost looks like the registers are laid out completely differently or something
<kazinsal> the one that just jumps straight into the PHY registers at the beginning of Register Definitions?
<geist> yes!
<geist> as if you're supposed to access it via 2 phy banks or whatnot
<geist> there's some leap of logic i have to make there to grok what they're talking about
<kazinsal> yeah, something is very odd about how Intel built the i219 as even on the Ark site it says the i219 is *just* a PHY
<geist> exactly, does it as a result use a different driver? (/me boots back into linux to see what it does)
<kazinsal> I think it still uses e1000e
<geist> yeah ui'm wondering if everytning (EEPROM aside) is really mapped the same way, they just switched to a different mechanism to describe it
<geist> in terms of PHY addresses and whatnot
<geist> yeah seems to be e1000e
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/5zLADT8E
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/gcwUplWg
<geist> so i think it's an i219 basedon the string, though it's unclear what (17) means in this case
[itchyjunk] has quit [Ping timeout: 250 seconds]
<heat> e1000e best driver
<geist> there's another set of pcid ids on the pci id site that has a (16)
<kazinsal> it's related to the revision and whether it's a -V or -LM I think
<heat> "how many nics do you want to support?" "yes"
<kazinsal> the -V is just an -LM that doesn't support VT-d passthrough
<kazinsal> my 8700K machine has an I219-V rev 00
<kazinsal> 8086:15B8
<geist> oh that makes sense. the i219 manual i found on the interwebs just describes the discrete chip version of it i think and has 0x156f and 0x15b7
<heat> what's VT-d again?
<geist> oh hah even better, a 15b8
<kazinsal> VT-d is PCI passthrough
<geist> oh yeah if you look at https://pci-ids.ucw.cz/read/PC/8086 there are a butt-ton of i219 variants, each with a (N) after it
<geist> goddamnit intel
<geist> why do they keep rolling device ids for minor changes? I guess maybe that's what you're supposed to do, but seems like if it's basically compatible you should report the same id with a features/revision field inside it
<kazinsal> looks like one new one for every PCH change, god damn
<geist> they do this with everything else too, but at least something like AHCI you can match based on a class/subclass thing
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/5oD11mFH
<geist> yah so the specific one i have here is E1000_DEV_ID_PCH_ADP_I219_V17
[itchyjunk] has joined #osdev
<heat> are skipping the fact that intel's naming makes no sense
<heat> for everything
myon98 has quit [*.net *.split]
duckworld has quit [*.net *.split]
myon98 has joined #osdev
duckworld has joined #osdev
<bslsk05> ​github.com: linux/netdev.c at master · torvalds/linux · GitHub
<kazinsal> yeah
<geist> and via some indirects leads to a bunch of descritors of stuff
<geist> and pointers to accessor routines. yay.
<geist> well okay, this driver will not be fixed today then
<heat> that's the i915 syndrome
<geist> yah and since there's enough subtle variations, it's not even safe to generally set your driver to bind to something you haven't personally seen
<kazinsal> yeah, the official e1000e driver is a bit of a mess
Affliction has quit [*.net *.split]
<geist> but then since there are like a hundred variants you never see all of them
Affliction has joined #osdev
<kazinsal> and a lot of the variants just boil down to "same shit, different coat of paint"
<heat> the freebsd one is also horrible btw
<geist> the gist at least for reading the mac address is apparently you have to read it out of a nvram. i bet it is ultimatley as kazinsal says: there's some flash nvram mapped into the aperture (which is larger than i'd seen in the past, 128K) and there's some structure to it
<geist> that the i219 manual is describing. so probably you just directly read the flash structure. seems to be what these accessor routines are doing in the intel manual
duckworld has quit [*.net *.split]
Affliction has quit [*.net *.split]
duckworld has joined #osdev
duckworld has quit [Max SendQ exceeded]
Affliction has joined #osdev
Affliction has quit [Max SendQ exceeded]
duckworld has joined #osdev
Affliction has joined #osdev
<kazinsal> aha! found something relevant in the bsd em driver
<bslsk05> ​github.com: src/if_em.c at master · openbsd/src · GitHub
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/usyE3VEh
ephemer0l has quit [Remote host closed the connection]
<geist> yah the pch ones i guess are the builtins and the tcp/adp/cnp stuff is the family name
<geist> tiger lake, alder lake, etc
nyah has quit [Ping timeout: 276 seconds]
spikeheron has quit [Quit: WeeChat 3.7.1]
spikeheron has joined #osdev
vdamewood has quit [Read error: Connection reset by peer]
vdamewood has joined #osdev
gog has quit [Ping timeout: 250 seconds]
kof123 has joined #osdev
<heat> weird q: is there a problem in doing OOB reads on string functions (strlen, etc)?
<zid> yes?
<zid> Like, very a lot
<heat> I was trying to be precise in my KASAN bounds (as in making it exactly the size the caller asked for, and not the rounded up obj size) but strlen is triggering it
<heat> and it's not possible to not trigger it and still have a fast implementation
<heat> reading more has zero side effects but triggers kasan
* zid puts the string at the end of a page
<zid> zero effects other than that page fault.
<heat> which is why you align the pointer
<heat> you're never getting an unwanted page fault reading 8 bytes if your pointer is 8 byte aligned
<zid> okay then *puts a string on his gameboy stack*
<zid> it might be a viable optim on x86 if you can prove a bunch of stuff for sure
<zid> but it isn't.. universally legal
<heat> it's not prove, it's that you read word-sized chunks on word-sized boundaries
epony has quit [Ping timeout: 252 seconds]
<heat> you obv need to manually align the ptr in strlen itself
<bslsk05> ​github.com: Onyx/strlen.cpp at master · heatd/Onyx · GitHub
epony has joined #osdev
epony has quit [Ping timeout: 252 seconds]
flx has quit [Remote host closed the connection]
flx has joined #osdev
<heat> i'm just adding an ASAN exception for strlen I guess
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
zaquest has quit [Remote host closed the connection]
zaquest has joined #osdev
Rubikoid has quit [*.net *.split]
weinholt has quit [*.net *.split]
Ermine has quit [*.net *.split]
eau has quit [*.net *.split]
jjuran has quit [*.net *.split]
klange has quit [*.net *.split]
merry has quit [*.net *.split]
kori has quit [*.net *.split]
DoubleJ has quit [*.net *.split]
dminuoso has quit [*.net *.split]
moberg1 has quit [*.net *.split]
colona has quit [*.net *.split]
Mutabah has quit [*.net *.split]
seer has quit [*.net *.split]
CompanionCube has quit [*.net *.split]
_xor has quit [*.net *.split]
alpha2023 has quit [*.net *.split]
Rubikoid has joined #osdev
klange has joined #osdev
jjuran has joined #osdev
CompanionCube has joined #osdev
colona has joined #osdev
DoubleJ has joined #osdev
Oshawott has joined #osdev
Ermine has joined #osdev
moberg has joined #osdev
merry has joined #osdev
seer has joined #osdev
dminuoso has joined #osdev
archenoth has quit [Ping timeout: 272 seconds]
Mutabah has joined #osdev
heat has quit [Ping timeout: 246 seconds]
Terlisimo has quit [Quit: Connection reset by beer]
sprock has quit [Remote host closed the connection]
Terlisimo has joined #osdev
darkstardevx has quit [*.net *.split]
ZipCPU has quit [*.net *.split]
ZipCPU has joined #osdev
darkstardevx has joined #osdev
Griwes has quit [Excess Flood]
sprock has joined #osdev
Griwes has joined #osdev
kaichiuchi has quit [*.net *.split]
danlarkin has quit [*.net *.split]
sham1 has quit [*.net *.split]
stephe has quit [*.net *.split]
DanDan has quit [*.net *.split]
ElementW has quit [*.net *.split]
sm2n has quit [*.net *.split]
gjnoonan has quit [*.net *.split]
exec64 has quit [*.net *.split]
alethkit has quit [*.net *.split]
milesrout has quit [*.net *.split]
alecjonathon has quit [*.net *.split]
noeontheend has quit [*.net *.split]
ddevault has quit [*.net *.split]
tom5760 has quit [*.net *.split]
ggherdov has quit [*.net *.split]
cultpony has quit [*.net *.split]
dragestil has quit [*.net *.split]
phr3ak has quit [*.net *.split]
w41 has quit [*.net *.split]
Luci-ghoule has quit [*.net *.split]
mxshift has quit [*.net *.split]
ElementW has joined #osdev
phr3ak has joined #osdev
noeontheend has joined #osdev
tom5760 has joined #osdev
alethkit has joined #osdev
gjnoonan has joined #osdev
exec64 has joined #osdev
alecjonathon has joined #osdev
ddevault has joined #osdev
cultpony has joined #osdev
dragestil has joined #osdev
sm2n has joined #osdev
milesrout has joined #osdev
sham1 has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
epony has joined #osdev
xalarm has joined #osdev
<Jari--> morning /s /w /p
Jari-- has quit [Quit: apt upgrade && reboot]
xalarm has quit [Read error: Connection reset by peer]
varad has quit [Quit: Ping timeout (120 seconds)]
varad has joined #osdev
merry has quit [Ping timeout: 252 seconds]
klange has quit [Ping timeout: 252 seconds]
klange has joined #osdev
merry has joined #osdev
Terlisimo has quit [Ping timeout: 252 seconds]
Terlisimo has joined #osdev
<geist> oh no the sortix irc serer is down
GeDaMo has joined #osdev
gildasio has quit [Remote host closed the connection]
fkrauthan has quit [Quit: ZNC - https://znc.in]
dragestil has quit [Ping timeout: 252 seconds]
dragestil has joined #osdev
fkrauthan has joined #osdev
sham1 has quit [Ping timeout: 252 seconds]
fkrauthan has quit [Client Quit]
sham1 has joined #osdev
fkrauthan has joined #osdev
Oshawott has quit [Ping timeout: 250 seconds]
archenoth has joined #osdev
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
vai has joined #osdev
vai is now known as Jari--
<Jari--> good morning everyone
<klys> ack good night (morning jari--)
<GeDaMo> Good early morning, the clocks just went back :P
FireFly has joined #osdev
<geist> oh woot. another week before the clocks reset here
<Griwes> Yeah dst is not only stupid but also stupidly shifted by a week between europe and NA
<GeDaMo> Most (all?) of Europe uses the last weekends in March and October
<GeDaMo> "It includes all countries in the European Economic Area (EEA), except Iceland." https://www.timeanddate.com/time/europe/
<bslsk05> ​www.timeanddate.com: Time Zones and DST in Europe
<zid> no more daylight for 6 months, rip
<kof123> i was expecting an "iceland has no concept of time, nor trees, so doesn't need time zones" joke
vdamewood has joined #osdev
elastic_dog has quit [Ping timeout: 240 seconds]
elastic_dog has joined #osdev
poisone has joined #osdev
<FireFly> I mean, effectively :p
<FireFly> I know some people set Iceland as time locale to lock their time into UTC
<zid> I'd just.. set it to UTC
<zid> I wonder if that's possible in systemd
<zid> ah yea it appears timedatectl is just making a symlink
<zid> for openrc it's echo "Europe/Brussels" > /etc/timezone or whatever, and there are UTC files you can straight use rather than country files
<zid> \/usr/share/zoneinfo/UTC (or UCT if you're a frencho I guess)
<zid> no idea what GMT+0, GMT-0 and GMT0 are though..
<zid> maybe just aliases
xenos1984 has quit [Ping timeout: 246 seconds]
xenos1984 has joined #osdev
<Griwes> I'd hope those are aliases lol
<zid> the MD5s are all the same, phew
<zid> 'GB' and 'GB-Eire' are different though
weinholt has joined #osdev
<zid> might be interesting to write some code that can parse out some of the info from these files at some point
<bslsk05> ​datatracker.ietf.org: RFC 8536 - The Time Zone Information Format (TZif)
<zid> yea
<zid> implementing tzif2
<zid> Oh right.. I think my books are going to come out an hour later now? :(
<zid> hopefully it's an hour earlier and I can't tell time
<GeDaMo> Books?
dragestil_ has joined #osdev
dragestil has quit [Ping timeout: 252 seconds]
dragestil_ is now known as dragestil
fkrauthan_ has joined #osdev
fkrauthan has quit [Ping timeout: 252 seconds]
fkrauthan_ is now known as fkrauthan
epony has quit [Ping timeout: 252 seconds]
wootehfoot has joined #osdev
epony has joined #osdev
k0valski1889 has joined #osdev
<Mondenkind> ohai FireFly. Didn't realise you were here
<Mondenkind> and in #tendra!
puck has quit [Excess Flood]
puck has joined #osdev
DanDan has joined #osdev
<FireFly> oh heya
<FireFly> zid: yeah, more for cases where that's not (easily) possible, like mobile phones or whatever where the interface wants you to pick a location
ss4 has joined #osdev
wootehfoot has quit [Ping timeout: 252 seconds]
ss4 has quit [Ping timeout: 246 seconds]
nyah has joined #osdev
chartreuse has quit [Ping timeout: 260 seconds]
potash has quit [Quit: ZNC 1.8.2 - https://znc.in]
poisone has quit [Remote host closed the connection]
nick64 has joined #osdev
<nick64> `__asm__ volatile ("clac");` is sufficient to disable SMAP right? Or do I have to edit the EFLAGS as well?
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
potash has joined #osdev
poisone has joined #osdev
mahk has quit [Ping timeout: 268 seconds]
k0valski18896 has joined #osdev
k0valski1889 has quit [Ping timeout: 252 seconds]
k0valski18896 is now known as k0valski1889
* sortie debugs the way root intended, the only primitives available is to print the current registers, and to hexdump memory addresses; proceeds to manually do a stack trace, having rebuilt a similar binary with symbols, which I manually inspect with shell commands to manually calculate offsets
<sortie> I really need to invest in building a stable kernel debugging solution for my OS, but this qemu monitor fallback is at least theoretically able to provide all the information, if I behave like gdb in my mind
<sortie> It's one of those days when one of my production VMs faulted on a super rare kernel bug, so VNC'd into the qemu to debug it
<sortie> int sys_ppoll(struct pollfd* user_fds, size_t nfds, const struct timespec* user_timeout_ts, const sigset_t* user_sigmask)
<sortie> ...
<sortie> if ( desc->poll(&ctx, node) == 0 )
<sortie> ^^^ https://gitlab.com/sortix/sortix/-/blob/b6bbc05eafba25dfba2bc8fc7fa13bdc2f3aeb85/kernel/poll.cpp#L295 ← The stack trace continues, the desc and inode call frames got tail optimized away
<bslsk05> ​gitlab.com: kernel/poll.cpp · b6bbc05eafba25dfba2bc8fc7fa13bdc2f3aeb85 · sortix / Sortix · GitLab
<sortie> Let's see if I can somehow untangle all of this assembly and see if I can examine the data structures
Jari-- has quit [Ping timeout: 246 seconds]
epony has quit [Remote host closed the connection]
<sortie> Hooo boy that's cursed
epony has joined #osdev
potash has quit [Read error: Connection reset by peer]
potash has joined #osdev
<klange> yeah, kernel issues with threads that never really cropped up in my dumb tests
gog has joined #osdev
elastic_dog has quit [Remote host closed the connection]
elastic_dog has joined #osdev
bauen1 has quit [Quit: leaving]
bauen1 has joined #osdev
bauen1 has quit [Remote host closed the connection]
awita has joined #osdev
isaacwoods has joined #osdev
poisone has quit [Remote host closed the connection]
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
awita has quit [Ping timeout: 252 seconds]
spikeheron has quit [Quit: WeeChat 3.7]
dutch has joined #osdev
poisone has joined #osdev
heat has joined #osdev
[itchyjunk] has joined #osdev
vai has joined #osdev
vai is now known as Jari--
<Jari--> howdy
<heat> mfw i can't tell if its the sanitizer's thats buggy or the code that's buggy
gog` has joined #osdev
gog has quit [Ping timeout: 252 seconds]
awita has joined #osdev
bauen1 has joined #osdev
poisone has quit [Ping timeout: 272 seconds]
<zid> I have pizza, so your comment is irrelevent
_awita_ has joined #osdev
awita has quit [Read error: Connection reset by peer]
gog` has quit [Remote host closed the connection]
gog` has joined #osdev
gog` is now known as gog
poisone has joined #osdev
<zid> me_irl
<GeDaMo> # I'm as honest as the day is long; the longer the daylight, the less I do wrong # :P
<bslsk05> ​'Madness - Shut Up (Official Video)' by Madness (00:03:28)
<gog> made this because we deploy tomorrow and i'm going to need it because i have a major feature going live
Jari-- has quit [Remote host closed the connection]
<gog> :P
lkurusa has joined #osdev
lkurusa has quit [Remote host closed the connection]
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
netbsduser has joined #osdev
<bslsk05> ​eloncodereview.com: Elon Code Review - Print it out and Elon Musk will check it
_xor has joined #osdev
theWeaver has joined #osdev
<gog> lmao
<gog> how would you even satisfy this request
<gog> print the last n days of PR's?
<gog> elon really has no clue whatsoever how programming works
dutch has quit [Quit: WeeChat 3.7]
<theWeaver> wait wait wait
<theWeaver> did he actually tell ppl to print their code
<gog> yes
<theWeaver> smh
<gog> for him to review
<theWeaver> LOL
<bslsk05> ​twitter: <CaseyNewton> Here it is: Twitter engineers were told today to *print out* their last 30 to 60 days of code, so they could show it to Elon Musk himself. ␤ ␤ Then they were told wait, no, actually, please shred all that code you just printed out. ␤ ␤ Subscribe to read ➡️ https://www.platformer.news/p/elon-takes-over-twitter https://pbs.twimg.com/media/FgMOJdmUAAAiR0T.jpg
<bslsk05> ​[According to four current employees, engineers spent Friday afternoon at Twitter dutifully printing out their code in anticipation of meetings with Musk and some of his senior engineers from Tesla. Other engineers were told to prepare for “code pairing” with Musk, in which they would sit with him and review code together. ␤ ␤ Just after noon, an executive assistant asked engineers to begin preparing code to show to Musk. “Please print out
Raito_Bezarius has quit [Quit: free()]
<theWeaver> lmfao
<gog> half of my code changes are deltions, how do you represent that
<gog> or does he just want to see the current state
<gog> what does it even mean
<theWeaver> the - lines on the diffs
<theWeaver> i guess
<gog> that was my intuition too
<gog> just print out your last n days of diffs
<gog> but if it's git format then you only have 3 lines of context
<theWeaver> i knew he was a moron but i wasnt expecting this
<theWeaver> he will have to infer the broader context himself
<theWeaver> ¯\_(ツ)_/¯
<gog> anyhow, when my wife told me this story i was like "what the fuck does that even mean"
<theWeaver> it means "musk is a freaking idiot"
<gog> yes
* zid shreds all of his code too in solidarity
<zid> sorry to all 4 users
<theWeaver> lol
dude12312414 has joined #osdev
<gog> that's infinity times as many users as my code
<zid> idk how many users it has it got pirated
<zid> I gave it to 4 people
xenos1984 has quit [Read error: Connection reset by peer]
<bslsk05> ​stackoverflow.com: Why DLL is not imported in the python script as the other libraries? - Stack Overflow
<zid> hey there's a nonsense stackoverflow question about it even
<zid> ctrl-f zid
<zid> possibly, anyway
<zid> TIL bing indexes this channel's logs and google doesn't
fatal1ty has joined #osdev
dude12312414 has quit [Ping timeout: 258 seconds]
sikkiladho has joined #osdev
<gog> bing
<zid> I always want to say "intel inside" but with swapped to bing, so like.. bingle inside
dude12312414 has joined #osdev
xenos1984 has joined #osdev
MrBonkers has quit [Quit: ZNC 1.8.2+deb2build5 - https://znc.in]
_Agua_ has joined #osdev
MrBonkers has joined #osdev
_awita_ has quit [Read error: Connection reset by peer]
dutch has joined #osdev
<bslsk05> ​aminet.net: Aminet - mods/voice/IntelOut.lha
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
tomaw has quit [Quit: Quitting]
sm2n has quit [Ping timeout: 255 seconds]
sm2n has joined #osdev
mahk has joined #osdev
tomaw has joined #osdev
xenos1984 has quit [Ping timeout: 246 seconds]
xenos1984 has joined #osdev
xenos1984 has quit [Ping timeout: 276 seconds]
Raito_Bezarius has joined #osdev
Raito_Bezarius has quit [Max SendQ exceeded]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
_xor has quit [Ping timeout: 252 seconds]
_xor has joined #osdev
xenos1984 has joined #osdev
Raito_Bezarius has joined #osdev
Raito_Bezarius has quit [Max SendQ exceeded]
kaichiuchi has joined #osdev
_Agua_ has quit [Quit: Leaving]
Raito_Bezarius has joined #osdev
<heat> my scheduler is ASAN unfriendly because it doesn't unwind the stack when context switching
<heat> ://
<heat> this is legitimately annoying
<heat> asan stack poisoning just poisons, but doesn't unpoison valid shadow areas
<fatal1ty> heat hi
<heat> my code assumed it could jump around with the stack without unwinding, but now it's just making valid stack areas have poisoned bytes ://
<heat> hi
<fatal1ty> poisoning, validity ...
<fatal1ty> i like you more and more by the minute
<fatal1ty> a real 2022 gangsta
<fatal1ty> im in
<fatal1ty> RofL
<fatal1ty> just to make it clear for everybody
<fatal1ty> whenever i be in a position when i need help with security stuff, i want to see all the gangsters here come with solutions
<fatal1ty> but right now i just brushing on my C
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
MrPortmaster has joined #osdev
jjuran has joined #osdev
MrBonkers has quit [Read error: Connection reset by peer]
fatal1ty has quit [Ping timeout: 276 seconds]
fatal1ty has joined #osdev
nur has quit [Ping timeout: 264 seconds]
poisone has quit [Remote host closed the connection]
sikkiladho has quit [Quit: Connection closed for inactivity]
epony has quit [Ping timeout: 252 seconds]
epony has joined #osdev
gog` has joined #osdev
gog has quit [Read error: Connection reset by peer]
gog` is now known as gog
misnor has joined #osdev
chartreuse has joined #osdev
dutch has quit [Quit: WeeChat 3.7]
dutch has joined #osdev
_xor has quit [Ping timeout: 252 seconds]
_xor has joined #osdev
sprock has quit [Remote host closed the connection]
sprock has joined #osdev
GeDaMo has quit [Quit: You are becoming what we French call 'Le Fruitcake'.]
sprock has quit [Quit: brb]
divine has quit [Ping timeout: 240 seconds]
sprock has joined #osdev
* geist yawns
<geist> totally slept in today, though i think i needed it
<heat> 💯
<fatal1ty> at least you can sleep]
<fatal1ty> be thankful for it
<fatal1ty> when im sleeping i regret that i got up in the morning
<bslsk05> ​'Prison Break(2005) S01 E03 - I don't sleep (6/6) l Netflix' by 3 Minute Drama (00:03:40)
sprock has quit [Client Quit]
epony has quit [Ping timeout: 252 seconds]
<heat> geist, have you played around with ASAN yet?
<fatal1ty> heat: shut up
<geist> not personally, aside from fuchsia
<heat> i've been reworking my implementation this weekend
<heat> the early memory mapping is... gnarly
<geist> we do however have asan and kasan for fuchsia, and there are bots that run it, etc
<geist> yeah i think it is. burns like 1/8 of ram up front for it
<geist> (for kasan that is)
<heat> that's not the worst
<geist> for user space asan you can rely on overcommitting and lots of virtual memory to allocate and demand fill in the bitmap
<heat> I can't do that because I do ASAN for vmalloc (which dynamically allocates address space)
<geist> are we talking about user sace or kernel space?
<heat> so I essentially implemented a CoW-like thing for the shadow mapping's page tables
<heat> kernel
* geist nods
<heat> if you have asan-stack=1 you're even more screwed
Burgundy has joined #osdev
<heat> not only do you need the early boot shadow zero mapping, but you need a real mapping for the stack because the compiler will emit code that writes directly to it
<heat> (you have a TODO in fuchsia)
<geist> unrelated: i read the other day that Intel Atoms from 5 or 6 years ago in the Rangeley class apparently fail over time
<heat> isn't that normal?
<geist> well, over time as in lots of them already have, after just a few years
<geist> and my firewall is one of them
<heat> oh
<geist> it's apparently some sort of manufacturing defect that causes the i think LPC bus to fail to work, which usually causes most machines to not post
fatal1ty has quit [Read error: Connection reset by peer]
<geist> since the bios can't talk to peripheral devices
<geist> somethingl ike some internal clock derivatoin logic fails over time prematurely
<heat> cpu or chipset?
<heat> I can never tell if things are apart of the CPU itself or the chipset
<geist> both it's a single core thing
<bslsk05> ​ark.intel.com: Products formerly Rangeley
<geist> ie, intel C2338, etc. they were an earlier rev of a line of atoms with lots of built in networking stuff
<geist> sold for exactly stuff like firewalls and whatnot. still very popular
<geist> in fact mine is literally a C2558
<gog> i cannot connect to the professional sortix network again :(
<geist> anyway what people are reporting is a very high failure rate on these
<heat> the professional sortix network had a crash
<geist> yah its still down it seems
<heat> sortie is on it
<gog> ooof
<sortie> gog, you can evacuate to #sortix on this network instead :)
<sortie> It's the evacuation assembly point
<gog> good to know :)
<sortie> In case of fire, don't use the elevators for up/down in gdb
<sortie> But yeah I managed to attach a gdb to the crashed kernel on my production VM and am right now trying to understand where the bad pointer causing a GPF came from
<gog> oops! #GP no fun
<geist> sortie: omg hasnt work told you you immediately roll the change back and fix the failure later?
<sortie> geist, would do if it hit the sortix.org front page, but irc.sortix.org is my test bed for debugging critical stability issues
<sortie> Better if it's down for a day if I debug a bad kernel crash
<geist> heh i'm kidding. i find it kinda annoying some time (that policy) but it's the rules!
<sortie> Oh it's def a policy for a reason and I do it all the time when I break infra
<geist> ie, always roll back never fix forward
<geist> yah it's just annoying when the fix forward is like a one line typo
<sortie> And then I debug the problem and reland a fixed version
<sortie> Well I do fix forward now and then
<sortie> But only if a fix is obvious
<sortie> geist, though you definitely appreciate the value of a super rare kernel crash
<sortie> You know how stable the irc server has been for months and then whoops there's a super mysterious crash
<geist> oh tell me about it. now that we have bazillion hours of zircon running i the field there's enough kernel crashes getting reported that there are some really bizarro ones to try to scratch your head over
<gog> cosmic rays
<geist> though many of them are flipped bits, or at least *have* to be, and the kernel is the canary
<gog> fatal single-bit flip caused by a high-energy particle originating deep from outer space
<geist> doesn't help that the kernel is very very highly asserted, so it tends to trip over anything slightly amiss. keeps a tight ship
MrBonkers has joined #osdev
MrPortmaster has quit [Ping timeout: 252 seconds]
<geist> acyually a thing you can find is sometimes there's one or two devices in the field that clearly have bad DRAM
<geist> and just continually generate failures. you feel bad for the owner but theres nothing you can do
<heat> why are the asserts still compiled in for release?
<geist> because they find problems
<heat> but you're complaining that they're finding too many
<geist> though to be clear we have two levels of asserts: ones that are DEBUG_ASSERTS that are turned off for production devices
<geist> and then regular ASSERTs that are left on all the time
<geist> sure, but it's *good*
<geist> i'm complaining not complaining
<gog> compilaining
<gog> is when you bitch about gcc or clang
<gog> get it
<gog> do y'all get my hilarious pun
<heat> hehehehehe
<geist> but what i mean is when you suddenly have millions of hours on your kernel and getting live reports of kernel panics it starts to get a little intimidating
<heat> totes
<geist> even if it's only a few here and there. the stragglers are the real weird ones
<geist> for example, there's an *extremely* rare bug that we've never been able to reproduce and keep adding more code to try to catch it: when tearing down an arm64 aspace, it asserts that the page tables are empty
<heat> I don't know how it looks in the operating systems world but for distributed systems (like the one I worked on at CF) it heavily depended on the team
<geist> ie, it's mandatory that something already called Unmap() across all of the mapped things, so by the time the aspace tears down the top level page table should be empty
<gog> me every time one of our watchdogs sends me an exception report
<geist> but extremely rarely it's not
<heat> the nginx team ran an extremely tight ship. no sentry issues *ever*
<geist> so been thinking it's a cosmic ray, but it's so hard to catch we have to just add another assert and wait another month
<geist> and we've gone over the state machine that should ensure that unmap() was called, and no other thing could have mapped into the page tables in the interim before destructing the object, etc
<geist> mind you *these* are the kind of failres i'm talking about we get reports on. the kernel is pretty solid, it's lingering issues like this that are tracking
<geist> i also just made another pass through the logic to make sure all the appropriate memory barriers are in place, etc.
<heat> yay memory barriers
<geist> tightened it up a bit, that might fix it? perhaps there's some extremely rare case where some PT entry gets written very late to the page table, but i can't imagine how
<heat> have you never reproduced it?
<geist> no.
<heat> how much info are you collecting?
<geist> that's the thing, given that we only see it extremely rarely and there are millinos of hours in the field, probably daily, it's clearly very very hard to reproduce
<heat> do you have coredumps?
<geist> oh no
<geist> just a panic message
<heat> :(((
<heat> get coredumps and check again
<geist> on purpose: has to not have any user data in it, etc etc
<geist> has to be anonymized, etc
<geist> and has to survive a reboot, so by the time the kernel panics it squirrels the panic data in a little memory area (or an EFI capsule if there) and the next kernel grabs it
<geist> but the space is very limited
<heat> does it? i think the law is a bit more fine tuned than (no personal information ever)
<geist> what do you mean?
<heat> we had law-compliant coredumps, but they had to remain in a remote server, etc
<geist> right, i mean perhaps, but a core dump of the kernel is... a lot
<geist> or i guess maybe the other way of looking at is is we've never even considered building a core dump system into the kernel because i dont know how we'd be able to use that in the field
<geist> so we get a panic message and about 4K of the kernel log leading up to it
<geist> it's fine, that's not the problem here
<geist> the problem here is the damn bug rarely reproduces
<heat> if you have the hw (which you probably do) and it never reproduced there, maybe it really is just a cosmic ray
<geist> perhaps
<geist> that's a theory, the logic in how the PT code cleans itself up and unmaps page tables means that if there's a bit corruption somewhere in a PT it'll keep that part of the tree from being cleaned up
<geist> that's our current working theory, but we have to add more crash time logic to confirm it
<geist> ie, 'if we sense that the top level PT is not empty, walk the PT leaves until you find the deepest PT and print all of the nonzero entries'
<geist> i think that's the current logic we have in there to try to catch it
<geist> add that, wait a few months
<heat> yeah
<heat> i mean, if you need to wait a few months, it's not that serious
<heat> you can keep drinking your mojitos in the sun
<geist> sure
<geist> but that was the original point of the discussion: it's the lingering rare bugs that are really annoying
<heat> linux gets around this by not having any automated bug reports
<heat> easy
* gog gets the crushed ice and muddles some mint
<heat> i wonder how windows' bug reports look
<heat> probably way scarier
<heat> think about it, you're running a relatively controlled experiment with devices you know and that were made by Google itself
<heat> not a crappy 2006 laptop
<geist> oh totally
<geist> the anonymization is frustrating though. for example if we see a device that continually blows up due to bad dram, there's no way to tie it to a user
<geist> i'd love to get the device back and send them a new one, but there's no way to do it
<geist> i mean i am 100% for all the anonymization but this is the one case where you feel bad for the owner but can't do anything about it unless they report it
<heat> maybe you could add some bad hw detection stuff
<heat> like if the OS crashes way too much, etc
<geist> i suppose. when the kernel detects it it's hard
atilla has joined #osdev
<atilla> use risc-v
<geist> but yeah, that's probably the only real strategy. if you get too many forced reboots that are not just due to someone unplugging it, etc, then try to get them to report it
<heat> I don't mean memtest, but actual "your stable build of fuchsia is crashing way too much. you probably have bad hw, please talk with support..."
<geist> yah that's the best you can do
<atilla> that wouldnt be happening if you were using riscv
<geist> i'll brig that up with the device metrics folks, see if they're not always thinking this
vai has joined #osdev
<geist> s/always/already
<vai> geist: howdy + friends
vai is now known as Jari--
<geist> howdy Jari--
<atilla> wassup baby
<geist> every time i see vai i think Steve Vai
<Jari--> geist: Finland is hot, more than 5 celcius today
<geist> then i think of the new Polyphia song, which is growing on me
<atilla> I am thinking of building an operating system that will keep niiggers away. How do I build one? I am thinking of detecting spelling mistakes and ebonics to initiate SSD wipe sequence
<Jari--> Anyone in the osdev teams thought of walking to the Silicon Valley and convince the investors to get in?
<geist> aaah here we go
<atilla> more like gayist amirite
atilla was kicked from #osdev by geist [atilla]
<heat> took you long
<geist> was waiting for it
<bslsk05> ​www.notebookcheck.net: Intel demos “Horse Creek” developer board with SiFive RISC-V CPU, DDR5 RAM and PCIe 5.0 slot - NotebookCheck.net News
<heat> Jari--, we've talked about this. no investor wants to build an OS
<heat> even embedded usually reuses other things
<geist> heat: yeah am very curious about that. will try to see if i can get one via work
<geist> hopefully it wont be super secret and if i do i can talk about it
<geist> but sometimes i have to be dark about it because NDA
<heat> noooo not darth geist!
<geist> omg i'd be the worst sith lord ever
<Jari--> heat: my plan is to borrow get parts from open source projects, result is an open source project OS
<Jari--> I dont believe I have much chance on propertiary markets, it simply sucks, the idea behind it
<heat> but why
<heat> no one that wants to make money will build an OS
<mjg_> :]
<heat> it really is just a "fuck you" money kind of thing
<Jari--> one day my kids can dig it up, and use it
<Jari--> :)
<heat> metaverse blockchain operating system!!!!11!1
<Jari--> click here os
gog` has joined #osdev
misnor has quit [Ping timeout: 252 seconds]
<Jari--> android partly famous for its games
gog has quit [Ping timeout: 252 seconds]
gog` is now known as gog
<heat> running candy crush for my mom is the world's most critical task
<geist> the candy must flow
<geist> today's task: rearrange some VLANs at my home network to move the main LAN traffic off the untagged network
<heat> lk/fuchsia as a router/switch when?
<geist> currently using pfsense, but one day i'll use LK as my router!
kori has joined #osdev
<heat> if I had a cool embedded OS that runs everywhere I would make sure it ran everywhere
<heat> hmm, i wonder if lwip is good for routing?
<geist> yeah it's my general lack of focus that holds it back. too many projects at once
<geist> my general issue with lwip, at least last time i looked at it, is it's pretty un-performant
<geist> iirc it's general strategy is to put everything in a queue and then iterate ont he queue
<heat> steal BSD's stack
<heat> as is tradition
<gog> i will not use pfsense again
<geist> makes for a simple design and fairly easy to deal with, but there's no way that scales up to routing speed
<geist> gog: oh? you've had issues?
<gog> the wireguard scandal made me question their review practices
<geist> ah
<gog> for a distro intended for secure appliances i wouldn't even consider it for a business that needs to comply with PCI-DSS
<geist> that's fair. i'm just not sure i particularly trust anything else any better
<gog> yeah
dutch has quit [Ping timeout: 252 seconds]
<gog> that's the problem isn't it
<geist> also trying to figure out why my network traffic is much higher this month
mxshift has joined #osdev
<geist> peaking around 1.9TiB this month, which is much higher than usual
<gog> eep
<geist> the router seems to indicate that it's about even in/out which is suspicious
<gog> you got a torrent runnign somewhere you forgot about?
<heat> can it track per-host?
<geist> sort of. i have a plugin darkstat that can, but it seems to only go back a few days
<geist> and in that case nothing sticks out, mostly just traffic between me an google
<gog> were you working from home more than usual this month?
<heat> goma?
dutch has joined #osdev
<heat> plus whatever build caching solution you have for rust
<geist> well, that does have a fair amount to do with it. i'm looking at stuff coming from my work vlan recently
<geist> and yeah, like one work host has moved 45GiB in the last few days
<geist> 32GB on the 'out' path
<geist> but this makes sense: nightly backups for work machines, though they're usually deltas
<geist> also to note: xfinity adds up both in/out traffic for that 1.9
<geist> which is about what i see here: my router is showing 755GiB in, 617GiB out, and i restarted it last around october 10th so it mised the first 10 days of the month
<geist> i'm just mostly curious about that 617
sprock has joined #osdev
<sortie> SO. I debugged my weird crash. I found out the Descriptor object was alive but the Vnode it pointed to appears to have been freed and reused.
heat has quit [Remote host closed the connection]
<sortie> The file descriptor said it was an Unix socket
<geist> !!
heat has joined #osdev
<sortie> It's crashing in my solanum server calling ppoll(2)
<sortie> So I'm thinking I got a Unix socket refcount issue when they're being passed. I'm already aware of such a bug, saw a weird crash about that the other day when casually testing
<gog> aaay
<sortie> But hey super duper impressive.. I had a failing VM with just a qemu monitor and a kernel without symbols, and I was able to rebuild a matching debug kernel, attach gdb, recover the thread frame of the crashed thread, and then do stack traces and lots of inspections to get a clue about what caused this super weird corruption
<sortie> Like this is a production system, not meant for debugging at all, and I was able to inspect it really powerfully and see what's going on inside it
<sortie> I restarted my irc.sortix.org network :)
<gog> sortix sophisticated
ids1024 has joined #osdev
<heat> careful with "rebuilding matching debug kernels"
<heat> pr
<sortie> Let's see if stays up, I mean this bug is rare
<sortie> heat, hmm?
<heat> particularly if you don't have reproducible builds
<sortie> Yeah at least my builds are pretty reproducible
<sortie> I was even able build my kernel on Linux matching the binary that I got when I built it natively on Sortix
<heat> same sha?
<sortie> Probably not, didn't check, but offsets for all the major stuff matched
moberg1 has joined #osdev
moberg has quit [Ping timeout: 255 seconds]
potash has quit [Quit: ZNC 1.8.2 - https://znc.in]
Burgundy has left #osdev [#osdev]
<heat> inling asan instrumentation makes my kernel 1MB larger
<heat> it also makes it wayyy more usable
<nick64> What is the reason for this error? https://postimg.cc/K3L1m4vR
<bslsk05> ​postimg.cc: Screenshot 2022 10 31 at 3 02 33 AM — Postimages
<heat> null pointer dereference
<heat> your kernel code is buggy
<heat> compile with KASAN+UBSAN and you might get more info
<nick64> I mean, why is kernel making the null pointer reference here? It is deliberately doing it to make it crash for security form smap, but I explicitly disabled security
<heat> because you're doing it?
<nick64> I am doing it, but just to be sure we are on same page, what do you mean by "it"?
<heat> *(char*)0x0
<nick64> To clarify: I am not doing NULL deref, kernel is just deliberately doing this error condition to make it crash to prevent hacking
<heat> where? why?
<nick64> What I can't figure out is, I have disabled security on kernel, and can't seem to understand why it is still crashing
<heat> linux doesn't have "hacking" prevention
<nick64> I am clearing bit 21 of CR4 register in the code
<nick64> that is a bit to prevent hacking
<heat> unless some weird ass security module you have is doing that
<heat> no, that has nothing to do with deref'ing a NULL
<nick64> From what I understand, kernel (the kernel thread) is deliberately taking the BUG branch that is usually taken for NULL Deref, to make the thread crash, and it is not an ACTUAL null deref
<heat> what kernel thread
<nick64> Whichever thread is running the code that is accessing usermode pages from kernel
<j`ey> if you've changed the code, to do something different from how linux does it now, maybe you need to change more things?
<nick64> But the actual question in, how do I ask the OS to temporarily disable smap security
<heat> nick64, stack trace pls
<heat> you don't
<nick64> I have done what is recommended inthe OSDev wiki
<heat> you omitted the stack trace, the only relevant thing here
<nick64> Let me do a dump_stack real quick
<nick64> Wait, that's not what you want, you want the stack trace on crash.. let me screenshot that
<heat> the oops should have the trace
<nick64> yeah
<bslsk05> ​postimg.cc: stacktrace — Postimages
<heat> memcpy to a null address
<heat> your patch_code is wrong
<nick64> patch_code is just invoking kernels built in text patch function, not doing anything from my own implementation there: https://postimg.cc/bDM9rG0P
<bslsk05> ​postimg.cc: patch code — Postimages
archenoth has quit [Ping timeout: 248 seconds]
<heat> god that looks confusing
<nick64> That text_poke is a kernel built in function, if that sounded like something I have defined somewhere else : https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/text-patching.h#L44
<bslsk05> ​elixir.bootlin.com: text-patching.h - arch/x86/include/asm/text-patching.h - Linux source code (v6.0.6) - Bootlin
<nick64> Oh and those function pointers I am passing to it via that wrapper function patch_code is something that The Almighty Kernel itself hinted me, so that can't be wrong, right? : https://postimg.cc/0MgNNP63
<bslsk05> ​postimg.cc: fnptrs — Postimages
<nick64> This wiki says that SMAP is determined by a bit in CR4 so that is what I am trying to do to avoid this error in the original screenshot, which cries out loud that it has something to do with smap. It says it is 21st bit so I am just clearing it: https://wiki.osdev.org/CPU_Registers_x86-64#CR4
<bslsk05> ​wiki.osdev.org: CPU Registers x86-64 - OSDev Wiki
<heat> I don't know, have you checked?
<gog> nick64: "=m" (new_cr4) should be "+m"
<heat> why are you touching smap anyway?
<gog> actually no nvm
<gog> you don't read it
<heat> smap is so you don't touch user memory in kernel space without it being deliberate
<nick64> I am not particularly doing SMA, but the kernel seems to be complaining of SMAP, so figured it maybe side effect of modifying kernel text section from a module insmod'ed from userspace
* klange smells linux talk
<gog> no, the kernel is being modified from kernel space
<gog> the module becomes part of kernel space
<nick64> Yeah, beats me why it is complaining smap, was trying to shut the kernel up about it anyway by disabling it
<gog> what's happening is the kernel is trying to dereference a pointer at 0x0000000000000023
<gog> which is in a userspace VMA
<gog> idk how SMAP works exactly so i'm gonna read for a minute
<nick64> I assumed that error message would be generic smap fault error message
<nick64> not sure
<heat> please show all the errors you're getting
<nick64> because 23 is definitely not null
<nick64> The first screenshot has all errors
<nick64> let me find that
<heat> 0x23 is ((struct blah*)0)->field for sure
<nick64> Oh okay got it, but still think that is generic error path taking for smap failure
<nick64> This is the error message : https://postimg.cc/K3L1m4vR
<nick64> Rest all is just stacktrace and registers
<nick64> Oh I will share the error on the insmod command
<nick64> Actually nothing much there really https://postimg.cc/kRKYdN0z
<bslsk05> ​postimg.cc: insmod err — Postimages
archenoth has joined #osdev