#osdev on 2021-07-09 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:11 tacco has quit []

00:13 <Mutabah> syscall usually refers to a direct call into the kernel (via a software interrupt/ecall/sycall/sysenter/...)

00:13 zoey has quit [Ping timeout: 255 seconds]

00:13 <Mutabah> while a "system api" could be a library provided by the base OS

00:15 <moon-child> it's kinda contextual, though. I've heard e.g. posix open/read/write/etc. referred to as system calls

00:16 <klange> I would say a "system call" is a specific action, and an "API" is a general concept.

00:16 <klange> The API is the combination of the system calls, the data structure layouts, the etc.

00:22 scaleww has quit [Ping timeout: 252 seconds]

00:22 <Skyz> So one is user-level and one is kernel

00:24 scaleww has joined #osdev

00:30 <gog> no

00:30 <gog> API is a concept independent of privilege level

00:34 <Skyz> That makes sense

00:38 <Skyz> I'm looking for something that generates pixels on the screen

00:41 <Skyz> I want my mouse to click on a pixel of a certain color

00:42 <Skyz> This should be able to be done in C

00:42 <gog> that's a little complicatd

00:43 <gog> so you'll have to scan the frame buffer for pixels of a certain color, move the cursor to the coordinates of that pixel on screen, then have it click

00:43 <gog> but reading the frame buffer isn't necessarily easy or fast

00:44 <Skyz> I probably could find something with an ML library

00:45 <Skyz> oh here https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getpixel?redirectedfrom=MSDN

00:45 <bslsk05> docs.microsoft.com: GetPixel function (wingdi.h) - Win32 apps | Microsoft Docs

00:46 <gog> that works too

00:53 <Skyz> This has gotten really interesting. I'm trying to do this with a game, so maybe there is a library that can read data from the client like the pixel

00:54 <Skyz> OpenGL hacking or DirectX hacking

00:55 gog has quit [Ping timeout: 246 seconds]

00:56 scaleww has quit [Quit: Leaving]

01:08 <moon-child> I kinda wish that strict aliasing rules were defined like atomics

01:08 <moon-child> like, if you have an int* and a float* that are equal, and you write to both, you don't get a guarantee about which write happens first, but neither of the writes gets dropped

01:09 <moon-child> function call can be like a barrier/fence

01:09 <moon-child> and you can opt into the nasal demons with restrict

01:15 <Skyz> I'm not sure if this was a good idea

01:16 <Skyz> It could be really useful for AI

01:17 <Skyz> moon-child any practical value to it personal or otherwise?

01:18 <moon-child> practical value to what?

01:19 <Skyz> I guess you could specify it, is there errors that happen now if they are equal?

01:19 <moon-child> if what are equal?

01:20 <Skyz> int* and float*

01:21 <moon-child> oh. Currently the behaviour of an aliased access undefined; I want to make it defined in limited capacity. So from the perspective of a strictly conformant program, I've not changed anything

01:21 <moon-child> idea is to still enable optimizations, while reducing the degree of WTFery the compiler is allowed to perpetrate

01:34 <geist> ah been wondering why people have been having long monologues with themselves

01:35 <geist> finally sat down to try to fix this C64 i've had floating around for a while

01:35 <geist> sure enough looks like the PLA is busted

01:35 <geist> pretty common

01:37 <moon-child> geist: o yeah good idea come to think of it

01:39 <geist> started tracing it out, and it's generating a good ntsc signal and the clocks are alive, etc

01:39 <geist> the cpu even seems to be running code basically

01:39 <geist> but... when i get to the PLA which is basically where most of the bus logic is, it's basically got the KERNAL rom in a weird half selected state (like 0 - 1V) and the R/W line is held hard low

01:41 <geist> basically the PLA seems to have lost it's brains and most of the output pins are either fixed or in some sort of indeterminate state

01:43 ElectronApps has joined #osdev

01:47 CryptoDavid has quit [Quit: Connection closed for inactivity]

01:57 <gorgonical> Is it possible for the arm chip to prevent access to cnthp_ctl_el2 when not running at el2? It's not clear to me if the hardware "knows" which el a register belongs to and can/will prevent access to it

01:57 <gorgonical> I'm getting an undefined instruction error on mrs from that register, but the kernel uses that as a catch-all so it's not immediately clear to me if its "undefined instruction" or something else

02:00 isaacwoods has quit [Quit: WeeChat 3.2]

02:14 <Skyz> Why do people downgrade their computers for linux?

02:14 <Skyz> "Freedom"

02:19 <geist> 100%

02:20 <geist> i think without exception you cannot access a *_ELn register at a higher el than you are running at

02:20 <geist> but not vice versa. EL2 can set EL1 regs in addition to EL2, but EL1 cannot read/write EL2 (or EL3) regs

02:20 <gorgonical> geist: that makes sense, i mean what is the purpose of having separate control regs if a lower el can just change them

02:20 <geist> the _ELn part is encoded into the instruction

02:21 <geist> yah

02:21 <gorgonical> so it's architectural, then?

02:21 <geist> a keystone of the design, yes

02:21 <gorgonical> I mean, because specifically these are timer registers, so it must be encoded into the instruction somehow?

02:21 <geist> right

02:21 <gorgonical> I see

02:22 <geist> that's essentially the mechanism by which it all works. by putting the control bits behind a register that can oly bec accessed from EL2 (and there's an EL3 version i think too) you gate it off from EL1

02:22 <geist> same with things like TCR_EL2 or SCTLR_EL2. those are banked, and affect EL2 and below. shouldn't be able to fiddle with it from EL1

02:23 <geist> similarly, that's how most of the supervisor regs are masked off from EL0. they're _EL1 regs

02:26 freakazoid333 has quit [Read error: Connection reset by peer]

02:44 dragestil has quit [Ping timeout: 252 seconds]

02:46 dragestil has joined #osdev

02:51 <Skyz> How do you work on this hobby when low-level development isn't needed anymore?

02:52 <Skyz> (general question)

02:52 <klange> What do you mean?

02:52 <klange> From the perspective of OS development being an unnecessary activity, it is a _hobby_ for most of us, after all. Toy trains aren't "needed" either.

02:52 <Mutabah> Even using "high level" languages/constructs, you still have to write a lot of infrastructure

02:52 <Skyz> System development is mostly solved, and the low-level stuff I'm interested seems to be years of work

02:53 <klange> From the perspective of how do you _continue_ with an OS project when the low-level stuff is done, there's so much more you can do beyond that.

02:54 <geist> sigh

02:54 <Skyz> I was more going for the first part, didn't realize that my "dream" was useless

02:54 sts-q has joined #osdev

02:54 <Skyz> or unprofitable

02:55 <klange> Profit can be measured in far more than dollars.

02:55 <Skyz> I'm pretty satisfied, but money is important too.

02:55 <geist> i'm actually pretty offended at the premise of his questions (unignored him and looked at the backlog)

02:56 <geist> 'low level development isn't needed anymore' my ass

02:56 <geist> if anything it's a very very lucrative job market

02:56 <geist> sounds like they're just justifying their own predicament by declaring things they aren't good at 'useless'

02:56 <Skyz> I'm not trying to be offensive but my interest is in Game Hacking and I went into it because CS is lucrative

02:57 <geist> game hacking is not

02:57 <Skyz> :|

02:57 <geist> if you'retrying to make money, you're barking up the wrong tree

02:57 <klange> Games are a hell only worthwhile if you have true passion for it, and even that doesn't excuse the abuse in the industry.

02:57 <geist> right plus the fact that it's really hard to make a living doing it, unless you're exceptionally good and lucky

03:00 vin has joined #osdev

03:10 dragestil has quit [Read error: Connection reset by peer]

03:11 <klange> I make this toy train anology a lot, and I think locomitives are an apt comparison to operating systems.

03:13 <klange> Building a steam engine of equal or better quality to one that was on the rails in the early days of trains isn't particular hard, but you're not going to sit down in your garage and spend a few years banging a hammer and come out with a GE AC6000CW.

03:14 <klange> And even if you do build something full-scale and passenger-capable, you're gonna have a hell of a time getting anyone to ride it - most people don't even ride the trains they do by choice, but by necessity.

03:16 <geist> but, also while banging out your steam locomotive you learn a heck of a lot about how the big ones work, and that gives you a lot of knowledge about how trains go and comes in handy

03:16 <vin> How would you build a GE AC6000CW that people would ride?

03:16 <geist> vin: exactly.

03:16 <klange> Hobby OSes are like model trains, model steam engines, like building a replica of a Union Pacific 844.

03:19 <klange> "But some guy built a diesel engine in his garage in Finland and now it's one of the most widely used freight engines in the world!" Yeah, well, that guy got lucky with timing.

03:20 <klange> Got in right when there was a market, published all his designs, didn't have to deal with all the safety regulations and fuel efficiency requirements and hundreds of different track gauges...

03:24 Skyz has quit [Quit: Client closed]

03:24 <vin> Didn't Ken also start with an hobby project? Offcourse a different world all together.

03:29 <klange> Kind of, but it was after he had already been hired by Bell and been professionally involved with Multics - while working on that project, he had built a video game (Space Travel) for the systems they were using, and when Bell withdrew from the project he ported it to the PDP-7, which was a few years outdated already, and that eventually led to Unix.

03:30 <geist> also being at bell labs they were basically paid to do various researchy projects

03:30 <geist> so could officially write it off as a project, and/or a solution to help other researchers do their work (text formatting, etc)

03:31 <nur> wait wait people don't like trains?

03:32 <klange> Oh trust me, there are some people who _love_ trains. But most people hopping on the subway in New York have at best a neutral attitude towards them.

03:33 <nur> that explains why my friends look at it me funny when I was hopping up and down with joy on the T in Boston

03:34 <geist> i like trains

03:34 <nur> me too!

03:34 <geist> i'm not a total nerd over them, but i'm more impressed with the power/etc

03:34 <nur> me neither, I just like the idea of trains

03:35 <klange> And much like how many people use Windows because they need to and not because they care, it doesn't matter how much better your train is if they need to get to Church Street and yours only goes around your house.

03:35 <nur> they go through the city _and_ the country! They make train noises! You can sit down and look out the window and work on The Great American Novel and be inspired by the landscapes that zoom by!

03:35 <geist> but the locomotive analogy is interesting too: the kernel/OS is the locomotive, but apps/etc are the cars

03:35 <klange> They might even love trains. They might be super into N700S. But they're not traveling from Tokyo to Osaka. They're going to work.

03:35 <geist> so you can build an incompatible hobby locomotive for a weird track, whcih is fine

03:36 <geist> but..

03:36 <nur> is it a nice small bonus for your resume though

03:36 <klange> Either the track or the stations are the hardware, this is where the analogy gets a bit more muddy.

03:37 <klange> Probably track = architecture, stations = devices?

03:37 <geist> or the track is the API? though you can argue it's the connection between trains i guess

03:38 <klange> Or maybe the physical tracks are... like individual computers and the _gauge_ is the architecture? You can design a train for any gauge, but you can generally only build it for one at a time? And you install the train on a particular network of tracks and...

03:39 <klange> Aw jeeze Skyz isn't even here anymore. Choo choos fun, time for lunch.

03:39 <nur> if you want a systemsy type job it's nice to be able to say "oh I did this" even if no one uses it, right

03:39 <klange> Yes.

03:40 <geist> yah like i said building your own locomotive is at worst extremely educational

03:40 <geist> and in a lot of cases that does have real benefits

03:40 <geist> since like it or not anyone that does software development is indirectly using trains

03:40 <nur> you mean operating systems

03:41 <geist> no trains! choo choo

03:41 <nur> fun fact: I did systems dev for trains once

03:41 <klange> I've done planes. Anyone done automobiles?

03:42 <nur> ooh then we have the Steve Martin trifecta

04:05 <nur> what's this thing when you do qemu -d int : env->regs[R_EAX]=00000000

04:05 <nur> is it EAX?

04:10 <geist> probably

04:10 <nur> funky syntax though

04:10 <nur> env->regs

04:26 ^[ has quit [Read error: Connection reset by peer]

04:26 ^[ has joined #osdev

04:28 MarchHare has joined #osdev

04:28 <MarchHare> Looks like it's improved a bit here.

04:31 <MarchHare> Well, maybe still quiet

04:37 <klange> You just missed a considerable conversation about trains.

04:38 <Mutabah> Take a look at the logs if you're curious

04:41 <MarchHare> Meh, not THAT interested about trains.

04:42 <klange> It's a metaphor for osdev.

04:42 <MarchHare> I suppose I'm just a bit dumb I guess. Didn't seem like a good metaphor to me.

04:42 dragestil has joined #osdev

04:43 dragestil has quit [Client Quit]

04:43 dragestil has joined #osdev

04:51 <MarchHare> Is there an actual programming tutorial for the AC97 stuff? Like something that walks through the initialization and how it's all done? I've tried to read the official docs, but they're very inscrutable for me.

04:51 <MarchHare> Example code with basic comments would probably be best for me.

04:53 srjek|home has quit [Ping timeout: 240 seconds]

04:54 <klange> I don't even know if my driver works on real hardware, since all I have is newer HDA chips... https://github.com/klange/toaruos/blob/master/kernel/audio/ac97.c

04:54 <bslsk05> github.com: toaruos/ac97.c at master · klange/toaruos · GitHub

05:04 vdamewood has quit [Quit: Life beckons]

05:05 <MarchHare> I'll give your OS a look. I've been trying to wrap my head around the basics of AC97 for awhile, but I just keep not getting it. I think I probably also need to learn how to talk to the PCI bus, too

05:05 vdamewood has joined #osdev

05:07 nly has joined #osdev

05:13 <MarchHare> klange: You got a bunch of pull requests on nyancat =P

05:23 <klange> ac97 is a bit more straightforward because it has fixed offsets for things like PCM Out

05:23 <klange> HDA has this whole complicated connection system with nodes

05:24 <klange> HDA was meant to be a long-running future-proofed system for consumer audio hardware, so it supports doing all these crazy things at the cost of "I just want to spit audio data at speakers" being... a lot more complicated.

05:34 tenshi has joined #osdev

05:38 <klange> ugh, so many things on my plate; should write them on post-it notes, stick them on a dart board, and see what I hit

05:38 <klange> i'm terrible at darts, so it's basically random

05:44 <clever> does the hda/ac97 stuff support codecs other then pcm?

05:46 <klange> Okay, so quick explanation of some things:

05:47 <klange> - AC97 is, in audio hardware terms, a codec in itself. The most common hardware form of it was the integrated ICH version shipped with Intel hardware, but it showed up behind other interfaces as well.

05:47 ElectronApps has quit [Remote host closed the connection]

05:47 <klange> - HDA is actually an entirely different class of hardware: It's just a _controller_. It supports an arbitrary set of codecs behind it.

05:47 <clever> ahh

05:48 <klange> I believe HDA does support additional codecs on top of basic PCM ones, as I'm sure I've seen that in Intel hardware before?

05:48 ElectronApps has joined #osdev

05:48 <klange> And presumably it's some capability bit somewhere...

05:48 <clever> and that could allow shoving raw mp3 data into the sound card, and skip the cpu decoder?

05:49 <klange> This is also why AC97 is so simple, you're basically just pointing the codec at a buffer and saying "play this".

05:49 <klange> HDA you need to point a "stream" at your buffer, point a DAC at your stream, wire your DAC to an output, and turn them all on.

05:50 <clever> so HDA has a lot more hw accelerated mixing of multiple hw accelerated codec decoding?

05:50 <klange> Which is really flexible, especially when we have hardware with multiple separate outputs, duplex ports, loopback support...

05:50 <clever> yeah

05:50 gioyik has quit [Quit: WeeChat 3.1]

05:50 <clever> ive often heard that alsa usually lacks hw mixing, so you either need alsa soft mixing, or pulseaudio, to share an output node

05:50 <klange> But I just want to say "hey here's some 48khz s16le" so I can hear the explosions in quake...

05:51 <klange> This is part of why the Linux world moved to Pulse in the first place - it was built for this newer kind of hardware that has these audio routing mechanisms built-in.

05:51 <clever> that reminds me, of what i tried to fix in my PA config years ago

05:52 <clever> 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)

05:52 <clever> i believe thats the onboard sound card

05:52 <clever> it has 3 capture channels, each with its own 3 input mux, all able to pick any input

05:52 <klange> Yep, and a demonstration of how HDA became the standard for modern audio interfaces on x86 - even AMD does it.

05:52 <clever> but pulseaudio (by default) only exposes 1 capture channel

05:53 <clever> and the alsa knobs to adjust all of that, have identical names, with an index attribute on them

05:53 <clever> and the PA config files, didnt support an index attribute, when describing how the knobs are related

05:53 <clever> so i had to patch pulseaudio, to even be able to describe the card to PA, so it could capture from 3 inputs at once

05:54 <klange> I have an 8086:a170 for the motherboard audio, and the HDMI out on my 1080 is _also_ an HDA.

05:55 ElectronApps has quit [Ping timeout: 258 seconds]

05:55 <clever> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]

05:55 <clever> thats my hdmi out

05:55 <clever> options snd_hda_intel enable=1,0

05:55 <klange> Dunno if that's an HDA, see if you have snd_hda_codec_hdmi loaded?

05:55 <clever> pretty sure its hda-intel

05:55 <clever> i had to half blacklist it like this

05:55 ElectronApps has joined #osdev

05:55 <clever> because the hdmi->dvi adapters where coming up as a default output randomly

05:56 <klange> I have codec_generic, codec_hdmi, codec_realtek...

05:56 <clever> due to the above enable flag, its likely ignoring the capabitilies of the gpu sound

05:56 <clever> so it wont load those codecs

05:56 <clever> [ 7.982597] snd_hda_intel: probe of 0000:01:00.1 failed with error -2

05:56 <clever> [ 8.033124] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC898: line_outs=4 (0x14/0x15/0x16/0x17/0x0) type:line

05:57 <klange> My HDMI is hooked up to something without audio anyway.

05:57 <clever> same

05:57 <klange> What I need to support is the one in my laptop...

05:57 <clever> thats why i disabled it

05:57 <klange> Just the speakers... just enough to hear those glorious grunts and gunshots...

05:57 <clever> PA would randomly default to that device, and not switch away

05:58 <clever> so all audio would just mysteriously vanish into /dev/null

05:58 <klange> Interesting. I do find myself twiddling with pavucontrol a lot, but often for application failures and because I have way too many capture devices.

06:00 <clever> Bus 002 Device 007: ID 046d:0a66 Logitech, Inc. [G533 Wireless Headset Dongle]

06:00 <clever> my primary audio device is usb based

06:01 <clever> bInterfaceClass 1 Audio

06:01 <clever> tSamFreq[ 7] 48000

06:01 <clever> wFormatTag 0x0001 PCM

06:01 <clever> looks like you just shove 16bit 48khz audio into the right endpoint and your done

06:07 <clever> klange: does HDA support any kind of writeback, such as feeding mp3 into the card, and getting flac back out in ram?

06:08 <klange> The interface in general does, it's up to the hardware - you'd need a node that receives and decodes mp3 from a stream, and a node that encodes and outputs flac to another, and to be able to attach them.

06:09 <clever> ah

06:10 <clever> so in theory, it could implement a hw accelerated transcode, with lower watt usage compared to a software solution

06:10 <clever> but i feel like modern audio stacks (pulse, alsa) dont expose such codecs at all, and expect it to already be in pcm form

06:10 <clever> so all of that hw goes to waste?

06:11 Izem has joined #osdev

06:13 <nly> heyy

06:21 <Izem> hola

06:22 <geist> clever: most likely. on cell phones and whatnot i think there's still some value to hw assist

06:22 <geist> but even then it's probably just a DSP

06:22 <geist> 10-15 years ago it was practically mandatory to assist with mp3 decode and whatnot

06:23 <geist> (on battery powered things)

06:23 <clever> yeah

06:23 <clever> i suspect my old mp3 player was more dsp then asic

06:23 <clever> because you could load more codecs into it

06:23 <clever> rca lyra

06:24 <geist> taht being said if you go back far enough it was just ARM again. ipod classic first few gens, for example, were basically a dual core ARM7tdmi

06:24 <geist> was before things were big enough that DSPs were being developed for that sort of thing

06:28 <clever> there was also that other ipod model, which was it...

06:28 <clever> https://en.wikipedia.org/wiki/VideoCore#Variants

06:28 <bslsk05> en.wikipedia.org: VideoCore - Wikipedia

06:28 <clever> > The VideoCore II-based VC02 / BCM2722 processor provides video capabilities for Apple's 5th generation iPod.[4]

06:29 <geist> yep

06:29 <clever> a quick glance at the features of the 4th and 5th gen, says they added video playback

06:29 <clever> makes my think it has all of the video hw accel of an rpi

06:32 <geist> somethingl ike that

06:33 <geist> early version of it at least

06:34 <clever> yep, before they added an arm core

06:34 <clever> and variables like how many macroblocks/sec it can do are likely tuned differently

06:40 nly has left #osdev [#osdev]

06:59 Robbe has quit [Remote host closed the connection]

07:14 * kingoffrance reads backlog

07:15 <kingoffrance> didnt you guys ever see "flight of the phoenix" ?

07:16 <kingoffrance> eventually the "repair based on toy" flies

07:30 <doug16k> clever, that hardware goes to waste, because historically, audio offload has been so flaky that just PCM is good enough

07:30 <doug16k> if you give me a hardware accelerate audio checkbox, I'd seriously consider unchecking it

07:31 <moon-child> hw accelerated audio usually has really dumb limits

07:31 <moon-child> latency wise mixing is not the bottleneck so it's questionable from the outset

07:32 <doug16k> it's like NIC offload. sure, that would be nice if it did the encryption for me. ya right. if it did, it would be a server NIC chip, not on the cheap ass device I got for at home

07:33 <geist> kingoffrance: heh yeah

07:33 <geist> totally forgot about that

07:34 <doug16k> CRC is about the only offload I'd trust fully. how dumb would it need to be to get that wrong

07:36 <clever> related, the h264 hw decode on the rpi, has some edge cases that complicated it when i tried to use it

07:36 <clever> the api is async, and fully handles b-frames

07:36 <clever> but ffmpeg/mplayer, expected a sync api

07:36 <clever> so mplayer gives you one frame of encode video, and waits for a decoded frame

07:37 <clever> the rpi hw, expects several mb of encoded data, before giving the 1st frame...

07:37 <clever> b-frames also complicate it, in that both the hw accel, and mplayer, want to re-order the frames, and that just scrambles it more

07:39 <doug16k> yes, video is hard because it has to worry at the level of the differences between whole recent frames to extract the extreme redundancy

07:39 <doug16k> one frame at a time is not feasible

07:40 <clever> b-frames is a special thing, where the frames are encoded in the wrong order, because it compresses better

07:40 <doug16k> that issue with the encoding side leaks to the way the output side works, and decoding

07:40 <clever> and you then have to buffer a few, and display them out of order, to correct for that

07:49 asymptotically has joined #osdev

09:08 MarchHare has quit [Ping timeout: 255 seconds]

09:16 asymptotically has quit [Remote host closed the connection]

09:16 asymptotically has joined #osdev

09:35 gog has joined #osdev

09:37 CryptoDavid has joined #osdev

09:40 grange_c has quit [Quit: The Lounge - https://thelounge.chat]

09:41 grange_c has joined #osdev

09:48 Robbe has joined #osdev

09:48 dormito has quit [Ping timeout: 240 seconds]

09:58 divine has quit [Ping timeout: 252 seconds]

10:00 divine has joined #osdev

10:05 GeDaMo has joined #osdev

10:16 ElectronApps has quit [Read error: Connection reset by peer]

10:19 ElectronApps has joined #osdev

10:20 scaleww has joined #osdev

10:21 dormito has joined #osdev

10:39 asymptotically has quit [Quit: Leaving]

10:42 <doug16k> oh wow, description is wrong on _mm_set_pd1 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=double&expand=6085,6085,6085&techs=SSE,SSE2,SSE3

10:42 <bslsk05> software.intel.com: Intel® Intrinsics Guide

10:42 <doug16k> it doesn't broadcast it

10:42 <doug16k> it's no wonder half the intrinsic use has stupid memory punning

10:43 <doug16k> seems like most people don't even know the cvt and casts exists

10:43 <doug16k> s/s$//

10:46 <doug16k> if it broadcasts it, what does _mm_set1_pd do?

10:46 <doug16k> broadcast it?

10:48 <doug16k> ah, it'd be move_sd

10:49 <doug16k> hardly even worth touching intrinsics anymore. autovectorizer will do it from simple code

10:50 <doug16k> it will see what you mean then just generate vector code for it

10:50 <doug16k> if you say it right

10:52 <doug16k> even if you did something like create an x rotation matrix and multiply it by another matrix, it can "know" all the zeros and ones and unknown and do it really well

10:53 <doug16k> if you force intrinsics down its throat it probably will do all the muls with 1 and 0 and adds of 0

11:09 zoey has joined #osdev

11:14 <doug16k> that _pd1 one must have been some inconsistently named one that was renamed

11:17 Izem has quit [Quit: Izem]

11:40 <doug16k> I can't believe linux. parallel make spends more time in clear_page_rep than anything else

11:40 <doug16k> whoever says clearing it at the last moment is good, is wrong

11:42 <doug16k> I have 33G of completely unused memory (not even used by cache) and it is blocking the paging operations zeroing pages

11:43 <mjg> that's a huge tradeoff

11:43 <doug16k> why not clear pages during idle

11:43 pony has quit [Quit: WeeChat 2.8]

11:44 <mjg> idle clearing was the go to method in bsds for years

11:44 pony has joined #osdev

11:44 <mjg> turns out the free pages get shredded immediately

11:44 <doug16k> well it is the right way now, given that a parallel c++ compile's top function is clear_page_rep

11:45 <mjg> i don't know have numbers handy right now, but something as simple as kenrel compile shreds through gigabytes of memory in terms of total pages used

11:45 <doug16k> try it. get something big and run sudo perf top and start make -j$(nproc). see if it's all clearing pages and malloc time

11:45 <doug16k> something with tons of ram

11:45 <mjg> well grab yourslef an ebpf script or something and count how many times the routine is called to begin with in your workload

11:46 <mjg> multiply by 4k and that's howm uch spare memory you wold have to have to avoid calling it

11:46 <mjg> i'm telling you would walk through your stash almost immediately

11:46 <mjg> and hten you are back to zeroing on demand

11:46 <doug16k> it's not like I imagined that clearing pages would be slow. I am talking about perf top telling me that it spent more cycles clearing pages than anything else

11:47 <doug16k> you saying that g++ will eat through 32 GB of freed pages in seconds in CoW?

11:47 <doug16k> zeroed*

11:48 <mjg> i don't have one liners for linux

11:48 <doug16k> COW + demand

11:48 <mjg> i can get you numbers from freebsd in 10 minutes

11:48 <doug16k> it might show something else on freebsd in a profile

11:48 <mjg> i'm not talking about profile

11:49 <mjg> i'm talking about number of page zeroing calls

11:49 <mjg> and consequently memory demand

11:49 <doug16k> I am talking about number of cycles spent where

11:49 <mjg> and i'm telling you background zeroing will not help

11:49 <mjg> because whatever stash you may amass will get shredded imediately

11:49 <mjg> and then you aer back to zeroing on demand

11:50 <doug16k> not really

11:50 <doug16k> let's say I did make -j1. other idling cpus would keep it full of zeroed pages and it would never block

11:50 <doug16k> or if -j isn't keeping up

11:50 <doug16k> or if the build becomes a bit sequential

11:50 <mjg> sure, you can come up with a contrived scenario where it "manages|

11:51 <mjg> i assumed parallel build

11:51 <mjg> make -j$(nproc)

11:51 <mjg> which keeps all cpus busy

11:51 <mjg> you quoted it yourself

11:51 <doug16k> yes

11:51 <doug16k> but even then, it's not perfect

11:51 <doug16k> there will be moments of idle for various reasons and it could do some proactive clearing

11:52 <doug16k> I see your point though. I would need to know the sum of all allocations during the build. if that exceeds the free pages by much it will degenerate to last moment

11:53 <doug16k> but just because it can't stay optimal the whole time, doesn't mean it won't benefit from the zeroed page pool sometimes

11:53 <mjg> well i got your numbers

11:53 <mjg> dtrace -n 'fbt::pagezero:entry { @[stack()] = count(); }' from building the freebsd kernel

11:54 <mjg> just from the page fault handler alone i got 36413914 calls

11:54 <mjg> that amoumnts to almost 139GB of memory

11:54 <mjg> total real time 52 seconds

11:55 <doug16k> how close to perfect cpu usage?

11:55 <doug16k> you have user and wallclock?

11:55 <doug16k> elapsed times I mean

11:55 <mjg> that one is alittle disfigured 3023.72s user 359.25s system 6423% cpu 52.665 total

11:55 <doug16k> every configure script serializing the build would fill up the pool

11:56 <mjg> there is configure here

11:56 <mjg> what is here is lz4 or some other compression algo which trails at low thread count as everyting else is done

11:56 <doug16k> 8 thread cpu?

11:56 <mjg> 104

11:56 <doug16k> lol oh ya

11:56 <doug16k> not even close to 100%

11:57 <mjg> > 13:56 < mjg> what is here is lz4 or some other compression algo which trails at low thread count as everyting else is done

11:57 <mjg> that's why

11:57 <mjg> prior to that utilisation was much closer to 100

11:57 <doug16k> right so those wasted cpus would be keeping zero page pool full

11:57 <doug16k> it would just sit full

11:57 <mjg> well they would not help much because most of thew orkload happened at the same time

11:58 <mjg> and shredded through gigabytes of memory

11:58 <mjg> the best you could do is slightly speed up the lz4 sucker

11:58 <mjg> which would go away if better concurrency was introduced there

12:02 <GeDaMo> It's a pity that memory wasn't slightly smarter so you could tell it to zero itself over a range

12:02 isaacwoods has joined #osdev

12:02 <doug16k> GeDaMo, that and more may be coming

12:02 <doug16k> there is a bunch of research right now into processing-in-memory

12:03 <doug16k> little coprocessor things that are right there next to the row buffer

12:03 <GeDaMo> I'm pretty sure that's been promised for a while :P

12:03 <doug16k> row copy and row clear and even vector arithmetic

12:03 <mjg> amd has a clzero instruction

12:04 <mjg> interestingly to my tests it reduces the cost of zeroing, but it uses non-temporal stores

12:04 <mjg> so then you pay for cache misses in userspace

12:04 <mjg> getting a net loss :)

12:04 <doug16k> it's great for pre-clearing

12:04 <doug16k> doesn't wreck cache

12:05 <mjg> that's the standard argument for background zeroing

12:05 <mjg> then it turns out the pages were in llc all along

12:05 <mjg> and now you evicted them

12:05 <mjg> basically the better job you do reusing pages, the worse this gets

12:06 <doug16k> it should kick over to pre-clear when hammering clearing

12:07 <doug16k> if cpus are idling

12:08 <doug16k> how the hell can clearing pages be at the top of the profile of a compile

12:09 <doug16k> either gcc is optimized by a god or something is wrong with the memory management

12:11 <doug16k> is there a perf top for bsd?

12:11 <doug16k> perf is a linux thing, right?

12:13 <mjg> https://people.freebsd.org/~mjg/fg/flix1-r363215-buildkernel.svg is what i suspect you are looking for

12:13 <mjg> pagezero_erms

12:13 <mjg> zeroing is at top of the profile because 1. it is slow 2. it is in huge demand 3. rest of the kernel is not that bad

12:13 <mjg> the above is an old flamegraph, would have to update

12:13 <mjg> but ye, pagezero is pretty visible

12:14 <doug16k> ya. make -j$(nproc) is pretty much a lot of page fault handler with blips of compilation here and there

12:15 <doug16k> I guess this shows that modern cpus can breeze through any amount of computation, and they only have some problems with cache miss

12:15 <doug16k> and the next level cache, page table miss

12:16 <doug16k> hopefully never miss all the way to swap

12:16 <GeDaMo> Is it mmapping the source files?

12:16 <doug16k> but billions of times a day, page fault CoW

12:16 <doug16k> GeDaMo, it's the demand paging

12:17 <doug16k> touching the read only shared zeroed page and committing a page on write

12:18 <doug16k> when you allocate memory, it pretends, and you get the shared zeroed page mapped readonly. when you write it, it coffee sprays, and clears you a page right then, and maps it into your process, and returns

12:19 <doug16k> so compilers spend more time going "oh they were really going to use the page" and clearing it

12:19 <doug16k> ...than compiling

12:21 <doug16k> I want to get turbo C++ working in DOS and time it compiling 5000 source files

12:21 <doug16k> on one cpu, on 3950x

12:22 <doug16k> how many milliseconds you think?

12:22 <doug16k> 10?

12:23 <doug16k> I can't imagine it being over 100ms

12:23 <GeDaMo> Does it actually copy a zero page or does it just write zeroes out to a page? Would that make much difference?

12:23 <doug16k> it writes zeros

12:24 <doug16k> that already-zeroed page is being shared by all the other virtual memory ranges that are allocated, but not yet written

12:24 <doug16k> when you write it, it looks and realizes that you are allowed to write there, so at that moment, it actually gets you your own page, clears it, maps it there, and restarts the instruction

12:25 <doug16k> until then it just pointed at the zeroed page and was readonly, even though you are supposed to be allowed to write there

12:25 <doug16k> obviously can't let you trash the shared page

12:26 <doug16k> the reasoning behind clearing it right at the last moment is, it'll be in the cache from the clearing

12:27 <doug16k> my argument is Amdahl's law - move that sequential bit out and run it in parallel if you can

12:29 <doug16k> or even not parallel, concurrently during idle time would work

12:31 <doug16k> I just find it very hard to believe that you wouldn't get a gain from pooling zeroed pages, even if you did end up draining it sometimes and falling back to last-moment

12:51 ahalaney has joined #osdev

13:22 immibis has quit [Remote host closed the connection]

13:23 aquijoule__ has quit [Remote host closed the connection]

13:23 aquijoule__ has joined #osdev

13:24 pieguy128 has quit [Quit: ZNC 1.8.2 - https://znc.in]

13:24 pieguy128 has joined #osdev

13:25 silverwhitefish has quit [Remote host closed the connection]

13:25 silverwhitefish has joined #osdev

13:26 gog has quit [Remote host closed the connection]

13:56 Izem has joined #osdev

14:06 nick8325 has joined #osdev

14:09 gog has joined #osdev

14:14 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

14:16 nick8325 has quit [Quit: Leaving.]

14:22 nick8325 has joined #osdev

14:36 Brnocrist has quit [Ping timeout: 252 seconds]

14:46 Izem has quit [Quit: Izem]

14:49 Brnocrist has joined #osdev

14:56 andydude has joined #osdev

14:57 ElectronApps has quit [Remote host closed the connection]

15:03 freakazoid333 has joined #osdev

15:34 Izem has joined #osdev

15:34 Izem has quit [Client Quit]

15:36 Brnocrist has quit [Ping timeout: 255 seconds]

15:36 Skyz has joined #osdev

15:38 <Skyz> Interesting discussion happening here: https://forum.osdev.org/viewtopic.php?f=15&t=42259

15:38 <bslsk05> forum.osdev.org: OSDev.org • View topic - Reinventing Unix is not my problem

15:51 andydude has quit [Quit: andydude]

15:55 andydude has joined #osdev

16:20 andydude has quit [Quit: andydude]

16:23 andydude has joined #osdev

16:24 Izem has joined #osdev

16:24 andydude has quit [Client Quit]

16:26 Skyz has quit [Quit: Client closed]

16:30 MarchHare has joined #osdev

16:39 scaleww has quit [Quit: Leaving]

16:43 sts-q has quit [Ping timeout: 252 seconds]

16:47 sts-q has joined #osdev

17:01 vdamewood has joined #osdev

17:06 vinleod has joined #osdev

17:06 iorem has quit [Quit: Connection closed]

17:07 vdamewood has quit [Ping timeout: 255 seconds]

17:07 vinleod is now known as vdamewood

17:13 srjek|home has joined #osdev

17:25 tacco has joined #osdev

17:40 Brnocrist has joined #osdev

18:00 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

18:11 vdamewood has joined #osdev

18:22 <dzwdz> i have a weird issue - iret'ing from an interrupt causes a #GP in bochs but works just fine in qemu

18:23 <dzwdz> when the iret is executed in bochs, it says that something is not a valid code segment

18:23 Izem has quit [Ping timeout: 255 seconds]

18:23 <j`ey> doug has said a few times that bochs is way more strict than qemu

18:23 <dzwdz> yup, i've noticed that already

18:23 <dzwdz> i had some issues with the GDT earlier today and it helped me catch that

18:24 <dzwdz> but i have no idea what could be causing this

18:24 <dzwdz> the interrupt handler is a single iret

18:27 <dzwdz> that should work fine, right?

18:33 Izem has joined #osdev

18:34 <gog> depends on the interrupt

18:34 <gog> some will need to have the stack popped once some won't

18:34 <gog> for exceptions anyhow, irqs don't

18:36 <dzwdz> it works fine on qemu though

18:36 <dzwdz> if the stack pointer wasn't in the right position then both qemu and bochs would break, right?

18:37 <dzwdz> also i should probably mention that this is a software interrupt

18:38 tenshi has quit [Quit: WeeChat 3.2]

18:40 <gog> hm yeah

18:40 <gog> idk then for sure

19:05 freakazoid333 has quit [Read error: Connection reset by peer]

19:17 brynet has quit [Quit: leaving]

19:41 brynet has joined #osdev

19:46 <geist2> mjg: re: clzero whats interesting there is ARM more or less mandates it

19:46 <geist2> ie, doesn't have to use `dc zva` but pretty much all memsets i've seen in real systems just revert to it when setting zeros

19:46 <geist2> i do kinda wonder about the logic of that, but i dont think anyone has really done any tests to see if it's the best idea in general

19:52 johnjay has quit [Ping timeout: 252 seconds]

19:52 <geist> dzwdz: the error code for the #GP should have the offending code segment i believe

19:52 johnjay has joined #osdev

19:53 <geist> if it's complete nonsense then it sounds like your stack is corrupted or misaligned

19:53 <dzwdz> the bochs error message shows that it's the r0 data segment

19:53 <geist> what is r0?

19:53 <dzwdz> ring 0

19:53 <geist> ah there you go then

19:53 <dzwdz> i don't modify the stack at all, though

19:54 <geist> doesn't matter to the computer

19:54 <dzwdz> but what else would corrupt the stack?

19:54 <geist> if its your first iret i'd stick a breakpoint or an infinite loop to ust in front of the iret and manually verify the stack

19:54 <geist> who knows. root cause it first

19:55 <dzwdz> that's a good idea, idk why i haven't though of that

19:55 <geist> like, you take an irq and dont touch the stack *at all*?

19:57 freakazoid333 has joined #osdev

19:59 <dzwdz> aren't IRQs the ones caused by hardware?

19:59 <dzwdz> it's a software interrupt

19:59 <geist> sure. same thing

19:59 <dzwdz> the IDT points to the iret

20:00 <geist> ah so it's literally a one byte irq handler?

20:00 <dzwdz> yup

20:00 <geist> well, again, gotta root cause it

20:01 <geist> that *should* work but there are a few reasons it could still get screwed up

20:01 <geist> like, for example, are you in 64bit?

20:01 <dzwdz> nope, 32bit

20:01 <geist> have to use the right form of iret (iretq for 64bit)

20:01 <dzwdz> i tried that just to be sure, but it didn't even compile

20:02 <geist> also if you are running on 'bad' segments prior to the irq it might still be okay up until you reload them via the iretq

20:02 <dzwdz> i'm pretty certain that my segments are good now

20:02 <geist> like if your CS is actually wrong, it takes the int, saves it on the stack, switches to a 'good' cs, runs iret, switches back to th bad one and faults

20:03 <dzwdz> i'm almost certain that both the code and data segments are correct

20:03 <geist> you can get that if you are running on some existing segments, reload your GDT, and dont reload your segments

20:03 <geist> what are their numbers?

20:03 <geist> did you reload all your segment registers after setting up your GDT?

20:04 <dzwdz> i've reloaded the code segment

20:04 <dzwdz> i should probably reload the other ones too

20:04 <geist> you absolutely must

20:04 <geist> what are the segment numbers?

20:05 <dzwdz> ring 0 code is 1, ring 0 data is 2, ring 3 code is 3, ring 3 data is 4

20:06 <geist> what values are you putting in the registers

20:06 <dzwdz> also, brb

20:06 <geist> sure, thanks for telling me you're brbing, btw. nothing more frustrating when someone just dissapears for a while and doesn't say anything while trying to help them :)

20:28 immibis has joined #osdev

20:30 <dzwdz> ok, i'm back

20:30 <dzwdz> i'm putting the index of the ring 0 data segment * 8 into %ds

20:31 <dzwdz> it uses the same enum as the code which sets the gdt up

20:31 <dzwdz> and i'm not changing any other segments

20:34 <geist> which is what, 0x10?

20:35 <geist> and cs has 0x8 in it?

20:35 <dzwdz> i'm not putting anything in cs

20:35 <geist> then what is in it

20:35 <dzwdz> wait a sec

20:35 <geist> i'm just trying to ask what actual numbers are in your cs and ds/es/fs/gs/ss

20:36 <geist> i think i'm zeroing in on something but i always want you to check your work and make sure all your blindspots are covered

20:36 <dzwdz> cs is 8, ds is 10

20:36 <dzwdz> in hex

20:36 <geist> good. so after you reload your GDT you did reload ds/es/fs/gs/ss?

20:36 <geist> if you did not, please do so

20:36 <geist> also want to far jmp to reload cs too

20:37 <geist> that will sniff out any bugs in your GDT

20:37 <dzwdz> i think i might know what's the issue

20:37 <dzwdz> the other segments are complete nonsense, including the stack one

20:38 <geist> segment register values?

20:38 <dzwdz> yup

20:38 <geist> yes you *must* put your house in order when you set up your own GDT

20:38 <geist> usually very first thing you do is reload all of the registers

20:38 <dzwdz> alright, thanks a lot ^^

20:38 <geist> this could be a big difference from bochs and qemu: the default state of the cpu and/or bios both of them are using

20:39 <geist> if you're relying on whatever existing state was there you'll end up with different results

20:39 <geist> side note, for the int instruction are you in ring3 or ring 0?

20:39 <dzwdz> i'm kinda sleepy already, and i know better than to work on this now, i'll fix this tommorow

20:39 <dzwdz> ring 0

20:39 <geist> okay, so if it were ring3 it'd *also* be loading SS and whatnot, which would also probably pick up garbage and fail too

20:40 <geist> cool, glad i could help!

20:41 <dzwdz> also just to be sure: i can far jump to the same code segment that i'm in already, right?

20:42 GeDaMo has quit [Quit: Leaving.]

20:42 <geist> totally

20:42 <geist> the trick is loading the GDT doesn't set the new hidden cached values, you have to reload the segment registers on the cpu to refresh values from the GDT

20:43 <geist> or, do something that causes segments to be reloaded, like an iret instruction

20:43 <geist> so i think in this case the iret is just the canary for an earlier problem

20:43 <geist> one that you'll find if you immediately reload all your segments

20:43 <dzwdz> and if i'm far jumping then there's no reason at all to set the data segment manually before that, right?

20:43 <dzwdz> s/data/code

20:43 <geist> right

20:44 <geist> that's why you do it almost immediately in your code

20:44 <geist> get your newGDT, reload all the segment registers and now you're running on yours

20:44 <geist> otherwise you're using whatevr stale segments were there from the bios/etc

20:45 <geist> plus if your GDT descriptors are busted, it'll potentially fault almost immediately

20:45 <geist> at least for CS. DS/SS will probably fault pretty quickly, or go off in the weeds, if you start accessing things

20:45 <geist> ES/FS/GS may go for some time without blowing up, since they're not used as much

20:46 <geist> depending on the type of descriptor corruption you have

20:47 dormito has quit [Ping timeout: 255 seconds]

20:55 Izem has quit [Quit: Izem]

21:12 mcs51 has joined #osdev

21:19 dormito has joined #osdev

21:20 ^[ has quit [Ping timeout: 240 seconds]

21:22 ^[ has joined #osdev

21:29 ^[ has quit [Ping timeout: 255 seconds]

21:42 Skyz has joined #osdev

21:51 mahmutov has joined #osdev

21:51 ^[ has joined #osdev

21:59 gog has quit [Quit: bye]

22:05 freakazoid333 has quit [Read error: Connection reset by peer]

22:06 ahalaney has quit [Quit: Leaving]

22:16 sortie has quit [Quit: Leaving]

22:19 zoey has quit [Remote host closed the connection]

22:22 freakazoid333 has joined #osdev

22:29 Skyz has quit [Quit: Client closed]

22:31 mcs51 has quit [Quit: WeeChat 3.0]

22:34 xenos1984 has quit [Remote host closed the connection]

22:36 xenos1984 has joined #osdev

23:19 elastic_dog has quit [Ping timeout: 246 seconds]

23:23 wolfshappen has quit [Quit: later]

23:25 wolfshappen has joined #osdev

23:27 <kazinsal> today I learned that like, half a dozen of the regulars in one ofe the community's I'm part of are colourblind. totally rethinking my approaches to highlighting in UX design now

23:27 <kazinsal> also apparently I cannot spell after a few work-approved afternoon beers damn

23:31 <geist> ah work approved afternoon beers

23:31 <geist> i do still wonder exactly what happened to the beer keg we had at the office when it shut down

23:31 <geist> far as i know no one did anything with it, and it's just been sitting there for a year and a half

23:32 <kazinsal> oof, that's going to be some raunchy brew

23:32 <kazinsal> today was one of our datacenter project guys' last day so we did a virtual sendoff party

23:32 <geist> yah and one of the taps had non alcoholic kombucha

23:32 <geist> we joke that it's probably alcholic now

23:33 <kazinsal> ha

23:33 <kazinsal> hmm. reminds me, the local polytechnic university does beer sales on friday from their brew lab. should grab a bus over there quickly and pick some up

23:33 <geist> annoyingly the little fan on one of my rpi4s has died

23:33 <kazinsal> apparently you can get kegs as well but, uh, that's probably a bit much

23:34 <geist> or is dying. interesting that it's little bearing is going, but it runs okay when i remove it from the case (and it gets to vibrate)

23:34 <geist> but as soon as i put it back on the case it barely spins. i guess it being held in a vibration free environment it ends up in some localized pattern where the vibration cancels its movement

23:35 <geist> i'm sure there's some name for that sort of thing that mechanical people know of

23:35 <geist> same effect if i stand it up on the table while spinning but push down such that it can't vibrate freely

23:37 gog has joined #osdev

23:53 srjek|home has quit [Ping timeout: 240 seconds]

23:53 <geist> yay fixed the fan

23:53 <geist> popped it out, pulled the fan out of the sleeve bearing, put some more oil on it

23:56 <geist> Singer machine oil. works great

23:59 <geist> probably will dry out again pretty quick, but now i know

23:59 <geist> i had watched someone on youtube do this. i've been throwing out sleeve bearing fans for years that start to make a lot of racket

23:59 <geist> lots of times i think it has just dried out