klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
tacco has quit []
<Mutabah> syscall usually refers to a direct call into the kernel (via a software interrupt/ecall/sycall/sysenter/...)
zoey has quit [Ping timeout: 255 seconds]
<Mutabah> while a "system api" could be a library provided by the base OS
<moon-child> it's kinda contextual, though. I've heard e.g. posix open/read/write/etc. referred to as system calls
<klange> I would say a "system call" is a specific action, and an "API" is a general concept.
<klange> The API is the combination of the system calls, the data structure layouts, the etc.
scaleww has quit [Ping timeout: 252 seconds]
<Skyz> So one is user-level and one is kernel
scaleww has joined #osdev
<gog> no
<gog> API is a concept independent of privilege level
<Skyz> That makes sense
<Skyz> I'm looking for something that generates pixels on the screen
<Skyz> I want my mouse to click on a pixel of a certain color
<Skyz> This should be able to be done in C
<gog> that's a little complicatd
<gog> so you'll have to scan the frame buffer for pixels of a certain color, move the cursor to the coordinates of that pixel on screen, then have it click
<gog> but reading the frame buffer isn't necessarily easy or fast
<Skyz> I probably could find something with an ML library
<bslsk05> ​docs.microsoft.com: GetPixel function (wingdi.h) - Win32 apps | Microsoft Docs
<gog> that works too
<Skyz> This has gotten really interesting. I'm trying to do this with a game, so maybe there is a library that can read data from the client like the pixel
<Skyz> OpenGL hacking or DirectX hacking
gog has quit [Ping timeout: 246 seconds]
scaleww has quit [Quit: Leaving]
<moon-child> I kinda wish that strict aliasing rules were defined like atomics
<moon-child> like, if you have an int* and a float* that are equal, and you write to both, you don't get a guarantee about which write happens first, but neither of the writes gets dropped
<moon-child> function call can be like a barrier/fence
<moon-child> and you can opt into the nasal demons with restrict
<Skyz> I'm not sure if this was a good idea
<Skyz> It could be really useful for AI
<Skyz> moon-child any practical value to it personal or otherwise?
<moon-child> practical value to what?
<Skyz> I guess you could specify it, is there errors that happen now if they are equal?
<moon-child> if what are equal?
<Skyz> int* and float*
<moon-child> oh. Currently the behaviour of an aliased access undefined; I want to make it defined in limited capacity. So from the perspective of a strictly conformant program, I've not changed anything
<moon-child> idea is to still enable optimizations, while reducing the degree of WTFery the compiler is allowed to perpetrate
<geist> ah been wondering why people have been having long monologues with themselves
<geist> finally sat down to try to fix this C64 i've had floating around for a while
<geist> sure enough looks like the PLA is busted
<geist> pretty common
<moon-child> geist: o yeah good idea come to think of it
<geist> started tracing it out, and it's generating a good ntsc signal and the clocks are alive, etc
<geist> the cpu even seems to be running code basically
<geist> but... when i get to the PLA which is basically where most of the bus logic is, it's basically got the KERNAL rom in a weird half selected state (like 0 - 1V) and the R/W line is held hard low
<geist> basically the PLA seems to have lost it's brains and most of the output pins are either fixed or in some sort of indeterminate state
ElectronApps has joined #osdev
CryptoDavid has quit [Quit: Connection closed for inactivity]
<gorgonical> Is it possible for the arm chip to prevent access to cnthp_ctl_el2 when not running at el2? It's not clear to me if the hardware "knows" which el a register belongs to and can/will prevent access to it
<gorgonical> I'm getting an undefined instruction error on mrs from that register, but the kernel uses that as a catch-all so it's not immediately clear to me if its "undefined instruction" or something else
isaacwoods has quit [Quit: WeeChat 3.2]
<Skyz> Why do people downgrade their computers for linux?
<Skyz> "Freedom"
<geist> 100%
<geist> i think without exception you cannot access a *_ELn register at a higher el than you are running at
<geist> but not vice versa. EL2 can set EL1 regs in addition to EL2, but EL1 cannot read/write EL2 (or EL3) regs
<gorgonical> geist: that makes sense, i mean what is the purpose of having separate control regs if a lower el can just change them
<geist> the _ELn part is encoded into the instruction
<geist> yah
<gorgonical> so it's architectural, then?
<geist> a keystone of the design, yes
<gorgonical> I mean, because specifically these are timer registers, so it must be encoded into the instruction somehow?
<geist> right
<gorgonical> I see
<geist> that's essentially the mechanism by which it all works. by putting the control bits behind a register that can oly bec accessed from EL2 (and there's an EL3 version i think too) you gate it off from EL1
<geist> same with things like TCR_EL2 or SCTLR_EL2. those are banked, and affect EL2 and below. shouldn't be able to fiddle with it from EL1
<geist> similarly, that's how most of the supervisor regs are masked off from EL0. they're _EL1 regs
freakazoid333 has quit [Read error: Connection reset by peer]
dragestil has quit [Ping timeout: 252 seconds]
dragestil has joined #osdev
<Skyz> How do you work on this hobby when low-level development isn't needed anymore?
<Skyz> (general question)
<klange> What do you mean?
<klange> From the perspective of OS development being an unnecessary activity, it is a _hobby_ for most of us, after all. Toy trains aren't "needed" either.
<Mutabah> Even using "high level" languages/constructs, you still have to write a lot of infrastructure
<Skyz> System development is mostly solved, and the low-level stuff I'm interested seems to be years of work
<klange> From the perspective of how do you _continue_ with an OS project when the low-level stuff is done, there's so much more you can do beyond that.
<geist> sigh
<Skyz> I was more going for the first part, didn't realize that my "dream" was useless
sts-q has joined #osdev
<Skyz> or unprofitable
<klange> Profit can be measured in far more than dollars.
<Skyz> I'm pretty satisfied, but money is important too.
<geist> i'm actually pretty offended at the premise of his questions (unignored him and looked at the backlog)
<geist> 'low level development isn't needed anymore' my ass
<geist> if anything it's a very very lucrative job market
<geist> sounds like they're just justifying their own predicament by declaring things they aren't good at 'useless'
<Skyz> I'm not trying to be offensive but my interest is in Game Hacking and I went into it because CS is lucrative
<geist> game hacking is not
<Skyz> :|
<geist> if you'retrying to make money, you're barking up the wrong tree
<klange> Games are a hell only worthwhile if you have true passion for it, and even that doesn't excuse the abuse in the industry.
<geist> right plus the fact that it's really hard to make a living doing it, unless you're exceptionally good and lucky
vin has joined #osdev
dragestil has quit [Read error: Connection reset by peer]
<klange> I make this toy train anology a lot, and I think locomitives are an apt comparison to operating systems.
<klange> Building a steam engine of equal or better quality to one that was on the rails in the early days of trains isn't particular hard, but you're not going to sit down in your garage and spend a few years banging a hammer and come out with a GE AC6000CW.
<klange> And even if you do build something full-scale and passenger-capable, you're gonna have a hell of a time getting anyone to ride it - most people don't even ride the trains they do by choice, but by necessity.
<geist> but, also while banging out your steam locomotive you learn a heck of a lot about how the big ones work, and that gives you a lot of knowledge about how trains go and comes in handy
<vin> How would you build a GE AC6000CW that people would ride?
<geist> vin: exactly.
<klange> Hobby OSes are like model trains, model steam engines, like building a replica of a Union Pacific 844.
<klange> "But some guy built a diesel engine in his garage in Finland and now it's one of the most widely used freight engines in the world!" Yeah, well, that guy got lucky with timing.
<klange> Got in right when there was a market, published all his designs, didn't have to deal with all the safety regulations and fuel efficiency requirements and hundreds of different track gauges...
Skyz has quit [Quit: Client closed]
<vin> Didn't Ken also start with an hobby project? Offcourse a different world all together.
<klange> Kind of, but it was after he had already been hired by Bell and been professionally involved with Multics - while working on that project, he had built a video game (Space Travel) for the systems they were using, and when Bell withdrew from the project he ported it to the PDP-7, which was a few years outdated already, and that eventually led to Unix.
<geist> also being at bell labs they were basically paid to do various researchy projects
<geist> so could officially write it off as a project, and/or a solution to help other researchers do their work (text formatting, etc)
<nur> wait wait people don't like trains?
<klange> Oh trust me, there are some people who _love_ trains. But most people hopping on the subway in New York have at best a neutral attitude towards them.
<nur> that explains why my friends look at it me funny when I was hopping up and down with joy on the T in Boston
<geist> i like trains
<nur> me too!
<geist> i'm not a total nerd over them, but i'm more impressed with the power/etc
<nur> me neither, I just like the idea of trains
<klange> And much like how many people use Windows because they need to and not because they care, it doesn't matter how much better your train is if they need to get to Church Street and yours only goes around your house.
<nur> they go through the city _and_ the country! They make train noises! You can sit down and look out the window and work on The Great American Novel and be inspired by the landscapes that zoom by!
<geist> but the locomotive analogy is interesting too: the kernel/OS is the locomotive, but apps/etc are the cars
<klange> They might even love trains. They might be super into N700S. But they're not traveling from Tokyo to Osaka. They're going to work.
<geist> so you can build an incompatible hobby locomotive for a weird track, whcih is fine
<geist> but..
<nur> is it a nice small bonus for your resume though
<klange> Either the track or the stations are the hardware, this is where the analogy gets a bit more muddy.
<klange> Probably track = architecture, stations = devices?
<geist> or the track is the API? though you can argue it's the connection between trains i guess
<klange> Or maybe the physical tracks are... like individual computers and the _gauge_ is the architecture? You can design a train for any gauge, but you can generally only build it for one at a time? And you install the train on a particular network of tracks and...
<klange> Aw jeeze Skyz isn't even here anymore. Choo choos fun, time for lunch.
<nur> if you want a systemsy type job it's nice to be able to say "oh I did this" even if no one uses it, right
<klange> Yes.
<geist> yah like i said building your own locomotive is at worst extremely educational
<geist> and in a lot of cases that does have real benefits
<geist> since like it or not anyone that does software development is indirectly using trains
<nur> you mean operating systems
<geist> no trains! choo choo
<nur> fun fact: I did systems dev for trains once
<klange> I've done planes. Anyone done automobiles?
<nur> ooh then we have the Steve Martin trifecta
<nur> what's this thing when you do qemu -d int : env->regs[R_EAX]=00000000
<nur> is it EAX?
<geist> probably
<nur> funky syntax though
<nur> env->regs
^[ has quit [Read error: Connection reset by peer]
^[ has joined #osdev
MarchHare has joined #osdev
<MarchHare> Looks like it's improved a bit here.
<MarchHare> Well, maybe still quiet
<klange> You just missed a considerable conversation about trains.
<Mutabah> Take a look at the logs if you're curious
<MarchHare> Meh, not THAT interested about trains.
<klange> It's a metaphor for osdev.
<MarchHare> I suppose I'm just a bit dumb I guess. Didn't seem like a good metaphor to me.
dragestil has joined #osdev
dragestil has quit [Client Quit]
dragestil has joined #osdev
<MarchHare> Is there an actual programming tutorial for the AC97 stuff? Like something that walks through the initialization and how it's all done? I've tried to read the official docs, but they're very inscrutable for me.
<MarchHare> Example code with basic comments would probably be best for me.
srjek|home has quit [Ping timeout: 240 seconds]
<klange> I don't even know if my driver works on real hardware, since all I have is newer HDA chips... https://github.com/klange/toaruos/blob/master/kernel/audio/ac97.c
<bslsk05> ​github.com: toaruos/ac97.c at master · klange/toaruos · GitHub
vdamewood has quit [Quit: Life beckons]
<MarchHare> I'll give your OS a look. I've been trying to wrap my head around the basics of AC97 for awhile, but I just keep not getting it. I think I probably also need to learn how to talk to the PCI bus, too
vdamewood has joined #osdev
nly has joined #osdev
<MarchHare> klange: You got a bunch of pull requests on nyancat =P
<klange> ac97 is a bit more straightforward because it has fixed offsets for things like PCM Out
<klange> HDA has this whole complicated connection system with nodes
<klange> HDA was meant to be a long-running future-proofed system for consumer audio hardware, so it supports doing all these crazy things at the cost of "I just want to spit audio data at speakers" being... a lot more complicated.
tenshi has joined #osdev
<klange> ugh, so many things on my plate; should write them on post-it notes, stick them on a dart board, and see what I hit
<klange> i'm terrible at darts, so it's basically random
<clever> does the hda/ac97 stuff support codecs other then pcm?
<klange> Okay, so quick explanation of some things:
<klange> - AC97 is, in audio hardware terms, a codec in itself. The most common hardware form of it was the integrated ICH version shipped with Intel hardware, but it showed up behind other interfaces as well.
ElectronApps has quit [Remote host closed the connection]
<klange> - HDA is actually an entirely different class of hardware: It's just a _controller_. It supports an arbitrary set of codecs behind it.
<clever> ahh
<klange> I believe HDA does support additional codecs on top of basic PCM ones, as I'm sure I've seen that in Intel hardware before?
ElectronApps has joined #osdev
<klange> And presumably it's some capability bit somewhere...
<clever> and that could allow shoving raw mp3 data into the sound card, and skip the cpu decoder?
<klange> This is also why AC97 is so simple, you're basically just pointing the codec at a buffer and saying "play this".
<klange> HDA you need to point a "stream" at your buffer, point a DAC at your stream, wire your DAC to an output, and turn them all on.
<clever> so HDA has a lot more hw accelerated mixing of multiple hw accelerated codec decoding?
<klange> Which is really flexible, especially when we have hardware with multiple separate outputs, duplex ports, loopback support...
<clever> yeah
gioyik has quit [Quit: WeeChat 3.1]
<clever> ive often heard that alsa usually lacks hw mixing, so you either need alsa soft mixing, or pulseaudio, to share an output node
<klange> But I just want to say "hey here's some 48khz s16le" so I can hear the explosions in quake...
<klange> This is part of why the Linux world moved to Pulse in the first place - it was built for this newer kind of hardware that has these audio routing mechanisms built-in.
<clever> that reminds me, of what i tried to fix in my PA config years ago
<clever> 00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) (rev 40)
<clever> i believe thats the onboard sound card
<clever> it has 3 capture channels, each with its own 3 input mux, all able to pick any input
<klange> Yep, and a demonstration of how HDA became the standard for modern audio interfaces on x86 - even AMD does it.
<clever> but pulseaudio (by default) only exposes 1 capture channel
<clever> and the alsa knobs to adjust all of that, have identical names, with an index attribute on them
<clever> and the PA config files, didnt support an index attribute, when describing how the knobs are related
<clever> so i had to patch pulseaudio, to even be able to describe the card to PA, so it could capture from 3 inputs at once
<klange> I have an 8086:a170 for the motherboard audio, and the HDMI out on my 1080 is _also_ an HDA.
ElectronApps has quit [Ping timeout: 258 seconds]
<clever> 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Tobago HDMI Audio [Radeon R7 360 / R9 360 OEM]
<clever> thats my hdmi out
<clever> options snd_hda_intel enable=1,0
<klange> Dunno if that's an HDA, see if you have snd_hda_codec_hdmi loaded?
<clever> pretty sure its hda-intel
<clever> i had to half blacklist it like this
ElectronApps has joined #osdev
<clever> because the hdmi->dvi adapters where coming up as a default output randomly
<klange> I have codec_generic, codec_hdmi, codec_realtek...
<clever> due to the above enable flag, its likely ignoring the capabitilies of the gpu sound
<clever> so it wont load those codecs
<clever> [ 7.982597] snd_hda_intel: probe of 0000:01:00.1 failed with error -2
<clever> [ 8.033124] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC898: line_outs=4 (0x14/0x15/0x16/0x17/0x0) type:line
<klange> My HDMI is hooked up to something without audio anyway.
<clever> same
<klange> What I need to support is the one in my laptop...
<clever> thats why i disabled it
<klange> Just the speakers... just enough to hear those glorious grunts and gunshots...
<clever> PA would randomly default to that device, and not switch away
<clever> so all audio would just mysteriously vanish into /dev/null
<klange> Interesting. I do find myself twiddling with pavucontrol a lot, but often for application failures and because I have way too many capture devices.
<clever> Bus 002 Device 007: ID 046d:0a66 Logitech, Inc. [G533 Wireless Headset Dongle]
<clever> my primary audio device is usb based
<clever> bInterfaceClass 1 Audio
<clever> tSamFreq[ 7] 48000
<clever> wFormatTag 0x0001 PCM
<clever> looks like you just shove 16bit 48khz audio into the right endpoint and your done
<clever> klange: does HDA support any kind of writeback, such as feeding mp3 into the card, and getting flac back out in ram?
<klange> The interface in general does, it's up to the hardware - you'd need a node that receives and decodes mp3 from a stream, and a node that encodes and outputs flac to another, and to be able to attach them.
<clever> ah
<clever> so in theory, it could implement a hw accelerated transcode, with lower watt usage compared to a software solution
<clever> but i feel like modern audio stacks (pulse, alsa) dont expose such codecs at all, and expect it to already be in pcm form
<clever> so all of that hw goes to waste?
Izem has joined #osdev
<nly> heyy
<Izem> hola
<geist> clever: most likely. on cell phones and whatnot i think there's still some value to hw assist
<geist> but even then it's probably just a DSP
<geist> 10-15 years ago it was practically mandatory to assist with mp3 decode and whatnot
<geist> (on battery powered things)
<clever> yeah
<clever> i suspect my old mp3 player was more dsp then asic
<clever> because you could load more codecs into it
<clever> rca lyra
<geist> taht being said if you go back far enough it was just ARM again. ipod classic first few gens, for example, were basically a dual core ARM7tdmi
<geist> was before things were big enough that DSPs were being developed for that sort of thing
<clever> there was also that other ipod model, which was it...
<bslsk05> ​en.wikipedia.org: VideoCore - Wikipedia
<clever> > The VideoCore II-based VC02 / BCM2722 processor provides video capabilities for Apple's 5th generation iPod.[4]
<geist> yep
<clever> a quick glance at the features of the 4th and 5th gen, says they added video playback
<clever> makes my think it has all of the video hw accel of an rpi
<geist> somethingl ike that
<geist> early version of it at least
<clever> yep, before they added an arm core
<clever> and variables like how many macroblocks/sec it can do are likely tuned differently
nly has left #osdev [#osdev]
Robbe has quit [Remote host closed the connection]
* kingoffrance reads backlog
<kingoffrance> didnt you guys ever see "flight of the phoenix" ?
<kingoffrance> eventually the "repair based on toy" flies
<doug16k> clever, that hardware goes to waste, because historically, audio offload has been so flaky that just PCM is good enough
<doug16k> if you give me a hardware accelerate audio checkbox, I'd seriously consider unchecking it
<moon-child> hw accelerated audio usually has really dumb limits
<moon-child> latency wise mixing is not the bottleneck so it's questionable from the outset
<doug16k> it's like NIC offload. sure, that would be nice if it did the encryption for me. ya right. if it did, it would be a server NIC chip, not on the cheap ass device I got for at home
<geist> kingoffrance: heh yeah
<geist> totally forgot about that
<doug16k> CRC is about the only offload I'd trust fully. how dumb would it need to be to get that wrong
<clever> related, the h264 hw decode on the rpi, has some edge cases that complicated it when i tried to use it
<clever> the api is async, and fully handles b-frames
<clever> but ffmpeg/mplayer, expected a sync api
<clever> so mplayer gives you one frame of encode video, and waits for a decoded frame
<clever> the rpi hw, expects several mb of encoded data, before giving the 1st frame...
<clever> b-frames also complicate it, in that both the hw accel, and mplayer, want to re-order the frames, and that just scrambles it more
<doug16k> yes, video is hard because it has to worry at the level of the differences between whole recent frames to extract the extreme redundancy
<doug16k> one frame at a time is not feasible
<clever> b-frames is a special thing, where the frames are encoded in the wrong order, because it compresses better
<doug16k> that issue with the encoding side leaks to the way the output side works, and decoding
<clever> and you then have to buffer a few, and display them out of order, to correct for that
asymptotically has joined #osdev
MarchHare has quit [Ping timeout: 255 seconds]
asymptotically has quit [Remote host closed the connection]
asymptotically has joined #osdev
gog has joined #osdev
CryptoDavid has joined #osdev
grange_c has quit [Quit: The Lounge - https://thelounge.chat]
grange_c has joined #osdev
Robbe has joined #osdev
dormito has quit [Ping timeout: 240 seconds]
divine has quit [Ping timeout: 252 seconds]
divine has joined #osdev
GeDaMo has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
ElectronApps has joined #osdev
scaleww has joined #osdev
dormito has joined #osdev
asymptotically has quit [Quit: Leaving]
<doug16k> oh wow, description is wrong on _mm_set_pd1 https://software.intel.com/sites/landingpage/IntrinsicsGuide/#!=undefined&text=double&expand=6085,6085,6085&techs=SSE,SSE2,SSE3
<bslsk05> ​software.intel.com: Intel® Intrinsics Guide
<doug16k> it doesn't broadcast it
<doug16k> it's no wonder half the intrinsic use has stupid memory punning
<doug16k> seems like most people don't even know the cvt and casts exists
<doug16k> s/s$//
<doug16k> if it broadcasts it, what does _mm_set1_pd do?
<doug16k> broadcast it?
<doug16k> ah, it'd be move_sd
<doug16k> hardly even worth touching intrinsics anymore. autovectorizer will do it from simple code
<doug16k> it will see what you mean then just generate vector code for it
<doug16k> if you say it right
<doug16k> even if you did something like create an x rotation matrix and multiply it by another matrix, it can "know" all the zeros and ones and unknown and do it really well
<doug16k> if you force intrinsics down its throat it probably will do all the muls with 1 and 0 and adds of 0
zoey has joined #osdev
<doug16k> that _pd1 one must have been some inconsistently named one that was renamed
Izem has quit [Quit: Izem]
<doug16k> I can't believe linux. parallel make spends more time in clear_page_rep than anything else
<doug16k> whoever says clearing it at the last moment is good, is wrong
<doug16k> I have 33G of completely unused memory (not even used by cache) and it is blocking the paging operations zeroing pages
<mjg> that's a huge tradeoff
<doug16k> why not clear pages during idle
pony has quit [Quit: WeeChat 2.8]
<mjg> idle clearing was the go to method in bsds for years
pony has joined #osdev
<mjg> turns out the free pages get shredded immediately
<doug16k> well it is the right way now, given that a parallel c++ compile's top function is clear_page_rep
<mjg> i don't know have numbers handy right now, but something as simple as kenrel compile shreds through gigabytes of memory in terms of total pages used
<doug16k> try it. get something big and run sudo perf top and start make -j$(nproc). see if it's all clearing pages and malloc time
<doug16k> something with tons of ram
<mjg> well grab yourslef an ebpf script or something and count how many times the routine is called to begin with in your workload
<mjg> multiply by 4k and that's howm uch spare memory you wold have to have to avoid calling it
<mjg> i'm telling you would walk through your stash almost immediately
<mjg> and hten you are back to zeroing on demand
<doug16k> it's not like I imagined that clearing pages would be slow. I am talking about perf top telling me that it spent more cycles clearing pages than anything else
<doug16k> you saying that g++ will eat through 32 GB of freed pages in seconds in CoW?
<doug16k> zeroed*
<mjg> i don't have one liners for linux
<doug16k> COW + demand
<mjg> i can get you numbers from freebsd in 10 minutes
<doug16k> it might show something else on freebsd in a profile
<mjg> i'm not talking about profile
<mjg> i'm talking about number of page zeroing calls
<mjg> and consequently memory demand
<doug16k> I am talking about number of cycles spent where
<mjg> and i'm telling you background zeroing will not help
<mjg> because whatever stash you may amass will get shredded imediately
<mjg> and then you aer back to zeroing on demand
<doug16k> not really
<doug16k> let's say I did make -j1. other idling cpus would keep it full of zeroed pages and it would never block
<doug16k> or if -j isn't keeping up
<doug16k> or if the build becomes a bit sequential
<mjg> sure, you can come up with a contrived scenario where it "manages|
<mjg> i assumed parallel build
<mjg> make -j$(nproc)
<mjg> which keeps all cpus busy
<mjg> you quoted it yourself
<doug16k> yes
<doug16k> but even then, it's not perfect
<doug16k> there will be moments of idle for various reasons and it could do some proactive clearing
<doug16k> I see your point though. I would need to know the sum of all allocations during the build. if that exceeds the free pages by much it will degenerate to last moment
<doug16k> but just because it can't stay optimal the whole time, doesn't mean it won't benefit from the zeroed page pool sometimes
<mjg> well i got your numbers
<mjg> dtrace -n 'fbt::pagezero:entry { @[stack()] = count(); }' from building the freebsd kernel
<mjg> just from the page fault handler alone i got 36413914 calls
<mjg> that amoumnts to almost 139GB of memory
<mjg> total real time 52 seconds
<doug16k> how close to perfect cpu usage?
<doug16k> you have user and wallclock?
<doug16k> elapsed times I mean
<mjg> that one is alittle disfigured 3023.72s user 359.25s system 6423% cpu 52.665 total
<doug16k> every configure script serializing the build would fill up the pool
<mjg> there is configure here
<mjg> what is here is lz4 or some other compression algo which trails at low thread count as everyting else is done
<doug16k> 8 thread cpu?
<mjg> 104
<doug16k> lol oh ya
<doug16k> not even close to 100%
<mjg> > 13:56 < mjg> what is here is lz4 or some other compression algo which trails at low thread count as everyting else is done
<mjg> that's why
<mjg> prior to that utilisation was much closer to 100
<doug16k> right so those wasted cpus would be keeping zero page pool full
<doug16k> it would just sit full
<mjg> well they would not help much because most of thew orkload happened at the same time
<mjg> and shredded through gigabytes of memory
<mjg> the best you could do is slightly speed up the lz4 sucker
<mjg> which would go away if better concurrency was introduced there
<GeDaMo> It's a pity that memory wasn't slightly smarter so you could tell it to zero itself over a range
isaacwoods has joined #osdev
<doug16k> GeDaMo, that and more may be coming
<doug16k> there is a bunch of research right now into processing-in-memory
<doug16k> little coprocessor things that are right there next to the row buffer
<GeDaMo> I'm pretty sure that's been promised for a while :P
<doug16k> row copy and row clear and even vector arithmetic
<mjg> amd has a clzero instruction
<mjg> interestingly to my tests it reduces the cost of zeroing, but it uses non-temporal stores
<mjg> so then you pay for cache misses in userspace
<mjg> getting a net loss :)
<doug16k> it's great for pre-clearing
<doug16k> doesn't wreck cache
<mjg> that's the standard argument for background zeroing
<mjg> then it turns out the pages were in llc all along
<mjg> and now you evicted them
<mjg> basically the better job you do reusing pages, the worse this gets
<doug16k> it should kick over to pre-clear when hammering clearing
<doug16k> if cpus are idling
<doug16k> how the hell can clearing pages be at the top of the profile of a compile
<doug16k> either gcc is optimized by a god or something is wrong with the memory management
<doug16k> is there a perf top for bsd?
<doug16k> perf is a linux thing, right?
<mjg> https://people.freebsd.org/~mjg/fg/flix1-r363215-buildkernel.svg is what i suspect you are looking for
<mjg> pagezero_erms
<mjg> zeroing is at top of the profile because 1. it is slow 2. it is in huge demand 3. rest of the kernel is not that bad
<mjg> the above is an old flamegraph, would have to update
<mjg> but ye, pagezero is pretty visible
<doug16k> ya. make -j$(nproc) is pretty much a lot of page fault handler with blips of compilation here and there
<doug16k> I guess this shows that modern cpus can breeze through any amount of computation, and they only have some problems with cache miss
<doug16k> and the next level cache, page table miss
<doug16k> hopefully never miss all the way to swap
<GeDaMo> Is it mmapping the source files?
<doug16k> but billions of times a day, page fault CoW
<doug16k> GeDaMo, it's the demand paging
<doug16k> touching the read only shared zeroed page and committing a page on write
<doug16k> when you allocate memory, it pretends, and you get the shared zeroed page mapped readonly. when you write it, it coffee sprays, and clears you a page right then, and maps it into your process, and returns
<doug16k> so compilers spend more time going "oh they were really going to use the page" and clearing it
<doug16k> ...than compiling
<doug16k> I want to get turbo C++ working in DOS and time it compiling 5000 source files
<doug16k> on one cpu, on 3950x
<doug16k> how many milliseconds you think?
<doug16k> 10?
<doug16k> I can't imagine it being over 100ms
<GeDaMo> Does it actually copy a zero page or does it just write zeroes out to a page? Would that make much difference?
<doug16k> it writes zeros
<doug16k> that already-zeroed page is being shared by all the other virtual memory ranges that are allocated, but not yet written
<doug16k> when you write it, it looks and realizes that you are allowed to write there, so at that moment, it actually gets you your own page, clears it, maps it there, and restarts the instruction
<doug16k> until then it just pointed at the zeroed page and was readonly, even though you are supposed to be allowed to write there
<doug16k> obviously can't let you trash the shared page
<doug16k> the reasoning behind clearing it right at the last moment is, it'll be in the cache from the clearing
<doug16k> my argument is Amdahl's law - move that sequential bit out and run it in parallel if you can
<doug16k> or even not parallel, concurrently during idle time would work
<doug16k> I just find it very hard to believe that you wouldn't get a gain from pooling zeroed pages, even if you did end up draining it sometimes and falling back to last-moment
ahalaney has joined #osdev
immibis has quit [Remote host closed the connection]
aquijoule__ has quit [Remote host closed the connection]
aquijoule__ has joined #osdev
pieguy128 has quit [Quit: ZNC 1.8.2 - https://znc.in]
pieguy128 has joined #osdev
silverwhitefish has quit [Remote host closed the connection]
silverwhitefish has joined #osdev
gog has quit [Remote host closed the connection]
Izem has joined #osdev
nick8325 has joined #osdev
gog has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nick8325 has quit [Quit: Leaving.]
nick8325 has joined #osdev
Brnocrist has quit [Ping timeout: 252 seconds]
Izem has quit [Quit: Izem]
Brnocrist has joined #osdev
andydude has joined #osdev
ElectronApps has quit [Remote host closed the connection]
freakazoid333 has joined #osdev
Izem has joined #osdev
Izem has quit [Client Quit]
Brnocrist has quit [Ping timeout: 255 seconds]
Skyz has joined #osdev
<Skyz> Interesting discussion happening here: https://forum.osdev.org/viewtopic.php?f=15&t=42259
<bslsk05> ​forum.osdev.org: OSDev.org • View topic - Reinventing Unix is not my problem
andydude has quit [Quit: andydude]
andydude has joined #osdev
andydude has quit [Quit: andydude]
andydude has joined #osdev
Izem has joined #osdev
andydude has quit [Client Quit]
Skyz has quit [Quit: Client closed]
MarchHare has joined #osdev
scaleww has quit [Quit: Leaving]
sts-q has quit [Ping timeout: 252 seconds]
sts-q has joined #osdev
vdamewood has joined #osdev
vinleod has joined #osdev
iorem has quit [Quit: Connection closed]
vdamewood has quit [Ping timeout: 255 seconds]
vinleod is now known as vdamewood
srjek|home has joined #osdev
tacco has joined #osdev
Brnocrist has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
vdamewood has joined #osdev
<dzwdz> i have a weird issue - iret'ing from an interrupt causes a #GP in bochs but works just fine in qemu
<dzwdz> when the iret is executed in bochs, it says that something is not a valid code segment
Izem has quit [Ping timeout: 255 seconds]
<j`ey> doug has said a few times that bochs is way more strict than qemu
<dzwdz> yup, i've noticed that already
<dzwdz> i had some issues with the GDT earlier today and it helped me catch that
<dzwdz> but i have no idea what could be causing this
<dzwdz> the interrupt handler is a single iret
<dzwdz> that should work fine, right?
Izem has joined #osdev
<gog> depends on the interrupt
<gog> some will need to have the stack popped once some won't
<gog> for exceptions anyhow, irqs don't
<dzwdz> it works fine on qemu though
<dzwdz> if the stack pointer wasn't in the right position then both qemu and bochs would break, right?
<dzwdz> also i should probably mention that this is a software interrupt
tenshi has quit [Quit: WeeChat 3.2]
<gog> hm yeah
<gog> idk then for sure
freakazoid333 has quit [Read error: Connection reset by peer]
brynet has quit [Quit: leaving]
brynet has joined #osdev
<geist2> mjg: re: clzero whats interesting there is ARM more or less mandates it
<geist2> ie, doesn't have to use `dc zva` but pretty much all memsets i've seen in real systems just revert to it when setting zeros
<geist2> i do kinda wonder about the logic of that, but i dont think anyone has really done any tests to see if it's the best idea in general
johnjay has quit [Ping timeout: 252 seconds]
<geist> dzwdz: the error code for the #GP should have the offending code segment i believe
johnjay has joined #osdev
<geist> if it's complete nonsense then it sounds like your stack is corrupted or misaligned
<dzwdz> the bochs error message shows that it's the r0 data segment
<geist> what is r0?
<dzwdz> ring 0
<geist> ah there you go then
<dzwdz> i don't modify the stack at all, though
<geist> doesn't matter to the computer
<dzwdz> but what else would corrupt the stack?
<geist> if its your first iret i'd stick a breakpoint or an infinite loop to ust in front of the iret and manually verify the stack
<geist> who knows. root cause it first
<dzwdz> that's a good idea, idk why i haven't though of that
<geist> like, you take an irq and dont touch the stack *at all*?
freakazoid333 has joined #osdev
<dzwdz> aren't IRQs the ones caused by hardware?
<dzwdz> it's a software interrupt
<geist> sure. same thing
<dzwdz> the IDT points to the iret
<geist> ah so it's literally a one byte irq handler?
<dzwdz> yup
<geist> well, again, gotta root cause it
<geist> that *should* work but there are a few reasons it could still get screwed up
<geist> like, for example, are you in 64bit?
<dzwdz> nope, 32bit
<geist> have to use the right form of iret (iretq for 64bit)
<dzwdz> i tried that just to be sure, but it didn't even compile
<geist> also if you are running on 'bad' segments prior to the irq it might still be okay up until you reload them via the iretq
<dzwdz> i'm pretty certain that my segments are good now
<geist> like if your CS is actually wrong, it takes the int, saves it on the stack, switches to a 'good' cs, runs iret, switches back to th bad one and faults
<dzwdz> i'm almost certain that both the code and data segments are correct
<geist> you can get that if you are running on some existing segments, reload your GDT, and dont reload your segments
<geist> what are their numbers?
<geist> did you reload all your segment registers after setting up your GDT?
<dzwdz> i've reloaded the code segment
<dzwdz> i should probably reload the other ones too
<geist> you absolutely must
<geist> what are the segment numbers?
<dzwdz> ring 0 code is 1, ring 0 data is 2, ring 3 code is 3, ring 3 data is 4
<geist> what values are you putting in the registers
<dzwdz> also, brb
<geist> sure, thanks for telling me you're brbing, btw. nothing more frustrating when someone just dissapears for a while and doesn't say anything while trying to help them :)
immibis has joined #osdev
<dzwdz> ok, i'm back
<dzwdz> i'm putting the index of the ring 0 data segment * 8 into %ds
<dzwdz> it uses the same enum as the code which sets the gdt up
<dzwdz> and i'm not changing any other segments
<geist> which is what, 0x10?
<geist> and cs has 0x8 in it?
<dzwdz> i'm not putting anything in cs
<geist> then what is in it
<dzwdz> wait a sec
<geist> i'm just trying to ask what actual numbers are in your cs and ds/es/fs/gs/ss
<geist> i think i'm zeroing in on something but i always want you to check your work and make sure all your blindspots are covered
<dzwdz> cs is 8, ds is 10
<dzwdz> in hex
<geist> good. so after you reload your GDT you did reload ds/es/fs/gs/ss?
<geist> if you did not, please do so
<geist> also want to far jmp to reload cs too
<geist> that will sniff out any bugs in your GDT
<dzwdz> i think i might know what's the issue
<dzwdz> the other segments are complete nonsense, including the stack one
<geist> segment register values?
<dzwdz> yup
<geist> yes you *must* put your house in order when you set up your own GDT
<geist> usually very first thing you do is reload all of the registers
<dzwdz> alright, thanks a lot ^^
<geist> this could be a big difference from bochs and qemu: the default state of the cpu and/or bios both of them are using
<geist> if you're relying on whatever existing state was there you'll end up with different results
<geist> side note, for the int instruction are you in ring3 or ring 0?
<dzwdz> i'm kinda sleepy already, and i know better than to work on this now, i'll fix this tommorow
<dzwdz> ring 0
<geist> okay, so if it were ring3 it'd *also* be loading SS and whatnot, which would also probably pick up garbage and fail too
<geist> cool, glad i could help!
<dzwdz> also just to be sure: i can far jump to the same code segment that i'm in already, right?
GeDaMo has quit [Quit: Leaving.]
<geist> totally
<geist> the trick is loading the GDT doesn't set the new hidden cached values, you have to reload the segment registers on the cpu to refresh values from the GDT
<geist> or, do something that causes segments to be reloaded, like an iret instruction
<geist> so i think in this case the iret is just the canary for an earlier problem
<geist> one that you'll find if you immediately reload all your segments
<dzwdz> and if i'm far jumping then there's no reason at all to set the data segment manually before that, right?
<dzwdz> s/data/code
<geist> right
<geist> that's why you do it almost immediately in your code
<geist> get your newGDT, reload all the segment registers and now you're running on yours
<geist> otherwise you're using whatevr stale segments were there from the bios/etc
<geist> plus if your GDT descriptors are busted, it'll potentially fault almost immediately
<geist> at least for CS. DS/SS will probably fault pretty quickly, or go off in the weeds, if you start accessing things
<geist> ES/FS/GS may go for some time without blowing up, since they're not used as much
<geist> depending on the type of descriptor corruption you have
dormito has quit [Ping timeout: 255 seconds]
Izem has quit [Quit: Izem]
mcs51 has joined #osdev
dormito has joined #osdev
^[ has quit [Ping timeout: 240 seconds]
^[ has joined #osdev
^[ has quit [Ping timeout: 255 seconds]
Skyz has joined #osdev
mahmutov has joined #osdev
^[ has joined #osdev
gog has quit [Quit: bye]
freakazoid333 has quit [Read error: Connection reset by peer]
ahalaney has quit [Quit: Leaving]
sortie has quit [Quit: Leaving]
zoey has quit [Remote host closed the connection]
freakazoid333 has joined #osdev
Skyz has quit [Quit: Client closed]
mcs51 has quit [Quit: WeeChat 3.0]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
elastic_dog has quit [Ping timeout: 246 seconds]
wolfshappen has quit [Quit: later]
wolfshappen has joined #osdev
<kazinsal> today I learned that like, half a dozen of the regulars in one ofe the community's I'm part of are colourblind. totally rethinking my approaches to highlighting in UX design now
<kazinsal> also apparently I cannot spell after a few work-approved afternoon beers damn
<geist> ah work approved afternoon beers
<geist> i do still wonder exactly what happened to the beer keg we had at the office when it shut down
<geist> far as i know no one did anything with it, and it's just been sitting there for a year and a half
<kazinsal> oof, that's going to be some raunchy brew
<kazinsal> today was one of our datacenter project guys' last day so we did a virtual sendoff party
<geist> yah and one of the taps had non alcoholic kombucha
<geist> we joke that it's probably alcholic now
<kazinsal> ha
<kazinsal> hmm. reminds me, the local polytechnic university does beer sales on friday from their brew lab. should grab a bus over there quickly and pick some up
<geist> annoyingly the little fan on one of my rpi4s has died
<kazinsal> apparently you can get kegs as well but, uh, that's probably a bit much
<geist> or is dying. interesting that it's little bearing is going, but it runs okay when i remove it from the case (and it gets to vibrate)
<geist> but as soon as i put it back on the case it barely spins. i guess it being held in a vibration free environment it ends up in some localized pattern where the vibration cancels its movement
<geist> i'm sure there's some name for that sort of thing that mechanical people know of
<geist> same effect if i stand it up on the table while spinning but push down such that it can't vibrate freely
gog has joined #osdev
srjek|home has quit [Ping timeout: 240 seconds]
<geist> yay fixed the fan
<geist> popped it out, pulled the fan out of the sleeve bearing, put some more oil on it
<geist> Singer machine oil. works great
<geist> probably will dry out again pretty quick, but now i know
<geist> i had watched someone on youtube do this. i've been throwing out sleeve bearing fans for years that start to make a lot of racket
<geist> lots of times i think it has just dried out