<doug16k>
syscall and sysret just cheats, and just knows what to immediately load into the descriptor cache, instead of loading a descriptor from the gdt and validating it
<NieDzejkob>
huh, so why does intel implement sysenter in long mode?
<heat>
because they made up sysenter and not implementing it would be admitting defeat :)
<doug16k>
my cpu implements it
<doug16k>
3950x
<heat>
only on 32-bit compat right?
<doug16k>
think so
<doug16k>
why would you want sysenter?
isaacwoods has quit [Quit: WeeChat 3.2]
<doug16k>
it is just worse
<heat>
fun?
<heat>
i've personally never looked at it
<doug16k>
it's worse because you have to set another MSR every context switch
<heat>
yeah
<heat>
oh well
<doug16k>
in syscall, that can just be a store to cpu local storage, out of order, not a serializing msr write
<doug16k>
AMD realized that they don't need to "help" you, you are a grown up kernel dev, you can set up the stack
<doug16k>
because you already need cpu-local storage for 20 other reasons
<heat>
sure but this was before swapgs
mrlemke has joined #osdev
<heat>
"When the CPUID instruction is executed on the Pentium Pro processor (model 1), the processor returns a the SEP flag as set, but does not support the SYSENTER/SYSEXIT instructions."
<heat>
what why how
<doug16k>
intel botched cpuid over and over
_mrlemke_ has quit [Ping timeout: 252 seconds]
<doug16k>
imagine being so stupid that you use one bit to tell if it is a 2 thread machine?
<heat>
hyperthreaded? or straight up 2-threads?
<doug16k>
it's as if they tell all the engineers not to worry about bad decisions, we are going to throw it all out next gen, then never throw it out
<doug16k>
yeah they made the hyperthreaded capability one bit with no adjacent space
<doug16k>
it didn't even *occur* to them that there could be more than 2 someday
<heat>
it doesn't sound that crazy
<heat>
if you want to know more there's topology stuff in the CPUID I think
<gog>
yeah and if you fuck up one leaf just add another :p
<heat>
the last time I dove deep into the intel manuals I was trying to figure out how to get the current frequency
<heat>
i immediately regretted it
<heat>
option 1: read a nice msr with the frequency in khz
<gog>
well for skylake and newer it looks like it has its own leaf for base, max and bus freq in MHz
<klysm>
what's the practical difference between an amd tr4 threadripper with 24 threads and an old intel xeon with 24 threads?
<klange>
Switching everything to the TrueType lib and ditching the SDF stuff dropped 30MB of runtime memory usage, presumably because all the SDF lib users were caching scaled bitmaps...
<heat>
option 2: read msrs with two opaque values that mean literally nothing, then you do cpu_khz * aperf_delta / mperf_delta, where cpu_khz is a value you need to get through calibration
<heat>
guess what x86 does?
<doug16k>
klange, TR has drastically more I/O bandwidth
<gog>
option 2
<heat>
bingo
<klange>
klysm*
<NieDzejkob>
wtf
<klysm>
doug16k, it's me
<gog>
except on skylake and newer with level 16h
<doug16k>
oops
<gog>
but it's also missing a few orders of magnitude of relevant information
<NieDzejkob>
so there's literally no way to get the freq accurately pre-skylake?!
<heat>
gog: i doubt they update it
<heat>
that's just base frequencies and whatnot
<gog>
yes
<doug16k>
klysm, TR has drastically more cache
<heat>
if you look at your htop or /proc/cpuinfo you'll see they get the actual frequency it's running on
<NieDzejkob>
is that why /proc/cpuinfo reports, uhh... 3391.217?
<klysm>
doug16k, so the pci-express (v3?) io speed is the main difference? with cache as an additional factor
<NieDzejkob>
and not a rounder number...
<heat>
NieDzejkob: that is mostly-accurate
<doug16k>
klysm, no. even if you broke TR's legs and ran it as lower PCI version than the xeon, it would still blow away the xeon with sheer lane count
<doug16k>
TR having 2x speed PCIe per lane than xeon is nice though
<klysm>
doug16k, does your TR box have 24 threads?
<doug16k>
I don't have TR
<klysm>
oh I must have missed something
<heat>
it's a 3950x
<doug16k>
I have the highest desktop zen2 one, 16 cores
<heat>
that's "merely" ryzen 9
<doug16k>
32 threads, not even a threadripper
* klysm
starts to price those
<doug16k>
klysm, it is insanely powerful for compiling
<doug16k>
klysm, if you had extreme I/O requirements, like running 16+ drive RAID array or something, then TR would make tons of sense
<doug16k>
if you want make -j32 to destroy giant compiles, 3950x will be a lot cheaper and super good
<klysm>
doug16k, so now it's 180 for am4 1920x 3.5gHz 24threads vs. 404 for 3950x 3.5gHz 32threads
<doug16k>
3xxx will kick 1xxx ass by at least 25% though, without being faster clock
<klysm>
why 25% ?
<doug16k>
just plain way more IPC
<doug16k>
they also shortened cache latency a good chunk
<doug16k>
especially in parallel. 3xxx drastically improved core-to-core latency
<doug16k>
you would get a ton of perf from that TR for the money though
<doug16k>
it has 2x memory bandwidth of non-TR
<doug16k>
twice as many channels
<doug16k>
quad instead of dual
<klysm>
hmm so are you still thinking 25% ?
<doug16k>
if money is a concern, 180 is a bargain for that
<doug16k>
you will need to disable the uop cache for it to be stable in linux. it hardly changes the perf anyway, the decoder is really good
<klysm>
yeah I'm guessing money is a concern. and my investment advisor may still need my money. so not necessarily happening yet.
<heat>
also where do you get a 3950x for $404
<heat>
that's literally half the price
<doug16k>
klysm, it depends. the way it works for those server-class-I/O machines is, sometimes, my machine will be faster, and sometimes you will leave me in the dust. depends on how often a program is I/O or memory bound
<doug16k>
if compute bound, I pull ahead slowly, if memory or I/O bound, you go whizzing by at 2x speed
<doug16k>
or several times speed if extremely heavy I/O
<doug16k>
I'd bottleneck on my 28 lanes, you'd be lanughing with 64
<bslsk05>
'[EPILEPSY WARNING] How fast should an unoptimized terminal run?' by Molly Rocket (00:51:03)
<kazinsal>
blasting gigs of crap to stdout is kind of an edge case
<moon-child>
I don't need my terminal to run at 10000fps. But the windows terminal still fucked up a bunch of escape sequences last time I tried it sooo¯\_(ツ)_/¯
<doug16k>
clever, yes I agree with that guy 100%, he echos my thoughts exactly
<kazinsal>
the guy is right in that a gigabyte of data is nothing
<kazinsal>
but a gigabyte of data being shoved into your terminal is a lot
<kazinsal>
that's more data than you will ever be able to sanely parse using your mark one human eyeballs
<doug16k>
what is supposed to happen is, devs just crap out some code that just does it without worring too much, THEN you profile it and optimize the important part. nowadays, they skip the second bit
<heat>
as a bad terminal writer I feel for the windows terminal devs
nyah has quit [Read error: Connection reset by peer]
<doug16k>
that guy can't believe it, that his crapped-out not-profiled code already blows away MS production code. imagine if he profiled it
<klange>
I support bracketed paste...
<kazinsal>
I find the idea of optimizing a terminal to be able to shit out a gigabyte to the screen at 10k FPS to be frankly bizarre
<moon-child>
also his colour benchmark they actually have a semi-legitimate reason for performing poorly on: subpixel antialiasing. As klange will attest, subpixel antialiasing should die, buuut...
<clever>
kazinsal: when i `ls -ltrh` in a dir with 1000 files, i dont want to wait 30 seconds
<kazinsal>
why are you dumping that straight to your terminial
<heat>
my crapped out not profiled code is really slow
<clever>
kazinsal: i have had to wait >30 seconds before, ON LINUX, because a 2d rendering bug caused multiple vsync waits per line
<heat>
in other news: non-gpu accelerated stuff is slow
<clever>
kazinsal: because i ran `ls -ltrh` in a dir, and forgot how many files where in there
<clever>
[clever@amd-nixos:~/Downloads]$ ls | wc -l
<clever>
2061
<kazinsal>
usually at that point I hit control-C, up arrow, and then " | less\n"
<clever>
kazinsal: like this one dir
<klange>
hot take: lazy rendering and taking gigabytes of output in stride is bad because when I hit ^C I should get an immediate response, not sit while those gigabytes of data get processed and lazily rendered because the whole app already ran.
<clever>
kazinsal: i have run into situations where ctrl+c doesnt stop it, because there is a huge backlog in the ssh buffers
<heat>
it takes ~1.5 seconds to see the output of ls -ltrhR on an edk2 tree
<heat>
oh no!
<heat>
(60929 lines)
<kazinsal>
a real smart terminal would detect that you've shoved a gigabyte of data into stdout and go "hey, stop it"
<kazinsal>
"did you mean: !! | less"
<clever>
kazinsal: the one in the above video, skips even trying to render, and goes into a process only mode
<clever>
so it can chew thru the data even faster
<kazinsal>
it's a neat optimization
<doug16k>
it's a classic example of the state of windows code as a whole. ingenious super fast code in some parts, unbelievably slow code in other parts
<kazinsal>
but hardly a "WOW ALL THESE OTHER TERMINALS SUCK" case
<clever>
kazinsal: he also mentioned, that the console subsystem in the KERNEL sucks, and is becoming a major bottleneck for his example term
<clever>
switching to named pipes, makes it even faster
<kazinsal>
yeah the windows console subsystem is kind mediocre
<kazinsal>
hasn't really had much work done on it in like, 30 yars
<kazinsal>
years*
<kazinsal>
partially because it wasn't really *intended* to be a primary user interface
<kazinsal>
we just started needing to use it because of some damn finnish college kid with a fancy new 386
<clever>
say that infront of some dos users :P
<kazinsal>
the NTVDM probably didn't suit their needs anyways
<kazinsal>
like throwing a unix greybeard into an AS/400 system and timing how long it takes for him to start whining about how he wants his PDP-11 back
<kazinsal>
reminds me I should play around with winevdm or whatever it's called these days
<kazinsal>
don't really have anything that *needs* it but it's a fun toy
<heat>
getting the vt emulator right is probably the thing they're most concerned about
<heat>
that and usability features of course
<kazinsal>
yeah, that really covers the overwhelming majority of use cases
<klange>
It takes 9 seconds to `cat /src/apps/bim.c` in my terminal with the new TrueType renderer, 2 seconds with the bitmap font, I suspect most of the time in the latter case is from scrolling.
<kazinsal>
to an extent the base console in windows sucks less at VT codes since 10
<klange>
(wc -l /src/apps/bim.c → 11673, for reference)
<clever>
on the linux side of things, the biggest bottleneck ive had lately, was screen
<heat>
not "urr durr why 1gb not fast" nor "urr durr why arabic not perfect"
<kazinsal>
my only issue with WT is that with non-antialiased truetype fonts it's not actually pixel perfect
<clever>
if i have 2 screen clients, of different size, connected to the same screen tab
<clever>
then it has to deal with word-wrapping internally, and that uses a lot more cpu
<kazinsal>
so if you tell it to render an 80x24 screen of a 10x20 font it won't actually be 800x480 pixels
<heat>
like who's writing arabic on a terminal?
<clever>
which is why i have ^a- bound, to un-select a window, and then the client temporarily doesnt get anything
<kazinsal>
this screws up some real nice shader stuff unfortunately
<klange>
Arabs, presumably.
<doug16k>
clever, that comment makes me unable to resist mentioning that I made a little SDL opengl hello world that runs at 11200 fps
<klange>
Someone who has their LC_ALL set to an Arabic script language and is getting error messages or other output in it?
<clever>
doug16k: i also got 10k fps out of an rpi once, by accident, because i had too high of a pixel clock, on a 100x100 display
<moon-child>
clever: tmux has a nicer solution to that particular problem
<heat>
klange: i don't think i've ever used a non-ascii character on a terminal and i've got plenty of those on my keyboard
<clever>
moon-child: tmux forces every client to the same size and tab, which makes it way more unusable
<moon-child>
it'll resize the window to match whichever client you touched most recently, and truncate on the other client
_mrlemke_ has joined #osdev
<doug16k>
what's even funnier: I was lazy and just used old style glBegin to skip over hand writing shaders and stuff
<kazinsal>
I've seen people's compilers spit out funky unicode
<doug16k>
so old crappy style API is 11k fps
<clever>
moon-child: due to that resizing, whenever i share a terminal with a co-worker (via tmate), i often wind up working in a tiny little box, because the remote tiling WM resized things
<klange>
heat: but as you demonstrate pretty clearly here, you have no trouble with English, a trait not shared by everyone sitting in front of a computer, even if they are into running things in terminals.
<heat>
hmm, good point
<moon-child>
doug16k: I get 23k with opengl core
<doug16k>
moon-child, nice
<moon-child>
size dependent ofc. 10k with a smaller window
<moon-child>
*larger
<moon-child>
23k was 640x480
<moon-child>
clever: ah, yeah, that's a fair use case
<doug16k>
I meant 640x480 2060 super
* moon-child
wishes he had a newer gfx card. rx570
<kazinsal>
bitcoiners should be punished with hard labour
<heat>
kazinsal: gpu's aren't used in bitcoin
<kazinsal>
abolish the prison system, except for cryptocurrency weirdos
<heat>
afaik it's ethereum that uses them
<heat>
bitcoin has been on ASICs for years
<kazinsal>
they're all bad
<heat>
yeah
<heat>
the good news is that stores are starting to have stock
<heat>
crazy prices though
<clever>
proof of stake is where its at
<doug16k>
being able to create money from electricity is destructive
<doug16k>
we need to set up the world so you can't do that
<clever>
doug16k: yeah, thats why proof of work is pointless
gog has quit [Ping timeout: 246 seconds]
<kazinsal>
global cryptocurrency power usage is like, 80 million tonnes of CO2 equivalent per year
<heat>
<donthitme> crypto is a scam and all coins are worthless </donthitme>
<moon-child>
proof of stake just strengthens the aristocracy
* CompanionCube
once read and liked a post from charles stross about wanting to burn bitcoin in a fire
<kazinsal>
that's roughly 20 million consumer vehicles doing 12000 km/year
<moon-child>
just kill crypto. It was a neat experiment, but it didn't work out
mrlemke has joined #osdev
<kazinsal>
and consumer vehicles are... honestly a pretty tiny fraction of CO2e emissions
<clever>
moon-child: look into proof of stake coins
<heat>
people just look at crypto like "oooh easy money i like money and want money"
<CompanionCube>
on that note binance is currently having regulation-related funtimes.
<moon-child>
clever: moon-child | proof of stake just strengthens the aristocracy
<kazinsal>
replace coal plants with nice safe canadian made reactors
<doug16k>
you know what's even funnier? crypto clients have one button overclock, so not only are they buying up all the new cards, they are straining the hell out of the existing ones, and aging their capacitors
<heat>
can you actually buy something with e.g bitcoin? not really, unless it's on the deep web
* kazinsal
does his best AECL sales rep impression
<CompanionCube>
heat: objection: i bought a VPS with bitcoin :p
<doug16k>
I can hardly believe defaults even works. overclocking is insane
<kazinsal>
yeah last card I overclocked was an R9 290
<kazinsal>
my 1080 Ti I just bumped the power limit up to 120% and let it boost however high it feels it should
<kazinsal>
it knows how many bins it can throw at the core clock within the power/thermal envelopes while staying stable
_mrlemke_ has quit [Ping timeout: 252 seconds]
<doug16k>
yeah, through a switching regulator whose output voltage is flopping all over the place in inverse relation to current
<doug16k>
I'm amazed they get defaults stable enough
<doug16k>
if power is 20% over normal, then most likely, the voltage overshoot when current requirement suddenly drops is also 20% higher spike
<doug16k>
so you get a pulse of aging every time it transitions from wide open to idle
<kazinsal>
the x80 Tis actually tend to have an intentionally lowered power limit than the design spec of the GPU itself
<kazinsal>
they're generally artificially limited Titan chips
<kazinsal>
so an x80 Ti will be a 250 W limited version of a 320 W chip
<kazinsal>
sometimes with a few execution units lopped off as well
<kazinsal>
or the GDDR bus shortened
<doug16k>
I guess I can't complain too much. everyone overclocking encourages them to exceed specifications by a bit so they get less RMA from overclockers, and we all end up with more stability
nyah has joined #osdev
_mrlemke_ has joined #osdev
mrlemke has quit [Ping timeout: 240 seconds]
<gorgonical>
Is there a way to bypass a kernel-central timer dispatch? On ARM64 there's a bunch of different timer IRQs for the different ELx levels, and with this hypervisor we're using you need to provide an EL2 timer handler, since an EL1 OS has to act as a scheduler, per ARM's FF-A.
<gorgonical>
Anyway, to separate the irqchip bringup from the hypervisor driver, the kernel needs to provide an interface. Could I do something like hand the irqchip bringup a function ptr, like void*(*el2_timer_tick)(void), and later when the hypervisor driver comes up, do something like set_el2_timer_tick(my_hypervisor_timer_tick)?
<gorgonical>
My gut instinct is that the initial, undefined/NULL value of el2_timer_tick is what would be used, right?
<gorgonical>
In that case, I can't really do much better than el2_timer_forward(void) { return el2_timer_tick(); } and register el2_timer_forward, can I?
ElectronApps has joined #osdev
<gorgonical>
As I'm reading the source code of the hypervisor I'm not even sure how this interrupt handoff works
<gorgonical>
The manual says that with IMO set to 0, an interrupt targeted at EL2 (which I presume the EL2 timer is) will not be taken and remain pending. That suggests to me that it shouldn't even fire at EL2. Does that mean it sticks around until EL1 happens again, and then fires?
<gorgonical>
If that's the case, why does the hypervisor even have an EL2_VBAR table? There's a stub in there that says "on an IRQ just call back to the primary VM and they'll handle it." But if the interrupt doesn't "get taken" how does that stub ever get reached?
<clever>
gorgonical: i think there is a config flag, that says if an IRQ should be serviced in EL1 or EL2
<clever>
and i think the GIC can also set that on a per-irq basis?
<gorgonical>
clever: Yeah, I think that's the HCR_EL2.imo flag. But the puzzle pieces aren't fitting together in my head, I guess
<gorgonical>
Does the el2 vbar get called, but the exception stays pending so that when el2 hands off to el1, it'll immediately re-fire?
<clever>
ive not really delt with servicing IRQ's on arm yet, most of my arm stuff has been low-level bringup or passing control to linux
pony has joined #osdev
heat has quit [Ping timeout: 252 seconds]
ElectronApps has quit [Ping timeout: 252 seconds]
ElectronApps has joined #osdev
<gorgonical>
Well I am thoroughly confused. I don't have any room in my brain left. Hope somebody sees all that and takes pity on me lol
freakazoid333 has quit [Read error: Connection reset by peer]
netbsduser has joined #osdev
netbsduser` has quit [Remote host closed the connection]
<geist>
gorgonical: which exceptinon are you thinking about?
<geist>
if it's an exception then no, it wont stay pending because they're intrinsically edge triggered
<geist>
page fault, illegal instruction, etc
<geist>
at least all synchronous exceptions
mahmutov has joined #osdev
_mrlemke_ has quit [Read error: Connection reset by peer]
_mrlemke_ has joined #osdev
englishm has joined #osdev
<gorgonical>
geist: I mean a timer interrupt, not an exception
mahmutov has quit [Ping timeout: 252 seconds]
srjek|home has quit [Ping timeout: 252 seconds]
ElectronApps has quit [Read error: Connection reset by peer]
YuutaW has quit [Quit: WeeChat 3.1]
ElectronApps has joined #osdev
ElectronApps has quit [Remote host closed the connection]
pony has quit [Quit: WeeChat 2.8]
pony has joined #osdev
ElectronApps has joined #osdev
mrlemke has joined #osdev
_mrlemke_ has quit [Ping timeout: 252 seconds]
ElectronApps has quit [Remote host closed the connection]
ElectronApps has joined #osdev
YuutaW has joined #osdev
GeDaMo has joined #osdev
j00ru has quit [Quit: leaving]
j00ru has joined #osdev
zoey has quit [Ping timeout: 240 seconds]
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<klange>
I think that covers the last general outline edge case I was missing, just need to do transformed compound glyphs...
<klange>
I need to write a font selection library / something less low-level than "take this exact font and render a glyph"; I had one at some point with the old Freetype-based wrapper?
<klange>
VL Gothic is 5MB, and though it deflates well because TrueType has no compression on its own, that's still pretty big compared to the rest of my live CD, so for now I'm going to maybe just stick it in a package...
<klange>
Though, now that I can display Japanese text again, maybe it's time for my next project to be an IME...
isaacwoods has joined #osdev
netbsduser has quit [Quit: Leaving]
ahalaney has joined #osdev
_mrlemke_ has joined #osdev
mrlemke_ has joined #osdev
mrlemke has quit [Ping timeout: 265 seconds]
isaacwoods has quit [Ping timeout: 272 seconds]
_mrlemke_ has quit [Ping timeout: 240 seconds]
isaacwoods has joined #osdev
doug16k has quit [Remote host closed the connection]
doug16k has joined #osdev
ElectronApps has quit [Remote host closed the connection]
ElectronApps has joined #osdev
srjek|home has joined #osdev
air has quit [Ping timeout: 265 seconds]
_whitelogger has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
qookie has joined #osdev
nsmb has quit [Ping timeout: 272 seconds]
heat has joined #osdev
mcs51 has joined #osdev
mrlemke has joined #osdev
mrlemke_ has quit [Ping timeout: 246 seconds]
srjek|home has quit [Ping timeout: 252 seconds]
<heat>
am I required to call _init or can I just exclusively support init_array?
<acidx>
indee/window 4
<acidx>
sorry :)
<heat>
okay i managed to find an answer: yes, if you're 100% sure you have initfini enabled in your toolchain
<heat>
which you don't if you're cross compiling and didn't enable it explicitly
<heat>
s/initfini/initfini array/
mahmutov has joined #osdev
andydude has joined #osdev
andydude has quit [Client Quit]
<mjg>
acidx: alt+4
lanodan has quit [Quit: WeeChat 3.1]
lanodan has joined #osdev
zoey has joined #osdev
<acidx>
mjg: doesn't work in my environment.
dennis95 has quit [Quit: Leaving]
<mjg>
esc+4
<acidx>
ah, that works. thanks.
YuutaW has quit [Quit: WeeChat 3.1]
iorem has quit [Quit: Connection closed]
mahmutov has quit [Quit: WeeChat 3.1]
mahmutov has joined #osdev
YuutaW has joined #osdev
shan has joined #osdev
tacco has joined #osdev
<dzwdz>
does the present flag in the IDT entires do anything?
<dzwdz>
i looked at the intel docs and i couldn't find anything on it
<sham1>
Well it indicates if the field is active and, well, "present"
<sham1>
That is, the entry is used if the present flag is set appropriately
CryptoDavid has joined #osdev
<dzwdz>
well yes, but does it do anything? e.g. when a not present interrupt gets called, do i get a GP or something?
<sham1>
Probably double faults
GeDaMo has quit [Quit: Leaving.]
<geist>
dzwdz: it's the 'is this entry valid' bit
<geist>
so yes. if you dont set it it's the equivalent of no bit
<geist>
so if you call an interrupt/exception on it, it'll probably #DF? (or will it #GP?)
<geist>
and then if that doesn't work you're in triple fault territory
<gog>
fairly certain it's #DF
<gog>
not in a spot where i can double check that tho
<Arsen>
can Iget efivars from the efi shell?
<j`ey>
`dmpstore`
<Arsen>
oh great, ty
<geist>
gog: and of corse you can't triple check it
<gog>
geist: -_-
<geist>
hmm, actually has me thinking: if a modern machine is in pure real mode, is there any way to get a #DF?
<geist>
i guess the stack pointer wrapping around can still happen. i think doug16k said that on 8086s and whatnot it would just crash
<geist>
in protected mode of course there's lots of ways to trigger a #DF because of the IDT and stuff being non present, etc
<gog>
ok now that i've gone to check maybe it's #NP?
<gog>
then if #NP isn't present, #DF
<geist>
thing is in real mode the IVT is always just there
<geist>
there's no notion of it not being present, unless a 0 in the address field means that
gioyik has joined #osdev
<gog>
yeah idk about real mode
mcs51 has quit [Quit: WeeChat 3.0]
<geist>
possible it cant happen. #DF of couse didn't exist in 8086 days anyway. i haven't checked but it was almost certainly added with 286 and protected mode
<geist>
but then the question is with later features can you trip one while inside pure real mode on >=286
<bslsk05>
twitter: <NoraDotCodes> oh my gods. they literally have no shame about this.   GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license. https://pbs.twimg.com/media/E5sQwkZXwAE_Zgr.jpg
<bslsk05>
[An email from GitHub Support. Name of agent is censored. Relevant part reads 'Sorry about the delay in getting back to you. I reached out to the team about this. Apparently all public GitHub code was used in training. We don't distinguish by license type."]
<doug16k>
gog, since 286, stack pointer wraparound triple faults in real mode
<doug16k>
can't be stopped. there's no way to task gate in real mode
<geist>
doug16k: and no #DF in real mode?
Brnocrist has quit [Ping timeout: 265 seconds]
<doug16k>
it doesn't help. it tries to push to current stack
<geist>
it's vector #8, but i guess it's not retroactively specced in real mode?
<geist>
oooh i see. yeah
<doug16k>
the IDT entry would need to be a task gate to switch to a new stack
<geist>
right, makes sense
Brnocrist has joined #osdev
<doug16k>
the cpu will never interpret the idt in a way where a task gate is possible in real mode
<doug16k>
they are all 16:16 far pointers
<doug16k>
a clean wraparound should work. I mean a pathological wraparound that spans byte 0 and byte -1 is triple fault
gog has quit [Ping timeout: 240 seconds]
<doug16k>
the highest byte it accesses is 0, the lowest byte is 0xffff, so limit check is not sane, everything passes 0 check, but you touched 0xffff, so instead of being that stupid it faults
<doug16k>
in other words, limit check checks addr+size-1. if you push with sp=1 then limit check checks 0, but it wrote 0xffff
srjek|home has joined #osdev
_mrlemke_ has joined #osdev
<doug16k>
sp=0 wraparound is expected. say you had a 64K stack at 0x30000, if you load sp=0 and ss=0x3000 then the first 16 bit push stores to 0xfffe
<doug16k>
in this case, the wraparound isn't occurring during the store
<doug16k>
it's just wrapping sp then doing clean store
<doug16k>
0x2fffe*
<doug16k>
so be careful not to misinterpret what is meant by wraparound
<doug16k>
to be clear, example, bottom of stack at 0x30000, stack limit at 0x20000, stack memory from 0x20000-0x30000
<doug16k>
then ss=0x20000. dang it
mrlemke has quit [Ping timeout: 265 seconds]
<geist>
makes sense
<heat>
why does the stack pointer overflowing triple fault?
<doug16k>
I just explained why
<doug16k>
the limit check checks the wrapped around offset, not accounting for the start of it at fffff...
<heat>
ah
<doug16k>
so instead of being that dumb it faults
<doug16k>
faults on wraparound during the store, not wraparound during the sp update
<doug16k>
I think it's a stack fault, but can't push, so double fault, but still can't push, so reset. mode doesn't matter
<doug16k>
might be GP though. doesn't matter what it is though, you never get control
<doug16k>
funny how IBM wired up that keyboard controller reset when all they needed to do is: push $ 0 ; push $ 0 ; lidt (%sp) ; int $ 0
<heat>
i still dont understand how you can't do that through the actual CPU
<doug16k>
you could even drop the second push $ 0, just screwing the limit is enough
<doug16k>
you can, triple fault
<heat>
yeah but why isn't there like an instruction or something like that
<kazinsal>
At one point during the 286 engineering process, Microsoft's kernel team asked Intel to make triple faults on the 286 faster so they could drop back to real mode quickly
<heat>
they had to route a RESET signal through the keyboard controller for some reason
<doug16k>
heat, that's entirely IBM's lack of thoughtfulness
<kazinsal>
yeah, from the Intel side a triple fault puts the CPU in a shutdown state, which just happens to end with "the reset line is asserted"
<doug16k>
yeah, the motherboard sees the shutdown cycle (which is the cpu saying, "I will never run another instruction no matter what") and it asserts reset
<heat>
so that's on the motherboard?
<doug16k>
not 100% sure on the genuine AT, but since then, yes
<doug16k>
now it is back inside the cpu, masquerading as the motherboard, in the SpC
<doug16k>
triple fault does work on AT
<doug16k>
AT meaning IBM PC/AT 286
<kazinsal>
from the 286 manual:
<kazinsal>
Shutdown occurs when a severe error condition prevents further processing. Shutdown is very similar to HLT in that the 80286 stops executing instructions. The 80286 externally signals shutdown as a Halt bus cycle with Al =0. The NMI or RESET input will force the 80286 out of shutdown. The INTR input is ignored during shutdown.
<geist>
yah i wouldn't be surprised if there isn't just a circuit that detects that and asserts a global reset across the mobo
<geist>
like some 74ls74 that latches it for one cycle and bops the system out
<geist>
lemme see... schematic is avail
<kazinsal>
interestingly there's also a small section in the same chapter about multiprocessing!
<kazinsal>
on the 286!
<geist>
yah i forget if the 286 is where they added the lock prefix? in general cpus at the time (68k, etc) could do a lock bus cycle by just asserting an additional line
<geist>
and it was the systems' job to make sure everything else was halted at the time. fairly straightforward
<doug16k>
I wonder how they expect you to keep up with the fetches and memory accesses of two cpus then?
<kazinsal>
yep, looks like the 286 has LOCK
<kazinsal>
LOCK is a prefix that will cause the BUS LOCK signal of the 80286 to be asserted for the duration of the instruction that it prefixes. In a multiprocessor environment, this signal should be used to ensure that the 80286 has exclusive use of any shared memory while BUS LOCK is asserted.
<doug16k>
a cache would be required, or they would starve
<geist>
yah a lot of 286 was intel playing catch up with 68k, which came out after 8086 but before 286
<geist>
and 68k had CAS
<geist>
and an external lock lock signal too
<geist>
doug16k: possibly it didn't really starve without cache because i doubt the cpu could do locks truly back to back
<geist>
probably always at least one cycle of instruction fetch or whatnto, enough for another bus master to sneak in
<doug16k>
no I mean all the time. just fetch decode execute
<geist>
if you had two cpus running with lock there's probably always one cycle there where the other one can grab it
<geist>
even if the exclusive priority scheme is basically fixed
<doug16k>
funny how IBM added the NMI mask
<doug16k>
when is parity error ok?
<doug16k>
does 80286 do the NMI mask while handling NMI? that would be a second example of unnecessary hardware IBM added, but back then you wouldn't be able to IPI yourself, so not as dumb as reset one
<doug16k>
today you can software mask NMI by sending yourself an NMI and not returning
<geist>
nmi is probably edge triggered?
<doug16k>
only useful temporarily though, the next iret would unmask it
<geist>
in whch case you can't just get yourself stuck in a nmi loop easily
<geist>
unless you're continually asserting it every cycle
<doug16k>
NMI is automatically masked when dispatched, and unmasked at next iret
<geist>
ah so there's a hidden mask in the cpu?
<doug16k>
yes
<geist>
i was reading through the z80 docs and it has something kinda complicated like that too for NMI
<doug16k>
completely visible if hypervisor or SMM
<geist>
in addition NMI is edge triggered
<geist>
actually now that i think about it, i think z80 doesn't have a reenable irq on iret (or eqivalent) like intel does
<geist>
it has iirc a one cycle delay on irq enable/disable, in the form of a chained latch that they document in the manual
<geist>
so it gives you an ability to reenable irq and then ret
<geist>
rte
<geist>
but when taking an irq it automatically jams the second latch to disabled, i think
<geist>
something like that
<geist>
s/one cycle/one instruction
ahalaney has quit [Quit: Leaving]
<doug16k>
linux pthread hammers processes with SIGSTOP? every condition variable wait raises a SIGSTOP
<doug16k>
something in thread creation does too
<doug16k>
pain
<doug16k>
have to tell gdb you don't care with: handle SIGSTOP nostop
<doug16k>
makes me wonder if ASAN is trying to stop on error with it too. it doesn't stop, program exits and all I have is the printfs
<heat>
hmm
<heat>
I don't see why it would
<doug16k>
and gcc sanitizers are a fragmented mess, only ubsan has the ability to emit trap instruction
<doug16k>
guess how many times programs wait on a condition variable :P
<doug16k>
am I the first person to debug multithreaded stuff? can't be
<doug16k>
can hardly believe it
<doug16k>
everyone just does `handle SIGSTOP nostop`, puts their fingers in their ears, and says "la la la la la la" then
<heat>
i can't believe it either
<heat>
makes no sense
<heat>
i dont see how you can get SIGSTOPs like that
<heat>
NPTL doesn't use them
<doug16k>
ah... I see why now. I tried it with bare gdb and it doesn't stop
<doug16k>
still a mystery why the sigstop are there at all though
<heat>
erm what were you using
<doug16k>
gdb MI
<heat>
MI?
<doug16k>
gdb machine interface, for IDE integration
<heat>
hm
<doug16k>
it's gdb with a hideous "command line" made for use by IDEs
<doug16k>
it's easier to parse and more determinstic
<heat>
SIGSTOPs can be generated by 1) your_favourite_kill_variant(SIGSTOP); 2) ptrace + breakpoints/other things that may cause it to break
<heat>
if gdb MI breaks ptrace that sounds horribly broken
<doug16k>
you are making me think it is a bug in the way my IDE uses it, it tells it to be paranoid as hell about all signals?
<doug16k>
oddly, if you do "handle all nostop" then it still stops at breakpoints
<doug16k>
makes some sense
<heat>
hm actually breakpoints don't generate sigstops I think
<doug16k>
would be SIGTRAP
<doug16k>
but "all" includes it(?)
<doug16k>
does it really cause a real sigtrap signal when the inferior hits a breakpoint? what a hack
<doug16k>
is ptrace good? seems like a total hack from where I am, but I don't know ptrace well enough to know
<heat>
it will generate whatever is thrown when you hit an int3
<heat>
it mainly intercepts signals
<heat>
is ptrace good? i'd say no
<heat>
it works but it's very peculiar
<heat>
oh and if you're not on the "process = thread but process also = group of threads" boat then ptrace is absolutely horrific
<doug16k>
my kernel is "processes have threads" not "bunch of processes that coincidentally use same memory and handles"
<heat>
also ptrace behaviour varies between OSes
<heat>
a quick google shows(or at least heavily implies) that freebsd's ptrace attaches to every thread in a process
<heat>
linux's attach is to individual pseudo-process-thread-thingies
<heat>
not to the actual thread group(aka process)
<doug16k>
I suppose technically my kernel just has a bunch of threads, and each one points to a process, but each process has a list of threads, which is a circular reference back to all the threads
<doug16k>
...of that process
<heat>
yeah same
gog has joined #osdev
qookie has quit [Quit: leaving]
heat has quit [Remote host closed the connection]
<gorgonical>
geist: can you shed some light on the interrupt system? I really only need clarification about the control flow; hcr_el2.[afm]o are all set to 0. This sounds like it means the interrupt won't get delivered to EL2, but EL1. But then, EL2 has an interrupt table that forwards irqs to a function that swaps the primary VM at EL1 back in. So one of two things is true: the EL2 interrupt table isn't being
<gorgonical>
used and the interrupt jumps to EL1 directly somehow, or the interrupt gets re-triggered on the switch to EL1 somehow. I suspect it's actually the latter, but I am curious about the mechanism.
<gorgonical>
"The interrupt" being explicitly the hypervisor timer on the generic timer attached via a GIC3