<sonny>
so you just set you data structures up and it can all be done at a high level
<sonny>
s/you/up
<heat>
yes
<sonny>
ok
<heat>
malloc is super high level
<moon-child>
meh you have to be careful though
<heat>
it's essentially just C/C++ without memory allocation
<heat>
it doesn't require assembly
<moon-child>
one time I did void *calloc(size_t x, size_t y) { return memset(malloc(x*y), 0, x*y); } (but with all the checks you need ofc), and the compiler turned it into void *calloc(size_t x, size_t y) { return calloc(x, y); }
<moon-child>
which is obviously not what you want
<heat>
-fno-builtin
<moon-child>
yeah
<sonny>
why did it change?
<moon-child>
similarly for the strings functions, if you implement them in c, compiler generally reserves the right to insert calls to them at will, so you have to do the same thing
<sonny>
doesn't look like an optimization
<sonny>
ok
<heat>
because the compiler is free to optimize your standard functions calls into other code
<heat>
memset doesn't call memset very often
<heat>
malloc + memset is undoubtly slower than calloc
<moon-child>
sonny: calloc can be faster than malloc+memset, in the case that you have to mmap
<moon-child>
since you get cleared pages from the os anyway
<heat>
or in the case you expanded the heap
<sonny>
oh, I had no idea
<sonny>
but that makes sense
<moon-child>
also even if you do get to reuse existing memory, you might already know size class, so save a couple branches there
<heat>
memset(mem, 0, 4) -> *(int *) mem = 0;
<moon-child>
oh yeah, calloc impl also gets to round its memset
<moon-child>
that's a really annoying thing
<moon-child>
like if you calloc 7 bytes, you can clear all 8 that you actually reserved. Regular memset doesn't get to do that
<sonny>
ok, that sounds like you called calloc again on the same region
<klys_>
calloc calling itself isn't what you want obviously, right.
gog has quit [Ping timeout: 252 seconds]
<heat>
moon-child, rounding a memset only seems useful on lower sizes
<heat>
rounding 210 -> 256 is just slower
<moon-child>
not slower, as it's the same number of cache lines
<moon-child>
but yes, most of the gain comes from not having to deal with sizes less than a word (or two words, or whatever your min allocation size is)
matt__ has joined #osdev
wxwisiasdf has joined #osdev
<geist>
haha this cpu is hilarious with the stock cooler
<klys_>
ryzen? I just got my epyc 3 fan today
<geist>
sits there nice, 35-40C idle. the instant you fire up prime95 in torture mode it slams right against 100C
<geist>
like, within a minute
<geist>
er within a *second*
<geist>
then throttlse itself for 2 or 3 seconds, then slams right against it again
<geist>
it's an alder lake, core i7-12700
<klys_>
coo
<geist>
seems to be highly specific to the test though. regular joe blow load seems to run up to 60C or so and generally get on with things
<geist>
but the prime95 torture test that runs within the L1 cache or so just goes open loop
<geist>
probably seriously memory starved, especially since i only have a single DIMM in it
sonny has quit [Quit: Client closed]
<geist>
klys_: also grats on the epyc fan. what were you using before?
<klys_>
this is a new system which is slowly coming together
<klys_>
I have so far, the ram, cpu, mobo, and a case (and potentially a test psu)
<klys_>
coming in the mail are: ram heat spreaders, and arctic mx-4 thermal paste
<heat>
henlo
<heat>
i have been invoked
<klys_>
=)
<heat>
did you see my response to your firmware question thing?
<klys_>
yes thank you
<heat>
np
<klys_>
now I was asking geist because he may have a new system thing coming and some of that might be preventable
<geist>
fwiw linux on the surface seems to just load up the performance cores first
<heat>
that's the general picture of firmware in x86
<heat>
a new system thing?
<geist>
ie, core 0-15 gets loaded up first, then 16-19 is spillover
<klys_>
yeah x86 firmware blobs are pretty horrendous, arm systems couldl even be worse
<klys_>
heat, because he's working with new devices at google
<geist>
oh? where do you get that impression?
<heat>
new x86 devices are going to look the same as old x86 devices
<heat>
new arm devices may have a lot more firmware due to SBSA and UEFI and ACPI
<klys_>
or at least you should be in the future, working with the prototype thing I was talking about yesterday
<heat>
(and possibly option ROMs actually built for aarch64)
<geist>
i mean i'm fiddling with x86 machines... what do yo mean prototype thing?
<geist>
please dont state things like that, tends to get picked up by the press and whatnot
<klys_>
yesterday I referred to a hypothetical prototype you may be working with
<geist>
mkay. i guess you did?
<klys_>
it was a reasonably hypothetical question about how much firmware it would have
<geist>
okay, but that's a lot different from 'geist is working on a prototype at google'
<klys_>
and heat says there could be a lot of firmware if it's an arm chip
wxwisiasdf has quit [Quit: Lost terminal]
<heat>
" how much firmware on an internal default fuchsia setup today, curious"
<heat>
"supposing there was such a device prototype at your company, I ask, how much firmware is loaded on the prototype "
<geist>
please dont make me regret being fairly free flowing with informatino and try to read some sort of stuff out of it
<geist>
that is exactly how you get told to stop talkong to people on irc
<klange>
7. 7 firmware.
<heat>
I don't think anything here in the questions or answers says there's a cutting edge prototype at goog
<heat>
at least definitely not in the answers
<klys_>
well if I ask the same question enough times someone may begin to understand what I said
<geist>
yeah i was specifically referring to what in the industry folks tend to do when buiding consumer devices. that's all
<geist>
means zero with what i'm currently doing at work
<heat>
if you want to know how cutting edge devices are doing firmware, ask coreboot or tianocore
<heat>
it's not rocket science
<klys_>
klange, so there's seven firmwarez?
<heat>
spoiler: it's the same as before, although maybe a bit more
<geist>
yeah really most if it is just mundane crap
<heat>
i think ARM is a bit fancier with the hypervisor stuff and possibly UEFI in the future
<heat>
x86 is also getting fancier by ditching SMM for PRMs
<heat>
(well, not ditching it all, just making it slimmer)
<heat>
Google folks (ron minnich et al) are also working on LinuxBoot
<klys_>
what's linuxboot?
xenos1984 has quit [Read error: Connection reset by peer]
<heat>
which is essentially a stripped down UEFI image with all the bare minimum, with a linux kernel attached to it
<heat>
so it executes linux where it would usually execute DXE (the driver stage)
<klys_>
so the kernel is at least a separate file
<geist>
fwiw single threaded, on the surface this i7-12700 benchmarks a teensy bit behind (maybe 10%) a 5950x
<geist>
whcih is about what i expect. i think it's runing about 5ghz
<geist>
at least until it thermal throttles itself
<heat>
apparently this is already done at decent scale in google servers
<heat>
which is super interesting
<klys_>
uefi firmware like tianocore, does it run in SMM?
<heat>
no
<klys_>
so only seabios does that
<heat>
it runs in ring 0, some modules (SMM modules, SMM core) run in SMM
<heat>
no seabios also runs in ring 0
<heat>
same with coreboot
<klys_>
except the ami bios chip on my socket 7 piix box runs in SMM?
<heat>
firmware is only more privileged than you before locking the SMRAM register, because it can set itself up in SMM; after that any privilege is given exclusively by secure boot
<heat>
(in the sense that executing is already a privileged state at all)
<heat>
there have also been experiments in ring 3 UEFI
<heat>
which sounds super interesting
<klys_>
so you can run an .efi PE binary in ring 3, except that doesn't sound like it could lgdt.
<heat>
it doesn't need to lgdt
<klys_>
it just needs to play a game with the luser
<klys_>
and appear useful
<heat>
the point is to restrain modules from doing everything (because firmware code is frequently not the best, and downstream is worse)
opal has quit [Ping timeout: 258 seconds]
<klys_>
oh you mean like register hash .efi could run without privs
<heat>
no
<heat>
like firmware modules that say they can drop to user drop to user, the other ones don't, everything still works
wxwisiasdf has joined #osdev
<wxwisiasdf>
hi
<klys_>
hi
<heat>
hello mainframe person
<wxwisiasdf>
hehe
<clever>
heat: what about arm uefi, what takes the role of the SMM?
<heat>
clever, I don't think it does
<clever>
so there is no real privledge seperation between the main uefi, runtime services and the os?
<heat>
I honestly haven't looked much into it, but it probably doesn't, I think it just assumes EL2
<heat>
think so
<geist>
i dont think there's an equivalent to SMM. depends on i guess the feature you're expecting to be implemented in SMM, and then how that'd be done on an ARM device
<heat>
there totally is, secure monitor
gildasio has quit [Ping timeout: 258 seconds]
<clever>
wouldnt EL3 be the closest, at least in terms of isolation and privs?
<heat>
I just don't know if they're actually using it
wxwisiasdf has quit [Client Quit]
<heat>
oh shit sorry, I meant EL1
* heat
commits ARM seppuku
<geist>
yeah, so again what precise feature do you want, and then can tell you probably where that'd be done
<clever>
heat: if uefi runs the os in EL1, then you can never use /dev/kvm
gildasio has joined #osdev
<heat>
so you're confusing me?
<geist>
it's not exactly the same thing. the question is asked backwards. it's not what technology equals the same in arm, it's how is the same feature implemented
<heat>
EL3 = user right?
<klys_>
SMM is something like 32-bit real mode which the cpu only supports in early boot ("befreo locking the SMRAM register") which x86 bios roms used to initialize for example PCI.
<clever>
geist: as an example, secure efi vars, so certain vars can be set by an os, and other vars are protected
<geist>
clever: good question. it probably wouldn't use EL3, but EL3 could be used to factilitate a trusted OS running in secure EL1
<clever>
geist: for secure efi vars to work, you would need to block off the SPI controller from being used by non-secure states, and then have some EL3 code that can limit what changes you do
<heat>
SMM code is something like variable writing code (installed by firmware), or PS2 emulation code (chipset mandated)
<geist>
that's why i mean it's not just a series of nested concentric circles as muc as EL3 generally implements facilitating switching between multiple realms
<klys_>
system management mode has been in x86 systems since pentium or so
<clever>
ah yeah, you could have a trusted os in secure EL1, and EL3 is just used to context switch between non-secure EL1 and secure EL1
<geist>
clever: right so that could be done in a secure OS, which would run at secure EL1 or so
<geist>
precisely. that's why i mean it's not a direct map between the worlds. usually EL3 doesn't do much except context switch and implement low power stuff
<clever>
but you will need some help from EL3, to be able to IPC into a secure EL1
<geist>
(ie, PSCI)
<geist>
that's right
<heat>
SBBR says "EL2 or EL1"
<clever>
and your SPI controller needs to be able to reject non-secure requests
<clever>
so the os doesnt just go around your back and re-write the flash directly
<geist>
but yes, re: UEFI the boot phase usually happens in EL2 or EL1. bascally the highest mode the cpu is in post EL3
<geist>
and it hands off control to the kernel in the same EL., which makes sense because it's where the .pe file for the secondary loader runs at
<clever>
makes sense, run in EL2 if its available, so the os can be a hypervisor
<geist>
right
<clever>
and fallback to EL1 when the hardware doesnt support it (or your already a guest)
<bslsk05>
www.rcollins.org: The Secrets of System Management Mode
<geist>
fairly certain i saw that written out in the uefi spec under the arm section
<clever>
yeah, i think i saw similar
<heat>
klys_, that's all documented in the SDM, not secret
[itchyjunk] has quit [Remote host closed the connection]
<heat>
also the UEFI code is public
<klys_>
heat, back when it was written, those were
<geist>
1997! did they even have computers then? wowwwwwwww11
<heat>
i wasn't alive in 1997
<clever>
geist: as for protecting peripherals, is your only option in the event of failure, is for the secure peripheral to just return an async external abort? no way for EL3 to trap writes to a peripheral?
<heat>
oldies
<geist>
clever: probably would be up to the system designer
<geist>
i dont think intrinsically yhere's a mechanism for it
xenos1984 has joined #osdev
<clever>
thats what i thought
<clever>
youve sent me a link before, showing that the axi bus has a secure/non-secure flag on it
<geist>
though in arm64 the async external aborts are a different vector, so maybe el3 can in one f the control bits redirect all those to it first?
<geist>
i haven't reead that part of the manual that closely
<clever>
and then the system designer can wire that up, so an spi controller acts like an non-existent bus address, to non-secure parties
opal has joined #osdev
<clever>
rpi has done similar, twice
<heat>
x86 also has that AFAIK
<clever>
mmio can optionally be gated on a different axi signal, user/kernel, so userland just cant do mmio
<clever>
and certain secure peripherals have a "allow arm" flag you can set, and if its clear, the arm just isnt trusted at all (secure and non-secure both rejected)
<geist>
okya yeah this is hilarious. bulding qemu it just bonks the cpus around 80-90C and they throttle every 5 seconds or so
<geist>
so not enough cooling to deal with this cpu for reals
<heat>
weren't the intel ones decent? or were they the amd ones?
<geist>
it clearly can't keep up with the heat so it throttles a lot under load
<clever>
geist: according to the diagram i just linked, the secure EL2 is also limited to 64bit only, bit strange, but the nature of secure mode, means you have more control over what even runs
<geist>
that's the reason i got it. it utterly breaks the fuchsia topology code
<clever>
i can see why
<clever>
it looks like a BIGlittle style design?
<geist>
that's the cheapest alder lake you can get with both 'performance' and 'efficiency' cores
<heat>
looks cute
<heat>
really, the cheapest?
<geist>
all the lower end ones just have P cores
<heat>
I would think the low ends would abuse bog little a lot more
<geist>
you'd think. at least in desktop space
<Matt|home>
alright. im trying to understand this very specific confusing thingy, and what i've concluded is that it's impossible and therefore my computer doesn't exist. somebody please tell me if how i understand this is correct:
<geist>
in laptop space it's a different thing, you get more Es than Ps
<clever>
geist: and how do you make those performance cores sleep, just the regular old x86 idle opcodes like usual? if all performance cores do that, is it enough to massively reduce power draw?
<geist>
but you can't get those stanalone, etc
<heat>
clever, x86 power management has evolved far beyond "haha cpu go hlt"
<clever>
heat: ACPI bytecode to poke platform regs then?
<geist>
FWIW the default ubuntu tunning is to just load up the P cores first
<geist>
then spillover to the E ones, which is really anti-useful to be honest
<heat>
ACPI bytecode, MSRs, set some bits so the CPU can manage itself, yadda yadda
<heat>
lots of options
<geist>
sort of against the spirit of it, at least, but that's a reasonable default safe thing to do
<heat>
hlt is by no means your best option really
<heat>
even monitor+mwait can do better than hlt
<geist>
also dont for... yeah monitor/mwait
<geist>
you can hint when you monitor which level of sleepy you want to go to
<geist>
for example
<clever>
ah
<Matt|home>
I'm trying to understand how alignment works, using x86 as an example (32-bit). Now, I have a 32-bit CPU therefore the word size of the computer is 32 bits or 4 bytes. This means two things: #1) If i try to read say, an 8-bit char into a register, the CPU will actually read the entire word size into the register instead. So "mov eax, 0x0" will actually read in 0x0, 1, 2, and 3 into eax
<Matt|home>
is this correct?
<klys_>
yes
<clever>
i believe EL0 can wfe, which lets locks sleep without a syscall?
<clever>
but what about x86, can userland trigger some sleep states on its own?
<heat>
no
<klys_>
and mov al,[dword 0x0] will read in 0x0 only
<Matt|home>
okay. #2: the CPU (at least x86 i believe) can NOT read in an "unaligned" address, in other words, an address not divisible by the word size - so for a 4-byte CPU, 0 is divisible by 4, and 4 is divisible by 4. if you try to read in 0x1 into eax, the CPU physically can not do that. it is impossible
<Matt|home>
is this correct?
<klys_>
presumably you meant to mov eax,[dword 0x0]
<geist>
x86 has no problem at all reading unaligned addresses and always has been able to do it
smach has quit []
<Mutabah>
Well.. the ISA can do it
<klys_>
matt|home, it'll do it in two bus cycles
<Matt|home>
my understanding is that the CPU _actually_ reads in from aligned addresses that have the chunks it requires
<geist>
the caveat is depending on the situation and the core and whatnot, it may take longer
<Mutabah>
the hardware just converts it to two bus cycles (or uses cache)
<geist>
right. it depend son the exact model you're operating under
<heat>
clever, actually sorry, umwait can affect performance levels
<geist>
the gist of it is it has no problem, and depending on which of the 40 something years history of x86 you're on, it may resort to simple things like 'read over multiepl cycles' or more modern stuff where the cache just deals with it
<Matt|home>
for example: i try to read 0x1 , the CPU will actually read in 0x0 into eax, and then the remaining data into another register or something, ebx, and then somehow stitch the relevant data together
<geist>
because it's already read in 32 or 64 bytes of cache anyway
<clever>
ahh, thats what the u prefix is for
<geist>
so it depends on how intersted you are in precisely what really happens vs what th emodel is
<Matt|home>
okay so im confused
<geist>
the model is 'it deals with it' but that's not very satisfying
<clever>
heat: so basically, umwait lets you sleep in a defined power state, until something in a defined virtual address range is modified?
<geist>
say mid 80s era cpus are the most interesting: in that case it does basically what you think
<heat>
clever, yup, or an interrupt
<geist>
it'll read in aligned words off the bus and then pick out the parts you need
<Matt|home>
is the general concept "the address is unaligned, i must read in a bigger chunk of addresses and stitch it together" correct?
<geist>
68ks would do this, x86s of the era would do this
<geist>
and thus that's why it might take an extra cycle. correct
<Matt|home>
right
<Matt|home>
okay
<clever>
heat: and in the event of an interrupt, it would be serviced in kernel mode, which may also context switch out of you temporarily, and then you exit from the umwait later, with no actual event
<Matt|home>
so here's what im trying to grok. im looking at a header file for a C struct, and it says that it includes additional padding to ensure alignment
<clever>
similar rules to arm wfe i believe, where you can be woke up by unrelated things, and should re-test your atomics and maybe sleep again
<geist>
a 68030 or a 80486 is a pretty good example: both are 32bit machines that can only address 32bit addresses, but they have the ability to do the appropriate amount of unaligned accesses
<Matt|home>
... how does that help ensure anything. if you create a variable in a program, there's no way to specify that it should be given an aligned address unless you malloc it or something
<Matt|home>
here's the exact wording
<geist>
and their bus protocols have the approrpiate control lines to tell the memory subsystem which bytes to write out on a write, for example
<Matt|home>
"All data structures that the file format defines follow the
<Matt|home>
If necessary, data structures contain explicit padding to ensure
<Matt|home>
a multiple of 4, and so on."
<Matt|home>
"natural" size and alignment guidelines for the relevant class.
<Matt|home>
4-byte alignment for 4-byte objects, to force structure sizes to
<clever>
heat: could a kernel inspect the interrupted opcode, parse the umwait inputs, and then setup its own mwait, and not even schedule your thread until your condition is met?
<clever>
heat: that seems better then context-switching between a dozen threads, that all are sleeping until something happens
<geist>
this is where the language mandates it. it's in the ABI and the compiler will already align things naturally for you
<clever>
"you sleep for 20ms, then you sleep for 20ms, then you" lol
<Matt|home>
and also, doesn't having to read stuff from unaligned addresses slow things down considerably like that?
<geist>
the ABI says things like 'this size variable shall be aligned on this boundary' and then the compiler emits code as if
<clever>
a sleep scheduler!
<geist>
Matt|home: well, again, depends on precisely what kinda hardware you're on
<klys_>
if the question is how to align your stack before pushing such a structure then good point I have no idea. mebby someone will know tho
<geist>
in the 80s. yep. absolutely, probably.
<heat>
clever, why would it set up its own mwait?
<heat>
umwait already does it
<geist>
nowaddays, probably not, except so,etimes
<geist>
because modern machines have so much cache and readahead and out of order speculation that it basically hides all of that
<Matt|home>
alright. last question, i think i understand
<clever>
heat: but what if you have 2 threads, that are both umwait'ing in userland, and the kernel scheduler is wasting time trying to run both, and both are just sleeping for their entire time-slice?
<geist>
the model that the cpu fetches say 32 bits at a time is no longer true, since the caches are reading in at least 32 or 64 bytes at a time, at the minimum
<heat>
clever, umwait isn't a general purpose thing
<Matt|home>
from the C standard: basically it's saying a void * and char * will have the same alignment requirements. what does 'requirement' mean, that it has to reside at an aligned address?
<clever>
so the entire time-slice goes to waste
<geist>
so within that aligned run the cpu an already read at arbitrary alignments
<heat>
Matt|home, data types that aren't aligned to their natural alignment are subject to undefined behavior
<klys_>
matt|home, 32-bit alignment means your pointer's low two bits are 00
<geist>
that's basially the ABI per architecture and language
<geist>
it specifies precisely what the alignment requirements are
<clever>
heat: yeah, it seems like a userland library would need to balance its use of umwait, vs a syscall like futex
<geist>
in this case it means '8 bits'
<geist>
because on x86 chars are 8 bits wide
<heat>
clever, I think umwait is explicitly anti-scheduler and anti-kernel - super special purpose
<geist>
arm64 has it's own ABI, riscv has it's own ABI (application binary interface)
<Matt|home>
heat : so if an 'int' is 32 bits it needs to reside at a 32-bit aligned address is what that means?
<geist>
and it specifies precisely how this stuff works, so different compilers and linkers can interoperate
<geist>
yes
<heat>
yes, unless you tell the compiler it shouldn't need to (and this is compiler-specific)
<heat>
you can also tell the compiler you want stronger alignment requirements - also compiler specific
<geist>
for the most part most sane architectures say that variables of a particular size (usually power of 2) should be in memory at at least a 'natural alignment'
<Matt|home>
ok. in general when im writing non-OS code do i need to give a damn about alignment outside of like, mmap or malloc or whatever?
<geist>
ie, a 8 byte word shoudl be at 8 byte boundaries
<heat>
yes
<heat>
don't fuck around with pointers too much
<geist>
also in general i should point out that malloc in particular has a requirement in the ABI for alignment
<heat>
also be careful if you want the best performance, or are writing some optimized routines
<geist>
for exampel on x86-64 it should always emit pointers at least 16 byte aligned
<geist>
any more than that is irrelevant (unless you're using memalign) but you should not unalign things less than 16 bytes
<clever>
geist: and i suspect that kind of big alignment bit me, when porting a 3d driver
<clever>
the code worked on linux without any care for alignment
<geist>
usually the 16 yte alignment is so that you can fit a 128 bit vector without explicitly needing to memalign
<clever>
but when porting it to LK, it suddenly broke, because LK's allocs where less aligned
<clever>
ah wait no, it was in .rodata, so the linker was to blame
<geist>
but in the case of arm64 it loves 16 byte alignment, the stack pointers are 16 byte mandated, etc
<clever>
and partially malloc on linux, for giving me a false sense of stability
<geist>
x86-64 is a little looser, but it still wants 16 byte aligned mallocs
<Matt|home>
jesus christ optimizers are stupid crazy smart nowadays
<klys_>
our savior
<geist>
hah i read exactky that too
terrorjack has joined #osdev
<geist>
jesus christ superoptimizer
<klys_>
see also: -fno-builtin
<klys_>
as mentioned earlier
<Matt|home>
https://pastebin.com/raw/WNuBJgqi <-- all that junk was thrown out and just replaced with printf("%d", 3); the damn thing just knew it was printing 3 at the endo f the day
<clever>
Matt|home: const-expr, it can do it all at compiletime
<geist>
yeah that's always annoying when trying to ge tit to emit code
<geist>
but constant folding is definitely one fo the best
<clever>
you can also tag certain functions as being pure and constexpr, to force it to do that more
<Matt|home>
well i had it on optimized for max speed just to see what the compiler was doing
<Matt|home>
just for fun
<geist>
or mark say mystruct as volatile
<geist>
will force it to emit all those instructions
<heat>
you can't even disable constant folding
<heat>
it's required by the compiler
<geist>
but yeah, that's good stuff
<zid>
just return s.c instead
<zid>
and pass in the struct from elsewhere
<zid>
that's more general case advice though, vol works easier here
<Matt|home>
i am now convinced that the engineers who invented the x86 CPU are all lunatics and raving madmen. just me though
<heat>
they are
<heat>
the lead for the i486 is now intel's CEO
<heat>
raging millionaire lunatic i tell ya
<heat>
(although a very smart one tbh)
<Matt|home>
actually im kind of impressed.. this really is just a bus issue that x86 seems to get around
<heat>
x86 designs get around everything
<heat>
they've been built up over 30 years worth of CPU design
heat has quit [Ping timeout: 260 seconds]
wxwisiasdf has joined #osdev
<wxwisiasdf>
Hello, tipos & advice for running 16-bit programs on long mode?
<Mutabah>
wxwisiasdf: use an emulator
<wxwisiasdf>
Mutabah: :^)
<Mutabah>
once in IA-32e mode (i.e. able to run 64-bit code), v8086 mode is no longer available
<wxwisiasdf>
yah, but won't switching from long->protected->v8086 solve this issue?
<Mutabah>
you could clear EFER.LME
<Mutabah>
but that's quite an invasive operation
<Mutabah>
What is your goal?
<wxwisiasdf>
how so?
<wxwisiasdf>
goal is to run 16-bit apps on 64-bit, as if it was some sort of MS-DOS 64-bit
<wxwisiasdf>
so you could theoretically run 1000 instances of Turbo C
<CompanionCube>
you can't, DOS needs segmentation which isn't a thing in long mode
<Mutabah>
A 32-bit kernel with PAE would probably be easier
<wxwisiasdf>
Fair enough
<wxwisiasdf>
PAE is 48 bits, right?
<Mutabah>
When you clear LME, the meaning of the GDT/IDT changes, and you cannot run 64-bit code until it's enabled again
<Mutabah>
It's the same as moving from PMode back to RMode
<wxwisiasdf>
that's fine by me
<CompanionCube>
Mutabah: for real obscure hacks there's always SVM snd 'paged real mode'
<Mutabah>
So many nice hacks :)
<wxwisiasdf>
SVM?
<CompanionCube>
AMD's x86 virtualisation extensions
<wxwisiasdf>
ah
<wxwisiasdf>
well i guess i will be stuck with i686
<Mutabah>
Using virtualisation might work for you
<CompanionCube>
(though for that the manual explicitly says that if in paged real mode the hypervisor doesn't manage page faults that's UB)
<wxwisiasdf>
right...
<klys_>
can vmx page real mode?
<CompanionCube>
iirc vmx went for a different solution
rorx has quit [Ping timeout: 252 seconds]
<klys_>
so it can't? your hypervisor isn't in pmode nor long mode? naturally the vm could be in real mode
<klys_>
something wrong?
<CompanionCube>
VMX went with regular real mode iirc
<CompanionCube>
originally it couldn't do real mode *at all* and they had to add it early on#
<klys_>
so what does that mean? explain
<klys_>
the hypervisor can't page while the vm is in real mode?
<geist>
i think it just means that initially vmx guests were protected mode and above. didn't allow running real mode guests
<geist>
that got fixed later
<klys_>
right just the idea that paged real mode isn't a thing with vmx seems weird
<sonny>
The context is an operating system environment. I don't understand it's purpose ... is it to be distinguised from the kernel?
<moon-child>
sonny: I'm busy now, but I suggest going into #proglangdesign and asking ChoHag
<sonny>
ok
<moon-child>
the shell is a program. It exists completely in userspace
<Mutabah>
The "shell" in general terms is the top-level interface application for the OS
<klange>
"Shell" is an overloaded term. The "Unix shell" is a programming language focused around executing other programs and constructing pipelines. There are many implementations of similar languages that may or not be compatible with the standardized POSIX shell, but perform the same general function.
<Mutabah>
E.g. In windows, the shell is provided by explorer.exe, and is the taskbar
<Mutabah>
In command-line environments, the shell is the interactive command interpreter
<Mutabah>
but then you get into "shell scripts", where it's no longer interactive
<klange>
"Shell" may also just mean "interface to the OS". ^ Sometimes Explore and/or DWM is referred to as a "shell", and there's Gnome-Shell that stuck it in their name.
<Mutabah>
so... complex question :)
<kazinsal>
And then there's cases where the command interpreter isn't called the shell, but there is also a different component in the OS called the shell, such as Cisco IOS
<sonny>
I mean suppose you did not have a split between priviliged and non privilledged code, does that change the definition too?
<kazinsal>
(the command interpreter is called the IOS Exec, then there's tclsh which is the Tcl Shell, which is completely separate and most people don't even use it)
<CompanionCube>
IOS does TCL? huh
riverdc has quit [Quit: quitting]
<klange>
Every time I learn something about IOS, I think maybe whoever at Apple wanted to cop that name might well have had the right idea in trying to erase the existing user of it...
riverdc has joined #osdev
GreaseMonkey has quit [Remote host closed the connection]
<kazinsal>
For maximum comedy, on certain newer IOS-XE platforms, you can use Tcl to spawn a docker instance running a Linux-based Tcl implementation inside that
<Mutabah>
klange: It has one good thing - it's better than CatOS
<Mutabah>
sonny: The concept still stands - the shell is the user interface
<Mutabah>
even if it's just a component of a much larger binary
<sonny>
ok
<sonny>
so I just can't shake the idea that if a programming language is an interface for an OS you should just end up with a small kernel but being able to implment the rest at a high level of abstraction
<dminuoso>
Programming languages are not an interface for an OS, generally.
<dminuoso>
They are an interface for execution resources
<dminuoso>
Things like "standard libraries" are usually an interface for the operating system
<klange>
"interface to the OS" in the sense of "thing the user interacts with to make the OS do stuff"
<klange>
Where "stuff" is quite often "launch other applications".
wxwisiasdf has quit [Remote host closed the connection]
<sonny>
I get it now
<sonny>
language is the interface, everything underneath it is the OS
<klange>
If you want to talk about what constitutes "the OS" that is a completely different discussion.
<dminuoso>
Or, on many todays machines, its particularly system calls that present the interface to an operating system.
<dminuoso>
(With standard libraries often directly or indirectly using system calls)
<dminuoso>
But yeah, it really depends on what we mean by "interface", or "OS"
<klange>
But, again, that's not the 'interface' presented to the 'user'... though... don't give me ideas for a shell that just lets you type up syscalls as the primary mechanism for doing things :)
<klange>
And... I don't think I would consider syscalls an "interface to the OS" even for programs, even if we use a wider definition of them that doesn't directly map to "the thing you do with, eg., the 'syscall' instruction on x86-64".
<dminuoso>
"the user" is also a matter of perspecftive *shrugs*
<klange>
The person who sits down in front of the computer and wants it to do something is the user.
<dminuoso>
Sure, if we define that to be user then you're right.
<klange>
A shell is the thing that facilitates the user doing stuff, but isn't itself the stuff the user wants to do.
<dminuoso>
A user could also be the person implementing a programming language
<dminuoso>
I'm just pointing out different interpretations, not that mine is necessarily the correct one
<klange>
I'm not sure what you're pointing out interpretations for, though, because we're talking about shells.
<dminuoso>
Ultimately I think there's many user interfaces involved simultaneously
<dminuoso>
On a modern machine you have your physical interface (keyboard, monitor), there's X11/Wayland in between, a window manager, a terminal emulator, tty-stuff in between, then you get the shell.
<klange>
Contributors to Wiktionary have given it this definition:
<klange>
"An operating system software user interface, whose primary purpose is to launch other programs and control their interactions; the user's command interpreter. Shell is a way to separate the internal complexity of the implementation of the command from the user. The internals can change while the user experience/interface remains the same."
<dminuoso>
I dont subscribe to the notion that there is always a single authoritative definition for most things - especially if that definition is owned and controllable via the wikimedia foundation.
<dminuoso>
But anyway
<kazinsal>
That is a fascinating, unique, and dramatically incorrect take on the role of the wikimedia foundation in the maintenance of wiktionary and I thank and hate you for it
sonny has quit [Quit: Client closed]
<klange>
sonny has left us, but I'm game for continuing on this topic.
<dminuoso>
kazinsal: The wikimedia foundation is responsible for the controls in place. It gives room for editors to interpret notions of notability or verifiability however they want it.
<dminuoso>
Well and more to the point, admins.
<klange>
Wiktionary has its own inclusion criteria, separate and unrelated to Wikipedia's, and they are much more clear, and, imo, forgiving.
<kazinsal>
And Wikipedia's notability criteria is important because otherwise you get TVTropes.
bauen1 has quit [Ping timeout: 252 seconds]
bauen1 has joined #osdev
zaquest has quit [Remote host closed the connection]
<sbalmos>
arm is quite the obstructionist, always throwing up barriers to productivity
<geist>
isb is basically 'let all instructions before and after this complete'
<geist>
it tends to be needed in places where on x86 it was implicitly a synchronizing instruction
<mjg>
geist: it's not my code and i noted i don't know if what it is doing makes any sense
<mjg>
geist: it is for armv7 and used around context switching
<mjg>
i only know about it because the compiler prints the warn
<mjg>
it definitely works well enough to have operational multiuser env
xenos1984 has joined #osdev
<geist>
ah yeah. in general if you see a msr/mrs instruction with an isb afterwards, it's simply because you're supposed to
<geist>
think of it this way: the msr instruction which writes to a control register is not synchronized with the pipeline on arm (unlike on x86)
<geist>
so the isb is your way of enforcing that later instructions should operate under whatever changes the msr brought about
<geist>
*most* msrs should require an isb, but some dont require it, so it lets you decide
<j`ey>
geist: like the new self synchronising counter regs!
<j`ey>
CNT(P,V)CTSS_EL0
* geist
nods
<rpnx_>
I am getting a strange error trying to call assembly from C
<geist>
and the error is....?
<j`ey>
guess!
<zid>
Is it 7
<geist>
i'm guessing lack of extern "C"?
<rpnx_>
warning: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: archive library: libklib.a the table of contents is empty (no object file members in the library define global symbols)
<geist>
hmm, okay looks like it has some exported global symbols
<geist>
is `ar qc` the right invocation?
<geist>
i dont know off the top of my head
<klange>
I find the default Make preference is `rv`, and the `v` is just verbose, so really it's just `r` loudly.
biblio has quit [Quit: Leaving]
<geist>
looks like 'q' is basically the equivalent of r
<geist>
except it does a 'quick' append and doesn't check if it duplicately adds. maybe tht's the issue here?
<geist>
rpnx_: can you switch the args to rv? if you already have an existing .a file maybe the build just keeps appending it and making a larger and larger .a file?
<geist>
maybe do a list on the existing .a file and see
<geist>
perhaps 'q' is just making a larger and larger a file and then ranlib is throwing up on it
<wxwisiasdf>
to hw context switch i just `int <num of task gate> right
<geist>
wxwisiasdf: you're using hardware task switching? why so?
<wxwisiasdf>
because it's fun
<geist>
so you explicitly know it's very much deprecated/slow/etc?
<wxwisiasdf>
yep
<geist>
okay then. yeah i think that's basically what you do. you can also put the task gate in a LDT and then switch LDTs as well, i think
<wxwisiasdf>
fair enough, just wanted to know that
<geist>
that's where the LDT stuff starts to make sense, because lets you switch contexts to a whole new set of segments
<wxwisiasdf>
but there can only be 1 tss per processor, right?
<geist>
no not at all
<wxwisiasdf>
englihten me, then
<kazinsal>
if you're in protected mode you can use multiple TSSes
<geist>
well, i'd kinda rather not, because it's a huuuuuuge topic
<kazinsal>
if you're in long mode you can't use hardware task switching so stop it
<geist>
but the '1 tss per processor' is the whole 'minimum thing you need if you're not using hw task switching'
<wxwisiasdf>
i am in protected mode and not having any plans to do x86_64
<geist>
but if you're actually using task switching then TSS has a completely different relationship
<geist>
in that case there's a TSS per *thread*
genpaku has quit [Remote host closed the connection]
<wxwisiasdf>
geist: ah, that makes it clearer
<geist>
it's basically the save spot for the hardware to load/store the thread (task) state
<wxwisiasdf>
per thread? including "sleeping" ones?
<geist>
yes
<wxwisiasdf>
aaah, okay makes sense now
<geist>
so one strategy is to have a huge GDT (up to 8192 entries) with a TSS per thread, that's probably the simplest thing to do, dynamically assign a GDT slot per thread, and allocate/free a TSS as you create them
<geist>
or i think you can put them in a LDT (or maybe not, i forget the dtails)
<geist>
possible TSSes can't live in LDTs, there are some limitatinos to what you can put in a LDT
<wxwisiasdf>
fair enough
<geist>
i piddled with using hardware task switching once for kicks, mostly to time it, and it is indeed slow
genpaku has joined #osdev
<geist>
i remember on a bulldozer it was a few orders of magnitude slower. as if the mirocode was serializing the hardware on every microop
<geist>
on a pure 386sx it was basically about as fast as the equivalent set of instructions to do precisley the same thing, as expected. something like 300 cycles, and if you added up all the existing instrutions it'd be about the same
bas1l has joined #osdev
<geist>
the real slow part being the reloading of 6 segment registers are cr3. each of which on a 386 are like 25 cycles
<geist>
s/are/and
<geist>
the big advantage of it vs pure sw task switch is you can jump directly between two ring3 tasks without a correspoinding kernel switch
<geist>
though of course you'd have to arrange for that to be allowable, but kinda neat for some sort of 'call into another process directly' style call gate
nickster7 has joined #osdev
buffet6 has joined #osdev
gjnoonan_ has joined #osdev
tom5760_ has joined #osdev
noeontheend_ has joined #osdev
fluix_ has joined #osdev
patwid_ has joined #osdev
jleightcap_ has joined #osdev
milesrout_ has joined #osdev
alethkit_ has joined #osdev
exec64_ has joined #osdev
sjs has quit [Killed (tantalum.libera.chat (Nickname regained by services))]
LostFrog has joined #osdev
onering has joined #osdev
matthews2 has joined #osdev
sjs has joined #osdev
Bitweasil- has joined #osdev
ozarker_ has joined #osdev
Piraty_ has joined #osdev
ZipCPU_ has joined #osdev
dennisschagt_ has joined #osdev
dzwdz1 has joined #osdev
dormito_ has joined #osdev
klys has joined #osdev
colona_ has joined #osdev
Yukara has joined #osdev
corank_ has joined #osdev
alethkit has quit [Ping timeout: 244 seconds]
cln has quit [Ping timeout: 244 seconds]
fluix has quit [Ping timeout: 244 seconds]
jleightcap has quit [Ping timeout: 244 seconds]
kkd has quit [Ping timeout: 244 seconds]
milesrout has quit [Ping timeout: 244 seconds]
wxwisiasdf has quit [Ping timeout: 244 seconds]
dormito has quit [Ping timeout: 244 seconds]
PotatoGim has quit [Ping timeout: 244 seconds]
mahk has quit [Ping timeout: 244 seconds]
dzwdz has quit [Ping timeout: 244 seconds]
dennis95 has quit [Ping timeout: 244 seconds]
noeontheend has quit [Ping timeout: 244 seconds]
ddevault has quit [Ping timeout: 244 seconds]
exec64 has quit [Ping timeout: 244 seconds]
tom5760 has quit [Ping timeout: 244 seconds]
LambdaComplex has quit [Ping timeout: 244 seconds]
kristinam has quit [Ping timeout: 244 seconds]
elastic_dog has quit [Ping timeout: 244 seconds]
zaquest has quit [Ping timeout: 244 seconds]
ZipCPU has quit [Ping timeout: 244 seconds]
Piraty has quit [Ping timeout: 244 seconds]
PapaFrog has quit [Ping timeout: 244 seconds]
sbalmos has quit [Ping timeout: 244 seconds]
shikhin has quit [Ping timeout: 244 seconds]
theobjectivedad has quit [Ping timeout: 244 seconds]
dennisschagt has quit [Ping timeout: 244 seconds]
catern has quit [Ping timeout: 244 seconds]
buffet has quit [Ping timeout: 244 seconds]
Bitweasil has quit [Ping timeout: 244 seconds]
colona has quit [Ping timeout: 244 seconds]
nickster has quit [Ping timeout: 244 seconds]
Beato has quit [Ping timeout: 244 seconds]
basil has quit [Ping timeout: 244 seconds]
klys_ has quit [Ping timeout: 244 seconds]
patwid has quit [Ping timeout: 244 seconds]
gjnoonan has quit [Ping timeout: 244 seconds]
ozarker has quit [Ping timeout: 244 seconds]
corank has quit [Ping timeout: 244 seconds]
matthews has quit [Ping timeout: 244 seconds]
mjg has quit [Ping timeout: 244 seconds]
meisaka has quit [Ping timeout: 244 seconds]
Matt|home has quit [Ping timeout: 244 seconds]
oldtopman has quit [Ping timeout: 244 seconds]
fluix_ is now known as fluix
alethkit_ is now known as alethkit
jleightcap_ is now known as jleightcap
milesrout_ is now known as milesrout
noeontheend_ is now known as noeontheend
exec64_ is now known as exec64
tom5760_ is now known as tom5760
nickster7 is now known as nickster
patwid_ is now known as patwid
gjnoonan_ is now known as gjnoonan
ZipCPU_ is now known as ZipCPU
sbalmos1 has joined #osdev
buffet6 is now known as buffet
sbalmos1 is now known as sbalmos
dennis95 has joined #osdev
Matt|home has joined #osdev
theobjectivedad has joined #osdev
elastic_dog has joined #osdev
wxwisiasdf has joined #osdev
zaquest has joined #osdev
ddevault has joined #osdev
LambdaComplex has joined #osdev
freakazoid333 has quit [Ping timeout: 255 seconds]
wxwisiasdf has quit [Quit: Lost terminal]
netbsduser has quit [Remote host closed the connection]
SGautam has quit [Quit: Connection closed for inactivity]
Gooberpatrol66 has quit [Quit: Leaving]
cheapie has left #osdev [#osdev]
dormito_ is now known as dormito
cheapie has joined #osdev
Yukara is now known as meisaka
Gooberpatrol66 has joined #osdev
mahk has joined #osdev
pretty_dumm_guy has joined #osdev
smach has quit []
Burgundy has joined #osdev
FreeFull has quit []
SpikeHeron has quit [Quit: WeeChat 3.6]
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
nyah has quit [Ping timeout: 268 seconds]
frkzoid has joined #osdev
carbonfiber has quit [Quit: Connection closed for inactivity]