<bslsk05>
hackaday.com: It’s A 486 Computer, On A Breadboard | Hackaday
gog has quit [Ping timeout: 268 seconds]
<geist>
i started to design something kinda similar to a 68030 i have floating around, but should consider adding more support chips onboard like that one does
[itchyjunk] has quit [Ping timeout: 240 seconds]
<zid>
yea it'd be rad to get an old cpu, but one that isn't typically made into a home computer like a z80 or 6502
<zid>
and hook it up
<zid>
486 is a good pick
<geist>
yah by 486 or 030 era it starts to get fairly complicated in that there are a fair amount of control bits and whatnot, but its still mostly just an A + D bus
<geist>
i think after that it starts getting pretty difficult to deal with the state without an ASIC level circuit to decode things
<linearcannon_>
maybe not ASIC, but at least FPGA
<linearcannon_>
i'm pretty sure you could manage with an FPGA all the way up to P3 era, at least
<zid>
dereferencing a function has no meaning though
<heat>
not that I would expect some other behavior except maybe erroring out
<zid>
they just decided it gives you back a function pointer, rather than an error, yea
<heat>
i guess the semantics of function vs function pointer are all pretty loose
<heat>
like funcptr = g; or funcptr = &g; doing the same thing
<moon-child>
in particular a function isn't an object
<moon-child>
so the implementation can kinda make it be whatever it wants
<zid>
It stops you having to put *f everywhere if 'f' is also 'dereference yourself and give your own pointer'
<moon-child>
in wasm apparently a function pointer is an index into a big global array of functions
<zid>
which also causes the ******f thing to work
<zid>
they have a special class of grammar called 'function designators' if you wanna look it up
<zid>
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator
<geist>
i wonder if *funcpointer() is really trying to dereferece whatever the function returns
<zid>
being the *f part
<zid>
geist: yea that's why you need the ()
<zid>
f() is shorthand for (*f)();
<zid>
not *f()
<zid>
a function designator with type ‘‘function returning type’’ is converted to an expression that has type ‘‘pointer to function returning type’’
<zid>
which is sort of recursive and weird, if you * a fp you get a function designator, and a function designator is 'a pointer to function'
<zid>
so it just decays ten times if you do ten *
<zid>
then has a dangling 11th conversion that () eats
<geist>
OCMPUERS!
<heat>
take this rust fanboys
<heat>
C!!!!!!!!!!!!!
<zid>
They could have just said that function objects can't be evaluated, and it would have all errored instead
genpaku has quit [Remote host closed the connection]
<moon-child>
they could also have had bounds checking
<moon-child>
and garbage collection
<moon-child>
but instead they had buffer overflows and use-after-frees
<moon-child>
I don't think these c-specifying-thingy-people are very smart
<FireFly>
function pointers and object pointers also potentially might not be comparable or castable to one another, which gets fun with void pointers, IIRC
<FireFly>
unless that was changed in a more recent C spec..
<zid>
yea dlsym is illegal
<zid>
I think function pointers are allowed to go back and forth again same as void * pointers
<zid>
as long as you don't try to do anything with the fake
<moon-child>
so
zxrom has left #osdev [Leaving]
<moon-child>
the c standard doesn't guarantee that you're allowed to go back and forth between function pointers and void pointers
<moon-child>
but it doesn't say you _can't_ either
<moon-child>
so it's ok for posix to mandate that you can
<moon-child>
so dlsym is legal
<mjg>
the only fhing guarnateed by the c standard is that you are going to get shafted by it
<mjg>
thing
<moon-child>
indeed
<mjg>
little known fun fact: the official standard pdf has a hidden message embedded in it
<mjg>
which says: LOL
catern- is now known as catern
<geist>
c++ of course has the whole virtual method pointer thing which is defintely not compatible with void *
<mjg>
is there a public regression testing for the scheduler?
<mjg>
not asking about some adhoc benchez
<moon-child>
'the' scheduler?
<mjg>
THE motherfucker
<heat>
mjg, what would be a regression for the scheduler
<bslsk05>
marc.info: 'Re: Periodic rant about SCHED_ULE' - MARC
<mjg>
the diea would be to fikkz in $magic manner, but then what about other cases
<heat>
okay so
<heat>
i'm not sure if you've heard of "real workloads" before
<heat>
it's a new bench
<heat>
essentially you run that and see if you regress
<mjg>
cool story
<mjg>
so
<heat>
in all honesty it's probably the best you'll get for a scheduler m8
<mjg>
bitch plz
<heat>
you can't write some weirdly synthetic shit and say it's better or worse
<Mutabah>
mjg: Mind cooling the language?
<mjg>
there is tons of funny corner cases you can't hope to cover by running REAL WORKLOAD by yourself
<mjg>
Mutabah: sure
<mjg>
i know linux has a bunch of tests, but i don't know of anything comprehensive
<Mutabah>
Also - I'm guessing you're talking about "the _linux_ scheduler" here
<heat>
no
<Mutabah>
Oh, hey, it's FreeBSD
<zid>
You can tell that guy's insane because he's writing example code in fortran
<mjg>
no, but i do suspect if there is a good test suite, it is for the linux one :)
<zid>
and writing csh scripts
<Mutabah>
Always worth clarifying
<mjg>
zid: mind cooling the language
<mjg>
Mutabah: now this much is implied mate :>
<zid>
what language
<zid>
is fortran a banned word in poland
<zid>
sorry I said it again
<mjg>
general question stands though
<heat>
f*rtr*n
<heat>
sorry, fartrun
<heat>
my IRC client auto-censors
<zid>
Imagine posting a bug report to lkml with an example program, but you did it in algol and /bin/ksh
<mjg>
do you remember the 'decade of wasted cores' paper?
<mjg>
i had a look at it due to the above problem and left rarther disappointed
<mjg>
interestingly next to not traffic on lkml either
<heat>
anyway what kind of corner cases do you think you can't find by running a real workload
<heat>
and why do those matter?
<mjg>
for example the above is a case where there is sightly more workers than cores, *all* cpu bound
<mjg>
my usual workload is *not* like that whatsoever
<mjg>
here is another one
<mjg>
dude has n threads all with nice 20 and did make -j
<heat>
there are plenty of workloads with that shit
<mjg>
without noitice
<mjg>
nice
<zid>
almost all interesting workloads are that
<mjg>
ffs people
<zid>
if they weren't cpu bound you wouldn't have a scheduling problem to begin with
<heat>
i've seen servers that had 2000 threads and like 70 cores
<zid>
"Scheduler works perfect if it's at 1% cpu load!", no shit?
<heat>
(servers as in web server, not hardware server)
<mjg>
there is tons of congiguratins one needs to check and doing it by hand by running some workloads is not going to uct it
<zid>
mjg: There are like, three configurations anyone *cares* about, however.
<mjg>
hence i'm looking for something comperhensive which i can just run and which will cover typical stuff
<zid>
So you can focus your tests on those and hit almost all actual users
<mjg>
what those might be
<heat>
welp i'm not telling you to run every workload by hand but the main thing would be running a couple of synthetic workloads of that shit and then if you have bad regressions someone complains and you take a look
<zid>
big iron, interactive desktop running prime95, no load
<mjg>
heat: i'm trying to do comperhensive job here dog
<mjg>
not just a bunch of stuff by hand
<heat>
but there's no compreehensive job to be done here
<mjg>
that's liek the last resort
<mjg>
sure there is
<heat>
you can test throughput of something, you can test the responsiveness
<heat>
but those will still be ultra specific cases where the scheduler responded based on *your system*
<mjg>
i can run on more systems
<zid>
scheduling is also chaotic, which helps a bunch..
<heat>
the stupid throughput test is a kernel build, the responsiveness test would be like testing both uncontended and contended CPU load and seeing how long until the thread gets rescheduled back or something
<mjg>
does not bode well for existence of a suite
<heat>
but all of these are stupid because the scheduler is supposed to adapt based on your system or system load or whatever its doing or how things interact with each other
<zid>
You can write a cool synthetic benchmark, but different threads will get different distances at different times (irqs, cache, etc) and then everything ends up on a different core between runs etc and comparisons get unruly.
<mjg>
huh
<mjg>
> A full run on Mel Gorman's magic scalability test-suite would be super
<bslsk05>
gormanm/mmtests - MMTests: Benchmarking framework primarily aimed at Linux kernel testing (84 forks/189 stargazers/GPL-2.0)
<zid>
Then you're just tuning bias values until someone shows up in a few years, tells everybody the entire thing is trash and does't work, and replaces it with something else that needs tuning :D
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<heat>
i agree
<heat>
expecting objective performance testing esp "realistic" one is super unrealistic
d34d1457 has quit [Read error: Connection reset by peer]
<zid>
Best you can do is a benevolent dictator scheduler that 'knows best' and doesn't support weird loads
<zid>
but handles big iron and interactive desktop gracefully
<kof123>
eh, mechanism not policy. benevolent dictator that gracefully steps down when proverbial gun held to his head, and gracefully retakes power when their services are required again
slidercrank has quit [Ping timeout: 240 seconds]
bradd has joined #osdev
<mjg>
quite frankly responses here are most bizzare
<mjg>
for starters the test suite at hand may have an array of actual workloads to test, each of which stresses different parts of scheduling
<mjg>
and where it is known that optimization for one case easily causes degradation for another
<kof123>
well im not qualified. my answer is just "give me lots of switches to toggle if need be"
<mjg>
having a collection of the sort lets one make at least somewhat informed choice regarding patchen'
<kof123>
i dont disagree with your assessment on test suite
<moon-child>
mjg: it almost feels like you'd want to somehow mock the scheduler's view of the system state
<moon-child>
which obviously doesn't account for imprecision in modeling the system state
<moon-child>
but does allow you to very precisely characterise the scheduler's behaviour
<mjg>
this is assuming i want to run a bunch of few line c progs
<mjg>
which i don't
<mjg>
again what's going on here
<mjg>
twilight zone
slidercrank has joined #osdev
catern has quit [Ping timeout: 250 seconds]
frkzoid has quit [Ping timeout: 252 seconds]
<geist>
hmm, frown. looks like most of the rv64 implementations i've seen *dont* do misaligned accesses
<geist>
but appear to be handled in firmware. transparently
<geist>
unclear if it's going to linux, or to opensbi
<geist>
a simple test is write to a unaligned address 10 million times: 4.5s, aligned .029s
<geist>
so clearly something is trapping an emulating it (this is on linux)
<moon-child>
opensbi?
<geist>
yah that's what i'm wondering
<moon-child>
no I mean, what's opensbi?
<geist>
oh it's the machine mode firmware that handles some low level details, even to linux kernel
<Mutabah>
A common firmware
<Mutabah>
think system management mode
* moon-child
nods
<moon-child>
what's the riscv architectural stance on unaligned accesses?
<geist>
the stance seems to be that user code can assume it works, and compiler can generate code accordingly
<geist>
but, it's allowed for the hardware to not support it and have it trapped and emulated in firmware
<geist>
which IMO is worse, because then you can just transparently have code that runs orders of magnitude slower
<moon-child>
hrm
<moon-child>
that's unfortunate
<geist>
it avoids having two different sets of incompatible binaries i guess
<moon-child>
oh yes, risc 'extension hell' v wants to avoid having incompatible binaries
<geist>
and means you should generally avoid unaligned accesses like the plague i guess
<moon-child>
very consistent
<moon-child>
:)
<geist>
word
<geist>
since the mdeleg register exists, it's entirely possible for machine mode code (where SBI runs) to trap all unaligned accesses first, and if they deal with it they can transparently do it behind the kernel's back
<geist>
though there is some comp[lexity as to how the machine mode code can access memory through supervisor paging. but i think there's a mechanism for that
<geist>
ah i see: mstatus.MPRV instruction lets machine mode temporarily operate as if it were at a lower priviledge mode
<geist>
while the bit is set load/stores act as if they were in mstatus.mpp priviledge level
<geist>
where mstatus.mpp is the saved priviledge level the cpu came from in the last exception
<geist>
so yeah sure enough there's some fairly convuluted code in opensbi to trap exceptions from user and supervisor mode and it'll transparently attempt to emulate the unaligned access
<bslsk05>
github.com: opensbi/sbi_unpriv.c at master · riscv-software-src/opensbi · GitHub
<geist>
gosh it'd be helpful if any of this code included like one frickin comment
<geist>
like just some sort of hint as to what the fuck it's doing
<geist>
took me staring at it for 30 minutes to figure out what's goin on
<moon-child>
comments are for quiche-eaters
<kof123>
freebsd would spit something on the console for alpha "unaligned accesss @ 0xdeadbeef" IIRC
<geist>
yah in this case the higher level firmware in machine mode is just trapping it without any ability for supervisor mode code (freebsd, linux, etc) to intercept
<geist>
it would be kinda nice to just let you take into your own hands this stuff, so you can decide to not let code do it, or at least get some sort of count of the number of times it's been fixed up
bgs has joined #osdev
<moon-child>
do riscv cpus have performance counters?
<geist>
yah that's my guess how you're supposed to figure it out. there's a SBI extension to get access to perf counters, so probably you enumerate and watch the umber of unaligned traps
gbowne1 has quit [Quit: Leaving]
asarandi has joined #osdev
<geist>
anyway, interesting if nothing else
<geist>
always fun to figure out how the sausage is made
<geist>
the whole machine mode trapping and reflecting exception model is pretty simple and powerful, and SBI uses it to great effect
MarchHare has quit [Ping timeout: 256 seconds]
MarchHare has joined #osdev
aejsmith has quit [Remote host closed the connection]
aejsmith has joined #osdev
catern has joined #osdev
gxt__ has quit [Remote host closed the connection]
gxt__ has joined #osdev
bnchs has quit [Remote host closed the connection]
smpl has joined #osdev
gog has joined #osdev
gog has quit [Client Quit]
gog has joined #osdev
GeDaMo has joined #osdev
morgan has quit [Read error: Connection reset by peer]
morgan has joined #osdev
\Test_User has quit [Ping timeout: 246 seconds]
\Test_User has joined #osdev
danilogondolfo has joined #osdev
dude12312414 has joined #osdev
Left_Turn has joined #osdev
Left_Turn has quit [Remote host closed the connection]
gabi-250_ has quit [Remote host closed the connection]
gabi-250_ has joined #osdev
danilogondolfo has quit [Ping timeout: 265 seconds]
danilogondolfo has joined #osdev
smpl has quit [Ping timeout: 248 seconds]
danilogondolfo has quit [Ping timeout: 265 seconds]
sinvet has joined #osdev
d34d1457 has joined #osdev
Left_Turn has joined #osdev
danilogondolfo has joined #osdev
freakazoid332 has joined #osdev
danilogondolfo has quit [Max SendQ exceeded]
danilogondolfo has joined #osdev
danilogondolfo has quit [Max SendQ exceeded]
danilogondolfo has joined #osdev
slidercrank has quit [Ping timeout: 250 seconds]
d34d1457 has quit [Read error: Connection reset by peer]
danilogondolfo has quit [Quit: Leaving]
rnicholl1 has joined #osdev
rnicholl1 has quit [Quit: My laptop has gone to sleep.]
danilogondolfo has joined #osdev
<heat>
geist, fyi that's disgusting
<heat>
the trapping into fw for unaligned stuff bit
* gog
traps into firmware
Ali_A has joined #osdev
Ali_A has quit [Client Quit]
Ali_A has joined #osdev
<heat>
noooooooooo gog dont trap into firmware
<heat>
that's a bad idea you'll hit firmware bugs!
gog has quit [Quit: Konversation terminated!]
<Ermine>
wdym
<heat>
wdym wdym
<Ermine>
trapping into fi
<Ermine>
firmware
Ali_A has quit [Quit: Client closed]
<heat>
welp in this case geist was saying that riscv either handles unaligned loads and stores directly or traps into firmware
<heat>
and he was saying that most/all riscv CPUs currently do not handle it directly but rather the firmware does it for them
<heat>
which is severely wrong IMO
<geist>
yah they were clearly trying to make it so that the unaligned problem is not for regular code to worry about, except it kinda is
<geist>
because it runs so slowly obviously you have to deal with it
* roan
traps into firmware
<roan>
we should really do something about this
<nortti>
is there any benefit to trapping to fw over to kernel?
<geist>
stay aligned!
<geist>
not particularly. the kernel could do it more efficiently, but in this case openSBI hard sets the mdeleg bit so that the kernel has no opportunity to override it
<heat>
nortti, works transparently for everyone I guess
qubasa has quit [Remote host closed the connection]
<heat>
geist, has no ISA extension mandated hardware unaligned accesses or something?
<heat>
cuz, you know, this seems like a big problem
<zid>
can it.. load bytes?
<geist>
possible. like the spec says it's valid for hardware to deal with unaligned natively
<zid>
or is the problem that it's 32bit load/store only
<heat>
zid, yeah it can
<geist>
but if it isn't then firmware must transparently do it, so that you dont have to worry about it at user space level
<zid>
so you can at least write code that doesn't need it then, same as the C model
<zid>
I'd prefer it crashed, tbh
<heat>
it's like x86 with #AC but the firmware handles your exception and patches it up
<heat>
yes, same
<geist>
but then clearly you want to avoid unaligned accesses because it's slower. similar to x86 and arm64 where it's generally not a good idea for performance reasons, all else held equal
<heat>
hm?
<geist>
right, my main complaint is exactly that, i can't make it crash if i wanted to
<heat>
x86 unaligned accesses are very close to ideal
<geist>
sure
<geist>
but it's an implementation detail. hardware deals with it, but there have been periods of times with various x86 microarchitectures where it was more of a hit
<heat>
in fact that whole overlapping copy stuff is all based on this
<geist>
much like someone could make a riscv core that deals with it with very little to no cost
<heat>
or all the memcpy implementations really
<geist>
that's precisely where i got started on it, wanted to sit down to build a new asm memcpy
<geist>
and was going to rely on the 'align to dest, dont worry about src' memcpy strategy
<geist>
but oh no, you better deal with both here
<geist>
glad i tested this first
<heat>
it's super possible that the best way to do this atm is like glibc
<heat>
in C too
<geist>
i was basically just going to implement that logic in asm
<geist>
main reason being that i want to force it to use a fixed set of registers, so i can safely use it in multiple asm contexts
<geist>
ie, user copy, memcpy, etc
<heat>
hmm yeah
<heat>
wait does your user copy need to be in asm too?
<geist>
right now there's a silly problem in zircon where i dont know what registers it uses, so it's a full function call inside the guts of the user copy routine, so i have to basically hand roll a full setjmp/longjmp for the error case
<geist>
it's basically 'free' to do that if its known that memcpy only uses callee trashed regs
<bslsk05>
github.com: Onyx/usercopy.cpp at master · heatd/Onyx · GitHub
<geist>
that's the strategy we do on arm64. the user copy 'set the recovery pointer' mechanism is implemented as an asm wrapper that sets things up, calls into memcpy (which is implemented in asm), and thus can undo things
<heat>
asm goto baybeh!
<geist>
but that works because the memcy routine on arm is explicitly written to never touch the stack or any saved regs, so it can be safely 'branched out of' the middle of it
<geist>
i basically wanted the same thing on RV
<geist>
and there's enough registers that it's basically possible, sinc eyou h ave a0-a6 + t0-t6 to work with
<geist>
a7 even
<heat>
yeah
<heat>
tbf you could make it work using the frame register right?
<heat>
ah wait no, saved regs
<heat>
yuck
<geist>
yah, if it pushes any saved regs you can't recover them
<geist>
so at the moment i have a total hack that just pushes s0-s3 locally, saves sp locally, then sts it up
<geist>
and i know that the memcy implementation actually only fiddles with s0-s2, so s3 is enough of an anchor to get it back
<geist>
but clearly that's Bad becaus eyoure relying on the compiler to generate code in a particular way
<geist>
nice thing is it'll instantly fail a unit test if the compiler tries anything else. so it's good for like, this week
<geist>
so what'd be ideal is to have a macro with a memcpy implementation to stamp out, that i just put inline
<geist>
or... a memcpy function that is known to just use a and t registers so it's safe to call it, either way
gog has joined #osdev
<gog>
mew
Ali_A has joined #osdev
* geist
pets gog
* gog
prr
danilogondolfo has quit [Ping timeout: 255 seconds]
<Ermine>
gog: may I pet you too?
<gog>
yes
* Ermine
pets gpg
<Ermine>
LOOOOL
* Ermine
pets gog
<geist>
it's nice innit?
<heat>
gog geist needs your help writing a riscv memcpy
<geist>
MEMCPY
* gog
prr
<geist>
if only RUUUUUUUUST
<Ermine>
Let's get gog out of fw first
<heat>
RUUUUUUUUUUUUUUUUUUUUUUUUUST
<geist>
what, we need to stop trapping gog in firmware?
<heat>
yes
<heat>
firmware bad
<geist>
gog in the machine
<gog>
don't copy memory
<gog>
copying bad
<gog>
never copy
<Ermine>
map map map map
<geist>
dont copy that memory!
<geist>
that's not aligned with the ideals of the firmware
<Ermine>
One mremap to rule them all
danilogondolfo has joined #osdev
<heat>
silly gag mapping memory is slow!
<Ermine>
-sponsored by memory copying gang
<geist>
welcome to the machine (mode)
<heat>
TAKE THE RED PILL, EXIT THE MACHINE
bnchs has joined #osdev
<heat>
why is riscv such a trashy arch
<geist>
oh i dunno, i think it's kinda elegant
<geist>
just limited
<heat>
i dont like that half the stuff is handled in M mode and passed down to the kernel like some sort of weird hypervisor
<sham1>
What did you expect from a RISC
<geist>
all of my complaints tend to be some variant of 'this area is underdeveloped and needs to get more complicated'
<geist>
ah yeah the SBI stuff is a mixed bag
<geist>
though to be fair it's not required for say embedded or whatnot. embedded code probably runs directly in M mode and doesn't have any SBI around
<heat>
and this "haha unaligned accesses Just Work" is another weird quirk
<heat>
like, why?
<Ermine>
Mr. Smith: *tries to copy unaligned memory* Everybody: *get into machine again*
<geist>
i may see about getting this changed via work, but trying to come up with a compelling argument
<heat>
most of this stuff doesn't really seem very well thought out
<geist>
ie, at least have some ability to set the properties of SBI via some call
<heat>
geist, performance?
<heat>
or control
<geist>
yes
<Ermine>
Btw how does it happen on other arches?
<heat>
how does what happen
<geist>
varies. some architectures disallow unaligned accesses entirely
<geist>
or they let you set a control bit that conditionally allows it
<geist>
or they allow it entirely
<heat>
yeah i mean in general on new stuff you just handle them
<geist>
tend to fall within those 3 categories
<Ermine>
unaligned memory access
<heat>
like x86 and arm64 all do unaligned
<geist>
there's a 4th category, like armv4, where it just gave you effectively garbage when you did an unaligend access
<sham1>
Unaligned accesses are ugly, slow and considered harmful
<heat>
you're ugly and slow and considered harmful
<sham1>
Right, but so are unaligned accesses
<geist>
so what riscv is trying to do is standardize the ABI to declare that unaligned is okay
<geist>
so it's fairly clear they want things to eventually arrive at unaligned is fine, but current hardware doesn't have the support
<geist>
so SBI transparently fixes it. so it's clearly not a performance choice, but a anti-fragmentation choice
<geist>
i get the idea, i'd just like a bit more control so i can turn it off and trap
<geist>
and note this is everything to do with supervisor mode level OSes. for pure embedded, machine mode, you have to deal with it yourself
<geist>
but in those cases ABI compatibility is generally not a concern
<geist>
note i haven't checked to see what qemu is doing. it's entirely possible qemu is simply doing full unaligned access, since there's probably little reason for it not to
<danlarkin>
I'm only mildly informed but I think it changed recently to faulting to M mode on an unaligned access
<heat>
cry.jpeg?
<heat>
you know, because qemu tcg isn't slow enough
<geist>
danlarkin: as in SBI didn't previously do this?
<geist>
well opensbi that is
<danlarkin>
nah qemu I mean
<geist>
oh gotcha. yah was gonna write a test in a minute, easy enough to determine by just looking at the performance of it
<geist>
heat: hmm, from what i can tell that never happened. i'm not getting unaligned traps on qemu
wereii has joined #osdev
<heat>
oh cool
catern has joined #osdev
<mrvn>
geist: maybe qemu faults on unaligned access if and only if the actitecture segfaults on unaligned access and qemus segfault hander then decodes the opcode and throws unaligned access
<mrvn>
Isn't unaligned access even on x86 still slower if you cross a cache line? You also keep 2 cache lines busy that way.
* mrvn
doesn't get how people still generate unaligned access. Stop using packed.
<moon-child>
I'm p sure 2 cache lines is the same speed as 1
<moon-child>
for <=8-byte accesses, that is--wider accesses do want to be aligned
<moon-child>
even if not, though, it's just one extra cycle
<moon-child>
I think page crossing is more expensive, but even in that case is just microcoded and 10s of cycles
crankslider has quit [Ping timeout: 265 seconds]
<heat>
moon-child, isn't avx2 movaps just slightly faster than movups
<moon-child>
no
<moon-child>
on aligned addresses, they have the same performance. On unaligned addresses, one fault, and the other might be slow
<geist>
a IMO valid use case i've seen a compiler emit is for example copying 7 bytes: the simplest code gen is to emit to 4 byte load/stores, with one of them offset by one and overlapping
<geist>
stuff like that i've seen a compiler do when i knows that it can do unaligned accesses with no issue. arm64 in particular loves that sort of thing
<moon-child>
I have at least one mathematical program that relies heavily on unaligned accesses
<moon-child>
if it's fast, no reason not to rely on it. (If not, then, well, different algorithm is called for)
<geist>
exactly
* mjg
is doing locked atomic ops across different huge pages
<mjg>
2 byte ops!
* moon-child
slaps mjg around a bit with a large trout
<mjg>
people talk about "latency bubbles" in the frontend etc.
<mjg>
what is actually happening is that the cpu sees the shit you are feeding it and facepalms, then needs some cycles to recover
<moon-child>
I read somewhere there are plans to let the cpu straight up fault on split locks
<mjg>
but muh code!
<moon-child>
poor code
<mjg>
there was a lkml thread claiming there are real *games* using this shit
<heat>
yes there are
<heat>
this was a whole saga
<moon-child>
o.o
<heat>
they had to rollback the extensive throttling they were doing to split locking threads
<bslsk05>
lwn.net: The search for the correct amount of split-lock misery [LWN.net]
<heat>
actually, sorry, they kept the throttling but added a command line knob to the kernel
<mjg>
truegamer=1
<heat>
moon-child, btw this is an actual feature for 11th gen+ intel cpus
<heat>
you get an exception for a split lock
<moon-child>
oh cool
<moon-child>
I thought it was just planned
<mjg>
curious if RUST will end up generating code which runs into it
<mrvn>
moon-child: why would page crossing be more expensive?
<mrvn>
geist: the important word there is "knows that it can do unaligned accesses"
<moon-child>
not sure. Possibly protection stuff
<geist>
right
<mrvn>
moon-child: a second TLB lookup?
<moon-child>
but I have heard this is the case at least for apple arm and intel
<geist>
and this is what ARM64 and armv8 has mandated
<moon-child>
actually now I'm somewhat curious on x86 if this applies to a crossed 4k boundary when you have big pages
<moon-child>
considering x86 cpus tend to do a lot of 4k stuff regardless of the page size
<mrvn>
moon-child: If it does then it's doing a TLB lookup for no reasons. But the part of the core that does that might not know the page size
<heat>
what does?
<mrvn>
Frankly, nobody uses huge pages so why should they optimize for that?
<moon-child>
mrvn: well--either that, or it doesn't have to do with tlb
<moon-child>
mrvn: wut
<mrvn>
moon-child: other than the phys mapping and VMs where would you use huge pages?
<heat>
mjg, RUST does not generate code, llvm does. checkmate freebsd idiot
<heat>
RUST is perfect, LLVM is a donkey
<moon-child>
any case when you have a lot of data and you don't want to die on tlb...?
<heat>
"every kernel ever"
<mrvn>
moon-child: and what app does that?
<moon-child>
most client apps 1) have comparatively small working sets and 2) don't need to manage their allocations at a low level to that degree
<mjg>
heat: you missed the part where llvm has no choice but emit what it was asked for
<mjg>
heat: and if it was asked for a split lock op, dafaq ya gonna do
<mrvn>
NUMA and huge pages might be used in some high performance cluster stuff but all the home and desktop use cases won't be using it.
<moon-child>
however I would expect allocators to transparently take advantage of that where feasible. I know mimalloc at least does (leaving aside lulz regarding queueing...), and I would expect the nice java gcs to
<mrvn>
moon-child: does it pass the MAP_HUGE to mmap?
<heat>
dude I'm fairly sure that since a recent kernel version a lot of program stacks are getting hugepage'd
<heat>
through thp
<mjg>
stacks?
<heat>
yes
<mjg>
huh
<mrvn>
stacks? that would require actually allocating a multiple of 2MB for them.
<heat>
which they are going to try to revert because it's somewhat silly
<mjg>
not saying they should not, but i would expect fragmention to fuck it up real quick
<mjg>
anyone has data how much stack is normally used?
<geist>
which i've heard is actually a pretty good win if you can pull it off
<mrvn>
geist: I would be interested in running with 64k page size. It's kind of the best size for IO too.
<geist>
indeed
rnicholl1 has joined #osdev
<mjg>
uh
<moon-child>
geist: 32 interesting, have a reference?
<geist>
ithink the biggest downside being the minimum size for a lot of things goes up, so probably would be a nonzero amount of overhead
<geist>
moon-child: yeah it's just described in the AMD manual
<mrvn>
memory is cheap.
<geist>
if you map N pages back to back in the same way on the same alignment etc the hardware may transparently treat it as a larger TLB
<geist>
but 16K pages is i think a nice compromise
<mjg>
does 16K work though right now?
<mjg>
i wuld expect tons of hardcoded 4k sizes
<mjg>
in userspace
<moon-child>
I feel like this probably depends on priorities
<mjg>
like everything was vax
<mrvn>
geist: I would expect the overhead to be pretty minimal for userspace actually. Most memory goes towards malloc and that can just use the remaining 60k after 4k was used for the next allocation. Doesn't really make sense to ask the kernel for single pages anyway.
<moon-child>
intel prioritises hpc and server stuff so 2mb is fine for them
<geist>
mjg: i'm fairly certain linux has gotten 16K support at this point
<moon-child>
apple on client cares somewhat more about fragmentation when you have a lot of apps
<moon-child>
amd somewhere in the middle
<geist>
note i'm talking about arm64 which up front in its ABI mandated that 64K is the largest base page size
<mjg>
geist: in that spirit freebsd is ding 16k on arm64 as well
<mjg>
geist: but that already required a fair amount of rototoiling the base system
<geist>
i know linux had 64k support pretty early on for arm64, but i think it only got 16k relatively recently
<mjg>
i have to expect 3rd party software not only hardcodes 4k, but that it is doing nasty stuff with it
<geist>
oh i'm sure stuff was busted, but then that hardware would't have worked on alpha!
<geist>
or vax!
<mrvn>
I think linux got 64k support for ppc and 16k for arm.
<mjg>
i'm saying i expect the current realities to pretty bad
<mrvn>
arm64 just inherited it
<geist>
yah
dutch has quit [Quit: WeeChat 3.8]
* geist
nods
<mrvn>
mjg: then it will fail because it didn't check the sysconfig for page size
<geist>
i mean sure, yeah OTOH that same software running on mac already has to deal with it, so i suspect things sort itself out over time
<geist>
old x86 software is i think where there's a lot more problem, since there's a basic assumption that old x86 stuff still works. so if there was some minimum page size bump there i bet a lot of shit gets broken
<geist>
ARM64 is new enough that i would think you're not going to get too upset on linux if some old binary doesn't run right
<mrvn>
I wouldn't expect things to run "not right" but outright fail.
<moon-child>
honestly I would expect concurrency stuff to cause more silent breakage
<moon-child>
'not right' yeah
<mrvn>
EINVAL We don't like addr, length, or offset (e.g., they are too large,
<mrvn>
or not aligned on a page boundary).
<mrvn>
My expecteation would be that anything that does things PAGE_SIZE related will call mmap or mprotect and get the above error
[itchyjunk] has joined #osdev
<mrvn>
Urgs, just though of something. All ELF files are build so the segments are at 4k or 2M boundaries. If you change the page size to 64k then 4k alignment won't work and 2M you have to split into 64k chunks.
<mrvn>
So 32bit x86 is screwed with it's 4k default. On AMD64 it's hit or miss wether something uses 4k or 2M.