<nikolar>
tell me you don't know c without telling me
<Matt|home>
here's an interesting basic question.. let's say you're learning a language for the first time, brand new to computers. in general, is there anything a regular beginning computer user could write during the early stages of learning coding that would break the compiler in half in terms of efficiency/not being able to optimize?
<Matt|home>
im curious how "efficient" modern compilers really are when faced with dumbassery..
<zid>
nikolar: what are you talking about? he has correct and perfectly working code
<zid>
It says so right there
<heat>
yes.
<nikolar>
kek obviously
<nikolar>
my bad
<Matt|home>
.. i haven't read the article but it sounds like you're making fun of this guy because he's assuming compilers are perfect?
<Matt|home>
there are worse things to assume you know
<heat>
we're making fun of this guy because he's hilariously wrong
<nikolar>
no, i am making fun of him because he thinks dereferencing null pointers should give you working code
<nikolar>
amongst other things
<Matt|home>
playing devil's advocate a little bit, it can be rather frustrating not understanding in the beginning when to bother and when not to care about code efficiency
<Matt|home>
as far as im aware there aren't really any clear and fast rules, it's fairly subjective. no?
<nikolar>
i mean one of the first things you learn about c ub is that null is a big no no
<nikolar>
*dereferencing null
<Matt|home>
i said "in the beginning", as in beginning coders learning programming for the first time
<heat>
>In my more than 30 years of development
<heat>
btw
<heat>
this is not a beginner coder
eddof13 has joined #osdev
<nikolar>
Matt|home: c is notorious for ub
<Matt|home>
im aware
<nikolar>
heat: i assume he hasn't been writing c that long lol
<nikolar>
no way you can go that long without noticing some of this stuff
<Matt|home>
thus my earlier question: let's assume that this is considered an 'obvious' thing not to do, don't aim for undefined behavior unless that's what you want. but most new people don't learn C
eddof13 has quit [Client Quit]
<heat>
nikolar, there are definitely loads of C programmers that don't believe in undefined behavior
<nikolar>
well you can't be mad at the compiler if you don't even put in the effort to learn the langauge
<nikolar>
heat: what do you mean
<heat>
they're like, UB atheists
<zid>
I know one!
<heat>
UB can't hit me if i don't believe it exists
<zid>
I posted some quotes the other day
<heat>
COMPILER BUG COMPILER BUG GCC BORKEN
<nikolar>
lol
<nikolar>
use tcc then
<nikolar>
i doubt it will ever take advantage of ub
<zid>
[00:51] <zid> "as someone who has done all this on multiple platforms and compilers, then anyone is free to disagree all they want. you can't argue with working software"
<zid>
[00:51] <zid> 'please stop doing UB' -> "you can and for a lot of software you must"
<nikolar>
and it will always do exactly what you tell it
<nikolar>
ah i remember that zi
<nikolar>
*zid
<zid>
You can't argue with working software nikolar.
<nikolar>
indeed
<nikolar>
how dumb of me
<zid>
UB is clearly fake, software works fine
<heat>
my future C compiler will delete your whole program at the sight of UB
<nikolar>
heat: at least you'll know exactly when you're ub-ing :P
<Matt|home>
hey. here's a smarter idea
<Matt|home>
let's not have undefined behavior to begin with huh?
<Matt|home>
sounds like a better idea imo
<zid>
spec isn't robust enough for that sadly
<kof673>
hyrum's law With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody. -- Hyrum Wright, 2012
<zid>
It's pretty fuzzy in a lot of overlap situations
<heat>
strict aliasing violation will blow you up willie-e-coyote style
<kof673>
this applies to things besides software too :D
<nikolar>
also a lot of ub is kept around for the purposes of optimization
<nikolar>
intentionally
<zid>
strict aliasing violations are one of the areas the spec is woefully inadequate
<zid>
infact
<nikolar>
imagine having to null check every pointer deref
<heat>
Matt|home, ok. what's division by 0 supposed to do?
<Matt|home>
also let's be honest, compilers are both incredibly amazing and a f****** joke
<heat>
what's an unaligned access supposed to do?
<heat>
lots of great questions that the anti-UB crowd can't answer
<zid>
UB is holes poked into the spec to allow optimizations, mainly around not having to check error conditions, the programmer just promises never to cause it to happen.
<Matt|home>
heat - depends on the situation. the point is that regardless of how arbitrary it is, you need to clearly define these things
<nikolar>
(btw, remember that c works on practically any hardware you can get your hands on)
<Matt|home>
it could turn the computer into candy corn, but as long as it does that all the time that's what's important.
<nikolar>
zid: also portabilty
<zid>
So now the compiler doesn't have to wrap *every* division with checks that you're not dividing by 0, because you promised never to do that.
<heat>
Matt|home, you cannot do that
<Matt|home>
that's great, so we're sacrificing the ease of a programmer writing code for the sake of making it more efficient for the computer. that's not the correct way to do this. the correct way to do this is to make it as simple for the programmer and the computer to make the most efficient code without having any hidden surprises
<heat>
anti-UB crowd can't answer how you'd standardize a lot of hardware behavior in a way that allows everything to be performant and efficient
<nikolar>
c is very simple actually
<heat>
some caveats to that, but sure
<heat>
simple-ish
<Matt|home>
and the correct way to do that is by having a low level language like C with NO UNDEFINED BEHAVIOR work across a multitude of platforms, and then create wrappers or simpler APIs around C
<Matt|home>
that is the correct way to approach this.
<heat>
there's already a really low level languge with NO UNDEFINED BEHAVIOR that every other language uses
<Matt|home>
which?
<heat>
it's called *checks notes* "assembly"
<zid>
heat: asm has UB
<zid>
:D
<Matt|home>
... im pretty sure x86 asm has UB bro
<heat>
1) technically
<heat>
2) oh yeah? point it out
<zid>
a bunch of instructions leave flags in an unspecified state
<heat>
3) tell me exactly why there shouldn't be UB
<Matt|home>
^
<heat>
ohno!
<zid>
and they're set to different things on different models of similar cpus
<nikolar>
that's because you never care about those flags in normal code
<Matt|home>
heat - because unpredictability means unpredictable future problems down the line, plain and simple
<zid>
so you can actually code against them if you want, you're just not "supposed" to, aka, UB
<nikolar>
thus, ub is perfectly fine
<Matt|home>
and those problems can scale
<Matt|home>
that's why
<heat>
they're not unpredictable future problems, just dont be a complete idiot
* Matt|home
bangs head on keyboard
<heat>
seriously, you basically need to sniff glue to hit x86 UB
<nikolar>
heat: i don't think glue is strong enough for that
<Matt|home>
people are idiots man.. it's not unreasonable to want to make things idiot-proof..
<heat>
UB exists to allow for future expansion, future optimizations. you DO NOT want to document every bit of your system
<kof673>
at the risk of sounding "postmodern" and not meaning to say it is impossible.........people don't agree on what words mean :D
<heat>
you DO NOT want to set everything in stone
<kof673>
the perfect spec, 30 years later, the lingo will have changed
<zid>
This masks the count to a value between 0 and 31. If a count is greater than the operand size, the result is undefined.
<zid>
cl > 31 for shl
<Matt|home>
i'll admit that's a valid argument, to allow room for future optimizations to grow not being super strict is probably a good way to do that. but i feel like my argument has equal weight, where you can have critical systems get broken because of UB
<zid>
is UB
<heat>
if you formally document your system to work in a way, and then it doesn't allow for future changes, you're fucked. x86 2 electric boogaloo
<Matt|home>
and this isn't based off of anything, this is entirely subjective on my part but i feel like making it easier for people in general will lead to more positive outcomes
<heat>
read docs and don't sniff glue. generally has good outcomes
<\Test_User>
that kind of 'UB' is still defined, since it's still defined to *not* cause your CPU to hand you ring0 and and such
<zid>
test has UB (AF flag undefined), aaa/aas/daa/das/sal/sar/shl/shr/imul/div/idiv/fstsw/fnstsw/fadd/faddp/fiadd
<Matt|home>
so let's go back to my previous question, which has kindly been ignored
<Matt|home>
for a beginning programmer new to computers, what are the most common things they can do to make the compiler cry without realizing it?
<zid>
\Test_User: undefined value rather than undefined behaviour, technically, mr. anti-semant
<Matt|home>
and by 'cry' i mean 'seriously affect the performance of either the program or system'
<nikolar>
Matt|home: complete beginners shouldn't be writing OSen
levitating has joined #osdev
<Matt|home>
i never said OS code. i said beginning code, this discussion is about compilers
<heat>
you keep popping in and dumping random questions in #osdev
<heat>
this is not ##c
<zid>
"How dare this chainsaw not safely cut down this tree unaided"
<\Test_User>
zid: and undefined values are perfectly fine to have around, as long as you don't decide to rely on the value thereof
<zid>
heat: They're probably sick of him, tbf
<\Test_User>
whereas undefined behavior is not
<Matt|home>
alright fine. i was under the impression that these discussions were welcome but clearly im incorrect
<zid>
\Test_User: No, both are hugely mandatory
<Matt|home>
i'll go fuck off.
<nikolar>
lol
<zid>
Please do not remove UB from c
<zid>
You could try it though, then name it after some videogame
<heat>
you can't efficiently divide in arm64 without undefined values, you can't efficiently divide in x86 without UB
<\Test_User>
zid: by 'have around' I meant as in, occurs when running it, not getting rid of the existence of UB (though true UB should *not* exist on the CPU level)
<nikolar>
zid: considering we got uwus and meows in the c standard, i woudln't put it past them :P
<\Test_User>
well, CPU w/o protections and such would be fine to have actual UB I suppose
<zid>
\Test_User: 'true' UB yea probably not but that isn't what it means in that context, just that it isn't specified
<zid>
You're right, it'd be pretty crappy if it caught fire instead, and you could rightfully consider that a silicon bug
<zid>
this is like the "I'm not not hungry" discussion all over again :P
Matt|home has quit [Quit: Matt|home]
<zid>
heat: idk if you knew, but matt's been on freenode/libera 'learning asm/c/whatever' for about a decade now
<zid>
he at one point claimed to have been in a bad car accident and gotten brain damage
<zid>
so he just repeats the same questions over and over every 3 days
<nikolar>
zid: didn't some early 6502 completely lock up if you executed some illegal instructions
<zid>
and retains none of it
<nikolar>
that's ub for you :P
<zid>
nikolar: 6502 has really sweet UB, some of them lock up yea
<zid>
but most of them just strobe random ass control lines
<heat>
yeah im somewhat aware
<zid>
so you end up running TAX + LDA at the same time
<nikolar>
i think the lockup was fixed in the cmos versions
<zid>
and you get weird meta-instructions that do two things at once and wreck the flags
<nikolar>
so it's safe to poke around illegal instructions
<zid>
gbcpu has a halt bug
<zid>
it forgets to increment PC coming out of HALT (back from the interrupt)
<nikolar>
yikes
Turn_Left has quit [Read error: Connection reset by peer]
<ring0_starr>
maybe C doesn't define what happens, but on x86, what usually happens unless you're a MSVC 19 /O2 user, is that it'll execute the "idiv" instruction, which raises #DE. this will trigger the OS's registered "int 0" handler.
<nikolar>
eww msvc
<ring0_starr>
and yes i get that it's not portable
<heat>
fwkw
<heat>
fwiw
<heat>
the performance impact is, afaik, not that large
<ring0_starr>
but saying "see what your OS does in response to <XYZ>" is a lot nicer than saying "magical purpple ponies might start coming out of your USB ports if you do this"
<nikolar>
but it's there
<ring0_starr>
look at the bloat
<nikolar>
for a blazingly fast, zero cost abstraction language
<nikolar>
so you know
<heat>
particularly they're doing an experiment in C++ containers where they enable bounds checking for every access by default, and you lose like 5% perf worst case
<heat>
and the C++ compiler isn't an all-knowing rustc god of formal proving shit
<nikolar>
neither is rustc honestly
<ring0_starr>
it sells itself as such
<heat>
should hopefully be better
<nikolar>
they just dump very bad llvm ir for llvm to sort out
<ring0_starr>
i remmeber that powerpc had a special thing for bounds checking
<ring0_starr>
didn't catch on i guess
<heat>
x86 also had concepts of a plan
<heat>
mpx ded
<nikolar>
concepts of a plan?
<nikolar>
mpx?
<zid>
concepts of a plan is an ameripol mean
Gooberpatrol66 has quit [Ping timeout: 260 seconds]
<nikolar>
eh?
<kof673>
*"united states"-ipol
<nikolar>
yes, that's the single part of that sentence that i understood lol
<kof673>
its what somebody would say when there is no plan :D
<Mutabah>
"concepts of a plan" is something Trump claimed during one of the debates... sums up half of his policies
<Ermine>
Seems like this is "unspecified behaviour" rather than "undefined behaviour"
<nikolar>
same difference
<Ermine>
And that's not the property of an assembler, it's property of ISA
<nikolar>
what's the difference
<nikolar>
assembly is just a textual representation of the isa instructions
rorx has joined #osdev
<Ermine>
The difference is that the assembler behaviour is pretty much defined: it converts mnemonics to binary code
<Ermine>
It won't leave your procedures out if it figured out something
<nikolar>
i think it still counts
<nikolar>
not that it matters
<Ermine>
Imo it matters from terminology side of things
<nikolar>
does it
<nikolar>
it's still undefined what happenes to flags when you do certain operations
<nikolar>
whether you call it assembly or isa
<nikolar>
same outcome
<Ermine>
unspecified
<nikolar>
what's the difference between unspecified and undefined
<Ermine>
in various standards those are two different terms for some reason
<nikolar>
well what's the difference when you say it :P
<Ermine>
idk actually
<nikolar>
exactly
<Ermine>
ik that there's a reason to that
<nikolar>
so undefined is fine
<Ermine>
but you need to ask other people tho
<Ermine>
not to me
<geist>
to me i'd read unspecified as 'it does something but isn't mentioned here'
<nikolar>
what's the difference from undefined
<geist>
and undefined is more random results, but usually a *good* cpu/isa/architecture description will be much more precise
<nikolar>
well you can't say that flags are defined after a certain operation if they differ frmo cpu to cpu
<nikolar>
undefined/unspecified are both fine there i'd say
griddle has quit [Quit: griddle]
<geist>
ARM has much more precise definitions in their arch, and there's truly undefined and something like unspecified
<nikolar>
unless you want to list in excruiating detail some mostly useless facts
<geist>
er i mean
<geist>
well i didn't read the scrollback to see what thecontext is
<geist>
if you read random arch manuals from yore or yonder they'll be all over the place
<geist>
since different companies or whatnot will have different standards
<geist>
*usually* though i think undefined and unspecified i'd treat the same
<geist>
when describing say what happens to a flag after this instruction
housemate has joined #osdev
<geist>
may basically mean 'it'll do something, but differnet implementations may do different thins, so dont rely on it'
<geist>
arm will call that IMPLEMENTATION DEFINED
<nikolar>
exactly
<nikolar>
all of those are approriate unless regorously defined
<nikolar>
as differetn
<nikolar>
like in std
<nikolar>
*std c
<geist>
vs say in ARM where it'll say UNPREDICTABLE
<nikolar>
arm vs ARM
<nikolar>
*
<geist>
sometimes UNDEFINED may be 'it's not defined now but in the future this may be'
<geist>
like for undefined instructions or whatnot
<nikolar>
> arm will call that IMPLEMENTATION DEFINED
<nikolar>
> vs say in ARM where it'll say UNPREDICTABLE
<nikolar>
eh?
<heat>
undefined behavior is really powerfu land and im a big fan
<geist>
oh i reallymean 'in another case arm will say IMPLEMENTATION DEFIEND'
<nikolar>
i agree
<geist>
i guess i shouldn't just wander in and blat out a bunch of stuff
<geist>
just came in from outside doing yard work
<heat>
arm is the arch reference manual right?
<nikolar>
lol you just said arm vs arm
<nikolar>
so i was like what
<heat>
arm the manual vs ARM the company making actual cores
<geist>
or ARM the company and ARM the architecture and whatnot
<geist>
i have a tendency to capitalize it when i remember to to make it stick out as in dont read it like it's a piece of a body
<nikolar>
heat: but how would ARM the company call some behaviour ID if not throgh the manuals
<nikolar>
right
<geist>
technically a few years back in like 2021 ARM, Ltd redid their logo to just be lowercase 'arm'
<heat>
yeah
<heat>
now it's a e s t h e t i c
<heat>
nikolar, so there's a difference between the arch and the cores
<heat>
arch manuals specify how the ISA generally works, cores can give you a little more information and be a little more specific
<nikolar>
sure
<geist>
but i use them as an example because in their 15k page manual they're at least very precise with their terminology
<heat>
because ofc apple can have their own core and their own behavior may be different than an ARM core
<heat>
all cores implement the ARM arm, only a single core (or family of cores) implements the arm cortex A-72 series manual or whatever
<geist>
but the apple core can ony differ where the arm arch allows it
<heat>
doesn't apple have special status?
<geist>
other arches have a similar strategy: POWER, SPARC, etc have their own architecture descriptions
<nikolar>
i think they do
<nikolar>
but geist would probably know better
<geist>
but then a lot of the other ones like x86 are just a desceription of how the cpus work, which also functions as an arch manual
<heat>
SPARC SPARC SPARC SPARC
<geist>
apple has special status in that they pay arm a lot of money for an 'architecture licese'
<geist>
license even
<heat>
yeah cuz x86 is a lot more adhoc
<geist>
a lot of companies have done that
<zid>
I put that down to whether it's licenced or not
<nikolar>
SPARC SPARC SPARC
<heat>
x86 doesn't even have a formal memory model
<geist>
you pay them a lot more up front for the priviledge of making your own implementation
<nikolar>
geist: i thought apple didn't pay a lot for the arm cores because they were something like the founder back when arm was starting
<zid>
if you're expecting someone else to make one, you want a formal description of how it 'should' work, rather than a description of how the cpu you made *does* work
<geist>
but you still have to conform to the arch spec to keep people from just doing what they want
<nikolar>
or something like that
<nikolar>
heat: yes it does?
<heat>
no it does not
<geist>
oh apple may get a good deal from arm, but who knows the details of that
<geist>
*lots* of companies have architectural license. that's what qualcomm/arm are fighting over. the terms of their agreement
<geist>
nvidia is another major arch licenser
<geist>
their Denver cores, etc
<heat>
SDM was historically super imprecise and still is super imprecise wrt memory model
<geist>
to summarize i guess: arms designs the architecture and will license it to you but will also make a bunch of implementations that you can buy probably on a per unit basis
<heat>
amd is also an arch licenser
<heat>
ZEN!
<heat>
also the amd seattle
<geist>
ie, if you're i dunno uh hwawei or whatnot you may pay like $1 per chip you sell if you licensed a cortex-x1 or whatnot
<nikolar>
heat: where did you get that zen is related to arm
<geist>
i have no idea what the pricing looksl ike, but i'm sure that's fiercely licensed forom ARM
<heat>
nikolar, they have admitted to it in the past
<geist>
i'm also sure when you get a core implementation from ARM you cannot modify it. think of it as a big .a file with a few #defines that you can set
<heat>
the K12 i think?
<geist>
(`define in verilog)
<geist>
and when you get the core manual from ARM for say a cortex-a72 it's basically the manual to that .a file
<heat>
jim keller i believe has also admitted to it
<heat>
which is why zen works suspiciously like an ARM core
<geist>
and yes AMD was working on an ARM core (K12) and then basicaly switched the front end to x86
air has joined #osdev
<geist>
and jim keller (who isn't at AMD anymore) confirmed it in at least one interview
<geist>
at least for Zen 1 there was a lot of microarchitectural details that act a lot like ARM if you grok the architecture
<geist>
the way the caches work, the way the TLB works, the layout of the cores, etc
<heat>
tlb fetching and caching a whole cacheline of contiguous entries :v
<heat>
tlbi and dsb ish
<heat>
sorry i mean invlpgb and tlbsync
<geist>
yeah you can imagine they designed a lot of those features into the TLB implementation and whatnot and then wern't able to access it in the x86 arch
<geist>
OTOH when you really look at it the two arches are not that different, so it's not exactly a huge thing
<geist>
you design the microarchitecture to be able to express stuff you can do in the ISA and once you squint you can see that both arches dont differ much, at least in integer space
<nikolar>
yea
<nikolar>
i mean arm even has similar addresing modes
<geist>
but things like what heat says where you can do more elaborate and clever TLB flushes and wahtnot in ARM they had to literally extend the x86 ISA to get to it
<nikolar>
very ciscy for a risc arch
<geist>
right
<geist>
it still has the biggie that really helps: fixed size instructions on fixed boundaries. that's a massive boost for the decoder
<geist>
and makes it very easy to say look at 32 bytes ata time and parallel decode 8 instructions
<geist>
which lots of high end ARM implementation sdo
<nikolar>
to be fair, it's not like instruction decode is a major bottleneck on x86
<geist>
it is
<geist>
it's a massive PITA, but there's lots of smarts in the last 30 years to try to help (with the uop cache, etc)
<geist>
but it's really hard to parallel decodem ore than i think 4 or 5 or 6 at a time
<geist>
i think the widest i've seen is 6?
<heat>
one has to wonder if its worth it to be more compact with variable-length insns, versus larger, fixed size encodings
<geist>
with tons of limitations re: instructions crossing cache lines, how many prefixes, etc
<geist>
this compact vs not debate is being hotly fought in the riscv space. lots of companies that are dipping their toes into it are not happy with the 16 bit compressed instructions
<geist>
i think the defacto will emerge soon
<nikolar>
i forget, can you mix thumb and regular arm instructions or not
<nikolar>
for riscv, you can mix compressed and normal insturctions
<geist>
at a branch boundary, and this is all arm32 we're talking about
<geist>
arm64 has no such thing
<geist>
for riscv it's at the instruction boundari
<nikolar>
ye
<nikolar>
oh arm64 doesn't?
<nikolar>
interesting
<geist>
ie, the first 2 or 3 bits in the instruction itsel f(low bits in little endian too) basically describe the length of the instruction
<heat>
even with compressed instructions, certain x86 encodings are a lot tighter than the corresponding riscv ones
<nikolar>
true
<geist>
yeah arm64 is a completely different beast. a from up instruction far more standard risc looking
<heat>
to keep with the ameripol theme, there are many such cases
<geist>
afk a little bit
<nikolar>
x86 has more compact code generally, no?
<heat>
that's what i'd guess yes
<nikolar>
you can think of x86 having compressed instructions for accesing the first 8 register
<nikolar>
*registers
<heat>
x86 has compressed instructions for everything
<nikolar>
and then having progressively longer instructions for more regisers (assumning apx doesn't die)
<nikolar>
heat: sure, just giving an example
<heat>
like, IIRC inc %eax/%ax is a single byte instruction
<nikolar>
i don't think both are
<nikolar>
i thinkg inc %eax is or sometihng
<nikolar>
the default should be 32bits
<heat>
depends on the processor mode ofc
<heat>
for 16-bit you probably need the override prefix
<nikolar>
yes, i am talking about x86-64
<heat>
(and for 64-bit, rex.w)
<nikolar>
> In 64-bit mode, INC r16 and INC r32 are not encodable
<nikolar>
heh
<nikolar>
> (because opcodes 40H through 47H are REX prefixes)
<heat>
huh, TIL
<nikolar>
i keep forgetting they repurposed some incs for REX
<nikolar>
> Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.
<nikolar>
inc %ax is a byte longer than inc %eax
<cloudowind>
ok
<nikolar>
so yeah, 32 bit is the default
<nikolar>
and with apx, inc %r19 or whatever is 4 bytes
<nikolar>
also, lea is a very fancy instruction
<nikolar>
let's you do various kinds of math in a single instruction
<nikolar>
lea 0x8(%rax,%rdi,8),%rax in 5 bytes
<geist>
and yes x86 code tends to be pretty dense
<geist>
thumb2 gets pretty close on average
<geist>
and riscv + compressed gets within i dunno 20% or so of x86 size
<geist>
the new APX stuff is interesting too, since it once again extends x86 with a new prefix (REX2) to 32 regs
<geist>
REX2 basically acts like REX + another bit for more registers
<geist>
but it also replaces some of the other prefixes so the code density isn't too much worse
<geist>
same as the VEX prefix, there's a field/bit that says 'this is a 0x0f or a ... instruction' which removes a byte or two out of the register you're adding a prefix to
<geist>
which ends up with a net of maybe one more byte, and sometimes the same
<heat>
x86 is the linux kernel of architectures
<heat>
large set of really clever hacks that look really ugly
<geist>
yah the part where they encode common prefix bytes into the VEX and VEX2 is pretty clever: Two p bits encode additional prefix bytes. The values 0, 1, 2, and 3 correspond to implied no, 0x66, 0xF3, and 0xF2 prefixes. These encode the operand type for SSE floating-point instructions: packed single, packed double, scalar single and scalar double, respectively.
cloudowind has quit [Ping timeout: 255 seconds]
cloudowind has joined #osdev
<nikolar>
heat: i think x86 is really underappreciated
<nikolar>
as in, everyone likes to crap on it
<nikolar>
geist: yeah that's pretty clever
<nikolar>
replace the whole prefix with a single bit :P
<heat>
x86 runs the world
<heat>
well, arm too
<heat>
darn you embedded and android
<nikolar>
lol
<nikolar>
what i mean is everyone likes to crap on x86, without realizing that arm or riscv aren't uncoditionally better or whatever
<heat>
x86 is in many ways crap, in many ways good, in many ways disgusting
<heat>
that's just objectively true
<nikolar>
sure
<nikolar>
but most people focus on disguisting
<heat>
but yes arm and riscv aren't unconditionally better
<nikolar>
without saying anything about good
<heat>
fwiw stuff like address calculation and loading of constants and [...] are mega yucky in RISC
<heat>
you need 2 (two) instructions for a call in riscv
<nikolar>
yeah
<nikolar>
luiw and auipc or whatever it was
<heat>
auipc + jalr
<nikolar>
while in x86, it's move $1234, %rax
<nikolar>
:P
<heat>
and those two instructions have the most yucky semantics ever like "load this address but excluding bits 22-0"
<heat>
22-bit relocations are crazy
<heat>
auipc (Add Upper Immediate to Program Counter): this sets rd to the sum of the current PC and a 32-bit value with the low 12 bits as 0 and the high 20 bits coming from the U-type immediate
<nikolar>
auipc is something like add upper immediate
<nikolar>
yeah
<nikolar>
very yucky
<heat>
lea sym(%rip), %rax go brrrrr
<nikolar>
i am not sure why having a 32 bit immediate was such a no no
<heat>
or mov sym(%rip), %rax even
<heat>
you can't have a 32-bit immediate if all your instructions are 32-bit
<nikolar>
well you can have a 32 bit inst and a 32 bit immediate right after :P
<heat>
we have just invented variable length encoding
<nikolar>
i know
<nikolar>
that's the point :P
<heat>
i assume all of this isn't too slow or yucky if you macro-op fuse those two
<heat>
the disasm looks yucky but it's kind of a clever way of maintaining fixed-length
Gooberpatrol66 has quit [Ping timeout: 260 seconds]
Gooberpatrol66 has joined #osdev
<geist>
that's the real key. riscv and arm rely a lot on microop fusing
<geist>
their optimization manuals (as well as x86) tell you to put certain instruction sequences next to each other
<geist>
so it can just see the whole pattern and flatten that out
<nikolar>
so instead of splitting like you might do on x86, they do fusing :P
<nikolar>
i think fusion is harder than splitting though
<geist>
depends. yeah but it's probably a huge win
<nikolar>
btw, geist, data16 prefix
<geist>
the obvious one that pretty much all big arches do is something like 'test + conditional branch'
<nikolar>
does it do anything on x86_64
<nikolar>
or is it just a nop prefix to modify the instruction length
<geist>
0x66?
<nikolar>
not sure, i just notice it occasionally in dissasembly
<nikolar>
like data16 nop whatever whatever
<zid>
data16 is 0x66
<nikolar>
that one then
<geist>
yeah 0x66. that sill works
<geist>
you're seeing it as a useless prefix to nop to pad out space
<nikolar>
does it do anything
<geist>
because there's a bunch of silly ways to generate variable length nops
<geist>
yes. to a real instruction you wont see it separate like that
<nikolar>
ah ok
<nikolar>
that's what that is then
<geist>
if you look at the disasm and the hex bytes you'll see it
<zid>
0: ff c0 inc eax
<zid>
5: 48 ff c0 inc rax
<zid>
2: 66 ff c0 inc ax
<nikolar>
right fair enough
<nikolar>
i wasn't paying attention to the bytes
<zid>
data16 inc eax -> 66 ff c0
<zid>
magic
<nikolar>
i would've prebably noticed that
<heat>
66 is one of the recommended nop prefixes
<heat>
intel specifies like 5 or 6 nop sequences, amd specifies like 15 no joke
<zid>
capstone or whatever does it to show you 'in the assembly' that a 16bit override has been used on an instruction where it does nothing
<nikolar>
right yeah
<geist>
yah i was just playing with it in godbolt though it kept using a 32bit lea :)
spareproject has quit [Remote host closed the connection]
op has joined #osdev
bencevans has quit [Read error: Connection reset by peer]
sidcha has joined #osdev
bencevans has joined #osdev
<Ermine>
yay, my kernel patch got merged!
bauen1 has quit [Ping timeout: 255 seconds]
the_oz_ has joined #osdev
the_oz has quit [Ping timeout: 245 seconds]
housemate has joined #osdev
<dinkelhacker>
nice! :) What was it about?
<Ermine>
cosmetic changes to virtio-gpu driver
chiselfuse has quit [Remote host closed the connection]
chiselfuse has joined #osdev
raphaelsc has joined #osdev
hwpplayer1 has joined #osdev
heat has joined #osdev
<heat>
Ermine, noiiiice
<heat>
now you have to send patchen for onyx, thats just how it goes
levitating has joined #osdev
<Ermine>
o7
<nikolar>
wait what did i miss
<Ermine>
Not a lot I guess
levitating has quit [Quit: Leaving]
housemate has quit [Ping timeout: 244 seconds]
housemate has joined #osdev
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
bauen1 has joined #osdev
eddof13 has joined #osdev
eddof13 has quit [Client Quit]
pg12 has quit [Ping timeout: 252 seconds]
pg12 has joined #osdev
edr has joined #osdev
op has quit [Remote host closed the connection]
griddle has joined #osdev
Left_Turn has joined #osdev
goliath has quit [Quit: SIGSEGV]
craigo has quit [Quit: Leaving]
Matt|home has quit [Ping timeout: 276 seconds]
eddof13 has joined #osdev
cloudowind has quit [Ping timeout: 252 seconds]
cloudowind has joined #osdev
griddle has quit [Quit: griddle]
m5zs7k has quit [Read error: Connection reset by peer]
m5zs7k has joined #osdev
cloudowind has quit [Ping timeout: 248 seconds]
cloudowind has joined #osdev
goliath has joined #osdev
nortti has quit [Ping timeout: 252 seconds]
clever has quit [Ping timeout: 252 seconds]
cloudowind has quit [Ping timeout: 260 seconds]
nortti has joined #osdev
cloudowind has joined #osdev
pabs3 has quit [Ping timeout: 248 seconds]
Arthuria has joined #osdev
pabs3 has joined #osdev
hwpplayer1 has quit [Quit: I'll be back later]
LittleFox has quit [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]
LittleFox has joined #osdev
griddle has joined #osdev
griddle has quit [Client Quit]
griddle has joined #osdev
LittleFox has quit [Quit: ZNC 1.8.2+deb3.1+deb12u1 - https://znc.in]
LittleFox has joined #osdev
clever has joined #osdev
Turn_Left has joined #osdev
Left_Turn has quit [Ping timeout: 260 seconds]
<dinkelhacker>
I currently am doing a bit of benchmarking. I have a cortex m7 running at 64 MHz. Doing 2000 context switches takes ~150 ms, enabling icaches takes the time down to ~50ms. I'm wondering if that is a plausible number or if I should expect more speedup from that. I guess in the end many factors go into it but does anyone have a gut-feeling about that?
goliath has quit [Quit: SIGSEGV]
griddle has quit [Quit: griddle]
FreeFull has quit []
Arthuria has quit [Ping timeout: 276 seconds]
eddof13 has quit [Quit: eddof13]
pabs3 has quit [Ping timeout: 264 seconds]
xenos1984 has quit [Ping timeout: 264 seconds]
xenos1984 has joined #osdev
eddof13 has joined #osdev
bauen1 has quit [Ping timeout: 246 seconds]
pabs3 has joined #osdev
FreeFull has joined #osdev
xenos1984 has quit [Ping timeout: 252 seconds]
Left_Turn has joined #osdev
Turn_Left has quit [Ping timeout: 276 seconds]
xenos1984 has joined #osdev
the_oz_ has quit [Read error: Connection reset by peer]
annamalai has quit [Remote host closed the connection]
<zid>
heat: does akinator know about onyx yet, that's how you know if you've made it
cloudowind has quit [Ping timeout: 246 seconds]
cloudowind has joined #osdev
GeDaMo has quit [Quit: 0wt 0f v0w3ls.]
<heat>
no but my mom does
<heat>
thats way better
eddof13 has quit [Quit: eddof13]
goliath has joined #osdev
mcfrd has joined #osdev
pie_ has quit [Ping timeout: 252 seconds]
mcfrdy has quit [Ping timeout: 276 seconds]
n3t has quit [Ping timeout: 276 seconds]
pie_ has joined #osdev
vancz_ has joined #osdev
tanto_ has joined #osdev
wereii has quit [Ping timeout: 276 seconds]
vancz has quit [Ping timeout: 276 seconds]
tanto has quit [Ping timeout: 276 seconds]
n3t has joined #osdev
mcfrd is now known as mcfrdy
wereii has joined #osdev
eddof13 has joined #osdev
npc has quit [Remote host closed the connection]
<geist>
dinkelhacker: hmm that seems a bit slow to be honest
<geist>
that means each context switch is like 100 usec, which seems pretty slow
<geist>
should be closer to tens of ns
<geist>
though it is at 64Mhz.... is the cache enabled? m7s have i and d cache
<geist>
s/ns/us
<geist>
but should be IMO about 10x as fast as that
<heat>
actually how fast are modern x86 context switches?
eddof13 has quit [Quit: eddof13]
<geist>
maybe 500ns?
<geist>
depends on precisely how much work is being done
<geist>
but i seem to remember something like interrupt latency is like 400us or so with some state pushed, etc
<geist>
so it's in that ballpark
<geist>
probably a full preempt with interrupt, context switch, iret is maybe 1us?
<geist>
and those numbers are fairly moderate consumer hardware. you can scale that in either direction a few times
netbsduser` has quit [Ping timeout: 246 seconds]
<heat>
actually what do you mean with "context switch"
<heat>
cuz that's hard to define, right?
<heat>
just popping registers and iretq'ing should be pretty fast and straight forward
<heat>
locks (even uncontended ones) should be somewhat expensive - my measurements on my hardware point to around 6ns per lock cmpxchg
<heat>
mov cr3 i've never benched, wrmsr's usually have long latency
<nikolar>
yeah i wonder what exactly he's measuring
<heat>
one of these days i ought to implement x86 asids
<nikolar>
heh
<geist>
yeah i'm generally defining it as just sort of the plain operation
<heat>
arm tlb management is incredibly goated btw
<geist>
assuming no giant locks, etc
<geist>
heat: yesssss you are coming to the dark side
<geist>
once you grok it the ARM mmu is goat
<heat>
the arm mmu isn't That Hard once you go through the "wtf are all these bits" moment
<heat>
and tlbi is worth.
<heat>
well, uhh, bbm is fucking annoying
<geist>
yeah
<geist>
and once you get past v8.0 features pretty much all of the new ones can be ignored until you're ready to use them
<geist>
either because the default behavior is the same as before or you have to enable it via some bit
Turn_Left has joined #osdev
<heat>
there's this really funny linux kernel optional optimization feature that does write-protection of some kernel memory
<heat>
this was accidentally enabled on ARM until they realized it was UNPREDICTABLE
<heat>
since this is pretty rare, the solution google is proposing seems to involve stopping other cores for a while while they pull the BBM off
<heat>
also, yay google arm64 servers
Left_Turn has quit [Ping timeout: 260 seconds]
<heat>
geist, hmm is it possible for nr_asids < nr_cpus on arm64
<heat>
i was briefly thinking about ASID assignment when nr_asids < nr_processes. i guess you kinda end up with some weird logic where you need a wildcard ASID... though i guess ASID 0 is that? i don't remember all the details
<geist>
heat: yeah some cores have 8 bit asids
<geist>
notably apple cores
<heat>
i was more worried about something awful like the thunderx2 having 8 bit asids with the 300 cores
<heat>
but asid per-cpu doesn't get you anywhere anyway
netbsduser` has joined #osdev
eddof13 has joined #osdev
netbsduser` has quit [Ping timeout: 248 seconds]
eddof13 has quit [Quit: eddof13]
netbsduser` has joined #osdev
Left_Turn has joined #osdev
remexre has quit [Ping timeout: 252 seconds]
Turn_Left has quit [Ping timeout: 244 seconds]
remexre has joined #osdev
Left_Turn has quit [Read error: Connection reset by peer]