<heat>
i was going to ask why jmp couldnt just alias to mov ..., pc but I realize that it's a very inefficient jump opcode
[itchyjunk] has quit [Ping timeout: 268 seconds]
<geist>
Yah, there are a few arches where PC is a regular register like that, but PDP-11 and arm32 were one of them
<geist>
However, arm32 still has regular branch instructions too, mostly so you can get more access to more bits for an offset
<geist>
So you had both: b<cc>, bl <offset>, bl <register>, and a bunch of mov/add/sub/etc pc stuff
<geist>
Actually kinda a mess when you think about it, since there are a bunch of ways to do the same thing
<geist>
Neat when you’re writing assembly,m but hard to optimize
dude12312414 has quit [Remote host closed the connection]
dude12312414 has joined #osdev
<geist>
Though it’s sort of a generic riscv thing to do (the arm32 solution) in another way it’s sort of anti-risc because you have multiple ways to accomplish the same thing
<geist>
So it depends on how you look at it
[itchyjunk] has joined #osdev
<geist>
In the case of arm64 at least all of the regular registers have no special purpose *except* x30, which is also known as lr. It’s implicitly used in bl and ret instructions
<geist>
Riscv is sort of cleaner and worse since none of the registers have any special case *except* the arch highly recommends you use certain regs for SP, RA, etc. it goes out of its way to say if you use them in those functions implementations may optimize for it
<geist>
Which really means they may as well be special regs. I think this was the crux of ARMs decision in arm64 to do that
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<heat>
yeah
<heat>
also that reminds me
<heat>
how do call stacks look?
<heat>
how do you unwind the stack?
<heat>
if it assumed you're saving lr at the top of the stack before doing bl?
<zid>
I tend to use tweezers and a slightly blunt scalpel for scraping
<dh`>
well no, a special reg is really one that's not in the general register file so you can't address it in most instructions, and that's definitely bad for the stack pointer
<dh`>
for ra, not so clear
heat has quit [Ping timeout: 240 seconds]
vai has joined #osdev
freakazoid333 has quit [Ping timeout: 255 seconds]
<geist>
dh`: depends. lets the cpu optimize it better of push/pops are always against a particular special register. key is riscv says 'the stack can be anything' but then it basically says 'it really needs to be this one'
<geist>
same with ra
<geist>
so it's competing goals of a generic ISA that's clean and pure and one that is designed for high speed implementations, but of course i'm sure these topics have been beaten to death elsewhere
<geist>
but i think really a lot of these boil down to 'i like how arm64 does it' vs 'riscv is designed to be extremely simple' so it's a give and take. both are nice in various ways
<clever>
geist: how did riscv deal with hypervisors and nested virtualization?
<geist>
nested virt i dunno, hypervisors it's kinda more similar to x86 style than ARM style
<geist>
basically in supervisor mode you can (if you're allowed to) set up a set of banked regsters and then switch to virtualized mode, which then switches modes such that the cpu then appears to be running in supervisor but is really using the banked copies
<geist>
traps out as usual, nested paging as usual
<geist>
so it's kinda a side step, or maybe if you considered supervisor to be EL1, you drop to EL0.9 or something
<geist>
nested seems like it'd be kinda straightforward though you'd have to trap fiddling with those regs
<geist>
i think initially they were planning on doing a hard nested levels, a-la armv8. there
<geist>
there's even an unused mode bit combination for it
<geist>
note banking the control regsters like that on riscv is fairly straightforward since there are only a handful of them. less than 10 or so
<geist>
so it's pretty simple, at least for now. as more state is inevitably added to the architecture will see
<clever>
ah
<geist>
dunno if linux has KVM support for that yet, but qemu has emulation support for it at least
<clever>
i was talking in #arm on the osdev discord, about how to handle the traps, for my LK hypervisor idea i mentioned before
<clever>
and the basic conclusion i came to, is that you can treat it like `eret` returns, kinda
<bslsk05>
github.com: linux/arch/riscv/kvm at master · kvm-riscv/linux · GitHub
<vai>
Sane virtualization A.I. start up am planning.
<clever>
the 1st half, will save all EL2 gprs, restore the EL1 gprs, restore the spsr/elr, and eret down to EL1, leaving a bunch of saved state on the EL2 stack
<geist>
yep, that seems like a pretty standard way to do it
<clever>
the "exception from lower level" vector, will then assume the EL2 stack is in that state, save the EL1 regs, restore the EL2 regs, and then `bx lr`
<clever>
so the function from the 1st half will return, like a normal function
<geist>
and you see how the EL2 virtualization feature bit makes that even simpler
<geist>
since it removes the need to bounce through EL2 for the host kernel
<clever>
yeah, just run LK entirely in EL2
<clever>
and either use that optional feature to alias EL1->EL2, or rewrite half of LK to use the EL2 names
<geist>
yah the latter would probably involve macroizing most of it
<geist>
so you could set a build flag
<clever>
exactly
<geist>
probably not too terribly difficult
<clever>
and once its a build flag, you could also run LK in EL3 as well
<geist>
had to basically do that for riscv so it can run in M and S mode
<geist>
indeed
<clever>
and macro the "guest" stuff a bit, and boom, you now have a secure firmware
<geist>
that's a fun project, get to it!
<geist>
and do it in lots of commits, that's much easier for me to take
<geist>
build the mechanism first, then commits to start converting parts
<clever>
i was thinking the first step, would be to just allow LK to run normally in EL2 or EL3
<clever>
without actually doing any different while in those levels
<geist>
right
<geist>
but even more so than that, figure out the mecyhanism to statically build it for the different modes
<clever>
yeah
<geist>
find all the places where registers are referred and see about making it a compile time thing
<geist>
prototype a mechanism, then find all the places where it'd need to get changed and see what happens
<clever>
yeah
<geist>
EL2 is difficult to generically run in, however, just as a warning. it doesn't support all of the features that you think exists in EL1
<clever>
for example?
<geist>
or at least not without the feature that makes it look like EL1
<geist>
like TTBR1
<geist>
'high' virtual memory
<clever>
ah, for the high/low split
<clever>
kinda makes sense, you would just have a single virtual view for the whole hypervisor, and EL2 isnt meant to be dealing with EL0
<geist>
basically that EL2 feature (forget the FEAT_* name) basically 'completes' EL2 while also giving it an ability to alias EL1 registers
<clever>
and then its EL1's job to deal with EL0 and need a split
<geist>
anyhoo, gotta go. doing an escape room in a bit. should be fun!
<geist>
ttyl! unless i dont make it out
<clever>
yep, laters
<clever>
lol
gog has quit [Ping timeout: 245 seconds]
<vai>
Jaahas meni sitten 30 minuuttia tai jotain kävellessä.
<vai>
oops :) sorry
frkzoid has joined #osdev
<vai>
ramdisk driver is not good, but having block drivers buggy is much more easier to fix than core file system corruptions, no file system corruptions
[itchyjunk] has quit [Remote host closed the connection]
gildasio has quit [Remote host closed the connection]
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
<moon-child>
froggey: follow up to your question from many months ago: https://weinholt.se/scheme/alignment-check.pdf says 'Using alignment checking does therefore not appear to incur any overhead as long as programs do not generate an excessive number of exceptions.'
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 240 seconds]
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
vdamewood has joined #osdev
gxt___ has quit [Remote host closed the connection]
gxt___ has joined #osdev
terminalpusher has joined #osdev
Vercas6 has quit [Quit: Ping timeout (120 seconds)]
Vercas6 has joined #osdev
terminalpusher has quit [Remote host closed the connection]
<froggey>
moon-child: oh interesting, thanks!
poyking16 has quit [Ping timeout: 268 seconds]
poyking16 has joined #osdev
gog has joined #osdev
socksonme_ has joined #osdev
poyking16 has quit [Ping timeout: 268 seconds]
poyking16 has joined #osdev
poyking16 has quit [Ping timeout: 240 seconds]
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
poyking16 has joined #osdev
[itchyjunk] has joined #osdev
poyking16 has quit [Ping timeout: 268 seconds]
poyking16 has joined #osdev
gildasio has quit [Remote host closed the connection]
gxt___ has quit [Remote host closed the connection]
gxt___ has joined #osdev
gildasio has joined #osdev
node1 has joined #osdev
<node1>
Hi
<node1>
Can we consider a snapshot is (Full + incremental) data?
poyking16 has quit [Ping timeout: 245 seconds]
<mjg>
i think you may want to provide some context for the question
poyking16 has joined #osdev
terminalpusher has joined #osdev
<node1>
sure, There is an option in hypervisor. Which has an option for full and link clone. And it says a `link clone has new snapshot will be created in the original virtual machine`
<node1>
So i would likes to understand whether this snapshot is incremental copies or differential copies or full + incremental copies?
<mjg>
that probably depends on the hv, you should ask on the related channel
<mjg>
there are funny hv-specific formats for achieving this
poyking16 has quit [Ping timeout: 268 seconds]
node1 has quit [Ping timeout: 252 seconds]
node1 has joined #osdev
node1 has quit [Ping timeout: 252 seconds]
SpikeHeron has quit [Quit: WeeChat 3.0]
vai has joined #osdev
<vai>
way back
<kazinsal>
in most hypervisor platforms a "snapshot" consists of functional metadata + the delta
<kazinsal>
in which the delta of the persistent storage is only valid assuming the source is unchanged
<kazinsal>
in dangerously common parlance, that condition not being met means that the resulting system is "undefined"
<kazinsal>
nobody clones the base disk for a couple of reasons
<kazinsal>
most instantly notably it means hitting the snapshot button means immediately doubling the storage requirements of the VM
<kazinsal>
whether or not you implement that by freezing the VM and cloning the disk on backing media then unfreezing and forking doesn't realistically matter
Piraty has quit [Quit: -]
Piraty has joined #osdev
heat has joined #osdev
Piraty has quit [Client Quit]
Piraty has joined #osdev
<vai>
Writing FAT takes long time, so I am using a 2 Mb HD image :) Qemu.
heat has quit [Remote host closed the connection]
heat has joined #osdev
dutch has joined #osdev
Piraty has quit [Quit: -]
poyking16 has joined #osdev
Piraty has joined #osdev
terminalpusher has quit [Remote host closed the connection]
Vercas6 has quit [Ping timeout: 268 seconds]
Piraty has quit [Quit: -]
vai has quit [Ping timeout: 245 seconds]
Piraty has joined #osdev
Vercas6 has joined #osdev
Messier81 has joined #osdev
Messier81 has quit [Ping timeout: 245 seconds]
the_lanetly_052_ has quit [Ping timeout: 245 seconds]
carbonfiber has quit [Quit: Connection closed for inactivity]
gildasio has quit [Ping timeout: 268 seconds]
gildasio has joined #osdev
the_lanetly_052_ has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 245 seconds]
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
<heat>
here's a weird gcc detail: it will not use cmpxchg16b for __atomic intrinsics but will for __sync
<heat>
it's explicitly documented that way
<heat>
why?
<zid>
C++ abi has some weird peculiarities wrt primitives
<zid>
but don't ask me what they are
<heat>
what are they
* heat
is a rebel
<psykose>
i'm telling mom
gildasio has quit [Quit: WeeChat 3.6]
gildasio has joined #osdev
gildasio has quit [Client Quit]
<heat>
how usable is the rpi 400?
<heat>
for both desktop lunix and osdev
<psykose>
it's an overclocked rpi4, i'd imagine it's fine
<heat>
yes but erm, how usable are those
<heat>
like is it actually decently powerful
<psykose>
it depends on your expectations
<psykose>
no, it fucking sucks
<psykose>
i host a bunch of shit out of mine
<psykose>
but it's good for what it is and what it cost
<heat>
define "fucking sucks"
<psykose>
for me to define that you'd have to define decently powerful
<heat>
upper range is "gives you a decent desktop experience and with firefox", lower range is "can it compile things in a reasonable amount of time"
<psykose>
i can bench a compile for ya if you want
<psykose>
the former.. definitely not
<psykose>
(though i haven't tried? maybe it's magic)
<j`ey>
heat: my linux kernel build on rpi takes 30mins
<j`ey>
for a cut down kernel
<heat>
fucking what
<heat>
oh
<psykose>
yeah that sounds about right
<heat>
that's horrible
<j`ey>
and that's a specific kernel, with as much as possible turned off
<psykose>
it's like a pentium4 or so lmao, isn't it
<heat>
what's the gold standard for a relatively affordable arm64 machine then?
<psykose>
there isn't one
<psykose>
all the sbcs are in this range
<psykose>
if you want to pay more into the larger category.. probably those clearfog boards
<heat>
I don't need an sbc, just some sort of board or laptop even
<heat>
well, I don't need one, I want one
<j`ey>
m1? :P
<psykose>
>affordable
<heat>
hahahaha
<heat>
you gave me an idea
<j`ey>
im sure heat can afford that!
<heat>
correct
<heat>
otoh
<heat>
it's a macbook
<j`ey>
pinebook pro? but I dont think its much faster
<bslsk05>
www.theregister.com: Open source laptop PineBook Pro is shipping again • The Register
<heat>
j`ey, make arm64_defconfig sounds about right?
<heat>
wait, is that even a thing?
<heat>
looks like it
<heat>
ok no jk
<heat>
ah, it's make defconfig
<j`ey>
yeah that ^
<heat>
what's the kernel's target for arm64?
<heat>
it's not bzImage
<Ermine>
rpi 400 reminds of commodore 64
<j`ey>
Image
<j`ey>
heat: ^
<heat>
yeah
<heat>
because you geniuses got rid of compression
<heat>
smh
<j`ey>
let the bootloder do it
<Ermine>
GeDaMo: UK keyboard version is out of stock already
<heat>
oh well
<heat>
i'm fucked
<heat>
I was trying to compile linux on my phone
<heat>
mission failed
<Ermine>
oom?
<heat>
no
<heat>
there's some libc/kernel header fuckery going on inside termux
<heat>
could I use an adb shell? shrug
<Ermine>
and use stuff from termux?
<heat>
no
<heat>
and just compile it
<heat>
like, can base AOSP compile linux
<Ermine>
ah, you have root access
<heat>
hm?
<Ermine>
if you want to compile linux with adb shell, you need to install gcc, and you would need to have root rights for it
<heat>
termux has a copy of LLVM
<heat>
you do not need root to install software
<Ermine>
ah
<psykose>
writing files to the disk needs root
<heat>
no it doesn't
<psykose>
reading them back needs root too
<psykose>
looking at your phone? that needs root as well
<Ermine>
on android, there's a whole bunch of partitions
<heat>
unlocking your phone needs root
<psykose>
oi, you got a root license mate?
<Ermine>
buying the phone needs root
<heat>
ok the answer seems to be "no, it's not possible"
<psykose>
see? you need root
<heat>
however, I found out I have an ld installed
<heat>
for some reason
<heat>
MCLinker
<Ermine>
cursed idea: subscription access to root
<heat>
love the idea
<heat>
ok i think i was just missing packages
<heat>
ok no im kinda fucked
<Ermine>
go to ur computer
<Ermine>
typing is tiring af
<heat>
im on my computer
<heat>
i just wanted to benchmark it
<heat>
theoretically the cpu should be pretty decent
<zid>
heat how tall are you
<gog>
meow
<gog>
good evening friendos
<zid>
I know gog is 290cm tall
<gog>
yes
<gog>
293 after yoga
<psykose>
lorge fish
poyking16 has quit [Ping timeout: 240 seconds]
poyking16 has joined #osdev
srjek has joined #osdev
srjek|home has joined #osdev
srjek has quit [Ping timeout: 244 seconds]
rorx has quit [Ping timeout: 244 seconds]
<geist>
re: rpi400. think of it as like a mini netbook class stuff. hard to giv eyou a good frame of reference
<geist>
but it's like say a very lowly clocked modern x86 with 4 cores and no SMT
<clever>
SMT?
<geist>
hyperthreading
<geist>
SMT (simultaneous multithreading) is the generic term for that
<zid>
HT is an intel trademark thingy yea
<Ermine>
I'm thinking how to get pine64 tech in country which they do not ship to
<psykose>
i would guess you kinda just don't, unless you know someone that can bring one or has
<psykose>
because if anyone actually did it with the intention of resale, it would be more expensive than something that already does ship
<Ermine>
or travelling to country which is in the list
<psykose>
yea
poyking16 has quit [Ping timeout: 245 seconds]
<heat>
zid, three football fields
<psykose>
that's tiny
<heat>
geist, yeah but if it takes 30 minutes to compile a very cut down linux kernel then it's super super underpowered I'd say
<heat>
psykose, it gets big when I get hard, I swear
<psykose>
prove it
<heat>
😳
chartreuse has joined #osdev
<heat>
I thought the a72s were decent though :/
<psykose>
nah
<psykose>
i mean it's fine, you're not going to be building linux on it
<heat>
I would like something capable of building my OS
<heat>
especially under itself for the fun of it
<psykose>
i can bench it for you if you want
<psykose>
though mine is overclocked to 2ghz
<j`ey>
heat: gcc takes an hour or so to build
<heat>
yeah erm forget it lol
<heat>
theoretically building my OS would require building a toolchain, building the base system (pretty quick), and then building packages (which builds a toolchain again, plus a bunch of shit)
<geist>
heat: sure. but then that's exactly expected
<geist>
i'm not really apologizing for it or whatnot, but think of it as a mid range desktop cpu from say 2010. runs on a few watts
<geist>
downclocked 2010 cpu more
<heat>
yeah sure
<geist>
things like compiling linux on it are probably not a thing you should do much, or at least use ccache
<zid>
everything post 2011 is trash, anything pre 2011 is antique, change my mind
<heat>
where are you supposed to get a decent arm64 machine without buying a server?
<geist>
OTOH one of the big downsides to the rpi line is storage is achingly slow, so you also need to factor that out of it. using MMC as a root is terrible
<geist>
heat: again define decent
<heat>
modern x86-like performance
<geist>
pay for a VM
<geist>
or get a mac
<heat>
that's kinda depressing
zhiayang has quit [Quit: oof.]
<zid>
arm's never had performance
<zid>
it's just had cheap
<geist>
such are many things in life
<heat>
are the arm chromebooks that bad?
<psykose>
adding usb3 storage helps
<geist>
right. apple of course changed that completely, at least proved that you *can* get performance out of the arm arch
<geist>
it's still just a matter of money and the desire to do so and building a team that does
<zid>
yea m1 is really rad
<zid>
whatever team they assembled knocked it out of the park
<zid>
whoever was in charge really knew what they were talking about
<geist>
in the sense that building a team to build intel/amd class desktop stuff only exists in a few places
<zid>
and got all the bottleneck predictions correct
<geist>
and so the ISA is really not the biggest part. it's the money/time/team/people power
<zid>
hardest part of a big ass project like that is invariably going to be "not knowing what the issues you will run into will be, ahead of time"
zhiayang has joined #osdev
<geist>
yes. and apple spent 10 years on that
<zid>
even 10 years is amazing
<geist>
you gotta build up the team, you can't just will it into being with money
<geist>
right
<zid>
they clearly struck gold on getting the right things earmarked as future issues
<geist>
but not to turn things into a love fest, it's just that it *can* be done
<zid>
you're not going to realize what you REALLY want is a super overpowered memory controller, just looking at the ISA or whatever
<geist>
heat: but seriously if you want a non x86 desktop experience apple is it. you might be able to start getting used M1 stuff as folks upgrade to M2
<mjg>
i thought non-x86 desktop experience is that nothing works
<zid>
it sorta works
<mjg>
in which case get yourself a talos box
<geist>
and of course there are lots of arm server stuff now, though they're server stuff so you really wont have one physically, but you can get a slice
<zid>
thanks to marcan
<geist>
mjg: what do you mean 'nothing works'?
<geist>
i take somewhat of an exception to that, lots of stuff works, just depends on precisely what you mean
<mjg>
i see that was rather unclear. up until recently even apple was x86
<geist>
like 'running x86 binaries directly?' probably not so well
<geist>
mjg: sure except when it wasnt, etc
<mjg>
so any non-x86 desktop was a bizzaro "pray it works"
<mjg>
people running sparcstations, powerpc and other stuff
<geist>
again i disagree with that. a rpi400 runs a pretty nice desktop, for example
<geist>
it's just not super fast
<psykose>
pretty much everything works in my experience, even web browsers etc had 99% of everything working just because phones share the same architecture
<mjg>
it does?
<geist>
sure it is
<psykose>
the main thing people complain about is that they can't run random third party programs lol
<geist>
lots of poeople put in a lot of times making stuff work fine on non x86
<mjg>
i'm positively surprised, i stopped following rpis long atime ago
<geist>
its a bit disengenous to just write it off as 'pray it works'
<mjg>
well, as a counter point, i got 2 different arm boards at work and some x86
<geist>
if it's open source it probably compiles fine on non x86 right now
<mjg>
wtf errata is rampant on the former
<geist>
what kinda errata?
<geist>
like, 'dont turn it on it catches on fire' or 'the cpu has a bug tha tyou need a workaround for?'
<mjg>
for example the cpu might fuck up it's own cache if you enable l2 prefetching
<geist>
that sounds like a piece of shit socs. i can assure you that most of them are pretty good
<geist>
but there are some vendors that do their ownt hing and fuck up stuff like caches *cough* broadcomm
<mjg>
not saying no to the first sentence :P
<mjg>
i used to work for a company doign 100% embedded work, vast majority of it arm
<geist>
also gotta remember there are a bazillion vendors and whatnot out there. it's part of the problem that ARM has been trying to standardize on
<geist>
alas it take a few bad apples to sour everything a bit
<mjg>
the cpu bugs & doc bugs combined shorttened lives by 10 years for each month of work
* geist
nods
<mjg>
fortunatley i did not have to deal with that at the time
xenos1984 has quit [Read error: Connection reset by peer]
<geist>
anyway, also depends a bit on the class of board, class of soc, how cheap they are, quality of the vendor, etc
<mjg>
my fav bug reported by coworker: you had to write to a reg as low endian
<mjg>
but you read it as big endian
<mjg>
:s
<geist>
well anyway, i wouldn't let that cloud your whole view of the ecosystem
<geist>
shitty socs has very very little to do with the ARM ISA
<mjg>
i like to rant man
<geist>
or the quality of arm cores
<mjg>
i have no opinion of arm itself, apart frm not liking ll/sc
<geist>
sure but also there are folks reading this that moight not be commenting, but get a bad takeaway
<geist>
so i like to make sure there's a bit of balance to the rants
<mjg>
but that is already addressed with LSE, so...
<heat>
henlo
<mjg>
geist: fair
<geist>
i'm kinda torn about the ll/sc style atomics. i first really hit it on PPC and it seems like a pretty good solution. or at least a pretty flexible solution
<geist>
main downside is it's hard to fairly do it in a big.LITTLE situation and i think it has scaling issues
<geist>
which is why arm puts the new atomics under 'LSE' which stands for 'large system extensions'
rorx has joined #osdev
<geist>
in general the trend with a lot of these things in modern designs is to do with less, more powerful instructions. ie, the CISCy style view sort of wins in the end for certain things since it gives the cpu more ability to do what it needs to do
<zid>
and also hardware accelerate things directly
* geist
goes to mention some new arm thing but then wonders if it's public, so doesn't
<mjg>
well the key point is that a more powerful instruction can be optimized in microcode
<zid>
rather than having to *guess* what the high level operation being done is
<mjg>
to give you an example
<mjg>
say you atomic add in multiple cpus at the same time
<geist>
mjg: yeah but instead of doing it across the board, can do it selectively. so atomics i think are a win win, even riscv uses single instruction atomics
<mjg>
they can cooperate and agree on end value
<mjg>
without every single one doing an explicit increment
<mjg>
huge deal in numa
<geist>
yah that's *probably* not what happens, but at least it keeps the cache line exclusive stuff to a more constrained situation
<mjg>
it does happen
<geist>
and yeah the ll/sc stuff is already a problem on big.LITTLE, since the big cores win the race every time
<geist>
we actually have observed it in zircon already. spinlocks are 'won' by big cores by some large marging
<geist>
much more so than the relative performance delta of the cores
<mjg>
have you tried fair locks like mcs?
<geist>
that's on the list of things to do, for this reason
<geist>
par tof the complexity is doing it with ll/sc style locks, etc
<zid>
geist: That's just overlooked in general, that high clockrates etc give you better latency
<geist>
since those tend to work better with atomics
<mjg>
so you really got a big.LITTLE board with small cores not doign LSE?
<geist>
yes
<mjg>
now that sounds like cheap shit :-P
<zid>
running a 2GHz cpu for half the time, vs a 1GHz cpu all of the time isn't *actually* the same thing in practice
<geist>
mjg: yes.
<geist>
basically all of the a53s, a57s, a72s, a73s out there, whcih are probably still the majority of a class arm cores out there already are not LSE
<geist>
it's a55+ and a75+ that went LSE (v8.1+)
<mjg>
[re that atomic optimization, it's a known idea: bunch of cpus on 1 node "agree" to bump something by n on another socket and then one goes ahead to do the work, assuming there is contention]
<geist>
and though that's been out for a while, there's lots of hardware out there that's pre that
<geist>
mjg: fairly certain none of the arm designs do that
<mjg>
i don't know what arm does
<geist>
they do atomics based on cache line exclusivity guarantees, even in LSE stuff
<mjg>
i do know arm suffers greatly when faced with cmpxchg loops et al
<geist>
especially since it's far more than add/subtract that yo can do with atomics, and frequently *do* with atomics
<geist>
compare and swap is generally the primary thing you do
<mjg>
(e.g., the graviton cpus on amazon cloud)
<geist>
note FWIW those will be LSE for sure
<mjg>
i know graviton is lse, it still sucks :p
<geist>
neoverse-n1 and whatnot is a derivitive of either a75 or a76 i forget
<geist>
there *is*however a cmpxchg instruction in LSE at least, so fi they suck i suspect it's in the way the mesh of cpus work
<geist>
and how the caches are distributed
<mjg>
i don't have numbers handy, but bottom line: there was a performance bug in freebsd where a certain rw lock would get taken/released A LOT with a cmpxchg loop
<geist>
one of the caveats with LSE is they haven't removed the ll/sc stuff, so it has to coexist
<geist>
so the underlying mechanism has to be the same
<mjg>
32 or whatever core amd64 suffered 10%-ish slowdown
<mjg>
arm around 50%
<mjg>
similar core count
* geist
nods
<geist>
yah would be interesting to see hat the interconnect looked like on whatever core it is
<geist>
also keep in mind therea re multiple vendors building their own implementation. Ampere, Cavium, etc
<mjg>
i google around a little, apparently it was not just me :-P
<geist>
they myay have radically different response to stuff
<geist>
which itself is a total PITA for kernel people
<geist>
since you can't really optimize it for one thing, necessarily
<mjg>
ye would be curious to bench realities of top-of-the-field cpus
<mjg>
in this aspect
<geist>
but yeah moving to ticket based spinlocks is on our list of things to do
<geist>
there's a key i haven't figured out with ARM and ticket locks with and without LSE (which we have to deal with)
<geist>
how to efficiently do it. with non LSE a plain spin based spin lock has a particular power optimization
<geist>
that involves looping, then failing to acquisition, then WFEing until someone releases it, and then spinning again
<mjg>
ticket spinlocks is one type of fair locks
<geist>
it's quite efficient, because you aren't actually spinning
<mjg>
another one is queues, like mcs
<geist>
but i dont understand precsely how to get the same optimization with the mcs or ticket locks. i'm sure it's doable, but the ARM ARM hasn't updated it
<geist>
all their examples still show ll/sc locks
<mjg>
if it is anything like "pause" from x86, mcs should come with it for free
<geist>
that part of the arch is *extremely* tricky, and if you mess it up there are fairly dire power consequences
<geist>
it's closer to mwait
<j`ey>
mjg: WFE sounds like pause
<mjg>
you queue yourself in and then chill waiting for your turn
<j`ey>
'wait for event'
<geist>
right, it's closer to monitor/mwait
<geist>
but the exact sequence is very very tricky
<mjg>
but maybe there is some arm-diff getting in the way
<mjg>
i only know few tidbits here and there
<geist>
for ll/sc it's straightforward: when you do the atomic grab of the lock you gain the exclusive, which you can then WFE on
<geist>
when someone writes to it on another cpu it breaks the eclusive which is treated as an 'event' in the arm cores that are waiting on that cache line, which fall out of the WFE and then try the loop again
xenos1984 has joined #osdev
<mjg>
fwiw concurrencykit has mcs locks (bsd or similar licensed, basically you can take it if you want)
<mjg>
it is *plausible* it is doing trt on arm
<geist>
oh i'm sure linux has solved it but i dont want to look at what linux does (please dont tell me)
<geist>
but the BSDs in my experience are far behind on the ARM64 front
<mjg>
concurrencykit is not linux
<geist>
even freebsd is fairly behind every time i look at it
<mjg>
bsd as in license
<geist>
yes yes i know
<mjg>
not tied to it
<mjg>
ye freebsd is kind of crap, but i have wip patches to fix some of it
<geist>
what i mean is the bsd kernels and arm64 are generally fairly behind state of the art
<mjg>
fwiw i got the vfs layer to do faster path lookups than linux
<heat>
freebsd is kind of crap - bsd dev
<heat>
finally, we got em
<geist>
and netbsd openbsd are fairly simple ports. not knocking it, but they just haven't had a lot of work on it
<mjg>
heat: bro, i have been saying this for a decade
<geist>
there are a few places where i think even zircon is ahead of freebsd in particlar arch optimizations (though very few)
<mjg>
oh?
<geist>
but spinlocks, probably not
<mjg>
well i can't vouch for arm ports
<geist>
linux of course is the gold standard here, but it's also. well linux
<mjg>
i can tell you some of the amd64 stuff is faster on freebsd than on linux
<geist>
yes i'm 100% talking about ARM ports
<mjg>
but mostly because they refuse to fix it
<mjg>
as opposed to me doing anything special
<mjg>
most notably memset and memcpy
<geist>
yes 100% arm
<mjg>
yye, ye, ok
<heat>
memset and memcpy?
<heat>
the userspace one?
<heat>
ones*
<mjg>
kernel
<mjg>
linux has plain ERMS use, if supported
<mjg>
that's terribly slow for small sizes
<mjg>
i don't understand why they do this
FreeFull has joined #osdev
<geist>
exept for the *new* second ERMS bit that syas 'use it anyway'
<geist>
that may be a case where $vendor adds patch because they added the bit
<mjg>
glibc, maintained by intel, is doing regular stores until it can do simd
<moon-child>
doesn't musl do that too?
<mjg>
it's literally intel disagreeing with itself
<geist>
that's user space though. differnet constraints
<geist>
can use simd, etc
<heat>
moon-child, no
<moon-child>
well, no, musl uses rep movs/stos regardless of erms :P
<mjg>
geist: small sizes which *don't* do simd
<heat>
musl's stringops are pretty non-optimal
<moon-child>
heat: that was meant as a put-down on musl
<geist>
yes, because a libc exist doesn't mean its already optimized everything to the wall
<heat>
moon-child, downvote
<heat>
musl is largely like a 3 or 4 people effort
<mjg>
i asked someone from intel about it, they told me to just use erms and be done with it
<mjg>
... except see above
<geist>
thing is there's a second erms bit now right?
<mjg>
fast short rep, ye
<geist>
it says 'really actually use rep movsb'
<mjg>
i have not benched it yet, i do find it suspicious so far though
<geist>
trouble with intel is they're probably mostly tasked with making the newest hottest thing go fast
<geist>
since old stuff they dont sell anymore doesn't make them money
<moon-child>
there's also avx512
<moon-child>
which aside from the power stuff is kinda cheating for memcpy/memset type tasks
sprock has quit [Ping timeout: 268 seconds]
<geist>
even with zircon i have to fight a bit to get folks to care bout stuff older than about 5 years
<mjg>
btw zircon is just arm64 and amd64?
<geist>
my usual get out of that argument card is 'yeah but folks run fuchsia on qemu with kvm on older machines' which will work for about a 10 year window
<geist>
mjg: correct. there's a rv64 port floating around too
<geist>
but not mainlined
<mjg>
cool
<j`ey>
what about m68k?
<mjg>
bigger than 64
<j`ey>
just copy those bits from LK, and im sure it'll work
<geist>
if it's 64bit we can probably port to it
<mjg>
motorola 68k?
<geist>
and i know you're kidding, but we do draw the line at 64bits
<geist>
whcih really simplifies things. gives you a fresh playing field
<geist>
at the expense of i think the kernel would work fairly well in certain embedded situations that we just cant even try
<geist>
i think ppc64 would be problematic though. there are probably some implicit little endian assumptions
<geist>
though i can't think of any off the top of my head in the kernel itself
<geist>
in user space, yeah i bet
<heat>
in drivers
<heat>
filesystems
<mjg>
ppc has even weaker real world memory ordering than arm
<mjg>
as in the cpu is more likely to fuck you over if you mess up your barriers
<mjg>
for this reason alone it is worth porting to
<heat>
geist, do you want a new rv64 port? 👀
<moon-child>
mjg, have to acquire hardware in order to do that
<mjg>
see TALOS
<mjg>
:)
<moon-child>
are they still 8k a pop?
<moon-child>
:P
<heat>
brb need to reboot, kde is acting up
<heat>
year of the linux desktop
heat has quit [Remote host closed the connection]
heat has joined #osdev
sprock has joined #osdev
<heat>
ldp and stp are the most non-RISC RISC instructions I've seen
<geist>
what do you mean?
<geist>
you must not have seen ldm/stm!
<heat>
i have not
<heat>
that is arm32 right?
<geist>
yes
<geist>
a pretty standard: load/store this bitmap of registers in order style instruction
<geist>
lots of arches had it
<geist>
but it's a very non risc thing. ldp/stp is a step back from that
Vercas6 has quit [Remote host closed the connection]
Test_User has joined #osdev
Vercas6 has joined #osdev
\Test_User has quit [Ping timeout: 245 seconds]
carbonfiber has joined #osdev
\Test_User has joined #osdev
GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]
<carbonfiber>
does anyone here know how to understand those timing diagrams or know of a guide that explain those type of timing diagrams in more depth?
<carbonfiber>
i tried searching but am only able to find other types of timing diagrams.
<zid>
404 for me
<heat>
after a brief look, that looks irrelevant
<zid>
oh nevermind cus the link fucked up my end
<zid>
I assume it was an edge trace on the hw level that nobody cares about?
<zid>
unless they're making a controller chip
<heat>
yea
<heat>
the ATA spec (and other similar specs) have a lot of that
<heat>
even the e1000 docs have a solid 2 or 3 chapters that are pretty irrelevant
<heat>
(to us)
<zid>
I found an e1000 doc that was 50 pages of electricals
<zid>
because it was mainly the manual for how to put one onto a pci-e card
<zid>
such that the LEDs worked properly
<mrvn>
carbonfiber: those timing diagrams are relevant when you design hardware.
kkd has joined #osdev
<CompanionCube>
geist: eh, ppc64le is a thing
<zid>
humans are are the way cat genes decided to colonize mars
<mrvn>
zid: cats are clearly advanced tool users
<heat>
I have just needed to input a MM/DD/YYYY date
<heat>
my day is ruined
FreeFull has quit []
<zid>
correct
<geist>
CompanionCube: true. though iirc my G5 powermac (which i'd hack on) didn't support LE mode
<geist>
i *think*
<zid>
ARM processors are the way neanderthal genes try to ruin humanity
<geist>
mkay
<heat>
riscv64 is the superior architecture
<zid>
found the gorilla
<geist>
sounds like everyone is just trying to trigger each other
<heat>
virgin arm arm with its 6000 pages vs chad riscv manual written in LaTeX with 200
<zid>
that's because you're american geist
<geist>
wow, really going for it today
<zid>
come be a chill dutch person or whatever instead
<heat>
or a sarcastic british person
<zid>
what? it's a known fact that americans have no bants or sarcasm
<geist>
mkay
socksonme_ has quit [Ping timeout: 252 seconds]
immibis has joined #osdev
[itchyjunk] has quit [Ping timeout: 255 seconds]
[itchyjunk] has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<heat>
geist, did you know there are builtins for isb and writing to system registers?
<heat>
mind = moderately blown
<geist>
yeah
<geist>
i think they're fairly new
<geist>
we generally switched zircon code to it
<geist>
not sure precisely which toolchain they showed up in, so be a bit careful about it
<heat>
2014
<heat>
the isb one
<heat>
__builtin_arm_wsr64 seems newer?
<geist>
yeah possibly
<geist>
the isb or dsb one i can kinda take or leave. it's nice to use the builtin, but then it's also just as easy to have a inline asm thing
<geist>
but yah, use it if it's avail
LittleFox has quit [Quit: ZNC 1.8.2+deb2+b1 - https://znc.in]
LittleFox has joined #osdev
<heat>
msr and mrs seem easy as well?
<heat>
I don't see the point in this except a e s t h e t i c s
<geist>
yah
<geist>
though if you didn't have to type them in in the first place that's nice
<geist>
so if you have new code it makes sense to use em
netbsduser` has quit [Quit: Leaving]
<heat>
geist, you have a zen 3 right?
<heat>
do you know if you have pcid, tlbsync, invlpgb?
<heat>
i think they're there but I'm not sure, and I'm not sure if they're there on desktops
<heat>
at least I haven't seen anyone use them yet