klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
bradd has quit [Ping timeout: 268 seconds]
gog has quit [Ping timeout: 248 seconds]
gog has joined #osdev
smach has quit []
smach has joined #osdev
srjek has joined #osdev
[_] has joined #osdev
[itchyjunk] has quit [Ping timeout: 244 seconds]
[_] is now known as [itchyjunk]
smach has quit []
elastic_dog has quit [Ping timeout: 256 seconds]
elastic_dog has joined #osdev
gog has quit [Remote host closed the connection]
gog has joined #osdev
elastic_dog is now known as Guest2126
elastic_dog has joined #osdev
Guest2126 has quit [Ping timeout: 248 seconds]
spikeheron has joined #osdev
opal has quit [Ping timeout: 268 seconds]
gog has quit [Ping timeout: 252 seconds]
opal has joined #osdev
[_] has joined #osdev
[itchyjunk] has quit [Ping timeout: 256 seconds]
sikkiladho has quit [Quit: Connection closed for inactivity]
vin has joined #osdev
gxt has quit [Ping timeout: 268 seconds]
gxt has joined #osdev
opal has quit [Ping timeout: 268 seconds]
opal has joined #osdev
elastic_dog has quit [Ping timeout: 248 seconds]
[_] has quit [Read error: Connection reset by peer]
elastic_dog has joined #osdev
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
gxt has quit [Remote host closed the connection]
opal has quit [Remote host closed the connection]
gxt has joined #osdev
opal has joined #osdev
bradd has joined #osdev
brynet has quit [Ping timeout: 240 seconds]
dh` has quit [Quit: brb]
brynet has joined #osdev
dh` has joined #osdev
srjek has quit [Ping timeout: 256 seconds]
smach has joined #osdev
smach has quit []
smach has joined #osdev
smach has quit [Remote host closed the connection]
foudfou has quit [Ping timeout: 268 seconds]
foudfou has joined #osdev
Vercas66 has joined #osdev
Vercas6 has quit [Ping timeout: 268 seconds]
Vercas66 is now known as Vercas6
the_lanetly_052_ has joined #osdev
the_lanetly_052 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 256 seconds]
teroshan9 has quit [Ping timeout: 248 seconds]
terrorjack has quit [Ping timeout: 256 seconds]
puck has quit [Excess Flood]
puck has joined #osdev
teroshan9 has joined #osdev
Vercas68 has joined #osdev
terrorjack has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas68 is now known as Vercas6
smach has joined #osdev
smach has quit [Read error: Connection reset by peer]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Client Quit]
smach has joined #osdev
sikkiladho has joined #osdev
sav_ has joined #osdev
dh` has quit [Quit: Leaving]
dh` has joined #osdev
dh` has quit [Ping timeout: 252 seconds]
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
dh` has joined #osdev
smach has quit []
GeDaMo has joined #osdev
Vercas6 has quit [Ping timeout: 268 seconds]
xenos1984 has quit [Read error: Connection reset by peer]
Vercas6 has joined #osdev
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
xenos1984 has joined #osdev
sikkiladho has quit [Quit: Connection closed for inactivity]
sav_ has quit []
smach has joined #osdev
dzwdz has quit [Ping timeout: 268 seconds]
dzwdz has joined #osdev
gog has joined #osdev
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 268 seconds]
smach has quit [Read error: Connection reset by peer]
Vercas68 has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas68 is now known as Vercas6
mrkajetanp has quit [Ping timeout: 240 seconds]
gildasio has quit [Quit: WeeChat 3.6]
freakazoid333 has quit [Ping timeout: 255 seconds]
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
gildasio has joined #osdev
ghee has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gog has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gmacd_ has joined #osdev
gmacd_ has quit [Ping timeout: 248 seconds]
heat has joined #osdev
gog has joined #osdev
[itchyjunk] has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
nyah has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
ghee has quit [Quit: EOF]
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
zaquest has quit [Remote host closed the connection]
ghee has joined #osdev
frkzoid has joined #osdev
zaquest has joined #osdev
srjek has joined #osdev
knusbaum has joined #osdev
smach has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 256 seconds]
smach has quit [Ping timeout: 252 seconds]
smach has joined #osdev
smach has quit [Remote host closed the connection]
the_lanetly_052_ has joined #osdev
vdamewood has joined #osdev
smach has joined #osdev
smach has quit [Client Quit]
the_lanetly_052_ has quit [Ping timeout: 248 seconds]
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
_xor has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
CYKS has quit [Quit: Ping timeout (120 seconds)]
CYKS has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gildasio has quit [Ping timeout: 268 seconds]
gildasio has joined #osdev
gog has joined #osdev
<mats1> where my daddies at
ghee has quit [Quit: EOF]
terminalpusher has joined #osdev
<heat> sup
<psykose> heat isn't a daddy he's a beta
<heat> you're a fucking beta alpine mofo
<heat> why is alpine slow? because ur malloc sux
<psykose> i like how you have literally no other material than just calling me alpinesomething every time
<heat> damn right
<heat> alpinelover420
<mjg> calling someone beta is beta
<mjg> and calling someone alpinesomething is right
<psykose> it's actually not the malloc it's the __lock
<psykose> check urself
<heat> i just got roasted
<heat> fuck
smach has joined #osdev
<heat> holy shit you had a holy war in your channel
<heat> damn
<heat> mjg, hello sir, how to write fast memcpy impl??
<GeDaMo> movsb
<mjg> for which cpu and kernel or userspace
<heat> amd64
<heat> I see you did freebsd's
<mjg> i did
<Bitweasil> IIRC you just... go at it with the string rep copy stuff, and let the CPU handle it.
<mjg> it is reasonably fast for the kernel
<psykose> what holy war
<heat> Bitweasil, not for user space
<mjg> not fast for user
<Bitweasil> Oh, you do get enough to make vector regs worth it?
<mjg> albeit it does beat out glibc for certain cases
<Bitweasil> I thought the "braindead simple" approach more or less ran even with the more complex solutions once you get on the CPU and it figures out what you're doing.
<mjg> Bitweasil: rep movs, even with ERMS +_movsb, has huge startup latency which kills perf for small copies
<Bitweasil> Hm... Ok, but once it gets going, how does it compare?
<mjg> against regular movs it wins depending on uarch, somewhere north of 192 bytes
<mjg> against SIMD depends on uarch again and i don't know if there is a reasonable upper limit
<mjg> afair glibc switches to it past 2KB or so
<Bitweasil> *nods*
<mjg> one funny problem with it is that it is hugely penalized by misalignment of the target buffer
smach has quit [Remote host closed the connection]
<mjg> in my testing it really wants 32 bytes (at least anyway)
smach has joined #osdev
<mjg> things go super bad if you have a buf ending with 0x8
<mjg> which is not particularly incommon
<mjg> uncommon
<mjg> and anything worse than really fucks you up
srjek has quit [Ping timeout: 244 seconds]
<mjg> there is a newer exteneion which is supposed to provide "fast startup" for erms, but i have not hard the opportunity to bench
<mjg> i mean i can have one anytime but can't be fucked for the time being
<mjg> measurements welcome :)
FreeFull has joined #osdev
<mjg> heat: so why do you ask if you even seen the code
<mjg> or you just like to see me rant
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
joe9 has quit [Remote host closed the connection]
zid has quit [Ping timeout: 248 seconds]
joe9 has joined #osdev
smach has quit []
zid has joined #osdev
<geist> yah in genera on other arches too its the target buffer that should be aligned, all else held equal
<geist> makes sense, if it's aigned the cpu has a better time filling in a cache lie, whereas on the read side it can start prefetching to fill any unalignment
<mjg> well for rep movsb you could argue it can align on its own
<mjg> it just does not happen
<geist> oh sure, but just saying if you see an effect, i'm not surprised it's more pronounced on the target side
<mjg> fwiw it's not bad for regular movs
<geist> under the hood it still has to move data via some block transfer, so the effects are still there
<mjg> as in it is cheaper to the mislagned stores than to spend time aligning the buffer
<geist> i think in a regular mov that can probably be soaked up with various OOO effects of the cpu
<geist> but in a copy situation that gets saturated and it starts to get synchronously lined up with whatever the copy pipeline is doing
<mjg> the cpu can check what the buf is, do the small mov internally and only then roll with the stream
<mjg> at least i'm assuming this much, perhaps there are fucked reasons why they can't feasibly do it
terminalpusher has quit [Remote host closed the connection]
<geist> yah dunno, would be interesting to see how the microcode works in that case. of course there are i'm sure at this point a handful of implementations of that copy engine, so it's probalby fairly core specific nowadays
<geist> including AMD cores
<geist> which i guess is the advantage of that scheme: tell the cpu what you mean to do and let it figure it out
<geist> even ARM is on that train, there's a new extension that basically does the same thing coming down the pike
<geist> mjg: but ayway that does arrive at the original question: given current sets of x86 machines and ERMS (and whatever the new ERMS second bit is) what would you recommend as a general in-kernel memcpy solution?
<geist> especially given that lots of copies are fairly small, and then some are probably page sized
<geist> the page sized copy can be probably called out and optimized seperately
<geist> i think for zircon we've just done the usual 'if erms use rep movsb, else do some sort of combination of align and rep movsq'
<mjg> i would recommend what i did for freebsd, modulo uarch specific optimizations you may need
<mjg> lemme open it
psykose has left #osdev [#osdev]
<bslsk05> ​cgit.freebsd.org: support.S « amd64 « amd64 « sys - src - FreeBSD source tree
<mjg> the 256 byte cutoff point was chosen as a compromise, it really differs per uarch
<mjg> but you should find it demolishes a mere 'rep movsb' for small copies
<heat> sorry mjg I had to go away for a while
<heat> I was just wondering where you got all the arcane knowledge (I assume intel's optimization manual?)
<mjg> with one caveat: i did no benchmark this against supposed fast short rep mov
<heat> especially since musl's memcpy is shit
<mjg> 1. agner fog's manual
<mjg> (bits an pieces)
<geist> ah i see. i thought you had benchmarked this against a lot of hardware
<mjg> 2. same for intel's manual
<mjg> geist: i did did on a bunch of intel adn amd uarchs back in the day, that was the best i could come up with
<geist> ah
<geist> well when i have copious free time i should run a set of benchmarks myself for slow stuff, and also things with the new erms bit
<heat> I particularly wanted to understand how SIMD could speed it up
<mjg> fast short rep mov showed up some time after i stopped worked in the area and it is plausible, but i doubt, that it beats it
<geist> i forget precisely when that showed up but i need to put in an order for an alder lake machine soon
<geist> hmm, i wonder if AMD ever picked up that bit
<mjg> erms? they did, fsrm i have no idea
gareppa has joined #osdev
<mjg> heat: simd has wider datapipes
<geist> oh interesting. my zen 2 machine doesn't show ERMS bit set
<geist> must have been zen 3
<mjg> it is plausible it is disabled in bios
<heat> there's an amd memcpy (GPLv2) that has gone around #musl
<heat> it's super super super insanely complex
<bslsk05> ​android.googlesource.com: libc/arch-x86_64/string/sse2-memmove-slm.S - platform/bionic - Git at Google
<geist> looks like.... 5950 (zen 3) has the erms bit: https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=77605
<bslsk05> ​www.cpu-world.com: Access denied
<mjg> i'm not saying this is perfect, but it is the right ballpark
<mjg> to demonstrate the diff between simd vs regular stores
<geist> okay, so now when does the FERMS show up o intel...
<mjg> "fsrm"
<mjg> i think ice lake was the oldest to have it
<mjg> heat: fwiw the best trick in the area is overlapping stores
<bslsk05> ​cgit.freebsd.org: support.S « amd64 « amd64 « sys - src - FreeBSD source tree
<mjg> instead of branching to find the exact right size you let the cpu possibly overwrite some of the bytes with the same content
<mjg> while this of course has *some* penalty, it is cheaper than all the branching
<mjg> also above that there is a 32 byte at a time loop, i found no speed up from making it wider
<mjg> bb in 30-ish
zid has quit [Read error: No route to host]
<geist> hmm,m guess that site doesn't decode that bit: https://www.cpu-world.com/cgi-bin/CPUID.pl?CPUID=77185
<bslsk05> ​www.cpu-world.com: Access denied
<geist> that's an 11th gen intel, seems like it *should* have it
<geist> well anyway
zid has joined #osdev
smach has joined #osdev
GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
sav_ has joined #osdev
_whitelogger has joined #osdev
smach has quit [Client Quit]
sav_ has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
gareppa has quit [Quit: Leaving]
smach has joined #osdev
<moon-child> I'm suddenly morbidly curious
<moon-child> how do they implement strings functions (aside from memset) on platforms with no unaligned memory ops?
<moon-child> are you just boned if your things aren't coaligned?
<mjg> you align them\
<mjg> or decide srew it and do 1 byte ops
<mjg> if the size is small enough(tm)
smach has quit [Remote host closed the connection]
<moon-child> 'align them' how?
<moon-child> they could have different alignments from each other; can't change that
<mjg> well you also got strlen :-p
<mjg> but ye, for 2 bufs it is huge pain
ThinkT510 has quit [Quit: WeeChat 3.6]
smach has joined #osdev
<mjg> there is magic to align both and then continue
<mjg> see e.g. memmove for armv7 on freebsd
<mjg> i'm not very familiar with it, only skimmed
<mjg> i doubt it is optimal for the platform
<moon-child> 'erg - unaligned source'
<moon-child> 'This is where it gets nasty ...'
<moon-child> heartening comments...
ThinkT510 has joined #osdev
<mjg> /* Do the buffers overlap? */
<mjg> RETeq /* Bail now if src/dst are the same */
<mjg> cmp r0, r1
<mjg> now that's a waste of a branch
<zid> I bet it speeds up some dumb program somewhere and the branch predictor gets it right 99.99% of the time
<geist> yeah for alignment you basically need t aign yoursef
<geist> https://godbolt.org/z/cdWzPajGT is a cheezy bored one afternoon thing i wrote to see what the codegen would be like for riscv (which doesn't necessarily do unaligned)
<bslsk05> ​godbolt.org: Compiler Explorer
<geist> untested, unoptimized, but it generally tries to dea with alignment
<geist> and yeah if the source and dest are not simiarly aligned it just gives up and goes to bytewise
<geist> obviously you can unroll this sort of thing, deal with unalignments better (i wrote one on arm32 years ago that does this) etc
<mjg> zid: i happened to check for this bullshit and it virtually never happens
<mjg> zid: that said you are probably right :p
<geist> https://github.com/apple/darwin-xnu/blob/main/osfmk/arm/bcopy.s is one i wrote years ago when i was at apple that tries to hande a good chunk of the cases
<bslsk05> ​github.com: darwin-xnu/bcopy.s at main · apple/darwin-xnu · GitHub
<geist> it also handles the 'source an dest are 1, 2, or 3 byte unaligned' case by oading a word and rotating it a bit and orring it with neighboring words
<geist> which is deliciousy fun to do on arm32 because of the barrel shifter
<moon-child> does anyone remotely modern _not_ have a barrel shifter?
<bslsk05> ​github.com: darwin-xnu/bcopy.s at main · apple/darwin-xnu · GitHub
<geist> moon-child: wel, i mean not intrinsic to data processing instructions like it was on arm32
<geist> you could shift or rotate the 2nd argument to all data processing instructions arbitrariy
<moon-child> oh right, yeah
<geist> very powerfu, though arguably a waste of instruction bits that coud go elsewhere
<moon-child> I mean, keep running down that path and you get vliw
<mjg> like for example cmpxchg :S
<moon-child> mjg: hey they have cas now
<mjg> i know
smach has quit [Remote host closed the connection]
srjek has joined #osdev
<FreeFull> klange: What did you spend those last few hours doing, exactly? Fighting an evil bug?
<mjg> zid: i just rmeembered something absolutely crazy
<mjg> the one crapper which had src == dst was in freebsd and was copied with rep mov
<mjg> it was possible other cpus would read the buf at the same time as copy is happening
<mjg> and there were cases where *this* would result in transient BULLSHIT in said buffer
<mjg> copying the buffer onto itself
<mjg> fucking amazing
<geist> computers were a mistake
smach has joined #osdev
<klange> FreeFull: Mostly signal shit
<klange> That's dash, with working job control.
k8yun has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
<klange> I guess the main thing is I fixed waitpid() to only return on _transitions_ for stopped jobs. My own shell was cheating and didn't waitpid(-1,...)
gildasio has quit [Ping timeout: 268 seconds]
smach has quit [Remote host closed the connection]
<mjg> i just fixed a bug which should not have happened(tm)
scoobydoo has joined #osdev
<mjg> but of course managed to run into it roughly at the time reports started coming in
<mjg> ... while i was using the patch for several days now
gildasio has joined #osdev
<mjg> software--
<FreeFull> posix signals are nasty
<mjg> posix is nasty
gildasio has quit [Remote host closed the connection]
mimmy has joined #osdev
<heat_> signals are damn nasty
heat_ is now known as heat
mimmy has quit [Client Quit]
gildasio has joined #osdev
smach has joined #osdev
gildasio has quit [Client Quit]
smach has quit [Client Quit]
manawyrm has quit [Quit: Read error: 2.99792458 x 10^8 meters/second (Excessive speed of light)]
<bslsk05> ​cve.mitre.org: CVE - CVE-2022-38392
smach has joined #osdev
manawyrm has joined #osdev
smach has quit [Remote host closed the connection]
mimmy has joined #osdev
smach has joined #osdev
mimmy has quit [Client Quit]
smach has quit [Remote host closed the connection]
gildasio has joined #osdev
mimmy has joined #osdev
mimmy has quit [Client Quit]
smach has joined #osdev
<zid> I fixed my enigma machine, I was dealing with underflow incorrectly.
<zid> Completely unrelated note: % operator is trash
mimmy has joined #osdev
mimmy has quit [Client Quit]
smach has quit [Remote host closed the connection]
mimmy has joined #osdev
mimmy has quit [Client Quit]
smach has joined #osdev
smach has quit [Client Quit]
sikkiladho has joined #osdev
mimmy has joined #osdev
mimmy has quit [Client Quit]
matt__ has joined #osdev
matt__ is now known as freakazoid333
FreeFull has quit []
joe9 has quit [Ping timeout: 248 seconds]
frkzoid has quit [Ping timeout: 256 seconds]
mimmy has joined #osdev