smach has quit [Remote host closed the connection]
foudfou has quit [Ping timeout: 268 seconds]
foudfou has joined #osdev
Vercas66 has joined #osdev
Vercas6 has quit [Ping timeout: 268 seconds]
Vercas66 is now known as Vercas6
the_lanetly_052_ has joined #osdev
the_lanetly_052 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 256 seconds]
teroshan9 has quit [Ping timeout: 248 seconds]
terrorjack has quit [Ping timeout: 256 seconds]
puck has quit [Excess Flood]
puck has joined #osdev
teroshan9 has joined #osdev
Vercas68 has joined #osdev
terrorjack has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas68 is now known as Vercas6
smach has joined #osdev
smach has quit [Read error: Connection reset by peer]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Client Quit]
smach has joined #osdev
sikkiladho has joined #osdev
sav_ has joined #osdev
dh` has quit [Quit: Leaving]
dh` has joined #osdev
dh` has quit [Ping timeout: 252 seconds]
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
dh` has joined #osdev
smach has quit []
GeDaMo has joined #osdev
Vercas6 has quit [Ping timeout: 268 seconds]
xenos1984 has quit [Read error: Connection reset by peer]
Vercas6 has joined #osdev
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
xenos1984 has joined #osdev
sikkiladho has quit [Quit: Connection closed for inactivity]
sav_ has quit []
smach has joined #osdev
dzwdz has quit [Ping timeout: 268 seconds]
dzwdz has joined #osdev
gog has joined #osdev
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 268 seconds]
smach has quit [Read error: Connection reset by peer]
Vercas68 has joined #osdev
Vercas6 has quit [Remote host closed the connection]
Vercas68 is now known as Vercas6
mrkajetanp has quit [Ping timeout: 240 seconds]
gildasio has quit [Quit: WeeChat 3.6]
freakazoid333 has quit [Ping timeout: 255 seconds]
Vercas6 has quit [Remote host closed the connection]
Vercas6 has joined #osdev
gildasio has joined #osdev
ghee has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gog has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gmacd_ has joined #osdev
gmacd_ has quit [Ping timeout: 248 seconds]
heat has joined #osdev
gog has joined #osdev
[itchyjunk] has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
scoobydoo has joined #osdev
nyah has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
ghee has quit [Quit: EOF]
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
zaquest has quit [Remote host closed the connection]
ghee has joined #osdev
frkzoid has joined #osdev
zaquest has joined #osdev
srjek has joined #osdev
knusbaum has joined #osdev
smach has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 256 seconds]
smach has quit [Ping timeout: 252 seconds]
smach has joined #osdev
smach has quit [Remote host closed the connection]
the_lanetly_052_ has joined #osdev
vdamewood has joined #osdev
smach has joined #osdev
smach has quit [Client Quit]
the_lanetly_052_ has quit [Ping timeout: 248 seconds]
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
_xor has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
CYKS has quit [Quit: Ping timeout (120 seconds)]
CYKS has joined #osdev
gog has quit [Ping timeout: 248 seconds]
gildasio has quit [Ping timeout: 268 seconds]
gildasio has joined #osdev
gog has joined #osdev
<mats1>
where my daddies at
ghee has quit [Quit: EOF]
terminalpusher has joined #osdev
<heat>
sup
<psykose>
heat isn't a daddy he's a beta
<heat>
you're a fucking beta alpine mofo
<heat>
why is alpine slow? because ur malloc sux
<psykose>
i like how you have literally no other material than just calling me alpinesomething every time
<heat>
damn right
<heat>
alpinelover420
<mjg>
calling someone beta is beta
<mjg>
and calling someone alpinesomething is right
<psykose>
it's actually not the malloc it's the __lock
<psykose>
check urself
<heat>
i just got roasted
<heat>
fuck
smach has joined #osdev
<heat>
holy shit you had a holy war in your channel
<heat>
damn
<heat>
mjg, hello sir, how to write fast memcpy impl??
<GeDaMo>
movsb
<mjg>
for which cpu and kernel or userspace
<heat>
amd64
<heat>
I see you did freebsd's
<mjg>
i did
<Bitweasil>
IIRC you just... go at it with the string rep copy stuff, and let the CPU handle it.
<mjg>
it is reasonably fast for the kernel
<psykose>
what holy war
<heat>
Bitweasil, not for user space
<mjg>
not fast for user
<Bitweasil>
Oh, you do get enough to make vector regs worth it?
<mjg>
albeit it does beat out glibc for certain cases
<Bitweasil>
I thought the "braindead simple" approach more or less ran even with the more complex solutions once you get on the CPU and it figures out what you're doing.
<mjg>
Bitweasil: rep movs, even with ERMS +_movsb, has huge startup latency which kills perf for small copies
<Bitweasil>
Hm... Ok, but once it gets going, how does it compare?
<mjg>
against regular movs it wins depending on uarch, somewhere north of 192 bytes
<mjg>
against SIMD depends on uarch again and i don't know if there is a reasonable upper limit
<mjg>
afair glibc switches to it past 2KB or so
<Bitweasil>
*nods*
<mjg>
one funny problem with it is that it is hugely penalized by misalignment of the target buffer
smach has quit [Remote host closed the connection]
<mjg>
in my testing it really wants 32 bytes (at least anyway)
smach has joined #osdev
<mjg>
things go super bad if you have a buf ending with 0x8
<mjg>
which is not particularly incommon
<mjg>
uncommon
<mjg>
and anything worse than really fucks you up
srjek has quit [Ping timeout: 244 seconds]
<mjg>
there is a newer exteneion which is supposed to provide "fast startup" for erms, but i have not hard the opportunity to bench
<mjg>
i mean i can have one anytime but can't be fucked for the time being
<mjg>
measurements welcome :)
FreeFull has joined #osdev
<mjg>
heat: so why do you ask if you even seen the code
<mjg>
or you just like to see me rant
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
joe9 has quit [Remote host closed the connection]
zid has quit [Ping timeout: 248 seconds]
joe9 has joined #osdev
smach has quit []
zid has joined #osdev
<geist>
yah in genera on other arches too its the target buffer that should be aligned, all else held equal
<geist>
makes sense, if it's aigned the cpu has a better time filling in a cache lie, whereas on the read side it can start prefetching to fill any unalignment
<mjg>
well for rep movsb you could argue it can align on its own
<mjg>
it just does not happen
<geist>
oh sure, but just saying if you see an effect, i'm not surprised it's more pronounced on the target side
<mjg>
fwiw it's not bad for regular movs
<geist>
under the hood it still has to move data via some block transfer, so the effects are still there
<mjg>
as in it is cheaper to the mislagned stores than to spend time aligning the buffer
<geist>
i think in a regular mov that can probably be soaked up with various OOO effects of the cpu
<geist>
but in a copy situation that gets saturated and it starts to get synchronously lined up with whatever the copy pipeline is doing
<mjg>
the cpu can check what the buf is, do the small mov internally and only then roll with the stream
<mjg>
at least i'm assuming this much, perhaps there are fucked reasons why they can't feasibly do it
terminalpusher has quit [Remote host closed the connection]
<geist>
yah dunno, would be interesting to see how the microcode works in that case. of course there are i'm sure at this point a handful of implementations of that copy engine, so it's probalby fairly core specific nowadays
<geist>
including AMD cores
<geist>
which i guess is the advantage of that scheme: tell the cpu what you mean to do and let it figure it out
<geist>
even ARM is on that train, there's a new extension that basically does the same thing coming down the pike
<geist>
mjg: but ayway that does arrive at the original question: given current sets of x86 machines and ERMS (and whatever the new ERMS second bit is) what would you recommend as a general in-kernel memcpy solution?
<geist>
especially given that lots of copies are fairly small, and then some are probably page sized
<geist>
the page sized copy can be probably called out and optimized seperately
<geist>
i think for zircon we've just done the usual 'if erms use rep movsb, else do some sort of combination of align and rep movsq'
<mjg>
i would recommend what i did for freebsd, modulo uarch specific optimizations you may need
<geist>
that's an 11th gen intel, seems like it *should* have it
<geist>
well anyway
zid has joined #osdev
smach has joined #osdev
GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
sav_ has joined #osdev
_whitelogger has joined #osdev
smach has quit [Client Quit]
sav_ has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
smach has joined #osdev
smach has quit [Remote host closed the connection]
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
gareppa has quit [Quit: Leaving]
smach has joined #osdev
<moon-child>
I'm suddenly morbidly curious
<moon-child>
how do they implement strings functions (aside from memset) on platforms with no unaligned memory ops?
<moon-child>
are you just boned if your things aren't coaligned?
<mjg>
you align them\
<mjg>
or decide srew it and do 1 byte ops
<mjg>
if the size is small enough(tm)
smach has quit [Remote host closed the connection]
<moon-child>
'align them' how?
<moon-child>
they could have different alignments from each other; can't change that
<mjg>
well you also got strlen :-p
<mjg>
but ye, for 2 bufs it is huge pain
ThinkT510 has quit [Quit: WeeChat 3.6]
smach has joined #osdev
<mjg>
there is magic to align both and then continue
<mjg>
see e.g. memmove for armv7 on freebsd
<mjg>
i'm not very familiar with it, only skimmed
<mjg>
i doubt it is optimal for the platform
<moon-child>
'erg - unaligned source'
<moon-child>
'This is where it gets nasty ...'
<moon-child>
heartening comments...
ThinkT510 has joined #osdev
<mjg>
/* Do the buffers overlap? */
<mjg>
RETeq /* Bail now if src/dst are the same */
<mjg>
cmp r0, r1
<mjg>
now that's a waste of a branch
<zid>
I bet it speeds up some dumb program somewhere and the branch predictor gets it right 99.99% of the time
<geist>
yeah for alignment you basically need t aign yoursef
<geist>
https://godbolt.org/z/cdWzPajGT is a cheezy bored one afternoon thing i wrote to see what the codegen would be like for riscv (which doesn't necessarily do unaligned)
<bslsk05>
godbolt.org: Compiler Explorer
<geist>
untested, unoptimized, but it generally tries to dea with alignment
<geist>
and yeah if the source and dest are not simiarly aligned it just gives up and goes to bytewise
<geist>
obviously you can unroll this sort of thing, deal with unalignments better (i wrote one on arm32 years ago that does this) etc
<mjg>
zid: i happened to check for this bullshit and it virtually never happens
<bslsk05>
github.com: darwin-xnu/bcopy.s at main · apple/darwin-xnu · GitHub
<geist>
it also handles the 'source an dest are 1, 2, or 3 byte unaligned' case by oading a word and rotating it a bit and orring it with neighboring words
<geist>
which is deliciousy fun to do on arm32 because of the barrel shifter
<moon-child>
does anyone remotely modern _not_ have a barrel shifter?
<FreeFull>
klange: What did you spend those last few hours doing, exactly? Fighting an evil bug?
<mjg>
zid: i just rmeembered something absolutely crazy
<mjg>
the one crapper which had src == dst was in freebsd and was copied with rep mov
<mjg>
it was possible other cpus would read the buf at the same time as copy is happening
<mjg>
and there were cases where *this* would result in transient BULLSHIT in said buffer
<mjg>
copying the buffer onto itself
<mjg>
fucking amazing
<geist>
computers were a mistake
smach has joined #osdev
<klange>
FreeFull: Mostly signal shit
<klange>
That's dash, with working job control.
k8yun has joined #osdev
scoobydoo has quit [Read error: Connection timed out]
<klange>
I guess the main thing is I fixed waitpid() to only return on _transitions_ for stopped jobs. My own shell was cheating and didn't waitpid(-1,...)
gildasio has quit [Ping timeout: 268 seconds]
smach has quit [Remote host closed the connection]
<mjg>
i just fixed a bug which should not have happened(tm)
scoobydoo has joined #osdev
<mjg>
but of course managed to run into it roughly at the time reports started coming in
<mjg>
... while i was using the patch for several days now
gildasio has joined #osdev
<mjg>
software--
<FreeFull>
posix signals are nasty
<mjg>
posix is nasty
gildasio has quit [Remote host closed the connection]