klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<mjg> and here is another trace:
<mjg> heheszek-read-bionic 62619868 cycles
<mjg> heheszek-read-erms 192876429 cycles
<mjg> heheszek-read-current 108545984 cycles
<mjg> heheszek-read-broken 33159219 cycles
<mjg> the -broken variant is a func which does nothing
<mjg> you may notice a func which does nothing is using 1/3rd of the time memset-current is
<mjg> also -erms almost twice as slow
<mjg> need to name it better
Matt|home has quit [Quit: Leaving]
<mrvn> mjg: and that still works?
<heat> stosb
<mjg> mrvn: ? the broken thing was only added for comparison purposes
<mjg> to see how much is spent on mere fact there is a func call
<mrvn> mjg: but I thought you were running some real world code. I would expect that to blow up and fail early.
airplanemodes has quit [Quit: WeeChat 3.8]
<mjg> i'm running *real* sizes, in the order they were obtained in a real workload, in a loop against various memsets
<mjg> and checking total time
<mrvn> ahh. That won't have the same cache behavior though
[itchyjunk] has quit [Ping timeout: 255 seconds]
<zzo38> Consider to add B-Free into the list of abandoned projects. It is available on GitHub (and I have forked the project, but the only change I made is to add 64-bit types)
<mrvn> and someone should care about that why?
<zzo38> It is a FOSS implementation of BTRON. Such things are difficult to find, and someone who has interest in BTRON should hopefully try to improve it.
<mrvn> zzo38: so add a link to it on https://en.wikipedia.org/wiki/BTRON
<zzo38> Is it notable enough for Wikipedia?
[itchyjunk] has joined #osdev
<zzo38> Do you have further comments about my own operating system design? Just now, I added some more stuff to the design documentation.
<mrvn> If you have to ask for validation then you won't get it.
<zzo38> This system has a hypertext file system, capability-based security with proxy capabilities, locks and transactions on groups of objects as a unit, a common file format, and others.
<zzo38> I hope that if I had made any mistakes or unclear, to fix it.
troseman has joined #osdev
<zzo38> I am also reading the forum to see if there is any interesting stuff mentioned in there
<heat> you should write it first
<heat> IMO not much point in designing what you haven't even tested
<zzo38> OK, it is a valid complaint, although first I want to write how I intend to design it, in the design documents, and then an implementation can be written and the parts of the design documentation changed as needed while finding some things are problems, but possibly some things can be found before implementation in case to make the implementation less messy
<mrvn> but if we tell you how will you ever learn?
<mrvn> what is a problem is also often subjective.
troseman has quit [Quit: troseman]
<zzo38> OK, but a collaborative design is also possible. It is true that what is a problem can be subjective, I suppose; I have found what seems to me to be problems with some other designs
troseman has joined #osdev
<zzo38> There is also possibility of such things as unclear documentations, etc
troseman has quit [Client Quit]
<mrvn> s/possibility/inevitability/
<geist> problem with a lot of this stuff is when you're first getting started there's lots of things you dont know you dont know. so if you try to build some comprehensive design you'll be missing large sets of things that you really should be thinking about
<geist> so the general strategy that works is to start by doing, then as you do things you'll learn more of what you dont know
<geist> it gives you a framework to start putting more knowledge on top of
<geist> over time you start to get a better handle on all the things that do and dont matter with larger designs
<geist> this is the same reason i dont like to just info dump on new folks that come in and ask questions. you have to do it in phases so they can learn the framework to attach more knowledge too later
<mrvn> also when you start with the perfect design you will need years to get anywhere and 99.999% of people give up. So it's not really worth investing time in such people. Start small and learn.
<geist> otherwise you're just dumping info on someone. like many bad professors at university
<mrvn> and with that I bid you good night.
* geist is moving around the house with laptop, trying to find the best place to keep from coughing too hard
gabi-250_ has quit [Ping timeout: 255 seconds]
<geist> have had a shitty cold for the last 5 days, i think it peaked on friday with a fever, but now it's mostly just moving towards chest congestion
<geist> and i *hate* coughing
gabi-250_ has joined #osdev
<heat> i feel you
<heat> was also under the weather a week ago
<zzo38> I think that it is worth to design both low-level and high-level stuff, and such thing also should be implemented in wiki
zzo38 has left #osdev [#osdev]
jcmdln has joined #osdev
rcvalle has quit [Ping timeout: 248 seconds]
troseman has joined #osdev
rcvalle has joined #osdev
troseman has quit [Quit: troseman]
SpikeHeron has quit [Quit: WeeChat 3.8]
troseman has joined #osdev
troseman has quit [Client Quit]
SpikeHeron has joined #osdev
troseman has joined #osdev
troseman has quit [Quit: troseman]
slidercrank has quit [Ping timeout: 248 seconds]
troseman has joined #osdev
troseman has quit [Quit: troseman]
elderK has joined #osdev
wand has quit [Remote host closed the connection]
troseman has joined #osdev
wand has joined #osdev
troseman has quit [Client Quit]
[itchyjunk] has quit [Remote host closed the connection]
<Jari--> morning all
heat has quit [Ping timeout: 248 seconds]
<geist> morn
ghostbuster has quit [Quit: WeeChat 3.6]
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
dude12312414 has joined #osdev
dude12312414 has quit [Client Quit]
bgs has joined #osdev
Matt|home has joined #osdev
wand has quit [Ping timeout: 255 seconds]
Arthuria has quit [Remote host closed the connection]
wand has joined #osdev
<klys> user: mdasoh hostname: whatever model: gigabyte mz72-hb0 v3.0 distro: debian unstable/sid gnu/linux amd64 kernel: linux 6.0.0 uptime: up 5 days, 4:24, 8 users, load average: 0.06, 0.15, 0.26 processes: 637 kvm: 2 running virtual machines window manager: openbox desktop environment: none shell: /bin/bash terminal: konsole 4:22.12.0-2 packages: 4248 temperature: not supported by kernel cpu: AMD
<klys> EPYC 7453 28-Core Processor @ 2750.095 MHz gpu: NVIDIA Corporation GM206 [GeForce GTX 960] (rev a1), reserved to vm running ms windows 7 professional 64-bit resolution: 1366x768 via magewell usb capture hdmi ram: 242958.6 free / 257617.4 total MiB ssd: 1154342732 free / 1377099228 total disk: 12692037120 free / 13371027980 total reserved to vm running debian testing/bookworm gnu/linux amd64
vin has quit [Quit: WeeChat 2.8]
<geist> klys: grats
<klys> geist, sure, thx
AmyMalik has quit [Quit: Bye Open Projects!]
eroux has quit [Quit: Textual IRC Client: www.textualapp.com]
Ellenor has joined #osdev
Ellenor is now known as AmyMalik
wand has quit [Remote host closed the connection]
ThinkT510 has quit [Quit: WeeChat 3.8]
ThinkT510 has joined #osdev
wand has joined #osdev
<geist> well, got Lk kinda running on the vision five board. having trouble getting the uart to fire irqs, but will leave that to another day
<geist> sadly the way the plic is set up and the uart are completely undocumented on the JH7110 soc, but the device tree tells me they're pretty darn standard
<geist> so it's probalby just some register i have to frob somewhere
GeDaMo has joined #osdev
<geist> oh i see what it is, it's the stupid hart 0 misaligning the contextx in the PLIC
foudfou_ has quit [Quit: Bye]
<klys> w00t
<sham1> w00t w00t
foudfou has joined #osdev
<GeDaMo> wಠಠt :|
danilogondolfo has joined #osdev
eroux has joined #osdev
<geist> sweet. seems to be working fine now
<klys> so nothing is wrong?
<geist> nah just had the mappings off. it's a stupid PLIC driver problem that i need to solve once and for all
<geist> i keep copying a PLIC driver for every riscv target and hacking it, because the mapping of cpu # (hart) to interrupt target (the plic's notion of a cpu) is not uniform
<geist> annoyingly so. the vf2 has the same 'hart 0 is a machine mode only cpu that isn't startbed by default' thing
<geist> so it uses up just one interrupt target on the plic, but all the other harts (1 - 4) use two
<geist> so the interrupter targets are offset by one
<geist> because for every cpu in the PLIC that has both M and S mode, it has two banks of registers, but only one bank for cpu 0
<geist> basically.
<geist> the device tree describes this which is the Real Solution to it
<geist> anyway when i copy pasted it i didn't fix it up properly here
<geist> i knew it was a prlbme but i thought i had tweaked it, but i insufficiently did
<klys> that sounds p.awesome, so you have visionfive support...
<geist> yah still need to add the secondary birngup code
<geist> but i'll do that tomorrow
<geist> it's not all that exciting to be honest, i had done bringup on a proper sifive unleashed board a long time ago, and this is all in all pretty close
<geist> and have been running on qemu riscv whcih is also fairly close
<geist> but it's nice to see it running on physical hardware again. i gave away my unleashed board years ago so haven't had a good physical riscv rv64 machine to run on in a while
<bslsk05> ​www.starfivetech.com: 新闻详情页
<GeDaMo> "RISC-V SBC VisionFive 2 Officially Shipped"
<geist> yep. got mine last week. it's actually pretty decent
<GeDaMo> Neat
<geist> i mean not really worth the money from a performance point of view, but it's usable
<geist> for $100 you get a half decent riscv machine (think cortex-a53) with 2 ethernet ports and 8GB ram.
<geist> i wouldn't try running any UIs on it though. i think the gpu stuff is in terribad shape
<geist> but as a headless linux box or something to hack on it's pretty straightforward
gog has joined #osdev
nyah has joined #osdev
<zid> is there a complicated german word for "I just found the best meme but it requires knowledge of a very specific thing to be understandble and I am now sad"
slidercrank has joined #osdev
<dminuoso> Memekontextverständnissmangeltrauer
gildasio has joined #osdev
<dminuoso> Oh, drop one of the s in the middle
<dminuoso> Memekontextverständnismangeltrauer
wand has quit [Remote host closed the connection]
<dminuoso> zid: does that work for you?
<zid> Thanks dminuoso
<dminuoso> Replace trauer with kummer, that sounds a bit cuter
<dminuoso> Memekontextverständnismangelkummer
<dminuoso> Yes that.
wand has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
bauen1 has quit [Ping timeout: 268 seconds]
eroux has quit [Ping timeout: 248 seconds]
bauen1 has joined #osdev
eroux has joined #osdev
bauen1 has quit [Ping timeout: 268 seconds]
craigo has joined #osdev
craigo has quit [Read error: Connection reset by peer]
craigo has joined #osdev
Raito_Bezarius has quit [Remote host closed the connection]
theboringkid has joined #osdev
bradd has quit [Ping timeout: 248 seconds]
wand has quit [Remote host closed the connection]
gabi-250_ has quit [Remote host closed the connection]
gabi-250_ has joined #osdev
wand has joined #osdev
Vercas0 has joined #osdev
theboringkid has quit [Ping timeout: 255 seconds]
Vercas has quit [Ping timeout: 255 seconds]
Vercas0 is now known as Vercas
leon has quit [Ping timeout: 252 seconds]
gabi-250_ has quit [Ping timeout: 255 seconds]
gabi-250_ has joined #osdev
leon has joined #osdev
foudfou has quit [Quit: Bye]
elastic_dog has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
Left_Turn has joined #osdev
bauen1 has joined #osdev
smach has joined #osdev
smach has quit []
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
gxt__ has quit [Ping timeout: 255 seconds]
gxt__ has joined #osdev
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
ghostbuster has joined #osdev
<mjg> Manually-specified variables were not used by the project:
<mjg> CMake Warning:
<bslsk05> ​github.com: llvm-project/libc/benchmarks/automemcpy at main · llvm/llvm-project · GitHub
elastic_dog has quit [Remote host closed the connection]
elastic_dog has joined #osdev
Arthuria has joined #osdev
Left_Turn has quit [Remote host closed the connection]
<mrvn> mjg: so now you have the prooven best memcpy?
<mjg> no, now i have complaints abut the usual problem: a howto not working
<mrvn> Does it output source code one can still read?
<mjg> the machinery to autogen stuff does not build, see above
elastic_dog has quit [Ping timeout: 248 seconds]
elastic_dog has joined #osdev
theboringkid has joined #osdev
elastic_dog has quit [Read error: Connection reset by peer]
elastic_dog has joined #osdev
bauen1 has quit [Ping timeout: 252 seconds]
elastic_dog has quit [Ping timeout: 248 seconds]
elastic_dog has joined #osdev
heat has joined #osdev
<heat> mjg, omg its our neighbourhood freebsd developer
<heat> hi!
<mjg> edited some cmakes, got the entire thing to try to build but it fails
<heat> how tf are you building it
<mjg> > /libc/benchmarks/automemcpy/lib/Implementations.cpp:6:10: fatal error: 'src/string/memory_utils/elements.h' file not found
<mjg> there is no file named elements.h anywhere nor anything like it
<heat> also btw pcm (PECIMAL) fixes got merged
<mjg> heat: dude i literally tried to follow the instruction
<heat> i have an elements.h in my local tree
<mjg> what's your version
<heat> old, c9faea04b1f8ef658ee5367ba8f00266b2051263, dated may 6
<mjg> let's try
<mjg> maybe it is autogened?
<heat> no
<heat> it's in ./libc/src/string/memory_utils/elements.h
<heat> (and I don't do in-tree builds)
<mjg> i configrm the file is there if i go back to that commit
<mjg> see top of main
<mjg> commit 534f4bca58f856eaecfcf4a698e7e6b2470349e4
<mjg> Author: Guillaume Chatelet <gchatelet@google.com>
<mjg> Date: Tue Oct 25 11:09:59 2022 +0000
<mjg> [libc] remove mem functions dead code
<mjg> whacked in this commit
<heat> WHACKED
<heat> ok now that we finally tracked down the issue
<heat> why is spinlock_enter, etc not inline?
<mjg> geezer
<heat> I touched spinlock stuff last evening
<mjg> the code is pretty atrocious so it maeks sense to not be inline
<heat> let me gist you this shit
<heat> i'm not convinced that the codegen is that good
<heat> after looking at linux, I'm probably missing some stuff
remexre has quit [Ping timeout: 252 seconds]
mctpyt has joined #osdev
<mjg> CMake Error at /root/repos/llvm-project/libc/CMakeLists.txt:116 (message): entrypoints.txt file for the target platform 'freebsd/x86_64' not found.
<mjg> sigh
<mjg> i'm done with this for the day
<bslsk05> ​gist.github.com: module_add_disasm.S · GitHub
<heat> tell me what you think
<heat> module_add is a pretty trivial hand rolled single linked list insert with a spinlock over it (which is why i'm using it as a codegen "benchmark")
<heat> doing dec and then re-fetching the value + test is pretty weird but I don't have condition code fuckery like linux does for this (yet)
xenos1984 has quit [Ping timeout: 246 seconds]
<heat> the pushf; pop; test is also bad
xenos1984 has joined #osdev
mctpyt has quit [Ping timeout: 248 seconds]
<mjg> you preemption code remains total crap
<mjg> freebsd, with all its flaws, can do one branch and that's it
<heat> ideally you could just use the preemption counter as a way to gauge if I need to/can reschedule
eroux has quit [Ping timeout: 248 seconds]
<mjg> you don't want to check if you can reschedule, normally you ahve to assume the counter goes to 0
<mjg> what you want is to mark somehow that shit needs to be doe
<mjg> and check for that
<mjg> the easiest way is to have another var
<mjg> you would get a failing grade in autopreempt
theboringkid has quit [Ping timeout: 268 seconds]
eroux has joined #osdev
<heat> so linux stuffs a bunch of crap in preempt_count
<heat> the actual preempt count is a tiny portion of the whole var
<heat> the idea would be to stick "thread needs preempt" and "needs softirq" in the top bits
<heat> and even then, I'm not sure I need a "needs softirq" in preempt_enable code
<heat> because softirqs should only be raised in hardirq context, and if so there's only one exit point
<heat> the test for irq-on I'm not too sure I need as well
<heat> like, having a spin_lock() and then a spin_lock_irqsave is valid, and nothing will malfunction in this case. having a spin_lock_irqsave and then a spin_lock is not valid
<heat> right?
<mjg> simplifying but yes
gog has quit [Quit: Konversation terminated!]
<mjg> sigh!
<mjg> while i don't have the auto thing operational, it got far enough to generate Implementations.cpp
<mjg> and this is where i'm deeply disappointed with the paper
<mjg> *every* *single* *one* imlementation rolls with increasing size checks
<mrvn> where are the primitives it uses to build a memcpy?
<heat> their whole premise is very farf etched anyway
<mjg> not even one rolls with the approach found elsewhere
<heat> "lets write a memcpy in C++"
<mjg> it is definitely way less than i thought it would be
<heat> i am very much not convinced that writing a memcpy in C++ is feasible
<mrvn> heat: why not?
<mjg> i checked the asm, it is ok, provided the code is not doing the stupid
<mjg> for example if(size < 8) return splat_set<HeadTail<_4>>(dst, 0, size);
<heat> you can't legally do overlapping stores
<mjg> compiles to an ok overlapping store
<heat> how?
<mjg> it *does* compile to it
<mjg> don't ask me how, i don't speak c++
<heat> how could you even do it in C?
<mjg> in c i don't think you could
<mrvn> your code will only ever be as good as your compiler. but they are pretty good.
<heat> well C++ is the same shit
<mjg> aha!
<mjg> builtin::Memset
<heat> unless __attribute__((aligned(1))?
<mjg> et al
<mjg> so there is an explicit compiler support for it
<heat> no
<heat> builtin:: is not compiler stuff I guarantee you that
<heat> __builtin_ would be, as in C
<mjg> well then happy grepping
[itchyjunk] has joined #osdev
<mrvn> Note that gcc/clang understand and detect various forem of memset() and will replace the code with their own. So you might generate 3 different memset flavours that all just turn into the compilers memset.
<mjg> so i'm sayin when they do this:
<mjg> return builtin::Memcpy<4>::head_tail(dst, src, count);
<mjg> if (count < 8)
<mjg> it *does* compile to an overlapping store, the way one would write in asm
<mjg> and for smaller sizes they correctly handle partial register access
<mjg> as in, perhaps modulo adding or not adding alignment to jump targets, this can generate the code i would write by hand
<mjg> so far anyway
<heat> ok, found the builtin
<heat> __builtin_memcpy_inline
<mjg> compiler support at last!
<mjg> there is also a lot of magic concerning what kind of simd to use
<bslsk05> ​godbolt.org: Compiler Explorer
<mjg> all my above comments were about gpr
<heat> i don't see how this is doing overlapping stores
<mjg> because you used sizes known at compilation time
<heat> those are the only available options
<heat> you can't give it a non-constexpr value
<mjg> take a page from the original
<mjg> if (count < 4) lolpunt();
<mjg> if (count < 8) __builtin_memcpy_inline(..., count);
<bslsk05> ​dpaste.com <no title>
<mjg> just copy some statements from there and adjust
<mjg> you literally provided no size arg :O
<heat> yeah erm no
<zid> if(count < 4) { DEBUG("Ever heard of MOV bro?"); }
<heat> I can't get it to do overlapping stores
<heat> no idea if there's more magic involved with this shit
<mjg> return builtin::Memcpy<8>::head_tail(dst, src, count);
<mjg> this magic compiles to it
<mjg> presumably they massage it to some extent
<heat> oh
<heat> they actually do the arithmetic
<mjg> anyway the fucking autogen uses a retired variant
<mjg> will have to update
<bnchs> hi
<heat> hi
<mjg> hello there
<heat> general
<heat> kenobi
<bnchs> bye
<mjg> have you seen that star wars movie with subs translated from english to chinese and *back*?
<mjg> there is a fan made dub using them
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> there you go
<heat> this is cool
<mjg> close to cool
<mjg> this being 2 separate statements is a little iffy
<mrvn> mjg: shì de
<mjg> i would prefer a more explicit "this is a fucking overlapping store situation"
<mrvn> 是的
<mjg> from this to that range
<heat> well this is supposed to be used by people who are bad at programming
<heat> like llvm-libc devs
gildasio has quit [Remote host closed the connection]
<mjg> lemme find that star wars movie
gildasio has joined #osdev
<heat> fyi they like interleaved loads and stores
<bslsk05> ​www.youtube.com: Star War The Third Gathers: Backstroke of the West HD (Dubbed) - YouTube
<mjg> heat: rgular or simd
<heat> regular
<mjg> were.dat
<heat> let me write a gpr memcpy with this shit
<heat> give me 5
<mjg> try the movie since 24:00
<mjg> i'm not looking at any asm until you do mofer
danilogondolfo has quit [Quit: Leaving]
<heat> omg it unrolled the hot loop
<heat> cry.jpeg
<mjg> loop?
Vercas has quit [Ping timeout: 255 seconds]
<mjg> if you slap builtin::Memcpy<64>::head_tail(dst, src, count); with -mno-sse i strongly suspect it will be the *pessimal* fuckton of straight movs
<mjg> a'la unrolled
remexre has joined #osdev
<bslsk05> ​godbolt.org: Compiler Explorer
<heat> addq $32, %rdi
<heat> addq $-32, %rdx
<heat> ja .LBB0_2
<heat> addq $32, %rsi
<heat> cmpq $31, %rdx
<heat> testq %rdx, %rdx
<heat> jne .LBB0_4
<heat> jmp .LBB0_13
<heat> truly genius code
<heat> all bow to the all seeing compiler
<mjg> it does generate interleaved ops tho as you mentioned
<mjg> will add that to my bench matrix
<heat> erm remove the fno-omit-frame-pointer ofc
<mjg> it also goes highest to lowest
<heat> yep
<heat> fyi gcc does not have this
<mjg> your code is wrong though
<heat> where?
<mjg> len >= 32 does not have to be a multiply of 32
<heat> hm?
<mjg> oh it is fine, i misread
<mjg> it also generates decreasing switch so to speak
<mjg> which is what typical hand-coded routines are doing
xenos1984 has quit [Ping timeout: 246 seconds]
Vercas has joined #osdev
<mjg> and which is the exact opposite of what was handcoded in the automemcpy case
<mjg> testing these cases upfront vs what i have now is what i wanted to do anyway
<heat> i mean, I did explicitly make it decreasing
<mjg> now i got even more reason
<mjg> oh wait, you added if (*likely*)
<mjg> this may be why
<mjg> let's drop all hints
<mjg> aaand no switch upfront
<mjg> :p
<mjg> welp i'm gonna tst anyway, we will see what happens
<mjg> another gripe i have with the automemcpy thing is that they just randomized all their samples
<mjg> they have actual run of what really happened
<mjg> i found there are many cases where memset is called repeatedly for the same size
<mjg> which also means branch predictor is gonna learn
<mjg> and make it faster
<mjg> this goes away if you roll with random
<heat> to be fair I think it's interesting to see what happens if you write it non-optimally
<heat> what I wrote is pretty much a translation-ish of my memcpy
<heat> so what happens if you write a bad memcpy like llvm-libc
<mjg> it is not *bad*
<heat> just a bunch of heavily abstracted if cases or something
<heat> well, the way it is coded is
<mjg> wdym
<heat> it's just praying the compiler gets it right
<mjg> afaiu the compiler was modified specifically to generate precisely what they ask for
<mjg> so i don't htink there is much praying going on here
<heat> __builtin_memcpy_inline is, yes
<heat> branching, etc? no
<heat> unless there's some "if __builtin_memcpy_inline is used, this is a memcpy function so lay things out like we want to"
<mjg> the generated asm is pretty clearly what they intended
<mjg> i agree in principle things may change and it can start compiling differently
<heat> shrug
<heat> I wrote my function in a very explicit "this is how I want it to be" but it completely ignored my suggestion
gog has joined #osdev
<mjg> i don't htink it did
<mjg> you got the size switch upfront, as requested with 'likely'
<mjg> and the 32 byte loop was placed prior to other stores
<heat> I didn't ask for a switch
<heat> i wrote a cascading thing like me/freebsd/linux
<mjg> if you wanted a cascade you should have written it like automemcpy
<mjg> if (size > x)
<mjg> a bunch of times
<heat> ok so there's no control in this stuff
<heat> good to know
<heat> compiler giveth, compiler taketh away
<heat> "hurr durr bad codegen send a compiler patch" does not work for most people
<mjg> now i rmeember why i did 32 byte read upfront
<mjg> it is so that when used as memmove i don't have to check for overlap
<heat> which branch?
<mrvn> If you have N checks of (size > x) have you considered benchmarking all the different orders for those tests and doing them linear or as tree?
<heat> all the <32 stuff you have does loads up front for that exact reason
<heat> you even commented that, pretty smart
<mjg> note none of these do the pointer comparison
<mjg> i would like to point out that for copyin/copyout use you can assume the user pointer is lower
<mjg> so perhaps codegen could reflect that
theboringkid has joined #osdev
xenos1984 has joined #osdev
<mjg> i have no clue what impact, if any, this has
gildasio has quit [Ping timeout: 255 seconds]
Left_Turn has joined #osdev
Turn_Left has joined #osdev
Left_Turn has quit [Ping timeout: 248 seconds]
gildasio has joined #osdev
<bslsk05> ​github.com: dpdk/rte_memcpy.h at master · scylladb/dpdk · GitHub
<mjg> > memcpy() simply copies data one by one from one location to another. On the other hand memmove() copies the data first to an intermediate buffer, then from the buffer to destination.
<mjg> :d
Turn_Left has quit [Ping timeout: 255 seconds]
<mjg> // Sample program to show that memmove() is better than
<mjg> // memcpy() when addresses overlap.
<mjg> :D
<mjg> too bad they don't have a chapter on writing memcpy itself
<mjg> ... they do
<mjg> fuck me
<mjg> > How to implement memmove()?
<mjg> // Create a temporary array to hold data of src
<mjg> char *temp = new char[n];
<mjg> ey heat, you can learn a thing or two here https://www.geeksforgeeks.org/write-memcpy/
<bslsk05> ​www.geeksforgeeks.org: Write your own memcpy() and memmove() - GeeksforGeeks
<mjg> fucking retarded
<mrvn> mjg: ouch, that segfaults due to stack overflow so fast you can't even blink
xvmt has quit [Ping timeout: 248 seconds]
<gog> lmao watf
<zid> memmove isn't implementable in C without a compiler loophole to define it
xvmt has joined #osdev
<zid> which is fun
<mrvn> zid: huh? Sure it is.
<Ermine> gog: may I pet you
<gog> yes
gxt__ has quit [Read error: Connection reset by peer]
* Ermine pets gog
* gog prr
<gog> it requires a check for overlap, and if the overlap is idk which way you have to do it head to base rather than base to head
<bnchs> mjg: bro, don't use geeks for geeks
<mjg> bnchs: what? why
<Ermine> depends on overlap kind
<mjg> look GREAT
<mrvn> The only problem is comparing 2 pointers that may or may not belong to the same allocation. So it's implementation defined.
<bnchs> it's slow, bloated, and has low-quality or copied answers
<gog> yeahhhh
<gog> aliasing rules
<mjg> bnchs: no wait mate
<mjg> bnchs: no way mate
<mrvn> bnchs: it also lies about the Auxiliary Space: O(1), it's O(n)
<mjg> how can this be bad if it is top google result
<mjg> basic logic mate
<Ermine> ah indeed, you cannot compare pointers...
<bnchs> because they pump their SEO
<bnchs> lol :3
<mrvn> gog: char * can always alias.
<mjg> but if it was bad people would not be using it
<mjg> i'm gonna take their memmove without credit
<Ermine> ... unless they belong to the same allocation...
gxt__ has joined #osdev
<mjg> their shell is my zsh replacement
<mrvn> If you want to write memcpy() in 100% correct C is there any other way but copying byte by byte?
<mrvn> You can't legally cast src/dst to short, int, long to get larger read/writes that I can see.
<mjg> fair q, i have no idea
* nikolar pets gog
jjuran_ has joined #osdev
[_] has joined #osdev
sinvet__ has joined #osdev
grange_c00 has joined #osdev
fr56677654rrfffu has joined #osdev
DrPatater has joined #osdev
Bonstra_ has joined #osdev
terrorjack8 has joined #osdev
gxt__ has quit [Ping timeout: 255 seconds]
dinkelha1 has joined #osdev
Arthuria has quit [Ping timeout: 248 seconds]
Patater has quit [Ping timeout: 248 seconds]
dinkelhacker has quit [Ping timeout: 248 seconds]
[itchyjunk] has quit [Ping timeout: 248 seconds]
terrorjack has quit [Ping timeout: 248 seconds]
jjuran has quit [Ping timeout: 248 seconds]
grange_c0 has quit [Ping timeout: 248 seconds]
Gooberpatrol_66 has quit [Ping timeout: 248 seconds]
nikolar has quit [Ping timeout: 248 seconds]
simpl_e has quit [Ping timeout: 248 seconds]
Bonstra has quit [Ping timeout: 248 seconds]
sinvet has quit [Ping timeout: 248 seconds]
terrorjack8 is now known as terrorjack
grange_c00 is now known as grange_c0
jjuran_ is now known as jjuran
fr56677654rrfffu has quit [Quit: Quit]
nikolar_ has joined #osdev
gxt__ has joined #osdev
nikolar_ is now known as nikolar
fr56677654rrfffu has joined #osdev
Left_Turn has joined #osdev
Arthuria has joined #osdev
Turn_Left has joined #osdev
fr56677654rrfffu has quit [Quit: Quit]
Left_Turn has quit [Ping timeout: 248 seconds]
fr56677654rrfffu has joined #osdev
fr56677654rrfffu is now known as EineNdamen
EineNdamen has quit [Changing host]
EineNdamen has joined #osdev
EineNdamen has quit [Quit: Quit]
EineNdamen has joined #osdev
EineNdamen has quit [Client Quit]
EineNdamen has joined #osdev
EineNdamen has quit [Client Quit]
EineNdamen has joined #osdev
slidercrank has quit [Ping timeout: 252 seconds]
Left_Turn has joined #osdev
gildasio has quit [Remote host closed the connection]
Turn_Left has quit [Ping timeout: 256 seconds]
gildasio has joined #osdev
EineNdamen has quit [Read error: Connection reset by peer]
dude12312414 has joined #osdev
EineNdamen has joined #osdev
tepperson has joined #osdev
elderK has quit [Quit: Connection closed for inactivity]
EineNdamen has quit [Read error: Connection reset by peer]
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
EineNdamen has joined #osdev
EineNdamen has quit [Read error: Connection reset by peer]
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
EineNdamen has joined #osdev
EineNdamen has quit [Remote host closed the connection]
<geist> mrvn: i dont *think* so, but then i think you might be technically forbidden from casting it to a char either
<geist> does make you wonder what memcpy looked like in environments were pointers were natively fat. ie segment:offset kinda stuff
<geist> or maybe in those environments they werent? ie, DOS large memory model
<mrvn> geist: you can cast anything to char. It's defined to give you the bit representation of an object.
<mrvn> char*
* geist nods
EineNdamen has joined #osdev
<mrvn> I guess you can do: uint16_t t = (((uint16_t)src[i]) << 8) | (src[i+1]); and hope the compiler optimizes that into a single load. But then you need to fix that for the hosts endianness and convince the compiler about alignment issues.
EineNdamen has quit [Read error: Connection reset by peer]
theboringkid has quit [Quit: Bye]
theboringkid has joined #osdev
EineNdamen has joined #osdev
EineNdamen has quit [Read error: Connection reset by peer]
bgs has quit [Remote host closed the connection]
lav is now known as yogaball
Vercas7 has joined #osdev
yogaball is now known as lav
Vercas has quit [Ping timeout: 255 seconds]
Vercas7 is now known as Vercas
wand has quit [Remote host closed the connection]
gildasio has quit [Ping timeout: 255 seconds]
gildasio has joined #osdev
EineNdamen has joined #osdev
EineNdamen has quit [Client Quit]
wand has joined #osdev
Vercas has quit [Quit: Ping timeout (120 seconds)]
Turn_Left has joined #osdev
<heat> mjg, oh my god that's a genius strategy
<heat> i'm surprised it's not in llvm-libc
Vercas has joined #osdev
<heat> hmm, can't really randomly allocate in the kernel can you
<heat> so in my case I will do the trivial optimization and allocate the temporary buffer on the stack
<mjg> right
<mjg> make sure to have multimegatybe stacks tho
<mjg> i mean you don't want limitations to memmove
<mjg> now, if they worked on automemmove
<mjg> perhaps they would stumble upon it
<mjg> lol @ folk 's what i'm sayin
Left_Turn has quit [*.net *.split]
<mrvn> Oh yeah, multimegatybe kernel stack for every thread. That will be so efficient.
<mjg> 1G stacks
<heat> real talk now im wondering if that can actually be faster than copying backwards
<mrvn> I thought we determined that the prefetcher will notice backwards sequential access and it won't be any slower
<mjg> ye i'm gonna 1. revisit the optimization docs 2. run some benchez once machinery gets operational
<heat> assuming you already have a temporary buffer, two forward copies vs one backwards
<mjg> brah
<mjg> openbsd way would be ONE buffer
<mjg> and all cpus just take turns doing memmove
<heat> one buffer and a bkl?
<heat> sgtm
<mjg> ok mjg@
<mjg> now i wonder, when i memset
<mjg> should i memset a temp buf and copy that?
<heat> geist, latest tianocore should support riscv OVMF
<mrvn> mjg: a buffer or a bunch of registers?
<mjg> what
<mjg> a buffer geeksforgeeks style
<mrvn> and you want to load that buffer into registers over and over?
<heat> obviously you use a global char protected by a mutex
<mrvn> or a buffer with the full size?
<heat> each iteration locks and unlocks
<mjg> each 1 byte access
* mrvn is still set on implementing memcpy/memset/memmove via DMA engine.
<mjg> now that i think of it i can wrap it in a zero cost abstraction for maximum performance
<heat> omg zero cost abstraction i write c++ i should know this omg omg
<mjg> here is a geezer story for you
<mjg> fucking guy claimed lock profiling has "sampling" implemented
<mjg> i tried to be nice and simply said it does not
<mjg> but he was adamanat
<mjg> the "sampling" was incrementing *one* global var for every lock acquire
<mjg> so i enabled it
<mjg> perf went to shit so bad it was not even funny
<mjg> to his credit he conceded it perhaps does not work as intended
Left_Turn has joined #osdev
CryptoDavid has joined #osdev
<heat> you like shit talking solaris so much but they literally gave you your favourite tool in the world, dtrace
slidercrank has joined #osdev
<heat> if freebsd was in charge of profiling you would still be doing lock->lock_acq++;
<mjg> to solaris credit, not only they have a memset which does not just roll with rep
<mjg> but according to at least one comment they checked real wowrld sizes which get passed in
<mjg> that's way above average right there
Turn_Left has quit [Ping timeout: 248 seconds]
<heat> lol
<heat> actually, do you have any sort of real world kernel allocation profiles?
<mjg> you mean what sizes land in kmalloc et al?
<heat> yep
<mjg> i did, nothing i can refer to right now
<heat> i'm curious to see how those look. probably varies wildly based on kernel?
<mjg> however, freebsd is kind of nice here
<mjg> you can vmstat -mz any long running box and get the stats
<mjg> i can ask netflix to give some
<heat> oooooooooooh please do
<mjg> have you seen how vmstat -mz looks like?
<heat> no
<mjg> Type InUse MemUse Requests Size(s)
<mjg> tmpfs name 11 1K 2844 16,32,64,128
<mjg> tmpfs dir 11 1K 2805 64
<mjg> GEOM 765 124K 3842 16,32,64,128,256,512,1024,2048,8192,16384
<mjg> 2 Bucket: 32, 0, 2125, 75743, 432494, 0, 0,79248
<mjg> 4 Bucket: 48, 0, 4946, 51670, 188727, 0, 0,59944
<mjg> 8 Bucket: 80, 0, 6513, 22837, 817309, 107, 0,58590
<mjg> 16 Bucket: 144, 0, 2561, 12503, 115959, 0, 0,29264
<mjg> etc.
<heat> my slab accounting is totally broken because I don't have pcpu counters yet
<heat> oh yeah what freebsd userspace struct is 224-sized?
<mjg> struct-fucking-stat
<heat> aha
<heat> I stared into your bufsizes for a bit and I could tell that the results did change from memcpy -> copyin -> copyout
<heat> it's funny
<mjg> so one interesting bit to ponder is what kind of malloc buckets make sense
<mjg> solaris rolls with several multiplies of 8
<heat> what workload did you run for bufsizes.txt?
<mjg> huge granularity
theboringkid has quit [Quit: Bye]
<mjg> at this point i don't remember mate :p
<mjg> probably building some shit
<mjg> i got some others
<mjg> including fresh onesfrom prod
<mjg> netflix guy says he is afk but can post stuff in 2h
<heat> is this warner losh
<mjg> no
<mrvn> mjg: I think the kernel basically always knows what size objects it needs and data should be kept local. So subsystems should make SLABs for specific sizes they need and you can optimize that far better.
<heat> half of freebsd is developed by him
<heat> ah ok
<heat> that guy and cristos zoulas are both fucking omnipresent
<mrvn> mjg: One thing that doesn't fall into that case though is the name lookup cache since file names are pretty variable.
<mjg> i can give you some bufsize.txt from netflix as well if the guy agrees
<mrvn> But even if you go with the malloc buckets then you know that all memory will be 8/16 byte aligned and have a size that's a multiple of 8/16. You can optimize that nicely too.
<heat> that would be most interesting, thanks
<mjg> linux has 8, 16, 32, 64, 96, 128, 192. 256, 512
<mjg> and then * 2 from there up to 8k
<mjg> i added 768 on freebsd to appease zfs which is doing tons of funny allocs
<mjg> total ram usage went down from it
<mjg> ops it was 384
Turn_Left has joined #osdev
<heat> my only restriction right now is that sizes need to be 16 byte aligned
<mrvn> heat: you have a prev and next pointer in every free object?
Left_Turn has quit [Ping timeout: 252 seconds]
xenos1984 has quit [Read error: Connection reset by peer]
<heat> mjg, 2 qs, 1) what do you think of the linux buddy page allocator 2) how does dtrace, etc effectively get data if you can't memory allocate? just pick a buf size?
<heat> s/memory allocate/allocate memory/g <-- yoda speak moment
<mjg> i don't know linux buddy page alocator
<mjg> dtrace has a bunch of safety measures to abort
<mjg> it preallocs bunch of shit and if it does not fit, you get told there are drops
<mjg> similarly, if it uses too much cpu, it decides something is way off and aborts tracing
<mjg> "systemic unresponsivness" or so they call it
<mjg> see the -b parameter
<heat> man i need bpftrace
<heat> this is GREAT
xenos1984 has joined #osdev
craigo has quit [Ping timeout: 260 seconds]
tepperson has quit [Quit: Client closed]
Left_Turn has joined #osdev
Turn_Left has quit [Ping timeout: 246 seconds]
Turn_Left has joined #osdev
Turn_Left has quit [Read error: Connection reset by peer]
Left_Turn has quit [Ping timeout: 265 seconds]
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
<bslsk05> ​arstechnica.com: Stealthy UEFI malware bypassing Secure Boot enabled by unpatchable Windows flaw | Ars Technica
alexander has quit [Remote host closed the connection]
nyah has quit [Quit: leaving]
<gog> i got the boots
<gog> and i got my gun from the pigs
* sakasama blinks at gog.