klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
ZipCPU has quit [*.net *.split]
FreeFull has quit [*.net *.split]
heat has quit [*.net *.split]
scoobydoo has quit [*.net *.split]
terrorjack has quit [*.net *.split]
Brnocrist has quit [*.net *.split]
DanDan has quit [*.net *.split]
ptrc has quit [*.net *.split]
genpaku has quit [*.net *.split]
elastic_dog has quit [*.net *.split]
dennisschagt has quit [*.net *.split]
wgrant has quit [*.net *.split]
ThinkT510 has quit [*.net *.split]
dormito has quit [*.net *.split]
thatcher has quit [*.net *.split]
woky has quit [*.net *.split]
CYKS has quit [*.net *.split]
k0valski1889 has quit [*.net *.split]
PapaFrog has quit [*.net *.split]
sprock has quit [*.net *.split]
Ram-Z has quit [*.net *.split]
ckie has quit [*.net *.split]
mahk has quit [*.net *.split]
nj0rd has quit [*.net *.split]
_xor has quit [*.net *.split]
FreeFull has joined #osdev
genpaku has joined #osdev
dennisschagt has joined #osdev
the_lanetly_052_ has quit [Remote host closed the connection]
ptrc has joined #osdev
heat has joined #osdev
heat has quit [Remote host closed the connection]
scoobydoo has joined #osdev
PapaFrog has joined #osdev
leah_ has joined #osdev
ZipCPU has joined #osdev
Ram-Z has joined #osdev
the_lanetly_052 has joined #osdev
heat has joined #osdev
woky has joined #osdev
dormito has joined #osdev
terrorjack has joined #osdev
elastic_dog has joined #osdev
Brnocrist has joined #osdev
sprock has joined #osdev
FreeFull has quit []
ckie has joined #osdev
gog` has quit [Ping timeout: 248 seconds]
<geist> ETOOHARD
nyah has quit [Ping timeout: 268 seconds]
<klange> Don't give me ideas...
<klange> So many syscalls I could "implement" by returning such an error code ;)
frkzoid has joined #osdev
<gog> i just do that for all coding tasks rn
<gog> i refuse to grow as a person and learn new things
<zid> same tbh
frkzoid has quit [Ping timeout: 244 seconds]
DanDan has joined #osdev
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
<heat> why is the grub decompressor so slow?
<heat> it's soo odd
<klange> Given your previous complaint, are you in TCG? I've found it's particularly bad at decompression algorithms.
<klange> I think they're unfriendly to the JIT by nature.
<heat> no, KVM
<klange> Ah, then, probably just because it's designed for size and is shit. Which format?
<heat> xz
<heat> 9MB -> 90MB approx
<heat> using the linux cmd line util is much faster
<klange> Probably a bunch of compounding factors with grub's io interfaces, a different xz implementation designed for size... and if this is bios grub that runs in protected mode, so maybe 32-bit instructions vs. 64-bit?
foudfou has quit [Remote host closed the connection]
foudfou has joined #osdev
<geist> Maybe the cache is disabled?
<geist> Is this on x86?
<heat> yes
<heat> although I'm no longer sure on the "much" part
<heat> maybe that was just placebo
gildasio has quit [Quit: WeeChat 3.6]
frkzoid has joined #osdev
jjuran_ has joined #osdev
<geist> Yeah, could also be various levels of compression too, though in general xz is fast to decompress
jjuran has quit [Ping timeout: 252 seconds]
jjuran_ is now known as jjuran
gxt has quit [Remote host closed the connection]
foudfou has quit [Remote host closed the connection]
opal has quit [Remote host closed the connection]
foudfou has joined #osdev
opal has joined #osdev
gxt has joined #osdev
jjuran has quit [Remote host closed the connection]
jjuran has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
gog has quit [Ping timeout: 268 seconds]
heat has quit [Remote host closed the connection]
heat has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
pretty_dumm_guy has joined #osdev
moberg has joined #osdev
heat has quit [Remote host closed the connection]
heat has joined #osdev
heat has quit [Ping timeout: 260 seconds]
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
ThinkT510 has joined #osdev
\Test_User is now known as Test_User
Test_User is now known as \Test_User
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 252 seconds]
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
scoobydoob has joined #osdev
scoobydoo has quit [Ping timeout: 248 seconds]
scoobydoob is now known as scoobydoo
bradd has quit [Quit: No Ping reply in 180 seconds.]
bradd has joined #osdev
gxt has quit [Remote host closed the connection]
gxt has joined #osdev
mahk has joined #osdev
GeDaMo has joined #osdev
bauen1 has quit [Ping timeout: 248 seconds]
wolfshappen has quit [Ping timeout: 248 seconds]
wolfshappen has joined #osdev
wolfshappen has quit [Client Quit]
wolfshappen has joined #osdev
bauen1 has joined #osdev
<mrvn> can you even disable caches on x86?
<Mutabah> Yes, MTRRs
<Mutabah> but pretty rare
puck has quit [Excess Flood]
puck has joined #osdev
opal has quit [Remote host closed the connection]
opal has joined #osdev
gelatram has joined #osdev
sprock has quit [Ping timeout: 252 seconds]
sprock has joined #osdev
gildasio has joined #osdev
<moon-child> can I manually do hot/cold ordering?
<moon-child> just put stuff in .text.cold or something? (What is the section name?)
zaquest has quit [Remote host closed the connection]
zaquest has joined #osdev
frkzoid has quit [Ping timeout: 244 seconds]
heat has joined #osdev
<heat> Mutabah, also CR0.CD
<heat> moon-child, attribute((cold))?
<heat> also x86 has four caching modes controlled by CR0.CD and CR0.NW
<heat> those bits + MTRR are what allow cache-as-ram
<moon-child> heat: in assembly
<heat> .section .text.cold?
gog has joined #osdev
gareppa has joined #osdev
gareppa has quit [Remote host closed the connection]
<heat> sorry, .section .text.cold,"ax"
<heat> I've screwed myself over quite a few times over not specifying that stuff
<dminuoso> mjg: Curious, critizing a company should *exactly* be done when you are there, not a good idea afterwards. *shrugs*
gildasio has quit [Quit: WeeChat 3.6]
gildasio has joined #osdev
MiningMarsh has quit [Quit: ZNC 1.8.2 - https://znc.in]
MiningMarsh has joined #osdev
opal has quit [Remote host closed the connection]
opal has joined #osdev
smach has joined #osdev
xenos1984 has joined #osdev
bauen1 has quit [Ping timeout: 260 seconds]
puck has quit [Excess Flood]
puck has joined #osdev
bliminse has quit [Ping timeout: 248 seconds]
bliminse has joined #osdev
gog has quit [Quit: byee]
gog has joined #osdev
bauen1 has joined #osdev
gog has quit [Quit: byee]
gog has joined #osdev
gelatram has quit [Ping timeout: 252 seconds]
Celelibi has quit [Read error: Connection reset by peer]
Celelibi has joined #osdev
<heat> anyone familiar with decompression?
<heat> I'd like to know how much slower is streamed decompression vs decompressing everything at once
<heat> (generally)
<heat> I implemented zstd initrd decompression in the kernel itself yesterday (instead of relying on GRUB, which is slow and also doesn't support zstd)
<clever> heat: decompression that can be done with threads seems like a good option for gaining massive speed
<heat> the problem is that I inevitably decompress everything at once (and do it on a try by try basis)
<clever> some compression formats are based on blocks, and each block has a header detailing the compressed and uncompressed size
<moon-child> I don't think threading is incompatible withs treaming; you just shard
<heat> zstd doesn't have that I think
<clever> if you want to be smarter about decompression, you need a compression format that supports seeking
<clever> while most gzip libraires (for example) dont directly support seeking, you can still add it in
<heat> moon-child, the important bit here is that I have the whole source buffer, but I don't have the whole dst buffer
* moon-child nods
<clever> if you scan the headers for each block, you can convert a byte offset, into a the start of a block, and an offset within that block
<clever> then you can skip to decompressing just that block
<clever> you can also do what things like zfs do, where each FS extent is a completely self-contained compressed object
<clever> so you just read the FS metadata, and decompress the right block
<heat> homie this is a compressed tarball
<heat> i'm not going around that
<clever> but in the zfs case, things are complicated, you have 128kb extents, being compressed seperately, and then turned into a series of 4k blocks
<heat> at least not until I figure out squashfs
<mrvn> didn't we figure out that the SD card is slower than a single core can decompress?
<clever> ah, for a simple .tar.gz, i would recommend 2 things
<heat> ideally I would craft a squashfs image and be done with it - the problem is that the documentation isn't great, as usual
<clever> 1: do a streaming decompress of the entire file, parse the .tar headers as you go, make note of the byte-offset&filename of every file in the tar
<heat> most linux distros use squashfs images for their livecd
<clever> 2: as your doing that, also make note of the uncompressed->compressed byte offsets for each block of gzip data
<clever> then you can resume decompression in the middle of the .tar.gz at any time, and skip ahead to the right tar entry
<heat> this is not gz, but zstd
<heat> i'm not sure I can do that
<heat> seems risky at least
<mrvn> clever: you can only start in the middle when you know the dictionary at that point
<clever> check if zstd is block based, and if you can uncompress just one block
<heat> I'm probably looking into just doing it all in one pass
<j`ey> heat: is this for the initrd or?
<heat> j`ey, ack
<mrvn> You can record all the places in a gz file where the dictionary is flushed.
<clever> mrvn: and where does gzip store the dictionary? how does `cat foo.gz bar.gz > baz.gz` work, is the dictionary a record within the stream and bar.gz updates it?
<heat> j`ey, the problem is that the problem here is when I want to generate a livecd environment
<heat> whoa, looped a bit there
<mrvn> clever: gzip builds the dictionary from your input
<clever> ah, that answers half of that
<heat> I can get huge livecd env, 200MB for instance
<clever> so you need to record the position of each directionary, the position of each block, and how much the block expands to
<heat> by decompressing everything at once, I take up around 250MB
<clever> then you should be able to seek within gzip
<mrvn> clever: as said the dictionary is build from your input. You have to start at a place where the dictionary is flushed.
<clever> mrvn: ah, so treat the offset immediately after a flush, as a new block?
<mrvn> yep.
<clever> and then that whole chunk, up to the next flush, is one unit
<clever> and you can skip to the start of any of those units
<clever> and then just record the byte offset to the start of each, and how much it unpacks to
<mrvn> gzip also has a rsyncable option where it flushes the diction a few extra times when the adler32 checksum of the input is 0.
<clever> so you can convert an output offset, to an input offset pair
<mrvn> Makes it flush at the same places in a file even if the start of the file changes.
<clever> ah, neat
<clever> i assume that makes the binary diff smaller?
<mrvn> means the compressed files has identical parts.
<clever> yeah
<mrvn> more of them anyway.
<clever> zfs gets similar (assuming no insertions, only overwrites), by splitting the file into 128kb chunks first, then compressed each chunk seperately
<clever> but zfs is also designed to allow seeking within a compressed object
<bslsk05> ​dr-emann.github.io: Squashfs Binary Format
<clever> heat: and now your making me want to add squashfs to little-kernel, lol
<clever> at a glance, that looks well documented
<clever> bbl
bauen1 has quit [Ping timeout: 252 seconds]
<heat> you didn't even add ext4 yet!
<heat> also that was reverse engineered apparently
<heat> so... unclear if its great
gildasio has quit [Ping timeout: 268 seconds]
gildasio has joined #osdev
frkzoid has joined #osdev
<clever> heat: as for why that idea popped into my head, i was recently mentioning on the rpi forums, about how my baremetal code can do things like a slide show you might find on an advertising sign, at very short boot times and perfect vsync'd swaps
<clever> and the current limitations, are no hdmi (solved by running on the arm side, with the official firmware)
<clever> and using an intrd for the image file payload seems like a way to simplify config
xenos1984 has quit [Quit: Leaving.]
<bslsk05> ​pierrekim.github.io: 2-byte DoS in freebsd-telnetd / netbsd-telnetd / netkit-telnetd / inetutils-telnetd / telnetd in Kerberos Version 5 Applications - Binary Golf Grand Prix 3 - IT Security Research by Pierre
<gog> dang
<heat> oh no
<heat> not telnet!
gildasio has quit [Remote host closed the connection]
gildasio has joined #osdev
<mats1> the wonders of 30y/o open sores
frkzoid has quit [Ping timeout: 260 seconds]
<gog> love me some programmatic ulcers
<heat> so everyone copied their telnet implementations from each other
<heat> and now they all have an exploit
<heat> this sounds so BIOS its not even funny
<mjg> man
<heat> man
<mjg> i was at a workplace which refused to retire telnet at 2011
<heat> swear to fucking god
<mjg> despite me pointing out the daemon is unused and it is waiting for someone to fuzz it for lulz
<heat> you need to start to name drop
<mjg> this one was leading polish webhosting provider, nazwa.pl
<heat> your employers are all getting blacklisted
<mjg> anyho
<mjg> it was all nice and dandy until someone did preciesly that -- a root priv 0day dropped around that time for telnetd
<mjg> they turned something which should have been a mere curiosity into a an actual threat
<mjg> make no mistake though, webhosting companies are a shithole
<clever> the only time i ever really had an interest in telnet, was when i was trying to run commands remotely from another script, on windows, but the non-deterministic binary junk in telnet got in the way at the time
<clever> i just made a custom tcp protocol instead
<mjg> for them security is an old african word for irrelevant
<clever> which is a better choice, then it can only trigger the actions i approve of, and cant just run anything
<zid> I likes telnet
<mjg> heat: btw that place also used sendmail :-P
<zid> It's up at 0xFF so it's utf-8 clean, has useful but not stupidly complex commands
<mjg> i don't know what you heard about that unit of a mta
<heat> custom tcp protocol is almost as scary as custom udp protocol that implements tcp
<heat> mjg, they're very old-unix, i like it
<heat> mckusick would be proud
<mjg> would not, that was linux
<mjg> debian
<heat> debian is indeed old
<mjg> that was debian woody, whihch was already obsolete when i joined
<mjg> 8)
<heat> debian is obsolete from its release onwards
<mjg> wait, there is no openbsd on that list
<mjg> did they whack telnetd?
<mjg> ... yes they did
<mjg> but the client is still there
<clever> heat: ive made countless custom line-based tcp protocols, and one binary one (that was wrapped in tls)
<mrvn> "... and because *physics* ..."
nyah has joined #osdev
carbonfiber has joined #osdev
matt__ has joined #osdev
dude12312414 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 252 seconds]
the_lanetly_052 has joined #osdev
dionys has quit [Ping timeout: 255 seconds]
eck has quit [Quit: PIRCH98:WIN 95/98/WIN NT:1.0 (build]
dionys has joined #osdev
matt__ is now known as freakazoid333
eck has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 264 seconds]
FreeFull has joined #osdev
the_lanetly_052 has joined #osdev
the_lanetly_052_ has quit [Ping timeout: 260 seconds]
seer has quit [Quit: quit]
gildasio has quit [Ping timeout: 268 seconds]
seer has joined #osdev
gildasio has joined #osdev
freakazoid333 has quit [Ping timeout: 260 seconds]
<geist> TIL about -ffinite-loops
<geist> useful: nerfs the ability of gcc and clang to mark infinite loops as UB
<mrvn> somehow when I write "while(true) { }" it doesn't eliminate that
<geist> seems to only be on clang
<geist> actually -fno-finite-loops. which seems inverted polarity
<mrvn> it just seems to detect obviously left empty loops and leaves them
<geist> yep. trouble is recentlyish clang has started to treat infinite loops as UB, and usually just elides them entirely, and silently
<geist> this has caused problems in zircon in more than one occasion
<geist> usually specialized 'die here' sort of loops in the kernel or early bringup code
<geist> but the fact that it just chooses to elide them is particularly heinous
<mrvn> it's UB in C++ as programs must make progress.
<geist> right. so this switch nerfs that
<MrBonkers> LittleFox I know I'm late, but considering you mentioned mlibc the other day, any reason why you didn't pick that? We're more than happy to help fix issues that you ran into
<LittleFox> didn't know it when I started looking for a libc to use and now just stuck with newlib as it's mostly good enough for now
papaya has joined #osdev
<MrBonkers> well if you want to switch, feel free to give us a shout if anything comes up (#managarm-mlibc on libera or the Managarm discord server)
<dzwdz> infinite loops being UB is weird
<dzwdz> i'm probably a bit ignorant, but it seems like the optimizations it enables are pretty niche
<geist> yeah also AFAICT gcc doesn't do anything with this, so unclear precisely what it's useful for
<mrvn> It neatly solves the halting problem. Every correct c++ program halts.
smach has quit []
<geist> and further moves C++ away from being acceptable for bare metal or non user-space-program centric
<\Test_User> what about when it uses external input for whether or not to halt; it may never halt if the input says not to
<\Test_User> s/whether or not/when/
<geist> it's strange too, since it seems to not admit that signals exist? AFAIK that's part of the language spec
<geist> that somethingl ike signals can come along
<mrvn> yeah, I never got that part.
<mrvn> alarm(1); while(true) { } is perfectly fine code
<geist> exactly, was going to say that
<dzwdz> geist: isn't it a thing in C too?
<dzwdz> N1528 seems to suggest that
<geist> indeed. C++ even acknowledges that stuff like signals exist because of things like thread_signal_fence and whatnot
<dzwdz> \Test_User: then that doesn't fit the definition of an infinite loop because it does io
<\Test_User> but it's still potentially infinite, so halting problem not solved
<dzwdz> eh, is it
<dzwdz> the user will die eventually
<mrvn> \Test_User: I don't think the halting problems includes infinite input
<\Test_User> so will the CPU, so therefore while (true) {} is also non-infinite?
<dzwdz> indeed
<\Test_User> mrvn: ah
<bslsk05> ​xkcd - Halting Problem
<GeDaMo> while (!heat_death()) {}
<mrvn> doesn't quanten effects prevent the heat death?
<heat> no
<heat> i am live
<dzwdz> i think so
<LittleFox> MrBonkers, thanks :)
<LittleFox> does it work with libc++, anything known?
<MrBonkers> unsure if that is tested, but it should. If it doesn't, we consider that a pretty major bug ngl
<mrvn> heat death is when everything is in thermodynamic equilibrium. So I can measure the temperature, and therefore speed, of a gas. And then I can determine the position of an atom and that's impossible to know both accurately.
<MrBonkers> works for sure with libstdc++ tho
<dzwdz> i don't think the laws are meant to apply at that point
<mrvn> dzwdz: because any observer is dead at that point?
<dzwdz> yeah, there's no police
<dzwdz> it's free for all
<dzwdz> i think quantum fluctuations are the thing that prevents the heat death
<dzwdz> but when a friend explained them to me it kinda went over my head, so idk
<heat> i die will never
<mrvn> " Sean M. Carroll, originally an advocate of this idea, no longer supports it,[24][25] arguing that the virtual particles produced by quantum fluctuation cannot become real particles without an external input of energy. "
<dzwdz> heat: not quite, you'll die but then appear again
<heat> like obi wan
<heat> ?
<mrvn> dzwdz: those fluctuations could create a new universe in 10^10^10^56 years but " Such a scenario, however, has been described as "highly speculative, probably wrong, [and] completely untestable". but but but I wanna test that, let me test that
<dzwdz> like, infinitely many times
<dzwdz> i'm purposefully not reading any more about it
<dzwdz> i'm probably heavily misunderstanding it and i don't want to ruin it
frkzoid has joined #osdev
<bslsk05> ​en.wikipedia.org: Boltzmann brain - Wikipedia
<mrvn> GeDaMo: there is a certain probability that all the atoms in your underwear will move 1m to the right at the same time.
* mrvn turns on the probability engine.
<GeDaMo> I don't get invited to those sort of parties :|
k0valski1889 has joined #osdev
<heat> operating system
<moon-child> party(6)
<heat> so how about those kernels eh
* moon-child grabs popcorn
<dzwdz> i got my garbage tcp stack to the point where it just barely works
<heat> thats the spirit
<dzwdz> doing a http file transfer at .5mbps is enough to pin the cpu
<heat> fucken what
<dzwdz> who needs more
<dzwdz> it was just meant to be a prototype to see how networking would fit into the overall system, i purposefully didn't care about optimizing it at all
<heat> what are you doing and why is it O(n!)
<moon-child> n‽ That's extreme. Usually you only get accidentally quadratic
<moon-child> or even just crap constant factors
<dzwdz> i'm making a lot of unneeded memory allocations
<dzwdz> i think there's one per each incoming tcp/udp packet
<moon-child> bet your memory allocator also sucks
<dzwdz> dlmalloc
<dzwdz> so, yeah
<moon-child> oh, eh, not amazing by modern standards, but dlmalloc is probably ok, esp. on one thread
<dzwdz> i'm actually running two threads
<moon-child> I was thinking tinyalloc or something
<dzwdz> and i didn't enable the thread safety thing at all
<dzwdz> lol
<moon-child> well
<heat> one per incoming packet is super standard
<moon-child> get thee some mimalloc!
<dzwdz> probably the bigger issue is that my ipc isn't that great
<dzwdz> is it bad if i manually parse the paging structures when i copy memory between different address spacess?
<heat> define manually?
<moon-child> as in you don't have your own model in-kernel of the virtual address mappings?
<dzwdz> i'd assume that the best way to do larger transfers is to map the pages i'm copying from/to in memory
<dzwdz> so i could then do a single contiguous transfer
<moon-child> ideally you would have shared memory buffers
<moon-child> and let the applications negotiate what 'transfer' means
<dzwdz> i'm also thinking about that
<heat> yeah sure that sounds fine
<moon-child> (maybe I shouldn't say 'ideally'. That is one common strategy, and if you are unsure of what to do, following orthodoxy may be a good idea)
<dh`> remapping pages as part of IPCs is expensive
<dzwdz> i have filesystems in userspace, and i was thinking about making read()/write() work on pages instead of arbitrary buffers
<dzwdz> dh`: ok then
<dh`> copying data is also expensive
<heat> well yes, depends on the size
<dh`> there's no one right answer (other than, perhaps, don't build things as microkernels)
<dzwdz> literally the only reason i'm doing osdev in the first place is that i wanted to experiment with an idea i've had for a microkernel
<dzwdz> so, no
<dh`> sure
<dh`> it just puts you on the back foot for performance
<heat> if you wanted to be on the back for performance you could've just used openbsd instead
<zid> heat forgetting solaris exists smh
<heat> cc mjg
<mjg> just copy paste my last rant
<mjg> 's what you get when you have lock-protected reference counts on vnodes
<mjg> and there are parallel lookups in the same dir
<ThinkT510> dzwdz: you are making a microkernel? do you have a repo one can look at or are you just in planning stages?
<geist> mjg: i sat down to fiddle with getting a memcpy & memset benchmark going last night. surprisingly my zen 3 has both ERMS and FRMS bits in cpuid set
<geist> i didn't know about the latter
<mjg> geist: oh?
<dzwdz> well, i do, but the readme is severely outdated
<dzwdz> lemme update it, i'll send a link
<geist> zen 2 has neither, of course
Gooberpatrol66 has quit [Quit: Leaving]
<geist> and somewhat predictably has fairly bad spin up on rep stosb as you say
<mjg> geist: are you sure about frms? does cpuid confirm it?
<mjg> i mean the cpuid tool
<mjg> it recognizes the bit
<geist> yep
<mjg> nice
<mjg> so what are the results :)
<geist> oh i dont have them handy right now, just was going to point out
<mjg> i almost got an frms capable box yesterday
<mjg> at work
<geist> still working on the tool, etc. was mostly getting it in place so i can then do side by side comparisons of memcpy and memset with various alignments/sizes, etc
<mjg> but some usb bullshit prevented it
<papaya> perhaps a dumb question, but what is the FRMS bit?
<mjg> "fast short rep mov"
<dzwdz> so do you know how complex some memcpy() functions can get? if you have FRMS you can replace your memcpy with like 4 asm instructions
<geist> yah it's the bit that says 'we *really* mean the ERMS bit now'
<heat> it's a big fat lie
<heat> dzwdz, 1) very 2) no
<dzwdz> supposedly, i don't have the hardware to test it on
<mjg> the supposedly bit is what makes me worried :)
<dzwdz> i mean
<dzwdz> you CAN, can't you?
<moon-child> you could do that anyway
<mjg> also, even if it spins up fast enough for short bufs, the question is how it handles misalignment
<dzwdz> isn't that what linux does
<moon-child> it's part of the base isa
<papaya> @mjg thank you
<bslsk05> ​github.com: Onyx/memmove.S at master · heatd/Onyx · GitHub
<mjg> dzwdz: linux has being doing plain erms for some time now for several funcs
<heat> and this is pretty simple compared to glibc
<mjg> dzwdz: *before* frms became a thing
<mjg> dzwdz: as in, believe it or not, they don't have very optimized routines
<dzwdz> heat: come on, what isn't complex in glibc
<heat> they = kernel here
<mjg> is that bionic?
<heat> dzwdz, good point
<heat> yes
<mjg> it is going to lose to erms past *some* size
<mjg> i have not benchmarked yet
<heat> believe it or not, bionic's sse memcpy doesn't really win against musl's crapshoot
<heat> it turns out my CPU probably does ERMS on rep movsq
<mjg> for what sizes adn what alignment
<heat> a whole bunch of em
<heat> we've gone through this before
<geist> also yeah there's two sets of solutions too: ones where vectors are available and ones where they aren't
<geist> the former is what i'm interested in right now
<dh`> it's always seemed to me that it would be better to write memcpy in C and teach the compiler to emit all the magic
<heat> geist, I think AVX is the super big win
<zid> That's what gc already does
<zid> gcc*
<moon-child> people have tried
<moon-child> compiler ain't good enough
<geist> heat: indeed, but again without them available it doesn't matter if it's a win
<zid> if you write out a memcpy it spits out a kilobyte of avx :p
<bslsk05> ​elixir.bootlin.com: usercopy_64.c - arch/x86/lib/usercopy_64.c - Linux source code (v5.19.3) - Bootlin
<geist> anyway i dont mean to once again drag this channel into another memcpy thing, but mjg did get me interested again in how bad zircon has pessimized things (in the kernel)
<mjg> they had a good idea: when you have reads from /dev/zero or similar intead of copying zeroes into the target buffer, you can just zero it out
<dh`> then the compiler should be made better
<mjg> ... except the routine is so botched....
<mrvn> Is all that unrolling even saving any time? Isn't the loop instruction eliminated by the hardware through pipelineing and branch prediction?
<geist> i had an idea the other day that on x86 if you take a page fault for writes, then check what instruction did it and it's a rep stosb yuo could potentially emulate it in the VM if it crosses a few pages
<dzwdz> mjg: how is copying zeros different than zeroing a buffer out
<dh`> that's a neat idea
<mjg> dzwdz: for starters you don't waste cache to store the source buffer
<moon-child> mrvn: yes, unrolling is helpful
<mrvn> geist: you mean don't fault a page in from disk if it's going to overwrite it?
<geist> that would be one case yeah
<geist> or if it's just some code memsetting something large
<dzwdz> wait, so the normal way to do it is to have /dev/zero be backed by a buffer full of zeroes?
<mjg> yea
<heat> yea?
<heat> not really
<dh`> no, that's not the normal way, but it can happen that way
<mjg> so how do you do it
<dh`> depends on how you have your devices set up
<mrvn> moon-child: how so?
<heat> user_memset
<mjg> freebsd has a 4KB page mapped over a 2MB area
<mjg> user_memset is the same idea i described above
carbonfiber has quit [Quit: Connection closed for inactivity]
<heat> yes, because it makes sense
<mjg> i completely agree
<mjg> just saying it is not the norm
<heat> if reading always returns 0's, just memset it
<heat> that's LSD code from 4.4
<mjg> yes
<moon-child> mrvn: there is still loop overhead
<mjg> bb in 15
<moon-child> mrvn: also consider the loop counter/index; on every iteration, the value of the counter depends on its value the previous iteration, and the body of the loop will likely depend on that index
<dh`> in a BSD kernel because of the uio abstraction the path of least resistance is to call uiomove() with a buffer of zeros
<mrvn> moon-child: and those instructions are not executed in parallel with waiting for the memory to load/store?
<dh`> and I think that's how it was in 4.4
<moon-child> so that limits your parallelism
<moon-child> they are executed in parallel. But for a sequential thing, you're probably not waiting very long
<moon-child> load/store are rather fast when hot
GeDaMo has quit [Quit: A program is just a bunch of functions in a trenchcoat.]
<mrvn> moon-child: one thing I'm wondering is branch prediction. Say you have a loop with one branch. Now it gets unrolled 4 times. Doesn't that mean the loop now uses 4 pranch predictors (lets assume no collision) and learns 4 times slower?
<moon-child> yes
<mrvn> I noticed clang is generally more conservative with unrolling. gcc code is often a lot longer.
<moon-child> really? I observe the opposite--gcc is usually more conservative
<heat> i've heard clang generates larger code at least
<heat> that's why they can't use it in firmware
<ThinkT510> dzwdz: found it. what made you settle on the name camellia?
<dzwdz> just picked a random flower name
<ThinkT510> neato
<dzwdz> i think the readme is from before i even wrote any code
<dzwdz> and the docs are super outdated too
<ThinkT510> any inspirations from other microkernel designs?
<mrvn> worst example I had was the destructor for a binary tree. If there is a left child call the destructor for the left child. Same for right child. clang just makes recursive calls. While gcc inlined 4 levels of recursion before doing a recursive call.
<mjg> dh`: so netbsd is not doing it?
<heat> my kernel is inspired on unix
<heat> and svr4, the best kernel ever
<mjg> unix sucks man
<heat> and linux, kinda shit
<mjg> check out "dnlc"
<heat> u succ fuk u bsd man go cry 2 berkly boohoo u dont like att i fuk u up
<mrvn> heat: 2 things where invented at Berkley: LSD and BSD. I think there is a connection.
<kof123> where is was/att? is this like east const west const all over again
<kof123> *is/was
<heat> lerkeley software distribution
<heat> just saying
<mjg> sounds like someone is on the stuff
<heat> yes
<mrvn> mjg: not enough
<heat> i have a big will to bring in mach's VM
<heat> also share my fucking code all over
<mjg> i wanted to say dnlc does not have backpointers from vnodes to nc entries
<mjg> which has a funny result where sometimes you need to SCAN THE ENTIRE CACHE to find them
<dzwdz> ThinkT510: not really, i looked around but didn't find anything i particularly liked
<mjg> and yes, solaris is doing it
<heat> d o o r s
<heat> slightly worse than STREAMS but I'll take it
<dzwdz> i wanted a mechanism for privilege separation which i'd find easy to grok and use
<ThinkT510> dzwdz: have you looked at managarm? they made a fosdem video in 2022 which was interesting
joe9 has quit [Quit: leaving]
<dzwdz> i honestly don't remember
<mjg> heat: you are laughing at a TRUE UNIX
<dzwdz> oh actually i didn't mention, this is influenced by plan 9 quite a bit
<heat> has marvell open sourced svr4 yet?
<heat> i wanna take a look
<dzwdz> at first i was thinking about doing this on top of its codebase
<mjg> i love their asm
<mjg> also love how it made its way to golang
<heat> link???????????
<heat> pls
<mjg> which part
<heat> asm
<dzwdz> plan9 is full of great ideas
<dzwdz> each with this ONE weird quirk which'll make you hate it
<dzwdz> still, it grew on me
<mjg> just search for golang on github or whatever, for example memmove_amd64.s
<mjg> TEXT runtime·memmove<ABIInternal>(SB), NOSPLIT, $0-24
<dh`> mjg: dunno (re zeros)
<heat> is it fast?
<mjg> oh heh
<mjg> geist:
<mjg> // REP instructions have a high startup cost, so we handle small sizes
<mjg> // with some straightline code. The REP MOVSQ instruction is really fast
<mjg> // for large sizes. The cutover is approximately 2K.
<mjg> tail:
<dh`> but if you're doing something performance-critical with /dev/zero you're already in the wrong place :-)
<geist> linux and splice() and /dev/zero is pretty slick though
<mjg> geist: from golang's src/runtime/memmove_amd64.s , perhaps you can bring these people to a fuchsia meeting
<geist> have had pretty good luck with that
<heat> you mean vmsplice?
<heat> geist
<geist> mjg: yeah part of the problem honestly is there *is* a lot of internal research at google with regards to memcpy/memmove
<geist> but it tends to be *highly* server workload oriented
<geist> so it's a bit hard to get folks out of that mindset
<geist> ie, it's not a blank slate to have a good discussion, bceause there's already a lot of heavy ammo on the table
<heat> where's the avx512 memcpy
<geist> precisely
<moon-child> avx512 is like easy-mode for memcpy
<moon-child> load masked, store masked, done
<mjg> it's like erms would like to be11
<mjg> what*
<geist> heat: anyway i was just thinking about splice() syscall and /dev/zero
<geist> kinda fun to move a lot of data around with that
<mjg> does it work though?
<geist> pv /dev/zero is particularly nice for just generting a crapton of data on a pipe since it uses splice()
<mjg> i distinctily remember someone from G implementing reads from sparse files from tmpfs with clear_user
<mjg> .. only to find it causes a slowdown cause the routine is atrocious
* geist nods
<mjg> if you end up using it... :)
<geist> key though is as usual: what version, what sort of hardware are yo on, etc etc
<geist> sometimes especially for Big Iron stuff, seem that copying around may have completely different crossover points with pissant consumer shit
<mjg> any hw, the routine has not been updated in over a decade and has been dodgy from the getgo
* geist nods
<mjg> i traced it a little bit with ebpf
<mjg> it is mostly used to store just few bytes, then there is a cutoff to over 1KB
<geist> i've noticed, for example, on derpy arm hardware on linux, that splice(/dev/zero) is pretty good
<geist> but then, i have no numbers to back it up, etc etc
<mjg> oh right
<mjg> it's amd64 :)
<geist> see, thats *precisely* what i mean by 'what hardware are you on, etc'
<geist> as in, yes that shit *actually matters*
<heat> vmsplice writes to pipes don't copy
<heat> they just COW pages
<heat> its distinctly cool
<heat> sadly they do regular old memcpy on reads
<mjg> geist: well i did paste the amd64 asm for that routine higher up, so...
<geist> that's precisely the kinda thing i tend to hit at google. folks are very big on Big Iron x86 and then you have this conversation and you have to remind them not all problems look like that
<mjg> and we are talking amd64 here with all the rep stuff, so the context... ;)
<geist> sure sure
<mjg> anyhow i found the file, here is part of it https://dpaste.com/5XBVNSLLG
<bslsk05> ​dpaste.com <no title>
<mjg> the number after : is the call count
<mjg> size on the left
<mjg> it jumps from small to really big, and the latter SUCKS
<mjg> which is really weird given how easy it is to drastically impreove this
<mjg> not to be confused with making it optimal for all uarchs
jafarlihi has joined #osdev
Gooberpatrol66 has joined #osdev
<jafarlihi> What do you think of single-header libraries in C++? Is it good idea to write your library as single-header when it is 20kloc+?
<heat> its a bad idea please dont
<griddle> single header libs are just there for people who don't value their time enough to sort out integrating into your build system
<moon-child> 'single-header' is a curiosity
<mjg> sudo bpftrace -e 'kprobe:__clear_user { @[kstack(), arg1] = count(); }' # if you want to take a look at yur craplinux
<griddle> use cpp files and dont require any fancy compiler flags to work
<griddle> test your lib w/ pedantic or whatever idk :)
<kof123> "single header libs are just there " yep and sometimes people do that with C too for same reason
<heat> header-only lib in C sounds funny
<griddle> what is that c lib that implements like png, etc?
<heat> libpng
<griddle> the famous one that has a name close to stl
<griddle> its header only
<moon-child> stb?
<griddle> yeah that one
<griddle> idk, single-header in C is probably fine
<griddle> in C++ you are trading up front one-time cost for constant cost at every compile
<griddle> (the up front cost of getting library TUs setup in your project)
<geist> also in general header only libraries imply that most things are inlined
<griddle> or, you compile multiple times and then have to do symbol resolution in the linker later
<geist> indeed
<griddle> i mean, c++'s big compile-time footgun of templates make header only libs the default in that language
joe9 has joined #osdev
<dzwdz> ThinkT510: i wrote a slightly better readme, so you can have some idea how it actually works
<ThinkT510> dzwdz: thanks
justache is now known as justDeez
MarchHare has joined #osdev
MarchHare has left #osdev [#osdev]
dude12312414 has joined #osdev
papaya has quit [Quit: Lost terminal]
frkzoid has quit [Ping timeout: 244 seconds]
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
<Griwes> Hey geist, why are things like "can create a new process" job policies instead of handles to objects that carry kernel permissions in zircon? Is it "let's reduce the number of handles people need to pass around", "this is the way we initially figured it", or a deeper reason?
<geist> it's both actually
<geist> the job policy is a defense in depth, a second layer that operates under a different mechanism, meant to be used as an additional wayt o lock down a process
<geist> the primary mechanism for things like that is n eeding a handle to a ting that lets you make a thing
<geist> ie, making a new job needs a handle to a job with sufficient rights
<geist> same for processes, threads, etc
<Griwes> ...right, creating a process wasn't a good example, because it does need a handle. I guess creating a VMO is a better one
jafarlihi has quit [Quit: WeeChat 3.6]
<Griwes> So the idea is that if someone guesses a handle, they can't just use it? (your handles are global, or am I misremembering?)
<geist> they're per process
<raggi> It'd make bootstrap harder, and bootstrap is already quite painful
<raggi> Userspace bootstrap that is
bauen1 has joined #osdev
[itchyjunk] has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
theruran has quit [Quit: Connection closed for inactivity]
heat has quit [Remote host closed the connection]
nyah has quit [Ping timeout: 252 seconds]
FreeFull has quit [Ping timeout: 268 seconds]
epony has joined #osdev
theruran has joined #osdev