klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
elastic_dog has quit [Ping timeout: 268 seconds]
elastic_dog has joined #osdev
nickster has quit [Quit: Ping timeout (120 seconds)]
nickster has joined #osdev
Lumia has joined #osdev
<geist> oh haha it didn't install an SMP kernel. only using cpu 0
<geist> hmm, tghat's not true, but for some reason the second cpu is not really running... that's odd
<klange> slightly better than using 0 cpus
<geist> oh it's running SMP, it's just... top and systat aren't really showing the second cpu in use, or the scheduler is unhappy with it or whatnot
<zid> TIL kenwood made a 72x
<mrvn> zid: does it do 72x on the inside or just the outside?
gog has quit [Ping timeout: 268 seconds]
<zid> What we need to know is what the ISP is when the CD shatters and it tries to get into orbit
nyah has quit [Quit: leaving]
nickster has quit [*.net *.split]
axis9 has quit [*.net *.split]
knusbaum has quit [*.net *.split]
sprock has quit [*.net *.split]
zaquest has quit [*.net *.split]
genpaku has quit [*.net *.split]
\Test_User has quit [*.net *.split]
Ermine has quit [*.net *.split]
moon-child has quit [*.net *.split]
darkstardevx has quit [*.net *.split]
teroshan9 has quit [*.net *.split]
DanDan has quit [*.net *.split]
k4m1 has quit [*.net *.split]
andreas303 has quit [*.net *.split]
JTL has quit [*.net *.split]
pbx has quit [*.net *.split]
ozarker_ has quit [*.net *.split]
Emil has quit [*.net *.split]
eschaton has quit [*.net *.split]
particleflux has quit [*.net *.split]
merry has quit [*.net *.split]
eau has quit [*.net *.split]
j`ey has quit [*.net *.split]
andreas303 has joined #osdev
moon-child has joined #osdev
knusbaum has joined #osdev
merry has joined #osdev
ozarker has joined #osdev
particleflux has joined #osdev
teroshan9 has joined #osdev
Ermine has joined #osdev
eschaton has joined #osdev
Emil has joined #osdev
j`ey has joined #osdev
JTL has joined #osdev
k4m1 has joined #osdev
axis9 has joined #osdev
DanDan has joined #osdev
sprock has joined #osdev
Lumia has quit [Remote host closed the connection]
Lumia has joined #osdev
IRChatter has quit [Ping timeout: 252 seconds]
hmmmm has quit [Read error: Connection reset by peer]
[spookyjunk] has quit [Ping timeout: 250 seconds]
[spookyjunk] has joined #osdev
srjek_ has quit [Ping timeout: 268 seconds]
axis9 has quit [Quit: joins libera]
smetano has joined #osdev
elastic_dog has quit [Quit: elastic_dog]
elastic_dog has joined #osdev
[spookyjunk] has quit [Remote host closed the connection]
ptrc has quit [Remote host closed the connection]
ptrc has joined #osdev
Lumia has quit [Quit: ,-]
Lumia has joined #osdev
Lumia has quit [Quit: ,-]
theruran has quit [Quit: Connection closed for inactivity]
[itchyjunk] has joined #osdev
zaquest has joined #osdev
GeDaMo has joined #osdev
jimbzy has quit [Ping timeout: 268 seconds]
elastic_dog is now known as Guest6588
Guest6588 has quit [Killed (tungsten.libera.chat (Nickname regained by services))]
elastic_dog has joined #osdev
xenos1984 has quit [Ping timeout: 260 seconds]
xenos1984 has joined #osdev
IRChatter has joined #osdev
ElectronApps has joined #osdev
xenos1984 has quit [Ping timeout: 250 seconds]
ElectronApps has quit [Quit: Leaving]
rorx has quit [Ping timeout: 260 seconds]
xenos1984 has joined #osdev
elastic_dog has quit [Quit: elastic_dog]
elastic_dog has joined #osdev
[itchyjunk] has quit [Read error: Connection reset by peer]
poyking16 has joined #osdev
smetano has quit [Read error: Connection reset by peer]
axis9 has joined #osdev
axis9 has quit [Read error: Connection reset by peer]
<GeDaMo> "1337 42" that's a setup, right? :P
<ddevault> just test values
<geist> oh woot
<Ermine> IPC?
<ddevault> IPC already worked, this is poll(2) equivalent
<Ermine> Ah
kindofwonderful has joined #osdev
<kindofwonderful> "Operating System Development
<kindofwonderful> sounds to my liking
<kindofwonderful> unfortunately i can't even C properly
<kindofwonderful> :)
<kindofwonderful> but that doesn't mean i don't like to rub shoulders with the professionals
<kindofwonderful> |heat| |GeDaMo| |mrvn| |clever| |j`ey|
<kindofwonderful> there are too many professionals
<kindofwonderful> ... i don't remember them all
<klys> wot are you doing up at this hour
<kindofwonderful> i don't sleep
<mrvn> klys: it's past noon
<klys> and why aren't you at church
* kindofwonderful rub shoulders with mrvn
* mrvn is getting ready for breakfast
wootehfoot has joined #osdev
<kindofwonderful> klys: in short ..
<bslsk05> ​'Michael got new cell mate who doesn't sleep at all - Prison Break' by PopularScenes2020 (00:02:26)
<kindofwonderful> or rather
<bslsk05> ​'Michael got new cell mate who doesn't sleep at all - Prison Break' by PopularScenes2020 (00:02:26)
<kindofwonderful> that
<kindofwonderful> klys: the last
<klys> okok
<klys> are you trying to swap everything out
<klys> got a bug in the hibernation
<kindofwonderful> klys: i try to poweroff buti have input/outpt error
<klys> user error
<klys> of course it's your project
<klys> just commit
<kindofwonderful> i commit to nothing, i have done the marriage mistke once
<klys> and that's why you don't have a github?
<kindofwonderful> yes
<klys> words, man, semantics
<klys> you don't wan tto give an inch, I suppose
<klys> you think someone will take a mile
<kindofwonderful> why are you blaming me for this ? lol
<kindofwonderful> i just have some pecularities
<kindofwonderful> like ..
<kindofwonderful> I DONT SLEEP
<kindofwonderful> AT ALL
<klys> it
<kindofwonderful> and if I do with strong medication it's with open eyes ( as i have been told )
<kindofwonderful> it's affecting me socially
<kindofwonderful> the medication have strnog side effects which means i only take them once in 3 days
<kindofwonderful> other than that and the fact i lost 15 pounds of muscle im pretty normal
<kindofwonderful> give or take
<kindofwonderful> now im getting tired ...
<kindofwonderful> klys: cya :)
axis9 has joined #osdev
kindofwonderful has quit [Ping timeout: 264 seconds]
axis9 has quit [Ping timeout: 268 seconds]
axis9_ has joined #osdev
netbsduser has joined #osdev
<netbsduser> decided to go ahead with trying to do some kind of single-level-store smalltalk. lots of interesting problems to tackle to make it work
<axis9_> a fixed
<axis9_> speech off
<axis9_> touch off
<axis9_> ident: optIn
<netbsduser> the on-disk format will be one area of particular interest
<netbsduser> i am expecting to be imitating a log structured filesystem for that
<axis9_> slivation ON
<mrvn> isn't a tree better?
<mrvn> scanning a 1TB log to recover would take long
<mrvn> B-tree with COW and multiple roots, one per generation
<netbsduser> mrvn: that could work well, i read a bit on ZFS recently and such a design could work
<mrvn> the biggest question is how you deal with mutable objects.
gog has joined #osdev
<netbsduser> i am expecting to cow them. for small objects this is actually a potential sore point; i am aiming at 128bytes or under as the object table entry size (therefore the minimum size of an object), and so if a load of small objects are modified which are spread across different blocks, there is an excess of copying afterwards. large objects i want to cow in parts. but here again comes a problem
<netbsduser> any object which is big and which is subject to random-looking write distributions will quickly become fragmented
<netbsduser> if i went for something like checkpointing automatically every 30s it could be a real nightmare
<netbsduser> but that might just be a problem i have to endure. i could always add some kind of hinting mechanism, so that e.g. objects which are expected to live a long time can be marked as such, and then subject to a different strategy for cow, perhaps copying a bigger extent if possible rather than an individual block
terminalpusher has joined #osdev
wootehfoot has quit [Read error: Connection reset by peer]
wootehfoot has joined #osdev
terminalpusher has quit [Remote host closed the connection]
terminalpusher has joined #osdev
marshmallow has quit [Ping timeout: 252 seconds]
eroux has quit [Ping timeout: 252 seconds]
eroux has joined #osdev
dude12312414 has joined #osdev
xvmt has quit [Ping timeout: 260 seconds]
Lumia has joined #osdev
[itchyjunk] has joined #osdev
<mjg> geist: so how did it go? :)
gareppa has joined #osdev
gareppa has quit [Remote host closed the connection]
<mjg> mxshift: for what sizes
<mjg> mxshift: using nt stores for string ops past certain size is not a new idea
<mjg> mxshift: the question is if nt stores are a win for the most part when employed in clear_page/pagezero/younameit, which is quite not obvious, despite typical handwaving
<mjg> mxshift: ... and which does seem *pessimal* on cpus made in last few years
<mjg> mxshift: well the q is if this was any good on the 32 bit suckers with sse, which may or may not be true, but which was never justified one way or the other with actual tests, that i could find anyway
<mjg> mxshift: what i did find is people handwaving and rolling with it
heat has joined #osdev
<heat> computer
<mjg> so far my history digging for page zeroing shows: solaris/illumos using nt stores, already came like that with initial import
<mjg> well export :-P
<mjg> freebsd: added nt stores with no explanation in the commit message
<mjg> netbsd: added nt stores with no explanation in the commit message
<mjg> openbsd: added claiming copy-paste from freebsd and a speed up in zeroing, but no methodology was shown or if actual usage (instead of *just* zeroing) was benchmarked
<mxshift> I don't recall what sizes ended up benefiting. Apple didn't care about 32bit x86 machines for the most part. macOS Leopard/SnowLeopard would be where the results of those experiments got introduced.
<mjg> found linux, it is using sse for zeroing, but the stores used don't bypass the cache?
<mjg> " movq %%mm0, (%0)\n"
<mjg> " movq %%mm0, 8(%0)\n"
<mjg> " movq %%mm0, 16(%0)\n"
<mjg> will have to check intel docs
<heat> that's not sse
<heat> that's mmx
<heat> wow
<mjg> right
<heat> is that for i686?
<mjg> yes
<mjg> i know the shit is bad on amd64
<mjg> the question is about the real old stuff
rwb is now known as rb
<sham1> Wouldn't the use of MMX also be expensive?
<heat> yes
<heat> i'm more surprised by the use of mmx itself lol
<heat> but i guess in i686 what's what you have
<mjg> there is sse2 in later cpus
<heat> turns out those two mmx instructions in the reset vector aren't the only ones used :(
netbsduser has quit [Remote host closed the connection]
<sham1> Could always REP STOSD
<sham1> Since IIRC that does bypass caches
<heat> no?
<heat> they only get funky cache behavior on erms AFAIK
<sham1> Ah, I see
<sham1> So that's probably what I remember reading
<heat> and rep stosd is probably worse for i686 than just manually looping
netbsduser has joined #osdev
<heat> why would you use an i386 kernel on modern (erms capable), 64-bit hardware?
<sham1> Good point
<heat> they didn't even mitigate retbleed
<sham1> Even if one had a 64 bit machine with not that much memory (although I'd hazard a guess that it wouldn't qualify as modern then) you'd probably better off using the x32 ABI for AMD64 for the smaller pointers while having all the extra registers
<heat> yes
<heat> or the 32-bit compat for less fuckery I guess
[itchyjunk] is now known as [spookyjunk]
<sham1> Compat doesn't get the extra registers though, right?
<heat> yup
<sham1> And just those alone are very appealing
<heat> i wonder how x32 affects performance
<heat> if we still had far and near pointers we could possibly enjoy reduced memory usage using an x32 + x64 hybrid
<heat> thank you for coming to my ted talk
<heat> or if we had segmentation
<heat> time to bring all these things back
<GeDaMo> You could use a 32 bit offset to a 64 bit base
<heat> but then the codegen sux
<GeDaMo> That's the compiler's problem! :P
<heat> we could also make things more compact by using 4-byte aligned u64/uptr
<heat> as is in 32-bit land
<sham1> We need 69-bit pointers. 64 bits for the address, 4 for a pointer tag if needed, and 1 for parity
<sham1> Makes it all very nice
<sham1> Or all 5 extra bits for checksumming
<sham1> Although a more realistic thing would be to have two 64-bit pointers acting like a base+offset pair, but that's not quite as funny
<heat> 420-bit pointers
<zid> Can we have 4 bits of parity
<zid> and a special ECC instruction to compute it
<mjg> dafuq's x32?
<sham1> It's not really parity if it's four bits
<heat> only if it's only available in xeons
<zid> x32 is 64bit regs with 32bit memory
<mjg> regular i386 + all the extra amd64 regs?
<heat> mjg, pointers are 32-bits, the rest is 64
<heat> it uses the normal x86_64 isa
<mjg> was not that x64?
<zid> IL64P32
<heat> no, x64 is x86_64 is x86-64 is amd64
<sham1> x64 is a Microsoft's name for AMD64
<heat> that is also IA-32e
<zid> x64 is itanium and you can't stop me
<heat> and Intel 64
* zid runs around ululating
<mjg> ye what zid said is my headcannon onw
<sham1> x64 for itanium would make sense
<sham1> Well, so far as the x86-style naming makes sense period
<heat> 86 should've been doubled as the bits doubled
<heat> x172
<zid> 8086 -> 80172 or 8172?
<heat> yes
<heat> why not both
<heat> 80172 in the US, 8172 in the rest of the world
<sham1> Clearly 160172
xenos1984 has quit [Read error: Connection reset by peer]
<heat> that is also a distinct possibility
<sham1> I wonder what the next address space extension would be, since we're already at PLT5 thanks to Intel
<sham1> And 52 bits
<sham1> PTL5*
<sham1> PML5**
<sham1> And actually it was 57 bits
<sham1> Man, I'm just misremembering like all the details
<heat> is PML5 already implemented by hw?
<sham1> So it leaves 7 bits for a hypothetical PML6
rorx has joined #osdev
<heat> oh yeah, icelake
<sham1> heat: since Ice lake
<sham1> Yeah
<sham1> So PML6 is going to have 128 PML6e
<heat> i don't believe we're getting PML6
<heat> because address tagging
<sham1> You're not supposed to use the high bits for tags, although that is a distinct possibility
<heat> you absolutely are
<heat> that's how you do it
<bslsk05> ​en.wikichip.org: Top-byte Ignore (TBI) - ARM - WikiChip
<sham1> Usually I've seen the low order bits being used as tags across the board
<heat> and related intel and amd extensions
<sham1> Hum
<heat> which they chose to implement *separately* and in an incompatible way
<heat> you can't actually store much data in the lower bits
<heat> at most 4 bits if you have normal malloc 16-byte alignment
<sham1> Depends on where you align. For example if you align your allocation to 16 bytes, as said you get 4 tag bits and then you can do other neat things like making it so that if the lowest order bit is 0 it's a fixnum and if not, it's something else, interrogate the higher tag bits
<sham1> Which gives you either 63 or 31 bit integers depending on the width used
<heat> yeah but that's not an address is it :)
<heat> you have 4 bits
<heat> that's not a lot
<sham1> Well you could have a singular tag for "heap allocated object" which in turn would have a header
<heat> if you use the upper address range you get 16-bits to play with
<heat> upper address bit range*
terminalpusher has quit [Remote host closed the connection]
<heat> which is in practice what JITs do
<sham1> Well clearly not V8 at least
<heat> and why most of them have issues when you expand the available address space
<sham1> V8 tags SMIs like I just described
<sham1> SMall Integer
eau has joined #osdev
<sham1> There's some happy consequences there, like that you can add a fixnum/SMI to another and get back a new fixnum/SMI. You'd have to promote to a full width register if your "this is actually an integer and not a pointer" bit was at the top
<sham1> Subtraction and negation also work, assuming 2's complement
<heat> python uses nanboxing
xenos1984 has joined #osdev
<sham1> Right, as does LuaJIT and IIRC also SpiderMonkey
<sham1> And that only works while your addresses and stuff stay sufficiently small
<heat> I know luajit was broken on M1-sized address spaces
<heat> at cloudflare the kernel team tried enabling larger address spaces on arm64 but luajit just wouldn't work
<heat> linux x86 deals with it by just not giving you a >48-bit address in mmap :)
<sham1> I could see something like MAP_57BITS or something being added as a Linux extension to those who really want to map even larger things
<heat> "An application that needs that much memory, and which does not play games with virtual addresses, can provide an address hint above the boundary in a call to mmap(), at which point the kernel will understand that mappings in the upper range are accessible. "
<j`ey> heat: 52 bit VA on arm64 or?
<heat> j`ey, yeah that was probably what they tried to enable, cant remember
<zid> I've only use more than like a couple of gigs of memory once ever
<zid> and it was to use my mmu as a trie implementation
<zid> cus I was too lazy to write one
ss4 has joined #osdev
wootehfoot has quit [Ping timeout: 250 seconds]
<mrvn> sham1: for some reason pointers are usually 0 tagged and integers 1 tagges. So a + b ==> a + b - tag
<mrvn> It's odd because pointer access in hardware usually allows an offset, which you could use to remove the tag.
xenos1984 has quit [Ping timeout: 264 seconds]
<sham1> I don't see why. I mean, as I said, v8 tags (small) integers as 0 and pointers with 1, and it makes sense since then a + b is a + b. Adding and subtracting integers is so very common. Pointer access for languages like Javascript is common too, fair, but I suppose v8 takes care of that by being the absolute master of inlining and unboxing
xenos1984 has joined #osdev
<sham1> I'd certainly tag pointers with 1
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<mrvn> the drawback is that you need a larger opcode for the pointer access to store the tag
<netbsduser> i've opted for pointers tagged 0 in my smalltalk dialect and smallintegers 1
<sham1> Fixnum addition, especially for things like loops with increments by 1, are so common that I simply don't understand that decision
<mrvn> The thing about integer arithmetic is that you often don't do that on memory. You load the values into registers, remove the tag, do lots of compuations, add the tag, store in memory.
<netbsduser> (and a few other things too at some point; Characters and SmallFloats; Apple famously encoded short strings into tagged pointers, very useful for early iPhone OS)
<jeaye> https://opensource.googleblog.com/2022/10/announcing-kataos-and-sparrow.html hm, intersting to see Google doing this as well as Fuschia.
<bslsk05> ​opensource.googleblog.com: Announcing KataOS and Sparrow | Google Open Source Blog
<mrvn> sham1: increment by 1 just becoems increment by 2
<sham1> Yeah. And then you don't even need to remove and readd the tag
<sham1> Even if done by register, that's still precious instructions
<mrvn> sham1: any literals you just remove the tag there.
<mrvn> * and / need to shift to remove the tag
<mrvn> sham1: I don't get why many GCs tag ints with 1 but they do.
<sham1> Yeah, but those are also not *quite* as common operations. They're still common, but not quite as common
<netbsduser> it would be interesting to see the two competing approaches benchmarked
<mrvn> I guess someone should implement both ways and benchmark that with modern code and modern cpus.
<sham1> You need to retag and such anyway because a multiplication will give you 128 bits and in the worst case you'll have to promote to a bignum at least if you have something like a Lisp
<netbsduser> mrvn: great minds think alike, and fools seldom differ
<mrvn> I'm sure someone did that in the deep past but a lot has changed since then.
<sham1> Same with division. You might end up with a rational number
<netbsduser> sham1: this is the behaviour of smalltalk
<mrvn> sham1: a * b ==> a / 2 * (b - 1) + 1 horrible
<sham1> Yeah, any language with a competent numeric tower like Smalltalk, CL, Scheme and so on has to do so
<mrvn> In any complex arithmetic the compiler can strip out a lot of the tagging.
<sham1> Yeah, if it can unbox it. Of course that goes for a lot of things
<mrvn> Ocaml uses the lowest bit for INT_TAG (1), and otherwise the lowest 2 bits for pointers (00) and special constants (10).
<mrvn> On 64bit it could use 3 tag bits.
xenos1984 has quit [Ping timeout: 268 seconds]
<mrvn> sham1: one thing why you might want pointers 0 tagged is access with offset and shift: r0 + 2<<3 to access an 8 byte aligned value. If you 1 tag pointers you can't shift and reduce the offset you can reach.
<mrvn> and r0 + r1<<3 - 1 needs more bytes on x86.
<sham1> Well usually that access would look more like r0 + constant
<mrvn> sham1: no, r0 + constant << shift. In most cpus.
<sham1> Why shift? I mean, if you know which field you're accessing, you could just precalculate the offset
<mrvn> sham1: it's part of the opcode. uses fewer bits to access the offset
Lumia has quit [Ping timeout: 246 seconds]
<sham1> If, say, I had a 0b111 as a heap object signature for whatever reason and I wanted to access the 2nd 8-byte word of a object, it could just calculate something like mov %r15, [%rdx + 1] or something to that effect
<sham1> Sure, you'd have to embed the offset into the opcode, but still
<sham1> And that is assuming I know that I wish to access the second word
<mrvn> sham1: try accessing the 256th.
<mrvn> 256 * 8 works but 2047 needs too many bits.
<sham1> I mean fair enough, but I do not know how often that'd be needed
<mrvn> yeah, that part you have to benchmark.
<mrvn> I think this was also intended for array access: x[i] ==> r0 + r1 << 3. But compilers either just increment r0 in the loop or increment r1 by 8.
xenos1984 has joined #osdev
<zid> .syntax intel, very_prefix strikes again
<heat> very prefix, much wow
<sham1> (mov (register r15) (deref (+ (register rdx) 1)))
<sham1> There you go, some prefixes
<heat> noooooo
<heat> register r15 is redundancy
<heat> just say register fifteen
<sham1> You need to distinguish it from xmm15
<heat> streaming single instruction multiple data register fifteen
<zid> mov r15, [rdx+1] violates my internal memory model btw, sorry
<zid> going to have to delete it
<sham1> Modern Lisp-machine inline assembly
<sham1> Back in the day, need special hardware, but now we have good enough commodity hardware
<heat> computer
<heat> math
<mrvn> a lisp cpu would be fun
<heat> sand
<sham1> A CPU of course being a piece of sand we've tricked into doing things by passing lightning through it
<sham1> Thinking about it makes the technology seem so cool
nyah has joined #osdev
<mrvn> Honey, I'm going to the beach, I need some more computronium.
<heat> no
<heat> it's horrifying
<heat> math for torture
<heat> computing is the modern day east india company
<heat> but with rocks instead of people
blockhead has joined #osdev
<zid> Programming was a mistake, they've taken us for absolute fools
Lumia has joined #osdev
<heat> :100:
<GeDaMo> The rocks used us! :P
<heat> are you victim shaming the rocks or somethin
wand has quit [Ping timeout: 258 seconds]
ZipCPU has quit [Ping timeout: 250 seconds]
wand has joined #osdev
<blockhead> rock abuse ;)
<heat> n a s t y
<bslsk05> ​hg.mozilla.org: mozilla-inbound: changeset 299679:dfaafbaaa2919a033c4c0abdd5830f4ea413bed6
<zid> it says hg.
<zid> of course it's nasty
<GeDaMo> "A Performance Study of the Unix System V Fork System Call Using Casper"
<heat> thank you human bslsk05
<heat> much appreciated
<GeDaMo> Does bslsk05 read PDFs?
<heat> no
<heat> clearly not
ephaptic has joined #osdev
Lumia has quit [Quit: ,-]
Burgundy has joined #osdev
axis9_ has quit [Quit: joins libera]
IRChatter has quit [Quit: ]
IRChatter has joined #osdev
invalidopcode has quit [Ping timeout: 248 seconds]
invalidopcode has joined #osdev
ZipCPU has joined #osdev
smach has joined #osdev
sav_ has joined #osdev
sav_ has quit [Client Quit]
<geist> mjg: well i haven't run your test, but i did get 6.4 on it
<geist> what precisely do you want me to do?
<heat> now that you have a freebsd installation you might as well daily-drive it
<heat> freebsd is to linux users what linux is to windows users
<heat> and this is fax
<bslsk05> ​git.kernel.org: kernel/git/torvalds/linux.git - Linux kernel source tree
<heat> a linux release name had emojis
<Ermine> <3
<Ermine> send me a fax
<heat> no
<heat> its 2022
<Ermine> Oh, really?
<heat> yes
frkzoid has quit [Ping timeout: 264 seconds]
<heat> fax
frkzoid has joined #osdev
<Ermine> windows server 2012 has fax server role
<geist> heat: well, problem is that 6.4 is too old
<geist> it's off the support train so a) the source repo doesn't exist in cvs anymore, so none of the cvsup/etc stuff work anymore
pbx has joined #osdev
<geist> and b) the ports/package stuff is too old, none of it seems to exist on the servers anymore
<geist> so it's pretty hard to do anything with it. i fiddled with it for a bit but coudln't install any of the handful of ports i wanted and couldn't sync any source
<geist> and a new ports dir predictably is wayyy too new for the old freebsd
<pbx> speaking of ports, i'm working on an interesting one for my OS:https://media.discordapp.net/attachments/1028035833149800510/1031288200431419512/unknown.png
<geist> this is pretty annoying. i found that at least old netbsd's can still kinda work, because the source and ports cvs are still there and still accessbile, even if using a 20 year old version
<heat> pbx, oh wow
* Ermine have got another crazy idea
\Test_User has joined #osdev
<heat> have you thought of, erm, not doing that?
<Ermine> No
<mjg> geist: kldload hwpmc && pmcstat -L
<geist> neither of those work
<mjg> wut
<netbsduser> pbx: that would indeed be an interesting port to see
<geist> -L switch doesn't eist in this veresion
<mjg> geist: give me 5
<geist> and the kldload doesn't have that module
<netbsduser> the only OSes i am aware of with X11 ports are the major ones of the osdev discord
<netbsduser> hobby OSes rather. managarm, lyre, and aero
<mjg> geist: what exact error are you getting from kldload
<mjg> cause
<mjg> The hwpmc driver first appeared in FreeBSD 6.0.
<geist> hwpmc does, the -L switch doesn't exist on the thing
<geist> oh lemme see, gotta boot the machine up again
<geist> it doesn't do power management very well, so it seems to burn a fair amount of power constantly :)
<geist> yay pentium 4
<mjg> you got a meter attached to it?
<geist> no but you can feel the heat coming out of it
<geist> though i should find my old kill-a-watt somewhere
<clever> i need to test my new router, i can feel the heat wafting off the top of it
<geist> booting it up now. iirc there was a syslog message to the effect of 'kernel not configured with ....'
<geist> as if the hwpmc driver was not a default option in the SMP kernel
<mjg> does the following work: pmcstat -S dc-misses -O /tmp/out
<geist> booting it up...
<clever> the router PSU is rated for 12v 5a
<geist> pmcstat: ERROR: Initialization of the pmc(3) library failed: No such file or directory
<mjg> did you kldload
<geist> "this kernel has not been compiled with options HWPMC_HOOKS"
<geist> so guess i gotta do that now
<mjg> sigh :)
<mjg> dye
<mjg> don't forget the patch
<geist> fwiw the pmcstat program has less options:
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/QdE2MTwE
<mjg> i'm looking at 6.4 sources now
<mjg> static struct pmc_event_alias p6_aliases[] = {
<mjg> EV_ALIAS("dc-misses", "p6-dcu-lines-in"),
<mjg> so that should work
<mjg> you can compile & install with: make kernel
<mjg> will roll with GENERIC by default
<mjg> use KERNCONF=meh to override
<mjg> configs can be found in sys/i386/conf
<geist> yep
<geist> trying to figure out how to apply your patch now. the old bsd patch command is unfamiliar
<mjg> heh default config does not do smp
<mjg> patch -p1 < patch
<geist> yep, which isn't working... trying harder
<mjg> wut?
<mjg> i can post the patched file
<geist> oh. no space after -p
* geist shrugs
<mjg> mount -u -o noatime /
<mjg> protip of the day
<mjg> device nsp # Workbit Ninja SCSI-3
<mjg> that's some funny devices
<geist> okay, building new kernel now with the hwpmc + patch in. will take about 900s iirc
heat_ has joined #osdev
heat has quit [Read error: Connection reset by peer]
<mjg> i wish i had such hardware handy
ephaptic has quit [Quit: WeeChat 3.7]
<mjg> would defo run some benches of the old stuff
<geist> that's honestly the primary reason i keep these around
<mjg> maybe not *that* slow, but a legit 4 or 8 core 2010-ish would be great
<geist> i have the space and i'm enough of a nerd that it's fun to drag one out every once in a while and experience the old days
<mjg> :)
<geist> modern stuff is not really that fun to run old software on since it's instant and quiet
<mjg> i was thinking about reproducing benches certain projects are proud of
<mjg> for example netbsd 5 with the major smp work
<mjg> tested up to 8 cores afair
<geist> what is kinda weird is though the SMP kernel here absolutely detects both 'cpus' (this is the first gen Hyperthreading that intel did)
<geist> i dont see a lot of evidence that it's actually *using* the second core
heat_ is now known as heat
<geist> i need to construct a parallel program test to verify that it is or not
<geist> ie, top and sysstat and whatnot dont seem to show any activity on cpu 1
<mjg> if i have my history right the first non-toy smp releaes was 7
<geist> yah, i do remember that 5 and 6 i think wwere kinda the dark ages of freebsd
<mjg> everything prior to it was a tirefire
<mjg> well i had a 5.3 in production which was rock solid fwiw :-P
<geist> for SMP, but freebsd 4.x was a super solid OS
<mjg> but also unicore
<mjg> interestingly it was freebsd which worked for me and not linux
<geist> twas the era where SMP x86 machines were still kinda exotic, or at least you knew that you weren't getting a 2x speedup, but that was better than nothing
<mjg> for examlpe in linux you still had to manually pick different /dev/dsp* (or whatever the name) devices for different programs
<mjg> so that more than one prog can emit sound (LOL)
<mjg> freebsd made it work with just 1
<heat> geist, were the CPUs themselves bottlenecked?
<heat> (for the "not getting a 2x speedup" comment)
<geist> oh i mean in terms of the state of the art OS stuff was not that sophsticated as it as now
<mjg> kernels were turbo bottlenecked
<heat> well, yes
<mjg> it's basically giant lock everywhere, like openbsd today
<geist> also things like per cpu run queues and whatnot were not the usual thing
<mjg> heat: if you are asking if the hw was inherently running into problems, then i don't know
<heat> i'm curious if the cpu technology itself made it a smaller gain than 100%
<mjg> right
<geist> but it worked well enough, in general. there wer obviously workloads that would collapse to 1x because of bottlenecks, but on the average it was still better than one cpu
<mjg> i would expect MESI was dogshit slow
<mjg> and consequently there was an inherent drop no matter what
<geist> again, i dont think it was that bad
<geist> i think a general rule of thumb was something like 1.8x or whatnot
<geist> but totally depends on what you're doing
<heat> was there ever a serious attempt to run a kernel instance per cpu?
<geist> it's all a matter of perspective. if you have the option of adding a second cpu to your machine you weighed the cost, but at a point in time there (late 90s) before intel and amd started bifurcating their market into server and client stuff, the overhead was you paid more for a dual socket mobo
<geist> but you had the option of just sticking another cpu in
<geist> but then you needed an OS that actually supported it, but it was generally a reasonable speedup
<mjg> did windows 9x do smp?
<mjg> sound silly now that i wrote it
<heat> lol
<geist> was it' optimal? probably not, but was > 1x by a pretty good margin, and worst case it'd probably collapse to about what you got with a single cpu, so it was a win no matter what
<mjg> .. looks like you needed nt
<mjg> imagine smp dos (LOL)
<geist> and yeah win 9x was very smp
<geist> very not smp
<geist> i remember the time in collage very time i had to reboot my SMP machien into win9x to play some game a little bit of me died inside
<mjg> :)
<geist> i got a dual ppro in about 1996. was my main machine up until 1999 when i got the dual p3 which i still have
smach has quit []
<mjg> wut 2 cpus in 1996?
<mjg> are you from old money or something
<geist> no but i worked at compaq at the time and there was a lot of... uh, discarded hardware
<mjg> what were you running which could even take advantage of it
<geist> ie if you were okay running a A0 rev ppro 180Mhz, go for it
<geist> was going in the trash otherwise
<mjg> oh ok, hw lying around in big corp is a topic of its own
<geist> but you could get mobos for not that much more. the market hadn't yet bifurcated into workstation/server (== $$) and desktop
<geist> a dual socket board was just a bit more expensive. but once you paid the cost you could just stick another cpu in later, and cpus weren't intrinsically SMP or not SMP capable at the time
<geist> since most of it was in the chipset anyway
<mjg> so how did cache coherency work?
<geist> oh they had a MOESI already there, but that was just present in basically 486+
<geist> was part of the bus
<geist> but ppro as a lot more sophsticated though, since it pulled in an L2 onto the socket
<geist> so the bus protocol i think was more complex there
<geist> but 486 and pentium there was an L1 cache on the cpu socket, so ti had to at least be coherent there
<geist> but any L2s were on the motherboard, and a single one, so the cache coherency there was part of the chipset
<geist> okay, got the new kernel, booted. lets see...
<geist> okay, so got the hwpmc module loaded, but of course the commandline is less featureful
<heat> what are we doing again
<heat> old freebsd, old cpu, but what for?
<geist> i dunno, i was just using it as an excuse to fiddle with old freebsd
<mjg> heat: you know, someone may join this channel and ask what would you develop a custom os for
<mjg> geist: so does this guy crap out or not: pmcstat -S dc-misses -O /tmp/out
<geist> fwiw when it loaded the hwpmc hwpmc: TSC/1/0x20<REA> P4/18/0xfff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA,PRC,TAG,CSC>
<geist> no, says it can't do dc-misses
<geist> ie, the command has a different syntax, etc
<heat> mjg, easy answer: yes
<geist> "pmcstat: ERROR: Cannot allocate system-mode pmc with specification "dc-misses": Invalid argument"
<mjg> geist: give me 5
<mjg> that means it does not see the event, but prog usage is fine
<heat> is this part of your page zeroing quest?
<geist> for funsisies
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/TTdbxxkS
<mjg> heat: yes
<heat> oooooh a pentium 4
<heat> i had one, it was my first cpu
<geist> yah it was the machine thatr replaced the dual P3
<heat> i wonder if it still boots...
<geist> and then was itself replaced with a dual athlon xp a year later or so
<mjg> geist: give me 5
<geist> sure thing
<geist> i think when i'm done with all of this i'm either going to replace this install with freebsd 13 or gentoo
<geist> cause why not
<mjg> :)
<mjg> dude that's with meltdown mitigation
<mjg> which splits kva
<mjg> already shitty perf in half man
<geist> *shrugs*
<heat> disable it?
<mjg> there is no switch on that arch
<geist> again, i'm not so fixated on raw perf as you are
<geist> as in a computer doing something is better than not doing anything at all
<heat> oh yeah fun thing, zen4 runs faster with mitigations on
<mjg> i'm sayin it may be so bad it's not even useful for laying around
<mjg> heat: wut?
<heat> yup
<mjg> url plz
<bslsk05> ​www.phoronix.com: With AMD Zen 4, It's Surprisingly Not Worthwhile Disabling CPU Security Mitigations - Phoronix
<geist> well, it's not *useful* as in it is a waste of power to run old hardware that chews up a lot of it just to actually compute things
<geist> that's honestly why i dont find some cute way to have all of them on all the tim in the garage or whatnot
<mjg> heat: dude phoronix is a perf tabloid
<geist> which would be fun, but i also pay the power bill and care a little bit about to environment
<zid> I could believe they spent their powerbudget elsewhere though
<zid> other than speculation
<heat> mjg, doesn't matter, did you see the results?
<mjg> heat: phoronix performs fire & forget benchmarking
<geist> i was talkig to some folks ta work about it. basically their take is zen 4 > *
<mjg> heat: who tf knows what actually happened
<geist> and zen 3-4 > zen 1-2 by a lot on the mitigaion front
<geist> ie, 1 and 2 have some fairly gnarly issues
<zid> I just have mitties off, spectre just means you can install printer drivers on my machine
<zid> if you have my user account I've already lost everything else
<geist> and yeah i usually run with mitties off, but frankly i can also just go into my bios and bump the power budget a bit and that more than makes up for it
<mjg> geist: so... i'm afraid i don't see a way to get pmcstat working (perhaps other release would, but i'm not going there)
<\Test_User> and bumping the power budget doesn't work without spectre mitigations? :P just do both
<geist> basically too old of a freebsd huh?
<geist> what about a freebsd 13 install? would it be testable?
<mjg> geist: if you give me 15 i'll give you 6.4 firendly will-it-scale to benchp age faults
<geist> oh sure. i'd be happy to compile and run a program
<mjg> ok stay tuned
<heat> STAY HARD
<geist> this little 22 year old computer wants to please!
<mjg> pet it on the front from me
GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]
<heat> geist, anyway i didn't quite understand what you said regarding zen4 mitigations. how would it ever be faster with them on than off?
<heat> i get that it has a lot less mitigations than older gens but...
<heat> unless they're playing some weird tricks
<geist> it wants to be your friend!
<mjg> even got a floppy?
<geist> heat: that i dont know
<mjg> nice little case
<heat> aww it's a qt
<geist> yah temporarily shuttles were really popular
<geist> they were kinda ahead of their time with the mini itx wave back in the early 2000s
<geist> it has this elaborate heat pipe to turn the heat sink 90 degress so it dumps heat directly out the back
<heat> np i understand
<geist> re: the zen 4 fater with mitigations, was asking someone at work about it but they dont know precisely which one it is
<mjg> so it's ture?
<mjg> wut
<geist> i should ask him to pop in here sometime, he is a wealth of knowledge that stuff
<heat> we want all the knowledge
<mjg> sounds pretty weird
<mjg> :)
<heat> when is bryan cantrill popping in too?
<mjg> dude, don;t
<geist> it's probably something far more mundane like it's not that the mitigations make it faster, but the path where the mitigations aren't taken is for some reason very pessimal on that hardware
<mjg> cantrill is a great example of an old geezer
<heat> he's not that old
<mjg> tell you what
<geist> anyway work on your program
<mjg> ok, fair
<heat> why do these new uarchs still have mitigations though?
<heat> i would assume they would fix it in sillicon
<geist> cause they find more, or the mitigation is an opt in thing
<geist> as in 'to avoid problems with this sort of thing, frob this when context switching'
<geist> and you can choose to not frob it
<geist> it's kinda the new thing now: things you should do when doing X or Y but dont *have* to
<bslsk05> ​www.phoronix.com: Disabling Spectre V2 Mitigations Is What Can Impair AMD Ryzen 7000 Series Performance - Phoronix
<heat> it seems to be spectrev2
<geist> soooo looks like i need hwloc
<mjg> no
<geist> hwloc.h
<mjg> the main.c i pasted does not have it
<geist> syre it is, about halfway down the include list
<mjg> oh there is caching in front of that webserver, it is plausible it has the wrong version now
<heat> mjg forgot to remove it
<heat> just delet
<mjg> lemme upload differently
<mjg> no
<geist> kk
<mjg> that's thew rong file fetched to begin with, i verified mine compiles on 6.4
<heat> oh wow you're using a radix tree in freebsd vm?
<bslsk05> ​people.freebsd.org: Index of /~mjg/.junk/64/
<mjg> you can use "fetch" to grab them
<mjg> cc main.c -o page_fault1 page_fault1.c
<geist> hmm now it's trying to find MAP_ANONYMOUS, hang on
<geist> in page fault 1 that is
<mjg> apologies
<mjg> use MAP_ANON
<heat> aha
<mjg> this one does not acess anyt of the mmapped pages so no pagezero yet
<mjg> but will serve as a baseline
<geist> okay, so running them now
<mjg> dafuq am i even saying. this one accesses all 128MB
<mjg> sysctl debug.pagezero_style= 0/1/2
<mjg> you may need -w
<geist> so which one do yo uwant me to run?
<geist> are there 9 permutations?
<mjg> for now stick to page_fault1, but i would like to see debug.pagezero_style= 0/1/2
<heat> what are the different styles
<geist> does it run forever?
<mjg> you can add "-s 2" to limit tries
<mjg> i would say -s 10
<mjg> lemme write a lolo script
<mjg> how did you name the binary
<geist> pf1, pf2, pf3
<geist> i got it now. no worries
<heat> ah ok
<geist> so far pagezero_style 0 is winning
<heat> is sse2 the non-temporal stuff?
<mjg> heat: ye
<heat> what's i686? mmx?
<bslsk05> ​IRCCloud pastebin | Raw link: https://irccloud.com/pastebin/raw/GAHM45Fh
<geist> also interesting i could hear the power supply start whining when running the 0 style
<bslsk05> ​dpaste.com <no title>
<geist> yah 0 is the sse2 style
<mjg> and then collect the .txt files
<geist> oh i dont have bash or anything here
<mjg> sh
<geist> seq and whatnot doesn't seem to exist
<mjg> wow motherfucker
<mjg> > for attempt in 1 2 3 4 5; do
<mjg> funny how certain tooling is misisng, you would think it was there since the 80s
<mjg> anyhow, given the above, i do expect sse2 to ultimately win. i'm somewhat surprised, but not shocked
<geist> so what was the default?
<mjg> sse2
<mjg> if available
<geist> so they were onto something!
<geist> so i think this whole exercise is, they were right at the time
<mjg> it looks like so, yes
<heat> woah
<geist> the p3 wont take this route bcause it didn't have SSE2, i assume
<geist> bcause SSE had just come out on P3
<mjg> it might have
<mjg> lemme do a quick check
<mjg> which uarch is this
<geist> this is a pentium 4. which is where SSE2 was introduced
<heat> what's the point of zeroing with temporal stores though?
<geist> the precise one.... good question
<heat> s/temporal/cached/
<geist> it's a later P4, the first with HT
<heat> assuming it's faster these days, why?
<geist> northwood maybe?
<heat> maybe the caches are just big enough now?
<geist> well, i think the anwer is NT stuff is a complicated problem. it trades this sort of thing for that
<geist> huh this cpu may be 2002. northwoods with HT came out in early 2002
<geist> the narrative in wikipedia was about 2002 or so they finally got the clock rate up (2.4Ghz and above) and was starting to beat the athlons at the time with sheer mhz
<geist> prior to that the p4s were kinda a joke
<mjg> oooh wait!
<mjg> wait
<mjg> this test only accesses one line from each page
<mjg> no wonder it's a win
<mjg> let's access some more
<geist> but that causes it to blat out the whole page
<mjg> it zeroes the hole page
<geist> ah i see yo umean if i read in the whole page it should start losing with NT
<mjg> but you then only ptay for fetching one line
<geist> because of the lack of cache warmup
<geist> got it
<mjg> will add a paramater how much to fetch
<geist> kk
<mjg> how big are the lines on that one
<mjg> 32?
<geist> good qustion
<mjg> it is plausible dmesg tells you
<geist> wikipedia says 64 bytes
<mjg> will play it safe and access every 32
<mjg> it's not smaller than that
<geist> yeah though 64 is probably it
<bslsk05> ​dpaste.com <no title>
<mjg> note csh may be unfriendly to CLINES=.. ./foo
<mjg> or let me whip out a script :P
<mjg> i guess you could start with: setenv CLINES 128
<mjg> and run the new prog like you did the previous one
<heat> bsd shells suck
<mjg> heat: true!
<heat> build bash and be a happy person
<mjg> from one crapper to the next, you are relentless
<mjg> team zsh 4 life
<heat> bash is the only usable shell
<heat> aww no
<heat> you're one of them
<geist> seems to be..... exact same speed now
<geist> 0 1 and 2
<mjg> hm
<geist> 2380537
<sham1> heat: zsh has a nicer license
<sham1> It's also nicer in other ways
<geist> actually the bzero is a teensy bit slower
<geist> 2386... vs 2380...
<geist> but it's extremely consistent
<mjg> geist: so how about i give you a script to run which will produce a bunch of .txt files, i'll process it and we will call it a day
<mjg> :)
<geist> okay
terminalpusher has joined #osdev
<heat> sham1, doesn't zsh have poor posix compliance?
<sham1> Yes. Then again so does bash. Really for POSIX sh stuff you want something like dash
<heat> i have dash
<geist> hang on a sec i think there's a bug in it
<mjg> geist: how did you name it?
<heat> pbx, UGH
<heat> my condolences
<geist> it's not pushing 'c' forward, so it's reading the original page over and over again
<mjg> geist: lol
<mjg> i knew i had to brainfart somewhere
<heat> pbx, what did you need to port it?
<bslsk05> ​'USENIX ATC '21/OSDI '21 Joint Keynote Address-It's Time for Operating Systems to Rediscover Hardware' by USENIX (01:06:19)
<geist> ah now you're getting a bit more of what you wanted to see: the NT store is slower when accessing the entire page
<pbx> heat: mostly just BSD sockets, poll/select and basic POSIX stuff
<mjg> right
<geist> though the 686 one (ersion 1) is a lot faster, and bzero is slower than the sse ones
<heat> pbx, wanna do a port-off :eyes:
<mjg> geist: again lemme fix it all up
<geist> so seems to be on first run: bzero < sse < i686
<mjg> geist: and ship a script to test a range
<geist> fine
<heat> isn't your test kinda super unrealistic?
<pbx> heat: i did run into some issues with XCB, which ignores half the configure detections
<epony> ^ keynote above by https://people.inf.ethz.ch/troscoe/
<bslsk05> ​people.inf.ethz.ch: Timothy Roscoe (Mothy)
<heat> you don't mmap 128MiB and access it sequentially cacheline-for-cacheline
<geist> it's a microbenchmark. they are almost all super unrealistic
<mjg> heat: 1. the bigger the total area the friendler this is for nt stores
<epony> another one of the same presenter https://www.youtube.com/watch?v=CyJ1ZCwtiRg but first look the other one
<bslsk05> ​'Unix50 - Unix Today and Tomorrow: The Kernel' by Nokia Bell Labs (00:51:33)
<mjg> heat: 2. i'm patching it up to address different sizes
<heat> you obviously need a more realistic bench
<heat> I propose: kernel building
<heat> the classic
<mjg> that wont see a diff
ss4 has quit [Quit: Leaving]
<heat> ah, it doesn't matter then
<mjg> at lesat i don't expect it to
<heat> checkmate
<mjg> no
<mjg> it wont because of utter single theraded slowness all over
<mjg> which masks any difference in this area
<geist> this is functionally a single threaded machine
<geist> actually the performance numbers are all over the place here
<geist> looking at it running its 5 runs
<geist> very inconsistent between runs
<mjg> hence the script
<geist> this is with the script
<mjg> if you get total noise with the script
<mjg> that's real bad
<mjg> i would make sure to kill off any possible source
<mjg> cron
<geist> ooooh it's because it's causing the machine to swap out
<mjg> sendmail
<mjg> lol
<mjg> haha
<heat> hahahaha
<geist> how much memory are you trying to map here?
<mjg> geist: how much dum do you have there
<heat> 128MiB
<mjg> me?
<geist> 1GB
<mjg> that's the default from there: 128 mb
<geist> but for some reason it's using a shitton more hwere
<mjg> i think we can safely drop it to 32
<geist> oh i see why. a local fix here causes it not to unmap the old memory
<mjg> it busts all the cases anyway
<mjg> lol
<gog> hi
<geist> because i c += pgsize
<heat> gog, hello gog.com
<gog> welcome
<gog> to zombocom
<heat> welcome to gogcom
<heat> pbx, hrmm what is it using to draw to the fb?
<heat> /dev/fb0 or something stupid like that?
poyking16 has quit [Quit: WeeChat 3.5]
<geist> okay, now with it locally fixed i'm seeing extremely consistent reuslts: nothing matters
<geist> all three are basically the same speed
<mjg> for clines 128?
<geist> yeah, maybe about 2% slower for the sse
<geist> 102k vs 106k
<mjg> can you only access half
<mjg> 64 i mean
<geist> lemme tweak it to 64
<geist> 64 lines of 64 now
<mjg> that's a full page
<geist> same numbers it seems: about 102k for SSE, 107k for the others
<geist> (using 64 byte cache lines)
<geist> ah you mean half the page, sure
<mjg> and last one: change mmapped area to 2 pages
<mjg> i promise to stop after :)
<geist> at half the page..... it's about 118k for sse, 112k/110k for the others
<geist> so based on this super simple test case: at best the SSE stuff is a little slower, but as you touch less and less of the page in subsequent usage, it becomes more of a wi
<geist> maybe the takeway is NT clearing isn't bad because cpus are very very good at filling cache lines on subsequent touchings, if they are
<mjg> do 2 pages man
<mjg> it is basically expected the less of the area you touch the better it is for nt
<geist> yeah i'm doing it now
<geist> and you're right
<mjg> the question is what real workloads are doing, but as pmcstat does not work, we can't test it on this sucker :/
<mjg> if you can be arsed some other time, perhaps plopping inux in there would help answer it
<mjg> :)
<mjg> if their profiling tooling, whatever it was at the time, worked
<geist> 102k for sse....96k/94k
<mjg> for 2 page area?
<geist> yes
<mjg> huh
<mjg> not what i expected
<heat> lmao
<mjg> can you paste the final patched prog somewhere?
<geist> fine.
<geist> my interest in this has crossed zero and is starting to go deeply negative now
<mjg> ye just give me the prog and i'm off your back
<mjg> as promised
<heat> i think you're trying to find a difference where there isn't any
<bslsk05> ​pastebin.com: p4f# cat page_fault4.c #include <unistd.h>#include <stdlib.h>#include <sys - Pastebin.com
<mjg> thanks
<geist> yah seems that its basically 'works pretty darn good'
<geist> or 'there's no magic bullet, but on the average this is not too bad and trashes less cache'
<mjg> geist: well thanks for testing. i would return the favor if i had the hw :-P
<bslsk05> ​reviews.llvm.org: ⚙ D92465 [PowerPC] Implement intrinsic for DARN instruction
<heat> DARN
<geist> okay, going to go do something else for a while
<mjg> > Instruction darn was introduced in ISA 3.0. It means 'Deliver A Random Number'.
<mjg> heh
<heat> this is on purpose now
<heat> if eieio wasn't, this must be
<sham1> > Darn Daniel
<sham1> Death by cringe
<heat> it's very 2015
<mjg> sham1: this is about the least old meme i can reference
<sham1> Oh I bet
<mjg> well i know about gigachad!
<mjg> which is me
<heat> it's cantrill
<mjg> oh right
<mjg> back to that one
<heat> the giga-est of the chads
<bslsk05> ​queue.acm.org: Real-World Concurrency - ACM Queue
<mjg> read this. it's basically solaris smp bag of tricks and if you think that's any good, we need to have a serious convo
<heat> not right now
<heat> but skimming through it, most things seem sensible
<mjg> > Hash tables are common data structures in performance-critical systems software, and sometimes they must be accessed in parallel. In this case, adding a lock to each hash chain, with the per-chain lock held while readers or writers iterate over the chain, seems straightforward.
<mjg> you already don't scale
netbsduser has quit [Remote host closed the connection]
<mjg> for read locking a chain
<mjg> it was already true at scales solaris claimed to operate at the time
<heat> huh?
<mjg> huh what
<heat> what's the problem here?
<heat> lock per chain
<mrvn> mjg: hashtables are supposed to not have collisions. So chains should be short (or you resize) and few cores would lock the same chain.
<heat> seems simple
<heat> and effective
<mrvn> just don't access the same item on multiple cores.
<mjg> we alread had this convo man
<mrvn> mjg: The big problem with hashtables is how to resize them.
<mjg> you spawn n threads, each of which looks up foo/bar/baz/quux{$thread_id}
<mjg> if you employ rcu or an equivalent and skip hash locking you scale perfectly
<mjg> otherwise all of them bounce lines on their way to the final component
<heat> but this is 2008 and no one had rcu
<heat> except... lunix
<heat> freebsd got rcu like 3 yeras ago
<mjg> then maybe you can concede solaris not having it could not scale
<mrvn> except when each thread deletes foo/bar/baz/quux{$thread_id} you end up with O(n^2) work
<heat> it scales worse
<mjg> mrvn: we also had this convo man
<mrvn> except when each thread deletes foo/bar/baz/quux{$thread_id} you end up with O(n^3) work, I mean
<mjg> mrvn: *modification* do write lock
<heat> but it still scales
<mjg> mrvn: so no, no O(n^2)
<heat> get a bigger hashtable, use read-write locks, etc
<mrvn> mjg: can't do write locks, that means you need read locks and your rcu is pointless
<heat> (and a good hash function)
<mjg> mrvn: no
<mjg> mrvn: again we had this convo
<mrvn> mjg: and you are still wrong
<mjg> mrvn: dude it's literally how linux and frebsd do it
<mjg> mrvn: write lock for changes, lockless for looukps
<mjg> on the same chain
<mjg> now you are telling me this does not work
<mrvn> no, I'm telling you you are missing something
<mjg> heat: no amount of hash resizing or hashing func changes is going to change the fact that foo/bar/baz/quux{$thread_id} lookups all visit the same 3 elements
<heat> mjg, yeah but erm, this is not specifically for paths?
<heat> they're talking about hashtables
<mjg> what?
<heat> and they said is sensible
<mjg> so for example their name cache is hash table
<mjg> and when you have to lock stuff to access it you run into the above?
<mjg> see zfs_lookup and higher
<mjg> they happen to have a mutex instead of a rw lock for chains, but the same problem would exist
<mjg> you fundamentally can't jump over the fact that rcu-like lookup of foo/bar/baz/quux{$thread_id} bounces nothing
<mjg> while read-write locking bounces *twice* for the first 3 components
<mrvn> Is there such a thing as a timed lock? You can aquire it for N ticks and then it reverts to unlocked. And you can check if it's already locked for >= M ticks and skip locking.
<mjg> so 6 times in total
<mjg> multiple that by n workers and you are fucked
<mrvn> mjg: for me N == 4 - 16 so totally not a problem.
y0m0n has joined #osdev
Burgundy has quit [Ping timeout: 252 seconds]
<mrvn> mjg: FYI your example is pretty bad. If you have threads using foo/bar/baz/quux{$thread_id} then first they have to create it, which is O(n^3) with RCU (which is why you need the write lock). Then they would operate on the FD they opened so no path lookup and then cleanup in O(n^3) with RCU again.
<mjg> dude
<mjg> let's agree to disagree
<mrvn> mjg: better example would be N gcc all parsing #include <stdio.h>
<mrvn> N parallel path lookups and no modifications.
<epony> it does not scale linearly without software rework
<mrvn> epony: ???
<Matt|home> memory stack is a cpu hardware feature correct?
<mrvn> mjg: what's a stack? my cpu has no stack
<mrvn> Matt|home: ^^
<Matt|home> i ask because im thinking if it's _actually_ a bad idea to treat the entire memory space as virtual addressing and only using heap memory storage..
<bslsk05> ​en.wikipedia.org: Amdahl's law - Wikipedia
<Matt|home> if that word salad made sense
<mrvn> Matt|home: like haskell does?
<epony> also you have concurrency on cache lines
<Matt|home> not familiar with haskell
<mrvn> Matt|home: doesn't really matter. Point is that it allocated "stack" frames on the heap
<Matt|home> yeah basically. the _main_ reason im thinking about it is because of an off-hand comment made earlier about the C compiler, where "eventually you'll stop thinking about memory in terms of stack and heap". like.. memory segmentation is a hardware feature it's not an abstraction
<Matt|home> so how dumb would it be for a system to act that way
<Matt|home> or am i thinking of something very dumb here..
<Matt|home> eh im thinking of something very dumb, nevermind ignore me
<mrvn> Matt|home: any language with callcc basically has to do it that way.
<mrvn> and it probably only makes sense for GC languages
<Matt|home> yeh
<mrvn> Something like C every function decends till it reached a leaf and then returns so a stack makes a lot of sense.
<mrvn> the heap probably makes the least sense. You really don't want a classic heap in a modern OS
<mrvn> brk/sbrk is just plain bad.
<bslsk05> ​en.wikipedia.org: Page (computer memory) - Wikipedia
<heat> brk kinda just works
<heat> particularly in dlmalloc designs
<mrvn> heat: that's not a classic heap.
<bslsk05> ​en.wikipedia.org: Memory management (operating systems) - Wikipedia
<AmyMalik> should I write my own malloc?
<heat> yes it is?
<heat> AmyMalik, sure, you'll learn a lot
<AmyMalik> should I do so, on an existing UNIX?
Burgundy has joined #osdev
<bslsk05> ​en.wikipedia.org: C dynamic memory allocation - Wikipedia
<mrvn> heat: you are using dmmalloc to improve the classic heap
<mrvn> epony: please stop pasting random links
<heat> the "classic heap" always had something on top of it
<heat> K&R malloc isn't just sbrk(size)
<AmyMalik> mrvn, I have epony on ignore.
<epony> also, make sure you actually target some HW for your kernel, just being an application is less demanding on your designs
y0m0n has quit [Ping timeout: 248 seconds]
<kazinsal> he should be on +b
<AmyMalik> true.
<mrvn> heat: most importantly dmmalloc uses mmap potentially
<heat> it heavily prefers sbrk
<AmyMalik> on FreeBSD, sbrk doesn't exist on arm64 or riscv.
<heat> huh
<heat> weird
<AmyMalik> how?
<AmyMalik> The brk() and sbrk() functions are legacy interfaces from before the advent of modern virtual memory management. They are deprecated and not present on the arm64 or
<AmyMalik> riscv architectures. The mmap(2) interface should be used to allocate pages instead.
<heat> how what?
DanDan has quit [Ping timeout: 252 seconds]
<AmyMalik> how is it weird?
<heat> it's a fairly simple syscall that may see some usage, particularly on some mallocs
civa has joined #osdev
<heat> but I guess freebsd just uses jemalloc
<mrvn> most programs just use libc and don't bake their on malloc/free so that isn't a big problem.
<AmyMalik> yeah, FreeBSD does use JEMalloc
<heat> it's also trivially emulatable in user-space using mmap + mremap
<bslsk05> ​en.wikipedia.org: CPU cache - Wikipedia
<mrvn> Does FreebSD lubc implement (s)brk?
<heat> AmyMalik, re: reimplementing malloc, sure?
<heat> you can do it wherever you prefer
<heat> your OS, an existing one, etc
<AmyMalik> right on
<mrvn> AmyMalik: have you planted a tree, build a house and walked a road yet?
<mrvn> wrote an editor=
<mrvn> ?
<heat> usually if its a userspace memory allocator I would prefer doing it in Linux
<AmyMalik> no, no, no and probably not
<heat> because it's easier to debug if things go poopy
<heat> which they inevitably will
<heat> not quite as easy to debug your own OS :)
<heat> (yet!)
<AmyMalik> hm
<AmyMalik> I don't have my own OS yet
<epony> quote of the week: -"I worked in a number of high profiled failures.." -"You're fired."
dude12312414 has joined #osdev
terminalpusher has quit [Remote host closed the connection]
frkzoid has quit [Ping timeout: 260 seconds]
<Matt|home> i'll look at it later but what's the basic idea behind 'writing your own malloc' implementation on an existing system. each program has it's own memory space, is the idea just to use what constraints you're given or are you expected to make some system calls or what
<Matt|home> or is it just an exercise that you can design however you want
<heat> you're obviously expected to make syscalls
<heat> but there's a lot to writing a malloc
<heat> algorithms, etc
Burgundy has left #osdev [#osdev]
Gooberpatrol66 has quit [Quit: Leaving]
DanDan has joined #osdev
wand has quit [Ping timeout: 258 seconds]
SpikeHeron has quit [Quit: WeeChat 3.6]