klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<heat> i would say designing is also super not important as a beginner
<heat> you have jack shit of an idea of how an OS works, designing now isn't important, doing is
<klange> A poorly designed OS that exists is more useful than a well designed OS that doesn't, aye.
<zzo38> Who assigns the instruction set numbers for ELF, and is there numbers for instruction sets MMIX, VAX, TRON?
<heat> maybe for vax
<heat> *maybe*
gog has quit [Ping timeout: 268 seconds]
<heat> zzo38, the sysv abi people do btw
<zzo38> Who are they, and how would you tell them to add more if they will ever be in use?
<bslsk05> ​groups.google.com: Generic System V Application Binary Interface - Google Groups
<klange> MMIX is 80. VAX is 75. What TRON instruction set are you talking about? BTRON ran on i386s, other TRONs run on ARM.
zns has left #osdev [#osdev]
<zzo38> There is another TRON instruction set (even though TRON can also be implemented on other computers too; the C interface is presumably independent of the instruction set). I have some information but some files seems to be missing (and it is Japanese).
<klange> Well I speak Japanese, so feel free to share.
<heat> i had no idea how much shit used ELF
<heat> they even have PDP10 and PDP11
<zzo38> Here is some of what I had found (I also made a backup on my own computer, using a program written in Ruby): http://web.archive.org/web/20080509195830id_/www.sakamura-lab.org/TRON/CHIP/top_index.htm
<bslsk05> ​web.archive.org: トロン仕様チップ 標準ハンドブック
<zzo38> I read Japanese a little bit but I am not very good at it.
<klange> The gist I get of this is that the TRONCHIP spec was not quite complete enough to be considered an ISA on its own, and while a few implementations were made in the late 80s, they were all very different.
<klange> There's some English info here about some of the implementations: https://www.cpushack.com/CPU/cpu4.html
<bslsk05> ​www.cpushack.com: Great Microprocessors of the Past and Present (V 13.4.0)
<klange> Particularly the Toshiba Tx1 (which I can't seem to find any useful information on in any language outside of of brief mentions) and the Gmicro/200.
<klange> I can at least find a picture of a Hitachi-manufactured Gmicro/200.
FreeFull has quit []
SpikeHeron has joined #osdev
<zzo38> I have heard of those but cannot find the full information about them such as the full instruction set information. I would still wanted to see how it is working even though it is no longer in use; I have some interests of some older computer designs including that one.
<bslsk05> ​web.archive.org: 第二部 トロン仕様チップ命令詳細
<bslsk05> ​web.archive.org: 第二部 第ニ章 転送命令
<bslsk05> ​web.archive.org: 第二部 第三章 比較命令
<bslsk05> ​web.archive.org: 第二部 第四章 算術演算命令
<bslsk05> ​web.archive.org: 第二部 第五章 論理演算命令
<klange> Shifts, bit-ops, special instructions for bitfields, decimal operations, string operations, queue operations, jumps, a couple of locked bit set/clear instructions that are listed in a section titled "Multiprocessor Operations"...
<klange> The actual machine encoding is all in images, but they seem to be archived on the handful of instructions I actually clicked through to.
<zzo38> I had found many of the pictures seem to be missing
<klange> You might be able to find a copy of the print version of this somewhere... https://www.kinokuniya.co.jp/f/dsg-01-9784893620842
<bslsk05> ​www.kinokuniya.co.jp: トロン仕様チップ標準ハンドブック - 紀伊國屋書店ウェブストア|オンライン書店|本、雑誌の通販、電子書籍ストア
<zzo38> I am not very good at Japanese, though. I had been trying to translate some of the TRON documentation into English, although I am not very good at it. Still, I have been able to figure out a few things, about the file format, file system, system calls, and a few other things, but not very well and probably I have some things wrong.
<klange> Much of the documentation, even in Japanese, seems to just be lost, and what little remains decribes only a watered down embedded RTOS kernel that is the only surviving aspect of the project.
<zzo38> Yes, that is what I had found too, but I had wanted to find the older documents too.
<heat> klange, have you been following the recent python bignum problem?
<Mutabah> Ooh, I smell drama and/or interesting technical problems
<heat> there was a CVE when parsing crazy large bignums, the python devs silently added a limit and an arcane call you could make to disable it or change the limit, a bunch of scientific/math libraries that relied on super large bignums are now upset and python broke everything
<Mutabah> Oof
<Griwes> amazing
<Griwes> wonder if there'll be ripples of this at work somehow
<bslsk05> ​lwn.net: A Python security fix breaks (some) bignums [LWN.net]
<Griwes> > a two-year-old CVE
<Griwes> *really*?
<heat> I really enjoy the vibe of "algorithm is O(n^2)? just limit n lol"
<klange> heat: very closely, and a friend in another chatroom is a core dev who had some choice words about the way things went down
<klange> the introduction of _a_ limit is fine, everything else about this situation is ass
<klange> the limit is a global, and thus not thread safe; the defaults are hilarious (the problem only really becomes evident around the 100k digit mark and they set the default limit to _4000_ish.
<klange> )*
<klange> there were _several_ proposals for improved conversion algorithms that would have avoided the issue entirely, eg. get things to a point where a reasonable limit would have approached "you're running out of RAM anyway" levels.
<heat> yeah someone in lwn's comments mentioned GMP does it with almost O(N)
<klange> the vast majority of the quoted 'attack vectors' should have been fixed directly
* Mutabah nukes popcorn
<Griwes> the fact that this is the solution people went with after having *two goddamn years* to figure it out is just *chef's kiss*
<klange> eg., parsing HTTP headers was a common one - most web servers limit the total size of headers... to sizes like 8K or 4K - less than the limit that was introduced.
<klange> Or the JSON parsing, but that's completely broken because JSON doesn't even have integers.
<clever> that reminds me of a nasty bug in haskell
<clever> the json parsing library, stuffs everything int a hashmap before doing further conversions
<clever> an attacker can intentionally create hash collisions on the json keys
<clever> and with a large enough json object, entirely negate the point of a hashmap and leave you cpu bound for minutes
elderK has joined #osdev
zzo38 has left #osdev [#osdev]
rcvalle has joined #osdev
terrorjack has quit [Quit: The Lounge - https://thelounge.chat]
terrorjack has joined #osdev
[itchyjunk] has quit [Remote host closed the connection]
heat has quit [Ping timeout: 260 seconds]
scoobydoob has joined #osdev
scoobydoo has quit [Ping timeout: 250 seconds]
scoobydoob is now known as scoobydoo
zaquest has quit [Remote host closed the connection]
zaquest has joined #osdev
elderK has quit [Quit: Connection closed for inactivity]
jjuran has quit [Quit: Killing Colloquy first, before it kills me…]
jjuran has joined #osdev
scoobydoob has joined #osdev
scoobydoo has quit [Ping timeout: 268 seconds]
scoobydoob is now known as scoobydoo
scaleww has joined #osdev
scaleww has quit [Remote host closed the connection]
scaleww has joined #osdev
junon has quit [Quit: You have been kicked for being idle]
gog has joined #osdev
gog has quit [Client Quit]
sprock has quit [Ping timeout: 252 seconds]
sprock has joined #osdev
heat has joined #osdev
GeDaMo has joined #osdev
_xor has joined #osdev
Matt|home has joined #osdev
<Matt|home> o\
<GeDaMo> Are you saluting? :|
<Matt|home> i am waving hello.
<GeDaMo> Hello :P
<Matt|home> dumb question.. in general what is the minimum info you'd need a process to have for your kernel to be able to handle it
<Matt|home> ID, size, etc
<zid> its memory and its registers
<Matt|home> that's it? that's not so bad
<zid> well what else do you need? That's what a process is
<GeDaMo> A stack maybe?
<zid> that's its memory (processes also don't actually need stacks)
<Matt|home> im looking at linux's process attributes. there's like 60
<zid> yea well linux has fancy things like files
<GeDaMo> Permissions and suchlike
<zid> note how all you get in userspace is an 'int fd', for each file, all the tracking info is in the kernel
<zid> so the kernel needs to attach it to the process
<heat> Matt|home, linux does a lot more than "minimum" ;)
<bslsk05> ​github.com: Onyx/process.h at master · heatd/Onyx · GitHub
<heat> mine also has a lot
<heat> file stuff, signal stuff, memory stuff, threading stuff, timing, etc
<Matt|home> jja
Maja[m] has joined #osdev
<Maja[m]> hmm, what happens when CS has RPL=3 and DS has RPL=0?
<zid> assuming there's nothing stopping it, it'd just mean cs fetches can't happen from 0 and ds can
<zid> so jmp beef would fault, but mov eax, [beef] would not
<zid> although I may be confusing rpl and cpl and dpl, I can't remember the differences
[itchyjunk] has joined #osdev
frkzoid has quit [Ping timeout: 244 seconds]
lentement has joined #osdev
lentement has quit [Remote host closed the connection]
<Maja[m]> from the definition I've read I'd assume CPL = CS.RPL
<Maja[m]> (RPL is the low bits of the selector, DPL is in the GDT)
<Maja[m]> and IIUC RPL <> DPL would cause a fault unless the segment is marked conforming?
lentement has joined #osdev
dude12312414 has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<Maja[m]> hmm though if the segment registers each control it separately then can we really have "current priviledge level" as a well-defined concept?
<zid> It's just what's loaded into cs
<zid> that is to say, the cached half
<zid> not the literal number
lentement has quit [Remote host closed the connection]
Ermine has quit [Quit: Cya!]
Ermine has joined #osdev
archenoth has joined #osdev
lentement has joined #osdev
lentement has quit [Ping timeout: 244 seconds]
freakazoid332 has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
rpnx has quit [Ping timeout: 244 seconds]
rpnx has joined #osdev
nyah has joined #osdev
epony has quit [Remote host closed the connection]
freakazoid332 has quit [Ping timeout: 244 seconds]
xenos1984 has quit [Ping timeout: 260 seconds]
xenos1984 has joined #osdev
frkzoid has joined #osdev
<LittleFox> <zid> yea well linux has fancy things like files
<LittleFox> that's a nice quote out of context :D
opal has quit [Remote host closed the connection]
opal has joined #osdev
<zid> It's context-free
<zid> files are fancy
<LittleFox> I mean, yeah
<LittleFox> so, an OS knowing about files is fancy then? :)
<zid> yup
<LittleFox> nice
<zid> makes you more fancy than android and ios for example :P
<LittleFox> there is a LF OS branch that can already be called fancy :3
<LittleFox> x)
<mjg_> directories == next level
<pitust> zid: android and ios do have files, and both have a file system provider thingy for apps
scaleww has quit [Quit: Leaving]
<zid> pitust: it's partially a joke, but also kinda true in practice
<zid> phone UIs aren't file based, they just hide everything in what would be AppData/Roaming on windows, or ~/.blah/ on linux
<zid> and give you a gallery, or a contact list, etc
xenos1984 has quit [Ping timeout: 240 seconds]
pretty_dumm_guy has joined #osdev
xenos1984 has joined #osdev
FreeFull has joined #osdev
FreeFull has quit [Ping timeout: 260 seconds]
frkzoid has quit [Ping timeout: 268 seconds]
FreeFull has joined #osdev
netbsduser has joined #osdev
skipwich has quit [Ping timeout: 265 seconds]
nyah has quit [Quit: leaving]
freakazoid332 has joined #osdev
nyah has joined #osdev
pretty_dumm_guy has quit [Quit: WeeChat 3.5]
saltd has joined #osdev
skipwich has joined #osdev
joe9 has joined #osdev
koolazer has joined #osdev
<geist> yah i'm always amused to see how oses like that abstract fses, or more specifically how objects do or dont map to an underlying fs
<mjg_> interestingly zfs internally is an object storage with posix layer built on top
<mjg_> i don't know if it was a good idea, i do know it comes with funny code needed to access it
<geist> yeah i remember that. i assume there's a layer api that lets you just get direct access to an underlying zfs object
<geist> figure a DB might find that useful or something
<geist> OTOH i guess a file in some sort of O_DIRECT is probably basically the same thing
<mjg_> if memory serves the code was pretty weird doing strlen and whatnot
vin has joined #osdev
<vin> When did a block access become "4096" B? Is there any historical significance to this?
<zid> It's a power of two of a good size, and happens to also be the same as the block size of the MMU, so it makes OS design 83943x easier
<vin> I mean why was the block size of MMU set to 4096?
<GeDaMo> 12 bits something something PDP?
<zid> <zid> It's a power of two of a good size
<vin> And not 2048 or 1024 etc
<zid> 2k is really small and overheady, 8k is fairly big and quite fragmenty
<zid> It's just suitable for the scale of what we use memory for
<vin> zid: I am trying to understand why you call 2k small and 8k fragmenty. What decides this?
<zid> the scale of what we use memory for
<vin> But why? Is it because of some hardware that we chose 4096? Like GeDaMo suggests
<zid> If computers *only* dealt with 1MB aligned records, 4kB would not be the page size.
<vin> Yup why is the page size 4k is my question.
<zid> I keep telling you
<zid> over and over
<zid> same reason there's 8 bits in a byte, it's a good size that doesn't waste too many bits, and isn't too small to fit common things into
<vin> Is there a paper/manual this was first introduced?
<zid> some machines used more, some machines used less, 8 is a nice middleground
<zid> that suits almost all applications
<vin> Of course I get your point zid I am just trying to figure out where it started
<vin> who/which machine introduced it
<zid> Probably tens of different ones independantly, in practice
<geist> vin: also it's mostly that most of the architectures that used something other than 4k aren't around anymore
<zid> first isn't going to be that useful
<geist> there are plenty of examples of non 4k mmu pages in various dead architectures
xenos1984 has quit [Read error: Connection reset by peer]
<zid> It isn't really an invention, it's a compromise based on circumstance
<geist> so in x86 it was 4K because that's what they chose. though to be fair it nicely works (12 bits + 10 + 10 == 32bits). nice clean 2 level page table
<vin> zid: who made that compromise first?
<vin> geist: so x86 introduced it?
<zid> I don't know and it's not relevent
<geist> more like x86 chose 4k, and x86 is still around
<geist> vs a bunch of arches that chose something else that aren't around
<geist> or had more flexible mmus where the page size could be variable from the get-go
<zid> (or you just don't care about)
<zid> because they're more niche
<zid> and have niche page sizes
<geist> yah and more modern arches (arm64 and riscv in particular) basically do whatever x86 does because it's the defacto standard
<pitust> not really
<pitust> arm64 does 4k, 16k and 64k
<pitust> and some may or may not be implemented afaik
<geist> sure, but it also does 4k
<pitust> geist: iPhones don't do 4k pages, like at all
<geist> and yes i know arm64 does a lot more etc etc but it also uses basically the same design, etc
<geist> pitust: that's a software choice
<vin> Byte addressable SSDs are around the corner, this made wonder the history of block based device. I was not sure if it is a NAND property that forces 4K amount of cells to be activated at once or if its the other way around. I know hard disks have 512B sections so one could have built a 512B page system
<zid> iphones also don't use files, so it makes more sense :P
<GeDaMo> https://en.wikipedia.org/wiki/Atlas_(computer) "The page size was 512 words, i.e. 3072 bytes."
<bslsk05> ​en.wikipedia.org: Atlas (computer) - Wikipedia
<vin> * 512B sectors
<geist> so VAX, for example, used 512 byte pages
<zid> when the only thing you're allocating memory for is javascript and images, 4k is a little small
<geist> and 68k had a fe wmmus, but they used a more variable size (at least the 030 and 040 one)
<pitust> geist: no, on iPhones you physically cannot set a page size thats not 16k (at least to my knowledge)
<mjg_> 3072 byte page?
* mjg_ gets all iffy about non-power-of-2
<geist> pitust: M1 cpus allow 4k and 16k. so either they explicit remove that feature in A15 and A16, or it's there but it's unused
<geist> mjg_: power of 2
<geist> but since vax and 68k were very inflential in the design of lots of unices and whatnot in the 70s and 80s they were definitely an exposed thing
<pitust> it's more like explicitly adding 4k pages for the m1
<geist> pitust: sure. okay.
<vin> SSDs offer 512B granularity but don't perform well at scale when used that way. So the block size is more storage device driven rather than MMU?
<geist> but other archces at the time like PPC, mips, alpha were basically all 4K AFAIK. though MIPS had this weird twin page design that i think made it 8k
<geist> and alpha might have been 8k as well?
<kof123> <reads backlog> i will just say years ago i wanted to see why "8 bits in a byte" (not contradicting this) ...ibm wanted to be compatible with someone else. whatever machine it was engineer wanted 24 bit word or something, that was a good number so [same registers?] could hold floats of some kind too. but was told no, and it was either that or no machine, so went along with it. this does not deny anything, the important part of this st
<kof123> ry is ibm wanted to be sort of "compatible" with some competitor? wow
<kof123> "whatever machine it was" then others followed this supposedly
<kof123> i dont recall details, or what that "someone else" was
<geist> yah it was fairly commom to have 9 bit derivatives at the time. also where a lot of use of octal came into play iirc
<geist> since multiples of 9 (or 6) work nice for 3-bit-per-digit octal
<zid> 7 was de-facto for a long time I thought at IBM
<vin> so 4k page size is more to do with compatibility on modern hardware than efficiency?
<geist> vin: yes
<geist> there's generally very little reason to have the OS build a VM around a page size that's not the same as hardware
<vin> http://herpolhode.com/rob/utah2000.pdf Why didn't someone write a paper on architecture research then :D
<geist> that being said i'm starting to hink that 4k is becoming too small for modern machines, a reason why some systems are playing with larger sizes (like apple and 16k)
<mjg_> 16k may be a little overkill though
<geist> it's a tradeoff since you probaby waste a bit more memory due to internal fragmentation
<vin> Isn't large pages only to reduce TLB misses?
<geist> but then there's less pages, less TLB pressure, etc
<zid> 4k being relatively small is kinda resolved by 2M pages though
<mjg_> vin: tlb misses is part of the story
<geist> well i wouldn't say only
<mjg_> key point being you have way fewer pages to shuffle around
<bslsk05> ​en.wikipedia.org: Page (computer memory) - Wikipedia
<mjg_> think aging with CLOCK or other algos
<zid> there's not *many* loads where you get rekt by not having 16kB pages where you can't just use 2M instead, I imagine
<geist> exactly, less general page faults, accessed/dirty bit checks, etc
<mjg_> less lock contention
<mjg_> it's all less work, except possibly in exchange for more memory waste
<geist> yah
<vin> Right, but there is this drastic change on storage space where "block" size is reducing from 4k to 512b to recently 256B (optane)
<mjg_> and we already got browsers to do the latter ;)
<geist> and a single page still fits within an average L1 cache (or at least on apple cpus which have a monsterous L1d)
<vin> This will be pushed further next year
<mjg_> i don't know squat about storage
<geist> so you'd want to at least be able to do a page to page copy without completely spilling the L1, i'd assume
<mjg_> why are they shrinking it?
gog has joined #osdev
<GeDaMo> Hasn't Optane been dropped?
<mjg_> geist: in a better world hw would understand common things like 'zero the page out' or 'copy it'
<zid> optane is sold off yea
<geist> mjg_: nod. i think apple cpus are known to have hardware assist instructions for page compression/decompression
<geist> so possible they have xactly thhat
<vin> Because interfaces support it, CXL will be byte addressable. The point is to reduce read amplifications and move away from old datastructures that force you to do sequential accesses. mjg_
<mjg_> geist: oh? nice
<geist> though arm cpus already have a clzero style thing
xenos1984 has joined #osdev
<saltd> (*  ̄3)(ε ̄ *)
<gog> hi
<saltd> ror gog,0xc4
<vin> So no one knows how 4k came into existence looks like!
<mjg_> i don't, but if i really wanted to find out
<mjg_> i would check old literature concerning cpu design
<vin> i will try that mjg_
<geist> vin: what do you mean? i think we answered it a bunch
<geist> it's just one of many sizes and its simply the size that you are most familiar with, probaby because you're mostly using an architecture that uses 4K pages
<geist> could be 2K, could be 512, could be 8K, etc. but x86 picked 4K because it is probably a reasonable compromise for the time, and continues to be fairly reasonable
<geist> and x86 at least is fairly strict on the page size so you can't easily switch to another size
<zid> and as I said, it's not an invention, it's a comrpromise, relying on your current memory size, expected workload, etc
<geist> (unlike some other architectures)
<vin> geist: I know you said x86 used it because someone else did or because it exists in the present day. But I am looking for something concrete where a architecture played around with different page sizes and evaluated why 4k makes most sense. We all know why 4k is good but who/which architecture showed it first
* geist shrugs
divine has quit [Ping timeout: 264 seconds]
<zid> It absolutely does not matter who used it first.
<geist> probably some mainframes. paging as a general solution didn't exist for a tremendously long time before x86 came along. i think the first paging machines were mid to late 60s, mainframes
<geist> dunno what sizes they chose
<zid> It is not an invention, it's the result of various compromises.
<gog> didn't vax have 4k pages?
<j`ey> pitust: seems like some iphone page size support was a bit random https://twitter.com/never_released/status/1334199003991445506
<bslsk05> ​twitter: <never_released> What does hardware in the wild actually implement? ␤ ␤ - Cortex-(A53/A57/A72): 4KB and 64KB pages ␤ - Cortex-A75 onwards: 4KB, 16KB and 64KB pages ␤ - NVIDIA Carmel: 4KB, 16KB and 64KB ␤ - Apple A7-A8: 4KB, 64KB ␤ - Apple A9-A12: 16KB, 64KB ␤ - Apple A14/M1: 4KB, 16KB ␤ ␤ No page size in common.
<geist> and like i said, VAX used 512 byte pages, whcih x86 borrowed a ton of design from
<kazinsal> I think the System/370 used 4K pages
<zid> You'd pick 4k on a much lower memory system if you were dealing with large records
<zid> and 4k on much higher memory systems if you were dealing with many small records
<geist> re: M1 I'm fairly certain the reason it bothered to implement 4K in hardware has eerything to do with Rosetta 2 and running x86
<j`ey> yeah that's likely
<geist> since the normal ARM side of OSX is straight 16K native
<kazinsal> S/360's "page" size (not really a page, technically called a Storage Block) is 2048 bytes
<zid> I'd say x86's choice of 4k is actually very forward-looking, for the time they decided on it, and was probably fairly big. Which is why it's still not an issue.
FreeFull has quit []
<zid> we'd really be struggling on 2k pages, but'd have been a lot more relevent to machines with 8MB of ram imo
<j`ey> geist: the IOMMU only supports 16K though! and linux doesnt support mismatched page_size vs iommu page_size yet
<vin> kazinsal: do you think these size decisions were cpu driven or storage driven? I.e. a size picked to work well with the storage device
<zid> neither, memory
<vin> :o
<zid> it's literally the size of the units in the memory subsystem
<zid> why would tape drives matter
<zid> or regiter size
<geist> yah and by the time x86 pickde their page size it was probably fairly well established that 4K is about right
<zid> geist: imo they were premature, but in a good way
<zid> 4k is oversize slightly for a couple of megs of ram, which is why it still works for the tens of gigs we have now
<geist> VAX had been out for 8 years or so with 512, and i think it was a bit small by the. 68k had a 68851 mmu which i think started at 4K pages and it was the main thing 386 (where paging was added) competed with
<vin> zid: why would the memory subsystem have a 4k page limit? Isn't a row 64B that gets activated and not 4kB
<zid> but had they gone for 1k or 2k, we might have ditched that style of paging with x64 already and gone to 8k or 16k for it
<zid> vin: capacity, this is about how many times one goes into the other.
<geist> also 12 bits per page elegantly has a nice 2 level table for 32bits
<geist> 12 + 10 + 10
<zid> so it's swings and roundabouts, we're kinda stuck with 4k because they chose it very early and it's adequate for today even if 8k/16k might ultimately be better. If they had picked 2k we'd have already moved.
<geist> word.
<kazinsal> yeah, 4K is fine
<zid> I think 2MB pages are kinda.. a bit too big? I'd like 128kB pages for 'big' mappings I feel.
<zid> but it fits the scheme better
GeDaMo has quit [Quit: Physics -> Chemistry -> Biology -> Intelligence -> ???]
<geist> yah and arm has a nice solution for that with the 'contiguous' pages
<vin> With huge page accesses/page caches storage being more granular will not help anyway
<geist> Ryzen in theory has that, but it's apparently transparent
<zid> as in they internally collasce ranges or something?
<zid> coallesescance
<zid> for christmas can I have spellcheck?
<geist> yah. with arm64 it's more exposed: you put N entries back to back and set a bit that says they're part of a larger page
<geist> so you can easily mix say 4K and 64K pages
<zid> That seems very nice
<geist> yah it's a little ergonomically unfriendly, but seems pretty useful
<zid> I wonder about an mmu design that is weird and not a tree, and actually ranged base
<zid> based
<zid> like MTRRs or whatever
<zid> or the e820
<zid> I assume we have silicon to burn if we want to on the memory controller can can let it do really clever things to build its own arbitrarily shaped trees or whatever, and not have to rely on us shoehorning it into ugly trees using hacks
<zid> s/can/and
<vin> < zid: vin: capacity, this is about how many times one goes into the other. > Than storage devices having 4k/512B granularity is just orthogonal and has nothing to do with 4k memory page sizes?
<zid> yes
<vin> So hard to believe!
<zid> the same considerations apply though
<zid> so the answer is very likely to end up the same
<zid> people use 4k pages because the things they put in memory are roughly 4k shaped
<zid> people use 4k blocks because the things they put on disc are roughly 4k shaped
<zid> bigger means wasted storage where files are padded out to the page size, smaller means more operations / filesystem churn
FreeFull has joined #osdev
saltd has quit [Remote host closed the connection]
saltd has joined #osdev
<netbsduser> vin: you are right to have some disbelief
<netbsduser> the VAX chose a page size of 512 bytes because DEC thought to have a page cache, the size of a disk block and the size of a memory page must be equal
<mjg_> not a tragic idea
<mjg_> for the time period
k8yun has joined #osdev
<vin> netbsduser: Interesting, so at least VAX had 512B pages because the smallest access one could make to a disk was 512B. Is there a VAX manual or paper that describes this?
vin has quit [Remote host closed the connection]
duckworld has quit [*.net *.split]
duckworld has joined #osdev
vin has joined #osdev
duckworld has quit [*.net *.split]
duckworld has joined #osdev
duckworld has quit [Max SendQ exceeded]
duckworld has joined #osdev
<dh`> I have some doubts about _that_
<dh`> 512-byte blocks were not yet a fixed thing when the vax was invented
<dh`> I think
<kazinsal> DEC used 512 byte blocks for disks on the PDP-11
<dh`> anyway the other thing is, the default filesystem block size was 8K for a long time before limitations in linux ext2 make 4K preferable
<dh`> s/make/made/
<kazinsal> I think Unix used 512-byte blocks for dectape and magtape as well
<dh`> unix has 512-byte blocks baked into things like du(1), much to everyone's irritation
<dh`> but I _think_ that happened later also
<kazinsal> that's an early thing
<dh`> I very strongly suspect that if you look at v7 or earlier it'll just report the number of blocks, whatever size they happen to be
<kazinsal> that is the case, yes
<kazinsal> seems to be so as far back as v2
<kazinsal> v2 manual says the filesystem works in 512 byte blocks as well
<dh`> anyway my recollection of ms-dos in the 80s is that blocks were by no means necessarily 512 bytes
<dh`> but I can't remember for certain
<vin> dh`: With regards to linux ext2 limitations making 4k preferable, is there more I can read about somwhere?
<vin> * about it somewhere
<immibis> is there really any difference between putting N pages back to back and marking them as consecutive, and doing that but not marking them as consecutive?
<immibis> other than TLB occupancy?
<mjg_> you will still know with 4k granularity which page(s) got dirtied
<mjg_> now that i said it, one huge page mapping is cheaper to install in the page table
<mjg_> it's less resources all around, modulo potential waste within these pages
<heat> vin, linux vfs requires block size <= page size due to how mpage IO and struct buffer_head's are layed out on top of struct page
<heat> you cannot mount an 8KB bsize ext2 fs in an x86_64 linux machine
<heat> because 4KB is the largest you can usually mount, it's the preferred one (large-ish blocks, compatible with most things)
<heat> realistically, ext2 uses 1024 for small filesystems, 2048 for medium-ish, 4096 for the rest
<heat> although you reach 4096-sized blocks like some gigs in
<heat> note that you could probably theoretically bypass the linux vfs restriction, but that probably makes things a good bit slower
<vin> yes heat I just read this answer https://stackoverflow.com/a/38309588 from ext2/3/4 maintainer
<bslsk05> ​stackoverflow.com: linux - Bypassing 4KB block size limitation on block layer/device - Stack Overflow
<heat> right now the common write path is that mpage stores a list of buffer_head (your "buffer cache", represents the block device's buffers/cache/wtv) in each struct page
<vin> He says it is possible by modifying the VM layer
<heat> yes sure
<heat> the mm and filesystem layers are intrisically linked in linux
<heat> there's no mm conference, nor a fs conference for linux devs, but a mm & fs conference
<heat> tbf linux has received a substantial overhaul in the past few years to the page cache and IO in general, so maybe the situation has changed
<vin> So page size > block size is okay but not page size < block size
<vin> I don't quite understand why though
<heat> how can a page have 1 or more buffer_head's if the size of a block is larger than the size of a page
<heat> that's the big problem here. each buffer_head maps to a *single* page, each page has 1 or more buffer_head's
FreeFull has quit [Ping timeout: 265 seconds]
<vin> I see
<heat> and this is just one example
<heat> the idea that a fs block cannot be larger than a page is something that is deeply embedded in a bunch of places in linux mm/fs
<vin> Super interesting!
<vin> So we never know how much better HDDs could have performed if larger physical block sizes were supported. Because no one put in the effort to make the vfs change we will never know.
<vin> Looks like reluctance to change might have expedited newer technologies (NAND flash)
<CompanionCube> geist: re: the original motivation for porting ZFS to linux was to run lustre over the object store layer
<mjg_> oh?
<mjg_> how is that working out?
<CompanionCube> well, since the early '10s they've got the 55PB sequoia supercomputer
<mjg_> but is it 1. on zfs 2. using the object store directly
<bslsk05> ​www.nextplatform.com: First Look at “Frontier” Supercomputer’s Storage Infrastructure
<CompanionCube> and on the lustre side: https://wiki.lustre.org/ZFS_OSD
<bslsk05> ​wiki.lustre.org: Just a moment...
<heat> klange, could you cut a new release of bim? bim 3 doesn't work with new kuroko as you may know
<klange> oh right, I should have tagged 3.1
<klange> gimme a minute to write up some release notes
<mjg_> CompanionCube: cool, thanks
<CompanionCube> related but distinct is https://blog.mayadata.io/openebs/zfs-based-cstor
<bslsk05> ​blog.mayadata.io: ZFS based cStor- the Storage Engine of OpenEBS (built on Kubernetes)
saltd has quit [Remote host closed the connection]
vin has quit [Remote host closed the connection]
k8yun has quit [Quit: Leaving]
<klange> heat: tagged 3.1, one major feature (initial regex search engine, it's an extended knuth system), handful of new highlighers, all the Kuroko C API changes, plus the C and Javascript highlighters are a lot faster now
<heat> thank klange
<klange> I had bumped the version for it a month ago, but I guess I just forgot?
<klange> Huh, it's also been long enough that `-q` is new to this release... Silliest option flag for an editor ever, it's "quiet mode", where it reads + highlights a file but doesn't print it and immediately exits. It's for performance testing highlighters.
<klange> In the early days of Kuroko-ified bim, it would take *multiple seconds* to highlight bim.c on my main workstation. Kuroko has improved enough that it now takes a quarter of a second.
[_] has joined #osdev
[itchyjunk] has quit [Ping timeout: 250 seconds]
netbsduser has quit [Remote host closed the connection]