<wikan>
well, i decided to write my own libfuse driver first, because i want my own filesystem
<ThinkT510>
I also like it when a language tries to improve itself based on lessons it learns trying to make an OS. here is one example in V: https://github.com/vlang/vinix
<bslsk05>
vlang/vinix - Vinix is an effort to write a modern, fast, and useful operating system in the V programming language (60 forks/915 stargazers/GPL-2.0)
<wikan>
thanks, i like your links ;)
<wikan>
never heard about v
<ThinkT510>
This one is rust specific but the author goes into a lot of generic background information: https://os.phil-opp.com/
<bslsk05>
os.phil-opp.com: Writing an OS in Rust
<wikan>
the best thing for me is as much information as it is mossible about everything what i have to master to be able to bring my own idea to life
<wikan>
i must collect puzzles first
<wikan>
because I have no idea not even how to start but the most important for me is to not copy blindly without understanding detailts
<wikan>
this is why I start with fuse for linux.
<wikan>
will inderstand and got a little practice with vfs concept
<wikan>
and of course it will help me to copy files to my own partitions of my own os in the future :)
<wikan>
or will not if I didn't understand fuse correctly
<ThinkT510>
implementing a filesystem is one piece of the puzzle
<wikan>
yea, but not sure if fuse gives me enough of flexibility
<wikan>
like custom fs attributes
<ThinkT510>
your design considerations will certainly have a knock on effect on the rest of the system. monolithic kernel vs microkernel, realtime, specific vs generic, embedded etc.
<wikan>
well and it is extremely hard for me because I want to micro or nano kernel
pretty_dumm_guy has quit [Ping timeout: 260 seconds]
ElectronApps has quit [Remote host closed the connection]
pretty_dumm_guy has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
[itchyjunk] has quit [Read error: Connection reset by peer]
zaquest has quit [Quit: Leaving]
zaquest has joined #osdev
kaichiuchu has joined #osdev
dennis95 has quit [Quit: Leaving]
ravan__ has joined #osdev
ravan_ has quit [Ping timeout: 260 seconds]
mctpyt has quit [Remote host closed the connection]
mctpyt has joined #osdev
ravan__ has quit [Quit: Leaving]
jess has quit []
ravan has joined #osdev
dra has joined #osdev
Matt|home has joined #osdev
srjek has joined #osdev
<vin>
When would read be better than mmap?
<gog>
when the descriptor can't be mapped for some reason
<gog>
like a socket
<GeDaMo>
Uh ... when the 'm' key doesn't work on your keyboard? :P
<vin>
lol
<vin>
when would read *perform* better than mmap though?
<gog>
probably never
<junon>
I've read that read is faster than MMAP in the case of sequential reads of block size
<junon>
I don't know if that's true or not
<junon>
Unless you mean a single call, in which case, it wouldn't.
<junon>
Probably.
<zid>
read is pretty well optimized for sequential
<zid>
mmap's nice if you care about cache and stuff
<vin>
what is "well optimized" zid? what more can read do than prefetch next block in sequnetial access? I am sure linux prefetches blocks on sequential mmap accesses as well.
<zid>
You're going to eat a lot of demand faults with mmap, compared to read, but once it's faulted in it should be nice and fast
<zid>
vin: 0 copy shenanigans, no page faults, etc
<mjg>
it really dependso n the access pattern
<vin>
So reading a small file sequntially would be faster with read? Just pay the sys call overhead but avoid page faults?
<mjg>
one key point with mmap is that it it costs to set it up
<mjg>
and most notably if you have multiple threads you will be suffering from it to some extent
<zid>
mmap's going to churn your TLB, cause a fuck lot of page faults, and a bunch of time setting up the page tables etc
<zid>
it's basiucally read + brk combined
<zid>
read is just read
<klange>
mmap is good if you don't know what you want ahead of time
<zid>
mmap's perfect for "My gap has a 1GB file it keeps in memory and uses random bits of it all the time"
<zid>
gap? game.
<junon>
I think a better question is, read+seek vs mmap
<junon>
Also mmap is inherently thread safe, is it not? whereas a read on an FD is not.
<vin>
junon: then wouldn't mmap be always better than read+seek
<klange>
probably something you could do a bunch of benchmarking for on different kernels/platforms/disks...
<junon>
I suppose I've never considered the thread safety of an mmapped region but I can't imagine it having issues.
<junon>
Yes
<junon>
vin: again it probably depends
<junon>
Also read on io_uring (on linux) from the FS is going to be faster than read syscalls.
<junon>
as io_uring was designed specifically for FS performance at facebook then was expanded out, IIUC
[itchyjunk] has joined #osdev
<vin>
Wait both read and mmap needs to copy from kerenl buffer to userspace buffer correct? Even with io_uring.
<vin>
So a kernel bypass read should be better than io_uring? like spdk I guess
<klange>
No, there doesn't need to be any buffer copying in a read if the conditions are right. Nicely page-and-sector-aligned reads can easily be zero-copy...
wootehfoot has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<graphitemaster>
Self plug: I updated my incbin header hack, it now does more, like text and works on Harvard architectures
pretty_dumm_guy has quit [Ping timeout: 260 seconds]
pretty_dumm_guy has joined #osdev
<graphitemaster>
sortie, Yeah but that requires running a separate external tool, which is sadge
<zid>
bin2o not being part of binutils is sadge
darkstardevx has quit [Read error: Connection reset by peer]
GeDaMo has quit [Remote host closed the connection]
<graphitemaster>
sortie, your documentation on your site still mentions freenode! GASP
<sortie>
graphitemaster, guess I just didn't regenerate that html file yet
darkstardevx has joined #osdev
darkstardevx has quit [Remote host closed the connection]
darkstardevx has joined #osdev
dude12312414 has joined #osdev
heat has joined #osdev
dra has quit [Remote host closed the connection]
mahmutov has quit [Ping timeout: 260 seconds]
<heat>
re: read/mmap, mmap doesn't copy data, nor allocates memory, unless it needs to, which is only when COWing
<heat>
with mmap you never actually allocate any pages beyond those in the inode's page cache unless there's actually a need for duplication, and because of that, you don't need to copy pages too
<heat>
that's why it can be faster than read, especially since the kernel can do readahead pretty nicely when it sees you accessing the mmap region
<heat>
oh and if the kernel starts to need to reclaim memory, in the read case it will write your file's data to swap, since it's anonymous memory; with mmap, it only writes back if there are dirty pages and there's no swap involved
<geist>
it's all about the setup overhead of mmap vs the gains you get from 'zerocopy'
<geist>
if it's a one time read it's probably not faster to mmap a thing and then read/write it
<geist>
but if you are going to continually access it then it probably ends up being a win
<geist>
also this is all assuming the OS is about as smart as it can be with this stuff
<geist>
a map of a huge file that you then sequentially read is possibly pretty fast if the OS is going to observe whast your'e doing and start pre-fetching pages as you read through it and try to pave in front of your cursor
<geist>
and thus making the overhead of the demand faults minimized
<geist>
OTOH if you're just reading a file in then it' the same number of copies to go through a read() call, since the kernel is most likely going to just map the same backing pages on a mmap() then what it'd internally memcpy out to user space
<clever>
i can see how read-ahead with mmap would perform faster then read-ahead with read(), even ignoring the copy costs
<geist>
*potentially*, but that's only if the kernel really sees whats going on and gets ahead of the cursor, which it probably cant
<clever>
the kernel could read-ahead with a second core, and fill the paging tables in ahead, so your thread never does a single context switch
<clever>
while read() has to both context switch AND also copy
<geist>
i doubt it can keep up. the page table overhead is immense vs the speed a cpu can read through stuff
<geist>
this is of course assuming the file is already completely brought in (or it's some shared memory object that's already populated)
<clever>
yeah, it would rely on you reading at less the 100% bus capacity
<clever>
like doing compute as you read
<geist>
right
<geist>
but i think in the case where you're mapping in something you're hitting randomly and continually, makes a lot of sense
<geist>
cache files, font files, databases, etc
<clever>
i use rtorrent a lot, and its mmap based
<clever>
and torrents are a very random load
<geist>
oh yeah totally
<clever>
but until your closing things, you dont need to sync much
<clever>
so the kernel can just flush the dirty data whenever it wants, and you dont really block on IO
<geist>
but yeah everyone here has repeated the sae thing with different variants: it depends and the setup is hard
<geist>
this is a thing we see a lot in zircon actually, since the VM lets you do pretty mcuh whatever you want by tossing a bunch of objects and tools at your feet
<geist>
and so folks have been building all sorts of things out of it, some of which may or may not be fast or efficient
<heat>
geist, I don't believe linux can do zero-copy read()
<geist>
heat: i dont either, since by definition it's a copy
<clever>
geist: does the TLB store negative results, like unmapped memory?
<heat>
it can barely even do zero copy networking (send)
<geist>
i consider the 'kernel moves data from internal buffer to user space' to be a copy
<geist>
clever: negative. that's a strong design requirement in pretty much all arches i've seen: no negative TLBs
<clever>
ah
<clever>
so you can just fill the page table in as you read from disk, and not have to flush anything
<geist>
there is even a fairly complex dance of precisely the state of a TLB when a cpu takes a page fault of various types. ie, did it just store that it couldn't do a TLB operation? did it flush the entry if it was a permission fault? etc
<geist>
heat: if you dont count the actual user copyin/copyout then read() should be zero copy since the kernel can basically move directly from the backing page cache
<heat>
ah
<heat>
well you have a copy, I wasn't counting that as zero copy :)
<geist>
but, mmap can be zero copy in the sense that you can directly manipulate the page cache. though at that point you h ave to consider whether or not a memcpy inside your application is a copy or not
<geist>
if you mmap a file and then directly utilize the mapping without making a private copy of the data, say for an image file, then that's definitely less copies than a read()
<clever>
i was investigating memory usage of wine many many years ago, and noticed that a chunk of the .exe file was anon memory, not mmap'd
<clever>
as i dug more, i found that the LOAD commands in the .exe, wanted a chunk of the file to be mis-aligned in ram
<geist>
clever: probably like in ELF where you either COW the .data segment, or you map in an anon chunk for bss
<geist>
ah
<geist>
yaeh if it's misaligned the loader either has to give up or make an anon mapping and copy
<heat>
oh no, misalignment D:
<clever>
and you cant share the read-cache with the mmap, if they have different alignments
<clever>
exactly
<heat>
i've always thought about a (userspace) filesystem driver that just mmap'd the whole disk and read everything like regular memory
<heat>
seems like a fun experiment
<clever>
heat: definitely needs 64bit there!
<heat>
yupp
<clever>
and with 48bit virtual size, thats 256tb of address space
<clever>
so as long as LVM isnt fusing devices into a larger unit, you shouldnt have any issues mapping the entire disk
<geist>
heat: there's kinda something like that we're working with on fuchsia
<clever>
but you may have issues mapping several disks in an array like zfs
<geist>
sicne FS drivers are user space, and the FS drivers also act as a user pager
<geist>
for stuff like metadata you can create a VMO that represents the entire disk
<geist>
and then demand fault them in and use the user pager interface to let the kernel do writebacks and whatnot
<heat>
the issue is that you get lots of page duplication between the disk's cache and the file's cache
<geist>
also i think that's a simialr interface to talk to block devices: you can have a VMO that represents the entire block device and then shuffle pages around with it
<geist>
that *is* the disk cache
<heat>
unless you're lucky enough that the partition is page aligned and the blocks are page-sized
<geist>
oh sure. yeah if you didn't then that's bad
<geist>
we at least have the advantage that we can make sure that is the case
<clever>
heat: fdisk tries to make partitions 1mb aligned, to future-proof things like that
<heat>
i've done a fun experiment with linux: create a file, write to it, read from the disk at the block's location, flush, read again from /dev/sda and again, with O_DIRECT
<zid>
1MB? go for 2MB because of 2MB pages smh
<zid>
actually, let's go for 1GB
<clever>
heat: a major problem when you dont do that, is if you have say 4kb sectors with 512byte emulation, and the partition wasnt 4k aligned, so every 4k write involves an 8kread, modify, 8kwrite!
<heat>
data stops being coherent between reads to sda, the file, and sda with O_DIRECT
<geist>
worse: classic DOS MBR would start the first partition in somethingl ike sector 63
<geist>
for legacy dumb reasons
<clever>
i think fat has a structure similar to the MBR in sector 0 of itself?
<geist>
or 62, or whatever it was one sector before where you think the sector should ber
<clever>
from the old pre-partition days
<gog>
so it didn't span cylinders i think
<clever>
and starting one too early, would put the bulk of the data where you would have expected it
<clever>
that ~1mb gap between sector 0 and partition 1, is also what grub and many other bootloaders abuse
<geist>
side note random recent game plug: metroid dread. finished it a few days ago. good fun
<clever>
a decent chunk of the bootloader gets shoved into that "unused" space
<clever>
but with GPT using that region, and banning such hacks, legacy on GPT now uses a partition of type "bios boot partition" to hold the raw executable code
<clever>
and the protective MBR in sector 0 still has the same 1st-stage as before
<gog>
i was playing simutrans but i hate it because they adamantly refuse to make the interface less clunky
<heat>
if we're plugging games I might as well do my classic "dark souls is the best game ever" plug
<gog>
openttd isn't as fun and also it hard locks my machine for some reason
<heat>
open source games are not very fun because most open source developers are bad with UX
freakazoid343 has joined #osdev
<gog>
openttd has a slightly better ui. slightly
<gog>
but its passenger and goods transport isn't very realistic
<gog>
too easy
heat has quit [Remote host closed the connection]
chibill[m] has joined #osdev
darkstarx has joined #osdev
darkstardevx has quit [Ping timeout: 260 seconds]
darkstardev13 has joined #osdev
darkstarx has quit [Ping timeout: 260 seconds]
vdamewood has joined #osdev
darkstarx has joined #osdev
darkstardev13 has quit [Ping timeout: 245 seconds]
Maka_Albarn has joined #osdev
<Maka_Albarn>
yo. has anyone heard anything about what's going on with osdev.org? I know it's been down for a few days now, but does anyone know why?
<Maka_Albarn>
...more or less down.
jess has joined #osdev
<zid>
nobody paid me $200 to threaten the guy hosting it not to stop
<vdamewood>
zid: What if we paid the $200 to the hosting guy?