klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
bch has quit [Quit: quitter]
bnchs has quit [Remote host closed the connection]
bauen1 has joined #osdev
bauen1 has quit [Remote host closed the connection]
bauen1 has joined #osdev
dutch has quit [Quit: WeeChat 3.8]
dutch has joined #osdev
Matt|home has quit [Quit: Leaving]
nyah has quit [Quit: leaving]
Left_Turn has quit [Ping timeout: 264 seconds]
thinkpol has quit [Remote host closed the connection]
thinkpol has joined #osdev
gog has quit [Ping timeout: 268 seconds]
epony has joined #osdev
slidercrank has quit [Ping timeout: 265 seconds]
Burgundy has quit [Ping timeout: 260 seconds]
_xor has quit [Quit: brb]
divine has quit [Quit: leaving]
heat has quit [Ping timeout: 256 seconds]
Starfoxxes has quit [Ping timeout: 248 seconds]
Starfoxxes has joined #osdev
fedorafansuper has quit [Ping timeout: 248 seconds]
fedorafan has joined #osdev
srjek has quit [Ping timeout: 252 seconds]
<immibis> what if instead of shoehorning everything into a handful of file verbs, the correct solution is to make verbs cheaper
<zid> we decided spectre was more fun
<Mutabah> APIs are hard yo
<zid> I do think we should break out ioctl to some more general verbs though
<zid> it's hard though
<immibis> No, more specific. Windows has verbs like EnumerateNetworkAdapters
<zid> which lives in userspace
<zid> and that's what linux does too, but leaves it to 3rd party userspaces
<immibis> How to enumerate network adapters on windows: call EnumNetworkAdapters or something like that. On Linux: open a netlink socket, send a request for network adapter list, receive messages until you get the last one, ignore messages not related to your request
<zid> windows has cm/ex/hal/io/ke/mm/ob/po/tm and nt/zw
<zid> which is more than linux has to be sure
<zid> but it doesn't *actually* have 99% of the winapi as syscalls
fedorafan has quit [Ping timeout: 248 seconds]
<immibis> who said syscalls
<immibis> which should really be called kernel calls, because all those DLLs are definitely part of the system
<zid> It's a paradigm difference, windows provides its 'verbs' by bundling them with the kernel inextricably as a userspace dll
<zid> linux provides nothing
fedorafan has joined #osdev
fedorafan has quit [Client Quit]
<zid> and on linux, you just ls /proc/net
<zid> cat /proc/net/dev
<zid> that's what ifconfig does I just checked
<Mutabah> For system information like that, files seems like a pretty appropriate method
<zid> yea files is actually really workable for that specific case
<Mutabah> (You're just reading a blob of information)
<zid> There's a whole shit load of 'desktopy' stuff that windows does provide that doesn't work that great as files though
gildasio1 has quit [Remote host closed the connection]
gildasio1 has joined #osdev
<immibis> that is an outdated interface zid. You're supposed to use netlink
<immibis> netlink is equivalent to device files but with a different interface for some reason that makes everything more complicated
<immibis> instead of opening /dev/netsystem you connect a netlink socket to the networking system using a hardcoded address number
<immibis> which goes in the protocol field and not the address field because they said so
_xor has joined #osdev
<sham1> It's even worse than I imagined
wand has quit [Ping timeout: 255 seconds]
<zid> so the problem is that the api moved *away* from files :P
[itchyjunk] has quit [Remote host closed the connection]
<sham1> Something something "those who do not understand UNIX are bound to recreate it, badly" something
dude12312414 has joined #osdev
dude12312414 has quit [Remote host closed the connection]
wand has joined #osdev
AFamousHistorian has joined #osdev
mehdix has quit [Quit: ZNC 1.7.5+deb4 - https://znc.in]
x8dcc has quit [Ping timeout: 246 seconds]
bradd has joined #osdev
fkrauthan has quit [Quit: ZNC - https://znc.in]
fkrauthan has joined #osdev
fkrauthan has quit [Client Quit]
fkrauthan has joined #osdev
<sham1> And this is why computing has been stagnant. We peaked in the 70s
sebonirc has quit [Remote host closed the connection]
sebonirc has joined #osdev
<zid> 2011*
<sham1> UNIX isn't from 2011
<zid> no but sandy bridge is
<zid> there has to be a dip afterwards for there to be a peak, rather than a plateau
<zid> and we definitely went up until sandy, then down
<kazinsal> maybe intel did
<kazinsal> the rest of the computing world moved on
<kazinsal> no one else is to blame for the most modern intel processor continuing to be a microwaved sandy bridge with the marketing flavour of the week bolted on for 1d4 generations before being dumped
<zid> hey that's unfair, they also roll the 1d4 for how many pci-e lanes to remove and how many memory channels too
<zid> 13xxx was semi-decent though, a good tock to sandybridge
<kazinsal> I ended up just building a 7700X machine because either way I'd need to jump to DDR5 so I figured I'd go with a fresh socket that's guaranteed to get at least two generations of tangible CPU improvements
<zid> if I had to give up SB for something before 13xxx came out it would have been a 3600x or something
<immibis> zid: files were insufficiently extensible; netlink allows structured requests and extensible TLV requests and responses and update notifications
<immibis> isn't sandy bridge also a slight update of whatever was before that, and isn't that also the case for all processors back to the first Core 2 which was a major microarchitectural change?
<sham1> This is why devices and such need to be file trees instead of single files
<immibis> or it's why memory needs to be an SQL database
<moon-child> how many avxes does sandybridge have?
<zid> 1
<moon-child> exactly
<moon-child> you can track the quality of intel cpus by how many avxes they have
<zid> sandy is where they figured shit out
<moon-child> hence why zen4 is the best zen
<zid> nahlem is missing stuff and is tuned worse etc
<moon-child> I mean, sure. But skylake > broadwell > haswell > ivybridge > sandybridge. It's not like it ever got _worse_
<zid> except that's not what chips they actually released
<zid> if you look at the SKUs, all they did was remove pci-e lanes and memory channels for 10 years
<zid> you got a couple of percent ipc, though, woo
<moon-child> idk man
<moon-child> I have a bunch of pcie lanes and memory channels
<zid> what cpu?
<moon-child> skylake
<zid> what cpu?
<moon-child> actually cascadelake now
<moon-child> w-2295
<zid> lol no shit
<zid> that's one of the best cpus they ever made
<zid> and is firmly outside that 10 year window
<zid> SB got uop cache(!), dual port memory, doubled branch prediction targets, avx, integrated graphics, etc
<zid> lots of other various internal hidden buffers doubled
<zid> it's *way* better than nahalem
<zid> it also clocks better by like.. gigahertz
<sham1> Memory need not be an SQL database. A filesystem however could be
<zid> biggest missing thing between SB and skylake (actual skylake, not fucking mega-rocket-cascade-ice-lake, actual skylake) is the cache clocks got massively improved, and avx2
<moon-child> cascade lake is literally skylake
<moon-child> icelake is different
<kazinsal> yeah most improvements have on the intel side since SB have been "we can get it to clock 125 MHz higher on stock voltage" every gen
<zid> cascade lake is skylake-sp++ or skylake-x++
<moon-child> well
<kazinsal> it was kinda neat just setting up my 7700X and telling Ryzen Master "yeah go nuts homie" and getting 5.3 GHz all-core without having to try
<moon-child> they also increased the number of avxes
<moon-child> which must count for something
<zid> kazinsal: I might get near that if I had enough fans :(
<zid> wtb more fans
<kazinsal> 360mm rad here
<zid> yea I'm still using the evo 212 from my q6600 :D
<kazinsal> it still maxes out at 95C but apparently they just do that
<zid> I hit 95C at like.. 4.5GHz, but I still have a shit load more voltage I can throw at it
<kazinsal> temps only matter if you can't actually dissipate the load
<zid> just don't have a cooler that'd let me
<kazinsal> and thing thing just goes BRRRRRRRRRRRRRRRRRR
<zid> maybe the silicon fails at 4.55GHz idk
<zid> but it seems like it won't given how much headroom I have left
<zid> The single core OC record for this chip is like.. 6.5GHz
<immibis> sham1: wrong. Everything should be SQL. How much time have you spent writing code to maintain multiple indexes of the same data?
<zid> I agree with S
<zid> disagree with QL
<moon-child> no one likes QL
<immibis> update dudes set position=position+velocity*0.03
AFamousHistorian has quit [Remote host closed the connection]
fedorafan has joined #osdev
knusbaum has quit [Ping timeout: 248 seconds]
knusbaum has joined #osdev
LostFrog has joined #osdev
PapaFrog has quit [Ping timeout: 256 seconds]
knusbaum has quit [Ping timeout: 255 seconds]
danilogondolfo has joined #osdev
knusbaum has joined #osdev
shinbeth has joined #osdev
jjuran has quit [Ping timeout: 260 seconds]
jjuran has joined #osdev
epony has quit [Remote host closed the connection]
gog has joined #osdev
bauen1 has quit [Ping timeout: 248 seconds]
slidercrank has joined #osdev
remexre has quit [Read error: Connection reset by peer]
nyah has joined #osdev
small has joined #osdev
GeDaMo has joined #osdev
joe9 has quit [Quit: leaving]
fedorafan has quit [Ping timeout: 252 seconds]
les has quit [Quit: Adios]
les has joined #osdev
Burgundy has joined #osdev
fedorafan has joined #osdev
bauen1 has joined #osdev
bauen1 has quit [Ping timeout: 265 seconds]
bauen1 has joined #osdev
slidercrank has quit [Ping timeout: 252 seconds]
dutch has quit [Quit: WeeChat 3.8]
spikeheron has joined #osdev
gog has quit [Ping timeout: 252 seconds]
gog has joined #osdev
bradd has quit [Ping timeout: 264 seconds]
[itchyjunk] has joined #osdev
DynamiteDan has quit [Excess Flood]
DynamiteDan has joined #osdev
bgs has joined #osdev
heat has joined #osdev
Burgundy has quit [Ping timeout: 256 seconds]
shinbeth has quit [Remote host closed the connection]
heat has quit [Read error: Connection reset by peer]
heat has joined #osdev
slidercrank has joined #osdev
srjek has joined #osdev
Left_Turn has joined #osdev
srjek has quit [Ping timeout: 265 seconds]
<mrvn> *gaehn*
<nikolar> *agheh*
grange_c0 has quit [Quit: The Lounge - https://thelounge.chat]
grange_c0 has joined #osdev
Burgundy has joined #osdev
gog has quit [Quit: Konversation terminated!]
Piraty has quit [Quit: -]
Piraty has joined #osdev
zhiayang has quit [Quit: oof.]
zhiayang has joined #osdev
xenos1984 has quit [Ping timeout: 248 seconds]
xenos1984 has joined #osdev
dude12312414 has joined #osdev
bauen1 has quit [Ping timeout: 248 seconds]
bch has joined #osdev
x8dcc has joined #osdev
spikeheron has quit [Quit: WeeChat 3.8]
dutch has joined #osdev
Terlisimo has quit [Quit: Connection reset by beer]
Terlisimo has joined #osdev
xenos1984 has quit [Ping timeout: 246 seconds]
<sham1> ping
<zid> dong
janemba has joined #osdev
xenos1984 has joined #osdev
bauen1 has joined #osdev
FreeFull has joined #osdev
craigo has joined #osdev
<nikolar> bang
<mrvn> I don't like greek pop-duos
<zid> and the dirt is gone
<geist> kazinsal: oh you got a 7700x? how you liking it? do you have an opportunity to single core bench it vs a 5000 or 3000 series?
<geist> curious how the zen 4s perform single core
<zid> looking it up, very good
<geist> i can look up benchmarks of course, but always nice to see it confirmed on the street
<zid> like, I am surprised
<zid> ryzen 5 was pretty average, my SB competed
<zid> his 7700 destroys us
<nikolar> It was like 13% IPC uplift
<nikolar> Add to that significantly higher clocks
<zid> and it's running like a gigahertz faster yea
<geist> you mean ryzen 5xxx when you say ryzen 5?
<zid> 3xxx
<zid> at least
<zid> is what I checked for a comparison
<zid> I'm not that into zen3
<geist> yah that'd be zen 2 vs zen 4, good jump
<zid> zen3 looks to be rouhgly.. the exact midpoint
<zid> intel went SB, crap crap crap crap, 13900k, amd went zen2, zen3, zen4 in equal steps
<nikolar> 13900k was less efficient than amd
<zid> now get one of those nice epycs with 768MB of L3
<nikolar> You can store a whole os in the cache :)
small has quit [Ping timeout: 252 seconds]
<mrvn> Now I wonder how long a syscall will take on a "AMD Ryzen 5 2400G with Radeon Vega Graphics" with kvm.
<mrvn> I should measure raw, kvm and nested kvm.
danilogondolfo has quit [Remote host closed the connection]
zhiayang has quit [Quit: oof.]
zhiayang has joined #osdev
dude12312414 has quit [Remote host closed the connection]
dude12312414 has joined #osdev
epony has joined #osdev
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
xvmt has quit [Remote host closed the connection]
xvmt has joined #osdev
fedorafan has quit [Ping timeout: 248 seconds]
bgs has quit [Remote host closed the connection]
fedorafan has joined #osdev
AFamousHistorian has joined #osdev
k0valski18891 has quit [Quit: Peace out !]
AFamousHistorian has quit [Ping timeout: 248 seconds]
bnchs has joined #osdev
<bnchs> hi osdevelopers and other developers alike :3
<zid> That's fair, I make messes and stupid jokes mainly
<nikolar> Hello
<bnchs> what are you all up to?
<zid> dark souls mainly?
<kof123> a "filesystem" thing. its important to have a plan/design and stay motivated .oO( ♫ and we'll save terrance and phillip too, cuz that's what brian boitano'd do ♫ )
brocellous has left #osdev [#osdev]
<mrvn> hey, me too. I need a cachefs for fuse.
k0valski18891 has joined #osdev
<mrvn> Anyone know if copy_file_range will pre-allocate the output file?
<moon-child> zid: that's a lotta l3 ;o
<nikolar> kof123: wanna share some details :)
<kof123> you are assuming i am a good influence/idea. ls -d */function.c | wc -l 1624 that is one function per file
<kof123> skeletons, havent written a line. i dunno, hundreds of fields but most are optional. so code will be very branchy
<kof123> just doing docs, for my own sake, to keep track of things
<mrvn> you are doing it wrong for sure
<kof123> eh, its simple. real stuff will call down to these, these will be very simple few liners
<moon-child> meh just style stuff. I wouldn't do it like that, but if it works, eh
<kof123> and a large amount is "abstractions"
<kof123> well, that is just fascist directory layout for other reasons
<moon-child> sun libm is one function per file and no one complains about that :)
<kof123> the declaration "headers" will be autogenerated
<kof123> this is kind of a....meant to bootstrap other things. so is kind of a grab bag everything and the kitchen sink goes here
<mrvn> moon-child: those are independant and highly complex functions though.
<kof123> normally i would split into many tiny libraries
bnchs has quit [Quit: Lost terminal]
<mrvn> kof123: you should have a macro for unimplemented functions and one file just listing them all.
<kof123> i have a whole #pragma thing planned. but this code is special because it is "boostrap" lol
<mrvn> todo.c /* All the function stubs I still need to implement */ :)
<kof123> like real code will eventually all be pragmas lol
<kof123> this is the tip of my horrible ideas lol
<mrvn> what's a #pragma?
<mrvn> kof123: I'm writing a fuse filesystem that will store up to 1TB of data in ram and has an ioctl to make a snapshot and sync that back to a network filesystem in the background.
<kof123> yeah, i need this to load a "kernel" in, so my "bootloader" will jump to this, and then can get to real "kernel" stuff.....and then try to bridge with prior "userland pseudo-oo stuff". point being, you are way ahead of me there
<moon-child> mrvn: why?
<kof123> i just have lots of scattered pieces eventually have to "merge" them all
<mrvn> moon-child: because user apps are exceedingly stupid. Like creating 4 byte files or overwriting data over and over and that's just horrible slow over the network.
* kof123 observed bnchs left in horror
<moon-child> oh, I missed 'network'
<moon-child> so nfs can't do what you want because coherency
<mrvn> even on local disks it's painfull
<moon-child> makes sense
<mrvn> moon-child: worse, lustre. so 1MB block size for files.
<moon-child> cache not aggressive enough?
<mrvn> moon-child: can't cache create(), flush() and fsync()
<moon-child> mmmm
<mrvn> because coherency
<moon-child> if I were a computer I would simply not crash
<moon-child> or lose power
<mrvn> But there are some really stupid DNA programs out there. They take a 50MB DNA sequence and split it into 3 base pairs long files with 1 extra char info. YOu end up with a working dir with 00000, 00001, 00002, 00003, 00004, ...
<moon-child> wat
<moon-child> why
<mrvn> exactly
<mrvn> Imagine running that on ext2 where file creation is O(n^2)
<mrvn> well, files creation, each file is O(n)
<geist> yah was just thinking if there's much of an optimization you can do there but make an in memory copy of all the dirnames and use it to speed up the lookup
<geist> the standard dir_cache obviously helps for positive lookups
<geist> but for negative lookups where you want to know if something already exists you have to have guaranteed a complete cache
<geist> but all told you still have to search it to find a slot to add the new entry too
<mrvn> geist: dir_hash helps tons.
<geist> sure, but that's an on disk structure
<mrvn> once the dir is cached it's all O(1) in memory lookups.
<geist> or at least whatever the hash collision stuff is
<mrvn> true.
<geist> but yeah was thinking you could read it in and keep a fairly compressed hash of all the entries
<mrvn> But networking still kills it because the op just takes freaking long.
<geist> still basically a full cache, but just more compressed for simple hit detection
<geist> am now kinda curious exactly what the structure of the dir_hash is on ext*
<mrvn> geist: I think you have an array of pointers into an array or file names.
<mrvn> First is indexed by the hash the other just concatenation of all names with potential holes.
* geist nods
<mrvn> Not sure if it does chaining or using the next (+hash2()) slot
<kof123> this sounds less bad: ls -d hash_algorithm* | wc -l 45 (looks like 3 functions per alg.) ls -d *is_list_set_to_* | wc -l 187 (query if option is set) ls -d *set_list_to_* | wc -l 374 (set/unset toggle option) 196 "fields" (beyond header), x2 if we assume get/set function for each, some are like "arrays" or "structs" with "subfields" etc., so this is simplified . 45+187+374+400 1006
<mrvn> Some distributed filesystems can give a host "ownership" over a directory contents. So all file creation by that host is basically local unless some other host contents it. (Hint: never have 2 hosts use the same working/temp dir)
<kof123> in some cases i just didnt want giant argument list
<mrvn> Wasn't there some kernel module for Linux that allows running a list of syscalls with a single call?
<mrvn> or something writing syscalls to an uring?
<moon-child> io_uring?
<moon-child> :P
<moon-child> yea
<mrvn> moon-child: isn't that just IO calls?
<moon-child> I think the idea is you should be able to do any syscall. I don't know if that's been fully implemented yet, though
<mrvn> I think that was what the extra module did
<kazinsal> geist: yeah, it's great! I'm running windows on it so if you have something I can single core bench on that for a comparison for you I can run one
<mrvn> moon-child: One thing that would be really cool would be if io_uring would allow reading data into a buffer and writing it back without passing it through user space. But I see no copy_file_range support. Best you can do is add a pipe and splice.
<moon-child> at some point, you end up writing your entire application in ebpf
* kazinsal . o ( x86 emulator in eBPF, running eBPF hello world in Linux )
<mrvn> Looks like there is some work towards offloading copy operations: https://lore.kernel.org/lkml/cd772b6c-90ae-f2d1-b71c-5d43f10891bf@nvidia.com/
<bslsk05> ​lore.kernel.org: Re: [PATCH v5 00/10] Implement copy offload support - Chaitanya Kulkarni
<mrvn> clever: Do you have some simple barebones examples that do flat shading and texture mapping in 3D for the RPi?
bch has quit [Ping timeout: 265 seconds]
<mrvn> just a rotating cube or equally simple 3D stuff.
<kof123> nikolar: https://0x0.st/s/HS9qzWb6hAPM0w2ILUIB6A/HrPG.c just a .h file with planned "fields" . the reason i am not worried, is anything i screw up, those are all optional, so if i find a better way to do something, add a new field, maybe remove one of those, etc. higher level "logic" functions will call get/set functions basically.
<kof123> then there is like: caching, locks, ....
<kof123> ask mrvn :D
<kof123> *higher level things will do all the real logic
<kof123> some things are just silly like "Inode" i think nfs wants a unique inode. it doesnt really do anything but maybe someday makes "exporting" easier
<kof123> i mean, i may only actually implement like 1/5 of that or less lol
<kof123> more of a brainstorm at this point
<clever> mrvn: texturing yes, shading based on angles, not currently
<nikolar> Oh that's a lot of defines lol
<nikolar> And very long names
<clever> mrvn: https://github.com/cleverca22/gl/blob/master/texture.s this is a fragment shader for doing texture lookup and alpha blending
<bslsk05> ​github.com: gl/texture.s at master · cleverca22/gl · GitHub
gog has joined #osdev
<clever> mrvn: it expects 5 varyings per vertex, the texture UV, and then an RGB color to mix in (the source texture in this case is just solid white on transparent)
<clever> mrvn: the hw leaves the varyings half interpolated, so each time you read the vary FIFO, you have to then add r5 to that, lines 3-6 will fetch UV, finish that add, and then 8/9 passes the UV off to the texture lookup hardware
<clever> 14/15 then blocks until the texture lookup is complete
<clever> 17-27, will pop the R/G/B tint off the varyings, and store them into r1, and 29 sets r1's alpha to 100% (opaque)
<kof123> i say "filesystem" in quotes because it is not defined where those might live (in ram, on disk...just a "stream" somewhere). thus, its like a "build your own" kind of. you might enable some fields for something stored in RAM, other fields stored on disk (or swap or something) and link them together, etc. so really, for anything practical, i will have like "templates" and some DSL to specify at "creation" time what "fields" you
<kof123> t
<nikolar> That's an interesting idea
<clever> mrvn: then it gets a bit more fuzzy, it loads the existing color from the framebuffer, and does alpha blending between them, and writes back
<kof123> so i mean, theoretically it will have more friendliness on top. that is what i mean by "kitchen sink" too....and where the are "registers for process" there ...
<kof123> *why there are
GeDaMo has quit [Quit: That's it, you people have stood in my way long enough! I'm going to clown college!]
<gog> hi'
<clever> mrvn: each time you feed a UV pair into the texture hardware, it will also pop a uint32_t[2] off the uniform fifo, this generates that data: https://github.com/cleverca22/gl/blob/master/core.c#L499-L500
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<clever> mrvn: that contains the phys addr of the texture, and the size and other params
<clever> mrvn: so the big unknown in your question, is how to do the shading/lighting, my example code entirely ignores lighting
bradd has joined #osdev
<mrvn> clever: 3D without a light source looks rather bad.
<mrvn> do I have to assemble that for th vc?
<clever> mrvn: yeah, let me find the assembler...
<clever> https://github.com/hermanhermitage/videocoreiv-qpu has some more notes/examples
<bslsk05> ​hermanhermitage/videocoreiv-qpu - Fun and Games with the Videocoreiv Quad Processor Units (34 forks/238 stargazers)
<clever> where was it...
<clever> 18 19:46:14< clever> /usr/bin/node /media/videos/4tb/rpi/videocoreiv-qpu/qpu-tutorial/qpuasm.js [--showbits] [--dumpglobals] [--dumpsymbols] [--verbose] [--ignore-errors] [--strict-match] [--in]filename
<clever> mrvn: aha, found the filename
<bslsk05> ​github.com: videocoreiv-qpu/qpuasm.js at master · hermanhermitage/videocoreiv-qpu · GitHub
<clever> mrvn: there, thats the original assembler i was using, back before mesa was properly ported
<clever> the end result of running that assembler, looks like: https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c#L143-L153
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<kof123> for something very simple, you could enable, say 8.3 filename field, dos "permissions", say like 200 entries max (all these are fixed-size -- configurable, but header has sizes...many more get/set functions lol), and have like: <header> <this stuff for "entries", say 200 spots reserved> and use the rest of a 1.44M floppy to store file data. so, you might only need like a tiny amount of fields for something simple like that
<clever> https://docs.broadcom.com/doc/12358545 is where you can find VideoCoreIV-AG100-R.pdf, that tells you how the whole 3d core works
<kof123> thats not really a priority, but should be possible
<clever> mrvn: so the question then, is how does the math for shading work, how can we implement it?
<mrvn> clever: for flat shading it's just a cross product of 2 sides to get a normal (normalize by length) and then vector product to get the angle to the light source.
tejr has quit [Remote host closed the connection]
<mrvn> clever: that gets mulitplied to the color.
<clever> mrvn: does this value vary over each pixel in a polygon, or is the entire polygon sharing one value?
<mrvn> If you interpolate the normals for the 3 vertexes you get more smooth lightning.
<mrvn> flat shading has every polygon as uniform color
<clever> so you can either pre-compute the normal map, and feed it in as a second texture
<clever> or you can feed just a few points in as varyings on the vertices
<clever> and interpolate across the polygon
<clever> but, from just the XYZ of a vertex, how do you know what angle the polygon is at?
<mrvn> clever: you need 3 points, 2 sides and then the cross produce gives you the normal
xenos1984 has quit [Read error: Connection reset by peer]
<mrvn> 90° to both sides.
<clever> i'm not sure the vertex shader can do that, so you would have to pre-compute it in cpu first
<clever> i still need to get a working demo of vertex shaders as well
<clever> https://github.com/cleverca22/gl/blob/master/core.c#L231-L244 this is a very primitive opengl implementation, where each vertex has the 5 varyings i showed earlier, UV + RGB
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<mrvn> the multiply vector ALU should do it
<clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c#L349-L356 and a far simpler shader, with just RGB in vary and nothing else
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<clever> the vertex shader, basically loads an `uint32_t attributes[attr_count][16]` into the vector registers, and then runs your vertex shader on a 16-lane vector core, computing 16 vertices in parallel
<clever> the shader must then fill that vector register bank, with x[16], y[16], w[16]?, vary[vary_count][16]
<mrvn> clever: you could compute 5 normals in parallel 5 * XYZ = 15.
<clever> but the shader isnt aware of which polygons are using each vertex
<clever> and a triangle it made up of 3 verticies, which come from 3 sets of attributes
<clever> the hw scheduler expects the 16 lane vector core, to produce 16 shaded vertices
<mrvn> yeah, first you have to get the 3 vertices for each triangle in clockwise rotation. Then compute 2 sides and build the cross product and last normalize.
<clever> but due to the primitive list, the vertices can be in any order
<mrvn> Might make sense to compute 16 triangles at a time.
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<clever> a triangle is made by putting the index of 3 verticies into this primtiveList
<clever> and you can just use any 3 vertices
<mrvn> that's normal. each vertice is also part of 3 or more triangles.
<clever> i dont know what the hardware does when your polygon is fragmented over multiple "pages" of 16
<clever> it might run the shader 3 times, producing 48 vertices, and throwing 45 of them into /dev/null
<clever> and then how can the vertex shader know what the other corners are?
<clever> enless you bake that into the attributes, and dont share a vertex
<mrvn> is the index only 0-15?
<clever> *looks*
<clever> the primitive list, is passed to the renderer thread as opcode 32
bnchs has joined #osdev
<clever> https://docs.broadcom.com/doc/12358545 page 68, opcode 32
<clever> index primitive list
<clever> you give it a 32bit maximum index, so it can fault upon corrupt data
<clever> a 32bit phys addr for the primitive list
<clever> a 32bit length, and a type of either 8bit or 16bit
<clever> so the index can be a 16bit int, 0 to 65535
<clever> and you also give it a primitive mode, points, lines, line_loop, line_strip, triangles, triangle_strip, and triangle_fan
<clever> ah yes, and this happens in the binning thread, not the rendering thread
<mrvn> So you can do 21k triangles without shared vertexes at a time.
<clever> the binning thread, is going to run your coordinate shader over things (that produces XY but no varys)
<clever> the binner will then figure out which on-screen tile the polygon is in, and generate some draw commands for those tile(s)
<clever> then the renderer thread, renders 1 tile at a time, operating on a subset of the polygons
<clever> where it will vertex shade, and fragment shade
<clever> and a tuning param, controls how far ahead the vertex shader stays
<clever> too far ahead, and you have fewer vector regs available
<mrvn> Note: Normaly your shapes are fixed so you compute normals once. If your light / object isn't moving you can even compute the brightness statically.
<clever> too far behind, and it keeps stalling to vertex shade
<clever> yeah, ive heard of some games baking the normal map into a dedicated texture
<mrvn> SO I guess you make each triangle have 4 vertexs: v0, v1, v2, normal
<clever> so you just feed the fragment shader 2 textures and a UV
<mrvn> 16k triangles at a time then
<clever> the vertex data itself...
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<kof123> i should call it duckfs -- does this field exist? ok you can use that feature
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<mrvn> where it gets real fun is shadows and mirrors.
<clever> mrvn: so line 240-241, sets up the whole vertex array (and shaders), then 245-249 says how to make polygons from that
<clever> in theory, you could just repeat those 2 blocks, for every 16k triangles
<mrvn> Z sorted
heat has quit [Remote host closed the connection]
<clever> there is depth testing going on somewhere
<clever> but i'm fuzzy on the details
<clever> i had also heard how mirrors work in a YT vid a few weeks ago, just draw the scene from the viewpoint of the mirror, using a transformation matrix, and a stencil
<mrvn> clever: and then set that as texture. That's the general hack.
heat has joined #osdev
xenos1984 has joined #osdev
<clever> the method i saw didnt use it as a texture, but drew right into the final framebuffer
<clever> and used the stencil to limit where it could draw, and let depth testing do the rest
<clever> also, using the 3d output as a texture, requires an extra step
<mrvn> clever: that just combines the 2 passes into one then
<clever> yeah
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<mrvn> But you might want the mirror to also be a (textured) light source.
<clever> this sets up the format and physical addr of the output frame, this code is doing a linear bitmap image
<clever> but the texture core cant accept linear images!
<mrvn> or you transform all the light sources to their mirrored equivalents.
<clever> if you then lookup opcode 113 in the pdf, youll find that bit 70:71, is the memory format, linear, t-format, lt-format
<clever> change that to t-format, and then you can happily use the output as a texture
<clever> oh, and there was another thing i have yet to try using...
<clever> page 72 in the pdf
<clever> "vg inline primitives"
<clever> you can skip the entire vertex index layer, and just put vertex data directly into the control list
<clever> for every 3 vertices, it makes 1 triangle, feed it vertex data until you exaust all ram, lol
<clever> the hardware also accepts 3 types of shader records
<clever> the "gl shader state record" contains the addresses for the coordinate shader, vertex shader, fragment shader, and the strides and bitmasks for all the attributes
<mrvn> I want my blitter back from my Amiga. That was so much simpler.
<clever> the "nv shader state record" takes a fragment shader, and the shaded vertex data addr
<clever> if you just want to blit, the dma core on the rpi can already do that, it has 2d dma modes
<clever> its basically just a memcpy in a loop, copy X bytes, increment src by Y, increment dst by Z, do it I times
<mrvn> draw the edges with bresenham, run the blitter over the rect to fill the polygon and then again to copy it into the bitmaps as the right color.
<mrvn> clever: the blitter could fill an area turning on/off every time it hits a set bit.
<clever> ah, thats something that the pi cant really do, that i know of
<nikolar> what sort of hardware accelleration is there for 2d anyway
<mrvn> nikolar: copying. everything else is 3D
<nikolar> yeah though so
<nikolar> basically just blitting
<clever> nikolar: the 2d core on the rpi can composite a large number of sprite like layers, and do scaling, alpha blending, and pixel format conversions
<mrvn> you have DMA and GPU
<nikolar> and video decode too i imagine
<clever> but the 2d core cant do skew or rotation, only axis flips
<mrvn> nikolar: no, codecs are extra code. video is too complex.
<clever> yeah
<nikolar> clever: yeah compositing is really useful for windowing stuff
<mrvn> nikolar: not powerfull enough though to composite on the fly every frame.
<nikolar> scratch that then
<mrvn> nikolar: you give every window a framebuffer and then composit them into a global framebuffer and that you display.
<clever> mrvn: are you sure? https://www.youtube.com/watch?v=JFmCin3EJIs
<bslsk05> ​'Chaos, 13 sprites randomly bouncing around' by michael bishop (00:00:12)
<nikolar> i am really new to gpu accelleration so sorry for dumb questions
<mrvn> and hopefully not too much changes every frame.
<mrvn> clever: I have more than 290 windows.
<clever> mrvn: for the rpi's 2d core, there is no global framebuffer, it composites on the fly, and is constantly racing the (virtual) electron beam
<clever> ah yeah, with that many, you would need to use the offline composition and global framebuffer options
<mrvn> clever: yes, but as said, not powerfull enough.
<clever> the 2d core can just be copied back to ram with dma
<clever> and then you can do multiple passes
<clever> that 290 limit, is also assuming you want to pageflip between 2 frames, each of 290
<nikolar> you can use the gpu to draw only the differences i imagine
<mrvn> Usualy you also don't have many windows visible and changing. So you only have to composite small reactanges of the overall screen each frame.
<nikolar> is there a way you can save the output
<clever> if your displaying 1 frame, and then doing offline composition, the limit is more like 500
<mrvn> nikolar: yes
<mrvn> 00:24 < clever> the 2d core can just be copied back to ram with dma
<clever> nikolar: thats what the offline composition is doing
<nikolar> ah sorry, missed the message
<clever> offline composition is also the only way to do a 90 degree (axis swap) rotation
<nikolar> is virtio-gpu representative of how actual hardware works
<mrvn> nikolar: you could also do it dynamically. Check out how many non overlapping (or just a few overlapping) rectangles you have and if it's < 290 you composite dynamically direct to the video out.
<nikolar> mrvn: but if it's more then you'd have to cache
<mrvn> nikolar: yes. and then you would sort windows by how much they change and combine the not chaning ones first.
<clever> mrvn: but if you want to use offline composition, you can still have the hw do ~500 layers in a single batch, and then you still have enough for 40+40 to pageflip between
<clever> so it could draw all of the idle windows in one batch, then use that like a wallpaper behind the 39 most active windows
vdamewood has joined #osdev
<mrvn> excatly what I said
<clever> yep
<nikolar> yeah that's what i was thinking
<mrvn> Splitting windows into exposed rectangles in some clever way probably gets you below 290 in almost all cases.
<nikolar> isn't that what xorg does
<clever> splitting like that, also drasticaly reduces the resource usage for drawing
<clever> drawing over a pixel multiple times is costly, and can bring the limit below 20
dutch has quit [Quit: WeeChat 3.8]
<mrvn> nikolar: xorg does have exposed ractangles. It tells you what parts of your window become visible.
<mrvn> But if you have transparency or shape extension this gets rather ugly.
<nikolar> yeah that's why you have xorg compositors as an external things :)
<mrvn> You can easily stack 100 terminals with transparency on top of each other. Have fun rendering that.
<mrvn> .oO(if you have 50% transparency you can stop after 8 though)
<nikolar> i have transparency on my terminal and there's actually a noticable difference in battery life when it's enabled and when it's disabled
<mrvn> I find that horrible to read. Doesn't gains you anything.
<mrvn> I don't even like a background image. Too easy to reduce the contrast of the text.
<clever> the 2d core can draw 4 pixels per clock, and runs at ~500mhz, 2 billion pixels/second, 1280x1024@60 is 78 million pixels/second, so the rpi hardware can composite ~25 frames, each 1280x1024, with per-pixel alpha, and do it at 60fps
<nikolar> it wasn't that noticable, and i did like how it looked, but i wanted to tune my laptop for maximum efficiency :)
<mrvn> turn down brightness. :)
<nikolar> that helps too
<clever> the only reason i have the compositor enabled, is so i dont notice the laggy redraws in chrome
<mrvn> also keep the desktop which with little black.
<nikolar> hardware decode also helps, which i didn't realise wasn't enabled
<mrvn> s/which/white/
<clever> due to every window having its own off-screen buffer, i can switch to it instantly, without having to wait for a repaint
<clever> that, and the preview alt+tab shows
<nikolar> mrvn: can't really do light theme though lol
<clever> all the fancy effects are disabled
<mrvn> nikolar: why not?
<nikolar> can't stand it lol
<mrvn> nikolar: too bad. TFT work by having a bright light and then blocking it to make pixel darker. That blocking takes power.
<nikolar> i know that, but there are some things i just find easier to look at
<nikolar> one of those being dark themes
<mrvn> I think there are some screens that can control the LED light for regions of the screen. So if a large area is dark it dimms the LEDs.
<clever> mrvn: my cellphone has an oled display, and its freaky how dark it can go
<clever> every time a loading screen makes it go black, it looks like the phone just died
<mrvn> clever: sucking in light?
<clever> there is no difference between "black screen" and "just off"
<clever> so it can be hard to tell if its even on sometimes
<mrvn> clever: are you sure black screen isn't detected and really is just off?
<mrvn> ever had a black screen with 1 white pixel?
<nikolar> maybe oled phones need a small led somewhere so you know it's not off
<mrvn> Hmm, aren't oleds actually producing light in the right color?
<clever> i believe this is just 3 LED's in each pixel, r/g/b
<clever> so black is just turning all of those LED's off
<mrvn> eaxcty. So black is just power off.
<clever> yep
<clever> the control circuits are still on, but you cant see those
<mrvn> On TFTs white is power off.
<clever> also on lcd, black is a dim grey
<clever> so you can tell that its on, even when its "black"
<mrvn> And the pixels can't block 100% of the light so black is actually just dark.
<clever> yep
<mrvn> When you setup the wall for the home projector or in the cinema before the film starts what color does it have?
<clever> ideally, something pure white?
<clever> and decently reflective
<mrvn> Nope. that's black. You aren't putting any color on it yet. :)
<clever> but retroreflective is bad
<clever> ah yeah, you mean when the projector light is off
<mrvn> Black in the film is just the color of your ambient light. Doesn't get really black. Just like TFTs.
<clever> and thats where you want to control the lighting in the room
<clever> avoid pointing it towards the screen
<clever> and make the floors/chairs absorb all light
<mrvn> Are there any oled cinemas yet?
<clever> not heard of any
<mrvn> What I have seen though is "green screens" for movies that are actually displays. So they display the background in real time while filming and not in post processing.
<mrvn> Avoids the border effects where the actors fade into the green screen.
<clever> i was watching a kyle hill livestream a few days ago
<clever> and i noticed a green tint to his hair
<clever> the hair is semi transparent, and the greenscreen was bleeding thru between the strands
<mrvn> and because it's a mix of hair and green it doesn't trigger the replacement.
<clever> yep
<clever> ideally, you should detect shades of green, and make those pixels partially transparent
<mrvn> or when it gets too thin it replaces and you have hair loss
<clever> but then you cant have any shades of green on the actor
<mrvn> clever: now the fern on the desk is transparent.
<clever> those projector setups avoid that issue, and also fix lighting issues
<clever> you dont want the actor to have a green face, because of all of the green light the screen is reflecting
<clever> and you want the actor to be lid up by the lights in the scene
<mrvn> clever: also produce lighting issues since you see the shadows on the wall.
<mrvn> you have to be carefull that you still replace the shadowed green on the floor
<clever> yeah, you need to adjust the thresholds, so it deletes all of the screen
<clever> but the underside of the girrafe also looks a tad green?
<mrvn> Bad if you get those on screens on the walls. Haven't heard of screens for the floor yet.
<clever> yeah, then you dont have to deal with it as much
<mrvn> indeed.
<mrvn> .oO(green screen was also something Amigas did in hardware.)
<clever> yep
<clever> genlock i think it was called?
<clever> ive heard of even the c64 doing it?
<mrvn> You had 6 (later 8) bitmaps forming a palette index. Or you could split that into 2 parts and one color of the first lookup is transparent.
<clever> i recently figured out why they are doing that
dutch has joined #osdev
<mrvn> what?
<clever> why it used bit planed images
<mrvn> easier to deal with different color depths that way.
<clever> compresses far better
<clever> https://i.imgur.com/o2Px3Qq.png this from a YT channel talking about gameboy games and how the encoding works
<mrvn> that works for image files but rather irrelevant for display hardware.
<clever> and with all of the variation in shading, that doesnt compress nicely
<clever> https://i.imgur.com/HppxKEy.png but this is then bit0 and bit1 as seperate images
<clever> and now it can RLE encode much more easily
<mrvn> clever: How would you build image in memory if you have 3 bit per pixel?
<clever> 3 seperate 1bpp images
<mrvn> Would be ugly to deal with (2,0) having 2 bit in the first byte and 1 bit in the second.
<mrvn> So yeah, bitmaps are far easier to deal with.
<clever> do you remember the list of pixel formats the rpi 2d core accepts?
<bslsk05> ​github.com: lk-overlay/hvs.h at master · librerpi/lk-overlay · GitHub
<clever> its this whole enum, and the palette format accepts the entire 1bpp to 8bpp range
<mrvn> The Amiga hardware had another cool feature. You could set 6 bitmaps and 00xxxx would select from the palette. 01xxxx would take the last pixel and replace the red component with xxxx. 10 for gree and 11 for blue.
<mrvn> Later it was 8 bitmaps with a 64 colors palette for the same encoding.