klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<klange> Oh, stb_truetype doesn't do instruction processing beyond basic shape extraction... if I can get that my lib might actually be useful outside of just me...
nyah has quit [Read error: Connection reset by peer]
<geist> it has a little interpreted bytecode? i guess i knew that but never looked at it
<geist> is it a stack based thing?
vdamewood has joined #osdev
<klange> yeah
<klange> i think it has some registers, and they are adorably font-specific things
vinleod has joined #osdev
vdamewood has quit [Ping timeout: 240 seconds]
vinleod is now known as vdamewood
Skyz has quit [Quit: Client closed]
Skyz has joined #osdev
<Skyz> Klange: Font is an OS specific format
<Skyz> So you have to have some way to implement it in the OS
<klange> Fonts are not an "OS specific format" and I do not have the patience this morning to explain to you just how off-base saying that to me is.
<Skyz> I'm just looking through the files in graphics right now
<vdamewood> Is 'Font' the name of a format I'm not familiar with, or is this actually about fonts such as .otf, .ttf, and such?
<Skyz> .ttf .otf and such
<vdamewood> Freetype supports both .otf and .ttf. So, if your OS supports libraries, your OS supports .ttf and .otf.
<kazinsal> I feel like I put someone on ignore and made the right decision because this conversation is missing chunks
richbridger has joined #osdev
<klange> vdamewood: somehow you've managed an even worse take then Skyz :P
<kazinsal> Oh. OH. That explains it
<vdamewood> klange: And I wasn't even trying that hard.
aquijoule_ has quit [Read error: Connection reset by peer]
<vdamewood> klange: I'm actually kind of curious what's bad about my take.
<vdamewood> Did I miss something?
<klange> You missed that you were talking to me, king of NIHing :)
<klange> [hacked my ongoing work into an SDL app because waiting for VMs to boot to iterate was annoying] That's a pretty nice looking 'd', right? https://klange.dev/s/Screenshot%20from%202021-07-02%2010-22-46.png
<gog> oh
<kazinsal> FreeType also happens to be designed to cover basically every use case whereas in a hobby OS it makes more sense to cover the use cases that you and your software actually need
<klange> I want more than just basic glyph shapes (stb_truetype) but not necessarily all of the features of FreeType, and most importantly... I want to write my own.
<bslsk05> ​xkcd - The General Problem
<vdamewood> Well, I didn't mean to imply that an osdever should actually use FreeType, just point out how positively trivial it is to support the most common formats.
<gog> i feel attacked
<gog> i get nothing done because i need to cover literally every scenario
<vdamewood> gog: Want a fishy?
<gog> yes.
* vdamewood gives gog a fishy.
* gog chomps
<klange> https://klange.dev/s/Screenshot%20from%202021-07-02%2010-33-37.png okay I'm reasonably content with this so far...
<gog> snowman!
<gog> what's that green line?
<pony> klange: oh, that's pretty!
<klange> it's a drawing tool that's loading up basic glyph points from a font to start off, the green line indicates next edge to mouse cursor; right click does a move-to, left click does a line-to, I hacked up curves with fixed subdivisions for now
<klange> (normally you'd do something smart like tesselate until the midpoint is within an error range)
<klange> so if I click a few times it'll insert more vertices: https://klange.dev/s/Screenshot%20from%202021-07-02%2010-38-42.png
<gog> ah neat so a crude vector drawing thing
<klange> it also selects a new random color each time
<klange> yeah
^[ has joined #osdev
<klange> The antialiasing can still use some work, but I'm happy with it for now - grid fitting / hinting will have a far more visible effect.
<gog> :)
<klange> Then I'll need to actually parse metrics, kerning pairs, etc. and we'll have a nice little text rendering system and I can finally throw away the SDF renderer and get some semblance of basic Unicode support back~
<moon-child> sdf--signed distance fields?
<moon-child> isn't that a gpu rendering technique?
isaacwoods has quit [Ping timeout: 268 seconds]
<kazinsal> iirc the GPU rendered SDF font technique was somewhat pioneered by Valve but you can do it in regular software too
isaacwoods has joined #osdev
<klange> It's pretty easy to implement and produces pretty solid results, so I banged one out as my first in-house antialiased text renderer.
<klange> take any of my random screenshots from the last couple of years and the text there is with the SDF renderer and some baked ASCII-only DejaVus: https://klange.dev/s/Screenshot%20from%202021-06-23%2021-04-23.png
<klange> But baking is expensive, the final textures are still much larger per-glyph than the original sources, and I don't really care to add all of the features necessary to make it a full general-purpose text system.
<klange> So it's time to sunset it and get this TrueType implementation in there instead.
<clever> klange: ignoring the scaling requirements, how would you compare one big texture like https://gallery.earthtools.ca/v3d2/arial.png ?
<klange> SDF is a significant improvement over a baked bitmap texture.
<klange> You can use a much smaller texture with distance fields to get the same quality _and_ with SDF you get viable scaling.
<clever> yeah, vector vs bitmap
<clever> in this image, i think its 100% solid white
<clever> the only information, is in the alpha layer
<clever> which is only 1 bpp, essentially (havent confirmed the actual encoding)
<klange> let me get one of my SDF bakes into a visible format - they're PNGs but github is confused by my choice of file extension.
<klange> https://klange.dev/s/sdf_thin.png this is a horribly wasteful bake of DejaVu. It's at a slightly bigger size than that Arial bitmap, but this is the sole source of all of the different sizes of text in my last screenshot [except the "Jun 23" that's the bold font]
<klange> With SDF we can scale _down_ and still get viable shapes and do nice anti-aliasing.
<clever> for context, i found that arial.png in wowmapviewer
<clever> when your rendering multi-gigabit 3d map files, the size of a font matters much less
<clever> gigabyte*
<clever> this is a final render coming out of it
<klange> A good implementation would do rect-packing, a very good implementation would do multi-dimension distance vectors which make for super crisp hard edges, and if I were to invest time in this the next steps would have been to add glyph mappings for Unicode, x-advance and kerning tables, etc.
<klange> (The x-advance is currently a separate config file with each letter and its width for the base size written out, by hand)
<klange> (With some obvious mistakes, especially in the bold font)
<bslsk05> ​github.com: wowmapviewer/font.cpp at master · cleverca22/wowmapviewer · GitHub
<clever> looks like wowmapviewer's x-advance, was simply width + 2
<klange> That works fine if you have the width, which you do if you're rect-packing :)
<klange> My thing only did grid layout; easy lookup, but no width information stored in it.
<clever> there is a text file, giving the xywh of each glyph in the .png
<clever> but with a vector file, its more fuzzy, and you can allow overlap
<clever> a vector file also allows you to not store gaps
<clever> with what wowmapviewer is doing, if you want more padding on a glyph, you need to include dead space in the texture and width
iorem has quit [Quit: Ping timeout (120 seconds)]
isaacwoods has quit [Quit: WeeChat 3.2]
Skyz has quit [Quit: Client closed]
freakazoid333 has joined #osdev
sts-q has joined #osdev
paulusASol has quit [Read error: Connection reset by peer]
medvid has quit [Remote host closed the connection]
paulusASol has joined #osdev
hgoel[m] has quit [Ping timeout: 250 seconds]
medvid has joined #osdev
hgoel[m] has joined #osdev
ElectronApps has joined #osdev
paulusASol has quit [Quit: node-irc says goodbye]
tenshi has quit [Quit: WeeChat 3.2]
mctpyt has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
medvid has quit [Quit: node-irc says goodbye]
ElectronApps has quit [Ping timeout: 272 seconds]
ElectronApps has joined #osdev
<doug16k> clever, I used an alpha-only texture format for my bitmap font renderer texture atlas. 8 bits per pixel
<bslsk05> ​andryblack/fontbuilder - Bitmap font generator (89 forks/393 stargazers/MIT)
<doug16k> then just instanced rendering to draw all glyphs on the screen in one call
<doug16k> makes text take a negligible amount of gpu time
iorem has joined #osdev
<clever> thinking about the rendering cost some....
hgoel[m] has quit [Quit: node-irc says goodbye]
<clever> lets say we are working with a fixed-width font, like a VGA console, 8 pixels wide, 16 pixels tall
<doug16k> you can boil it down to just giving it an actual array of characters and positions, then the vertex shader looks up the texcoord and passes them down. fragment shader does trivial texture fetch and multiplies that value with the color
<clever> ahh, your going one step further then ive done
<doug16k> or does color*pixel+existing_color*(1-pixel)
<doug16k> then the alphas are really just pixel coverage values
<clever> if i was to do it without the gpu, then each glyph on screen, involves a 1 byte read (the char itself), then a 128 pixel copy
<moon-child> doug16k: if you're on gpu, you have unlimited memory b/w, so does squishing to 8bpp make sense?
<moon-child> I guess you can lut so it's not a big deal
<clever> lets assume i'm using RGB565, so 16bit color
<doug16k> it tells you what proportion, from 0.0 to 1.0, of a mix
<doug16k> you broadcast the value you got. why make it waste memory bandwidth?
<clever> that means each glyph needs 256 bytes of writes!!
<clever> yeah, i can see how things scale rapidly, and need gpu accel
<doug16k> do you mean software rendering?
<clever> did the math on software rendering first, so i have something to compare to
<doug16k> to go fast with software text rendering, the key is to go as far across each scanline as possible and make the stores contiguous bursts
<doug16k> let the reads be more scattered so the stores can be more contiguous
<clever> yeah, because thats just just 256 bytes worth of writes, thats 16 seperate bursts of 16 bytes each
<doug16k> I mean go across glyphs as much as you can
<clever> i could improve that burst by doing it one scanline at a time, yeah
<doug16k> Hello would render scanline 0 of H e l l o then scanline 1 of H e l l o, etc
<clever> then i have to read each glyph from the text buffer 16 times, but the writes are better
<clever> and write combining will like that
<doug16k> yeah, the idea is to make it do bursts
<clever> now, if we switch gears, to using the 3d core of the rpi (since i know its internals well)
<bslsk05> ​github.com: wowmapviewer/font.cpp at master · cleverca22/wowmapviewer · GitHub
<clever> lets assume we are using triangles, so we have to draw 2 tri's to cover an 8x16 glyph
<doug16k> ya then do what I said
<doug16k> use instanced rendering
<doug16k> earlier I mean
<clever> your technique is even more advanced then what i'm mathing out now
<clever> depending on how well you implement it, you need either 4 or 6 vertex points
<doug16k> then it is a very simple vertex shader to figure out the atlas coords
<doug16k> it passes down hello-world level texcoord+vertex
<clever> 6 is the dumb way, 4 reuses 2 vertex's between the tris
<doug16k> yeah but the idea of instancing it is, you are actually just drawing a bunch of instances of two triangles
<doug16k> and the vertex shader can tell what instance this is
<clever> id like to finish the math first, and see how costly a lack of instancing is
<doug16k> so it knows how to lookup the texture atlas coords
<doug16k> it gets the glyph
<doug16k> if you are really close to the hardware like that, just immediate mode doing simple pair of triangles is probably good enough
<doug16k> the instancing is to save API call overheads
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<clever> setting the UV of a vertex involves 2 floats, but this code is cheating and storing them in a global var for later use
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<clever> this then inputs the XY of the vertex, and copies the UV from the global var
<doug16k> yes, opengl is a bunch of state
<doug16k> it leaves most things unsaid, assumed you selected it and that state is set already
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<clever> and ultimately, the vertex data is just a flat array of this struct
<clever> 16 bits of x, 16 bits of y, 32bits of z/w/u/v/r/g/b
<doug16k> it can do instancing though, right?
<doug16k> you can set a divisor?
<doug16k> if you use instancing, it is insanely fast
<clever> so 8 bytes per vertex, times 4 (best case), gives 32 bytes per glyph
<doug16k> it just converts everything into a texture lookup that it can do trivially easy
<doug16k> even getting the glyph and positions are using the same mechanism as texture lookup
<clever> so if i dont do instancing, and i rebuild the vertex data on every frame, its ~1/8th the writes, not counting some other misc overheads
<doug16k> with instancing you don't keep saying the vertices
<clever> vertex shading can probably do instancing too, if i knew the GPU better
<clever> there are also tricks i could do, if i was rendering a vga text console
<doug16k> ya go ahead and immediate mode do it for sure
<clever> i can assume the tri's are stationary
<doug16k> it'll be close enough
<clever> and only update the UV in each vertex
<doug16k> instancing gives the implementation the power to do it extremely well. it's shocking how fast my nvidia drivers do instanced text render
<clever> if i was to cheat in such a matter, updating the text would just involve 8 bytes worth of writes per glyph, with a bit of scatter
<doug16k> it is using the hottest case that they optimize heavily
<clever> updating the color would involve 12 bytes of writes
<clever> and this isnt even taking into account dirty status's
<doug16k> yeah which still amounts to very little compared to how much video memory store bandwidth it generates
<clever> i'm mostly thinking about the main cpu core's store bandwidth
<doug16k> can you make it a strip or fan?
<doug16k> then it would be 4 glVertex for 2 tris
<clever> its index based
<clever> you create an array of raw vertex data, then you create an array containing sets of 3 index's into the 1st one
<doug16k> being index doesn't mean it isn't fan or strip
<clever> how does fan and strip work?
<doug16k> fan keeps repeating the 0th vertex in every triangle. the other two points are the last two vertices
<doug16k> a strip uses the last 2 vertices plus the new one for each new triangle
<clever> oh, i just noticed that in the docs...
<clever> > primitive mode: 0,1,2,3,4,5,6 = points, lines, line_loop, line_strip, triangles, triangle_strip, https://github.com/cleverca22/wowmapviewer/blob/master/src/font.cpp#L57-L86
<clever> ack, mis-paste
<clever> triangle_fan was the last one
<clever> so the hardware does support strip and fan modes, but i'm not using it yet
<clever> and i was thinking of simulating that, by manually reusing vertex data in the index list
<clever> in plain triangles mode, i would tell it to make 2 tris, with vertexes 0,1,2 and 1,2,3
<doug16k> ya or with fan you would go 0, 1, 2, 3 and get both tris
<clever> so i have to feed it 4 vertex structs (32 bytes each) plus 6 indexes (8bit or 16bit, depending on vertex array size)
<doug16k> strip would also work in case that simple
<doug16k> I think
<doug16k> fan definitely
<clever> i can picture why its called fan, if you are mapping out pizza slices of a circle
<doug16k> ya exactly
<clever> reusing the center vertex
<doug16k> same with clipped polygons. if you just make sure you emit all the clipped vertices in the same order around the edges, you can just toss N verts at a fan call and it does it right
<doug16k> not ideal but works
<clever> so if i change this to triangle_fan mode, then i can implement glVertex2squad better, and switch the font code over to using a quad
<doug16k> yeah
<doug16k> there is also the possibility of a "restart index"
<clever> but, all tris in the shader, must be in the same mode
<doug16k> also, you can make a degenerate triangle with two points the same to start a new strip or fan
<clever> rendering with a different mode, requires a second vertex index list, and another shader record
<doug16k> in one big index list
<clever> how would the GPU know how many slices in the fan?
<doug16k> verts - 1
<doug16k> by definition
<doug16k> er - 2
<clever> i dont see how that can work, with the v3d
<clever> there is no tri count
srjek|home has quit [Ping timeout: 256 seconds]
<doug16k> so?
<clever> you just give it an array of vertex indexes, and when in triangles mode, it knows that its sets of 3
<doug16k> why does it care?
<doug16k> fans and strips only look at the last one or two verts
<clever> but if i want to render quads, i need sets of 4, each fanning out from a new center point
<doug16k> if the verts just keep coming it just keeps using the last and second_last verts it is keeping
<clever> but then how do i reset it, to a new center point?
<doug16k> and each vert is pushed down that little sliding window of recent verts
<doug16k> you say the same vert twice
<doug16k> in a strip
<doug16k> that makes it have 0 area
<clever> ah, so it would be like 0/1/2/3/3 then 4/5/6/7/7 ?
<clever> to render 2 quads
<doug16k> just look at how the strip indexes into the history of verts, and emit the appropriate pair of identical verts to make a 0 area degenerate triangle that has a 3rd point in the new place
<doug16k> OR
<doug16k> the hardware might support a "restart index" you can set
<doug16k> then when you say that magic number in an index array, it knows to start a new strip/fan
<clever> either way, thats 5 indexes per quad
<clever> vs 6 indexes, with 2 being repeated
<clever> i can see how fan would help more, the more slices you have
<clever> with 2, its barely a benefit
<doug16k> the true benefit is how much more it can do in one call
<doug16k> it's not primarily about saving bandwidth
<doug16k> it throws that in as a bonus
<doug16k> you want to deluge a gpu with work it can overlap
<doug16k> one thing at a time is hideous
<doug16k> you are so close to the hardware though, it should be fine to just do it manually
<doug16k> most of the cool opengl speed stuff is about saving api call overheads
<doug16k> throwing blocks of work at it
<clever> i think VBO's where about exposing this vertex array directly to the app?
<doug16k> vbo is about remembering how you configured it to look up stuff in arrays
<doug16k> how you did binding between arrays and shader variables
<clever> ah
<clever> i think ive also heard instancing being mentioned, in doing things like copy/pasting a 3d object like a tree
<clever> so you just give the xyz and rotation/scaling params for a tree, and then the ?? shader will paste a duplicate copy of the tree, with those params applied
<doug16k> sooner or later, you run into a case where, you have many copies of the same thing, but just a couple of things differ
<doug16k> maybe different color, position, orientation
<doug16k> everything else is identical
<clever> yeah, thats why my vertex has an RGB on it
<clever> the pixel shader will tint the texture to that color
<doug16k> so with instancing, you can have an array of those things that differ. an array of colors, an array of positions, an array of orientations. then on those you also set a divisor. typically 1 so each 1 instance advances to next color/position/orientation
<doug16k> then you draw it once and it renders one for each instance
<clever> and is that all handled by the vertex shader?
<doug16k> shader can be oblivious
<doug16k> you setup the array binding for the color already
<doug16k> it is bound to some vertex shader input, regardless
<doug16k> you just happened to put an instance divisor on it
<doug16k> then it means, do the whole draw, advance the instance things to the next one appropriately, respecting the divisor, and render it again
<doug16k> and again, until the instance count
<doug16k> in one draw!
<doug16k> what would have been uniform becomes an attribute
<doug16k> but that attribute behaves like a uniform - but changes with the instance id
<doug16k> it does the lookup into the array that has the divisor
<doug16k> array[instance index divided by divisor]
<doug16k> think of it like a uniform that magically sets itself across instances in one big draw call
<doug16k> yes that is exactly what you want
<clever> that is what line 14-17 was having to generate
<clever> and i suspect the vertex shader has to create these?
<doug16k> even if you were software rendering, there is a place in the code just like that link
<doug16k> no
<doug16k> this is the output of your clipping
<doug16k> this is the edge data
<doug16k> x y are screenspace coords, 16 bit integers?
<doug16k> z is 32 bit float
<clever> yeah
<doug16k> 1/wc is the reciprocal of the 4d clip-space coordinates
<doug16k> reciprocal of w
<doug16k> you interpolate down the edge and have one of those per scanline
<doug16k> then another sequence of those for right edge
<doug16k> this isn't how you deal with it in an opengl api
<doug16k> this is you scan converting the triangles and just using fragment shader
<doug16k> this is you doing the rasterization step by hand and just letting it do shading in gpu
<clever> there is a clipping step that still happens
<clever> the v3d has a "binning" phase, where it will figure out which tri's are in a given tile
<doug16k> hideous one that only a fool would use
<clever> the v3d can only render one tile at a time
<doug16k> you must clip your homogeneous coordinates
<doug16k> it expects everything to be in the -1<x<1 range
<doug16k> and z in 0<z<1
<doug16k> you need to project them
<doug16k> there's no other way
<clever> the tiles are all 64x64 pixels
<doug16k> see that 1/wc there?
<doug16k> how you going to 1/0 ?
<doug16k> answer: you don't have to worry, clipping will make it always in range and lowest z ever is 1
<clever> nextVertex->w = 1;
<doug16k> trust me, you must clip this: https://i.imgur.com/6rlXGm6.png
<clever> i never figured out what that did, and was just hard-coding it
<doug16k> if you make the input vertex be ,1 that means make the position {x/1, y/1, z/1}
<doug16k> now it should be obvious why 1
<doug16k> if you said ,2 and divided x,y,z by 2, nothing changed
<doug16k> er multiplied x,y,z by 2
<doug16k> I wish I knew that 20 years ago
<doug16k> I was in the "wtf w?" camp for quite a while :D
<clever> ah, so this would scale the xyz coords
<doug16k> it is for projection
<clever> ooo, does this handle things in the distance appearing to be a diff size?
<doug16k> the projection matrix can use that 4th column to make it divide
<doug16k> exactly
<doug16k> it ends up with a number that seems like it is 1/z but offset weirdly to account for the z input range being 1 to whatever and output z range mapping to 0 to 1
<clever> how does rotation of the camera work then?
<doug16k> think of the three rows as three arrows
<clever> if i had a proper vertex shader, and the camera was moving in 3d space, should the XYZ's change, or does the vertex shader compute that for me?
<doug16k> they tell you "which way" each axis goes
<doug16k> for each dimention
<doug16k> s
<doug16k> so you take the input and map it onto those directions by dotting the distance along each dimension with that direction in the output
<doug16k> the matrix is actually the inverse of that though. to look right 45 degrees you actually rotate the whole world left 45 degrees
<doug16k> camera is always at 0,0 looking down z axis
<clever> my rough guess, on how this all works
<clever> is that the vertex shader will first translate every vertex, by adding a constant to the x/y/z coords (because of the camera moving)
<clever> then it will do some funky math to rotate everything around that center point
<doug16k> oh you mean how do you do it
<clever> then it will compute the W, so the gpu will scale distance objects
<clever> W being the distance to the object
<doug16k> you set one or two uniform matrices
<doug16k> then in the vertex shader, you just multiply the incoming vertex by the matrix
<doug16k> write the output to the position and set any varyings
<clever> yeah, ive heard about how matrix mult can do both the rotation and translation in one step
<clever> i still have no clue how
<doug16k> and projection
<clever> so if its doing projection, do i even need W ?
<doug16k> you can do any number of projections, scales, translations, shears, and rotations in one
<doug16k> you need w, you want w. you love w
<doug16k> without w, don't waste your time
<doug16k> it's genius
<doug16k> the clipping is easy and fast too
<doug16k> and the clipping doesn't vary - clipping is the same no matter how the engine does stuff
<doug16k> wipes all divide errors off the map
<clever> what about the gpu just ignoring tri's behind a tri?
<clever> how does that clipping happen? and how does it know if a tri is transparent or not
<doug16k> you are skipping way past what I mean
<doug16k> what if you draw a triangle that is behind the camera
<doug16k> did you think it would magically give you sensible coordinates?
<clever> yeah, those should be dropped hard
<doug16k> it will happingly give you total crap
<clever> yeah
<doug16k> no
<clever> something will have to filter them out
<doug16k> if you think you can cheat out of w use, you are wasting your time
<clever> i think i see what you mean now
<doug16k> if you want to do 2d stuff, then make w 1 always, and it works
<doug16k> you can set up the projection matrix so w ends up 1 no matter what
<doug16k> that is what happens in isometric projection
<doug16k> it has no idea how far away anything is
<doug16k> you can keep the z around and see
<doug16k> but I mean, the z gets incorporated into the x y
<doug16k> and you have 1/w = 1 always, the z has no effect unless it is being used to do z fill/test
<clever> need to take a break now, but i need to re-implement this under LK some time soon
<clever> LK can render a 2d buffer to the screen, so having tris would be some insane performance boosts
<clever> doug16k: how insane would it be, for a 128kb binary, to boot an rpi to a spinning teapot, without any blobs? lol
<doug16k> this will make a projection matrix for you: https://github.com/doug65536/qemu-rom/blob/master/vec.h#L715
<bslsk05> ​github.com: qemu-rom/vec.h at master · doug65536/qemu-rom · GitHub
<doug16k> just multiply a few rotations and translations by that and use that in vertex shader
<doug16k> when I said you must clip, I meant that thing you linked you know
<clever> first step, is just re-creating this 2d render, without any blobs helping out
<clever> then i need to figure out vertex shaders
<doug16k> of course if you just drawindexed then part of pipeline is doing that clipping I said
<clever> then i can actually implement that
<doug16k> for that it is a one liner
<doug16k> two liner if texcoord
<doug16k> position = vertex * mvpMatrix
<doug16k> varyingTexcoordIMadeUp = input_texcoord
<doug16k> at hardware level, you might have to put in derivatives
<doug16k> for interpolation
<doug16k> the beauty of it is, you interpolate the 1/w, then per-pixel it is a multiply to do projection divide
<doug16k> has the right weird non-linearity to look right
<doug16k> pixels expand way out close up, and squish way down far away
<doug16k> but the more far away, the less smaller it seems to make it appear
<doug16k> that nonlinearity
<doug16k> if flying away from something at a constant rate
<doug16k> or the other way, as you get closer, the scale of it gets exponential
<doug16k> clipping prevents numeric explosion or weird flipped projection of point behind you
<doug16k> faces behind the camera that you didn't clip will render as their backface
<doug16k> because you are dividing by negative w
<clever> there is also an option in the v3d, on which side of a face can render
<doug16k> I mean if you used https://i.imgur.com/6rlXGm6.png you must prevent negative w
<doug16k> with glvertex, sure, throw whatever at it, it does that clipping part implicitly
<doug16k> I suppose you could emit those in the vertex shader assembly
<doug16k> you scan convert in the vertex shader too? that doesn't sound right
<clever> i still need to figure out how vertex shaders work on v3d
<doug16k> could be that it keeps giving you an edge
<clever> thats one step i never got thru
<doug16k> it's a shame that you can't copy a boot image into a pi4 across usb
<doug16k> makes it borderline useless. hanging on by a thread
<clever> what do you mean?
<clever> there are several ways to boot over usb
<doug16k> I can't just update the bootloader over usb
<clever> you can, but there are 2 things blocking it
<geist> happy belated canada day
<clever> doug16k: do you want the pi4 usb to be in host or device mode?
<doug16k> geist, good day off :D
<geist> toss off, eh?
<doug16k> would need to be device to appear as serial or block storage
<clever> doug16k: 3 options then, 1: erase the spi first, 2: adjust the BOOT_ORDER to co-operate, 3: short a pin out to force it into device mode
<clever> doug16k: once those are done, it shows up as a vendor usb device, and you must use rpiboot to push firmware over
<doug16k> firmware?
<clever> doug16k: if doing 1 or 3, you can send recovery.bin over the usb port
<clever> and it will then request pieeprom.{bin,sig}, and re-flash the SPI
<doug16k> that requires me to know all about rpi4 hardware. I couldn't care less about rpi4 hardware though
<bslsk05> ​github.com: usbboot/recovery at master · raspberrypi/usbboot · GitHub
<doug16k> it would be fun to play with the opengl-like gpu though
<clever> you just run `rpiboot -d recovery` and it reflashes
<doug16k> is that just a kernel elf boot thing?
<doug16k> or is that wake up with no ram and doornail mode pci?
<clever> when using mode 1/3, no ram at all
<clever> when using 2, ram can be online, and you can boot a full linux
<bslsk05> ​www.raspberrypi.org: Raspberry Pi 4 bootloader configuration - Raspberry Pi Documentation
<clever> if BOOT_ORDER hits a 3, then it will use the rpiboot protocol to request start4.elf from a host
<doug16k> request how
<clever> and then it can pull config.txt/kernel.img/initrd over that
<clever> rpiboot has 2 protocols, dumb, and file-server
<bslsk05> ​github.com: usbboot/main.c at master · raspberrypi/usbboot · GitHub
<clever> when in file-server mode, you read a `struct file_message` from an endpoint, that contains a command and a filename
<clever> the host must then respond correctly, and wait for another file_message
<clever> it supports 3 commands, query file size, read file, quit
<doug16k> who reads
<clever> the pi4 initiates the reads
<clever> while acting as a usb device
<doug16k> what usb device
<doug16k> nothing? just some hacked together hack using libusb?
<clever> when the bootloader firmware hits a 3 in BOOT_ORDER, the usb-c port of the pi4 goes into device mode
<clever> and talks to that libusb code
<clever> when using rpiboot correctly (or my webusb re-implementation), you can push over the start4.elf/kernel.img/initrd, and boot linux up
<clever> linux can then take over the usb controller, and gadget mode anything it wants to
<doug16k> so it can't just be a compile step that updates it
<doug16k> I have to spoon feed it each time from some daemon
<clever> yeah
<doug16k> screw that
<clever> you can always make something better, if you boot something from SD
<clever> and then use a protocol of your own choosing
<geist> hmm, not too bad. wanted to benchmark how fast a arm64 core can crc32 check a buffer of moderate size
<doug16k> the only thing keeping my rpi4 out of the garbage can is use as a 3rd fallback of none of my 3 better machines work
<geist> it just did 16GB of crc32 in units of 64k (so the cache is pretty hot) in about 5.7 seconds
<geist> so somewhere around 3GB/sec
<geist> this is a rpi4
<clever> doug16k: thats part of why i want to get proper source for the ddr4 controller
<clever> doug16k: so i could then put custom code into the SPI flash, and it could boot directly into a usb gpu, for example
<clever> no daemon involved
mctpyt has quit [Ping timeout: 258 seconds]
elastic_dog has quit [Ping timeout: 256 seconds]
<bslsk05> ​github.com: mesa/vc4_context.h at main · mesa3d/mesa · GitHub
<clever> doug16k: this is the source for driving the 3d pipeline on the whole pi0 to pi3 range
<clever> if you want to look at vertex shaders in more depth on that
<clever> i'm currently trying to find out where in the source it generates one..
elastic_dog has joined #osdev
<clever> found some relevant code in vc4_draw.c
<doug16k> ah you have to do stuff for tiled render too eh?
<clever> doug16k: there is a dedicated control list for the binning process, and the gpu somehow sorts tri's into tiles for you
<clever> it will generate an array of functions that does something to render a tile
<clever> the renderer control list, is then an unrolled for loop, to call each function in that generated array
<bslsk05> ​github.com: gl/core.c at master · cleverca22/gl · GitHub
<clever> doug16k: opcode 115 sets the coordinates for the destination tile, opcode 17 calls some code the binner generated, rendering one tile worth of tris, and opcode 24/25 commit that tile to ram
<clever> kernel/vc4_packet.h: VC4_PACKET_TILE_COORDINATES = 115,
<clever> aha, and thats how mesa refers to it
<clever> 27 * In the VC4 driver, render command list generation is performed by the
<clever> 28 * kernel instead of userspace.
<clever> doug16k: aha, that could explain why some code appears to be missing!
<clever> the shader code can basically write to any ram it wants to, all security is out the door
<clever> and validating that the configuration is sane is cpu intensive
<clever> far safer for the kernel to just generate a sane config, based on your requirements
<doug16k> dma can usually write to whatever it wants
<clever> yep
<clever> in the case of the pi4, there is an extra MMU between the 3d hw and ram
<clever> which kinda acts like an IOMMU
<clever> but it was added more for ram size reasons, the 3d hw is still 32bit based, and cant address beyond 4gig
<clever> but the bcm2711 can support up to 16gig of ram
<clever> so the extra MMU lets the dma write to 64bit addresses, while only dealing with 32bit internally
<clever> its 3am, i should get to bed now
Phibred has quit [Quit: Leaving]
klysm has quit [Quit: Lost terminal]
<doug16k> I made a few performance fixes in gnuchess
<mjg> :)
<doug16k> it's hilarious how strong it is if you let it think during your turn, with 16GB transposition table and PGO LTO build
<doug16k> it now spends more time on chess than select spin
<mjg> did it migrate to a neural network?
<geist> gosh i was getting owned by sargon chess on z80 the other day
<mjg> afair stockfish did
<geist> even downclocked it to 1mhz and still got my butt kicked
<doug16k> 3950x on heavily optimized gnu chess is a monster against me
<bslsk05> ​'Grandmaster Naroditsky Chess Speedrun Pt. 1' by Daniel Naroditsky (00:13:59)
<geist> though i may be possible that the original 8080 sargon is actually pretty good
<geist> at least for the time. looking at the page on wikipedia it was no slouch
<doug16k> if I enable post, the thinking just jumps straight to depth 18 in a flash, then it is getting hard to get to 19, 20, 21, etc
<doug16k> how can you beat that?
<mjg> ask stocfksih
<doug16k> ya stockfish would destroy a grandmaster though
<doug16k> grandmaster games are weird
<doug16k> they will just put some good piece right where a pawn can take it, and if you take it you guaranteed loss
<geist> ah Sargon 2.1
<doug16k> he's seeing if you are so stupid that you will take it
paulusASol has joined #osdev
<doug16k> it is a tiny bit creepy though, how you can make a move and it "knows" what you are doing and starts defending against that line or whatever
hgoel[m] has joined #osdev
medvid has joined #osdev
<doug16k> stockfish doesn't work for me
<doug16k> xboard just stops working if I try to use it
<doug16k> stockfish is uci or nothing
<mjg> lol
<mjg> xboard is incredibly buggy
<doug16k> yeah
<mjg> i used to play with a coworker using xboard
<mjg> it would crash every single time there was a checkmate
<mjg> also you could promote a pawn to a king
<doug16k> there are weird variants where you can
<doug16k> but ya not normal chess
<mjg> we played regular chess
<doug16k> suicide chess allows promotion to king
<doug16k> you win by losing all your pieces, there is no check and losing king is no big deal. if you can attack, you must attack. you get to pick which attack to do if multiple available
<mjg> personally i like anti-chess
<mjg> at least that's how i know it by name
<doug16k> that is gentler name
<mjg> i tried setting up leela locally but it's just too painful
<doug16k> kid friendlier
<mjg> i tried several frontends and they all keep fucking up in some manner
<doug16k> ikr? what is wrong with linux chess ui code?
<doug16k> broken!
<mjg> ye that's an example of something i assumed would be on lock
<mjg> but no
<mjg> it's fucking worse than your typical webstack
<geist> huh seems like it'd be trivial
<geist> at least the interface. isn't it just ascii based moves?
<mjg> it's better than that, there is a clear text protocol
<geist> yah that's what i figured
<mjg> which makes the entire thing even more perplexing
<mjg> eh, programming classic: write a patch in 5 seconds, fuck with tooling for 50 minutes to test it
<bslsk05> ​'Hal fixing a light bulb (from Malcolm in the Middle S03E06 - Health Scare)' by Vincent Verschuren (00:00:42)
<doug16k> where is the terminal chess client?
<doug16k> unicode has all the pieces
<mjg> on c64! :)
<doug16k> mjg, that video describes my OS project right now too
<bslsk05> ​'30 Weird Chess Algorithms: Elo World' by suckerpinch (00:42:35)
<mjg> the guy is a genius
paulusASol has quit [Quit: Client limit exceeded: 20000]
hgoel[m] has quit [Quit: Client limit exceeded: 20000]
tacco has joined #osdev
medvid has quit [Quit: Client limit exceeded: 20000]
tacco has quit [Client Quit]
paulusASol has joined #osdev
medvid has joined #osdev
sortie has joined #osdev
hgoel[m] has joined #osdev
dennis95 has joined #osdev
paulusASol has quit [Quit: Client limit exceeded: 20000]
_mrlemke_ has joined #osdev
scaleww has joined #osdev
mrlemke has quit [Ping timeout: 258 seconds]
medvid has quit [Read error: Connection reset by peer]
hgoel[m] has quit [Read error: Connection reset by peer]
paulusASol has joined #osdev
paulusASol has quit [Remote host closed the connection]
sortie has quit [Quit: Leaving]
mrlemke has joined #osdev
_mrlemke_ has quit [Ping timeout: 252 seconds]
GeDaMo has joined #osdev
paulusASol has joined #osdev
KidBeta has joined #osdev
medvid has joined #osdev
hgoel[m] has joined #osdev
<sahibatko> Hi, I have aquestion to qemu and uefi: I run qemu with ... -accel hvf -device virtio-vga ..., that results in one "virtio-vga" entry in the qemu's "view" menu. I see a single window, but detect 2 GOP devices with a framebuffer (plus one "protocol" without a framebuffer, but that is expected) - where did the second framebuffer come from? Next issue is that writing to the framebuffer directly does not
<sahibatko> light up any pixel in the active qemu window, but that later on.
xenos1984 has quit [Ping timeout: 256 seconds]
xenos1984 has joined #osdev
<sahibatko> * same result with -device bochs-display btw
janemba has quit [Ping timeout: 272 seconds]
janemba has joined #osdev
_mrlemke_ has joined #osdev
mrlemke has quit [Ping timeout: 246 seconds]
<klange> I think you need to disable the default display device, but doug16k probably knows better.
<geist> also i think you might be able to do something like -vga virtio?
<geist> instead of -device
<geist> but possibly you want -vga none -device virtio-vga
<geist> lots of combos, sometimes you gotta just try em all until something sticks
<sahibatko> will try those, as soon as I finish some refactoring, but that could be it
ElectronApps has quit [Ping timeout: 258 seconds]
ElectronApps has joined #osdev
<sahibatko> neither does work, but perhaps this is setting a different thing - in the real world, there is a GPU (or more), that has some outputs and some outputs can have a monitor connected. Question is, UEFI firmware enumerates a dedicated GOP protocol (handle?) for each GPU? Output? Monitor? I will probably just take the first one as default and be done with it... but I don't like that attitude
silverwhitefish has quit [Ping timeout: 265 seconds]
sortie has joined #osdev
iorem has quit [Quit: Connection closed]
pieguy128 has quit [Ping timeout: 272 seconds]
pieguy128 has joined #osdev
freakazoid333 has quit [Read error: Connection reset by peer]
vai has joined #osdev
<vai> oh well, sharing int 0x80 with PIC controller and system calls
<vai> tons lines of assembly to figure it out
<vai> which is it
<vai> definite you dont it to probe PIC every it makes system call :))
<vai> *everytime
<vai> using int 0x7F actually for system calls
KidBeta has quit [Ping timeout: 250 seconds]
springb0k has joined #osdev
SGautam has joined #osdev
srjek|home has joined #osdev
archenoth has joined #osdev
wootehfoot has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
SGautam_ has joined #osdev
SGautam__ has joined #osdev
SGautam has quit [Ping timeout: 272 seconds]
SGautam_ has quit [Ping timeout: 268 seconds]
SGautam__ has quit [Quit: Leaving]
Terlisimo has quit [Quit: Connection reset by beer]
theruran has quit [Quit: Connection closed for inactivity]
mahmutov has joined #osdev
Terlisimo has joined #osdev
Skyz has joined #osdev
Skyz has quit [Client Quit]
freakazoid333 has joined #osdev
mahmutov has quit [Ping timeout: 272 seconds]
Brnocrist has quit [Ping timeout: 256 seconds]
Brnocrist has joined #osdev
Gravis has joined #osdev
Gravis has left #osdev [Murdered]
Skyz has joined #osdev
nick8325 has quit [Ping timeout: 256 seconds]
nick8325 has joined #osdev
mlugg has joined #osdev
<mlugg> Hi, I've read about a 46-bit limit on physical addresses on AMD64. From what I've seen online, this is a limitation in 4-level paging (resolved by Intel's 5-level page table proposal), but I can't for the life of me figure out where the limit actually comes from; the addresses in standard page tables (as described in vol 2 of the architecture
<mlugg> programmer's manual) all extend up to 52 bits, so where's 46 come from?
<GeDaMo> Is it not 48?
nick8325 has quit [Quit: Leaving.]
<GeDaMo> "Similarly, the 48-bit virtual address space was designed to provide 65,536 (216) times the 32-bit limit of 4 GB (4 × 10243 bytes), allowing room for later expansion and incurring no overhead of translating full 64-bit addresses." https://en.wikipedia.org/wiki/64-bit_computing#Limits_of_processors
<bslsk05> ​en.wikipedia.org: 64-bit computing - Wikipedia
<mlugg> That's virtual
<mlugg> And that's a limit I understand
<mlugg> https://www.kernel.org/doc/html/v5.6/x86/x86_64/5level-paging.html here are kernel.org docs for instance, saying "Original x86-64 was limited by 4-level paing [sic] to [...] 64 TiB of physical address space" (64 TiB = 46 bits)
<bslsk05> ​www.kernel.org: 22.4. 5-level paging — The Linux Kernel documentation
<mlugg> And I saw a few random tech news bits online also suggesting that 64 TiB was a physical limit which 5-level paging resolves although it's very possible they were just quoting those kernel.org docs
mlugg has quit [Ping timeout: 240 seconds]
asymptotically has joined #osdev
Brnocrist has quit [Ping timeout: 258 seconds]
<geist> awwthey're gone
<geist> the physical limit is fairly arbitrary and different on different cores
<geist> some are 36, lots are 40, some are 46
<geist> it's described in a ID register
<geist> same as on x86
<j`ey> they said AMD64, not ARM64 :P
<geist> ah you're right. just woke up, eyes arne't seeing well
<geist> but it works same way on x86 really
<geist> cpuid describes the size and you get up to 52. and it can vary between cores
<geist> my take is they just dont include more TLB bits than that particular market the core is designed for targets
<geist> since at the end of the day it's basically tag width
<geist> also tag width in the caches too, but those are invisible
zoey has joined #osdev
<Skyz> I'm not sure I'm gonna continue programming, there is too much math involved and solving these problems takes long time and it's very mentally tasking
<Skyz> I don't know how you know so much about OS now that I started programming
silverwhitefish has joined #osdev
<Skyz> Publishers working on software for an homebrew OS would be cool
<Skyz> I have ideas for an OS I just can't implement them
vai has quit [Remote host closed the connection]
dennis95 has quit [Quit: Leaving]
Brnocrist has joined #osdev
klysm has joined #osdev
tenshi has joined #osdev
Arthuria has joined #osdev
Arthuria has quit [Ping timeout: 265 seconds]
<immibis> Skyz: publishers? Publishers are people who want to make money. Homebrew OSes do not.
mahmutov has joined #osdev
<Skyz> Maybe this time around homebrewers can make money, enough to sustain the OS
<bslsk05> ​awesomekling.github.io: I quit my job to focus on SerenityOS full time – Andreas Kling – I like computers!
freakazoid333 has quit [Read error: Connection reset by peer]
<immibis> nope.
freakazoid333 has joined #osdev
freakazoid333 has quit [Read error: Connection reset by peer]
tenshi has quit [Quit: WeeChat 3.2]
freakazoid333 has joined #osdev
freakazoid333 has quit [Read error: Connection reset by peer]
dutch has quit [Quit: WeeChat 3.0.1]
dutch has joined #osdev
warlock_ has joined #osdev
warlock_ is now known as doubletoker
doubletoker has left #osdev [#osdev]
Skyz has quit [Quit: Client closed]
Skyz has joined #osdev
offlinemark has quit [Quit: Connection closed for inactivity]
Skyz has quit [Quit: Client closed]
Skyz has joined #osdev
GeDaMo has quit [Quit: Leaving.]
robert_ has joined #osdev
<kazinsal> the serenityos guy is also a dickhead who thinks that operating system hobbyist purity tests should be conducted and include ideas such as "no precompiled images because if you're not building it yourself you're not a real osdev hobbyist"
<j`ey> I still hope they'll budge on that one day heh
SanchayanMaity has quit [Ping timeout: 272 seconds]
paulbarker has quit [Read error: Connection reset by peer]
jakesyl has quit [Read error: Connection reset by peer]
<Skyz> Serenity seems pretty significant
asymptotically has quit [Quit: Leaving]
freakazoid333 has joined #osdev
<Skyz> If it only goes to show that there is interest still in people having a new pc experience
<Skyz> I think win11 will be interesting once it comes out
SanchayanMaity has joined #osdev
jakesyl has joined #osdev
paulbarker has joined #osdev
<Skyz> Seems difficult to understand all the specifications for an OS
<j`ey> yup
<Skyz> I'm looking at the APIC now, and can only grasp a small amount of it, something about IRQs
<Skyz> I see that spoken about alot
pony has quit [Quit: WeeChat 2.8]
pony has joined #osdev
CryptoDavid has joined #osdev
pony has quit [Quit: WeeChat 2.8]
pony has joined #osdev
pony has quit [Client Quit]
mrlemke has joined #osdev
_mrlemke_ has quit [Read error: Connection reset by peer]
freakazoid333 has quit [Read error: Connection reset by peer]
zoey has quit [Remote host closed the connection]