klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<vin> I guess it's something most people want to achieve but the world is only making it harder with 100s of distractions.
Arthuria has quit [Remote host closed the connection]
sortie has quit [Quit: Leaving]
zoey has quit [Ping timeout: 255 seconds]
silverwhitefish has quit [Quit: One for all, all for One (2 Corinthians 5)]
dutch has quit [Quit: WeeChat 3.2]
dutch has joined #osdev
<heat> can osdev generally be weighed in as "relevant work experience" on a kernel/embedded/os dev job?
silverwhitefish has joined #osdev
<klysm> vin, how did you arrive at your above question? are you a disciplined person? why so/not?
<klysm> and, I am generally working on a problem until the inevitable peek at irc, which usually takes me away from my work. as an aside, others have generally not expressed an interest in solving problems that I create. I've gone towards several projects in the last several years, and done so mostly solo. Talking can be educational, just that not everyone can see things the same way. This would be required for c
<klysm> ollaboration.
<vin> klysm: I starting reading "deep work" (I could have unconsiously picked it up because I was not happy with the amount of work I do in a day). Especially when you are alone working on a project with no deadline and no schedule to put pressure on me, I realized I constantly seek instant gratification although I really want to do a good job on the project.
<vin> klysm: Am I disciplined person? Right now I don't think so. Which is why I believe creating silos are important. DNS blocking, throwing my phone away, are some things I am asorting to. I wonder if this normal?
Oli has joined #osdev
<vin> *this is normal
gog has quit [Ping timeout: 265 seconds]
srjek|home has quit [Ping timeout: 255 seconds]
<doug16k> omg. two triangles 640x480 32bpp 24bit-z opengl 3.3 core simplistic shader render is 11-13 microseconds per frame, lol
<doug16k> ah. it's hex. ok not that impossibly good
<doug16k> oops
<doug16k> 71-73us per frame
<doug16k> debug context though
vai has quit [Ping timeout: 255 seconds]
<klange> if any graphics masters [*cough*] want to take a poke at my shoddy little bilinear filter transformation system and help make it faster...: https://github.com/klange/toaruos/blob/master/lib/graphics.c#L764
<bslsk05> ​github.com: toaruos/graphics.c at master · klange/toaruos · GitHub
freakazoid333 has joined #osdev
ids1024 has quit [Ping timeout: 256 seconds]
ids1024 has joined #osdev
sts-q has quit [Ping timeout: 256 seconds]
sts-q has joined #osdev
ElectronApps has joined #osdev
isaacwoods has quit [Quit: WeeChat 3.2]
nyah has quit [Ping timeout: 250 seconds]
Izem has joined #osdev
<Izem> what's and interesting idea for a graphics layer if your are only interested in ascii?
<Mutabah> A framebuffer-backed terminal you mean?
<Izem> yeah, but specifically I don't want to go the terminal route with escape codes and all that
<Mutabah> You could make your API use out-of-band signalling
<Izem> graphics are not gonna be important to me for a while
<Mutabah> or be stateless (and calls that provide the rendering particulars)
<Izem> out of band signalling sounds like a terminal?
<Izem> oh right, I messed up
<Izem> I meant english text
<Izem> not ascii
<moon-child> Izem: look at bearlibterminal and libtickit
<moon-child> good libs for text-based graphics
<moon-child> tickit has a concept of a 'pen', which is associated with some set of style information (fg/bg colour, bold/italic/underline, ...), and you can say 'draw using this pen'. It keeps the good parts of palettes (a la curses) without the bad
<Izem> thanks
<klange> Curses was more of a solution to a problem that no longer exists than it was a good approach to managing TUIs.
<moon-child> at the lowest level, the simplest thing is double buffered grid of cells, where each cell contains style information and text (or a note that the current cell is the continuation of a double-width character in the previous cell). There is more you can do, but past a certain point it stops making sense to call it 'text' (or, it stops making sense to put all that in a text-only framework)
<klange> ^ This is pretty much how my terminal works.
<moon-child> klange: talking specifically about curses's colour pairs; partly they're part of the hardware (afaik?), but they're also meaningful at the application level as a way to create a set of associated widgets
<moon-child> e.g. alpine uses them this way
<klange> Color pairs aren't really a thing, nothing does paletized fg/bg colors like that that I'm aware of, and frankly I think that's one of curses' biggest sins that did not adapt as terminals got better.
<klange> Now that we can throw 24 bit colors at each of the fg and bg, having a limited set of pairs to pick from is an antiquated relic of the days when memory was measured in kilobytes.
<moon-child> yes, I hate colour pairs with a passion
<Izem> conversely, have people gotten to do osdev without a ui? I can't picture that
<moon-child> even tried to get them to let me fix it for ncurses (https://lists.gnu.org/archive/html/bug-ncurses/2019-08/msg00019.html)
<bslsk05> ​lists.gnu.org: A colouring api that doesn't suck
<klange> Plenty of kernel projects that don't touch anything front-end-related, but I think without at least some attempt at a user interface you're not doing "operating system" development if you're _just_ doing a kernel with no user-facing way to do stuff with it.
<klange> As the classic GNU/Linux copy-pasta implies, a kernel alone is not an operating system.
<Izem> that made me think of a server focused os, but yeah without a ui how do you poke about? :P
<Izem> this will be interesting
<moon-child> you don't need much more than dumb tty to run a shell. Can even punt on the actual display part if you do net and expose telnet (or ssh even!) instead
<moon-child> s/much//
<klange> Serial or bust.
<Izem> do emulation tools support serial?
<klange> Of course.
<Izem> like vbox and vmware
<Izem> oh ok cool
<klange> You'll find options such as serial-over-TCP, serial-over-Unix-socket, serial to a file, and qemu has both serial on stdio and also has options for tabs in the GUI frontends though I have no idea what sort of terminal emulation it supports.
<klange> This clock reminds me that I should really implement timezone stuff so I can do UTC RTC...
<kazinsal> vmware's serial-over-tcp on ESXi requires a proper full monty vSphere license
<kazinsal> but yeah, serial over whatever you want is common
<klange> Workstation supports pretty much everything but hides virtually everything in a config file.
<klange> Quick explanation of the QEMU options in that screenshot since it's kinda odd:
<moon-child> don't you have to pay for all the versions of vmware?
<klange> No, Workstation is free-for-noncommercial-use.
<moon-child> I mean, month-long free trial (that you can reset at will with questionable legality), but
<moon-child> ah, hmm
<klange> It's a pain in the ass for actually debugging, with intentional misfeatures like having to restart the whole application to get back to the machine configuration.
<klange> `-nographic` pretty self-explanatory, `-no-reboot` quits on restart and exiting the shell triggers restart, so `exit` does what it should, `-audiodev none,id=id` shuts up pulse to keep the output clean as it inevitably complains about something
<klange> `-serial null -serial mon:stdio` this one is fun; this disables "COM1", which in this case doesn't mean much but in UEFI boot keeps OVMF from spamming crap to the terminal; the monitor and stdio serial running "COM2", which I map as /dev/ttyS1
<Izem> does COM{1,2} predate windows?
<klange> Those names for them are DOS-era and not really used in x86 Unix-likes.
<Izem> I see, I wondered at that since I remember qemu is an open project
<klange> The fw_cfg options: opt/org.toaruos.gettyargs gets passed to the 'getty' app that manages serial consoles, -a like in Linux getty means "autologin", and /dev/ttyS1 is COM2.
<Izem> thanks
<klange> opt/org.toaruos.bootmode - my bootloader parses this to pick a boot mode without the UI, supports a few different strings for quick boot.
<klange> opt/org.toaruos.term - the Makefile is actually setting this to $TERM, gets read by one of my init apps, possibly getty? I don't even remember! but ensures the hosted terminal knows what it's running on
<klange> getty also does one other fun little hack where it rams the cursor into the lower right corner and does a position report request, so it can get the size of a remote terminal.
<klange> Or more correctly, shells out to a tool that does that: https://klange.dev/s/Screenshot%20from%202021-07-14%2013-34-08.png
* kingoffrance .oO( "if any graphics masters [*cough*] " ) *coughs* and sticks magnets under graphics master
<kingoffrance> ive done all i can do
<klange> Just in case anyone thought I was a GUI hardliner, heck no, I provide first-class experience over serial and in VGA text mode.
<Izem> klange: I don't get that bit about the cursor
<klange> Terminals have a size. If you are attached directly to a terminal emulator there is a signalling mechanism where the terminal emulator can tell the TTY layer how big it is.
<klange> This is important for running any TUI app, of course.
<Izem> yeah
<klange> And if you are using ssh, and even telnet, there are mechanisms for those to pass this information between endpoints.
<klange> Serial does not have this, you have to manually configure sizes.
<Izem> but can't you do that without putting the cursor in the corner?
<klange> Weirdly, there is no standardized escape sequence for "tell me how big you are". Not sure why, just never seemed to happen. But there's a silly workaround: There are cursor movement sequences, very standard, been around for ages, and there is "cursor position report" that shoves data into the input buffer.
<Izem> oh I see
<Izem> makes sense
<Izem> kingoffrance: did you ever read graphics gems?
<klange> And the standard handling of a position that is too big is to 'trap' the cursor in the bottom right corner. So you ask for a ridiculous position like 10000,10000 and then ask where the cursor is and bam, you know the size of the terminal.
<moon-child> you can also use the 'cursor report' sequence to do other fun things (http://nethack4.org/blog/portable-terminal-codes.html)
<bslsk05> ​nethack4.org: Towards being able to ignore $TERM
<klange> You can also use cursor report to figure out a remote terminal's wcwidth, but it's, uh, messy.
<kingoffrance> Izem no, i know nothing, just ...theres ways to summon certain channel members...
<kingoffrance> i used to have https://www.jagregory.com/abrash-black-book/ but never went through it
<bslsk05> ​www.jagregory.com: Michael Abrash’s Graphics Programming Black Book, Special Edition
<Izem> thanks, seems to have a good bit about the vga
<kingoffrance> if i ever got that far, i will start with serial port and "at the lowest level, the simplest thing is double buffered grid of cells, where each cell contains style information and text"
ElectronApps has quit [Remote host closed the connection]
<Izem> sounds like what I'm gonna do :P
<Izem> but I'm also going to have to answer important questions about what an OS is so I don't end up making emacs
<kingoffrance> and then "client" whatever can decide how many of "style" stuff it can honour/display, else fall back is ignore them all i suppose
ElectronApps has joined #osdev
<kingoffrance> even that, "text" still means "charset" or utf or whatever, so itself needs defined
heat has quit [Ping timeout: 276 seconds]
<kingoffrance> anyhow, i have no idea if that is good idea, just i envision the "style" stuff that you could still have "client" display portions, even if it cant handle the full deal
<klange> Things like `screen` and `tmux` do that, they can take in their own particular dialect of the standard escape sequences and output through a variety of dialects.
<kingoffrance> "Implementing and Optimizing Bresenham’s Line-Drawing Algorithm" i do have a very crude that, not optimized, but i dont consider that anything except "this is how you normalize/fudge a line to square pixels"
<kingoffrance> (mentioned in book, i used that elsewhere, but not really hooked up to anything)
<kingoffrance> i think that is very basic/famous/simple, just thats maybe as far as graphics i have got
<klange> I can report it is sunny and nice on my side of Tokyo, so these thunderstorms must be on the south side...
<klange> The little bugfix I did to my rounded rectangle renderer has a noticable effect on the corners of these bubbly popups.
<klange> I wonder if my fuzzy unhinted text would be improved with a gamma curve or whatever it's called?
<moon-child> try it. Just do clr_val = pow(clr_val, 2.2)
<moon-child> (where the value is in [0,1])
<Izem> when doing the cross compiler does that mean all the binutils have to be prepared the same way?
lucf117 has quit [Remote host closed the connection]
<klange> standard 2.2 way too wiry, but might look into other gamma curves...
Izem has quit [Quit: Izem]
MarchHare has quit [Ping timeout: 255 seconds]
MarchHare has joined #osdev
^[ has quit [Ping timeout: 276 seconds]
^[ has joined #osdev
ElectronApps has quit [Read error: Connection reset by peer]
ElectronApps has joined #osdev
MarchHare has quit [Ping timeout: 255 seconds]
vdamewood has joined #osdev
sortie has joined #osdev
Burgundy has joined #osdev
elastic_dog has quit [Ping timeout: 255 seconds]
elastic_dog has joined #osdev
mhall has joined #osdev
elastic_dog has quit [Ping timeout: 255 seconds]
elastic_dog has joined #osdev
gmacd has quit [Remote host closed the connection]
zaquest has joined #osdev
dennis95 has joined #osdev
gog has joined #osdev
z_is_stimky has quit [Read error: Connection reset by peer]
z_is_stimky_ has joined #osdev
GeDaMo has joined #osdev
elastic_dog has quit [Ping timeout: 245 seconds]
elastic_dog has joined #osdev
dormito has quit [Ping timeout: 255 seconds]
Skyz has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<Skyz> I think it would be interesting to make an OS that starts off with only two colors, black and white
dormito has joined #osdev
<Skyz> Something that can resemble the game boy original feel
<klange> 1) That's a hardware thing, not really an OS thing. 2) The original GameBoy, rather famously, was 2-bit grayscale, and its LCD's color contrast and green backdrop meant it was really more "4 shades of olive" than even "gray". Not at all "black and white".
<Skyz> 2-bit grayscale huh
<Skyz> It could be self-imposed on a software level
<Skyz> I'm not looking to follow the tutorial to make the same thing
elastic_dog has quit [Ping timeout: 255 seconds]
elastic_dog has joined #osdev
silverwhitefish has quit [Quit: One for all, all for One (2 Corinthians 5)]
scaleww has joined #osdev
elastic_dog has quit [Ping timeout: 245 seconds]
ahalaney has joined #osdev
elastic_dog has joined #osdev
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
vai has joined #osdev
<vai> hi all :)
<vai> sortie: hi! :D
<sortie> Hi
<Skyz> Hi
iorem has joined #osdev
isaacwoods has joined #osdev
johnjay has quit [Ping timeout: 255 seconds]
silverwhitefish has joined #osdev
heat has joined #osdev
shikhin has quit [Quit: Quittin'.]
zgrep has quit [Quit: It's a quitter's world.]
zgrep has joined #osdev
shikhin has joined #osdev
johnjay has joined #osdev
<gog> hi
Skyz has quit [Quit: Client closed]
<jimbzy> Sup gog
Skyz has joined #osdev
<Skyz> Okay, so I'm doing some research into reflection-oriented programming
<Skyz> I think if you use a language to construct the os that has a bytecode interpreter it could reflect on itself
<jimbzy> Why?
srjek|home has joined #osdev
<Skyz> self-awareness
nyah has joined #osdev
<Skyz> In computer science, reflection is the ability of a computer program to examine (see type introspection) and modify its own structure and behavior (specifically the values, meta-data, properties and functions) at runtime.[1]
<jimbzy> I know what reflection is, but why are you interested in it?
<Skyz> It can be used for hacking games
<jimbzy> That's cheating, tho.
<Skyz> ¯\_ (ツ)_/¯
<jimbzy> I never used reflection for that. I used a debugger and a hex editor.
<jimbzy> I also got an angry letter from Hasbro because they didn't like my RCT trainer.
<Skyz> I only cheat at games that you can't beat
<jimbzy> I can beat any game, except one. I have never beaten the original Battletoads on the NES.
<Skyz> I hear that game is hard to beat
<gog> i can't beat any games
<Skyz> I played frogger as a kid for ps
<jimbzy> I love frogger, too.
<jimbzy> gog, I have to play offline because the kids beat my ass and make fun of me :(
<gog> :(
<heat> i love getting told what to do in online games by Polish 9 year olds
<Skyz> lol
<heat> it's the cherry on top of the cake
<jimbzy> heat, I had a great team when CoD:BO first came out.
<jimbzy> We were all over 30 and very organized.
<jimbzy> At one point, we were ranked like 5000th overall in zombie mode on Kino Der Toten.
<jimbzy> It's funny, too, because to this day when my son sees a Nazi in uniform on TV he calls them "Zombzis"
<Skyz> I haven't played CoD since black ops
<Skyz> Never got into it
<Skyz> Was more of a fan of Halo
<jimbzy> Never tried Halo
<jimbzy> Hell, I've never actually played Half-Life.
<heat> i've never played COD in my life lol
<jimbzy> I played the hell out of that one, but that was about it.
<heat> all I play is counter strike and rocket league
<jimbzy> I started playing the original FFVII again on PS4.
<heat> the toxicity is truly part of the experience
<jimbzy> It's too much like going to a family reunion for me, heat :p
<heat> :D
<gog> i'd rather get called f-g by somebody whose face i can't see than my cousin :p
<jimbzy> That happened at the last one I went to in 2009.
<jimbzy> My cousin called my younger brother that. It didn't end well for him.
freakazoid333 has quit [Read error: Connection reset by peer]
<gog> good
<jimbzy> Yeah, they're pretty ignorant.
MarchHare has joined #osdev
<jimbzy> I think I'm going to walk down to the store and get a cup of coffee before it heats up out there. I'll catch you all later.
<gog> byee
heat has quit [Read error: Connection reset by peer]
kingoffrance has quit [Ping timeout: 252 seconds]
mahmutov has joined #osdev
iorem has quit [Quit: Connection closed]
kingoffrance has joined #osdev
kingoffrance has quit [Ping timeout: 255 seconds]
freakazoid333 has joined #osdev
kingoffrance has joined #osdev
<Skyz> What if Hollywood is right about skynet :o
scaleww has quit [Quit: Leaving]
Oli has quit [Quit: Lost terminal]
<Skyz> Well it looks like skynet is built
<bslsk05> ​en.wikipedia.org: Skynet (satellite) - Wikipedia
ElectronApps has quit [Ping timeout: 272 seconds]
mahmutov has quit [Ping timeout: 245 seconds]
<Skyz> geist: there's even a satellite called Zircon https://en.wikipedia.org/wiki/Zircon_(satellite)
<bslsk05> ​en.wikipedia.org: Zircon (satellite) - Wikipedia
<Skyz> G2G, that's some food for thought
<bslsk05> ​en.wikipedia.org: John von Neumann - Wikipedia
<Skyz> His wiki got updated significantly
Skyz has quit [Quit: Client closed]
<nur> would I be a bad osdevver if I started poking at another hardware platform while also doing another
<sortie> It will change your alignment from lawful to chaotic, but no change to your moral standing
srjek|home has quit [Ping timeout: 255 seconds]
<gog> chaotic neutral gang
<nur> I always wanted to be more Han Soloesque
<nur> what the hell
<nur> let's do this
<kingoffrance> as ive said months ago, han solo is ship of theseus "ive made some upgrades"
<kingoffrance> so, i see no conflict...
zoey has joined #osdev
<nur> is raspi4 a valid qemu-arm target machine
<nur> I feel like it's been merged yet
<nur> $ qemu-system-aarch64 -machine raspi4
<nur> qemu-system-aarch64: -machine raspi4: unsupported machine type 'raspi4'
<nur> lol nope it doesn't work
<clever> nur: what about `-machine help` ?
<nur> yeah it's not there
<nur> $ qemu-system-aarch64 --version
<nur> QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.16)
<nur> maybe it's the version
<clever> hw/arm/raspi.c
<clever> its defined in this file of thesource
<bslsk05> ​github.com: qemu/raspi.c at master · qemu/qemu · GitHub
<clever> nur: yep, 4 is missing from master
<j`ey> no rpi4
zoey has quit [Ping timeout: 255 seconds]
<nur> someday
<clever> nur: what pi4 specific feature are you wanting to emulate?
<nur> not there yet :)
<nur> just wondering if it's very different
<clever> behind the scenes, a lot has changed
<clever> but qemu didnt emulate 90% of the stuff that has changed
<clever> the only real difference qemu supports, is >1gig of ram
<clever> and pci-e support
<clever> everything else is missing from qemu, even on the pi3 machine
<nur> I guess I can start working on my arm OS on RPI3 Qemu edition and worry about it when I can afford to buy a real one
<j`ey> nur: or just target the 'virt' machine
<nur> so it'll be like a "generic ARM OS"?
<clever> nur: ive been doing baremetal on the rpi as well, but not using the arm core
<j`ey> nur: ish
<nur> It still boggles me how I can just "not worry about it too much"
<clever> nur: do you want to support the rpi specificly, or just do generic arm dev? does it have to be arm?
<nur> I want to buy a RPI so I can boot my OS on it
<nur> so yes?
<nur> it doesn't _have_ to be ARM but it has to be something right
<clever> nur: there is also the much less traveled road, of doing VPU development on the rpi
<j`ey> dont do that
<j`ey> lol
<clever> nur: the rpi has 2 seperate cpu clusters in the same chip
<clever> j`ey: why not? :D
<nur> yeah I just wanna target ARM so that when I can get my hands on a real machine I can try it out
<j`ey> clever: you know why P
Skyz has joined #osdev
<bslsk05> ​github.com: lk-overlay/start.S at master · librerpi/lk-overlay · GitHub
<clever> nur: an example of VPU asm, line 12 turns IRQ's off, 13 sets up the stack, 15 may enable the uart very early, 17 clears .bss, 18 passes control to C, and 20 will loop forever if C somehow returned
<clever> from then on, you can just use C, and any rpi peripheral, same as if you where on the ARM core
<Skyz> Nur: I would do virtu or you can target the "hack" platform from NandToTetris
<bslsk05> ​onatm.dev: Anatomy of a Hack assembly program - Part 1 | Extremely random blog posts from Onat
zoey has joined #osdev
srjek|home has joined #osdev
<nur> thanks :)
<Skyz> Np
dennis95 has quit [Quit: Leaving]
<Skyz> I think you can put the hack computer on an FPGA
<bslsk05> ​hackaday.io: Nand to Tetris in Verilog Part 1 - Icarus | Hackaday.io
<immibis> hey guys, hey guys, guys, hey. What if the processor could speculate BOTH sides of the branch?
<GeDaMo> I think that some do
tacco has joined #osdev
Vercas has quit [Remote host closed the connection]
Vercas has joined #osdev
Mooncairn has joined #osdev
<Skyz> Found something working on FPGA
<Skyz> Full project is here
<bslsk05> ​gitlab.com: Michael Schröder / nand2tetris-FPGA · GitLab
freakazoid333 has quit [Read error: Connection reset by peer]
Skyz has quit [Ping timeout: 246 seconds]
<jimbzy> NAND to Tetris, eh?
<GeDaMo> NAND 2 Tetris is good
srjek|home has quit [Ping timeout: 245 seconds]
<GeDaMo> At least the hardware part is, I lost interest when it got to the software part :P
* geist yawns
<geist> good aftenoon folks
<jimbzy> What hardware does it support?
<jimbzy> Hey g.
<jimbzy> What's up?
<GeDaMo> It's all done through simulators
<geist> oh not much
<jimbzy> If I'm not mistaken, Tetris was originally written for the Electronika 60, which was similar to the PDP-11.
asymptotically has joined #osdev
<gog> yes
<gog> LSI's clone
Skyz has joined #osdev
<Skyz> Yes, your right. Tetris was made in Russia
<Skyz> I'm actually surprised how advanced Russia is with technology
<Skyz> Germany too
<gog> it was made in the soviet union, tovarisch
<j`ey> linus tovarisch
<Skyz> Well, there are a lot of places technology is being produced
<Skyz> The course was written by someone in Israel, they have some affinity with the Soviet Union
freakazoid333 has joined #osdev
immibis has quit [Killed (NickServ (GHOST command used by immibis_!~immibis@2a02:3032:404:1b60:a9d2:4a7a:60e:e127))]
immibis has joined #osdev
<moon-child> immibis: wait'll you hear about hyperthreading. If the cpu runs into a branch (or cache miss, just generally runs out of things to speculate), it can just pick some other instructions to execute instead
<Skyz> I've recently seen a lot with China, there is an interesting look on tech on Bloomberg's YT
<moon-child> ;o
<clever> moon-child: from what ive heard, x86 hyperthreading is a complex blend, of having say 2 opcode decoders, and 2 very basic opcode execution units, but then sharing more expensive things like sse and fpu cores
<bslsk05> ​'Inside China's Accelerating Bid for Chip Supremacy' by Bloomberg Quicktake (00:19:49)
<clever> moon-child: on the VPU for example, each core has 2 scalar units and 1 vector unit, so its able to cheat and run 2 scalar opcodes in the same clock cycle (if conditions are right)
<clever> moon-child: and if you start a vector opcode, but dont try to read its result right away, the scalar units can run dozens of opcodes, in parallel to the vector unit doing one opcode
<clever> x86 hyperthreading, is probably just sharing some of those units between multiple cores
<geist> clever: not really the right terminology
* sortie works on Unix socket file descriptor passing support in sendmsg/recvmsg
<geist> x86 hyperthreading is a marketing term for SMT (simultaneous multitthreading). been around a while, lots of architectures have had it
<geist> think of it as a single core that is simply maintaining state of N separate hardware threads
<geist> and internally switching between instructions between threads
<sortie> Damn this convoluted 2017-era prototype code of mine scares me and there's reasons why it's not simpler and lots of unhandled edge conditions
<geist> so it's less that multiple cores are sharing hardware (except maybe AMD's bulldozer) and more than a single core holds multiple hardware thread state at the same time
<clever> geist: ah, maybe i'm thinking of something else
YuutaW has quit [Ping timeout: 240 seconds]
<geist> AMD's somewhat ill conceived CMT (bulldozer) is a different story
<clever> so hyperthreading is more about just having say double the registers, and context switching when it would have stalled?
<geist> clever: basically right
<kingoffrance> not that i know anything but another approach? https://en.wikipedia.org/wiki/Digital_signal_processor DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture, which use separate program and data memories (sometimes even concurrent access on multiple data buses). all i know is
<kingoffrance> , "MIPS" you must compare apples and oranges to have any semblance of meaningfulness. even that ancient black book graphics programming "As I was writing my last game, I discovered that the program ran perceptibly faster if I used look-up tables instead of shifts and adds for my calculations. It shouldn’t have run faster, according to my cycle counting, but it did. In truth, instruction fetching was rearing its head again, as it often does, a
<bslsk05> ​en.wikipedia.org: Digital signal processor - Wikipedia
<kingoffrance> nd the fetching of the shifts and adds was taking as much as four times the nominal execution time of those instructions." under the heading "assume nothing"
<clever> and i can see how each stage in the pipeline could be doing a different thread
<clever> so it can weave them together
<geist> there are all sorts of schemes to do decide when to context switch, but i think in the steady state most designs will just toggle back and forth
<geist> clever: right, and all of the usual ependency tracking hardware Just Works in a SMT case
<geist> since instructions from different threads are intrinsically not dependent on each other
<geist> so you already have this highly out of order cpu that just tosses a bunch of unrelated instructions at it, it is even more efficient at it
<bslsk05> ​en.wikipedia.org: Bulldozer (microarchitecture) - Wikipedia
<clever> model name : AMD FX(tm)-8350 Eight-Core Processor
<clever> and i'm on the list in that wiki page!
<clever> so my cpu is the exception to that rule, and isnt the same kind of hyperthreading?
<geist> not that bulldozer and amd's CMT was distinctly different from standard SMT, since it shares far less than the whole cpu
<geist> right. CMT turned out to be a Bad Idea
<geist> zen switched to full SMT
<clever> what was a bad idea with it?
<geist> didn't work worth a crap
<geist> or more to the point SMT simply does a better job
ccx has quit [Ping timeout: 272 seconds]
<clever> ah
<geist> hard to tell if CMT was torpedoed mostly by bad implementation or was a fundamentally bad idea
<immibis> clever: you are describing the modern out-of-order-execution system, not sure if it has a name, but "tomasulo's algorithm" may come close
<geist> but on paper SMT should generally be superior in every way
<immibis> the decoder issues instructions as fast as it can; they wait in some kind of buffer until their dependencies are satisfied; then they get allocated to any available execution unit (or wait further if none is available)
<geist> my understanding is CMT simply shares the fpu/vector back end between pairs of cores, but otherwise they're standalone
<immibis> any available execution unit which can execute that instruction*
<geist> yah one way of thinking about modern designs is theyr'e really a pile of little cpus all talking to each other asynchronously
<clever> immibis: and my understanding is that the bulldozer CMT, is sharing those execution units between cores?
<kingoffrance> *must compare apples and apples
<clever> geist: so if 2 cores are doing non-vector ops, they can run in parallel, and do more work then a SMT design? not counting the stalls wasting time
<geist> clever: right, thats the idea
<clever> or only one core in the pair limits itself to fpu/vector ops
<geist> OTOH that'd use more space than a SMT design, so you could potentially cram two SMTs in the same space, etc etc
<immibis> it appears yes although i hadn't heard of it until today. The FPU units are shared. Though Wikipedia's diagram also only shows one decoder, which is odd...
<clever> while SMT will have the 2nd core suffer far more, if your compiler was smart enough to avoid stalls
<geist> it's all about what tradeoffs you get. in the bulldozer case they seemed to bet that fpu/vector performance wasn't as important and they could save space by sharing one between cores
<immibis> which makes sense unless you are doing machine learning. Most code is boring. But, maybe most code that cares about performance is not so boring.
<geist> immibis: it got complicated. there were 4 cores in the family 15h: bulldozer, piledriver, steamroller, excavator
<geist> in the later revisions they split the decoders out, etc
<geist> by the end of it (excavator) it was fairly decent, but still outclassed by equivalent intel cores
<immibis> i guess you only really need one core-equivalent to run windows and word. Extra cores are for gaming and machine learning. And you'd better be able to use them for that
<geist> it was okay, just not good enough
<clever> immibis: a lot of the games i play tend to be single-threaded :(
<geist> it was a case where the whole family 15h was not a complete dumpster fire, just middling performance
<mjg> your web browser maeks up for it
<geist> and thus no reason to get it
<geist> and/or they sold it as bargain bin low end, with no margins
<geist> by the end of the line they weren't even making new desktop cpus, since no one would buy it. excavator was a reasonably good design but i think it only made it to laptops, since that was the only market AMD could sell it into (at cheap prices)
<clever> > The longer pipeline allowed the Bulldozer family of processors to achieve a much higher clock frequency compared to its K10 predecessors. While this increased frequencies and throughput, the longer pipeline also increased latencies and increased branch misprediction penalties.
<geist> anyway. Zen completely destroyed it, since they went back to the drawing board and made a much more proper (re: more like intel) core design
ccx has joined #osdev
<geist> kinda recommend, there's a good interview with Jim Keller on anandtech. doesn't talk about a lot of tech details, but he was the guy they brought in to fix AMD's problem
<geist> seems like a real smart guy and a straight shooter, so to speak
YuutaW has joined #osdev
<geist> also side note he straight up talks about there being two front ends in development for what became the zen core: K12 (arm decoder) and the x86 decoder
ZetItUp has quit [Read error: Connection reset by peer]
<geist> presumaly the K12 project is parked, but i've always had the strong suspicion that the zen backend has a lot of ARMisms in it because it was designed to also support arm front end
<clever> would the frontend be set in stone when fabbed, or is there any chance of context switching it at runtime?
<geist> you could, though it'd be complicated
<geist> to a certain extend apple doing what they do wth the M1 (having a strongly ordered mode) is probably the right way to go about it
<geist> make it so that the cpu runs more or less the same 'way' memory order wise, then you can do a fairly straightforward binary translation between the two fully capable ISAs
<clever> from what ive read, the M1 is just an arm frontend with the memory ordering mode being toggleable
<geist> right, but that memory order thing is a Big Deal
<clever> so you still have to translate the x86->arm, in the raw binary, and fixup the addressing
<geist> since aside from ISA the two architectures do approach memory order completely differently
<clever> having 2 frontends, would eliminate the need for the translation step
<geist> sure, but translating between two ISAs like that is a solved problem
<geist> sure, but like all things its not free, so you can make it a software problem, which can get better over time (ie, can be upgraded in the field) and you can even cache the translations, etc
<geist> plus in their case the idea is to eventually not run x86 anymore, so there's little point investing in hardware to do it
<clever> yeah
<clever> the dalvik stuff on android is doing similar
<clever> at one time, it was interpreted bytecode, with only an install-time linker patching
<geist> itanium, for example, had an x86 decoder built in (could run it in x86 compatibility mode) but it ran so terribly it was hard to use
<clever> but now its using llvm to translate it into native at install time
<geist> another one of those failures of the itanium design. they probably should have gone in with a good SW translator instead of worrying about wasting hardware on it
<geist> though it was also designed to be dropped in later itaniums, which it was IIRC
<clever> i'm also reminded of the BMOW1 and its micro-code flash chip
<clever> basically, a 4 bit micro-code PC, the 8bit opcode latch, and a condition var, are used as address lines into the micro-code flash chip
<immibis> seems about right. That's just a slightly upgraded PLA
<clever> and the raw data lines out, control all of the latches/buffers in the cpu, to route data betweenregisters/alu
<immibis> A ROM is really just a fully decoded PLA and NOR flash is a reprogrammable ROM
<clever> but, if you just have an opcode set register as a few more addr bits
<clever> you could context switch to an entirely different microcode table, at runtime
<geist> oh side note, replacement PLA for my C64 came in. fixed it right up
<clever> immibis: yeah, ive seen die shots of the 6502, and it has a maskrom for the microcode
<geist> not really. that not microcode so much
<clever> immibis: the BMOW1 is just a 6502 compatible cpu, made out of raw logic gates and wire-wrap, with a pinch of standard flash/ram added in
<immibis> wait the BMOW is 6502 compatible? heh I didn't realize that
<clever> immibis: i think it was
<bslsk05> ​www.bigmessowires.com: BMOW 1 Computer | Big Mess o' Wires
<geist> the the https://gigatron.io/ is also 6502 compatible now with a new rom
<bslsk05> ​gigatron.io: Gigatron – TTL microcomputer
<clever> > The high-level instruction set that’s implemented in this microcode is a close cousin to 6502 assembly language.
<immibis> <clever> but, if you just have an opcode set register as a few more addr bits <- you can also do that with spaghetti logic, or whatever you're using, you could even power off a whole decode unit and start up a new one
<geist> similar idea. simple 8 bit microcode class cpu that is furiously running an emulator for a larger 16 bit cpu, and now an optional 6502 mode
<immibis> not sure if the gigatron really counts, isn't it using software emulation?
<geist> same thing
<geist> that's the point, microcode looks like software at a particular level
<immibis> by that logic my gameboy can be switched to the x86 instruction set
<immibis> just by changing a rom
<geist> sure. also.... remember transmeta?
<immibis> in fact I don't even need soldering because the rom comes on a user-replaceable cartridge
<clever> immibis: for the 6502 in the c64, the maskrom was inside the cpu itself
Mooncairn has left #osdev [Leaving]
<geist> the lines are blurred when you get into deep microcode. in general i think you can tend to call it microcode if it looks a particular way
<geist> ie, if the instruction indexes directly into a rom, which starts a sequence of control logic
<immibis> clever: I believe that kind of "microcode" will be tightly integrated with the CPU. In fact, even the BMOW1's probably is. It can't be reprogrammed arbitrarily, it can only run instruction sets that look enough like the one it was designed for
<immibis> which could be quite wide in the case of the BMOW1
<clever> immibis: i think it has 2 main limits
<clever> 1: the number of raw registers
<clever> 2: it relies on a design where you have 8 bits of opcode all by itself, followed by operands and immediates in the next bytes
<immibis> things like ALU flags will be hard-wired. You want a half-carry flag and you don't have one in hardware? too bad
<immibis> ah yes that too
<clever> 2: the current design is also limited to 16 steps for an opcode
<clever> 3*
<clever> 1 can easily be solved, by just throwing more registers into the design, and having a wider output from the micro-code rom
<geist> immibis: but yeah i think you're right in that the low level microcode on the gigatron is a bit more cpu like. i guess to me the real question is whether or not it runs a code that fetches the next instruction and then looks in a table
<clever> 2 gets tricky.....
<geist> vs if the microcode itself directly dispatches. it's a detail and probably not important
<clever> 3, just add more bits to the micro-code PC counter
<geist> anyway, too many conversations at once
* geist bows out
<immibis> IMO real microcode is designed with hardware integration; direct instruction dispatch is part of that. It's hardware and "software" designed together, not just software running on hardware
<immibis> if it's easy to implement and makes the CPU faster then you do it, even if it limits the microcode you can write
<clever> immibis: in the case of the bmow1, there isnt really a pipeline
<clever> so if an opcode takes 4 microcode steps to run, then it takes 4 clock cycles to run, and cant share a cycle with anything
<immibis> didn't say anything about pipelining
GeDaMo has quit [Quit: Leaving.]
<geist> immibis: you're right though, the gigatron's microcode is more of an interpreter: https://github.com/kervinck/gigatron-rom/blob/master/Core/dev.asm.py#L1458
<bslsk05> ​github.com: gigatron-rom/dev.asm.py at master · kervinck/gigatron-rom · GitHub
<geist> that seems to be the core loop for the vcpu
<clever> that sounds a lot more powerful, at the cost of spending more cycles to do a given task
<geist> it basically runs as many instructions as it can before a vblank interrupt comes along, in which case it runs the logic to bit bang the video output, and back to interpreting instructions
<clever> is the bit-banging written in the micro ops or the interpreted ops?
<geist> micro ops. what you're looking at there is the micro op assembler
<geist> they did something clever that i wouldn't have though of: instead of writing an assembler, they simply defined all of the asm instructions as a bunch of python functions
<geist> and then implement all of their assembly as python itself
<clever> ah, so they are cheating a bit, and having the high performance (bit-banging video) stuff skip the interpreter, and run directly on the raw microops
<geist> when you run this .py it spits out the rom
<geist> yah the whole point is it's ust TTL and there's no dedicated sound or video hardware
<geist> the TTL itself context switches between interpreting instructions and bit banging hardware
<clever> the original VPU assembler was just C macro's, compiling and running it would generate a rom image
<geist> actually not too unlike the xeros alto
<clever> same basic idea
<geist> xerox alto did the same thing: had like 16 levels of microcode, hardware switched, and the different tasks did various hardware keeping
<geist> and then the lowest priority task just interpreted data general nova instructions which is what application code ran on
<bslsk05> ​github.com: videocoreiv/dumpbootrom.s at master · hermanhermitage/videocoreiv · GitHub
<clever> an example of that macro based assembly code
silverwhitefish has quit [Quit: One for all, all for One (2 Corinthians 5)]
Skyz has quit [Quit: Client closed]
dormito has quit [Ping timeout: 255 seconds]
<sortie> This 2017 Unix socket recvmsg/sendmsg code of mine is in dire need of comments explaining wtf is going on :)
<sortie> It's tricky because the control data also goes in the actual buffer so I need to maintain headers describing the layout and there are reference pointers (file descriptors) in there too (that can cause reference counting cycles!)
<immibis> anyone here looked at capability-based operating systems deriving from GNOSIS?
<immibis> KeyKOS, EROS, CapROS
<immibis> is L4 similar?
PapaFrog has quit [Ping timeout: 258 seconds]
<immibis> these are nanokernels where the kernel cannot even allocate memory; all kernel state is held in pages given to it by userspace and persisted alongside userspace data
<moon-child> l4 doesn't allocate because dynamic memory allocation is hard to verify :^)
<immibis> presumably it does something similar then, stores data about capabilities in pages accounted to userspace
<moon-child> the main thing that's interesting about keykos (imo) is persistence, which l4 doesn't have
<immibis> wikipedia lists fuschia in the same category :)
<immibis> but fuschia doesn't look like it fits into this class, it's just also a microkernel
<immibis> persistence is also interesting. Apparently they demoed it by ripping the power cord out of a running computer, then plugging it back in
<immibis> i'm not convinced a single-level store is efficient, but it certainly is interesting for that reason
<moon-child> the cpu does pretty well at synchronizing a virtually single-level store across 4-5 different levels of actual storage
<moon-child> (l1,l2,l3,ram, maybe virt. registers if your cpu is fancy enough)
<bslsk05> ​twitter: <cmuratori> This is a great diagram from Anandtech ( <anandtech.com/show/16805/amd… https://t.co/c2j2THqgcW> ). It uses color to show the relative cost of communicating between any two cores of a 64-core Threadripper. The physical layout of chips is becoming increasingly important to performance-oriented programming! https://pbs.twimg.com/media/E6RtVi1VkAIS3iS.jpg
<immibis> and yet we have all these tricks to try and trick it into being efficient. I suppose it's a standard flexibility/efficiency tradeoff. If you had to allocate cache lines, either nobody would ever bother, or you'd run out of cache lines sometimes and slow everything else down
<immibis> increasingly important? I thought NUMA was already important
<moon-child> perf beyond that can be taken care of with hints (the cpu has hints too--prefetch and such), and the programmertime/computertime tradeoff applies
<graphitemaster> Hate so say it but realtime requirements of video games are soon going to be looking into actual physical distance between CPU cores
<immibis> doubts on "soon"
<moon-child> immibis: allocating cache lines is workload-sensitive. If you did that manually, (adjusted cache allocations in response to workload) you'd basically be duplicating work in every application/lib, and get a negligible perf benefit
<immibis> this is latency, right? gamedev is moving in the direction of storing and processing big streams of data
<immibis> I think
<immibis> with minimal interactions between the different streams
<immibis> each time you join two streams you have to pay an inter-core latency, but... that's 0.1us per join and you have 16000us and not so many joins
<moon-child> monitors are getting faster, and kernels are as greedy for throughput as they've ever been. You gotta keep up with the monitor and not drop frames cause the scheduler decided to jitter you a little
<immibis> hopefully
koolazer has joined #osdev
<immibis> rather than caring about physical distance they might start caring about keeping data on the same node that processes it.
silverwhitefish has joined #osdev
<moon-child> yah, I was think about ways to model video games as actors
<immibis> i can see physical distance mattering if they want to parallelize further - then they want to run 8 parallel threads on the same dataset on the 8 tightly-coupled CPUs if they have some auxiliary data structure they all share
<moon-child> then you basically do graph partitioning to try to put actors that like to talk with each other on the same cores
<moon-child> similar space for numa opts
<immibis> i don't think that's new. Well it's relatively new, but you're not coming up with it right now, and they call them "systems" rather than "actors"
dormito has joined #osdev
<immibis> don't have a good reference other than "something i saw on a gdc presentation on youtube once"
<moon-child> systems as in ec/ecs?
<immibis> yes. But ECS is a vague term with many specific variants
<immibis> and many unrelated ideas some of which are incompatible with each other
<moon-child> concept is very different from that
<immibis> but one of the ideas is decomposing your game loop into transformations on arrays of components
<moon-child> I later found out that carmack had experimented with largely the same thing 10 years ago. Impossible to beat him :P. https://www.youtube.com/watch?v=1PhArSujR_A 16:25, I think
<bslsk05> ​'John Carmack's keynote at Quakecon 2013 part 4' by Kostiantyn Kostin (00:29:59)
<immibis> the trivial canonical example being `position += velocity * timestep;` -> `all_positions += all_velocities * timestep;` -> run this on the core that has all_positions and all_velocities in local memory, while another other core culls bounding boxes or something
<moon-child> everything is immutable, frame data is double buffered. Very good for multicore because you don't have any contention on your writes
<moon-child> and things like interpolation, time travel (cf multiplayer/rollback, or braid-style as a game mechanic) become much easier as an added bonus
<immibis> immutability is not necessary, but clear data flow is. Allocation and garbage collection when you know the old thing isn't used any more is a waste of time
ahalaney has quit [Remote host closed the connection]
<immibis> if you prefer, think of it as compile-time garbage collection. "This isn't used any more but we want a new buffer of the exact same size, so overwrite it"
asymptotically has quit [Quit: Leaving]
superleaf1995 has joined #osdev
<moon-child> call it sophisticated manual memory management (where malloc/free, naive manual memory management, is rarely of use)
<moon-child> immutability is the only mechanism I know of for guaranteeing clear dataflow at the architectural level
<immibis> proper prior planning prevents piss poor performance
<immibis> you update the thing. now every bit of code that accesses the thing is accessing the new thing
<moon-child> okayyyyy, but I want to evolve my designs...
<immibis> then you'd better evolve the plan
<moon-child> I can't redesign my entire application every time I think of a new feature I want to add
<immibis> but you can update your dataflow graph
<immibis> are we still talking about game loops?
<moon-child> I was about to say, I think we're talking at cross purposes :P
<moon-child> I was talking about game loops. I think I'm not quite sure what point you're making, though
<immibis> you can certainly write games in haskell, and most games have stuff at the periphery that can benefit from it, but what do you get from making the core game loop functional?
<immibis> you mentioned past states for interpolation
<immibis> but you can design that as a ping-pong or circular buffer for example. You don't need to involve the garbage collector.
<immibis> if different entities interpolate from different time points it may destroy the cache, but i'm not sure how you would fix that or how immutability would help it
<moon-child> ok, then let me explain from scratch, because I don't think you need to involve garbage collection
<moon-child> proposed model is that 'game state' (call it S) is a collection of entities, each of which can be transformed, updating them from one frame to the next
<moon-child> every entity has r/w access to its own state as of frame n+1, and ro access to the state of every other entity as of frame n
<moon-child> doing things this way lets you update all the entities in parallel and gets you (imo) an architecture which enforces separation of concern
nur has quit [Remote host closed the connection]
<moon-child> you don't need garbage collection in a general sense; you do need to deal with entity creation/destruction/reference somehow, but that's a general problem which gc doesn't even help with
<immibis> I am not thinking of "entities" having access to anything. Rather processing steps, or systems
<immibis> (I have tried writing a game in this style and it is practical for some systems, impractical for others. DoorSystem, really??)
<immibis> (I hear that some games have a catch-all "scripting system" for little events like "click mouse to open door")
<immibis> (on the other hand, there's a differential equation that updates the world state, and evaluating that as one step on big vectors is much better than trying to evaluate it for every grid square in parallel)
<immibis> (in sequence*)
<moon-child> yeah, physics is something I struggled with coming up a sensible design for
<moon-child> one thought I had is--currently there's one sync point, which is 'tick', and graphics are centralized. Could add multiple sync points and do centralized physics as well. But that's getting dangerously close to an explicit dependency graph (a la make), which is something I'm trying to stay away from
<immibis> an explicit dependency graph is exactly what you want to maximize parallelism
Skyz has joined #osdev
<immibis> you can do any step as soon as all its input steps have completed
<Skyz> KeyKOS came out when systems research had relevance, interesting
<bslsk05> ​cap-lore.com: U.S. Patent 4,584,639
<immibis> it is odd because you were not online when I mentioned capability-based systems
<bslsk05> ​libera.irclog.whitequark.org: #osdev on 2021-07-14 — irc logs at whitequark.org
<Skyz> Yeah I looked at the logs
<immibis> apparently GNOSIS was designed for a time-sharing system, their equivalent of cloud computing
<immibis> Rather than many identical servers, they wanted to share a smaller number of computers among a large number of users, and software would be written specifically for their system
<immibis> if programs can be ported to ARM because $megacorp said so, they could be ported to KeyKOS because $othermegacorp said so
<Skyz> Not true
<Skyz> I guess it works with M$ now and windows 11
<immibis> programs have already been ported to a Mach because of $megacorp, but it's not entirely fair because they actually use the unix emulation layer
<immibis> on mac is unix built on top of mach or is it a peer with mach?
<immibis> i think i read darwin implements both unix and mach and the mach stuff is mostly vestigial
<Skyz> Mach-o? or Mach?
<Skyz> Because GNU has a project called Mach
<bslsk05> ​developer.apple.com: Mach Overview
<Skyz> I've never actually read apple's documentation
<Skyz> Mach is built on top of unix I believe
<immibis> well in any case the point is you use what $bigmegacorp says or else
<immibis> if you're a $bigmegaos programmer
<immibis> because your customers are using $bigmegaphones
<Skyz> Yeah
<Skyz> They are who pay in the end
<Skyz> So the $megacorp does what they say
<Skyz> But $megacorp can also impose new things
sortie has quit [Quit: Leaving]
Skyz has quit [Quit: Client closed]
<geist> immibis: i see ewhat you did there with $
<immibis> users never paid for ARM hardware. That was entirely the corp's decision. They wouldn't pay for a new kernel, similarly
<immibis> well they're paying for hardware, but they don't care whether it's ARM or not
immibis_ has joined #osdev
Burgundy has quit [Ping timeout: 265 seconds]
immibis has quit [Ping timeout: 268 seconds]