klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books
<klange> And the fact that subsequent boots are immediate suggest it's not a display problem. The only consistent thing here is "write to card, get 3 minute boot before ARM code runs"
<clever> and that sounds like its a stage2 problem not stage1
<clever> so you would instead want to do uart_2ndstage=1 in config.txt on the fat32
<clever> and then watch the uart
<klange> UART the only way to get logs out of that?
<clever> you can run the closed-source vcdbg under linux to query the logs
<klange> That requires a new SD card... >_<
<clever> > There is an alternative implementation of part of vcdbg that you can try: https://git.venev.name/hristo/rpi-vc-log
<bslsk05> ​redirect -> forums.raspberrypi.com: Debugging device tree with yocto - Raspberry Pi Forums
<bslsk05> ​git.venev.name: rpi-vc-log - Unnamed repository; edit this file 'description' to name the repository.
<clever> this also claims to be an open source log dumper
<clever> if you could port that to your os, and give it the ability to read /dev/mem, it may work
<clever> oh, and on the subject of needing a 2nd card, there is a trick that is poorly documented
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<klange> i wonder if usb boot has this problem, maybe i can just shove this image on a regular USB stick...
<clever> ~3 ways to trigger it
<clever> from linux, you can run `reboot 42` to set the boot partition
<clever> on startup, the firmware will look at the 42nd fat32 partition for all firmware
<clever> (mbr extended partitions)
<clever> using that, you can have multiple OS's, each with its own fat32 of firmware+kernel
<clever> and hot-swap between them
<mrvn> I should implement snake.
<clever> plan b: create an autoboot.txt on the 1st fat32 partition, set boot_partition=42 in there, and you can use the officially documented gpio conditionals like [gpio12=0] to make change your mind
<clever> plan c: (for your os), you encode the partition# into the RSTS register (i can give docs) before you do a watchdog reset
masoudd has quit [Remote host closed the connection]
<klange> I have a long weekend ahead and my main goal is to get PCIe initialized and try to start talking to the xhci.
<clever> oh, there are also 3 things you can do to cheat here
<clever> 1: only put firmware+kernel on usb, but keep rootfs on SD so you dont rely on pci-e just yet
<clever> 2: you can get xhci without pci-e!, the usb-c port is an xhci controller
<mrvn> Plan for next weekend: understand the reloactions in ARM ELF32 and ELF64.
<clever> 3: put firmware+kernel on tftp, as you mentioned earlier
<klange> I specifically need the controller the integrated keyboard is attached to :)
<clever> ah right
<clever> thats on the pci-e vl805 one
<mrvn> clever: is that a Pi400?
<clever> yeah
<mrvn> Don't have one of those (yet)
<clever> k: and after you get pci-e online and setup the BAR's, you must tell the firmware to re-init the vl805
<clever> because RPF cheaped out and the vl805 firmware chip is missing
<klange> i read about that, they used to include the dedicated EEPROM for it, but in newer board revisions of the 4 they dropped it and now the VC needs to load the firmware, presumably from its own EEPROM?
<clever> yep
<klange> it's just a mailbox command after setting up PCIe and initializing a BAR for the xhci controller?
<clever> yep
<clever> there is a magic addr and data reg within the pci-e config space
<clever> if you put an addr into the addr reg, then the data reg acts as a proxy'd window to that addr in the vl805 addr space
<clever> the VC firmware uses that to side-load a vl805 bootloader, and poke debug registers to force a jump to it
<clever> and then the vl805 will dma its own firmware from the host ram, also arranged by the VC firmware
<clever> but it also doesnt have enough ram to hold its own firmware!
<clever> so it cant boot fully, until you configure the standard xhci scratch-space
<clever> it will then use that to steal host ram, and keep its firmware/state
<klange> if you have any docs for this that aren't "here's linux sources" I'd love to read them :D
<clever> only what ive deduced from decompilign things: https://github.com/librerpi/rpi-open-firmware/blob/master/docs/vl805.txt
<bslsk05> ​github.com: rpi-open-firmware/vl805.txt at master · librerpi/rpi-open-firmware · GitHub
<klange> this is already going to be enough fun as it's my first time seriously poking at an xhci controller / usb in general, so getting all the rpi/bcm unfun out of the way first would be most desirable...
<clever> there is a second xhci controller you can play with, that has no firmware to deal with
<clever> if you set otg_mode=1 in config.txt, the broadcom xhci will be routed to the usb-c port, and added to your device-tree
<clever> no pci-e, no firmware, just plain xhci
<clever> but its also not able to reach the keyboard, so only external devices
<klange> that might be worth looking at in the future, but as mentioned earlier, unfortunately my end goal right now is "talk to mushy membrane keyboard" :(
<clever> and the pi400 can only be powered via usb-c (no gpio tricks)
<clever> yeah
<clever> that membrane keyboard, is on a holtek MCU
<clever> which has its firmware in some write-once flash
<clever> it deals with converting the matrix into usb, but also deals with the power button logic
<clever> via unknown means, the holtek can run when "off", and can turn the power to the system back on when you hit f10
<klange> ah, I was going to ask if there was a way I could possible get that signal without USB
<clever> and if you hold fn+f10 for just the right amount of time, it will signal a gpio on the soc, telling linux to shutdown
<klange> docs say 2 seconds
<clever> if you hold it for too short of a time, nothing happens
<clever> if you hold it too long, the firmware forces a hard power cut
<klange> which is probably too long for a crappy one-bit emergency input device...
<clever> also, the clean shutdown, only happens if you release the key
<clever> so you have no way of knowing if you held it long enough, until your no longer holding it
<clever> one sec
<clever> # evtest --grab /dev/input/event0
<clever> with this under linux, i can see that fn+f10 just sends the standard KEY_POWER code
<clever> and --grab means systemd cant react to it
<clever> but, if i fn and tap f10, nada
<clever> but if i fn+hold f10, UPON RELEASE, i get the key-down and key-up events!
<klange> Hm, so, I'm holding this thing down for quite a while and nothing happens, so does that 'firmware forces a hard power cut' maybe need some message from the OS to enable?
<clever> it only works once start4.elf has booted
<clever> so if your at the hdmi diagnostics, your screwed
<clever> just pull the cord
<klange> I'm booted and watching a Doom demo
<clever> ive also seen some reports, that the first press of fn+f10 is always ignored
<clever> only the 2nd press works
<clever> and onward
<klange> hopefully my keyboard isn't broken ;) it was working the last time I was running a Linux on it
<bslsk05> ​al.zerostem.io: Xray
<clever> this also has hi-res photos of some rpi products
<clever> including the official keyboard, which uses the same MCU
Burgundy has quit [Ping timeout: 272 seconds]
<klange> long pi
<clever> at least remove the uSD card before you remove the board
<clever> thats how cards get beheaded :P
<klange> ha; old photo anyway, from when it arrived
<clever> also, notice the empty pads for ethernet leds on the left end?
<klange> rightmost side of the board near the ETHERNET silkscreen?
<clever> yeah
<klange> can I populate those to get my blinkies?
<clever> possibly
<clever> the current limiting resistors are also missing
<clever> just probe those pads with a voltmeter, and you can probably figure it out
<klange> I am disastrously unfamiliar with these things, despite having been an early buyer of the original Model B a decade ago.
<clever> http://ledcalc.com/ is the key
<bslsk05> ​ledcalc.com: Current limiting Resistor calculator for leds
<clever> first, look at the pads, and figure out which one is ground, put the black lead there
<clever> then use the red lead of the voltmeter, to confirm the pad voltage comes from for each led
<clever> and what voltage it is
<klange> I mean RPis in general, and I'm not that interested in blinken lights for the Ethernet.
<klange> Need to write a damn genet driver first anyway...
<clever> heh, yeah
<klange> the bsd one is only 1500 lines, can't be that bad?
<clever> and i think the pi4 series has at least 3 RGMII PHY's to deal with
<clever> genet isnt on the pci or usb bus
<clever> so genet may be simpler then pci-e
<clever> do you happen to have dwc drivers?
<clever> from past pi models
<klange> I have never targeted a Pi before.
<clever> ah
<klange> as I said, disastrously unfamiliar with these things :)
<clever> i'm dangerously familiar with the entire rpi line, lol
<clever> i know things about the hw that even the rpi engineers arent familiar with
<klange> things i imagine they might be contractually obligated to not try to know
<clever> 6by9 has clearly answered some questions with "i choose not to answer" i think it was
<clever> but multiple users claimed a feature doesnt exist, yet i had mountains of evidence it did
<clever> and i later found the disable flag, in the headers RPF released back in the pi1 days :P
<clever> multiple engineers*
<mrvn> might be stuff that only works sometimes
<clever> mrvn: it was a security flag for the 2d video output hw
<clever> its been set to non-secure mode since the very first pi firmware ever released
<clever> and the linux drivers only work when set to non-secure
<klange> i have so many things on my todo list
<bslsk05> ​twitter: <ToAruOS> That would look a lot better if I had known I needed to ask for the full CPU speed! Despite being an early buyer of the original Model B, I’m not exactly experienced with the RPi’s. https://video.twimg.com/ext_tw_video/1493900031895941120/pu/vid/1280x720/5XgZqHPkGzmBQPDN.mp4?tag=12
<clever> i assume your not even using the 3d core yet?
<klange> nope
<clever> ive got docs and example code for the vc4 era 3d core
<clever> the bcm2711 3d core has under-documented differences though, but it does have mesa support
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<clever> when in 2d mode, you must create an array of vertex data
<klange> I have yet to support any 3D acceleration, or even 2D acceleration, anywhere, and it'll be rather ironic if the VC is the first thing I support
<bslsk05> ​github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub
<clever> and then you create an array of indexes into that vertex data, each set of 3 forms a triangle
<clever> the hw then figures out every pixel in the triangle, and runs the shader to compute each pixel's color
<clever> the hw will also interpolate the varyings between the 3 corners, giving a smooth transition from one to the other
<mrvn> I'm waiting for a libvulcan for the VC.
<clever> mrvn: what does vulcan really do differently?
<mrvn> clever: have a nicer API (as horrible as it is) than the VC direct
<clever> got a pro/con list for each?
<mrvn> vulcan vs. vc?
radens has joined #osdev
<clever> vulcan vs mesa
<clever> or opengl?
<clever> you rarely use v3d directly
<mrvn> it's all chinese to the uninitiated
<radens> Say I have a passthrough hypervisor which I'm installing a gdb stub in. How do I pull a sibling core out of VM mode so I can poke its registers? Right now I give the guest complete access to the APIC etc.
<clever> radens: either the hypervisor has to own the irq controller and emulate one to the guest, or the irq controller needs to support VM's and route some irq to the hypervisor and others to the current guest
<mrvn> clever: vulcan is the nex shiny toy. everybody has to use it
pretty_dumm_guy has quit [Quit: WeeChat 3.4]
<radens> clever: yeah that's what I was thinking but I haven't read much about apic virtualization.
<mrvn> clever: does the mesa for RPi let you set shaders?
<clever> mrvn: i believe it does
<mrvn> clever: do you have a shader compiler for the VC or do you have to write them by hand?
<clever> it compiles standard GLSL to QPU for you
<clever> the user supplies a vertex and fragment shader
<clever> but mesa then turns the vertex shader into both a coordinate and vertex shader
<clever> the coordinate shader takes a list of attributes for each vertex, and turns it into xy coords on a 2d screen, for binning
<clever> the vertex shader takes a list of attributes, turns it into xy again, but also produces a list of varyings
<clever> that then forms polygons (usually triangles)
<clever> the hw will then run the fragment shader on each pixel in the poly, and interpolate the varyings for you
<bslsk05> ​'2d and 3d demo' by michael bishop (00:00:21)
<clever> in this demo, the varyings directly set the r/g/b channels
<clever> but in a more traditional setup, the varyings supply the xy coordinate of a texture, and the fragment shader will query that pixel in the texture
<gog> https://gist.github.com/adachristine/f4eeb37dc1bdf9d4aaa584e6aa603580 am i doing an atomic increment wrong? i just do "count++" on it and this is the code that it makes
<bslsk05> ​gist.github.com: gist:f4eeb37dc1bdf9d4aaa584e6aa603580 · GitHub
<gog> but it never seems to get incremented
<gog> should it be volatile?
<mrvn> where is the source?
<moon-child> shouldn't need to be volatile
<moon-child> debugger might help
<mrvn> 5347: f0 48 0f c1 05 f0 25 lock xadd %rax,0x25f0(%rip) # 7940 <count>
<mrvn> it's "volatile"
<gog> yeah i'm stepping it rn
<mrvn> gog: asre you incrementing it in the IRQ handler and reading it in normal code?
<moon-child> ohh yea that's probably it
<gog> yes
<moon-child> slap an atomic_load in the normal code
<moon-child> see if it changes
<gog> aha ok
<clever> gog: the normal code doesnt know your modifying the var, and cached the load
<mrvn> yeah, then it's a case for volatile. Except volatile is deprecated.
<clever> volatile is one solution, to forcing it into loading every time
<gog> i figured i didn't know the semantics
<moon-child> also, you can explicitly use a relaxed memory ordering
<moon-child> assuming this is per core
<mrvn> no need for atomic there actually
<moon-child> not appreciably faster but meh
<moon-child> mrvn: you do need it to be atomic, but it can be relaxed
<moon-child> cuz reentrancy
<mrvn> moon-child: not if the IRQ increments it and everyone else only reads
<moon-child> imagine you have a non-atomic rmw, and it gets preempted between the rm and the w
<moon-child> well sure
<mrvn> or, reentrancy would need atomic again
<mrvn> Is anyone using reentrant IRQs?
<moon-child> not for timer maybe but, say, event queues. I want to be able to push events both from an interrupt and from regular code
<gog> well it seems to have done the trick. now to debug the task switch. tomorrow though.
gog has quit [Quit: byee]
<mrvn> moon-child: so have the event driver just wake up the driver process.
<mrvn> event interrupt
<mrvn> single readre, single writer. makes the whole code much simpler.
<clever> mrvn: there is a thing most cpu's implement, that make that way simpler then you may even imagine
<clever> basically, a given 32bit int (width varies by cpu), that is written to a size-aligned address, will never be seen partially written
<clever> so the reader will only ever see the old value or the new value, and nothing in between
<mrvn> clever: most drivers need more than 32bit for a request
<clever> a: use a ring-buffer, and use that 32bit pointer to point to the next message
<clever> b: have an "is valid" flag in the message, and the reader polls that on the next message in the array
<mrvn> So then you need a lock for the request structure. And what if the interrupt happens while the request queue ist locked. Now you have to schedule in interrupt. or have critical secions in the driver where IRQs get disabled.
<clever> write the entire structure first
<clever> if you get interrupted, the reader doesnt know it exists, no harm
<clever> once the structure is written, set an "is valid" flag with a 32bit write
<mrvn> clever: I'm talking hardware structures. You can't pick your own format for e.g. the descriptors for the NIC
<clever> boom, the reader will now obey it, and you must never modify that object again
<clever> xhci implements the exact protocol i just described
<mrvn> clever: that's nice. That still leaves you the problem that you have to try&repeat writing the 32bit pointer.
<clever> there is no try/repeat, you just write it when the message is fully formed
<mrvn> clever: and then the IRQ and normal process overwrite each other
<clever> if you get interrupted, the scheduler will resume you normally, and you will finish the message
<clever> single-writer, the normal and irq should never fight over a given queue
<mrvn> clever: can't have 2 writers without extra checks
<clever> yep
<mrvn> "able to push events both from an interrupt and from regular code" ==> 2 writers
<clever> yeah, you would need 2 queues, and the reader must read both
<clever> or use atomics
<mrvn> best to avoid it unless really necessary
<mrvn> What you can do is have the IRQ write and the process read or vice versa. Or have the process write to an im-memory queue and the IRQ transferes from memory to hardware and back.
<mrvn> But I recommend always having clearly separate jobs for the driver process and IRQ handler.
<clever> for my 2d driver, a normal thread writes to the dlist memory in the hardware, but that on its own has no effect
<clever> that normal thread then stores the index for the start of a list, in normal ram
<mrvn> I'm also a big fan of the automotive factory model of multithreading.
<clever> the vsync irq handler, then copies that index to the "start of display list" register, causing the actual page-flip
<clever> so the normal thread sets up the next frame out of irq, and queues it up via a global var
<clever> and the irq handler is then the reader, and doing the actual page-flip on v-sync
<mrvn> one writer, one reader. easy.
<clever> yep
<mrvn> can the VC do variable refresh rates?
<clever> there is an "idle" interrupt from the video subsystem
<clever> which implies it can generate 1 frame and then go idle
<clever> but i dont know how to configure that, or which outputs support it
<mrvn> There seem to be 2 standards for it and one you have to pay for.
<clever> vc4_regs.h:# define SCALER_DISPCTRLX_ONESHOT BIT(29)
<clever> there is a one-shot mode in the HVS
<clever> vc4_hvs.c: (oneshot ? SCALER_DISPCTRLX_ONESHOT : 0);
<mrvn> does that render a new frame each time you write a new dlist?
<clever> reading the src...
<clever> vc4_hvs_init_channel() accepts a `bool oneshot`
<mrvn> .oO(That's how I would implement it in the hardware)
<clever> and will put the channel into a one-shot mode
<clever> bool oneshot = vc4_crtc->feeds_txp;
<clever> ahh, and thats only used by the transposer
<clever> the transposer is used for rendering to ram
<clever> so one-shot makes sense there
<mrvn> It's rather stupid that we still draw a frame at 60HZ to a TFT that first stores it in memory and then draws it to the cells in it's own time.
<clever> so we have no examples of variable refresh rate to actual displays
<clever> a lot cheaper tft displays, arent capable of buffering anything
<clever> you must supply pixels at the correct refresh rate, or the lcd will decay between refreshes
<mrvn> clever: I've never seen a TFT that couldn't scale the input to fit full screen and that needs buffering.
<clever> if its capable of scaling, then it has a buffer
<clever> and thats not part of the display, thats part of the controller
<clever> i'm thinking low-level stuff like DSI and DPI
<mrvn> I'm talking desktop monitors here. The embedded TFT displays are a different matter.
<mrvn> the RPi TFTs probably don't have a buffer.
<clever> oh, and scaling only needs to buffer 2 scanlines anyways
<clever> you dont have to buffer the full frame
<clever> just store line3 and line4, and interpolate as you generate lines 3.1 3.4 3.6 3.8 4
<clever> and once you cross 4, you received 5, and can repeat
<mrvn> anyway, even without buffering the TFT holds the pickture a long time. Having the refreshs happen at irregular intervals of roughly 60Hz would always be fine.
<clever> the rpi even does thet internally, for its image scaling
<mrvn> Unlike a CRT where the beam follows a fixed pattern that you can't just pause.
<clever> i should play with that one-shot flag, and see if i can make it do things
<clever> the design of the hvs is still heavily based around the concept of racing against the electron beam
<clever> after you configure a list of images to display, the HVS will composite them into a scanline, and push that into a fifo
<mrvn> Would be nice for games. Render the frame, push it out the display port, render next frame, push it out. If the rendering takes a bit longer or shorter? Who cares.
<clever> and the fifo lets the hvs do a bursty/jittery job, as dram latencies slow it down
<clever> yeah, thats a problem in the opengl api
<clever> if it takes you 1.1 frame-times to render, you may miss every 2nd vsync
<clever> and your 60fps instantly turns into 30 fps
<mrvn> it's a hardware problem that should have been fixed since TFTs came about.
<clever> oh, that reminds me of other weirdness
<clever> my lcd monitor, cant go much beyond 60hz
<clever> my crt monitor, can easily do 120hz
<mrvn> Also why do PCs never generate a vsync interrupt so videos don't tear?
<clever> xorg doesnt have a good api for that
<clever> you know about front and back porches?
<mrvn> The odd part is that if the video tears it ALWAYS tears at the same spot in the frame as far as I can tell. So something synchronises it exactly to the frame rate but just offset so it tears.
<clever> i had that exact bug, when i would write the whole dlist on vsync
<clever> it was taking too long, so it page-flips 10% into the frame
<clever> thats why i now pre-write the dlist out of irq, before vsync even occurs
<bslsk05> ​github.com: rpi-open-firmware/pixelvalve.md at master · librerpi/rpi-open-firmware · GitHub
<mrvn> yeah, the flip has to be atomic
<clever> i also have 10 interrupts to pick from
<clever> hsync start, h-backporch start, hactive start, h-frontporch start, vsync start, v-backporch start, vactive start, v-frontporch start, f-frontporch end, and idle!
zaquest has quit [Remote host closed the connection]
<mrvn> clever: I think there it shows that the VC was made to displays DVDs. Can't have tearing in a DVD player.
<clever> the HVS can also accept yuv420 images right ouf of the mpeg2 decoder
<clever> so you dont have to spend any cpu on yuv->rgb conversions
<clever> oh, and the bcm2835, was in the roku2, a media streaming brick!
zaquest has joined #osdev
<mrvn> It's really sad. Back in the 90s on my Amiga I had perfectly synced video output without tearing. Hell, in the 80s on my C64 I had that too. Now we have 2022 and videos tear.
<clever> have you seen what the emu68 guys are doing?
<mrvn> no
<clever> watch this
<bslsk05> ​'Amiga 2000 playing MP3 while rendering [ PiStorm / Emu68 ]' by Simo Koivukoski (00:02:30)
<clever> that is an rpi, replacing the 68k cpu of an amiga 2000, and emulating just the cpu, and then bit-banging the amiga bus to drive the rest of the system
<mrvn> How did they interface the RPi?
<clever> the GPIO pins are basically just going thru a level shifter and into the addr/data bus
<mrvn> wow. Bit banging at 16MHz is hard.
<clever> the SMI peripheral speeds that up a lot
<mrvn> Plus you have a shared bus and have to honor the bus protocol
<clever> the rpi has a dedicated hw interface, for controlling an 18bit data bus with a 6bit addr bus
<clever> and support for external signals to stall it
xenos1984 has quit [Read error: Connection reset by peer]
<bslsk05> ​'Amiga 2000 4K UHD RTG [ PiStorm / Emu68 ]' by Simo Koivukoski (00:03:05)
<clever> they then wrote RTG drivers for the HVS with my help
<clever> so you now get 4k uhd on amigaos :P
<clever> https://www.youtube.com/watch?v=xf97KXPZ1Lo and here is an FPS demo
<bslsk05> ​'piStorm Emu68 barebone FPS showcase' by Tomaz Kac (00:13:45)
heat has joined #osdev
<zid> I just wanted to take a moment to point out how amazing the playstation docs are
<zid> 16-31 0 Not used (usually 0 for OTC, or 1 ("one block") for CDROM)
<mrvn> clever: but that's on the RPi video, right?
<zid> they're 0, not used, but also 0, and also 1
<clever> mrvn: yeah, the RTG drivers are passing bitmaps from the amigaos guest to the rpi HVS, where it then comes out hdmi
<clever> mrvn: the crazy thing, is that none of these drivers are implemented in the host, the MMIO is just directly exposed to the guest, and all rpi drivers get compiled to 68k asm, lol
<mrvn> clever: is that the normal bitmaped data or chunky like any modern hardware has?
<bslsk05> ​'Tweaking VC4 scaler engine' by Michal Schulz (00:01:55)
<clever> i pointed the emu68 dev towards the scaling kernel, and this then popped out a few days later, lol
<clever> those parameters control how it up-scales your low-res bitmap to the hdmi resolution
<clever> and let you get either fuzzy or blocky pixels
<mrvn> But from the sceen mode selector that looked like chunky pixel. So basically the RPi is a separate graphics card add-on.
<clever> yeah, from the viewpoint of the amigaos, its using the RTG (retargettable graphics) api
<clever> which is what a lot of amiga add-on cards used
<clever> instead of the program directly talking to the hw, it uses a proper api, which you can then replace routines in, to create new drivers
<clever> but older amiga programs dont speak RTG, and still spit video out of the original amiga video chip
<mrvn> clever: all the GUI stuff is nice that way. But games often didn't use the OS.
<clever> exactly
<clever> there is a seperate project, that clamps onto the stock video chip, and sniffs the digital rgb signals going to the DAC
<mrvn> Is there code to transform the graphics memory into a texture for the RPi?
<clever> it then plugs into the camera port of an rpi, and shows up as a video stream
<mrvn> that's cool too.
<clever> you cant just read the graphics memory, because there are chips on the motherboard modifying the bitmap without the cpu being aware
<mrvn> clever: you can read it. or snoop the bus.
<clever> the 68k cpu port cant snoop that bus
<clever> its pins dont change when another device does DMA
vin has quit [Ping timeout: 240 seconds]
<clever> and you would burn a ton of bus bandwith fighting the onboard GPU over re-reading the graphics memory
<mrvn> oh? It only gets the bus-busy signal and doesn't see the address and data lines?
<clever> other chips on the board also change the gpu config on h-sync interrupts
<clever> so the bitmap your reading, wont even be at the same address for a full frame
vin has joined #osdev
<clever> all kinds of crazy stuff is happening
dude12312414 has joined #osdev
<mrvn> clever: not just h-sync. The Amiga has a processor that runs code syncron to the beam. You can wait for screen positions and then modify graphics register on the fly.
<clever> exactly
<clever> and you have no easy way to know what the config is at a given pixel and re-creating the image
<mrvn> well, you can read the code
<clever> you also have to emulate the code on both the 68k and the copper, to cycle level accurate
<clever> and at that point, you might as well just be using a full system emulator on a normal system, and ditch the amiga board
<mrvn> And if you emulate the copper and alice and all you wouldn't need to write the data to the slow graphics memory. Just keep it in RPi memory.
<mrvn> :)
<clever> those chips might be reading data the floppy controller wrote to chipram
<mrvn> not sure the floppy does DMA. but I get the point.
<clever> it does
<clever> the floppy controller writes the raw data directly to ram, a whole track at a time
<mrvn> Do you know what and emu68 interface costs?
<clever> and the copper then does demodulation
<clever> the board is called pistorm, and its for sale on ebay and other places
<clever> looks like about $37 on ebay
<mrvn> Crap, A500, A500+ and A2000 only. I could do A600, A1200 or A4000.
<clever> where did i put those notes...
<clever> ah, pistorm600 is for the A600
<mrvn> Looks like quite a bit of hardware on the PoStorm to handle the bus protocol.
<clever> for the old revision, it was just 8bit latches and thats it
<clever> the newer one uses an fpga
<clever> got a discord acct?
<mrvn> Hard to bit bang on the clock edge otherwise.
xenos1984 has joined #osdev
<mrvn> Just the other week I've been thinking about what it would take to make a Zoro3 board for the amiga that connects to an adruino or RPi.
<mrvn> Taking over the cpu is even cooler though.
zid has left #osdev [#osdev]
[_] has joined #osdev
[itchyjunk] has quit [Killed (silver.libera.chat (Nickname regained by services))]
[_] is now known as [itchyjunk]
_xor has quit [Quit: brb]
ElectronApps has joined #osdev
epony has quit [Ping timeout: 240 seconds]
pounce has joined #osdev
radens has quit [Quit: Connection closed for inactivity]
Maka_Albarn has joined #osdev
<Maka_Albarn> Hey, does anyone know of a way to write multiple assembly functions at the same time with AT&T assembly? And I mean functions that do the same thing with an iterating value?
<Maka_Albarn> I'm wanting to avoid having to write individual stubs for IRQs if I can make a macro to fill in the contents.
<heat> .macro and .endm
<heat> you can also use .rept and (.endp? not sure) plus some .set trickery to do an effective for loop of macros
<bslsk05> ​gist.github.com: macro.s · GitHub
<geist> and/or you can compile it with gcc and use the preprocessor
<geist> #define and whatnot
<geist> yah that should work, or something like that
<Maka_Albarn> just multidefine irq x?
<geist> there's even a way to spit out 256 of them in a row with an incrementing value
<Maka_Albarn> \me's eyes widen
<bslsk05> ​github.com: lk/exceptions.S at master · littlekernel/lk · GitHub
<Maka_Albarn> stupid formating
<geist> that's a little fancier than you need, but the .rept stuff + .set i = i + 1 lets you iterate with.a different vlaue between runs
<Maka_Albarn> hmm
<geist> i did it that way so that each of the ISR functinos are a fixed size so when setting up the IDT i can just bump a pointer forward
<geist> and plus they're ailgned for performance (as if it matters)
<geist> but you could easily have that spit out a isr0: isr1: isr2: etc
<Maka_Albarn> what happens in assembly if you have something like pushl $(1+3)?
<bslsk05> ​github.com: toaruos/irq.S at master · klange/toaruos · GitHub
<Maka_Albarn> does it run as pushl $4?
<geist> it should
<bslsk05> ​github.com: rpi-open-firmware/traps.S at master · librerpi/rpi-open-firmware · GitHub
<geist> assember can solve simple arithmetic
<Maka_Albarn> I just wanted to make sure that it wouldn't read it as a memory location or something stupid
<clever> Maka_Albarn: ive also done symbol - symbol as well, to compute sizes: https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/arm/payload.S#L18-L19
<bslsk05> ​github.com: lk-overlay/payload.S at master · librerpi/lk-overlay · GitHub
<clever> this turns a start/end symbol, into a start/length value
<clever> and then on the c side, i claim arm_payload_array is an array of structs, so it turns into a simple structure
<Maka_Albarn> heat: Thank you. That worked perfectly
heat has quit [Ping timeout: 240 seconds]
<klange> this was an amusing thing to debug
<klange> first I had an issue where I wasn't even sending the wakeup signal correctly because I used one of my MMU APIs that I had forgotten considered '0' a special value _and guess what page the spin tables are in_...
<klange> Then I struggled with cache problems, as I was trying to store things in the kernel and then reference them without the mmu in the AP startup sequence
<klange> and then I was scratching my head for a while until I realized I forgot to flush _one_ variable...
<clever> ahh
<klange> I ended up debugging the whole thing by reusing the _one_ piece of data I knew was working: the spintable.
<klange> So after the asmstub code jumped to my code, I replaced the spintable value with a series of other values, which the main cpu was reading in a loop
<clever> ahh
<clever> i did that kind of debug, by just printing a char to the uart for each core
<clever> like printing out '0' + corenr
<Maka_Albarn> \me
* Maka_Albarn hums
<klange> the pi is a bit warmer, but not on fire, I'll let it spin on doom for a while
<clever> the official firmware will also do thermal throttling
<clever> so the arm freq will be changed without warning, if things start to cook
<klange> I don't have anything monitoring it, but Doom is still as smooth as it was. I'm using ~25-50% of each core according to my idle stats
<clever> oh, i can tell you how to read the temp sensor
<bslsk05> ​github.com: lk-overlay/temp.c at master · librerpi/lk-overlay · GitHub
<clever> this came from the linux source
<clever> you can entirely ignore setup_tsens, the official firmware already did that for you
<clever> so you just read TS_TSENSSTAT and do a bit of math
<clever> my code is using floats, but linux was doing fixed-point math instead
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
vdamewood has joined #osdev
<Matt|home> well that's a bit demotivating. initial research into foss contributions is showing me "yeah we don't need new coders, go document stuff for us". bleh..
<dmh> tell them to kick rocks
<moon-child> yes oss is a scam
<moon-child> fuck the church^Wcathedral
vinleod has joined #osdev
<Matt|home> im not talking about a particular project, just saying it seems like my limited skillset isn't in high demand right now. i could be very wrong
vdamewood has quit [Killed (molybdenum.libera.chat (Nickname regained by services))]
vinleod is now known as vdamewood
<kingoffrance> actually, this is a good timeto note, part of the problems with the bazaar v cathedral comparisons, is grace/new testament went out with monarchies. its always been "law" "no man is above the law" for a long time. there are no cathedrals for centuries
<kingoffrance> it was always kind of an idiotic comparison
<kingoffrance> ditto corporations replaced bazaars centuries ago
<kingoffrance> people should be a lot less shallow with analogies
the_lanetly_052 has joined #osdev
Jari-- has joined #osdev
the_lanetly_052 has quit [Max SendQ exceeded]
<Jari--> morning all
<kingoffrance> one million dollars </dr evil> you know thats not a lot of money?
the_lanetly_052 has joined #osdev
<kingoffrance> its best to retire that analogy
the_lanetly_052 has quit [Max SendQ exceeded]
<klange> "Don't you think we should ask for more than a million dollars? A million dollars isn't that much money these days..."
the_lanetly_052 has joined #osdev
<klange> "Alright then... FIVE MILLION DOLLARS!" "Virtucon alone makes over nine billion dollars a year." "Oh, relly? One-hundred billion dollars. OK, make it happen. Anything else?"
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
the_lanetly_052 has quit [Max SendQ exceeded]
the_lanetly_052 has joined #osdev
the_lanetly_052 has quit [Max SendQ exceeded]
the_lanetly_052 has joined #osdev
the_lanetly_052 has quit [Max SendQ exceeded]
the_lanetly_052 has joined #osdev
the_lanetly_052 has quit [Max SendQ exceeded]
the_lanetly_052 has joined #osdev
<Maka_Albarn> Ooooookaaaaaay, I set up the IRQ remapping and installed the interrupt entry points into the IDT. I turned on interrupts with "sti" and got GP fault with an error code of 0x103.
<Maka_Albarn> ummmm, what?
<klys> does your handler ack the interrupt?
<Maka_Albarn> ???
<klys> out 0x20,0x20
<Maka_Albarn> I know it happens during my main() for loop.
<Maka_Albarn> klys: yes, my handler gives my PIC the End-of-Interrupt
<klys> do you return from 32 bit code with iretd ?
<klys> if it was 64 bit code, iretq?
<Maka_Albarn> it's all 32-bit code
<Maka_Albarn> iret
<Jari--> apple juice makes brains work better.. umph umph some summer wine yesterday :-]
<klys> is your code segment executable?
<Jari--> exciting to get assynchronous code errors, never know where the fun is at
<Maka_Albarn> umm... supposedly my Data segment selector is reading as 0x100010... that's not right...
<Maka_Albarn> is the segment selector supposed to hold 3 bytes?
<Maka_Albarn> I don't think so...
<klys> only 16 bits
<klys> geist, i spent a bit less on my new nezha sbc
<klys> visionfive appears to sell for 209. advantages?
<klys> oh this has 8gb ram
<Maka_Albarn> I found the problem. I accidentally swapped the attribute and selector arguments for my idt_set_gate() function for my IRQs.
<klys> maka_albarn, good going!
bradd has quit [Ping timeout: 240 seconds]
[itchyjunk] has quit [Remote host closed the connection]
_xor has joined #osdev
<Maka_Albarn> time to go. got work tomorrow.
Maka_Albarn has left #osdev [#osdev]
<geist> klys: yah they might have a 4GB versino at some point, but it seems to be greyed out
<geist> main reason i'd generally stay away from the nezha is its using the allwinner D1 which uses a C906 core which is pretty ancient
<geist> and ancient as in it's using an older priviledged spec, etc. ie 'nonstandard'
<geist> does it matter much? probably not, but it's a detail that has to be dealt with
eroux has joined #osdev
vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
JanC has quit [Remote host closed the connection]
JanC has joined #osdev
<kingoffrance> update: hercules flipped the ferryman's oar back on him. hercules, doesnt need no stinking $ in makefiles
masoudd has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
GeDaMo has joined #osdev
matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Jari-- has quit [Ping timeout: 256 seconds]
<kingoffrance> i changed my mind GeDaMo -- hercules flipped the boatman's oar, no need to use $ in makefiles if you are hercules
<GeDaMo> Depends who the boatman is, I suppose :P
pretty_dumm_guy has joined #osdev
Burgundy has joined #osdev
gog has joined #osdev
dormito has quit [Quit: WeeChat 3.3]
Irvise_ has quit [Quit: Bridge terminating on SIGTERM]
nopenope[m] has quit [Quit: Bridge terminating on SIGTERM]
paulusASol has quit [Quit: Bridge terminating on SIGTERM]
junon has quit [Quit: Bridge terminating on SIGTERM]
niedzejkob[m] has quit [Quit: Bridge terminating on SIGTERM]
Matt|home has quit [Ping timeout: 256 seconds]
Irvise_ has joined #osdev
nopenope[m] has joined #osdev
paulusASol has joined #osdev
junon has joined #osdev
dormito has joined #osdev
rwb is now known as rb
elastic_dog has joined #osdev
<bslsk05> ​twitter: <ChartreuseK> Little project teaser. 6800 @ 921kHz, dual 6850 UARTs, 2 or 4kB of ROM, and up to a whopping 8kB of static RAM. https://pbs.twimg.com/media/FL339Z1UcAEVXGk.png [Fennec 6800 (homebrew) PCB render]
<GeDaMo> For a second I read that as 921MHz :P
<j`ey> overclocked1
<GeDaMo> "Place iceberg here" :P
[itchyjunk] has joined #osdev
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Brnocrist has quit [Ping timeout: 256 seconds]
Brnocrist has joined #osdev
epony has joined #osdev
lkurusa has joined #osdev
eroux has joined #osdev
lkurusa has quit [Read error: Connection reset by peer]
lkurusa has joined #osdev
k8yun has joined #osdev
sheb has joined #osdev
sheb has quit [Remote host closed the connection]
xenos1984 has quit [Remote host closed the connection]
xenos1984 has joined #osdev
dude12312414 has joined #osdev
<mrvn> j`ey: How do people always manage with so few traces and holes? My designs always end up total sphagetti and impossible to route.
<j`ey> practise? idk
<j`ey> I've never done anything that complex
<GeDaMo> Do people not use automatic layout then manually fixup?
<mrvn> GeDaMo: sure
<mrvn> You should see my 16bit barrel shifter.
lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]
nyah has joined #osdev
<mrvn> 24x 74xxx 4x 2:1 MUX chips plus some control logic. Can't resolve all the wire corssings on a 10cm x 10cm board.
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
X-Scale` has joined #osdev
X-Scale has quit [Ping timeout: 272 seconds]
X-Scale` is now known as X-Scale
jborgner has joined #osdev
k8yun has quit [Read error: Connection reset by peer]
k8yun has joined #osdev
jborgner has left #osdev [#osdev]
paulbarker has quit [Read error: Connection reset by peer]
paulbarker has joined #osdev
jborgner has joined #osdev
the_lanetly_052 has joined #osdev
jborgner has quit [Quit: WeeChat 3.4]
matrice64 has joined #osdev
eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
ElectronApps has quit [Remote host closed the connection]
<geist> j`ey: cute. using old school ram too
the_lanetly_052 has quit [Remote host closed the connection]
dude12312414 has quit [Ping timeout: 240 seconds]
the_lanetly_052 has joined #osdev
the_lanetly_052_ has joined #osdev
the_lanetly_052 has quit [Ping timeout: 256 seconds]
<mrvn> That's not old school ram, this is old school ram: https://en.wikipedia.org/wiki/Magnetic-core_memory#/media/File:KL_CoreMemory.jpg
<bslsk05> ​en.wikipedia.org: Magnetic-core memory - Wikipedia
<mrvn> and now I wonder where I can get a bag of ferrite rings to build one of those.
<bslsk05> ​hackaday.com: Printing Magnets | Hackaday
<mrvn> GeDaMo: you want to print some core memory?
<GeDaMo> Sure! :P
<GeDaMo> I'd like to be able to print integrated circuits
<mrvn> I think I have 4 ferrite rings or so from old PCs where they looped the cable through one as filter.
<mrvn> Those big ~2cm ones.
<GeDaMo> Don't they still use ferrite cores on cables but in plastic containers?
<gog> yeah
<mrvn> monitor cables prior to display port
<GeDaMo> Ah, that's what I'm thinking of
<gog> i had a 2.1 speaker setup where i actually had to add one to the subwoofer because it would hum very loudly if the cops rolled by and were using their radios
<gog> which was often
<mrvn> Try running a poorly shielded Amiga with MC68060 @ 50Hz next to your radio tuned to SWF3 at 100.3 MHz.
<mrvn> I can tell you jpeg decoding has a quite distinct sound.
<gog> that'd do it
<gog> reminds me of my front panel audio connectors on my old rig
<mrvn> gog: Ever used an Arduino as a radio station?
<gog> never before
<gog> i've dabbled with RTL-SDR receivers though
<mrvn> you can connect a longish wire to a pin and bit-bang it at radio frequencies and tune a radio to it.
mahmutov has joined #osdev
<clever> mrvn: that long wire is even optional, as is gpio: https://spritesmods.com/?art=avrfmtx
<clever> in the case of the avr, there is a *4 PLL inside the cpu, so if you overclock it to 24mhz, there is a 96mhz signal deep within the cpu core
<clever> and at short enough ranges, that can be picked up by a radio
<gog> neat
<clever> and if you wiggle your overclock up/down, the 96mhz will wiggle, and boom, FM
<GeDaMo> You used to be able to use radios to listen to (really) old computers running :P
xenos1984 has quit [Read error: Connection reset by peer]
<mrvn> it's only 2-3 decades. You call that old? tssss
xenos1984 has joined #osdev
<clever> GeDaMo: one of my old laptops, i could hear the cpu usage, without any radio
<clever> something in the cpu power regulator was vibrating at audio freqs
<gog> i can sometimes hear mine whine
<gog> especially running games
<gog> idk if it's coming from the speakers or if it's some kind of timer circuitry
<gog> or a combination
<GeDaMo> At this place I used to work, we had shortwave radios for communicating around the building; whenever you triggered one near a (CRT) monitor, the monitor would give a loud clunk and go blank for a second
<GeDaMo> If that's what it was doing to the monitor, what was it doing to my head? :|
<gog> degauss your hippocampus for long life and health
<clever> oh right, there was also software exploits for crt displays
<gog> killer poke?
<clever> no, data extracting
<clever> you mess with the video timing parameters in just the right way, so the RF leaking out of the monitor encodes data you want to extract
<clever> and somebody can then recover the data from that
<bslsk05> ​en.wikipedia.org: Tempest (codename) - Wikipedia
<clever> yep
<gog> ohhh i see
<clever> instead of broadcasting on a normal freq, you just modulate the normal rf leakage that is expected/ignored
<gog> like the whole star trek "hiding the comms signal in a warp signature"
<clever> yep
<GeDaMo> "In January 2015, the Airhopper project from Georgia Institute of Technology, United States demonstrated (at Ben Gurion University, Israel) the use of Van Eck Phreaking to enable a keylogger to communicate, through video signal manipulation, keys pressed on the keyboard of a standard PC, to a program running on an Android cellphone with an earbud radio antenna."
<bslsk05> ​en.wikipedia.org: Van Eck phreaking - Wikipedia
<clever> i'm also able to control the rpi hw to that level, and can change the timing parameters freely
<clever> but for some outputs (ntsc/pal), the hw enforces using the right ratio of periods (but the reference clock can be played with)
<clever> for digital outputs (hdmi), the display is counting the length of periods, and will get upset if your off by even 1 count
<clever> so that only leaves DPI->VGA
<mrvn> clever: maybe you can mess with the frequencies in a way you still end up count perfect at the end,.
<clever> mrvn: yeah, messing with the pixel clock is the only option that works on every output
<clever> but it needs to be a small enough change to not make the display loose its lock
<mrvn> you can also just mess with the lowest bits in the picture. Add a bit of noise that's actually data.
<clever> mrvn: WoW does that to its screenshots!, but getting data encoded into hdmi like that is harder
<bslsk05> ​github.com: gateware/dvi-encoder.sv at master · swetland/gateware · GitHub
<mrvn> clever: should be trivial. Run the RPi with linux like now but in the VC's dlist add a texture to manipulate the lowest bits.
<clever> mrvn: dvi/hdmi uses a complex scrambling code to both avoid dc current flow (keep an even match of +- and -+ on the diff pairs), and also to avoid creating pure tones on the rf leakage
<mrvn> I don't mean manipulate the hdmi stream. Manipulate the picture itself.
<clever> yeah, you could overlay a picture with some alpha, to modify pixels slightly
<clever> but then how do you detect that externally?
<mrvn> clever: camera on the display through the window.
<clever> ah
<clever> but its likely not exposed to a window, because you could also scrape secrets that are just displayed normally under normal use
<mrvn> I'm pretty sure with some modeling you could also create frequencies in the hdmi stream.
<mrvn> it's more designed against accidental leakage than intentional.
<clever> yeah
<mrvn> I should look into VC coding for stuff like raid6 and reed-solomon.
<mrvn> The GPU should be really good at doing bit operations on MB chunks of memory.
<clever> yeah
<clever> have i explained the vector core to you before?
<mrvn> Just needs to be something complex enough to outweigh the time to load/save the data.
<clever> basically, imagine a 64x64 spreadsheet, each cell can hold an 8bit int, 0-255
<gog> it'd make rebuilding a degraded set a little less frustrating i imagine
<gog> if it was indeed faster
<clever> for each vector, i specify a coordinate (row#, col#), a direction (row or column mode), and a bit width (8bit, 16bit, 32bit)
<mrvn> gog: not really. The limiting factor is the disk speed there already. But it would use less cpu.
<gog> ah ok
<mrvn> My raid rebuild at ~550MiB/s,
<gog> oh that's faster than my SSD can even read
<clever> mrvn: so for example, H(5,6) says to load a 16 wide chunk, starting at row5, column6 and ending at column21
<mrvn> 110MiB/s per disk.
<mrvn> The disk could do more but I think the controller can't.
<clever> mrvn: following along so far?
<mrvn> clever: I know how it works
<clever> ah
<mrvn> gog: Have you ever considered the lifetime of your SDD or an M2.key? I have a 2TB M2.key that can do 2GB/s throughput. It's expected to last for at least 400TB writes (or so). Sounds like a lot right?
<mrvn> 400*1000/2/60/60 = ~55.5 hours.
<gog> i have and i'm pretty sure at the rate i use it its expected life is about 3 years
<gog> i've had it for 4 :|
<mrvn> gog: warranty + 1 day.
<gog> lol
<gog> yeah it's well out of warranty
<clever> Data Units Written: 669,977,243 [343 TB]
<clever> for the main ssd in my desktop
<clever> Percentage Used: 70%
<mrvn> So the M2.key is expected to fail after 55 hours at full speed. It's also supposed to last 3-5 years (depending on what you buy). Lets you compute the avearge write speed they expect you to have.
<mrvn> clever: is that SMART data?
<clever> yep
<gog> 231 SSD_Life_Left 0x0000 015 015 000 Old_age Offline - 85
<gog> not bad i guess
<gog> 16TB lifetime writes
<gog> 134 unsafe shutdowns
<gog> but no erase failures
<clever> Unsafe Shutdowns: 227
<gog> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 20807
<mrvn> gog: doesn't it have a little capacitor to safely shutdown on power loss?
<clever> Power On Hours: 29,646
<gog> mrvn: idek about that
<mrvn> Model Number: INTEL SSDPEKNW020T8
<clever> Model Number: INTEL SSDPEKKW512G7
<mrvn> Temperature: 37 Celsius
<mrvn> Data Units Read: 41,057,477 [21.0 TB]
<mrvn> Data Units Written: 47,942,854 [24.5 TB]
<clever> Temperature: 31 Celsius
<mrvn> Power On Hours: 12,644
<mrvn> Still good with that one.
<gog> Device Model: KINGSTON SA400S37120G
<clever> at what point should we just pastebin the entire thing? lol
<gog> lol
<mrvn> sorry.
<mrvn> I've written the whole thing 12 times but only read it 10 times. interesting.
<clever> Data Units Read: 232,332,603 [118 TB]
<clever> Data Units Written: 669,977,243 [343 TB]
<clever> Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
<mrvn> Looks like I reboot every 22 days on average.
<clever> 99days for my desktop
<mrvn> Most reboots will be clustered. A lot for the initial setup and install.
<clever> 15:17:09 up 360 days, 18:18, 22 users, load average: 0.00, 0.01, 0.05
<clever> nearing 1 year for the irc client box
<mrvn> clever: what does your disk say?
<clever> for the irc client, *looks*
<mrvn> power on hours / power cycles = ???
<gog> 12 days uptime lol
<clever> sda, 173 power cycles, 114845 hours, 663 days avg
<gog> i need to remove a bunch of plasma/kde stuff
<gog> updates are kindof a chore rn
<mrvn> I think reboots don't register on the disk, only cold starts, right?
<clever> sdb 100 power cycles!!, 76079 power on, 760 day avg
<clever> possibly
<gog> 14 day average uptime
<gog> 20807/1462
<mrvn> you win
<gog> i've gone entire months with this off though
<gog> baack when i didn't need a laptop really
<mrvn> Oh, this was my server. It's never off.
<mrvn> (intentionally)
<clever> oh, i should check the router hdd...
<clever> it doesnt even have smartctl installed!
<mrvn> 2022-02-18 20:11:18 status unpacked smartmontools:amd64 7.2-1
<clever> 80851 hours, 1566 cycles, 51 day avg, i blame the previous owner
<clever> the router was once part of an xray machine in a hospital
<mrvn> ouch, that must have caused some bit flips.
<clever> smart says nothing is wrong with it
the_lanetly_052_ has quit [Remote host closed the connection]
gareppa has joined #osdev
k8yun has quit [Read error: Connection reset by peer]
gareppa has quit [Quit: Leaving]
dude12312414 has joined #osdev
garrit has quit [Ping timeout: 256 seconds]
Teukka has quit [Read error: Connection reset by peer]
Teukka has joined #osdev
<gog> holy shit i got task switching working
<gog> ;aehrs;rongaoidjrga;a
<GeDaMo> Which task was that? :P
<gog> the one where i excitedly smash my keyboard
<mrvn> cooperative, preemptive, tickless?
<gog> cooperative round-robin
<gog> really rudimentary
<gog> well i think it can preempt?
<gog> idk
<gog> i don't have time slices implemented yet
<GeDaMo> Yield, baby, yield! :P
<mrvn> I feel cooperative is actually harder, needs syscalls while preemptive needs IRQs which most do first.
<gog> oh then it's preemptive
<j`ey> well cooperative could be kernel threads, explicitly calling schedule()
<gog> i made a sort of callback interface to my timer irq in order to keep this clean
<gog> none of the threads call the scheduler so it's preemptive then
<mrvn> j`ey: true
<j`ey> gog: so a timer fires and that causes the threads to be context switch?
<mrvn> gog: a good clean way is to task switch on return from an irq/syscall and have the scheduler just set a variable which task to run next.
<gog> the timer fires and calls a function that iterates over any sleeping tasks, appends tasks to wake up to the ready queue, then calls the scheduler
<mrvn> gog: makes it real easy to e.g. schedule when the disk returns a block of data another task was waiting for.
<mrvn> gog: iterates? bad idea.
<gog> probably
<mrvn> gog: you have the tasks in a doubly linked list?
<gog> singly
<gog> i'm following brendan's tutorial because i had no clue where to begin
<gog> and i cribbed most of the code from it
<mrvn> gog: if you make it double a task can remove itself from the running list and insert itself in the waiting list or vice versa.
<gog> that would be more ideal
<gog> but i'm new to this part
<gog> i've never gotten this far with an experiment
<mrvn> And the timer interrupt would just use current_task->next.
<gog> i still need to test it more
<mrvn> gog: anyway, gratulations, it works. Now make it nice.
<gog> it _appears_ to work
<gog> there's a big gulf between appearing to work and verified to work
<mrvn> gog: Most people get stuck on how to create a task. The switching isn't that difficult but preparing the second task so you can switch needs a mind warp.
<j`ey> does it print ababab?
<mrvn> j`ey: mine prints primed and binomial numbers
<mrvn> something that quickly slows down output.
mahmutov has quit [Ping timeout: 240 seconds]
<klange> ah, my rpi froze... clock says a bit under 10 hours of uptime
<klange> I should change the boot timestamp (no rtc), it's currently a very random 1644908027 which is 6:53:47 this past Tuesday. Should make it midnight, easier uptime calculation from the panel clock...
heat has joined #osdev
<gog> j`ey: no it prints a bunch of debugging output with elapsed time and wakeup time and "THE THREAD IS AWOKEN!" when the sleep test's expiration time comes
<j`ey> :3
<klange> my original task switch test 10 years ago printed A and B
<gog> i still need to make more threads to test
<gog> but thread creation itself is sorted
<klange> oh and Cs
<gog> at least for my purposes currently
<heat> tip: preemption shouldn't rely on IRQs
<gog> yeah i know i need time slices
<heat> a timer event is just one of the ways ask for rescheduling
<klange> "preemption" absolutely should rely on irqs, how else are you going to interrupt a running userspace process?
<klange> "task switching" should not
<heat> IRQs, IPIs
<gog> anyhow i have more testing to do
<mrvn> heat: better to say: should not just rely on IRQs
<klange> the advice of ensuring your task switching isn't dependent on irqs is about being able to make task switches for blocking
<mrvn> preempting on fixed time slizes is kind of boring and unresponsive.
<gog> yeah when a thread is blocked it'll run the scheduler
<mrvn> klange: separating the scheduler (called by e.g. the timer interrupt) and task switching (performed on kernel -> user switch) fits that model really nicely.
<gog> that's hwo the tutorial is written so that's how mine works :p
<heat> also don't measure your time using a counter you tick every IRQ
<heat> bad, bad idea
<mrvn> google tickless.
<mrvn> although with the PIC you kind of need to do tick based.
<heat> tickless is harder to do
<klange> tickless just means the preemption timer isn't a fixed rate
<gog> yes i intend to have the capability for tickless at some point when i understand this a little more thorougly
<mrvn> but tickless is so much worth it.
<gog> yes
<heat> my scheduler runs using a tick (no tickless yet) but my timer doesn't know what a tick is, it just does oneshot of the next event
<heat> the scheduler tick + preemption is just a regular event
<mrvn> heat: so you are tickless but just never change the interval
<mrvn> heat: well, maybe. Do you stop the timer when the idle thread runs?
<heat> no. that would be dynticks I think
<heat> tickless is when you don't tick but set the next "tick" to the next preemption point (when the next thread can run or something)
<heat> at least that's my idea of it
<mrvn> I thick dynticks would be selecting an interval based on e.g. process priority. "time till next event" I think describes tickless
<mrvn> you just are verry bad at calculating "next event" so far
<klange> tickless is when you aren't using a fixed periodic timer as the pre-emption source but instead making the pre-emption timer fire precisely when the next time event is
<heat> yea
<klange> no one is running and no one is blocked with a timeout? no tick at all - some other hardware interrupt will bring us out of idle
<mrvn> It's kind of blury when you don't have a periodic timer in hardware
xenos1984 has quit [Remote host closed the connection]
<heat> and i'm using a fixed-ish frequency, so not really tickless
xenos1984 has joined #osdev
<mrvn> I think the more important part of preemption (maybe that's the wrong term?) is that you switch task when something more important comes up instead of finishing the current time slize.
<heat> right, that's where something like IPIs can come into play
<klange> IPIs are still interrupts.
<heat> yes
<heat> not a timer IRQ though
<mrvn> more important is stuff like keyboard interrupt, mouse, network
<klange> Preemption is very specifically "hey, you, running userspace process that hasn't made a system call, I'm descheduling you".
<mrvn> wake up the video player when the next video frame comes from youtube even though gcc still has time left to run.
<heat> that triggers an IPI
<mrvn> klange: but why are you descheduling? Because the time slize is up? Because some higher priority task work up?
<j`ey> mrvn: slice!
<heat> sched_unblock(thread) -> thread->prio > other_cpu_curr_thread->prio -> other_cpu_curr_thread->flags |= SHOULD_SCHED_OUT; sched_send_ipi(other_cpu);
<klange> mrvn: separate topic
<heat> i modeled a lot of my timer+sched thing around linux and all it matters is that at some point, ->flags is checked at some points
<heat> like after unlocking a mutex, re-enabling preemption, exiting to user-space, etc
<heat> for tasks which are permanently in user-space or idling, IPI
GeDaMo has quit [Remote host closed the connection]
<mrvn> klange: For me scheduling on time slices is just time sharing. True preemption include premature switching. Which is a step stronger.
<heat> so, a random malloc blows up the kernel
<heat> and I don't know what's wrong
<klange> mrvn: and it has nothing to do with what I'm saying
<heat> kasan says everything is great
<mrvn> klange: for you windows 3.1 is premptive, DOS is preemptive.
<klange> what
<mrvn> heat: blows up can mean a lot of things
<klange> you are putting words in my mouth
<mrvn> klange: it interrupts the running process and schedules a different one.
<heat> mrvn: I know, and it's doing a lot of things; like general corruption of thread state and whatnot
<klange> Pre-emption is the ability to interrupt a userspace process and determine that another one needs to run, without that process having willingly yielded itself by making a system call.
<mrvn> klange: yep, win: 3.1 check, DOS: check
<klange> This requires some hardware ability to force the userspace process to enter the kernel. That's an interrupt.
<heat> i wonder if I missed some important detail on the riscv mmu
<klange> How that interrupt is triggered, whether it's an IPI from another processor, or a hardware IRQ from a mouse, or a timer, is irrelevant to pre-emption.
<mrvn> klange: even time sharing systems from the 60s are preemptive with that definition
<klange> Yes, very much so.
<mrvn> so basically everything but userspace coroutines is preemptive.
<klange> Yes?
<mrvn> I'm not finding that definition verry useful.
<klange> > The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.
<klange> > In simple terms: Preemptive multitasking involves the use of an interrupt mechanism which suspends the currently executing process and invokes a scheduler to determine which process should execute next. Therefore, all processes will get some amount of CPU time at any given time.
<mrvn> yeah, I think modern use is a bit different, esspecially in linux.
<klange> It's a term that, alone, has lost a lot of usefulness because the thing it describes has become a fundamental aspect of what is expected from an operating system: The ability to pretend a single core is running multiple uncooperative processes "simultaneously".
<mrvn> bingo
<gog> ok i was just trawling through the disassembly of my kerenl and there's a movaps with xmm registers in there, should i be using -mno-sse?
<mrvn> gog: I don't know, shoud you?
<gog> ¯\_(ツ)_/¯
<mrvn> gog: what about fpu regs?
<klange> -mgeneral-regs-only is a good start
<gog> could've sworn i had that
<klange> until/unless you want to manage the kernel being able to use 'fancy' registers, which you can do if you want!
<heat> note that mgeneral-regs-only is kind of recent in x86 gcc/clang
<heat> used to be arm64 only
<heat> so update yer toolchains
<gog> ok
<gog> j`ey: neighborhood cat is curled up at my feet rn :3
<gog> she's been rubber ducky debugging for me all day
<mrvn> gog: better than on your keyboard
<gog> yes
<gog> ok -mgeneral-regs-only was not applied to all objects. fixed
<gog> i don't trust those fancy registers because i don't understand what using them might entail
<j`ey> XSAVE!
<klange> Usually just making sure you restore them like all the other ones, it's just they're kinda big...
<mrvn> gog: it will kill your multitasking since you aren't saving them
<gog> yeah i don't have anything for that implemented
<j`ey> save all the AVX512 registers..
<j`ey> gog: youre saving the normal GPR regs, right? :P
<gog> lol no why would i do that
<mrvn> don't you have to enable AVX before use?
<gog> yes of course i'm saving the GPRs
<heat> fxsave and xsave, frstor and xrstor
<heat> mrvn, yes
<klange> I think I have a... I don't want say bug, but missing feature in my task switching where I always save/restore fpu regs even if I'm just switching between kernel contexts...
<klange> need to fix that to only save coming in from userspace, and restore going back out...
<heat> oof
<mrvn> klange: or save it anyway and use SSE in kernel
<heat> it's not a bug if it works properly
<klange> it's a "thing that can be improved" rather than a bug, yes
<klange> mrvn: i don't wanna, you can't make me! *throws things*
<heat> when I try to allocate fpu space for user threads my kernel blows up
<heat> so you're a tiny bit better than me :P
<gog> my kernel doesn't even know what the FPU is
<gog> my "kernel"
<klange> you doin' x86-64 or old-skool 32-bit?
<gog> x86_64
<gog> 32-bit is a pain in the ass
<heat> osdev is a pain in the ass
<heat> go learn karate or something
<heat> way more fun
<j`ey> karate is a pain in the ribs
<mrvn> gog: FPU is that thing that gets called when your opcode is Exxx
<klange> you'll want to teach it what an fpu is, then, 'cause once you start making "real" userspace stuff you'll find that the compiler just assumes a certain level of fpu is available for normal stuff, as you've already seen
<gog> what's opcode
<heat> thing that make cpu go brrr
<klange> opcode i-
<gog> (not much what's opcode with you?)
<klys> the first byte is 0xEx ?
<klange> dammit heat i was gonna do the brrr
<mrvn> gog: that thing that calls your illegalt-instruction handler
<gog> i don't have a handler implemented for #UD lol
<mrvn> klys: iirc. and it's a word on m68k, no bytes for opcodes
<heat> you'll be so confused when you hit UB and get a double fault
<klys> mrvn, m68k eh
<mrvn> gog: are you using clang?
<klange> gotta have panics stubs for all the exception interrupts, smash a bunch of strings in an array so you can look up a pretty name for them
<gog> i will nver use clang stop asking
<heat> i sure as hell hope so
<heat> i gave you the patch
<gog> i ignored it
<gog> send a PR
<heat> take it
<klys> mrvn, you have a coldfire project?
<heat> i only send patches
<gog> so i can ignore it again
<mrvn> gog: too bad, clang actually produces ub2 opcodes. gcc just goes haywire on UB
<kingoffrance> as uptime is those cat-quick blinks between reboots, opcodes are surely those brief flickers between faults
<mrvn> klys: I have original Amigas
<klys> mrvn, an amiga 2000 ?
<gog> ok heat i'm gonna make a branch and apply the patch there
<mrvn> klys: 600, 1200, 4000
<klange> [Narrator: gog never merged the branch.]
<heat> gog: imagine a world where the compiler is friendly and looks out for you and your code, a world where LTO can be fast
<heat> a world where your compiler and the runtime libraries are liberally licensed
<klys> LTO is like libtool right?
<gog> do i still need -lgcc though
<mrvn> heat: what's wrong with libgcc licensing?
<heat> klys, no
<heat> fuck libtool
<mrvn> klys: LTO is like: cat *.c | gcc -
<heat> fuck libtool x100
<klange> you should be able to get gcc to produce fully libgcc-less kernel code for x86-64
<heat> mrvn: libgcc's licensing falls apart if you need to explicitly include it
<mrvn> klys: you can. I do.
<gog> also idk if i mentioned i tested with a bunch of threads all set to sleep for different intervals and it still seemed to work so yay!
<kingoffrance> mrvn, indeed, it was also called CMI, part of/related to ye olde oskit :D
<klys> so LTO is a TLA remind me what it stands for
<klange> link-time optimization
<mrvn> klys: just't don't use "x / 10" anywhere in your code.
dormito has quit [Ping timeout: 256 seconds]
<klange> code optimization that happens at link time, rather than just compilation time, so you can mash together object files more efficiently than "here's a bunch of relocs, good luck"
<mrvn> klys: link-time-optimization. It stores the intermediate language in the .o files and on link it merges all the chunks and runs the optimizer over everything.
<heat> LTO is a compiler mode where it includes bytecode and ASTs inside your object files, and when linking it looks at the compiler data and optimises stuff
<klys> does LTO change the makefile?
<heat> you just add -flto to the cflags
<mjg> wait for a paper that LTO does not improve squat over -O2 if you control for everything
<mrvn> klys: yes. you need to add the LTO flag and link with CC/C++
<heat> mrvn, it's actually not the compiler that does it, but the linker with plugins
<klys> does clang have LTO?
<mjg> [on a serious note i do believe properly implemented LTO gives great opportuniteis of course]
<mrvn> mjg: the only way LTO changes anything is if you have multiple compilation units.
<mjg> which is the standard in bigger projects
<heat> klys, ofc
<mjg> and even smaller ones
<klange> clang has LTO and is noted for being better at it than gcc, hence why it was even brought up
<mrvn> mjg: and then it changes tons and tons and tons.
<heat> clang even has -flto=thin which is like regular LTO but way faster
<j`ey> less LTOy
<heat> but way way faster
<heat> and
<heat> speed
<heat> is
<heat> keyyyyyyyyy
<mjg> see
<mjg> gcc likes to inject rep mov/rep stos sometims
<mrvn> The nice thing about LTO is that you can put your trivial getters and setters in .c files without paying for it.
<mjg> instead of generatic a call to memcpy/memset
<mjg> except when it does that, it gets slower
<mrvn> mjg: why?
<mjg> or to put it differently, compiler optimization are pretty hit and miss
<heat> depends on the -march
<mjg> mrvn: there is a huge startup cost to rep prefix
<heat> and the size of the memcpy
<klys> gcc -flto -o main.o -c main.c; clang -flto -o main -o main.o; ???
<mjg> mrvn: on most cpus
<mrvn> mjg: so? memcpy does rep too
<mjg> only for big sizes
<mjg> normally you do simd
<mjg> or even straight up movs
<mrvn> mjg: which require a branch, even worse
<mjg> no
<klys> gcc -flto -o main.o -c main.c; clang -flto -o main main.o; does this work?
<heat> mjg: if your compiler generates rep movs for small sizes, it's broken or you configured it wrong
<mjg> heat: gcc will inject them at times
<heat> klys, changing compilers? no clue
<mjg> heat: grep something big like the linux kernel
<mrvn> mjg: sure it does. the normal memcpy function call isn't optimized for known size. That's where the rep mov/rep stos comes from.
<heat> mjg: my kernel has a bunch of rep movsb but they're way less than a memcpy
<bslsk05> ​android.googlesource.com: libc/arch-x86_64/string/sse2-memset-slm.S - platform/bionic - Git at Google
<heat> and generally when size is bounded
<mjg> mrvn: pick any production memset/memcpy and you will find they don't do rep
<mjg> mrvn: even intel optimization manuals explicitly state to only do rep after simd becomes too slow
<heat> yes
<heat> and that's like 200-300 bytes in
<mrvn> mjg: I count 5 jumps right there
<mjg> well ye there are some jumps
<klange> intel specifically says they aim for rep mov to be the fastest, but have historically fucked it up
<mjg> and that's still faster than mere rep mov
<mrvn> mjg: and you can't use sse, only regular regs you told gcc.
<mjg> for kernel code, sure
<mjg> but evven then it is faster to not use rep up to about 256 bytse
<mrvn> mjg: gcc is pretty good in knowing the cost of various memcpy() flavours and pretty good at picking the fastest for known sizes. c++ needs that so often that it's the biggest timer saver making that good.
<mjg> note when you do these sizes it's not very jumpy thanks to misaligned + overlapping stores
<mrvn> mjg: is it faster on a pentium?
<mjg> you mean pentium 1?
<mjg> i don't know on pre-amd64
<mrvn> mjg: or even 586.
<mrvn> mjg: if you are doing 32bit code the cpu target is rather low.
<mjg> i do remember very old memsets et al were doing hand rolled loops or had a jump table
<mjg> all my comments were about amd64
<heat> if your compiler is doing rep movsb of smaller targets, its broken because it knows exactly how big a memcpy needs to be before it uses rep movsb
<mjg> again see intel optimization manuals
<mrvn> mjg: if you are doing 64bit code you get a bunch of cpu optimizations for free because all 64bit cpus are more modern.
<mjg> they literally say to use simd
<heat> smaller sizes*
<mjg> and only resort to rep stuff for big sizes
<heat> i've never seen GCC do what you say it does
<heat> and I have looked at that specifically
<mjg> heat: i have to head off in few minutes. i have seen it several times, but don't remember the version number
<mjg> i'm happy to flame this tomorrow around the same time :)
<bslsk05> ​en.wikipedia.org: Duff's device - Wikipedia
<bslsk05> ​godbolt.org: Compiler Explorer
<mjg> mrvn: of course
<heat> actually this is a flawed example, just change the memcpy's length
<mjg> i did not claim gcc generates inline rep every time
<mjg> that would be incredibly broken
<mjg> i did claim it happens sometimes
<mrvn> args, stupid firefox messing up the copy&paste again
<mjg> afair it had to do with sizes known at compilation time and > 100 bytes
<mjg> 100ish
<mrvn> try memcpy(buf, buf2, 64);
<mrvn> On not-x86_64 the compiler has to know the alignment as well as the size for good code.
<klange> Duff's device is from the early 80s and has nothing to do with what's fastest for modern CPUs. I don't even think it was built for Intel CPUs at all.
<heat> AH YES OF COURSE
<heat> mjg: i know why you're seeing that
<heat> -Os
<heat> test it.
<mjg> no
<mrvn> mjg: 256 bytes uses vmovdqa, 512 bytes calls memcpy.
<mjg> i'm confident it was O2
<mjg> and simd disabled
<mrvn> mjg: you must have hit a case where it didn't have the register and the "rep" is cheaper than saving and restoring the regs.
<mjg> that is plausible, sure
<bslsk05> ​godbolt.org: Compiler Explorer
<mrvn> mjg: with -mregular-regs-only I get a bunch of "mov" with size 128 and memcpy with size 256
<mrvn> mjg: did you maybe see a strdup()?
<mjg> no, it was in some random code
<mjg> i grepped objdump -d
<mjg> on a big binary
<mjg> and no, it was not hand-rolled asm there
<mjg> i have to go now, happy to continue tomorrow same time - 1h
<heat> mrvn, see my example. it only tries to inline rep movsb when -Os
<mjg> for -Os that's pretty expected
<mjg> and i would not complain about this bit :)
<mrvn> heat: I bet there are cases where rep is better
<mjg> it is better starting about 2K
<mjg> if you can use simd
<mjg> and 256 bytes if you can't
<mjg> exact numbers depend on the uarch of course
<mjg> ok i'm really afk now, cheers
<gog> heat: oh dam it works
<gog> i had to git am -3 it but it worked lol
<gog> one conflict where i radically rewrote task.c
wootehfoot has quit [Read error: Connection reset by peer]
heat_ has joined #osdev
heat has quit [Ping timeout: 250 seconds]
<mrvn> Aprop LTO from earlier. My moose build system allows mixing LTO and non-LTO on a per directory basis. So you can LTO optimize all your heap functions but not pay the compile time to do it all.
mahmutov has joined #osdev
<mrvn> even per file.
<klange> i might want to look at ltoing my kernel again, been a while
<klange> all my userspace apps are single-source-file and then dynamically linked to libraries that are also single-source-file... except kuroko, which I should also look at ltoing
<klange> hm, or do I yolo build kuroko as a single source...
<klange> i yolo build kuroko, nvm
<heat_> mrvn, theoretically yes, but well crafted memcpys like glibc's are still faster
<klange> kernel is still built from object files, so it could use lto'ing; libc is too, but I'm super wary of LTOing libc
heat_ is now known as heat
<gog> yolo
<klange> $(CC) -o kuroko kuroko/src/*.c
<gog> heat: idk if you saw i applied your patch with git am -3 with one conflict and one reversion
<gog> and it builds and runs
<klange> give or take some CFLAGS
<mrvn> heat: memcpy() is so much better that gcc replaces copy loops with it.
<heat> klange, note that you can do -ffat-lto-objects and your object files get both LTO goo and object files
<heat> eh, machine code*
<mrvn> heat: you basically have to for boot.S
<heat> what?
<mrvn> heat: well, if you build temp libs for subdirs
<klange> since I only statically link libc in one place (ld.so) I might be okay? I think I just have outdated paranoia about LTO dropping symbols from a shared lib
<klange> and/or messing with things that should be [shudders at the thought] interposable
<heat> gog, where?
<mrvn> heat: threw me for a while. if you link boot.o bla.o blub.o with LTO the boot.o stuff disappears and you are only left with the ASTs.
<heat> no?
<gog> heat: reverted debug.sh and conflict in kc/core/task.c because i refactored the whole thing
<heat> mrvn: I link plenty of non-LTO stuff with LTO stuff, and everything is included
<mrvn> klange: if the function is externally visible then it won't get dropped.
<mrvn> heat: into object files?
<mrvn> heat: it's different for libs and binaries
<heat> gog, yeah but where is it
<gog> idle_task
<gog> __attribute__((used))
<gog> or you mean the branch? i haven't pushed it
<heat> mrvn: hmm, maybe static libs are different, but I'm talking about the final exec since I don't do static libs of directories
<heat> gog, do the push
<gog> ehhhh
<mrvn> gog: [[used]]
<gog> mrvn: i am using c2x on my gcc branch :o
<heat> tip: all the changes are probably super compatible with GCC, you just need to change the makefile slightly
<mrvn> heat: static libs are just ar files of all the .o file. I'm talking .so files.
<heat> i know they are
<mrvn> heat: if you link .o files into a larger .o (or .so) file you end up with a non-fat .o file. So only the AST chunks.
<heat> gog: have a CLANG=1 to use clang, and use x86_64-elf otherwise or something should be fine
<gog> heat: i'm considering a reworking of my build system soon and that could become part of it
<heat> i find it to be really useful to compile my stuff with two compilers
<heat> really irons out all the issues
<mrvn> heat: at least gcc is to stupid to notice when a .o file has no AST and only the binary code and drops it.
<heat> I've found gcc LTO to be significantly more picky than LLVM for example
<mrvn> heat: picky in what way?
<gog> haven't tried LTO
<heat> wrt function signatures for example
<gog> my project is tiny though so it'd probably be feasible
<mrvn> gog: try it early. less likely to break.
<heat> if one compilation unit sees void f(int); and the other one sees unsigned long f(unsigned long); gcc complains, clang doesn't
<mrvn> heat: I would say gcc is correct.
<heat> yes
<mrvn> it's undefined behavior.
<heat> everything I was doing was technically correct wrt ABI, but it's totally undefined
<mrvn> or actually an error. If you have a duplicate function then it must be identical.
<heat> my syscall table was just an array of a funcptr that took 6 unsigned longs and returned a long
<heat> due to the x86_64 ABI, it works, but it wasn't correct
<heat> I refactored it now, but yeah
<mrvn> heat: only works for register arguments and not with vargs
<mrvn> verry fragile
<heat> why would I use varargs on a system call
<mrvn> just saying
<mrvn> .oO(sys_print)
<heat> cursed
<mrvn> or sys_log.
<heat> all POSIX functions that say they take varargs end up resolving the varargs in user-space
<heat> like open(2) for instance
<mrvn> heat: my OS isn't POSIX: :)
<gog> i think my next step is user mode
<klange> do iiiit
<gog> i figured out syscall/sysret already
<heat> emperor palpatine vibes
<gog> i just haven't implemented anything for it
<mrvn> gog: you have threads without user space?
<gog> i have kernel threads rn
<gog> or tasks or whatever you wanna call them
<heat> well, go to user space
<mrvn> gog: no wonder your threads work. :)
<heat> if that's what you want to do that is
<gog> i'm positive userspace is gonna break shit
<gog> which is fine
<kazinsal> userspace contains users, and users break shit, so this tracks
<gog> yes
<heat> there's plenty of stuff you can still do in kernel space
<gog> probably
<heat> it turns out
<gog> i need to also refactor my memory management
<heat> kernel dev never ends
<heat> it's an unending cycle
<gog> there's a lot of TODO comments everywhere
<heat> just like osdev
<gog> i just don't wanna die without badly reimplementing UNIX
<heat> what you do need is ACPI
<heat> also PCI
<gog> yeah i've been investigating ACPI
<klange> still don't have much acpi beyond "where cores plz"
<heat> do you not handle pci irq routing?
<gog> i don't have any generic synchronization primitives yet
<mrvn> gog: the compiler has
<heat> oh god the sync primitives ahh
<mrvn> gog: don't write your own
<gog> ACPICA wants a few of them iirc
<klange> i have a chipset driver for piiix that gets me pretty far
<heat> mrvn, GCC doesn't include a mutex implementation i think
<mrvn> who needs a mutex. bah. bad code.
<heat> klange: you're doing toaru without any 3rd party code right?
<klange> aye
<klange> acpica is verboten
<heat> does that count in the bootloader?
<klange> my bootloader is all mine as well these days
<heat> idea: parse ACPI into a device tree
<heat> like an actual fdt
<klange> that's not a terrible idea, really
<klange> especially in EFI where it's more necessary to have that stuff and you've got some functions available for poking it...
<heat> do you?
<klange> nope, but I may actually think about it when my fdt parsing improves
JanC_ has joined #osdev
JanC has quit [Killed (calcium.libera.chat (Nickname regained by services))]
JanC_ is now known as JanC
<gog> there's only 4 functions in the acpi protocol
<gog> iirc
<gog> they're not very useful
<heat> UEFI code parses a device tree into AML btw
<gog> find rsdp, load a table, uhh
<heat> it does the reverse
<gog> ah i see
<heat> my idea was: black box inside the bootloader where you *can* use 3rd party code, the kernel just uses regular fdt code
<heat> if that doesn't work, welp
<klange> I will, eventually, write my own AML interpreter.
<klange> Probably after I finish my truetype hinter.
<heat> can you enforce MSI only?
<heat> might work better
<heat> i know vms are weird with that stuff, so you could use the chipset driver
<gog> klange: will you have truetype bytecode interpretation in-kernel like NT did
<gog> it's a great idea
<heat> if(!enable_msi()) do_legacy() where do_legacy would panic on non-piiix chipsets
<klange> no :P my kernel only has a bitmap font thing for its debug output, my truetype lib is userspace
<heat> modern PCs have PCIe, and PCIe requires MSI afaik
<mrvn> monochrome?
<gog> reject modernity, return to ISA
<gog> no PnP BIOS
<gog> jumpers baybeee
<klange> mrvn: not sure what you're asking
<mrvn> klange: if it's a monochrome font or greyscale
<klange> The kernel's text driver is 'monochrome' in that the bitmaps are either on or off, though it does support color and has a tiny DEC parser
<klange> The TrueType implementation, while currently unhinted, is antialiased.
<klange> It replaced a signed-distance-field renderer (that used baked signed-distance-fields stored as PNGs, previously BMPs)
<klange> Which itself replaced FreeType as part of aforementioned "no third-party code" project.
<heat> gog, hmmm that would kinda work too though
<klange> The current TrueType implementation only really looks marginally better than the SDF renderer as far as individual glyph presentation, but it had huge wins for kerning and basic Unicode support
<heat> each driver implements probe_irq(), which generates an IRQ, the irq subsystem would find out which IRQ the device is mapped to
<heat> checkmate ACPI
<klange> it was a pain in the ass to bake the glyph spritesheets for the SDF library, and I gave up on it before implmenting any sort of glyph index tables or rect-packing optimizations, so it was infeasible to convert whole fonts, only basic ASCII subsets
<klange> now I just plop the real Deja Vu fonts in and I'm good - only third-party thing I use if you can accept/except libgcc :)
<heat> you could probably skip libgcc too
<klange> I can probably provide my own implementation of the necessary functions, but convincing gcc to universally not produce it for userspace things is... not fun.
<heat> init/fini_array take care of the constructors, and libgcc's interface isn't that complex, at least for modern 64-bit architectures
<mrvn> klange: reimplenting the gcc specific stack unrolling and exceptions and such is no fun
<klange> I don't actually think libgcc is providing anything init-related, it's mostly stupid things gcc can't be arsed to spit out directly
<heat> it includes crtbegin/end
<heat> which implements _init, etc
<klange> the stack unrolling and exception support mrvn mentions is probably the biggest "fuck this shit" part
<klange> there's no init code in crtbegin
<klange> ironically it has dtor code
<klange> deregister_tm_clones, register_tm_clones, __do_global_dtors_aux, and frame_dummy are all that's in there
<klange> crtend is... effectively empty
<heat> oh right right
<heat> _init and _fini are defined by crti.o
<heat> but crtstuff has all the good stuff
<heat> that's what goes into .init and .fini
<klange> crtstuff is a macro that pulls in those other things, and crti, n, and 0 are 'user'-provided, so those are already mine :)
<heat> crt1 > crt0
<heat> fite me
<klange> no no, your math checks out...
<klange> I happily expunged libgcc from the kernel, at least, on x86-64 and aarch64, with the right flags.
<klange> I abandoned 32-bit x86, but I recall divs being the killer thing that would drag in libgcc there
<mrvn> klange: x/10 in printfs kills me on ARM
<klange> I do not get those refs on aarch64
<mrvn> klange: had to implement my own div_10 that uses shifts and masks.
<mrvn> klange: aarch64 has a 64bit div opcode I assume
<heat> klange, do your atomics work? I've seen some atomics code for arm64 in compiler-rt
<klange> yes, I had to add flags for it, though
<heat> lse?
<klange> dunno about clang/compiler-rt, but for gcc it was because it wanted to check if it could do 8.1 atomics and if I forced it not to it would directly emit instructions and be good to go
<heat> yes exactly
<heat> compiler-rt and libgcc are completely compatible btw
<heat> clang even defaults to libgcc in linux for example
<klange> -mno-outline-atomics
<heat> yeah I see that
<heat> Out-of-line LSE atomics helpers
<klange> I've also got -ffixed-x18 thanks to geist, doing all my CPU-local stuff as direct register references
<heat> how does that work?
<mrvn> heat: gcc forgets x18 exists and you can use it in asm however you want
<heat> isn't there an abi register for the tp?
<klange> yes, but using a general reg is more flexible for code
dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]
<heat> why?
<klange> because system regs need mrs/msr instructions but general reg can be used in other operations directly
<heat> oh
<heat> how does user-space do TLS then?
<klange> it's unclear to me whether this is actually faster or if it's just saving a few bytes of code here and there, but it allows for `register struct ProcessorLocal * this_core asm("x18");`
<heat> ok it also uses mrs
<klange> I think userspace TLS is generally loading from the thread pointer register once and then just treating it as constant from then on, but for the ^ CPU-local case that's not feasible because it can change in a function call, so there would be way more loads necessary
<heat> well if it changes in a function call, that seems pretty broken to me
<mrvn> kernel or user code?
<heat> generally when you're doing percpu accesses you disable preemption, so you can't migrate
<klange> In kernel code, a function call may end up calling a yield, and when it returns you can be on a different CPU.
<heat> yes but you disable preemption so that doesn't work
<heat> it's how linux works at least
<mrvn> heat: and then the kernel dies with "Aieeee, scheduling in interrup. Not syncing."
<heat> all percpu accesses need preemption disabled
<klange> I do not pre-empt kernel code, but that doesn't matter, a _function call_ may return on a different CPU.
<heat> mrvn, hm?
<klange> so if I want this_core->foo and then block_on_thing(), then this_core->bar afterwards may be a different CPU's bar
<mrvn> heat: calling yield in a critical section is not a good idea
<heat> well, don't call yield
<klange> no I have no multitasking :P
<klange> now*
<bslsk05> ​elixir.bootlin.com: percpu-defs.h - include/linux/percpu-defs.h - Linux source code (v5.16.10) - Bootlin
<mrvn> klange: for that I want "volatile struct PerCpu __seg_x18 *this_core;"