#osdev on 2022-02-18 — irc logs at libera.irclog.whitequark.org

2021-05-23 01:57 klange changed the topic of #osdev to: Operating System Development || Don't ask to ask---just ask! || For 3+ LoC, use a pastebin (for example https://gist.github.com/) || Stats + Old logs: http://osdev-logs.qzx.com New Logs: https://libera.irclog.whitequark.org/osdev || Visit https://wiki.osdev.org and https://forum.osdev.org || Books: https://wiki.osdev.org/Books

00:00 <klange> And the fact that subsequent boots are immediate suggest it's not a display problem. The only consistent thing here is "write to card, get 3 minute boot before ARM code runs"

00:00 <clever> and that sounds like its a stage2 problem not stage1

00:00 <clever> so you would instead want to do uart_2ndstage=1 in config.txt on the fat32

00:00 <clever> and then watch the uart

00:00 <klange> UART the only way to get logs out of that?

00:01 <clever> you can run the closed-source vcdbg under linux to query the logs

00:01 <klange> That requires a new SD card... >_<

00:01 <clever> https://www.raspberrypi.org/forums/viewtopic.php?p=1827424#p1827424

00:01 <clever> > There is an alternative implementation of part of vcdbg that you can try: https://git.venev.name/hristo/rpi-vc-log

00:01 <bslsk05> redirect -> forums.raspberrypi.com: Debugging device tree with yocto - Raspberry Pi Forums

00:01 <bslsk05> git.venev.name: rpi-vc-log - Unnamed repository; edit this file 'description' to name the repository.

00:01 <clever> this also claims to be an open source log dumper

00:01 <clever> if you could port that to your os, and give it the ability to read /dev/mem, it may work

00:04 <clever> oh, and on the subject of needing a 2nd card, there is a trick that is poorly documented

00:04 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

00:04 <klange> i wonder if usb boot has this problem, maybe i can just shove this image on a regular USB stick...

00:04 <clever> ~3 ways to trigger it

00:04 <clever> from linux, you can run `reboot 42` to set the boot partition

00:05 <clever> on startup, the firmware will look at the 42nd fat32 partition for all firmware

00:05 <clever> (mbr extended partitions)

00:05 <clever> using that, you can have multiple OS's, each with its own fat32 of firmware+kernel

00:05 <clever> and hot-swap between them

00:06 <mrvn> I should implement snake.

00:06 <clever> plan b: create an autoboot.txt on the 1st fat32 partition, set boot_partition=42 in there, and you can use the officially documented gpio conditionals like [gpio12=0] to make change your mind

00:06 <clever> plan c: (for your os), you encode the partition# into the RSTS register (i can give docs) before you do a watchdog reset

00:08 masoudd has quit [Remote host closed the connection]

00:08 <klange> I have a long weekend ahead and my main goal is to get PCIe initialized and try to start talking to the xhci.

00:08 <clever> oh, there are also 3 things you can do to cheat here

00:08 <clever> 1: only put firmware+kernel on usb, but keep rootfs on SD so you dont rely on pci-e just yet

00:09 <clever> 2: you can get xhci without pci-e!, the usb-c port is an xhci controller

00:09 <mrvn> Plan for next weekend: understand the reloactions in ARM ELF32 and ELF64.

00:09 <clever> 3: put firmware+kernel on tftp, as you mentioned earlier

00:09 <klange> I specifically need the controller the integrated keyboard is attached to :)

00:10 <clever> ah right

00:10 <clever> thats on the pci-e vl805 one

00:10 <mrvn> clever: is that a Pi400?

00:10 <clever> yeah

00:10 <mrvn> Don't have one of those (yet)

00:11 <clever> k: and after you get pci-e online and setup the BAR's, you must tell the firmware to re-init the vl805

00:11 <clever> because RPF cheaped out and the vl805 firmware chip is missing

00:12 <klange> i read about that, they used to include the dedicated EEPROM for it, but in newer board revisions of the 4 they dropped it and now the VC needs to load the firmware, presumably from its own EEPROM?

00:12 <clever> yep

00:12 <klange> it's just a mailbox command after setting up PCIe and initializing a BAR for the xhci controller?

00:12 <clever> yep

00:12 <clever> there is a magic addr and data reg within the pci-e config space

00:13 <clever> if you put an addr into the addr reg, then the data reg acts as a proxy'd window to that addr in the vl805 addr space

00:13 <clever> the VC firmware uses that to side-load a vl805 bootloader, and poke debug registers to force a jump to it

00:13 <clever> and then the vl805 will dma its own firmware from the host ram, also arranged by the VC firmware

00:14 <clever> but it also doesnt have enough ram to hold its own firmware!

00:14 <clever> so it cant boot fully, until you configure the standard xhci scratch-space

00:14 <clever> it will then use that to steal host ram, and keep its firmware/state

00:15 <klange> if you have any docs for this that aren't "here's linux sources" I'd love to read them :D

00:16 <clever> only what ive deduced from decompilign things: https://github.com/librerpi/rpi-open-firmware/blob/master/docs/vl805.txt

00:16 <bslsk05> github.com: rpi-open-firmware/vl805.txt at master · librerpi/rpi-open-firmware · GitHub

00:16 <klange> this is already going to be enough fun as it's my first time seriously poking at an xhci controller / usb in general, so getting all the rpi/bcm unfun out of the way first would be most desirable...

00:17 <clever> there is a second xhci controller you can play with, that has no firmware to deal with

00:17 <clever> if you set otg_mode=1 in config.txt, the broadcom xhci will be routed to the usb-c port, and added to your device-tree

00:17 <clever> no pci-e, no firmware, just plain xhci

00:18 <clever> but its also not able to reach the keyboard, so only external devices

00:18 <klange> that might be worth looking at in the future, but as mentioned earlier, unfortunately my end goal right now is "talk to mushy membrane keyboard" :(

00:18 <clever> and the pi400 can only be powered via usb-c (no gpio tricks)

00:18 <clever> yeah

00:18 <clever> that membrane keyboard, is on a holtek MCU

00:19 <clever> which has its firmware in some write-once flash

00:19 <clever> it deals with converting the matrix into usb, but also deals with the power button logic

00:20 <clever> via unknown means, the holtek can run when "off", and can turn the power to the system back on when you hit f10

00:20 <klange> ah, I was going to ask if there was a way I could possible get that signal without USB

00:20 <clever> and if you hold fn+f10 for just the right amount of time, it will signal a gpio on the soc, telling linux to shutdown

00:20 <klange> docs say 2 seconds

00:20 <clever> if you hold it for too short of a time, nothing happens

00:21 <clever> if you hold it too long, the firmware forces a hard power cut

00:21 <klange> which is probably too long for a crappy one-bit emergency input device...

00:21 <clever> also, the clean shutdown, only happens if you release the key

00:21 <clever> so you have no way of knowing if you held it long enough, until your no longer holding it

00:21 <clever> one sec

00:22 <clever> # evtest --grab /dev/input/event0

00:23 <clever> with this under linux, i can see that fn+f10 just sends the standard KEY_POWER code

00:23 <clever> and --grab means systemd cant react to it

00:23 <clever> but, if i fn and tap f10, nada

00:23 <clever> but if i fn+hold f10, UPON RELEASE, i get the key-down and key-up events!

00:24 <klange> Hm, so, I'm holding this thing down for quite a while and nothing happens, so does that 'firmware forces a hard power cut' maybe need some message from the OS to enable?

00:25 <clever> it only works once start4.elf has booted

00:25 <clever> so if your at the hdmi diagnostics, your screwed

00:25 <clever> just pull the cord

00:25 <klange> I'm booted and watching a Doom demo

00:25 <clever> ive also seen some reports, that the first press of fn+f10 is always ignored

00:25 <clever> only the 2nd press works

00:25 <clever> and onward

00:26 <klange> hopefully my keyboard isn't broken ;) it was working the last time I was running a Linux on it

00:27 <clever> https://al.zerostem.io/~al/dzi/xray.html

00:27 <bslsk05> al.zerostem.io: Xray

00:27 <clever> this also has hi-res photos of some rpi products

00:27 <clever> including the official keyboard, which uses the same MCU

00:29 Burgundy has quit [Ping timeout: 272 seconds]

00:30 <klange> https://cdn.discordapp.com/attachments/711112727426367571/944027636408148038/IMG_8340.jpg

00:30 <klange> long pi

00:30 <clever> at least remove the uSD card before you remove the board

00:30 <clever> thats how cards get beheaded :P

00:31 <klange> ha; old photo anyway, from when it arrived

00:31 <clever> also, notice the empty pads for ethernet leds on the left end?

00:32 <klange> rightmost side of the board near the ETHERNET silkscreen?

00:32 <clever> yeah

00:32 <klange> can I populate those to get my blinkies?

00:32 <clever> possibly

00:32 <clever> the current limiting resistors are also missing

00:32 <clever> just probe those pads with a voltmeter, and you can probably figure it out

00:35 <klange> I am disastrously unfamiliar with these things, despite having been an early buyer of the original Model B a decade ago.

00:36 <clever> http://ledcalc.com/ is the key

00:36 <bslsk05> ledcalc.com: Current limiting Resistor calculator for leds

00:36 <clever> first, look at the pads, and figure out which one is ground, put the black lead there

00:36 <clever> then use the red lead of the voltmeter, to confirm the pad voltage comes from for each led

00:36 <clever> and what voltage it is

00:36 <klange> I mean RPis in general, and I'm not that interested in blinken lights for the Ethernet.

00:37 <klange> Need to write a damn genet driver first anyway...

00:37 <clever> heh, yeah

00:37 <klange> the bsd one is only 1500 lines, can't be that bad?

00:37 <clever> and i think the pi4 series has at least 3 RGMII PHY's to deal with

00:38 <clever> genet isnt on the pci or usb bus

00:38 <clever> so genet may be simpler then pci-e

00:44 <clever> do you happen to have dwc drivers?

00:44 <clever> from past pi models

00:45 <klange> I have never targeted a Pi before.

00:46 <clever> ah

00:46 <klange> as I said, disastrously unfamiliar with these things :)

00:46 <clever> i'm dangerously familiar with the entire rpi line, lol

00:47 <clever> i know things about the hw that even the rpi engineers arent familiar with

00:47 <klange> things i imagine they might be contractually obligated to not try to know

00:48 <clever> 6by9 has clearly answered some questions with "i choose not to answer" i think it was

00:48 <clever> but multiple users claimed a feature doesnt exist, yet i had mountains of evidence it did

00:48 <clever> and i later found the disable flag, in the headers RPF released back in the pi1 days :P

00:48 <clever> multiple engineers*

00:50 <mrvn> might be stuff that only works sometimes

00:50 <clever> mrvn: it was a security flag for the 2d video output hw

00:50 <clever> its been set to non-secure mode since the very first pi firmware ever released

00:50 <clever> and the linux drivers only work when set to non-secure

00:52 <klange> i have so many things on my todo list

00:54 <klange> at least this works https://twitter.com/ToAruOS/status/1493900065383264259

00:54 <bslsk05> twitter: <ToAruOS> That would look a lot better if I had known I needed to ask for the full CPU speed! Despite being an early buyer of the original Model B, I’m not exactly experienced with the RPi’s. https://video.twimg.com/ext_tw_video/1493900031895941120/pu/vid/1280x720/5XgZqHPkGzmBQPDN.mp4?tag=12

00:54 <clever> i assume your not even using the 3d core yet?

00:55 <klange> nope

00:55 <clever> ive got docs and example code for the vc4 era 3d core

00:55 <clever> the bcm2711 3d core has under-documented differences though, but it does have mesa support

00:57 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c#L349-L356

00:57 <bslsk05> github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub

00:57 <clever> when in 2d mode, you must create an array of vertex data

00:57 <klange> I have yet to support any 3D acceleration, or even 2D acceleration, anywhere, and it'll be rather ironic if the VC is the first thing I support

00:57 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/v3d/v3d.c#L409-L411

00:57 <bslsk05> github.com: lk-overlay/v3d.c at master · librerpi/lk-overlay · GitHub

00:58 <clever> and then you create an array of indexes into that vertex data, each set of 3 forms a triangle

00:58 <clever> the hw then figures out every pixel in the triangle, and runs the shader to compute each pixel's color

00:59 <clever> the hw will also interpolate the varyings between the 3 corners, giving a smooth transition from one to the other

01:05 <mrvn> I'm waiting for a libvulcan for the VC.

01:06 <clever> mrvn: what does vulcan really do differently?

01:08 <mrvn> clever: have a nicer API (as horrible as it is) than the VC direct

01:09 <clever> got a pro/con list for each?

01:09 <mrvn> vulcan vs. vc?

01:09 radens has joined #osdev

01:09 <clever> vulcan vs mesa

01:10 <clever> or opengl?

01:10 <clever> you rarely use v3d directly

01:10 <mrvn> it's all chinese to the uninitiated

01:11 <radens> Say I have a passthrough hypervisor which I'm installing a gdb stub in. How do I pull a sibling core out of VM mode so I can poke its registers? Right now I give the guest complete access to the APIC etc.

01:12 <clever> radens: either the hypervisor has to own the irq controller and emulate one to the guest, or the irq controller needs to support VM's and route some irq to the hypervisor and others to the current guest

01:12 <mrvn> clever: vulcan is the nex shiny toy. everybody has to use it

01:12 pretty_dumm_guy has quit [Quit: WeeChat 3.4]

01:12 <radens> clever: yeah that's what I was thinking but I haven't read much about apic virtualization.

01:13 <mrvn> clever: does the mesa for RPi let you set shaders?

01:13 <clever> mrvn: i believe it does

01:14 <mrvn> clever: do you have a shader compiler for the VC or do you have to write them by hand?

01:14 <clever> it compiles standard GLSL to QPU for you

01:15 <clever> the user supplies a vertex and fragment shader

01:15 <clever> but mesa then turns the vertex shader into both a coordinate and vertex shader

01:16 <clever> the coordinate shader takes a list of attributes for each vertex, and turns it into xy coords on a 2d screen, for binning

01:16 <clever> the vertex shader takes a list of attributes, turns it into xy again, but also produces a list of varyings

01:16 <clever> that then forms polygons (usually triangles)

01:17 <clever> the hw will then run the fragment shader on each pixel in the poly, and interpolate the varyings for you

01:17 <clever> https://www.youtube.com/watch?v=GHDh9RYg6WI

01:17 <bslsk05> '2d and 3d demo' by michael bishop (00:00:21)

01:17 <clever> in this demo, the varyings directly set the r/g/b channels

01:17 <clever> but in a more traditional setup, the varyings supply the xy coordinate of a texture, and the fragment shader will query that pixel in the texture

01:18 <gog> https://gist.github.com/adachristine/f4eeb37dc1bdf9d4aaa584e6aa603580 am i doing an atomic increment wrong? i just do "count++" on it and this is the code that it makes

01:18 <bslsk05> gist.github.com: gist:f4eeb37dc1bdf9d4aaa584e6aa603580 · GitHub

01:18 <gog> but it never seems to get incremented

01:18 <gog> should it be volatile?

01:19 <mrvn> where is the source?

01:19 <moon-child> shouldn't need to be volatile

01:19 <moon-child> debugger might help

01:19 <mrvn> 5347: f0 48 0f c1 05 f0 25 lock xadd %rax,0x25f0(%rip) # 7940 <count>

01:19 <mrvn> it's "volatile"

01:19 <gog> yeah i'm stepping it rn

01:20 <mrvn> gog: asre you incrementing it in the IRQ handler and reading it in normal code?

01:20 <moon-child> ohh yea that's probably it

01:20 <gog> yes

01:20 <moon-child> slap an atomic_load in the normal code

01:21 <moon-child> see if it changes

01:21 <gog> aha ok

01:21 <clever> gog: the normal code doesnt know your modifying the var, and cached the load

01:21 <mrvn> yeah, then it's a case for volatile. Except volatile is deprecated.

01:21 <clever> volatile is one solution, to forcing it into loading every time

01:21 <gog> i figured i didn't know the semantics

01:21 <moon-child> also, you can explicitly use a relaxed memory ordering

01:21 <moon-child> assuming this is per core

01:21 <mrvn> no need for atomic there actually

01:21 <moon-child> not appreciably faster but meh

01:22 <moon-child> mrvn: you do need it to be atomic, but it can be relaxed

01:22 <moon-child> cuz reentrancy

01:22 <mrvn> moon-child: not if the IRQ increments it and everyone else only reads

01:22 <moon-child> imagine you have a non-atomic rmw, and it gets preempted between the rm and the w

01:22 <moon-child> well sure

01:23 <mrvn> or, reentrancy would need atomic again

01:24 <mrvn> Is anyone using reentrant IRQs?

01:24 <moon-child> not for timer maybe but, say, event queues. I want to be able to push events both from an interrupt and from regular code

01:25 <gog> well it seems to have done the trick. now to debug the task switch. tomorrow though.

01:28 gog has quit [Quit: byee]

01:30 <mrvn> moon-child: so have the event driver just wake up the driver process.

01:31 <mrvn> event interrupt

01:31 <mrvn> single readre, single writer. makes the whole code much simpler.

01:32 <clever> mrvn: there is a thing most cpu's implement, that make that way simpler then you may even imagine

01:32 <clever> basically, a given 32bit int (width varies by cpu), that is written to a size-aligned address, will never be seen partially written

01:33 <clever> so the reader will only ever see the old value or the new value, and nothing in between

01:33 <mrvn> clever: most drivers need more than 32bit for a request

01:33 <clever> a: use a ring-buffer, and use that 32bit pointer to point to the next message

01:33 <clever> b: have an "is valid" flag in the message, and the reader polls that on the next message in the array

01:34 <mrvn> So then you need a lock for the request structure. And what if the interrupt happens while the request queue ist locked. Now you have to schedule in interrupt. or have critical secions in the driver where IRQs get disabled.

01:34 <clever> write the entire structure first

01:34 <clever> if you get interrupted, the reader doesnt know it exists, no harm

01:34 <clever> once the structure is written, set an "is valid" flag with a 32bit write

01:35 <mrvn> clever: I'm talking hardware structures. You can't pick your own format for e.g. the descriptors for the NIC

01:35 <clever> boom, the reader will now obey it, and you must never modify that object again

01:35 <clever> xhci implements the exact protocol i just described

01:36 <mrvn> clever: that's nice. That still leaves you the problem that you have to try&repeat writing the 32bit pointer.

01:36 <clever> there is no try/repeat, you just write it when the message is fully formed

01:37 <mrvn> clever: and then the IRQ and normal process overwrite each other

01:37 <clever> if you get interrupted, the scheduler will resume you normally, and you will finish the message

01:37 <clever> single-writer, the normal and irq should never fight over a given queue

01:37 <mrvn> clever: can't have 2 writers without extra checks

01:37 <clever> yep

01:37 <mrvn> "able to push events both from an interrupt and from regular code" ==> 2 writers

01:38 <clever> yeah, you would need 2 queues, and the reader must read both

01:38 <clever> or use atomics

01:38 <mrvn> best to avoid it unless really necessary

01:40 <mrvn> What you can do is have the IRQ write and the process read or vice versa. Or have the process write to an im-memory queue and the IRQ transferes from memory to hardware and back.

01:41 <mrvn> But I recommend always having clearly separate jobs for the driver process and IRQ handler.

01:42 <clever> for my 2d driver, a normal thread writes to the dlist memory in the hardware, but that on its own has no effect

01:42 <clever> that normal thread then stores the index for the start of a list, in normal ram

01:42 <mrvn> I'm also a big fan of the automotive factory model of multithreading.

01:43 <clever> the vsync irq handler, then copies that index to the "start of display list" register, causing the actual page-flip

01:43 <clever> so the normal thread sets up the next frame out of irq, and queues it up via a global var

01:43 <clever> and the irq handler is then the reader, and doing the actual page-flip on v-sync

01:43 <mrvn> one writer, one reader. easy.

01:44 <clever> yep

01:44 <mrvn> can the VC do variable refresh rates?

01:44 <clever> there is an "idle" interrupt from the video subsystem

01:44 <clever> which implies it can generate 1 frame and then go idle

01:45 <clever> but i dont know how to configure that, or which outputs support it

01:45 <mrvn> There seem to be 2 standards for it and one you have to pay for.

01:45 <clever> vc4_regs.h:# define SCALER_DISPCTRLX_ONESHOT BIT(29)

01:45 <clever> there is a one-shot mode in the HVS

01:46 <clever> vc4_hvs.c: (oneshot ? SCALER_DISPCTRLX_ONESHOT : 0);

01:46 <mrvn> does that render a new frame each time you write a new dlist?

01:46 <clever> reading the src...

01:46 <clever> vc4_hvs_init_channel() accepts a `bool oneshot`

01:46 <mrvn> .oO(That's how I would implement it in the hardware)

01:47 <clever> and will put the channel into a one-shot mode

01:47 <clever> bool oneshot = vc4_crtc->feeds_txp;

01:47 <clever> ahh, and thats only used by the transposer

01:47 <clever> the transposer is used for rendering to ram

01:48 <clever> so one-shot makes sense there

01:48 <mrvn> It's rather stupid that we still draw a frame at 60HZ to a TFT that first stores it in memory and then draws it to the cells in it's own time.

01:48 <clever> so we have no examples of variable refresh rate to actual displays

01:48 <clever> a lot cheaper tft displays, arent capable of buffering anything

01:49 <clever> you must supply pixels at the correct refresh rate, or the lcd will decay between refreshes

01:49 <mrvn> clever: I've never seen a TFT that couldn't scale the input to fit full screen and that needs buffering.

01:49 <clever> if its capable of scaling, then it has a buffer

01:49 <clever> and thats not part of the display, thats part of the controller

01:49 <clever> i'm thinking low-level stuff like DSI and DPI

01:50 <mrvn> I'm talking desktop monitors here. The embedded TFT displays are a different matter.

01:50 <mrvn> the RPi TFTs probably don't have a buffer.

01:50 <clever> oh, and scaling only needs to buffer 2 scanlines anyways

01:50 <clever> you dont have to buffer the full frame

01:51 <clever> just store line3 and line4, and interpolate as you generate lines 3.1 3.4 3.6 3.8 4

01:51 <clever> and once you cross 4, you received 5, and can repeat

01:51 <mrvn> anyway, even without buffering the TFT holds the pickture a long time. Having the refreshs happen at irregular intervals of roughly 60Hz would always be fine.

01:51 <clever> the rpi even does thet internally, for its image scaling

01:52 <mrvn> Unlike a CRT where the beam follows a fixed pattern that you can't just pause.

01:52 <clever> i should play with that one-shot flag, and see if i can make it do things

01:52 <clever> the design of the hvs is still heavily based around the concept of racing against the electron beam

01:53 <clever> after you configure a list of images to display, the HVS will composite them into a scanline, and push that into a fifo

01:53 <mrvn> Would be nice for games. Render the frame, push it out the display port, render next frame, push it out. If the rendering takes a bit longer or shorter? Who cares.

01:53 <clever> and the fifo lets the hvs do a bursty/jittery job, as dram latencies slow it down

01:53 <clever> yeah, thats a problem in the opengl api

01:54 <clever> if it takes you 1.1 frame-times to render, you may miss every 2nd vsync

01:54 <clever> and your 60fps instantly turns into 30 fps

01:54 <mrvn> it's a hardware problem that should have been fixed since TFTs came about.

01:54 <clever> oh, that reminds me of other weirdness

01:55 <clever> my lcd monitor, cant go much beyond 60hz

01:55 <clever> my crt monitor, can easily do 120hz

01:55 <mrvn> Also why do PCs never generate a vsync interrupt so videos don't tear?

01:55 <clever> xorg doesnt have a good api for that

01:56 <clever> you know about front and back porches?

01:56 <mrvn> The odd part is that if the video tears it ALWAYS tears at the same spot in the frame as far as I can tell. So something synchronises it exactly to the frame rate but just offset so it tears.

01:56 <clever> i had that exact bug, when i would write the whole dlist on vsync

01:57 <clever> it was taking too long, so it page-flips 10% into the frame

01:57 <clever> thats why i now pre-write the dlist out of irq, before vsync even occurs

01:57 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/docs/pixelvalve.md#pv-inten

01:57 <bslsk05> github.com: rpi-open-firmware/pixelvalve.md at master · librerpi/rpi-open-firmware · GitHub

01:57 <mrvn> yeah, the flip has to be atomic

01:57 <clever> i also have 10 interrupts to pick from

01:58 <clever> hsync start, h-backporch start, hactive start, h-frontporch start, vsync start, v-backporch start, vactive start, v-frontporch start, f-frontporch end, and idle!

01:58 zaquest has quit [Remote host closed the connection]

01:58 <mrvn> clever: I think there it shows that the VC was made to displays DVDs. Can't have tearing in a DVD player.

01:59 <clever> the HVS can also accept yuv420 images right ouf of the mpeg2 decoder

01:59 <clever> so you dont have to spend any cpu on yuv->rgb conversions

01:59 <clever> oh, and the bcm2835, was in the roku2, a media streaming brick!

01:59 zaquest has joined #osdev

02:02 <mrvn> It's really sad. Back in the 90s on my Amiga I had perfectly synced video output without tearing. Hell, in the 80s on my C64 I had that too. Now we have 2022 and videos tear.

02:02 <clever> have you seen what the emu68 guys are doing?

02:02 <mrvn> no

02:02 <clever> https://www.youtube.com/watch?v=wzPlVn-a8yg

02:02 <clever> watch this

02:02 <bslsk05> 'Amiga 2000 playing MP3 while rendering [ PiStorm / Emu68 ]' by Simo Koivukoski (00:02:30)

02:03 <clever> that is an rpi, replacing the 68k cpu of an amiga 2000, and emulating just the cpu, and then bit-banging the amiga bus to drive the rest of the system

02:03 <mrvn> How did they interface the RPi?

02:04 <clever> the GPIO pins are basically just going thru a level shifter and into the addr/data bus

02:04 <mrvn> wow. Bit banging at 16MHz is hard.

02:04 <clever> the SMI peripheral speeds that up a lot

02:05 <mrvn> Plus you have a shared bus and have to honor the bus protocol

02:05 <clever> the rpi has a dedicated hw interface, for controlling an 18bit data bus with a 6bit addr bus

02:05 <clever> and support for external signals to stall it

02:05 xenos1984 has quit [Read error: Connection reset by peer]

02:05 <clever> https://www.youtube.com/watch?v=-MAnG86SmUk

02:05 <bslsk05> 'Amiga 2000 4K UHD RTG [ PiStorm / Emu68 ]' by Simo Koivukoski (00:03:05)

02:05 <clever> they then wrote RTG drivers for the HVS with my help

02:06 <clever> so you now get 4k uhd on amigaos :P

02:06 <clever> https://www.youtube.com/watch?v=xf97KXPZ1Lo and here is an FPS demo

02:06 <bslsk05> 'piStorm Emu68 barebone FPS showcase' by Tomaz Kac (00:13:45)

02:06 heat has joined #osdev

02:06 <zid> I just wanted to take a moment to point out how amazing the playstation docs are

02:06 <zid> 16-31 0 Not used (usually 0 for OTC, or 1 ("one block") for CDROM)

02:06 <mrvn> clever: but that's on the RPi video, right?

02:07 <zid> they're 0, not used, but also 0, and also 1

02:07 <clever> mrvn: yeah, the RTG drivers are passing bitmaps from the amigaos guest to the rpi HVS, where it then comes out hdmi

02:08 <clever> mrvn: the crazy thing, is that none of these drivers are implemented in the host, the MMIO is just directly exposed to the guest, and all rpi drivers get compiled to 68k asm, lol

02:08 <mrvn> clever: is that the normal bitmaped data or chunky like any modern hardware has?

02:08 <clever> mrvn: https://www.youtube.com/watch?v=x7eqt2ogtwc

02:08 <bslsk05> 'Tweaking VC4 scaler engine' by Michal Schulz (00:01:55)

02:09 <clever> i pointed the emu68 dev towards the scaling kernel, and this then popped out a few days later, lol

02:09 <clever> those parameters control how it up-scales your low-res bitmap to the hdmi resolution

02:09 <clever> and let you get either fuzzy or blocky pixels

02:10 <mrvn> But from the sceen mode selector that looked like chunky pixel. So basically the RPi is a separate graphics card add-on.

02:11 <clever> yeah, from the viewpoint of the amigaos, its using the RTG (retargettable graphics) api

02:11 <clever> which is what a lot of amiga add-on cards used

02:11 <clever> instead of the program directly talking to the hw, it uses a proper api, which you can then replace routines in, to create new drivers

02:12 <clever> but older amiga programs dont speak RTG, and still spit video out of the original amiga video chip

02:12 <mrvn> clever: all the GUI stuff is nice that way. But games often didn't use the OS.

02:12 <clever> exactly

02:12 <clever> there is a seperate project, that clamps onto the stock video chip, and sniffs the digital rgb signals going to the DAC

02:12 <mrvn> Is there code to transform the graphics memory into a texture for the RPi?

02:12 <clever> it then plugs into the camera port of an rpi, and shows up as a video stream

02:13 <mrvn> that's cool too.

02:13 <clever> you cant just read the graphics memory, because there are chips on the motherboard modifying the bitmap without the cpu being aware

02:13 <mrvn> clever: you can read it. or snoop the bus.

02:14 <clever> the 68k cpu port cant snoop that bus

02:14 <clever> its pins dont change when another device does DMA

02:14 vin has quit [Ping timeout: 240 seconds]

02:14 <clever> and you would burn a ton of bus bandwith fighting the onboard GPU over re-reading the graphics memory

02:14 <mrvn> oh? It only gets the bus-busy signal and doesn't see the address and data lines?

02:14 <clever> other chips on the board also change the gpu config on h-sync interrupts

02:15 <clever> so the bitmap your reading, wont even be at the same address for a full frame

02:15 vin has joined #osdev

02:15 <clever> all kinds of crazy stuff is happening

02:15 dude12312414 has joined #osdev

02:16 <mrvn> clever: not just h-sync. The Amiga has a processor that runs code syncron to the beam. You can wait for screen positions and then modify graphics register on the fly.

02:16 <clever> exactly

02:16 <clever> and you have no easy way to know what the config is at a given pixel and re-creating the image

02:16 <mrvn> well, you can read the code

02:17 <clever> you also have to emulate the code on both the 68k and the copper, to cycle level accurate

02:17 <clever> and at that point, you might as well just be using a full system emulator on a normal system, and ditch the amiga board

02:18 <mrvn> And if you emulate the copper and alice and all you wouldn't need to write the data to the slow graphics memory. Just keep it in RPi memory.

02:18 <mrvn> :)

02:18 <clever> those chips might be reading data the floppy controller wrote to chipram

02:18 <mrvn> not sure the floppy does DMA. but I get the point.

02:18 <clever> it does

02:19 <clever> the floppy controller writes the raw data directly to ram, a whole track at a time

02:19 <mrvn> Do you know what and emu68 interface costs?

02:19 <clever> and the copper then does demodulation

02:19 <clever> the board is called pistorm, and its for sale on ebay and other places

02:20 <clever> looks like about $37 on ebay

02:21 <mrvn> Crap, A500, A500+ and A2000 only. I could do A600, A1200 or A4000.

02:22 <clever> where did i put those notes...

02:23 <clever> ah, pistorm600 is for the A600

02:24 <mrvn> Looks like quite a bit of hardware on the PoStorm to handle the bus protocol.

02:24 <clever> for the old revision, it was just 8bit latches and thats it

02:24 <clever> the newer one uses an fpga

02:25 <clever> got a discord acct?

02:25 <mrvn> Hard to bit bang on the clock edge otherwise.

02:25 xenos1984 has joined #osdev

02:29 <mrvn> Just the other week I've been thinking about what it would take to make a Zoro3 board for the amiga that connects to an adruino or RPi.

02:30 <mrvn> Taking over the cpu is even cooler though.

02:41 zid has left #osdev [#osdev]

02:45 [_] has joined #osdev

02:45 [itchyjunk] has quit [Killed (silver.libera.chat (Nickname regained by services))]

02:45 [_] is now known as [itchyjunk]

02:46 _xor has quit [Quit: brb]

02:47 ElectronApps has joined #osdev

02:53 epony has quit [Ping timeout: 240 seconds]

03:35 pounce has joined #osdev

03:49 radens has quit [Quit: Connection closed for inactivity]

03:58 Maka_Albarn has joined #osdev

03:59 <Maka_Albarn> Hey, does anyone know of a way to write multiple assembly functions at the same time with AT&T assembly? And I mean functions that do the same thing with an iterating value?

04:00 <Maka_Albarn> I'm wanting to avoid having to write individual stubs for IRQs if I can make a macro to fill in the contents.

04:06 <heat> .macro and .endm

04:06 <heat> you can also use .rept and (.endp? not sure) plus some .set trickery to do an effective for loop of macros

04:13 <Maka_Albarn> would this work? https://gist.github.com/MakaAlbarn001/3b212c8fffb16f87cae2c1d76dd2066e

04:13 <bslsk05> gist.github.com: macro.s · GitHub

04:13 <geist> and/or you can compile it with gcc and use the preprocessor

04:13 <geist> #define and whatnot

04:13 <geist> yah that should work, or something like that

04:13 <Maka_Albarn> just multidefine irq x?

04:13 <geist> there's even a way to spit out 256 of them in a row with an incrementing value

04:14 <Maka_Albarn> \me's eyes widen

04:14 <geist> https://github.com/littlekernel/lk/blob/master/arch/x86/64/exceptions.S#L27 etc

04:14 <bslsk05> github.com: lk/exceptions.S at master · littlekernel/lk · GitHub

04:14 <Maka_Albarn> stupid formating

04:14 <geist> that's a little fancier than you need, but the .rept stuff + .set i = i + 1 lets you iterate with.a different vlaue between runs

04:15 <klange> well that was a huge pain in the ass https://cdn.discordapp.com/attachments/711112727426367571/944084329762521158/IMG_8678.jpg

04:15 <Maka_Albarn> hmm

04:16 <geist> i did it that way so that each of the ISR functinos are a fixed size so when setting up the IDT i can just bump a pointer forward

04:16 <geist> and plus they're ailgned for performance (as if it matters)

04:16 <geist> but you could easily have that spit out a isr0: isr1: isr2: etc

04:16 <Maka_Albarn> what happens in assembly if you have something like pushl $(1+3)?

04:16 <klange> i have this mess https://github.com/klange/toaruos/blob/master/kernel/arch/x86_64/irq.S#L13-L87

04:16 <bslsk05> github.com: toaruos/irq.S at master · klange/toaruos · GitHub

04:17 <Maka_Albarn> does it run as pushl $4?

04:17 <geist> it should

04:17 <clever> https://github.com/librerpi/rpi-open-firmware/blob/master/firmware/traps.S#L30-L44 and i have this series of .macro's

04:17 <bslsk05> github.com: rpi-open-firmware/traps.S at master · librerpi/rpi-open-firmware · GitHub

04:17 <geist> assember can solve simple arithmetic

04:17 <Maka_Albarn> I just wanted to make sure that it wouldn't read it as a memory location or something stupid

04:18 <clever> Maka_Albarn: ive also done symbol - symbol as well, to compute sizes: https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/arm/payload.S#L18-L19

04:18 <bslsk05> github.com: lk-overlay/payload.S at master · librerpi/lk-overlay · GitHub

04:18 <clever> this turns a start/end symbol, into a start/length value

04:19 <clever> and then on the c side, i claim arm_payload_array is an array of structs, so it turns into a simple structure

04:23 <Maka_Albarn> heat: Thank you. That worked perfectly

04:24 heat has quit [Ping timeout: 240 seconds]

04:27 <klange> this was an amusing thing to debug

04:27 <klange> first I had an issue where I wasn't even sending the wakeup signal correctly because I used one of my MMU APIs that I had forgotten considered '0' a special value _and guess what page the spin tables are in_...

04:28 <klange> Then I struggled with cache problems, as I was trying to store things in the kernel and then reference them without the mmu in the AP startup sequence

04:28 <klange> and then I was scratching my head for a while until I realized I forgot to flush _one_ variable...

04:29 <clever> ahh

04:29 <klange> I ended up debugging the whole thing by reusing the _one_ piece of data I knew was working: the spintable.

04:30 <klange> So after the asmstub code jumped to my code, I replaced the spintable value with a series of other values, which the main cpu was reading in a loop

04:30 <clever> ahh

04:30 <clever> i did that kind of debug, by just printing a char to the uart for each core

04:31 <clever> like printing out '0' + corenr

04:32 <Maka_Albarn> \me

04:33 * Maka_Albarn hums

04:34 <klange> the pi is a bit warmer, but not on fire, I'll let it spin on doom for a while

04:34 <clever> the official firmware will also do thermal throttling

04:34 <clever> so the arm freq will be changed without warning, if things start to cook

04:35 <klange> I don't have anything monitoring it, but Doom is still as smooth as it was. I'm using ~25-50% of each core according to my idle stats

04:35 <clever> oh, i can tell you how to read the temp sensor

04:36 <clever> https://github.com/librerpi/lk-overlay/blob/master/platform/bcm28xx/temp/temp.c

04:36 <bslsk05> github.com: lk-overlay/temp.c at master · librerpi/lk-overlay · GitHub

04:36 <clever> this came from the linux source

04:36 <clever> you can entirely ignore setup_tsens, the official firmware already did that for you

04:37 <clever> so you just read TS_TSENSSTAT and do a bit of math

04:37 <clever> my code is using floats, but linux was doing fixed-point math instead

04:47 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

05:00 vdamewood has joined #osdev

05:00 <Matt|home> well that's a bit demotivating. initial research into foss contributions is showing me "yeah we don't need new coders, go document stuff for us". bleh..

05:00 <dmh> tell them to kick rocks

05:02 <moon-child> yes oss is a scam

05:02 <moon-child> fuck the church^Wcathedral

05:03 vinleod has joined #osdev

05:03 <Matt|home> im not talking about a particular project, just saying it seems like my limited skillset isn't in high demand right now. i could be very wrong

05:04 vdamewood has quit [Killed (molybdenum.libera.chat (Nickname regained by services))]

05:04 vinleod is now known as vdamewood

05:12 <kingoffrance> actually, this is a good timeto note, part of the problems with the bazaar v cathedral comparisons, is grace/new testament went out with monarchies. its always been "law" "no man is above the law" for a long time. there are no cathedrals for centuries

05:12 <kingoffrance> it was always kind of an idiotic comparison

05:13 <kingoffrance> ditto corporations replaced bazaars centuries ago

05:13 <kingoffrance> people should be a lot less shallow with analogies

05:19 the_lanetly_052 has joined #osdev

05:19 Jari-- has joined #osdev

05:19 the_lanetly_052 has quit [Max SendQ exceeded]

05:19 <Jari--> morning all

05:19 <kingoffrance> one million dollars </dr evil> you know thats not a lot of money?

05:20 the_lanetly_052 has joined #osdev

05:20 <kingoffrance> its best to retire that analogy

05:21 the_lanetly_052 has quit [Max SendQ exceeded]

05:21 <klange> "Don't you think we should ask for more than a million dollars? A million dollars isn't that much money these days..."

05:21 the_lanetly_052 has joined #osdev

05:21 <klange> "Alright then... FIVE MILLION DOLLARS!" "Virtucon alone makes over nine billion dollars a year." "Oh, relly? One-hundred billion dollars. OK, make it happen. Anything else?"

05:21 eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

05:22 the_lanetly_052 has quit [Max SendQ exceeded]

05:22 the_lanetly_052 has joined #osdev

05:23 the_lanetly_052 has quit [Max SendQ exceeded]

05:24 the_lanetly_052 has joined #osdev

05:25 the_lanetly_052 has quit [Max SendQ exceeded]

05:25 the_lanetly_052 has joined #osdev

05:27 the_lanetly_052 has quit [Max SendQ exceeded]

05:27 the_lanetly_052 has joined #osdev

05:31 <Maka_Albarn> Ooooookaaaaaay, I set up the IRQ remapping and installed the interrupt entry points into the IDT. I turned on interrupts with "sti" and got GP fault with an error code of 0x103.

05:31 <Maka_Albarn> ummmm, what?

05:32 <klys> does your handler ack the interrupt?

05:32 <Maka_Albarn> ???

05:32 <klys> out 0x20,0x20

05:32 <Maka_Albarn> I know it happens during my main() for loop.

05:33 <Maka_Albarn> klys: yes, my handler gives my PIC the End-of-Interrupt

05:34 <klys> do you return from 32 bit code with iretd ?

05:34 <klys> if it was 64 bit code, iretq?

05:35 <Maka_Albarn> it's all 32-bit code

05:35 <Maka_Albarn> iret

05:36 <Jari--> apple juice makes brains work better.. umph umph some summer wine yesterday :-]

05:36 <klys> is your code segment executable?

05:37 <Jari--> exciting to get assynchronous code errors, never know where the fun is at

05:40 <Maka_Albarn> umm... supposedly my Data segment selector is reading as 0x100010... that's not right...

05:41 <Maka_Albarn> is the segment selector supposed to hold 3 bytes?

05:41 <Maka_Albarn> I don't think so...

05:41 <klys> only 16 bits

05:54 <klys> geist, i spent a bit less on my new nezha sbc

05:54 <klys> visionfive appears to sell for 209. advantages?

05:55 <klys> oh this has 8gb ram

05:55 <Maka_Albarn> I found the problem. I accidentally swapped the attribute and selector arguments for my idt_set_gate() function for my IRQs.

05:56 <klys> maka_albarn, good going!

05:59 bradd has quit [Ping timeout: 240 seconds]

06:08 [itchyjunk] has quit [Remote host closed the connection]

06:13 _xor has joined #osdev

06:14 <Maka_Albarn> time to go. got work tomorrow.

06:14 Maka_Albarn has left #osdev [#osdev]

06:16 <geist> klys: yah they might have a 4GB versino at some point, but it seems to be greyed out

06:17 <geist> main reason i'd generally stay away from the nezha is its using the allwinner D1 which uses a C906 core which is pretty ancient

06:17 <geist> and ancient as in it's using an older priviledged spec, etc. ie 'nonstandard'

06:17 <geist> does it matter much? probably not, but it's a detail that has to be dealt with

06:22 eroux has joined #osdev

06:32 vdamewood has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

07:15 JanC has quit [Remote host closed the connection]

07:16 JanC has joined #osdev

07:16 <kingoffrance> update: hercules flipped the ferryman's oar back on him. hercules, doesnt need no stinking $ in makefiles

08:16 masoudd has joined #osdev

08:43 the_lanetly_052 has quit [Ping timeout: 256 seconds]

08:46 GeDaMo has joined #osdev

08:58 matrice64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

09:03 Jari-- has quit [Ping timeout: 256 seconds]

09:10 <kingoffrance> i changed my mind GeDaMo -- hercules flipped the boatman's oar, no need to use $ in makefiles if you are hercules

09:17 <GeDaMo> Depends who the boatman is, I suppose :P

09:33 pretty_dumm_guy has joined #osdev

09:43 Burgundy has joined #osdev

10:07 gog has joined #osdev

10:09 dormito has quit [Quit: WeeChat 3.3]

10:13 Irvise_ has quit [Quit: Bridge terminating on SIGTERM]

10:13 nopenope[m] has quit [Quit: Bridge terminating on SIGTERM]

10:13 paulusASol has quit [Quit: Bridge terminating on SIGTERM]

10:13 junon has quit [Quit: Bridge terminating on SIGTERM]

10:13 niedzejkob[m] has quit [Quit: Bridge terminating on SIGTERM]

10:16 Matt|home has quit [Ping timeout: 256 seconds]

10:17 Irvise_ has joined #osdev

10:32 nopenope[m] has joined #osdev

10:32 paulusASol has joined #osdev

10:33 junon has joined #osdev

10:52 dormito has joined #osdev

11:15 rwb is now known as rb

11:57 elastic_dog has joined #osdev

12:07 <j`ey> geist: https://twitter.com/ChartreuseK/status/1494623622891147266

12:07 <bslsk05> twitter: <ChartreuseK> Little project teaser. 6800 @ 921kHz, dual 6850 UARTs, 2 or 4kB of ROM, and up to a whopping 8kB of static RAM. https://pbs.twimg.com/media/FL339Z1UcAEVXGk.png [Fennec 6800 (homebrew) PCB render]

12:08 <GeDaMo> For a second I read that as 921MHz :P

12:09 <j`ey> overclocked1

12:09 <GeDaMo> "Place iceberg here" :P

12:28 [itchyjunk] has joined #osdev

12:34 eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

12:39 Brnocrist has quit [Ping timeout: 256 seconds]

12:40 Brnocrist has joined #osdev

12:49 epony has joined #osdev

13:05 lkurusa has joined #osdev

13:05 eroux has joined #osdev

13:14 lkurusa has quit [Read error: Connection reset by peer]

13:15 lkurusa has joined #osdev

13:32 k8yun has joined #osdev

13:36 sheb has joined #osdev

13:37 sheb has quit [Remote host closed the connection]

13:43 xenos1984 has quit [Remote host closed the connection]

13:43 xenos1984 has joined #osdev

13:54 dude12312414 has joined #osdev

14:02 <mrvn> j`ey: How do people always manage with so few traces and holes? My designs always end up total sphagetti and impossible to route.

14:03 <j`ey> practise? idk

14:03 <j`ey> I've never done anything that complex

14:03 <GeDaMo> Do people not use automatic layout then manually fixup?

14:03 <mrvn> GeDaMo: sure

14:05 <GeDaMo> https://img1.ak.crunchyroll.com/i/spire2/2177443e41fe27844f90e18ec958f3e31334696935_full.jpg

14:05 <mrvn> You should see my 16bit barrel shifter.

14:06 lkurusa has quit [Quit: I probably fell asleep (or went out). Who will ever know.]

14:06 nyah has joined #osdev

14:07 <mrvn> 24x 74xxx 4x 2:1 MUX chips plus some control logic. Can't resolve all the wire corssings on a 10cm x 10cm board.

15:04 X-Scale` has joined #osdev

15:05 X-Scale has quit [Ping timeout: 272 seconds]

15:05 X-Scale` is now known as X-Scale

15:16 X-Scale` has joined #osdev

15:17 X-Scale has quit [Ping timeout: 272 seconds]

15:17 X-Scale` is now known as X-Scale

15:17 jborgner has joined #osdev

15:22 k8yun has quit [Read error: Connection reset by peer]

15:23 k8yun has joined #osdev

15:23 jborgner has left #osdev [#osdev]

15:30 paulbarker has quit [Read error: Connection reset by peer]

15:31 paulbarker has joined #osdev

15:33 jborgner has joined #osdev

15:34 the_lanetly_052 has joined #osdev

15:37 jborgner has quit [Quit: WeeChat 3.4]

16:07 matrice64 has joined #osdev

16:10 eroux has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

16:13 ElectronApps has quit [Remote host closed the connection]

16:15 <geist> j`ey: cute. using old school ram too

16:28 the_lanetly_052 has quit [Remote host closed the connection]

16:30 dude12312414 has quit [Ping timeout: 240 seconds]

16:32 the_lanetly_052 has joined #osdev

17:01 the_lanetly_052_ has joined #osdev

17:03 the_lanetly_052 has quit [Ping timeout: 256 seconds]

17:08 <mrvn> That's not old school ram, this is old school ram: https://en.wikipedia.org/wiki/Magnetic-core_memory#/media/File:KL_CoreMemory.jpg

17:08 <bslsk05> en.wikipedia.org: Magnetic-core memory - Wikipedia

17:11 <mrvn> and now I wonder where I can get a bag of ferrite rings to build one of those.

17:14 <GeDaMo> https://hackaday.com/2022/02/14/printing-magnets/

17:14 <bslsk05> hackaday.com: Printing Magnets | Hackaday

17:22 <mrvn> GeDaMo: you want to print some core memory?

17:22 <GeDaMo> Sure! :P

17:23 <GeDaMo> I'd like to be able to print integrated circuits

17:23 <mrvn> I think I have 4 ferrite rings or so from old PCs where they looped the cable through one as filter.

17:24 <mrvn> Those big ~2cm ones.

17:24 <GeDaMo> Don't they still use ferrite cores on cables but in plastic containers?

17:25 <gog> yeah

17:26 <mrvn> monitor cables prior to display port

17:26 <GeDaMo> Ah, that's what I'm thinking of

17:26 <gog> i had a 2.1 speaker setup where i actually had to add one to the subwoofer because it would hum very loudly if the cops rolled by and were using their radios

17:26 <gog> which was often

17:28 <mrvn> Try running a poorly shielded Amiga with MC68060 @ 50Hz next to your radio tuned to SWF3 at 100.3 MHz.

17:29 <mrvn> I can tell you jpeg decoding has a quite distinct sound.

17:29 <gog> that'd do it

17:29 <gog> reminds me of my front panel audio connectors on my old rig

17:37 <mrvn> gog: Ever used an Arduino as a radio station?

17:39 <gog> never before

17:40 <gog> i've dabbled with RTL-SDR receivers though

17:47 <mrvn> you can connect a longish wire to a pin and bit-bang it at radio frequencies and tune a radio to it.

17:47 mahmutov has joined #osdev

17:47 <clever> mrvn: that long wire is even optional, as is gpio: https://spritesmods.com/?art=avrfmtx

17:48 <clever> in the case of the avr, there is a *4 PLL inside the cpu, so if you overclock it to 24mhz, there is a 96mhz signal deep within the cpu core

17:48 <clever> and at short enough ranges, that can be picked up by a radio

17:49 <gog> neat

17:49 <clever> and if you wiggle your overclock up/down, the 96mhz will wiggle, and boom, FM

17:51 <GeDaMo> You used to be able to use radios to listen to (really) old computers running :P

17:51 xenos1984 has quit [Read error: Connection reset by peer]

17:52 <mrvn> it's only 2-3 decades. You call that old? tssss

17:52 xenos1984 has joined #osdev

17:55 <clever> GeDaMo: one of my old laptops, i could hear the cpu usage, without any radio

17:55 <clever> something in the cpu power regulator was vibrating at audio freqs

17:56 <gog> i can sometimes hear mine whine

17:56 <gog> especially running games

17:56 <gog> idk if it's coming from the speakers or if it's some kind of timer circuitry

17:56 <gog> or a combination

17:57 <GeDaMo> At this place I used to work, we had shortwave radios for communicating around the building; whenever you triggered one near a (CRT) monitor, the monitor would give a loud clunk and go blank for a second

17:57 <GeDaMo> If that's what it was doing to the monitor, what was it doing to my head? :|

17:58 <gog> degauss your hippocampus for long life and health

17:58 <clever> oh right, there was also software exploits for crt displays

17:58 <gog> killer poke?

17:58 <clever> no, data extracting

17:59 <clever> you mess with the video timing parameters in just the right way, so the RF leaking out of the monitor encodes data you want to extract

17:59 <GeDaMo> https://en.wikipedia.org/wiki/Tempest_(codename)

17:59 <clever> and somebody can then recover the data from that

17:59 <bslsk05> en.wikipedia.org: Tempest (codename) - Wikipedia

17:59 <clever> yep

17:59 <gog> ohhh i see

17:59 <clever> instead of broadcasting on a normal freq, you just modulate the normal rf leakage that is expected/ignored

18:00 <gog> like the whole star trek "hiding the comms signal in a warp signature"

18:00 <clever> yep

18:03 <GeDaMo> "In January 2015, the Airhopper project from Georgia Institute of Technology, United States demonstrated (at Ben Gurion University, Israel) the use of Van Eck Phreaking to enable a keylogger to communicate, through video signal manipulation, keys pressed on the keyboard of a standard PC, to a program running on an Android cellphone with an earbud radio antenna."

18:03 <GeDaMo> https://en.wikipedia.org/wiki/Van_Eck_phreaking

18:03 <bslsk05> en.wikipedia.org: Van Eck phreaking - Wikipedia

18:06 <clever> i'm also able to control the rpi hw to that level, and can change the timing parameters freely

18:07 <clever> but for some outputs (ntsc/pal), the hw enforces using the right ratio of periods (but the reference clock can be played with)

18:07 <clever> for digital outputs (hdmi), the display is counting the length of periods, and will get upset if your off by even 1 count

18:08 <clever> so that only leaves DPI->VGA

18:46 <mrvn> clever: maybe you can mess with the frequencies in a way you still end up count perfect at the end,.

18:46 <clever> mrvn: yeah, messing with the pixel clock is the only option that works on every output

18:47 <clever> but it needs to be a small enough change to not make the display loose its lock

18:48 <mrvn> you can also just mess with the lowest bits in the picture. Add a bit of noise that's actually data.

18:52 <clever> mrvn: WoW does that to its screenshots!, but getting data encoded into hdmi like that is harder

18:52 <clever> https://github.com/swetland/gateware/blob/master/hdl/display/dvi-encoder.sv

18:52 <bslsk05> github.com: gateware/dvi-encoder.sv at master · swetland/gateware · GitHub

18:53 <mrvn> clever: should be trivial. Run the RPi with linux like now but in the VC's dlist add a texture to manipulate the lowest bits.

18:53 <clever> mrvn: dvi/hdmi uses a complex scrambling code to both avoid dc current flow (keep an even match of +- and -+ on the diff pairs), and also to avoid creating pure tones on the rf leakage

18:53 <mrvn> I don't mean manipulate the hdmi stream. Manipulate the picture itself.

18:54 <clever> yeah, you could overlay a picture with some alpha, to modify pixels slightly

18:54 <clever> but then how do you detect that externally?

18:54 <mrvn> clever: camera on the display through the window.

18:54 <clever> ah

18:55 <clever> but its likely not exposed to a window, because you could also scrape secrets that are just displayed normally under normal use

18:55 <mrvn> I'm pretty sure with some modeling you could also create frequencies in the hdmi stream.

18:56 <mrvn> it's more designed against accidental leakage than intentional.

18:56 <clever> yeah

18:59 <mrvn> I should look into VC coding for stuff like raid6 and reed-solomon.

19:00 <mrvn> The GPU should be really good at doing bit operations on MB chunks of memory.

19:01 <clever> yeah

19:01 <clever> have i explained the vector core to you before?

19:01 <mrvn> Just needs to be something complex enough to outweigh the time to load/save the data.

19:02 <clever> basically, imagine a 64x64 spreadsheet, each cell can hold an 8bit int, 0-255

19:02 <gog> it'd make rebuilding a degraded set a little less frustrating i imagine

19:03 <gog> if it was indeed faster

19:03 <clever> for each vector, i specify a coordinate (row#, col#), a direction (row or column mode), and a bit width (8bit, 16bit, 32bit)

19:03 <mrvn> gog: not really. The limiting factor is the disk speed there already. But it would use less cpu.

19:03 <gog> ah ok

19:03 <mrvn> My raid rebuild at ~550MiB/s,

19:04 <gog> oh that's faster than my SSD can even read

19:04 <clever> mrvn: so for example, H(5,6) says to load a 16 wide chunk, starting at row5, column6 and ending at column21

19:04 <mrvn> 110MiB/s per disk.

19:04 <mrvn> The disk could do more but I think the controller can't.

19:04 <clever> mrvn: following along so far?

19:04 <mrvn> clever: I know how it works

19:04 <clever> ah

19:07 <mrvn> gog: Have you ever considered the lifetime of your SDD or an M2.key? I have a 2TB M2.key that can do 2GB/s throughput. It's expected to last for at least 400TB writes (or so). Sounds like a lot right?

19:07 <mrvn> 400*1000/2/60/60 = ~55.5 hours.

19:08 <gog> i have and i'm pretty sure at the rate i use it its expected life is about 3 years

19:08 <gog> i've had it for 4 :|

19:08 <mrvn> gog: warranty + 1 day.

19:08 <gog> lol

19:08 <gog> yeah it's well out of warranty

19:09 <clever> Data Units Written: 669,977,243 [343 TB]

19:09 <clever> for the main ssd in my desktop

19:10 <clever> Percentage Used: 70%

19:10 <mrvn> So the M2.key is expected to fail after 55 hours at full speed. It's also supposed to last 3-5 years (depending on what you buy). Lets you compute the avearge write speed they expect you to have.

19:10 <mrvn> clever: is that SMART data?

19:10 <clever> yep

19:11 <gog> 231 SSD_Life_Left 0x0000 015 015 000 Old_age Offline - 85

19:11 <gog> not bad i guess

19:11 <gog> 16TB lifetime writes

19:11 <gog> 134 unsafe shutdowns

19:12 <gog> but no erase failures

19:12 <clever> Unsafe Shutdowns: 227

19:12 <gog> 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 20807

19:12 <mrvn> gog: doesn't it have a little capacitor to safely shutdown on power loss?

19:12 <clever> Power On Hours: 29,646

19:12 <gog> mrvn: idek about that

19:13 <mrvn> Model Number: INTEL SSDPEKNW020T8

19:13 <clever> Model Number: INTEL SSDPEKKW512G7

19:13 <mrvn> Temperature: 37 Celsius

19:13 <mrvn> Data Units Read: 41,057,477 [21.0 TB]

19:13 <mrvn> Data Units Written: 47,942,854 [24.5 TB]

19:13 <clever> Temperature: 31 Celsius

19:14 <mrvn> Power On Hours: 12,644

19:14 <mrvn> Still good with that one.

19:14 <gog> Device Model: KINGSTON SA400S37120G

19:14 <clever> at what point should we just pastebin the entire thing? lol

19:14 <gog> lol

19:14 <mrvn> sorry.

19:15 <mrvn> I've written the whole thing 12 times but only read it 10 times. interesting.

19:15 <clever> Data Units Read: 232,332,603 [118 TB]

19:15 <clever> Data Units Written: 669,977,243 [343 TB]

19:15 <clever> Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]

19:16 <mrvn> Looks like I reboot every 22 days on average.

19:16 <clever> 99days for my desktop

19:17 <mrvn> Most reboots will be clustered. A lot for the initial setup and install.

19:17 <clever> 15:17:09 up 360 days, 18:18, 22 users, load average: 0.00, 0.01, 0.05

19:17 <clever> nearing 1 year for the irc client box

19:17 <mrvn> clever: what does your disk say?

19:17 <clever> for the irc client, *looks*

19:17 <mrvn> power on hours / power cycles = ???

19:18 <gog> 12 days uptime lol

19:18 <clever> sda, 173 power cycles, 114845 hours, 663 days avg

19:18 <gog> i need to remove a bunch of plasma/kde stuff

19:18 <gog> updates are kindof a chore rn

19:18 <mrvn> I think reboots don't register on the disk, only cold starts, right?

19:18 <clever> sdb 100 power cycles!!, 76079 power on, 760 day avg

19:19 <clever> possibly

19:20 <gog> 14 day average uptime

19:20 <gog> 20807/1462

19:20 <mrvn> you win

19:20 <gog> i've gone entire months with this off though

19:20 <gog> baack when i didn't need a laptop really

19:21 <mrvn> Oh, this was my server. It's never off.

19:21 <mrvn> (intentionally)

19:21 <clever> oh, i should check the router hdd...

19:22 <clever> it doesnt even have smartctl installed!

19:22 <mrvn> 2022-02-18 20:11:18 status unpacked smartmontools:amd64 7.2-1

19:22 <clever> 80851 hours, 1566 cycles, 51 day avg, i blame the previous owner

19:22 <clever> the router was once part of an xray machine in a hospital

19:23 <mrvn> ouch, that must have caused some bit flips.

19:23 <clever> smart says nothing is wrong with it

19:32 the_lanetly_052_ has quit [Remote host closed the connection]

20:00 gareppa has joined #osdev

20:02 k8yun has quit [Read error: Connection reset by peer]

20:10 gareppa has quit [Quit: Leaving]

20:16 dude12312414 has joined #osdev

20:23 garrit has quit [Ping timeout: 256 seconds]

20:27 Teukka has quit [Read error: Connection reset by peer]

20:31 Teukka has joined #osdev

20:40 <gog> holy shit i got task switching working

20:40 <gog> ;aehrs;rongaoidjrga;a

20:41 <GeDaMo> Which task was that? :P

20:41 <gog> the one where i excitedly smash my keyboard

20:41 <mrvn> cooperative, preemptive, tickless?

20:41 <gog> cooperative round-robin

20:41 <gog> really rudimentary

20:41 <gog> well i think it can preempt?

20:41 <gog> idk

20:41 <gog> i don't have time slices implemented yet

20:41 <GeDaMo> Yield, baby, yield! :P

20:42 <mrvn> I feel cooperative is actually harder, needs syscalls while preemptive needs IRQs which most do first.

20:42 <gog> oh then it's preemptive

20:42 <j`ey> well cooperative could be kernel threads, explicitly calling schedule()

20:42 <gog> i made a sort of callback interface to my timer irq in order to keep this clean

20:43 <gog> none of the threads call the scheduler so it's preemptive then

20:43 <mrvn> j`ey: true

20:43 <j`ey> gog: so a timer fires and that causes the threads to be context switch?

20:43 <mrvn> gog: a good clean way is to task switch on return from an irq/syscall and have the scheduler just set a variable which task to run next.

20:44 <gog> the timer fires and calls a function that iterates over any sleeping tasks, appends tasks to wake up to the ready queue, then calls the scheduler

20:44 <mrvn> gog: makes it real easy to e.g. schedule when the disk returns a block of data another task was waiting for.

20:44 <mrvn> gog: iterates? bad idea.

20:44 <gog> probably

20:45 <mrvn> gog: you have the tasks in a doubly linked list?

20:45 <gog> singly

20:45 <gog> i'm following brendan's tutorial because i had no clue where to begin

20:45 <gog> and i cribbed most of the code from it

20:45 <mrvn> gog: if you make it double a task can remove itself from the running list and insert itself in the waiting list or vice versa.

20:45 <gog> that would be more ideal

20:45 <gog> but i'm new to this part

20:45 <gog> i've never gotten this far with an experiment

20:46 <mrvn> And the timer interrupt would just use current_task->next.

20:46 <gog> i still need to test it more

20:46 <mrvn> gog: anyway, gratulations, it works. Now make it nice.

20:46 <gog> it _appears_ to work

20:46 <gog> there's a big gulf between appearing to work and verified to work

20:47 <mrvn> gog: Most people get stuck on how to create a task. The switching isn't that difficult but preparing the second task so you can switch needs a mind warp.

20:47 <j`ey> does it print ababab?

20:49 <mrvn> j`ey: mine prints primed and binomial numbers

20:49 <mrvn> something that quickly slows down output.

20:57 mahmutov has quit [Ping timeout: 240 seconds]

20:58 <klange> ah, my rpi froze... clock says a bit under 10 hours of uptime

20:59 <klange> I should change the boot timestamp (no rtc), it's currently a very random 1644908027 which is 6:53:47 this past Tuesday. Should make it midnight, easier uptime calculation from the panel clock...

21:01 heat has joined #osdev

21:02 <gog> j`ey: no it prints a bunch of debugging output with elapsed time and wakeup time and "THE THREAD IS AWOKEN!" when the sleep test's expiration time comes

21:02 <j`ey> :3

21:02 <klange> my original task switch test 10 years ago printed A and B

21:02 <gog> i still need to make more threads to test

21:02 <gog> but thread creation itself is sorted

21:02 <klange> oh and Cs

21:02 <gog> at least for my purposes currently

21:03 <heat> tip: preemption shouldn't rely on IRQs

21:03 <klange> https://user-images.githubusercontent.com/223546/149865481-2aedb885-0f43-47c5-8a12-83ef5cfb8a27.png

21:03 <gog> yeah i know i need time slices

21:03 <heat> a timer event is just one of the ways ask for rescheduling

21:03 <klange> "preemption" absolutely should rely on irqs, how else are you going to interrupt a running userspace process?

21:03 <klange> "task switching" should not

21:03 <heat> IRQs, IPIs

21:04 <gog> anyhow i have more testing to do

21:05 <mrvn> heat: better to say: should not just rely on IRQs

21:06 <klange> the advice of ensuring your task switching isn't dependent on irqs is about being able to make task switches for blocking

21:06 <mrvn> preempting on fixed time slizes is kind of boring and unresponsive.

21:07 <gog> yeah when a thread is blocked it'll run the scheduler

21:07 <mrvn> klange: separating the scheduler (called by e.g. the timer interrupt) and task switching (performed on kernel -> user switch) fits that model really nicely.

21:07 <gog> that's hwo the tutorial is written so that's how mine works :p

21:07 <heat> also don't measure your time using a counter you tick every IRQ

21:07 <heat> bad, bad idea

21:08 <mrvn> google tickless.

21:08 <mrvn> although with the PIC you kind of need to do tick based.

21:08 <heat> tickless is harder to do

21:08 <klange> tickless just means the preemption timer isn't a fixed rate

21:08 <gog> yes i intend to have the capability for tickless at some point when i understand this a little more thorougly

21:08 <mrvn> but tickless is so much worth it.

21:08 <gog> yes

21:09 <heat> my scheduler runs using a tick (no tickless yet) but my timer doesn't know what a tick is, it just does oneshot of the next event

21:09 <heat> the scheduler tick + preemption is just a regular event

21:09 <mrvn> heat: so you are tickless but just never change the interval

21:09 <mrvn> heat: well, maybe. Do you stop the timer when the idle thread runs?

21:10 <heat> no. that would be dynticks I think

21:10 <heat> tickless is when you don't tick but set the next "tick" to the next preemption point (when the next thread can run or something)

21:10 <heat> at least that's my idea of it

21:10 <mrvn> I thick dynticks would be selecting an interval based on e.g. process priority. "time till next event" I think describes tickless

21:11 <mrvn> you just are verry bad at calculating "next event" so far

21:11 <klange> tickless is when you aren't using a fixed periodic timer as the pre-emption source but instead making the pre-emption timer fire precisely when the next time event is

21:11 <heat> yea

21:12 <klange> no one is running and no one is blocked with a timeout? no tick at all - some other hardware interrupt will bring us out of idle

21:12 <mrvn> It's kind of blury when you don't have a periodic timer in hardware

21:12 xenos1984 has quit [Remote host closed the connection]

21:12 <heat> and i'm using a fixed-ish frequency, so not really tickless

21:13 xenos1984 has joined #osdev

21:13 <mrvn> I think the more important part of preemption (maybe that's the wrong term?) is that you switch task when something more important comes up instead of finishing the current time slize.

21:13 <heat> right, that's where something like IPIs can come into play

21:13 <klange> IPIs are still interrupts.

21:14 <heat> yes

21:14 <heat> not a timer IRQ though

21:14 <mrvn> more important is stuff like keyboard interrupt, mouse, network

21:14 <klange> Preemption is very specifically "hey, you, running userspace process that hasn't made a system call, I'm descheduling you".

21:14 <mrvn> wake up the video player when the next video frame comes from youtube even though gcc still has time left to run.

21:15 <heat> that triggers an IPI

21:15 <mrvn> klange: but why are you descheduling? Because the time slize is up? Because some higher priority task work up?

21:15 <j`ey> mrvn: slice!

21:16 <heat> sched_unblock(thread) -> thread->prio > other_cpu_curr_thread->prio -> other_cpu_curr_thread->flags |= SHOULD_SCHED_OUT; sched_send_ipi(other_cpu);

21:16 <klange> mrvn: separate topic

21:17 <heat> i modeled a lot of my timer+sched thing around linux and all it matters is that at some point, ->flags is checked at some points

21:18 <heat> like after unlocking a mutex, re-enabling preemption, exiting to user-space, etc

21:18 <heat> for tasks which are permanently in user-space or idling, IPI

21:20 GeDaMo has quit [Remote host closed the connection]

21:22 <mrvn> klange: For me scheduling on time slices is just time sharing. True preemption include premature switching. Which is a step stronger.

21:22 <heat> so, a random malloc blows up the kernel

21:23 <heat> and I don't know what's wrong

21:23 <klange> mrvn: and it has nothing to do with what I'm saying

21:23 <heat> kasan says everything is great

21:23 <mrvn> klange: for you windows 3.1 is premptive, DOS is preemptive.

21:24 <klange> what

21:24 <mrvn> heat: blows up can mean a lot of things

21:24 <klange> you are putting words in my mouth

21:24 <mrvn> klange: it interrupts the running process and schedules a different one.

21:24 <heat> mrvn: I know, and it's doing a lot of things; like general corruption of thread state and whatnot

21:24 <klange> Pre-emption is the ability to interrupt a userspace process and determine that another one needs to run, without that process having willingly yielded itself by making a system call.

21:25 <mrvn> klange: yep, win: 3.1 check, DOS: check

21:25 <klange> This requires some hardware ability to force the userspace process to enter the kernel. That's an interrupt.

21:25 <heat> i wonder if I missed some important detail on the riscv mmu

21:25 <klange> How that interrupt is triggered, whether it's an IPI from another processor, or a hardware IRQ from a mouse, or a timer, is irrelevant to pre-emption.

21:26 <mrvn> klange: even time sharing systems from the 60s are preemptive with that definition

21:26 <klange> Yes, very much so.

21:26 <mrvn> so basically everything but userspace coroutines is preemptive.

21:27 <klange> Yes?

21:27 <mrvn> I'm not finding that definition verry useful.

21:27 <klange> > The term preemptive multitasking is used to distinguish a multitasking operating system, which permits preemption of tasks, from a cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.

21:27 <klange> > In simple terms: Preemptive multitasking involves the use of an interrupt mechanism which suspends the currently executing process and invokes a scheduler to determine which process should execute next. Therefore, all processes will get some amount of CPU time at any given time.

21:28 <mrvn> yeah, I think modern use is a bit different, esspecially in linux.

21:28 <klange> It's a term that, alone, has lost a lot of usefulness because the thing it describes has become a fundamental aspect of what is expected from an operating system: The ability to pretend a single core is running multiple uncooperative processes "simultaneously".

21:29 <mrvn> bingo

21:29 <gog> ok i was just trawling through the disassembly of my kerenl and there's a movaps with xmm registers in there, should i be using -mno-sse?

21:30 <mrvn> gog: I don't know, shoud you?

21:30 <gog> ¯\_(ツ)_/¯

21:30 <mrvn> gog: what about fpu regs?

21:30 <klange> -mgeneral-regs-only is a good start

21:30 <gog> could've sworn i had that

21:30 <klange> until/unless you want to manage the kernel being able to use 'fancy' registers, which you can do if you want!

21:31 <heat> note that mgeneral-regs-only is kind of recent in x86 gcc/clang

21:31 <heat> used to be arm64 only

21:31 <heat> so update yer toolchains

21:31 <gog> ok

21:31 <j`ey> gog: https://i.imgur.com/IHjDjUW.jpeg

21:32 <gog> j`ey: neighborhood cat is curled up at my feet rn :3

21:32 <gog> she's been rubber ducky debugging for me all day

21:32 <mrvn> gog: better than on your keyboard

21:32 <gog> yes

21:32 <gog> ok -mgeneral-regs-only was not applied to all objects. fixed

21:33 <gog> i don't trust those fancy registers because i don't understand what using them might entail

21:33 <j`ey> XSAVE!

21:33 <klange> Usually just making sure you restore them like all the other ones, it's just they're kinda big...

21:33 <mrvn> gog: it will kill your multitasking since you aren't saving them

21:33 <gog> yeah i don't have anything for that implemented

21:33 <j`ey> save all the AVX512 registers..

21:34 <j`ey> gog: youre saving the normal GPR regs, right? :P

21:34 <gog> lol no why would i do that

21:34 <mrvn> don't you have to enable AVX before use?

21:34 <gog> yes of course i'm saving the GPRs

21:34 <heat> fxsave and xsave, frstor and xrstor

21:34 <heat> mrvn, yes

21:34 <klange> I think I have a... I don't want say bug, but missing feature in my task switching where I always save/restore fpu regs even if I'm just switching between kernel contexts...

21:34 <klange> need to fix that to only save coming in from userspace, and restore going back out...

21:35 <heat> oof

21:35 <mrvn> klange: or save it anyway and use SSE in kernel

21:35 <heat> it's not a bug if it works properly

21:35 <klange> it's a "thing that can be improved" rather than a bug, yes

21:35 <klange> mrvn: i don't wanna, you can't make me! *throws things*

21:35 <heat> when I try to allocate fpu space for user threads my kernel blows up

21:35 <heat> so you're a tiny bit better than me :P

21:36 <gog> my kernel doesn't even know what the FPU is

21:36 <gog> my "kernel"

21:36 <klange> you doin' x86-64 or old-skool 32-bit?

21:37 <gog> x86_64

21:37 <gog> 32-bit is a pain in the ass

21:37 <heat> osdev is a pain in the ass

21:37 <heat> go learn karate or something

21:37 <heat> way more fun

21:37 <j`ey> karate is a pain in the ribs

21:37 <mrvn> gog: FPU is that thing that gets called when your opcode is Exxx

21:37 <klange> you'll want to teach it what an fpu is, then, 'cause once you start making "real" userspace stuff you'll find that the compiler just assumes a certain level of fpu is available for normal stuff, as you've already seen

21:37 <gog> what's opcode

21:38 <heat> thing that make cpu go brrr

21:38 <klange> opcode i-

21:38 <gog> (not much what's opcode with you?)

21:38 <klys> the first byte is 0xEx ?

21:38 <klange> dammit heat i was gonna do the brrr

21:38 <mrvn> gog: that thing that calls your illegalt-instruction handler

21:38 <gog> i don't have a handler implemented for #UD lol

21:38 <mrvn> klys: iirc. and it's a word on m68k, no bytes for opcodes

21:39 <heat> you'll be so confused when you hit UB and get a double fault

21:39 <klys> mrvn, m68k eh

21:39 <mrvn> gog: are you using clang?

21:39 <klange> gotta have panics stubs for all the exception interrupts, smash a bunch of strings in an array so you can look up a pretty name for them

21:39 <gog> i will nver use clang stop asking

21:39 <heat> i sure as hell hope so

21:39 <heat> i gave you the patch

21:39 <gog> i ignored it

21:39 <gog> send a PR

21:39 <heat> take it

21:39 <klys> mrvn, you have a coldfire project?

21:40 <heat> i only send patches

21:40 <gog> so i can ignore it again

21:40 <mrvn> gog: too bad, clang actually produces ub2 opcodes. gcc just goes haywire on UB

21:40 <kingoffrance> as uptime is those cat-quick blinks between reboots, opcodes are surely those brief flickers between faults

21:40 <mrvn> klys: I have original Amigas

21:40 <klys> mrvn, an amiga 2000 ?

21:40 <gog> ok heat i'm gonna make a branch and apply the patch there

21:40 <mrvn> klys: 600, 1200, 4000

21:41 <klange> [Narrator: gog never merged the branch.]

21:41 <heat> gog: imagine a world where the compiler is friendly and looks out for you and your code, a world where LTO can be fast

21:41 <heat> a world where your compiler and the runtime libraries are liberally licensed

21:41 <klys> LTO is like libtool right?

21:41 <gog> do i still need -lgcc though

21:41 <mrvn> heat: what's wrong with libgcc licensing?

21:42 <heat> klys, no

21:42 <heat> fuck libtool

21:42 <mrvn> klys: LTO is like: cat *.c | gcc -

21:42 <heat> fuck libtool x100

21:42 <klange> you should be able to get gcc to produce fully libgcc-less kernel code for x86-64

21:42 <heat> mrvn: libgcc's licensing falls apart if you need to explicitly include it

21:42 <mrvn> klys: you can. I do.

21:43 <gog> also idk if i mentioned i tested with a bunch of threads all set to sleep for different intervals and it still seemed to work so yay!

21:43 <kingoffrance> mrvn, indeed, it was also called CMI, part of/related to ye olde oskit :D

21:43 <klys> so LTO is a TLA remind me what it stands for

21:43 <klange> link-time optimization

21:43 <mrvn> klys: just't don't use "x / 10" anywhere in your code.

21:43 dormito has quit [Ping timeout: 256 seconds]

21:43 <klange> code optimization that happens at link time, rather than just compilation time, so you can mash together object files more efficiently than "here's a bunch of relocs, good luck"

21:44 <mrvn> klys: link-time-optimization. It stores the intermediate language in the .o files and on link it merges all the chunks and runs the optimizer over everything.

21:44 <heat> LTO is a compiler mode where it includes bytecode and ASTs inside your object files, and when linking it looks at the compiler data and optimises stuff

21:44 <klys> does LTO change the makefile?

21:44 <heat> you just add -flto to the cflags

21:44 <mjg> wait for a paper that LTO does not improve squat over -O2 if you control for everything

21:44 <mrvn> klys: yes. you need to add the LTO flag and link with CC/C++

21:45 <heat> mrvn, it's actually not the compiler that does it, but the linker with plugins

21:45 <klys> does clang have LTO?

21:45 <mjg> [on a serious note i do believe properly implemented LTO gives great opportuniteis of course]

21:45 <mrvn> mjg: the only way LTO changes anything is if you have multiple compilation units.

21:45 <mjg> which is the standard in bigger projects

21:45 <heat> klys, ofc

21:45 <mjg> and even smaller ones

21:45 <klange> clang has LTO and is noted for being better at it than gcc, hence why it was even brought up

21:45 <mrvn> mjg: and then it changes tons and tons and tons.

21:45 <heat> clang even has -flto=thin which is like regular LTO but way faster

21:45 <j`ey> less LTOy

21:45 <heat> but way way faster

21:46 <heat> and

21:46 <heat> speed

21:46 <heat> is

21:46 <heat> keyyyyyyyyy

21:46 <mjg> see

21:46 <mjg> gcc likes to inject rep mov/rep stos sometims

21:46 <mrvn> The nice thing about LTO is that you can put your trivial getters and setters in .c files without paying for it.

21:46 <mjg> instead of generatic a call to memcpy/memset

21:46 <mjg> except when it does that, it gets slower

21:46 <mrvn> mjg: why?

21:46 <mjg> or to put it differently, compiler optimization are pretty hit and miss

21:47 <heat> depends on the -march

21:47 <mjg> mrvn: there is a huge startup cost to rep prefix

21:47 <heat> and the size of the memcpy

21:47 <klys> gcc -flto -o main.o -c main.c; clang -flto -o main -o main.o; ???

21:47 <mjg> mrvn: on most cpus

21:47 <mrvn> mjg: so? memcpy does rep too

21:47 <mjg> only for big sizes

21:47 <mjg> normally you do simd

21:47 <mjg> or even straight up movs

21:47 <mrvn> mjg: which require a branch, even worse

21:47 <mjg> no

21:47 <klys> gcc -flto -o main.o -c main.c; clang -flto -o main main.o; does this work?

21:48 <heat> mjg: if your compiler generates rep movs for small sizes, it's broken or you configured it wrong

21:48 <mjg> heat: gcc will inject them at times

21:48 <heat> klys, changing compilers? no clue

21:48 <mjg> heat: grep something big like the linux kernel

21:48 <mrvn> mjg: sure it does. the normal memcpy function call isn't optimized for known size. That's where the rep mov/rep stos comes from.

21:48 <heat> mjg: my kernel has a bunch of rep movsb but they're way less than a memcpy

21:48 <mjg> mrvn: for example https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/arch-x86_64/string/sse2-memset-slm.S

21:48 <bslsk05> android.googlesource.com: libc/arch-x86_64/string/sse2-memset-slm.S - platform/bionic - Git at Google

21:48 <heat> and generally when size is bounded

21:49 <mjg> mrvn: pick any production memset/memcpy and you will find they don't do rep

21:49 <mjg> mrvn: even intel optimization manuals explicitly state to only do rep after simd becomes too slow

21:49 <heat> yes

21:49 <heat> and that's like 200-300 bytes in

21:49 <mrvn> mjg: I count 5 jumps right there

21:49 <mjg> well ye there are some jumps

21:49 <klange> intel specifically says they aim for rep mov to be the fastest, but have historically fucked it up

21:49 <mjg> and that's still faster than mere rep mov

21:50 <mrvn> mjg: and you can't use sse, only regular regs you told gcc.

21:50 <mjg> for kernel code, sure

21:50 <mjg> but evven then it is faster to not use rep up to about 256 bytse

21:50 <mrvn> mjg: gcc is pretty good in knowing the cost of various memcpy() flavours and pretty good at picking the fastest for known sizes. c++ needs that so often that it's the biggest timer saver making that good.

21:51 <mjg> note when you do these sizes it's not very jumpy thanks to misaligned + overlapping stores

21:51 <mrvn> mjg: is it faster on a pentium?

21:51 <mjg> you mean pentium 1?

21:51 <mjg> i don't know on pre-amd64

21:51 <mrvn> mjg: or even 586.

21:52 <mrvn> mjg: if you are doing 32bit code the cpu target is rather low.

21:52 <mjg> i do remember very old memsets et al were doing hand rolled loops or had a jump table

21:52 <mjg> all my comments were about amd64

21:52 <heat> if your compiler is doing rep movsb of smaller targets, its broken because it knows exactly how big a memcpy needs to be before it uses rep movsb

21:52 <mjg> again see intel optimization manuals

21:52 <mrvn> mjg: if you are doing 64bit code you get a bunch of cpu optimizations for free because all 64bit cpus are more modern.

21:52 <mjg> they literally say to use simd

21:53 <heat> smaller sizes*

21:53 <mjg> and only resort to rep stuff for big sizes

21:53 <heat> i've never seen GCC do what you say it does

21:53 <heat> and I have looked at that specifically

21:53 <mjg> heat: i have to head off in few minutes. i have seen it several times, but don't remember the version number

21:53 <mjg> i'm happy to flame this tomorrow around the same time :)

21:54 <mrvn> mjg: ever seen this? https://en.wikipedia.org/wiki/Duff%27s_device

21:54 <bslsk05> en.wikipedia.org: Duff's device - Wikipedia

21:54 <heat> mjg: https://godbolt.org/z/xrMzczKTb

21:54 <bslsk05> godbolt.org: Compiler Explorer

21:55 <mjg> mrvn: of course

21:55 <heat> actually this is a flawed example, just change the memcpy's length

21:56 <mjg> i did not claim gcc generates inline rep every time

21:56 <mjg> that would be incredibly broken

21:56 <mjg> i did claim it happens sometimes

21:56 <mrvn> heat: try: https://godbolt.org/z/xrMzczKTb

21:56 <mrvn> args, stupid firefox messing up the copy&paste again

21:56 <mjg> afair it had to do with sizes known at compilation time and > 100 bytes

21:56 <mjg> 100ish

21:56 <mrvn> try memcpy(buf, buf2, 64);

21:57 <mrvn> On not-x86_64 the compiler has to know the alignment as well as the size for good code.

21:57 <klange> Duff's device is from the early 80s and has nothing to do with what's fastest for modern CPUs. I don't even think it was built for Intel CPUs at all.

21:57 <heat> AH YES OF COURSE

21:57 <heat> mjg: i know why you're seeing that

21:57 <heat> -Os

21:57 <heat> test it.

21:58 <mjg> no

21:58 <mrvn> mjg: 256 bytes uses vmovdqa, 512 bytes calls memcpy.

21:58 <mjg> i'm confident it was O2

21:58 <mjg> and simd disabled

21:58 <mrvn> mjg: you must have hit a case where it didn't have the register and the "rep" is cheaper than saving and restoring the regs.

21:59 <mjg> that is plausible, sure

22:00 <heat> https://godbolt.org/z/74v5Ej3zx

22:00 <bslsk05> godbolt.org: Compiler Explorer

22:00 <mrvn> mjg: with -mregular-regs-only I get a bunch of "mov" with size 128 and memcpy with size 256

22:01 <mrvn> mjg: did you maybe see a strdup()?

22:02 <mjg> no, it was in some random code

22:02 <mjg> i grepped objdump -d

22:02 <mjg> on a big binary

22:02 <mjg> and no, it was not hand-rolled asm there

22:02 <mjg> i have to go now, happy to continue tomorrow same time - 1h

22:03 <heat> mrvn, see my example. it only tries to inline rep movsb when -Os

22:03 <mjg> for -Os that's pretty expected

22:03 <mjg> and i would not complain about this bit :)

22:03 <mrvn> heat: I bet there are cases where rep is better

22:04 <mjg> it is better starting about 2K

22:04 <mjg> if you can use simd

22:04 <mjg> and 256 bytes if you can't

22:04 <mjg> exact numbers depend on the uarch of course

22:05 <mjg> ok i'm really afk now, cheers

22:06 <gog> heat: oh dam it works

22:06 <gog> i had to git am -3 it but it worked lol

22:07 <gog> one conflict where i radically rewrote task.c

22:07 wootehfoot has quit [Read error: Connection reset by peer]

22:07 heat_ has joined #osdev

22:08 heat has quit [Ping timeout: 250 seconds]

22:09 <mrvn> Aprop LTO from earlier. My moose build system allows mixing LTO and non-LTO on a per directory basis. So you can LTO optimize all your heap functions but not pay the compile time to do it all.

22:10 mahmutov has joined #osdev

22:10 <mrvn> even per file.

22:11 <klange> i might want to look at ltoing my kernel again, been a while

22:11 <klange> all my userspace apps are single-source-file and then dynamically linked to libraries that are also single-source-file... except kuroko, which I should also look at ltoing

22:12 <klange> hm, or do I yolo build kuroko as a single source...

22:12 <klange> i yolo build kuroko, nvm

22:13 <heat_> mrvn, theoretically yes, but well crafted memcpys like glibc's are still faster

22:13 <klange> kernel is still built from object files, so it could use lto'ing; libc is too, but I'm super wary of LTOing libc

22:13 heat_ is now known as heat

22:13 <gog> yolo

22:13 <klange> $(CC) -o kuroko kuroko/src/*.c

22:13 <gog> heat: idk if you saw i applied your patch with git am -3 with one conflict and one reversion

22:13 <gog> and it builds and runs

22:14 <klange> give or take some CFLAGS

22:14 <mrvn> heat: memcpy() is so much better that gcc replaces copy loops with it.

22:14 <heat> klange, note that you can do -ffat-lto-objects and your object files get both LTO goo and object files

22:14 <heat> eh, machine code*

22:14 <mrvn> heat: you basically have to for boot.S

22:14 <heat> what?

22:15 <mrvn> heat: well, if you build temp libs for subdirs

22:15 <klange> since I only statically link libc in one place (ld.so) I might be okay? I think I just have outdated paranoia about LTO dropping symbols from a shared lib

22:15 <klange> and/or messing with things that should be [shudders at the thought] interposable

22:16 <heat> gog, where?

22:16 <mrvn> heat: threw me for a while. if you link boot.o bla.o blub.o with LTO the boot.o stuff disappears and you are only left with the ASTs.

22:16 <heat> no?

22:16 <gog> heat: reverted debug.sh and conflict in kc/core/task.c because i refactored the whole thing

22:17 <heat> mrvn: I link plenty of non-LTO stuff with LTO stuff, and everything is included

22:17 <mrvn> klange: if the function is externally visible then it won't get dropped.

22:17 <mrvn> heat: into object files?

22:17 <mrvn> heat: it's different for libs and binaries

22:19 <heat> gog, yeah but where is it

22:19 <gog> idle_task

22:19 <gog> __attribute__((used))

22:20 <gog> or you mean the branch? i haven't pushed it

22:20 <heat> mrvn: hmm, maybe static libs are different, but I'm talking about the final exec since I don't do static libs of directories

22:20 <heat> gog, do the push

22:20 <gog> ehhhh

22:20 <mrvn> gog: [[used]]

22:21 <gog> mrvn: i am using c2x on my gcc branch :o

22:21 <heat> tip: all the changes are probably super compatible with GCC, you just need to change the makefile slightly

22:21 <mrvn> heat: static libs are just ar files of all the .o file. I'm talking .so files.

22:21 <heat> i know they are

22:22 <mrvn> heat: if you link .o files into a larger .o (or .so) file you end up with a non-fat .o file. So only the AST chunks.

22:22 <heat> gog: have a CLANG=1 to use clang, and use x86_64-elf otherwise or something should be fine

22:22 <gog> heat: i'm considering a reworking of my build system soon and that could become part of it

22:23 <heat> i find it to be really useful to compile my stuff with two compilers

22:23 <heat> really irons out all the issues

22:23 <mrvn> heat: at least gcc is to stupid to notice when a .o file has no AST and only the binary code and drops it.

22:23 <heat> I've found gcc LTO to be significantly more picky than LLVM for example

22:23 <mrvn> heat: picky in what way?

22:23 <gog> haven't tried LTO

22:23 <heat> wrt function signatures for example

22:24 <gog> my project is tiny though so it'd probably be feasible

22:24 <mrvn> gog: try it early. less likely to break.

22:24 <heat> if one compilation unit sees void f(int); and the other one sees unsigned long f(unsigned long); gcc complains, clang doesn't

22:25 <mrvn> heat: I would say gcc is correct.

22:25 <heat> yes

22:25 <mrvn> it's undefined behavior.

22:25 <heat> everything I was doing was technically correct wrt ABI, but it's totally undefined

22:26 <mrvn> or actually an error. If you have a duplicate function then it must be identical.

22:26 <heat> my syscall table was just an array of a funcptr that took 6 unsigned longs and returned a long

22:26 <heat> due to the x86_64 ABI, it works, but it wasn't correct

22:26 <heat> I refactored it now, but yeah

22:26 <mrvn> heat: only works for register arguments and not with vargs

22:27 <mrvn> verry fragile

22:27 <heat> why would I use varargs on a system call

22:27 <mrvn> just saying

22:27 <mrvn> .oO(sys_print)

22:27 <heat> cursed

22:27 <mrvn> or sys_log.

22:28 <heat> all POSIX functions that say they take varargs end up resolving the varargs in user-space

22:28 <heat> like open(2) for instance

22:28 <mrvn> heat: my OS isn't POSIX: :)

22:28 <gog> i think my next step is user mode

22:28 <klange> do iiiit

22:29 <gog> i figured out syscall/sysret already

22:29 <heat> emperor palpatine vibes

22:29 <gog> i just haven't implemented anything for it

22:29 <mrvn> gog: you have threads without user space?

22:29 <gog> i have kernel threads rn

22:30 <gog> or tasks or whatever you wanna call them

22:30 <heat> well, go to user space

22:30 <mrvn> gog: no wonder your threads work. :)

22:30 <heat> if that's what you want to do that is

22:30 <gog> i'm positive userspace is gonna break shit

22:30 <gog> which is fine

22:30 <kazinsal> userspace contains users, and users break shit, so this tracks

22:31 <gog> yes

22:31 <heat> there's plenty of stuff you can still do in kernel space

22:31 <gog> probably

22:31 <heat> it turns out

22:31 <gog> i need to also refactor my memory management

22:31 <heat> kernel dev never ends

22:31 <heat> it's an unending cycle

22:31 <gog> there's a lot of TODO comments everywhere

22:31 <heat> just like osdev

22:31 <gog> i just don't wanna die without badly reimplementing UNIX

22:32 <heat> what you do need is ACPI

22:32 <heat> also PCI

22:32 <gog> yeah i've been investigating ACPI

22:32 <klange> still don't have much acpi beyond "where cores plz"

22:32 <heat> do you not handle pci irq routing?

22:32 <gog> i don't have any generic synchronization primitives yet

22:32 <mrvn> gog: the compiler has

22:32 <heat> oh god the sync primitives ahh

22:33 <mrvn> gog: don't write your own

22:33 <gog> ACPICA wants a few of them iirc

22:33 <klange> i have a chipset driver for piiix that gets me pretty far

22:33 <heat> mrvn, GCC doesn't include a mutex implementation i think

22:33 <mrvn> who needs a mutex. bah. bad code.

22:34 <heat> klange: you're doing toaru without any 3rd party code right?

22:34 <klange> aye

22:34 <klange> acpica is verboten

22:34 <heat> does that count in the bootloader?

22:34 <klange> my bootloader is all mine as well these days

22:34 <heat> idea: parse ACPI into a device tree

22:35 <heat> like an actual fdt

22:35 <klange> that's not a terrible idea, really

22:35 <klange> especially in EFI where it's more necessary to have that stuff and you've got some functions available for poking it...

22:35 <heat> do you?

22:36 <klange> nope, but I may actually think about it when my fdt parsing improves

22:36 JanC_ has joined #osdev

22:37 JanC has quit [Killed (calcium.libera.chat (Nickname regained by services))]

22:37 JanC_ is now known as JanC

22:38 <gog> there's only 4 functions in the acpi protocol

22:38 <gog> iirc

22:38 <gog> they're not very useful

22:38 <heat> UEFI code parses a device tree into AML btw

22:38 <gog> find rsdp, load a table, uhh

22:39 <heat> it does the reverse

22:39 <gog> ah i see

22:40 <heat> my idea was: black box inside the bootloader where you *can* use 3rd party code, the kernel just uses regular fdt code

22:40 <heat> if that doesn't work, welp

22:40 <klange> I will, eventually, write my own AML interpreter.

22:40 <klange> Probably after I finish my truetype hinter.

22:41 <heat> can you enforce MSI only?

22:41 <heat> might work better

22:41 <heat> i know vms are weird with that stuff, so you could use the chipset driver

22:42 <gog> klange: will you have truetype bytecode interpretation in-kernel like NT did

22:42 <gog> it's a great idea

22:42 <heat> if(!enable_msi()) do_legacy() where do_legacy would panic on non-piiix chipsets

22:42 <klange> no :P my kernel only has a bitmap font thing for its debug output, my truetype lib is userspace

22:43 <heat> modern PCs have PCIe, and PCIe requires MSI afaik

22:43 <mrvn> monochrome?

22:44 <gog> reject modernity, return to ISA

22:44 <gog> no PnP BIOS

22:44 <gog> jumpers baybeee

22:45 <klange> mrvn: not sure what you're asking

22:45 <mrvn> klange: if it's a monochrome font or greyscale

22:45 <klange> The kernel's text driver is 'monochrome' in that the bitmaps are either on or off, though it does support color and has a tiny DEC parser

22:45 <klange> The TrueType implementation, while currently unhinted, is antialiased.

22:46 <klange> It replaced a signed-distance-field renderer (that used baked signed-distance-fields stored as PNGs, previously BMPs)

22:47 <klange> Which itself replaced FreeType as part of aforementioned "no third-party code" project.

22:47 <heat> gog, hmmm that would kinda work too though

22:48 <klange> The current TrueType implementation only really looks marginally better than the SDF renderer as far as individual glyph presentation, but it had huge wins for kerning and basic Unicode support

22:48 <heat> each driver implements probe_irq(), which generates an IRQ, the irq subsystem would find out which IRQ the device is mapped to

22:48 <heat> checkmate ACPI

22:49 <klange> it was a pain in the ass to bake the glyph spritesheets for the SDF library, and I gave up on it before implmenting any sort of glyph index tables or rect-packing optimizations, so it was infeasible to convert whole fonts, only basic ASCII subsets

22:49 <klange> now I just plop the real Deja Vu fonts in and I'm good - only third-party thing I use if you can accept/except libgcc :)

22:50 <heat> you could probably skip libgcc too

22:51 <klange> I can probably provide my own implementation of the necessary functions, but convincing gcc to universally not produce it for userspace things is... not fun.

22:51 <heat> init/fini_array take care of the constructors, and libgcc's interface isn't that complex, at least for modern 64-bit architectures

22:51 <mrvn> klange: reimplenting the gcc specific stack unrolling and exceptions and such is no fun

22:51 <klange> I don't actually think libgcc is providing anything init-related, it's mostly stupid things gcc can't be arsed to spit out directly

22:52 <heat> it includes crtbegin/end

22:52 <heat> which implements _init, etc

22:52 <klange> the stack unrolling and exception support mrvn mentions is probably the biggest "fuck this shit" part

22:53 <klange> there's no init code in crtbegin

22:53 <klange> ironically it has dtor code

22:54 <klange> deregister_tm_clones, register_tm_clones, __do_global_dtors_aux, and frame_dummy are all that's in there

22:54 <klange> crtend is... effectively empty

22:54 <heat> oh right right

22:55 <heat> _init and _fini are defined by crti.o

22:55 <heat> but crtstuff has all the good stuff

22:55 <heat> that's what goes into .init and .fini

22:56 <klange> crtstuff is a macro that pulls in those other things, and crti, n, and 0 are 'user'-provided, so those are already mine :)

22:57 <heat> crt1 > crt0

22:57 <heat> fite me

22:57 <klange> no no, your math checks out...

22:59 <klange> I happily expunged libgcc from the kernel, at least, on x86-64 and aarch64, with the right flags.

22:59 <klange> I abandoned 32-bit x86, but I recall divs being the killer thing that would drag in libgcc there

23:00 <mrvn> klange: x/10 in printfs kills me on ARM

23:00 <klange> I do not get those refs on aarch64

23:00 <mrvn> klange: had to implement my own div_10 that uses shifts and masks.

23:00 <mrvn> klange: aarch64 has a 64bit div opcode I assume

23:01 <heat> klange, do your atomics work? I've seen some atomics code for arm64 in compiler-rt

23:01 <klange> yes, I had to add flags for it, though

23:02 <heat> lse?

23:02 <klange> dunno about clang/compiler-rt, but for gcc it was because it wanted to check if it could do 8.1 atomics and if I forced it not to it would directly emit instructions and be good to go

23:02 <heat> yes exactly

23:03 <heat> compiler-rt and libgcc are completely compatible btw

23:03 <heat> clang even defaults to libgcc in linux for example

23:03 <klange> -mno-outline-atomics

23:04 <heat> yeah I see that

23:04 <heat> Out-of-line LSE atomics helpers

23:04 <klange> I've also got -ffixed-x18 thanks to geist, doing all my CPU-local stuff as direct register references

23:05 <heat> how does that work?

23:05 <mrvn> heat: gcc forgets x18 exists and you can use it in asm however you want

23:06 <heat> isn't there an abi register for the tp?

23:06 <klange> yes, but using a general reg is more flexible for code

23:06 dude12312414 has quit [Quit: THE RAM IS TOO DAMN HIGH]

23:06 <heat> why?

23:07 <klange> because system regs need mrs/msr instructions but general reg can be used in other operations directly

23:07 <heat> oh

23:07 <heat> how does user-space do TLS then?

23:08 <klange> it's unclear to me whether this is actually faster or if it's just saving a few bytes of code here and there, but it allows for `register struct ProcessorLocal * this_core asm("x18");`

23:09 <heat> ok it also uses mrs

23:09 <klange> I think userspace TLS is generally loading from the thread pointer register once and then just treating it as constant from then on, but for the ^ CPU-local case that's not feasible because it can change in a function call, so there would be way more loads necessary

23:11 <heat> well if it changes in a function call, that seems pretty broken to me

23:11 <mrvn> kernel or user code?

23:11 <heat> generally when you're doing percpu accesses you disable preemption, so you can't migrate

23:11 <klange> In kernel code, a function call may end up calling a yield, and when it returns you can be on a different CPU.

23:12 <heat> yes but you disable preemption so that doesn't work

23:12 <heat> it's how linux works at least

23:12 <mrvn> heat: and then the kernel dies with "Aieeee, scheduling in interrup. Not syncing."

23:12 <heat> all percpu accesses need preemption disabled

23:12 <klange> I do not pre-empt kernel code, but that doesn't matter, a _function call_ may return on a different CPU.

23:13 <heat> mrvn, hm?

23:13 <klange> so if I want this_core->foo and then block_on_thing(), then this_core->bar afterwards may be a different CPU's bar

23:13 <mrvn> heat: calling yield in a critical section is not a good idea

23:13 <heat> well, don't call yield

23:14 <klange> no I have no multitasking :P

23:14 <klange> now*

23:15 <heat> https://elixir.bootlin.com/linux/latest/source/include/linux/percpu-defs.h#L275

23:15 <bslsk05> elixir.bootlin.com: percpu-defs.h - include/linux/percpu-defs.h - Linux source code (v5.16.10) - Bootlin

23:22 <mrvn> klange: for that I want "volatile struct PerCpu __seg_x18 *this_core;"