#scopehal on 2023-09-12 — irc logs at libera.irclog.whitequark.org

2022-03-25 21:41 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

00:27 t4nk_freenode has quit [Quit: ZNC 1.8.2 - https://znc.in]

00:29 t4nk_freenode has joined #scopehal

00:29 azonenberg has joined #scopehal

00:34 veegee has joined #scopehal

00:47 <azonenberg> Welp

00:47 <azonenberg> Back online after an unplanned lab shutdown lol

00:47 <azonenberg> Hit the UPS bypass switches in the wrong order and Bad Things(tm) happened

00:48 <azonenberg> Appears the only casualty was a hepa air cleaner and the aging SBC I was using as a serial console server

00:48 <azonenberg> anyway, since everything was down I put the GPUs in the xen server

00:48 <azonenberg> Attached one to a VM and it enumerates but i'm still testing to see if i'll have full vulkan etc in the VM

00:48 <azonenberg> (gotta finish bringing other services online still)

00:56 benishor_ has joined #scopehal

00:56 benishor has quit [Quit: tah tah!]

00:56 benishor_ is now known as benishor

00:56 <Darius> oops

00:56 Degi_ has joined #scopehal

00:56 Degi has quit [Ping timeout: 246 seconds]

00:56 Degi_ is now known as Degi

01:55 <d1b2> <azonenberg> @johnsel

01:55 <d1b2> <azonenberg> https://cdn.discordapp.com/attachments/776941750291267595/1150972799020695584/ngscopeclient-in-vm.png

01:55 <d1b2> <johnsel> in a vm?

01:55 <d1b2> <azonenberg> That's a xen instance, yes

01:55 <d1b2> <johnsel> hurray

01:55 <d1b2> <johnsel> shame after all my hours you took this moment of glory away from me

01:55 <d1b2> <azonenberg> One card is currently attached to that vm, one is free. one of the amd cards is present in the chassis but i think that pcie slot is unavailable due to conflicts with the m.2 or something

01:55 <d1b2> <johnsel> but I'm not complaining

01:56 <d1b2> <johnsel> well, I am, lol, but still :p

01:56 <d1b2> <azonenberg> all i had to do was disable nouveau and throw the nvidia blob on it

01:56 <d1b2> <azonenberg> anyway that's one of my test instances that i'll shut down shortly. the second card is currently uncommitted

01:56 <d1b2> <azonenberg> let me know if you need me to attach a card to a vm for you, i forget if you can do that

01:56 <d1b2> <johnsel> yeah NVidia drivers are pretty great nowadays under Linux, they stuffed everything in the on-device firmware

01:57 <d1b2> <azonenberg> yeah i was impressed. i remember going amd originally for this box because of how anti-virtualization nvidia had been

01:57 <d1b2> <johnsel> yeah it's a shame AMD is lagging behind so much on this front

01:57 <d1b2> <azonenberg> anyway, one step closer to working CI

01:57 <d1b2> <azonenberg> let me know what support if any you need

01:57 <d1b2> <johnsel> they seem much more interested in putting their SoCs in Teslas

01:58 <d1b2> <johnsel> yes I do need your help getting it attached

01:58 <d1b2> <johnsel> I have permission to them but can't seem them due to the stupid UI

01:58 <d1b2> <johnsel> unless they patched that

01:58 <d1b2> <johnsel> which our repo based instance of the management software doesn't allow

01:59 <d1b2> <johnsel> let me get connected to the VPN real quick and let you know what I need

02:13 <d1b2> <johnsel> can you pm me the ip address of 'my' xoa instance?

02:14 <d1b2> <johnsel> and dns please

02:16 <d1b2> <johnsel> @azonenberg

05:02 benishor has quit [*.net *.split]

05:02 azonenberg has quit [*.net *.split]

05:02 veegee has quit [*.net *.split]

05:02 t4nk_freenode has quit [*.net *.split]

05:02 Bird|otherbox has quit [*.net *.split]

05:02 tnt has quit [*.net *.split]

05:02 florolf has quit [*.net *.split]

05:02 d1b2 has quit [*.net *.split]

05:02 syscall has quit [*.net *.split]

05:02 Stary has quit [*.net *.split]

05:02 mxshift has quit [*.net *.split]

05:02 lethalbit has quit [*.net *.split]

05:02 Ekho has quit [*.net *.split]

05:02 gruetzkopf has quit [*.net *.split]

05:02 elms has quit [*.net *.split]

05:02 anuejn has quit [*.net *.split]

05:02 Yamakaja has quit [*.net *.split]

05:02 electronic_eel has quit [*.net *.split]

05:02 juh has quit [*.net *.split]

05:02 davidc__ has quit [*.net *.split]

05:02 esden has quit [*.net *.split]

05:02 mithro has quit [*.net *.split]

05:02 welterde has quit [*.net *.split]

05:02 Stephie has quit [*.net *.split]

05:02 juri_ has quit [*.net *.split]

05:02 Fridtjof has quit [*.net *.split]

05:02 sgstair has quit [*.net *.split]

05:02 vup has quit [*.net *.split]

05:02 benishor has joined #scopehal

05:02 veegee has joined #scopehal

05:02 t4nk_freenode has joined #scopehal

05:02 azonenberg has joined #scopehal

05:02 Bird|otherbox has joined #scopehal

05:02 tnt has joined #scopehal

05:02 d1b2 has joined #scopehal

05:02 florolf has joined #scopehal

05:02 vup has joined #scopehal

05:02 lethalbit has joined #scopehal

05:02 syscall has joined #scopehal

05:02 mxshift has joined #scopehal

05:02 Ekho has joined #scopehal

05:02 Stephie has joined #scopehal

05:02 gruetzkopf has joined #scopehal

05:02 anuejn has joined #scopehal

05:02 Yamakaja has joined #scopehal

05:02 electronic_eel has joined #scopehal

05:02 elms has joined #scopehal

05:02 Stary has joined #scopehal

05:02 sgstair has joined #scopehal

05:02 Fridtjof has joined #scopehal

05:02 juh has joined #scopehal

05:02 juri_ has joined #scopehal

05:02 davidc__ has joined #scopehal

05:02 esden has joined #scopehal

05:02 mithro has joined #scopehal

05:02 welterde has joined #scopehal

06:42 <_whitenotifier-1> [scopehal-apps] azonenberg pushed 2 commits to master [+0/-0/±4] https://github.com/glscopeclient/scopehal-apps/compare/1e326a3735c2...9f5f998e1b45

06:42 <_whitenotifier-1> [scopehal-apps] azonenberg 3e60ab7 - Fixed missing space in appdate field

06:42 <_whitenotifier-1> [scopehal-apps] azonenberg 9f5f998 - Initial serialization work on trigger groups (and manage instruments dialog). Fixes #607.

06:42 <_whitenotifier-1> [scopehal-apps] azonenberg closed issue #607: Serialization support for trigger groups - https://github.com/glscopeclient/scopehal-apps/issues/607

07:40 bvernoux has joined #scopehal

08:26 bvernoux has quit [Read error: Connection reset by peer]

14:11 sgstair has quit [Server closed connection]

14:11 sgstair has joined #scopehal

19:52 bvernoux has joined #scopehal

20:09 <d1b2> <246tnt> I just switched my RX570 for a RX6600 and now I can run ngscopehal without kernel panic 😁

20:10 <d1b2> <246tnt> Also doesn't crash when zooming out 🤔

20:15 <t4nk_freenode> hey, @246tnt ... I had the same thing with my rx580 last time I tried, haven't dared trying again since, since I like to keep my machine alive ;)

20:16 <t4nk_freenode> I was just about to ask how things were in this regard, but apparently the issues still exist

20:16 <d1b2> <246tnt> Yeah, me too, I tried a couple of times with that card, but didn't try to debug it much.

20:17 <t4nk_freenode> how's that rx6600 for you? is it much better than the rx570?

20:17 <d1b2> <246tnt> To be fair, I had fairly old distro ( 20.04 ) so kernel ( drm ) and mesa ( radv vulkan driver ) werent very fresh so maybe it would work with more recent ones but ...

20:18 <t4nk_freenode> no man, I'm using gentoo with the latest of everything and I had the same

20:18 <d1b2> <246tnt> I literally put it in the PC like 3h ago and just finished upgrading the OS a bit so that I can actually use it, so don't have much impression yet 😅

20:18 <azonenberg> well we also fixed a bunch of bugs around AMD stuff recently

20:18 <d1b2> <david.rysk> userspace shouldn't be causing panics

20:19 <d1b2> <david.rysk> I feel like upstream would be interested in that

20:19 <d1b2> <246tnt> Well, I couldn't report anything because with versions I was running all that upstream would say is "update" and then report 😅

20:19 <d1b2> <david.rysk> there are PPAs available with updated versions for ubuntu, I'm pretty sure

20:39 <d1b2> <246tnt> yeah, most didn't go back to 18.04 even the kernel was a problem to update because of libc deps. Believe me, before I upgraded to 20.04 I tried some option to avoid that because I didn't have enough disk space for the update so I had to move to another drive which was a pain ...

20:52 t4nk_freenode is now known as t4nk_fn

21:02 <_whitenotifier-1> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±3] https://github.com/glscopeclient/scopehal/compare/39aff5824827...956152729151

21:02 <_whitenotifier-1> [scopehal] azonenberg 9561527 - SubtractFilter: correctly handle trigger phase offset as long as sample rates are equal. Fixes #609.

21:02 <_whitenotifier-1> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±4] https://github.com/glscopeclient/scopehal-apps/compare/9f5f998e1b45...32e9c0d909d1

21:02 <_whitenotifier-1> [scopehal-apps] azonenberg 32e9c0d - Updated submodules, added restrict read/writeonly qualifiers to SSBOs for deskew shader

21:13 <_whitenotifier-1> [scopehal-apps] azonenberg closed issue #609: Subtract filter: support trigger phases - https://github.com/glscopeclient/scopehal-apps/issues/609

21:34 <d1b2> <hansemro> ngscopeclient (master) should be fixed for RX4XX/RX5XX cards as well. Just tested with RX480.

21:34 <d1b2> <hansemro> https://cdn.discordapp.com/attachments/776941750291267595/1151269728128614571/image.png

21:35 <azonenberg> which reminds me, we still have a lot of shaders i need to refactor to support 2D work groups

21:35 <azonenberg> right now we assume we can run arbitrarily many groups in the X axis, which is true on nvidia (2^31 max groups)

21:35 <azonenberg> but most nvidia/intel cards have much lower limits

21:36 <azonenberg> most amd/intel*

21:36 <azonenberg> so working with hundred megapoint to gigapoint waveforms will require 2D dispatches

21:36 <d1b2> <246tnt> @hansemro Mmmm ... do you know which commit fixed it ?

21:36 <d1b2> <hansemro> https://github.com/glscopeclient/scopehal/pull/790

22:08 <t4nk_fn> @hansemro are we talking about the same thing? because I don't know what 'incorrect Vulkan queue type used' means

22:08 <t4nk_fn> but I was talking of a vicious system crash

22:09 <d1b2> <hansemro> Yes, this is the same issue. Mesa-radv driver would crash the kernel when using the wrong queue

22:09 <t4nk_fn> not just a black screen and unresponsive, but the whole system crashing, it was really scary

22:09 <d1b2> <hansemro> Interestingly, AMD's vulkan driver (amdvlk) does not crash, but would abort cleanly

22:13 <t4nk_fn> well, I'll try again sometime soon, see if I can rebuild and if it works for my 580 too

22:23 <d1b2> <hansemro> This is odd, but using amdvlk vulkan driver displays a larger compute group size compared to mesa-radv: 4294967295 vs 65535

22:23 <d1b2> <hansemro> https://cdn.discordapp.com/attachments/776941750291267595/1151282048443629568/image.png

22:24 <d1b2> <hansemro> https://cdn.discordapp.com/attachments/776941750291267595/1151282170879557673/image.png

22:27 <azonenberg> interesting

22:27 <azonenberg> so perhaps thats a mesa limitation. either way we need to support the smaller group sizes on intel

22:28 <t4nk_fn> so those 'amdvlk' and mesa-radv ... they are part of mesa or so?

22:29 <t4nk_fn> I think I have xf86-video-amdgpu and mesa master on my system

22:29 <d1b2> <johnsel> https://cdn.discordapp.com/attachments/776941750291267595/1151283543658483712/image.png

22:30 <t4nk_fn> + + vulkan : Add support for 3D graphics and computing via the Vulkan cross-platform API

22:31 <t4nk_fn> + + video_cards_radeonsi

22:32 <d1b2> <johnsel> I think mesa-radv is the love child of Valve and others wanting to support vulkan on AMD SoC for their projects

22:32 <d1b2> <johnsel> amdvlk is AMD's own driver

22:34 <d1b2> <johnsel> AMD drivers are tuurrrrible though regardless. There are many videos of geohot debugging crash after crash for his tinygrad project

22:34 <d1b2> <johnsel> that said, he met with lisa su and they are actively working to better it

22:35 <d1b2> <johnsel> but they are really focussing on specific SoC based stuff, Tesla, Valve's game thing, Xbox and PS5

22:36 <d1b2> <johnsel> and now AI

22:37 <azonenberg> johnsel: that in the VM?

22:37 <d1b2> <johnsel> it sure is

22:37 <azonenberg> :D

22:37 <azonenberg> awesome

22:37 <azonenberg> So what's left to be able to run full CI jobs with vulkan? and where are we on the linux instance?

22:37 <d1b2> <johnsel> https://youtu.be/Mr0rWJhv9jU?t=689

22:38 <d1b2> <johnsel> https://youtu.be/Mr0rWJhv9jU?t=1000

22:38 <d1b2> <johnsel> very funny

22:39 <d1b2> <johnsel> honestly I have to review what we need to do myself, but we're fairly close to running our own CI jobs

22:39 <azonenberg> awesome :D

22:39 <azonenberg> I may buy some cheap siglent gear to use for hardware in loop tests eventually. i already have a SPD3303X-E power supply i'm not using most of the time

22:40 <t4nk_fn> holy smokes, that guy has issues

22:40 <t4nk_fn> other than a kernel panic

22:40 <d1b2> <johnsel> well look up who he is, he's a very respected hacker

22:40 <azonenberg> geohot?

22:40 <azonenberg> he's smart but he does in fact have issues :p

22:40 <d1b2> <johnsel> but personality wise yeah the dude is mental

22:41 <d1b2> <johnsel> oh yeah no doubt about it

22:41 <azonenberg> i've met him, he's a character

22:41 <azonenberg> why are the best hackers always absolutely insane?

22:41 <d1b2> <johnsel> comes with the territory

22:41 * azonenberg looks in general direction of chris tarnovsky

22:44 <d1b2> <johnsel> and to answer your other question azonenberg, can you swap the GPUs again?

22:44 <d1b2> <johnsel> also note I still have to test running stuff headless

22:45 <d1b2> <johnsel> I was able to get GPU acceleration over RDP but not sunshine (which hooks the GPU driver directly, normally, but could not now for some reason)

22:45 <azonenberg> swap the edid emu again? ok

22:45 <d1b2> <johnsel> but it's probably easier to get a good overview of all those tasks with the edid emulators in place, and with a good test protocol

22:45 <azonenberg> Done

22:45 <azonenberg> I have one for the other card coming any minute now, still shows out for delivery

22:45 <d1b2> <johnsel> we may also need that edid in place during boot

22:46 <d1b2> <johnsel> which is not ideal

22:46 <d1b2> <johnsel> but I guess it is what it is

22:47 <azonenberg> well long term there will be one dedicated to each card

22:47 <azonenberg> moving them is just temporary for the next few hours :p

22:47 <d1b2> <johnsel> yep yep, but it may need to be in there when the system boots

22:47 <d1b2> <johnsel> because the firmware might be onto us

22:47 <d1b2> <johnsel> the RDP working but sunshine not is kinda weird

22:48 <d1b2> <johnsel> there is another remote gaming app I can try that might work but it would be easiest to exclude all possible reasons for that happening

22:49 <d1b2> <johnsel> the internal console connects to the virtio gpu(?? or whatever else xen uses)

22:50 <d1b2> <johnsel> anyway if you could pretty please do a full server reboot once the edid emulators are in place then I can be sure I have to look on the software side

22:50 <d1b2> <johnsel> I think before the end of this year we will have CI fully operational

22:50 <d1b2> <johnsel> will that be usb or ethernet hardware by the way?

22:52 <d1b2> <johnsel> also normal trigger mode hangs ngscopeclient 😦

22:52 <d1b2> <johnsel> at least with demoscope

22:52 <d1b2> <johnsel> I can try with my RIgol later

22:52 <d1b2> <johnsel> I'm bouncing between tasks, can't get anything done this way

23:01 <d1b2> <johnsel> also azonenberg, suppose I would bridge the VPN with my internal ethernet that is dedicated to the Rigol (I am grateful for them sending it, but that thing is not going on my LAN lol) would the vpn give out an IP to it?

23:01 <d1b2> <johnsel> or do we need a full s2s config for that?

23:02 <d1b2> <johnsel> mm better to port forward

23:18 <d1b2> <louis8374> This would work for me

23:18 <azonenberg> a full host reboot?

23:19 <azonenberg> that would be a pain

23:19 <azonenberg> as far as vpn, the config i have right now hands out one IP to each client but i can add routing rules for subnets

23:20 <azonenberg> vpn endpoints go in 10.255.2.x/24 and actual site systems are 10.site.subnet.host

23:20 <azonenberg> with the CI environment living in site #2

23:20 <azonenberg> so i could assign a site number to your test network and add routing rules for it, open scpi port traffic to it, etc

23:21 <azonenberg> it'd take a bit of setup work but is absolutely doable and i have peering arrangements over the same vpn with other people

23:25 bvernoux has quit [Quit: Leaving]

23:40 <d1b2> <johnsel> yeah...

23:40 <d1b2> <johnsel> did you boot at least 1 GPU with edid emu?

23:41 <azonenberg> The one i had on fpgadev before. which i think is the one now attached to your linux builder

23:41 <azonenberg> whats sunshine?

23:41 <azonenberg> i'm just using ssh+vnc to the linux test system on my end

23:41 <d1b2> <johnsel> can you swap the GPUs between the VMs?

23:41 <azonenberg> I cant control which gpu goes to which vm

23:42 <d1b2> <johnsel> can kill both v

23:42 <azonenberg> i just ask for "a GPU of this type"

23:42 <azonenberg> and i have no idea which one gets attached

23:42 <d1b2> <johnsel> that sucks

23:42 <azonenberg> i guess i could force that a bit

23:42 <azonenberg> shut down one of yours, i attach to another instance

23:42 <azonenberg> shut down the other one, leaving one free

23:42 <azonenberg> now the next one to start has to get the free card

23:42 <azonenberg> etc

23:42 <d1b2> <johnsel> not sure that works but you have more vision into the pci endpoints

23:43 <azonenberg> but they're supposed to be equivalent, it shouldnt matter which gets what :p

23:43 <d1b2> <johnsel> sunshine is a remote desktop app

23:43 <azonenberg> if only one of the two cards is attached and the other is free

23:43 <d1b2> <johnsel> but it hooks into low level GPU driver paths

23:43 <azonenberg> any vm that starts will get the free one

23:43 <azonenberg> the idea is to keep exactly one card free at all times so yo ucan "hand off" the card from one vm to another

23:43 <azonenberg> with a third vm it should be possible to exchange cards between two

23:43 <d1b2> <johnsel> https://github.com/LizardByte/Sunshine

23:44 <d1b2> <johnsel> well let's give it a try

23:44 <azonenberg> ok so shut down one instance

23:44 <d1b2> <johnsel> the problem I am facing is running things headless

23:44 <d1b2> <johnsel> can you do it? I'm eating

23:44 <d1b2> <johnsel> you can just kill the box

23:44 <azonenberg> Gimme a few. Or i could just wait until the edid emulators get here if you arent in a rush?

23:45 <d1b2> <johnsel> I'm not in a rush, but the edid emulator may be necessary at boot for it to initialize fully

23:46 <azonenberg> We'll deal with that if we have to but i have enough other prod stuff on this same host i'd rather not reboot it if i can avoid it

23:46 <azonenberg> long term i kinda want to get a second xen server so i can do failover. the second one wouldn't even be running all the time

23:47 <azonenberg> i'd turn it on, migrate stuff onto it, shut down the primary, work on it

23:47 <azonenberg> then i could run both if i had demand scaling beyond what one could handle

23:47 <azonenberg> but thats expensive enough i'm not doing it yet :p

23:48 <d1b2> <johnsel> I understand, but I am having crashes via RDP and the sunshine tool and nvidia tool show the card is connected but I am not connected to the framebuffer

23:48 <d1b2> <johnsel> sunshine should be able to do that

23:48 <azonenberg> interesting

23:48 <d1b2> <johnsel> so it may only be partially functional over RDP

23:48 <d1b2> <johnsel> and RDP has it's own display driver that does things

23:48 <d1b2> <johnsel> so it can talk to the nvidia driver and get it to do some things

23:49 <d1b2> <johnsel> render in blender works fine

23:49 <d1b2> <johnsel> so strange behavior all over, and my experience has taught me that that is the card or drivers only half initializing

23:50 <azonenberg> i assume you've rebooted the instance already?

23:50 <d1b2> <johnsel> yes

23:50 <azonenberg> Ok

23:50 <azonenberg> well i can try a host reboot once the edid emus get here i guess

23:51 <d1b2> <johnsel> anyway 1 card should have been booted with edid emu and should be golden as far so we can try to differentially diagnose by swapping cardsd

23:51 * azonenberg pictures flock of large flightless birds with VESA mounts on them

23:51 <d1b2> <johnsel> I did not notice any change in behavior when you swapped the edid

23:51 <d1b2> <johnsel> lol

23:51 <d1b2> <johnsel> that took me a while

23:54 <d1b2> <johnsel> hmm

23:54 <d1b2> <johnsel> food gave me some ideas to test

23:55 <d1b2> <johnsel> we could disable xen GPU and see what happens

23:59 <d1b2> <johnsel> HAH

23:59 <d1b2> <johnsel> fuck