azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | | Logs:
Degi has quit [Ping timeout: 265 seconds]
Degi has joined #scopehal
bvernoux has joined #scopehal
<_whitenotifier-9> [scopehal-docs] bvernoux opened pull request #64: Update section-ng-gettingstarted.tex Add step to apply patch for Windows Mingw64 -
<_whitenotifier-9> [scopehal-apps] bvernoux opened pull request #592: Fix MinGW64 build failure with gslang -
<_whitenotifier-9> [scopehal-apps] mandl opened pull request #593: Fix macos again -
<_whitenotifier-9> [scopehal] mandl opened pull request #778: Add PullPulseWidthTrigger for siglent sds1000 -
<_whitenotifier-9> [scopehal-apps] azonenberg closed pull request #593: Fix macos again -
<_whitenotifier-9> [scopehal-apps] azonenberg pushed 4 commits to master [+0/-0/±3]
<_whitenotifier-9> [scopehal-apps] mandl 723a47d - Update build-macos.yml remove brew update and upgrade
<_whitenotifier-9> [scopehal-apps] mandl c738060 - Merge branch 'glscopeclient:master' into master
<_whitenotifier-9> [scopehal-apps] mandl 65841dd - Update build-macos.yml libomp 16.0.4 is already installed and up-to-date.
<_whitenotifier-9> [scopehal-apps] azonenberg 17e5bb4 - Merge pull request #593 from mandl/master Fix macos again
<_whitenotifier-9> [scopehal-apps] bvernoux edited a comment on issue #590: ngscopeclient Windows / Linux(Ubuntu 23.04) big slow down on some zoom step -
<bvernoux> good news
<bvernoux> Like discussed in PV with azonenberg
<bvernoux> I have tested the switch of ngscopeclient to VulkanSDK-
<bvernoux> and all work fine so far on Windows + mingw64
<bvernoux> It seems to be the same for Linux as Andrew is testing it also in //
<bvernoux> I will rebuild all also on my native Ubuntu 23.04 with VulkanSDK-
<_whitenotifier-9> [scopehal-apps] bvernoux commented on pull request #592: Fix MinGW64 build failure with gslang -
<_whitenotifier-9> [scopehal-apps] bvernoux closed pull request #592: Fix MinGW64 build failure with gslang -
<_whitenotifier-9> [scopehal-docs] bvernoux commented on pull request #64: Update section-ng-gettingstarted.tex Add step to apply patch for Windows Mingw64 -
<_whitenotifier-9> [scopehal-docs] bvernoux closed pull request #64: Update section-ng-gettingstarted.tex Add step to apply patch for Windows Mingw64 -
<_whitenotifier-9> [scopehal-docs] bvernoux opened pull request #65: Update section-gettingstarted.tex Windows(mingw64) & Linux use now official VulkanSDK -
<_whitenotifier-9> [scopehal-apps] bvernoux forked the repository -
<_whitenotifier-9> [scopehal-apps] bvernoux opened pull request #594: Update GitHub CI build-ubuntu.yml & build-windows.yml to use VulkanSDK -
<_whitenotifier-9> [scopehal-apps] bvernoux edited pull request #594: Update GitHub CI build-ubuntu.yml & build-windows.yml to use VulkanSDK -
<_whitenotifier-9> [scopehal-apps] bvernoux commented on issue #561: Compilation error on Ubuntu 20.04 -
<_whitenotifier-9> [scopehal-apps] bvernoux edited a comment on issue #561: Compilation error on Ubuntu 20.04 -
<_whitenotifier-9> [scopehal-docs] azonenberg closed pull request #65: Update section-gettingstarted.tex Windows(mingw64) & Linux use now official VulkanSDK -
<_whitenotifier-9> [scopehal-docs] azonenberg pushed 2 commits to master [+0/-0/±2]
<_whitenotifier-9> [scopehal-docs] bvernoux 82de376 - Update section-gettingstarted.tex Update Windows mingw64 & Linux to use official VulkanSDK instead of old VulkanSDK 13.224.1
<_whitenotifier-9> [scopehal-docs] azonenberg a7c0aaf - Merge pull request #65 from bvernoux/master Update section-gettingstarted.tex Windows(mingw64) & Linux use now official VulkanSDK
<_whitenotifier-9> [scopehal] azonenberg commented on pull request #778: Add PullPulseWidthTrigger for siglent sds1000 -
<_whitenotifier-9> [scopehal-apps] azonenberg closed pull request #594: Update GitHub CI build-ubuntu.yml & build-windows.yml to use VulkanSDK -
<_whitenotifier-9> [scopehal-apps] azonenberg pushed 5 commits to master [+0/-0/±6]
<_whitenotifier-9> [scopehal-apps] bvernoux a895bbd - Update build-ubuntu.yml use VulkanSDK
<_whitenotifier-9> [scopehal-apps] bvernoux 841d2a3 - Update build-windows.yml use VulkanSDK
<_whitenotifier-9> [scopehal-apps] bvernoux 9b5f3ee - Update build-ubuntu.yml Add VULKAN_SDK_VERSION with version as environment variable
<_whitenotifier-9> [scopehal-apps] ... and 2 more commits.
<_whitenotifier-9> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±1]
<_whitenotifier-9> [scopehal-apps] azonenberg c07266e - Update to latest scopehal-docs
<_whitenotifier-9> [scopehal-apps] robbederks commented on issue #561: Compilation error on Ubuntu 20.04 -
<_whitenotifier-9> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±3]
<_whitenotifier-9> [scopehal-apps] azonenberg 7f016e2 - Fixed bug causing lots of unnecessary wakeups of WaveformThread redrawing zero changed filters
<azonenberg> bvernoux: 7f016e2 should fix the unnecessary wakeups
<azonenberg> let me know how that works for you
<bvernoux> ha great
<azonenberg> i dont think it will do anything for the hangs/potential race
<azonenberg> (although it may make it trigger less often since there's less pointless renders happening)
<bvernoux> built
<bvernoux> ha nice
<bvernoux> Rasterize time and Tone map time are fixed now
<bvernoux> they only change when I zoom in/out or I scroll
<bvernoux> on the problematic zoom step also even it is like before and all feel like frozen on those 2 specific step
<bvernoux> anyway it is a very good fix
<azonenberg> Correct. That's the desired behavior, it should only rasterize and tone map if you do something requiring the shader to re-run
<azonenberg> this is mostly a performance/energy efficiency issue
<azonenberg> there's no point in having the gpu redraw unchanged data every frame
<azonenberg> now the compute shaders run on demand, and only the vertex/fragment shaders for the compositor and GUI run every frame
<azonenberg> When it freezes on that step, it only freezes when you move the plot though right?
<bvernoux> it clearly feel smoother
<azonenberg> and after a few seconds the rest of the GUI is responsive as long as you don't pan/zoom the waveform itself?
<azonenberg> you can interact with menus, dialogs, etc OK?
<azonenberg> if so, that's good. it means we're properly decoupling the compute pipeline from the GUI
<azonenberg> we still have the issue of the compute stuff hanging
<bvernoux> when it "freeze/very slowdown everything" it is only on 2 zoom steps like before and I can reproduce that only with my CH569 SerDes data
<azonenberg> I cannot reproduce on my machine with the ch569 last time i checked. will test again with a more exhaustive range of zooms etc
<bvernoux> I cannot reproduce that on SPI data you provide me for example
<azonenberg> this definitely feels like a race condition of some sort
<bvernoux> I tried all possible zoom steps one by one
<azonenberg> anyway, making it happen less often is still a win :p
<bvernoux> yes now the slow down on special zoom step appears when I move something in ngscopeclient
<bvernoux> as before it was always present
<bvernoux> it seems to affect only Rasterize time which jump from 300ms to 900ms ...
<bvernoux> when I scroll the timebase or if I change the intensity now
<azonenberg> Yep. So the rasterizing shader is either running slower or is blocking on something it shouldn't be, or there's some kind of livelock that eventually recovers
<azonenberg> it's almost certainly a thread synchronization issue, but this is still progress
<bvernoux> I see the GPU reaching >90% when moving the timebase on problematic zoom
<azonenberg> oh interesting. and it's not doing that on the other zooms?
<azonenberg> *that* is surprising. what i was expecting was that the GPU was idle because we had the CPU blocking on a mutex before kicking the shader off
<bvernoux> on other zoom when i move quickly the GPU is at max 15%
<bvernoux> it is GPU 0 - 3D the RTX4070
<azonenberg> It seems something is happening in the shader itself at those zooms
<azonenberg> really wreird
<bvernoux> when I do not do anything in ngscopeclient windows with non problematic zoom GPU for that task is at about 9%
<azonenberg> innteresting
<bvernoux> between 7 to 9%
<bvernoux> There's rendering with Peformance displayed too
<bvernoux> where I see only refresh on Framerate as now the other counter do not blink anymore
<bvernoux> Vertices & Indices change sometimes but like 1 time per second
<bvernoux> I suspect you have the same on your computer
<azonenberg> yeah thats normal, verts/indexes are mostly coming from imgui
<azonenberg> the actual waveform drawing is two triangles per waveform
<azonenberg> with a texture generated by the tone map step
<azonenberg> everything before that final compositing is compute shaders
<bvernoux> I have 3 Total filters with 69.3955 ms Exec time
<bvernoux> always with my CH569 SerDes dataset
<bvernoux> hmm new Vulkan is better
<bvernoux> I can enable debug now and ngscopeclient does not crash
<azonenberg> Yay
<bvernoux> so it was clearly a bug in previous VulkanSDK
<azonenberg> Not crashing is always good :p
<bvernoux> anyway when I reproduce the slow down with all possible Vulkan Debug I do not see any debug or things
<azonenberg> annoying, but not surprising
<azonenberg> See if you can get a trace under nsight or something that shows the hang?
<azonenberg> at this point i'm grasping at straws to see if we can figure out where the block is
<azonenberg> once we know what's blocking on what, we can begin to actually troubleshoot
<bvernoux> I have installed Nsight System on WSL2
<bvernoux> as IIRC it does not exist for Windows
<bvernoux> but I can do profiling of Windows with it
<bvernoux> I will install native Windows version ;)
<azonenberg> Sounds good. as far as run time goes, "rasterize" is the time to run RenderAllWaveforms() in WaveformThread
<azonenberg> This includes three mutex locks followed by building the render command buffer, submitting, and blocking for it to complete
<azonenberg> my thought was one of the mutexes was taking too long to acquire, but high GPU load suggests otherwise
<azonenberg> As an experiment, try moving the initialization of "tstart" to below the three mutex locks
<azonenberg> so it will only print the shader time, starting after it's holding the mutexes
<bvernoux> I'm installing NVIDIA Nsight Systems 2023.2.1
<azonenberg> this won't change the hang itself, but see if displayed rasterize time goes down or not
<bvernoux> it seems to be the most complete to measure anything related to GPU ...
<azonenberg> if it's still high in the hang state, then you know we're actually hanging GPU side
<azonenberg> but if it displays very low time in the hang (similar to a normal render) you know the hang is in the mutex acquisition
<azonenberg> that will tell me a lot
<bvernoux> I have done the profiling ;)
<azonenberg> And?
<bvernoux> amazing
<bvernoux> it is clearly something locked
<bvernoux> user request 126ms
<azonenberg> what do you mean "user request"
<bvernoux> with 100% CPU used
<azonenberg> thats not a term i'm familiar with
<bvernoux> yes
<bvernoux> we see also Vulkan API
<bvernoux> as I have traced everything
<azonenberg> can you send me the trace file? not sure if i can load it on another machine
<azonenberg> but worth trying
<bvernoux> it seems locked in vkWaitForFences
<bvernoux> yes it is a huge file
<bvernoux> report1.nsys-rep 54MB
<azonenberg> vkwaitforfences suggests it *is* hanging GPU side in the command buffer
<azonenberg> interesting
<bvernoux> but near there is also report1.arrows 558MB
<bvernoux> it's from 11.33s to >14s
<bvernoux> where I play with the special zoom which lag everything
<bvernoux> you will see it is easy to see it as the CPU cores are at >99%
<bvernoux> it is why I had the feeling even windows is frozen
<bvernoux> it was the same feeling on native Ubuntu 23.04 I could do the measurements too with same Apps for Linux
<bvernoux> azonenberg, do you have NVIDIA Nsight Systems 2023.2.1 installed ?
<azonenberg> I have 2022.3.4 and am about to go out and run an errand
<azonenberg> will get your version when i'm back
<bvernoux> ok
<bvernoux> see you
bvernoux has quit [Quit: Leaving]
Degi has quit [Ping timeout: 265 seconds]
Degi has joined #scopehal