#scopehal on 2022-11-30 — irc logs at libera.irclog.whitequark.org

2022-03-25 21:41 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

03:10 Degi_ has joined #scopehal

03:10 Degi has quit [Ping timeout: 264 seconds]

03:10 Degi_ is now known as Degi

07:30 massi has joined #scopehal

10:44 <_whitenotifier> [scopehal] makerprobe commented on issue #739: ngscopeclient crashes when an invalid path to a USBTMC instrument is given - https://github.com/glscopeclient/scopehal/issues/739#issuecomment-1331956702

14:48 massi_ has joined #scopehal

14:48 massi has quit [Ping timeout: 264 seconds]

17:07 massi_ has quit [Remote host closed the connection]

17:37 mikolajw has quit [Quit: Bridge terminating on SIGTERM]

17:37 jevinskie[m] has quit [Quit: Bridge terminating on SIGTERM]

17:37 fridtjof[m] has quit [Quit: Bridge terminating on SIGTERM]

17:37 sajattack[m] has quit [Quit: Bridge terminating on SIGTERM]

17:37 whitequark has quit [Quit: Bridge terminating on SIGTERM]

17:43 mikolajw has joined #scopehal

17:57 sajattack[m] has joined #scopehal

17:57 fridtjof[m] has joined #scopehal

17:57 whitequark has joined #scopehal

17:57 jevinskie[m] has joined #scopehal

18:26 bvernoux has quit [Ping timeout: 264 seconds]

20:37 <d1b2> <louis> azonenberg: found MP's bug with the FIR filter

20:37 <azonenberg> oh? what was it

20:37 <d1b2> <louis> We have some cache coherency issue with nonblockingly-bound buffers bound to compute pipelines. The memory being available seems to race

20:38 <d1b2> <louis> So this fixes it: diff diff --git a/scopeprotocols/FIRFilter.cpp b/scopeprotocols/FIRFilter.cpp index b1dd3dd..3558cd0 100644 --- a/scopeprotocols/FIRFilter.cpp +++ b/scopeprotocols/FIRFilter.cpp @@ -232,9 +232,9 @@ void FIRFilter::DoFilterKernel( args.end = din->size() - m_coefficients.size(); args.filterlen = m_coefficients.size(); - m_computePipeline.BindBufferNonblocking(0, din->m_samples, cmdBuf); -

20:38 <d1b2> m_computePipeline.BindBufferNonblocking(1, m_coefficients, cmdBuf); - m_computePipeline.BindBufferNonblocking(2, cap->m_samples, cmdBuf, true); + m_computePipeline.BindBuffer(0, din->m_samples); + m_computePipeline.BindBuffer(1, m_coefficients); + m_computePipeline.BindBuffer(2, cap->m_samples, true); m_computePipeline.Dispatch(cmdBuf, args, GetComputeBlockCount(args.end, 64)); cmdBuf.end();

20:38 <d1b2> <louis> But that shouldn't be neccesary

20:38 <d1b2> <louis> So it seems like we have some underlying bug that the bindings are not available by kernel invocation

20:38 <azonenberg> Hmmmm

20:38 <d1b2> <louis> (Or was the use of BindBufferNonblocking a mistake here in the first place?)

20:38 <azonenberg> did you try running with the vulkan validation layer enabled?

20:39 <azonenberg> BindBufferNonblocking is supposed to submit a transfer request to the same command buffer as the filter

20:39 <azonenberg> the idea being that you do the transfer then the compute task happens

20:39 <azonenberg> it's possible we need some kind of sync primitive

20:40 <azonenberg> but we have a pipeline barrier in AcceleratorBuffer::CopyToGpuNonblocking()

20:40 <azonenberg> which SHOULD be sufficient

20:42 <azonenberg> I wonder if maybe writing to the memory in m_cpuBuffer needs some sync primitive too wrt cache coherency with the gpu

20:42 <azonenberg> i'll do some reading later today

20:43 <azonenberg> The other bit is, this is happening on M1 right?

20:44 <azonenberg> the whole memcpy is unnecessary there. We just haven't added optimizations for unified memory architectures yet

20:44 <d1b2> <louis> I reproduced it on my intel + nvidia laptop

20:44 <azonenberg> with which card?

20:44 <d1b2> <louis> (actually, not sure which gpu)

20:45 <azonenberg> sooo the other thing is if perhaps we're using a memory range that is not host coherent

20:45 <azonenberg> Can you check the logs for that run

20:45 <azonenberg> it should say "using type X for pinned host memory" somewhere

20:46 <azonenberg> and then if you look up when it dumps the physical devices, under memory types

20:46 <azonenberg> see if type X lists "host coherent"

20:46 <d1b2> <louis> Was using Quadro M2000M

20:47 <d1b2> <louis> "Using type 9 for pinned host memory" Type 9 Heap index: 1 Host visible Host coherent Host cached

20:47 <azonenberg> Well there goes that theory

20:47 <d1b2> <louis> https://louis.members.acm.umn.edu/www/.share/.sbh--1669841239.1748364--clipboard.txt here is the Vk startup output

20:47 <d1b2> <louis> perhaps I am reading it wrong

20:48 <azonenberg> no, that's correct

20:48 <azonenberg> it's host cached

20:48 <azonenberg> and host coherent

20:50 <azonenberg> @louis: what is the input to the FIR

20:50 <azonenberg> a scope channel?

20:50 <d1b2> <louis> Yes

20:51 <d1b2> <louis> (from a Tek)

21:03 <d1b2> <louis> Hm. I backed out that change and now I can't reproduce it again.

21:04 <azonenberg> yeah i'm looking at the vulkan docs and i cant find anything wrong with the sync we have

21:04 <azonenberg> I think it's something else

21:05 <azonenberg> and you just got lucky with that change

21:05 <azonenberg> (the vulkan validation layers should do a good job of complaining about missing barriers)

21:36 <d1b2> <louis> OK, interesting. Re-repro'ed

21:36 <d1b2> <louis> this

21:36 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1047627200523939850/image.png

21:36 <d1b2> <louis> filter graph reproduces blinking/blanking on both the FFT and FIR output, so it's not just a problem in the FIR block

21:38 <d1b2> <louis> and yes, hm, the blocking bindbuffer does not fix it

21:43 <d1b2> <louis> --nogpufilter seems to stop it from happening?

21:43 <d1b2> <louis> investigating...

21:46 <d1b2> <louis> wonder if it's a locking bug in the tek driver

22:02 <d1b2> <louis> Hm. No, seems like it is some GPU cache coherency issue FIRFilter din->size() = 1000000 samples: 0.599090 0.662823 0.688316 0.726556 FIRFilter cap->size() = 999965 samples: 0.000000 0.000000 0.000000 0.000000

22:02 <d1b2> <louis> valid samples going into FIR kernel, zeroes coming out

22:19 <d1b2> <louis> interestingly, when the FFT filter flips out, it emits all -infs

22:20 <d1b2> <louis> but this does not seem to be a numerical singularity issue since re-running the filter does not produce the same issue

22:35 <d1b2> <louis> OK, interesting. Those respectivley are the behaviours the GPU kernel produces on an all-zero waveform. So looks like it is on the GPU end that we have the coherence problem

22:49 <azonenberg> Hmmm how likely is it that it's a problem with setting the modified flag somewhere?

22:49 <azonenberg> i.e. is BindBufferNonblocking actually calling CopyDataToGpu()?

22:56 <azonenberg> oho

22:56 <azonenberg> i think i know what might be happening

22:58 <azonenberg> @louis: ok so here's my hypothesis of what's going on

22:58 <azonenberg> it's a cross queue cache coherency issue

22:59 <azonenberg> we copy data onto the GPU in a different queue

22:59 <azonenberg> and it's not made visible to the GPU using a different queue to run the shader

23:01 <d1b2> <louis> our barrier to ensure the memory is visible is blocking the wrong queue, you mean?

23:01 <azonenberg> Something to that effect, yeah. so hypothetical scenario is, we copy the data to the GPU in the rendering thread and add a barrier in the render queue

23:02 <azonenberg> simultaneously the filter graph runs

23:02 <azonenberg> it sees the data is already on the GPU

23:02 <azonenberg> so doesn't add a barrier

23:03 <d1b2> <louis> Tried adding din->MarkModifiedFromCpu(); right before the filter kernels and that dosen't seem to fix it.

23:03 <azonenberg> yeah i'm thinking

23:03 <d1b2> <louis> Adding that and changing BindBufferNonblocking to BindBuffer seems to fix it? (Which concurs with what I thought had initially patched it.)

23:03 <azonenberg> did you run with the vulkan validation layers active?

23:03 <azonenberg> specifically with synchronization checks enabled

23:03 <d1b2> <louis> Do you have a recipie for how to do that?

23:04 <azonenberg> vkconfig

23:04 <d1b2> <azonenberg> https://cdn.discordapp.com/attachments/776941750291267595/1047649388601823263/vkconfig-3.png

23:04 <azonenberg> This is how i have mine set up by default when doing dev

23:05 <azonenberg> sometimes i enable "break" which will add a debug breakpoint when a validation event happens

23:05 <azonenberg> leave vkconfig open while running ngscopeclient, by default anything it changes is undone when you quit vkconfig

23:13 <d1b2> <louis> OK, got that to work (why is it in the confusingly named non-default vulkan-extra-tools package on arch??)

23:15 <azonenberg> it's part of the sdk normally

23:16 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1047652359280541788/image.png

23:16 <d1b2> <louis> Still observe glitching and don't see anything in the log

23:18 <azonenberg> hmmm

23:26 <d1b2> <louis> Here's an interesting semi-related data point

23:26 <d1b2> <louis> is I tried this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..6e7ff89 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -184,7 +184,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st continue; //If handle is unallocated, use it right away - if(m_queues[i].Handle.use_count() == 0) + if(true ||

23:26 <d1b2> m_queues[i].Handle.use_count() == 0) { LogDebug("QueueManager creating family=%zu index=%zu name=%s\n", m_queues[i].Family, m_queues[i].Index, name.c_str()); m_queues[i].Handle = make_shared<QueueHandle>(

23:27 <d1b2> <louis> under the theory that it would force all tasks to use the same (first) queue

23:27 <d1b2> <louis> & judging from the QueueManager debug output, that seemed to work

23:27 <d1b2> <louis> but (A) I still saw the blanking issue at least once and (B) it hangs/crashes a few seconds in under that configuration

23:27 <d1b2> <louis> oh wait duh nvm

23:33 <d1b2> <louis> anyway this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..90aa8a3 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -177,7 +177,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st //Because we sort m_queues by flag count in the constructor, the first match //should be the one with the least feature flags that satisfies the request.

23:33 <d1b2> ssize_t chosenIdx = -1; - for(size_t i=0; i<m_queues.size(); i++) + for(size_t i=0; i<1; i++) { //Skip if flags don't match if(!(m_queues[i].Flags & flags)) which i think should do what i want results in no crashing but still blinking