azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 264 seconds]
Degi_ is now known as Degi
massi has joined #scopehal
<_whitenotifier> [scopehal] makerprobe commented on issue #739: ngscopeclient crashes when an invalid path to a USBTMC instrument is given - https://github.com/glscopeclient/scopehal/issues/739#issuecomment-1331956702
massi_ has joined #scopehal
massi has quit [Ping timeout: 264 seconds]
massi_ has quit [Remote host closed the connection]
mikolajw has quit [Quit: Bridge terminating on SIGTERM]
jevinskie[m] has quit [Quit: Bridge terminating on SIGTERM]
fridtjof[m] has quit [Quit: Bridge terminating on SIGTERM]
sajattack[m] has quit [Quit: Bridge terminating on SIGTERM]
whitequark has quit [Quit: Bridge terminating on SIGTERM]
mikolajw has joined #scopehal
sajattack[m] has joined #scopehal
fridtjof[m] has joined #scopehal
whitequark has joined #scopehal
jevinskie[m] has joined #scopehal
bvernoux has quit [Ping timeout: 264 seconds]
<d1b2> <louis> azonenberg: found MP's bug with the FIR filter
<azonenberg> oh? what was it
<d1b2> <louis> We have some cache coherency issue with nonblockingly-bound buffers bound to compute pipelines. The memory being available seems to race
<d1b2> <louis> So this fixes it: diff diff --git a/scopeprotocols/FIRFilter.cpp b/scopeprotocols/FIRFilter.cpp index b1dd3dd..3558cd0 100644 --- a/scopeprotocols/FIRFilter.cpp +++ b/scopeprotocols/FIRFilter.cpp @@ -232,9 +232,9 @@ void FIRFilter::DoFilterKernel( args.end = din->size() - m_coefficients.size(); args.filterlen = m_coefficients.size(); - m_computePipeline.BindBufferNonblocking(0, din->m_samples, cmdBuf); -
<d1b2> m_computePipeline.BindBufferNonblocking(1, m_coefficients, cmdBuf); - m_computePipeline.BindBufferNonblocking(2, cap->m_samples, cmdBuf, true); + m_computePipeline.BindBuffer(0, din->m_samples); + m_computePipeline.BindBuffer(1, m_coefficients); + m_computePipeline.BindBuffer(2, cap->m_samples, true); m_computePipeline.Dispatch(cmdBuf, args, GetComputeBlockCount(args.end, 64)); cmdBuf.end();
<d1b2> <louis> But that shouldn't be neccesary
<d1b2> <louis> So it seems like we have some underlying bug that the bindings are not available by kernel invocation
<azonenberg> Hmmmm
<d1b2> <louis> (Or was the use of BindBufferNonblocking a mistake here in the first place?)
<azonenberg> did you try running with the vulkan validation layer enabled?
<azonenberg> BindBufferNonblocking is supposed to submit a transfer request to the same command buffer as the filter
<azonenberg> the idea being that you do the transfer then the compute task happens
<azonenberg> it's possible we need some kind of sync primitive
<azonenberg> but we have a pipeline barrier in AcceleratorBuffer::CopyToGpuNonblocking()
<azonenberg> which SHOULD be sufficient
<azonenberg> I wonder if maybe writing to the memory in m_cpuBuffer needs some sync primitive too wrt cache coherency with the gpu
<azonenberg> i'll do some reading later today
<azonenberg> The other bit is, this is happening on M1 right?
<azonenberg> the whole memcpy is unnecessary there. We just haven't added optimizations for unified memory architectures yet
<d1b2> <louis> I reproduced it on my intel + nvidia laptop
<azonenberg> with which card?
<d1b2> <louis> (actually, not sure which gpu)
<azonenberg> sooo the other thing is if perhaps we're using a memory range that is not host coherent
<azonenberg> Can you check the logs for that run
<azonenberg> it should say "using type X for pinned host memory" somewhere
<azonenberg> and then if you look up when it dumps the physical devices, under memory types
<azonenberg> see if type X lists "host coherent"
<d1b2> <louis> Was using Quadro M2000M
<d1b2> <louis> "Using type 9 for pinned host memory" Type 9 Heap index: 1 Host visible Host coherent Host cached
<azonenberg> Well there goes that theory
<d1b2> <louis> perhaps I am reading it wrong
<azonenberg> no, that's correct
<azonenberg> it's host cached
<azonenberg> and host coherent
<azonenberg> @louis: what is the input to the FIR
<azonenberg> a scope channel?
<d1b2> <louis> Yes
<d1b2> <louis> (from a Tek)
<d1b2> <louis> Hm. I backed out that change and now I can't reproduce it again.
<azonenberg> yeah i'm looking at the vulkan docs and i cant find anything wrong with the sync we have
<azonenberg> I think it's something else
<azonenberg> and you just got lucky with that change
<azonenberg> (the vulkan validation layers should do a good job of complaining about missing barriers)
<d1b2> <louis> OK, interesting. Re-repro'ed
<d1b2> <louis> this
<d1b2> <louis> filter graph reproduces blinking/blanking on both the FFT and FIR output, so it's not just a problem in the FIR block
<d1b2> <louis> and yes, hm, the blocking bindbuffer does not fix it
<d1b2> <louis> --nogpufilter seems to stop it from happening?
<d1b2> <louis> investigating...
<d1b2> <louis> wonder if it's a locking bug in the tek driver
<d1b2> <louis> Hm. No, seems like it is some GPU cache coherency issue FIRFilter din->size() = 1000000 samples: 0.599090 0.662823 0.688316 0.726556 FIRFilter cap->size() = 999965 samples: 0.000000 0.000000 0.000000 0.000000
<d1b2> <louis> valid samples going into FIR kernel, zeroes coming out
<d1b2> <louis> interestingly, when the FFT filter flips out, it emits all -infs
<d1b2> <louis> but this does not seem to be a numerical singularity issue since re-running the filter does not produce the same issue
<d1b2> <louis> OK, interesting. Those respectivley are the behaviours the GPU kernel produces on an all-zero waveform. So looks like it is on the GPU end that we have the coherence problem
<azonenberg> Hmmm how likely is it that it's a problem with setting the modified flag somewhere?
<azonenberg> i.e. is BindBufferNonblocking actually calling CopyDataToGpu()?
<azonenberg> oho
<azonenberg> i think i know what might be happening
<azonenberg> @louis: ok so here's my hypothesis of what's going on
<azonenberg> it's a cross queue cache coherency issue
<azonenberg> we copy data onto the GPU in a different queue
<azonenberg> and it's not made visible to the GPU using a different queue to run the shader
<d1b2> <louis> our barrier to ensure the memory is visible is blocking the wrong queue, you mean?
<azonenberg> Something to that effect, yeah. so hypothetical scenario is, we copy the data to the GPU in the rendering thread and add a barrier in the render queue
<azonenberg> simultaneously the filter graph runs
<azonenberg> it sees the data is already on the GPU
<azonenberg> so doesn't add a barrier
<d1b2> <louis> Tried adding din->MarkModifiedFromCpu(); right before the filter kernels and that dosen't seem to fix it.
<azonenberg> yeah i'm thinking
<d1b2> <louis> Adding that and changing BindBufferNonblocking to BindBuffer seems to fix it? (Which concurs with what I thought had initially patched it.)
<azonenberg> did you run with the vulkan validation layers active?
<azonenberg> specifically with synchronization checks enabled
<d1b2> <louis> Do you have a recipie for how to do that?
<azonenberg> vkconfig
<azonenberg> This is how i have mine set up by default when doing dev
<azonenberg> sometimes i enable "break" which will add a debug breakpoint when a validation event happens
<azonenberg> leave vkconfig open while running ngscopeclient, by default anything it changes is undone when you quit vkconfig
<d1b2> <louis> OK, got that to work (why is it in the confusingly named non-default vulkan-extra-tools package on arch??)
<azonenberg> it's part of the sdk normally
<d1b2> <louis> Still observe glitching and don't see anything in the log
<azonenberg> hmmm
<d1b2> <louis> Here's an interesting semi-related data point
<d1b2> <louis> is I tried this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..6e7ff89 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -184,7 +184,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st continue; //If handle is unallocated, use it right away - if(m_queues[i].Handle.use_count() == 0) + if(true ||
<d1b2> m_queues[i].Handle.use_count() == 0) { LogDebug("QueueManager creating family=%zu index=%zu name=%s\n", m_queues[i].Family, m_queues[i].Index, name.c_str()); m_queues[i].Handle = make_shared<QueueHandle>(
<d1b2> <louis> under the theory that it would force all tasks to use the same (first) queue
<d1b2> <louis> & judging from the QueueManager debug output, that seemed to work
<d1b2> <louis> but (A) I still saw the blanking issue at least once and (B) it hangs/crashes a few seconds in under that configuration
<d1b2> <louis> oh wait duh nvm
<d1b2> <louis> anyway this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..90aa8a3 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -177,7 +177,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st //Because we sort m_queues by flag count in the constructor, the first match //should be the one with the least feature flags that satisfies the request.
<d1b2> ssize_t chosenIdx = -1; - for(size_t i=0; i<m_queues.size(); i++) + for(size_t i=0; i<1; i++) { //Skip if flags don't match if(!(m_queues[i].Flags & flags)) which i think should do what i want results in no crashing but still blinking