massi_ has quit [Remote host closed the connection]
mikolajw has quit [Quit: Bridge terminating on SIGTERM]
jevinskie[m] has quit [Quit: Bridge terminating on SIGTERM]
fridtjof[m] has quit [Quit: Bridge terminating on SIGTERM]
sajattack[m] has quit [Quit: Bridge terminating on SIGTERM]
whitequark has quit [Quit: Bridge terminating on SIGTERM]
mikolajw has joined #scopehal
sajattack[m] has joined #scopehal
fridtjof[m] has joined #scopehal
whitequark has joined #scopehal
jevinskie[m] has joined #scopehal
bvernoux has quit [Ping timeout: 264 seconds]
<d1b2>
<louis> azonenberg: found MP's bug with the FIR filter
<azonenberg>
oh? what was it
<d1b2>
<louis> We have some cache coherency issue with nonblockingly-bound buffers bound to compute pipelines. The memory being available seems to race
<d1b2>
<louis> filter graph reproduces blinking/blanking on both the FFT and FIR output, so it's not just a problem in the FIR block
<d1b2>
<louis> and yes, hm, the blocking bindbuffer does not fix it
<d1b2>
<louis> --nogpufilter seems to stop it from happening?
<d1b2>
<louis> investigating...
<d1b2>
<louis> wonder if it's a locking bug in the tek driver
<d1b2>
<louis> Hm. No, seems like it is some GPU cache coherency issue FIRFilter din->size() = 1000000 samples: 0.599090 0.662823 0.688316 0.726556 FIRFilter cap->size() = 999965 samples: 0.000000 0.000000 0.000000 0.000000
<d1b2>
<louis> valid samples going into FIR kernel, zeroes coming out
<d1b2>
<louis> interestingly, when the FFT filter flips out, it emits all -infs
<d1b2>
<louis> but this does not seem to be a numerical singularity issue since re-running the filter does not produce the same issue
<d1b2>
<louis> OK, interesting. Those respectivley are the behaviours the GPU kernel produces on an all-zero waveform. So looks like it is on the GPU end that we have the coherence problem
<azonenberg>
Hmmm how likely is it that it's a problem with setting the modified flag somewhere?
<azonenberg>
i.e. is BindBufferNonblocking actually calling CopyDataToGpu()?
<azonenberg>
oho
<azonenberg>
i think i know what might be happening
<azonenberg>
@louis: ok so here's my hypothesis of what's going on
<azonenberg>
it's a cross queue cache coherency issue
<azonenberg>
we copy data onto the GPU in a different queue
<azonenberg>
and it's not made visible to the GPU using a different queue to run the shader
<d1b2>
<louis> our barrier to ensure the memory is visible is blocking the wrong queue, you mean?
<azonenberg>
Something to that effect, yeah. so hypothetical scenario is, we copy the data to the GPU in the rendering thread and add a barrier in the render queue
<azonenberg>
simultaneously the filter graph runs
<azonenberg>
it sees the data is already on the GPU
<azonenberg>
so doesn't add a barrier
<d1b2>
<louis> Tried adding din->MarkModifiedFromCpu(); right before the filter kernels and that dosen't seem to fix it.
<azonenberg>
yeah i'm thinking
<d1b2>
<louis> Adding that and changing BindBufferNonblocking to BindBuffer seems to fix it? (Which concurs with what I thought had initially patched it.)
<azonenberg>
did you run with the vulkan validation layers active?
<azonenberg>
specifically with synchronization checks enabled
<d1b2>
<louis> Do you have a recipie for how to do that?
<d1b2>
<louis> Still observe glitching and don't see anything in the log
<azonenberg>
hmmm
<d1b2>
<louis> Here's an interesting semi-related data point
<d1b2>
<louis> is I tried this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..6e7ff89 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -184,7 +184,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st continue; //If handle is unallocated, use it right away - if(m_queues[i].Handle.use_count() == 0) + if(true ||
<d1b2>
<louis> under the theory that it would force all tasks to use the same (first) queue
<d1b2>
<louis> & judging from the QueueManager debug output, that seemed to work
<d1b2>
<louis> but (A) I still saw the blanking issue at least once and (B) it hangs/crashes a few seconds in under that configuration
<d1b2>
<louis> oh wait duh nvm
<d1b2>
<louis> anyway this diff diff --git a/scopehal/QueueManager.cpp b/scopehal/QueueManager.cpp index 6b8831e..90aa8a3 100644 --- a/scopehal/QueueManager.cpp +++ b/scopehal/QueueManager.cpp @@ -177,7 +177,7 @@ shared_ptr<QueueHandle> QueueManager::GetQueueWithFlags(vk::QueueFlags flags, st //Because we sort m_queues by flag count in the constructor, the first match //should be the one with the least feature flags that satisfies the request.
<d1b2>
ssize_t chosenIdx = -1; - for(size_t i=0; i<m_queues.size(); i++) + for(size_t i=0; i<1; i++) { //Skip if flags don't match if(!(m_queues[i].Flags & flags)) which i think should do what i want results in no crashing but still blinking