<_whitenotifier>
[scopehal] azonenberg labeled issue #731: Consider deprecating blocking versions of PrepareForCpuAccess() / PrepareForGpuAccess() that use implicit global queue object - https://github.com/glscopeclient/scopehal/issues/731
<_whitenotifier>
[scopehal] azonenberg opened issue #731: Consider deprecating blocking versions of PrepareForCpuAccess() / PrepareForGpuAccess() that use implicit global queue object - https://github.com/glscopeclient/scopehal/issues/731
<_whitenotifier>
[scopehal] azonenberg labeled issue #731: Consider deprecating blocking versions of PrepareForCpuAccess() / PrepareForGpuAccess() that use implicit global queue object - https://github.com/glscopeclient/scopehal/issues/731
<lain>
azonenberg: looking at the issue of two FFTs causing deadlock... one FFT thread is blocked on the lock_guard in QueueHandle::SubmitAndBlock() via AcceleratorBuffer::CopyToCpu, the other is blocked on the lock_guard on g_vkTransferMutex in AcceleratorBuffer::CopyToCpu, hrm
<lain>
ok so the first one is blocking the second.. but what's blocking the first one
<d1b2>
<Darius> lock order reversal?
<d1b2>
<Darius> (ie not acquiring two different locks in the same order in two different places)
<lain>
hmm
<azonenberg>
I continue to say that we should not be using AcceleratorBuffer::CopyToCpu and should work towards refactoring out g_vkTransfer* entirely
<azonenberg>
actually the other thing is, g_vkTransferMutex is now redundant
<azonenberg>
because g_vkTransferQueue is now a QueueHandle right?
<lain>
it's not entirely redundant because you also have a global command buffer iirc
<lain>
for transfers
<lain>
g_vkTransferCommandBuffer
<azonenberg>
Hmm
<azonenberg>
well
<azonenberg>
ok
<azonenberg>
But it would be redundant if we only used the nonblocking transfer calls that took a commandbuffer as an argument
<azonenberg>
at least within filters, that should probably be the rule
<lain>
yeah
<lain>
ok I think I see the deadlock now
<lain>
thread 12's FFT compute is executing on the same queue as g_vkTransferQueue, which has to be held with QueueLock until we change the Filter API
<lain>
so yeah
<azonenberg>
So yeah basically we need to just nuke g_vkTransferQueue from orbit?
<azonenberg>
or is there a less invasive way to solve this for the near term?
<lain>
specifically, out of interest, we have:
<lain>
thread 12 holds: QueueHandle::m_mutex 0x00000001044a66b8 (FFTFilter::Refresh), g_vkTransferMutex (AcceleratorBuffer::CopyToCpu), and is waiting on QueueHandle::m_mutex 0x000000010444e0a8 in AcceleratorBuffer::CopyToCpu
<lain>
thread 14 holds: QueueHandle::m_mutex 0x000000010444e0a8, and is waiting on g_vkTransferMutex (AcceleratorBuffer::CopyToCpu)
<lain>
so yes, lock order reversal as Darius suggested
<lain>
I think the filter API change is the big one tbh
<lain>
because this would still deadlock even if we were using the nonblocking transfer calls
<lain>
the problem is we're holding the compute queue lock for the duration of Filter::Refresh(), in which we also (for FFTFilter) call FindPeaks, which performs a transfer. we can't guarantee there's >1 queue available, so we shouldn't be holding the compute queue lock for all of Refresh() if we need to perform queue operations within it
<lain>
if instead we use QueueHandle in Filter::Refresh, each filter's queue->Submit() or queue->SubmitAndBlock() call would be effectively atomic
<azonenberg>
well what i'm saying is
<azonenberg>
FindPeaks should submit to ... oh yeah
<azonenberg>
actually no it should
<azonenberg>
so basically we need to allow the filter to submit the command buffer early
<azonenberg>
then read back results and do cpu side processing
<lain>
hm
<lain>
well, regardless, Filter::Refresh() holds the queue lock the entire time until we make the api change
<lain>
so I don't think that solves anything for single-queue systems?
<lain>
I'm just gonna go ahead and make the Filter::Refresh() change real quick
<lain>
it'll be fast
<azonenberg>
yes it does make the difference IMO
<azonenberg>
because we no longer reference multiple QueueHandle's
<azonenberg>
we do the transfer on the same handle we submit the shader operation to
<azonenberg>
at that point there's no sync with another thread
<azonenberg>
the two refresh calls may not be able to execute in parallel
<azonenberg>
but that's a non-issue if we're short on queues anyway
<lain>
mmm
<lain>
we need both changes to make it ideal
<lain>
oh I see what you're saying, yeah
<azonenberg>
basically, if we never have a single thread use >1 queuehandle
<azonenberg>
we can never have such a deadlock
<lain>
yeah
<azonenberg>
So in that case i think the correct solution is to implement #731 and remove the global transfer queue/mutex
<lain>
but we *also* need to implement #732
<lain>
well, ok
<lain>
there's two choices here: either we implement #732 and pass QueueHandles around, or we hold the queue lock for the duration of the filter execution like we do currently, and pass a vk::raii::Queue& for transfers
<lain>
thoughts?
<lain>
eh
<lain>
it's moot now, I just finished implementing #732 :P
<lain>
now able to run any number of FFTs on my M1 with no issues
<d1b2>
<Darius> yay I guessed right!
<lain>
azonenberg: may I push this to master, or would you prefer a PR?
<lain>
eh, it's working quite well here, and should fix many bugs, soooo yolo!
<_whitenotifier>
[scopehal-apps] lain 4350fa9 - Bump scopehal library
<_whitenotifier>
[scopehal-apps] lain eecdc02 - Merge branch 'master' of github.com:glscopeclient/scopehal-apps
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 256 seconds]
Degi_ is now known as Degi
<azonenberg>
lain: So the #731 refactoring is still on the todo list but no longer a critical blocker?
<lain>
azonenberg: correct
<azonenberg>
ok great it probably makes sens to do that as part of a larger cleanup then
<azonenberg>
Anyway, i'm now working on packet tree support in the ngscopeclient protocol analyzer
<azonenberg>
Which had been on my list for a couple days but on hold since i didnt have scopesession load support or any DUTs handy that spoke some of the "easier" protocols that do packet merging
<azonenberg>
(i.e. i cant save sessions so i didnt want a protocol that was gonna need a dozen probe hookups and complex multilayer decodes)
<azonenberg>
so happens i'm using one of those protocols at work right now
<azonenberg>
So now i have test data and i'm making good progress
<azonenberg>
also found what i'm pretty sure is a memory leak in the protoco lanalyzer
<_whitenotifier>
[scopehal-apps] azonenberg b936b50 - ProtocolAnalyzerDialog: fixed bug where packets with <1 line of text showed no bytes in the data column. See #541.
<_whitenotifier>
[scopehal-apps] azonenberg a5a65d3 - Protocol analyzer now supports packet hierarchies for complex multi-packet transactions. See #541.
nelgau has joined #scopehal
nelgau_ has joined #scopehal
nelgau has quit [Ping timeout: 260 seconds]
<azonenberg>
lain: you borked the unit tests
<azonenberg>
we're getting CI failures
<azonenberg>
looks like you didnt refactor them to take a queuehandle
massi has joined #scopehal
nelgau_ has quit [Ping timeout: 260 seconds]
massi has quit [Remote host closed the connection]