<d1b2>
<ehntoo> Just catching up on scrollback. A naive question: do you need a mutex or other locking primitive around a Vulkan queue? I ask because I don't think I've seen any crashes that I'd attribute to weird multithreading issues with the current moltenvk allocator hack. That may be attributable to a moltenvk implementation detail or not having sufficient usage yet though.
<d1b2>
<bob_twinkles> in general, Vulkan objects are expected to be "externally synchronized" (i.e. synchronized by users of the API, not the API implementation itself)
<d1b2>
<bob_twinkles> so when you use the VK API to manipulate objects you usually need to have some sort of mutex protecting the object
<d1b2>
<ehntoo> ok. makes sense, just wanted to check. 🙂
<azonenberg>
yeah. that said, you can use different queues simultaneously from different threads without locking
<azonenberg>
that's the whole point of multiple queues
<azonenberg>
The challenge comes when you need to run on hardware that has less queues than you want to use threads
<azonenberg>
like the integrated intel card on my laptop w/ a single queue
<azonenberg>
lain: so your shader is now fully integrated, except for no alpha adjustment because i dont have a GUI element for adjusting that yet, for uniform analog waveforms. it just doesnt work yet :p
<_whitenotifier-7>
[scopehal] azonenberg f0ce4a7 - ComputePipeline: don't try to allocate a vk::DescriptorPoolSize with zero elements if a given shader has no SSBOs or images attached
<_whitenotifier-7>
[scopehal-apps] azonenberg 532d6ee - Progress on new renderer
<_whitenotifier-7>
[scopehal-apps] azonenberg f6ee00c - Added mutex to block waveform thread during framebuffer recreation (no idea why this is a conflict)
<azonenberg>
but ngscopeclient is still very much in the dev state
<azonenberg>
i dont consider it something end users should be using yet
<azonenberg>
so it's very low priority as the list of bugs goes
<d1b2>
<david.rysk> should the Vulkan/Metal renderer be working at all?
<d1b2>
<david.rysk> I have lib/scopehal on the mac branch
<d1b2>
<david.rysk> At the moment, RenderAnalogWaveform() crashes in GetTextureHandle() due to m_texture not being set
<azonenberg>
david.rysk: how recent is that, are you using latest from master as of this morning?
<d1b2>
<david.rysk> from the mac branch; I can switch to the master branch if that's preferred
<azonenberg>
c2f89693622efbb7054de3929221542eb82f350f in particular
<azonenberg>
yes, the mac branch is a fair bit behind
<d1b2>
<david.rysk> aha, ok
<d1b2>
<david.rysk> is it merged in yet?
<azonenberg>
that's lain's dev branch which is focusing on portability issues. she sends periodic PRs
<azonenberg>
but my rendering work is mostly in master and that is very active ongoing work
<azonenberg>
oh wait are you in ngscopeclient or glscopeclient for that test?
<d1b2>
<david.rysk> ngscopeclient
<azonenberg>
ok yeah. i was doing rendering work overnight and into the morning
<azonenberg>
anythign from more than like 30 minutes old is obsolet e:p
<azonenberg>
The most recent version from today should build and run fine for uniform analog waveforms
<azonenberg>
there is no support for any other waveform type which is a non-issue as long as you don't add a MSO channel from a scope to the view
<azonenberg>
because all of the other waveform types can only be created by filters and there's no gui code path to instantiate filters yet :p
<azonenberg>
I have a few other things to do, in particular cursors, markers, and history, to take care of before i work on filters
<azonenberg>
and then of course file load/save
<azonenberg>
but i might do filters before that
<d1b2>
<ehntoo> The waveform doesn't seem to update on zoom/pan on my Macbook with what's in master, but I'm getting >110 fps on 10M and 25M depth waveforms. 😃
massi has quit [Remote host closed the connection]
<azonenberg>
ehntoo: yeah rendering is done in a background thread, there's a bunch of things that need to trigger re-renders that don't necessarily do so yet
<azonenberg>
the tone mapping vs waveform rasterization are separate steps
<azonenberg>
the 60 Hz rendering loop is only drawing a texture to the screen so it's stupidly fast
<d1b2>
<ehntoo> (test waveform is a 10mhz sine wave modulated at 10khz)
<azonenberg>
yeah anyway i have a bunch of fixes coming in the next few minutes
<d1b2>
<david.rysk> fwiw: build fails since it's looking for ffts and there's something not right with include paths. I'll debug later
<d1b2>
<ehntoo> yeah, to get ngscopeclient running you do still need to provide an ffts install. This is the diff I wound up with to get it compiling on my m1 max machine. FFTS still doesn't work, but it links, which is enough to get things going. 😉
<_whitenotifier-7>
[scopehal-apps] azonenberg 21a9a06 - Lots of updates to Vulkan rendering for better command batching. Fixed a couple of concurrency issues.
<azonenberg>
Looks like ngscopeclient has a similar scaling issue to glscopeclient wrt waveform area width vs plot width
<azonenberg>
i'll look into that
<azonenberg>
there is a known validation failure when splitting windows (may lead to a crash if you have validation layers off)
<azonenberg>
oh and LeCroy scopes will use 8 bit transfer resolution vs 16 bit regardless of preferences because we dont actually apply prefs yet
<azonenberg>
also the listo f recently used scopes seems to sometimes not update, not sure what's up with that yet
<azonenberg>
ehntoo: 21a9a06 should fix the waveforms not updating when zooming/panning
<d1b2>
<ehntoo> yep! that's much better. I definitely notice the scaling issue you mention when panning.
<d1b2>
<ehntoo> something about the discrepancy between window area width and the graph display area width that doesn't include the y-axis area perhaps
<azonenberg>
Yes that is exactly what it is
<azonenberg>
i just need to figure out where i made the error
<azonenberg>
there's also a bunch of issues where when moving waveforms around or splitting groups you can try to destroy a texture after it's been submitted to a command queue, but before execution has finished
<azonenberg>
i have a list of pending textures that i hold onto that *should* keep references open until end of frame but apparently that doesn't always work. so i have to figure out where i screwed up that
<azonenberg>
The other possibility is the scaling issue is DPI related but i thought i took care of that
<azonenberg>
anyway, it's coming along nicely but there's still some rough edges
<azonenberg>
we're still a ways from ngscopeclient being general usability
<d1b2>
<bob_twinkles> "end of the frame" on the CPU side is rarely sufficient for resources, assuming you mean "after I've submitted the last render command." You probably want to hold on to them for (size of swapchain)-1 frames
<d1b2>
<bob_twinkles> that let's you use the swapchain fences to know that rendering has completed on old frames, though there's some potential hazards when the swapchain goes out of date
<azonenberg>
So the swapchain isn't the issue
<azonenberg>
it's after the fence on the commandbuffer has completed
<azonenberg>
Right now we actually destroy resources at the start of the next frame
<azonenberg>
more precisely, we clear a vector<shared_ptr<Texture>>
<azonenberg>
after we've acquired the next frame's image
<azonenberg>
So any textures that no longer are referenced by anything else are destroyed at that time
<azonenberg>
most likely the bug is i neglected to actually add a ref to a texture i used that frame
<azonenberg>
but i don't know for sure yet
<d1b2>
<bob_twinkles> That fence signals after all rendering work completes?
<azonenberg>
There's a couple, but yes. That said, it's OK to destroy the texture before the framebuffer swap etc
<azonenberg>
as long as the commandbuffer for *imgui* has submitted and completed execution
<azonenberg>
since we do our rendering all in compute shaders writing ot textures then pass those textures to imgui
<d1b2>
<bob_twinkles> Ah ok, that should be working as long as you're holding references to all the right textures 🙂
<azonenberg>
Yes, exactly
<azonenberg>
So the bug is presumably that we're not
<d1b2>
<ehntoo> window resize seems to deadlock in UpdateFramebuffer on master, as well. I can troubleshoot after work.
<azonenberg>
I'll look into that. this was a fix i added recently, tl;dr it appears that creating a swapchain is an operation that is exclusive with all background vulkan processing in other threads
<azonenberg>
within the same device handle
<azonenberg>
i get errors/crashes about queues being in use from other threads any time i try to create a swapchain in one thread while running stuff in another thread
<azonenberg>
so i added locks to guard against that and there's likely a bug around that
<azonenberg>
Most likely what's happening is that WaveformThread is blocking on g_waveformProcessedEvent
<azonenberg>
and that's not coming because we are in the resize handler
<azonenberg>
but i'll debug it after work
<d1b2>
<ehntoo> 👍
<azonenberg>
I wish there was a way to have a synchronization object that says "foo and bar are mutually exclusive with baz, but not each other"
<d1b2>
<ehntoo> it's not a normal usage for it, but could you make a shared_mutex represent that for this scenario? swapchain creation as lock, other operations as lock_shared?
<azonenberg>
that was not a primitive i knew about
<azonenberg>
i'll look into it
<azonenberg>
the challenge remains ensuring that everything comes to a clean state once i try to create the swapchain
<d1b2>
<ehntoo> normally used for multiple readers of a structure that needs an exclusive lock for writing.
<azonenberg>
in particular, what happens if i have N different threads that never all release the lock at the same time
<azonenberg>
how do i ensure the swapchain eventually gets the lock
<azonenberg>
i need some way to tell them all a swapchain creation is pending, have them wrap up what they're doing and not start more work
<azonenberg>
then release them all after
<azonenberg>
It's not a trivial problem
<d1b2>
<ehntoo> with a shared_mutex, my understanding is that as soon as there's a thread attempting to take the exclusive lock, no new threads requesting the shared lock will get access. as long as all the current holders of the shared lock eventually let go of it, the exclusive requester should get access.
<azonenberg>
ah ok
<azonenberg>
then that might be something to look into
<azonenberg>
the shared locks would all be held by things like "running the filter graph" or "rendering a waveform"
<azonenberg>
and released once those operations completed
<d1b2>
<ehntoo> yeah, I think this might be the primitive you're looking for.
<azonenberg>
yep i'll try that out tonight
<d1b2>
<ehntoo> oh. std::shared_mutex is C++17. 🙃
<azonenberg>
i have no problem bumping from 14 to 17 at this timer
<azonenberg>
time*
<azonenberg>
20 is the main step i want to avoid as that is not well supported on all of our target platforms at this time
<azonenberg>
17 is five years old and even debian stale has a compiler for it
<azonenberg>
ehntoo, lain, louis, bvernoux: are any of you aware of a platform we care about that does not have a c++17 compiler available?
<d1b2>
<zyp> tbf, debian stable even has reasonable c++20 support
<lain>
azonenberg: not that I'm aware of, no
<d1b2>
<ehntoo> same. I know MS has been pretty good about supporting new C++ revisions, so I don't think windows will be a problem.
<azonenberg>
ok so then tonight we'll bump to 17
<azonenberg>
we already have our own event class similar to std::event that doesn't depend on c++20
balrog has quit [Quit: Bye]
balrog has joined #scopehal
<d1b2>
<Mughees> So now I sum the large value first and just push small values onto output waveform vector (after dividing), and I still get zeros in statistics... So I think we need to fix this in statistics filter....
<azonenberg>
Have you printed out the calculated area values?
<azonenberg>
it's entirely possible the stats filters *also* need numerical stability improvements
<azonenberg>
but let's make sure the data we're feeding in is good
<azonenberg>
it looks like all you're doing is rescaling the data initially
<azonenberg>
you're not actually taking into account differences in magnitude between data elements
<d1b2>
<Mughees> let me try
<d1b2>
<Mughees> These are some values
<d1b2>
<Mughees> Area 0.00000000316110725000 Area 0.00000000316225925000 Area 0.00000000316338050000 Area 0.00000000316447125000 Area 0.00000000316553125000 Area 0.00000000316656050000 Area 0.00000000316755900000 Area 0.00000000316852675000 Area 0.00000000316946375000 Area 0.00000000317036975000 Area 0.00000000317124475000 Area 0.00000000317208900000 Area 0.00000000317290225000 Area 0.00000000317368450000 Area 0.00000000317443575000 Area
<d1b2>
0.00000000317515575000 Area 0.00000000317584475000 Area 0.00000000317650250000 Area 0.00000000317712925000 Area 0.00000000317772475000 Area 0.00000000317828900000 Area 0.00000000317882200000 Area 0.00000000317932375000 Area 0.00000000317979425000 Area 0.00000000318023350000 Area 0.00000000318064150000 Area 0.00000000318101825000 Area 0.00000000318136350000 Area 0.00000000318167750000 Area 0.00000000318196000000 Area 0.00000000318221125000
<d1b2>
Area 0.00000000318243100000 Area 0.00000000318261950000 Area 0.00000000318277650000 Area 0.00000000318290225000 Area 0.00000000318299650000 Area 0.00000000318305925000 Area 0.00000000318309075000
<azonenberg>
Ok so you are getting nonzero values. So the numerical stability issues may well be in the average block
<azonenberg>
Also is signed area really what you want? i.e. do negative values count as positive vs taking away from the total?
<d1b2>
<Mughees> negative count as positive
<azonenberg>
because this isn't the integral of the waveform anymore
<azonenberg>
it might make sense to either have two filters or a parameter to select the mode
<d1b2>
<Mughees> ok can think about it
<d1b2>
<Mughees> One clarification?: In sparse waveform, m_duration and m_offsets, both are in units of femtoseconds right?
<d1b2>
<Mughees> and not in timescale units?
<azonenberg>
No. all time units are in timescale units
<d1b2>
<Mughees> good
<azonenberg>
It is common for the timescale to be 1, in which case duration/offset are in femtoseconds
<d1b2>
<Mughees> thanks
<azonenberg>
for example, any kind of filter that interpolates a zero crossing will have a timescale of 1 in the output
<azonenberg>
but if you're looking at say logic analyzer output after compression, it's going to be sparse with timescale of the sample period
<azonenberg>
And then all timestamps are offset by the trigger phase but if you're only looking at durations and vertical axis values, you can ignore that for the most part
<d1b2>
<Mughees> I was aksing in context of calculating area of one sample of a sparse waveform
<d1b2>
<Mughees> it comes = sample[i] * timescale * m_duration
<d1b2>
<Mughees> and this to be summed across all samples
<azonenberg>
Yes
<azonenberg>
So for the summation you will need to use a numerically stable floating point summing algorithm
<d1b2>
<Mughees> Is it really necessary even now? since i now divide by FS_PER_SEC after whole summation is complete
<d1b2>
<Mughees> ?
<d1b2>
<Mughees> for (size_t i = 0; i < length; i++) { area += (fabs(uadin->m_samples[i]) * din->m_timescale); cap->m_samples.push_back(area / FS_PER_SECOND); }
<d1b2>
<Mughees> so area accumulates with large values only
<d1b2>
<Mughees> i mean not very very very small values
<azonenberg>
That's not the point
<azonenberg>
the order of magnitude of the values you're summing is ireelevant
<azonenberg>
what matters is the partial sum's value compared to the values you're adding in
<azonenberg>
imagine adding 1 to X a few million times
<azonenberg>
at the start it's 1+1
<azonenberg>
at the end it's 1M + 1
<azonenberg>
the more values you have, and the greater their spread, the worse the problem is
<d1b2>
<Mughees> hmmm...i agree...will do
<lain>
azonenberg: tell to me go read the vulkan docs if the answer is complex, but do you think it makes sense for QueueManager to maintain a list of all possible Queues, just grabbing them all at the start, or should we allocate on demand?
<azonenberg>
Allocate on demand
<lain>
kk
<azonenberg>
have a list of all possible queues, but don't actually create them
<azonenberg>
i.e. (family, index) tuples
<azonenberg>
vs vulkan handles
<bvernoux>
Great with latest ngscopeclient
<bvernoux>
I have about 26Hz Waveform rate with demo
<bvernoux>
and between 0 &1 pending waveform
<bvernoux>
So it is ultra responsive
<bvernoux>
In full screen too
<bvernoux>
when in comparison I have 15 Hz Waveform rate with glscopeclient
<azonenberg>
yep. still fixing a bunch of bugs
<azonenberg>
but it's super fast for the subset of features it supports
<bvernoux>
I was afraid it was slower than with the CPU graph ;)
<bvernoux>
but it is even smoother
<bvernoux>
Do you will also port the modifications you are doing on glscopeclient ?
<bvernoux>
Could be really interesting to have both glscopeclient/ngscopeclient on par on the waveform rendering with Vulkan optimized
<bvernoux>
as the remaining overhead will be related to gtk on glscopeclient
<azonenberg>
This *is* that overhead
<azonenberg>
it's literally lain's vulkan-ized rendering shader from glscopeclient
<azonenberg>
the gtk overhead is that big
bvernoux has quit [Quit: Leaving]
<azonenberg>
it's constantly shuffling gl contexts around and calling expensive xlib APIs
<_whitenotifier-7>
[scopehal] azonenberg 0a668c0 - Fixed one extra ..pipe
<d1b2>
<ehntoo> I took a quick stab at using a shared_mutex for the swapchain mutex - seems to work for me using shared_lock's in VulkanWindow::Render() and unique_lock's in WaveformThread. I don't know that I'm fully mentally equipped to fully reason about this after the poor night's sleep I got last night, though.
<azonenberg>
I already have some fixes done
<azonenberg>
not pushed yet
<d1b2>
<ehntoo> sweet
<azonenberg>
i also fixed a whole bunch of other threading issues around when you do things like split or close a waveform group while the render thread is busy
<azonenberg>
making sure we keep at least one reference to the shared_ptr open until the render shader finishes
<azonenberg>
even if the result will then be discarded
<azonenberg>
(the VkCommandBuffer points to textures, descriptors, etc internally even though it doesn't hold references to the C++ objects, so we have to manage that ourselves)
<azonenberg>
some of those fixes dragged down the frame rate though
<azonenberg>
So i'm not pushing until i find solutions for that
<_whitenotifier-7>
[scopehal-apps] azonenberg 3c34bd4 - Fixed a lot of thread safety issues, improved performance of some operations
<_whitenotifier-7>
[scopehal-apps] azonenberg ec9ad81 - Replaced heavyweight mutexing around ImGui::UpdatePlatformWindows() with locks only when we actually create, destroy, or resize a platform window
<lain>
azonenberg: how about if QueueManager provides some methods e.g. GetComputeQueue(), GetRenderQueue(), GetTransferQueue() to mimic the existing Allocate*Queue global funcs.. or would you prefer a different interface?
<azonenberg>
Yes, i think that makes the most sense. methods to allocate a queue for a given purpose