#scopehal on 2022-09-27 — irc logs at libera.irclog.whitequark.org

2022-03-25 21:41 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

00:02 <d1b2> <louis> azonenberg: About to PR the minimal zero-hold changes I was carrying

00:02 <d1b2> <louis> More generally, how do we want to deal with sparse waveform data that does not satisfy offset[i] + duration[i] == offset[i+1]? Because currently all(?) rendering pathways assume this

00:04 <d1b2> <louis> I would be loath to lose the optimization of avoiding copying the durations[], but clearly we need to in some cases to correctly render waveforms that don't have data at some points

00:04 Degi has quit [Ping timeout: 244 seconds]

00:04 Degi_ has joined #scopehal

00:04 Degi_ is now known as Degi

00:06 <d1b2> <Darius> I ended up using PocketFFT at work (which is used by numpy/scipy/etc)

00:21 <azonenberg> I think it's OK to give up on that optimization near term

00:21 <azonenberg> because we dont usually have huge sparse waveforms

00:22 <azonenberg> at least analog/digital, vs protocol

00:22 <azonenberg> the one exception is things like a cdr pll output

00:22 <azonenberg> which is large and sparse but gap free

00:24 <d1b2> <louis> It isn't actually that complex in the shader to support both

00:24 <d1b2> <louis> More it's just annoying to have this combinatorical explosion of shader versions to handle in cpp

00:25 <d1b2> <louis> So wondering if it would be improved by rewriting that handling to be less manual

00:34 <_whitenotifier-7> [scopehal-apps] 602p opened pull request #504: Zero-hold rendering flag - https://github.com/glscopeclient/scopehal-apps/pull/504

00:34 <_whitenotifier-7> [scopehal] 602p opened pull request #698: Zero-hold rendering flag - https://github.com/glscopeclient/scopehal/pull/698

00:34 <d1b2> <louis> PR opened. If you're not hankering to merge right now I will take a stab at having sparse rendering take durations

01:16 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1024127296409456700/unknown.png

01:27 <_whitenotifier-7> [scopehal-apps] 602p synchronize pull request #504: Zero-hold rendering flag - https://github.com/glscopeclient/scopehal-apps/pull/504

01:27 <_whitenotifier-7> [scopehal-apps] 602p edited pull request #504: Zero-hold rendering - https://github.com/glscopeclient/scopehal-apps/pull/504

01:29 <d1b2> <louis> At the moment the behaviour in that PR keeps the existing behavior of disregarding duration except in the circumstance that zero-hold rendering is requested.

01:29 <d1b2> <louis> Not sure what we want to do more generally since I'm not sure how well-defined interpolation is on a noncontinuous (i.e. offs[i] + dur[i] != offs[i+1]) waveform and I wanted to leave that logic as-is

01:40 <_whitenotifier-7> [scopehal-apps] 602p synchronize pull request #504: Zero-hold rendering - https://github.com/glscopeclient/scopehal-apps/pull/504

01:54 <azonenberg> yeah i think that probably makes the most sense

01:54 <azonenberg> zero hold has gaps, interpolation doesnt

01:54 <azonenberg> pass the duration buffer always, but shader wont read from it unless doing zero hold?

01:55 <azonenberg> I'm not sure i like the idea of having it be a flag in the waveform though

01:55 <azonenberg> it makes more sense to be a view setting

01:55 <azonenberg> IMO

02:12 <d1b2> <louis> As PRed the flag is specified as a hint with a note that it could be overridden by a per-view setting

02:27 <d1b2> <louis> I'm not attached to the flag, my wish is just that (IMO) the sparse measurement filters make more sense to be rendered sparse as a default so we need some way to communicate that

02:32 <d1b2> <louis> Maybe a flag for "the offsets and durations here are perfectly precise (relative to the input waveform) and not a product of a sample rate" that gets set on measurements, and then there could be a setting for the (default) rendering behavior of that kind of waveform

02:32 <d1b2> <louis> Since that's really the semantic to me

02:56 <azonenberg> hmm i'm not even sure about that

02:56 <azonenberg> like, if i'm looking at a FM modulation or something

02:56 <azonenberg> i think linear interpolation makes more sense

03:04 <d1b2> <louis> hm

03:06 <d1b2> <louis> At root my UX wish is that non-contigous measurements [configurably] aren't interpolate by default, I don't really care how that's accomplished. But it seems to neccessitate some flag for noncontigousness

03:08 <azonenberg> yeah

03:08 <azonenberg> i'm just thinking that for some cycle by cycle measurements it might make more sense

03:08 <azonenberg> but i guess to start, making all of them zero-hold makes the most sense?

03:09 <d1b2> <louis> Thinking about it more, part of my beef is that the current interpolation behaviour means that the line contradicts the data inside the sample, since interpolation is between leftmost point of sample n and n+1

03:10 <azonenberg> can you explain what you mean?

03:12 <azonenberg> interpolation is effectively putting a dot at each sample then connecting the dots

03:12 <d1b2> <louis> yes, but the samples have duration so they aren't dots really

03:13 <d1b2> <louis> the code in that PR does

03:13 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1024156679140552764/unknown.png

03:13 <d1b2> <louis> current interpolation behaviour is

03:13 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1024156758387732550/unknown.png

03:13 <d1b2> <louis> Maybe preferable for noncontigous waveforms would be

03:13 <azonenberg> ah so you're saying we ignore duration and interpolate immediately. i see

03:13 <d1b2> <louis> https://cdn.discordapp.com/attachments/776941750291267595/1024156898460713012/unknown.png

03:13 <azonenberg> yes

03:14 <azonenberg> I think we ultimately will want a couple of different options we can select form

03:14 <d1b2> <louis> If we did this (bonus points for lighter coloring of the interpolated part) it would also solve my UX issue

03:14 <azonenberg> for now, what you have is probably a good starting point. i'll look at the code and merge shortly

03:14 <azonenberg> before we do too much more on that i want to get it pulled into ngscopeclient

03:14 <azonenberg> I'm trying to avoid having to do too much UX work twice

03:14 <d1b2> <louis> roger

03:15 <azonenberg> essentially my thought is, at this point i think we're mostly in agreement that ngscopeclient is the future, but glscopeclient is now and people have to get their work done

03:15 <azonenberg> so fix any bugs in glscopeclient that are blocking people

03:15 <azonenberg> continue developing backend as normal

03:15 <d1b2> <louis> 👍

03:15 <azonenberg> but avoid adding new features to glscopeclient that we'll just have to port to ngscopeclient

03:15 <azonenberg> and have ngscopeclient be the focus of UI dev

03:16 <d1b2> <louis> yeah that PR satisfies my desire w/r/t glscopeclient. and i'll bikeshed the exact rendering in ngscope

03:16 <azonenberg> ok

03:16 <azonenberg> sounds like a plan

03:16 <azonenberg> i'll review and merge then get moving on getting the new renderer into ngscopeclient

03:16 <azonenberg> On that note, M asked me to prepare a brief for marketing people to help us come up with a new name

03:17 <azonenberg> since the new tool is a perfect opportunity to do the rebranding we've needed to do from day one :p

03:19 <azonenberg> So if you or anyone else wants to weigh in, dump ideas in the channel

04:01 Degi_ has joined #scopehal

04:03 Degi has quit [Ping timeout: 250 seconds]

04:03 Degi_ is now known as Degi

04:25 <_whitenotifier-7> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±9] https://github.com/glscopeclient/scopehal-apps/compare/e38ec20a0b19...7983c3fc0380

04:25 <_whitenotifier-7> [scopehal-apps] azonenberg 7983c3f - Added ToneMapAllWaveforms()

04:38 <azonenberg> So we definitely need to optimize our queue allocations at some point

04:39 <azonenberg> For example, moltenvk has a fairly small number of queues, and each one can do anything (they're drop in equivalents)

04:39 <azonenberg> my recent nvidia card has three types of queue

04:39 <azonenberg> the first one (which we currently use for everything) has 16 queues, and can do anything

04:39 <azonenberg> the second has only 2 queues, and can only do host/device transfers. but we should use those for transfer as it's optimized for that (and it means we're not using queues that could be doing something else)

04:40 <azonenberg> and there's a third type with 8 queues, which can do anything but rendering

04:40 <azonenberg> (so we should be using those for a lot of the background processing)

04:45 <azonenberg> Right now we are using four queues that I don't have names for at the moment (I have to figure out where I allocate those)

04:45 <azonenberg> eight (one per thread) for filter graph execution

04:45 <azonenberg> one for actual rendering

04:46 <azonenberg> one for the rendering thread

04:46 <azonenberg> one for vkFFT that i think we can optimize out, as it's only used during initialization

04:46 <azonenberg> and one dedicated to host/device transfers

04:46 <azonenberg> So we're maxed out on my nvidia card and on some other cards we might even be running out

04:49 <azonenberg> But i think that's something to work on later

05:48 bvernoux has joined #scopehal

06:01 <_whitenotifier-7> [scopehal-apps] bvernoux edited pull request #503: Fix GitHub CI Windows Build/Install issue (WIP) - https://github.com/glscopeclient/scopehal-apps/pull/503

07:09 massi has joined #scopehal

11:43 azonenberg has quit [Ping timeout: 260 seconds]

12:02 azonenberg has joined #scopehal

14:43 <_whitenotifier-7> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://github.com/glscopeclient/scopehal/compare/6022d1756b94...7a135f408215

14:43 <_whitenotifier-7> [scopehal] azonenberg 7a135f4 - ComputePipeline: AddComputeMemoryBarrier is now static

14:44 <_whitenotifier-7> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±11] https://github.com/glscopeclient/scopehal-apps/compare/7983c3fc0380...62934e00b14b

14:44 <_whitenotifier-7> [scopehal-apps] azonenberg 62934e0 - Continued setup for renderer

16:02 massi has quit [Remote host closed the connection]

16:33 <_whitenotifier-7> [scopehal-apps] bvernoux opened issue #505: Windows GitHub CI MSI Build / PortableApps missing shaders waveform-compute.* - https://github.com/glscopeclient/scopehal-apps/issues/505

16:34 <_whitenotifier-7> [scopehal-apps] bvernoux commented on pull request #503: Fix GitHub CI Windows Build/Install issue (WIP) - https://github.com/glscopeclient/scopehal-apps/pull/503#issuecomment-1259765159

16:34 <_whitenotifier-7> [scopehal-apps] bvernoux closed pull request #503: Fix GitHub CI Windows Build/Install issue (WIP) - https://github.com/glscopeclient/scopehal-apps/pull/503

17:40 <azonenberg> Strange threading issue in the new renderer. vulkan validation complains that the queue for WaveformThread is being accessed by another thread simultaneously under certain conditions (like a window being resized)

17:40 <azonenberg> but i don't see how that could be possible because i declared the queue as a local variable within the thread method

17:40 <azonenberg> and i never pass it to anything that holds onto it

17:40 <azonenberg> so i can't imagine how the handle could ever end up in another thread

18:00 <azonenberg> adding a mutex to prevent updating the framebuffer while doing this seems to have solved it

18:00 <d1b2> <louis> Whoops, totally forgot to handle no int64 in my renderer patch. Will look at that this afternoon/evening

18:01 <azonenberg> so possible i have to block all background processing on any queue when making a swapchain? if so this is not well documented

18:03 <bvernoux> Does anyone have understood the weirdness of CMake Install for windows which does not want to copy all shaders as each time the waveform-compute.* are not copied

18:03 <bvernoux> It is https://github.com/glscopeclient/scopehal-apps/issues/505

18:03 <azonenberg> (poke lain, since i think she wrote that code?)

18:03 <bvernoux> Why does this line

18:03 <bvernoux> https://github.com/glscopeclient/scopehal-apps/blob/master/src/glscopeclient/CMakeLists.txt#L159

18:03 <bvernoux> does not do the job ?

18:04 <bvernoux> It is a simple directory copy of shaders which contains the famous waveform-compute.*

18:04 <azonenberg> louis: yeah that is needed to support a lot of older intel integrated cards off the top of my head

18:04 <azonenberg> i forget if the M1 gpu had native int64 or not

18:11 <_whitenotifier-7> [scopehal-apps] 602p synchronize pull request #504: Zero-hold rendering - https://github.com/glscopeclient/scopehal-apps/pull/504

18:11 <d1b2> <louis> OK, I think it's just that simple

18:13 <azonenberg> Will have a look and merge after work

18:30 <d1b2> <Mughees> @louis @azonenberg Is there a limit to the number of significant digits while outputting a float value? (fabs(uadin->m_samples[i]) * (float) din->m_timescale) / FS_PER_SECOND; evaluates to zero when outputting through stats window. Graph view shows atleast something....Can we increase it somehow.... I am implementing area under the curve filter

18:31 <azonenberg> hmmm did you declare the units for the filter?

18:31 <azonenberg> it should be using Unit::PrettyPrint() iirc

18:31 <azonenberg> which will give you nice suffixes like ps/ns/fs etc

18:32 <d1b2> <Mughees> no..let me try

18:32 <azonenberg> if you're trying to output in volts or something i could see that being a problem

18:32 <azonenberg> also i'm not sure why you are dividing by fs per second

18:32 <azonenberg> the native output of the filter should be in fs

18:32 <azonenberg> then the unit should take care of converting that to whatever SI scale is appropriate

18:35 <d1b2> <Mughees> https://cdn.discordapp.com/attachments/776941750291267595/1024388707047968939/unknown.png

18:35 <azonenberg> yeah you're outputting in volts

18:35 <d1b2> <Mughees> yeah

18:35 <d1b2> <Mughees> should be Vs

18:36 <azonenberg> ... oh, yeah this is volt-seconds

18:36 <azonenberg> ok so you are starting to hit a problem with our existing units framework

18:36 <azonenberg> this is actually the thing that blocked me last time i looked at making an integration filter

18:36 <d1b2> <Mughees> hmmm

18:36 <azonenberg> https://github.com/glscopeclient/scopehal/issues/322

18:36 <d1b2> <Mughees> ay quick expert suggestions 🙂

18:37 <azonenberg> It's an open problem at this point. We need a more general units framework that understands how to represent arbitrary algebraic combinations of existing units

18:37 <azonenberg> of base units*

18:37 <azonenberg> ping @louis as well

18:37 <d1b2> <Mughees> Can't we have somthing quick that sipl displays values?

18:38 <azonenberg> I mean the quick and dirty hack is to extend the Unit class with a new volt-seconds unit

18:38 <azonenberg> and set your y axis to use that

18:38 <azonenberg> (this is separate from the issue about your value not showing up in the stat, i think)

18:38 <d1b2> <Mughees> but how would the problem of zero resolve

18:38 <azonenberg> So that's a different problem

18:38 <d1b2> <Mughees> basically any value like 1e-11 outputs zero

18:38 <azonenberg> I would have to look at the code and/or poke at stuff in a debugger. I do know that there is some rounding involved, but in general the unit class should use SI suffixes when printing things

18:39 <azonenberg> and i think it knows down to femto

18:39 <azonenberg> so this *should* not be a problem

18:39 <azonenberg> It could also be a numerical stability issue somewhere in either the statistic or your filter implementation

18:39 <azonenberg> if you're summing a bunch of extremely small numbers that add to almost zero, fp32 intermediates may not be sufficient precision

18:40 <azonenberg> and/or you may have to use a more sophisticated summing algorithm that sorts the values and sums similar magnitude ones first etc

18:40 <d1b2> <Mughees> fp 32 is between e-38 to e38

18:40 <azonenberg> yes but that's total range

18:40 <azonenberg> if you add say 1e-20 to 1.0 you get 1.0

18:40 <azonenberg> because the dynamic range is much smaller

18:40 <azonenberg> I'm not saying this is your problem but it's certainly something i'm suspicious of being a contributing factor

18:41 <d1b2> <Mughees> hmmmm....

18:41 <d1b2> <Mughees> will explore more

18:47 Bird|otherbox has quit [Ping timeout: 248 seconds]

18:48 Bird|otherbox has joined #scopehal

18:51 <azonenberg> also random other interesting observation: i threw VTune at my LeCroy scope and it seems that at least the FFT block *does* use AVX

19:03 <azonenberg> But e.g. the SDA package seems to only use SSE

19:12 <d1b2> <louis> @Mughees if you crank up the magnitude of the sine wave and crank down the sample rate does it stop returning zero? That would indicate it's a very-small-float problem to me

19:13 <d1b2> <Mughees> yeah that is exactly what happens

19:16 <azonenberg> ok yeah so you have a numerical stability problem. Quick and dirty fix for smaller datasets is to do the integration in a fp64 temporary but that will still fail with larger datasets

19:16 <azonenberg> The proper option is to use a more numerically optimized summation algorithm

19:17 <azonenberg> https://en.wikipedia.org/wiki/Kahan_summation_algorithm may be worth looking at, although we'd have to make sure our optimization settings on the compiler didn't cause problems (at one point i think we had -ffast-math enabled)

19:17 <azonenberg> i forget if we still do

19:18 <azonenberg> https://people.eecs.berkeley.edu/~demmel/AccurateSummation.pdf is likely also worth a read

19:20 <azonenberg> also https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.3535&rep=rep1&type=pdf

19:20 <d1b2> <Mughees> ok.

19:21 <d1b2> <Mughees> in a tektronix demo video i saw that the area unit was in terms of nano units

19:21 <d1b2> <Mughees> seems like it automatically was judging the sampling rate and assigning an appropriate unit based on that

19:21 <d1b2> <Mughees> like nano volts seconds

19:26 <azonenberg> So you have two different problems

19:26 <azonenberg> the first is numerical stability causing your result to be zero

19:27 <azonenberg> second is the unit issue

19:27 <azonenberg> For the near term, if we assume that you are only using this on inputs of type volt, we can just create a volt-second unit and have the units library automatically add SI scaling factors as needed

19:27 <azonenberg> it's not ideal but will get us something usable near term

19:29 <azonenberg> But you still need a more stable summation algorithm to have something to print

19:33 <d1b2> <Mughees> ok

19:37 <d1b2> <louis> I think mughees is saying that you could determine (somehow) the scale and then produce an output where 1f is 1 nano-voltamp instead of 1 voltamp

19:37 <d1b2> <louis> Might be something to think about when we do some hacking for algebraic units. We already have this with mV/V I think

19:39 <azonenberg> louis: mV is a hack for Y axis bathtub curves

19:39 <azonenberg> we need to solve the algebraic unit issue first

19:39 <azonenberg> for display scale, i dont see the need to worry about it, fp32 has plenty of range to use SI base units for the most part

19:39 <azonenberg> our problem is numerical stability not lack of range

20:30 bvernoux has quit [Quit: Leaving]

20:45 * lain begins attacking #685

20:49 <azonenberg> lain: one thing we should probably do, i forget if i mentioned in the issue comments

20:49 <azonenberg> is make use of other queue types

20:50 <lain> ah yeah

20:50 <azonenberg> right now we use the first compatible type for all of our queues

20:50 <azonenberg> my nvidia card has 16x general purpose/rendering queues that can do anything

20:50 <azonenberg> we currently allocate all 16 of them although i think i can optimize out 1-2 at least

20:50 <lain> I will keep that in mind, I read what you said about that earlier in the backlog

20:50 <azonenberg> then the 2 transfer-only queues and the 8 compute-focused ones go unused

20:50 <azonenberg> so e.g. g_vkTransferQueue should be put in a transfer-only queue type if one exists

20:51 <azonenberg> It's not a trivial problem given the range of GPUs out there and the different types of queues available

20:55 <lain> should we make an issue for it?

20:55 <azonenberg> i consider all of that part of 685

20:55 <azonenberg> but basically we need to make a single unified allocator

20:55 <lain> alright

20:55 <azonenberg> that understands all of the queue types available

20:56 <azonenberg> and when we request a new queue will decide the right type to use

20:56 <azonenberg> and it needs to understand how many we've already allocated from each family and such

20:56 <azonenberg> orthogonally we also probably want to optimize our code to not overuse queues

20:56 <azonenberg> how many queues does moltenvk provide?

20:57 <azonenberg> if you can pastebin a log of glscopeclient or ngscopeclient initializing on your mac with all of the gpu info that would help inform discussions

20:59 <lain> kk

20:59 <d1b2> <zyp> I think that depends on the gpu

21:01 <d1b2> <zyp> fwiw here's vulkaninfo from my hackintosh: https://paste.jvnv.net/view/ROzdl

21:02 <d1b2> <zyp> and my macbook: https://paste.jvnv.net/view/stceb

21:03 <azonenberg> ok so the first one has a total of four queues

21:04 <d1b2> <zyp> looks like both do

21:04 <azonenberg> as does the second

21:04 <azonenberg> So we definitely need to do some work to squeeze into less queues. this may mean adding mutexing or something to certain threads if we are limited on queues

21:04 <azonenberg> since a queue cannot be used from two threads concurrently

21:05 <azonenberg> Right now we have one thread for the backend waveform processing and rendering, one for the GUI, one for transfers, eight for the filter graph...

21:05 <azonenberg> we're using a total of 16 queues now

21:05 <azonenberg> one can definitely be optimized out, we can cut the filter graph down from 8 threads to less

21:05 <azonenberg> four of the others i don't know where they came from and need to investigate

21:06 <d1b2> <zyp> by transfers you mean between regular and GPU memory?

21:06 <d1b2> <zyp> on the macbook with integrated graphics, there's no distinction there and hence no need to transfer

21:06 <azonenberg> Correct. so we can likely optimize out that queue entirely

21:07 <azonenberg> however there are a few cases where we use that queue for memory barriers too

21:07 <azonenberg> so we'd have to move those elsewhere

21:07 * lain currently on a side quest to fix catch2 on macos, since brew defaults to pulling in latest stable (3.1.0) which has a breaking change, you have to include <catch2/catch_all.hpp> instead of <catch2/catch.hpp>

21:08 <lain> how do I see what variables are defined within cmake for a given package? I just want to add a -D to the CXX args so I can ifdef the includes appropriately

21:08 <azonenberg> re side quest, can we identify the version and change the include path or somehting?

21:09 <lain> that's what I'm doing :3

21:09 <azonenberg> you can look at cmakecache.txt to see all of the intermediate values

21:09 <lain> ah ok

21:11 <lain> I see "set(PACKAGE_VERSION "3.1.0")" in Catch2ConfigVersion.cmake, can I use that immediately after the find_package(Catch2 REQUIRED) call to check the catch2 package version?

21:11 <azonenberg> yes i think so

21:11 <lain> alrighty

21:19 <azonenberg> also lol my laptop's intel integrated card only has a single queue total

21:19 <azonenberg> that will be fun to deal with

21:19 <lain> ha

21:27 <lain> ok yeah, it becomes Catch2_VERSION after you include(Catch)

21:28 <lain> some day I'll understand cmake

21:33 <lain> ok, the catch2 v3.x fix is in my macos branch for now

22:07 <_whitenotifier-7> [scopehal] lainy commented on issue #685: Improve Vulkan queue allocator - https://github.com/glscopeclient/scopehal/issues/685#issuecomment-1260103490

22:09 <_whitenotifier-7> [scopehal-apps] azonenberg labeled issue #506: Figure out how to function on systems with relatively few Vulkan queues - https://github.com/glscopeclient/scopehal-apps/issues/506

22:09 <_whitenotifier-7> [scopehal-apps] azonenberg opened issue #506: Figure out how to function on systems with relatively few Vulkan queues - https://github.com/glscopeclient/scopehal-apps/issues/506

22:09 <azonenberg> lain: opened separate issue 506 for the "handle using only a handful of queues" scenario

22:10 <azonenberg> there are 1-2 queues i think i can optimize out of existence easily

22:10 <azonenberg> we can trivially single thread the filter graph (although for extra fun, we can multithread evaluation but use a single vulkan queue when doing that)

22:10 <azonenberg> we will probably need to add some kind of class/method that does conditional mutexing if two threads have collisions on the same queue ID

22:11 <azonenberg> i have ideas but haven't implemented any of it yet

22:11 <_whitenotifier-7> [scopehal] azonenberg assigned issue #681: AcceleratorBuffer: better unified memory support - https://github.com/glscopeclient/scopehal/issues/681

22:15 <azonenberg> lain: i also assigned the unified memory ticket to you since you've got access to a platform with unified memory and i don't

22:15 <azonenberg> and while it's not a first line "won't work at all on M1" portability issue, it would be a serious performance and memory usage impact

22:25 * monochroma wonders if unified memory optomization will help with low end intel/amd integrated GPUs

22:25 <lain> likely

23:26 <lain> azonenberg: how do you feel about a global instance of a class that hands out queue handles and has a Dispatch() method which, if necessary, grabs the mutex for the given queue before actually dispatching

23:29 <lain> s/Dispatch/Submit/

23:31 <azonenberg> Hmmmm i'm not thrilled with that

23:31 <azonenberg> better idea: have a queue allocator class that hands out queue wrapper objects

23:31 <lain> eyy that was my next suggestion

23:32 <azonenberg> more specifically it gives out shared_ptr<QueueHandle> objects, that may have already been given out to someone else

23:32 <lain> yep, exactly

23:32 <azonenberg> a QueueHandle contains a vk::raii::Queue and a mutex

23:32 <lain> and that would provide the Dispatch method

23:32 <azonenberg> well it would be a submit method that takes a command buffer as an argument

23:32 <azonenberg> the one potential wrinkle is how we'd handle non-blocking submit operations

23:32 <lain> er yes I keep Submit<->Dispatch in my head for some reason

23:33 <azonenberg> i.e. if you don't block on the fence immediately after submitting

23:33 <azonenberg> you don't know when to release the mutex

23:33 <lain> hmm true

23:33 <azonenberg> I guess we can hold onto the fence

23:33 <azonenberg> and have the next submit call wait on it

23:33 <azonenberg> and have an atomic<bool> or something you can use to check

23:33 <azonenberg> idk, think about it

23:33 <lain> k

23:33 <azonenberg> i want to support nonblocking workflows

23:33 <azonenberg> having longer lived fences is probably a good thing wrt performance

23:34 <lain> I like that, I'll give it some more thought and start implementing it

23:34 <azonenberg> right now we create/destroy fences far more often than we need to

23:34 <azonenberg> note that if you are reusing a fence i think it has to be reset somehow

23:34 <azonenberg> check the api,

23:34 <azonenberg> right now we destroy ours after a single use

23:34 <lain> will do

23:34 <azonenberg> meanwhile i'm working on refactoring the fft code to not need a dedicated queue

23:34 <azonenberg> so that will relieve some pressure on limited queues

23:35 <lain> nice

23:39 <_whitenotifier-7> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±3] https://github.com/glscopeclient/scopehal/compare/7a135f408215...4d90bd6546b5

23:39 <_whitenotifier-7> [scopehal] azonenberg 4d90bd6 - Refactored vkFFT wrapper code to not need a dedicated queue

23:41 <azonenberg> lain: other ideas: the allocator object should keep track of how many users of a given queue slot there are somehow

23:41 <azonenberg> maybe instead of giving out shared_ptr's give out unique_ptr's and have them refcount the undelrying vulkan object, idk

23:41 <azonenberg> but basically, if you can avoid locking a mutex because you know a priori that there's no other users of that queue, it's faster

23:42 <azonenberg> i dont know if that can be done in a thread safe fashion faster than locking a mutex

23:42 <azonenberg> the other thing is, there should be a free list so that if we create a queue to do some background processing and destroy it, that queue becomes free and is eligible for reallocation

23:42 <azonenberg> basically, we should try to overlap multiple users in the same physical queue as little as possible

23:42 <azonenberg> to reduce contention

23:43 <lain> agree

23:44 <lain> I was going to suggest handing out shared_ptr's and keeping a weak_ptr, but I'm not sure making that thread safe is any faster than just always grabbing the mutex

23:45 <azonenberg> anyway, build it and we can make the implementation fast later

23:45 <azonenberg> the API should be the same

23:45 <lain> yep

23:45 <azonenberg> If it indeed ends up being a bottleneck

23:45 <azonenberg> optimally, you want to do as much work as possible in a single submit operation

23:45 <azonenberg> so fill up as much as you can in a single commandbuffer