<_whitenotifier-7>
[scopehal] azonenberg 62825f3 - Disabled GPU acceleration in TRCImportFilter since it seems to be broken. Will investigate after queue refactoring is done since a proper fix depends on a queue allocator.
<azonenberg>
numerical precision issues in waveform rendering are back. unsure if refactoring bug or something else
<azonenberg>
(in glscopeclient)
<azonenberg>
investigating...
<azonenberg>
Seems like it was introduced with the vulkan refactoring. that's a massive enough change it will be hard to bisect etc. Time to hav esome fun lol
<azonenberg>
lain: so i figured out the plotRight issue while looking at this
<azonenberg>
turns out you were doing the right thing
<lain>
:O
<azonenberg>
the texture is drawn over the entire gl viewport with no transformation
<azonenberg>
essentially what we end up doing is, allocate the buffer the size of the whole window
<azonenberg>
but only actually invoke the shader on the first plotRight columns
<azonenberg>
so we just waste a little memory
<lain>
ohh
<azonenberg>
We could fix it, but ngscopeclient eliminates the issue entirely so not worth it
<azonenberg>
right now i'm investigating what looks ilke weird numerical precision issues the GL shader doesnt have though
<_whitenotifier-7>
[scopehal-apps] azonenberg 6199908 - Initial implementation of preferences dialog in ngscopeclient. Supports boolean and enum preferences only. See #522.
<_whitenotifier-7>
[scopehal-apps] azonenberg e085a48 - Lots of work on preferences system in ngscopeclient
<_whitenotifier-7>
[scopehal-apps] azonenberg 14e2e26 - Removed comment about unnecessary fix
massi has quit [Remote host closed the connection]
<lain>
azonenberg: so I'm actually having trouble finding a previous commit that works as intended
<azonenberg>
lain: try a10fe6c
<lain>
doesn't work
<lain>
I'll rebuild to see why it didn't work, but that was the first one I tried
<azonenberg>
huh
<azonenberg>
i swear i tried that last night and it was ok
<lain>
will know in a few minutes!
<azonenberg>
anyway, i just want a fix. i'd suggest focusing more on root cause than diagnosing a potential regression
<azonenberg>
treat it as a new bug
<lain>
hm ok
<lain>
I was hoping to use bisect to narrow down the introduction point
<azonenberg>
yeah i know
<lain>
Warning: ReadDataFile: Could not open file "shaders/Convert16BitSamples.spv"
<lain>
terminate called after throwing an instance of 'vk::UnknownError'
<lain>
what(): vkCreateShaderModule: ErrorUnknown
<lain>
fish: “./src/glscopeclient/glscopeclie…” terminated by signal SIGABRT (Abort)
<azonenberg>
lol, ok
<azonenberg>
so that's the import filter not being happy because of missing shader config
<azonenberg>
you can probably just comment out the gpu code in TRCImportFilter as a quick workaround?
<lain>
mmm it just segfaults
<azonenberg>
or also try updating scopehal to latest
<azonenberg>
and only using the old scopehal-apps code
<lain>
oh, I patched it badly
<azonenberg>
ah ok
<azonenberg>
that would do it
<lain>
eh, ok, this has bugs from the vulkan render shaders in it
<lain>
I'mmm just gonna examine this as a new bug as you suggested :P
<azonenberg>
yeah
<azonenberg>
i feel like we've made extensive enough changes trying to bisect would be a nightmare and the diff would essentially be the entire shader rewrite
<lain>
yeahhh
<azonenberg>
as a starting point, i would be very suspicious of all math on x axis values especially if there is any chance of being cast to float32 at some point in the process
<azonenberg>
i attempted to keep stuff in the int64 domain as long as possible such that after the offset, the dynamic range from left to right of a single plot would easily fit in fp32
<azonenberg>
but that may have gone wrong somewhere
<azonenberg>
as a first order experiment, i suggest patching some code at various points to act as if the samples were uniform
<azonenberg>
essentially invent a fictional timebase and ignore the actual offset values in the code
<azonenberg>
see what happens
<azonenberg>
but i'm not sure if that will tell you anything useful
* lain
nod
<azonenberg>
it will at least let you confirm the problem is wrt x axis math i guess
<azonenberg>
That giant gap at left should not be there
<lain>
ah ok
<azonenberg>
you'll notice as you zoom in it comes and goes
<azonenberg>
and some samples appear and disapppear
<azonenberg>
i think due to it miscalculating the left/right bounds of which samples go in which pixels
<lain>
yep
<azonenberg>
so some columns of pixels get no samples or something
<azonenberg>
It would not surprise me if at least some of the bug is in PrepareGeometry() rather than the actual shader
<azonenberg>
We had a similar bug some time ago and iirc converting "xscale" from fp32 to fp64 fixed it
<azonenberg>
but it's fp64 now and we still have the issue
<azonenberg>
but that may not have been a complete fix, idk
<lain>
hmm
<lain>
so I added some quick debug code to WaveformArea::RenderTrace, to just LogDebug whether data->IsDensePacked() is true or false
<lain>
should there be multiple waveforms in the C2-C1 WaveformArea?
<lain>
oh nvm I misread the output
<azonenberg>
The top plot is the uniform analog differential input signal (computed by subtracting two signals from the trc import filter) with three digital overlays
<azonenberg>
there's a threshold which is uniform as well i think
<azonenberg>
the CDR PLL which is sparse
<azonenberg>
and the PRBS check which is sparse and should be all zeroes (no PRBS errors)
octorian has quit [Ping timeout: 244 seconds]
<azonenberg>
the second plot is the extracted cycle to cycle jitter and should be one sample per UI, sparse analog
octorian_ has joined #scopehal
octorian_ is now known as octorian
<lain>
hmm
<lain>
I'm suspicious of the zero hold stuff
<lain>
still debugging
<azonenberg>
yes, i was too
<azonenberg>
it seemed to work ok when i tested prior to merging
<azonenberg>
but i did not, in retrospect, test with any extremely long 1fs resolution sparse waveforms
<azonenberg>
Which is where numerical precision errors would creep in
<lain>
ah, I found a typo that would cause unpredictable behavior
<_whitenotifier-7>
[scopehal-apps] azonenberg bf3c3aa - Added support for preferences of "color" type in ngscopeclient. Added prefs for grid configuration. Improved display of grid to avoid colliding labels. See #522.
massi has joined #scopehal
<azonenberg>
from your refactoring or the zhold patches?
<azonenberg>
so i just realized something that further points to rounding or numerical precision issues
<azonenberg>
Open the test case, click on the timeline, and drag it left/right veeery slowly
<azonenberg>
note what looks like aliasing behavior
<azonenberg>
pixels come and go as the waveform moves
<azonenberg>
and the areas we render seem to shift as the x offset shifts
<azonenberg>
you can also see that the waveforms have an area at the left that always renders correctly, then things go haywire
<azonenberg>
aaand guess what
<lain>
yep
<lain>
:o
<lain>
did you find it lol
<azonenberg>
no
<azonenberg>
but i have more evidence
<azonenberg>
the first problems seem to crop up just about 2.5 ns
<azonenberg>
hmm, nvm. i was off by a few OOMs
<azonenberg>
i was thinking 2^31 fs
<azonenberg>
but that's 2147 ns
<azonenberg>
But, 2^23 fs is only 8 ns
<azonenberg>
(23 bit mantissa in ieee754 fp32)
<azonenberg>
which is the right order of magntiude for where we see problems
<azonenberg>
if we do a bunch of multiplies and divides i could totally see rounding errors becoming >1 pixel at that scale
<lain>
hmm
<lain>
azonenberg: likely unrelated, but I'm looking at the localSize calculation at the start of WaveformArea::RenderTrace, it says it must match COLS_PER_BLOCK in waveform-compute shader, but there is no COLS_PER_BLOCK in there, so I'm guessing this code is obsolete
<lain>
looks like it's just calculating numGroups, which becomes the number of x invocations for the shader
<azonenberg>
let me look, h/o
<azonenberg>
that is for the *y* axis size iirc
<azonenberg>
or wait
<azonenberg>
oh ok. yeah
<azonenberg>
it's mostly obsolete. we used to support two or more columns of pixels in one shader thread
<azonenberg>
one shader group*
<azonenberg>
also, here's an interesting finding
<lain>
ah ok
<azonenberg>
I set indexBuffer to {0}
<azonenberg>
i.e. every thread starts rendering from x=0
<azonenberg>
this is of course stupidly inefficient as we spend a ton of time transforming samples we'll never see
<azonenberg>
everything renders perfectly, then it stops right where it would have gone bad
<lain>
isn't indexBuffer set in PrepareGeometry?
<azonenberg>
Yes
<lain>
oh I see you're syaing you just set it to {0} as a test
<azonenberg>
i patched out the BinarySearchForGequal call to return constant zero
<lain>
fascinating o.o
* lain
thinks
<azonenberg>
i would have expected it to draw the entire waveform, just very slowly
<lain>
indeed
<azonenberg>
fwiw, there is apparently the ability to printf or log messages from shader code in vulkan using the validation layers
<lain>
that smells like a shader issue then
<azonenberg>
i havent used it
<azonenberg>
but it may be worth trying
<azonenberg>
it seems that this is mutually exclusive with gpu assisted bounds checking etc
<azonenberg>
but we dont seem to be going OOB here
<azonenberg>
as none of those checks are firing
<lain>
azonenberg: does your gpu have int64 support? wondering if I can ignore !HAS_INT64 issues
<lain>
iirc my M1 GPU does not have int64, so it's using the more annoying path
<azonenberg>
yes
<lain>
kk
<azonenberg>
your nvidia should too i think
<lain>
seems likely
<lain>
I'm testing on the M1 currently since it doesn't seem to make a difference which machine I test on, and the nvidia machine is back in WA and VNC is slow :P
<d1b2>
<Mughees> Is Virtual machine supported for glscope?
<azonenberg>
mughees: yes and no
<azonenberg>
if you just throw together a VM, it probably won't work well/fast
<azonenberg>
GPU acceleration in VMs tends to not be great
<azonenberg>
with pcie passthrough of a dedicated GPU, it should work fine
<azonenberg>
software emulation using swiftshader may be an option but we have not tested
<d1b2>
<Mughees> alright...actually my hard disk gone bad..so had to switch to a windows laptop
<azonenberg>
software emulation using llvmpipe will work out of the box eventually (although slow) but last time we tried we had some issues with it
<d1b2>
<david.rysk> virgl may work
<d1b2>
<david.rysk> If your host supports it
<d1b2>
<Mughees> it does kind a work..but crashes while genrating a sinewave
<azonenberg>
glscopeclient does build/run on windows also, although the build is a bit of a mess right now
<d1b2>
<david.rysk> IIRC it worked last time I tried, was just not so fast
<d1b2>
<Mughees> In windows, are we able to edit code within MSYS2 environemnt?
<azonenberg>
We're working on improving that. but the best path forward involves leaving GTK and moving entirely to ngscopeclient. which is several months from being at feature parity with glscopeclient
<azonenberg>
msys2/mingw files are just normal files on your windows disk
<azonenberg>
that you can edit with your text editor of choice
<d1b2>
<Mughees> ok
<azonenberg>
so you can edit the code there then compile in msys2
<d1b2>
<Mughees> I think it is better to oreder a new hard 🙂
<azonenberg>
We want to transition away from msys2 and just generate visual studio projects with cmake
<azonenberg>
but while doing that with GTK is not impossible, it's a huge pain
<azonenberg>
One of about a dozen reasons we're leaving GTK
<lain>
ok there is definitely something weird in the shader
<lain>
still debugging, but I'm seeing some nonsense values from the push constants
<lain>
so far I see no evidence of numerical instability, I'm beginning to wonder if this is a logic error
<azonenberg>
very possible
<azonenberg>
All i can say with certainty at this point is it's a bug :p
<azonenberg>
try printing out the actual digital samples you process around x=500 or wherever the big gap is in the CDR trace
<azonenberg>
look at start/end, see if you are taking an early-out too soon or being passed incorrect index data
<bvernoux>
nice latest ngscopeclient rendering in demo is even faster
<bvernoux>
I have about 58fps in full screen
<bvernoux>
before it was slower with framerate between 48fps to 56gps
<bvernoux>
nice there is the cursor too
<bvernoux>
which work fine
<bvernoux>
and the menu is instant even if so far there is only that option it is day and night vs glscopeclient right menu click
<azonenberg>
i mean there isnt much there. glscopeclient does a lot more work to set up the menu
<azonenberg>
but we will definitely be caching heavily
<lain>
azonenberg: time ticks are, what, femtoseconds?
<azonenberg>
in a normal time domain waveform, yes
<azonenberg>
in frequency domain, Hz
<azonenberg>
although we will likely shift to mHz or uHz to provide better resolution in the future
<lain>
hrm
<lain>
not seeing anything weird with the voltage values, interesting...
* lain
tests some ground truths
<azonenberg>
are you looking at the cdr pll or what?
<azonenberg>
the pll output is digital so less things to go wrong i think
<azonenberg>
probably easier to test there
<lain>
hm ok
<lain>
I've been looking at the TIE measurement
<lain>
oooh ok
<lain>
getting somewhere now...
<azonenberg>
oh?
<azonenberg>
are you seeing some pixels that dont render any samples?
<azonenberg>
because i'm very sure that's what is happening
<azonenberg>
i just dont know *why*
<lain>
yes
<lain>
I replaced voltage[i] with a fixed value in the calculations for left and right in the main loop
<lain>
I would expect to see a straight line
<azonenberg>
And you don't?
<lain>
it is a straight line but it has the missing pixels
<azonenberg>
try printf'ing the total number of samples drawn in each column of pixels
<lain>
I think you're right that it's an issue in the x calculation somewhere
<azonenberg>
i bet you'll see some threads that do nothing
<lain>
ahm
<lain>
azonenberg: wait how would you calculate that in the shader?
<lain>
looks like g_updating[gl_LocalInvocationID.y] gets set true if that pixel was updated, else false
<azonenberg>
Yeah
<lain>
ok yeah
<azonenberg>
so waht you'd do is add a bit of logic in thread y=0 of the block
<azonenberg>
that sums g_updating each pass through the main loop
<azonenberg>
into a local
<azonenberg>
records a total count of how many pixels that thread wrote to
<azonenberg>
then after the end of the loop, printf it
<azonenberg>
(only in y=0)
<azonenberg>
since you dont care which *threads* handled the samples
<azonenberg>
you just care about the whole block drawing zilch
massi has quit [Remote host closed the connection]
<lain>
ok I'm forgetting how this works, would I use the global or local invocation's y to check for == 0 ?
<lain>
or I can just look it up in a sec
<lain>
but brb, dr appt for a bit
<azonenberg>
local
<azonenberg>
local is thread index within the block
<lain>
ok back
<lain>
azonenberg: ok so I added a 'shared uint total_updated;', and then at the top of main(), if local y=0 I set total_updated = 0;...
<lain>
near the end of the main loop there's an if(g_updating[y]) { ... }, and I increment total_updated in there, and then after the main loop, if this is y==0, I print the value
<lain>
does that sounds right?
<lain>
and indeed I'm getting plenty of zeroes in the output
<lain>
ok, so why aren't some threads doing anything... hm hmmmmmm!
<azonenberg>
yeah that sounds about right. you might have some race conditions if you dont use atomic shareds or something
<azonenberg>
but for this purpose it should be good enough
<azonenberg>
actually total_updated doesnt even have to be shared
<azonenberg>
you can just only write it in y=0
<lain>
oh, true
<azonenberg>
just make sure you check if g_updating[] i true for any Y
<azonenberg>
anyway, point is you're getting lots of threads doing nothing, just as i suspected
<lain>
wouldn't this set g_done=true right away though?
<azonenberg>
yeah i'm thinking
<azonenberg>
ok my mistake
<azonenberg>
I'm not sure what that check is for
<azonenberg>
oh
<azonenberg>
this is to skip drawing samples before the 0th
<lain>
shouldn't it be < rather than <= ?
<azonenberg>
We currently fill index=0 from the left pixel in the plot up to the first sample that actually contains waveform data
<azonenberg>
it would probably be better to fill with a negative value and change to -
<azonenberg>
change to < 0
<azonenberg>
see WaveformArea::BinarySearchForGequal()
<azonenberg>
if we're searching for a value before the start of the waveform it returns 0
<azonenberg>
(feel free to comment the renderer more aggressively, this is some of the heavily optimized arcane black magic that makes glscopeclient tick lol)
<_whitenotifier-7>
[scopehal-apps] lain 02b3c7f - Fix regression for sparse waveform rendering.
<azonenberg>
lain: fix confirmed
<azonenberg>
thanks. back to macos stuff now? you said your branch is ready to check out and merge?
<lain>
yeah I'm just merging master again right now as there's some changes I need to fix up
<lain>
hrm
<lain>
azonenberg: ok so TextureManager and Texture need access to a queue and cmdbuf under the new queue manager model. iirc the existing code just uses globals. is it safe to have them use the global transfer queue, or does it *need* to be the same render queue used by the main window?
<azonenberg>
The global transfer queue needs to go away
<azonenberg>
Because we need to support operating on intel iGPUs that have only one queue
<lain>
alternatively I can have TextureManager grab a queue of its own, make a command pool, and a command buffer... but then locking access to the cmdbuf is mildly annoying
<azonenberg>
ok let me rephrase
<azonenberg>
we cannot dedicate a queue to transfer permanently
<azonenberg>
so it will have to be a QueueHandle
<lain>
yes
<azonenberg>
Using the global transfer queue should be fine
<lain>
so in my branch, g_vkTransferQueue is a QueueHandle
<azonenberg>
Where it gets challenging is the sharing aspect
<lain>
yeah, avoiding deadlocks
<azonenberg>
where you have to declare it as being shareable to any queue it might be rendered on
<lain>
ohh
<lain>
that sharing aspect
<azonenberg>
yeah
<lain>
hm yeah...
<azonenberg>
So you need to have some way for QueueManager to give you a list of any queue that could possibly be used for rendering, and any queue that could possibly be g_vkTransferQueue
<azonenberg>
and share it with all of them
<lain>
bluh :P
<azonenberg>
If we can statically allocate queues on systems with more queues, and eliminate locking, that will be good
<azonenberg>
but we can't give up functionality on lower end hardware outright
<lain>
or I could have MainWindow pass its render QueueHandle to TextureManager when it instantiates it
<azonenberg>
But the render QueueHandle isn't guaranteed to be the same underlying queue object every time you lock it, right?
<lain>
it is yes
<azonenberg>
ah, ok
<azonenberg>
in that case that's likely the simplest solution
<azonenberg>
yay concurrency :D
<lain>
the only annoying part there is I need to expose a mutex from TextureManager in case multiple Texture objects are being created at the same time (and thus all trying to hit TextureManager::m_cmdBuf at the same time)
<lain>
hm
<azonenberg>
You'd need one anyway, no?
<lain>
yeah
<azonenberg>
because TextureManager has the list of textures that isnt thread safe
<lain>
ah, true
<azonenberg>
But i'm not sure if we actually ever allocate textures outside the main thread
<azonenberg>
i think we only ever use textures for tone mapping (in the main thread) and when loading toolbar icons
<lain>
I'll just make a note
<azonenberg>
Yeah i think it's safe to not lock it
<azonenberg>
we do tons of stuff in background threads and on other queues but it's all compute shaders
<lain>
hmm
<lain>
my branch is a lil crashy, I'll want to fix that before we attempt a merge
<lain>
looks like that's my agenda for tomorrow
<azonenberg>
ok. i'm working on gui preferences and font handling in ngscopeclient
<azonenberg>
shouldn't be conflicting
<lain>
I think it's just an order of operations issue with resource release, will confirm tomorrow
<lain>
oo, also some fence issues around command buffers, I can see how that would slip by with how I've implemented queue manager.. well, should be a pretty easy fix