#scopehal on 2021-05-25 — irc logs at libera.irclog.whitequark.org

2021-05-22 06:58 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/azonenberg/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

00:20 azonenberg1 has joined #scopehal

00:23 azonenberg has quit [Ping timeout: 264 seconds]

00:24 miek has quit [Ping timeout: 264 seconds]

00:24 miek has joined #scopehal

00:33 syscall has quit [Ping timeout: 264 seconds]

00:37 Kliment has quit [*.net *.split]

00:37 Kliment has joined #scopehal

00:42 bgamari has joined #scopehal

00:47 azonenberg1 is now known as azonenberg

01:22 someone--else has quit [Quit: Connection closed]

01:42 _whitelogger has joined #scopehal

01:44 GenTooMan has joined #scopehal

01:45 _whitelogger has joined #scopehal

02:48 whitequark has joined #scopehal

02:48 _whitenotifier-3 has joined #scopehal

02:48 <_whitenotifier-3> [scopehal-apps] kliment forked the repository - https://git.io/JsxUn

02:48 <_whitenotifier-3> [scopehal-apps] kliment opened pull request #359: Driver selection dialog polish - https://git.io/JsxUX

02:48 <_whitenotifier-3> [scopehal-apps] azonenberg closed pull request #359: Driver selection dialog polish - https://git.io/JsxUX

02:48 <_whitenotifier-3> [scopehal-apps] azonenberg pushed 3 commits to master [+0/-0/±5] https://git.io/JsxqR

02:49 <_whitenotifier-3> [scopehal-apps] Kliment f8774d2 - Set path to "null" if path is empty with demo/siggen drivers.

02:49 <_whitenotifier-3> [scopehal-apps] Kliment e534c85 - Hide siggen driver from UI driver selection dialog and error message

02:49 <_whitenotifier-3> [scopehal-apps] azonenberg b4ec0e7 - Merge pull request #359 from kliment/driver-selection-dialog-polish Driver selection dialog polish

02:49 <_whitenotifier-3> [scopehal] umarcor opened pull request #486: update logtools - https://git.io/Jsx6B

02:49 <_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JsxPD

02:49 <_whitenotifier-3> [scopehal] azonenberg daa09a5 - Updated logtools

02:49 <_whitenotifier-3> [starshipraider] azonenberg pushed 3 commits to master [+8/-0/±1] https://git.io/Jsp20

02:49 <_whitenotifier-3> [starshipraider] azonenberg 2456334 - Initial AKL-AD3 mechanical design

02:49 <_whitenotifier-3> [starshipraider] azonenberg 28f6d10 - Updated README

02:49 <_whitenotifier-3> [starshipraider] azonenberg ccf5c6c - Initial schematic and layout for AKL-AD3 amplifier board

03:23 <azonenberg> Lol

03:23 <azonenberg> guess the bods are back

03:23 <azonenberg> bots*

03:25 <whitequark> azonenberg: replied to you elsewhere

03:25 <whitequark> restarted the bots

03:29 Degi_ has joined #scopehal

03:29 Degi has quit [Ping timeout: 264 seconds]

03:29 Degi_ is now known as Degi

03:29 <azonenberg> Great thanks

03:42 _whitelogger has joined #scopehal

03:47 whitequark has left #scopehal [#scopehal]

04:19 balrog has quit [Ping timeout: 272 seconds]

04:26 balrog has joined #scopehal

06:42 _whitelogger has joined #scopehal

07:36 xzcvczx has quit [Ping timeout: 264 seconds]

08:10 Bird|ghosted has quit [Read error: Connection reset by peer]

08:10 Bird|ghosted has joined #scopehal

08:23 <azonenberg> Woop

08:24 <azonenberg> It got caught in spam but i just found out :p

08:24 <azonenberg> the talk i submitted to hardwear.io was accepted

08:24 _whitenotifier-3 has quit [*.net *.split]

08:24 balrog has quit [*.net *.split]

08:24 Kliment has quit [*.net *.split]

08:24 Ekho has quit [*.net *.split]

08:24 vup has quit [*.net *.split]

08:24 lain has quit [*.net *.split]

08:25 <azonenberg> I'm going to be giving a talk on using glscopeclient for hardware reversing

08:25 <azonenberg> still working out details with the organizers, i'm thinking what might work best is a 30 min presentation followed by 30 mins of hands-on practice session

08:26 <azonenberg> i'll grab some demo waveforms and such people can play with

08:26 <azonenberg> noopwafel: ^

08:27 _whitenotifier-3 has joined #scopehal

08:27 balrog has joined #scopehal

08:27 vup has joined #scopehal

08:27 Kliment has joined #scopehal

08:27 lain has joined #scopehal

08:27 Ekho has joined #scopehal

08:31 balrog has quit [Ping timeout: 264 seconds]

08:33 Kliment_ has joined #scopehal

08:34 Kliment has quit [Ping timeout: 264 seconds]

08:34 Kliment_ is now known as Kliment

08:35 balrog has joined #scopehal

09:41 someone--else has joined #scopehal

10:10 <_whitenotifier-3> [scopehal] umarcor closed pull request #486: update logtools - https://git.io/Jsx6B

10:10 <_whitenotifier-3> [scopehal] umarcor synchronize pull request #486: update logtools - https://git.io/Jsx6B

10:11 <_whitenotifier-3> [scopehal-apps] umarcor synchronize pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb

10:43 <_whitenotifier-3> [scopehal-apps] umarcor synchronize pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb

11:05 <_whitenotifier-3> [scopehal-apps] umarcor commented on pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsjwG

12:04 asy is now known as Guest4949

12:18 gruetze_ has joined #scopehal

12:18 gruetze_ is now known as gruetzkopf

13:43 someone--else has quit [Quit: Connection closed]

16:01 <kc8apf> azonenberg: nice

16:04 <GyrosGeier> the "no clock output from SERDES" thing is basically what bit me yesterday

16:04 <GyrosGeier> ah

16:04 * GyrosGeier should scroll down before replying

16:04 <GyrosGeier> sorry for the noise

16:27 someone--else has joined #scopehal

17:10 <_whitenotifier-3> [scopehal-apps] azonenberg closed pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb

17:10 <_whitenotifier-3> [scopehal-apps] azonenberg pushed 4 commits to master [+2/-0/±5] https://git.io/JGej2

17:10 <_whitenotifier-3> [scopehal-apps] umarcor c19a69f - update submodules

17:10 <_whitenotifier-3> [scopehal-apps] umarcor f6541c5 - ci/msys2: add PKGBUILD recipe for FFTS

17:10 <_whitenotifier-3> [scopehal-apps] umarcor 8070750 - ci: use make install

17:10 <_whitenotifier-3> [scopehal-apps] azonenberg 07cd746 - Merge pull request #355 from umarcor/cmake-install Use make install on Windows (MSYS2)

17:10 <_whitenotifier-3> [scopehal-apps] azonenberg synchronize pull request #358: Windows portable zip/MSI package builds - https://git.io/JsD6B

17:15 <someone--else> is ui supposed to slow down with heavy dsp?

17:16 <someone--else> I'm testing current master on windows

17:16 <someone--else> all of the ui update (menus, dialogs etc.) delay seems to be inversely proportional to the indicated WFM/s rate

17:17 <azonenberg> someone--else: Define "heavy dsp"

17:17 <azonenberg> processing takes time if that's what you mean

17:17 <azonenberg> The "demo" driver is throttled to run at a fixed... i think 20 or 25 WFM/s

17:17 <azonenberg> If you get CPU bound it will run slower

17:18 <someone--else> now, if I add a FIR filter to a demo 8B10B waveform, WFM/s slows down to ~2 down from ~20 and all of the ui starts responding with a visible ~half of a second lag

17:18 <someone--else> sure, calculations take time

17:18 <azonenberg> How big of a FIR?

17:19 <someone--else> but what I want to understand is: is the ui update rate currently linked to WFM/s rate or it's a problem with my build/machine/os etc.

17:20 <azonenberg> Rendering runs in the main thread, so if rendering is slow it will obviously bog everything else down

17:20 <someone--else> not sure how to check the FIR size, but it's a narrowband bandpass filter; I think it's big enough to drop WFM/s down to 2 from 20

17:20 <azonenberg> Waveform processing is multithreaded but if you have a ton of stuff in one thread it can lead to a bottleneck

17:20 <azonenberg> and yeah you're running at 50 Gsps by default in the demo driver

17:20 <azonenberg> so if you are doing either very narrowband or a low pass, it's going to max the CPU

17:21 <azonenberg> The other factor to consider is that the FIR is one of the OpenCL accelerated filters

17:21 <azonenberg> And if your GPU is getting saturated with DSP, obviously that's going to compete with rendering

17:21 <azonenberg> So really it depends on *why* WFM/s is low

17:22 <azonenberg> If you're slow due to the scope not triggering very often, or network lag, it won't bog down the UI at all because waveform acquisition happens in a background thread

17:22 <azonenberg> but the filter pipeline runs in all threads including the main thread since it's initiated by event handlers in the UI

17:22 <azonenberg> So yes, if you're saturated with DSP I'd expect the UI to slow down a bit. As soon as you stop acquisition it should go back to normal responsiveness

17:23 <someone--else> ok, so I'm seeing the supposed behavior

17:25 <someone--else> is there already an easy/known way to move some load away from the main thread? don't have any understanding of GTK internals

17:27 <someone--else> (I'd like to improve the ui responsiveness; asking if there's some pre-existing knowledge on how to best approach this)

17:29 <azonenberg> I have some ideas, but there's nothing simple

17:29 <azonenberg> Basically right now there's an event handler in the main thread that runs when a new waveform is reported ready from the scope thread

17:29 <azonenberg> then it runs the filter graph, pushes a bunch of updates to things like protocol analyzers etc, and triggers a render

17:29 <azonenberg> Moving that functionality to another thread would definitely help with responsiveness but it's a nontrivial refactoring

17:30 <azonenberg> It's not something i'd recommend a new contributor to the project take on

17:31 <azonenberg> Typically most of our slow performance in the past has been bottlenecked on slow *scopes* due to scope firmware, network latency, etc

17:31 <azonenberg> and that is hidden fairly well at this point

17:31 <azonenberg> that, or rendering - although after recent optimizations the rendering shader is >5x faster so that's becoming less of a bottleneck

17:32 <azonenberg> I aggressively optimize individual filters using AVX2/AVX512 and OpenCL where possible, although right now the AVX optimizations are disabled on Windows due to compiler/calling convention problems we still have to debug

17:33 <someone--else> thanks! I'll look into the related code a bit later

17:33 <azonenberg> You can look but I want to do this refactoring. This would be a wide-ranging change to structure of the whole processing pipeline

17:33 <someone--else> though as a general principle I think ui/widgets code should ideally be decoupled from any math so no matter how slow it can get, ui stays snappy

17:33 <azonenberg> because some stuff has to run in the main thread due to GTK architecture

17:34 <someone--else> sure

17:34 <azonenberg> in particular updating the protocol analyzer and history views

17:35 <someone--else> is there "run this function on the main thread at an earliest convenience" concept in GTK? javafx, for example, does

17:35 <someone--else> have something like this

17:36 <azonenberg> Not to my knowledge. But the existing OnWaveformDataReady() handler would work fine. It's just a matter of adding a second thread with another level of queueing in it i think

17:37 <azonenberg> so rather than scope directly to UI thread, scope -> processing thread -> UI thread

17:37 <azonenberg> The processing thread would then split off more threads using OpenMP to extract filter graph parallelism when possible. Which we do now, but one of those threads is the UI thread

17:37 <azonenberg> I'll see what i can do over lucnh

17:39 <someone--else> ah, perhaps removing ui thread from the openmp pool could be a low-hanging fruit for this

17:39 <azonenberg> No

17:39 <azonenberg> The openmp activity is *dispatched from* the UI thread

17:39 <azonenberg> and then the UI thread blocks until it returns

17:39 <azonenberg> possibly doing some processing in the UI thread

17:40 <azonenberg> but the issue is that you're in a UI event handler when that call happens

17:40 <azonenberg> that call needs to be initiated from another thread so the UI is free to process other events in the meantime

17:40 <azonenberg> Extracting more parallelism in general is definitely something that I'd like to do, because we don't make as much use of multiple cores as I would like for some types of processing

17:41 <someone--else> I see

17:41 <someone--else> thanks for the explanations, I think I understand the problem better now

17:42 <azonenberg> And right now looking in vtune the UI thread is more CPU loaded than anything else in typical scenarios. The other advantage of this refactoring is that it would allow rendering to happen in the UI thread while the DSP thread is already crunching the next waveform

17:42 <azonenberg> Right now, DSP for waveform N+1 doesn't start until waveform N finishes rendering

17:43 <azonenberg> So we might see an improvement in WFM/s as well as UI responsiveness

17:45 <azonenberg> Oh, and we also will need to plan mutexing and such between the threads

17:45 <azonenberg> because sometimes UI events can change the behavior of filters

17:45 <azonenberg> What happens if, for example, you click "delete" on a FIR filter trace while the DSP thread is doing math on it?

17:45 <azonenberg> Or if you resize an eye pattern in the UI thread while it's integrating samples in the DSP thread?

17:46 <azonenberg> It needs to happen, I agree, but it's not a trivial change

17:46 <azonenberg> there's lots of ways to screw it up and add race conditions or just not work

17:46 <azonenberg> oh, and when rendering we need to make sure that the data we're drawing doesn't change under us

17:47 <azonenberg> If you just naively add mutexes for all rendering, you'll probably be almost as slow as what we have now

17:47 <azonenberg> We might want to do some refactoring of the rendering path so that it pushes data for as much as possible to the GPU early on

17:48 <azonenberg> then releases the DSP thread to update that data while it draws off the data in video RAM

17:48 <azonenberg> but this gets complicated when you add in things like cursors, since those need access to the waveform too

17:48 <someone--else> yep, sounds complicated

17:49 <someone--else> I did a decade of c++ a decade ago, hated every single day of it XD

17:49 <azonenberg> This isn't a C++ problem, it's a multithreading problem

17:49 <azonenberg> you'd have it in any language

17:49 <azonenberg> Any time you try to overlap multiple stages of processing you risk changing data while something else is using it

17:50 <someone--else> I think it gets better if you don't share any memory between threads

17:50 <azonenberg> Yes, but that means lots of copying which also hurts performance :D

17:50 <someone--else> well, this solution has its downsides

17:51 <azonenberg> I think a reasonable first pass would be to go hog wild with mutexes and lock anything that even *might* conflict. this should at least enable dialogs and menus and similar event handlers that don't touch waveform data to overlap with the DSP thread

17:51 <azonenberg> the next step would be to refactor rendering so that we do everything that involves touching the waveform data in CPU RAM early on

17:51 <azonenberg> so protocol decode overlays, cursors, and copying waveforms to GPU RAM

17:51 <azonenberg> then release the mutex, and allow processing in parallel with the GPU drawing the frame off its copy of the waveform

17:52 <azonenberg> as well as all of the compositing, tone mapping, and stuff

17:52 <someone--else> but it makes multithreading at least theoretically tractable, while shared memory flavor of it is definitely not XD

17:53 <someone--else> this particular problem involving desktop guis and processing is complicated

17:53 <someone--else> i agree

17:55 <someone--else> reading waveform data from ui thread while somebody else is updating it might be acceptable in this context

17:55 <azonenberg> No, it's not

17:55 <azonenberg> you can get horrible artifacts, or use-after-frees in some cases where a buffer is resized etc

17:56 <azonenberg> No, there has to be a clean sync point where the DSP thread is done updating and data is copied to GPU memory

17:56 <azonenberg> then the DSP thread is released to process the next frame

17:56 <azonenberg> The copies should be fast

17:56 <azonenberg> it's just a matter of restructuring the code so they all happen in one point rather than all throughout the rendering path

17:57 <someone--else> sure, it's not ideal

17:59 <someone--else> ..but the gpu does when vsync is off :]

17:59 <GyrosGeier> there is also the option of CL-GL buffer sharing

18:00 <GyrosGeier> that might be helpful in some instances, but make everything complicated

18:02 <azonenberg> GyrosGeier: yeah thats not something we want to do here

18:03 <azonenberg> anyway, i have a definite plan

18:03 <azonenberg> I'll see what i can do over lunch

19:05 <_whitenotifier-3> [scopehal-apps] umarcor reviewed pull request #358 commit - https://git.io/JGv89

19:10 <_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±3] https://git.io/JGv4o

19:10 <_whitenotifier-3> [scopehal] azonenberg 32c7546 - Fixed some bugs causing incorrect channel IDs to be used during serialization

19:10 <_whitenotifier-3> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JGv4i

19:10 <_whitenotifier-3> [scopehal-apps] azonenberg a936d29 - Fixed some bugs causing incorrect serialization IDs

19:30 <azonenberg> someone--else: also interesting, the "demo" signal generator spends a *significant* amount of its time in std::mersenne_twister_engine

19:30 <azonenberg> I might want to consider backing off to a simpler PRNG for adding noise, it's not like i need particular spectral properties

19:31 <azonenberg> a simple LFSR or LCG would likely be fine and a lot faster

19:33 <someone--else> @azonenberg good find; although imo it's plenty fast already as far WFM/s are considered

19:33 <azonenberg> Yeah but I'm always looking for stuff i can tune

19:34 <azonenberg> https://www.antikernel.net/temp/demo-vtune.png

19:34 <azonenberg> you can see activity in the main thread up top, and the scope thread below that doing waveform generation

19:34 <azonenberg> then most of the OpenMP threads are idle since i only have one filter

19:35 <azonenberg> for this demo i'm looking at a single demo waveform and doing a narrow bandpass on it just like you had

19:35 <azonenberg> you can see the UI thread is pretty much saturated

19:35 <azonenberg> So i'm going to see what i can do to move that out

19:36 <someone--else> is the stack trace for the main/ui thread?

19:37 <azonenberg> the big list is the list of all functions sorted by decreasing CPU time

19:38 <azonenberg> The call stack on the right is the call stack for that one function i have selected, doing the FIR filter

19:38 <azonenberg> So you can see ultimately everything was called from inside OscilloscopeWindow::OnTimer()

19:39 <azonenberg> I was going to be spending my lunch refactoring that and ended up getting derailed by a) discovering my vtune install was way out of date and updating it

19:39 <azonenberg> and b) finding a bug in serializing waveform overlays that i had to patch instead :p

19:39 <someone--else> ah, I see

19:40 <azonenberg> There's lots of other views too and ways to configure it. Vtune is an amazing tool, not open source but freeware

19:41 <someone--else> yep

19:41 <someone--else> btw, I remember you asked about cdr plls on twitter a while ago - perhaps an option for a simple bandpass filter as cdr might be useful

19:42 <someone--else> that's why I tested the fir filter

19:42 <azonenberg> The CDR we have now works fine for data recovery and basic eye patterns

19:42 <azonenberg> what's needed to improve is to implement a PLL with the precise jitter transfer characteristics specified in (to start) the pcie and fibre channel standards

19:43 <someone--else> right

19:43 <azonenberg> So we can actually get eye patterns suitable for compliance testing

19:43 <azonenberg> So it's not a matter of throwing something together, it has to have that exact transfer curve

19:43 <azonenberg> and that's the part i don't know how to do

19:44 <someone--else> yep, I remember now

19:58 <sorear> Have I shilled AES-CTR to you yet? Best statistical properties money can buy and for bulk calls it's one of the faster options as well

19:58 <azonenberg> Lol, that's one option

19:59 <azonenberg> It looks like the mersenne twister itself is not actually the slow part

19:59 <azonenberg> it's std::generate_canonical

19:59 <azonenberg> a lot of that and the normal distribution etc

19:59 <azonenberg> i bet if i simplified the math it would run a lot faster too

20:01 <gruetzkopf> hm, my laptop does around 500MByte/s/thread of AES-128-CTR

20:03 <gruetzkopf> (for size 16 blocks, if you go up to 16Ki size blocks it's on the order of 10GBytes/s per thread)

20:04 <azonenberg> yeah i think the issue is getting the random ints into floats and getting a normal distribution etc

20:05 <azonenberg> to do things more properly i think i'll want to make a vectorized implementatino

20:05 <azonenberg> that uses a vector optimized floating point rng

20:05 <azonenberg> rather than bouncing between int and float constantly

20:09 <azonenberg> But it might be possible to use an int RNG actually. vcvtdq2ps converts 16 int32's to float32's with 4 cycle latency and 0.5 CPI on skylake or 1 CPI on icelake

20:09 <azonenberg> I'll do some reading when i get a chance on fast vector random number generation

20:10 <azonenberg> But right now the focus is going to be on trying to move waveform processing to another thread

20:53 GenTooMan has quit [Remote host closed the connection]

20:54 GenTooMan has joined #scopehal

21:04 <_whitenotifier-3> [scopehal-apps] azonenberg pushed 1 commit to master [+1/-0/±4] https://git.io/JGvXy

21:04 <_whitenotifier-3> [scopehal-apps] azonenberg 71a7a1b - Refactoring: began preparation for moving filter graph processing into new thread

21:05 <azonenberg> My lunch break has gone on long enough :P this is progress though

21:06 <azonenberg> the next bit is going to be where it gets messy and i'll be writing code that touches a zillion different event handlers to add sync points

21:19 sorear has quit [Ping timeout: 272 seconds]

21:19 sorear has joined #scopehal

21:20 noopwafel has quit [Ping timeout: 272 seconds]

21:20 noopwafel has joined #scopehal

21:20 electronic_eel has quit [Read error: Connection reset by peer]

21:21 electronic_eel has joined #scopehal

21:21 Stary has quit [Ping timeout: 272 seconds]

21:21 Stary has joined #scopehal

22:32 <GenTooMan> heck of a lunch break

22:35 <sorear> I think there's an underappreciated parameter "how good do I want my normals to be"

22:36 <sorear> in terms of L1 distance from the true normal dist or similar metrics

22:42 <GenTooMan> I still want to see a normal anything...