azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/azonenberg/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal
azonenberg1 has joined #scopehal
azonenberg has quit [Ping timeout: 264 seconds]
miek has quit [Ping timeout: 264 seconds]
miek has joined #scopehal
syscall has quit [Ping timeout: 264 seconds]
Kliment has quit [*.net *.split]
Kliment has joined #scopehal
bgamari has joined #scopehal
azonenberg1 is now known as azonenberg
someone--else has quit [Quit: Connection closed]
_whitelogger has joined #scopehal
GenTooMan has joined #scopehal
_whitelogger has joined #scopehal
whitequark has joined #scopehal
_whitenotifier-3 has joined #scopehal
<_whitenotifier-3> [scopehal-apps] kliment forked the repository - https://git.io/JsxUn
<_whitenotifier-3> [scopehal-apps] kliment opened pull request #359: Driver selection dialog polish - https://git.io/JsxUX
<_whitenotifier-3> [scopehal-apps] azonenberg closed pull request #359: Driver selection dialog polish - https://git.io/JsxUX
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 3 commits to master [+0/-0/±5] https://git.io/JsxqR
<_whitenotifier-3> [scopehal-apps] Kliment f8774d2 - Set path to "null" if path is empty with demo/siggen drivers.
<_whitenotifier-3> [scopehal-apps] Kliment e534c85 - Hide siggen driver from UI driver selection dialog and error message
<_whitenotifier-3> [scopehal-apps] azonenberg b4ec0e7 - Merge pull request #359 from kliment/driver-selection-dialog-polish Driver selection dialog polish
<_whitenotifier-3> [scopehal] umarcor opened pull request #486: update logtools - https://git.io/Jsx6B
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±1] https://git.io/JsxPD
<_whitenotifier-3> [scopehal] azonenberg daa09a5 - Updated logtools
<_whitenotifier-3> [starshipraider] azonenberg pushed 3 commits to master [+8/-0/±1] https://git.io/Jsp20
<_whitenotifier-3> [starshipraider] azonenberg 2456334 - Initial AKL-AD3 mechanical design
<_whitenotifier-3> [starshipraider] azonenberg 28f6d10 - Updated README
<_whitenotifier-3> [starshipraider] azonenberg ccf5c6c - Initial schematic and layout for AKL-AD3 amplifier board
<azonenberg> Lol
<azonenberg> guess the bods are back
<azonenberg> bots*
<whitequark> azonenberg: replied to you elsewhere
<whitequark> restarted the bots
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 264 seconds]
Degi_ is now known as Degi
<azonenberg> Great thanks
_whitelogger has joined #scopehal
whitequark has left #scopehal [#scopehal]
balrog has quit [Ping timeout: 272 seconds]
balrog has joined #scopehal
_whitelogger has joined #scopehal
xzcvczx has quit [Ping timeout: 264 seconds]
Bird|ghosted has quit [Read error: Connection reset by peer]
Bird|ghosted has joined #scopehal
<azonenberg> Woop
<azonenberg> It got caught in spam but i just found out :p
<azonenberg> the talk i submitted to hardwear.io was accepted
_whitenotifier-3 has quit [*.net *.split]
balrog has quit [*.net *.split]
Kliment has quit [*.net *.split]
Ekho has quit [*.net *.split]
vup has quit [*.net *.split]
lain has quit [*.net *.split]
<azonenberg> I'm going to be giving a talk on using glscopeclient for hardware reversing
<azonenberg> still working out details with the organizers, i'm thinking what might work best is a 30 min presentation followed by 30 mins of hands-on practice session
<azonenberg> i'll grab some demo waveforms and such people can play with
<azonenberg> noopwafel: ^
_whitenotifier-3 has joined #scopehal
balrog has joined #scopehal
vup has joined #scopehal
Kliment has joined #scopehal
lain has joined #scopehal
Ekho has joined #scopehal
balrog has quit [Ping timeout: 264 seconds]
Kliment_ has joined #scopehal
Kliment has quit [Ping timeout: 264 seconds]
Kliment_ is now known as Kliment
balrog has joined #scopehal
someone--else has joined #scopehal
<_whitenotifier-3> [scopehal] umarcor closed pull request #486: update logtools - https://git.io/Jsx6B
<_whitenotifier-3> [scopehal] umarcor synchronize pull request #486: update logtools - https://git.io/Jsx6B
<_whitenotifier-3> [scopehal-apps] umarcor synchronize pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb
<_whitenotifier-3> [scopehal-apps] umarcor synchronize pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb
<_whitenotifier-3> [scopehal-apps] umarcor commented on pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsjwG
asy is now known as Guest4949
gruetze_ has joined #scopehal
gruetze_ is now known as gruetzkopf
someone--else has quit [Quit: Connection closed]
<kc8apf> azonenberg: nice
<GyrosGeier> the "no clock output from SERDES" thing is basically what bit me yesterday
<GyrosGeier> ah
* GyrosGeier should scroll down before replying
<GyrosGeier> sorry for the noise
someone--else has joined #scopehal
<_whitenotifier-3> [scopehal-apps] azonenberg closed pull request #355: Use make install on Windows (MSYS2) - https://git.io/JsEdb
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 4 commits to master [+2/-0/±5] https://git.io/JGej2
<_whitenotifier-3> [scopehal-apps] umarcor c19a69f - update submodules
<_whitenotifier-3> [scopehal-apps] umarcor f6541c5 - ci/msys2: add PKGBUILD recipe for FFTS
<_whitenotifier-3> [scopehal-apps] umarcor 8070750 - ci: use make install
<_whitenotifier-3> [scopehal-apps] azonenberg 07cd746 - Merge pull request #355 from umarcor/cmake-install Use make install on Windows (MSYS2)
<_whitenotifier-3> [scopehal-apps] azonenberg synchronize pull request #358: Windows portable zip/MSI package builds - https://git.io/JsD6B
<someone--else> is ui supposed to slow down with heavy dsp?
<someone--else> I'm testing current master on windows
<someone--else> all of the ui update (menus, dialogs etc.) delay seems to be inversely proportional to the indicated WFM/s rate
<azonenberg> someone--else: Define "heavy dsp"
<azonenberg> processing takes time if that's what you mean
<azonenberg> The "demo" driver is throttled to run at a fixed... i think 20 or 25 WFM/s
<azonenberg> If you get CPU bound it will run slower
<someone--else> now, if I add a FIR filter to a demo 8B10B waveform, WFM/s slows down to ~2 down from ~20 and all of the ui starts responding with a visible ~half of a second lag
<someone--else> sure, calculations take time
<azonenberg> How big of a FIR?
<someone--else> but what I want to understand is: is the ui update rate currently linked to WFM/s rate or it's a problem with my build/machine/os etc.
<azonenberg> Rendering runs in the main thread, so if rendering is slow it will obviously bog everything else down
<someone--else> not sure how to check the FIR size, but it's a narrowband bandpass filter; I think it's big enough to drop WFM/s down to 2 from 20
<azonenberg> Waveform processing is multithreaded but if you have a ton of stuff in one thread it can lead to a bottleneck
<azonenberg> and yeah you're running at 50 Gsps by default in the demo driver
<azonenberg> so if you are doing either very narrowband or a low pass, it's going to max the CPU
<azonenberg> The other factor to consider is that the FIR is one of the OpenCL accelerated filters
<azonenberg> And if your GPU is getting saturated with DSP, obviously that's going to compete with rendering
<azonenberg> So really it depends on *why* WFM/s is low
<azonenberg> If you're slow due to the scope not triggering very often, or network lag, it won't bog down the UI at all because waveform acquisition happens in a background thread
<azonenberg> but the filter pipeline runs in all threads including the main thread since it's initiated by event handlers in the UI
<azonenberg> So yes, if you're saturated with DSP I'd expect the UI to slow down a bit. As soon as you stop acquisition it should go back to normal responsiveness
<someone--else> ok, so I'm seeing the supposed behavior
<someone--else> is there already an easy/known way to move some load away from the main thread? don't have any understanding of GTK internals
<someone--else> (I'd like to improve the ui responsiveness; asking if there's some pre-existing knowledge on how to best approach this)
<azonenberg> I have some ideas, but there's nothing simple
<azonenberg> Basically right now there's an event handler in the main thread that runs when a new waveform is reported ready from the scope thread
<azonenberg> then it runs the filter graph, pushes a bunch of updates to things like protocol analyzers etc, and triggers a render
<azonenberg> Moving that functionality to another thread would definitely help with responsiveness but it's a nontrivial refactoring
<azonenberg> It's not something i'd recommend a new contributor to the project take on
<azonenberg> Typically most of our slow performance in the past has been bottlenecked on slow *scopes* due to scope firmware, network latency, etc
<azonenberg> and that is hidden fairly well at this point
<azonenberg> that, or rendering - although after recent optimizations the rendering shader is >5x faster so that's becoming less of a bottleneck
<azonenberg> I aggressively optimize individual filters using AVX2/AVX512 and OpenCL where possible, although right now the AVX optimizations are disabled on Windows due to compiler/calling convention problems we still have to debug
<someone--else> thanks! I'll look into the related code a bit later
<azonenberg> You can look but I want to do this refactoring. This would be a wide-ranging change to structure of the whole processing pipeline
<someone--else> though as a general principle I think ui/widgets code should ideally be decoupled from any math so no matter how slow it can get, ui stays snappy
<azonenberg> because some stuff has to run in the main thread due to GTK architecture
<someone--else> sure
<azonenberg> in particular updating the protocol analyzer and history views
<someone--else> is there "run this function on the main thread at an earliest convenience" concept in GTK? javafx, for example, does
<someone--else> have something like this
<azonenberg> Not to my knowledge. But the existing OnWaveformDataReady() handler would work fine. It's just a matter of adding a second thread with another level of queueing in it i think
<azonenberg> so rather than scope directly to UI thread, scope -> processing thread -> UI thread
<azonenberg> The processing thread would then split off more threads using OpenMP to extract filter graph parallelism when possible. Which we do now, but one of those threads is the UI thread
<azonenberg> I'll see what i can do over lucnh
<someone--else> ah, perhaps removing ui thread from the openmp pool could be a low-hanging fruit for this
<azonenberg> No
<azonenberg> The openmp activity is *dispatched from* the UI thread
<azonenberg> and then the UI thread blocks until it returns
<azonenberg> possibly doing some processing in the UI thread
<azonenberg> but the issue is that you're in a UI event handler when that call happens
<azonenberg> that call needs to be initiated from another thread so the UI is free to process other events in the meantime
<azonenberg> Extracting more parallelism in general is definitely something that I'd like to do, because we don't make as much use of multiple cores as I would like for some types of processing
<someone--else> I see
<someone--else> thanks for the explanations, I think I understand the problem better now
<azonenberg> And right now looking in vtune the UI thread is more CPU loaded than anything else in typical scenarios. The other advantage of this refactoring is that it would allow rendering to happen in the UI thread while the DSP thread is already crunching the next waveform
<azonenberg> Right now, DSP for waveform N+1 doesn't start until waveform N finishes rendering
<azonenberg> So we might see an improvement in WFM/s as well as UI responsiveness
<azonenberg> Oh, and we also will need to plan mutexing and such between the threads
<azonenberg> because sometimes UI events can change the behavior of filters
<azonenberg> What happens if, for example, you click "delete" on a FIR filter trace while the DSP thread is doing math on it?
<azonenberg> Or if you resize an eye pattern in the UI thread while it's integrating samples in the DSP thread?
<azonenberg> It needs to happen, I agree, but it's not a trivial change
<azonenberg> there's lots of ways to screw it up and add race conditions or just not work
<azonenberg> oh, and when rendering we need to make sure that the data we're drawing doesn't change under us
<azonenberg> If you just naively add mutexes for all rendering, you'll probably be almost as slow as what we have now
<azonenberg> We might want to do some refactoring of the rendering path so that it pushes data for as much as possible to the GPU early on
<azonenberg> then releases the DSP thread to update that data while it draws off the data in video RAM
<azonenberg> but this gets complicated when you add in things like cursors, since those need access to the waveform too
<someone--else> yep, sounds complicated
<someone--else> I did a decade of c++ a decade ago, hated every single day of it XD
<azonenberg> This isn't a C++ problem, it's a multithreading problem
<azonenberg> you'd have it in any language
<azonenberg> Any time you try to overlap multiple stages of processing you risk changing data while something else is using it
<someone--else> I think it gets better if you don't share any memory between threads
<azonenberg> Yes, but that means lots of copying which also hurts performance :D
<someone--else> well, this solution has its downsides
<azonenberg> I think a reasonable first pass would be to go hog wild with mutexes and lock anything that even *might* conflict. this should at least enable dialogs and menus and similar event handlers that don't touch waveform data to overlap with the DSP thread
<azonenberg> the next step would be to refactor rendering so that we do everything that involves touching the waveform data in CPU RAM early on
<azonenberg> so protocol decode overlays, cursors, and copying waveforms to GPU RAM
<azonenberg> then release the mutex, and allow processing in parallel with the GPU drawing the frame off its copy of the waveform
<azonenberg> as well as all of the compositing, tone mapping, and stuff
<someone--else> but it makes multithreading at least theoretically tractable, while shared memory flavor of it is definitely not XD
<someone--else> this particular problem involving desktop guis and processing is complicated
<someone--else> i agree
<someone--else> reading waveform data from ui thread while somebody else is updating it might be acceptable in this context
<azonenberg> No, it's not
<azonenberg> you can get horrible artifacts, or use-after-frees in some cases where a buffer is resized etc
<azonenberg> No, there has to be a clean sync point where the DSP thread is done updating and data is copied to GPU memory
<azonenberg> then the DSP thread is released to process the next frame
<azonenberg> The copies should be fast
<azonenberg> it's just a matter of restructuring the code so they all happen in one point rather than all throughout the rendering path
<someone--else> sure, it's not ideal
<someone--else> ..but the gpu does when vsync is off :]
<GyrosGeier> there is also the option of CL-GL buffer sharing
<GyrosGeier> that might be helpful in some instances, but make everything complicated
<azonenberg> GyrosGeier: yeah thats not something we want to do here
<azonenberg> anyway, i have a definite plan
<azonenberg> I'll see what i can do over lunch
<_whitenotifier-3> [scopehal-apps] umarcor reviewed pull request #358 commit - https://git.io/JGv89
<_whitenotifier-3> [scopehal] azonenberg pushed 1 commit to master [+0/-0/±3] https://git.io/JGv4o
<_whitenotifier-3> [scopehal] azonenberg 32c7546 - Fixed some bugs causing incorrect channel IDs to be used during serialization
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 1 commit to master [+0/-0/±2] https://git.io/JGv4i
<_whitenotifier-3> [scopehal-apps] azonenberg a936d29 - Fixed some bugs causing incorrect serialization IDs
<azonenberg> someone--else: also interesting, the "demo" signal generator spends a *significant* amount of its time in std::mersenne_twister_engine
<azonenberg> I might want to consider backing off to a simpler PRNG for adding noise, it's not like i need particular spectral properties
<azonenberg> a simple LFSR or LCG would likely be fine and a lot faster
<someone--else> @azonenberg good find; although imo it's plenty fast already as far WFM/s are considered
<azonenberg> Yeah but I'm always looking for stuff i can tune
<azonenberg> you can see activity in the main thread up top, and the scope thread below that doing waveform generation
<azonenberg> then most of the OpenMP threads are idle since i only have one filter
<azonenberg> for this demo i'm looking at a single demo waveform and doing a narrow bandpass on it just like you had
<azonenberg> you can see the UI thread is pretty much saturated
<azonenberg> So i'm going to see what i can do to move that out
<someone--else> is the stack trace for the main/ui thread?
<azonenberg> the big list is the list of all functions sorted by decreasing CPU time
<azonenberg> The call stack on the right is the call stack for that one function i have selected, doing the FIR filter
<azonenberg> So you can see ultimately everything was called from inside OscilloscopeWindow::OnTimer()
<azonenberg> I was going to be spending my lunch refactoring that and ended up getting derailed by a) discovering my vtune install was way out of date and updating it
<azonenberg> and b) finding a bug in serializing waveform overlays that i had to patch instead :p
<someone--else> ah, I see
<azonenberg> There's lots of other views too and ways to configure it. Vtune is an amazing tool, not open source but freeware
<someone--else> yep
<someone--else> btw, I remember you asked about cdr plls on twitter a while ago - perhaps an option for a simple bandpass filter as cdr might be useful
<someone--else> that's why I tested the fir filter
<azonenberg> The CDR we have now works fine for data recovery and basic eye patterns
<azonenberg> what's needed to improve is to implement a PLL with the precise jitter transfer characteristics specified in (to start) the pcie and fibre channel standards
<someone--else> right
<azonenberg> So we can actually get eye patterns suitable for compliance testing
<azonenberg> So it's not a matter of throwing something together, it has to have that exact transfer curve
<azonenberg> and that's the part i don't know how to do
<someone--else> yep, I remember now
<sorear> Have I shilled AES-CTR to you yet? Best statistical properties money can buy and for bulk calls it's one of the faster options as well
<azonenberg> Lol, that's one option
<azonenberg> It looks like the mersenne twister itself is not actually the slow part
<azonenberg> it's std::generate_canonical
<azonenberg> a lot of that and the normal distribution etc
<azonenberg> i bet if i simplified the math it would run a lot faster too
<gruetzkopf> hm, my laptop does around 500MByte/s/thread of AES-128-CTR
<gruetzkopf> (for size 16 blocks, if you go up to 16Ki size blocks it's on the order of 10GBytes/s per thread)
<azonenberg> yeah i think the issue is getting the random ints into floats and getting a normal distribution etc
<azonenberg> to do things more properly i think i'll want to make a vectorized implementatino
<azonenberg> that uses a vector optimized floating point rng
<azonenberg> rather than bouncing between int and float constantly
<azonenberg> But it might be possible to use an int RNG actually. vcvtdq2ps converts 16 int32's to float32's with 4 cycle latency and 0.5 CPI on skylake or 1 CPI on icelake
<azonenberg> I'll do some reading when i get a chance on fast vector random number generation
<azonenberg> But right now the focus is going to be on trying to move waveform processing to another thread
GenTooMan has quit [Remote host closed the connection]
GenTooMan has joined #scopehal
<_whitenotifier-3> [scopehal-apps] azonenberg pushed 1 commit to master [+1/-0/±4] https://git.io/JGvXy
<_whitenotifier-3> [scopehal-apps] azonenberg 71a7a1b - Refactoring: began preparation for moving filter graph processing into new thread
<azonenberg> My lunch break has gone on long enough :P this is progress though
<azonenberg> the next bit is going to be where it gets messy and i'll be writing code that touches a zillion different event handlers to add sync points
sorear has quit [Ping timeout: 272 seconds]
sorear has joined #scopehal
noopwafel has quit [Ping timeout: 272 seconds]
noopwafel has joined #scopehal
electronic_eel has quit [Read error: Connection reset by peer]
electronic_eel has joined #scopehal
Stary has quit [Ping timeout: 272 seconds]
Stary has joined #scopehal
<GenTooMan> heck of a lunch break
<sorear> I think there's an underappreciated parameter "how good do I want my normals to be"
<sorear> in terms of L1 distance from the true normal dist or similar metrics
<GenTooMan> I still want to see a normal anything...