azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal
GenTooMan has joined #scopehal
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 256 seconds]
Degi_ is now known as Degi
<_whitenotifier-7> [scopehal-apps] azonenberg opened issue #477: Add preference to force 16-bit mode on 8-bit LeCroy scopes - https://github.com/glscopeclient/scopehal-apps/issues/477
<_whitenotifier-7> [scopehal-apps] azonenberg labeled issue #477: Add preference to force 16-bit mode on 8-bit LeCroy scopes - https://github.com/glscopeclient/scopehal-apps/issues/477
<azonenberg> Thinking again about incorporating some sort of LLC, nonprofit, or other legal entity for the project to oversee things like collecting donations, paying contract developers, buying parts for probe R&D, etc
<azonenberg> How does "Open Test & Measurement Group" sound as a potential name? Anyone have other ideas?
<Johnsel> I like that name
<azonenberg> (I googled it and it appears to not be taken)
wizardyesterday has joined #scopehal
<wizardyesterday> azonenberg, I started looking at your code on github. I finally have time freed up.
<azonenberg> o/ wizardyesterday
<wizardyesterday> Heya.
<wizardyesterday> I wanted to finish the final chapter of my statistical digital signal processing book to get that out of my mind.
<azonenberg> yeah i saw your fb posts about it
<azonenberg> Have you tried building the latest code yet?
<wizardyesterday> I have no way to clone it. I don't even think my raspberry pi3 has new enough encryption stuff to clone the repository. I'll try that tomorrow though.
<azonenberg> ah ok yeah it wont run on a pi3 for sure
<azonenberg> after this new porting work and the rewrite of the renderer, it will run on a pi4
<wizardyesterday> azonenberg, You realize that you can just use the filters that I have on my github.
<azonenberg> but we're a few weeks out still i think for that
<wizardyesterday> Ahh okay
<azonenberg> And i havent even had a chance to look at any of your code yet. i've been too busy with other stuff :)
<wizardyesterday> Oh I know how that goes. :)
<wizardyesterday> Well my stuff is in the Filters directory for both HackRF and rtlsdr
<azonenberg> right now i'm trying to GPU accelerate the logic in the PicoScope driver that turns raw 8/10/12 bit ADC samples (padded to 16 bits) plus gain/offset codes into float32 volts
<wizardyesterday> I have both, floating point and int16 types.
<azonenberg> so i can better keep up with the multi-Gbps firehose of data the picoscope spits out at me lol
<wizardyesterday> Nice!
<azonenberg> (internally pretty much all scopehal math is done in fp32 sinec that's what AVX, GPUs, etc like - but ADCs usually give fixed point output)
<wizardyesterday> I wouldn't have even bothered to use integer math except for the fact that I do all of my signal processing on a BeagleBone Black board
<azonenberg> To give you a bit of background, we're transitioning away from OpenCL and OpenGL compute shaders and transitioning all of our accelerated math to Vulkan
<azonenberg> Lol, yeah
<wizardyesterday> Okay, I'm listening.
<azonenberg> This project is targeting slightly beefier hardware but probably also higher data rates than your beaglebone can handle
<wizardyesterday> Oh for sure
<azonenberg> one of my benchmark datasets is two complementary channels, 512M points each at 80 Gsps, off a 36 GHz LeCroy Labmaster
<azonenberg> containing a PRBS...9 i think? at 25.78125 Gbps
<wizardyesterday> That's "slightly fast" :)
<azonenberg> yeah but more importantly converted to float32 that's 4GB of waveform data lol
<wizardyesterday> lol
<mxshift> PRBS7 I think
<wizardyesterday> Okay what's PRBS7?
<wizardyesterday> I don't know the acronyms.
<azonenberg> wizardyesterday: Psuedorandom bit sequence
<wizardyesterday> Ahhhhh
<azonenberg> There's a bunch of standard LFSRs with periods of 2^n - 1
<wizardyesterday> Okay that I know.
<wizardyesterday> Cool
<azonenberg> so PRBS-7 is a LFSR with a 127-bit repetition period
<azonenberg> there's standard polynomials the industry has adopted for 7, 9, 15, and 31 bit states
<azonenberg> and 23
<azonenberg> which are commonly used to generate known data for bit error rate testing etc
<mxshift> 100G ethernet training phase uses PRBS9
<wizardyesterday> Oh cool.
<azonenberg> mxshift: in that case it's likely PRBS9
<wizardyesterday> Oh I didn't know they used that stuff
<wizardyesterday> For the equalizers?
<wizardyesterday> Or something else?
<azonenberg> Well in my case it's just a waveform dataset i can throw various math functions at and stress them on a large dataset
<azonenberg> i have another one that's only 4M points of the same data stream
<mxshift> nah, I had forced PRBS7 because the generator I was using supported inversion of that mode and we weren't sure that polarity swapping was working correctly
<azonenberg> but i can pull out the half-gigapoint dataset if i want to do a good benchmark like to see how much some optimization improves performance
<azonenberg> mxshift: i see
<mxshift> wizardyesterday: there's a phase where each link partner sends PRBS9 and the receiver sends back commands requesting the transmitter adjust pre- and post- cursor TX EQ.
<wizardyesterday> azonenberg, Ahh I see.
<azonenberg> wizardyesterday: anyway, so we're doing this big transition because we currently use a mix of OpenGL compute shaders and OpenCL for GPU acceleration
<azonenberg> Neither is supported well by pi4 or apple platforms
<azonenberg> Vulkan now runs on pi4 and there's a compatibility layer to use it on Macs as well
<wizardyesterday> I'll ahve to see what Vulkan is now that I have a nice phone again.
<wizardyesterday> That's what has a modern browser that can render stuff
<azonenberg> It's a new low level GPU API that came out a few years ago
<wizardyesterday> Oh sweet.
<azonenberg> unifies graphics and compute
<azonenberg> it's more work to set up than the old school stuff but gets you closer to the metal and better control of things like exactly what memory regions stuff is stored in, cache coherency, etc
<wizardyesterday> I've never worked with GPUs before.
<wizardyesterday> I remember CUDA years ago
<azonenberg> Yeah this is similar to CUDA but vendor agnostic
<azonenberg> CUDA is nvidia proprietary
<wizardyesterday> Ahhh
<wizardyesterday> Okay this sounds pretty cool.
<azonenberg> you compile your shaders to a portable bytecode then the driver compiles the bytecode to a card-specific native binary
<azonenberg> anyway, long term we'll transition the graphics over to it too but for the short term we'll keep OpenGL for rendering but transition all of the accelerated DSP over
<d1b2> <Darius> which invariably means it gets run through LLVM 😉
<azonenberg> the other reaosn we're doing this is that the options for FFT libraries in OpenCL sucked. clFFT hasn't been updated since 2017, has a hard coded max of 16M points, and several known bugs
<wizardyesterday> So they provide filter structures?
<azonenberg> Vulkan just provides a method to call a compute kernel across an abstract 3-dimensional grid of points
<azonenberg> these points need not correspond to anything physical, it's just a way for the parallel instanecs to tell each other apart
<wizardyesterday> Ahh so the signal processing is performed at higher layers (the app).
<azonenberg> so for example in the case of the "subtract" filter block
<wizardyesterday> Ahh cool.
<azonenberg> the compute kernel processes one sample per logical point
<azonenberg> and is invoked in a degenerate grid with only one dimension (y/z are both 1)
<azonenberg> and x dimension is your buffer size
<wizardyesterday> How huge is the API?
<azonenberg> and the main() of the GPU kernel is a bounds check followed by
<azonenberg> dout[gl_GlobalInvocationID.x] = inP[gl_GlobalInvocationID.x] - inN[gl_GlobalInvocationID.x];
<azonenberg> The bounds check is needed because the grid is actually not a grid of points, it's a grid of fixed size blocks. Blocks are typically multiples of 32 or 64 points
<azonenberg> to match the vector width of the GPU hardware
<azonenberg> so if you have a dataset size that isn't an exact multiple you have to bounds check in the kernel and ignore the last few points in the grid that don't correspond to actual memory locations
<azonenberg> Or you can pad your dataset out with garbage data to a multiple of the size, that works too
<azonenberg> The API is decently large but I have wrappers around most of it. and if we had you writing DSP code you'd be writing the serial software fallback version
<azonenberg> and me or one of the other folks with more GPU experience would probably handle the parallel porting
<azonenberg> generally speaking if the block can be written in a data parallel fashion (a loop over N samples where the iterations are stateless and depend only on the input, not on previous cycles' output) it's straightforward to port to GPU
<wizardyesterday> I think I have a bit of learning to do before I can be of use to you.
<wizardyesterday> However, I'm willing to learn.
<azonenberg> We have lots of more math oriented work where even pseudocode would be super helpful
<azonenberg> and somebody else can handle the low level implementation details
<azonenberg> as a good example, how familiar are you with jitter and the dual Dirac model?
<azonenberg> (probably not very if you're mostly an RF guy)
<wizardyesterday> I know what jitter is. But the dual Dirac model, I have no familiarity.
<wizardyesterday> I'm more of an embedded software and DSP person.
<wizardyesterday> Mostly for SDRs.
<azonenberg> essentially it's a way to fit a simple statistical model to observed jitter (extracted as time interval error between a golden reference clock, usually coming from a PLL, and the transition times of a measurement)
<azonenberg> and extrapolate various jitter terms like periodic, random, and total jitter
<azonenberg> the key is that you can predict what your jitter at a given probability is, say 1e-12 or so, without actually taking that many samples
<azonenberg> by fitting the model to observed samples and modeling the distribution
<azonenberg> this is a capability i want, but we currently lack
<azonenberg> and the math is over my head
<azonenberg> if you do some googling i'm sure you can find reading material
<wizardyesterday> dual Dirac model?
<azonenberg> Yeah
<wizardyesterday> I've studied signal modeling for the past year, but that's different stuff. You know AR, MA, ARMA, ... stuff like that
<wizardyesterday> oh cool thanks
<azonenberg> anyway, ultimately what i want is a straight C function that takes as an input an array of int64_t[]'s containing the time interval error of each sample in the input dataset, measured as femtosecond delta from actual to nominal edge time
<azonenberg> and outputs all of the various standard jitter metrics (Tj, Rj, Dj, Pj) from it
<wizardyesterday> I found a whitepaper on an agilent site..
<azonenberg> i can supply some test data if you need it
<wizardyesterday> Oh cool.
<azonenberg> this is pure math and doesnt involve getting deep into our API so should be easy for you to get started on
<wizardyesterday> Oh cool.
<azonenberg> and it's also something i'm weak on
<wizardyesterday> lol
<wizardyesterday> me too
<azonenberg> i can give you data in a nice easy to parse format like CSV or something
<wizardyesterday> I'll read this first Agilent white paper
<azonenberg> Go for it. yeah, let me know if/when you want some data to try it with
<azonenberg> but i expect it'll take a while to grok the math
<wizardyesterday> I think so
<wizardyesterday> This is just like any of my tasks when I worked at Moto.
<wizardyesterday> azonenberg, Thanks for the info. :)
<azonenberg> Awesome
<wizardyesterday> This paper I'm looking at is, "Jitter Analysis The dual-Dirac Model, RJ/DJ, and Q-Scale.
<wizardyesterday> Hopefully I'm not going down a rabbit hole.
<azonenberg> I don't fully understand what Q-scale is. I think it might be an alternative to dual Dirac
<azonenberg> i.e. different models you can fit to the same data
<wizardyesterday> Ahh
<wizardyesterday> I'll study up
<wizardyesterday> I'm a quick learner.
<azonenberg> so an ELI5 explanation of the differences between them would be helpful
<azonenberg> most big-name serial data suites support both
<wizardyesterday> ELI5?
<azonenberg> explain like i'm 5
<wizardyesterday> lol
<wizardyesterday> You're using them there college words. :)
<wizardyesterday> I'll learn it and distill the information.
<azonenberg> anyway, i already have filter blocks to compute the TIE as well as to use the non-repeating-pattern method to extract data dependent jitter from the TIE
<azonenberg> Which is quite interesting in itself: basically you look at a sliding window of the last N bits, i think we have 7 or 9 in the window of our implementation
<azonenberg> you assume ISI in the channel is short enough to fit within that window
<azonenberg> and then you form a histogram of jitter for each of the 2^n possible combinations of that sliding window
<azonenberg> and by averaging that out, you can figure out the contribution of the data dependent component to the overall jitter
<azonenberg> Then you can go back and subtract the DDJ from the Tj and get the Rj + BUj term
<wizardyesterday> Oh that's pretty interesting.
<azonenberg> I already implement all of this
<wizardyesterday> Sweet.
<azonenberg> what's missing is going from RJ + Buj as a time domain waveform to scalar dual-Dirac Rj, Dj, and Tj values
<azonenberg> and Pj
<wizardyesterday> I think once I read a few papers, I'll understand this stuff a bit more.
<azonenberg> Yeah no rush
<wizardyesterday> Right now, I'm a greenhorn. :)
<azonenberg> I can give you lots of test data under various conditions when you're ready
<azonenberg> i have a PRBS generator board I made that can produce PRBS-9 and PRBS-31 patterns at 1.25, 2.5, 5, and 10.3125 Gbps as well as a channel emulator board that contains PCB traces ranging from 55 to 300 mm in length
<azonenberg> which i can insert between the scope and the signal generator to create varying levels of channel loss
<azonenberg> (thus increasing ISI and jitter)
<wizardyesterday> Cool!
<azonenberg> i've also been working with mxshift and some of the other folks at oxide computer
<azonenberg> so they sent me some much faster data rate characterization waveforms from their 100gbit ethernet backplane
<wizardyesterday> So what exactly is this stuff used for?
<azonenberg> Characterizing high speed serial data transmitters, receivers, and channels
<wizardyesterday> I remember our previous conversation
<wizardyesterday> Ahh okay
<azonenberg> in order to estimate error rates, determine if a link is viable
<azonenberg> how much FEC or equalization you might need
<wizardyesterday> Okay so that's the goal.
<wizardyesterday> Or one goal....
<azonenberg> also tracking down sources of interference/noise
<wizardyesterday> On "channel" cables?
<azonenberg> for example oxide had a link with high jitter that they root caused as a missing terminator on the PLL reference clock
<azonenberg> but the first symptom of the problem was excessive jitter
<azonenberg> which of course you won't see if you can't measure it :)
<wizardyesterday> Okay cool. :)
<azonenberg> this may also be worth reading
<azonenberg> it's the user manual for the serdes characterization software package from LeCroy but spends a while talking about the theory and what the various metrics mean
<wizardyesterday> Ugh! I can't render most https:// URLs
<wizardyesterday> let me check
<wizardyesterday> oh nvm
<wizardyesterday> I'll get it with curl
<wizardyesterday> one sec
<azonenberg> whitepaper also from lecroy talking about jitter callculation
<azonenberg> may be somewhat of a duplicate of the keysight/agilent paper i linked you
<azonenberg> but a second source cant hurt
<azonenberg> i've had all of these bookmarked for a while but never managed to wrap my head around the math
<wizardyesterday> Agreed
<wizardyesterday> Thanks for those documents.
<wizardyesterday> I have them downloaded, and I'll read them tomorrow.
<wizardyesterday> First the whitepaper.
<azonenberg> Great. no rush, this has been on the wishlist for a year or more
<azonenberg> a few more days won't hurt :)
<wizardyesterday> oh yeah!
<wizardyesterday> Oh that's why dual Dirac... you have a pair of impulses
<wizardyesterday> azonenberg, I think I'll sleep soon. You have a good night.
<wizardyesterday> I think this stuff will be fun to implement.
wizardyesterday has quit [Quit: leaving]
<_whitenotifier-7> [scopehal] azonenberg labeled issue #670: Look into recycling waveform memory buffers - https://github.com/glscopeclient/scopehal/issues/670
<_whitenotifier-7> [scopehal] azonenberg labeled issue #670: Look into recycling waveform memory buffers - https://github.com/glscopeclient/scopehal/issues/670
<_whitenotifier-7> [scopehal] azonenberg opened issue #670: Look into recycling waveform memory buffers - https://github.com/glscopeclient/scopehal/issues/670
<azonenberg> welp, just found a bug where i was dispatching nsamples thread *groups* not threads
<azonenberg> that explains some of my other tests being slow :p
<azonenberg> i was launching 64x as many threads as i needed lol
<d1b2> <sierra (she/her)> hey azoneberg - i'd be interested if you had any other "mostly maths" problems that needed a brain thrown at them - i'm coming up on a uni break soon & might be able to spare the cycles. Could be a good way to dip my toes in the pool I think?
<azonenberg> Let me skim the issue tracker and see. i'm sure we do
<azonenberg> We have the clock recovery PLL revamp but i'm already talking to someone else about that so i'll leave that to him unless plans change
<azonenberg> We have some equalization filters, for example a DFE. That might be a bit much to come in on if you're new to the codebase
<azonenberg> oh here's a nice one that should be straightforward
<d1b2> <sierra (she/her)> yeah, probably best to stick to things that are being passed an array or similar
<d1b2> <sierra (she/her)> okay - I'll take a look!
<azonenberg> MIPI DSI has error correction codes in the packet header
<azonenberg> we currently ignore them
<azonenberg> We should, as a minimum, verify that the ECC code is correct
<azonenberg> correction is probably not necessary but you could add if you were feeling dangerous
<azonenberg> all of the packet parsing code is already there, and i can supply test waveforms with hopefully-valid ECC values for you to play with
<azonenberg> let me see what other options we have
<azonenberg> https://github.com/glscopeclient/scopehal/issues/542 is a bit more involved, as is https://github.com/glscopeclient/scopehal/issues/540. both are math-y and need to be done but are probably best saved until you have some familiarity with the code
<azonenberg> https://github.com/glscopeclient/scopehal/issues/599 would be good to have as well but is also probably not a good first project
<_whitenotifier-7> [scopehal] azonenberg closed issue #623: clFFT cannot handle FFTs of >16M points. Figure out how to handle this - https://github.com/glscopeclient/scopehal/issues/623
<_whitenotifier-7> [scopehal] azonenberg commented on issue #623: clFFT cannot handle FFTs of >16M points. Figure out how to handle this - https://github.com/glscopeclient/scopehal/issues/623#issuecomment-1223621337
<d1b2> <sierra (she/her)> I'll take a look at #328 - anywhere that might have the spec for how the header is structured? I have institutional access and I'm not afraid to use it
<azonenberg> Google around for the mipi dsi spec. if you cant find it after a bit of digging i think i have a copy somewhere i can send you the relevant info from
<d1b2> <sierra (she/her)> 👍
massi has joined #scopehal
<_whitenotifier-7> [scopehal] azonenberg pushed 10 commits to master [+1/-0/±22] https://github.com/glscopeclient/scopehal/compare/54f84e7ea32a...23660a3a2697
<_whitenotifier-7> [scopehal] azonenberg d524888 - Added 8 bit sample conversion filter
<_whitenotifier-7> [scopehal] azonenberg 6d52c2e - AcceleratorBuffer: added GetCpuPtr()
<_whitenotifier-7> [scopehal] azonenberg 031e28a - Added flag for enabling/disabling GPU acceleration in scope drivers
<_whitenotifier-7> [scopehal] ... and 7 more commits.
<azonenberg> ok so i tweaked a bunch of stuff but ultimately tonight was a waste of time, my attempt at gpu accelerating the pico driver made it slower so i reverted it :p
<azonenberg> AVX actually outperformed it after transfer delays
<azonenberg> also according to profiling we're losing a *lot* of time to all of the shuffling of offset/duration values we don't actually need
<azonenberg> Which is making me wonder if it might make sense to do the Waveform refactoring sooner rather than later
<azonenberg> meaning like starting tomorrow :p
<azonenberg> every time i try to optimize more i find we're getting held up on the offset/duration fills
<azonenberg> and lots of stuff doesnt even use them
<azonenberg> And i think i can squeeze a ton more performance out of things if i dont have to drag all that extra data around
<azonenberg> ok i give up i'm gonna start doing it right now and see how far i get before i get tired :p
<azonenberg> the longer i put this off the more work i'm going to have to undo when i do it
<azonenberg> this... may take a while
<d1b2> <sierra (she/her)> got a hold of the DSI standard, looks reasonably doable. Might take a while though. Is there a set of code standards for the project?
<azonenberg> We have a C++ coding policy linked somewhere that i should probably add more detail to
<azonenberg> Quick overview: one \t = one level of indentation = 4 columns
<azonenberg> m_ prefix on class member variables, g_ prefix on globals
<azonenberg> curly braces go on their own line
<azonenberg> camel case for identifiers, initial capital for class names FooBarFilter but not for member/global variables g_fooBar m_fooBar
<azonenberg> use Doxygen-style comment annotations on methods (we have nowhere near full coverage in Doxygen but we'll work on that eventually...)
<azonenberg> lib/scopeprotocols/DSIPacketDecoder.cpp is the place to start if you havent found it already
<azonenberg> on line 324 we make a buffer containing all of the data that needs to be input to the ECC algorithm
<azonenberg> and that needs to be checked against s.m_data, the expected check value
<azonenberg> and you should then output a TYPE_ECC_OK or TYPE_ECC_BAD symbol as appropriate
<azonenberg> if you want to be fancy, if you have a correctable error, go back and patch the VC, type, and length fields previously generated
<azonenberg> (in which case you might want to create a third type of symbol for "correctable error")
<Yamakaja> Hey azonenberg, do you have a list of supported / tested devices somewhere? Or more specifically have you tried using scopehal with the DSOZ334A yet?
massi_ has joined #scopehal
massi has quit [Ping timeout: 268 seconds]
Johnsel has quit [Ping timeout: 256 seconds]
Johnsel has joined #scopehal
massi_ has quit [Remote host closed the connection]
Stephie has quit [Read error: Connection reset by peer]
Stephie- has joined #scopehal
<Johnsel> that was supposed to be re: We have a C++ coding policy linked somewhere
<azonenberg> yes that is generally it. it could use more detail
<azonenberg> louis: so yeah this sparse/dense refactoring is probably going to be one of the largest single commits in the history of the project lol
<azonenberg> its gonna make your GetText refactor look tiny
Johnsel has quit [Ping timeout: 252 seconds]
<azonenberg> i have a 1K line diff and i haven't even got the first source file to compile yet
<electronic_eel> azonenberg: running vulkan glscopeclient with my amd gpu open source drivers seems to work, it is showing some vulkan info when running with --debug
<electronic_eel> azonenberg: is it a known issue that 2 tests segfault when running "make test"?
<electronic_eel> the tests are Filter_FrequencyMeasurement and Primitive_SampleOnRisingEdges
<azonenberg> No
<azonenberg> They work and pass for me
<azonenberg> i can look into it in a few days after i finish my current refactoring
<electronic_eel> ok, thanks, i can take a look too
<electronic_eel> like a backtrace
<electronic_eel> but i'll do a PR for installing the new .spv shader files first
<electronic_eel> glscopeclient won't start without those files
<electronic_eel> I'm just running a full build & install to verify that my fix works
<_whitenotifier-7> [scopehal] electroniceel forked the repository - https://github.com/electroniceel
<azonenberg> Ok, yeah that shouldnt break any of my refactoring
<azonenberg> although i will not be merging until i get that taken care of
<azonenberg> in other news: the new PT5 filter arrived today
<azonenberg> so i can assemble that prototype after work
<azonenberg> along with the AD4 that is actually already on the solder paste jig
<azonenberg> but idk i may be too busy coding to actually do that :p
* azonenberg needs a time machine
<_whitenotifier-7> [scopehal] electroniceel opened pull request #671: install the compiled .spv filter files into share/glscopeclient/shaders - https://github.com/glscopeclient/scopehal/pull/671
<electronic_eel> it is just a one-liner, but merge whenever convenient for you
<azonenberg> Will do
<electronic_eel> don't you get confused with all this task juggling between major refactoring and probe building?
<electronic_eel> if i'm in the middle of a big refactoring or similar, i tend to work on it until i drop
<azonenberg> i may put off the probe stuff until the refactoring is done. but i'm mostly doing $dayjob stuff for the rest of the day
<azonenberg> this was too big to do in one sweep
<azonenberg> I did however figure out a good template based approach that should greatly simplify things
<azonenberg> there's still a compile time combinatorial explosion but its hidden from the user in many cases
<azonenberg> just means a lot of helper methods in the Filter class will now be 4x replicated or so by templates
<electronic_eel> btw, is "glslc" just a build time dependency or is it also needed at runtime?
<azonenberg> for every possible combination of sparse or uniform clock and data
<azonenberg> Compile time only. it's the compiler from .glsl to .spv
<azonenberg> which is the compiled shader bytecode
<azonenberg> those .glsl files are no longer needed at run time (although the ones we pass to OpenGL still are)
<azonenberg> anything we give to vulkan is going to be SPIR-V bytecode
<electronic_eel> ok, but these shaders aren't generated (and then compiled) during runtime, like adapted to the filter graph or something
<azonenberg> They're JITted by vulkan to a GPU-specific native binary
<azonenberg> but thats not something the build scripts care about, it happens in ram
<azonenberg> (although there is an option for you to cache the native binary to speed filter graph creation - that is pending but not currently implemented)
<azonenberg> We do not do any runtime shader code generation, no
<electronic_eel> ok. i just wanted to get that right for my rpm packages
<_whitenotifier-7> [scopehal] electroniceel opened issue #672: Segfault in tests Filter_FrequencyMeasurement and Primitive_SampleOnRisingEdges - https://github.com/glscopeclient/scopehal/issues/672
<azonenberg> the opencl message is also a bug, it shouldnt be printed during the test case - but we're refactoring away opencl soon anyway
<azonenberg> so i'm just gonna let it sit until we remove the offending code outright
<azonenberg> as far as the tests go, you should be able to run the test binary directly (not under ctest) and get some more output
<electronic_eel> i don't care about the opencl message, but segfaults are another matter ;)
<azonenberg> as well as a attach a debugger
<azonenberg> yeah i'm just saying, its not related to the crash
<azonenberg> first conjecture: the test case is trying to make Vulkan API calls to allocate memory without having called VulkanInit() first to create a device object
<azonenberg> Right now theres a dozne or so functions you have to call to set up scopehal and its easy to miss a few
<azonenberg> this will be refactored into a cleaner init some time down the road
<electronic_eel> the backtraces were with directly running the binary that cmake created in gdb. but i guess cmake puts some test harness thing around them
<electronic_eel> if there was no call to VulkanInit() first, wouldn't it then also fail for you?
<azonenberg> Well...
<azonenberg> it depends on how your driver is structure vs mine
<azonenberg> in particular whether allocating memory with a null device handle crashes, allocates host-only memory, or silently fails
<azonenberg> pretty sure that behavior is undefined
Johnsel has joined #scopehal
<electronic_eel> ah, sure
<electronic_eel> do you know of an easy way to force using the software vulkan pipeline for the unit tests?
<azonenberg> For debug you could patch the device lookup code in VulkanInit()
<azonenberg> llvmpipe and/or swiftshader will show up as separate devices just like a GPU
<azonenberg> i currently will use either as a last resort behind integrated cards, which are only used if no discrete card is found
<electronic_eel> yeah, i have seen them show up when starting with --debug, there is a list of devices
<azonenberg> Longer term there will be preferences and/or command line args to force use of specific devices
<electronic_eel> but i'd say the tests are more reproducible if you run them with the software stack
<electronic_eel> so something like setting an environment variable or similar
<azonenberg> Yeah
<azonenberg> well, ultimately i plan to refactor the tests and run them using both the pure software (non vulkan) fallback implementations as well as the vulkan version
<azonenberg> and cross check results against each other
<electronic_eel> yeah, that is probably better
<electronic_eel> regarding terminology - isn't the software stack also implementing vulkan? what is the reason you call it "(non vulkan)"?
<azonenberg> I mean using a straight C++ loop, possibly with AVX or other vector intrinsics
<azonenberg> as opposed to using vulkan
<azonenberg> whether the shader runs on a physical GPU or is emulated/cross compiled to run on the CPU is orthogonal to that
<azonenberg> e.g. SubtractFilter will either call InnerLoop(), InnerLoopAVX2(), or call SubtractFilter.svp
<azonenberg> .spv*
<azonenberg> three different implementations of the same algorithm
<electronic_eel> wasn't your plan to move everything to vulkan (be it software implemented or on gpu) and not require a fallback implementation in an extra library (like ffts)?
<electronic_eel> or is this different for the shaders and fft?
<azonenberg> So, in the case of FFT it's an external black box library so we likely will go vulkan only there
<azonenberg> for the other filters, i may keep a C++ reference implementation (deleting the vectorized version if there is one) to cross check against
<azonenberg> just to catch regressions in the shader
<azonenberg> the ref implementation may become part of the unit test vs something in the filter block
<azonenberg> but yes ultimately i want to avoid having too many implementations of each filter
<azonenberg> if we can have one vulkan version of everything that is likely best
<azonenberg> for the near term i'm keeping the old versions because i think they could be valuable for testing
<azonenberg> and ultimately i want unit tests for every filter
<azonenberg> anyway, right now the acceleration work is on hold until the sparse/dense waveform refactoring is done
GenTooMan has quit [Ping timeout: 244 seconds]
GenTooMan has joined #scopehal
Johnsel has quit [Remote host closed the connection]
Johnsel has joined #scopehal
<d1b2> <azonenberg> Well this isn't as bad as i thought it would be
<azonenberg> This one funciton can handle a SparseWaveform or UniformWaveform for data - of any sample type
<azonenberg> a digital waveform - sparse or uniform - as clock
<azonenberg> and correctly validates a bunch of invariants at compile time
<azonenberg> like input/output data type are equal
<azonenberg> and output waveform is sparse
<azonenberg> lots of templating and it's going to probably turn into like 30 implementations at compile time
<azonenberg> but it actually replaces 3 overloaded functions
<azonenberg> the next step will probably be a wrapper around this that uses RTTI to call each of the various template overrides given a WaveformBase*
<azonenberg> but that will come later
<azonenberg> but yeah i'm essentially implementing polymorphic types at compile time using templates
<azonenberg> to avoid the overhead of virtual function calls in inner loops
Johnsel has quit [Remote host closed the connection]