azonenberg changed the topic of #scopehal to: ngscopeclient, libscopehal, and libscopeprotocols development and testing | https://github.com/ngscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal
<d1b2> <azonenberg> The performance metrics tab shows the sum of all filter graph execution time
<d1b2> <azonenberg> there's currently no logging for single filter instances but it's easy to add yourself if you wanted to do it temporarily
<d1b2> <azonenberg> usually i care more about throughput for that kind of thing so i just hop in vutune and measure total elapsed time over say one minute of acquisitions
<d1b2> <azonenberg> vtune*
<d1b2> <johnsel> right, I'm working off of an offline acquisition
<_whitenotifier-3> [scopehal-apps] azonenberg opened issue #665: FontManager: better fallback vs asserting if invalid file path - https://github.com/ngscopeclient/scopehal-apps/issues/665
<d1b2> <azonenberg> So in that case load the session in a profiler and record total time spent in that filter Refresh() method
<d1b2> <johnsel> as we don't have a good way to produce a datastream for CDR synthethically
<d1b2> <azonenberg> or if you are ok with making local patches for testing, just call GetTime() before and after
<d1b2> <azonenberg> that's a helper method that returns fp64 seconds
<d1b2> <azonenberg> subtract and you get elapsed time
<d1b2> <azonenberg> (cross platform)
<d1b2> <azonenberg> i use it a lot for quick performance tests
<d1b2> <johnsel> Yes I thought it might be useful to use the graph to output that data directly for export to csv
<d1b2> <johnsel> I do have some feedback on the export to CSV functionality
<d1b2> <johnsel> it seems to not deal with multiple columns properly yet
<d1b2> <azonenberg> what do you mean?
<d1b2> <johnsel> and no progress indicator is also not super great
<d1b2> <azonenberg> you specify multiple columns, it makes more inputs
<d1b2> <azonenberg> you hook up each input and you're good to go
<d1b2> <azonenberg> as far as progress goes its just a filter, it runs in a background thread... but yes there is an open ticket for giving some indication of how the filter graph is going
<d1b2> <johnsel> right well in my case that resulted in the file staying at 0 bytes
<d1b2> <azonenberg> huh
<d1b2> <azonenberg> did you actually trigger the export?
<d1b2> <johnsel> I did
<d1b2> <johnsel> I have the data separately in 2 files exported just fine
<d1b2> <azonenberg> the challenge is getting good feedback there and not going overboard on measuring time and slowing things down
<d1b2> <azonenberg> hmmph ok i might look into it later
<d1b2> <azonenberg> i just woke up :p
<d1b2> <azonenberg> (still not 100%)
<d1b2> <johnsel> In my case I used the PCIe example and tried to export the raw data stream and the clock recovery output
<d1b2> <johnsel> for the dnn clock recovery project
<d1b2> <johnsel> absolutely, which is why I'm talking to you and not just slapping something wherever
<d1b2> <johnsel> it may be possbile to make --debug add time-elapsed as an output for each filter and then only run it when there is something connected to the output port
<d1b2> <johnsel> but I prefer to discuss it with you first things like that
<d1b2> <azonenberg> in general i want to avoid going overboard on in-app performance monitors when profilers already exist and are designed to measure things like this
<d1b2> <azonenberg> low resolution overall stats like how much time is spent on rendering and filter graph is useful
<d1b2> <azonenberg> especially since that is the sum of cpu and gpu time
<d1b2> <azonenberg> but the goal of those stats is not to help you optimize one block
<d1b2> <johnsel> right in general I would agree but it makes more sense for my usecase to have the granular data so I have something to compare against
<d1b2> <johnsel> seeing as it will become a very repetitive things for me (and it is hard to pull that data for a full waveformset from nvidia nsight
<d1b2> <azonenberg> yeah so that sounds like somewhere to add temporary debug code and remove when you're done
<d1b2> <azonenberg> Rather than trying to find a general solution
<d1b2> <azonenberg> i throw GetTime() calls all over the place if i'm tweaking stuff
<d1b2> <azonenberg> but there's way too many potential monitor points to justify leaving the overhead of keeping them all in
<d1b2> <johnsel> I'll see what I decide is the best implementation for me, I like the idea of having it as an output on the filter so I can include it in the csv, but it doesn't have to end up in main regardless what I do
<d1b2> <johnsel> fyi, the erroring case
<d1b2> <azonenberg> i'm not seeing any reason at a glance why that shouldn't work
<d1b2> <azonenberg> you will probably have to look at CSVExportFilter in detail to debug
<d1b2> <azonenberg> Having a scalar output on the filter could work, although we dont currently support scalar channels in the csv export
<d1b2> <azonenberg> (doesnt make a ton of sense as csv is intended for vector data)
<d1b2> <johnsel> hmm actually
<d1b2> <azonenberg> Also while i have you here
<d1b2> <johnsel> I closed out ngscopeclient just now
<d1b2> <azonenberg> what's the current CI status
<d1b2> <johnsel> and now there is af ile
<d1b2> <johnsel> 40MB
<d1b2> <azonenberg> oh interesting maybe it delayed the export for some reason. maybe the bug is that it didnt trigger when it should have
<d1b2> <azonenberg> not that the export was broken
<d1b2> <johnsel> so it may be either very slow or have hung on something
<d1b2> <azonenberg> yeah
<d1b2> <azonenberg> the csv export fprintf's every sample
<d1b2> <azonenberg> it's... not exactly fast
<d1b2> <johnsel> oh
<d1b2> <johnsel> yeah
<d1b2> <azonenberg> but its outputting ascii so that's not surprising
<d1b2> <johnsel> room for improvement there for sure
<d1b2> <azonenberg> float to ascii conversion is hard to make fast
<d1b2> <johnsel> still a bit wonky
<d1b2> <azonenberg> yeah i've seen lots of timeouts and such with commits i've pushed lately
<d1b2> <johnsel> yes there was a bug
<d1b2> <johnsel> I think I have it working properly now
<d1b2> <johnsel> but I'm monitoring it
<d1b2> <azonenberg> Ok. well i have a bunch of stuff i'll hopefully be working on tonight so you should have a nice flood of commits to test with lol
<d1b2> <azonenberg> Also, semi related question: how hard would it be to set up the same CI infrastructure to run on other projects under the ngscopeclient org?
<d1b2> <azonenberg> i'm thinking the digilent, pico, etc. bridges
<d1b2> <azonenberg> it'd be nice to have automated check builds of those
<d1b2> <azonenberg> lower priority than the app core of course
<d1b2> <azonenberg> but if we can reuse 99% of the work, probably worth doing eventually
<d1b2> <azonenberg> (They dont need GPU but it might be possible to do hardware in loop tests of them eventually)
<d1b2> <azonenberg> plus of course our builder is fastr than the free tier azure one
<d1b2> <johnsel> Well the current builds are done with ephemeral VMs so they have a lot of overhead time on boot
<d1b2> <johnsel> we could add another long-living builder for the bridges since we don't care about those being ran from a fresh setup every time
<d1b2> <azonenberg> yeah and it probably doesnt need 64GB of ram or anything either
<d1b2> <azonenberg> the bridges are tiny, usually like 4-5 source files
<d1b2> <azonenberg> anyway, its not a priority but please put it on the back of your todo list 🙂
<d1b2> <johnsel> no, so adding (or picking) a template in XOA is step 1, then step 2 is to write a vm config template which is copy paste, adjust vm template and some other basic paramters and then apply the config should bring one up
<d1b2> <johnsel> and then we need to make the .yml files defining the actual build process
<d1b2> <azonenberg> higher up on the list: what do you think of starting to set up CPack to make unofficial .deb, rpm, etc distro packages in CI?
<d1b2> <johnsel> few hours work only, can definitely be done
<d1b2> <azonenberg> separate from any official distro integrations we may get later on
<d1b2> <azonenberg> but so people can just grab a deb and get the binary + all deps pulled in
<d1b2> <johnsel> Yes I think that makes sense
<d1b2> <johnsel> Right now we produce an artifact that you cannot run basically
<d1b2> <azonenberg> yeah lol. i want people to be able to grab and run like they can with the windows builds
<d1b2> <johnsel> it has 2 .so files and an executable int he wrong folders
<d1b2> <azonenberg> which I think can just be installed
<d1b2> <azonenberg> lol
<d1b2> <azonenberg> ok that needs to get fixed :p
<d1b2> <johnsel> yes, install or unzip
<d1b2> <johnsel> so we definitely should do something about that
<d1b2> <azonenberg> do you have time/interest in working on that?
<d1b2> <azonenberg> ideally, we could have that be our official binary releases. find a commit we're happy with and tag it
<d1b2> <azonenberg> and the CI build will become the official release binary
<d1b2> <azonenberg> then just grab it off actions and stick it as a release artifact
<d1b2> <johnsel> I mean, I'd rather not if we can find somebody who likes this stuff more than me
<d1b2> <azonenberg> Lol fair enough
<d1b2> <johnsel> I think we've had some offers over time, no?
<d1b2> <azonenberg> ok i'll add to my notes for the dev call. but somebody's gotta do it
<d1b2> <johnsel> I'd like to finish off the CI work and move to more core development for the foreseeing future, I definitely need to work on the thunderscope + litex integration, and I'd like to build some tensor core and deep neural network stuff next
<d1b2> <azonenberg> Makes sense
<d1b2> <azonenberg> yeah CI was not supposed to be a "project"
<d1b2> <azonenberg> it was supposed to be something we just do, all the time
<d1b2> <johnsel> It has been my focus for a very long time the CI
<d1b2> <johnsel> what do you mean?
<d1b2> <azonenberg> i mean that i wasn't planning for it to take a year to get the local CI setup working lol
<d1b2> <azonenberg> i hoped it would just happen and work :p
<d1b2> <azonenberg> and we could use it
<d1b2> <azonenberg> clearly i underestimated the difficulty
<d1b2> <johnsel> With our requirements it is a little more intricate for sure
<d1b2> <johnsel> Anyway I can help set up some extra runners that’s no problem, but I haven’t packaged a linux app once so I would need to learn that myself and it is pretty well into “administer, not build” the ci architecture
<d1b2> <azonenberg> Yeah
<d1b2> <azonenberg> i'm poking a few folks who have touched that stuff in the past
<d1b2> <johnsel> Cool, I think we'll find somebody who can and wants to contribute that, and if we really don't we can revisit it
<d1b2> <johnsel> hey one more question Andrew, in theory you could load a dataset in a separate c++ application that pulls in scopehal and apply a filter there, right?
<d1b2> <azonenberg> Correct, that was always the goal. I am not aware of anyone currently doing it but i know Codysseus did so at some point in the past doing some USB analysis headlessly
<d1b2> <azonenberg> i want to add some tools to make this easier in the future
<d1b2> <azonenberg> e.g. being able to export a scopesession to either C++ code that instantiates filters and connects them, or some kind of reudced scopesession format that only describes the filter graph that you can load without using ngscopeclient
<d1b2> <azonenberg> (the itnent being to allow prototyping in ngscopeclient then moving to a headless workflow)
<d1b2> <azonenberg> we of course woudl also need docs and such to help people get started bringing up the APIs and such
<d1b2> <johnsel> interesting, might come in useful for me as well
<d1b2> <azonenberg> it's likely bitrotted a bunch
<d1b2> <azonenberg> but yes
<d1b2> <johnsel> most likely
<d1b2> <johnsel> still a decent reference though
<d1b2> <johnsel> If I do end up with something useful like that as an example I'll PR it
<d1b2> <johnsel> The Alinx AXAU15 board ($420 ex VAT) I bought does in fact support PCIe4.0 by the way. I was able to read @ 50Gbit (6.25GByte/s)
<d1b2> <johnsel> Pretty neat huh
<d1b2> <johnsel> Still not the desired 80Gbit I want to stream the full 10Gsa datastream to the host but still
<d1b2> <johnsel> it's only x4 pcie
<d1b2> <johnsel> unfortunately the chips that have PCIe4C and support gen 4 all have only 12 transceivers, so that is dissappointing
<d1b2> <johnsel> I'd kill for one with 16 transceivers so I could do x8 to the ADC and x8 to the host
<d1b2> <johnsel> how would you deal with that? more FPGAs and RAM to write to on the one end and read from on the other?
<d1b2> <johnsel> can you even have 2 FPGAs read/write to the same RAM chips?
<d1b2> <johnsel> maybe passing through 1 FPGA?
<d1b2> <azonenberg> oh cool. as far as two fpgas on one ram generally not possible
<d1b2> <azonenberg> the only modern dual port rams i know of are qdr-ii+ and qdr-iv
<d1b2> <azonenberg> and while they have separate read/write data buses the command/address bus, at least on qdr-ii+, is time shared between the ports
<d1b2> <johnsel> so then you need some custom very wide parallel interface
<d1b2> <azonenberg> As far as transceivers go
<d1b2> <azonenberg> my tentative plan for the big scope was to have five FPGAs for a four channel scope
<d1b2> <azonenberg> each channel would have one (tentatively xcau25p for the smaller scale prototype but i think i'd have to go xcku025/035 for the full scale to get enough performance) and 12 lanes of jesd204 to the adc, then some ddr4 packet buffer
<d1b2> <azonenberg> then reduced rate / bit depth data (e.g. zero crossings or decoded 8b10b symbols or something) would go over some other interface, likely parallel or channel bonded serial LVDS, to the top level FPGA which would do triggering
<d1b2> <azonenberg> then once a trigger happened the top FPGA would use the same intrfaec to read out the ddr buffer and stream it out over 10/25/40/100GbE to the outside world using its transceivers
<d1b2> <azonenberg> ideally with sufficient memory that during the download the acquisition boards could have already re-armed and started capturing more waveforms
<d1b2> <azonenberg> For the stripped down prototype i was going to have an au25p with an RGMII PHY on it and do everything on one board with only a 1G uplink
<d1b2> <azonenberg> just as a proof of concept
<d1b2> <azonenberg> the fpga board would be just fpga and ram basically, then it'd have a serdes + power + i2c connector to the adc board which would basically just be a 6 gsps ad9213
<d1b2> <azonenberg> then that would have probably differential SMA plus a power/control interface to the frontend board
<d1b2> <azonenberg> (minimizing the chances of a frontend bug leading to the loss of a nearly $2k adc)
<d1b2> <johnsel> Yeah we're thining very similarly then
<d1b2> <johnsel> other than the pcie vs network adapter
<d1b2> <azonenberg> yeah
<d1b2> <azonenberg> but in either case its gotta be multi fpga and the interface between them isnt going to be full BW
<d1b2> <azonenberg> because the root FPGA has to have N channels worth of interfaces on it
<d1b2> <azonenberg> plus the external bus interfaec
<d1b2> <azonenberg> tentative plan was for the root fpga to also have some sort of logic analyzer functionality on it
<d1b2> <azonenberg> so you'd have logic channels + SCPI + triggering on the mainboard and then could populate analog channels as budget/requirements dictate
<d1b2> <johnsel> yes you'd definitely have to give up streaming the full bw stream
<d1b2> <azonenberg> (the plan was for the full system to be a modular 1U, possibly slightly less boards than the proto but i still want to be able to repalce the frontend without losing the very expensive adc if i blow an input)
<d1b2> <azonenberg> i mean for the full scale system we're talking 4-8 channels * 10 Gsps * 12 bits
<d1b2> <azonenberg> i dont think there is any kind of ethernet in existence that fast yet
<d1b2> <azonenberg> last time i checked the roadmap only went to 800G :p
<d1b2> <johnsel> No, we're looking at different types of budget hardware too
<d1b2> <azonenberg> Yeah. I'm expecting this scope to cost as much as a nice car
<d1b2> <johnsel> I'm still looking at just above current capabilitites
<d1b2> <johnsel> For a few k
<d1b2> <azonenberg> like, something price competitive with lecroy hdos or tek msos
<d1b2> <azonenberg> give or take a bit
<d1b2> <azonenberg> or at least spec competitive
<d1b2> <johnsel> You may still be able to stream at ~10GSa
<d1b2> <azonenberg> that's where the fun is IMO 😛
<d1b2> <azonenberg> i mean i'll make it as fast as i can
<d1b2> <johnsel> Sure, I agree, but you have to start somewhere
<d1b2> <azonenberg> i dont think ngscopeclient is remotely ready to handle 120 Gbps of streaming waveforms
<d1b2> <azonenberg> thunderscope is already pushing limits right now
<d1b2> <johnsel> Yeah we'll have to change up some things for sure
<d1b2> <johnsel> but that is fine
<d1b2> <johnsel> I think getting 50Gbit/s working will be a good next step
Degi_ has joined #scopehal
<d1b2> <johnsel> I expect to have data streaming by the end of q1 24
<d1b2> <johnsel> at that rate
<d1b2> <azonenberg> one thing that i think will have to change as we move past 10 Gbps is the current model of each channel having one current waveform
<d1b2> <azonenberg> that we use for rendering and processing
<d1b2> <azonenberg> we're going to have to move to a fully pipelined model so we don't have to interlock rendering and filter graph operations
<d1b2> <azonenberg> that would be a major refactoring
<d1b2> <johnsel> yep that's one
<d1b2> <azonenberg> it may be possible to get close, or even just as good performance, by somehow integrating rendering into a FilterGraphExecutor derived class
Degi has quit [Ping timeout: 264 seconds]
Degi_ is now known as Degi
<d1b2> <azonenberg> so a render cycle is scheduled just like a filter execution
<d1b2> <azonenberg> rather than mutexing utnil the entire graph has finished like we do now
<d1b2> <johnsel> that would definitely be fun to test and work out
<d1b2> <johnsel> I think it that can work very well indeed
<d1b2> <johnsel> And right now we do 8bit to float conversion as the first step right?
<d1b2> <johnsel> I think for (perhaps only for) a streaming mode it might also make sense to have 8bit stay along longer in the processing chain
<d1b2> <johnsel> but that's all a bit soon to be thinking off right now I think
<d1b2> <johnsel> anyway I'm off to bed for the night
<d1b2> <johnsel> ttyl
<d1b2> <azonenberg> and well, it's 8 or 16 or whatever
<d1b2> <azonenberg> we can't have that many permutations of every filter especially if we're mixing and matching them
<d1b2> <azonenberg> fp32 is a reasonable interchange format every filter knows how to handle
<d1b2> <azonenberg> we already have a combinatorial explosion of filter kernels in some cases handling e.g. two inputs that could be sparse or uniform
bvernoux has joined #scopehal
<_whitenotifier-3> [scopehal-apps] Fescron starred scopehal-apps - https://github.com/Fescron
<_whitenotifier-3> [scopehal-apps] forthy42 starred scopehal-apps - https://github.com/forthy42
juri_ has quit [Ping timeout: 252 seconds]
juri_ has joined #scopehal
juri_ has quit [Read error: Connection reset by peer]
juri_ has joined #scopehal
juri_ has quit [Read error: Connection reset by peer]
juri_ has joined #scopehal
juri_ has quit [Ping timeout: 256 seconds]
juri_ has joined #scopehal
juri__ has joined #scopehal
juri_ has quit [Ping timeout: 252 seconds]
juri__ has quit [Ping timeout: 255 seconds]
juri_ has joined #scopehal
juri_ has quit [Read error: Connection reset by peer]
juri_ has joined #scopehal
juri_ has quit [Read error: Connection reset by peer]
juri_ has joined #scopehal
juri_ has quit [Ping timeout: 268 seconds]
juri_ has joined #scopehal
bvernoux has quit [Quit: Leaving]