#scopehal on 2023-12-14 — irc logs at libera.irclog.whitequark.org

2023-10-21 05:40 azonenberg changed the topic of #scopehal to: ngscopeclient, libscopehal, and libscopeprotocols development and testing | https://github.com/ngscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

02:04 <d1b2> <azonenberg> The performance metrics tab shows the sum of all filter graph execution time

02:04 <d1b2> <azonenberg> there's currently no logging for single filter instances but it's easy to add yourself if you wanted to do it temporarily

02:04 <d1b2> <azonenberg> usually i care more about throughput for that kind of thing so i just hop in vutune and measure total elapsed time over say one minute of acquisitions

02:05 <d1b2> <azonenberg> vtune*

02:06 <d1b2> <johnsel> right, I'm working off of an offline acquisition

02:06 <_whitenotifier-3> [scopehal-apps] azonenberg opened issue #665: FontManager: better fallback vs asserting if invalid file path - https://github.com/ngscopeclient/scopehal-apps/issues/665

02:06 <d1b2> <azonenberg> So in that case load the session in a profiler and record total time spent in that filter Refresh() method

02:07 <d1b2> <johnsel> as we don't have a good way to produce a datastream for CDR synthethically

02:07 <d1b2> <azonenberg> or if you are ok with making local patches for testing, just call GetTime() before and after

02:07 <d1b2> <azonenberg> that's a helper method that returns fp64 seconds

02:07 <d1b2> <azonenberg> subtract and you get elapsed time

02:07 <d1b2> <azonenberg> (cross platform)

02:07 <d1b2> <azonenberg> i use it a lot for quick performance tests

02:08 <d1b2> <johnsel> Yes I thought it might be useful to use the graph to output that data directly for export to csv

02:08 <d1b2> <johnsel> I do have some feedback on the export to CSV functionality

02:08 <d1b2> <johnsel> it seems to not deal with multiple columns properly yet

02:08 <d1b2> <azonenberg> what do you mean?

02:08 <d1b2> <johnsel> and no progress indicator is also not super great

02:08 <d1b2> <azonenberg> you specify multiple columns, it makes more inputs

02:09 <d1b2> <azonenberg> you hook up each input and you're good to go

02:09 <d1b2> <azonenberg> as far as progress goes its just a filter, it runs in a background thread... but yes there is an open ticket for giving some indication of how the filter graph is going

02:09 <d1b2> <johnsel> right well in my case that resulted in the file staying at 0 bytes

02:09 <d1b2> <azonenberg> huh

02:09 <d1b2> <azonenberg> did you actually trigger the export?

02:09 <d1b2> <johnsel> I did

02:09 <d1b2> <johnsel> I have the data separately in 2 files exported just fine

02:09 <d1b2> <azonenberg> the challenge is getting good feedback there and not going overboard on measuring time and slowing things down

02:10 <d1b2> <azonenberg> hmmph ok i might look into it later

02:10 <d1b2> <azonenberg> i just woke up :p

02:10 <d1b2> <azonenberg> (still not 100%)

02:10 <d1b2> <johnsel> In my case I used the PCIe example and tried to export the raw data stream and the clock recovery output

02:10 <d1b2> <johnsel> for the dnn clock recovery project

02:11 <d1b2> <johnsel> absolutely, which is why I'm talking to you and not just slapping something wherever

02:12 <d1b2> <johnsel> it may be possbile to make --debug add time-elapsed as an output for each filter and then only run it when there is something connected to the output port

02:12 <d1b2> <johnsel> but I prefer to discuss it with you first things like that

02:13 <d1b2> <azonenberg> in general i want to avoid going overboard on in-app performance monitors when profilers already exist and are designed to measure things like this

02:13 <d1b2> <azonenberg> low resolution overall stats like how much time is spent on rendering and filter graph is useful

02:13 <d1b2> <azonenberg> especially since that is the sum of cpu and gpu time

02:13 <d1b2> <azonenberg> but the goal of those stats is not to help you optimize one block

02:14 <d1b2> <johnsel> right in general I would agree but it makes more sense for my usecase to have the granular data so I have something to compare against

02:15 <d1b2> <johnsel> seeing as it will become a very repetitive things for me (and it is hard to pull that data for a full waveformset from nvidia nsight

02:15 <d1b2> <azonenberg> yeah so that sounds like somewhere to add temporary debug code and remove when you're done

02:15 <d1b2> <azonenberg> Rather than trying to find a general solution

02:15 <d1b2> <azonenberg> i throw GetTime() calls all over the place if i'm tweaking stuff

02:15 <d1b2> <azonenberg> but there's way too many potential monitor points to justify leaving the overhead of keeping them all in

02:17 <d1b2> <johnsel> I'll see what I decide is the best implementation for me, I like the idea of having it as an output on the filter so I can include it in the csv, but it doesn't have to end up in main regardless what I do

02:18 <d1b2> <johnsel> https://cdn.discordapp.com/attachments/776941750291267595/1184680704127270932/image.png?ex=658cdae2&is=657a65e2&hm=204e4238e59d17573a43242786c35af39d0e3c388689af294d5d0eb0c2e1b1eb&

02:18 <d1b2> <johnsel> fyi, the erroring case

02:19 <d1b2> <azonenberg> i'm not seeing any reason at a glance why that shouldn't work

02:19 <d1b2> <azonenberg> you will probably have to look at CSVExportFilter in detail to debug

02:20 <d1b2> <azonenberg> Having a scalar output on the filter could work, although we dont currently support scalar channels in the csv export

02:20 <d1b2> <azonenberg> (doesnt make a ton of sense as csv is intended for vector data)

02:21 <d1b2> <johnsel> hmm actually

02:21 <d1b2> <azonenberg> Also while i have you here

02:21 <d1b2> <johnsel> I closed out ngscopeclient just now

02:21 <d1b2> <azonenberg> what's the current CI status

02:21 <d1b2> <johnsel> and now there is af ile

02:21 <d1b2> <johnsel> 40MB

02:22 <d1b2> <azonenberg> oh interesting maybe it delayed the export for some reason. maybe the bug is that it didnt trigger when it should have

02:22 <d1b2> <azonenberg> not that the export was broken

02:22 <d1b2> <johnsel> so it may be either very slow or have hung on something

02:22 <d1b2> <azonenberg> yeah

02:22 <d1b2> <azonenberg> the csv export fprintf's every sample

02:22 <d1b2> <azonenberg> it's... not exactly fast

02:22 <d1b2> <johnsel> oh

02:22 <d1b2> <johnsel> yeah

02:22 <d1b2> <azonenberg> but its outputting ascii so that's not surprising

02:22 <d1b2> <johnsel> room for improvement there for sure

02:22 <d1b2> <azonenberg> float to ascii conversion is hard to make fast

02:22 <d1b2> <johnsel> still a bit wonky

02:23 <d1b2> <azonenberg> yeah i've seen lots of timeouts and such with commits i've pushed lately

02:23 <d1b2> <johnsel> yes there was a bug

02:23 <d1b2> <johnsel> I think I have it working properly now

02:23 <d1b2> <johnsel> but I'm monitoring it

02:24 <d1b2> <azonenberg> Ok. well i have a bunch of stuff i'll hopefully be working on tonight so you should have a nice flood of commits to test with lol

02:24 <d1b2> <azonenberg> Also, semi related question: how hard would it be to set up the same CI infrastructure to run on other projects under the ngscopeclient org?

02:24 <d1b2> <azonenberg> i'm thinking the digilent, pico, etc. bridges

02:25 <d1b2> <azonenberg> it'd be nice to have automated check builds of those

02:25 <d1b2> <azonenberg> lower priority than the app core of course

02:25 <d1b2> <azonenberg> but if we can reuse 99% of the work, probably worth doing eventually

02:25 <d1b2> <azonenberg> (They dont need GPU but it might be possible to do hardware in loop tests of them eventually)

02:25 <d1b2> <azonenberg> plus of course our builder is fastr than the free tier azure one

02:26 <d1b2> <johnsel> Well the current builds are done with ephemeral VMs so they have a lot of overhead time on boot

02:26 <d1b2> <johnsel> we could add another long-living builder for the bridges since we don't care about those being ran from a fresh setup every time

02:26 <d1b2> <azonenberg> yeah and it probably doesnt need 64GB of ram or anything either

02:27 <d1b2> <azonenberg> the bridges are tiny, usually like 4-5 source files

02:27 <d1b2> <azonenberg> anyway, its not a priority but please put it on the back of your todo list 🙂

02:28 <d1b2> <johnsel> no, so adding (or picking) a template in XOA is step 1, then step 2 is to write a vm config template which is copy paste, adjust vm template and some other basic paramters and then apply the config should bring one up

02:28 <d1b2> <johnsel> and then we need to make the .yml files defining the actual build process

02:28 <d1b2> <azonenberg> higher up on the list: what do you think of starting to set up CPack to make unofficial .deb, rpm, etc distro packages in CI?

02:28 <d1b2> <johnsel> few hours work only, can definitely be done

02:28 <d1b2> <azonenberg> separate from any official distro integrations we may get later on

02:29 <d1b2> <azonenberg> but so people can just grab a deb and get the binary + all deps pulled in

02:29 <d1b2> <johnsel> Yes I think that makes sense

02:29 <d1b2> <johnsel> Right now we produce an artifact that you cannot run basically

02:29 <d1b2> <azonenberg> yeah lol. i want people to be able to grab and run like they can with the windows builds

02:29 <d1b2> <johnsel> it has 2 .so files and an executable int he wrong folders

02:29 <d1b2> <azonenberg> which I think can just be installed

02:29 <d1b2> <azonenberg> lol

02:29 <d1b2> <azonenberg> ok that needs to get fixed :p

02:29 <d1b2> <johnsel> yes, install or unzip

02:29 <d1b2> <johnsel> so we definitely should do something about that

02:30 <d1b2> <azonenberg> do you have time/interest in working on that?

02:30 <d1b2> <azonenberg> ideally, we could have that be our official binary releases. find a commit we're happy with and tag it

02:30 <d1b2> <azonenberg> and the CI build will become the official release binary

02:30 <d1b2> <azonenberg> then just grab it off actions and stick it as a release artifact

02:30 <d1b2> <johnsel> I mean, I'd rather not if we can find somebody who likes this stuff more than me

02:30 <d1b2> <azonenberg> Lol fair enough

02:30 <d1b2> <johnsel> I think we've had some offers over time, no?

02:30 <d1b2> <azonenberg> ok i'll add to my notes for the dev call. but somebody's gotta do it

02:32 <d1b2> <johnsel> I'd like to finish off the CI work and move to more core development for the foreseeing future, I definitely need to work on the thunderscope + litex integration, and I'd like to build some tensor core and deep neural network stuff next

02:32 <d1b2> <azonenberg> Makes sense

02:32 <d1b2> <azonenberg> yeah CI was not supposed to be a "project"

02:32 <d1b2> <azonenberg> it was supposed to be something we just do, all the time

02:32 <d1b2> <johnsel> It has been my focus for a very long time the CI

02:33 <d1b2> <johnsel> what do you mean?

02:33 <d1b2> <azonenberg> i mean that i wasn't planning for it to take a year to get the local CI setup working lol

02:33 <d1b2> <azonenberg> i hoped it would just happen and work :p

02:33 <d1b2> <azonenberg> and we could use it

02:33 <d1b2> <azonenberg> clearly i underestimated the difficulty

02:34 <d1b2> <johnsel> With our requirements it is a little more intricate for sure

02:35 <d1b2> <johnsel> Anyway I can help set up some extra runners that’s no problem, but I haven’t packaged a linux app once so I would need to learn that myself and it is pretty well into “administer, not build” the ci architecture

02:38 <d1b2> <azonenberg> Yeah

02:38 <d1b2> <azonenberg> i'm poking a few folks who have touched that stuff in the past

02:41 <d1b2> <johnsel> Cool, I think we'll find somebody who can and wants to contribute that, and if we really don't we can revisit it

02:42 <d1b2> <johnsel> hey one more question Andrew, in theory you could load a dataset in a separate c++ application that pulls in scopehal and apply a filter there, right?

02:45 <d1b2> <azonenberg> Correct, that was always the goal. I am not aware of anyone currently doing it but i know Codysseus did so at some point in the past doing some USB analysis headlessly

02:45 <d1b2> <azonenberg> i want to add some tools to make this easier in the future

02:46 <d1b2> <azonenberg> e.g. being able to export a scopesession to either C++ code that instantiates filters and connects them, or some kind of reudced scopesession format that only describes the filter graph that you can load without using ngscopeclient

02:46 <d1b2> <azonenberg> (the itnent being to allow prototyping in ngscopeclient then moving to a headless workflow)

02:46 <d1b2> <azonenberg> we of course woudl also need docs and such to help people get started bringing up the APIs and such

02:47 <d1b2> <johnsel> https://github.com/Codysseus/scopehal-apps/blob/090559f33cfe5f09c5d545bead24ec38f6d65d58/src/examples/usbcsv/main.cpp

02:47 <d1b2> <johnsel> interesting, might come in useful for me as well

02:49 <d1b2> <azonenberg> it's likely bitrotted a bunch

02:49 <d1b2> <azonenberg> but yes

02:49 <d1b2> <johnsel> most likely

02:49 <d1b2> <johnsel> still a decent reference though

02:50 <d1b2> <johnsel> If I do end up with something useful like that as an example I'll PR it

02:51 <d1b2> <johnsel> The Alinx AXAU15 board ($420 ex VAT) I bought does in fact support PCIe4.0 by the way. I was able to read @ 50Gbit (6.25GByte/s)

02:51 <d1b2> <johnsel> Pretty neat huh

02:51 <d1b2> <johnsel> Still not the desired 80Gbit I want to stream the full 10Gsa datastream to the host but still

02:52 <d1b2> <johnsel> it's only x4 pcie

02:53 <d1b2> <johnsel> unfortunately the chips that have PCIe4C and support gen 4 all have only 12 transceivers, so that is dissappointing

02:54 <d1b2> <johnsel> I'd kill for one with 16 transceivers so I could do x8 to the ADC and x8 to the host

02:54 <d1b2> <johnsel> how would you deal with that? more FPGAs and RAM to write to on the one end and read from on the other?

02:55 <d1b2> <johnsel> can you even have 2 FPGAs read/write to the same RAM chips?

02:55 <d1b2> <johnsel> maybe passing through 1 FPGA?

02:56 <d1b2> <azonenberg> oh cool. as far as two fpgas on one ram generally not possible

02:56 <d1b2> <azonenberg> the only modern dual port rams i know of are qdr-ii+ and qdr-iv

02:57 <d1b2> <azonenberg> and while they have separate read/write data buses the command/address bus, at least on qdr-ii+, is time shared between the ports

02:57 <d1b2> <johnsel> so then you need some custom very wide parallel interface

02:58 <d1b2> <azonenberg> As far as transceivers go

02:58 <d1b2> <azonenberg> my tentative plan for the big scope was to have five FPGAs for a four channel scope

02:59 <d1b2> <azonenberg> each channel would have one (tentatively xcau25p for the smaller scale prototype but i think i'd have to go xcku025/035 for the full scale to get enough performance) and 12 lanes of jesd204 to the adc, then some ddr4 packet buffer

03:00 <d1b2> <azonenberg> then reduced rate / bit depth data (e.g. zero crossings or decoded 8b10b symbols or something) would go over some other interface, likely parallel or channel bonded serial LVDS, to the top level FPGA which would do triggering

03:00 <d1b2> <azonenberg> then once a trigger happened the top FPGA would use the same intrfaec to read out the ddr buffer and stream it out over 10/25/40/100GbE to the outside world using its transceivers

03:01 <d1b2> <azonenberg> ideally with sufficient memory that during the download the acquisition boards could have already re-armed and started capturing more waveforms

03:01 <d1b2> <azonenberg> For the stripped down prototype i was going to have an au25p with an RGMII PHY on it and do everything on one board with only a 1G uplink

03:01 <d1b2> <azonenberg> just as a proof of concept

03:02 <d1b2> <azonenberg> the fpga board would be just fpga and ram basically, then it'd have a serdes + power + i2c connector to the adc board which would basically just be a 6 gsps ad9213

03:02 <d1b2> <azonenberg> then that would have probably differential SMA plus a power/control interface to the frontend board

03:03 <d1b2> <azonenberg> (minimizing the chances of a frontend bug leading to the loss of a nearly $2k adc)

03:03 <d1b2> <johnsel> Yeah we're thining very similarly then

03:03 <d1b2> <johnsel> other than the pcie vs network adapter

03:03 <d1b2> <azonenberg> yeah

03:03 <d1b2> <azonenberg> but in either case its gotta be multi fpga and the interface between them isnt going to be full BW

03:03 <d1b2> <azonenberg> because the root FPGA has to have N channels worth of interfaces on it

03:04 <d1b2> <azonenberg> plus the external bus interfaec

03:04 <d1b2> <azonenberg> tentative plan was for the root fpga to also have some sort of logic analyzer functionality on it

03:04 <d1b2> <azonenberg> so you'd have logic channels + SCPI + triggering on the mainboard and then could populate analog channels as budget/requirements dictate

03:05 <d1b2> <johnsel> yes you'd definitely have to give up streaming the full bw stream

03:05 <d1b2> <azonenberg> (the plan was for the full system to be a modular 1U, possibly slightly less boards than the proto but i still want to be able to repalce the frontend without losing the very expensive adc if i blow an input)

03:05 <d1b2> <azonenberg> i mean for the full scale system we're talking 4-8 channels * 10 Gsps * 12 bits

03:06 <d1b2> <azonenberg> i dont think there is any kind of ethernet in existence that fast yet

03:06 <d1b2> <azonenberg> last time i checked the roadmap only went to 800G :p

03:06 <d1b2> <johnsel> No, we're looking at different types of budget hardware too

03:06 <d1b2> <azonenberg> Yeah. I'm expecting this scope to cost as much as a nice car

03:06 <d1b2> <johnsel> I'm still looking at just above current capabilitites

03:06 <d1b2> <johnsel> For a few k

03:06 <d1b2> <azonenberg> like, something price competitive with lecroy hdos or tek msos

03:07 <d1b2> <azonenberg> give or take a bit

03:07 <d1b2> <azonenberg> or at least spec competitive

03:07 <d1b2> <johnsel> You may still be able to stream at ~10GSa

03:07 <d1b2> <azonenberg> that's where the fun is IMO 😛

03:07 <d1b2> <azonenberg> i mean i'll make it as fast as i can

03:07 <d1b2> <johnsel> Sure, I agree, but you have to start somewhere

03:08 <d1b2> <azonenberg> i dont think ngscopeclient is remotely ready to handle 120 Gbps of streaming waveforms

03:08 <d1b2> <azonenberg> thunderscope is already pushing limits right now

03:09 <d1b2> <johnsel> Yeah we'll have to change up some things for sure

03:09 <d1b2> <johnsel> but that is fine

03:09 <d1b2> <johnsel> I think getting 50Gbit/s working will be a good next step

03:10 Degi_ has joined #scopehal

03:10 <d1b2> <johnsel> I expect to have data streaming by the end of q1 24

03:10 <d1b2> <johnsel> at that rate

03:10 <d1b2> <azonenberg> one thing that i think will have to change as we move past 10 Gbps is the current model of each channel having one current waveform

03:10 <d1b2> <azonenberg> that we use for rendering and processing

03:10 <d1b2> <azonenberg> we're going to have to move to a fully pipelined model so we don't have to interlock rendering and filter graph operations

03:11 <d1b2> <azonenberg> that would be a major refactoring

03:11 <d1b2> <johnsel> yep that's one

03:11 <d1b2> <azonenberg> it may be possible to get close, or even just as good performance, by somehow integrating rendering into a FilterGraphExecutor derived class

03:11 Degi has quit [Ping timeout: 264 seconds]

03:11 Degi_ is now known as Degi

03:12 <d1b2> <azonenberg> so a render cycle is scheduled just like a filter execution

03:12 <d1b2> <azonenberg> rather than mutexing utnil the entire graph has finished like we do now

03:12 <d1b2> <johnsel> that would definitely be fun to test and work out

03:12 <d1b2> <johnsel> I think it that can work very well indeed

03:13 <d1b2> <johnsel> And right now we do 8bit to float conversion as the first step right?

03:14 <d1b2> <johnsel> I think for (perhaps only for) a streaming mode it might also make sense to have 8bit stay along longer in the processing chain

03:14 <d1b2> <johnsel> but that's all a bit soon to be thinking off right now I think

03:16 <d1b2> <johnsel> anyway I'm off to bed for the night

03:16 <d1b2> <johnsel> ttyl

03:18 <d1b2> <azonenberg> and well, it's 8 or 16 or whatever

03:19 <d1b2> <azonenberg> we can't have that many permutations of every filter especially if we're mixing and matching them

03:19 <d1b2> <azonenberg> fp32 is a reasonable interchange format every filter knows how to handle

03:19 <d1b2> <azonenberg> we already have a combinatorial explosion of filter kernels in some cases handling e.g. two inputs that could be sparse or uniform

11:59 bvernoux has joined #scopehal

18:39 <_whitenotifier-3> [scopehal-apps] Fescron starred scopehal-apps - https://github.com/Fescron

19:31 <_whitenotifier-3> [scopehal-apps] forthy42 starred scopehal-apps - https://github.com/forthy42

19:38 juri_ has quit [Ping timeout: 252 seconds]

19:40 juri_ has joined #scopehal

19:51 juri_ has quit [Read error: Connection reset by peer]

19:53 juri_ has joined #scopehal

19:57 juri_ has quit [Read error: Connection reset by peer]

19:58 juri_ has joined #scopehal

20:08 juri_ has quit [Ping timeout: 256 seconds]

20:09 juri_ has joined #scopehal

20:58 juri__ has joined #scopehal

21:01 juri_ has quit [Ping timeout: 252 seconds]

21:08 juri__ has quit [Ping timeout: 255 seconds]

21:10 juri_ has joined #scopehal

21:15 juri_ has quit [Read error: Connection reset by peer]

21:15 juri_ has joined #scopehal

21:37 juri_ has quit [Read error: Connection reset by peer]

21:38 juri_ has joined #scopehal

21:48 juri_ has quit [Ping timeout: 268 seconds]

21:50 juri_ has joined #scopehal

21:59 bvernoux has quit [Quit: Leaving]