<_whitenotifier-7>
[scopehal-apps] 602p a9c0924 - Update glscopeclient for GetText/GetColor move
<_whitenotifier-7>
[scopehal-apps] azonenberg 0b81863 - Merge pull request #472 from 602p/protocolwaveform_refactor Refactor GetText/GetColor to be properties on protocol waveforms
<azonenberg>
(yes i know the build is broken, i have fixes en route)
<Johnsel>
I've reached a point where I think my best bet is to reverse shell from the CI MinGW to do some debugging by hand
<Johnsel>
I wonder why nobody at GitHub has thought maybe giving ssh access to the runners is a good idea
<_whitenotifier-7>
[scopehal] azonenberg b92ab6b - Added warning for invalid type
<_whitenotifier-7>
[scopehal] azonenberg c5435f4 - Added default case to avoid compiler warning (appears to be false positive, does not appear reachable)
<_whitenotifier-7>
[scopehal-apps] azonenberg b9da377 - Removed comment about multiple streams now that we are getting text from the stream rather than the channel
<Johnsel>
azonenberg: we had some discussions around m1 and I ended up renting a m1 mac mini vps to test the CI runner on m1 (which works, btw!)
<Johnsel>
I see they also have a gpu instance, for eur 1.08/hr
<Johnsel>
ubuntu only though
<azonenberg>
That is expensive enough we probably wouldn't want a 24/7 instance. Could we spin it up only when a commit happens?
<azonenberg>
(almost $800/month - we'd be better off just buying a cheap 2U of commodity PC parts at that point)
<Johnsel>
in theory, yes, in practice we'd have to orchestrate that ourselves
<azonenberg>
Yeah i feel like this is the kind of thing that could easily go wrong and result in thousands of dollars in hosting charges
<azonenberg>
at this point i think dedicated hardware is the more practical option
<azonenberg>
although as discussed earlier, i can probably arrange a single VM (either windows or linux) for local GPU hosting/testintg
<Johnsel>
I agree, a single somewhat beefy server with one or 2 simple GPUs (depending on if it can be shared/alternated between) would probably end up best in the long run, fast builds, easy management.
<azonenberg>
Thinking windows might be better, since i can use it as a windows dev instance too
<azonenberg>
and then possibly dedicate it to CI in the future
<Johnsel>
I'm not sure if you want to use that instance for other things, ideally we'd have something dedicated (and even more ideally scripted so it can be re-deployed in a known configuration)
Degi_ has joined #scopehal
<Johnsel>
The best case would be if we just have virtual server that can spin up whatever OS and version even, because not everything is Windows 11 or Ubuntu LTS
<azonenberg>
yeah i'm thinking right now, only having one GPU handy
<Johnsel>
But you'd have to see what you can dedicate
<azonenberg>
having a windows dev box i can use to not break the build when working on my own
<azonenberg>
that may or may not also be used for CI
Degi has quit [Ping timeout: 256 seconds]
<azonenberg>
is probably the best use of that spare GPU in my existing VM server
Degi_ is now known as Degi
<Johnsel>
It's a shame gpu virtualization is such a nightmare
<azonenberg>
yes agreed
<azonenberg>
pcie passthrough is about the only viable option
<Johnsel>
the only thing doing it reliably is windows 11
<azonenberg>
SR-IOV for GPUs is basically not a thing
<Johnsel>
anyway since you said it does not exist :)
<Johnsel>
and yes I spent many many hours on it
<Johnsel>
it is practically nonexistent
<azonenberg>
Yeah i have no problem buying a quadro or something if that's what it takes. my problem is that it doesnt seem possible
GenTooMan has joined #scopehal
<Johnsel>
also re: workstation/ci windows install. I think the job runner is intended for server 2022, not sure if that would be an issue for you. Or if it might work anyway
<azonenberg>
I think i probably have 10 pro on that vm but it's been a while sinec i fired it up
<azonenberg>
i dont think it's 7/8
<Johnsel>
I have 11 so I could test the job runner on my machine if I find the time
<Johnsel>
I may end up doing that if this reverse shell does not pan out
GenTooMan has quit [Ping timeout: 244 seconds]
GenTooMan has joined #scopehal
<Johnsel>
what's interesting is that cmake doesn't run on the osx runner, even though it is installed
<Johnsel>
that might point to env var/PATH weirdness set by the runner
<_whitenotifier-7>
[scopehal] azonenberg 72038a2 - Refactoring: Waveform now uses AcceleratorBuffer for waveform storage instead of std::vector. Removed the abomination that was EmptyConstructorWrapper.
<_whitenotifier-7>
[scopehal-apps] azonenberg 6af245a - Updated Sampling unit test for AcceleratorBuffer refactor
<azonenberg>
louis: So i just finished implementing the subset of std::vector that the code depends on
<azonenberg>
and refactored Waveform to use AcceleratorBuffer<T> instead of the mess of vectors we had before
<azonenberg>
we're only ever allocating CPU-side memory, and this *should* not break anything
<azonenberg>
but that's done and pushed
<azonenberg>
next step is going to be rewriting the filter graph scheduler, and then i can start actually writing a simple compute filter - maybe subtraction? as a vulkan kernel
<azonenberg>
once i get that infrastructure in, migrating the rest of the opencl code should be straightforward
<azonenberg>
hopefully i can get that done over the next week or two
<azonenberg>
lain: see above
<azonenberg>
i'm maybe 1-2 days of work away from having all of the infrastructure ready for the vulkan renderer stuff to begin
<azonenberg>
so by the time you're ready to start, all of that should be clear and you should be ungated
bvernoux1 has joined #scopehal
bvernoux has quit [Ping timeout: 252 seconds]
GenTooMan has quit [Ping timeout: 252 seconds]
GenTooMan has joined #scopehal
GenTooMan has quit [Ping timeout: 244 seconds]
GenTooMan has joined #scopehal
Johnsel has joined #scopehal
<azonenberg>
So apparently not all of our supported platforms - my debian stable included - has std::barrier yet
<azonenberg>
so i have to reimplement it. oops
<Johnsel>
oops
<Johnsel>
ffts is going away right?
<azonenberg>
Yes. ffts and clFFT are both essentially dead, neither has had any commits whatsoever in the past five years
<azonenberg>
i used them because there were no good permissively licensed alternatives
<azonenberg>
we're transitioning fully to vkFFT long term
<azonenberg>
still a week or two out probably
<Johnsel>
yeah I though so, just as well because it won't build on m1 osx
<Johnsel>
is there a place where notes can go that are useful for others but which aren't written very tidily?
<azonenberg>
i think somebody else here (was it femto?) attempted it
<azonenberg>
and found there was a small tweak
<azonenberg>
that was enough to make it compile
<azonenberg>
Did not realize this was a blocker to the M1 work though, so good to know now
<Johnsel>
it's not, if I skip it
<Johnsel>
or at least not this work
<Johnsel>
if there's no way to exclude the ffts dependency from the build then it might block lain though yes
<azonenberg>
Yeah focus on the CI stuff. lain will be doing all of the actual code porting
<azonenberg>
FFTS is only used in a handful of filters
<azonenberg>
so its probably possible to just disable them temporarily and port the rest
<Johnsel>
yeah I wasn't planning to, I just noticed that cmake ran on the latest test ci run so I thought I'd start copy in some deps and commands to see how far the build will go
<Johnsel>
but nor ./configure nor cmake build ffts
<azonenberg>
Yeah, it's assumed to be installed as a systemwide dependency
<Johnsel>
it gets skipped though in the main cmake
<azonenberg>
oh, our cmakelists doesnt check for it?
<azonenberg>
i'd fix that if we were keeping it around any longer
<Johnsel>
it checks, but it'll keep going
<azonenberg>
but not worth the effort this late in the game
<azonenberg>
ah i see
<Johnsel>
so it's not an issue for me to just pretend it's not an error is what I mean
<azonenberg>
yeah makes sense
<azonenberg>
How much did you say the M1 cloud instance you're testing on costs per day?
<azonenberg>
thinking about cost of dedicated hardware vs renting one near/long term
<Johnsel>
but I am building a pretty useful list of install commands from my command history that would be useful to put somewhere shared
<azonenberg>
I'd just keep it locally and send it direct to lain when the time comes
<Johnsel>
the m1 instance costs .10eur/hr
<Johnsel>
fixed rate
<Johnsel>
so 2.40/day
<Johnsel>
75-ish/mo
<Johnsel>
EU hosted only though I believe
<Johnsel>
and you do not want to vnc into it
<azonenberg>
Lol
<azonenberg>
That's not bad at all. I think it's very likely we can get budget for that at least for a few months until we get some dedicated hardware spun up
<Johnsel>
terminal works fine though it's more than fast enough for testing
<azonenberg>
Great. for CI latency is not a concern
<azonenberg>
so i dont think this will be an issue at all
<Johnsel>
yeah just depends on the total build time
<Johnsel>
I suspect it will be ok though
<azonenberg>
yeah. for local testing if you want, try just commenting out the filters i mentioned from scopeprotocols.h / scopeprotocols.cpp
<azonenberg>
and disabling in the cmakelists
<azonenberg>
it will still fail to compile due to x86-isms but you might get further
<azonenberg>
probably not worth the effort at this time though
<Johnsel>
for now the entire cmake still tries to complete so there are other dependencies that I can look at first without touching any code
<Johnsel>
which would be nice because the windows build still is giving me headaches
<Johnsel>
I may have to set up a local server 2022 instance after all
<azonenberg>
it absolutely should build on client windows, or is the github runner being the derpy one?
<Johnsel>
it's something related to the ci runner/environment
<Johnsel>
but it's practically impossible to debug because it doesn't return a proper error
<Johnsel>
nor can I ssh in or something to manually try
<Johnsel>
the only option I have is 45 min "try some random thing" cycles
<Johnsel>
or the reverse shell
<Johnsel>
ci building on windows is bad enough, ci building on mingw on windows is worse
<Johnsel>
but it's inherent to the managed CI runner
<Johnsel>
you just don't get enough access to really see it's internals
<Johnsel>
I -think- it's the "setup-msys2" action that has some bad code making the errors dissappear
<azonenberg>
fun
<Johnsel>
not really but in any case it's useful enough having the setup procedure for a self-hosted runner anyway
<azonenberg>
Yeah
<Johnsel>
the 45 min wait in between in worst
<Johnsel>
is*
<Johnsel>
just long enough to make context switching extra error-inducing
<azonenberg>
lol yeah
<azonenberg>
if it makes you any happier i'm debugging multithreaded deadlocks right now :p
<Johnsel>
I refrained from commenting on that because it brings back bad memories
<Johnsel>
actually it does make me feel better, we're sharing pain for the noble cause of bringing high speed measurements to the masses
<azonenberg>
Lol
<azonenberg>
Yeah i'm rewriting the filter graph scheduler
<azonenberg>
The old setup was block based
<azonenberg>
I'd make a group of filters that had no dependencies other than scope channels (i.e. no dependencies on other filters) and evaluate them in parallel with an openmp loop
<azonenberg>
once that finished i'd make a group that had no dependneices other than scope channels and the ones i had just finished
<azonenberg>
evaluate those
<azonenberg>
and repeat until i had none left
<azonenberg>
the problem is, often you have one thread finish one filter that could have unblocked others
<azonenberg>
but they have to wait until the whole group is done
<azonenberg>
So i'm switching to a more fine grained producer/consumer model where every time a filter completes, anything it's blocking is now eligible to run
<Johnsel>
nice, it's definitely one of the hard parts
<_whitenotifier-7>
[scopehal] azonenberg b63b79e - UpsampleFilter: avoid excessive use of push_back causing constant reallocations even when length is known in advance
GenTooMan has quit [Ping timeout: 244 seconds]
<azonenberg>
Woop, another refactoring done
<azonenberg>
honestly at this point i think i'm ready to start trying a Vulkan implementation of an actual filter block
GenTooMan has joined #scopehal
<azonenberg>
i think i'll do subtract since that's trivial
<azonenberg>
and i can focus on the setup and glue rather than the implementation
<_whitenotifier-7>
[scopehal] azonenberg f28e6a3 - Added PrepareForCpuAccess() helper method to Waveform class
Johnsel has joined #scopehal
Johnsel has quit [Remote host closed the connection]
Johnsel has joined #scopehal
<azonenberg>
Johnsel: so i remember we were talking earlier about maybe making a "load every file in scopehal-testdata and swee what breaks" test suite
<Johnsel>
si
<azonenberg>
I added a --quit-after-loading argument to glscopeclient to support that (as well as my frequent use case of benchmarking how long a filter graph takes to run by saving it to a file and profiling the load)
<azonenberg>
just run glscopeclient foo.scopesession --quit-after-loading
<Johnsel>
very useful indeed
<azonenberg>
it should exit with code 0 if all is well
<Johnsel>
I have set up a w2022 install in the mean time and will be working on it so I can definitely test that too
<azonenberg>
Great. I'm writing some glue logic in preparation for the first Vulkan filter implementation
<azonenberg>
which should hopefully happen later today
<Johnsel>
Cool. I haven't yet really found time to dig in the actual code of glscopeclient and the filters much with this ci distraction but I think it will be interesting to follow for sure
<azonenberg>
well, build infrastructure may be less sexy but it's just as necessary
<azonenberg>
and somebody's gotta do it
<azonenberg>
BTW, as soon as i commit this first Vulkan kernel we'll have a dependency on glslc to compile the shaders. so we might find new ways to break the build ;p
<azonenberg>
it should be installed as part of the vulkan SDK but there can always be path issues etc
<Johnsel>
Yep and it will absolutely help speed the development process up and help the quality (once there's decent test coverage).
<azonenberg>
Yeah. I'm trying to write test cases for core features like the AcceleratorBuffer class as i develop them
<azonenberg>
but filters, drivers, and gui code - in that order - are more and more difficult to write tests for
<azonenberg>
not impossible, but require substantially more effort
<Johnsel>
once you get into a rhythm where bugs are solved and a test written to prevent regression I'm sure it'll build up over time
<azonenberg>
So there's two main kinds of filter
<Johnsel>
you'll never get to 100% coverage, but every bit that is automated helps
<azonenberg>
the math-y ones are straightforward to write tests for
<azonenberg>
i might actually make one for the subtract filter as part of the vulkan refactoring to make sure it's working correctly
<azonenberg>
But things like PCIe are much harder
<azonenberg>
you need a) a lot of data and b) known-good decode output
<Johnsel>
you could make small test files that you can decode and test the decode output
<azonenberg>
yeah. To start i will write tests when i have multiple implementations, say with/without AVX
<azonenberg>
or software/GPU
<Johnsel>
if it's easy to contribute for people I'm sure that there will be people that will add to them
<azonenberg>
and verify they give identical results within some epsilon
<Johnsel>
bwegh everything about those runners is made with the expectation they're deployed to azure
<azonenberg>
And in the case of the subtract function i can easily generate synthetic input and predict what the output will be
<azonenberg>
yeah that makes hardware in loop hard :p
<Johnsel>
even though they use terraform's automation
<_whitenotifier-7>
[scopehal] azonenberg d533852 - Allow filters to specify where they want inputs (CPU, GPU, or don't care). Began preparing SubtractFilter for Vulkan implementation
<_whitenotifier-7>
[scopehal-apps] azonenberg 8f54ba6 - Bump min CMake version to 3.16 so that "cmake -E true" works
<azonenberg>
Progress: a test compute shader is now being compiled to SPIR-V by glslc
<azonenberg>
under cmake
<azonenberg>
the SPIR-V binary is *not* being installed by "make install" yet, that's a long ways out
<Johnsel>
I'll give it a shot in a bit
<azonenberg>
Johnsel: worry about it in a day or two
<azonenberg>
nothing uses the binary yet :p
<azonenberg>
so it doesnt break anything lol
<azonenberg>
(there is also another open ticket somewhere for some other file, i forget what it was, that was missing an install clause)
<azonenberg>
Anyway I'm gonna go take a break from coding for a bit and hopefully get the code to actually execute this shader done later in the evening
<azonenberg>
(and i still have probes to build, right...)
GenTooMan has quit [Ping timeout: 244 seconds]
GenTooMan has joined #scopehal
<Johnsel>
yeah same I'm going to do something else for now I'm pretty burned out on this issue
<Johnsel>
lain: I have some notes on how I set up the m1 buildbox, that you might find useful and/or want to modify once you start working on m1