azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal
Degi_ has joined #scopehal
Degi has quit [Ping timeout: 260 seconds]
Degi_ is now known as Degi
octorian_ is now known as octorian
<azonenberg> And the Rs are done. 51 files left
massi has joined #scopehal
<azonenberg> And finished refactoring the S's. I now get 67% of the way through a clean build
<azonenberg> new milestone, the list of files in the window menu of my editor fits on one screen without scrolling :p
<azonenberg> 35 files to go
<azonenberg> plus glscopeclient
<azonenberg> So at this rate i think i'll probably finish scopeprotocols tomorrow
<azonenberg> and then i dont know how much it will take to do glscopeclient yet
<benishor> azonenberg: congrats, why is the refactoring so heavy?
<benishor> usually one goes in small steps when refactoring
<benishor> I can only think that's a sign of high coupling
<azonenberg> benishor: i'm changing the fundamental definition of what a waveform is
<azonenberg> as you can imagine almost everything in the project works with waveforms in one way or another
<azonenberg> This is the latest and heaviest in a series of such refactorings going back quite a few years, and hopefully the last
<azonenberg> each time it was more work because we had more filters and drivers to redo
<azonenberg> for history... our original waveform representation was a vector<FooSample>
<azonenberg> where a sample object consisted of a 64-bit start time, a 64-bit duration, and an arbitrary value type
<azonenberg> e.g. an AnalogSample aka OscilloscopeSample<float> was int64_t offset, int64_t duration, float value
<azonenberg> This was not SIMD-friendly at all, so when i started doing vector optimizations I switched it to instead be a FooWaveform consisting of a vector<int64_t> offsets, vector<int64_t> durations, vector<T> samples
<azonenberg> The next step was to add a boolean flag m_densePacked as a hint, indicating that the data was uniformly sampled at a constant rate
<azonenberg> more formally, offsets = 0, 1, 2, 3... N-1 and durations = 1, 1, 1... 1
<azonenberg> a filter receiving dense packed input is allowed to ignore the offset/duration values and assume sample index = offset and duration=1, making compile time optimizations to reduce memory accesses and math
<azonenberg> generally this saves at least two memory loads per sample and sometimes avoids additional math or memory BW depending on how the filter is architected
<azonenberg> a filter outputting dense packed waveforms, however, is still obligated to generate offset/duration values
<azonenberg> because not all filters incorporated this optimization, so you have to provide the data for those filters who aren't aware it's redundant
<azonenberg> The problem is, when you're dealing with large datasets, this is a massive bloat
<azonenberg> you're dragging around two 64-bit values for a single 32-bit floating point value in an analog waveform
<azonenberg> which is an overhead 4x as big as your actual data
<azonenberg> the net result is that e.g. a 1 gigapoint waveform takes up 20GB of RAM instead of 4GB
<azonenberg> And so the current refactoring is that instead of Waveform<T> being derived from WaveformBase, we now have a third level of class hierarchy
<azonenberg> WaveformBase has two subclasses, SparseWaveformBase and UniformWaveformBase
<azonenberg> the former has offset/duration values, the latter does not
<azonenberg> and then SparseWaveform<T> and UniformWaveform<T> add sample data and are the actual waveform types instantiated to store real data
<azonenberg> Additionally, closely related, waveforms now have methods PrepareForCpuAccess(), PrepareForGpuAccess, MarkModifiedFromCpu, MarkModifiedFromGpu to manage explicit sync between separate CPU and GPU side memory buffers which are not cache coherent with each other
<azonenberg> (waveforms may be either pinned memory or separate buffers depending on various details, pinned memory is typically cache coherent but separate buffers are a non-coherent mirror)
<azonenberg> So every filter/driver has to be updated to call the appropriate methods on the input and output data
<azonenberg> and you can no longer be lazy and assume input has offset/duration values, so you need to explicitly check which of the two possible formats each input is in
<azonenberg> and use different implementations of the algorithm
massi has quit [Remote host closed the connection]
<azonenberg> (this is often quite simple as i have e.g. a helper method for sampling a waveform of arbitrary type on the edges of a clock, implemented as a template for arbitrary sparse/uniform combinations of data and clock, producing a sparse output)
<azonenberg> it's not a difficult change to make, but it has to be done everywhere and is complex enough that it's not practical to automate
<GenTooMan> as an aquaintance of mine would oft say if it were easy it would be done already
<azonenberg> Woo finished refactoring the T's
<azonenberg> 22 files to go
* GenTooMan read that as miles for a second.
<azonenberg> lol
<GenTooMan> that's almost a marathon!
* GenTooMan ahems.
<GenTooMan> well at least you aren't crazier ... wait that doesn't sound right.
<azonenberg> oh i'm definitely crazy. I just am not in shape to run a marathon
<azonenberg> And finished the U's
<azonenberg> 14 files to go. Definitely going to finish that part of it today
<azonenberg> then the big question is glscopeclient itself
<azonenberg> i have no idea how invasive the changes will be there :p
<azonenberg> And done with the Vs. 9 left
<GenTooMan> hmm so could you tag it and call it 0.0.1 or something :D
<azonenberg> i definitely want to make an official 0.1 release at some point
<azonenberg> but several of the tickets i had planned for it are still open
<azonenberg> at some point i want to sit back and re-evaluate if we are in a stable enough state to call it 0.1
<azonenberg> now is definitely not that point. i probably will want to get the vulkan renderer rewrite done and remove the old fft libaries and opencl code
<azonenberg> just to avoid dragging around too many dependencies
<azonenberg> and then make sure that all of the various ports/buidls are functional
<azonenberg> installers and packages are available and up to date
<azonenberg> etc
<GenTooMan> hmm what's up with opencl deletion?
* GenTooMan wonders if opencl has become obsolete or because Nvidia is "normally a total pain in the <censored>"
* GenTooMan <-- anti Nvidia person for good reason.
<GenTooMan> never had one of their display cards work right in linux <-- example and I had plenty of them.
<azonenberg> i have been an nvidia user on linux for years and their blob driver works fine. nouveau is a piece of garbage that is only usable with ancient cards
<azonenberg> by the time they're fully RE'd they're obsolete
<azonenberg> that said, they dont have good tooling support for opencl
<azonenberg> and the reason for the migration is actually not nv's fault. it's apple
<azonenberg> despite being one of the early key players/backers of opencl, apple is deprecating it. and they also have poor opengl support (e.g. no compute shaders on any platform)
<azonenberg> also, clFFT, the de facto standard FFT library for opencl, is abandonware and hasn't had any commits since 2017
<azonenberg> and has multiple serious bugs that are impacting the project
<azonenberg> also, opencl to opengl buffer sharing is a bit awkward WRT rendering
<azonenberg> So the new plan is, we're converting all of our gpu acceleration to vulkan for the most part, ditching opengl compute as well as opencl
<azonenberg> vkFFT is actively maintained, and vulkan (unlike GL 4.3+) runs on apple platforms through MoltenVK which translates vulkan api calls to metal
<azonenberg> So the net result is that we will end up having to support one compute api instead of two, we will no longer need to check for gpu acceleration support as anything that can run our renderer can also run our accelerated filter blocks
<GenTooMan> well it won't be able to support my platform.
<azonenberg> long term we can potentially get rid of software fallbacks for those filters although i'd want to keep them for unit testing
<azonenberg> oh?
<azonenberg> i am not aware of anything which supports opengl 4.3 but not vulkan
<azonenberg> if you want to limit yourself to open drivers that might be a different story
<azonenberg> but using blob drivers at least, basically everything made in the last ~10 years can run vulkan
<GenTooMan> for some reason my computer refuses to use them I suppose I can "try again"
<azonenberg> Anyway, there is also the option of software fallback using llvmpipe or swiftshader
<azonenberg> which, at a significant performance cost, should run on anything whatsoever
<GenTooMan> might be a good option, as long as their is a way to verify it works :D
<azonenberg> that said... in general, glscopeclient is targeting users in industry with relatively high end scopes (which you'd need to take advantage of all the e.g. gigabit serial protocol decodes etc)
<azonenberg> the assumption is that someone with that kind of budget can afford a <10 year old PC
<azonenberg> i'll gladly merge PRs to support older/entry level hardware
<azonenberg> but it's not the focus and key architectural/project direction decisions are being made to support our primary target demographic
<azonenberg> so if i have to kill support for openbsd on a core2quad with a gtx 480 in order to get some useful capability, i will do so without a second thought
<azonenberg> And i dont have the time or engineering resources to design or maintain heavyweight fallback mechanisms
<azonenberg> if someone wants to add one to support their hardware and it doesnt break anything important, great
<azonenberg> but i won't put any effort into supporting it myself
<azonenberg> It's the unfortunate reality of having limited staff, time, and budget. I'm building glscopeclient to solve my own problems and releasing it as open source because i figure other people can benefit from that
<azonenberg> ultimately my priority is to get my work done, not to run on any computer ever made
<GenTooMan> K