#scopehal on 2022-08-29 — irc logs at libera.irclog.whitequark.org

2022-03-25 21:41 azonenberg changed the topic of #scopehal to: libscopehal, libscopeprotocols, and glscopeclient development and testing | https://github.com/glscopeclient/scopehal-apps | Logs: https://libera.irclog.whitequark.org/scopehal

01:02 Degi_ has joined #scopehal

01:03 Degi has quit [Ping timeout: 260 seconds]

01:03 Degi_ is now known as Degi

02:10 octorian_ is now known as octorian

03:18 <azonenberg> And the Rs are done. 51 files left

07:04 massi has joined #scopehal

08:00 <azonenberg> And finished refactoring the S's. I now get 67% of the way through a clean build

08:01 <azonenberg> new milestone, the list of files in the window menu of my editor fits on one screen without scrolling :p

08:01 <azonenberg> 35 files to go

08:01 <azonenberg> plus glscopeclient

08:10 <azonenberg> So at this rate i think i'll probably finish scopeprotocols tomorrow

08:10 <azonenberg> and then i dont know how much it will take to do glscopeclient yet

11:29 <benishor> azonenberg: congrats, why is the refactoring so heavy?

11:30 <benishor> usually one goes in small steps when refactoring

11:30 <benishor> I can only think that's a sign of high coupling

15:43 <azonenberg> benishor: i'm changing the fundamental definition of what a waveform is

15:43 <azonenberg> as you can imagine almost everything in the project works with waveforms in one way or another

15:44 <azonenberg> This is the latest and heaviest in a series of such refactorings going back quite a few years, and hopefully the last

15:44 <azonenberg> each time it was more work because we had more filters and drivers to redo

15:45 <azonenberg> for history... our original waveform representation was a vector<FooSample>

15:46 <azonenberg> where a sample object consisted of a 64-bit start time, a 64-bit duration, and an arbitrary value type

15:46 <azonenberg> e.g. an AnalogSample aka OscilloscopeSample<float> was int64_t offset, int64_t duration, float value

15:47 <azonenberg> This was not SIMD-friendly at all, so when i started doing vector optimizations I switched it to instead be a FooWaveform consisting of a vector<int64_t> offsets, vector<int64_t> durations, vector<T> samples

15:49 <azonenberg> The next step was to add a boolean flag m_densePacked as a hint, indicating that the data was uniformly sampled at a constant rate

15:49 <azonenberg> more formally, offsets = 0, 1, 2, 3... N-1 and durations = 1, 1, 1... 1

15:49 <azonenberg> a filter receiving dense packed input is allowed to ignore the offset/duration values and assume sample index = offset and duration=1, making compile time optimizations to reduce memory accesses and math

15:50 <azonenberg> generally this saves at least two memory loads per sample and sometimes avoids additional math or memory BW depending on how the filter is architected

15:51 <azonenberg> a filter outputting dense packed waveforms, however, is still obligated to generate offset/duration values

15:51 <azonenberg> because not all filters incorporated this optimization, so you have to provide the data for those filters who aren't aware it's redundant

15:51 <azonenberg> The problem is, when you're dealing with large datasets, this is a massive bloat

15:51 <azonenberg> you're dragging around two 64-bit values for a single 32-bit floating point value in an analog waveform

15:52 <azonenberg> which is an overhead 4x as big as your actual data

15:52 <azonenberg> the net result is that e.g. a 1 gigapoint waveform takes up 20GB of RAM instead of 4GB

15:58 <azonenberg> And so the current refactoring is that instead of Waveform<T> being derived from WaveformBase, we now have a third level of class hierarchy

15:58 <azonenberg> WaveformBase has two subclasses, SparseWaveformBase and UniformWaveformBase

15:59 <azonenberg> the former has offset/duration values, the latter does not

15:59 <azonenberg> and then SparseWaveform<T> and UniformWaveform<T> add sample data and are the actual waveform types instantiated to store real data

16:01 <azonenberg> Additionally, closely related, waveforms now have methods PrepareForCpuAccess(), PrepareForGpuAccess, MarkModifiedFromCpu, MarkModifiedFromGpu to manage explicit sync between separate CPU and GPU side memory buffers which are not cache coherent with each other

16:02 <azonenberg> (waveforms may be either pinned memory or separate buffers depending on various details, pinned memory is typically cache coherent but separate buffers are a non-coherent mirror)

16:02 <azonenberg> So every filter/driver has to be updated to call the appropriate methods on the input and output data

16:03 <azonenberg> and you can no longer be lazy and assume input has offset/duration values, so you need to explicitly check which of the two possible formats each input is in

16:03 <azonenberg> and use different implementations of the algorithm

16:03 massi has quit [Remote host closed the connection]

16:03 <azonenberg> (this is often quite simple as i have e.g. a helper method for sampling a waveform of arbitrary type on the edges of a clock, implemented as a template for arbitrary sparse/uniform combinations of data and clock, producing a sparse output)

16:04 <azonenberg> it's not a difficult change to make, but it has to be done everywhere and is complex enough that it's not practical to automate

17:10 <GenTooMan> as an aquaintance of mine would oft say if it were easy it would be done already

21:12 <azonenberg> Woo finished refactoring the T's

21:13 <azonenberg> 22 files to go

21:25 * GenTooMan read that as miles for a second.

21:27 <azonenberg> lol

21:29 <GenTooMan> that's almost a marathon!

21:29 * GenTooMan ahems.

21:38 <GenTooMan> well at least you aren't crazier ... wait that doesn't sound right.

22:05 <azonenberg> oh i'm definitely crazy. I just am not in shape to run a marathon

22:38 <azonenberg> And finished the U's

22:39 <azonenberg> 14 files to go. Definitely going to finish that part of it today

22:39 <azonenberg> then the big question is glscopeclient itself

22:39 <azonenberg> i have no idea how invasive the changes will be there :p

23:10 <azonenberg> And done with the Vs. 9 left

23:15 <GenTooMan> hmm so could you tag it and call it 0.0.1 or something :D

23:19 <azonenberg> i definitely want to make an official 0.1 release at some point

23:19 <azonenberg> but several of the tickets i had planned for it are still open

23:20 <azonenberg> at some point i want to sit back and re-evaluate if we are in a stable enough state to call it 0.1

23:20 <azonenberg> now is definitely not that point. i probably will want to get the vulkan renderer rewrite done and remove the old fft libaries and opencl code

23:20 <azonenberg> just to avoid dragging around too many dependencies

23:21 <azonenberg> and then make sure that all of the various ports/buidls are functional

23:21 <azonenberg> installers and packages are available and up to date

23:21 <azonenberg> etc

23:22 <GenTooMan> hmm what's up with opencl deletion?

23:24 * GenTooMan wonders if opencl has become obsolete or because Nvidia is "normally a total pain in the <censored>"

23:25 * GenTooMan <-- anti Nvidia person for good reason.

23:25 <GenTooMan> never had one of their display cards work right in linux <-- example and I had plenty of them.

23:27 <azonenberg> i have been an nvidia user on linux for years and their blob driver works fine. nouveau is a piece of garbage that is only usable with ancient cards

23:27 <azonenberg> by the time they're fully RE'd they're obsolete

23:28 <azonenberg> that said, they dont have good tooling support for opencl

23:28 <azonenberg> and the reason for the migration is actually not nv's fault. it's apple

23:28 <azonenberg> despite being one of the early key players/backers of opencl, apple is deprecating it. and they also have poor opengl support (e.g. no compute shaders on any platform)

23:29 <azonenberg> also, clFFT, the de facto standard FFT library for opencl, is abandonware and hasn't had any commits since 2017

23:29 <azonenberg> and has multiple serious bugs that are impacting the project

23:29 <azonenberg> also, opencl to opengl buffer sharing is a bit awkward WRT rendering

23:30 <azonenberg> So the new plan is, we're converting all of our gpu acceleration to vulkan for the most part, ditching opengl compute as well as opencl

23:30 <azonenberg> vkFFT is actively maintained, and vulkan (unlike GL 4.3+) runs on apple platforms through MoltenVK which translates vulkan api calls to metal

23:31 <azonenberg> So the net result is that we will end up having to support one compute api instead of two, we will no longer need to check for gpu acceleration support as anything that can run our renderer can also run our accelerated filter blocks

23:31 <GenTooMan> well it won't be able to support my platform.

23:31 <azonenberg> long term we can potentially get rid of software fallbacks for those filters although i'd want to keep them for unit testing

23:31 <azonenberg> oh?

23:31 <azonenberg> i am not aware of anything which supports opengl 4.3 but not vulkan

23:32 <azonenberg> if you want to limit yourself to open drivers that might be a different story

23:32 <azonenberg> but using blob drivers at least, basically everything made in the last ~10 years can run vulkan

23:32 <GenTooMan> for some reason my computer refuses to use them I suppose I can "try again"

23:33 <azonenberg> Anyway, there is also the option of software fallback using llvmpipe or swiftshader

23:33 <azonenberg> which, at a significant performance cost, should run on anything whatsoever

23:34 <GenTooMan> might be a good option, as long as their is a way to verify it works :D

23:34 <azonenberg> that said... in general, glscopeclient is targeting users in industry with relatively high end scopes (which you'd need to take advantage of all the e.g. gigabit serial protocol decodes etc)

23:34 <azonenberg> the assumption is that someone with that kind of budget can afford a <10 year old PC

23:34 <azonenberg> i'll gladly merge PRs to support older/entry level hardware

23:35 <azonenberg> but it's not the focus and key architectural/project direction decisions are being made to support our primary target demographic

23:35 <azonenberg> so if i have to kill support for openbsd on a core2quad with a gtx 480 in order to get some useful capability, i will do so without a second thought

23:36 <azonenberg> And i dont have the time or engineering resources to design or maintain heavyweight fallback mechanisms

23:36 <azonenberg> if someone wants to add one to support their hardware and it doesnt break anything important, great

23:36 <azonenberg> but i won't put any effort into supporting it myself

23:37 <azonenberg> It's the unfortunate reality of having limited staff, time, and budget. I'm building glscopeclient to solve my own problems and releasing it as open source because i figure other people can benefit from that

23:37 <azonenberg> ultimately my priority is to get my work done, not to run on any computer ever made

23:40 <GenTooMan> K