<d1b2>
<hansemro> Sorry for delays on finalizing PRs. I am nearly finished working on AVX optimizations for Siglent BIN Import. AVX2 optimization for processing digital samples is done, but could use some work/feedback.
<azonenberg>
Oh yay the IRC-discord bridge is back up
<azonenberg>
No worries, i have my hands full with other stuff
<azonenberg>
I'm in the middle of doing some probe testing and working with some artists to improve the appearance of the filter graph editor
<azonenberg>
I'd have to take a bit more time to look at that and find out. What i can say is, i'm impressed, i think you're the first contributor we've had doing vector optimization other than myself lol
<d1b2>
<hansemro> @azonenberg Looking at QueueManager, I see that it is sorting queues in ascending order of feature flag count. This often means prioritizing queues with graphics capabilities. However, the first selected queue is not a graphics queue, but a transfer queue (see VulkanInit). On AMD integrated GPUs, where there is only 1 graphics capable queue, QueueManager will end up reusing the same graphics queue for rendering and g_vkTransferQueue. If
<d1b2>
instead, we reversed the sort order (in descending order of feature flag count), this reuse does not happen and we can reserve graphics queue when they are needed for rendering.
<azonenberg>
Hmm, that makes sense. In general we should try to use the least featureful queue that meets our needs for a given application
<azonenberg>
lain: ^^
<azonenberg>
Do some testing and send a PR
<azonenberg>
in general we have done comparatively little testing on AMD cards since most of the devs have nvidia or apple silicon platforms
<azonenberg>
Improved APU / unified memory card support is still pending
<azonenberg>
in particular, AcceleratorBuffer does not currently understand that memory can be both host local and device local at the same time in a unified memory SoC
<azonenberg>
so it will allocate two copies of each memory block and create needless copies
<azonenberg>
So if you wanted to spend some time on it, it certainly wouldn't hurt
<azonenberg>
Being a performance issue rather than "totally broken" and not affecting a platform anyone had easy access to for testing, it was lower on the priority list
<d1b2>
<hansemro> Yes, this is pretty low priority. Not experiencing severe performance issues, but wanted to raise some awareness.
<azonenberg>
File a ticket if nothing else
<azonenberg>
Also, the CI stuff has been on hold for quite some time, i have a pair of GPUs sitting on the floor next to my rack that the VM serve ris in
<azonenberg>
where they've been since like april
<azonenberg>
i hope to have time to get back to that soon. things have been hectic and rebooting the vm server is a big annoyance but i'm about due for some hypervisor patches and distro updates on the VMs so a lot of stuff is going to get rebooted soon anyway
<d1b2>
<hansemro> Unrelated: I am interested in picking up the jtaghal project. What are your bsdl parsing needs? I don't have too much experience in writing a lexer and parser, but I want to do some bsdl-IC validation tooling.
<azonenberg>
Jtaghal had really been focused on in circuit debug and test of FPGA stuff, with a bit of ARM on the side
<d1b2>
<hansemro> I see
<azonenberg>
I dont think i've ever actually done actual boundary scan with INTEST/EXTEST
<azonenberg>
So i never put any effort into it
<azonenberg>
I was mostly using it for things like debug of FPGA based stuff using the xilinx USERx instructions, and researching low level ARM debug stuff to study code protection and security mechanisms for work
<azonenberg>
While the project isn't dead, i'm not actively working on it because it does what I need it to at the moment
<azonenberg>
And it never got anywhere near the level of community or adoption as scopehal did
<d1b2>
<hansemro> gotcha
<azonenberg>
That said, one of my mid term TODO items is reverse engineering the xilinx ILA and VIO IP core JTAG protocols
<azonenberg>
and writing a jtaghal + scopehal based driver such that I can interface directly with ILA and VIO blocks a) without having to use vivado for debug and b) use ngscopeclient to read ILA data
<azonenberg>
ultimately i want to be able to do complex cross-trigger setups with an external scope/LA plus an ILA (or several) and do trigger cascade, compare the on-chip view of a signal to the off-chip view and identify the electrical causes of bit errors, etc
<azonenberg>
And poke bits in a VIO from ngscopeclient gui
<azonenberg>
while viewing analog waveforms
<azonenberg>
When fine tuning an FPGA transceiver, i frequently will poke emphasis taps and drive strength in a VIO then take eye measurements with a scope
<azonenberg>
So having that all under one roof would be handy
<azonenberg>
I also want to have a well defined way to access the built in eye scan feature on xilinx FPGAs via scopehal, so i can get a post-equalization BER eye
<azonenberg>
hansemro: anyway, if you want to use jtaghal for your work i won't object to you figuring out a way to bolt in a BSDL parser, and will happily take a PR for it. but i don't consider it a priority at all
<d1b2>
<hansemro> I guess this is also something unsupported by open source series 7 Xilinx FPGA flows? Do you know any projects that use BSCAN blocks directly?
<azonenberg>
I do not know the state of the open flows, i've always used vivado
<azonenberg>
For my thesis, I made heavy use of BSCANs for debug. in fact that was what i originally started jtaghal for since xilinx didn't have any API in ISE/vivado for doing this
<azonenberg>
(this was in part because ChipScope was a paid feature for ISE at the time and vivado came out right before I graduated)
<azonenberg>
i had my own logic analyzer core
<azonenberg>
i also had a layer 2 tunneling mechanism allowing me to push raw frames from my custom NoC over JTAG and into the interconnect fabric on the FPGA
<azonenberg>
on the PC side it was exposed as a TCP socket server
<azonenberg>
you could connect to the server and get a connection object which directly mapped to a virtual bus endpoint on the FPGA
<azonenberg>
and send and receive messages via JTAG as if you were a soft IP on the FPGA
<azonenberg>
and exercise actual gateware at full hardware speed from C++ test cases
<azonenberg>
It was actually barely even JTAG I was using at that point
<azonenberg>
I loaded USER1 into IR, switched to SHIFT-DR state
<azonenberg>
then free-ran TCK while pushing framed data or padding into TDI and getting framed data out TDO
<azonenberg>
so i'd just send zeroes if i had nothing to say, then when i wanted to send a frame I'd send a 55 55 55 D5 preamble followed by the bus transaction lol
<azonenberg>
and then a CRC at the end
<azonenberg>
it was basically slightly abbreviated ethernet framing tunneled over barely-jtag
<d1b2>
<hansemro> I am really amazed by your work. As a recent college grad, I have lots to learn and experience.
<d1b2>
<hansemro> I agree with your sentiment that a device that cannot be tested/debugged is useless. I am trying to force myself to do more pre-design and post-design verification work since no one I know seems to enjoy it.
<azonenberg>
yeah, meanwhile i like building tools
<azonenberg>
almost more than i like building things with said tools lol
<d1b2>
<hansemro> I correct my statement about the QueueManager. It looks like the intended sort is ascending (which is what we want), but the sort is actually descending. So this seems like a bug.