<_whitenotifier-4>
[amaranth-lang/amaranth-lang.github.io] github-merge-queue[bot] f914629 - Deploying to main from @ amaranth-lang/amaranth@590cba1d6c00dffba7abb5f399680ac50e252adf 🚀
_whitelogger_ has joined #amaranth-lang
_whitelogger_ has quit [Remote host closed the connection]
_whitelogger_ has joined #amaranth-lang
__DuBPiRaTe__ has quit [Quit: Leaving]
_whitelogger_ has quit [Remote host closed the connection]
<Seb[m]>
I have a bit of a design question that I thought perhaps someone here might be interested to weigh in on. Context is I have quite a complex DSP pipeline running at audio rate, all components connected with streams.
<Seb[m]>
Most components perform multiplies. As a result, designs run out of multipliers before any other FPGA resource. However, due to the low sample rate, the ECP5 DSP tiles are quite under-utilized. So, I have been prototyping with a tagged message-ring-like arrangement, where each multiply implies a message on the ring, and if the component state machines on the same rings match, the multiplier throughput can still be close to 100%:
<Seb[m]>
It seems to work okay and doesn't use much FPGA resources, but does this design seem remotely sane? Anyone know if there is any literature on an arrangement like this? I have not been able to find much
<adamgreig[m]>
seems reasonable enough, though i've usually just had one clock domain and the other elements can skip processing every other (or every n) cycles, saves needing the asyncfifo and other complexity
<adamgreig[m]>
you could have the A side run on even clock cycles and B on odd and the multipliers on both or something
<adamgreig[m]>
but also: the exp5 multipliers can actually run DDR, so you could clock the whole thing at 60M and still get two multiplies per cycle out
vk2seb[m] has joined #amaranth-lang
<vk2seb[m]>
oho, very interesting that the ECP5 multipliers can run DDR. Was not aware of that. That seems much cleaner!
<vk2seb[m]>
And lower latency too, although a few clocks doesn't matter so much for this application.
<adamgreig[m]>
it will probably be a lot more annoying to set up and use though 😅
<adamgreig[m]>
I'd personally just keep what you have, if it's working ok
Guest72 has quit [Quit: Client closed]
zyp[m] has joined #amaranth-lang
<zyp[m]>
IIRC nextpnr can do cross domain constraints now for simple cases like a CLKDIVF making a 60MHz clk from a 120MHz clk
<zyp[m]>
which sounds like less hassle to deal with than internal DDR signals
<adamgreig[m]>
yosys cannot infer an ecp5 ddr multiplier either so you'd be completely manually instantiating them which is not hugely fun
<Seb[m]>
Hm, I'd like to avoid manually instantiating things if possible. in any case thanks for the pointers.
<Seb[m]>
In any case I expect 2-rings-per-multiplier to (hopefully) not be necessary for all but the most insane audio projects :)