<tnt>
The litepcie loopback when operating in prog mode should be 'blocking' right ? I mean if the dma writer doesn't have any descriptors it will not read from its input fifo. And the dma reader isn't going to read/use any of the descriptors if its output fifo is full ?
<tnt>
Oh ...but when disabled, they will drop anything.
oter has quit [Remote host closed the connection]
oter has joined #litex
<_florent_>
tnt: the difference between prog and loop mode is only that in loop mode, descriptors read from the fifo are written back
<tnt>
_florent_: yeah, I know, but that means it doesn't need interaction from the sw to proceed since it always has descriptors.
<tnt>
Here what I was observing is that due to some issue in my code, I wasn't enabling the DMA Writer, but I was seeing the DMA reader keep going ...
<tnt>
and that surprised me because I was wondering where the data was going since the writer wasn't being run, it wouldn't consume any dat aand so the reader should block.
<tnt>
But turns out that when disabled the DMA writer just discards any data at its input.
<_florent_>
tnt: Ah sorry, so while programming the descriptors, you can keep the DMA disable
<_florent_>
so filtered the valid/ready with the enable
<_florent_>
but this makes behavior different on this point with direct LiteX integration of the core and with the generator
<_florent_>
so we could eventually add a parameter to configure this
<tnt>
Wait, this does the opposite of what the commit says. It says "DMA Writer will not accept incoming stream when disabled." but then you do "self.comb += sink.ready.eq(1)"
<_florent_>
the commit is describing the behavior for the generator
<tnt>
Anyway, I'm not really bothered by the behavior, I just wasn't expecting it, but my logic won't generate any input to the writer with it being disabled, so not a problem.
<tnt>
What kind of performance (Gb/s or % of theoritical) should I be expecting btw ?
<_florent_>
ok, I understand it can be confusing. Maybe we should block DMA Writer by default and enable the discarding only when specified. I'll look at this.
<_florent_>
That's generally around ~80-85% efficiency (on PCIe Gen2 / 7-Series), it should be similar for Gen3/Gen4/Ultrascale
<_florent_>
So ~3.5Gbps per Gen2 lane. (theoritical max of 4Gbps with the 8b10b encoding).
<tnt>
gen3 is not 8b/10b, you should get almost all of it so in 8x, I should get about 50G. ATM I'm at 33G (in zero copy and without data check), so probably some overhead of using prog mode vs loop mode, I'll look into improving that.
<_florent_>
yes I'm aware gen3 is not 8b/10b, I was just providing the numbers I have on gen2 :)
<tnt>
yeah, I was just providing explanation for my math :)
<_florent_>
with high PCIe bandwidth, the DRAM bandwidth on the Host can also be a limiting factor
<_florent_>
be sure to activate dual/quad channel if available
<tnt>
Yeah, I filled the right DIMM slots, but on the chipset you can't "overclock" the RAM :/ It's stuck at like 2466M or something like that.
futarisIRCcloud has joined #litex
FabM has quit [Quit: Leaving]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<somlo>
gatecat: I just updated my toolchain to the latest (as of last night) yosys, trellis, and nextpnr. And it looks like I can now fit a FPU-enabled rocket core on the 85k ecp5, which is awesome!
<somlo>
nextpnr FTW! :D
<gatecat>
oh, that's very good news, there were some ECP5 packing improvements that have hopefully helped
<somlo>
timing is less forgiving -- I still get nextpnr to report 25-ish MHz (I'm asking for 50). Without the FPU, it mostly boots linux and trundles along OK. But with the FPU, it now fails memtest
<somlo>
not sure how much lower than 50 I can take LiteX and still have it work (iirc, litedram really really doesn't like running at slow sysclock rates)
<somlo>
but I'm currently hammering at it with/without `nowidelut` and `abc9`, and with random nextpnr seeds, to see if I maybe get lucky with one of the runs, timing-wise :)
<somlo>
but anyway, TLDR -- I wanted to say thanks for the placement improvement, it's quite significant!
<swetland>
ooh I should update. been squishing a VexRISCV RV32IM w/ U/S/M and MMU and peripherals in a 25F and it's crowded in there
<tnt>
you need to disable the DDR DLL to run at slow speed.