_florent_ changed the topic of #litex to: LiteX FPGA SoC builder and Cores / Github : https://github.com/enjoy-digital, https://github.com/litex-hub / Logs: https://libera.irclog.whitequark.org/litex
tpb has quit [Remote host closed the connection]
tpb has joined #litex
nelgau has joined #litex
Degi has quit [Ping timeout: 256 seconds]
Degi has joined #litex
eigenform has quit [Remote host closed the connection]
eigenform has joined #litex
FabM has joined #litex
FabM has quit [Changing host]
FabM has joined #litex
FabM has quit [Remote host closed the connection]
FabM has joined #litex
FabM has joined #litex
FabM has quit [Changing host]
cr1901 has quit [Read error: Connection reset by peer]
cr1901 has joined #litex
jryans has quit [Quit: You have been kicked for being idle]
FabM has quit [Ping timeout: 240 seconds]
<tnt> Well ... I implemented the 'streaming' mode in a way that works. (I'm not a jtag expert but I'm pretty sure openocd is doing DRPAUSE wrong and I had to "match their wrongness").
<tnt> It still sucks :/ 30MHz JTAGBone is 20x slower at downloading a litescope trace than a 2Mbaud UARTBone.
<tnt> Unfortunately I think there is a bunch of inefficiencies points that compound:
<tnt> - AFAICT LiteScope just uses plain old single-at-a-time register read to get the data, but each read is one command, then the latency to wait for the response and then get the data. No bursting or anything to avoid that long latency cycle.
<tnt> - The whole valid/ready handshake on the jtagbone also means that it can only send one byte per drscan burst because if the ready bit it reads back is 0, it would have no way to stop any further bytes in the burst to go through should the 'ready' bit flip to 1 during a burst.
<_florent_> tnt: litex_server is able to automatically regroup read access in bursts with https://github.com/enjoy-digital/litex/blob/master/litex/tools/litex_server.py#L24
<_florent_> tnt: this should happen during the LiteScope upload
<tnt> _florent_: but they're not sequential reads
<tnt> I mean ... thre is a few sequential to read the width of the monitored burst.
<tnt> but then it's reading the same address again.
<tnt> So lets say you monitor 64 bits, it will read address 0 4 0 4 0 4 0 4 0 4 ....
<tnt> So sure, it regroups the 0 4 in a burst, but that's not much compared to all the reads to get the whole data.
<tnt> Also in litescope_cli it's calling regs.xxxx.read() so it would want the result before issuing the next one, so it wouldn't have any opportunity to merge.
<_florent_> tnt: it will indeed have more effect on large capture buses
<tnt> should probably rever d8df6cb27d0d2611ab4e4fd41d303c111525581c too. I guess there was a reason it wasn't in the list and I just assumed it was an inadvertant omission.
<_florent_> tnt: but yes, this is a simple upload protocol and this could be optimized
<tnt> That's what I'm looking at now. It's probably the shortest path to increase performance.
<tnt> (rather than keep banging my head against JTAGBone)
<_florent_> tnt: to allow fixed bursts, we could also only expose the data on a 32-bit CSR and use DownConverter between the mem FIFO and the CSR interface.
<_florent_> tnt: this would avoid the 0 4 0 4 0 4 etc... pattern and allow read_merger to generate a proper fixed burst
<tnt> Yes, that's the plan.
<_florent_> but we should also probably avoid checking mem_valid for each data: https://github.com/enjoy-digital/litescope/blob/master/litescope/software/driver/analyzer.py#L162
<tnt> And I also have a plan to deal with the 'mem_valid' it sticks in there.
<_florent_> ok good :)
<_florent_> instead of the valid, you could report the mem.level on a CSR
<tnt> I think the valid is mostly there to deal with the CDC fifo becoming empty
<tnt> in case the 'scope' clock domain is slow vs the 'sys' one.
<tnt> Or maybe not ... because it aborts if not valid.
<tnt> Then yeah, actually just reading the level at the beginning instead of storage_length would do.
bl0x has joined #litex
<_florent_> tnt: I did something very close to speed-up crossover UART, this can maybe be useful: https://github.com/enjoy-digital/litex/blob/master/litex/tools/litex_term.py#L122-L131
<_florent_> tnt: here I was monitoring full/empty CSR but principle will be similar with a level CSR.
<tnt> _florent_: is there convenient way to get the native CSR bus width ?
<tnt> (instead of just CSRStatus(32) ...)
<_florent_> The SoC has it (self.csr_data_width), but this is not directly available from the core, this could be a parameter of LiteScopeAnalyzer
<_florent_> tnt: csr_data_width of 8 is still supported, but not sure it's useful have it optimized with LiteScope since mostly here for retro-compabitibility and almost everyone is probably using csr_data_width=32 now
r4d10n[m] has quit [Quit: You have been kicked for being idle]
<tnt> Gotta run now, but first results are encouraging. ~ 7x speed up of the download phase and > 85% of the theoritical max bitrate of UART.
daveb has joined #litex
<_florent_> Great!
<jevinskie[m]> About JTAGbone speed… I’ve been pondering for a while about adding another bit to the protocol. If set, it would initiate a DMA from a following addr/length pair and stream it out
<_florent_> jevinskie[m]: we could think about an alternative protocol to speed up large transfers yes. (I was also thinking doing something similar for Etherbone where we could just stream the data on a specific UDP port to/from the Host)
daveb has quit [Quit: daveb]
Znullptr has joined #litex
<tnt> jevinskie[m]: well the protocol actually already support that since it's the same UARTBone protocol and supports bursting.
<tnt> But an improvement to the tunneling made in JTAGBone would be that instead of a 'ready' bit (in the device -> host directly), we use an 'not almost full' bit instead. So that if it's not set, we can safely push several chars because there is enough buffer space.
znullptr[m] has joined #litex
Znullptr has quit []
<tnt> Pushed a PR for the new proto. Should probably be tested by a few more people though :)
<znullptr[m]> used all defaults for install ` litex_sim --cpu-type=vexriscv ` : ../libc/libc.a(libc_ssp_chk_fail.c.o): in function `__chk_fail': undefined reference to `write'