<d1b2>
<johnsel> hey @azonenberg not super related this question but it has been quite silent here for a while and people might take an interest. You are working on a DIY switch, correct? Do you have some insight into how to start making use of the SFP+ on the KC705 I got? Any projects I should take a look at or general approach tips?
<d1b2>
<johnsel> now I should specify that I want to do 10G
<azonenberg>
johnsel: So basically, a SFP+ is just a differential pair to light converter
<azonenberg>
it has no intelligence, just some thresholding and a few simple feedback loops for sensitivity and tx power level
<d1b2>
<johnsel> yup, the question is how do I build up something inside the FPGA to talk to/over it 🙂
<azonenberg>
There's a few 3.3V GPIOs for things like enabling/disabling the transmit, detecting that a module is present, detecting faults
<azonenberg>
an optional i2c bus that contains a descriptor EEPROM and (usually, but not required by spec) some sensors
<azonenberg>
The actual data is 10Gbase-R coded
<azonenberg>
Which is to say, 64/66b coded ethernet frames
<azonenberg>
I have an open source MAC/PCS in my antikernel-ipcores repo that integrates nicely with a 7 series GTX
<azonenberg>
XGMACWrapper is just a shell around those two to save you the trouble of instantiating the two modules directly
<d1b2>
<johnsel> that's very useful already
<azonenberg>
What you end up with is, on the internal-facing side, a data bus consisting of 32 data bits, a 312.5 MHz clock, a valid flag, and a bytes-valid counter
<azonenberg>
plus a start flag that is asserted during the preamble (so you can reset per-packet state machines)
<azonenberg>
and then at the end of a packet either commit goes high, indicating good checksum and everything went fine
<azonenberg>
or drop goes high, indicating the packet was corrupted/malformed and should be ignored
<azonenberg>
TX is the same bus sans drop flag, once you start sending you have to finish sending it
<azonenberg>
on the other side, it expects to talk to the 7 series transceiver wizard configured for 10Gbase-R with, iirc, the asynchronous 64/66b gearbox
<azonenberg>
also note that my XGMIIBus interface is not 802.3 compliant XGMII
<azonenberg>
i swapped the lane numbering left to right, so that bytes would show up in a human readable order in logic analyzer / simulation traces
<azonenberg>
and it's also single rate 312.5 MHz vs DDR 156.25 MHz since nobody uses ddr signals inside an fpga
<d1b2>
<johnsel> Thanks, that's super useful already. I haven't looked very carefully, but it looked like you have some IPv4 packet related things written already, correct?
<azonenberg>
I have a full IPv4, ICMP, ARP, and UDP stack
<azonenberg>
It's intended as an embedded server, so it lacks client support for most of these protocols
<azonenberg>
e.g. it can respond to incoming pings, but not initiate an echo request
<azonenberg>
it also has a TCP server that is a WIP, it works great as long as you never drop a packet from the FPGA to the client
<azonenberg>
it will correctly send ACKs and everything else so client-to-FPGA packet loss is well tolerated
<azonenberg>
but it doesn't retransmit anything sent in the opposite direction
<d1b2>
<johnsel> hmmm, do you have something you use to benchmark it?
<azonenberg>
Not currently. I'm not actually using the stack for anything serious yet
<azonenberg>
what i've actually used more seriously is the software tcp/ip stack, azonenberg/staticnet
<azonenberg>
which is basically the same level of completion
<azonenberg>
no tcp retransmits, no client support, no ipv6
<azonenberg>
the difference is, this one has a ssh server implementation attached to it
<azonenberg>
it's super bare bones and has no OS or library dependencies, in particular it explicitly does not use dynamic memory allocation
<d1b2>
<johnsel> I see, not on a microblaze or other cpu core inside a FPGA I assume right?
<azonenberg>
everything is based on fixed sized packet pools that are statically allocated
<azonenberg>
It could hypothetically run on such
<azonenberg>
but the intended use case is stm32h7
<azonenberg>
i have a driver for the stm32h7 crypto accelerator to speed up SSH already, although it doesn't have elliptic curve functionality
<azonenberg>
so i either do that in software or (in progress) integrate with an fpga curve25519 accelerator
<azonenberg>
The intent for the all-FPGA stack is to be used on the open hardware scopes, since there's no way the stm32 tcp/ip stack can get remotely close to saturating a 10G link with packet data
<azonenberg>
What i am beginning to explore is linking them
<azonenberg>
so that things like arp, icmp, etc are handled on the MCU
<azonenberg>
and low bandwidth management traffic like scpi goes to it
<azonenberg>
but high speed stuff like the waveform sample datapath is all FPGA
<azonenberg>
rather than having the waveform data and the management be considered two seaprate hosts with their own ip/mac i want to look into sharing state and packet data
<azonenberg>
such that certain ports/protocols are implemented in software and others in hardware
<azonenberg>
and you can trade back and forth depending on fpga area vs performance requirements
<d1b2>
<johnsel> yeah you've told me about it before, it's an interesting idea
<azonenberg>
anyway the reasdons for using the exxternal mcu are that it has a lot of sram (so doesnt compete with fpga block ram)
<azonenberg>
it has a random number generator (so no need to use sketchy RNGs in the FPGA for crypto)
<azonenberg>
and it can clock significantly faster than a typical softcore
<d1b2>
<johnsel> You're basically doing Zynq but discrete now, haha
<azonenberg>
Yes
<azonenberg>
and with a cortex-M not an A
<d1b2>
<johnsel> Anyway I'm looking into 10G for my scope project, so if I do build something useful I'll PR it back
<azonenberg>
i like bare metal not linux
<azonenberg>
And with the FPGA and MCU being explicitly decoupled
<azonenberg>
e.g. the mcu cannot reprogram the FPGA unless you create an interface for it to do so
<azonenberg>
one of the things i liked about the stm32h735 is that one of the package options (which i have not got my hands on yet, it's out of stock everywhere i looked) is a 68 pin QFN
<azonenberg>
i could basically just have jtag, uart, quad SPI to the FPGA, and maybe a few debug LEDs
<azonenberg>
and have it be a "brain on a stick" hanging off the FPGA
<azonenberg>
xilinx's vision for zynq is an arm soc with an fpga accelerator as a peripheral
<azonenberg>
my vision is an fpga with a microcontroller as a peripheral :p
<d1b2>
<johnsel> Yeah different usecases
<d1b2>
<johnsel> I get the industry move towards linux, it gets software people into hardware more easily, but there's definitely a lot of downsides to their current approach
<azonenberg>
yeah. and the over-reliance on things like axi and linux makes it difficult to use any other way
<azonenberg>
like you basically *have* to use the ip integrator in a zynq design
<d1b2>
<johnsel> yeah that's the whole spiel, you get custom hardware in your SoC that you can drive from the fully featured Linux environment
<azonenberg>
Yeah
<azonenberg>
thats one of the things that bothers me about xilinx's future
<azonenberg>
all of their marketing docs are presenting versal as the successor to ultrascale+
<azonenberg>
they dont go out and say it, but it's strongly implied
<d1b2>
<johnsel> they wouldn't, would they?
<d1b2>
<johnsel> I think discrete FPGA will stay
<azonenberg>
i.e. i fear that au+ / ku+ may be their last family of fpgas without an arm core you are forced to use to get any work done at all
<d1b2>
<johnsel> it's just the AI craze taking hold
<azonenberg>
I think it will stay across the industry
<azonenberg>
I don't know if it it will stay *from xilinx*
<azonenberg>
they seem all-in on versal and i dont like it
<d1b2>
<johnsel> that would be the stupidest thing ever
<azonenberg>
anyway, u+ isn't going away any time soon, even 7 series is going to be supported until at least like 2035 iirc
<azonenberg>
So even if there's no next-gen platform afterwards, i have a long ways to go before my projects outgrow a ku5p :p
<azonenberg>
Considering right now i'm working on a 7k160t and using a nontrivial amount of it, but nowhere near running out of space (yet)
<d1b2>
<johnsel> Yeah for sure, I'm discussing a building an overpowered "Analog Discovery" with someone and he asked for Xilinx' latest series (as it would be good for marketing). I said their 7 series are still plenty fast enough for what we want to do.
<d1b2>
<johnsel> it's a tough job to fully utilize one of those chips, especially on Kintex Serdes
<d1b2>
<johnsel> and 12.8Gbit/s is plenty fast, especially if you have like 8 or 16 of them
<azonenberg>
i mean, i have the opposite problem with ethernet lol
<azonenberg>
LATENTORANGE is going to use as many serdes as i can find for switching N 10GbE lanes
<azonenberg>
and then for the open scope project, i'll need a dozen JESD204B lanes to use the AD9213
<azonenberg>
That's going to be my next big hardware project once i have the mini-switch done i think
<azonenberg>
although it will be a multi step project, i need to do more work on the frontend (might borrow ideas from the thunderscope but i have my own frontend design i wanted to play more with too)
<d1b2>
<johnsel> anyway to recap your stack set up 7 series transceiver wizard configured for 10Gbase-R with the asynchronous 64/66b gearbox set up the interface and protocols using your stack probably tinker with the SFP+ module to actually switch on, and maybe some clocking issues (KC705 has a weird clock for SFP+, not sure if you use that or pull a clock from somewhere else) and hope for some wireshark traffic
<azonenberg>
Pretty much. There is a full TCPIPStack module that integrates all of the various protocol components if you want to use that for starting out
<d1b2>
<johnsel> sound correct to you?
<azonenberg>
you just have to instantiate the serdes wizard, the mac/pcs, and the stack and bolt them together
<d1b2>
<johnsel> Cool. I'll let you know how far I get, I'm receiving the SFP+ PCIe module tomorrow and some transceivers and fiber
<azonenberg>
I also have a 1000base-X core as well BTW
<azonenberg>
which you can use with a GTX or GTP in 8b10b mode
<d1b2>
<johnsel> Might be a good one to keep in de debugging toolkit if nothing goes as it should
<azonenberg>
and then i have GMII and RGMII support of course
<azonenberg>
and experimental SGMII. The 1000base-X block should support SGMII over a GTP no problem today (although this has never been tested)
<azonenberg>
and it also should in theory work over ISERDES/OSERDES oversampling, but i had hardware problems on my last board that used it
<azonenberg>
and the switch has two SGMII PHYs that i plan to use to continue testing this
<azonenberg>
I also have QSGMII support using a GTP/GTX, which is broken out to four SGMII lanes with their own MACs. this is very lightly simulation tested but has never been tested in hardware
<azonenberg>
hopefully that will begin this weekend once i stuff the other side of this board
<d1b2>
<johnsel> cool, l've been seeing incremental progress on your mastodon
<azonenberg>
Yep. All of these projects tie into each other
<azonenberg>
the whole reason scopehal has so many networking protocol decodes is so that i can do debug and verification on the switch
<azonenberg>
and i got into high speed networking so i could better build infrastructure to run high performance data acquisition
<azonenberg>
and i got into high speed probing so i could collect waveforms to debug both of the above
<azonenberg>
lol
<d1b2>
<johnsel> recursive improvement
<d1b2>
<johnsel> same-ish story here though. I wanted to do something high-speed. But to do high-speed you need an oscilloscope, thus i'm building a high-speed oscilloscope. Now I am working on the oscilloscope I have need for faster interfaces so I am looking at 10GBase
<d1b2>
<johnsel> Although I was starting from 0, you started with some nice measurement capability already. But I like bootstrapping projects
<azonenberg>
i mean that was the inspiration for FREESAMPLE
<azonenberg>
Which i still want to build at some point