<azonenberg>
wow idk what i did to this sample lol
<azonenberg>
its got some really caked on gunk that i dont think will come off
<azonenberg>
I was hoping to just quickly get a nice pretty 3s50a image for you but i'd have to prep a new sample to do that
<Wanda[cis]>
... bleh ultrascale
<Wanda[cis]>
I'm currently trying to get the geometry extractor for that thing working again after a refactor and it's a major pain
<azonenberg>
OG or U+?
<Wanda[cis]>
both
<azonenberg>
OG ultrascale seems to be another forgotten sibling
<Wanda[cis]>
they aren't different enough to warrant separate codepaths
<azonenberg>
like xilinx's extended lifespan guarantee recently covers 7 series and U+
<azonenberg>
and notably *not* vanilla U
<Wanda[cis]>
yeah
<azonenberg>
i hope they dont kill off U because the ku035 seems to be the best way to get a ton of fast IO without using a $$$$ part right now
<azonenberg>
the smallest ku+ with a competitive quantity of IO is the ku11p which is ludicrously expensive
<Wanda[cis]>
oh hey the definitely-not-a-zynq
<azonenberg>
aren't all the u+'s zu+'s with fused off cpus and different bondouts?
<Wanda[cis]>
not all
<azonenberg>
or are some of the smaller ones actually separate dies with no cpu?
<Wanda[cis]>
virteces are actually non-zynqs!
<Wanda[cis]>
also ku3p and au*
<azonenberg>
sorry i should have said all of the ku+s
<azonenberg>
wait what
<azonenberg>
i thought au20, au25, ku3, ku5 were all the same die, and it was also a zu+
<Wanda[cis]>
nope
<Wanda[cis]>
there are three die
<azonenberg>
i knew au7/10/15 were not the same die
<Wanda[cis]>
au7p, au15p, ku5p
<azonenberg>
ah ok that makes sense
<Wanda[cis]>
none of them are zynq
<azonenberg>
so 10 is fused down au15
<azonenberg>
and 20/25/3 are fused down ku5 as i thought
<Wanda[cis]>
yes
<azonenberg>
but i could have sworn that die was a zynq
<Wanda[cis]>
nope
<azonenberg>
TIL
<Wanda[cis]>
there's also apparently going to be su*p
<Wanda[cis]>
which are... I have NFI what's going on with them, but they're definitely all-new die
<azonenberg>
(i dont actually know zu+ at all, never used it... i have some ku3p boards and some bare ku5p chips that i haven't made boards for)
<azonenberg>
so yeah
<Wanda[cis]>
and almost certainly not zynq
<azonenberg>
what's interesting is that the su+ have the hard memory controller capability and much faster IOs
<azonenberg>
they seem to have some versal influence
<Wanda[cis]>
the X5PIO comes from versal, yeah
<Wanda[cis]>
from versal2, even
<azonenberg>
so are su+ 7nm then?
<Wanda[cis]>
don't think so
<azonenberg>
i thought su+ were 16ff like other u+
<azonenberg>
or did they backport xp5io from versal somehow
<Wanda[cis]>
I thiiiink they somehow backported it
<azonenberg>
it also has a versal-esque platform management controller apparently
<azonenberg>
which i know nothing about
<Wanda[cis]>
it'd make no sense to make 7nm us= parts IMO
<Wanda[cis]>
oh yeah that's also a mystery
<azonenberg>
yeah i know, they'd be all pad ring lol
<Wanda[cis]>
well more importantly
<azonenberg>
and $$$$
<Wanda[cis]>
you'd have to somehow die-shrink the whole US+ fabric tiles
<Wanda[cis]>
which .... yeah, no
<Wanda[cis]>
that's equivalent to making a new FPGA family (which exists and is called versal)
<azonenberg>
how similar is versal's fabric to u+? skimming stuff it looked quite substantially changed even ignoring all of the NPUs and hard AXI
<Wanda[cis]>
it's not
<Wanda[cis]>
it's definitely a recognizable successor
<azonenberg>
i mean yes
<Wanda[cis]>
but that's about it
<azonenberg>
yeah it seemed a lot different the one time i used it
<Wanda[cis]>
for one, the fabric... barely stands on its own
<azonenberg>
but i had literally one day
<azonenberg>
yeah lol
<Wanda[cis]>
the clock routing is not part of the fabric
<azonenberg>
wait what
<Wanda[cis]>
well
<azonenberg>
is it like attached to the hard axi or something
<Wanda[cis]>
you know how versal doesn't really have the concept of a bitstream anymore?
<azonenberg>
i know it has a PDI file instead of a .bit
<azonenberg>
but i havent dissected them
<azonenberg>
no idea how different the structure is
<Wanda[cis]>
mhm
<azonenberg>
i know U+ down to like spartan3 is all the same basic container format
<Wanda[cis]>
so these files are actually more-or-less a list of stuff you have to write to MMIO registers
<azonenberg>
with some small changes like s6 using a 16-bit datapath instead of 32
<Wanda[cis]>
down to virtex actually, but yes
<azonenberg>
(i havent used anything older than s3)
<Wanda[cis]>
anyway
<Wanda[cis]>
versal... still has fabric, and configuration frames not unlike ultrascale
<Wanda[cis]>
but the old configuration logic and bitstream framing is gone
<Wanda[cis]>
instead you upload the config frames via all-new AXI-attached... thing
<Wanda[cis]>
(no it's very different from zynq PCAP)
<Wanda[cis]>
in a much more ... direct way
<Wanda[cis]>
also the configuration frames only apply to core interconnect and CLBs/BRAMs/DSPs/a few other things
<azonenberg>
so what do they like just have FDRI as an axi-mapped address you can write to?
<Wanda[cis]>
other things on the periphery are configured completely separately
<Wanda[cis]>
GTYs are just... memory mapped peripherials
<Wanda[cis]>
so are the XPIOs, hard memory controllers, PLLs
<azonenberg>
wanna bet they just bought some bog-standard IPs and hung them off axi
<Wanda[cis]>
azonenberg: that's a close approximation
<Wanda[cis]>
(it's actually DMAd)
<Wanda[cis]>
anyway
<Wanda[cis]>
notably BUFGCTRLs are not part of the fabric; they're memory-mapped tiles.
<azonenberg>
Huh interesting
<azonenberg>
i knew xilinx was drinking the axi koolaid hard with versal but didnt realize it was to that extreme
<Wanda[cis]>
there's lots of other smaller differences too
<azonenberg>
so basically a cpu-less versal doesn't even make sense
<Wanda[cis]>
absolutely not
<Wanda[cis]>
though
<Wanda[cis]>
ARM-less versal is possible
<Wanda[cis]>
theoretically
<Wanda[cis]>
since the boot CPU is the microblaze thing
<azonenberg>
i'm a bit surprised the used microblaze for that stuff when they have arm licenses
<Wanda[cis]>
they did the same on zu+
<azonenberg>
i cant imagine a m0+ license costs much compared to an a53 or whatever the main core is
<Wanda[cis]>
I wonder just how many ublazes are on that thing btw
<azonenberg>
the point of microblaze is supposed to be that the isa maps well to xilinx's lut fabric
<azonenberg>
which is kinda gone when you hard silicon it
<azonenberg>
Lol good question
<Wanda[cis]>
there's hard microblazes in the GTs and DDRMCs I think
<azonenberg>
The ASIC i worked on years ago had a proprietary CPU in the Credo serdes IP that ran the logic for tuning equalizer coefficients etc
<azonenberg>
it was clearly inspired by mips and risc-v but was not any ISA we could find docs on anywhere
<Wanda[cis]>
oh heh
<azonenberg>
(we had to RE the firmware blob a bit in order to do some hacks to work around silicon errata on our end)
<Wanda[cis]>
there's also some really weird custom ISA in the GTZ transceivers
<Wanda[cis]>
which I reversed
<Wanda[cis]>
but couldn't match to anything
<azonenberg>
huh i wonder if that was a credo IP lol
<azonenberg>
be funny if it turned out to be the same ip
<Wanda[cis]>
does it have 10-bit code bytes
<azonenberg>
That i dont know, i wasn't the one working on the RE
<azonenberg>
some dude at virginia tech i never met did it
<azonenberg>
but yeah, in general i am not a fan of the GTs on xilinx parts
<azonenberg>
having worked with "naked" asic serdes that have no line coding blocks and CDC FIFOs and other fluff of any kind, and just give you a parallel bus in the recovered clock domain
<azonenberg>
it's actually quite refreshing
<azonenberg>
vs constantly wondering if you got TXUSRCLK and TXUSRCLK2 wrong or something
<azonenberg>
and i've started doing my own designs with raw GTXE2_CHANNEL primitive instantiations and my own wrappers around them for known protocol standards etc
<azonenberg>
i have no idea what the magic values mean but they work :p
<azonenberg>
And even doing my own line coding in some cases so i dont have to deal with theirs
<Wanda[cis]>
the GTs are certainly ... something
<Wanda[cis]>
I like how about half the generations have 64b66b capability in hardware, but disabled due to bugs
<azonenberg>
yeah i dont like the hard gearboxes
<azonenberg>
i've started to just roll my own in fabric in a way that i like better
<azonenberg>
i run my entire rx logic in the recovered clock domain as much as i can and just add a valid signal that deasserts 2/66 cycles
<azonenberg>
(this is especially handy in situations like a BERT or multi-protocol analyzer where you need to be able to switch from 8b10b to 64b66b without a reset or something)
<Wanda[cis]>
oh, and as for ARM-less versal
<Wanda[cis]>
that thing may actually exist
<azonenberg>
??
<Wanda[cis]>
you know how multi-die versal devices are a thing?
<azonenberg>
oh
<Wanda[cis]>
every die has a PMC
<azonenberg>
lol
<azonenberg>
but only one has the PS
<azonenberg>
so you could hypothetically package out one of the other dies
<Wanda[cis]>
... probably
<Wanda[cis]>
though there's a little problem
<Wanda[cis]>
only the die with PS also has the DDRMC and XPIO
<azonenberg>
so no hard ddr, big deal if you're doing a pure-FPGA design
<Wanda[cis]>
no XPIO.
<azonenberg>
do you still have HPIO or whatever the fast-ish stuff is?
<azonenberg>
or is it only HDIO
<azonenberg>
and GT*
<Wanda[cis]>
nope
<Wanda[cis]>
there's HDIO and XPIO
<azonenberg>
interesting lol
<Wanda[cis]>
HDIO is... kinda shit-grade
<azonenberg>
yeeah
<azonenberg>
lol
<azonenberg>
if its anything like u+ HDIO
<azonenberg>
or is it nerfed even more
<Wanda[cis]>
it's... something like it, I think
<azonenberg>
i did manage to do RGMII in U+ HDIO using a PHY that had built-in io delay lines
<azonenberg>
it took some tweaking to make it pass IO timing and have consistent performance
<azonenberg>
but it did work in the end
<Wanda[cis]>
lol like me and Cat did with ice40/glasgow
<azonenberg>
(running literally at fmax of the IOs)
<Wanda[cis]>
using "we have IODELAY at home"
<azonenberg>
lol
<azonenberg>
so funny thing is
<azonenberg>
my 7 series RGMII IP doesn't actually need IODELAYs
<azonenberg>
i use RGMII 2.0 internal delay on the phy for RX and just clock IDDR's off RXC directly and sample RXD / RX_EN
<azonenberg>
for TX, i run the datapath at 250 MHz using IOSERDES and synthesize a 90 degree phased TX clock plus data from a single clock domain
<azonenberg>
no variable speed internal clocks or anything
<azonenberg>
just enables depending on the speed
mwk has quit [Ping timeout: 265 seconds]
mwk has joined #prjcombine
<mupuf>
Wanda[cis]: I see, I had not considered the challenge of creating a small database being a requirement before moving forward to P&R but it makes a ton of sense to focus on that first!
<mupuf>
thanks for the insight :)
<azonenberg>
mupuf: i ran out of disk space on my work laptop installing versal device files and had to get rid of some vm snapshots and stuff to make room lol
<azonenberg>
of course i only needed it for one day so i can probably delete that data dir or something from vivado
<azonenberg>
so i absolutely appreciate any work done to make small device databases
<Wanda[cis]>
... I just hope it works out
<Wanda[cis]>
still worried about how large the timing databases will be
<Wanda[cis]>
they have the potential to be quite bad
<azonenberg>
Wanda[cis]: "smaller than vivado" is not a high bar
<azonenberg>
:p
<Wanda[cis]>
true
<Wanda[cis]>
well I can guarantee I won't package several gcc toolchains and some guy's homedir in prjcombine
<azonenberg>
and a few dozen jvms?
<Wanda[cis]>
yeah
<azonenberg>
and installing jtag cable firmware blobs twice for 32 and 64 bit host binary dirs?
<Wanda[cis]>
heh
<azonenberg>
The other thing that would be useful is just more modularity in packaging
<azonenberg>
like, vivado force bundles HLS down your throat and gives you very limited granularity for device support
<azonenberg>
what if i know i'll only ever use the ku3p and ku5p for example
<azonenberg>
why pull in data for the ku11p?
<Wanda[cis]>
well
<azonenberg>
(although i'm not sure how much of a net savings that will be given the scalable architecture)
<Wanda[cis]>
I like how xpla3 manages to be larger than virtex7
<azonenberg>
xc7v includes xc7a/s/k?
<Wanda[cis]>
yeah
<azonenberg>
xc2c is larger too lol
<Wanda[cis]>
also includes xc7z
<Wanda[cis]>
(I used to call it series7 in code, but it annoyed me too much that virtex4/virtex5/virtex6/series7 sorted completely wrong wherever things were alphabetically ordered)
<mupuf>
Wanda[cis]: How accurate do you want the timing database to be? Do you want to have it be 1:1 the one from vivado/ISE or would you accept small (pessimistic) deviations?
<Wanda[cis]>
(xc4v-xc7v are similar enough that they share a lot of common codepaths and are kinda considered variants of the same underlying architecture, so I wanted the naming to be consistent too)
<Wanda[cis]>
mupuf: well ideally it'd be 1:1
<Wanda[cis]>
it's also... not clear there's any easy way to cheat to reduce size
<mupuf>
Also does the speed index scale all the timings linearly, or would you need to have separate timings db for each of them?
<Wanda[cis]>
it's not linear
<azonenberg>
if only lol
<Wanda[cis]>
it's uh.
<mupuf>
even inside types of blocks?
<Wanda[cis]>
kind of a complete mess.
<Wanda[cis]>
best I can hope for is some deduplication across same speed grade of different devices of same family
<Wanda[cis]>
like. maybe CLB timings of all virtex7 devices in speed grade -2 are, in fact, the same. hopefully. or just one of a small number of variants.
<mupuf>
I wouldn't assume xilinx to re-characterize every speed index of every device in every family, especially since this would artificially degrade the binned down version and would drive adoption of the best one
<Wanda[cis]>
(I know that's not true of interconnect timing unfortunately, since some interconnect simply has varying size across devices)
<mupuf>
Right, makes sense for routing to be pretty unique
<Wanda[cis]>
anyway. I don't really know. tbh I've been avoiding looking at the speed stuff for a long time, worrying about what I might find
<mupuf>
yeah, better not get depressed by it and focus on reversing the different blocks first!
<Wanda[cis]>
me and Cat spent one long evening looking at this stuff and managed to 1) figure out the timing algorithm ISE uses, 2) concluded that blackboxing the timing database out of unmodified ISE is pretty much impossible and our best bet is just parsing the raw speed files and converting them to something we can use
<Wanda[cis]>
or... well, more like cost-prohibitive, not impossible
<mupuf>
Wanda[cis]: this is gonna be fun!
<mupuf>
(for very-masochistic definitions of fun)
Maja has joined #prjcombine
<Maja>
just here for the live feed of mwk sliding into madness
<Wanda[cis]>
... mei
<Wanda[cis]>
I mean you get a live feed either way
<Wanda[cis]>
okay I've thrown in a small readme
melnary has joined #prjcombine
vup has joined #prjcombine
joshhead has joined #prjcombine
joshhead has quit [Read error: Connection reset by peer]
melnary has quit [Remote host closed the connection]
melnary has joined #prjcombine
melnary has quit [Remote host closed the connection]