#prjcombine on 2024-11-13 — irc logs at libera.irclog.whitequark.org

2024-11-12 16:41 ChanServ changed the topic of #prjcombine to: repo: https://github.com/prjunnamed/prjcombine/ | docs: https://prjunnamed.github.io/prjcombine/ | logs: https://libera.irclog.whitequark.org/prjcombine/

00:04 h_ro has quit [Ping timeout: 260 seconds]

00:06 h_ro has joined #prjcombine

00:10 <azonenberg> wow idk what i did to this sample lol

00:11 <azonenberg> its got some really caked on gunk that i dont think will come off

00:12 <azonenberg> I was hoping to just quickly get a nice pretty 3s50a image for you but i'd have to prep a new sample to do that

00:19 <Wanda[cis]> ... bleh ultrascale

00:20 <Wanda[cis]> I'm currently trying to get the geometry extractor for that thing working again after a refactor and it's a major pain

00:20 <azonenberg> OG or U+?

00:20 <Wanda[cis]> both

00:21 <azonenberg> OG ultrascale seems to be another forgotten sibling

00:21 <Wanda[cis]> they aren't different enough to warrant separate codepaths

00:21 <azonenberg> like xilinx's extended lifespan guarantee recently covers 7 series and U+

00:21 <azonenberg> and notably *not* vanilla U

00:21 <Wanda[cis]> yeah

00:21 <azonenberg> i hope they dont kill off U because the ku035 seems to be the best way to get a ton of fast IO without using a $$$$ part right now

00:22 <azonenberg> the smallest ku+ with a competitive quantity of IO is the ku11p which is ludicrously expensive

00:23 <Wanda[cis]> oh hey the definitely-not-a-zynq

00:23 <azonenberg> aren't all the u+'s zu+'s with fused off cpus and different bondouts?

00:24 <Wanda[cis]> not all

00:24 <azonenberg> or are some of the smaller ones actually separate dies with no cpu?

00:24 <Wanda[cis]> virteces are actually non-zynqs!

00:24 <Wanda[cis]> also ku3p and au*

00:24 <azonenberg> sorry i should have said all of the ku+s

00:24 <azonenberg> wait what

00:25 <azonenberg> i thought au20, au25, ku3, ku5 were all the same die, and it was also a zu+

00:25 <Wanda[cis]> nope

00:25 <Wanda[cis]> there are three die

00:25 <azonenberg> i knew au7/10/15 were not the same die

00:25 <Wanda[cis]> au7p, au15p, ku5p

00:25 <azonenberg> ah ok that makes sense

00:25 <Wanda[cis]> none of them are zynq

00:25 <azonenberg> so 10 is fused down au15

00:26 <azonenberg> and 20/25/3 are fused down ku5 as i thought

00:26 <Wanda[cis]> yes

00:26 <azonenberg> but i could have sworn that die was a zynq

00:26 <Wanda[cis]> nope

00:26 <azonenberg> TIL

00:26 <Wanda[cis]> there's also apparently going to be su*p

00:26 <Wanda[cis]> which are... I have NFI what's going on with them, but they're definitely all-new die

00:27 <azonenberg> (i dont actually know zu+ at all, never used it... i have some ku3p boards and some bare ku5p chips that i haven't made boards for)

00:27 <azonenberg> so yeah

00:27 <Wanda[cis]> and almost certainly not zynq

00:27 <azonenberg> what's interesting is that the su+ have the hard memory controller capability and much faster IOs

00:27 <azonenberg> they seem to have some versal influence

00:27 <Wanda[cis]> the X5PIO comes from versal, yeah

00:27 <Wanda[cis]> from versal2, even

00:27 <azonenberg> so are su+ 7nm then?

00:27 <Wanda[cis]> don't think so

00:27 <azonenberg> i thought su+ were 16ff like other u+

00:27 <azonenberg> or did they backport xp5io from versal somehow

00:28 <Wanda[cis]> I thiiiink they somehow backported it

00:28 <azonenberg> it also has a versal-esque platform management controller apparently

00:28 <azonenberg> which i know nothing about

00:28 <Wanda[cis]> it'd make no sense to make 7nm us= parts IMO

00:28 <Wanda[cis]> oh yeah that's also a mystery

00:28 <azonenberg> yeah i know, they'd be all pad ring lol

00:28 <Wanda[cis]> well more importantly

00:29 <azonenberg> and $$$$

00:29 <Wanda[cis]> you'd have to somehow die-shrink the whole US+ fabric tiles

00:29 <Wanda[cis]> which .... yeah, no

00:29 <Wanda[cis]> that's equivalent to making a new FPGA family (which exists and is called versal)

00:29 <azonenberg> how similar is versal's fabric to u+? skimming stuff it looked quite substantially changed even ignoring all of the NPUs and hard AXI

00:29 <Wanda[cis]> it's not

00:30 <Wanda[cis]> it's definitely a recognizable successor

00:30 <azonenberg> i mean yes

00:30 <Wanda[cis]> but that's about it

00:30 <azonenberg> yeah it seemed a lot different the one time i used it

00:30 <Wanda[cis]> for one, the fabric... barely stands on its own

00:30 <azonenberg> but i had literally one day

00:30 <azonenberg> yeah lol

00:30 <Wanda[cis]> the clock routing is not part of the fabric

00:30 <azonenberg> wait what

00:31 <Wanda[cis]> well

00:31 <azonenberg> is it like attached to the hard axi or something

00:31 <Wanda[cis]> you know how versal doesn't really have the concept of a bitstream anymore?

00:31 <azonenberg> i know it has a PDI file instead of a .bit

00:31 <azonenberg> but i havent dissected them

00:31 <azonenberg> no idea how different the structure is

00:31 <Wanda[cis]> mhm

00:31 <azonenberg> i know U+ down to like spartan3 is all the same basic container format

00:31 <Wanda[cis]> so these files are actually more-or-less a list of stuff you have to write to MMIO registers

00:31 <azonenberg> with some small changes like s6 using a 16-bit datapath instead of 32

00:32 <Wanda[cis]> down to virtex actually, but yes

00:32 <azonenberg> (i havent used anything older than s3)

00:32 <Wanda[cis]> anyway

00:32 <Wanda[cis]> versal... still has fabric, and configuration frames not unlike ultrascale

00:32 <Wanda[cis]> but the old configuration logic and bitstream framing is gone

00:32 <Wanda[cis]> instead you upload the config frames via all-new AXI-attached... thing

00:33 <Wanda[cis]> (no it's very different from zynq PCAP)

00:33 <Wanda[cis]> in a much more ... direct way

00:33 <Wanda[cis]> also the configuration frames only apply to core interconnect and CLBs/BRAMs/DSPs/a few other things

00:33 <azonenberg> so what do they like just have FDRI as an axi-mapped address you can write to?

00:33 <Wanda[cis]> other things on the periphery are configured completely separately

00:34 <Wanda[cis]> GTYs are just... memory mapped peripherials

00:34 <Wanda[cis]> so are the XPIOs, hard memory controllers, PLLs

00:34 <azonenberg> wanna bet they just bought some bog-standard IPs and hung them off axi

00:34 <Wanda[cis]> azonenberg: that's a close approximation

00:34 <Wanda[cis]> (it's actually DMAd)

00:34 <Wanda[cis]> anyway

00:35 <Wanda[cis]> notably BUFGCTRLs are not part of the fabric; they're memory-mapped tiles.

00:35 <azonenberg> Huh interesting

00:35 <azonenberg> i knew xilinx was drinking the axi koolaid hard with versal but didnt realize it was to that extreme

00:35 <Wanda[cis]> there's lots of other smaller differences too

00:35 <azonenberg> so basically a cpu-less versal doesn't even make sense

00:35 <Wanda[cis]> absolutely not

00:36 <Wanda[cis]> though

00:36 <Wanda[cis]> ARM-less versal is possible

00:36 <Wanda[cis]> theoretically

00:36 <Wanda[cis]> since the boot CPU is the microblaze thing

00:36 <azonenberg> i'm a bit surprised the used microblaze for that stuff when they have arm licenses

00:36 <Wanda[cis]> they did the same on zu+

00:36 <azonenberg> i cant imagine a m0+ license costs much compared to an a53 or whatever the main core is

00:37 <Wanda[cis]> I wonder just how many ublazes are on that thing btw

00:37 <azonenberg> the point of microblaze is supposed to be that the isa maps well to xilinx's lut fabric

00:37 <azonenberg> which is kinda gone when you hard silicon it

00:37 <azonenberg> Lol good question

00:37 <Wanda[cis]> there's hard microblazes in the GTs and DDRMCs I think

00:37 <azonenberg> The ASIC i worked on years ago had a proprietary CPU in the Credo serdes IP that ran the logic for tuning equalizer coefficients etc

00:38 <azonenberg> it was clearly inspired by mips and risc-v but was not any ISA we could find docs on anywhere

00:38 <Wanda[cis]> oh heh

00:38 <azonenberg> (we had to RE the firmware blob a bit in order to do some hacks to work around silicon errata on our end)

00:38 <Wanda[cis]> there's also some really weird custom ISA in the GTZ transceivers

00:38 <Wanda[cis]> which I reversed

00:39 <Wanda[cis]> but couldn't match to anything

00:39 <azonenberg> huh i wonder if that was a credo IP lol

00:39 <azonenberg> be funny if it turned out to be the same ip

00:39 <Wanda[cis]> does it have 10-bit code bytes

00:39 <azonenberg> That i dont know, i wasn't the one working on the RE

00:39 <azonenberg> some dude at virginia tech i never met did it

00:40 <azonenberg> but yeah, in general i am not a fan of the GTs on xilinx parts

00:41 <azonenberg> having worked with "naked" asic serdes that have no line coding blocks and CDC FIFOs and other fluff of any kind, and just give you a parallel bus in the recovered clock domain

00:41 <azonenberg> it's actually quite refreshing

00:41 <azonenberg> vs constantly wondering if you got TXUSRCLK and TXUSRCLK2 wrong or something

00:43 <azonenberg> and i've started doing my own designs with raw GTXE2_CHANNEL primitive instantiations and my own wrappers around them for known protocol standards etc

00:43 <azonenberg> i have no idea what the magic values mean but they work :p

00:43 <azonenberg> And even doing my own line coding in some cases so i dont have to deal with theirs

00:47 <Wanda[cis]> the GTs are certainly ... something

00:47 <Wanda[cis]> I like how about half the generations have 64b66b capability in hardware, but disabled due to bugs

00:48 <azonenberg> yeah i dont like the hard gearboxes

00:49 <azonenberg> i've started to just roll my own in fabric in a way that i like better

00:49 <azonenberg> i run my entire rx logic in the recovered clock domain as much as i can and just add a valid signal that deasserts 2/66 cycles

00:50 <azonenberg> (this is especially handy in situations like a BERT or multi-protocol analyzer where you need to be able to switch from 8b10b to 64b66b without a reset or something)

00:54 <Wanda[cis]> oh, and as for ARM-less versal

00:54 <Wanda[cis]> that thing may actually exist

00:54 <azonenberg> ??

00:54 <Wanda[cis]> you know how multi-die versal devices are a thing?

00:54 <azonenberg> oh

00:54 <Wanda[cis]> every die has a PMC

00:54 <azonenberg> lol

00:54 <azonenberg> but only one has the PS

00:54 <azonenberg> so you could hypothetically package out one of the other dies

00:54 <Wanda[cis]> ... probably

00:55 <Wanda[cis]> though there's a little problem

00:55 <Wanda[cis]> only the die with PS also has the DDRMC and XPIO

00:55 <azonenberg> so no hard ddr, big deal if you're doing a pure-FPGA design

00:55 <Wanda[cis]> no XPIO.

00:55 <azonenberg> do you still have HPIO or whatever the fast-ish stuff is?

00:55 <azonenberg> or is it only HDIO

00:56 <azonenberg> and GT*

00:56 <Wanda[cis]> nope

00:56 <Wanda[cis]> there's HDIO and XPIO

00:56 <azonenberg> interesting lol

00:56 <Wanda[cis]> HDIO is... kinda shit-grade

00:56 <azonenberg> yeeah

00:56 <azonenberg> lol

00:56 <azonenberg> if its anything like u+ HDIO

00:56 <azonenberg> or is it nerfed even more

00:56 <Wanda[cis]> it's... something like it, I think

00:56 <azonenberg> i did manage to do RGMII in U+ HDIO using a PHY that had built-in io delay lines

00:57 <azonenberg> it took some tweaking to make it pass IO timing and have consistent performance

00:57 <azonenberg> but it did work in the end

00:57 <Wanda[cis]> lol like me and Cat did with ice40/glasgow

00:57 <azonenberg> (running literally at fmax of the IOs)

00:57 <Wanda[cis]> using "we have IODELAY at home"

00:57 <azonenberg> lol

00:57 <azonenberg> so funny thing is

00:57 <azonenberg> my 7 series RGMII IP doesn't actually need IODELAYs

00:58 <azonenberg> i use RGMII 2.0 internal delay on the phy for RX and just clock IDDR's off RXC directly and sample RXD / RX_EN

00:58 <azonenberg> for TX, i run the datapath at 250 MHz using IOSERDES and synthesize a 90 degree phased TX clock plus data from a single clock domain

00:58 <azonenberg> no variable speed internal clocks or anything

00:59 <azonenberg> just enables depending on the speed

01:30 mwk has quit [Ping timeout: 265 seconds]

02:01 mwk has joined #prjcombine

04:32 <mupuf> Wanda[cis]: I see, I had not considered the challenge of creating a small database being a requirement before moving forward to P&R but it makes a ton of sense to focus on that first!

04:32 <mupuf> thanks for the insight :)

05:05 <azonenberg> mupuf: i ran out of disk space on my work laptop installing versal device files and had to get rid of some vm snapshots and stuff to make room lol

05:05 <azonenberg> of course i only needed it for one day so i can probably delete that data dir or something from vivado

05:06 <azonenberg> so i absolutely appreciate any work done to make small device databases

05:07 <Wanda[cis]> ... I just hope it works out

05:07 <Wanda[cis]> still worried about how large the timing databases will be

05:07 <Wanda[cis]> they have the potential to be quite bad

05:08 <azonenberg> Wanda[cis]: "smaller than vivado" is not a high bar

05:08 <azonenberg> :p

05:08 <Wanda[cis]> true

05:08 <Wanda[cis]> well I can guarantee I won't package several gcc toolchains and some guy's homedir in prjcombine

05:08 <azonenberg> and a few dozen jvms?

05:09 <Wanda[cis]> yeah

05:09 <azonenberg> and installing jtag cable firmware blobs twice for 32 and 64 bit host binary dirs?

05:09 <Wanda[cis]> heh

05:09 <azonenberg> The other thing that would be useful is just more modularity in packaging

05:10 <azonenberg> like, vivado force bundles HLS down your throat and gives you very limited granularity for device support

05:10 <azonenberg> what if i know i'll only ever use the ku3p and ku5p for example

05:10 <azonenberg> why pull in data for the ku11p?

05:10 <Wanda[cis]> well

05:10 <azonenberg> (although i'm not sure how much of a net savings that will be given the scalable architecture)

05:11 * Wanda[cis] sent a code block: https://catircservices.org/_irc/v1/media/download/AfEgPr3zB7BAB4ZpCCfU7ckx3CcBK0ulcaRuy11EtcpiC5tal6JTnepnqTem_v3hKzxUfrpBb84tZ4hXtBckfrK_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9KS0VFTldORGdac1BTT1lwbEZRclJIbW0

05:11 <Wanda[cis]> oh lmao

05:11 <Wanda[cis]> the tiledb sizes are... kinda wild

05:12 * Wanda[cis] sent a code block: https://catircservices.org/_irc/v1/media/download/AXuyQ-3fgP5MmnsFWYu08XEoVeLnjxFArB2i-lT7x9L0VLtSgGdayzYzC_TwLneexAJaea528DLsv7JJnezK0RG_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9hVmZYc2R6R0RBemptaFdqbFZvcEFvYmI

05:12 <Wanda[cis]> I like how xpla3 manages to be larger than virtex7

05:12 <azonenberg> xc7v includes xc7a/s/k?

05:12 <Wanda[cis]> yeah

05:13 <azonenberg> xc2c is larger too lol

05:14 <Wanda[cis]> also includes xc7z

05:15 <Wanda[cis]> (I used to call it series7 in code, but it annoyed me too much that virtex4/virtex5/virtex6/series7 sorted completely wrong wherever things were alphabetically ordered)

05:15 <mupuf> Wanda[cis]: How accurate do you want the timing database to be? Do you want to have it be 1:1 the one from vivado/ISE or would you accept small (pessimistic) deviations?

05:16 <Wanda[cis]> (xc4v-xc7v are similar enough that they share a lot of common codepaths and are kinda considered variants of the same underlying architecture, so I wanted the naming to be consistent too)

05:16 <Wanda[cis]> mupuf: well ideally it'd be 1:1

05:17 <Wanda[cis]> it's also... not clear there's any easy way to cheat to reduce size

05:17 <mupuf> Also does the speed index scale all the timings linearly, or would you need to have separate timings db for each of them?

05:17 <Wanda[cis]> it's not linear

05:17 <azonenberg> if only lol

05:17 <Wanda[cis]> it's uh.

05:17 <mupuf> even inside types of blocks?

05:17 <Wanda[cis]> kind of a complete mess.

05:18 <Wanda[cis]> best I can hope for is some deduplication across same speed grade of different devices of same family

05:19 <Wanda[cis]> like. maybe CLB timings of all virtex7 devices in speed grade -2 are, in fact, the same. hopefully. or just one of a small number of variants.

05:19 <mupuf> I wouldn't assume xilinx to re-characterize every speed index of every device in every family, especially since this would artificially degrade the binned down version and would drive adoption of the best one

05:20 <Wanda[cis]> (I know that's not true of interconnect timing unfortunately, since some interconnect simply has varying size across devices)

05:21 <mupuf> Right, makes sense for routing to be pretty unique

05:22 <Wanda[cis]> anyway. I don't really know. tbh I've been avoiding looking at the speed stuff for a long time, worrying about what I might find

05:23 <mupuf> yeah, better not get depressed by it and focus on reversing the different blocks first!

05:24 <Wanda[cis]> me and Cat spent one long evening looking at this stuff and managed to 1) figure out the timing algorithm ISE uses, 2) concluded that blackboxing the timing database out of unmodified ISE is pretty much impossible and our best bet is just parsing the raw speed files and converting them to something we can use

05:25 <Wanda[cis]> or... well, more like cost-prohibitive, not impossible

05:55 <mupuf> Wanda[cis]: this is gonna be fun!

05:55 <mupuf> (for very-masochistic definitions of fun)

08:15 Maja has joined #prjcombine

08:18 <Maja> just here for the live feed of mwk sliding into madness

09:10 <Wanda[cis]> ... mei

09:10 <Wanda[cis]> I mean you get a live feed either way

10:48 <Wanda[cis]> okay I've thrown in a small readme

12:18 melnary has joined #prjcombine

15:38 vup has joined #prjcombine

16:11 joshhead has joined #prjcombine

16:14 joshhead has quit [Read error: Connection reset by peer]

21:29 melnary has quit [Remote host closed the connection]

21:29 melnary has joined #prjcombine

21:30 melnary has quit [Remote host closed the connection]

21:30 melnary has joined #prjcombine