<azonenberg>
mwk: I can definitely write a lot of info on the XC2C interconnect and macrocell stuff when i have time
<azonenberg>
where's the doc source stored?
<Wanda[cis]>
in docs/ of course
<Wanda[cis]>
the tables are all autogenerated from the database though
<Wanda[cis]>
also uh. I really need to fix up that sphinx theme to at least remove max-width
<Wanda[cis]>
the experience of wide tables is currently Not Great
<Wanda[cis]>
(I have a local css hack but I never got around to fixing it in the published docs)
<whitequark[cis]>
oh, THAT's why it's unusable when published
<azonenberg>
it might be a bit as i'm pushign on trying to get ngscopeclient v0.1 out the door by EOY but i have a lot of internal notes, programming algorithms at least verified on the 2c32a, and other stuff i can write up
<Wanda[cis]>
yeah sorry >_>
<azonenberg>
as well as some interesting notes on the internal structure of the ZIA/AIM
<Wanda[cis]>
anyway I have to leave in like negative 20 minutes
<Wanda[cis]>
see you later
<azonenberg>
(and a working verilog emulation model of the 2c32a that implements everything except eeprom programming)
<azonenberg>
and some quirks of the IOBs
<azonenberg>
not sure if you want that shoved in your repo anywhere, but it exists somewhere and is BSD-3 licensed if you wanna make a separate repo or whatever
<azonenberg>
even implements the JTAG, you can run it on an artix7 and hook its GPIOs up to iMPACT and it'll happily program a 2c32a jed to it
<azonenberg>
I dont think i can contribute much to the other device families but definitely xc2c i can help with
<azonenberg>
(the emulation model is parameterizable and could easily be extended with ZIA tables for larger devices but I only ever actually implemented the 32a codepath)
jn has joined #prjcombine
mupuf has joined #prjcombine
<mupuf>
mwk: wow, you've been productive! Congrats!
<mupuf>
How would nextpnr be able to make use of all this work? Is there an IR that can be used to document FPGAs?
<whitequark[cis]>
there's himbaechel
ari has joined #prjcombine
<mupuf>
whitequark[cis]: thanks, that's just what I was looking for :)
BluRaf has joined #prjcombine
<Wanda[cis]>
alright
<Wanda[cis]>
back
<Wanda[cis]>
holy crap it's cold
<Wanda[cis]>
<mupuf> "How would nextpnr be able to..." <- so this is kinda a complex question
<Wanda[cis]>
first off, there's no way to do that with just an IR, you're going to need a bunch of target-specific code in the P&R tool
<Wanda[cis]>
though hopefully not that much
<Wanda[cis]>
second
<Wanda[cis]>
a big goal of prjcombine is getting the chip databases to manageable size
<Wanda[cis]>
which is... tricky
<Wanda[cis]>
the largest devices are kind of huge
<Wanda[cis]>
so the way prjcombine works is that the device geometry is specified as a very small "blueprint", which is expanded to a proper tile grid by target-specific code
<Wanda[cis]>
which has been a reasonably successful approach, allowing me to fit all Xilinx devices up to ultrascale+ within 4.4MiB of compressed database total
<h_ro>
What kind of information is included in "device geometry"?
<azonenberg>
Wanda[cis]: i wish actual xilinx toolchains did thatk ind of thing lol
<azonenberg>
i tried to do that in my xc2c code years ago
<Wanda[cis]>
what kind of tiles every device is made of
<Wanda[cis]>
what positions
<Wanda[cis]>
what wires are in each kind of tile, what muxes
<Wanda[cis]>
etc.
<h_ro>
got it
<Wanda[cis]>
unfortunately the 4.4MB figure doesn't include timing data, which is likely to be quite large and will probably make up the bulk of the final database
<Wanda[cis]>
azonenberg: how did that work out?
<Wanda[cis]>
I find CPLDs don't really benefit from deduplication that much
<azonenberg>
Wanda[cis]: yeah the ZIA didn't dedup well but at least i only had to store the macrocell structures once
<azonenberg>
it was actually procedural rather than data driven
<azonenberg>
so i just had a loop making a bunch of macrocell objects etc
<Wanda[cis]>
I mean, I still did that, but... well there's much less benefit in deduplicating a 512-macrocell CPLD than a million-LUT FPGA
<azonenberg>
well yeah lol
<azonenberg>
This reminds me i wanted to make nice APB-based VIO/ILA cores that could interface with an attached MCU and bridge to ngscopeclient
<azonenberg>
the idea was that i could just have a SCPI interface on a uart, ethernet port, whatever
<azonenberg>
and interface to one or more virtual instruments in the DUT
<azonenberg>
without using any of xilinx's IPs
<azonenberg>
the other thing i wanted to do differently was have symbol tables (at least optionally) baked into block ram
<azonenberg>
or a flash chunk on the mcu or something
<azonenberg>
basically the equivalent of a xilinx .ltx but built into firmware so you can just take a device and talk to it without needing separate symobls
<azonenberg>
symbols*
<azonenberg>
h_ro: Yes. i have no versal hardware, nor am i likely to ever get any any time soon
<azonenberg>
so it's dead to me
<azonenberg>
if and when they add support for 7 series or ultrascale+ i want to make a scopehal driver for the xilinx ILA/VIO using it
<azonenberg>
either way i want a fully f/oss alternative
<Wanda[cis]>
versal is deliciously insane hardware
<Wanda[cis]>
perfect to self-harm with
<azonenberg>
i was at a customer last week that had VMK108's *everywhere*
<azonenberg>
they had one that was a glorified ethernet to [redacted] bridge
<azonenberg>
i asked for an fpga devkit to generate a handful of simple digital signals as part of the test i was doing and they gave me another
<azonenberg>
there must have been half a dozen VMK108s just on this one bench i was sitting at
<azonenberg>
it took me most of a day just to set things up and figure out the stupid block design flow and infrastructure enough i could get a blinky working
<azonenberg>
the versal (and zynq) chips and flows embody everything i think xilinx is doing wrong
<azonenberg>
(extra funny because it seems to be their primary focus moving forward)
<azonenberg>
i tried making a systemverilog top level design like i usually do, then it complained about me not having their stupid PS9 wrapper IP, which had to be a block design
<azonenberg>
(of course it didnt tell me until i tried to make a bitstream)
<azonenberg>
then it wouldnt let me put my sv design in a block design because that flow doesnt support sv
<azonenberg>
so i had to make a v2005 wrapper around my sv code and put THAT in the bd
<Wanda[cis]>
idk I think zynq is kinda cute
<Wanda[cis]>
but then I never used it with the official tools
<azonenberg>
lol
<azonenberg>
my big problem is that the PS isn't isolated enough from the PL
<azonenberg>
i like rtl-centric security architectures where you can build a root of trust out of gateware and guarantee that no matter what else happens, X invariant will hold
<azonenberg>
If the PS can load a new bitstream on the PL without its consent at any time, it turns that on its head
<Wanda[cis]>
oh.
<azonenberg>
also i dont like how they have all the hardware AXI interfaces (and on their IPs) exposed as 50 separate discrete named ports. SV interfaces and the VHDL equivalent exist for a reason
<azonenberg>
by all means have the primitive work that way under the hood since you need discrete wires
<Wanda[cis]>
would you prefer virtex5-style FPGA-hard core combo?
<azonenberg>
but then wrap it in interfaces
<azonenberg>
Yes. I want an FPGA with a CPU just sitting somewhere like a block ram
<azonenberg>
that does nothing until i ask it to
<azonenberg>
and can't talk to anything i don't allow it to
<azonenberg>
ideally it would have a couple of ~1 GHz M85 class processors and a few dozen m0+ class i can use as offloads for what would otherwise be an annoyingly large rtl state machine in some logic block
<azonenberg>
a m0+ is like the size of a bram transistor wise
<azonenberg>
put a column of m0's next to every 3rd bram column or so
<azonenberg>
and give me pips to hook them to the adjacent bram as TCM and then provide an AHB interface out to fabric
<azonenberg>
anyway my other problems with xc7z are smaller things, like the inability to boot the PL and PS independently from spi flash and the lack of hard TRNG + crypto IPs
<Wanda[cis]>
I... hm
<Wanda[cis]>
I'm not sure about that
<azonenberg>
you can jtag the PL
<Wanda[cis]>
but there's a distinct possibility that the PL actually can be indepentendly booted from SPI
<azonenberg>
but there is no documented way to boot the pl from spi flash
<azonenberg>
key word documented
<Wanda[cis]>
oh yes.
<azonenberg>
there are some strap pins and bits of bootrom where i think it probably is possible
<Wanda[cis]>
just well
<azonenberg>
Just havent bothered to hack on it when xc7z isnt even that great CPU wise by modern standards
<Wanda[cis]>
there are three RSVDVCC and RSVDGND pins that are suspiciously in the same area as M0-M2 on other virtex7 devices
<azonenberg>
Yes. I noticed that too
<azonenberg>
never bothered to tinker with them
<azonenberg>
but i can guess
<azonenberg>
how buggy the mode is is anybody's question
<Wanda[cis]>
I wonder if it actually works
<Wanda[cis]>
yeah
<azonenberg>
In my own bigger projects lately I've been using a stm32h735 with the parallel memory controller connecting to an APB bridge on an adjacent 7 series or, soon, ultrascale+, FPGA
<Wanda[cis]>
... of course I don't have any board that'd be actually wired for it, so...
<azonenberg>
the h735 CPU is coremark-wise competitive with an xc7z A9
<Wanda[cis]>
I wonder if I can like
<azonenberg>
and it has internal sram and flash and a ton of IO independent of the FPGA
<Wanda[cis]>
INTEST the configuration logic
<azonenberg>
Which is especially important for u+ because the low end parts like the au20p, ku3p, etc only come in ffg676 which is pretty light on IO (low end OG ultrascale were available in ffg1156)
<azonenberg>
so being able to throw all my slow IOs on the stm32 and save the FPGA IOs for fast stuff is important
<azonenberg>
I did have another cursed xc7z idea i've been meaning to play with, though
<azonenberg>
i may have mentioned it to you, basically porting antikernel to the platform
<Wanda[cis]>
using coresight for external context-switching?
<azonenberg>
Yeah
<azonenberg>
with each A9 locked up in a padded cell with access to a small chunk of ddr and a mailbox to the PL
<azonenberg>
i've never had time to work on it but i have a zybo i bought years ago meaning to try it out
<h_ro>
Wanda[cis]: Got toolchain file set up, but failing on dump_ise_parts step: https://bpa.st/EAFA Is this something you have encountered before?
<Wanda[cis]>
this maaaay be the thing I'm working around with LD_PRELOAD
<Wanda[cis]>
save to fixuseafterfree.c ; gcc -shared -fPIC fixuseafterfree.c -o fixuseafterfree.so ; add fixuseafterfree.so to LD_PRELOAD within the toolchain toml file
<Wanda[cis]>
see if it fixes the problem
<h_ro>
brb
<Wanda[cis]>
it certainly seems like that could be it; the problem manifested with emitting (possibly non-ascii) junk in xdlrc files
<Wanda[cis]>
(I think if you actually use the ancient RHEL version that ISE nominally requires, you don't hit this issue or something?)
<h_ro>
It worked. Thanks for that fix.
<Wanda[cis]>
ISE is a great piece of software.
<Wanda[cis]>
grep prjcombine sources for your favorite obscenity to find more examples of greatness.
<azonenberg>
did you ever figure out what the root cause of this is by looking at older/newer bitstreams?
<Wanda[cis]>
oh, isn't that actually documented?
<Wanda[cis]>
anyway it's pretty simple
<azonenberg>
AFAIK it's just documented "9k bram init doesnt work" and "the fix is available in newer ISE but isn't compatible with encrypted bitstreams"
<Wanda[cis]>
s6 has 16kbit blockrams, splittable into two 8kbit blockrams
<azonenberg>
i'm not aware of any root cause explained
<Wanda[cis]>
and, like on any other FPGA, uploading BRAM initial contents borrows one of the bram read/write ports
<Wanda[cis]>
it turns out the borrowing logic does not take the split-8kbit configuration into account, and breaks when it is active
<Wanda[cis]>
so it'll mangle the data somehow (I haven't checked how)
<Wanda[cis]>
ISE normally works around it by not turning on the "split into 2×8kbit" bit on the first pass
<Wanda[cis]>
and then overwriting the relevant configuration frames later, after the bram contents are uploaded
<Wanda[cis]>
(this is why the workaround bitstreams are larger than normal)
<azonenberg>
oh interesting
<Wanda[cis]>
but this is not possible with encrypted bitstreams, because encrypted bitstreams for security reasons only allow you one upload pass, from start to finish, in order
<Wanda[cis]>
(encrypted s6 bitstreams that is; v6/v7 encryption works differently)
<azonenberg>
yeah i really do wonder what happened to s6's dev team
<azonenberg>
it was absolutely xilinx's windows ME
<Wanda[cis]>
yes.
<azonenberg>
even right down to "they axed the entire product line and rebuilt all the new low end products as cut down virtex6's"
<Wanda[cis]>
they managed to kill off the spartan line
<Wanda[cis]>
I mean
<Wanda[cis]>
it wasn't really a separate product line for long
<azonenberg>
There's a few products i've been very curious about
<azonenberg>
The XC7A350T, for example
<Wanda[cis]>
other than s6 and sooooomewhat s3, every other spartan has been rebadged something else
<Wanda[cis]>
well, what about it?
<Wanda[cis]>
it got cancelled
<Wanda[cis]>
can't really tell you why
<azonenberg>
yeah thats the thing
<azonenberg>
It never launched. How far did it get? did it tape out? were there bugs? did they nuke it because they didn't want it cutting into kintex's market share?
<azonenberg>
ditto the xc2c1024
<Wanda[cis]>
idk
<Wanda[cis]>
it does have an IDCODE
<azonenberg>
Yeah and it's referenced in some older ISE versions, datasheets, etc
<azonenberg>
it was to be in the same ffg1156 as the 7a200t but pin out the two NC banks
<azonenberg>
there's info about how many luts, ram, etc it was supposed to have
<azonenberg>
idk if there was ever a full P&R DB for it
<Wanda[cis]>
there's still traces of it, and of many other canceled virtex7 devices, in ISE files
<Wanda[cis]>
but not complete
<Wanda[cis]>
the main P&R DBs got cut
<azonenberg>
The one i'm most curious about, though
<azonenberg>
are the BladeRunner and StarFighter CPLD platforms
<azonenberg>
BladeRunner-I was XC2C afaik
<azonenberg>
hence "xbr"
<azonenberg>
BladeRunner-II/III were according to a leaked roadmap to have been 1.5 and 1.2V, likely on UMC 110 and 90nm based on some other sources
<azonenberg>
and then StarFighter I/II were to be 1.8 / 1.5V descendents of XC9500
<Wanda[cis]>
pfft.
<Wanda[cis]>
they killed off the 2.5V XC9500 already
<azonenberg>
There was also a patent i found about a weird FPGA-CPLD hybrid architecture
<azonenberg>
with a 2D routing interconnect like an FPGA
<azonenberg>
but with PLAs instead of LUTs as the basic logic primitive
<Wanda[cis]>
... I think lattice specialised in monstrosities like this
<azonenberg>
it would essentially be a grid of XC2C FBs in a 2D array with FPGA-style pip routing between them
<azonenberg>
i'm really curious how far the internal projects based on that got
<azonenberg>
and why they got killed
<Wanda[cis]>
shrug that I have no idea about
<azonenberg>
ah ok same roadmap says BladeRunner-II was to be 150nm, as was StarFighter II
<azonenberg>
launching in 2002-2003
<Wanda[cis]>
heh
<Wanda[cis]>
what I'm curious about is whether fpgacore ever like, existed
<azonenberg>
yeah
<azonenberg>
thats another one i was wondering about
<Wanda[cis]>
that one actually has complete support in released ISE! just kind of... disabled
<Wanda[cis]>
yeah I know it was an IBM thing
<Wanda[cis]>
... in exchange for the PPC cores or something
<Wanda[cis]>
(btw that thing is essentially just a filleted out spartan3)
<azonenberg>
yeah i figured
<azonenberg>
it's probably an xc3s50 sans pad ring pretty much
<Wanda[cis]>
sans pad ring, DCMs, and BRAMs
<azonenberg>
oh, no brams? interesting
<Wanda[cis]>
mhm
<Wanda[cis]>
oh and also BUFGMUXes got downgraded to just plain BUFGs
<azonenberg>
multipliers or no
<azonenberg>
i guess the idea is you have all of the clock tree coming out of the parent asic
<azonenberg>
so that makes some sense
<Wanda[cis]>
no multipliers either (they're closely tied to BRAMs in s3)
<azonenberg>
yeah thats why i asked
<azonenberg>
probably dedicated clock inputs on the periphery not shared with the normal IOs?
<Wanda[cis]>
nope
<Wanda[cis]>
shared with IO just like on plain s3
<azonenberg>
oh interesting
<Wanda[cis]>
the IO cell is special
<Wanda[cis]>
S3 has 3 IOBs per tile, of which (on average) 2.2 are actually connected to pads
<azonenberg>
i would have figured you'd just have like a bunch of pips on the perimeter of the array that would just route signals that would normally go to IOBs to a fixed buffer and an interconnect track on metal 3 or so
<Wanda[cis]>
fpgacore has 4 IPADs and 4 OPADs per IO tile
<azonenberg>
and then the asic integrator would wire that to whatever
<Wanda[cis]>
they have FFs and loopback mode
<azonenberg>
interesting
<Wanda[cis]>
the "pad" being likely a misnomer
<Wanda[cis]>
the thing also has its own JTAG TAP
<Wanda[cis]>
including BSCAN for the "pads"
<azonenberg>
i really wanna get my hands on one of those chips lol
<Wanda[cis]>
... if they exist
<Wanda[cis]>
if you're interested
<azonenberg>
yeah i know
<azonenberg>
also wow my xc3s50a substrate sample is FILTHY