<sajattack[m]1>
hey, I'm a casual fpga toucher, and big rust nerd so this seems neat
<whitequark[cis]>
nice
<whitequark[cis]>
welcome
<whitequark[cis]>
rust seemed like basically the only reasonable choice for starting a new fpga toolchain project in 2025
<sajattack[m]1>
not sure if my matrix client or the server is derping, but I tried to heart-react to that and it doesn't wanna go through 😅
<Wanda[cis]>
we have disabled reactions on this channel because it doesn't translate to IRC (which the channel is bridged to)
<sajattack[m]1>
makes sense
<sajattack[m]1>
so is prjunnamed synthesis and prjcombine pnr or?
<Wanda[cis]>
prjcombine is reverse engineering
<sajattack[m]1>
oh cool
<whitequark[cis]>
<whitequark[cis]> "rust seemed like basically the..." <- obviously neither c nor c++ are suitable for it, and while i would be up for something more esoteric like OCaml, neither the performance characteristics nor the community-wide understanding of it is really quite there
<Wanda[cis]>
prjcombine's goal is to document FPGAs and answer any queries you may have about them (whether you are a person seeking information, or a P&R tool seeking list of available wires and interconnection points), it's highly unopinionated and makes no decisions
<Wanda[cis]>
prjunnamed is actual synthesis and P&R
<Wanda[cis]>
(except we haven't started on P&R just yet)
<sajattack[m]1>
<whitequark[cis]> "obviously neither c nor c..." <- yeah I don't know the direct context of the yosys codebase, but in general it seems very difficult to develop robust systems in c/cpp compared to rust.
<sajattack[m]1>
I was actually chatting with a somewhat new programmer who wants to learn C the other day (mostly just knows python atm), and he didn't seem to be aware of that distinction yet. But maybe he will be pretty fast when he starts 😆
<sajattack[m]1>
I think I'm on whitequark's patreon but I'd have to check tbh
<whitequark[cis]>
yosys is a fairly old codebase (started in 2013) and it really suffers from an idiosyncratic flavor of data structures that is particularly difficult to use correctly or efficiently or concisely
<Wanda[cis]>
to be clear, this project is not "rewrite yosys in rust"; the yosys codebase has much deeper problems, mostly centered around its IR being not fit for purpose
<whitequark[cis]>
i don't think it needs to be that bad, but this is basically a situation where you need to be a world-leading expert at C++ to match the outcome of a completely average Rust programmer in terms of using the resulting abstractions
<whitequark[cis]>
but yes, what Wanda says is correct: we both have a lot of C++ experience and the language alone is not what motivated starting prjunnamed
<sajattack[m]1>
ok thanks for the info
<whitequark[cis]>
it's more that now that we did decide to start it, we obviously did it in rust
<sajattack[m]1>
my most recent fpga creation is a laughably simple and probably not even close to correct MCS6530 RRIOT replacement. It fits in under 200 ice40 LUT4s. And it works well enough for my KIM-1 replica board to boot 😅
<whitequark[cis]>
nice!
<sajattack[m]1>
daglem helped me with the board design and gave some good feedback too
<sajattack[m]1>
anyway I'll stop nattering
<sajattack[m]1>
I was just thinking it might be handy if you need something slightly more complicated than a blinky to poke at, but you probably have lots of stuff yourselves
<Wanda[cis]>
(also we're going to use glasgow applets for testing as soon as we have working memory inference)
<sajattack[m]1>
I made the cpu from nand2tetris before and I think I'm almost ready to try making a bad riscv core 😅
<whitequark[cis]>
nice!!
<sajattack[m]1>
my code for the nand2tetris thing is intentionally super janky because I wanted to build everything as modules based on nand gates as the core intended
<sajattack[m]1>
it's probably a nightmare for synthesis idk
<whitequark[cis]>
oh wait i need to define something
<whitequark[cis]>
okay -DMCS6530_002 did it
<sajattack[m]1>
MOS6530_002=1
<sajattack[m]1>
yeah
<sajattack[m]1>
I'm curious about this async ram thing still. Maybe it's because the verilator reads the framebuffer directly? Or maybe it's a different clock domain
<whitequark[cis]>
by the way, adding an SB_IO that's not constrained in the .pcf file (the dontcare one) makes nextpnr place it at a random pin
<whitequark[cis]>
i had to delete it from the netlist to synthesize it because of some issues with netlist import we have
<whitequark[cis]>
... and i'm not sure where the other SB_IO went
<whitequark[cis]>
but yeah. once we have memory mapping you should be able to test the prjunnamed-based flow
<sajattack[m]1>
cool
<sajattack[m]1>
I'm also using llvm-mos to do a advent of code puzzle in rust on the kim-1, so it would be kinda fun to make the kim-1 rustier
* sajattack[m]1
uploaded an image: since you mentioned I was probably turning luts into ram, I was curious what the quartus build looked like for utilization (though a large part of it might be the mister junk as well)
<widlarizerEmilJT>
<whitequark[cis]> "yosys is a fairly old codebase..." <- The first commit was an import from SVN or CVS with "481 files changed, 54634 insertions(+)" so there might be even deeper history. I peeked at the initial commit before because I've been messing with the oldest parts of yosys the most
<whitequark[cis]>
oh right
<widlarizerEmilJT>
Claire once explained to me that the initial scope was purely for some kind of coarse-grained reconfigurable systems, so I think it's like a huge case of scope creep
<Wanda[cis]>
well
<Wanda[cis]>
it started as a BSc thesis
<mupuf>
Well, that must have been one hell of a good one by my standards
<_whitenotifier-4>
[prjunnamed/prjunnamed] whitequark bfb8bc1 - Fix tests after cf879787a.
<whitequark[cis]>
what's really funny is that it's doing something really silly, like running canonicalize 23 times to fix up 23 OR cells
<whitequark[cis]>
so there's a decent amount of low-hanging fruit in those 1.5s
<widlarizerEmilJT>
sounds extremely wasm-able
<povikMartinPovie>
have you improved the speed of the lut mapper?
<povikMartinPovie>
when I tested it it seemed slow
<whitequark[cis]>
oh yeah it's trivial to wasm. i've already done it
<Wanda[cis]>
mmm, the lut mapper of all things?
<Wanda[cis]>
oh hm.
<whitequark[cis]>
(you were doing cargo run --release, right?)
<Wanda[cis]>
I guess it can get carried away making ridiculously large LUTs if you make it make a mux of three LUT4s
<povikMartinPovie>
well, I blamed the lut mapper but it could have been something else, getting a lut mapping was slower than with Yosys
<povikMartinPovie>
let me try again with --release later :)
<Wanda[cis]>
that will be immediately discarded for not actually fitting in a LUT4
<whitequark[cis]>
debug builds are kinda often slower than -O0 C++ builds because we rely on "zero cost" abstractions a ton
<povikMartinPovie>
ok, sounds like that was probably it
<whitequark[cis]>
it's the #0 question to ask if someone says a rust program is slow
<Wanda[cis]>
... oh nevermind at worst it'll construct a LUT7, that's not particularly bad
<Wanda[cis]>
should maybe be doing something smarter for LUT6 architectures? not sure how to deal with it
<Wanda[cis]>
otoh an emphemeral LUT11 won't be the end of the world either
leocassarani[m] has joined #prjunnamed
<leocassarani[m]>
I was surprised to see so many ✅s next to the Xilinx targets in the prjcombine README, I thought the equivalent of Trellis for Xilinx chips had only achieved a partial reverse engineering of the bitstream. And didn't Xilinx then start coming out with parts with encrypted bitstreams? Bear in mind I've only ever worked with Lattice targets so I might be getting mixed up.
<Wanda[cis]>
encryption on xilinx devices is optional; it's also completely user-controlled, you actually provide the key
<Wanda[cis]>
ie. it's supposed to protect user's secrets, not xilinx'
<Wanda[cis]>
(I have also completely reversed it up to virtex7; it's... not very good, by the way)
<Wanda[cis]>
and as for "equivalent of trellis"... do you mean xray?
<leocassarani[m]>
Yes, xray, sorry I forgot the name
<Wanda[cis]>
yes, it's not a very well designed or managed project. which is why I completely ignored it and decided to make something better.
<leocassarani[m]>
I think I heard a Xilinx engineer on a podcast once boasting that because of bitstream encryption it was going to be impossible to reverse it in future (this was a direct rebuke to Yosys) — I guess he was just talking nonsense
<leocassarani[m]>
This was at least five years ago athough
<leocassarani[m]>
s/athough/though/
<Wanda[cis]>
that's just nonsense.
<Wanda[cis]>
however.
<Wanda[cis]>
I'd like to note that vivado device database files are encrypted
<Wanda[cis]>
which prevents you from just dumping the database and loading it in your p&r tool
<Wanda[cis]>
unless, say, you think for 3s and notice that the key to these files must obviously be somewhere within vivado
<Wanda[cis]>
so you can just find it and proceed.
<Wanda[cis]>
which is also moot anyway
<Wanda[cis]>
because that's not what Combine does; instead, it relies on blackbox reversing by generating lots of bitstreams and carefully correlating features to bits.
<Wanda[cis]>
encryption doesn't matter; vivado helpfully decrypts itself as it prepares sample bitstreams for us
<leocassarani[m]>
lol, that's great
<leocassarani[m]>
My impression of project xray was that they'd hit some kind of wall and there were parts of the bitstream they hadn't been able to decode (but again that might be a misunderstanding of the situation)
<leocassarani[m]>
It sounds like you've already gone further with combine?
<Wanda[cis]>
we have complete virtex7 bitstream information, yes.
<Wanda[cis]>
and,for that matter, every xilinx fpga before it too
<Wanda[cis]>
what we're missing is timing.
<Wanda[cis]>
here I'm actually considering dumping the vendor database directly btw
<leocassarani[m]>
I was going to say, do you need to generate lots of bitstreams and then see what Vivado's timing analysis says?
<leocassarani[m]>
But it sounds like that's not the way to go
<Wanda[cis]>
because the complexity can be intractable otherwise
<Wanda[cis]>
yeah that was the original plan
<Wanda[cis]>
but then I looked closely and understood how the timings work
<Wanda[cis]>
computed via a pretty complex model from capacitance/resistance numbers for every piece of wire etc.
<Wanda[cis]>
hm
<Wanda[cis]>
hold on let me go downstairs and grab a proper keyboard
<leocassarani[m]>
Presumably it was easier to get timing for say, a up5k, because the device database is freely available?
<Wanda[cis]>
okay so.
<Wanda[cis]>
yes and no, actually
<Wanda[cis]>
the thing with up5k is that it has a very simple timing model
<Wanda[cis]>
a path delay is just a sum of predefined segment delays, and there's not many of those either
<Wanda[cis]>
the icecube software also just gives you the raw data in detail and it's kind of obvious how to match it up to wires
<Wanda[cis]>
as in, when you back-annotate the design with sdf, it's trivial to extract the segment delays
<Wanda[cis]>
you cannot do that with xilinx because of two reasons
<Wanda[cis]>
one, it doesn't give you the per-segment delays or the raw parameters it computed them from
<Wanda[cis]>
second, the timing model is way more complex and involves computing delay propagation through RC trees
<Wanda[cis]>
if you attempt to model it as a sum of delay segments and blackbox-reverse it that way, your algorithm will just either diverge or get a bad approximation
<Wanda[cis]>
a given path from point A to point B can have a varying delay depending on what additional fanout you have branching out and where
<leocassarani[m]>
Yeah that sounds like a nightmare
<Wanda[cis]>
so. how to deal with this.
<Wanda[cis]>
for timing database I believe you should copy vendors timing models as closely as possible
<widlarizerEmilJT>
That's an interesting problem
<Wanda[cis]>
because of the fundamental fact that this is what the devices are factory-tested against
<Wanda[cis]>
you cannot possibly get a batter approximation of how the device works because you're not the one doing binning at the factory
<Wanda[cis]>
any differences between your db and vendor db would necessarily have to be conservative approximations. which is generally very much not in your favor.
<Wanda[cis]>
hence why I'm considering just extracting the timing database wholesale
<Wanda[cis]>
for virtex7 the situation is really funny because vivado allows you to just... ask for all internal timing parameters via the tcl API and get the raw values
<Wanda[cis]>
the very same speed database is heavily encrypted on disk btw.
<Wanda[cis]>
completely pointless move
<leocassarani[m]>
incredible
<Wanda[cis]>
for pre-virtex7 you have to rely on ISE, which... well, will not let you ask for speed model parameters
<Wanda[cis]>
the speed database on disk is likewise encrypted, but ... not heavily so
<Wanda[cis]>
actually it's kind of a joke.
<Wanda[cis]>
I'm still not completely sure whether I want to touch it
<Wanda[cis]>
but due to the nature of speed data, the best I could do is literally extracting the exact same database, just via more roundabout means
<leocassarani[m]>
Is there a risk that they would consider it copyrighted information? (cue DMCA takedown)
<Wanda[cis]>
interesting question, isn't it
<Wanda[cis]>
as far as people I've talked to can tell, it's a database of factual information and not really copyrightable
<Wanda[cis]>
but you know. not a lawyer.
<leocassarani[m]>
Weird though isn't it, that for example maps (as in, maps of the world) are copyrightable even if they simply describe facts the ground
<leocassarani[m]>
* describe facts on the ground
<Wanda[cis]>
(I believe one of them actually did hire a lawyer; how competently they actually described the situation to the lawyer, however, is another question)
<leocassarani[m]>
Do you have any specific plans for what prjunnamed's P&R (including timing analysis) will look like? Or do you plan to rely on nextpnr for a long time before tackling that?
<Wanda[cis]>
we actually want to bootstrap it rather soon.
<Wanda[cis]>
our main goal right now is to basically get the entire skeleton of the toolchain in place, so we can validate the entire flow, and more importantly validate that our own internal structures are up to the task
<leocassarani[m]>
Exciting!
<Wanda[cis]>
the quality of the result (in the sense of performance metrics) is secondary, we're mostly concerned about not locking ourselves into suboptimal high-level design decisions
<Wanda[cis]>
there's a minor snag in that I didn't get prjcombine siliconblue bindings ready in time for the prjunnamed kickoff, and we'll probably get blocked on that for a few days or so
<Wanda[cis]>
we also considered linking in icestorm instead, but... that'd have problems that would way outweight the small delay
<leocassarani[m]>
I guess by comparison to Yosys, Nextnpr is a much younger codebase, and it was based on the lessons learned from arachne-pnr, does that mean it's closer to the model you'd like to see in prjunnamed?
<Wanda[cis]>
one thing Cat suggested, which I think may be a good idea, is to start the entire P&R part from the timing analyzer
cr1901 has joined #prjunnamed
<Wanda[cis]>
ignore the actual P&R for the start
<Wanda[cis]>
just load P&Red netlists from nextpnr and produce timing data
<Wanda[cis]>
see, one of major nextpnr flaws is a rather simplistic timing analyzer and timing constraint model
<Wanda[cis]>
this is something we very much want to get right in the design
<leocassarani[m]>
Yeah I can definitely relate to that as a nextpnr user
<Wanda[cis]>
leocassarani[m]: about that: closer than what? than yosys?
<Wanda[cis]>
maybe
<leocassarani[m]>
That is what I was asking, yeah
<Wanda[cis]>
the IR it uses is also questionable, but not nearly as much as RTLIL; I believe Myrtle has expressed interest in replacing it with something unnamed-like in the past
<Wanda[cis]>
from my experience of making a nextpnr backend, one of the main pain points we'd like to fix is the architecture interface
<Wanda[cis]>
though nextpnr itself is already making some progress on it