<cr1901_>
Why isn't the harvester crate called "sickle"?
cr1901_ is now known as cr1901
Wanda[cis] has joined #prjcombine
<Wanda[cis]>
I don't think they make combine sickles
whitequark[cis] has joined #prjcombine
<whitequark[cis]>
... "combine harvester"
<whitequark[cis]>
oh my god.
<whitequark[cis]>
that's so upsetting.
<Wanda[cis]>
that has always been the intended reading of the project name!
<cr1901>
I thought it a play on communism- hammer and sickle
<Wanda[cis]>
well. either that or as a HL2 reference, if you prefer to think of me as an evil alien hell-bent on assimilating everything around me.
<Wanda[cis]>
which, fair
<cr1901>
Never played it
<cr1901>
Issue #1 says to document them eventually; what's the short version of "how is the RE job split between hammer and harvester?"
<Wanda[cis]>
they're two completely separate approaches to RE
<Wanda[cis]>
hammer relies entirely on controlling every single feature of the bitstream, down to individual routing pips
<Wanda[cis]>
it's the first one I designed, 5 years ago or so (a few rewrites of this project ago), and it is what the ISE and XACT reversing code in prjcombine is based on
<Wanda[cis]>
unfortunately, it has two fundamental flaws: first, it is entirely incapable of dealing with toolchains that simply don't allow this kind of control (or where it's prohibitively hard); second, it is often very tricky to design hammer samples that pinpoint the exact thing you're trying to reverse without dragging a bunch of other stuff into the bitstream diff; this is particularly true for toolchains with strong required DRC (XACT
<Wanda[cis]>
already gave me a lot of trouble here; Vivado ... I believe would be theoretically possible to handle, but would be even worse to deal with)
<Wanda[cis]>
so, I decided to design an alternate approach, capable of reversing more targets, and the result is called harvester
<Wanda[cis]>
it relies on generating mostly-random crap, letting vendor P&R do whatever it wants with it, and then just using the resulting placement and routing information to correlated bits within the bitstreams
<Wanda[cis]>
the only requirement on the toolchain becomes that you have to be able to extract the final routing from it, you don't have to control it
<cr1901>
>often very tricky to design hammer samples <-- For trellis, we manually edited NeoCAD files, which Diamond happily ingested and emitted a bitstream. The main issue was figuring out what the valid NeoCAD fields were :P
<cr1901>
^That sounds like hammer
<Wanda[cis]>
it also has its downsides. it's slower, because you don't get to just directly ask for the bits you want, you have to wait until they fall out from the router by chance.
<Wanda[cis]>
and you sometimes have to manually nudge it into giving you what you want anyway
<Wanda[cis]>
but after getting really frustrated with XC4000 interconnect reversing, I believe this should be the way to go for further RE work going forward
<cr1901>
I'm poking around and reading the src to see how MachXO2 RE mk II would work
<Wanda[cis]>
the harvester approach is still basically in testing, by its first target (prjcombine-siliconblue), which was explicitly picked as something easy that won't require me dealing with ridiculously large bitstreams
<Wanda[cis]>
the core works perfectly, but I still need to work on diagnostics when something goes wrong
<cr1901>
Maybe gatecat or you know other ways, but getting the routing info in plaintext from Diamond requires a NeoCAD parser. It was presumably easier to write NeoCAD by hand until Diamond didn't bitch, and then correlate bitstreams for each pip,non-routing config,LUT vals, etc
<Wanda[cis]>
what exactly is a "NeoCAD parser"? NeoCADis a toolchain, not an interchange format
<cr1901>
"The file format that original NeoCAD toolchain that Diamond is derived from used"
<cr1901>
The NC is for NeoCAD, and there's an extra "L"
<cr1901>
so I call it "NeoCAD" for short
<Wanda[cis]>
the plan for prjcombine-lattice is exactly to parse ncl and use harvester.
<whitequark[cis]>
<cr1901> "Maybe gatecat or you know..." <- ki have a neocad netlist parser on my hard drive
<whitequark[cis]>
s/ki/i/
<whitequark[cis]>
it supports specifically Diamond netlists
<whitequark[cis]>
it's written in C++ however but it has all the RE bits you could wish for
<cr1901>
Ahh, well at least that work doesn't need to be done then :P
<cr1901>
whitequark[cis]: Can you post a link so I could look at it?
<whitequark[cis]>
no, for... reasons
<whitequark[cis]>
I can however DM you an archive of it
<cr1901>
Ahhhh... I understand.
<whitequark[cis]>
the reasons are entirely personal, the code itself isn't private. long story
<cr1901>
Yea, no worries, I left my lone q in DM
<Wanda[cis]>
alright, so back to this discussion
<Wanda[cis]>
<cr1901> ">often very tricky to design..." <- yup that sounds like the hammer approach, though perhaps without all the massive batching stuff that makes hammer fast
<Wanda[cis]>
so I was originally planning to write prjcombine-diamond-hammer as the second target right after ISE; there's even some initial work in the repo for it
<Wanda[cis]>
this was motivated exactly by Diamond 1) allowing this kind of control via ncl, 2) allowing you to just dump the complete interconnect database tcl api just like ISE does, 3) being otherwise similar to ISE due to shared neocad ancestry
<Wanda[cis]>
point 2) is also kind of important for prjcombine-hammer: you don't just need to have control, you also have to know what to aim for
<Wanda[cis]>
but then I have decided that doing this as the second target because it's easy is the obviously wrong decision
<Wanda[cis]>
instead, I should make sure that prjcombine is capable of working with diverse toolchains before cementing the existing design even harder than it already is (by 77kLOC of prjcombine-ise-hammer)
<Wanda[cis]>
siliconblue is the perfect target. it fails all of criteria 1-3, while also being small and already mostly reversed, so I could just look up what the results should be and focus on developing the actual RE approach
<Wanda[cis]>
(well, and also prjcombine-xact-hammer became the actual second FPGA target after I needed a distraction while sleep deprived and on train)
<Wanda[cis]>
other targets that have been considered for developing prjcombine-harvester were Vivado (which was another obvious next target, given where prjcombine's supported device list currently ends...) and Quartus.
<Wanda[cis]>
I rejected Vivado because it still would be "too easy" by allowing you to dump the interconnect database (prjcombine already has the full interconnect database for ultrascale, in fact) and being similar to ISE, and also because developing core RE methodology while dealing with a bitgen that takes minutes and produces hundreds of megabits of bitstreams would be incredible suffering
<Wanda[cis]>
and I rejected Quartus because, once started, that could easily tie me up in an incredibly long and complex RE campaign with a toolchain and vendor that are batshit insane
<Wanda[cis]>
I'll deal with it, one day. but it'll take an intentional decision to deal with Quartus and all the devices it supports, it's not something to be casually done on the way to a more interesting goal
<Wanda[cis]>
and now that prjcombine-harvester exists, I believe it should be used for all new targets by default, even when you have the level of control that ncl allows. that is, unless something weird comes up that requires a different approach.
<Wanda[cis]>
there is, by the way, a particular reason why I am somewhat distrustful of XDL- or ncl-based stuff
<Wanda[cis]>
you use xdl/ncl so you can have low-level control over the netlist, yes?
<Wanda[cis]>
the primitive attributes correspond more closely to the bitstream than verilog does, after all
<Wanda[cis]>
but that is dangerous.
<Wanda[cis]>
you run a serious risk of missing something about how verilog corresponds to xdl/ncl
<Wanda[cis]>
an approach with Verilog at the frontend lets you verify the whole thing end-to-end, something that xdl/ncl-based reversing is not really capable of
<Wanda[cis]>
I was worried enough about this problem that prjcombine has a completely separate step where it fuzzes the verilog to xdl transformation and verifies the results are as expected. and this step has indeed found a few surprises.
<Wanda[cis]>
this is actually one of the core problems faced by prjcombine: it is incredibly easy to miss little details when you're operating on this scale of scope and automation. this is why I try to design every step with cross-checking in mind, as much as is practical.
<Wanda[cis]>
so one obvious reason why prjcombine-hammer massively batches samples together into shared bitgen runs is performance. however, the other, equally important reason is that this allows me to detect when two features turn out to be not as independent as I had assumed. the hammer core explicitly includes redundancy and randomization to make it more likely that it'll blow up in this case.
<Wanda[cis]>
this is also why I don't consider any database included in prjcombine to be at all reliable until it has been properly verified in-hardware and the results documented. unfortunately, I am still not quite sure of the details of how to do that verification, particularly at scale.
<Wanda[cis]>
I may be saying that "I have extracted all bitstream bits that can be extracted from ISE", and it is the literal truth for the most part, but that's... not quite a gold standard
<Wanda[cis]>
there's a bunch of bits that I found completely incomprehensible even though I understand their verilog-to-xdl and xdl-to-bitstream mapping behavior, and I basically decided that the only way to deal with them is to just hook up a scope to the device and start poking at things
<Wanda[cis]>
the usual suspects being I/O tiles. or whatever the hell is going on with Spartan 6 clock distribution.
<cr1901>
Gimme a few mins to read please :)
* cr1901
is eating, so can't focus on reading :(
<cr1901>
>an approach with Verilog at the frontend lets you verify the whole thing end-to-end <-- so, FWIW, trellis _does_ use Verilog for fuzzing I/O standards and a few other things. The minitests directory was for doing Verilog to ncl tests to get a feel for what template Verilog/NCL file will extract the most information 1/2
<cr1901>
Unfortunately, as you can prob guess, I/O standards are enough of a clusterf*** already and they are somewhat handwaved in Trellis (as in, I only have coarse "set these bits for this exact I/O standard". I don't know how those bits are split into further groups)
<Wanda[cis]>
I have actually managed to split them for the most part for ISE, mostly with incredible and manually-applied violence
<cr1901>
If harvester can be made to work on Diamond, that's great! I'm not attached to any particular way of fuzzing things. I'm mostly thinking out loud today :D
<Wanda[cis]>
I/O tile reversing code tends to be the absolute worst
<Wanda[cis]>
(said violence has involved a scope on a few occasions)
<cr1901>
("Given that I've done MachXO2 REing, and I understand mostly* how Trellis works, how can I apply those skills to help combine when the time comes, _without_ getting too deep into the weeds and burning out again?")
<Wanda[cis]>
"again", eh
<cr1901>
Doing the nextpnr port took everything I had. I desperately wish it didn't take that much effort/out of my comfort zone, but I wanted nothing to do with FPGA code for a decent period after the port was done. That's burnout, I think.
<Wanda[cis]>
oh hey that's pretty much exactly what happened when I was doing a spartan6 nextpnr port
<Wanda[cis]>
I have since concluded that this means the proper solution to the problem is to just reverse things, skipping the "write the nextpnr backend afterwards" part
<Wanda[cis]>
worked well so far
<cr1901>
When I go back to MachXO2, it helps me to focus on the things I enjoy most on the chip. Mostly playing with the UFM (User Flash Memory)
<Wanda[cis]>
... I wonder how bad prjunnamed-pnr is going to be
<Wanda[cis]>
well
<Wanda[cis]>
I guess we'll see what rule #0 is made of
<cr1901>
Hey, I'm taking a look at the code and chatting when my bandwidth permits :P. That's progress :D!
<cr1901>
(Re: mostly*- I never had to touch the actual 'find the differing bits' logic in Trellis. That stuff was rock-solid for getting MVP. So I never really looked at it. I'm sure I'd be fine after a few hours.)
<whitequark[cis]>
<Wanda[cis]> "... I wonder how bad prjunnamed..." <- i think it'll be fine, probably
<Wanda[cis]>
probably
<Wanda[cis]>
we just have a goal of designing a P&R flow competitive with vivado
<Wanda[cis]>
no big deal
<whitequark[cis]>
yeah!
<Wanda[cis]>
now here's where I'd say something rude, except I'm still not over that part where we just sat down for a month and wrote a synthesis tool
<whitequark[cis]>
nice
<whitequark[cis]>
i think we'll hit scaling issues but i also think we'll be in a better position to resolve them than nextpnr
<Wanda[cis]>
as in, when we actually hook up larger devices?
<whitequark[cis]>
yeah
<Wanda[cis]>
mmm, I'm not even worried about scaling issues, I'm worried about handling sparse interconnect
<whitequark[cis]>
i think you've already effectively started working on prjunnamed's P&R (although not `prjunnamed-pnr`) because the choices in database format are pretty significant
<whitequark[cis]>
Wanda[cis]: right
<whitequark[cis]>
I mean I think of that as a sort of scaling issue because bigger FPGA families have sparser interconnect
<Wanda[cis]>
I guess
<whitequark[cis]>
I am quite curious to see how this works out
<Wanda[cis]>
same
<whitequark[cis]>
either way I don't feel like there are unresolvable problems there since FPGA P&R tools, empirically, exist
<whitequark[cis]>
not only that but there exists a vast and diverse amount of them, notwithstanding how many of them are just NeoCAD again and how many of them are terrible
<Wanda[cis]>
whitequark[cis]: it didn't really feel like that at the time; I've always been optimizing for deduplication and understandability of the resulting database, not any sort of fitness for P&R purposes
<whitequark[cis]>
but that's a part of it!
<whitequark[cis]>
I think one of our bigger risks is that our poor database design (well, if it would be poor) crushes our ambition
<whitequark[cis]>
basically nobody seriously makes tools that are portable across FPGA families
<Wanda[cis]>
nobody ever really did that, yeah
<Wanda[cis]>
neocad maybe?
<whitequark[cis]>
i wonder if they did or if it was more like nextpnr where you'd use separate backends
<Wanda[cis]>
shrug
<Wanda[cis]>
there are per-family .sos
<Wanda[cis]>
but then, unclear how much there is in them, I've never bothered looking
<Wanda[cis]>
it's not like we'll avoid target-specific code either