this is actually why I'd rather not have any backends that are not based on prjcombine; having a unified database format is a core part of the plan
Yeah, makes complete sense
after the Xilinx saga and the current (unpublished) siliconblue work, I'm reasonably confident prjcombine can deal with pretty much anything we'd want to include
(that and also I've looked into Gowin and other Lattice devices closely enough to be reasonably sure it won't be a problem)
anyway. possibly the most core issue we want to fix with nextpnr is that it's a separate tool at all, with a separate IR and all it implies.
having an integrated toolchain allows you to do things that are simply not possible with separate synthesis and P&R tools
like, say, resynthesis. pick the area that failed timing, and apply some extra-strength expensive optimizations to it post-factum.
Yeah and potentially much better diagnostics? Whenever my design fails timing it's a bit of a nightmare figuring out exactly why (relating it back to the Verilog)
oh that's kind of a yosys problem actually
it... does an absolute mess of naming cells
Yes I've noticed :D
RTLIL has this fascinating property
every cell and wire has to have a name. exactly one.
of course, there is not really meaningful name to pick when you're in the middle of some advanced logic optimization pass that makes a complete hash out of the original netlist, and writing yosys passes is hard enough without having to worry about naming every individual cell you emit, so in practice the stuff that ends up in the name is an absolute incomprehensible mess.
the abc mapper basically just nukes names and assigns completely random ones later.
the very first thing we did when designing prjunnamed IR is throwing out the names and just identifying every single cell by its position in the netlist. by which I mean a literal index within the array.
and we just got rid of wires entirely.
hence the project name.
oh that had completely escaped me
names are powerful things, and they are to be assigned with intent.
we just treat them as separate things that can be attached to cells or values. or not. or maybe you can attach several distinct names to a cell, because of aliasing.
So your IR is in the netlist crate, right?
we believe the set of cells we have is reasonably good, and won't change all that much
(well we want to add print cells and assert cells and some other stuff like NIR has, but the core synthesizable cells are there)
however, we haven't implemented a proper metadata system yet
which is also planned to be a core part of the IR
stuff like source locations, scope information, and proper hierarchical names
Yeah, looks really interesting so far
Thanks for explaining it all, it's very exciting
you're welcome
it's nice to finally put down this stuff in writing
that project has just been in my and Cat's heads for like 3 years
<Wanda[cis]> "hence the project name." <- Ohhhhh.
Are you interested in eventually supporting cyclone V, and if yes, what should it look like? I'd like to help converting the information I plonked in mistral into however you'd like it to be
cyclone V and altera in general is very much in-scope, eventually
and I'd like to implement that by writing a proper Quartus backend for prjcombine, and reversing every single device altera ever made in one fell swoop
should stuff go into prjcombine?
but also
right now we are not that interested in having altera as a target
we're only interested in particular targets right now as a tool to validate the core flow actually works
siliconblue has always been the obvious place to start. the second obvious target is xilinx, particularly given the nice database I have laying around
these two should give us more than enough things to work on
some peculiarities of quartus: they have to backends for timing. One does a proper model of wires, capacities and stuff and uses step integration to compute the actual voltage change shapes and extracts timings from that (quartus_sta). The other has gigantic times of timings for every wire and is used for P&R
gigantic tables that is
however. there is another major problem that makes us very uneasy with implementing xilinx or altera support.
cranky lawyers?
no, worse
consider what happens to this project the second it becomes known we have full xilinx flow.
I'd suspect the answer is "nothing much" unless vivado is really expensive?
quartus for cyclone v is free
have you seen the shit people will put up with to get open-source xilinx flows?
I haven't, is it bad?
symbiflow. prjxray.
right now prjunnamed, if we were to get P&R and finish synthesis for siliconblue, is just a funny alternative to the existing established yosys toolchain, and when things don't work we just say it's experimental and people come back a few months later or something
if we were to get P&R for xilinx, prjunnamed would instantly become *the* highest-quality open-source toolchain for a bunch of devices
and it's a little early for that?
I'd say nobody cares about cyclone V except there's mister and analogue pocket
so open source people with the associated... enthusiasm
do you use a specific devboard for sb testing?
this is one of these questions that matters not one bit
i just grab something out of the pile of boards we have
and a pile of rtl i have
a mature toolchain shouldn't be tied to a singular device
by the way, one of the reasons I've always disliked so many open-source reversing projects is the weird focus on particular devices.
so we might as well could start on that from the beginning
ah ok :-) I don't have a pile of boards, at least not ice* ones
this is actually a core design principle of prjcombine
i have several dozen boards. i don't even remember all of them
i can borrow a kintex ultrascale also
make this stuff capable and automated enough that it'll just blow straight through the entire device list
thats the highest one ive touched
and it worked. the prjcombine support list starts at XC2000 and ends at Virtex 7.
Not sure how one automate RE-ing the bitstream format
that's fairly straightforward?
(framing, checksums, all that stuff)
oh, that
(compression too)
oh, you don't automate all of it
ok then
that doesn't vary too much between devices
you just make a framework that makes it easy
it varies quite a lot from an altera device series to another for some reason
I mean from cyclone V to IV to X, etc
not withing cyclone V of course
but also, about the ice40 board
there is actually one that we'd be particularly interested in supporting
you may have heard of it
glasgow? ;-)
As it happens I have one just there connected to various points into a yamaha mu100b
Any plans for managing passes?
you'd have to be a little more specific
i feel like we should be offering our users the same amount of managing passes as, say, clang
which is to say that most people should not even think about it
I guess I'm interested in what you think about user-customizing flows vs baking a happy path
oh that.
yes, then I believe Cat put it into words very well
i'm actually fairly comfortable saying "if you want to customize the flow build it from source" although we'll eventually have some sort of extension language one way or another
of course we'll still have like, target options.
I consider yosys exposing its guts at the slightest provocation to be one of its worst design features
exhibitionist software design
it would be nice at some future point to have all of it as a python library you could import in a amaranth program
it results in no clear delineation of what is the public API and what is not, resulting in it being absolutely not possible to change anything for the fear of breaking something somewhere
galibert[m]: to have what as a python library?
prjunnamed, from amaranth standpoint, is a box you put netlists into and bitstreams come out
a bit like exec() without temporary files
we have the platform stuff for that, yeah
I really don't know what you're asking for here
don't worry, I'm not 100% sure I know either
we do have the platform framework that manages running toolchains for you; this applies to all of yosys+nextpnr, icecube, vivado, ...
just ignore me at this point :-)
and it will equally apply to unnamed
<Wanda[cis]> "I consider yosys exposing its..." <- anyway, I want to expand on this a little more
there are several problems with the yosys design here
exposing an interface like that has a cost
the problem with yosys is that it, in many ways, encourages you to meddle with passes
you already have to write a yosys script to do anything useful; the direct interface to the passes is the exact same interface you use to run synth_ice40
and yosys is quite proud of it too, encouraging you to hack up random scripts to hack on the IR
now. the cost. it is twofold
first, it exposes RTLIL internals and pass characteristics as a public or semi-public interface, setting them in stone and preventing you from improving it
hence stuff like $mem*_v2 cells where I went to considerable amount of effort to maintain some compatibility because we could not have any idea who is actually using the raw cell interface
second, passes in yosys have input invariants which they do not enforce. you can very easily shoot yourself in the foot by eg. running a wrong pass on a netlist with processes.
you mean users outside of the yosys upstream or within it?
outside of yosys upstream.
so. the two costs.
the invariants are a particular PITA
they are not too much of a concern for a debugging interface
maybe we should have something like that. for clang you can do -emit-llvm and go to town with the opt command.
but there must be a clear delineation of public and private APIs.
plus sane people would like to just do fpgac --target whatever design.v -o design.bin and not weird scripts
between stuff we commit to supporting, and stuff we will just break whenever it's convenient.
galibert[m]: yes. this is the interface we want.
heh, you're sane
Wanda[cis]: I would like that to be the case with prjunnamed as well. Though with clang, the method of breaking down the automatic full flow to component calls with saving intermediates actually isn't transparent. So there's space for improvement for even better experimentation
well, it's going to be a little more complex because you want to look at timing reports and whatnot, but still. this is the focus point.
clang's pass manager is a notorious nightmare
i'd like to avoid having that
can you elaborate?
or link to resources etc
remember chandlerc converting everything to the new pass manager for close to a decade?
Uh, I don't. I'm a late arrival to compilers
I'm aware of there being two though
ok, well, that happened
if that alone doesn't tell you there are issues with the pass manager i don't know what to say
basically, pass ordering is more of a dark art than a science, and managing pass dependencies is extremely challenging
the less configurability we have in that part of the flow, the better
Kind of a gap between "there are issues" and "this is a nightmare", I assume everything has issues
any refactoring that goes on for more than five years is definitionally a nighmare
although configurability didn't cause that part
like... a pass manager is basically a mini-build-system which has the additional unpleasant property where the thing that is being built is continuously clobbered
and it has to track invalidation of dependencies regardless
i don't want to work on that!
got it
oh, and to be clear, I consider this one of the worst design failings in yosys is because it locks everything else into place.
during my time at yosyshq I was more than willing enough to transform RTLIL, over time, into something more reasonable
the weird init attribute thing? I could just lock myself in a cave for a week and exorcise it from the entire codebase
but exposing so many implementation details is what made it impossible
well. that and Claire being.... resistant to change
so what I did instead is make many overlays over RTLIL instead, that look like the IR that should have been
things like kernel/ff.cc or kernel/mem.cc, or SigMap (though that one's not on me)
but this has always been a massive pain to deal with
If the flow of passes is restricted to require less dependency and invariant modeling, I wonder what this means for alternate flows in the future - formal and ASIC. I mean if there's 3 restricted flows built into the tool rather than an undefined subset of a combinational explosion of then that's still productive and maintainable
in many ways, there's a much simpler IR in yosys that's struggling to get out
in fact, you'll note the strong similarity between unnamed FlipFlop cell and what I made ages ago in kernel/ff.cc
Wanda[cis]: Yeah I'm writing yet another pass that is almost purely poking at your ff interface
(it's going to be even more similar when we actually add latches to the model)
(just. Cat kinda made me descope latches. in no small part because iCE40 doesn't have them.)
i think we probably don't want to have a formal flow
I've been talking to Jannis and most likely the way to go here is to use imctk once it's ready
we can of course add export/import support but besides that, synthesizers are not actually very good at formal verification
i'm not even sure if we should be processing decision trees, i already take advantage of X-prop there...
there's some value that unnamed can provide here once we have an elaboration story and actual frontends; but that'd not include the actual synthesis part, just using the common IR as a funnel
feels like an error number collision?
doesn't feel like something that'd happen?
i think it wants to exec a sat solver
ohhh. Lemme strace it
go and install z3
looks like I need to do a full update, gonna take a little while
fwiw, strace agrees with you, it's z3 that's missing
oh hey, it's even written down in the README
and I missed it, sorry about that
yay, all tests passed
ChangeQueue::unalived_cells? c'mon
do you not have the balls to call it murdered_cells?
what's the difference between Cell and CellRepr?
also Value vs Net? is a Net always one bit wide?
Net is one bit; Value is a list of nets
Cell is the conceptual model; CellRepr is something we actually store in-memory for efficient storage of fine netlists (ie. netlists where most stuff is 1-bit-wide gates)
it's... not clear it's actually worth it
it came up in the past
i read the entire available history of the channel but i might have missed it
element scrolling can be wonky at times
oh, no
it's come up in previous discussions in like 2021
<Wanda[cis]> "it's... not clear it's actually..." <- might make more sense to like, intern the values or something
make em indices into a flat Vec<u32> under the hood
yeah uh.
tbh the memory representation thing is a bit of a reaction to some old yosys discussion
I'd rather not worry too much about memory efficiency just yet
so, you have this split for now, to make sure that you don't design yourselves into a corner where it's hard to introduce, in case you decide that it's a good idea
so, a Net is just an index into Design::cells?
oooh, that's what the Skip cell is for
it... may be on the "overly clever" side
yeahh, it kinda feels like a lot of the savings of being able to refer to a particular bit in only 4 bytes is gonna be eaten up by all the Skips lol
but this should be refactorable later with benchmarks to guide us
I'm more worried about all the allocations than about skips
anyway. you may be interested in the other alternative, which was implemented in amaranth NIR.
the difference is pretty simple; a net is a (cell index, bit index) tuple, and there's no skip stuff
the reason we switched to the current version is because we were concerned about efficient representation of fine netlists, ie. netlists made mostly from individual gates
if you have such a netlist, there's basically no skips, the CellRepr short format is used so there's no allocations
it's close to optimal
right, a lot of the compute-intensive parts are being done on the fine netlists
that is the idea, yes
there's a bit of a problem with this plan because we don't represent LUTs all that well
oh. oooohhh
yeah, but this is just low hanging fruits for perf work later on
this is fine
if you try to design a tight compressed ternary representation for this shit I'm going to personally murder you.
but. yeah. this is stuff that can be improved later on.
we do have the capability for compression
Wanda[cis]: kinky
i was thinking more of a pair of bitvec who love each other very much kind of vibe
oh we were thinking of something else
`Either<Vec<Trit>, (u64, u64)>` is same size as `Vec<Trit>`
and the pair of u64 is enough to encode consts up to 64-trit
... without doing weird compression stuff to pack 80-something trits, mind you.
I may have yosys trauma.
who the fuck would do arithmetic coding for an in-memory representation
you're supposed to be doing computation with this
were they on drugs when they designed this
... I mean.
so, what are the semantics of Trit::Undef, exactly?
excellent question!
basically something like Verilog X until decided otherwise
also similar to LLVM undef in that you can substitute it with anything you want, including with a value that magically appears to be different each time you look at it
(this question is motivated by seeing Trit::mux)
ie. you're allowed to optimize a = undef; b = a xor a to b = 1 if you'd like
so. this is the bit where we're going to need tighter definition to not fuck this up in subtle ways.
Trit::mux is exactly Verilog's ?: operator semantics
I think it's reasonably good
but there's less obvious stuff
so X in verilog is kind of like unreachable_unchecked?
not quite
X is not a value that instantly disintegrates your circuit when it appears somewhere
materializing it is perfectly okay
it just happens to poison everything downstream with itself, until it's gated off
ie. mux(1, 0, X) just selects the 0 and doesn't let the X propagate further
i feel like the existence of X can lead to a lot of subtly incorrect optimizations
X kind of means "just stuff whatever is convenient to minimize the circuit here, I'm not going to use it"
it's a dangerous tool
convenient how prjunnamed_smt2 just seems to bail on any mention of X
that's because it's kinda MVP-grade
we need X-aware smt2
and for that we need well-defined semantics
but, this stuff has been done previously in yosys, in not completely horrible ways
I mean, the X-aware smt2, not the well-defined semantics
I've also wondered if we're going to need basically two subtly different variants of cells to ensure the X propagation rules don't fuck us over
consider the adc cell
the approach I have (a little implicitly, but it's codified in the const-eval code) picked for unnamed is that an X input at bit position a makes all output positions from a upwards undefined
it's not the only definition possible
Verilog uses a different decision here: if any input bit is X, all output is X
which is inconvenient because it means you cannot merge a[7:0] + b[7:0] and a[3:0] + b[3:0] into a single adder. an X input at a[4] would wrongly poison the lower bits.
(enabling this optimization is what motivated my reasoning)
you could also go into a little more detail, and think of the adc as a series of XOR3 + MAJ3 gates (or, full adders, if you'd prefer), and stop X propagation through the carry chain at the points where the other MAJ3 inputs are both 0 or both 1
but I'm not convinced there's any useful optimizations that this enables, so meh
what is X useful for?
galibert: for letting the synthesizer know you don't care about values of some signals in some branches
consider an instruction decoder driving control signals to an ALU
what do you send there when you're processing an instruction that doesn't touch the ALU?
so you're delegating that decision to the compiler
the obvious answer is X, which allows the synthesizer to simplify the decision trees and send whatever garbage is convenient
amaranth doesn't have X at this point, right?
Amaranth design is centered around not letting you fuck yourself over too easily
X can be an incredibly dangerous tool
which is nice, honestly
it is not quite as bad as undefined behavior in C, but it's the same kind of thing
in theory the compiler could find out that a signal is not important in a given context, but I guess that's yet another thing that's NP-annoying
mind you, there have been discussions about including undefined values in Amaranth
but they were centered around the one place where they're very hard to avoid
which is memory initialization in ASICs
for non-initialization of srams?
which is... well, not a thing
it's more of a sim issue though
the sram case
Amaranth requires all memories to have well-defined initial values; usually (though not always) trivial to implement in FPGAs given that they're loaded together with the bitstream, pretty much impossible in an ASIC
now, about sim
X values are kind of very annoying to deal with in simulation
particularly when it's based on sequential processes, not netlists
they're supposed to behave like a non-fully-viral NaN, is a way? like a and 0 should still give 0?
consider: what happens when you have an if (a) begin [...] end; in Verilog and a happens to be X?
galibert[m]: basically yes
the if case sounds terrifying
Wanda[cis]: according to the Verilog standard, the answer to this question is "`if (a)` executes iff `a` is 1, so it does not execute when it is `X`"
that's a Bad answer
this results in the entire Verilog simulation model being essentially garbage that's not fit for purpose when X is involved, yes.
it results in simulation-synthesis mismatch of the bad knd
ie. simulation claims you get a well-defined value, while the hardware will just do something completely different.
you'd better scribble X everywhere in that case, that's less traumatic
the "it's fucked" would be way more obvious
you'd better scribble X everywhere that the relevant branch touches, yes
but Verilog cannot specify that behavior
Verilog is Bad, news at 11
because Verilog is not a synthesizable language in the general case
consider: it has loops.
because verilog can printf?
and, yeah, stuff like printf
or rather, $display
it'd be funny if you did a printf in an if(a) when a happens to be an X and it just printed a bunch of X's to the terminal
should set the oldschool blink attribute
well, loops on scalar, netlist-constant data makes perfect sense, we do that all the time in amaranth. But I guess you're not talking about that, right?
that's what I was afraid of
Verilog mixes simulation-only models with synthesizable stuff with no clear delineation
and some stuff that doesn't even make sense in simulation, it feels like at times
you can actually just inspect the value and check it for X
it has two equality operators
== which gives you an X if there are X inputs involved and the result isn't obviously unequal on other bit positions
only two? We know worse :-)
and === which does a literal comparison that includes matching X to X only, and always returns 1 or 0
galibert[m]: don't worry, SystemVerilog added more
How reassuring
funnily enough, RTLIL actually has separate cells for the two! what for, I do not know
maybe formal.
is your IR RTLIR with the bad parts removed or something completely different?
it's actually derived from Amaranth NIR
which is in turn derived from Amaranth HDL
... which, in turn, was somewhat influenced by yosys, yes
standing on the shoulders of... [insert noun here]
it is influenced by RTLIL, it'd be a lie to pretend otherwise
Amaranth doesn't exist in a vacuum and lowering to RTLIL was a consideration when we were designing NIR
not everything is bad in rtlir, far from that
that said, we trimmed it quite aggressively
oh it's rtlil, not r?
it's very much rtlil
language, not representation I guess
anyway. our focus was on removing much of the redundancy that exists, so the passes don't have to consider a myriad ways of expressing the same thing
that's nice
eg. you'll note that we only have `==`, signed `<`, and unsigned `<` comparison operators
I'm actually still considering getting rid of the or cell
maybe I'll do it
not and not is kind of busy
a little, yes
but inverters are trivial to fold
you want to keep some readability for the poor humans trying to debug stuff
and it reduces the number of rules we have to add
it's not like I invented the concept, either; AIGs are a thing after all
google is only answering insurance stuff when I put in AIG?
and-inverter graph
the splitting of xor isn't annoying in AIGs?
it's a representation commonly used for logic optimization
well yeah
that's why XAIGs are a thing too
and why I'm not proposing to get rid of xor
yeah, I remember some proof systems added xor because removing it is combinatorially annoying, especially when you play with crypto computations
xor is a distinct thing that's useful to consider on its own merits; or is just an evil mirror twin of and
a few days ago I called or an "and under CPT symmetry" and Cat hissed at me and told me to go to sleep.
she had a correct reaction
Wanda[cis]: badeline!!
.... mei
mirror mommy
well luckily for you you're out of biting range right now
...exactly what i was typing
Now I'm wondering what happens when a LUT beta-decays, and that's all your fault
galibert[m]: now that sounds like a kind of area-recovery optimization I'll have to implement or something
<Wanda[cis]> "consider the adc cell" <- oh, and I didn't finish that thought
so there's a problem with our adc cell: when you're emiting Verilog, you actually cannot legally emit it as a + operator because of the incompatible X semantics
on the other hand, our X semantics is useful for optimization
so, what, do? one option I've previously considered is to have subtly different flavors of some cells that differ in what exact X propagation rules they promise/require
adc is one such cell; shifts are another
that's going to cause some many subtle bugs
shifts are particularly interesting because, aside of X semantics, they're a little bit equivalent to binary tree multiplexers
so my original IR draft from two years ago had variants.
how about: if emitting verilog, freeze all the X's to 0
won't help you because you can get X on primary inputs
X is a fact of life. even in hardware.
how come?
consider: there is no such thing as digital logic
okay but that doesn't actually map to your X does it
fpga like to pretend there is though
what do you think happens when you have an input that's halfway between GND and VCCIO?
you get some garbage that will propagate until it's properly gated off
it's an X.
it does look like an X
but it isn't really much of a controlled process is it
it's not a X, it's a whyyyyyyyyyyyyyyyyyyyyy?
it operates on the same rules
ANDing that shit with 0 will still get rid of it
i have no experience with this but i would assume that you can't actually promise anything about what happens if the input pin is an X?
Wanda[cis]: okay but LUT4'ing where the lut bits happen to align into an and?
interesting question, isn't it
generally most vendors' LUTs are glitch-safe, ie. if the two LUT bits are actually the same, they'll reliably pick the value
weh i just want to stick the middle fine
s/fine/finger up verilog's butt/
(oh, by the way, this is where X actually can come into play in FPGAs: not undefined primary inputs, but internal wires in the process of switching)
mei[m]: kinky
oh, come on, butt stuff is kinky? what are you, a cishet?
I'm fine with butt stuff, but Verilog?
verilog stuff is kinky
and we don't talk about vhdl
what's better than 4-valued booleans? 9-valued booleans.
isn't there one with 64-values booleans?
eh someone probably made something
it's not like Verilog has only 4 if you look closely
have you seen the drive strength stuff?
isn't that where there are the 64 levels indeed?
I haven't counted
vaguely feels like more than that?
if you want the evil twin of the current unnamed IR
ControlNet is basically just Net + optional inverter
it's called ControlNet because it's mostly used for control signals which tend to be used with various polarities
(eg. ASICs love their active-low async resets)
FPGAs just tend to have free bitstream-controlled inverters on control inputs (clocks pretty much always, resets and enables often), so we have the inversion as part of the cell where it can be extracted while techmapping
cyclone v has optional inverters pretty much everywhere
so, are there any optimizations that introduce more X's?
trick question
I'm not sure how to interpret it
increasing amount of X in netlist by volume? yeah sure that's going to happen in ways that are not related to X
just because something gets mirrored somewhere
generating an X where there was none before? that'd mostly be an invalid optimization unless you can prove the result isn't used
there's the semi-const-folding rules that'll eg. replace the entire output of a mul with X if there's an X input bit at any position
wait, is that valid? what decides whether that's valid
you should be able to do truncations like with adds
yes. which is why I'm thinking of changing this particular rule.
the validity of the transformation is, of course, decided by the abstract model of what a mul cell is
which we get to define however we want, to enable whatever optimization passes we want, as long as we stay within the constraints defined by the input and output languages
the Verilog model of the * operator matches our current mul cell definition, which allows us to consume the Verilog operator directly into mul, and emit mul as * on Verilog output
however, it doesn't allow the optimization
we could change the model to be what you're implicitly proposing here, with X poisoning only higher bits
then we'd still be able to ingest * operator directly, we'd now be able to perform the optimization, but we'd no longer be allowed to emit mul as * in Verilog
annoying, isn't it
So would you need to emit * as some kind of X-guard in front of a mul cell?
you mean in Verilog frontend?
Is the issue that you'd need special handling for X values?
Yes, sorry, in the Verilog to IR translation layer
no, that's easy
you can just import the cell directly in both cases
you'll be relaxing the X behavior in the second case,but that's allowed; a synthesizer can always arbitrarily decide to replace an X with whatever
Ooooh, of course
IR to Verilog is where it gets tricky
That's handy
Yeah I can see what you're saying now
oh, and by the way, for another major annoyance caused by X semantics, see Verilog LUT models
it's one of these cases where the alternative would be pretty messy as well
the alternative is "take every possible combination of X bits substituted with 0 and 1, check if all input bits at the index set are the same`
* the alternative is "take every possible combination of X bits substituted with 0 and 1, check if all input bits at the index set are the same"