whitequark[cis] changed the topic of #prjunnamed to: FPGA toolchain project · rule #0 of prjunnamed: no one should ever burn out building software · https://prjunnamed.org · https://github.com/prjunnamed/prjunnamed · logs: https://libera.irclog.whitequark.org/prjunnamed
sdomi has quit [Ping timeout: 248 seconds]
sdomi has joined #prjunnamed
mupuf has quit [Remote host closed the connection]
mupuf has joined #prjunnamed
cr1901 has quit [Quit: Leaving]
cr1901 has joined #prjunnamed
<whitequark[cis]> <mei[m]> "oh! so it's like that for you..." <- haha no it's like hauling rocks. i just do it anyway and eat the cost
<povikMartinPovie> if we had AndNot cells, it would enable representing an AIG with a single node per and gate of the AIG, modulo inversions at the boundary
<povikMartinPovie> this comes into play when you are writing mappers directly operating on unnamed IR -- complementing is somewhat free, so you don't want to distinguish between a fanout connected through an inverter, and one who's not; if you can normalize the source netlist so that it's free of internal invertors, it can help
<povikMartinPovie> I think And, Or, Andnot, Xor are a complete set to represent any two-input function modulo complementing an output?
<jix> another possibility is to use 2-luts, which give you one degree of freedom too much, but then evaluate the combinational part once using all zero inputs and normalize the output polarities of all those 2-luts to be zero
<jix> and that's exactly equivalent to using only And, Or, AndNot and Xor
<whitequark[cis]> i like the 2-lut option more
<whitequark[cis]> we've discussed it before
<povikMartinPovie> the 2-lut thing can just as well be an abstraction (using ControlNets maybe?), I'm not sure it's worth the bother introducing into the basic set of cells
<whitequark[cis]> i've also thought about adding ControlNets into the mix, yeah
<povikMartinPovie> is there a chance you will add Andnot? I'm inclined to add it via a local patch and continue my work
<jix> (the evaluation with all zero inputs is done if you want to normalize polarities even across combinational logic using representations where you can't push inverters through, if that's not a concern you can use rewrite rules alone ofc)
<whitequark[cis]> it seems pretty application-specific and thus far we've avoided pinning ourselves to AIG-specific restrictions
<whitequark[cis]> i think i want to understand your plan more first
<whitequark[cis]> can you tell us how it will simplify your job?
<whitequark[cis]> and in particular, why it should be a part of the general repertoire of cells rather than a pass-specific thing?
<whitequark[cis]> for example, the LUT mapper maintains a table of LUT dispositions and some other internal state, which it does not need for interchange with anything else; would this not serve you well?
<whitequark[cis]> the other possibility is just "looking through" an inverter, which is very cheap in our representation
<povikMartinPovie> sure, I'm porting a standard cell mapper which so far is working off its own AIG representation of the input, which is built at the start of the pass from unnamed IR
<povikMartinPovie> I realized it could be simplified working closer to unnamed IR, there's not that much gained from building the internal representation other than dealing with the inverter thing
<jix> given that the IR doesn't maintain fanout indices (afaik?), don't you need to build that index on the fly anyway and could use an index that looks through inverters when needed?
<whitequark[cis]> it does not, yeah
<povikMartinPovie> I guess
<whitequark[cis]> (we've planned at the beginning to do so and discussed it several times since but it never seemed particularly valuable given you can do it in one pass and like five lines of code)
<whitequark[cis]> * (we've planned at the beginning to do so and discussed it several times since but it never seemed particularly valuable given you can do it in a single pass and like five lines of code)
<povikMartinPovie> the input wouldn't need to be AIG, but if it's coarser than an AIG it would restrict the options of the mapper, since as a cut mapper it's always packing a whole number of input nodes into the selected gate
<povikMartinPovie> not sure if that would be useful to anyone, e.g. to force a mux to be split across gates
<povikMartinPovie> * a mux not to be
<whitequark[cis]> by "coarser" do you mean "more than 2-input per cell"?
<povikMartinPovie> yes, or allowing xor/xnor
<whitequark[cis]> hm, so you need to lower the netlist first and then build a fanout index
<whitequark[cis]> i think i'm inclined to agree with @jix that it would probably not be very beneficial to add this cell to our repertoire
<whitequark[cis]> among other things i dislike that it's non-commutative which makes merging harder, and i'm not sure how to satisfy simplify's monotonicity guarantees if it's going to be mapped to
<whitequark[cis]> and if it's just used and understood by this pass it seems too pass-specific to make the entire project care about it
<jix> fwiw, that's not what I said (but it might be a direct conclusion from what I said and other things that you consider a given thing)
<whitequark[cis]> oh, yes, sorry
<povikMartinPovie> some amount indexing of the design is required inside the mapper in any case, but when it's a single cell standing between a 1:1 correspondence of unnamed IR and nodes which the mapper is packing, I want to bring it up
<whitequark[cis]> i meant that i agree with you saying that building a fanout index yourself would be necessary and as a part of that you can look through inverters
<whitequark[cis]> and then make some additional conclusions from that
<povikMartinPovie> this 1:1 correspondence is of some value inside the mapper, maybe to the user too
<povikMartinPovie> I've a version of the mapper which builds up a complete AIG for itself from the unnamed design, and finds a mapping, I'm considering moving it closer to the IR before I add the final step of moving the mapping back into unnamed
<jix> whitequark[cis]: I'm also not disagreeing, I just don't think I can make any calls on what's beneficial to the project as a whole
<whitequark[cis]> i guess to summarize my thoughts, i feel that because UIR is an interchange format (with other tools; with whoever is reading it; between passes) first, there is a relatively high burden to adding a new cell; we could probably stand to reduce the cell count from the current one
<whitequark[cis]> (tangentially, we probably don't need so many shifts and divisions each being their individual cell)
<whitequark[cis]> an obvious way to reduce it is to replace our bitwise cells with Lut1 and Lut2 cells
<whitequark[cis]> (muxes are pattern matched over in passes like FSM inference, so they should in any case remain their own thing)
<whitequark[cis]> since you need a preprocessing pass anyway, this would allow you to get a netlist that has a 1:1 correspondence with the mapper nodes just as well, and you can bail out (panic) if the netlist isn't in the right form, which i think you need anyway as it is
<povikMartinPovie> > this would allow you to get a netlist that has a 1:1 correspondence with the mapper nodes just as well
<povikMartinPovie> that's true, if you don't mean that this netlist can be held as unnamed IR
<povikMartinPovie> what I'm asking for (want to have discussed) is extending the IR so that it's possible to do so
<jix> I think "this" referred to making unnamed use Lut1 and Lut2 for all 1/2-input 1-output bitlevel cells?
<whitequark[cis]> yes
<povikMartinPovie> ah, ok, sorry
<povikMartinPovie> we're on the same page here
<whitequark[cis]> actually, it would be more general than that
<whitequark[cis]> or, sorry, let me clarify something
<whitequark[cis]> if by 1/2 input 1 output you mean "1/2 operands" not "1/2 nets" then we are talking about the same thing
<whitequark[cis]> (I think it would not be the right move to use Lut1/Lut2 only for bit-level cells)
<whitequark[cis]> enum Cell { Lut2(u8, Value, Value) } basically
<jix> I wasn't thinking about word-level at all when writing that, but my impression was that you want to avoid unnecessary duplication of bit-level and word-level functionality so I would agree
<whitequark[cis]> we would need a bunch of infrastructure to make the text IR more readable (i suppose we could represent %0:1 = lut2 1000 %1 %2 as %0:1 lut2 and %1 %2 for example), and a decent amount of work on the existing passes
<whitequark[cis]> * we would need a bunch of infrastructure to make the text IR more readable (i suppose we could represent %0:1 = lut2 1000 %1 %2 as %0:1 = lut2 and %1 %2 for example), and a decent amount of work on the existing passes
<whitequark[cis]> (actually, something like %0:1 = and !%1 %2 might be truer to the spirit here; recognize and specially print all of the most common functions then handle the rest with a truth table)
<whitequark[cis]> anyway, i'm neither committing nor rejecting anything here, i want to hear what Wanda thinks first
<whitequark[cis]> these are my personal thoughts on what would be the most useful
<Wanda[cis]> I don't think we should have more 2-input cells
<Wanda[cis]> in fact, the Or cell is on thin ice
<Wanda[cis]> it's enough for mapping already; you can just store (net, inversion) pairs
<povikMartinPovie> on another topic, do you want to add a method which exposes the size of the cells array?
<whitequark[cis]> i'm curious where that comes up?
<povikMartinPovie> I would know the highest index a net can have, and could use a linear array to store some per-net information
<povikMartinPovie> I wouldn't need HashMap lookups in an inner loop
<whitequark[cis]> but net indices are not accessible outside of the netlist crate, are they?
<povikMartinPovie> ah, pub(crate)
<whitequark[cis]> yes. i insisted on this approach from the very beginning because i felt that Rust graph manipulation implementations tend to overuse indices in a way that makes them harder to work with
<whitequark[cis]> hm
<jix> you could provide a `DenseNetMap<T>` to enable this pattern without exposing that implementation detail
<whitequark[cis]> yes, that's what i was thinking about
<povikMartinPovie> works for me if my idea of what it is is correct
<povikMartinPovie> I assume it would borrow the design for the duration of its existence
<whitequark[cis]> there's a few ways it could go
<whitequark[cis]> i would probably just prototype this with a HashMap; we do this a lot in existing code already
<jix> I just meant a wrapper around a Vec and that wouldn't need any borrowing
<whitequark[cis]> i've mostly avoided spending too much effort on fine-grained optimizations in favor of defining an architecture that allows them later
<whitequark[cis]> like, swapping one type of map for another is really not a high effort thing so i'm content with not doing the best performing thing upfront
<povikMartinPovie> if you tell me there will likely be a way to do this later I'll just use a HashMap for now
<whitequark[cis]> yep
<whitequark[cis]> i think it's useful to have a repertoire of passes that are written using common Rust abstractions first and then extract the most useful patterns out of them later
<whitequark[cis]> i suppose the exception to this was CellRepr, but i felt that it's something that should be designed in from the outset because it changes the interface so much
<whitequark[cis]> this is partly because i think this results in better designs, and partly for aesthetic reasons (i like a codebase that was grown incrementally like this a lot more than i like one which front-loaded a lot of expected fine-grained optimization work much of which may not even have any meaningful effect on runtime)
<whitequark[cis]> i'm not sure if you've looked into it but right now i can synthesize boneless in 500ms and minerva in 2000ms, which is like... i think less than yosys takes to load the ice40 techmap.v?
<whitequark[cis]> and most of that comes out of running canonicalize like 15 times to keep fixing up one cell
<povikMartinPovie> sure, but I want to be not too far off from ABC whose mapper is fairly optimized
<whitequark[cis]> right
<whitequark[cis]> i'm on board with that obviously, so let's collect some benchmarks once it's ready for that
<povikMartinPovie> yes.
<widlarizerEmilJT> I think that when you have an IR that's comfy to traverse, you're not going to suffer on runtime or development velocity or memory consumption when your inverters aren't folded into ands
<whitequark[cis]> tbh i think the really big initial gains will be from using some sort of SmallVec type thing for Value
<whitequark[cis]> Vec is 24 bytes (which is too big, nobody wants >4 billion nets in a Value...), if we find a way to cut that down and also inline the simple cases (a constant; a single net) that should improve how quickly we can traverse the IR
<whitequark[cis]> `enum ValueRepr { Net(Net), Slice { nets: Box<[Net]>, cap: u32 } }` would probably already improve things a lot
<whitequark[cis]> SmallVec isn't the right solution because the size/capacity are still both usize
<whitequark[cis]> but also these things are fiddly enough that i'd really want benchmarks first before committing to anything