sajattack[m]1 has quit [Quit: Idle timeout reached: 172800s]
<mei[m]>
so if you request a memory cell with 10 read ports, you can lower this to hardware cells by duplicating the memory. are there any similar strategies for when you request a memory cell with 10 write ports?
<Wanda[cis]>
<mei[m]> "so if you request a memory..." <- there are, but they are a little too fucked up to be practical
<Wanda[cis]>
basically: they're the kind of strategies that, if really necessary, are better applied by the HDL designer, not the synthesis tool
<gatecat[m]>
yep, this
<Wanda[cis]>
mostly because, good luck meeting timing
<Wanda[cis]>
also
<Wanda[cis]>
there aren't really strategies that reduce a memory with many write ports to memory with fewer write ports
<Wanda[cis]>
the strategies generally involve reducing it to several large memories with single write ports, plus a small memory with many write ports, which you still have to lower
<Wanda[cis]>
which eventually ends up with bitblasting anyway.
<Wanda[cis]>
and the accursed XOR trick,yes
<Wanda[cis]>
in comparison, duplicating a memory to handle too many read ports is trivial
<Wanda[cis]>
(well turns out it's not completely trivial if you want it to be optimal in presence of read ports of varying width and have a target cell with multiple but not enough read ports, but that's just my OCD speaking)
<Wanda[cis]>
(look I may have thought a lot about memory geometry rearrangement yesterday)
<Wanda[cis]>
mei: so I take it you have read through my new memory crate?
<Wanda[cis]>
I uh.
<Wanda[cis]>
I actually think it may be enough (once I actually implement the swizzle) to replicate all the actually useful features of memory_libmap
<Wanda[cis]>
and the swizzle generation for iCE40 doesn't look particularly hard either
<Wanda[cis]>
mei: also if you're curious about memory validity rules
<Wanda[cis]>
so for iCE40 it's pretty easy
<Wanda[cis]>
it has an SDP blockram and that's it
<mei[m]>
Wanda[cis]: i started to, yes
<mei[m]>
SDP?
<Wanda[cis]>
so the validity rules are "if it has single write port and only synchronous read ports, it's valid for blockram (and also for fallback); if it has all write ports in same clock domain, it's valid for fallback (but you'll suffer)"
<Wanda[cis]>
simple dual-port
<Wanda[cis]>
ie. one read port, one write port, independent
<Wanda[cis]>
the other common options are SP (single port, which can do both read and writes) or TDP (true dual port, two ports that can do both read and writes)
<Wanda[cis]>
and those are a major fucking pain.
<Wanda[cis]>
unnamed doesn't have read-write ports. and I'm still not sure if it should have.
<Wanda[cis]>
memory_libmap will simply map a read port and a write port to the same hardware read-write port if it detects they are "compatible". but this is where the major shit starts.
<Wanda[cis]>
to combine the two, they must of course have the same address.
<Wanda[cis]>
which is validity rule number one.
<Wanda[cis]>
except. what does it mean they have the same address?
<Wanda[cis]>
consider: if write_enable and read_enable signals are mutually exclusive (say, negations of one another), we can actually merge the two ports even if they have completely different stuff connected to the address inputs, just install a mux
<Wanda[cis]>
we could just say "they must be exactly the same, insert the damn mux yourself if you need to"
<Wanda[cis]>
but. memory_libmap was made for yosys, and yosys has atrociously bad lowering for memory constructs coming from Verilog.
<Wanda[cis]>
the thing actually connected to the write address input is more like mux(write_enable, user_provided_address, X)
<Wanda[cis]>
and to keep you on your toes, the thing connected to the write enable is actually mux(write_enable, all-1, all-0)
<Wanda[cis]>
so I gave up on disentangling this bullshit (the Verilog frontend is not a place of honor. seriously, do not look.) and `memory_libmap` just asks a SAT solver whether `write_enable && read_enable && read_addr != write_addr` is SAT.
<galibert[m]>
Something I find very useful for framebuffers is a dual-port where one port is readonly and used by the ramdac, and one is read-or-write-but-not-at-the-same-time and connected to the processor bus. Dunno in which category you'd put it
<mei[m]>
concept: it is decidable whether two signals are mutually exclusive (assuming that the state of the flops is absolutely arbitrary, i.e. if you don't need induction to prove it)
<Wanda[cis]>
anyway! that was rule number 1. there is also rule number 2 which depends on vendor and target cell configuration a lot.
<mei[m]>
Wanda[cis]: i think that's what i'm suggesting 😅
<Wanda[cis]>
rule 2 variant 1 is: write enable must imply read enable (ie. the target cell cannot perform a write without also performing a read)
<Wanda[cis]>
rule 2 variant 2 is: write enable and read enable must be exclusive (you cannot write and read at the same time)
<mei[m]>
what's read enable anyway
<Wanda[cis]>
Xilinx and Lattice let you pick which one should apply to any given port
<Wanda[cis]>
oh, basically clock enable connected to the conceptual read data flip-flop
<Wanda[cis]>
ie. if 0, the read data output doesn't change and holds previous value; if 1, you actually do a read
<Wanda[cis]>
continuing.
<Wanda[cis]>
for iCE40, rule 2 variant 2 always applies
<Wanda[cis]>
and for Altera... hmmmm, I don't remember offhand, but I think there is no rule 2 and they actually have properly separate controls? id
<Wanda[cis]>
s/id/idk/
<Wanda[cis]>
Wanda[cis]: errr, for iCE40 SPRAM, which is a *separate* thing from the iCE40 blockram, and only exists on one chip
<Wanda[cis]>
I kinda forgot about it in previous ranting. sorry.
<Wanda[cis]>
(it is out of scope for now)
<Wanda[cis]>
(it is, as the name suggests, a single-port RAM)
<Wanda[cis]>
(it's main defining feature is: big)
<Wanda[cis]>
s/it's/its/
<Wanda[cis]>
(by big I mean 256kbits)
<galibert[m]>
not bad
<Wanda[cis]>
anyway. rule 2 variant 2 is an actual mapping viability constraint. if it is not met, you just cannot use the target cell in that configuration.
<galibert[m]>
an altera m10k block is 10240 bits, so that's roughly 25 of them
<Wanda[cis]>
so. memory_libmap obviously calls a SAT solver, because hey at this point why not
<galibert[m]>
I didn't realize how many calls to sat happens in a synthesizer because verilog is so bad at expression wanted constraints
<Wanda[cis]>
rule 2 variant 1, if you actually think about it, is not a hard constraint, because you can emulate a read enable by a contraption involving an extra "holding state" flop and a mux that selects between the two
<galibert[m]>
s/expression/expressing/
<mei[m]>
Wanda[cis]: as in "just instantiate the target cell yourself fucker"?
<Wanda[cis]>
so memory_libmap... well, also calls a SAT solver, but only to check whether it can do it the polite way.
<Wanda[cis]>
if it can't, it uses violence
<Wanda[cis]>
mei[m]: well yeah
<Wanda[cis]>
I mean. eventually we do want to support it directly. just, you know, priorities.
<Wanda[cis]>
galibert[m]: this is not *quite* the reason
<Wanda[cis]>
the reason is that yosys is bad
<galibert[m]>
oh? /me sits and listens
<Wanda[cis]>
and memory_libmap is faced with a netlist that's quite a mess
<Wanda[cis]>
or, more specifically, because yosys Verilog frontend is really bad
<galibert[m]>
in which way is it bad, it's losing information that's in the original source?
<Wanda[cis]>
yes
<Wanda[cis]>
so. think of how it all should be done in a good toolchain
<Wanda[cis]>
a good toolchain should have a set of Verilog-level patterns that are guaranteed to synthesize to whatever memories the user wants
<Wanda[cis]>
and these patterns should be documented in the synthesizer user-facing docs, as part of the contract with the user
<galibert[m]>
quatus/vivado have that?
<Wanda[cis]>
oh, yes
<galibert[m]>
s/quatus/quartus/
<Wanda[cis]>
maybe not completely well documented, but they do
<galibert[m]>
so the frontend should pattern match and generate appropriate IR modules?
<Wanda[cis]>
now, memory_libmap is unable to do that, because it's not operating on Verilog code, it's operating on a netlist that's been through many passes already
<Wanda[cis]>
in practice, it can see a bunch of muxes that have been generated by yosys proc lowering
<Wanda[cis]>
think of rule 2 variants
<Wanda[cis]>
they could be trivially expressed and documented as Verilog-level validity constraints
<Wanda[cis]>
for variant 1, "there has to be one read in some branch of an if/case tree, and all writes have to be nested within that branch subtree"
<Wanda[cis]>
for variant 2, "the read and the write have to be in mutually-exclusive if/case branches"
<Wanda[cis]>
done
<galibert[m]>
so you need to still have if/case for the matching
<Wanda[cis]>
entirely syntactic checks
<galibert[m]>
does that mean you need a verilog-front-end dedicated IR, on which pattern recognition is done and eventually lowering to the standard IR?
<Wanda[cis]>
excellent question.
<galibert[m]>
or there are better methods than that?
<Wanda[cis]>
and this is the part that we will actually need to think about.
<galibert[m]>
Ok, that's not obivous then, interesting
<Wanda[cis]>
so. we do have a more advanced tool than yosys already, which is the match cell
<galibert[m]>
s/obivous/obvious/
<Wanda[cis]>
which does allow us to do pretty simple queries on whether its outputs are mutually exclusive, or one implies the other
<Wanda[cis]>
if we were making our own Verilog frontend, we could make it work by a) defining memory lowering validity rules in terms of read/write enable connections to match cells, b) ensure the Verilog frontend lowers processes to match cells in a way that preserves these validity rules
<galibert[m]>
aren't you making it, eventually?
<Wanda[cis]>
and. I actually have excellent news: we could even make it work for yosys, by "simply" exporting the RTLIL earlier and replacing the proc pass with something that emits our match cells
<Wanda[cis]>
galibert[m]: there will come a day. that day is not soon.
<galibert[m]>
heh, that I really can understand
<Wanda[cis]>
it's tempting actually, but there are bigger fish to fry
<mei[m]>
that's a very cat-coded metaphor
<Wanda[cis]>
Wanda[cis]: of course, this would involve dealing with RTLIL parsing and lowering yosys processes, which are very much on the batshit side, but... well, just saying it's doable in principle, not sure whether I want to try it.
<galibert[m]>
never saw a cat bother to fry her fish
<mei[m]>
mmm sashimi
<Wanda[cis]>
galibert[m]: it happens a fair bit, actually. the small ones just have different tools for frying fish than the large ones.
<Wanda[cis]>
(ie. humans)
<Wanda[cis]>
see, as far as a cat is concerned, a human is a device that dispenses food.
<galibert[m]>
and pets
<galibert[m]>
and is comfortable to sleep on
<galibert[m]>
I think the list is complete
<Wanda[cis]>
now, continuing with the match cells
<Wanda[cis]>
there's another interesting question here, which is what should be done about Amaranth
<Wanda[cis]>
where memories are operated differently than in Verilog
<Wanda[cis]>
which will actually result in harder to define rules
<galibert[m]>
I thought amaranth was actually cleaner there
<galibert[m]>
having Memory for a start
<Wanda[cis]>
because, well, you don't have read or write a memory as an Amaranth statement
<Wanda[cis]>
you can only assign to a memory control signal, which will cause a memory read or write to happen somewhere else
<galibert[m]>
ah, iff enable is active when clock happens
<Wanda[cis]>
mhm
<galibert[m]>
which is kinda what happens in hardware, but is not a statement
<Wanda[cis]>
and that results in additional layers of indirection which have to be pierced through to define validity rules
<galibert[m]>
composite jtag, really?
<Wanda[cis]>
we'd need something like... "read enable and write enable have to be init=0, driven within the same module, and only set within exclusive branches (for variant 2)"
<Wanda[cis]>
for variant 1, even messier, "there has to be a branch that unconditionally assigns 1 to read enable, and all assignments to write enable have to be nested within it"
<Wanda[cis]>
(either that or just read enable combinationally tied to 1, that also works)
<galibert[m]>
ewwww
<Wanda[cis]>
and we have to make these rules pierce through any unconditional assignments, such as connects
<Wanda[cis]>
then, we can document these rules and, being in control of the NIR lowering, make sure this results in the exact match cell connections that unnamed will recognize
<Wanda[cis]>
I consider all of this to be messy, but not super messy, and perfectly doable.
<Wanda[cis]>
there are two alternatives
<galibert[m]>
that's reassuring
<galibert[m]>
I like amaranth after all :-)
<Wanda[cis]>
alternative 1 involves composite ports with Memory.read_write_port_with_write_implies_read() and Memory.read_write_port_with_write_excludes_read(). I have floated this idea by Cat a year ago or so, and it was not very happy about it.
<mei[m]>
USER WAS BITTEN FOR THIS POST
<Wanda[cis]>
alternative 2 involves a SAT solver, which is horrible from a usability perspective because it is really hard to understand why your memory suddenly started being lowered in a significantly worse way, or was rejected entirely; too many actions at a distance involved
<Wanda[cis]>
also.
<Wanda[cis]>
another thing that should be done is taking a survey of other HDLs and how they deal with expressing memories (if at all)
<Wanda[cis]>
I expect for most of them the answer will be "not very well", because they haven't had the proper resources to properly think about or tackle this problem
<Wanda[cis]>
(we do have the massive advantage of designing an end-to-end flow here)
<galibert[m]>
vhdl doesn't, of course, because why would it?
<Wanda[cis]>
I mean
<Wanda[cis]>
vhdl does
<Wanda[cis]>
in more or less the same way as Verilog
<galibert[m]>
which doesn't really, either
<Wanda[cis]>
it's a process-based language where reading a memory and writing a memory are statements within a process.
<Wanda[cis]>
this means it can follow pretty much the same rules
<galibert[m]>
true
<Wanda[cis]>
galibert[m]: have you listened to a word I said?
<galibert[m]>
I have
<Wanda[cis]>
Verilog is better than Amaranth for expressing memory validity constraints
<galibert[m]>
I meant there's arrays, and whether it maps to a memory block is kind of implicit
<Wanda[cis]>
because you have the memory operations directly tied to the statement structure
<Wanda[cis]>
and so is VHDL, because it is the same thing
<galibert[m]>
you can't say "I want a memory block here and fail otherwise" portably
<Wanda[cis]>
you can!
<galibert[m]>
really?
<Wanda[cis]>
have you looked at 1364.1
<galibert[m]>
I've looked at parts of it
<Wanda[cis]>
there are actual attributes you can use to request a memory block
<galibert[m]>
I needed to keep some of my sanity though
<Wanda[cis]>
yosys will, in fact, respect them
<galibert[m]>
impressive
<Wanda[cis]>
and for VHDL I'd like to direct you to IEEE 1076.6 section 6.5
<Wanda[cis]>
like, yes, Verilog and VHDL have massive flaws. but they are industry-standard languages that have been used for decades to create most of digital logic that exists, why on earth would you think they have no means to specify a memory block?
<galibert[m]>
I thought they had no means to require it
<mei[m]>
any thoughts on terminology that would distinguish "pattern (building block that implements prjunnamed_pattern::Pattern)" and "pattern (the syntactic category parsed by netlist_matches!)"?
<whitequark[cis]>
"pattern struct" or "pattern implementation" vs "pattern syntax"?
<mei[m]>
maybe the building blocks should be called "matchers"?
<whitequark[cis]>
sure
<mei[m]>
also, thoughts on renaming netlist_matches! to netlist_match!, so that the naming parallels Rust's match construct rather than the matches! macro?
<whitequark[cis]>
sure
<mei[m]>
Wanda: can you re-read your doc comment on top of `FlipFlop`? it doesn't make much sense
<mei[m]>
or, if it is correct, then reset_over_enable isn't a priority selection, it chooses whether enable gates reset
<whitequark[cis]>
that's what it does, if i understood you correctly
<Wanda[cis]>
<mei[m]> "or, if it is correct, then ..." <- is there a difference?
<Wanda[cis]>
so... the thing about enable, as commonly understood in EDA, is that its polarity is kinda backwards when you think about it
<Wanda[cis]>
the "neutral" state for a reset is 0, no reset; the "neutral" state for an enable is 1, always enabled
<Wanda[cis]>
imagine, for a moment, that instead of an "enable" which allows the flop value to be changed, we have a "hold" input with the opposite semantics (a 1 forces the flop to hold current value)
<Wanda[cis]>
then the thing we have is exactly hold vs reset priority
<mei[m]>
wait, is clear like, asynchronous asynchronous?
<Wanda[cis]>
it is completely asynchronous, yes
<mei[m]>
huh, so the with_enable and with_reset builders don't commute
<Wanda[cis]>
yeah
<mei[m]>
so are FPGA cells that expose separate init_value, reset_value and clear_value common?
<Wanda[cis]>
not really
<mei[m]>
but many interesting subsets can be found?
<Wanda[cis]>
I don't know of anything that has all three
<Wanda[cis]>
but, yes
<Wanda[cis]>
most Xilinx FPGAs have separate init_value and either reset_value or clear_value (you cannot have both clear and reset)
<mei[m]>
is it configurable per cell?
<mei[m]>
what's the priority of hold vs reset on those?
<Wanda[cis]>
per SLICE (group of 2, 4, or maybe 8 LUTs depending on the device)
<Wanda[cis]>
reset over enable
<Wanda[cis]>
SiliconBlue, otoh, has always-0 init_value and then arbitrary either reset_value or clear_value
<mei[m]>
who has enable over reset?
<Wanda[cis]>
... with enable over reset
<Wanda[cis]>
Xilinx Spartan 6 is like other Xilinx, except init_value must match reset_value/clear_value
<Wanda[cis]>
Lattice is like Spartan 6
<Wanda[cis]>
and Altera.... it is unclear what the fuck is going on with Altera because the documentation is crap
<Wanda[cis]>
but it does have independently usable reset and clear
<Wanda[cis]>
at least on some devices
<Wanda[cis]>
hm
<Wanda[cis]>
the ALM-based devices seem to have independent reset and clear, with the requirement that init_value = reset_value = clear_value = 0
<mei[m]>
Wanda[cis]: so, you can always easily emulate a synchronous reset. how do you implement a flop with async reset and `clear_value != init_value` on spartan 6, though?
<mei[m]>
ALM?
<galibert[m]>
altera stuff
<Wanda[cis]>
Adaptive Logic Modules
<Wanda[cis]>
one of the two genders of Altera logic blocks
<Wanda[cis]>
well, two major ones, at any rate
<Wanda[cis]>
mei[m]: violence.
<Wanda[cis]>
create two flops, one with clear_value, another with init_value
<Wanda[cis]>
create a mux between them, drive it from a latch set by the clear signal
<galibert[m]>
I don't think the alms have independant reset and clear, they just have async and sync clear
<galibert[m]>
where sync is just a mux on the ff input
<Wanda[cis]>
galibert: "sync clear" *is* reset
<galibert[m]>
weird terminology, but ok
<galibert[m]>
but yeah, sync and async can only set to zero in any case