<whitequark[cis]>
zyp[m]: or maybe ask for a pair of ports whose addresses are tied together
<whitequark[cis]>
however, one reason to have a readwrite port is because enables aren't really independent in that case
<tpw_rules>
they seeeeeeeem to be according to altera docs
<zyp[m]>
whitequark[cis]: how do you express that as a signature?
<zyp[m]>
and if you do, how does it differ from a RW port?
<whitequark[cis]>
tpw_rules: not *always* independent
<tpw_rules>
i think she means sort of like a socket interface which returns a pair of connected sockets: you as for a pair of connected ports
<tpw_rules>
ask*
<whitequark[cis]>
zyp[m]: you don't, they just are physically same
<galibert[m]>
tpw_rules: transparency make it an expensive pass through
<whitequark[cis]>
lib.wiring never says that all the stuff you get as signals in an interface object refers to like, different signals
<whitequark[cis]>
it just says what the structure is
<tpw_rules>
that would also be hard to get multiple ports from, unless you can ask for n tandemed read ports with your write port
<zyp[m]>
whitequark[cis]: so in other words they're not very usable with `wiring.connect`
<tpw_rules>
wiring.connect doesn't care, it's just a bit arbitrary which one's address really matters (the latest one)
<whitequark[cis]>
not as individual separate ports no
<zyp[m]>
tpw_rules: `wiring.connect` doesn't care, but you end up driving `addr` twice
<tpw_rules>
whitequark[cis]: do you have an example offhand of not independent conditions? i am curious what a sample constraint would be
<whitequark[cis]>
ask Wanda
<tpw_rules>
e.g. i think an en + rd/wr switch would be like... 75% dual port
<whitequark[cis]>
well, that's actually what i'm talking about
<whitequark[cis]>
en+we is very different from re+we
<whitequark[cis]>
because the former can't express re=we=1
<whitequark[cis]>
but I don't fully understand it...
<tpw_rules>
yes, but i would harshly judge a vendor who claimed that was true dual port
<tpw_rules>
(my considerably valuable input)
<tpw_rules>
and, perhaps, amaranth should not expose it like that
<tpw_rules>
but anyway, i'm happy about a resolution on the clock issue, help from amaranth shouldn't be necessary. i am okay with punting on true dual port for now, but it would be nice to have a language-sanctioned solution
<tpw_rules>
to me i don't think a sat solver is necessary, i think it should only be true dual port where port_r.addr.eq(x) and port_w.addr.eq(y) and `x is y`
<tpw_rules>
(which might be some simple same-net check in the IR?)
<whitequark[cis]>
we've also considered that
<whitequark[cis]>
basically, it's hard to come up with a very satisfying solution. I think we'll need to do it iteratively
<whitequark[cis]>
so maybe we just don't have TDP at first. well, iCE40 doesn't have TDP at all :D
<whitequark[cis]>
also I still wonder if inference cannot be made to work
<tpw_rules>
yes, you can throw yourself at the mercy of the tools
<tpw_rules>
i wonder if it would end up needing hand crafted templates
<tpw_rules>
i'm not sure what the full extent of quartus's inference capabilities are, i know it rejects non power of two sizes. and we're not doing weird stuff like byte enables. but idk about many read port duplication. perhaps i need to move those tests up on my priority list
<whitequark[cis]>
tpw_rules: if the only problem is NPOT memories then the existing get_memory hook is absolutely trivial to adjust for that
<tpw_rules>
i've also had problems with transparency
<whitequark[cis]>
which?
<tpw_rules>
my quartus will not infer transparent=True for cyclone v
<tpw_rules>
then there's the real issue that motivated my interest in all this, which is wrappers to make the init behavior match. but i think those could be just wrappers indeed
<whitequark[cis]>
I see
<tpw_rules>
as i understand now amaranth just tells yosys "memory" and lets yosys generate all the verilog. i think for reliable inference we would need to fix up yosys a bunch and/or have amaranth just write out known-good snippets
<whitequark[cis]>
known-good snippets aren't something i'm necessarily willing to do at this stage
<whitequark[cis]>
because amaranth doesn't actually do verilog, only yosys does
<tpw_rules>
ok, so that's not "inference" to you
<tpw_rules>
good to know
<whitequark[cis]>
it's a layering issue
<tpw_rules>
yes
<whitequark[cis]>
it would have to go into yosys' write_verilog backend
<whitequark[cis]>
-mode quartus or whatever
<tpw_rules>
does RTLIL support arrays that are not memory? could amaranth instead use not memory RTLIL objects and set up something that it knows will be written out to verilog how quartus likes?
<whitequark[cis]>
the closest thing to that is an RTLIL memory object.
<whitequark[cis]>
unless you're like, writing verilog separately and then instantiating it, which is something I'll only consider if it can be shown that inference will never work with quartus
<tpw_rules>
which gives yosys full control back of how to write out the verilog. i suppose it's not out of the question to teach yosys how to do it and then have amaranth rely on that
<whitequark[cis]>
(which is doubtful)
<tpw_rules>
okay well i gotta run to dinner, thanks for the discussion. it would definitely be nice to keep work and complexity out of amaranth for this
<whitequark[cis]>
see you
<tpw_rules>
amaranth being able to actually generate memories so easily is a big part of the value to me indeed
<Wanda[cis]>
re enables: the motivating example is Xilinx, which has EN and WE signals (WE being per-byte)
<Wanda[cis]>
there are two modes: plain (read_en = EN, write_en = EN & WE, cannot express write without read), or `NO_CHANGE` (read_en = EN && WE == 0, write_en = EN & WE, cannot express write and read at the same time)
<Wanda[cis]>
so whichever mode you pick, there is one combination you cannot express
<Wanda[cis]>
for yosys memory inference I basically gave up and decided to use a SAT solver: if SAT solver determined that `write_en && !read_en` is UNSAT, I use the first mode; if SAT solver determines that `write_en && read_en` is UNSAT, I use the second mode; otherwise, I reject the port combination
<Wanda[cis]>
note that my approach was determined by a simple constraint: I needed to be able to infer memories after the original Verilog has long been converted to a mux tree, and a bunch of optimization passes have already run on it
<Wanda[cis]>
another sound approach would have been defining some blessed Verilog code patterns that you're supposed to use, and recognizing that directly (I think this is what most toolchains do?), but that'd have required me to hook into the Verilog frontend in a much earlier stage, and that is just... holy fuck you don't want to touch the yosys Verilog frontend.
lf has quit [Ping timeout: 264 seconds]
lf_ has joined #amaranth-lang
<Wanda[cis]>
further, yosys memory inference was also very much designed for Verilog in the first place, obviously
<Wanda[cis]>
the memory inference pass would've been very different if it was designed to consume some higher-level description of memories rather than trying to divine it from Verilog statements
<Wanda[cis]>
thus, arguably, we shouldn't be basing Amaranth memory design on yosys at all, since its constraints do not really apply.
<Wanda[cis]>
this is why yosys never got a read-write memory port: there's simply no way to express such a thing in Verilog
<Wanda[cis]>
you can have a memory write, and a memory read, and then you can have some magic that recognizes they can be combined
<Wanda[cis]>
now, if I were to do a greenfield design for high-level structures for memory description (as if I could ever do this while having my mind contaminated by yosys), read-write ports could be a very good idea; the really annoying part though is that you need like three variants of them
<Wanda[cis]>
(with less horrifying names hopefully)
<Wanda[cis]>
and the latter two would probably have EN + WE signals like Xilinx
<Wanda[cis]>
anything else, unfortunately, will involve SAT solving bullshit
<Wanda[cis]>
or having significant parts of hw inaccessible
<Wanda[cis]>
* of hw functionality inaccessible
<Wanda[cis]>
the problem with all this is that having a legitimately hardware-independent description is hard because there's so much tiny semantic differences.
<Wanda[cis]>
another annoying one: does asserting reset on the read-write port override write enable? idk, depends on the vendor.
<Wanda[cis]>
it doesn't fully capture all possibilities, because some vendors do even crazier bullshit, but it captures the .... reasonably sane subset
<Wanda[cis]>
(did you know old Altera blockrams *capture* write address & data at clock posedge, but *perform* the write at negedge? fun.)
<Wanda[cis]>
(did you know the async reset on the same blockrams also applies to the write registers, not just read registers? kill me.)
<whitequark[cis]>
honestly i suspect we'll just have to do the SAT bullshit all over again
<Wanda[cis]>
I... don't know
<Wanda[cis]>
I think if you added the ReadWritePort with the three possibilities I outlined (exclusive read/write, independent read/write, write implies read), then defined some conservative semantics for reset, you could do memory lowering entirely without SAT
<whitequark[cis]>
that's pretty gross though
<Wanda[cis]>
shrug
<Wanda[cis]>
perhaps
<Wanda[cis]>
idk how to make it work
<whitequark[cis]>
so many platform details leaking unnecessarily
<Wanda[cis]>
but it may be worth it
<Wanda[cis]>
like
<Wanda[cis]>
I ... really don't like the SAT-based thing
<Wanda[cis]>
it feels so fucking fragile
<whitequark[cis]>
i think the ReadWritePort thing is also fragile in a different, similar way
<whitequark[cis]>
cause when you go to another platform it breaks if you're not making it happy
<Wanda[cis]>
so does SAT
<Wanda[cis]>
the difference is that with ReadWritePort we can actually document the combinations expected to work in a sane way
<whitequark[cis]>
we don't even have testbenches for anything
<whitequark[cis]>
basically, i'm not ready to commit to either of these solutions until we actually understand the problem space
<whitequark[cis]>
not just on the hardware side, but also on "what the toolchains actually do" side, and "what are the desirable configurations people actually use" side
<Wanda[cis]>
the problem space is horrifying.
<Wanda[cis]>
also, another major wart
<Wanda[cis]>
if you want your memories to infer well, you have to give up on determinism
<Wanda[cis]>
particularly TDP
<Wanda[cis]>
what's the transparency behavior between the two TDP ports on common BRAMs? fuck if I know
<Wanda[cis]>
I spent a lot of time reading documentation and sim models trying to figure that out, and most vendors just go "lol idk"
<whitequark[cis]>
hmm, so our behavior is currently defined as "you get new value if transparent_for, old value otherwise"
<whitequark[cis]>
and we may have to go to "you get new value if transparent_for, unspecified value otherwise"
<Wanda[cis]>
like, "get old value" tends to be supportable for intra-port transparency
<Wanda[cis]>
for inter-port... I thiiiink Xilinx can do it, and that's it?
<Wanda[cis]>
I mean, for BRAMs; for LUTRAMs it just happens everywhere by default
<whitequark[cis]>
i think we just need to teach unspec values to the simulator and aggressively emit those on anything that looks nondeterministic that we can't practically just fix
<whitequark[cis]>
which is a blocker for the desirable uninitialized memories feature
<whitequark[cis]>
so it kinda has to happen anyway
<whitequark[cis]>
anyway, i'm checking out
<Wanda[cis]>
also as an argument against SAT
<Wanda[cis]>
with ReadWritePort, whether your memory infers or not will depend (with suitable implementation) on the memory ports you created and nothing else
<Wanda[cis]>
with SAT-based approach, your success or failure can depend on some logic 10 modules across from the one containing the memory
<whitequark[cis]>
yes, I understand the downsides
<Wanda[cis]>
people are already surprised by lesser shit happening!
<Wanda[cis]>
like the thing where we infer BRAMs sometimes for memories with comb read ports
<whitequark[cis]>
I mean, that seems like a valid and desirable optimization
<Wanda[cis]>
because it turns out that yosys managed to merge some random flop that happens to be on the read path into the memory
<tpw_rules>
Wanda[cis]: altera only supports old data for inter-port
<tpw_rules>
(and only new data for intra-port)
<Wanda[cis]>
hm
<Wanda[cis]>
is it "new data" or "new data on we-enabled lanes, 'x on non-we-enabled lanes, unless no write, in which case current data"? I recall one vendor being batshit like this
<Wanda[cis]>
it may have been altera
<tpw_rules>
uh that sounds close, but i kinda skimmed past that part since amaranth doesn't have lane support
<Wanda[cis]>
uh? of course it has lane support
<tpw_rules>
i guess that might come up with mixed width
<Wanda[cis]>
it's called granularity
<Wanda[cis]>
mixed width doesn't have to be involved in this
<tpw_rules>
buh you're right, idk how i forgot that
<tpw_rules>
anyway at least new data is patchable with some extra logic, bearing in mind your comment about the haters
<tpw_rules>
sigh. i'm kinda negative on SAT too for my 2 cents
<Wanda[cis]>
new data is patchable if you can rely on non-we-enabled lanes actually reading back the current data
<tpw_rules>
according to the manual that's a selectable option
<Wanda[cis]>
... okay, yeah, according to my notes it does work on altera
<tpw_rules>
oh wait that might be family specific
<tpw_rules>
but you can still do new data if you use an old data mode, it's just more limiting
<Wanda[cis]>
I looked at every family for my notes
<tpw_rules>
the doc i have seems to say that on the native intra-port mode, some families allow you to read current data for non-we-enabled lanes. some mandeate returning x
<Wanda[cis]>
oh.
<Wanda[cis]>
right.
<Wanda[cis]>
M4K
<Wanda[cis]>
I grepped for the wrong thing
<tpw_rules>
M10K mandates x
<whitequark[cis]>
Wanda: haha oh god we might actually want `TrueSettle`
<Wanda[cis]>
cyclone, cyclone II, stratix, stratix II, arria gx
<Wanda[cis]>
Catherine: ... what is it now
<whitequark[cis]>
I'm currently debugging put_stream and get_stream functions in the OBI codebase, which definitely do not function as they should
<Wanda[cis]>
OBI?
<whitequark[cis]>
doesn't matter
<whitequark[cis]>
point is, there is a weird race condition, and it goes away if i double or triple up yield Settle in add_testbench_wrapper
<Wanda[cis]>
... I think I may have said something regarding chainsaws
<whitequark[cis]>
chainsaws?
<Wanda[cis]>
and fucking thereof.
<cr1901>
That sounds dangerous
<Wanda[cis]>
anyway.
<Wanda[cis]>
what if we just
<Wanda[cis]>
(always a good beginning, I know)
<Wanda[cis]>
always schedule processes in front of testbenches?
<whitequark[cis]>
yes I suspect we have to do something like that
<Wanda[cis]>
like, if there's a schedulable process (and pyrtl emitted stuff counts as process), run it; only look at testbenches when that queue runs out
<Wanda[cis]>
hm
<Wanda[cis]>
there... should also be delta cycles in there somewhere
<Wanda[cis]>
ugh
<Wanda[cis]>
yeah
<Wanda[cis]>
chainsaw please
<whitequark[cis]>
fucking gtk3 gtkwave. garbage because it's gtkwave and because it's gtk3! double the horribleness
Degi_ has joined #amaranth-lang
<whitequark[cis]>
still doesn't support removing a trace with Delete because no one competent ever touched that UI
Degi has quit [Ping timeout: 255 seconds]
Degi_ is now known as Degi
<cr1901>
Thoughts on Surfer?
<whitequark[cis]>
can't stand gesture based interfaces
<cr1901>
very reasonable
<whitequark[cis]>
basically unusable for me with the current input model
<cr1901>
I don't use it full time either, but I have it installed in the hopes that it'll eventually be good enough for me
<whitequark[cis]>
i instrumented the simulator to advance vcd time whenever it does a delta cycle, and also show when it executes coro and rtl processes
<whitequark[cis]>
you can clearly see how sometimes that happens in the same delta
<whitequark[cis]>
* same delta cycle
<whitequark[cis]>
this is Not Correct
<whitequark[cis]>
actually it's not 100% clear what the causality on that is, let's see
<tpw_rules>
is that arrow like, a delta function
<whitequark[cis]>
its just a vcd thing i put in a bunch of places
<whitequark[cis]>
ok, so here gtkwave glitched out displaying cursors, but fortunately these are actually the exact two places i wanted to highlight
<whitequark[cis]>
1st time get_testbench gets called, it's doing yield stream.valid. 2nd time it's called, it's doing yield stream.payload (more or less)
<whitequark[cis]>
in between these, the pyrtl processes do their thing, and update the state
<whitequark[cis]>
there's also a second race condition here, which is not an amaranth issue
<whitequark[cis]>
namely, we have m.d.comb += self.dac_stream.valid.eq(self.dwell_stream.valid)
<whitequark[cis]>
and then we have put_testbench set dwell_stream.valid, and get_testbench check dac_stream.valid
<whitequark[cis]>
naturally they end up running in parallel
<zyp[m]>
huh, I like those arrows, I kinda would like to be able to make them from testbenches
<whitequark[cis]>
it's the vcd "event" feature
<whitequark[cis]>
it's been requested before but is kind of tricky to expose in a non-awful way
<whitequark[cis]>
it's also completely unsupported by cxxrtl
<whitequark[cis]>
ideally what i want to see is annotations, where you can hover around a timeline and at various instantaneous moments (which can happen within zero time too) you can see something happening, like a process sending a command
<whitequark[cis]>
and guess what!!! the CXXRTL protocol already supports those, they're called "diagnostics"
<whitequark[cis]>
and the CXXRTL protocol also lets you distinguish happens-before relationship within zero time
<whitequark[cis]>
so I think the play here is to add CXXRTL protocol support to pysim and then use Surfer or whatever as a frontend
<zyp[m]>
Surfer can ingest CXXRTL protocol?
<whitequark[cis]>
yeah
<whitequark[cis]>
it's not public yet since the protocol is fairly unstable still
<whitequark[cis]>
and it requires you to use CXXRTL from a branch
<whitequark[cis]>
but yeah
<zyp[m]>
nice, and yeah, that'd make sense
<whitequark[cis]>
anyway back to testbenches
<whitequark[cis]>
what do we do about this mess?
<whitequark[cis]>
I see two issues at hand
<whitequark[cis]>
1. testbench processes racing with pyrtl processes under... some conditions?
<whitequark[cis]>
actually, hold oon
<whitequark[cis]>
yeah I'm wrong about the first race. the testbench processes wake up and immediately execute Settle
<whitequark[cis]>
yeah, I think the only issue here is basically that if one testbench does e.g. "x.eq(1), y.eq(2), z.eq(3)" and another does "y, y, y"
<whitequark[cis]>
then neither of those operations is atomic at all, even though both could trivially be
<whitequark[cis]>
I think the way we schedule testbenches is fundamentally broken
<whitequark[cis]>
they're preemptible at every single yield, and worse, they are guaranteed to be preempted at every single yield if that's at all possible
<whitequark[cis]>
you know how people say that concurrent programming with asyncio is easier than threads because asyncio tasks can get preempted only at await points and threads can get preempted anywhere? well, if you make "variable get" and "variable set" await points, you basically make threads out of asyncio
<whitequark[cis]>
in fact not just threads, but threads with a scheduler that is actively being evil
<zyp[m]>
combinational signals have to preempt testbenches, whether they are handled by pyrtl or a process
<whitequark[cis]>
elaborate?
<zyp[m]>
code has m.d.comb += a.eq(b), testbench does yield b.eq(whatever) then yield a
<zyp[m]>
or equivalently async for x in sim.changed(b): sim.set(a, x)
<whitequark[cis]>
yes, of course
<whitequark[cis]>
sim.set(a) where a is awaited on must be a preemption point, I don't dispute that
<whitequark[cis]>
however I don't think that any other sim.set() or any sim.get() should be!
<whitequark[cis]>
that's just making your life harder for no reason
<zyp[m]>
hmm
<zyp[m]>
so if I've got async def foo(sim): await sim.tick(); sim.set(x, whatever) and async def bar(sim): await sim.tick(); sim.get(x), the implicit settle will make bar read what foo just wrote, right?
<zyp[m]>
* so if I've got async def foo(sim): await sim.tick(); await sim.set(x, whatever) and async def bar(sim): await sim.tick(); await sim.get(x), the implicit settle will make bar read what foo just wrote, right?
<zyp[m]>
but if I've instead got async def foo(sim): await sim.tick(); await sim.set(y, something); await sim.set(x, whatever) it can be preempted between the two sets?
<Wanda[cis]>
I think we can make it even stricter
<Wanda[cis]>
if you have sim.set in testbench, it can get preempted by processes, but not by other testbenches
<whitequark[cis]>
yes, I was just talking about this on a voice call with Isabel
<whitequark[cis]>
there's actually a case where you sim.set() in one testbench and await sim.changed() on the same thing in another
<whitequark[cis]>
ok, so as a first attempt, I'll make yield eq() return True if that triggered something
<whitequark[cis]>
and then make the testbench wrapper only settle on Assign if it returns True
<whitequark[cis]>
Wanda: I guess we can think of `Print` and `Assert` as a type of testbench that lives in RTL
<whitequark[cis]>
and schedule them after all other RTL processes, together with testbenches
<Wanda[cis]>
... do we get that check phase after all?
<whitequark[cis]>
no
<Wanda[cis]>
it's equivalent though
<whitequark[cis]>
actually, not necessarily?
<whitequark[cis]>
depending on how preemption ends up working (and this can actually cause comb assertions to pass/fail)
<Wanda[cis]>
hm
<whitequark[cis]>
so let me explain first what made me rethink it
<Wanda[cis]>
"testbenches are (or, should be) only scheduled if the set of runnable processes (incl. RTL) is empty": true or false?
<whitequark[cis]>
so you know how CXXRTL works right? you have an eval/commit loop. I think that is actually a perfectly fine design and requires no adjustment, for the following reason: the eval/commit loop is a way of updating shared mutable state that has internal side effects semantically represented by parallel processes
<whitequark[cis]>
the eval/commit loop ensures that the side effects are fully deterministic
<whitequark[cis]>
(let's ignore $print and $check here for a second)
<whitequark[cis]>
outside of the core eval/commit loop, you have the testbenches, usually one in CXXRTL, but you may easily want more than that. the testbenches do not participate in the eval/commit loop and their execution isn't necessarily deterministic; they serialize in whichever way happen to be written
<whitequark[cis]>
* whichever way they happen to
<whitequark[cis]>
the fact that the stuff inside the eval/commit loop looks like "process triggered by async event" and the stuff outside the eval/commit loop looks like "process triggered by async event" is, I think, a red herring
<whitequark[cis]>
they're fundamentally different kinds of entities
<whitequark[cis]>
the former is an implementation detail of how netlists are translated to C++. logically, you give the circuit a map {input=>value} and you get back a map {output=>value}; the circuit is a function in discrete time. you could even make it a pure function if you do ({input=>value},state)->({output=>value},state')
<whitequark[cis]>
the latter is an inalienable part of the programming model of testbenches
<whitequark[cis]>
the way I've been originally planning to integrate CXXRTL with Amaranth was by introducing a "reactor". I haven't actually added one because I wasn't sure of the design, but now I'm getting more confident in what it should be
<whitequark[cis]>
the CXXRTL reactor would get a bunch of CXXRTL modules, wire them up together (so that you can compose big modules out of smaller modules with separate compilation), let you add a clock driver (mostly just for performance reasons), and then let you wait on netlist wires
<whitequark[cis]>
new blackboxes actually depend on building the reactor, since you want your blackbox to be invoked on a clock and gated with some enable, and you don't want to manually run it from the toplevel
<whitequark[cis]>
the reasons I haven't added a reactor were mainly: (a) I did not know how to wire together separate cxxrtl::modules efficiently, since you could have arbitrary comb connections and I wanted to avoid injecting delta cycles as much as possible, and (b) it wasn't clear exactly what the concurrency semantics would be otherwise
<whitequark[cis]>
it seemed to me like there would need to be one big event loop that both cxxrtl::modules register themselves in, and testbenches/blackboxes do, and they'd just all kind of mash together. but that's unsatisfying because of clock gating: a clock gating primitive like BUFGCE is one of the few places where you really, unavoidably care about ordering of events in zero physical time
<whitequark[cis]>
(back)
<whitequark[cis]>
the way I think about it now is: the reactor is the boundary between the deterministic and the nondeterministic. everything inside is a bunch of mutable state that's guarded by the eval/commit loop guarantees, and which should be completely deterministic (provably so if you don't allow arbitrary code in eval()). everything outside is the nondeterministic world with I/O, testbenches, synchronization via mutexes, channels,
<whitequark[cis]>
whatever other means, which the reactor notifies about state changes via subscriptions
<whitequark[cis]>
actually "reactor" at this point is a misnomer because it shouldn't even have an event loop (the eval/commit loop isn't an event loop), it's just an efficient, imperative interface over what's naturally a pure function
<whitequark[cis]>
so you have the "reactor" (i'll keep calling it that to hopefully reduce confusion) which tells you when stuff changed, and a scheduler of some sort which actually calls back into the outside world, as two separate modules
<whitequark[cis]>
both Print and Assert are IO but they're embedded within RTL, creating this confusing arrangement that was the impetus for trying to add a check phase
<whitequark[cis]>
(imagine that m is translated into one piece of code and BUF is translated into another piece of code; doesn't matter what the simulator details are)
<whitequark[cis]>
it's actually not enough to have a check phase within the step function for m, because to correctly process this, you need to consider that there may be comb feedback through the outside of m. in this case, the "reactor" would (in practice) run delta cycles for both m and BUF to fixpoint, and only after that the assertion would be checked
<whitequark[cis]>
i.e.: you can't do void step() { do { eval(); bool changed = commit(); } while(changed); check(); } in m and BUF separately
<whitequark[cis]>
* i.e.: you can't do void step() { do { eval(); } while(commit()); check(); } in m and BUF separately
<whitequark[cis]>
* i.e.: you can't do void step() { do eval(); while(commit()); check(); } in m and BUF separately
<whitequark[cis]>
you have to do uh... `bool step() { bool changed = false; do eval(); while(changed ||= commit()); return changed; }` and `void fixpoint() { do ; while(m.step() && BUF.step()); m.check(); BUF.check(); }`
<whitequark[cis]>
s/&&/||/
<whitequark[cis]>
anyway, I think that instead of a check phase, CXXRTL should have a scheduler, and schedule non-physical things like comb assert and print together with IO and testbenches and away from the deterministic core. (actually I suspect that clocking should work in kind of some similar way, but I'm too tired right now to think about it)
Guest68 has joined #amaranth-lang
Guest68 has quit [Client Quit]
<whitequark[cis]>
s/non-physical/impure/
<whitequark[cis]>
all of the above applies to the Amaranth simulator of course
<whitequark[cis]>
so the way I imagine things working are: the core of the simulator is just a passive chunk of state. outside of it, there is a scheduler that runs testbenches in whichever way and order seem necessary. testbench doing sim.get() just reads a bit of the state; testbench doing sim.set() writes a bit of the state, potentially kicking off side effects within and as such affecting other bits of the state, which in turn potentially
<whitequark[cis]>
alters readiness of other testbenches (but does not directly run them). testbenches include ones added with add_testbench and also combinatorial Print/Assert
<whitequark[cis]>
like... for all i know, testbenches could be scheduled by asyncio itself? that seems silly to do in most cases, but conceptually it would work
<whitequark[cis]>
zyp: I think `sim.get()` has no need to be awaitable under any circumstances, and `sim.set()` shouldn't be a preemption point for neither `add_process` (no need to, as everything's double buffered) nor `add_testbench` (no need to, as it isn't a preemption point), so doesn't need to be awaitable either
<whitequark[cis]>
* zyp: I think `sim.get()` has no need to be awaitable under any circumstances, and `sim.set()` shouldn't be a preemption point for neither `add_process` (no need to, as everything's double buffered) nor `add_testbench` (no need to, as preemption is undesirable in this context), so doesn't need to be awaitable either
<zyp[m]>
so what happens when a testbench sets one signal and reads another that has a combinatorial path between them? when does it get updated if there's no await statements involved during or between those operations?
<whitequark[cis]>
sim.set() runs eval/commit until converged
<zyp[m]>
and if you've got a combinatorial add_process?
<whitequark[cis]>
sim.set() does two different things in add_process() (where it just sets next) and add_testbench() (where it runs delta cycles)
<whitequark[cis]>
if you're in the outer loop handling testbenches, sim.set() runs the inner loop; if you're in the inner loop, it schedules a pending change for the next loop iteration
<zyp[m]>
hmm, maybe, I'm too tired to picture how everything fits together (why am I even awake)
<whitequark[cis]>
it's actually quite easy to implement
<Wanda[cis]>
yeah that's how I pictured all of it, too
<Wanda[cis]>
neither get nor set needs to actually be a coroutine
<Wanda[cis]>
(or awaitable)
<whitequark[cis]>
this design I described above actually makes sense
<whitequark[cis]>
zyp: ah, there is one consequence of changing things like this
<whitequark[cis]>
I think add_process maybe shouldn't have Delay anymore
<whitequark[cis]>
I can actually justify it without invoking the simulator implementation. add_process (going forward) is intended for behavioral replacement of RTL constructs, and RTL constructs can't do anything like Delay. so it should be made equal in power to it.
<zyp[m]>
it's useful if you want to simulate phase shifting/delay primitives
<whitequark[cis]>
yes, I know
<whitequark[cis]>
you can still do that using add_testbench
<whitequark[cis]>
zyp: Wanda: I think I discovered a serious issue with the simulator, namely, the `read_stream` is unimplementable with either the old or the new interface
<whitequark[cis]>
* new interface when using `add_testbench`
<whitequark[cis]>
the stream protocol requires doing a very specific thing for it to work correctly: values of signals valid and payload must be sampled at exactly the posedge of the associated clock
<whitequark[cis]>
in fact, i think that at all other times, the values of those signals can be completely undefined?
<whitequark[cis]>
anyway, on the screenshot, get_testbench is reading from the dac_stream. it awaited tick(), and got back control with a state where post-tick() propagation has already finished
<galibert[m]>
isn't that true of every synchronous signal? The only state that matters is at posedge?
<whitequark[cis]>
so it gets a weird intermediate value as a result, where the x coordinate is from the next point, and the rest of payload is from the previous one
<whitequark[cis]>
now, consider a slightly different arrangement (I swapped add_testbench calls) that "works":
<whitequark[cis]>
here, get_testbench is also sampling a weird and wrong intermediate value. however, because it runs after put_testbench (note: the gateware has a comb assignment of valid from the stream put_testbench writes to, to the stream get_testbench reads from), and nothing else runs in that cycle, this weird intermediate value happens by chance to match the value at the next posedge, and things "work"
<whitequark[cis]>
except that get_testbench is actually getting the values out almost one cycle earlier than it should be able to have them
<whitequark[cis]>
this is completely broken! we can't ship this
<whitequark[cis]>
zyp: I propose altering the domain trigger object in this way: `await tick(sample=[stream.ready, stream.valid])` returns the values of those signals at the moment of the tick, and not at a later point; the same for `await tick().until()`, the condition is evaluated at the moment of the tick
<galibert[m]>
Easier than collating the writes
<whitequark[cis]>
actually, I'm going to suggest something even more radical. await tick(*sample, domain="sync", context=None)
notgull has joined #amaranth-lang
Guest26 has joined #amaranth-lang
<whitequark[cis]>
here's why. right now, sim.get() has two completely different behaviors depending on the context where it's called from. if it's in add_process it gives you the value exactly at the wait point (e.g. right at the moment when await tick() fired), even though values may have changed since then; if it's in add_testbench it gives you the current value, and there's no way to get the value at the wait point
<whitequark[cis]>
this sucks. it's extremely confusing, and it's why we need the weird @testbench_helper functionality
<whitequark[cis]>
what if: we banned using sim.get() in add_process processes?
<whitequark[cis]>
then the only ways to sample values become: await changed(*sample) and await tick(*sample), corresponding to a comb and a sync process
<whitequark[cis]>
and, because await tick() doesn't actually behave any different in add_process vs add_testbench, we no longer need to differentiate the context for the helper functions, because you can write them in a way that they just work universally
<whitequark[cis]>
(maybe await tick(domain="sync", *, context=None).sample(*signals) is a bit more palatable; same general idea. also it needs an __aiter__.)
<whitequark[cis]>
now the difference between add_process and add_testbench becomes: add_process is race-free but no sim.get() or sim.delay() (I think delay has to go from add_process after all); add_testbench can race (with other testbenches only) but you can do sim.get() or sim.delay(); it is recommended to do any kind of IO via add_testbench and only use add_process for behavioral models that are self-contained and don't
<whitequark[cis]>
have any shared non-signal state with anything else (because add_process is only race-free in regards to signals)
<galibert[m]>
Doesn’t that mean testbench is active and process passive?
<whitequark[cis]>
yeah I think so
<galibert[m]>
And can you do combinatorial in process? IIRC ack on wishbone tends to be comb?
<whitequark[cis]>
yes
<galibert[m]>
that sounds cool
Guest26 has quit [Ping timeout: 250 seconds]
<whitequark[cis]>
ok, so I prototyped await tick().sample() (with the old interface and a bunch of really gross, unupstreamable hacks)
<whitequark[cis]>
take a look at this
<whitequark[cis]>
here, get_testbench is waiting on a Tick() until the async trigger
<whitequark[cis]>
this approach has actually resolved the race condition that existed in this design; any ordering of get_testbench and put_testbench produces more or less the same behavior (they differ slightly at the very beginning)
<galibert[m]>
What would be a sane way to replace an Instance of a hard block with something that simulates it when you're doing simulation? I'm thinking replacing my instances of m10k in true dual port mode by a standard Memory for sim
<tpw_rules>
but it looks like the only thing that uses it is SignalKey which only checks for it on a Signal
<tpw_rules>
does removing it need to go through deprecation or can it just be yanked out as part of that?
<whitequark[cis]>
it's not public
<tpw_rules>
how do the $nnn suffixes get added when two names are the same? is that amaranth or yosys?
<whitequark[cis]>
RTLIL backend I think
<tpw_rules>
ok i'll yank it out
<whitequark[cis]>
careful with SignalKey though, if you replace it with e.g. id() it may introduce nondeterminism
<tpw_rules>
yes. i need equivalent functionality for the signal names so i was thinking of just migrating it into Signal for now
<whitequark[cis]>
why do you need that for signal names?
<tpw_rules>
it seemed an easy way to name private signals
<whitequark[cis]>
I don't think that's a good way to do it
<tpw_rules>
would they only be named by a backend and keep a name of "" in .name?
<whitequark[cis]>
the RTLIL backend should just have a counter to emit $nnn
<whitequark[cis]>
like it already emits suffixes
<whitequark[cis]>
tpw_rules: yeah
<tpw_rules>
okay, sounds good
<whitequark[cis]>
skipped for VCD and stuff
<tpw_rules>
so they wouldn't be written to a vcd either? i thought you just wanted a flag which disabled showing them
<whitequark[cis]>
um
<whitequark[cis]>
how would that even work?
<tpw_rules>
i hadn't gotten to that part yet
<whitequark[cis]>
VCD has no such functionality
<whitequark[cis]>
and why write stuff you'll never show, anywau
<tpw_rules>
are there other waveform viewers that need to check?
<whitequark[cis]>
s/anywau/anyway/
<whitequark[cis]>
naw
<tpw_rules>
should the repr as just "(sig )"? or "(sig <private>)"?
<tpw_rules>
they*
<whitequark[cis]>
`(sig)` seems fine
<tpw_rules>
ok, the current formatting would have a space after "sig" but i can remove that
<Wanda[cis]>
yeah the DUID stuff needs to go, but removing it is tricky, and I don't think it belongs to the same patchset
<Wanda[cis]>
what you need to do is essentially just allowing empty name, then filtering such signals in VCD writer and in the naming pass in hdl._ir.Design
<Wanda[cis]>
the RTLIL backend needs no changes at all, I think
<Wanda[cis]>
it's already made to handle anonymous nets
<Wanda[cis]>
(the ones that happen all the time when you just pass the result of an operator directly to another one)
<Wanda[cis]>
actually you don't even need to allow empty name, it's already allowed because we don't bother to check the name other than "it's a string"
<tpw_rules>
currently it has the same meaning as None there to Signal
<Wanda[cis]>
oh
<tpw_rules>
so it will get the variable name
<Wanda[cis]>
we check by boolean?
<Wanda[cis]>
yeaah that needs a fix
<tpw_rules>
yes, i did already
<tpw_rules>
what do you mean by filter in the naming pass? i currently just taught the rtlil pass to treat a name of "" as None and then give it a name there
<Wanda[cis]>
have you looked at the Design class?
<Wanda[cis]>
and the naming pass in it?
<tpw_rules>
yes, it looks like i can modify _add_name
<Wanda[cis]>
no
<Wanda[cis]>
_add_name should not be called for these signals at all
<Wanda[cis]>
because they don't have names
<Wanda[cis]>
you need a very simple change
<tpw_rules>
ok, let me digest the logic a little further
<Wanda[cis]>
_assign_names, the loop for signal in frag_info.used_signals:
<Wanda[cis]>
just... continue here if the signal's name is ""
<tpw_rules>
yes, i was just looking there. just skip signals without names there?
<Wanda[cis]>
yup
<Wanda[cis]>
and you don't need any change in the RTLIL backend
<tpw_rules>
okay. how do anonymous nets end up getting processed through here?
<Wanda[cis]>
through where?
<Wanda[cis]>
it's a naming pass, if a net is anonymous, it's none of its business
<tpw_rules>
the Design class. i guess nets end up being Values or something, not Signals?
<Wanda[cis]>
no
<Wanda[cis]>
that happens later, in NIR converter
<Wanda[cis]>
and _nir.Net / _nir.Value are anonymous by design
<Wanda[cis]>
names are external annotations
<tpw_rules>
so frag_info.used_signals is exactly the list of Signal objects the user has used in their design
<Wanda[cis]>
and for anonymous signals we simply don't emit them
<Wanda[cis]>
tpw_rules: in the particular fragment even, not in the design
<tpw_rules>
okay. i guessed that signals might be things which are not Signals, but that's where net comes in. that change makes sense then, thank you
<Wanda[cis]>
so how it works is
<Wanda[cis]>
the NIR nets / values are completely nameless
<Wanda[cis]>
and the names are outside
<Wanda[cis]>
the pre-NIR `_assign_names` pass creates a mapping of (fragment, signal) -> name
<Wanda[cis]>
then while constructing the NIR we also construct a dict of signal -> NIR value
<Wanda[cis]>
and we copy the earlier mapping from _assign_names to NIR
<tpw_rules>
is it possible for a private Signal to end up as a port?
<Wanda[cis]>
then comes the RTLIL backend. the RTLIL backend is mainly concerned about emitting actual NIR, not the names, so it'll construct an anonymous (ie. auto-named) wire for every cell output
<Wanda[cis]>
but it will then also emit a wire for every signal present in the name dictionary for a given fragment, and assign the raw anon value to it
<Wanda[cis]>
with a further optimization that if a wire needed for NIR cell output exactly matches a signal, we'll use the signal wire instead of emitting an anonymous one
<Wanda[cis]>
tpw_rules: yes, and in this case you get an auto-named port
<tpw_rules>
so used_io_ports exclusively contains IOPort
<Wanda[cis]>
... as the name suggests, yes
<tpw_rules>
oh i see, ports already are all auto-named anyway, it's a later pass that puts the names back on
<Wanda[cis]>
Verilog-level port creation is done in separate passes at the end of _ir.py
<Wanda[cis]>
_compute_ports specifically
<Wanda[cis]>
for the ports that are values (as opposed to IO values)
<Wanda[cis]>
they're not all auto-named, we reuse a signal name if one happens to match
<tpw_rules>
more directly, why don't we have to check for a signal with a private name in the "Reserve names for top-level ports" loop
<Wanda[cis]>
oh hm.
<Wanda[cis]>
actually that's a good idea
<Wanda[cis]>
oh wait, no
<Wanda[cis]>
we don't
<Wanda[cis]>
at this point, top-level port names are already decided
<tpw_rules>
it looks like a port name can only come from the ports= argument of prepare? which only come from a Platform.build call or similar?
<Wanda[cis]>
not quite
<Wanda[cis]>
port names passed to ports= can be None, which means "name them from the signal"
<Wanda[cis]>
see _assign_port_names
<Wanda[cis]>
there we should have the check for anon signal
antoinevg[m] has quit [Quit: Idle timeout reached: 172800s]
<tpw_rules>
yeah it is possible to break the RTLIL builder if you do something weird like `ports={"": (rx.err, PortDirection.Output)}`. does that get filed under "you get what you deserve"?
<tpw_rules>
(passing that through the CLI to Platform.build)
<Wanda[cis]>
it kinda does, but also we do want a better diagnostic for it
<Wanda[cis]>
_assign_port_names would be a good place to make sure all port names make sense
<tpw_rules>
uh and actually fixing that for loop does not fix that problem
<Wanda[cis]>
also: we need to check that all our names are actually valid RTLIL identifiers
<Wanda[cis]>
(ie. don't contain whitespace)
<Wanda[cis]>
which should probably be in Signal / IOPort / ... constructor
<tpw_rules>
i think that's a later patch, i also have the issue for accidentally doing that to module names
<tpw_rules>
Wanda[cis]: i forget what you said your status was on https://github.com/amaranth-lang/amaranth/issues/1100 but it was a problem for the quartus fix. is there a way i could do this or collaborate with you on it? i am not really familiar with RTLIL at all
<Wanda[cis]>
the code is done, it now needs tests
<Wanda[cis]>
lots of tests
<Wanda[cis]>
my intent is to work on them as soon as I'm done with reviewing Cat's PRs
<Wanda[cis]>
the core concurrency model is sound and I agree with it
<Wanda[cis]>
I find .sample() useful too, and I consider it interesting how we're reinventing systemverilog program (aka our testbench) and now clocking input variables
<Wanda[cis]>
however, I'm very much not sold on two things
<Wanda[cis]>
add_process functionality reduction and the proposed interaction of until / repeat with sample
<Wanda[cis]>
I believe we explicitly wanted delay in processes, so that we can implement delay lines
<Wanda[cis]>
it may be the case that we want to provide the functionality in a different way (.set with a delay?), but removing it entirely goes directly against the discussion on RFC 36
<Wanda[cis]>
as for removing sim.get: it makes things weirdly asymmetic, and prevents you from implementing some FF types easily
<Wanda[cis]>
you also say that sim.set can be used to copy the value of a signal without looking at it, but that's not what the RFC says
<Wanda[cis]>
the RFC says that set argument needs to be int, or ShapeCastable.const-convertible
<Wanda[cis]>
(this also means the DDR register example in the RFC was invalid all along, which, oops)
<Wanda[cis]>
* (this also means the DDR register example in the RFC has been invalid all along, which, oops)
<Wanda[cis]>
further: .sample cannot be combined with .edge, which is a major problem given sim.get removal in process — how would you amend the DDR buffer example to add clock enable or sync reset, for example?
<Wanda[cis]>
also: we now have no way at all to read memory in processes
<Wanda[cis]>
(we also have no way at all to watch a memory for changes, which I consider a separate defect; I may consider amending RFC 62 to add such a thing
<Wanda[cis]>
* a thing)
<Wanda[cis]>
overall, I believe our processes should be powerful enough to, in principle, implement all of PyRTL compiler with them
<Wanda[cis]>
and the VCD writer as well
<Wanda[cis]>
* add_process functionality reduction and the proposed interaction of until with sample
<Wanda[cis]>
as for until, it now has very weird semantics
<Wanda[cis]>
essentially the first time it performs the check, it does sim.get, then it goes through sampling
<tpw_rules>
hm, does RTLIL support unicode identifiers? it says no ascii below 32 but what about above 127
<Wanda[cis]>
this means that, in the absence of other testbench modifications of the condition, the first and second checks are actually the same
<Wanda[cis]>
tpw_rules: it does (by not disallowing them). however, whatever toolchain ends up consuming the generated Verilog may disagree.
<tpw_rules>
Wanda[cis]: ok, i'm trying to come up with identifier rules. so chars 32-126 are okay, must not start with a \ or $ ?
<Wanda[cis]>
starting with \ or $ is perfectly fine
<Wanda[cis]>
chars 33-126 (not 32-126) are okay
<Wanda[cis]>
as for 128+ ... I think they should be allowed too
<tpw_rules>
oh, yes, 32 is space
<Wanda[cis]>
like, it's 2024, international identifiers are a thing
<tpw_rules>
i'd be okay with that, yeah
<Wanda[cis]>
and if it causes problems for some shit toolchain down the line, I believe the correct solution is adding a flag that sanitizes the verilog output, perhaps implemented in yosys
<Wanda[cis]>
and set it from the platform
<tpw_rules>
okay. it's also still strictly better than currently
<galibert[m]>
Yeah, cursed identifier names for cursed HDL
<Wanda[cis]>
I'd go for something like "is printable and not whitespace"
<Wanda[cis]>
instead of looking at char codes
<Wanda[cis]>
it works out to 33-126 for ascii range
<galibert[m]>
testing printability in unicode is annoying
<Wanda[cis]>
we have Python.
<tpw_rules>
it does come with the unicode database, and i think relies on unicode rules for internal names already
<tpw_rules>
i'll fiddle with that and file a PR
<tpw_rules>
not sure if that's RFC-worthy, my primary goal is fixing explosions using ""
<Wanda[cis]>
it's not RFC-worthy, just do a PR
<tpw_rules>
ok
<Wanda[cis]>
Catherine: (continuing RFC 64 review) the `stream_recv` example is just plain broken if `valid` is already `1`, since `until.sample` will return `None` as data in that case
<Wanda[cis]>
further, the example as given may tick twice with ready set to 1, eating a value
<Wanda[cis]>
likewise for stream_send, the value can get sent twice
<Wanda[cis]>
I think these examples just expose the fundamental brokenness of until-sample interaction, I'm not quite sure what it should be, though
<Wanda[cis]>
also, interaction of sample with repeat is... slightly weird
<Wanda[cis]>
return a list of tuples of sampled signals, one tuple per tick? meh.
<Wanda[cis]>
idc really
<Wanda[cis]>
I also see some potential use for a symmetric feature to sample: await sim.tick(...).update(sig, new_value), which corresponds to SystemVerilog clocking output variables
<Wanda[cis]>
doing this would perform the updates right after the tick, ensuring all other testbenches see the new values
<Wanda[cis]>
but this is... less well-motivated than sample
<galibert[m]>
why would sample return None?
<Wanda[cis]>
galibert: have you read the proposed amendment?
<galibert[m]>
Non, only the discussion here
<Wanda[cis]>
then I'd recommend doing so before engaging in discussion.
<galibert[m]>
Didn't realize there was an amendment
<Wanda[cis]>
... then people should get used to names that don't contain spaces
<Wanda[cis]>
the accepted character set is already ridiculously rich
<galibert[m]>
Oh, that "None" if the simulation has not advanced. Why would the simulation not advance in the first place?
<tpw_rules>
okay
<Wanda[cis]>
like, neither VCD nor RTLIL can accept space-having names, so there's literally nothing we can do with them
<Wanda[cis]>
they're also not valid in Verilog names
<galibert[m]>
sim.tick().until(...).sample(...) is supposed to wait at least until the next tick, no?
<Wanda[cis]>
.... okay I think VHDL names can contain spaces, so there's actually prior art for that
<Wanda[cis]>
but VHDL names are also several kinds of batshit insane, so I refuse to continue that line of thought any further
<galibert[m]>
There's a lot of prior art for converting spaces to underscores when needed too
<Wanda[cis]>
galibert[m]: can you *please* read the RFC
<galibert[m]>
I did now
<galibert[m]>
until returns immediatly if the condition is ok, but until is supposed to be tested after tick elapses
<Wanda[cis]>
that's not what the RFC says
<Wanda[cis]>
> If condition is initially true, await will return immediately without advancing simulation.
<Wanda[cis]>
that's also not what the example implementation says
<galibert[m]>
I don't understand the example implementation. But a version working like you say makes no sense, which may be why I didn't interpret it that way
<galibert[m]>
since a simple "do a think when that variable is on on the tick" is just not implementable
<Wanda[cis]>
that is exactly why I'm complaining about it!
<galibert[m]>
Ok, then I agree with you :-)
<galibert[m]>
I understood it as linear composition, where each level of trigger does its thing and passes to the next one, trying again if the next one fails (recursively)
<galibert[m]>
So tick() waits one tick, then calls until() which passes or fails, if it fails ticks tries again after a tick, if it passes sample takes the values and always passes
<galibert[m]>
you could even add a changed in there
<galibert[m]>
you need to have the whole chain to pass, otherwise the first one advances time and tries again
<galibert[m]>
(or waits for the time to advance in the comb case)
<Wanda[cis]>
also whatever happens in tcl generally happens after Amaranth is done producing the netlist
<mcc111[m]>
but there's that Amaranth Build System thing right
<Wanda[cis]>
(plus we have support for remote builds, so in general the Python interpreter and tcl interpreter involved don't even run on the same machine)
<mcc111[m]>
that does sound desirable
<mcc111[m]>
today I installed the "Libero" toolchain on my ubuntu laptop
<mcc111[m]>
it was a miserable experience
<Darius>
you can use Tkinter to run Tcl inside Python, but yes you will need the right Tcl interpreter