<Chips4MakersakaS>
To be honest I have never came to the point where I could grasp what the yield in a testbench actually does; I did use cocotb on Verilog output. As a consequence I'm afraid this RFC is above my head.
<galibert[m]>
yield sucks, it means four different things depending on the context
<whitequark[cis]>
Chips4Makers (aka Staf Verhaegen): I see
<whitequark[cis]>
do you mean yield in general, or the specific case of bare yield?
<whitequark[cis]>
galibert: though so does `await`
<galibert[m]>
Yeah, it’s no better
<galibert[m]>
And the performance is impressively low
<whitequark[cis]>
I think it is better; at least now you have the same syntax for yielding control to the simulation or to the OS, and it eliminates one of the forms (there's no bare await)
<Chips4MakersakaS>
yield in the conext of an Amaranth testbench, not general python, I do make generators in my python code.
<whitequark[cis]>
I'm not sure where the performance claims come from considering we don't have async testbenches
<whitequark[cis]>
Chips4Makers (aka Staf Verhaegen): are you familiar with coroutines?
<Chips4MakersakaS>
yes
<whitequark[cis]>
so yield val and yield val.eq(other) are essentially just a way to avoid global state by delegating an operation to "the caller of this coroutine", if this makes any sense
<whitequark[cis]>
the caller of the coroutine (the simulator) knows what it is, so when the coroutine returns a value or an Assign statement as a value, it updates its state
<whitequark[cis]>
without that we'd have to define some sort of implicit global for "the current simulator" which doesn't feel especially good
<Chips4MakersakaS>
Main difficulty for me is how the time and clock advances with the yields. With cocotb you could explicitly wait for rising edge or for a certain amount of time. But have to admit that is also rusty ATM.
<whitequark[cis]>
right, so this is actually what this RFC is about
<whitequark[cis]>
you can wait for a rising edge with yield Tick("domain") (and a future RFC will allow waiting for a rising edge on an arbitrary signal)
<galibert[m]>
The performance claim comes from comparing the result of a reimplementation of the 6502 pla and comparing its output with probes in perfect6502. Trying all 2**14 values was sub-second for perfect6502 and minute-range for the sim in amaranth
<galibert[m]>
abysmal
<whitequark[cis]>
that sort of stuff is what cxxsim is for
<whitequark[cis]>
the Python simulator in Amaranth is actually something like 2x faster than Migen's while being more flexible
<galibert[m]>
Oh I'm not saying migen is in any way good
<whitequark[cis]>
the simulator optimized for its very low startup latency: you can start evaluating against your netlist almost immediately, without any compilation step
<whitequark[cis]>
(and portability)
<whitequark[cis]>
anyway, discussion of performance isn't on topic since that's not what the RFC is about; it's about usability
<tpw_rules>
(which will benefit any future simulator too, correct?)
<whitequark[cis]>
yes, this is a general change in the simulator interface
<jfng[m]>
merge; from a purely ergonomic POV, i think this is worth the increase in API surface
<jfng[m]>
i currently end up using `yield; yield Settle()` a lot in synchronous testbenches, and having to think about it is a bit tedious
<whitequark[cis]>
it also prevents us from adding yield from fifo.read() or such
<Chips4MakersakaS>
So still learning. How would testbench look like for nultiple domain design ?
<Chips4MakersakaS>
* for nultiple clock domain design
zyp[m] has joined #amaranth-lang
<zyp[m]>
while we're updating the simulator interface, would it be worthwhile to explore whether using await in place of some of the yields would be more ergonomic?
<whitequark[cis]>
zyp: that would be the topic of another (planned) RFC
<whitequark[cis]>
I'm not entirely sure what is the viable approach there yet
<whitequark[cis]>
you have three options: async def, async def generator, and def generator
<whitequark[cis]>
all three can potentially be mixed
<whitequark[cis]>
it is unclear what are the costs of allowing such mixing, and what are the costs of adding e.g. a separate AsyncSimulator or the like
<whitequark[cis]>
backwards compatibility is important, but preserving the existing level of performance is pretty valuable too, and it's not clear that adding compatibility shims won't make it unacceptable
<zyp[m]>
would the other RFC deprecate the new interface this RFC proposes, or would they coexist?
<whitequark[cis]>
add_testbench is here to stay, just like add_sync_process
<galibert[m]>
(or you mean in the specific case of add_testbench?)
<whitequark[cis]>
there are use cases you cannot practically achieve without either
<whitequark[cis]>
i.e. you cannot write testbenches (especially not while abstracting out functions) without add_testbench, and you cannot replace RTL with behavioral code without add_sync_process
<zyp[m]>
what's the value of add_sync_process if Settle is deprecated?
<whitequark[cis]>
(not without adding a lot of overhead and complexity that makes it impractical to emulate one with the other)
<whitequark[cis]>
add_sync_process lets you pretend to be a flop
<Chips4MakersakaS>
galibert: TY
<whitequark[cis]>
the ability to observe comb output values "just before" the clock edge is something you need to be able to easily replace synchronous logic with a Python function
<whitequark[cis]>
or in other words: two add_testbench waiting on the same clock edge will race (it is undefined which order they are evaluated at), two add_sync_process waiting on the same clock edge are OK (they are evaluated simultaneously and see the same values even if they use .eq in the body)
<zyp[m]>
and that use of add_sync_process doesn't need Settle?
<whitequark[cis]>
no? using Settle there completely breaks it, in fact
<whitequark[cis]>
(because now there will be a race)
<zyp[m]>
okay, I haven't used the amaranth simulator enough to have a clear picture of how everything fits together yet
<zyp[m]>
but I've tried to model DDR IO registers in the migen simulator, and that sounds like it'd be a lot easier in amaranth :)
<whitequark[cis]>
we are approaching the end of our hour long slot
<whitequark[cis]>
does anyone here have comments on the technical substance of the RFC?
<Chips4MakersakaS>
Not me.
<whitequark[cis]>
(aside from jfng who suggests merging)
<whitequark[cis]>
I'm wary of merging an RFC that nobody understands, but the issue is partly that nobody understands it because the current system is incredibly confusing and hard to teach
<crzwdjk>
Seems to make things easier for my use case of writing testbenches. So I would vote merge as well.
<galibert[m]>
I don't use add_sync_process because I rarely have only one clock domain, so abstain
<whitequark[cis]>
galibert: it applies exactly the same to you calling `yield Tick()`
<whitequark[cis]>
i.e. you won't need to use yield Tick(); yield Settle()
<galibert[m]>
(it's close to impossible to emulate an old processor without at least two phases)
<galibert[m]>
I... don't?
<whitequark[cis]>
then it sounds like your use case will be largely or entirely unaffected
<whitequark[cis]>
(thinking about how you would use a multi-phase setup, it probably isn't subject to the issues this RFC is attempting to solve)
<whitequark[cis]>
I think I'm going to end the meeting here; we have a weak consensus towards merge but I'm wary of merging something so few people actually understand
<whitequark[cis]>
I'm going to revisit it next time so it would be great if zyp and Chips4Makers (aka Staf Verhaegen) would be able to look into the current and proposed mechanics?
<whitequark[cis]>
Chips4Makers (aka Staf Verhaegen): happy to have a 1-1 with you to get into the gnarly details if that's something you'd have time for
<Chips4MakersakaS>
I will have to see.
<whitequark[cis]>
all right
<whitequark[cis]>
that's it from me for today then
<mcc111[m]>
So I've just learned that the Cyclone V has onboard memory ("BRAM" units?). Does Amaranth have the ability to allocate BRAM directly?
Wanda[cis] has joined #amaranth-lang
<Wanda[cis]>
via the Memory class, yes
<Wanda[cis]>
it's... in a bit of a rough shape unfortunately, but there are plans for fixing it
<Wanda[cis]>
generally for block RAMs you have the options of either instantiating the underlying vendor primitive manually (Instance in amaranth terms) and having fun connecting the myriad wires, or letting the HDL pick a memory primitive for you (which, in amaranth, means using Memory)
<Wanda[cis]>
and which one you choose basically depends on how "standard" your needs are
<Wanda[cis]>
(if you don't have experience with this, go for Memory and let's hope it's not broken in some funny way in the Intel flow)
<mcc111[m]>
OK, interesting
<mcc111[m]>
At the moment I'm running on the Pocket which doesn't use the "real" intel platform, so I might have to allocate/instantiate it in Verilog and somehow plug it into the amaranth core from there
<Wanda[cis]>
the Intel platform code isn't involved here
<Wanda[cis]>
Memory gets translated to target-independent Verilog code
<Wanda[cis]>
which ... Quartus will hopefully be able to make sense of, but it's a touchy area
<Wanda[cis]>
(there are plans for actually having the vendor cell hooked up by amaranth platform code instead of letting Verilog synthesizer do this, but we're not doing that yet)
adamgreig[m] has joined #amaranth-lang
<adamgreig[m]>
ime Memory should work fine
<adamgreig[m]>
though.... not sure I've tried synthesising it since yosys's big memory processing updates
<Wanda[cis]>
in this case it's a question of what Quartus is doing, not yosys
<Wanda[cis]>
(well... a bit of both, since yosys is tasked with emitting the Verilog)
<tpw_rules>
fwiw i like to use quartus's IP designer and just make a memory block and use it through an Istance
<tpw_rules>
it's in some ways the worst of both worlds but it is guaranteed to work
<tpw_rules>
maybe i'm just overly cautious too. but that's how i do it for big important memories at least
<Wanda[cis]>
you know
<Wanda[cis]>
it may just work
<Wanda[cis]>
but ... yeah, pushing memories through Verilog is extraordinarily fragile
<Wanda[cis]>
it's basically compiling down the description of your memory into imperative code, then having the synthesis tool pattern-match and decompile what you wrote back into something it understands
<tpw_rules>
quartus also nicely guides you through it especially if BRAM is an unfamiliar term
<adamgreig[m]>
i have way too many memories to click through a wizard and make an instance for all of them, though
<tpw_rules>
yeah totally fair. but if it's your first time even hearing the term that really isn't a bad way to do it
<adamgreig[m]>
for ecp5 dual port read+write and some other edge cases i've manually instantiated the ecp5 primitives, but mostly Memory has worked well
<adamgreig[m]>
yea for sure, even if you just use it to find out all the settings and things and then throw it out
<Wanda[cis]>
so whether it works depends on whether Quartus is happy with the Verilog code patterns that yosys emits
<tpw_rules>
fwiw too Quartus also seemed happy with whatever litex did
<Wanda[cis]>
litex is migen though
<Wanda[cis]>
doesn't go through yosys verilog emitter
<tpw_rules>
ah
<tpw_rules>
knew the first, not the second
<Wanda[cis]>
FWIW memory pattern recognition code (for Verilog into yosys direction) is some of the most cursed shit I wrote; at some points it basically guesses what you meant and asks a SAT solver to prove that its guess is correct
<Wanda[cis]>
and from what I know, some vendor tools don't bother with the second part of that
chaoticryptidz has quit [Read error: Connection reset by peer]