<cr1901>
Yes. With -dff, the design optimizes down to 15 SB_IOs and 1 GB_IO. Without it, around 1200 LUTs and 400 FFs (as expected)
<cr1901>
Have NFI how I'm gonna minimize it tho
<cr1901>
Hmmm, well the design doesn't work either with -dff disabled. Maybe the optimization is correct. I have no idea why the design isn't working. That's tomorrow-me's problem
<cr1901>
All I can say rn is that I've yet to get a design working with Wishbone2CSR attached to a top-level wishbone decoder with sparse=True
indy_ is now known as indy
<cr1901>
Okay I figured it out... if you have sparse enabled (as wq implied, but I didn't think it applied to 8-bit ports), you have to write to the low byte at offset 0 first, and then the high byte at offset 0xc if you want a byte to be xferred.
<cr1901>
(when sparse=True)
<cr1901>
Or something like that, I may not have explained the general method correctly. Point is, I have to write to 2 different addresses to get data to be latched into an I/O port on a write when sparse=True and using Wishbone2CSR
<galibert[m]>
That sounds weird
<galibert[m]>
Catherine: What me to try to write a rfc on what we talked about w.r.t component constructor?
<whitequark[cis]>
I see only one outstanding issue (keeping in mind that we will need to have an RFC cycle for this, of course), which is that of get/set
<whitequark[cis]>
it had made more sense when you also had .changed() and I guess potentially others, but now .set is kind of overlapping with .eq
<whitequark[cis]>
do we want that overlap? I can actually see it both ways
<whitequark[cis]>
I guess the question to ask before that would be: is there anything we will run into trouble with if we replace ".set()" with "await .eq()"?
<whitequark[cis]>
I see two main downsides:
<whitequark[cis]>
1. making Value awaitable does work but it means that ValueCastables need to be awaitable and this exposes people writing those to a probably unnecessarily high amount of async Python internals
<whitequark[cis]>
(also, how would you even write .eq()? we would need some way to make "a group of .eq" awaitable, I guess. we do have StatementList but ehh)
<zyp[m]>
View.set() does things View.eq() doesn't
<whitequark[cis]>
2. `.eq()` implies, semantically, that the value is equal at all times [in given conditions]. meanwhile, `.set()` assigns the value only once
<whitequark[cis]>
zyp[m]: can you elaborate?
<zyp[m]>
the way I've implemented it, you can do e.g. View.set({'a': 1, 'b': 2})
<whitequark[cis]>
oh, I see!
<whitequark[cis]>
hm, do we want to have the same functionality within .eq()?
<zyp[m]>
which means you can also pass in a dict to StreamInterface.send() when the stream payload shape is a struct
<whitequark[cis]>
I feel like the answer is "maybe no" but I'm also concerned about having two sub-languages, synthesizable and simulatable
<whitequark[cis]>
that is honestly probably fine, if you merge the two you get Verilog and we all know how that ends
<whitequark[cis]>
okay, I think I've convinced myself that .get/.set is the way to go
<whitequark[cis]>
if you want to champion this proposal, then on the basis of this PoC and the discussions previously, I would like to encourage you to write an RFC and an implementation (and, as you've suggested, subsume my testbench process RFC into it, since eliminating Settle is almost required for the async interface, or at least for it to be sanely usable)
<whitequark[cis]>
personally I think the proposed interface is basically good as-is, with maybe minor changes, and that the implementation is alright too, again with minor but maybe slightly more substantial changes
<zyp[m]>
yeah, I'm intending to write a RFC for this, but I think I'd rather depend on RFC #27 than subsume it entirely
<whitequark[cis]>
okay; then there is a dependency between my work and yours
<whitequark[cis]>
I think I'll get around to updating RFC #27 not earlier than new year
frgo has quit [Ping timeout: 260 seconds]
<zyp[m]>
that is fine, I figure getting 0.4 out takes precedence in any case, so I'm not planning to hurry with the RFC
<whitequark[cis]>
I'm also happy to merge a PR to RFC #27 if you want to update it as it was requested
<whitequark[cis]>
so there are options
<whitequark[cis]>
yes, getting 0.4 out takes precedence, but I think that will probably be done by next Tuesday
<whitequark[cis]>
at least 0.4.0
<galibert[m]>
A week to 0.4 prerelease? Cool news !
frgo has joined #amaranth-lang
<whitequark[cis]>
I'm thinking about just not doing prereleases
<whitequark[cis]>
but having the .0 releases be something that is known to possibly have bugs
<whitequark[cis]>
this was the norm in software for so long and I think it still works fine
<zyp[m]>
the issues with #27 was just a matter of making the intent clearer, not actually changing anything in the proposal?
<whitequark[cis]>
Rust doesn't have prereleases, for example, they just cut the main branch whenever
<whitequark[cis]>
zyp: yes, in making it more convincing
<whitequark[cis]>
especially the bit where two testbenches using add_sync_process cannot race, and two testbenches using Settle() can
<whitequark[cis]>
and the bit where replacing sync logic with something that uses Settle() is unsound
<zyp[m]>
ok, I'll take a look
<whitequark[cis]>
thanks!
<adamgreig[m]>
rust does have a pretty long master -> beta -> stable train, though... but ime I don't know how much testing/use the betas really get compared to either nightly or stable
<adamgreig[m]>
no complaint about getting 0.4 out quickly and without a prerelease though!
<whitequark[cis]>
I think betas are very underused
<adamgreig[m]>
I think the main purpose is to give some time for bugs to be found in nightly, realise they affect the beta, backport a fix before it becomes stable
<adamgreig[m]>
or like, the main actual practical utility, if not the stated purpose
<galibert[m]>
Prerelease and .0 are mostly the same thing in practice
<cr1901>
Notice the write to 0x2000000 (sw s1, 0(s0)); it shows the led mux's w_stb being activated, and the w_data being changed. But the LEDs aren't updated
<cr1901>
it isn't until I do a second write at 0x200000c (sw s1, 12(s0)) that the LEDs get a signal to be updated
<cr1901>
the idea is the CSR bus has built-in atomic updates for writes > 8-bits (simplified). But I don't need these for 8-bit I/O registers.
frgo has joined #amaranth-lang
<cr1901>
(actually it may have atomic reads as well, I forget)
frgo has quit [Ping timeout: 268 seconds]
<whitequark[cis]>
both reads and writing
frgo has joined #amaranth-lang
<cr1901>
How do you let the peripheral know "I'm done with the read, stop holding the latched value"?
<whitequark[cis]>
you don't
<whitequark[cis]>
it latches the value when you start reading (the inverse of what's done for writes)
<cr1901>
Okay, so I think what I have to do is "reads are at offset 0, writes are at offset 0xc" (my specific use case) if I want both reads/writes to be done in a single cycle
frgo has quit [Ping timeout: 260 seconds]
<cr1901>
(sparse does give good savings, FWIW. It just makes... addresses very fucky :P)
<whitequark[cis]>
(side note: it's weird to call the decoder "periph_bus". it's not a bus! it's an interconnect component. it has a .bus property iirc)
<whitequark[cis]>
anyhow, I am very confused
<whitequark[cis]>
you have data_width the same for the CSRs and the Wishbone bus
<cr1901>
granularity=8, alignment=25) is in the __init__() for the first link
<whitequark[cis]>
oh
<cr1901>
self.decoder attaches to CPU
<cr1901>
memory attaches to self.decoder
<whitequark[cis]>
ok yeah that makes sense
<cr1901>
peripherals attach to periph
<cr1901>
and periph attaches to decoder with sparse=True
<cr1901>
What I have _found_ is that if I want to write to the LED peripheral _and have the LEDs actually reflect the output of the write_, I have to write to 2 I/O locations
<cr1901>
once to set the "holding register that implements atomic writes"
<cr1901>
another to "xfer the holding register that implements atomic write" to the actual LEDs
<whitequark[cis]>
that's a bug
<whitequark[cis]>
if you have 8-bit CSR elements on 8-bit-wide CSR bus, it should take exactly 1 write
<whitequark[cis]>
and there are no holding registers in this case
<whitequark[cis]>
I would encourage you to report an MCVE to jfng
<whitequark[cis]>
this is a slightly confusing situation, because there are 2 levels of width adaptation
<cr1901>
Will minimize when I get the chance
<cr1901>
yes, I know, I do alignment=2 in the peripherals
<cr1901>
this ends up being the way to save space compared to my wishbone-only version of the SoC
<whitequark[cis]>
- there is the atomic read/write adapter inside CSR decoder that splits big elements into smaller ones. if you have element.width == csr_bus.data_width it should do nothing
<whitequark[cis]>
- then there is the FSM in the CSR-Wishbone bridge that splits big Wishbone transactions into smaller CSR transactions. if you have sparse==True it does not exist
<whitequark[cis]>
if you have a 32-bit element on a 8-bit-wide CSR bus connected to a 32-bit Wishbone bus a write would take 4 cycles because it has to go through this bottleneck
<whitequark[cis]>
which is intended and desired
<whitequark[cis]>
with alignment==2, hmm
<cr1901>
I didn't exactly think hard about this lol :P
<cr1901>
> which is intended and desired <-- yes and I agree w/ this. One point of having a smaller CSR bus is to save resources if you desire e.g. 8-bit regs
<whitequark[cis]>
you should definitely have, because it explains why you are observing the behavior you do
<whitequark[cis]>
alignment==2 is designed to have predictable write latency when you are accessing both 8-bit and 16-bit registers on a 8-bit CSR bus from a 16-bit Wishbone bus
<whitequark[cis]>
remember how there are 2 interacting processes?
frgo has quit [Ping timeout: 264 seconds]
<whitequark[cis]>
basically, we want writes to complete in the same amount of bus cycles regardless of how big the CSR element is
<whitequark[cis]>
because doing otherwise is a potentially expensive footgun esp in ASICs
<whitequark[cis]>
I think you are misusing alignment and getting the results you do because you misuse it
<whitequark[cis]>
what happens if you do alignment=1, the default? the addresses will go back to normal of course (leds_reg at 0, inout_reg at 4, oe_reg at 8)
<cr1901>
I can get some savings by increasing that (without affecting alignment)
<cr1901>
(I ran "pdm demo", btw)
<whitequark[cis]>
interesting
<cr1901>
(Gimme a sec, the "git diff" I made was kinda a mess)
frgo has quit [Ping timeout: 256 seconds]
<cr1901>
Okay, I'm gonna be honest, I don't quite remember how I got to this diff, but this diff from the reduce-size branch was where I realized "holy shit, I can get GREAT savings by abusing sparse and alignment" 1243 ICESTORM_LCs
<cr1901>
Basically, what I probably did is the compiler equivalent of "passing a bunch of optimization flags together without any regard to how they interact", in an effort to make Number Go Down