#amaranth-lang on 2024-03-09 — irc logs at libera.irclog.whitequark.org

2024-02-21 07:31 whitequark[cis] changed the topic of #amaranth-lang to: Amaranth hardware definition language · weekly meetings: Amaranth each Mon 1700 UTC, Amaranth SoC each Fri 1700 UTC · play https://amaranth-lang.org/play/ · code https://github.com/amaranth-lang · logs https://libera.irclog.whitequark.org/amaranth-lang · Matrix #amaranth-lang:matrix.org

00:24 <tpw_rules> jfng[m]: random Q, i noticed you used `del` in the GPIO code and a few other places?

01:01 lf has quit [Ping timeout: 240 seconds]

01:02 lf has joined #amaranth-lang

02:09 Stary_ is now known as Stary

02:31 Degi has quit [Ping timeout: 255 seconds]

02:33 Degi has joined #amaranth-lang

06:42 notgull has joined #amaranth-lang

10:39 frgo has joined #amaranth-lang

13:13 notgull has quit [Ping timeout: 264 seconds]

13:16 notgull has joined #amaranth-lang

13:19 cr1901_ has quit [Read error: Connection reset by peer]

13:20 cr1901 has joined #amaranth-lang

13:28 notgull has quit [Ping timeout: 264 seconds]

17:16 <_whitenotifier-7> [amaranth] lekcyjna123 opened issue #1193: Mocks of combinational circuits and `add_process` - https://github.com/amaranth-lang/amaranth/issues/1193

18:33 <jfng[m]> <tpw_rules> "jfng: random Q, i noticed you..." <- yeah, it's just to explicitly indicate that the `pin_i_sync_ff` variable is no longer used past the body of the loop

18:39 <jfng[m]> <tpw_rules> "jfng: maybe it should be renamed..." <- `offset_granularity` would be a more explicit name, but i'd still keep 8 as default (as it is what most users are familiar with)

18:45 <jfng[m]> i.e. even with a 32-bit CSR bus, using an 8-bit granularity gives you consistent offsets between your implementation, documentation, etc

18:45 <tpw_rules> but the only place it matters in implementation is the offset parameter to csr.Builder, and even then it is subject to restrictions

18:48 <jfng[m]> yeah, and if you care about providing explicit offsets for your registers (e.g. for backward compatibility between silicon revisions), it gives you a single point of truth between implementation, documentation, testbenches, etc

18:49 <tpw_rules> all of those things except documentation need the non-granularized offset though

18:49 <jfng[m]> docs and testbenches and drivers would use the products of a BSP generator, but that offset could be directly obtained from the implementation

18:50 <tpw_rules> (or possibly differently granularized offset)

18:50 <jfng[m]> tpw_rules: you'd probably want your firmware drivers to use 8-bit granularity, i think

18:50 <tpw_rules> not if i'm writing boneless code

18:51 <jfng[m]> right, and in any case it would be parameterized/translated by the BSP generator

18:51 <tpw_rules> then why don't the BSP generator and the docs generator take a granularity? what does the peripheral have to say about it?

18:52 <tpw_rules> or rather, why does the peripheral have a say in it?

18:52 <jfng[m]> that's a good point

18:52 <tpw_rules> like, there could be 10 different granularities between the user saying "store 3 to 0x42" and the peripheral in a sufficiently perverted bus architecture

18:53 <jfng[m]> in a sense, what would really matter in `csr.Builder` is what is the most convenient for the peripheral designers

18:55 <tpw_rules> that's why i say default to data_width. they can't specify an offset with precision smaller than data_width//granularity anyway

18:56 <tpw_rules> like, i have concern that someone's going to say "okay great i'm gonna put all my registers at offsets with multiples of 4" and needlessly break compatibility with a 64 bit bus

18:57 <jfng[m]> jfng[m]: and then defaulting to `data_width` can become attractive; i'm still not sure that it is a better choice though

18:57 <jfng[m]> it does make things much simpler to reason about

18:57 <tpw_rules> i would think that adding registers in a consistent order and not specifying an offset is the best way to guarantee backwards compatibility

18:59 <tpw_rules> also with it being data_width, nobody looking at the peripheral or using it on the FPGA side can use addresses that are not referenced to that. the 32 bit peripheral designer says "okay register 1 is at offset 4" but, say, logic or testbenches that access the peripheral uses address 1

19:00 <jfng[m]> also a good point

19:08 <tpw_rules> also also the granularity is not compatible with the addr on e.g. csr.Decoder.add, and it's not really clear how it could be made so. that uses the data bus width too

19:11 <tpw_rules> so to reiterate, my optimal proposal is to drop granularity from csr.Builder. it's only useful to a designer who knows the SoC CPU's granularity and is trying to design peripherals that have known data_width and want to use offsets when adding registers.

19:12 <tpw_rules> the other proposal is to rename granularity to offset_granularity to emphasize what it affects, and default it to data_width so in the default case the offsets used line up with the rest of the FPGA design. that way someone could specify a different one, but they would have to know it could break data_width parameterization and confuse readers

19:13 <tpw_rules> in both cases the granularity from csr.Builder should not be exposed to any generator; the generator should take in its own granularity from the perspective of what the artifacts are being generated for and scale the addresses appropriately

20:18 byteit101 has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

21:51 chaoticryptidz has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

21:51 chaoticryptidz has joined #amaranth-lang

21:52 mcc111[m] has joined #amaranth-lang

21:52 <mcc111[m]> Imagine I tell Amaranth to add a semi-large group of numbers— say a + b + c + d. will it do something smart like (a + b) + (c + d) or will it add these linearly?

21:54 <tpw_rules> "arithmetics on Amaranth values never overflows because the width of the arithmetic expression is always sufficient to represent all possible results.". so it doesn't much matter. i think the toolchain will end up rearranging the logic. i expect the output verilog to be linear iirc

21:56 <Wanda[cis]> mcc111[m]: this goes to the underlying synthesis tool unchanged

21:56 <Wanda[cis]> yosys has a pass that will group these things, and do some pretty smart lowering

21:56 chaoticryptidz has quit [Client Quit]

21:56 <Wanda[cis]> (basically: there is only one adder in the final netlist, and a bunch of 3-to-2 compressors)

21:57 chaoticryptidz has joined #amaranth-lang

21:58 <Wanda[cis]> I'd expect any other reasonably smart synth tool to do the same, it's a well-known technique

21:58 <galibert[m]> what's a compressor in that context?

21:58 <Wanda[cis]> (or something equivalently smart; for Altera FPGAs the hardware can actually do 3-way addition natively-ish)

22:00 <Wanda[cis]> 3-2 compressor is a circuit that takes 3 numbers (a, b, c), and outputs 2 numbers (e, f)

22:00 <Wanda[cis]> such that a + b + c == e + f

22:00 <Wanda[cis]> it's cheaper than addition, and doesn't have a long carry chain, it's just simple per-bit gates

22:01 <galibert[m]> how does that work?

22:01 <tpw_rules> i thought expressions were broken up into intermediate variables in verilog

22:01 <Wanda[cis]> so when you have a bunch of numbers to add, you construct a DAG of these compressors smashing numbers together until there are only two left, then do a normal carry-chain addition on the final two

22:01 <galibert[m]> I can see muxes if at least one bit is zero, but what if they're all one?

22:01 <Wanda[cis]> (this is also how you lower multipliers, multipliers are multi-input adders with some masking)

22:02 <Wanda[cis]> simple

22:02 <Wanda[cis]> you use a full adder

22:02 <Wanda[cis]> take bit-slice 0

22:02 <Wanda[cis]> compute a[0] + b[0] + c[0] == x

22:02 <Wanda[cis]> x is 2-bit

22:02 <Wanda[cis]> you wire x[0] to e[0], x[1] to f[1]

22:03 <Wanda[cis]> basically e is low-order bits of all single-bitslice sums, f is high-order bits of those

22:03 <galibert[m]> ohhh, e is non-carry addition result, f is carries?

22:03 <Wanda[cis]> (and you tie f[0] to 0)

22:03 <Wanda[cis]> yuuup

22:03 <galibert[m]> beautiful

22:03 <galibert[m]> makes a lot of sense

22:04 <Wanda[cis]> if you're on FPGA, that's two LUTs or one frangible LUT

22:04 <Wanda[cis]> per bitslice

22:04 <Wanda[cis]> with no long critical path

22:04 <galibert[m]> yeah, it's just beautiful

22:04 <galibert[m]> the real adder is just used at the end to fold everything

22:04 <Wanda[cis]> mhm!

22:05 <cr1901> Searching for frangible LUT returns exactly 2 results on Google, one from wq, the other from lofty

22:06 <cr1901> What _is_ a frangible LUT?

22:06 <galibert[m]> cr1901: the pretend-LUT6 in the cyclonev labcells, to give one example, is in reality two LUT4 and four LUT3

22:06 <Wanda[cis]> it may be fracturable LUTs?

22:06 <cr1901> oh, so it's the "A LUT6 is two LUT5" thing Xilinx 7-series does?

22:06 <Wanda[cis]> basically large LUTs with multiple outputs where the uhh LUTting power can be distributed between outputs in funny ways

22:06 <galibert[m]> so you can use it as a LUT6 but you can also split it into parts have have things done in parallel

22:06 <Wanda[cis]> yes

22:07 <Wanda[cis]> Xilinx has a pretty simple construct of LUT6 or LUT5×2

22:07 <Wanda[cis]> Altera has... something more complex

22:07 <Wanda[cis]> but, same core idea really

22:08 <cr1901> I actually really missed that feature on ice40 recently... turns out 2s-complementing a n-bit number requires N-LUTs for the complement part. But with fracturable LUTs, I could do it in N/2 (_I think_)

22:09 <cr1901> But I also wouldn't care about LUT counting if I wasn't targeting ice40, so...

22:09 <galibert[m]> https://github.com/Ravenslofty/mistral/blob/master/docs/lab-cell.pdf

22:09 <Wanda[cis]> it doesn't have to be a 3:2 compressor btw; with larger LUTs it may be reasonable to have something like 6:3 compressor

22:12 <Wanda[cis]> <tpw_rules> "i thought expressions were..." <- and about that

22:13 <Wanda[cis]> well, yes, but it doesn't change the underlying data flow graph structure

22:13 <tpw_rules> ok yeah that's what i thought

22:13 <tpw_rules> how is project mistral looking these days

22:14 <tpw_rules> hm, not amazing it looks

22:17 <galibert[m]> yeah, too many other things to do

22:17 <galibert[m]> there's some... things I need to RE better, then I will be able to have things go forward

22:19 <tpw_rules> i guess it might be too much to ask for HPS support :D

22:19 <tpw_rules> would be ludicrously sexy to ditch quartus from the PAPA project

22:20 <galibert[m]> HPS is more of a nextpnr problem than a mistral problem actually

22:20 <galibert[m]> and yosys too I guess

22:20 <tpw_rules> i mean i don't imagine they would have to do much?

22:21 <galibert[m]> no, it's much about creating the classes in yosys, and handling them in nextpnr, mostly at the naming level

22:21 <galibert[m]> pin naming level that is

22:22 <tpw_rules> i mean i have been working sort of on documenting that kind of stuff

22:22 <tpw_rules> are the primitives in verilog pretty direct

22:26 <tpw_rules> can mistral do a blinky?

22:27 <galibert[m]> Yes

22:44 <_whitenotifier-5> [rfcs] mwkmwkmwk commented on pull request #53: RFC 53: Low-level I/O primitives. - https://github.com/amaranth-lang/rfcs/pull/53#issuecomment-1987000004

22:44 <_whitenotifier-5> [rfcs] mwkmwkmwk deleted a comment on pull request #53: RFC 53: Low-level I/O primitives. - https://github.com/amaranth-lang/rfcs/pull/53#issuecomment-1987000004

22:45 <_whitenotifier-5> [rfcs] wanda-phi commented on pull request #53: RFC 53: Low-level I/O primitives. - https://github.com/amaranth-lang/rfcs/pull/53#issuecomment-1987000055

23:36 <_whitenotifier-7> [amaranth] whitequark commented on issue #1193: Mocks of combinational circuits and `add_process` - https://github.com/amaranth-lang/amaranth/issues/1193#issuecomment-1987010488