<FFY00_>
but I don't believe you can specify which specific pip to use
<cr1901>
I pinned to build == 1.0.3 for now. When I release 0.1.1 of my package, I'll try the new release
Degi has quit [Ping timeout: 252 seconds]
Degi_ has joined #amaranth-lang
Degi_ is now known as Degi
skipwich has quit [Quit: DISCONNECT]
skipwich has joined #amaranth-lang
ArmanSamimi[m] has joined #amaranth-lang
<ArmanSamimi[m]>
Hi. I am wondering what people’s thoughts here are on CIRCT, and if they are any plans to use it as a ‘backend’ for Amaranth. If I understand correctly while CIRCT and Chisel are closely associated CIRCT is designed to be independent, and in fact there is a python HDL called Magma which is similar to Amaranth but uses CIRCT. Does Amaranth currently use Yosys to fill the same function?
<ArmanSamimi[m]>
How does the quality of the generated verilog compare (in terms of optimization, human readability, etc)?
<ArmanSamimi[m]>
While I don’t understand the details, the idea of CIRCT seems pretty powerful to me, and I understand that the parent project LLVM had a big impact in the SW world.
<whitequark[cis]>
there are no current plans to do so as it's unclear what the benefit would be
<whitequark[cis]>
at one point I planned to use FIRRTL to communicate between Amaranth and Yosys since FIRRTL is much more clearly defined than RTLIL
<whitequark[cis]>
however, the people who made FIRRTL advised I do not do so since FIRRTL is basically deprecated
<whitequark[cis]>
also the Yosys FIRRTL frontend isn't very good I think
<ArmanSamimi[m]>
I see. So are Yosys and CIRCT (and RTLIL and MIRL) analogous in a sense? And currently neither is much better than the other?
<whitequark[cis]>
the generated Verilog is fairly human-readable, especially when compared to other HDL frontends; as for optimization, it corresponds very closely to the hardware primitives for the most part
<whitequark[cis]>
there shouldn't be any times where Amaranth-generated Verilog is worse than manually written Verilog, except for the places where Amaranth intentionally makes semantics defined that Verilog doesn't (Amaranth lacks 'x, on purpose)
<whitequark[cis]>
ArmanSamimi[m]: yeah
<whitequark[cis]>
CIRCT broadly speaking seems OK, but it's very heavyweight to build against and deploy, and it didn't even exist when Amaranth got tightly integrated with Yosys
<ArmanSamimi[m]>
I see. well, it’s good to have options I guess.
Chips4MakersakaS has quit [Quit: Idle timeout reached: 172800s]
<jfng[m]>
who is attending ?
<whitequark[cis]>
me
<tpw_rules>
hello
<galibert[m]>
me
<galibert[m]>
led pins are not io, they're o only usually?
<whitequark[cis]>
you could request them as io, if you wanted to use them with this peripheral
<whitequark[cis]>
(this is a realistic and useful use case)
<jfng[m]>
yes, but they are still a possible use case (the opposite use case would be buttons)
<whitequark[cis]>
* use case, at least with our current platform system)
Chips4MakersakaS has joined #amaranth-lang
<Chips4MakersakaS>
sort of here
<cr1901>
here
<whitequark[cis]>
note: there is (unusually for us) some important discussion in the GitHub PR itself
<whitequark[cis]>
(I'll try to shift more questions to PRs going forward and lead by example)
<Chips4MakersakaS>
If mode is constant would synthesis then be able to do optimization by constant propagation ?
<jfng[m]>
re: extension points
<jfng[m]>
i think this should constitue an RFC for Amaranth SoC peripherals in general, but we can certainly mention this in the "Unresolved questions" section
<galibert[m]>
there are "pins", like the irq inputs to the hps, that could be nice to have as a gpio but are strictly output (no enable line, no input line)
<whitequark[cis]>
Chips4Makers (aka Staf Verhaegen): constant how? usually synthesis isn't clever enough to find never-written CSRs
<whitequark[cis]>
galibert: why GPIO and not, say, their own peripheral which is basically 1 register?
<whitequark[cis]>
(I guess we could potentially have a HPS peripheral in SoC)
<cr1901>
HPS?
<jfng[m]>
Chips4MakersakaS: i dont think so; maybe if its CSR is made to be read-only, with a constant value ?
<whitequark[cis]>
cr1901: on-FPGA CPU core
<galibert[m]>
Hard Processor System
<cr1901>
ahhh
<tpw_rules>
i don't love how there's three different types of pin masks
<tpw_rules>
that's been an annoyance in the STM32 units for me and people i know
<whitequark[cis]>
tpw_rules: generally you're supposed to use the autogenerated BSP
<Chips4MakersakaS>
whitequark[cis]: I mean constant by assigning a constant INPUT_ONLY value to it.
<galibert[m]>
Then if you wouldn't use the gpio module to control irqs, why would you use it to control leds?
<tpw_rules>
whitequark[cis]: sure, but if you want to optimize
<whitequark[cis]>
Chips4Makers (aka Staf Verhaegen): assign by writing it in the CPU firmware?
<whitequark[cis]>
galibert: because "use some GPIOs to control some LEDs" is the #1 beginner Amaranth SoC project and we should support that
<Chips4MakersakaS>
Yeah, was confused; mode is memory mapped register.
<whitequark[cis]>
support, document how to do it, etc
<cr1901>
I don't treat IRQ lines as general purpose inputs personally
<jfng[m]>
galibert[m]: in this case, you could only connect the `.o` port to the HPS, and set the GPIO to PUSH_PULL mode
<whitequark[cis]>
in a practical SoC, you will probably not use the LED or button resources for GPIOs at all, but have dedicated GPIO resources in the platform
<whitequark[cis]>
however, requiring newcomers to jump through these hoops is outright cruel. also, in the new resource system, I plan to leave the IO buffer type entirely to the peripheral, so this issue should just disappear in a few months
<galibert[m]>
Catherine: you're going to have the problem that the connect() is going to fail if there's only a .o, no?
<whitequark[cis]>
galibert: `.request("led", dir="io")` gives you a `Pin` with all three
<galibert[m]>
"this issue should just disappear in a few months" means "the code is going to break at that point"
<galibert[m]>
or I misunderstood something
<whitequark[cis]>
correct. I plan to break all code that uses platform.request() in a few months period
<whitequark[cis]>
with the usual deprecation cycle, of course
<tpw_rules>
e.g. the altera peripheral has all access parallel so all bits 0 control pins 0
<galibert[m]>
you don't want to break how the gpio module is supposed to be used at the same time, do you?
<tpw_rules>
i don't know if there's a survey to do but i'd much prefer that
<tpw_rules>
it would involve slightly more decode logic i guess
<whitequark[cis]>
galibert: the GPIO module is explicitly designed in a way that does not couple it to the platform system
<jfng[m]>
^
<whitequark[cis]>
(which also makes this conversation somewhat offtopic; the only thing that would change is the supplementary example code in the guide section of the RFC)
<whitequark[cis]>
tpw_rules: sorry, I don't understand what you're describing at all
<tpw_rules>
so the proposed GPIO interface supports 16 pins in 32 bit registers. mode is set by two adjacent bits, in and out are set by one bit (ignoring the top 16 bits), set/clr are used with two bits 16 apart. so there's three different masks per pin depending on what you want to do.
<cr1901>
Is alternate mode mutually exclusive with having a pin multiplexer? (msp430 uses alternate to mean "redirect pins to a peripheral, not the GPIO register". But pins are hardcoded to peripherals typically)
<whitequark[cis]>
tpw_rules: what do you mean by a mask?
<tpw_rules>
e.g. the altera gpio interface supports 32 pins in 32 bit registers. set and clear are separate registers, and the mode bits are separate registers too. so you only need one mask, and bit n always corresponds to an action on pin n
<whitequark[cis]>
oh, that is strictly inferior
<tpw_rules>
why do you say that?
<whitequark[cis]>
having set and clear in separate registers would mean that you cannot e.g. set pin 1 and clear pin 2 on the same cycle, which is an intended part of the usage model
<whitequark[cis]>
the same for switching a mode e.g. from 0b01 to 0b10
<whitequark[cis]>
I care more about the set/clear register though
<cr1901>
the proposed* alternate mode.
<whitequark[cis]>
jfng: one thing we could do about the set/clear register, potentially, is to have them paired: `set_0 clr_0 set_1 clr_1 ...`; this avoids the need for the slightly weird power-of-2 calculation
<cr1901>
afk for a bit
<jfng[m]>
whitequark[cis]: this would make code quite unreadable, i think
<whitequark[cis]>
cr1901: cr1901: alternate mode is proposed to be used *with* a pin multiplexer
<whitequark[cis]>
jfng: how so?
<whitequark[cis]>
oh... we don't have arrays with a stride
<tpw_rules>
i think i would prefer the bits in two separate ranges like they are now
<tpw_rules>
thanks for your explanation though. that makes sense
<jfng[m]>
whitequark[cis]: the mask to set bits would be interleaved with the mask to clear bits in e.g. driver code
<tpw_rules>
"If we guarantee that the SetClr register is power-of-2 sized" we definitely should
<whitequark[cis]>
jfng: in my mind, the code will look something like `write_reg(GPIOA, REG_GPIO_SETCLR, GPIO_SETCLR_SET(clk) | GPIO_SETCLR_CLR(cipo));`
<whitequark[cis]>
are you thinking of programming it in assembly? even then, you could (and probably should) use macros to make your life better. (the C preprocessor works fine with most assemblers)
<tpw_rules>
i'm thinking of any dynamic case
<jfng[m]>
but if someone doesn't access to C and writes assembly by hand ?
<jfng[m]>
*have
<whitequark[cis]>
tpw_rules: works fine dynamically. `#define GPIO_SETCLR_SET(n) (1<<n)`; `#define GPIO_SETCLR_CLR(n) (1<<(16+n))`
<tpw_rules>
maybe i'm placing too much weight on my lifetime of bitbanging sins
<whitequark[cis]>
which means that e.g. a 24 pin GPIO register can be densely packed on a 8-bit data width
<jfng[m]>
hmm
<tpw_rules>
so the peripheral does not necessarily define mandate that pin_count*2 <= data_width? or do you mean for a higher level adapter?
<jfng[m]>
it is a tradeof of usage vs. implementation concerns
<jfng[m]>
tpw_rules: pin_count can be any positive integer
<zyp[m]>
It's easier to translate a multibit value and mask to set and reset when they are two consecutive ranges and not interleaved
<whitequark[cis]>
galibert: I have two counterarguments. (1) This is a GPIO peripheral, not a GPI peripheral or GPO peripheral. It is OK to not support every possible configuriation at all times. (2) I'm not sure how big this problem is, given that many (most?) SoCs will include a pin multiplexer or similar, which can be (and often will be) set up in such a way that the registers in the GPIO peripheral are dynamically ignored
<whitequark[cis]>
basically, you cannot escape actually reading the datasheet for your device, and once you've done that, you have all the necessary knowledge
<tpw_rules>
jfng[m]: ok, i guess the CSR bridge has the handling for the width conversion. i forget exactly where that's done
<zyp[m]>
(v&mask)|((~v&mask)<<k)
<whitequark[cis]>
(a lesser reason to not care is that most of us will use the GPIOs on an FPGA, not ASIC, and FPGAs very rarely have input-only or output-only pins)
<whitequark[cis]>
zyp: why would you be masking the SetClr register? it's write-only
<whitequark[cis]>
you're never reading it, you're constructing it from some other thing, usually either pin number or precomputed mask
<jfng[m]>
tpw_rules: yep, this is handled by `csr.Multiplexer` internally
<whitequark[cis]>
* other thing by ORing, usually
<zyp[m]>
Sorry. I'm out and on the phone, I'll explain better when I get home
<whitequark[cis]>
zyp: oh, nevermind, I misunderstood what you meant
<whitequark[cis]>
I agree that this is a reason to have consecutive ranges
<galibert[m]>
Then the beginner project of "use some GPIOs to control some LEDs" should not use that module since it's not a GPO
<whitequark[cis]>
galibert: I feel that you're not approaching this in good faith, to be quite honest
<jfng[m]>
why not just say "reads from the Input register return undefined values, because the LEDs do not have input lines" in some tutorial ?
<galibert[m]>
Actually, I'm not. I was very surprised when I saw led pins requested as io, was about to say it was a typo, then went wtf? when I realized it was a fundamental limit of the proposal
<tpw_rules>
i think also a concern is that `connect` couldn't work
<whitequark[cis]>
tpw_rules: this has already been addressed
<whitequark[cis]>
galibert: when you solder a pin of your STM32 or something to a LED, it doesn't magically become a GPO, with a completely different usage model
<jfng[m]>
you can do `platform.requiest("led")` and manually assign `.o` otherwise
<tpw_rules>
jfng[m]: does the RFC and possibly eventual docs prescribe any particular memory map? or just let the csr machinery generate what it will?
<cr1901>
Some GPIO peripherals (6522) return the output register _or_ the input pins when read as inputs in output-only mode. It doesn't _have_ to be undefined
<jfng[m]>
tpw_rules: the documentation will have to do so, but the final memory map is dependent on the `data_width` parameter
<whitequark[cis]>
for the same reason, when the pin of your FPGA is soldered to a LED and happens to be called "led" in our (flawed) resource system, that doesn't make the pin somehow unsuitable for connecting to the GPIO peripheral
<galibert[m]>
jfng: with the limits w.r.t interaction with the mode registers I was talking about. H8 gp(i)(o)s for instance have "traditional" direction registers, and when a pin is single-direction the associated direction pin is constant to the correct value
<cr1901>
jfng[m]: So, this is a bikeshed question, but why doesn't inputs-in-output-only mode return "the value of the output register"?
<cr1901>
Just not worth the trouble?
<whitequark[cis]>
cr1901: generally you want the input buffer to be always connected to the pin
<tpw_rules>
jfng[m]: so say 24 pins on an 8 bit CSR bus, you'd always have to write three bytes and they'd be at multiples of three (supposing we do not round SetClr)
<whitequark[cis]>
adding an extra mux in there, to give you the value you can already read from the Output register, is a waste of resources
<cr1901>
So "just not worth the trouble"
<whitequark[cis]>
in some rare cases (when an external driver overpowers your driver) they can actually read different values. or if the magic smoke left the output driver afterwards. so it has some utility too
<whitequark[cis]>
(I've diagnosed dead GPIOs that way)
<jfng[m]>
@galibert:matrix.org my point is more that a software driver is *aware* that a given pin is say, input-only, because its developers based it on a datasheet
<jfng[m]>
so a default value is not of use, if it is just never going to be read anyway
<galibert[m]>
jfng: you're kind of overconfident in the quality of firmware writers :-P
<jfng[m]>
it's like reading from arbitrary/undocumented locations in your address space and expecting a specific result
<whitequark[cis]>
galibert: there are two different use cases for resources: one for the case where you're driving them directly from the FPGA, where the fixed direction is useful (or at least I deemed it useful years ago when I came up with `.request()`; which I regret). in that case, they're always inputs or outputs for the convenience of FPGA gateware.
<whitequark[cis]>
the second use case for resources is connecting them to an MCU instantiated in the SoC, in which case you have a standardized usage model (rather than a specialized one), so something somewhere has to adapt the one to the other. in this case, the adaptation is done by requesting the pin as inout (which it technically is, just like it is on your typical MCU) and letting software deal with it
<galibert[m]>
well, if only part of the pins are monodirectional you're going to access all the mode bits in any case
<whitequark[cis]>
could you extend the GPIO peripheral to have some form of metadata saying which pins are input-only, which are output-only, etc? in principle, yes
<whitequark[cis]>
this doesn't reduce the complexity though, it just shifts it around the system
<whitequark[cis]>
in our CSR system, fields with RW access type are fully controlled by the CPU, so something like "the mode is always INPUT_ONLY for input-only pins" would require changing to a custom access type specific to the GPIO peripheral
<galibert[m]>
Well, it would give out the possibility of providing the information in the automatically generated memory map
<galibert[m]>
s/out/you/
<whitequark[cis]>
you can still do that though, by adding an annotation
<whitequark[cis]>
regardless of whether we ever want to support input-only and output-only pins explicitly--that is for @jfng to answer, not me--I do want to say that I think our first peripheral ever should definitely not indulge in this level of application specific complexity
<jfng[m]>
we are approaching the end of the meeting, i think we should consider voting
<whitequark[cis]>
GPIOs are supposed to be simple. they... in reality aren't, but we can always make them complicated later. not vice versa though
<jfng[m]>
whitequark[cis]: i think that input-only or output-only pins can be advertised through metadata to the BSP generator
<jfng[m]>
which can then provide the appropriate accessors, etc
<whitequark[cis]>
my vote is: merge, with the clarifications on GitHub, requiring set^clr, adding ALTERNATE, relaxing input_stages requirement. I have no strong opinion on power-of-2 sizing except that we should either do it for all registers or for none (in the latter case, using interleaved encoding)
<whitequark[cis]>
* my vote is: merge, with the clarifications on GitHub, requiring set^clr, adding ALTERNATE, relaxing input_stages requirement. I have no strong opinion on power-of-2 sizing except that we should either do it for all registers or for none (in the latter case, using interleaved encoding for SetClr)
<Chips4MakersakaS>
metge
<tpw_rules>
this all happened fast, i'd like to postpone a vote til next meeting
<Chips4MakersakaS>
*merge
<jfng[m]>
register sizes are variable for all registers, and will be clarified
<cr1901>
jfng[m]: I think it's a Good Peripheral, and re: "should peripherals exist in amaranth-soc?", your GPIO peripheral is minimal and fine. I would prefer interleaved encoding for SetClr
<cr1901>
Ahhh hmmm
<galibert[m]>
close/defer/postpone, I really think the lack of taking directionality into account is something we'll regret in the future. As it is I wouldn't use it, or at least not for anything directional
<jfng[m]>
@vegard_e:matrix.org ?
<whitequark[cis]>
galibert: you should clarify whether by "postpone" you mean "until next meeting" or "indefinitely" (that's a separate thing described in our RFC repo README)
<zyp[m]>
I'm not caught up so I'll have to vote defer
<whitequark[cis]>
(I guess what tpw_rules says should be called defer for reduction of ambiguity)
Wanda[cis] has joined #amaranth-lang
<Wanda[cis]>
about input-only pins
<Wanda[cis]>
there are actual input-only pins on some FPGAs
<galibert[m]>
Well, if the final version is as it is my vote is close, if there is a chance it changes in that area I will change my vote so closing would be premature. Not sure how to express that
<Wanda[cis]>
where you just plain cannot request an IO buffer
<cr1901>
Wanda[cis]: Aren't those typically "clock pins you're not using for clocks"?
<Wanda[cis]>
not necessarily, no
<Wanda[cis]>
Spartan 3E just sprinkles them wherever
<whitequark[cis]>
galibert: I do want to say that thus far I find the HPS use case unconvincing as the GPIO peripheral is for connecting to I/O pins and HPS ports are not I/O pins. I think the HPS use case should be covered by an HPS-specific peripheral that is designed specifically for that niche use case
<whitequark[cis]>
if you have motivation that is not the HPS use case I'd like to hear it
<cr1901>
Wanda[cis]: I have a spartan 3E board w/ input only pins. Was under the impression that the HW designer couldn't shoehorn 4 extra clocks into the design. I'll have to check :P
<cr1901>
w/ input only pins exposed as GPI*
<galibert[m]>
It's also LEDs, buttons and the confusion that will come with them. One is not going to desolder leds to make the pin usable as bidir again
<tpw_rules>
now that i look a little more carefully, why is setclr double-length (or quad-length)? i think interleaving and packing might be separate questions. i'm not sure the impact of the CSR bridge/multiplexer has been properly explored here
<whitequark[cis]>
tpw_rules: so that you can atomically set and clear any amount of pins
<jfng[m]>
packing them makes them less convenient to manipulate with e.g. assembly code imo
<tpw_rules>
whitequark[cis]: yes, i get that. for n pins, output and input have 2n bits on the diagram, mode has 2n, but setclr is 4n?
<whitequark[cis]>
tpw_rules: oh, I see, you mean the alignment
<whitequark[cis]>
yeah, that's what I meant when talking about register sizes
<whitequark[cis]>
galibert[m]: you absolutely would, though
<jfng[m]>
outputs don't have 2n bits, in this case they have data_width bits
<jfng[m]>
as 4 < 8
<whitequark[cis]>
there's a decent amount of FPGA boards with silly peripheral allocation where you plainly have to desolder a LED or a button to use some functions
<tpw_rules>
what happens if pin_count is > data+width?
<zyp[m]>
whitequark[cis]: Been there, done that
<tpw_rules>
you said earlier that's supported and i'm trying to understand how
<whitequark[cis]>
tpw_rules: the CSR infrastructure takes care of that
<galibert[m]>
If you can do that you can change the platform object to mark said pin bidir though
<jfng[m]>
tpw_rules: then it is rounded up to a power-of-two and multiple of data_width
<tpw_rules>
then how can the output register only have data_width bits?
<galibert[m]>
You're not using a vanillay de10-nano anymore
<Wanda[cis]>
there are also output-only pins on some fpgas, but they're ... obscure
<galibert[m]>
s/vanillay/vanilla/
<tpw_rules>
jfng[m]: why bother doing that?
<whitequark[cis]>
galibert: are you unable to comprehend the utility of being able to use the LED resource for a quick one-off demo when you're bringing up a new FPGA board, or are you arguing in bad faith?
<Wanda[cis]>
S3E just had input-only pins wherever though
<cr1901>
Couldn't that be handled at the resource (revamp) level? Just plain refuse to attach an input only pin to a GPIO peripheral without a gasket?
<jfng[m]>
tpw_rules: it will technically always be a multiple of data_width due to how the CSR bus is designed
<whitequark[cis]>
yes, any nontrivial use of the GPIO peripheral should have you add a dedicated GPIO resource with the io direction. it is still useful to be able to not have to mess with pins if all you want is to blink some LEDs on a new board.=
<whitequark[cis]>
s/.=/./
<Wanda[cis]>
basically they wanted to optimize how much logic they'd fit on silicon at expense of IO, hence saving area on output buffers
<tpw_rules>
jfng[m]: that also means that if you have 24 bits on an 8 bit bus you'd have to do four writes instead of three
<cr1901>
whitequark[cis]: Re: input only pins, at least one board supported by Amaranth (Mercury) explicitly has input only pins exposed
<tpw_rules>
jfng[m]: doesn't the csr machinery make accessing those extra bits undefined? also speaking of that sort of, there's not a way to calculate the required address width except by trial and error?
<jfng[m]>
tpw_rules: you are right, this should be user-configurable, and i need to think about this a bit more
<cr1901>
I was going to say merge, but I'm going to switch to defer until next week.
<jfng[m]>
i am tired and losing ability to think clearly, also
<jfng[m]>
so i think we should end the meeting
<galibert[m]>
Catherine: Not following. Generic CycloneV would have all bidi pins, specific eval board would have some of those pins input-only and output-only, and I'd like the connect to tell me when I'm doing things wrong instead of essentially casting everything to void *
<cr1901>
jfng[m]: I think you did a good job, FWIW
<galibert[m]>
Catherine: and having a somewhat fundamental disagreement does mean either of us is in bad faith
<galibert[m]>
s/does/doesn't/
<tpw_rules>
jfng[m]: why would this be user-configurable?
<whitequark[cis]>
galibert: happy to agree to disagree on "in that case I think you should be using a different peripheral"
<galibert[m]>
Yeah, that's my conclusion too. I'll switch to abstain then
<tpw_rules>
jfng[m]: please ping me when you're thinking about it again, i'd like to share more inputs
<tpw_rules>
i think it's a bit under-described, and i think how it works will set the stage for future peripherals
<jfng[m]>
you can leave them here and i will read them
<whitequark[cis]>
tpw_rules: you could also comment on GitHub to discuss this more asynchronously
<jfng[m]>
thank you for providing them
<jfng[m]>
or on github, yes
<tpw_rules>
okay, i will
<whitequark[cis]>
tpw_rules: re: the CSR machinery, didn't we go over it while discussing that RFC?
<whitequark[cis]>
I agree that it's fundamental, I'm just surprised given that you were pretty involved in it
<tpw_rules>
whitequark[cis]: no, i understand how the machinery works, and i think it's a good idea, i just am saying how it relates to the design and documentation of a peripheral
<whitequark[cis]>
tpw_rules: ohhhh, oh yeah
<whitequark[cis]>
the actual user-facing documentation of a peripheral is a big topic not really breached in the RFC
<tpw_rules>
for example, it's non-intuitive to me that pin_count*2 can be > data_width
<whitequark[cis]>
my plan was to address it together with JF after the RFC is merged, since we do not have a methodology for writing peripheral docs yet
<tpw_rules>
i think they are more linked together than that. it might turn out too difficult to write docs for that case, so it should be not allowed in the peripheral
<whitequark[cis]>
Wanda: input-only cells were, IIRC, the actual (part of) motivation for having `dir="i"` etc
<whitequark[cis]>
tpw_rules: I wonder if that's the case. for any given instantiation, docs for the peripheral instantiated with those parameters should be eventually generated by the BSP generator too (I guess we should include docstrings in metadata eventually)
<whitequark[cis]>
so for a user of a specific instantiation it's not a problem since they get a clear explanation of how that specific one works, and for a user of a parametric instantiation (LiteX style) they're relying entirely on the BSP to do register accesses for them
<whitequark[cis]>
this leaves out the case of a configurable chip for which someone might want to manually write code in assembly
<tpw_rules>
generate the docs and read them?
<whitequark[cis]>
yeah, that's the obvious choice
<whitequark[cis]>
but if you want to write assembly generic over the chip config, but also not using the BSP, you might be in a difficult place
<whitequark[cis]>
that seems like not something we should necessarily overly focus on, though
<tpw_rules>
well doesn't the bsp generator have to do that?
<cr1901>
If I'm writing code for a soft-core in all assembly, I probably have a very specific bespoke design whose insns are intricately tied to doing I/O and thus wouldn't benefit from amaranth-soc
<whitequark[cis]>
tpw_rules: sorry, I don't follow
<tpw_rules>
i would use it, but be doing assembly hax
<cr1901>
(Although would be cool it _could_ benefit from amaranth-soc)
<tpw_rules>
whitequark[cis]: the bsp has to generate code to access the chip anyway, maybe not always in assembly
<whitequark[cis]>
tpw_rules: yep
<galibert[m]>
cr1901: unless you start adding i/o specific instructions to your core, memory-mapped stuff on wishbone is easy to use and comfortable
<tpw_rules>
so such a user could study it
<whitequark[cis]>
oh yeah
<cr1901>
>whose insns are intricately tied to doing I/O
<whitequark[cis]>
I guess in that case I don't feel like any particular layout is going to be a problem
<whitequark[cis]>
since you could always generate docs and/or c/rust/assembly and study them
<whitequark[cis]>
tpw_rules: jfng: happy to defer until next friday, I do feel that the topic of register sizing is underexplored
<whitequark[cis]>
I did mention that I'm fine with either alternative to the current proposed design (I am) but I also do acknowledge that it may be more important than I thought
<tpw_rules>
not per se, but understanding the atomicity guarantees and how everything gets splits up is complex
<tpw_rules>
and i don't think overcomplicating it because it can technically be generated into an un-understandable tangle is a great idea
<whitequark[cis]>
in particular, I think we should definitely avoid the case where e.g. the input register is written in 3 byte chunks, and the setclr register is written in 8 byte chunks due to only one of them being aligned to next power of 2
<whitequark[cis]>
s/input/output/
<tpw_rules>
absolutely
<tpw_rules>
i think that's an easy decision, but it raises other questions
<tpw_rules>
i'll collect them a comment on the PR soon
<whitequark[cis]>
I think having a dense allocation extracts some marginal efficiencies (maybe you really like your GPIO ports being 24-pin on your 8-bit MCU) at the cost of making a lot more people suddenly deal with odd-sized registers
<jfng[m]>
whitequark[cis]: whether you do or not, the decision is alreay defer by majority (2 merge, 3 defer, 1 abstain)
<whitequark[cis]>
sure, I just wanted to register my change of view on the topic
<tpw_rules>
what does power of two even mean? i assume power of two multiple of data width bits
<whitequark[cis]>
yeah
<whitequark[cis]>
on reflection, I think we should probably cater more to the common case of "most GPIO ports are roughly power-of-2 sized and those that aren't are close to that", and would mildly prefer making them look and work (more) uniformly at the cost of some marginal efficiencies afforded by the dense encoding
<tpw_rules>
i'm a little unsure of the benefit of being able to 'over-size' the peripheral in the first place; no chip i'm aware of does that
<whitequark[cis]>
and yeah, it is true that alignment and packing order of set/clr are independent, we could have a 48-bit SetClr register for a 24-pin GPIO component
<whitequark[cis]>
s/component/peripheral/
<whitequark[cis]>
'over-size'?
<tpw_rules>
pin_count*2 > data_width
<whitequark[cis]>
you probably haven't encountered chips that do this because this is primarily useful for FPGAs
<whitequark[cis]>
you can save a lot of area (and some delay) by not routing a pair of huge 32-bit data buses to every minor peripheral that you don't even care to access quickly, you just want to have a single-clock design
<whitequark[cis]>
so being able to go from a single-cycle-access 32-bit-wide CSR bus to a four-cycle-access 8-bit-wide CSR bus by changing a single parameter and regenerating your BSP can save you a ton of effort on timing closure
<whitequark[cis]>
of course you want the usage model to stay exactly the same, without torn writes suddenly becoming an issue just because you tried to optimize the resource consumption a bit
<whitequark[cis]>
the CSR machinery is intended to be nearly transparent over data_width, with nothing changing except for timings (and timings changing in a predictable way). as a consequence of having this design requirement, it also makes it very easy to have big registers that don't have torn accesses, like 64-bit timers on a 32-bit CSR bus (or 8-bit even), which is extremely handy in some cases
<tpw_rules>
that all makes sense. do we document it as a 64 bit register or eight 8 bit registers though?
<whitequark[cis]>
I don't actually know
<whitequark[cis]>
eight 8-bit registers seems safe and in-line with industry practice
<whitequark[cis]>
however, industry practice is that because they don't have a consistent way to make such registers, so each individual use site is an one-off with, often, its own slightly special behavior
<whitequark[cis]>
Amaranth does provide such a way, so we could conceivably say "this 32-bit register has 1-cycle access, and this 64-bit register has 2-cycle access" with a big warning on each such register telling you exactly the constraints on accessing it
<whitequark[cis]>
I think only time will tell whether that or documenting it as several separate ones will work better, as in, result in a lower firmware defect rate
<whitequark[cis]>
I can see arguments in favor of either choice and I don't have a strong opinion either way
<tpw_rules>
2 cycle access being that you have to access both 32 bit halves?
<whitequark[cis]>
correct
<whitequark[cis]>
2 (bus) cycle, or 2 access, or however you call it
<jfng[m]>
<whitequark[cis]> "I think only time will tell..." <- i think it is easier to communicate to developpers the notion of atomic accesses over chunks of a single register, rather than a group of registers
<whitequark[cis]>
that is a good point
<jfng[m]>
i.e. that they should be accessed in ascending order of addresses
<zyp[m]>
I agree, if all chunks have to be accessed, it should be documented as one register
<zyp[m]>
especially if it's four eight bit chunks behind a 32 to 8-bit bridge
<whitequark[cis]>
so, I wanted to float some thoughts on clock domains (cc galibert)
<whitequark[cis]>
first off, I think we should remove propagation of domains "up" in the hierarchy, chiefly by deprecating ClockDomain() without local=True and then making that the only option. it has really strange semantics where clock domains are automatically renamed using submodule names (a weird Migen-ism), causes no end of issues because it's not fully well-defined, doesn't really fit Amaranth's design, and causes problems for the
<whitequark[cis]>
compiler itself and anything else analyzing the semantics of Amaranth code
<whitequark[cis]>
the workaround is very straightforward: in your clock and reset generator submodule (if you have one), drive ClockSignal() instead of cd_sync.clk, then move the clock domain definition to your toplevel
<whitequark[cis]>
in the unlikely case you were actually relying on the automagical clock domain renaming, use DomainRenamer() explicitly
<galibert[m]>
Yup, not a problem
<whitequark[cis]>
then, we need to talk about resets
<whitequark[cis]>
when considering synchronous inputs only (ie the async control set is just {clk}), there are two options for flop controls: reset-over-enable and enable-over-reset
<whitequark[cis]>
both have their advantages and as far as I'm aware neither is "best"
<whitequark[cis]>
so, ignoring the existence of ClockDomain for now: if we consider ResetInserter() and EnableInserter(), then we can already express priority: if multiple inserters are applied to the same domain, the outermost one wins
<whitequark[cis]>
even in this simplified case, we need to consider what ResetSignal() and EnableSignal() do
<whitequark[cis]>
I think right now the expression ResetSignal() lowers to is just OR'd with all resets from each reset inserter. presumably EnableSignal() would be, either
<whitequark[cis]>
however, should they not interact with each other too?
<whitequark[cis]>
that is, ResetInserter() does rst_new = rst_inserted | rst_old and en_new = rst_inserted | en_old
<whitequark[cis]>
and `EnableInserter()` does `rst_new = en_inserted & rst_old` and `en_new = en_inserted & en_old`
<whitequark[cis]>
otherwise, external logic driven by ResetSignal() or EnableSignal() would not recognize prioritization correctly (ie, the same way as Amaranth logic)
<whitequark[cis]>
next, we should consider ClockDomain.rst and ClockDomain(async_reset=). right now that has broken semantics: async_reset= makes all of your ResetInserter-added resets also async, which is something generic code cannot compensate for at all, except for defensively using FFSynchronizer everywhere
<whitequark[cis]>
ClockDomain must necessarily have an async control set (clk, arst; arst being maybe optional), as these features are otherwise completely unavailable in Amaranth
<whitequark[cis]>
however, it doesn't necessarily need to have a sync control set (rst, en)
<whitequark[cis]>
I think having both rst and en runs into the problem of "is it reset-over-enable or enable-over-reset?" which is unsolvable, therefore we should not have both
<whitequark[cis]>
we could potentially kill rst entirely, promote the methodology of "whenever you need to reset an entire domain, use an async reset, ResetSynchronizer, and reap the benefits of async assertion, sync release"
<whitequark[cis]>
this is a bit radical but we've done radical changes before
<whitequark[cis]>
we could also leave rst and only rst in ClockDomain, making its meaning basically "an implicit ResetInserter() applied after all other ResetInserter()s for any use of that domain in the design", which is actually how rst is implemented right now (modulo some insanity when async_reset=True)
<Wanda[cis]>
<whitequark[cis]> "next, we should consider ..." <- I think `ResetInserter` just adds a synchronous reset in addition to the async one
<whitequark[cis]>
galibert: in my view, these are the most pressing and fundamental questions with our current clock domain system. I think your RFC will need to wield an answer to all of these questions at a minimum, and maybe some from others in the community
<whitequark[cis]>
Wanda: oh, I think I know why it may not have done this previously
<whitequark[cis]>
because it relied on proc_arst and proc_arst did the insane thing when it matched on the combined emitted reset or something
<Wanda[cis]>
what.
<whitequark[cis]>
I'm fairly positive that just enabling async_reset=True made ResetInserter()-inserted resets async too
<whitequark[cis]>
and I think this may have been happening because it was relying on designed-for-verilog pattern matching of sync rules?
<whitequark[cis]>
didn't me and you discuss that a few months ago, even?
<whitequark[cis]>
it's possible I'm just misremembering entirely but there has to be some issue I'm thinking of
<Wanda[cis]>
I ... don't know?
<Wanda[cis]>
the problem that I recall is ResetInserter not affecting ResetSignal
<Wanda[cis]>
there's also the weird hack that EnableInserter does with MemoryInstance in particular
<Wanda[cis]>
also we did talk about the priority issue; I brought it up when I attempted to implement EnableSignal but it ended up an "open, scream, close" kind of issue
<tpw_rules>
b.eq(a) means the low two bits of b are the same net as a and you can't disambiguate them anymore
<Wanda[cis]>
because it's effectively asking the port inference code to insert redundant nets into ports
<tpw_rules>
sorry, i was talking about the consequence, not the work
<Wanda[cis]>
about that
<Wanda[cis]>
because the signal is effectively driven by a shorter cell
<Wanda[cis]>
if you do a = Signal(4); b = Signal(4); m.d.comb += a.eq(~b)
<Wanda[cis]>
then amaranth will synth a 4-bit NOT cell
<Wanda[cis]>
if you then pass a somewhere, it'll pass the output of this cell and require a 4-bit port
<Wanda[cis]>
when the port inference code creates this port and wants to name it, it notices there's already a perfectly good wire, a, that happens to consist of the same nets, and will reuse it as the port
<Wanda[cis]>
however
<Wanda[cis]>
if you do a = Signal(8); b = Signal(4); m.d.comb += a.eq(~b)
<Wanda[cis]>
there still is a 4-bit NOT cell, and the a signal gets assigned to a concatenation of the output of this cell and 4 0-bits
<Wanda[cis]>
pass a to another module, and the port inference code again notices it needs to wire these 4 bits
<tpw_rules>
why did it forget the four 0 bits?
<Wanda[cis]>
except this time there is no signal that conveniently consists of the same bits, so an anonymous port wire is created
<Wanda[cis]>
because const bits don't need to be passed between modules
<Wanda[cis]>
they'll just be materialized when they need to be wired somewhere
<tpw_rules>
where do the named signals get created? the b in the example
<Wanda[cis]>
if signed signals were involved, not signed, you'd get a assigned to a concat of the 4-bit output, and then 4 duplicates of the highest net
<Wanda[cis]>
for the port inference code, that's just 4 nets that need to be passed
<Wanda[cis]>
created? the signals are created by the user, and get passed around through the whole process
<Wanda[cis]>
the netlist builder keeps a track of signal-net correspondence
<tpw_rules>
created in the verilog
<Wanda[cis]>
we also keep track of which signals were used in which modules
<Wanda[cis]>
then, when creating RTLIL, we use these two bits of information to insert a wire per used signal into every module and connect it to the relevant nets
<Wanda[cis]>
if a port and signal happen to use the same nets, we merge them into named port
<Wanda[cis]>
that said, that stuff is quite... best-effort based
<Wanda[cis]>
the main goal for the signals emitted into Verilog is that they have the correct values for debugging purposes, which we do maintain
<Wanda[cis]>
using pretty port names is of secondary importance
<tpw_rules>
i think port consistency is not unimportant ofr debugging
<tpw_rules>
but that could be a slippery slope
<Wanda[cis]>
I think you're trying to rely on more than Amaranth can provide.
<tpw_rules>
is it not a regression?
<Wanda[cis]>
maybe. that said, decoupling ports from signals was an explicit goal of the NIR
<Wanda[cis]>
with the underlying goal of enabling mixed directionality of bits within a single signal
<tpw_rules>
i see
<Wanda[cis]>
ie. driving some bits of it in one module, some in another
<Wanda[cis]>
(we're not quite there yet for normal drivers, due to pysim design, but it already works for Instance-driven stuff)
<tpw_rules>
i see, thanks for explaining everything
<Wanda[cis]>
thinking about this some more, I wonder if we should have buffer cells in NIR, associated with signals
<Wanda[cis]>
that could possibly solve some problems
<galibert[m]>
I think I'll have to re-read the RFC, it's almost a year old at this point, but in any case there was a lot of open questions in it. Including, I think, around reset
toshywoshy has quit [Read error: Connection reset by peer]