#rust-embedded on 2023-06-30 — irc logs at libera.irclog.whitequark.org

2022-02-07 19:20 ChanServ changed the topic of #rust-embedded to: Welcome to the Rust Embedded IRC channel! Bridged to #rust-embedded:matrix.org and logged at https://libera.irclog.whitequark.org/rust-embedded, code of conduct at https://www.rust-lang.org/conduct.html

00:30 kenny has joined #rust-embedded

05:00 emerent has quit [Read error: Connection reset by peer]

05:01 emerent has joined #rust-embedded

06:55 id_tam has quit [Ping timeout: 250 seconds]

08:22 dne has quit [Remote host closed the connection]

08:22 dne has joined #rust-embedded

09:03 id_tam has joined #rust-embedded

09:04 id_tam has quit [Client Quit]

09:35 emerent has quit [Remote host closed the connection]

09:36 emerent has joined #rust-embedded

10:17 <re_irc> <@thejpster:matrix.org> TIL https://gitlab.arm.com/firmware/corstone-hal-rs exists - official Arm PAC and HAL for the Corstone reference System-on-Chip design emulated by both QEMU and the Arm Fixed Virtual Platform.

10:19 <re_irc> <@diondokter:matrix.org> Oh, that's pretty cool!

10:19 <re_irc> <@thejpster:matrix.org> mad respect to for getting that pulled off

10:23 <re_irc> <@ithinuel:matrix.org> 😄 I was not the only one involved 😄 but thanks I'll share with the colleagues involved !

10:57 IlPalazzo-ojiisa has joined #rust-embedded

12:19 emerent has quit [Ping timeout: 258 seconds]

12:20 emerent has joined #rust-embedded

13:55 <re_irc> <@thejpster:matrix.org> https://mastodon.social/@whitequark/110633474336633729

13:57 <re_irc> <@whitequark:matrix.org> I've been planning to talk about it more as the bits and pieces fall into the final form, and I think now's as good a time as any to give an overview, considering that Amaranth SoC will feature a Rust CSP generator quite possibly before a C CSP generator

14:00 <re_irc> <@whitequark:matrix.org> the concept at the core of the CSR system is that of a register: an abstract N-bit entity that can be read (producing an N-bit word) and/or written (producing an N-bit word as well). usually you see all of the registers being APB word sized, but this has limitations and Amaranth steps away from it a bit conceptually

14:03 <re_irc> <@whitequark:matrix.org> it does so because a register access (which from the point of view of the peripheral always happens in a single cycle), in this model, is a transaction (in a database sense); you can read all of the fields in a register, or write any subset of the fieds, while having a guarantee (unless you're doing explicitly unsafe things) that under no circumstances will the CPU race with the peripheral

14:03 <re_irc> <@whitequark:matrix.org> * fields,

14:07 <re_irc> <@whitequark:matrix.org> so the fields can be read-only (owned by the peripheral), read-write or write-only (owned by the CPU), or a flag (ownership shared between CPU and peripheral with the operations restricted); the flag can be set by the peripheral, cleared by the CPU, and if set+cleared simultaneously, setting it wins since the CPU could only possibly be clearing a flag it's previously read

14:07 <re_irc> <@whitequark:matrix.org> this is a bit long-winded but my point is that there are direct parallels between synchronization between software tasks done in e.g. Rust via shared memory, and synchronization between a software and a hardware task via MMIO

14:09 <re_irc> <@dirbaio:matrix.org> 👀 very interesting

14:10 <re_irc> <@whitequark:matrix.org> if the infrastructure makes it possible to take an arbitrary set of fields (all of which have a fixed point--that is, the CPU can write any field and have it reliably be a no-op) and read or write them all simultaneously, it means that you effectively have a form transactional memory (without a rollback operation, but you don't need one if all transactions take the same unit time to complete)

14:11 <re_irc> <@whitequark:matrix.org> since the transactionality is ensured on the peripheral side, no matter how weird your bus interconnect or multitasking system is, you can always know that the transaction will be completed in a single cycle in its entirety

14:12 <re_irc> <@dirbaio:matrix.org> how do the transactions work? multiple AHB reads/writes, and the periph tracks some state of the WIP transaciton?

14:12 <re_irc> <@whitequark:matrix.org> and having infrastructure support for registers wide than the bus interconnect is key to enabling that, since otherwise you're screwed the moment you want more fields in a transaction than you have bits in PWDATA

14:13 <re_irc> <@dirbaio:matrix.org> * APB

14:14 <re_irc> <@whitequark:matrix.org> : the CSR bus is a variation on APB, and yes, essentially

14:15 <re_irc> <@whitequark:matrix.org> how to track the state of the transaction was actually one of the contentious points here, because it is logically a resource owned by the thread of execution (not even a bus initiator), but in the absence of useful bus locking support, you can only allocate this resource inside the peripheral

14:17 <re_irc> <@whitequark:matrix.org> the solution I came up with is actually to rely on the ownership model _again_: if a register is wider than the bus data width, it must be written in its entirety, in an ascending order, with no intervening writes to other registers within the same CSR multiplexer (~register block, aka peripheral channel), or the behavior of the peripheral is UNPREDICTABLE until it is reset

14:18 <re_irc> <@whitequark:matrix.org> this means that you can snap the ADC peripheral (with a shared SAR block) into e.g. 8 channels and give them all away to RTOS tasks without caring which does what, but if you have both a task and an interrupt touching the same channel, you need to ensure atomicity of access yourself

14:18 <re_irc> <@whitequark:matrix.org> the solution I came up with is actually to rely on the ownership model _again_: if a register is wider than the bus data width, it must be written in its entirety, in an ascending order, with no intervening writes to other wide registers within the same CSR multiplexer (~register block, aka peripheral channel), or the behavior of the peripheral is UNPREDICTABLE until it is reset

14:19 <re_irc> <@whitequark:matrix.org> i.e. disable the interrupt. if you have no concurrent access to wide registers (meaning the hardware resources handling the transaction for that case are uncontested) you can do whatever, it works the same way you'd expect

14:22 <re_irc> <@whitequark:matrix.org> within these constraints there is actually a _really_ lean and elegant implementation of the transaction mechanism that uses negligible amounts of area and delay, meaning we can forget about having to painstakingly shuffle bits in peripheral registers until you get atomicity for at least the most important things

14:23 <re_irc> <@whitequark:matrix.org> nothing stops you from e.g. making a write of the _entire register set of a timer_ atomic

14:23 <re_irc> <@dirbaio:matrix.org> I see. So the periph author decides how to split the regs, and the firmware author must ensure no concurrent accesses to the same reg, but concurrent accesses to separate regs are allowed?

14:23 <re_irc> <@whitequark:matrix.org> or a write to the entire clocking control block

14:24 <re_irc> <@jamesmunns:beeper.com> > you need to ensure atomicity of access yourself... i.e. disable the interrupt

14:24 <re_irc> You answered my one question :D

14:24 <re_irc> <@whitequark:matrix.org> : replace "reg" with "reg block" and that would be it

14:24 <re_irc> <@dirbaio:matrix.org> I see!

14:25 <re_irc> <@whitequark:matrix.org> the way transactions are implemented internally is by implementing something very similar to a 1-line cache (both read and write) that gets loaded from the peripheral register when you read the first word or write the last word

14:25 <re_irc> <@whitequark:matrix.org> and you have one "cache line" per register block, ie timer channel, DAC channel, etc. whatever is meaningful to have it be handled by a single software driver

14:25 <re_irc> <@dirbaio:matrix.org> so, within the same "reg block", if I want to write to one field from main and another from interrupt, I can't (without a critical section)

14:25 <re_irc> <@whitequark:matrix.org> +instance

14:26 <re_irc> <@whitequark:matrix.org> if both of the registers are wider than the bus data width, you need a critical section

14:26 <re_irc> <@whitequark:matrix.org> if either of the registers is equal or narrower, the write goes directly to the peripheral

14:27 <re_irc> <@dirbaio:matrix.org> so there's this tradeoff as a periph writer between "big reg blocks -> allows nice big atomic writes but less concurrency" and "small reg blocks -> allows more concurrency but no less writes"

14:27 <re_irc> <@whitequark:matrix.org> also reads and writes do not alias

14:27 <re_irc> <@dirbaio:matrix.org> * less atomic

14:27 <re_irc> <@dirbaio:matrix.org> * writes"?

14:27 <re_irc> <@whitequark:matrix.org> : correct

14:27 <re_irc> <@jamesmunns:beeper.com> Is there any interest in being able to have a "failable transaction"? Like if a transaction was interrupted, you could detect and retry in the interrupted prio level? Or would that massively overcomplicate things vs "just do a critical section if you touch the same block from different prio levels"

14:28 <re_irc> <@whitequark:matrix.org> : the expectation is that in most cases, there is one "obvious" size of the reg block that corresponds to a control set in hardware / concurrency unit in software

14:28 <re_irc> <@jamesmunns:beeper.com> (thinking sort of like CAS loops work, don't want to throw a bunch of uninformed suggestions if that's already been explored/out of scope :D)

14:29 <re_irc> <@jamesmunns:beeper.com> It sounds very cool, and much better than a lot of racy register designs I've seen in the past already!

14:29 <re_irc> <@whitequark:matrix.org> : also nothing stops you from having a reg block per register in your peripheral if you expect heavily concurrent access patterns. the idea is that you only pay for what you use, but never less than is necessary to ensure correctness of larger-than-bus-data-size reads and writes

14:31 <re_irc> <@whitequark:matrix.org> or, more precisely: nothing stops you from having a 1:1 mapping between the actual registers and shadow read/write registers, meaning that at the cost of <3x the flops you never need to synchronize if you are accessing different locations

14:32 <re_irc> <@whitequark:matrix.org> (reg blocks also affect things such as alignment, but you can tune how much resources it consumes directly, if you have those to spend)

14:33 <re_irc> <@whitequark:matrix.org> : one of the consequences this has is that small 8 or 16 bit MCUs in your configuration logic (or even 32 bit ones for 64 bit systems) can safely access all CSRs with the same guarantees as your normal CPU

14:35 <re_irc> <@whitequark:matrix.org> or, like, you could expose them over I2C or JTAG or anything without having to add annoying bodges to make things work

14:36 <re_irc> <@whitequark:matrix.org> : I've thought about it, but the software interface is... unclear

14:36 <re_irc> <@whitequark:matrix.org> what is worse however is that this essentially moves the critical section into hardware, and now you can't split it into two if you've taped the damn thing out

14:37 <re_irc> <@jamesmunns:beeper.com> Totally fair! I can think of a transaction ID, but that would still be susceptible to the ABA problem

14:37 <re_irc> <@whitequark:matrix.org> also that inherently limits how many execution threads can contend on a register block

14:38 <re_irc> <@jamesmunns:beeper.com> like

14:38 <re_irc> swap r0 START_REG ; transaction id is now in r0, say it is 0x42

14:38 <re_irc> write

14:38 <re_irc> copy r0 to r1

14:38 <re_irc> write

14:38 <re_irc> swap r0 COMMIT_REG

14:38 <re_irc> if r0 == r1 -> success, else fail

14:38 <re_irc> <@whitequark:matrix.org> what _would_ work is a functioning bus locking semantics that is respected by all of the interconnect

14:39 <re_irc> <@jamesmunns:beeper.com> oh, I'm immediately seeing problems with that, since you might partially finish the old transaction.

14:39 <re_irc> <@jamesmunns:beeper.com> anyway, ignore me, I'm sure you've thought way more about it :D

14:41 <re_irc> <@whitequark:matrix.org> and AXI4 for example plainly does not have locked transfers

14:41 <re_irc> <@jamesmunns:beeper.com> (the hope was that it only actually "latches" the value if the write to commit matches the current counter or whatever, otherwise it aborts any partial writes) - at 8 bits the ABA problem might apply, but this whole logic might take an unfortunate number of gates, I don't have hardware design brain)

14:42 <re_irc> <@whitequark:matrix.org> hmmm

14:46 <re_irc> <@jamesmunns:beeper.com> IF your arch has an atomic? swap (probably only for COMMIT, START could probably just be a read), AND you are willing to have an N bit rolling counter for the transaction ID, it might be workable? Where a new read to START aborts any pending writes, and a swap with commit where the values don't match also aborts any pending writes, maybe it's workable? But I honestly don't think it's _too_ weird to require a critical...

14:46 <re_irc> ... section, but having some way to NOTICE that (like an exception/interrupt) might be nice if a software dev was having a bad day and forgot :D

14:47 <re_irc> <@whitequark:matrix.org> on APB, an atomic swap looks like read-then-write

14:47 <re_irc> <@jamesmunns:beeper.com> Anyway, won't suggest more/expand unless it's interesting. I went into problem solving mode :p

14:48 <re_irc> <@whitequark:matrix.org> but you can make this work; I would suggest a slightly different approach

14:49 <re_irc> <@whitequark:matrix.org> it won't work for 8-bit data bus width though :(

14:49 <re_irc> <@whitequark:matrix.org> anyway, the approach is: you put a narrow (<=data bus width) register somewhere at the end. it has a counter and an address field

14:50 <re_irc> <@whitequark:matrix.org> at the beginning of a transaction you read the counter, then start writing data. each write increments the coutner

14:50 <re_irc> <@whitequark:matrix.org> * counter

14:51 <re_irc> <@whitequark:matrix.org> at the end you write the _expected_ value of the counter and the address, and if it matches, it strobes a write for that register

14:52 <re_irc> <@dirbaio:matrix.org> yuo could make the peripheral have a "transaction OK" bit. You must write all regs in order, then read the "transaction OK" bit, if false retry.

14:52 <re_irc> <@jamesmunns:beeper.com> the problem is if you have N prio levels all racing

14:52 <re_irc> <@dirbaio:matrix.org> peripheral resets transaction bit to 0 if it sees mis-ordered writes

14:52 <re_irc> <@jamesmunns:beeper.com> how do you prevent reading a "stale" bit if one prio up JUST finished before you checked it

14:53 <re_irc> <@jamesmunns:beeper.com> (well, I don't think it takes MANY prios, just two):

14:53 <re_irc> write ;

14:53 <re_irc> write ; start

14:53 <re_irc> write ;

14:53 <re_irc> INTERRUPT

14:53 <re_irc> write

14:53 <re_irc> finish

14:53 <re_irc> check good

14:53 <re_irc> YIELD

14:53 <re_irc> check good

14:53 <re_irc> <@jamesmunns:beeper.com> maybe checking clears the "good" bit? dunno.

14:54 <re_irc> <@jamesmunns:beeper.com> (well, I don't think it takes MANY prios, just two):

14:54 <re_irc> write ;

14:54 <re_irc> finish

14:54 <re_irc> write ; start

14:54 <re_irc> write ;

14:54 <re_irc> INTERRUPT

14:54 <re_irc> write

14:54 <re_irc> finish

14:54 <re_irc> check good

14:54 <re_irc> YIELD

14:54 <re_irc> check good

14:54 <re_irc> <@dirbaio:matrix.org> actually, two bits: tx_ok, tx_done.

14:54 <re_irc> on write first reg, set tx_ok=1, tx_done=0

14:54 <re_irc> on write any misordered reg, set tx_ok=0, tx_done=0

14:54 <re_irc> on write last reg, if tx_ok=1, commit write, set tx_ok=0, tx_done=1

14:54 <re_irc> on read tx_done, set tx_ok=0, tx_done=0

14:55 <re_irc> <@dirbaio:matrix.org> : "on read tx_done, set tx_done=0" fixes it

14:55 <re_irc> <@jamesmunns:beeper.com> are you interested in suggestions for this? Or are we bikeshedding right now? :D

14:56 <re_irc> <@whitequark:matrix.org> sure why not

14:56 <re_irc> <@whitequark:matrix.org> : actually my suggestion won't work because uh... consider two pieces of code racing to set the exact same register to different values

14:57 <re_irc> <@whitequark:matrix.org> same number of words written, same address, if you have (tx start write write write write (tx start write x4 tx end) tx end), then both will report tx successful

14:58 <re_irc> <@whitequark:matrix.org> yeah you need every read from the transaction register to allocate you a transaction ID

14:58 <re_irc> <@jamesmunns:beeper.com> Is "the arch has a swap instruction" and "the reg block can detect a swap" a workable requirement?

14:58 <re_irc> <@dirbaio:matrix.org> : reading the "tx done" flag the first time would clear it, so the inner tx would succeed but the outer would fail

14:58 <re_irc> <@whitequark:matrix.org> : I think it would be an impractical limitation

14:58 <re_irc> <@jamesmunns:beeper.com> shoot

14:59 <re_irc> <@jamesmunns:beeper.com> : If you got interrupt JUST before the last write, so the the interrupt code writes one register uninterrupted, then checks, (but it meant to write 5 in a row, not one!), would that be handled?

15:01 <re_irc> <@dirbaio:matrix.org> haha I think it can break this way:

15:01 <re_irc> thread 1: write 0

15:01 <re_irc> thread 2: write 0 - aborts the thread 1's transaction

15:01 <re_irc> -- context switch

15:01 <re_irc> thread 1: write 1

15:01 <re_irc> thread 2: write 1

15:01 <re_irc> -- context switch

15:01 <re_irc> thread 1: write 2

15:01 <re_irc> thread 1: write 3

15:01 <re_irc> thread 1: check -> success, but it should fail

15:01 <re_irc> <@jamesmunns:beeper.com> Yeah, that's about what I meant :/

15:02 <re_irc> <@whitequark:matrix.org> : I don't see why you can't have something like:

15:02 <re_irc> loop:

15:02 <re_irc> mov [D1], #2

15:02 <re_irc> mov [D0], #1

15:02 <re_irc> mov r0, [TXCTL] ; increments TXID, returns new TXID, clears TXDONE

15:02 <re_irc> mov [TXCTL], r0

15:02 <re_irc> mov r1, [TXCTL]

15:02 <re_irc> xor r0, r0, r1

15:02 <re_irc> jne r0, #TXDONE, loop

15:02 <re_irc> <@whitequark:matrix.org> forgive my horrific bastard assembly

15:02 <re_irc> <@dirbaio:matrix.org> it works only with "nested" preemption like interrupts, not arbitrary RTOS context switching

15:02 <re_irc> <@dirbaio:matrix.org> unless the RTOS writes a magic "abort all transactions" bit on context switch 😂

15:02 <re_irc> <@dirbaio:matrix.org> my idea works only with "nested" preemption like interrupts, not arbitrary RTOS context switching

15:03 <re_irc> <@whitequark:matrix.org> : I have ideas related to that, but let's process things one byone

15:03 <re_irc> <@whitequark:matrix.org> * by one

15:04 <re_irc> <@dirbaio:matrix.org> doing "force abort transaction" like that would suck for "forward progress" guarantees and for multicore

15:04 <re_irc> <@dirbaio:matrix.org> wow this is a hard problem 😂

15:04 <re_irc> <@jamesmunns:beeper.com> : So (my asm is bad):

15:04 <re_irc> - read ID from TXCTL

15:04 <re_irc> - write ID back to TXCTL

15:04 <re_irc> - do writes

15:04 <re_irc> - read TXCTL

15:04 <re_irc> - if old and new read match, success

15:04 <re_irc> ?

15:04 <re_irc> <@whitequark:matrix.org> : if new read is the same but has TXDONE set, success

15:04 <re_irc> <@whitequark:matrix.org> "(old^new)==TXDONE"

15:04 <re_irc> <@jamesmunns:beeper.com> so WRITES don't incr TXCTL, but reads do?

15:04 <re_irc> <@jamesmunns:beeper.com> ahhh

15:05 <re_irc> <@jamesmunns:beeper.com> how do you tell the difference between "read to start" and "read to check"?

15:05 <re_irc> <@whitequark:matrix.org> : yes. read from TXCTL is "ROLLBACK; BEGIN TRANSACTION x;"; write is "COMMIT TRANSACTION x;"

15:05 <re_irc> <@jamesmunns:beeper.com> like if you got interrupted and the interrupt does a read to start?

15:05 <re_irc> <@whitequark:matrix.org> if the interrupt does a read to start, TXDONE would be 0

15:07 <re_irc> <@whitequark:matrix.org> TXID flow is like... "TXID=n --(read TXID)--> TXID=n+1,TXDONE=0,return {TXDONE,TXID} --(write TXID=n+1)--> TXID=n+1,TXDONE=1,return {TXDONE,TXID}"

15:07 <re_irc> <@jamesmunns:beeper.com> if you separate TXDONE and TXCTL, I think you could do it without XOR?

15:07 <re_irc> write d0

15:07 <re_irc> write d1

15:07 <re_irc> r0 = TXCTL ; 0x42

15:07 <re_irc> write d2

15:07 <re_irc> TXCTL = r0 ;

15:07 <re_irc> r1 = TXDONE ; read the last completed transaction ID

15:07 <re_irc> if r0 == r1 success

15:07 <re_irc> <@jamesmunns:beeper.com> so like, a successful write latches the last ID into TXDONE

15:08 <re_irc> <@whitequark:matrix.org> the XOR is just there because TXDONE is the highest bit of TXID

15:08 <re_irc> <@whitequark:matrix.org> I'm too used to hyperoptimizing assembly, it isn't load bearing

15:08 <re_irc> <@whitequark:matrix.org> think of it as a mask+and but shorter

15:08 <re_irc> <@whitequark:matrix.org> * mask+compare

15:08 <re_irc> <@jamesmunns:beeper.com> yeah, I just don't grok what the values would be for the interrupted and interrupter code

15:09 <re_irc> <@whitequark:matrix.org> oh you're right, that's too many side effects stuffed into one register

15:09 <re_irc> <@whitequark:matrix.org> yes I agree they must be separate

15:09 <re_irc> <@jamesmunns:beeper.com> I _think_ my approach is workable?

15:10 <re_irc> <@whitequark:matrix.org> yes that will work

15:10 <re_irc> <@whitequark:matrix.org> and it's actually very cheap to implement

15:10 <re_irc> <@jamesmunns:beeper.com> it does require 2x regs of "max transaction counter bits", and a counter, but doesn't require a swap or anything fancy

15:10 <re_irc> <@whitequark:matrix.org> that costs ~nothing

15:10 <re_irc> <@whitequark:matrix.org> even for a 32 bit counter

15:10 <re_irc> <@jamesmunns:beeper.com> maybe you can choose how much ABA insurance you want? lol

15:11 <re_irc> <@jamesmunns:beeper.com> yeah, 32 bits of ABA would mean you had to be interrupted for 4 billion other transactions to get a false positive lol

15:11 <re_irc> <@jamesmunns:beeper.com> even 255 seems EXCEEDINGLY unlikely unless something has gone terribly wrong

15:12 <re_irc> <@jamesmunns:beeper.com> So TXCTL gets incremented on any non-sequential writes to the block, or any time it is read

15:12 <re_irc> <@jamesmunns:beeper.com> and the values only get latched from the shadow reg to the real block when writing to done when txdone == txctl

15:12 <re_irc> <@whitequark:matrix.org> if you keep it to 255, you get the benefit of being able to use 8-bit CPUs

15:13 <re_irc> <@jamesmunns:beeper.com> I think it's reasonable to just say that the width of txctl/done could match the data bus width? or thats the default at least?

15:13 <re_irc> <@jamesmunns:beeper.com> (I have no good understanding of cost nor "reasonable cost" here, take my opinion with all of the salt)

15:14 <re_irc> <@whitequark:matrix.org> normally the peripheral doesn't care too much about the data bus width, but I guess the transaction machinery does

15:14 <re_irc> <@whitequark:matrix.org> the cost of all of this is ~negligible

15:14 <re_irc> <@whitequark:matrix.org> if you keep it to a 8-bit counter, you can forget about it no matter the process or frequency

15:15 <re_irc> <@whitequark:matrix.org> 64-bit counter that can get incremented on every cycle can be more tricky sometimes

15:15 <re_irc> <@whitequark:matrix.org> especially on FPGAs

15:16 <re_irc> <@jamesmunns:beeper.com> from a security perspective, 256 bytes could make it easy to force a false positive, just read TXCTL 255 times. But that's probably not anybody's threat model :p

15:16 <re_irc> <@jamesmunns:beeper.com> err, 8 bits of counter

15:17 <re_irc> <@whitequark:matrix.org> a single CSR block is not a security boundary, no

15:17 <re_irc> <@whitequark:matrix.org> anyway, there's something more interesting here

15:18 <re_irc> <@whitequark:matrix.org> so when making this machinery, we have decided to pay / assumed paid the cost of having full coverage of shadow registers (as opposed to "pay as you go"). this is still very cheap on newer ASIC processes

15:18 <re_irc> <@whitequark:matrix.org> now... until now, transactions spanned one register block. however why not make them global?

15:19 <re_irc> <@whitequark:matrix.org> in fact, why not add "transaction channels" that can be allocated to e.g. RTOS tasks that let them atomically commit arbitrarily large updates to the SoC state instantly (+ CDC latency if you have multiple clocks)

15:19 <re_irc> <@jamesmunns:beeper.com> oh that is EXTREMELY cool.

15:20 <re_irc> <@whitequark:matrix.org> instead of TXCTL/TXDONE, you would have a TXCHAN register, which selects one of the global transaction coordinators. the per-peripheral cost is extremely low, bordering on negligible, and can be made so low (with a latency tradeoff) that you could have thousands of channels

15:21 <re_irc> <@whitequark:matrix.org> each channel has TXCTLn/TXDONEn with the same semantics, but instead the writes are queued and then committed/aborted for every peripheral where TXCHAN=n

15:21 <re_irc> <@whitequark:matrix.org> you know the complex stuff STM32 has to coordinate timers and ADCs or DACs so you can do updates in phase?

15:22 <re_irc> <@whitequark:matrix.org> you can throw all of that out, now it's just memory writes

15:22 <re_irc> <@jamesmunns:beeper.com> would that still work with RMW approaches?

15:22 <re_irc> <@whitequark:matrix.org> elaborate?

15:23 <re_irc> <@jamesmunns:beeper.com> like:

15:23 <re_irc> - task1: transaction 1 starts

15:23 <re_irc> - task1: transaction 1 ends

15:23 <re_irc> - task0: read current state

15:23 <re_irc> - task0: transaction 2 starts

15:23 <re_irc> - task0: transaction 2 ends

15:24 <re_irc> <@dirbaio:matrix.org> iiuc it's "atomically write N regs", not "atomically read modify write"

15:24 <re_irc> <@whitequark:matrix.org> nothing stops you from capturing the state of every associated peripheral when you start a transaction

15:24 <re_irc> <@jamesmunns:beeper.com> like, the transactions are still atomic and serialized, but if someone gets there between read and write, it could be applying "stale" modifications, undoing something that happened in transaction 1

15:24 <re_irc> <@jamesmunns:beeper.com> yeah, that's true

15:24 <re_irc> <@whitequark:matrix.org> so that the reads give you the data at the time of transaction start

15:24 <re_irc> <@jamesmunns:beeper.com> if you have a shadow register, you could latch IN on transaction start

15:25 <re_irc> <@dirbaio:matrix.org> : yea, but that still doesn't give you atomic RMW if another transcation commits in between of your transaction start + commit

15:25 <re_irc> <@whitequark:matrix.org> there's two, one for reads and one for writes (since the read and write halves don't really have to refer to the same storage)

15:25 <re_irc> <@whitequark:matrix.org> : atomic RMW works internally by asserting a lock on the bus

15:25 <re_irc> <@jamesmunns:beeper.com> yeah, if multiple transactions overlap "scope", I feel like they need to invalidate each other?

15:26 <re_irc> <@whitequark:matrix.org> it's a SoC-level critical section basically

15:26 <re_irc> <@jamesmunns:beeper.com> (I think this is all sound for "one global channel", applied atomically, I'm not so sure about N channels of serializable transactions)

15:26 <re_irc> <@jamesmunns:beeper.com> but maybe I don't understand that part?

15:27 <re_irc> <@jamesmunns:beeper.com> like if you start a database transaction, read a bunch of rows, then someone else starts one, modifies those rows, and then finishes the transaction, your first transaction is now wrong

15:27 <re_irc> <@whitequark:matrix.org> a single peripheral can only belong to one channel, so you basically partition your system into independent parts

15:27 <re_irc> <@whitequark:matrix.org> this is good for ensuring you e.g. toggle GPIOs in multiple banks in phase alignment, or start a bunch of timers and ADCs at the same time

15:28 <re_irc> <@jamesmunns:beeper.com> so starting the second one either must invalidate the first, or the second one must be blocked until the first one completes (I think?)

15:28 <re_irc> <@whitequark:matrix.org> it is not "true" database transactions so much as them being a convenient vehicle for explanation

15:28 <re_irc> <@jamesmunns:beeper.com> yeah, but I was applying the metaphor :D

15:29 <re_irc> <@whitequark:matrix.org> anyway, a task can spin on while (TXCTL!=TXDONE);

15:29 <re_irc> <@jamesmunns:beeper.com> I think the same thing is true for "N channels", which is more like "how many database connections can you have open at once" (in this metaphor)

15:29 <re_irc> <@jamesmunns:beeper.com> Yeah, that's true!

15:29 <re_irc> <@jamesmunns:beeper.com> I think "one pair of global TXCTL + TXDONE" with "N channels of shadow registers" should still work?

15:29 <re_irc> <@whitequark:matrix.org> : however here peripherals are statically allocated to a connection, unlike in databases where rows are locked

15:30 <re_irc> <@whitequark:matrix.org> : now THAT is incredibly expensive

15:30 <re_irc> <@jamesmunns:beeper.com> maybe I'm not following your N channel explanation then :)

15:31 <re_irc> <@jamesmunns:beeper.com> (I get the "synced changes to multiple reg blocks" goal, but not how you got there, I guess)

15:32 <re_irc> <@whitequark:matrix.org> so each channel has a START and DONE strobes, for n channels, you have START[0..n-1] and DONE[0..n-1]

15:32 <re_irc> <@whitequark:matrix.org> when you set TXCHAN=n, you pick STARTn to capture values into read shadow register and DONEn to apply values from write shadow register

15:33 <re_irc> <@whitequark:matrix.org> so that's 2*(n wires) per transaction channel plus 2*(n:1 multiplexer) per peripheral

15:34 <re_irc> <@whitequark:matrix.org> which is a low to moderate cost if your n stays under, say, 32

15:34 <re_irc> <@whitequark:matrix.org> * 16

15:34 <re_irc> <@jamesmunns:beeper.com> So if CHANNEL1 wants to modify ADC0 and DAC1, and CHANNEL2 wants to modify ADC0 and I2S0, how do you keep channel1 and channel2 from stepping on each-others toes?

15:35 <re_irc> <@whitequark:matrix.org> this is impossible to express in this system

15:35 <re_irc> <@jamesmunns:beeper.com> (where "modify" can be all/any of RMW)

15:35 <re_irc> <@jamesmunns:beeper.com> gotcha

15:35 <re_irc> <@jamesmunns:beeper.com> (that's what I was trying to figure out how you handled, which wasn't what you were trying to handle)

15:35 <re_irc> <@whitequark:matrix.org> only a single owner can control a peripheral. this is an existing logical requirement in Amaranth SoC, and with transaction channels it will be a physical requirement too

15:36 <re_irc> <@whitequark:matrix.org> well, you basically give ownership to a transaction channel

15:36 <re_irc> <@jamesmunns:beeper.com> ohhhhhh, so it's more about sharing shadow registers between periphs?

15:36 <re_irc> <@jamesmunns:beeper.com> instead of one shadow register dedicated to each periph?

15:37 <re_irc> <@whitequark:matrix.org> nope, it's about being able to do an atomic write across multiple peripherals, vs only within a single periphera

15:37 <re_irc> <@whitequark:matrix.org> * peripheral

15:37 <re_irc> <@whitequark:matrix.org> sharing shadow registers will ... well, not work. it will be more costly than not sharing them (in routing)

15:37 <re_irc> <@whitequark:matrix.org> in general, flops are cheap, unless you start building entire memories out of them

15:38 <re_irc> <@whitequark:matrix.org> adding 2 flops to every 1 flop with minimal addressing logic: extremely cheap

15:38 <re_irc> <@whitequark:matrix.org> adding 16 flops to every 1 flop: quite expensive

15:38 <re_irc> <@whitequark:matrix.org> +and mostly because of non-flop things you now need to add too

15:38 <re_irc> <@jamesmunns:beeper.com> gotcha... I don't totally grok how you achieve the "apply to many" part, or how the channels help, but that's okay.

15:39 <re_irc> <@jamesmunns:beeper.com> If you have a clear picture, that matters more :D

15:39 <re_irc> <@whitequark:matrix.org> so the core idea here is that we queue writes and then apply them in bulk on some strobe, right?

15:39 <re_irc> <@jamesmunns:beeper.com> yep!

15:40 <re_irc> <@whitequark:matrix.org> with the original TXCTL/TXDONE discussion, this strobe was generated by a simple per-peripheral transaction state machine

15:40 <re_irc> <@jamesmunns:beeper.com> with you so far!

15:40 <re_irc> <@whitequark:matrix.org> but having it per-peripheral is limiting. we can take the exact same TXCTL/TXDONE thing and make it SoC-global

15:41 <re_irc> <@jamesmunns:beeper.com> yup! With you so far ("one global txctl/txdone" makes sense to me)

15:41 <re_irc> <@whitequark:matrix.org> so there's now one instead of Nperipheral copies of it

15:41 <re_irc> <@jamesmunns:beeper.com> okay this is where you lose me!

15:41 <re_irc> <@jamesmunns:beeper.com> where does "N pairs" help where "one global pair" doesn't?

15:42 <re_irc> <@whitequark:matrix.org> in your firmware, you would normally split responsibilities between tasks, right?

15:42 <re_irc> <@whitequark:matrix.org> one handles GPIO bitbang, another handles ADC readout

15:42 <re_irc> <@whitequark:matrix.org> so let's say you want to bitbang GPIOA/GPIOB in sync, and sample ADC0 and ADC1 in sync

15:43 <re_irc> <@whitequark:matrix.org> if you have a single pair, now you've got contention

15:43 <re_irc> <@whitequark:matrix.org> if you have two (for the sake of example, connected to all GPIOs and all ADCs) you don't have it anymore

15:43 <re_irc> <@jamesmunns:beeper.com> gotcha, so it DOES let you batch writes to M registers, but DOESN'T help if there is contention where multiple channels want to touch the same registers?

15:44 <re_irc> <@jamesmunns:beeper.com> so you can have N concurrent batches firing, but they MUST NEVER touch the same peripheral at the same time?

15:44 <re_irc> <@whitequark:matrix.org> yeah

15:44 <re_irc> <@jamesmunns:beeper.com> cool! The database metaphor was wanting for a mechanism to detect when channel 3 touches something already touched by channel 2

15:45 <re_irc> <@whitequark:matrix.org> so that's actually also doable, fairly cheaply

15:45 <re_irc> <@jamesmunns:beeper.com> (so channel 3 or channel 2's commit fails deterministically)

15:45 <re_irc> <@jamesmunns:beeper.com> (either, the latecomer fails, or the interrupted transaction fails)

15:46 <re_irc> <@whitequark:matrix.org> if your RTOS writes the TXCHAN for the current task to some register, it's possible to e.g. turn writes to peripherals assigned to another TXCHAN into hard faults

15:46 <re_irc> <@jamesmunns:beeper.com> so each channel would need to "claim" each block, and a claim of a claimed block would raise an exception?

15:47 <re_irc> <@whitequark:matrix.org> rather a register block would be assigned to a channel (the inverse relationship works much better hardware wise)

15:49 <re_irc> <@jamesmunns:beeper.com> okay, so something like:

15:49 <re_irc> ADC0_CHAN = 4 ; faults if chan wasn't "idle" before

15:49 <re_irc> r0 = TX_CTL_C4;

15:49 <re_irc> DAC1_CHAN = 4 ; faults if chan wasn't "idle before

15:49 <re_irc> ...

15:49 <re_irc> TX_CTL_C4 = r0;

15:49 <re_irc> r1 = TX_DONE_C4; ; maybe this also releases claim automatically?

15:49 <re_irc> if r0 == r1 => success

15:50 <re_irc> <@whitequark:matrix.org> I was thinking of a different use pattern but I think that would work fine with my proposa

15:50 <re_irc> <@whitequark:matrix.org> * proposal

15:51 <re_irc> <@jamesmunns:beeper.com> That makes sense!

15:51 <re_irc> <@jamesmunns:beeper.com> I think this does mean that if you tried to touch the same periph in an interrupt, you just immediately fault now tho lol

15:51 <re_irc> <@jamesmunns:beeper.com> even if you were nice and used a different channel

15:51 <re_irc> <@jamesmunns:beeper.com> so uh, the transaction stuff is maybe not needed anymore?

15:51 <re_irc> <@jamesmunns:beeper.com> you now have like mutexed locked owners of peripheriphs

15:52 <re_irc> <@jamesmunns:beeper.com> (you do still get "apply to many at once" behavior, but not "failable transactions" to the same block)

15:52 <re_irc> <@whitequark:matrix.org> without the transaction stuff you can't assign several things simultaneously

15:52 <re_irc> <@jamesmunns:beeper.com> I mean, by setting the block's channel, you are essentially mutex locking it anyway

15:53 <re_irc> <@whitequark:matrix.org> yea they don't need to be fallible anymore, could reduce it all to one bit PENDING, capture on rising, apply on falling

15:53 <re_irc> <@whitequark:matrix.org> you could also ditch the hardware completely now and have eg 4 billion channels

15:54 <re_irc> <@whitequark:matrix.org> just broadcast the transaction number and acquire/release bit across the entire SoC

15:54 <re_irc> <@whitequark:matrix.org> this is actually getting closer to how NoCs work

15:54 <re_irc> <@jamesmunns:beeper.com> Yeah, I think it's a sideways step between "failable transactions on a peripheral" and "global application to many peripherals"

15:55 <re_irc> <@jamesmunns:beeper.com> like, not better or worse, just two different (mostly) unrelated features :D

15:56 <re_irc> <@whitequark:matrix.org> they're related in that they all involve "grouping of peripherals"

15:56 <re_irc> <@whitequark:matrix.org> which is basically just hardware tracking of ownership or its aspects

15:56 <re_irc> <@jamesmunns:beeper.com> That makes sense!

15:57 <re_irc> <@jamesmunns:beeper.com> Thanks for all the explanation, it makes a lot more sense now what your goal(s) are.

16:27 Socker has quit [Ping timeout: 260 seconds]

16:40 Socker has joined #rust-embedded

17:20 <re_irc> <@lambdafriend:matrix.org> I'm working through how to provide my embedded code (STM32H7xx fwiw) with a chunk of sdram that I can divy up and use as buffers in various structs. The allocations does not change during the life of the program.

17:20 <re_irc> #[link_section = ".sdram"]

17:20 <re_irc> static mut MEMORY: [f32; MAX_SIZE] = [0.0; MAX_SIZE];

17:20 <re_irc> #[no_mangle]

17:20 <re_irc> ...elsewhere

17:20 <re_irc> let memory: &'static mut [f32; MAX_SIZE] = unsafe { &mut MAX_SIZE };

17:20 <re_irc> // partition memory into N sized mutable slices for use in N structs

17:20 <re_irc> The issue I am facing specifically is that "array::from_fn" impls "FnMut" and so mutable slice references into "memory" created within the closure do not live past that. However, I know it does (at least I think it does, right?).

17:20 <re_irc> I think what I'm wanting is an allocator, but really my use case is really straight forward.

17:20 <re_irc> Does anyone have any advice on how to do this?

17:24 <re_irc> <@dirbaio:matrix.org> you want an array of "&'static mut [f32]"?

17:27 <re_irc> <@dirbaio:matrix.org> slicing with "&mut memory[a..b]" doesn't work because the borrows will be shorter

17:27 <re_irc> <@dirbaio:matrix.org> you can split a slice without downgrading lifetimes with ".split_at_mut()"

17:27 <re_irc> <@dirbaio:matrix.org> let mut memory: &'static mut [f32] = unsafe { &mut MAX_SIZE };

17:27 <re_irc> let mut chunks: [&'static mut [f32]; 16] = [&mut []; 16];

17:27 <re_irc> let (a,b) = m.split_at_mut(1024); // split it. this gives two `&'static mut [f32]`. Same lifetime!

17:27 <re_irc> for i in 0..16 {

17:27 <re_irc> let m = mem::take(&mut memory); // take out memory, leaving an empty slice behind.

17:27 <re_irc> chunks[i] = a; // the 1024 sample chunk.

17:27 <re_irc> memory = b; // put back the rest of the memory so we can take another chunk out in the next iteration.

17:27 <re_irc> }

17:28 <re_irc> <@dirbaio:matrix.org> the reason "split_at_mut()" works is it guarantees the slices are not overlapping, while "&mut memory[a..b]" doesn't.

17:30 <re_irc> <@jamesmunns:beeper.com> (fwiw: I'd probably recommend using something like "singleton!()" or "StaticCell" instead of a static mut, because if you ever make some code reentrant, then you'll at least get a panic or something)

17:31 <re_irc> <@jamesmunns:beeper.com> then you can just put the "static allocations" locally, instead of having to chunk them out with a bunch of unsafe code at runtime.

17:31 <re_irc> <@jamesmunns:beeper.com> I also am unreasonably biased about "static mut"s existing anywhere though, so take it for what it's worth :D

17:36 <re_irc> <@dirbaio:matrix.org> btw if the chunk sizes are known at compile time you can let the linker lay them out for you. Just make separate statics for each.

17:37 <re_irc> <@jamesmunns:beeper.com> I think the "static_alloc" crate implements a basic bump allocator, if you ever need to do it at runtime

17:37 <re_irc> <@jamesmunns:beeper.com> https://docs.rs/static-alloc/latest/static_alloc/bump/index.html

17:38 <re_irc> <@jamesmunns:beeper.com> You can use it as a global allocator, or just for doing "alloc at ~start" type stuff.

17:41 <re_irc> <@lambdafriend:matrix.org> Fantastic! I appreciate all the advice. I'm going to work through the options and see what sticks. 🙌

17:48 Socker has quit [Ping timeout: 250 seconds]

18:00 Socker has joined #rust-embedded

18:19 Socker has quit [Ping timeout: 258 seconds]

18:32 Socker has joined #rust-embedded

19:25 <re_irc> <@lambdafriend:matrix.org> : This worked verbatim! 🙌

19:47 Socker has quit [Ping timeout: 250 seconds]

20:01 Socker has joined #rust-embedded

20:22 Sockeee has joined #rust-embedded

20:25 Socker has quit [Ping timeout: 240 seconds]

21:00 jsolano has joined #rust-embedded

22:07 <re_irc> <@dkhayes117:matrix.org> I'm using a crate that expects a Vec<u8> for a token. I will also store the token for comparison later. What is the smartest type to store it as? When I compare it I will be using a "get_token" method which returns a "&[u8]".

22:07 <re_irc> <@dkhayes117:matrix.org> * it,

23:22 <re_irc> <@dkhayes117:matrix.org> I feel like that was a dumb question? idk, I just made it a [u8;8] in my struct.

23:58 IlPalazzo-ojiisa has quit [Quit: Leaving.]