ChanServ changed the topic of #rust-embedded to: Welcome to the Rust Embedded IRC channel! Bridged to #rust-embedded:matrix.org and logged at https://libera.irclog.whitequark.org/rust-embedded, code of conduct at https://www.rust-lang.org/conduct.html
kenny has joined #rust-embedded
emerent has quit [Read error: Connection reset by peer]
emerent has joined #rust-embedded
id_tam has quit [Ping timeout: 250 seconds]
dne has quit [Remote host closed the connection]
dne has joined #rust-embedded
id_tam has joined #rust-embedded
id_tam has quit [Client Quit]
emerent has quit [Remote host closed the connection]
emerent has joined #rust-embedded
<re_irc> <@thejpster:matrix.org> TIL https://gitlab.arm.com/firmware/corstone-hal-rs exists - official Arm PAC and HAL for the Corstone reference System-on-Chip design emulated by both QEMU and the Arm Fixed Virtual Platform.
<re_irc> <@diondokter:matrix.org> Oh, that's pretty cool!
<re_irc> <@thejpster:matrix.org> mad respect to for getting that pulled off
<re_irc> <@ithinuel:matrix.org> 😄 I was not the only one involved 😄 but thanks I'll share with the colleagues involved !
IlPalazzo-ojiisa has joined #rust-embedded
emerent has quit [Ping timeout: 258 seconds]
emerent has joined #rust-embedded
<re_irc> <@whitequark:matrix.org> I've been planning to talk about it more as the bits and pieces fall into the final form, and I think now's as good a time as any to give an overview, considering that Amaranth SoC will feature a Rust CSP generator quite possibly before a C CSP generator
<re_irc> <@whitequark:matrix.org> the concept at the core of the CSR system is that of a register: an abstract N-bit entity that can be read (producing an N-bit word) and/or written (producing an N-bit word as well). usually you see all of the registers being APB word sized, but this has limitations and Amaranth steps away from it a bit conceptually
<re_irc> <@whitequark:matrix.org> it does so because a register access (which from the point of view of the peripheral always happens in a single cycle), in this model, is a transaction (in a database sense); you can read all of the fields in a register, or write any subset of the fieds, while having a guarantee (unless you're doing explicitly unsafe things) that under no circumstances will the CPU race with the peripheral
<re_irc> <@whitequark:matrix.org> * fields,
<re_irc> <@whitequark:matrix.org> so the fields can be read-only (owned by the peripheral), read-write or write-only (owned by the CPU), or a flag (ownership shared between CPU and peripheral with the operations restricted); the flag can be set by the peripheral, cleared by the CPU, and if set+cleared simultaneously, setting it wins since the CPU could only possibly be clearing a flag it's previously read
<re_irc> <@whitequark:matrix.org> this is a bit long-winded but my point is that there are direct parallels between synchronization between software tasks done in e.g. Rust via shared memory, and synchronization between a software and a hardware task via MMIO
<re_irc> <@dirbaio:matrix.org> 👀 very interesting
<re_irc> <@whitequark:matrix.org> if the infrastructure makes it possible to take an arbitrary set of fields (all of which have a fixed point--that is, the CPU can write any field and have it reliably be a no-op) and read or write them all simultaneously, it means that you effectively have a form transactional memory (without a rollback operation, but you don't need one if all transactions take the same unit time to complete)
<re_irc> <@whitequark:matrix.org> since the transactionality is ensured on the peripheral side, no matter how weird your bus interconnect or multitasking system is, you can always know that the transaction will be completed in a single cycle in its entirety
<re_irc> <@dirbaio:matrix.org> how do the transactions work? multiple AHB reads/writes, and the periph tracks some state of the WIP transaciton?
<re_irc> <@whitequark:matrix.org> and having infrastructure support for registers wide than the bus interconnect is key to enabling that, since otherwise you're screwed the moment you want more fields in a transaction than you have bits in PWDATA
<re_irc> <@dirbaio:matrix.org> * APB
<re_irc> <@whitequark:matrix.org> : the CSR bus is a variation on APB, and yes, essentially
<re_irc> <@whitequark:matrix.org> how to track the state of the transaction was actually one of the contentious points here, because it is logically a resource owned by the thread of execution (not even a bus initiator), but in the absence of useful bus locking support, you can only allocate this resource inside the peripheral
<re_irc> <@whitequark:matrix.org> the solution I came up with is actually to rely on the ownership model _again_: if a register is wider than the bus data width, it must be written in its entirety, in an ascending order, with no intervening writes to other registers within the same CSR multiplexer (~register block, aka peripheral channel), or the behavior of the peripheral is UNPREDICTABLE until it is reset
<re_irc> <@whitequark:matrix.org> this means that you can snap the ADC peripheral (with a shared SAR block) into e.g. 8 channels and give them all away to RTOS tasks without caring which does what, but if you have both a task and an interrupt touching the same channel, you need to ensure atomicity of access yourself
<re_irc> <@whitequark:matrix.org> the solution I came up with is actually to rely on the ownership model _again_: if a register is wider than the bus data width, it must be written in its entirety, in an ascending order, with no intervening writes to other wide registers within the same CSR multiplexer (~register block, aka peripheral channel), or the behavior of the peripheral is UNPREDICTABLE until it is reset
<re_irc> <@whitequark:matrix.org> i.e. disable the interrupt. if you have no concurrent access to wide registers (meaning the hardware resources handling the transaction for that case are uncontested) you can do whatever, it works the same way you'd expect
<re_irc> <@whitequark:matrix.org> within these constraints there is actually a _really_ lean and elegant implementation of the transaction mechanism that uses negligible amounts of area and delay, meaning we can forget about having to painstakingly shuffle bits in peripheral registers until you get atomicity for at least the most important things
<re_irc> <@whitequark:matrix.org> nothing stops you from e.g. making a write of the _entire register set of a timer_ atomic
<re_irc> <@dirbaio:matrix.org> I see. So the periph author decides how to split the regs, and the firmware author must ensure no concurrent accesses to the same reg, but concurrent accesses to separate regs are allowed?
<re_irc> <@whitequark:matrix.org> or a write to the entire clocking control block
<re_irc> <@jamesmunns:beeper.com> > you need to ensure atomicity of access yourself... i.e. disable the interrupt
<re_irc> You answered my one question :D
<re_irc> <@whitequark:matrix.org> : replace "reg" with "reg block" and that would be it
<re_irc> <@dirbaio:matrix.org> I see!
<re_irc> <@whitequark:matrix.org> the way transactions are implemented internally is by implementing something very similar to a 1-line cache (both read and write) that gets loaded from the peripheral register when you read the first word or write the last word
<re_irc> <@whitequark:matrix.org> and you have one "cache line" per register block, ie timer channel, DAC channel, etc. whatever is meaningful to have it be handled by a single software driver
<re_irc> <@dirbaio:matrix.org> so, within the same "reg block", if I want to write to one field from main and another from interrupt, I can't (without a critical section)
<re_irc> <@whitequark:matrix.org> +instance
<re_irc> <@whitequark:matrix.org> if both of the registers are wider than the bus data width, you need a critical section
<re_irc> <@whitequark:matrix.org> if either of the registers is equal or narrower, the write goes directly to the peripheral
<re_irc> <@dirbaio:matrix.org> so there's this tradeoff as a periph writer between "big reg blocks -> allows nice big atomic writes but less concurrency" and "small reg blocks -> allows more concurrency but no less writes"
<re_irc> <@whitequark:matrix.org> also reads and writes do not alias
<re_irc> <@dirbaio:matrix.org> * less atomic
<re_irc> <@dirbaio:matrix.org> * writes"?
<re_irc> <@whitequark:matrix.org> : correct
<re_irc> <@jamesmunns:beeper.com> Is there any interest in being able to have a "failable transaction"? Like if a transaction was interrupted, you could detect and retry in the interrupted prio level? Or would that massively overcomplicate things vs "just do a critical section if you touch the same block from different prio levels"
<re_irc> <@whitequark:matrix.org> : the expectation is that in most cases, there is one "obvious" size of the reg block that corresponds to a control set in hardware / concurrency unit in software
<re_irc> <@jamesmunns:beeper.com> (thinking sort of like CAS loops work, don't want to throw a bunch of uninformed suggestions if that's already been explored/out of scope :D)
<re_irc> <@jamesmunns:beeper.com> It sounds very cool, and much better than a lot of racy register designs I've seen in the past already!
<re_irc> <@whitequark:matrix.org> : also nothing stops you from having a reg block per register in your peripheral if you expect heavily concurrent access patterns. the idea is that you only pay for what you use, but never less than is necessary to ensure correctness of larger-than-bus-data-size reads and writes
<re_irc> <@whitequark:matrix.org> or, more precisely: nothing stops you from having a 1:1 mapping between the actual registers and shadow read/write registers, meaning that at the cost of <3x the flops you never need to synchronize if you are accessing different locations
<re_irc> <@whitequark:matrix.org> (reg blocks also affect things such as alignment, but you can tune how much resources it consumes directly, if you have those to spend)
<re_irc> <@whitequark:matrix.org> : one of the consequences this has is that small 8 or 16 bit MCUs in your configuration logic (or even 32 bit ones for 64 bit systems) can safely access all CSRs with the same guarantees as your normal CPU
<re_irc> <@whitequark:matrix.org> or, like, you could expose them over I2C or JTAG or anything without having to add annoying bodges to make things work
<re_irc> <@whitequark:matrix.org> : I've thought about it, but the software interface is... unclear
<re_irc> <@whitequark:matrix.org> what is worse however is that this essentially moves the critical section into hardware, and now you can't split it into two if you've taped the damn thing out
<re_irc> <@jamesmunns:beeper.com> Totally fair! I can think of a transaction ID, but that would still be susceptible to the ABA problem
<re_irc> <@whitequark:matrix.org> also that inherently limits how many execution threads can contend on a register block
<re_irc> <@jamesmunns:beeper.com> like
<re_irc> swap r0 START_REG ; transaction id is now in r0, say it is 0x42
<re_irc> write
<re_irc> copy r0 to r1
<re_irc> write
<re_irc> write
<re_irc> write
<re_irc> swap r0 COMMIT_REG
<re_irc> if r0 == r1 -> success, else fail
<re_irc> <@whitequark:matrix.org> what _would_ work is a functioning bus locking semantics that is respected by all of the interconnect
<re_irc> <@jamesmunns:beeper.com> oh, I'm immediately seeing problems with that, since you might partially finish the old transaction.
<re_irc> <@jamesmunns:beeper.com> anyway, ignore me, I'm sure you've thought way more about it :D
<re_irc> <@whitequark:matrix.org> and AXI4 for example plainly does not have locked transfers
<re_irc> <@jamesmunns:beeper.com> (the hope was that it only actually "latches" the value if the write to commit matches the current counter or whatever, otherwise it aborts any partial writes) - at 8 bits the ABA problem might apply, but this whole logic might take an unfortunate number of gates, I don't have hardware design brain)
<re_irc> <@whitequark:matrix.org> hmmm
<re_irc> <@jamesmunns:beeper.com> IF your arch has an atomic? swap (probably only for COMMIT, START could probably just be a read), AND you are willing to have an N bit rolling counter for the transaction ID, it might be workable? Where a new read to START aborts any pending writes, and a swap with commit where the values don't match also aborts any pending writes, maybe it's workable? But I honestly don't think it's _too_ weird to require a critical...
<re_irc> ... section, but having some way to NOTICE that (like an exception/interrupt) might be nice if a software dev was having a bad day and forgot :D
<re_irc> <@whitequark:matrix.org> on APB, an atomic swap looks like read-then-write
<re_irc> <@jamesmunns:beeper.com> Anyway, won't suggest more/expand unless it's interesting. I went into problem solving mode :p
<re_irc> <@whitequark:matrix.org> but you can make this work; I would suggest a slightly different approach
<re_irc> <@whitequark:matrix.org> it won't work for 8-bit data bus width though :(
<re_irc> <@whitequark:matrix.org> anyway, the approach is: you put a narrow (<=data bus width) register somewhere at the end. it has a counter and an address field
<re_irc> <@whitequark:matrix.org> at the beginning of a transaction you read the counter, then start writing data. each write increments the coutner
<re_irc> <@whitequark:matrix.org> * counter
<re_irc> <@whitequark:matrix.org> at the end you write the _expected_ value of the counter and the address, and if it matches, it strobes a write for that register
<re_irc> <@dirbaio:matrix.org> yuo could make the peripheral have a "transaction OK" bit. You must write all regs in order, then read the "transaction OK" bit, if false retry.
<re_irc> <@jamesmunns:beeper.com> the problem is if you have N prio levels all racing
<re_irc> <@dirbaio:matrix.org> peripheral resets transaction bit to 0 if it sees mis-ordered writes
<re_irc> <@jamesmunns:beeper.com> how do you prevent reading a "stale" bit if one prio up JUST finished before you checked it
<re_irc> <@jamesmunns:beeper.com> (well, I don't think it takes MANY prios, just two):
<re_irc> write ;
<re_irc> write ; start
<re_irc> write ;
<re_irc> INTERRUPT
<re_irc> write
<re_irc> write
<re_irc> write
<re_irc> finish
<re_irc> check good
<re_irc> YIELD
<re_irc> check good
<re_irc> <@jamesmunns:beeper.com> maybe checking clears the "good" bit? dunno.
<re_irc> <@jamesmunns:beeper.com> (well, I don't think it takes MANY prios, just two):
<re_irc> write ;
<re_irc> finish
<re_irc> write ; start
<re_irc> write ;
<re_irc> INTERRUPT
<re_irc> write
<re_irc> write
<re_irc> write
<re_irc> finish
<re_irc> check good
<re_irc> YIELD
<re_irc> check good
<re_irc> <@dirbaio:matrix.org> actually, two bits: tx_ok, tx_done.
<re_irc> on write first reg, set tx_ok=1, tx_done=0
<re_irc> on write any misordered reg, set tx_ok=0, tx_done=0
<re_irc> on write last reg, if tx_ok=1, commit write, set tx_ok=0, tx_done=1
<re_irc> on read tx_done, set tx_ok=0, tx_done=0
<re_irc> <@dirbaio:matrix.org> : "on read tx_done, set tx_done=0" fixes it
<re_irc> <@jamesmunns:beeper.com> are you interested in suggestions for this? Or are we bikeshedding right now? :D
<re_irc> <@whitequark:matrix.org> sure why not
<re_irc> <@whitequark:matrix.org> : actually my suggestion won't work because uh... consider two pieces of code racing to set the exact same register to different values
<re_irc> <@whitequark:matrix.org> same number of words written, same address, if you have (tx start write write write write (tx start write x4 tx end) tx end), then both will report tx successful
<re_irc> <@whitequark:matrix.org> yeah you need every read from the transaction register to allocate you a transaction ID
<re_irc> <@jamesmunns:beeper.com> Is "the arch has a swap instruction" and "the reg block can detect a swap" a workable requirement?
<re_irc> <@dirbaio:matrix.org> : reading the "tx done" flag the first time would clear it, so the inner tx would succeed but the outer would fail
<re_irc> <@whitequark:matrix.org> : I think it would be an impractical limitation
<re_irc> <@jamesmunns:beeper.com> shoot
<re_irc> <@jamesmunns:beeper.com> : If you got interrupt JUST before the last write, so the the interrupt code writes one register uninterrupted, then checks, (but it meant to write 5 in a row, not one!), would that be handled?
<re_irc> <@dirbaio:matrix.org> haha I think it can break this way:
<re_irc> thread 1: write 0
<re_irc> thread 2: write 0 - aborts the thread 1's transaction
<re_irc> -- context switch
<re_irc> thread 1: write 1
<re_irc> thread 2: write 1
<re_irc> -- context switch
<re_irc> thread 1: write 2
<re_irc> thread 1: write 3
<re_irc> thread 1: check -> success, but it should fail
<re_irc> <@jamesmunns:beeper.com> Yeah, that's about what I meant :/
<re_irc> <@whitequark:matrix.org> : I don't see why you can't have something like:
<re_irc> loop:
<re_irc> mov [D1], #2
<re_irc> mov [D0], #1
<re_irc> mov r0, [TXCTL] ; increments TXID, returns new TXID, clears TXDONE
<re_irc> mov [TXCTL], r0
<re_irc> mov r1, [TXCTL]
<re_irc> xor r0, r0, r1
<re_irc> jne r0, #TXDONE, loop
<re_irc> <@whitequark:matrix.org> forgive my horrific bastard assembly
<re_irc> <@dirbaio:matrix.org> it works only with "nested" preemption like interrupts, not arbitrary RTOS context switching
<re_irc> <@dirbaio:matrix.org> unless the RTOS writes a magic "abort all transactions" bit on context switch 😂
<re_irc> <@dirbaio:matrix.org> my idea works only with "nested" preemption like interrupts, not arbitrary RTOS context switching
<re_irc> <@whitequark:matrix.org> : I have ideas related to that, but let's process things one byone
<re_irc> <@whitequark:matrix.org> * by one
<re_irc> <@dirbaio:matrix.org> doing "force abort transaction" like that would suck for "forward progress" guarantees and for multicore
<re_irc> <@dirbaio:matrix.org> wow this is a hard problem 😂
<re_irc> <@jamesmunns:beeper.com> : So (my asm is bad):
<re_irc> - read ID from TXCTL
<re_irc> - write ID back to TXCTL
<re_irc> - do writes
<re_irc> - read TXCTL
<re_irc> - if old and new read match, success
<re_irc> ?
<re_irc> <@whitequark:matrix.org> : if new read is the same but has TXDONE set, success
<re_irc> <@whitequark:matrix.org> "(old^new)==TXDONE"
<re_irc> <@jamesmunns:beeper.com> so WRITES don't incr TXCTL, but reads do?
<re_irc> <@jamesmunns:beeper.com> ahhh
<re_irc> <@jamesmunns:beeper.com> how do you tell the difference between "read to start" and "read to check"?
<re_irc> <@whitequark:matrix.org> : yes. read from TXCTL is "ROLLBACK; BEGIN TRANSACTION x;"; write is "COMMIT TRANSACTION x;"
<re_irc> <@jamesmunns:beeper.com> like if you got interrupted and the interrupt does a read to start?
<re_irc> <@whitequark:matrix.org> if the interrupt does a read to start, TXDONE would be 0
<re_irc> <@whitequark:matrix.org> TXID flow is like... "TXID=n --(read TXID)--> TXID=n+1,TXDONE=0,return {TXDONE,TXID} --(write TXID=n+1)--> TXID=n+1,TXDONE=1,return {TXDONE,TXID}"
<re_irc> <@jamesmunns:beeper.com> if you separate TXDONE and TXCTL, I think you could do it without XOR?
<re_irc> write d0
<re_irc> write d1
<re_irc> r0 = TXCTL ; 0x42
<re_irc> write d2
<re_irc> TXCTL = r0 ;
<re_irc> r1 = TXDONE ; read the last completed transaction ID
<re_irc> if r0 == r1 success
<re_irc> <@jamesmunns:beeper.com> so like, a successful write latches the last ID into TXDONE
<re_irc> <@whitequark:matrix.org> the XOR is just there because TXDONE is the highest bit of TXID
<re_irc> <@whitequark:matrix.org> I'm too used to hyperoptimizing assembly, it isn't load bearing
<re_irc> <@whitequark:matrix.org> think of it as a mask+and but shorter
<re_irc> <@whitequark:matrix.org> * mask+compare
<re_irc> <@jamesmunns:beeper.com> yeah, I just don't grok what the values would be for the interrupted and interrupter code
<re_irc> <@whitequark:matrix.org> oh you're right, that's too many side effects stuffed into one register
<re_irc> <@whitequark:matrix.org> yes I agree they must be separate
<re_irc> <@jamesmunns:beeper.com> I _think_ my approach is workable?
<re_irc> <@whitequark:matrix.org> yes that will work
<re_irc> <@whitequark:matrix.org> and it's actually very cheap to implement
<re_irc> <@jamesmunns:beeper.com> it does require 2x regs of "max transaction counter bits", and a counter, but doesn't require a swap or anything fancy
<re_irc> <@whitequark:matrix.org> that costs ~nothing
<re_irc> <@whitequark:matrix.org> even for a 32 bit counter
<re_irc> <@jamesmunns:beeper.com> maybe you can choose how much ABA insurance you want? lol
<re_irc> <@jamesmunns:beeper.com> yeah, 32 bits of ABA would mean you had to be interrupted for 4 billion other transactions to get a false positive lol
<re_irc> <@jamesmunns:beeper.com> even 255 seems EXCEEDINGLY unlikely unless something has gone terribly wrong
<re_irc> <@jamesmunns:beeper.com> So TXCTL gets incremented on any non-sequential writes to the block, or any time it is read
<re_irc> <@jamesmunns:beeper.com> and the values only get latched from the shadow reg to the real block when writing to done when txdone == txctl
<re_irc> <@whitequark:matrix.org> if you keep it to 255, you get the benefit of being able to use 8-bit CPUs
<re_irc> <@jamesmunns:beeper.com> I think it's reasonable to just say that the width of txctl/done could match the data bus width? or thats the default at least?
<re_irc> <@jamesmunns:beeper.com> (I have no good understanding of cost nor "reasonable cost" here, take my opinion with all of the salt)
<re_irc> <@whitequark:matrix.org> normally the peripheral doesn't care too much about the data bus width, but I guess the transaction machinery does
<re_irc> <@whitequark:matrix.org> the cost of all of this is ~negligible
<re_irc> <@whitequark:matrix.org> if you keep it to a 8-bit counter, you can forget about it no matter the process or frequency
<re_irc> <@whitequark:matrix.org> 64-bit counter that can get incremented on every cycle can be more tricky sometimes
<re_irc> <@whitequark:matrix.org> especially on FPGAs
<re_irc> <@jamesmunns:beeper.com> from a security perspective, 256 bytes could make it easy to force a false positive, just read TXCTL 255 times. But that's probably not anybody's threat model :p
<re_irc> <@jamesmunns:beeper.com> err, 8 bits of counter
<re_irc> <@whitequark:matrix.org> a single CSR block is not a security boundary, no
<re_irc> <@whitequark:matrix.org> anyway, there's something more interesting here
<re_irc> <@whitequark:matrix.org> so when making this machinery, we have decided to pay / assumed paid the cost of having full coverage of shadow registers (as opposed to "pay as you go"). this is still very cheap on newer ASIC processes
<re_irc> <@whitequark:matrix.org> now... until now, transactions spanned one register block. however why not make them global?
<re_irc> <@whitequark:matrix.org> in fact, why not add "transaction channels" that can be allocated to e.g. RTOS tasks that let them atomically commit arbitrarily large updates to the SoC state instantly (+ CDC latency if you have multiple clocks)
<re_irc> <@jamesmunns:beeper.com> oh that is EXTREMELY cool.
<re_irc> <@whitequark:matrix.org> instead of TXCTL/TXDONE, you would have a TXCHAN register, which selects one of the global transaction coordinators. the per-peripheral cost is extremely low, bordering on negligible, and can be made so low (with a latency tradeoff) that you could have thousands of channels
<re_irc> <@whitequark:matrix.org> each channel has TXCTLn/TXDONEn with the same semantics, but instead the writes are queued and then committed/aborted for every peripheral where TXCHAN=n
<re_irc> <@whitequark:matrix.org> you know the complex stuff STM32 has to coordinate timers and ADCs or DACs so you can do updates in phase?
<re_irc> <@whitequark:matrix.org> you can throw all of that out, now it's just memory writes
<re_irc> <@jamesmunns:beeper.com> would that still work with RMW approaches?
<re_irc> <@whitequark:matrix.org> elaborate?
<re_irc> <@jamesmunns:beeper.com> like:
<re_irc> - task1: transaction 1 starts
<re_irc> - task1: transaction 1 ends
<re_irc> - task0: read current state
<re_irc> - task0: transaction 2 starts
<re_irc> - task0: transaction 2 ends
<re_irc> <@dirbaio:matrix.org> iiuc it's "atomically write N regs", not "atomically read modify write"
<re_irc> <@whitequark:matrix.org> nothing stops you from capturing the state of every associated peripheral when you start a transaction
<re_irc> <@jamesmunns:beeper.com> like, the transactions are still atomic and serialized, but if someone gets there between read and write, it could be applying "stale" modifications, undoing something that happened in transaction 1
<re_irc> <@jamesmunns:beeper.com> yeah, that's true
<re_irc> <@whitequark:matrix.org> so that the reads give you the data at the time of transaction start
<re_irc> <@jamesmunns:beeper.com> if you have a shadow register, you could latch IN on transaction start
<re_irc> <@dirbaio:matrix.org> : yea, but that still doesn't give you atomic RMW if another transcation commits in between of your transaction start + commit
<re_irc> <@whitequark:matrix.org> there's two, one for reads and one for writes (since the read and write halves don't really have to refer to the same storage)
<re_irc> <@whitequark:matrix.org> : atomic RMW works internally by asserting a lock on the bus
<re_irc> <@jamesmunns:beeper.com> yeah, if multiple transactions overlap "scope", I feel like they need to invalidate each other?
<re_irc> <@whitequark:matrix.org> it's a SoC-level critical section basically
<re_irc> <@jamesmunns:beeper.com> (I think this is all sound for "one global channel", applied atomically, I'm not so sure about N channels of serializable transactions)
<re_irc> <@jamesmunns:beeper.com> but maybe I don't understand that part?
<re_irc> <@jamesmunns:beeper.com> like if you start a database transaction, read a bunch of rows, then someone else starts one, modifies those rows, and then finishes the transaction, your first transaction is now wrong
<re_irc> <@whitequark:matrix.org> a single peripheral can only belong to one channel, so you basically partition your system into independent parts
<re_irc> <@whitequark:matrix.org> this is good for ensuring you e.g. toggle GPIOs in multiple banks in phase alignment, or start a bunch of timers and ADCs at the same time
<re_irc> <@jamesmunns:beeper.com> so starting the second one either must invalidate the first, or the second one must be blocked until the first one completes (I think?)
<re_irc> <@whitequark:matrix.org> it is not "true" database transactions so much as them being a convenient vehicle for explanation
<re_irc> <@jamesmunns:beeper.com> yeah, but I was applying the metaphor :D
<re_irc> <@whitequark:matrix.org> anyway, a task can spin on while (TXCTL!=TXDONE);
<re_irc> <@jamesmunns:beeper.com> I think the same thing is true for "N channels", which is more like "how many database connections can you have open at once" (in this metaphor)
<re_irc> <@jamesmunns:beeper.com> Yeah, that's true!
<re_irc> <@jamesmunns:beeper.com> I think "one pair of global TXCTL + TXDONE" with "N channels of shadow registers" should still work?
<re_irc> <@whitequark:matrix.org> : however here peripherals are statically allocated to a connection, unlike in databases where rows are locked
<re_irc> <@whitequark:matrix.org> : now THAT is incredibly expensive
<re_irc> <@jamesmunns:beeper.com> maybe I'm not following your N channel explanation then :)
<re_irc> <@jamesmunns:beeper.com> (I get the "synced changes to multiple reg blocks" goal, but not how you got there, I guess)
<re_irc> <@whitequark:matrix.org> so each channel has a START and DONE strobes, for n channels, you have START[0..n-1] and DONE[0..n-1]
<re_irc> <@whitequark:matrix.org> when you set TXCHAN=n, you pick STARTn to capture values into read shadow register and DONEn to apply values from write shadow register
<re_irc> <@whitequark:matrix.org> so that's 2*(n wires) per transaction channel plus 2*(n:1 multiplexer) per peripheral
<re_irc> <@whitequark:matrix.org> which is a low to moderate cost if your n stays under, say, 32
<re_irc> <@whitequark:matrix.org> * 16
<re_irc> <@jamesmunns:beeper.com> So if CHANNEL1 wants to modify ADC0 and DAC1, and CHANNEL2 wants to modify ADC0 and I2S0, how do you keep channel1 and channel2 from stepping on each-others toes?
<re_irc> <@whitequark:matrix.org> this is impossible to express in this system
<re_irc> <@jamesmunns:beeper.com> (where "modify" can be all/any of RMW)
<re_irc> <@jamesmunns:beeper.com> gotcha
<re_irc> <@jamesmunns:beeper.com> (that's what I was trying to figure out how you handled, which wasn't what you were trying to handle)
<re_irc> <@whitequark:matrix.org> only a single owner can control a peripheral. this is an existing logical requirement in Amaranth SoC, and with transaction channels it will be a physical requirement too
<re_irc> <@whitequark:matrix.org> well, you basically give ownership to a transaction channel
<re_irc> <@jamesmunns:beeper.com> ohhhhhh, so it's more about sharing shadow registers between periphs?
<re_irc> <@jamesmunns:beeper.com> instead of one shadow register dedicated to each periph?
<re_irc> <@whitequark:matrix.org> nope, it's about being able to do an atomic write across multiple peripherals, vs only within a single periphera
<re_irc> <@whitequark:matrix.org> * peripheral
<re_irc> <@whitequark:matrix.org> sharing shadow registers will ... well, not work. it will be more costly than not sharing them (in routing)
<re_irc> <@whitequark:matrix.org> in general, flops are cheap, unless you start building entire memories out of them
<re_irc> <@whitequark:matrix.org> adding 2 flops to every 1 flop with minimal addressing logic: extremely cheap
<re_irc> <@whitequark:matrix.org> adding 16 flops to every 1 flop: quite expensive
<re_irc> <@whitequark:matrix.org> +and mostly because of non-flop things you now need to add too
<re_irc> <@jamesmunns:beeper.com> gotcha... I don't totally grok how you achieve the "apply to many" part, or how the channels help, but that's okay.
<re_irc> <@jamesmunns:beeper.com> If you have a clear picture, that matters more :D
<re_irc> <@whitequark:matrix.org> so the core idea here is that we queue writes and then apply them in bulk on some strobe, right?
<re_irc> <@jamesmunns:beeper.com> yep!
<re_irc> <@whitequark:matrix.org> with the original TXCTL/TXDONE discussion, this strobe was generated by a simple per-peripheral transaction state machine
<re_irc> <@jamesmunns:beeper.com> with you so far!
<re_irc> <@whitequark:matrix.org> but having it per-peripheral is limiting. we can take the exact same TXCTL/TXDONE thing and make it SoC-global
<re_irc> <@jamesmunns:beeper.com> yup! With you so far ("one global txctl/txdone" makes sense to me)
<re_irc> <@whitequark:matrix.org> so there's now one instead of Nperipheral copies of it
<re_irc> <@jamesmunns:beeper.com> okay this is where you lose me!
<re_irc> <@jamesmunns:beeper.com> where does "N pairs" help where "one global pair" doesn't?
<re_irc> <@whitequark:matrix.org> in your firmware, you would normally split responsibilities between tasks, right?
<re_irc> <@whitequark:matrix.org> one handles GPIO bitbang, another handles ADC readout
<re_irc> <@whitequark:matrix.org> so let's say you want to bitbang GPIOA/GPIOB in sync, and sample ADC0 and ADC1 in sync
<re_irc> <@whitequark:matrix.org> if you have a single pair, now you've got contention
<re_irc> <@whitequark:matrix.org> if you have two (for the sake of example, connected to all GPIOs and all ADCs) you don't have it anymore
<re_irc> <@jamesmunns:beeper.com> gotcha, so it DOES let you batch writes to M registers, but DOESN'T help if there is contention where multiple channels want to touch the same registers?
<re_irc> <@jamesmunns:beeper.com> so you can have N concurrent batches firing, but they MUST NEVER touch the same peripheral at the same time?
<re_irc> <@whitequark:matrix.org> yeah
<re_irc> <@jamesmunns:beeper.com> cool! The database metaphor was wanting for a mechanism to detect when channel 3 touches something already touched by channel 2
<re_irc> <@whitequark:matrix.org> so that's actually also doable, fairly cheaply
<re_irc> <@jamesmunns:beeper.com> (so channel 3 or channel 2's commit fails deterministically)
<re_irc> <@jamesmunns:beeper.com> (either, the latecomer fails, or the interrupted transaction fails)
<re_irc> <@whitequark:matrix.org> if your RTOS writes the TXCHAN for the current task to some register, it's possible to e.g. turn writes to peripherals assigned to another TXCHAN into hard faults
<re_irc> <@jamesmunns:beeper.com> so each channel would need to "claim" each block, and a claim of a claimed block would raise an exception?
<re_irc> <@whitequark:matrix.org> rather a register block would be assigned to a channel (the inverse relationship works much better hardware wise)
<re_irc> <@jamesmunns:beeper.com> okay, so something like:
<re_irc> ADC0_CHAN = 4 ; faults if chan wasn't "idle" before
<re_irc> r0 = TX_CTL_C4;
<re_irc> DAC1_CHAN = 4 ; faults if chan wasn't "idle before
<re_irc> ...
<re_irc> TX_CTL_C4 = r0;
<re_irc> r1 = TX_DONE_C4; ; maybe this also releases claim automatically?
<re_irc> if r0 == r1 => success
<re_irc> <@whitequark:matrix.org> I was thinking of a different use pattern but I think that would work fine with my proposa
<re_irc> <@whitequark:matrix.org> * proposal
<re_irc> <@jamesmunns:beeper.com> That makes sense!
<re_irc> <@jamesmunns:beeper.com> I think this does mean that if you tried to touch the same periph in an interrupt, you just immediately fault now tho lol
<re_irc> <@jamesmunns:beeper.com> even if you were nice and used a different channel
<re_irc> <@jamesmunns:beeper.com> so uh, the transaction stuff is maybe not needed anymore?
<re_irc> <@jamesmunns:beeper.com> you now have like mutexed locked owners of peripheriphs
<re_irc> <@jamesmunns:beeper.com> (you do still get "apply to many at once" behavior, but not "failable transactions" to the same block)
<re_irc> <@whitequark:matrix.org> without the transaction stuff you can't assign several things simultaneously
<re_irc> <@jamesmunns:beeper.com> I mean, by setting the block's channel, you are essentially mutex locking it anyway
<re_irc> <@whitequark:matrix.org> yea they don't need to be fallible anymore, could reduce it all to one bit PENDING, capture on rising, apply on falling
<re_irc> <@whitequark:matrix.org> you could also ditch the hardware completely now and have eg 4 billion channels
<re_irc> <@whitequark:matrix.org> just broadcast the transaction number and acquire/release bit across the entire SoC
<re_irc> <@whitequark:matrix.org> this is actually getting closer to how NoCs work
<re_irc> <@jamesmunns:beeper.com> Yeah, I think it's a sideways step between "failable transactions on a peripheral" and "global application to many peripherals"
<re_irc> <@jamesmunns:beeper.com> like, not better or worse, just two different (mostly) unrelated features :D
<re_irc> <@whitequark:matrix.org> they're related in that they all involve "grouping of peripherals"
<re_irc> <@whitequark:matrix.org> which is basically just hardware tracking of ownership or its aspects
<re_irc> <@jamesmunns:beeper.com> That makes sense!
<re_irc> <@jamesmunns:beeper.com> Thanks for all the explanation, it makes a lot more sense now what your goal(s) are.
Socker has quit [Ping timeout: 260 seconds]
Socker has joined #rust-embedded
<re_irc> <@lambdafriend:matrix.org> I'm working through how to provide my embedded code (STM32H7xx fwiw) with a chunk of sdram that I can divy up and use as buffers in various structs. The allocations does not change during the life of the program.
<re_irc> #[link_section = ".sdram"]
<re_irc> static mut MEMORY: [f32; MAX_SIZE] = [0.0; MAX_SIZE];
<re_irc> #[no_mangle]
<re_irc> ...elsewhere
<re_irc> let memory: &'static mut [f32; MAX_SIZE] = unsafe { &mut MAX_SIZE };
<re_irc> // partition memory into N sized mutable slices for use in N structs
<re_irc> The issue I am facing specifically is that "array::from_fn" impls "FnMut" and so mutable slice references into "memory" created within the closure do not live past that. However, I know it does (at least I think it does, right?).
<re_irc> I think what I'm wanting is an allocator, but really my use case is really straight forward.
<re_irc> Does anyone have any advice on how to do this?
<re_irc> <@dirbaio:matrix.org> you want an array of "&'static mut [f32]"?
<re_irc> <@dirbaio:matrix.org> slicing with "&mut memory[a..b]" doesn't work because the borrows will be shorter
<re_irc> <@dirbaio:matrix.org> you can split a slice without downgrading lifetimes with ".split_at_mut()"
<re_irc> <@dirbaio:matrix.org> let mut memory: &'static mut [f32] = unsafe { &mut MAX_SIZE };
<re_irc> let mut chunks: [&'static mut [f32]; 16] = [&mut []; 16];
<re_irc> let (a,b) = m.split_at_mut(1024); // split it. this gives two `&'static mut [f32]`. Same lifetime!
<re_irc> for i in 0..16 {
<re_irc> let m = mem::take(&mut memory); // take out memory, leaving an empty slice behind.
<re_irc> chunks[i] = a; // the 1024 sample chunk.
<re_irc> memory = b; // put back the rest of the memory so we can take another chunk out in the next iteration.
<re_irc> }
<re_irc> <@dirbaio:matrix.org> the reason "split_at_mut()" works is it guarantees the slices are not overlapping, while "&mut memory[a..b]" doesn't.
<re_irc> <@jamesmunns:beeper.com> (fwiw: I'd probably recommend using something like "singleton!()" or "StaticCell" instead of a static mut, because if you ever make some code reentrant, then you'll at least get a panic or something)
<re_irc> <@jamesmunns:beeper.com> then you can just put the "static allocations" locally, instead of having to chunk them out with a bunch of unsafe code at runtime.
<re_irc> <@jamesmunns:beeper.com> I also am unreasonably biased about "static mut"s existing anywhere though, so take it for what it's worth :D
<re_irc> <@dirbaio:matrix.org> btw if the chunk sizes are known at compile time you can let the linker lay them out for you. Just make separate statics for each.
<re_irc> <@jamesmunns:beeper.com> I think the "static_alloc" crate implements a basic bump allocator, if you ever need to do it at runtime
<re_irc> <@jamesmunns:beeper.com> You can use it as a global allocator, or just for doing "alloc at ~start" type stuff.
<re_irc> <@lambdafriend:matrix.org> Fantastic! I appreciate all the advice. I'm going to work through the options and see what sticks. 🙌
Socker has quit [Ping timeout: 250 seconds]
Socker has joined #rust-embedded
Socker has quit [Ping timeout: 258 seconds]
Socker has joined #rust-embedded
<re_irc> <@lambdafriend:matrix.org> : This worked verbatim! 🙌
Socker has quit [Ping timeout: 250 seconds]
Socker has joined #rust-embedded
Sockeee has joined #rust-embedded
Socker has quit [Ping timeout: 240 seconds]
jsolano has joined #rust-embedded
<re_irc> <@dkhayes117:matrix.org> I'm using a crate that expects a Vec<u8> for a token. I will also store the token for comparison later. What is the smartest type to store it as? When I compare it I will be using a "get_token" method which returns a "&[u8]".
<re_irc> <@dkhayes117:matrix.org> * it,
<re_irc> <@dkhayes117:matrix.org> I feel like that was a dumb question? idk, I just made it a [u8;8] in my struct.
IlPalazzo-ojiisa has quit [Quit: Leaving.]