rob5225 has quit [Remote host closed the connection]
rob5225 has joined #rust-embedded
starblue has quit [Ping timeout: 256 seconds]
starblue has joined #rust-embedded
M9names[m] has quit [Quit: Idle timeout reached: 172800s]
TommyG[m] has quit [Quit: Idle timeout reached: 172800s]
dutchkin has joined #rust-embedded
crabbedhaloablut has joined #rust-embedded
elpiel[m] has quit [Quit: Idle timeout reached: 172800s]
dutchkin has quit [Ping timeout: 255 seconds]
dutchkin has joined #rust-embedded
henrik_alser[m] has quit [Quit: Idle timeout reached: 172800s]
Ralph[m]1 has quit [Quit: Idle timeout reached: 172800s]
IlPalazzo-ojiisa has joined #rust-embedded
IlPalazzo-ojiisa has quit [Quit: Leaving.]
IlPalazzo-ojiis1 has joined #rust-embedded
starblue has quit [Ping timeout: 256 seconds]
<JamesMunns[m]>
<lkostka[m]> "But it will give me 5 bytes..." <- In postcard, **arrays** and **slices** are serialized differently. An array of a fixed size, like `[u8; 4]`, will not have a length prefix, while `&[u8]` will.
<JamesMunns[m]>
I can't remember seeing that other places tho, maybe dirbaio knows
<diondokter[m]>
The thing is that I've been working on `sequential-storage` a lot and added a bunch of CRC protection and recovery.
<diondokter[m]>
But there's an edge case where some corruption has been detected, and the last item written by the user contains all 0xFF data. I can't detect that I've already written there so there's the possibility of getting three writes there
<JamesMunns[m]>
You could skip writes of 0xFF, which I don't know if they even count lol
<diondokter[m]>
Idk haha
<JamesMunns[m]>
Yeah, ran into that in a recent project, where I built a (really basic) flash simulator for testing, which refused any multiple writes to the same byte, but I was wondering if 0xFF writes count because if the controller is smart, that's a no-op basically
<diondokter[m]>
I can't skip, because that is implemented for any nor flash, not just multiwrite norflash. The flash might do some ECC stuff to all 0xFF still
<JamesMunns[m]>
ahh, that's a point. This chip wasn't ECC iirc
<JamesMunns[m]>
Quick, write a blog post making strong claims like "flash chips can do this and they can't do that", and wait for someone to come correct you
dutchkin has left #rust-embedded [Konversation terminated!]
<JamesMunns[m]>
call it something like "the objective 10% irrefutable truth of flash chips I am very smart you are not"
<JamesMunns[m]>
s/10/100/
<diondokter[m]>
"Flash is much simpler than you thought"
<JamesMunns[m]>
lol
<JamesMunns[m]>
"flash is simple get gud"
<diondokter[m]>
"Just write some bytes, bro"
<diondokter[m]>
More seriously, I added a CRC purely for the length. But a 16-bit CRC for a 16-bit length is way overkill. Anybody know of a good error correcting crate for small data?
<JamesMunns[m]>
Passing thought, I'd love to figure out how to promote a blog post push in December, to get more people to write more things about either stuff they DO know, or learned about this year, or for questions like this, like "I know X, Y, Z; but haven't found an answer for A, B, C"
<JamesMunns[m]>
diondokter[m]: honestly, just write the value twice?
<Darius>
lol I was just about to say that
<JamesMunns[m]>
like, if you're gunna have 100% overhead, just write the data twice
<diondokter[m]>
JamesMunns[m]: But which is the correct one then?
<JamesMunns[m]>
if they disagree they are both wrong and it's garbo.
<diondokter[m]>
I'd like some error correction :D
<Darius>
you will know there is an error at least
<JamesMunns[m]>
ah, you want correction, not detection :D
<Darius>
yu need bigger blocks for error correction AFAIK
<JamesMunns[m]>
CRC doesn't give you correction either, I think?
<diondokter[m]>
No, only detection
Guest7221 has joined #rust-embedded
<diondokter[m]>
There is hamming code for example
<diondokter[m]>
Hamming is pretty much adding an awful lot of parity bits
<JamesMunns[m]>
maybe there's a fancy way to do it, but not sure. adamgreig has shown me cool things like raptor codes for FEC, but I think you'll need more space, it'd be better over a whole block or something
<JamesMunns[m]>
yeah, you could probable grey-code the 16-bit value into a 32-bit symbol, but they'd still be weak to an error run
<diondokter[m]>
Yeah, maybe it's not worth it
<JamesMunns[m]>
sentences from wikipedia that sound unhinged without context:
<JamesMunns[m]>
> Snake-in-the-box codes, or snakes, are the sequences of nodes of induced paths in an n-dimensional hypercube graph, and coil-in-the-box codes,[73] or coils, are the sequences of nodes of induced cycles in a hypercube.
<adamgreig[m]>
<diondokter[m]> "More seriously, I added a CRC..." <- If you just want detection you can easily 4 or 5 or 8 bit crc with very good detection properties for 16 bit data, but correction is harder... hamming code is probably simplest for this small a block
<adamgreig[m]>
I wonder if you could make a custom ldpc code
<adamgreig[m]>
Probably be quite fun. Really depends on your expected error model though because probably if anything goes wrong the whole word is unreadable
<diondokter[m]>
Errors could be bitflips due to radiation or bad SPI signal, but also burst errors
<diondokter[m]>
Thing is that I don't know much about this field except for the computerphile videos about hamming codes that I watched years ago
<JamesMunns[m]>
I feel like you could get a lot of error recovery out of (32, 16) coding, since only 1/65536 patterns are actually valid
<diondokter[m]>
That's my feeling as well
<JamesMunns[m]>
but a 4 byte error, or more specifically, "you just didn't do the (full) write" is more of your burst error case
<JamesMunns[m]>
basically, have a header/footer that describes the payload itself, then error detect/correct the header separately from the body itself
<diondokter[m]>
Yeah, an item is now stored as 'length, length crc, data crc, data'
<JamesMunns[m]>
yeah, I mean you might as well also check the data crc, because that could get corrupted too
<diondokter[m]>
But I also need to think about performance because flash can be very slow
<diondokter[m]>
I can't be reading all data all the time
<JamesMunns[m]>
im sorta thinking on this topic atm, for postcard-rpc, since that adds a header, and I'm thinking about how to:
<JamesMunns[m]>
* error detect header issues
<JamesMunns[m]>
* error detect body issues
<diondokter[m]>
Detection can just be adding a sufficiently large CRC
<JamesMunns[m]>
and also sorta thinking about if there are better ways to do framing than cobs, as well
<JamesMunns[m]>
tho, cobs is hard to beat, in terms of complexity:effectiveness ratio :p
<JamesMunns[m]>
but maybe wire encoding is sufficiently different than disk encoding, at least if you want to be able to read in a memory mapped kinda way, where you can't use encoding that changes the "byte stream"
<diondokter[m]>
Really depends on what your requirements are. On a reliable (like tcp) connection you can get away with encoding the length. But with unreliable (like uart) that's insufficient
<JamesMunns[m]>
Tho the thought of doing cobs-on-flash with 0xFF as the sentinel byte is funny to me
<JamesMunns[m]>
then blank flash is just all "end of frame" bytes :p
<diondokter[m]>
Yeah with wire encoding you get everything in ram
<diondokter[m]>
Already kinda doing that by having the requirement that the header must not be all 0xFF so I can easily detect if there's a header or not
<JamesMunns[m]>
diondokter[m]: for the record, I wouldn't call tcp reliable enough for that. I'd only consider tcp+tls enough personally
<diondokter[m]>
JamesMunns[m]: Hmmm... because of man in the middle attacks?
<JamesMunns[m]>
> Traces of Internet packets from the past two years show that between 1 packet in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on links where link-level CRCs should catch all but 1 in 4 billion errors. For certain situations, the rate of checksum failures can be even higher: in one hour-long test we observed a checksum failure of 1 packet in 400. We investigate why so many errors are observed, when link-level CRCs
<JamesMunns[m]>
should catch nearly all of them.
<JamesMunns[m]>
TCP checksums are just not very strong
<diondokter[m]>
Ah ok
<diondokter[m]>
Again, depends on your requirements :P
<JamesMunns[m]>
> After an analysis we conclude that the checksum will fail to detect errors for roughly 1 in 16 million to 10 billion packets
<JamesMunns[m]>
so yeah, tcp might be strong enough, but the chance of desync or corruption IMO isn't great, once you hit some kind of scale.
<JamesMunns[m]>
but in TLS if you get corruption the whole thing fails to decrypt and you get a terminated connection instead of silent corruption/desync
<JamesMunns[m]>
UDP and TCP just use a simple crc16, iirc
<JamesMunns[m]>
I wonder tho, what's your plan if you detect and correct an error? just leave the partially corrupted data on disk and keep on moving silently?
<diondokter[m]>
So far it's being ignored
<JamesMunns[m]>
not saying that's unreasonable, but (how) do you surface that to the caller?
<JamesMunns[m]>
like "hey somethings weird but not too weird?", then what do *they* do about that?
<diondokter[m]>
Mostly to make sure not everything is corrupted and we can move on
<JamesMunns[m]>
yeah, I guess you don't do bad block detection/avoidance? It's probably hard to tell if it was just a failed write due to power loss, or actual failure of bits on the flash
<diondokter[m]>
Previously when something happened it would return a corrupted error.
<diondokter[m]>
The only action then is to erase everything and the user would lose all data.
<diondokter[m]>
I want to make that better so the user only loses the data that is actually corrupted
<diondokter[m]>
What I could do is still return corrupted data, but with it marked as being corrupted
<diondokter[m]>
That way the user can still try to recover some of it
<diondokter[m]>
Just FYI, the error correction is just for the header length. Because if that breaks recovery is much more difficult and bad for performance
<JamesMunns[m]>
I see what you mean wrt the len being most important data, even more than the payload crc itself, because then you can still recover with only losing one block
<JamesMunns[m]>
you could put a "footer sentinel" at the end of the block, so that you can read/check/recover the len, but still validate it points to a valid "end of frame", but then I guess "what do you do if the end of frame sentinel was corrupted". hmm.
<diondokter[m]>
I can still recover right now by reading up to the corrupted block and then trying to interpret the next flash word as the start of the next block. However, this is awfull for performance
<JamesMunns[m]>
Yeah, I guess there's no need for a footer sentinel if you just see if there is a reasonable header (or blank I guess?) right after the block, but then yeah, you need to read the next header too.
<JamesMunns[m]>
crc32-cksm is almost certainly "good enough", if you can afford 4 bytes of overhead
<JamesMunns[m]>
there are likely more efficient choices if you know the length of the data is relatively small for lorawan, where you might only have 1-15 byte messages or something
<mameluc[m]>
hmm good point
<JamesMunns[m]>
doesn't lorawan also have a protocol level crc check? I have to look
<mameluc[m]>
come to think about it, lorawan already is checksummed in some way. I have to check that out also π
<JamesMunns[m]>
yeah
<mameluc[m]>
yes, don't know why I didn't remember that
<diondokter[m]>
LoRaWAN has a Message Integrity Code (MIC) in its payload format
<diondokter[m]>
You don't really need an additional CRC indeed
<mameluc[m]>
so then only the flash portion should be checked, there I can afford the overhead
<JamesMunns[m]>
diondokter[m]: (unless you want stronger guarantees than lorawan provides)
<diondokter[m]>
JamesMunns[m]: Haha, the answer is always 'it depends' :P
<mameluc[m]>
diondokter[m]: thanks for looking it up
<JamesMunns[m]>
and lorawan messages are encrypted as well, right?
<diondokter[m]>
I've worked with LoRaWAN a lot in the past (even made a fully certified stack) and it's never been an issue
<diondokter[m]>
Yes
<diondokter[m]>
Doubly encrypted even
<JamesMunns[m]>
Yeah, I'd probably say you can trust lorawan frames are "valid if received, or not received at all" at the application layer, IIRC
<diondokter[m]>
Kind of
<diondokter[m]>
diondokter[m]: kind of
<JamesMunns[m]>
So for lorawan, the stack will tell you the message len, so I'd probably just use regular postcard "from bytes" on that for received, and "to slice" for sending
<JamesMunns[m]>
for flash, you could just use the crc32 flavor, which I think uses crc32-cksm
<JamesMunns[m]>
ah no, you have to give it a digest
<JamesMunns[m]>
so it'll use whatever crc32 flavor you give it
<mameluc[m]>
<diondokter[m]> "I've worked with LoRaWAN a lot..." <- did you do it in rust? I have dabbled with lorawan but I copied a lot from rust-lorawan and added maccommands and some 1.0.4 stuff
<JamesMunns[m]>
`crc::CRC_32_CKSUM` is the polynomial used by:
<JamesMunns[m]>
> ISO 3309 (HDLC), ANSI X3.66 (ADCCP), FIPS PUB 71, FED-STD-1003, ITU-T V.42, ISO/IEC/IEEE 802-3 (Ethernet), SATA, MPEG-2, PKZIP, Gzip, Bzip2, POSIX cksum,[52] PNG,[53] ZMODEM, many others
<diondokter[m]>
mameluc[m]: No, that was in the period of 2017 to 2019 in C++. I only started with Rust near the end of 2019
<diondokter[m]>
Yeah we had a discussion of this some days ago. The result was that it was likely best that if you have a forbidden value (0 in my case), just detect that and change it to a non-forbidden value
<JamesMunns[m]>
but honestly IMO even the baseline 1/2^32 guarantee is strong enough for most cases
<diondokter[m]>
JamesMunns[m]: The zoo page does that too with the numbers in-between the `{ .. }`
<mameluc[m]>
I basically save the security context of lorawan to flash. Should probably encrypt it anyway for production and just start over if it is invalid. I would get a invalid security context warning from my network and I would have to reset the security context there if there was a problem
<JamesMunns[m]>
encryption usually includes a MAC, which also gets you (often much stronger) baseline guarantees, since the signatures are (much) longer than 32 bits, but are less "efficient", in that they typically have lower HD qualities
<JamesMunns[m]>
but yeah, if you chacha-poly1305 the contents onto external flash, I'd probably consider that good enough where I wouldn't add a crc32 inside the encrypted block
<mameluc[m]>
yeah there is not really anything I can do if the flash content is invalid. Would need to implement some kind of error correction or something but at that point I have bigger problems I guess
<JamesMunns[m]>
yep, detect+fail is often a very very reasonable choice
<JamesMunns[m]>
error correction is more important when you CAN'T re-request the data, or the cost is astronomically large
<mameluc[m]>
I only write on join and that should happen mostly on power on. Only other time is when the fcnt rolls over and that is 32 bit. Point being I should not burn trough the flash
<mameluc[m]>
that stuff could be stored in sram2 but then Id need a supercap or something to keep it powered when the battery is removed, too much hassle
<diondokter[m]>
mameluc[m]: It's good practice IMO to join once in a while, like maybe every 100 days
<mameluc[m]>
diondokter[m]: good point, I should do that
<JamesMunns[m]>
at least w/ chacha8poly1305, I got pretty good speeds on a 64mhz nrf52
<mameluc[m]>
the specs for incresing the power level for lorawan in case of connection lost is quite strange and slow. While sending once per day it would take like 30 days to restore connection in worst case if I understood it correctly
<JamesMunns[m]>
for stuff you almost never read/write, you could probably use the full chacha20 instead of chacha8, the poly1305 is probably the most expensive part anyway
<JamesMunns[m]>
the "too much crypto" paper said chacha8 was probably "good enough", but ymmv.
<diondokter[m]>
Ha, not sure how snow affects LoRa :P
<mameluc[m]>
depends on how many meters of snow there is ;)
<JamesMunns[m]>
ah is this your "sensor in a hole in the ground" project?
<mameluc[m]>
yes kind of, it is for septic tanks out in peoples yards. The device will be above ground
<diondokter[m]>
Dang, here I always hope for 2 snow days with like 5-10 cm of snow. I almost can't imagine a meter or 2 of snow
<mameluc[m]>
(even though the bosses and the clients don't want something sticking up but reality is quite persistent in its views on how stuff should work)
<diondokter[m]>
mameluc[m]: The company I worked for has devices inside underground metal trash containers. Still works!
<diondokter[m]>
* Still works (with ADR)!
<JamesMunns[m]>
btw, chacha20-poly1305 is specifically an AEAD algorithm, so more comparable to AES-(SIV|GCM|GCM-SIV): https://github.com/RustCrypto/AEADs
<mameluc[m]>
diondokter[m]: lora be crazy some times. Yesterday I tested my device in a cellar that is under ground and that only worked on full power and the network wanted to up the nbtrans (redundant sends).
<mameluc[m]>
so yes I got the data trough but would drain the battery fast
<mameluc[m]>
but the roof was metal and the walls had rebar so not exactly a good case
<diondokter[m]>
Turn it around though, I think it's really impressive it still manages to work in that situation!
<JamesMunns[m]>
<mameluc[m]> "> <@jamesmunns:beeper.com> ```..." <- which stm32wl are you using again?
<mameluc[m]>
yeah I was quite surprised. my phone lost 4g there
<mameluc[m]>
stm32wle (the one core thingy)
<mameluc[m]>
I think I will land on a module with STM32WLE5CC
<mameluc[m]>
st has a lot of the key management/security figured out I justβ’οΈ have to implement it
<mameluc[m]>
but I am not going to look at FUOTA at least for now
<diondokter[m]>
mameluc[m]: FUOTA is hard when you can either send like 100 bytes per day or need to coordinate all/many devices to update at the same time
<mameluc[m]>
yeah, for my application where I only need to support a couple of sensors I decided to not look at it. I have to choose my battles
<JamesMunns[m]>
having done some work looking at lorawan FUOTA recently, I'd almost recommend doing your own custom thing over implementing their algorithm :p
<mameluc[m]>
JamesMunns[m]: good to know π
<mameluc[m]>
my process was basically to read trough all st lorawan code and documentation. Try it out for some stuff. Realize I don't want to work with C anymore and then I dove straight in to rust. Not the most efficient way to get out a product but I would have tired of the project if I continued with C.
<JamesMunns[m]>
but yeah, their fragmentation protocol means you can only send 2^14 - 1 frames, including all redundant parity frames, and the minimum dynamic packet size is as small as 11 bytes of payload, IIRC
<JamesMunns[m]>
but... it would take a very long time to send 16383 packets
<mameluc[m]>
yeah, doesn't really make sense for my stuff. FUOTA over lorawan with battery powered devices is quite tricky. With unlimited power it might make more sense
<mameluc[m]>
my goal is to create a device that is as stupid as possible and very much tailored for the project. No one thing for everything. I am trying to get rid of all feature creep for this one.
<mameluc[m]>
If somebody wants a custom sensor in a basement this one is not for them. I would probably go with wall power, nb-iot and battery back up for that stuff.
<mameluc[m]>
okay enough ranting π
IlPalazzo-ojiisa has quit [Quit: Leaving.]
epistax has joined #rust-embedded
epistax has quit [Client Quit]
<diondokter[m]>
Ok, conclusion, Reed-Solomon isn't great for small bits of data
<dirbaio[m]>
talking CRCs again huh ;d
<dirbaio[m]>
s/;d/:D/
<diondokter[m]>
Haha yep
<diondokter[m]>
Do you know of an error correction method for a u16?
<diondokter[m]>
Storing a u16 in 4 bytes
<dirbaio[m]>
i've seen some store (x, !x)
<diondokter[m]>
I don't see how you can correct with that... Detect? sure, but correct?
<dirbaio[m]>
ahhh
<dirbaio[m]>
dunno then!
<diondokter[m]>
Haha ok, thanks!
<dirbaio[m]>
I decided to stay away from ECC in EKV to limit rabbit holes π€£
<dirbaio[m]>
how likely it is that you get a bit error if the flash is in good condition?
<diondokter[m]>
Yeah, I've got something decent working with just CRC, but it'll work better if I can encode the length with an error correcting algo
<dirbaio[m]>
for the "still be able to read other keys if one key is corrupted" thing?
<diondokter[m]>
Yes
<dirbaio[m]>
fun
<diondokter[m]>
I can still recover, but now I need to read word for word and check if it can be a valid header. Super bad for performance
<diondokter[m]>
Maybe it's not worth it
<dirbaio[m]>
that's vulnerable to "bad luck" if a payload is a valid header by accident tho
<diondokter[m]>
Yep
<diondokter[m]>
Hence my desire for some error correction. I feel like it should be possible!
<dirbaio[m]>
in EKV I just made it give up and return Err(Corrupted)
<dirbaio[m]>
only way out is to reformat
<dirbaio[m]>
for my use case I can always redownload the entire config from the server so π€·
<diondokter[m]>
Doesn't work well when the server url is stored in the config in flash :D
<dirbaio[m]>
hahahaha
<dirbaio[m]>
oof
<dirbaio[m]>
I have "factory config" and "user config"
<diondokter[m]>
Right, that helps
<diondokter[m]>
We have a fallback to a default too, but still
<dirbaio[m]>
server url is in factory config, and device never writes to factory config so it should never corrupt
<dirbaio[m]>
and it's in internal flash even on products that do have external qspi flash
<dirbaio[m]>
except we just launched a product that uses wifi
<dirbaio[m]>
and guess where I put the wifi config (SSID, password)?
<dirbaio[m]>
on user config π¬
<diondokter[m]>
Hahaha
<dirbaio[m]>
because the user does modify those... I guess I'll need a 3rd partition "user config but for important stuff that we should try hard to never erase"
<dirbaio[m]>
π₯²
azzentys117[m] has joined #rust-embedded
<azzentys117[m]>
dirbaio[m]: Is this for A/B kind of update?
<dirbaio[m]>
no, it's for device configuration. there's a web admin panel where the user configures stuff, the configuration gets synced to all devices so devices have it even if they're offline
<dirbaio[m]>
erasing the on-device config is no big deal because it can be downloaded again
<dirbaio[m]>
except if the config had the details for connecting to wifi, because then you're stuck offline and someone has to physically come to the device to rescue it
robk5225 has joined #rust-embedded
rob5225 has quit [Ping timeout: 276 seconds]
<mameluc[m]>
maybe implement soft-delete and ability to store last couple of configs so it can revert on network loss?
<dirbaio[m]>
configs can be huge, up to megabytes
<dirbaio[m]>
can't store multiple copies
<mameluc[m]>
you have probably thought about this a lot already but what about only saving the last working network settings? (that is almost what you talked about)
<dirbaio[m]>
yeah that's what the "user config but for important stuff that we should try hard to never erase" partition would be
<mameluc[m]>
also check the legality of saving the users wifi credentials even if they tell the device to forget them. Some notification is probably needed
<dirbaio[m]>
the device already supports multiple "network profiles", the server will only erase an old one once connectivity with a newer one has been verified
<dirbaio[m]>
so you can never lock yourself out :D
<dirbaio[m]>
the only thing left is ... don't store the network profiles in the general "user config" that might get erased π€£
<mameluc[m]>
small detail π
<dirbaio[m]>
yep π
<dirbaio[m]>
it also gets erased when I deploy a firmware update that does breaking changes to the EKV on-disk format ...
<dirbaio[m]>
that's how I "discovered" this design issue lol
<mameluc[m]>
I have had almost the exact problem in an old project. I guess everything is small details when it comes to embedded
<diondokter[m]>
Alright, sequential-storage v0.6.0 has been released!
<diondokter[m]>
Thanks for all the help here!
<diondokter[m]>
I gave up on error correction (for now).
<adamgreig[m]>
hi @room, meeting time again! agenda is https://hackmd.io/PR6ImqL5SCqSVQ6HT4zeyQ, please add anything you have to announce or discuss and we'll start in a few mins
<adamgreig[m]>
i've put a note about embedded-hal-mock, thanks Ralph
therealprof[m] has joined #rust-embedded
<therealprof[m]>
adamgreig[m]: Can you make it public?
<therealprof[m]>
Oh sorry, it is but is seems the link was the rendered version.
thejpster[m] has joined #rust-embedded
<thejpster[m]>
'sup, bitslingers
<therealprof[m]>
therealprof[m]: Or hackmd decided for me. ;)
<JamesMunns[m]>
Away from computer, but "we all good to talk about PAC stuff next year?" might be good to pre pre pre discuss at the end of the meeting
<adamgreig[m]>
so long as we don't veer into a pre pre discussion :p
<JamesMunns[m]>
adamgreig[m]: I believe in you, Mr. Moderator!
<adamgreig[m]>
ok, let's start!
<adamgreig[m]>
I only have one announcement, which is that svd2rust 0.30.3 is out with a few small changes, including making sure generated vectors are repr(c) :p
<cr1901>
Maybe I can rip out json RPC out of my notifier app now
<therealprof[m]>
Can I get fries with that?
<JamesMunns[m]>
therealprof[m]: If `Fries: Serialize + DeserializeOwned`, yes!
<adamgreig[m]>
next up, we now have a plan to release embedded-hal 1.0 on dec 28th, I guess we should update the 1.0 tracking issue and keep encouraging people to try out the rc with their hals and drivers
<cr1901>
A late Christmas miracle?
<adamgreig[m]>
we also still need the migration guide by then
<therealprof[m]>
Holiday feast
<adamgreig[m]>
korken89 opened an issue about the spi stuff in https://github.com/rust-embedded/embedded-hal/issues/521, which is basically that before, drivers basically had access to the bus, so if they needed precise timing control between say CS assert and clock starting, they could do that themselves
<adamgreig[m]>
now, drivers just get a SpiDevice which handles CS for them, but there's no way to control small delays cs-to-data and data-to-cs, which lots of devices specify a minimum time for in their datasheets
<adamgreig[m]>
often that minimum time is very short (tens or hundreds of ns) and so it elapses naturally in the normal time taken between asserting cs and starting an spi operation, but it isn't guaranteed and might be very short on fast microprocessors (or devices that need a very long delay)
<adamgreig[m]>
historically we've said that stuff like SPI clock speed, phase, and polarity are "configuration" that the user takes care of and the driver doesn't know about, so this could be lumped in with that... but typically there's no way for the user to configure this
burrbull[m] has quit [Quit: Idle timeout reached: 172800s]
<adamgreig[m]>
we do have a DelayUs operation in SpiOperation, but that's generally too coarse for these sorts of timings, and I don't think we want DelayNs because you can't support it on Linux for example
<dirbaio[m]>
well we could "support" it by rounding up
<adamgreig[m]>
anyone have any thoughts? maybe we just leave it as-is because it more or less works for everything already; if anything has a really long delay it can use DelayUs
<adamgreig[m]>
my gut feeling is that the type of delay DelayUs is meant to be for is very different from the sort of "bus timing" thing that is being asked here
kevlan[m] has joined #rust-embedded
<kevlan[m]>
Specify it in number of nop instructions to run in a loop since that's how I usually implement those sort of delays?
<JamesMunns[m]>
Can we make it DelayDuration instead? :p
<cr1901>
I like the idea of rounding up DelayNs if not supported
spinfast[m] has joined #rust-embedded
<spinfast[m]>
Isn't some configuration needed to write a driver for something like a sensor, flash, fram type device sitting on a bus?
<adamgreig[m]>
but otoh it's not really "configuration" because the CS pin is generally driven by software anyway
<JamesMunns[m]>
kevlan[m]: Weve learned nop timing is terribly unportable :p
<spinfast[m]>
like without that, it would be incredibly hard to say write a driver for like a bosch/tdk/infineon/whatever device against the traits alone
<adamgreig[m]>
spinfast: as in, bus clock speed and phase/polarity?
<kevlan[m]>
JamesMunns[m]: Forgot the :sarcasm: tag on that π
<adamgreig[m]>
that does need to be configured of course, but right now that's seen as "out of scope" and the user configures the bus
<spinfast[m]>
I see, and how does the bus get reconfigured for each device on it?
<spinfast[m]>
e.g. one device needs one configuration, another needs a different one
<adamgreig[m]>
we could have DelayDuration and tell HALs to round up to the best precision they can, then on Linux it would be Β΅s, which would tank performance for some situations but what can you do
<adamgreig[m]>
spinfast: currently, no good answer to that question, except that whatever thing is sharing the bus can provide it somehow (i.e. it's outside the traits)
<adamgreig[m]>
embedded-hal-bus provides things that turn an SpiBus into multiple SpiDevices, ensuring each transaction gets exclusive access to the bus for the duration of the transaction
<adamgreig[m]>
when the user requests a new SpiDevice from that sharing struct, they could say "and use this configuration for this device", but that doesn't exist at the moment
<spinfast[m]>
I mean it makes sense to lump in cs delay with that
<adamgreig[m]>
right now there's no shared trait for "SPI configuration", so such a thing can't exist cross-HAL
<spinfast[m]>
but yeah, feels like there should be some answer for common configuration setup at some point
<adamgreig[m]>
which I think in practice means each HAL would have to ship its own bus sharing struct too
<spinfast[m]>
otherwise it really makes drivers near impossible to write in a general way
<adamgreig[m]>
well, the drivers can be written perfectly generically, they just can't be used or composed :p
<adamgreig[m]>
the spidevice contract is clear that it just talks about data/transactions, so the driver documentation has to tell the user what configuration is required and it's then on them to set it
<adamgreig[m]>
still, I agree it's a deficiency and means in practice you probably can't share the bus between devices with incompatible SPI protocols
<adamgreig[m]>
luckily almost everything works in both mode 1 and 3 and then you just set the clock to the lowest common clock and it should work :P
<therealprof[m]>
Did that actually ever work?
<adamgreig[m]>
but perhaps the SpiBus trait needs to learn about "configuring the spi bus", clock+phase+polarity plus perhaps CS timings, and the HALs could support that, and then embedded-hal-bus's sharing structs could use that to allow reconfiguring the bus for each device
<spinfast[m]>
couldn't spi change the clock depending on the device?
<spinfast[m]>
chip select should gate things (its not i2c)
<therealprof[m]>
spinfast[m]: With a super complex API, sure.
<adamgreig[m]>
yes, it's normal to do that in embedded, just right now we don't have a shared interface for it
<thejpster[m]>
fun fact - SD cards need you to change clock speed inbetween transactions, and to send bytes into the SPI bus with no chip select selected
<adamgreig[m]>
hm, do you have to change clock speed? I thought you can use a constant lower speed if you wish
<adamgreig[m]>
the clocks without CS part does make it annoying though
<therealprof[m]>
thejpster[m]: Isn't that handled by special peripherals?
<adamgreig[m]>
but we've already discussed that for sd cards :p
<thejpster[m]>
adamgreig: you have to init at under 400 kHz
<adamgreig[m]>
the special peripherals are for SDIO/"SD card" mode, rather than their SPI mode
<thejpster[m]>
therealprof: you can use a 4-bit SD/MMC peripheral, but SD cards also have a 1-bit SPI mod
<JamesMunns[m]>
I use SPI for ws2812b devices too but I wouldn't call them an SPI device :p
<therealprof[m]>
adamgreig[m]: I see.
<thejpster[m]>
s/mod/mode/
<therealprof[m]>
JamesMunns[m]: No, that's more a custom signaling that happens to be SPI compatible.
<adamgreig[m]>
yea, that use case is covered under "just use SpiBus directly", since it can't really be shared with other devices anyway (no CS)
<adamgreig[m]>
but in theory SD cards are real SPI, except they require you be able to transmit without CS, which our current trait doesn't really support
<adamgreig[m]>
short of some dodgy hacks anyway
<adamgreig[m]>
for example, we could solve the "CS timing delay" thing with a custom OutputPin that you pass as your CS pin, and it does a short nop delay after being asserted
<adamgreig[m]>
but that's annoying for a driver to have to do, it would be something the application firmware does
<adamgreig[m]>
anyway, it does seem like a limitation of the current model, with a few workarounds or things we could change, but if we did want to extend SpiBus to support configuration that would be a breaking change, so it had better get in before 1.0
<adamgreig[m]>
I guess it comes down to "SPI configuration isn't purely an init-time problem, because it needs to be reconfigured for each device you share the bus between"
<therealprof[m]>
spinfast[m]: To reconfigure the SPI bus you have to do all sorts of nasty register juggling. Are you sure that API supports that under the hood and on-the-fly? That would be impressive.
<adamgreig[m]>
whereas the original goal with these traits was "neglect init-time setup, leave that to the HAL, only handle runtime stuff"
jessebraham[m] has quit [Quit: Idle timeout reached: 172800s]
<spinfast[m]>
therealprof[m]: It does
<adamgreig[m]>
the zephyr API definitely supports that, yea, it's not complicated though
<adamgreig[m]>
you pass it the config you want for each transaction, if it's different to the current config the driver reconfigures the spi peripheral
<spinfast[m]>
> <@therealprof:matrix.org> To reconfigure the SPI bus you have to do all sorts of nasty register juggling. Are you sure that API supports that under the hood and on-the-fly? That would be impressive.
<spinfast[m]>
* It does, each transaction can reconfigure the device (it is a fallible operation)
<adamgreig[m]>
linux does this too, each spidev (SpiDevice) is associated with a bus config
<therealprof[m]>
spinfast[m]: Guess the drivers are having a mighty good time then.
<adamgreig[m]>
I don't think it's too hard for the driver? they already know how to configure the peripheral, of course, so they just check if the current config is OK and reconfigure if not
<spinfast[m]>
the hardest part is usually around clocking and doing a best effort to find a clock config
<therealprof[m]>
Yeah, but you might also have to change clock configuration and whatnot.
<spinfast[m]>
the rest is usually a bit flag change
<adamgreig[m]>
changing clock config will still be inside the spi peripheral
<adamgreig[m]>
our API would probably be different, because hopefully we assume the config doesn't change once the SpiDevice is constructed so it can stay out of the transaction
<therealprof[m]>
adamgreig[m]: Not necessarily.
<adamgreig[m]>
I'm not sure what your point is exactly
<adamgreig[m]>
clearly it's possible because zephyr does it for all their targets
<adamgreig[m]>
I expect it doesn't ever change whole device clocks for this, so probably it does the highest speed below the speed you ask for that it can accomplish by changing the spi peripheral divider
<adamgreig[m]>
but all our drivers need to be able to set a clock speed, just right now that's their own api, and there's nothing in embedded-hal to make it common
<therealprof[m]>
The point is that some chips have dedicated clocks which feed the peripherals. Some speed changes are not just local to the peripheral but might require changing the system wide clock configuration which could affect other peripherals, too.
<adamgreig[m]>
does anything not have a local divider in the spi peripheral? it seems very commonplace
<adamgreig[m]>
but regardless I'm sure they find a way to handle such a thing, maybe they even panic if somehow your desired config is totally impossible
<spinfast[m]>
at least the ones I've used have a local divider
<thejpster[m]>
(but also checks if the config is unchanged by stashing the pointer to the config struct and doing a pointer comparison, so your configs have to be static...)
<spinfast[m]>
I get not tackling that in 1.0... but really it would make driver writing on top of SPI/I2C etc a lot more generalized
<spinfast[m]>
* more generalized in my very humble opinion
<adamgreig[m]>
thejpster: the configs are `const config*` so in theory that's in the type system :P
<adamgreig[m]>
(the documentation does say it will use pointer comparisons to detect changes, at least)
<JamesMunns[m]>
I don't disagree with spinfast but we really really should ship 1.0 soon, and move on to 1.1 or 2.0 after.
<JamesMunns[m]>
There will be oversights and we will be smarter tomorrow than today
<spinfast[m]>
JamesMunns[m]: No disagreement from me
<adamgreig[m]>
I only agree up to a point
<adamgreig[m]>
if this ends up being a serious enough problem that we did foresee before releasing, it will be a real annoying 2.0
<thejpster[m]>
embedded-hal 2.0 will break the world
<adamgreig[m]>
my gut feeling is more likely we'd just never do it and it will remain a deficiency we have to live with or work around
<JamesMunns[m]>
thejpster[m]: So how do we fix THAT
<thejpster[m]>
we ship embedded-hal 1.<big number>
<adamgreig[m]>
well, I guess we could provide a default impl for "SpiBus::set_bus_config" that is a no-op, ha ha
<JamesMunns[m]>
Like, we should have the humility to admit that 1.0 will be flawed because we shouldn't wait 3 years between every major release
<therealprof[m]>
Well, holding everything until a proven API has emerged and implemented is probably not going to a very popular approach.
<thejpster[m]>
we stabilised Rust Embedded in 2018 so it's taken 5 years to get this point
<adamgreig[m]>
yea, but sticking our head in the sand and ignoring problems people are facing with the rc because we want to ship it isn't great either
<JamesMunns[m]>
adamgreig[m]: Then every driver that doesn't reset config on every transaction could be broken
<therealprof[m]>
adamgreig[m]: It's not a new "problem" but rather a feature request.
<JamesMunns[m]>
I'm not on the Hal team, it's not my call. But I think we should be shipping more often and make it easier to keep drivers up to date at a reasonable 6-12mo cadence. I'll stop arguing the point tho for today.
<adamgreig[m]>
it's a regression - the v0.2 SPI didn't have this issue
<therealprof[m]>
adamgreig[m]: Okay, I must have missed that point.
<adamgreig[m]>
our "SpiBus" trait exists so someone could in principle just require that, which is also our answer for any other complicated or annoying device
<thejpster[m]>
I think it's fine for Driver X to say `<S: DriverXSpiDevice>` where the application author has to `impl DriverXSpiDevice for NrfSpiDevice` for cases where the nuances of SPI bus access are not covered by the HAL
<thejpster[m]>
The HAL is not the only way to ensure freedom to swap drivers between MCUs
<adamgreig[m]>
I don't think the answer should be "oh just don't use the hal", then, though?
<adamgreig[m]>
s/the/embedded-/
<thejpster[m]>
embedded-hal will never cover everything, and if it does, it will be big and inefficient.
<thejpster[m]>
So saying "you don't have to use the embedded-hal" I think is fine?
<adamgreig[m]>
I agree, but I worry where we draw that line, and if we end up excluding a lot of devices then it seems we might have drawn it wrong
Guest7221 has left #rust-embedded [Error from remote client]
<cr1901>
I figured that once we got to 1.0, the pressure for 2.0 would be relaxed
<thejpster[m]>
You can always add a second Spi HAL trait.
<thejpster[m]>
You just can't break the first one
<adamgreig[m]>
if we're already casting our SD cards, LCD displays, and now also anything where you want to share a bus between devices with different SPI bus modes or clocks...
<adamgreig[m]>
hmm, that's true
<thejpster[m]>
If embedded-hal lets me share a bus between two devices running in the same mode at the same speed, I'd say ship it
<adamgreig[m]>
conceivably we can keep SpiDevice the same and add SpiBusWithConfig:SpiBus
<adamgreig[m]>
that can be part of the migration guide we need, too
<thejpster[m]>
embedded-io is at 0.6 and not 1.0-rc1
<therealprof[m]>
That could be easily fixed.
<thejpster[m]>
> NOTE This HAL is still is active development. Expect the traits presented here to be tweaked, split or be replaced wholesale before being stabilized, i.e. before hitting the 1.0.0 release.
<thejpster[m]>
Also, let's delete that :)
<adamgreig[m]>
ok, let's cover a couple other points before we run out of time
<adamgreig[m]>
not sure if anyone from the hal team is around π but there was some discussion about moving embedded-hal-mock into the wg and making sure it's up to date with 1.0
<dirbaio[m]>
if you mock away the hardware, you're testing your driver against a "fake" hardware that does work how you think it works
<Ralph[m]1>
dirbaio[m]: i think it's great if you're writing a driver. i've used this for all my doc-tests (and other tests). it's not exactly easy to have real hardware running in a CI ;)
<Ralph[m]1>
also, it might be that you don't have a HAL for your device which implements the necessary API already (e.g. new API or obscure/outdated HAL).
<Ralph[m]1>
it's not covering everything of course, but a generic driver can never be tested against all hardware. having unit tests against mocks and manual tests on real hardware is a good combination (and has saved my ass several times)
<Danilo[m]>
Hi! If there are any questions: I'm here
<Danilo[m]>
Ralph[m]1: > <@rursprung:matrix.org> i managed to get back to a PC just in time π
<therealprof[m]>
dirbaio[m]: I think there's some potential there to do optimisation work.
<Danilo[m]>
Here was the offer to move the repo: https://matrix.to/#/!BHcierreUuwCMxVqOf:matrix.org/$Qr3hJcXPCNirvITUYiMsbPq_uBJAPaZsh57g6ejpTIQ?via=matrix.coredump.ch&via=matrix.org&via=catircservices.org
<dirbaio[m]>
so i'm a bit concerned if we put e-h-mock in the e-h repo it might slow down development, especilly if we require new traits to have the mock implementation ready to go
<therealprof[m]>
I don't think we'll have such a requirement.
<thejpster[m]>
dirbaio: slower than it already is?!
<thejpster[m]>
also yes, they don't need to line up 1:1. You can have traits that aren't mocked.
<thejpster[m]>
(switching to mobile ... rootfs ran out of space)
<adamgreig[m]>
I guess having it in its own repo isn't much of an advantage there
<therealprof[m]>
Why not?
<adamgreig[m]>
sorry, I mean, it might as well live in-repo even if it doesn't necessarily stay up-to-date
<adamgreig[m]>
but having it in-repo and not-up-to-date does seem a bit annoying: who then updates it?
<therealprof[m]>
Nah.
<Ralph[m]1>
dirbaio[m]: that suggestion about the requirement came from me (earlier message here in the thread). implementing a mock is usually quite fast. it'd help with having something to implement against from day 0, rather than having to wait for your favourite HAL to implement it (or contribute it there and then wait for that to make it into a release). and it could e.g. be used in doc-tests of e-h itself, so that the API documentation
<Ralph[m]1>
can show how it's supposed to be used (with code which actually builds). might help in figuring out whether an API is easy to use (eat your own dog food and all that)
<dirbaio[m]>
and if you're writing a driver against a trait that's so new that your HAL doesn't impl it yet, how do you even run your driver to see if it works?
<adamgreig[m]>
is it likely to be a big problem post e-h 1.0? I expect dev velocity will slow down a bit anyway as we're hopefully iterating on things less
<Ralph[m]1>
dirbaio[m]: well, you can get started and work based on the spec. it's not optimal, but it's the epitome of test driven development, isn't it? π (and who said that the device even already exists; maybe that's also still under development? π)
<dirbaio[m]>
TDD only works if you can write test cases that you're sure they're correct
lulf[m] has joined #rust-embedded
<lulf[m]>
I think there is some value to the mocks when while implementing the driver, using it in unit test with assertions that it emits correct set of commands according to the datasheet. It's not a substitute for the real thing but still...
<lulf[m]>
s/the/a/
crabbedhaloablut has quit []
<dirbaio[m]>
a lot of bugs (a majority even, i'd say?) are due to incorrect/incomplete mental model of how the hardware works
<dirbaio[m]>
if you write unit tests before testing against real hardware, you'll write them against the incorrect model
<dirbaio[m]>
and then when you run with real hardware, you'll have more work because now you have to fix the driver and the tests
<therealprof[m]>
If you ever had to start at a protocol decoder on your oscilloscope, trying to find the incorrect bit of data sent across the wire over and over, you might appreciate the ability of simply writing a bit of code to test this with your development machine instead.
<Ralph[m]1>
dirbaio[m]: so far (note: my experience in the embedded world is limited; i'm more at home in the enterprise IT world) i've worked with the real hardware and written tests (with e-h-m) in parallel to make sure that i test what the HW does. this way i was then also able to modify the code when i didn't have the hardware at hand and feel confident enough if i mainly did technical refactorings
<therealprof[m]>
s/start/stare/
<lulf[m]>
It's interesting because I'm generally against mocking for non-embedded, but that's usually because non-embedded software can more easily run 'anywhere', whereas with embedded you need the exact same hardware for real tests.
<therealprof[m]>
lulf[m]: Same.
<adamgreig[m]>
I've not used embedded-hal-mock but for fpga/gateware stuff I always have a model of the external devices to test the gateware drivers against and it generally works really well
<adamgreig[m]>
you definitely do have to be careful that the model is correct though
<adamgreig[m]>
still, we're over time for this meeting, thanks everyone for discussing!
<dirbaio[m]>
therealprof[m]: if you're implementing a *complex protocol* there sure is value in unit testing. I'm talking about writing *drivers*.
<adamgreig[m]>
I think for e-h-mock perhaps best to open an issue where the hal team can discuss? if they're not interested then e-h-community is enough option or just keep maintaining it as-is
<dirbaio[m]>
first is for example implementing a TCP/IP network stack, the latter is implementing a driver for an Ethernet chip
<therealprof[m]>
dirbaio[m]: I'm also talking about drivers... display drivers.
<Danilo[m]>
From my view: I started writing e-h-mock when I was starting to write drivers for I2C based sensors. There, a lot of it is reading datasheets, and handling protocol messages in a certain way, often even type conversions, and iterating on a proper driver API.... (full message at <https://catircservices.org/_matrix/media/v3/download/catircservices.org/YPsJoLDFOXDgLecyYWsBqGIi>)
<adamgreig[m]>
yea, for the fpga stuff it's still drivers like "send messages to configure this adc and then request samples", not protocol level things, and it's super helpful to have the tests because it's very easy to accidentally change what gets sent to the device
<therealprof[m]>
dirbaio[m]: In fact I could have used that when I was debugging a driver for an Ethernet via SPI chip a long time ago.
<Danilo[m]>
In some way, embedded-hal-mock isn't even classical mocking, it's essentially a HAL implementation where you can control how the virtualized hardware-under-test behaves.
<adamgreig[m]>
modelling the device accurately is usually not too bad because it's just "read/write some registers", you just want to make sure the firmware set the right registers to the right values
<dirbaio[m]>
> you just want to make sure the firmware set the right registers to the right values
<dirbaio[m]>
yeah so this is the thing
<dirbaio[m]>
"set register X to value Y" is very simple code, it's quite hard to screw up
<lulf[m]>
dirbaio[m]: I guess it's anyway hard to make any distinction between simple and complex drivers for embedded-hal-mock's POV?
<lulf[m]>
Meaning, if it's useful for complex drivers, then its... useful?
<adamgreig[m]>
I think it's hard to screw up in straightforward line-by-line code, but it's easy to refactor something and now some method isn't called that does the calibration routine or bla bla bla and you end up not setting some registers that you meant to?
<therealprof[m]>
dirbaio[m]: It gets way more complex if the values are not a single simple integer number anymore...
<dirbaio[m]>
adamgreig[m]: but then you rerun it against real hardware and see it's stopped working and debug it
<adamgreig[m]>
or yea, maybe you're computing the values based on some config struct, or you need to also compute a CRC and send that to the device
<adamgreig[m]>
right, and to check what it's doing you now need a logic analyser or a bunch of debug statements
<adamgreig[m]>
but you could have a fairly simple mock that checks your refactor hasn't changed what the device saw
<adamgreig[m]>
running against real hardware is a good gold standard and you have to do it from the start, but I can still see the value in having mocks to help with refactoring or automated testing or debugging changes in behaviour etc
<adamgreig[m]>
for the fpga stuff it's hard enough just to write out some registers in order π
<therealprof[m]>
I started multiple times to write something similar because I had a use for that.
<mameluc[m]>
dirbaio[m]: unless the hardware locks the registers when some bit is set somewhere and you need to do a song and a dance to unlock it. The mocks will never catch this.
<mameluc[m]>
Testing that the right data reaches the driver is probably all you can test without hardware, then it is up to the drivers to deliver the data to the hw.
<adamgreig[m]>
why wouldn't the mocks catch it?
<Danilo[m]>
You can test that the high-level instructions given to the driver are converted into the appropriate low-level instructions for the hardware.
<adamgreig[m]>
I mean, you could have missed it in the datasheet, and so not implemented it
<adamgreig[m]>
but if you read the datasheet, you could have put that logic in the mock too
<dirbaio[m]>
in my experience the extra time you have to spend writing the tests and keeping them up to date is higher than the time you save e.g. debugging a refactor
<dirbaio[m]>
if you write the driver with trace! in the right places
<mameluc[m]>
adamgreig[m]: maybe my mocks just sucks π also stm32wl makes me salty
<dirbaio[m]>
you can easily compare traces before and after the refactor
<dirbaio[m]>
and adding traces is like 1/10th the time cost of adding mock-based tests
<adamgreig[m]>
I think I agree with that on balance, which is probably why I've only done this for fpga drivers
<dirbaio[m]>
and you want tracse anyway for other reasons, to be able to e.g. tell someone "please give me trace logs" in a github issue
<mameluc[m]>
dirbaio[m]: interesting approach, never done it that way but makes sense to me
<adamgreig[m]>
but I can see how the balance could be different for different people, different devices, different circumstances
<Ralph[m]1>
dirbaio[m]: tbh, i've heard the same arguments against all kinds of unit- & integration tests also in enterprise IT. and the arguments could never be confirmed with empirical data; having automated tests is such a huge productivity boost if used correctly (that's where some of these arguments came from; if you test if the substring function provided by your language/runtime works correctly instead of testing your business logic then
<Ralph[m]1>
you're indeed wasting your time - not claiming that you're running into this issue, just mentioning a common issue i've seen)
<adamgreig[m]>
maybe you're writing a driver for your HSN-1000 and it's very inconvenient or expensive to test with real hardware :P
<dirbaio[m]>
tests are a tradeoff between "faster dev" vs "catch more bugs"
<dirbaio[m]>
in embedded, a lot of bugs are due to incorrect model of how the hardware works (as I explained above), so the "catch more bugs" advantage of tests is much weaker
<adamgreig[m]>
or like "you have to have written tests for process/QA/regulatory reasons"
<therealprof[m]>
Tracing is nice and useful but a one-time thing. Tests can be used many times once they're written.
<adamgreig[m]>
that's the one :p
<dirbaio[m]>
as a real world case study: embassy's HIL tests have caught a lot of bugs, and almost none of them would've been caught by mock tests
<therealprof[m]>
Sure.
<dirbaio[m]>
plus, we've refactored drivers a few times that have changed the order and stuff of register writes. these would've required spending extra time updating the tests to the new register write sequences.
mabez[m] has joined #rust-embedded
<mabez[m]>
I think this is a personal preference thing. What about moving it to r-e-h and monitoring activity/usage. We can always move into e-h later if gets more popular/needs more attention? We don't need to force a decision now.
<dirbaio[m]>
vs with HIL tests, the same unmodified test still worked
<dirbaio[m]>
because it's testing what matters: can the driver send the bytes over SPI
<dirbaio[m]>
not implementation details like "is CR1 written before CR2" when the hardware desn't care
<Danilo[m]>
btw, if you don't want to move it, an alternative is always that I add a few of you as co-maintainers, to increase the bus factor :)
<therealprof[m]>
For me it's a continuation of the Rust story: If it compiles it's certain to work (to a certain degree). If a drivers tests fine with mocks I have a high confidence that it'll also do something useful on real hardware.
<Danilo[m]>
* btw, if you don't want to move it, there's always the possibility that I add a few of you as co-maintainers, to increase the bus factor :)
<therealprof[m]>
... and if it doesn't, it vastly reduces the problem space.
<dirbaio[m]>
not if you wrote the mock tests based on the datasheet only :P
<dirbaio[m]>
if you write the driver against real hardware first, and once it works you write the tests to "lock in place" the register writes then sure
<dirbaio[m]>
but that's not TDD
<dirbaio[m]>
* not TDD anymore
<dirbaio[m]>
* register writes that you alreadu know wor, then sure
<therealprof[m]>
I've written my fair share of drivers and there's always a high hurdle to get a driver off the ground from scratch with real hardware.
<dirbaio[m]>
* register writes that you alreadu know work, then sure
AdamHott[m] has joined #rust-embedded
<AdamHott[m]>
Hey All, I gotta run, special shout-out to James Munns who helped me solve my RTT issue a few days ago! Now I can get those videos together for review in a few days for updating the tooling section of the book to include more information on probe-rs and rtt! Thanks all!
<dirbaio[m]>
if you're halfway writing the driver and want to check what you've written so far does the right register writes?
<dirbaio[m]>
you don't need mock tests for that, just log all spi/i2c transfers and look at them
<lulf[m]>
dirbaio[m]: That's usually my approach, and I could see it being useful if you're just a small hobbyist driver crate, you get quick feedback and don't need to run a HIL stack in your apartment if someone comes along with a PR :)
<lulf[m]>
> <@dirbaio:matrix.org> if you write the driver against real hardware first, and once it works you write the tests to "lock in place" the register writes that you alreadu know work, then sure
<lulf[m]>
* That's usually my approach, and I could see it being useful if you're just maintaining a hobbyist driver crate, you get quick feedback and don't need to run a HIL stack in your apartment if someone comes along with a PR :)
<therealprof[m]>
dirbaio[m]: If you have decently complex external, that is not fun!
<therealprof[m]>
* complex external chip, that
<therealprof[m]>
... and first you need to get the communication going to begin with.
<dirbaio[m]>
therealprof[m]: then writing the tests is not fun either!
<therealprof[m]>
If you're starting from raw chips, that can be a real challenge. Of course there's a lot of working stuff now compared to some years ago so that became a ton easier because you don't have to bring up the hardware, work the peripherals and the external chips at the same time...
<dirbaio[m]>
you can get 90% of the benefit with 10% of the effort with traces, vs actual mock-based tests
<dirbaio[m]>
if your traces are long enough that eyeballing them is painful, pipe them to a file and diff the files
<therealprof[m]>
I disagree, because the traces are good for one-time use only. Tests can used often.
<dirbaio[m]>
and once you do have the full driver working, HIL tests have much higher ROI
<therealprof[m]>
And sometimes you don't even have access to the raw output.
<therealprof[m]>
So there's nothing to print.
<dirbaio[m]>
and much less maintenance burden
<therealprof[m]>
HIL tests are only easy if they're basic. Once you start testing interaction with the real world, things get insanely complex... How do you test whether your step motor turned 1 degree of angle or 2? How do you judge whether the output on the display is correct? How do you compare whether the temperature is correct to a fraction of a degree?
<therealprof[m]>
Mocks are certainly not everything but they can give a high degree of confidence that something has at least a chance of working.
<dirbaio[m]>
just assert temperature is between 10 and 30 deg C
<dirbaio[m]>
a bug in your temperature sensor will most likely make it hang or read complete garbage that won't pass the check
<dirbaio[m]>
and if you have some complex calibration calculation, isolate it in a pure function and unit-test just that
<dirbaio[m]>
this gives you 90% of the benefit with 10% of the effort
M9names[m] has joined #rust-embedded
<M9names[m]>
How much do you want a 100kg machine in you HIL? Or 1000 of the same device that has different timings/measurements?
<M9names[m]>
Like, I'm happy HIL is working out well for you but I don't see why you're trying to die on this hill, specifically.
<dirbaio[m]>
π€£ i'm not dying on any hill (on any HIL? π)
<dirbaio[m]>
i'm partial to analyzing the cost/benefit of the choices you do, like doing HIL testing, mock testing, tracing, whatever
<dirbaio[m]>
Perhaps simply don't test the thing that would require a 100kg machine in your HIL. Or if you really must becase otherweise it might kill someone then maybe mock testing is the way, yes
<dirbaio[m]>
because the benefit (complying with the "MUST test this" requirement) outweights the cost (spending more dev time writing the mock tests) and the cost of alternatives would be higher (making a HIL rig with the 100kg machine)
<mameluc[m]>
I don't need to do HIL bc dirbaio does for me
<dirbaio[m]>
but i'm pointing out that for quite a few cases where I see people using mock testing, the cost/benefit analysis doesn't check out IMO, vs manual testing or HIL testing.
<mameluc[m]>
starting to come to a point where I should set up some kind of HIL for lorawan testing
<dirbaio[m]>
benefit is smaller (doesn't catch all bugs, vs real hardware does), cost is higher (you must update the tests if you do some refactor that changes the order of register accesses that the hardware doesn't care about)
<dirbaio[m]>
but it's just my opinion
<dirbaio[m]>
people can do whatever they want
<dirbaio[m]>
Β―\_(γ)_/Β―
<therealprof[m]>
dirbaio[m]: And they will, too. π
<Ralph[m]1>
a case against HIL: i don't intend to run the actual hardware for a CI somewhere in my basement to support arbitrary PRs coming in on my repo (ok, the only one besides me opening PRs there is dependabot and that's usually on the stm32 example, so unrelated to the mocks π ).
<Ralph[m]1>
i think HIL makes sense for corporate things (where you have the budget & space to set it up) and for large(r) projects (which may or may not get funding from 3rd parties to do these setups).
<dirbaio[m]>
what got this conversation started though is "should e-h-m live in the e-h repo"
<dirbaio[m]>
which would put more work on my plate, as a member of the HAL team which maintains the e-h repo
<adamgreig[m]>
I'm sorry the conversation got so derailed
<dirbaio[m]>
so it's reasonable for me to say "no, because i'm not convinced of the value"
<dirbaio[m]>
i'm not saying it shouldn't exist, i'm not saying people shouldn't be allowed to use it, i'm just saying it shouldnt' be in the e-h repo in my opinion
<adamgreig[m]>
Usually the side discussions are fun but I think maybe not so much this time
<dirbaio[m]>
(and i'm only 1 member out of 5 in the HAL team. if this gets voted and the outcome is yes, it'll go in the e-h repo and i'll help maintain it, like I already do with the nb stuff...)
<lulf[m]>
dirbaio[m]: Agreed with this. Apologies for any side-tracking on my part!
<Ralph[m]1>
<dirbaio[m]> "what got this conversation..." <- i think it was a two-fold question: "should it be in rust-embedded?" followed by "if yes, should it be part of the embedded-hal repo?".
<Ralph[m]1>
i feel like we mainly focused on the latter (with a focus on the general "does it make sense to use mocks here?" question) and didn't really discuss the former.
<Ralph[m]1>
sorry if i helped in derailing the discussion ποΈ
<dirbaio[m]>
the sidetracking was me explaining why I personally don't see the value. I could've been less incendiary, sorry!
<dirbaio[m]>
and there are benefits to it being in the e-h repo (being able to atomically update traits + mocks in a single PR) so if it was in rust-embedded it'd make most sense to put it in the e-h repo
<dirbaio[m]>
so it mostly boils down to "does the HAL team want to take over maintenace", for which my individual vote is "no"
<dirbaio[m]>
lack of maintenance can be fixed in other ways, like adding more people to the repo or transferring it to rust-embedded-community
<JamesMunns[m]>
I appreciate y'all coming back to calm the waters a bit.
_whitelogger_ has joined #rust-embedded
_whitelogger has quit [Remote host closed the connection]
Guest7221 has joined #rust-embedded
mirko___ has joined #rust-embedded
mirko___ has quit [Remote host closed the connection]
mirko___ has joined #rust-embedded
<therealprof[m]>
I only got the "should be it maintained by the HAL team?" question. And for me that would be a yes. I wouldn't agree to it being part of e-h.
<adamgreig[m]>
what do you mean by part of e-h, like being in the same crate?