<ivmarkov[m]>
For TLSF (the thing used by ESP-IDF) the author of the above GH repo gives H(M, n) = M * (n - 2) which is simply terrible. I.e. for the same params as above I get 256KB * (32KB - 2) ~ 8 **Gigabytes** of RAM WCMC.
<ivmarkov[m]>
I get it that in reality the chance of hitting WCMC is probably very very small, and I should be looking at averages and standard deviation but yet - such a large fragmentation seems alarming. I do hope I'm just not good at understanding these formulae though. :-)
<ivmarkov[m]>
Would appreciate your feedback!
<ivmarkov[m]>
* For TLSF (the thing used by ESP-IDF) the author of the above GH repo gives H(M, n) = M \* (n - 2) which is simply terrible. I.e. for the same params as above I get 256KB \* (32KB - 2) ~ 8 **Gigabytes** of RAM WCMC.
<ivmarkov[m]>
I get it that in reality the chance of hitting WCMC is probably very very small, and I should be looking at averages and standard deviation but yet - such a large worst-case fragmentation seems alarming. I do hope I'm just not good at understanding these formulae though. :-)
<ivmarkov[m]>
Would appreciate your feedback!
<ivmarkov[m]>
To answer my own question (feel stupid now) the TLSF folks specifically don't even bother with WCMC. With only 12.5% average fragmentation of their worst-case tests and sigma = 1.6% this means six-sigma is at most 22% fragmentation (right?), which sounds quite OK. all [here](http://www.gii.upv.es/tlsf/files/papers/TLSF_performance.pdf)
<ivmarkov[m]>
20-sigma would then be only 45% fragmentation which is still quite OK. The bottom line of all of this is... why bother with no-alloc if even simple RTOS allocators are that good? Maybe I still make simple math mistakes... not sure, but the question remains?
sugoi has joined #rust-embedded
sugoi has quit [Ping timeout: 265 seconds]
<chrysn[m]>
<explodingwaffle1> "https://blog.rust-lang.org/..." <- I had brief hopes that this would be a viable replacement for TAIT as used in embassy for tasks, but the MVP only allows restricting where clauses and not giving types for statics. Still, it's likely a good step in the right direction..
emerent has quit [Ping timeout: 248 seconds]
emerent has joined #rust-embedded
<JamesMunns[m]>
ivmarkov I haven't seen WCMC before, all the hard realtime systems I've worked on either do no allocation, or all allocation at boot, where no allocations occur after init (basically a bump allocator).
<JamesMunns[m]>
The allocation being "constant time" is not necessarily *fast*, it means that you can guaranteed that an alloc with an empty heap or an almost-entirely full heap takes the same time, to make hard realtime guarantees. In hard realtime, you really ONLY care about the "absolute worst possible case", because you still need to hit timing deadlines even in the (almost impossible) worst case
<JamesMunns[m]>
But I'm also not sure what you mean, because the first two messages are talking about how *bad* allocators are, you need a huge amount of overprovisioning of memory, then you say:
<JamesMunns[m]>
> why bother with no-alloc if even simple RTOS allocators are that good? Maybe I still make simple math mistakes
<JamesMunns[m]>
like, you could say that allocations have a "constant time" of 100ms, for example
<ivmarkov[m]>
* My first message (dealing with WCET) was just me freaking out that WCET - the non-statistical absolute measurement so to say - is really really bad for all that allocator guys.
<ivmarkov[m]>
My _second_ message however, is trying to understand - in a way - that OK, if WCET is so really unbearably bad, _how often_ - statistically speaking - it is likely to occur in practice? And the statistical results of the TLSF guys (putting aside how much we can trust these as they assume standard distribution and give little to no info in their presentation as to the exact nature of their tests) - STILL - seem to imply that -
<ivmarkov[m]>
_statistically_ - TLSF and similar are really good at fragmentation. If my 20-sigma math is correct, you'll only get 45% of fragmentation in real life in - what is it? - 99.999999% of the cases, which might be equal to your MCU getting broken by a cosmic radiation or something.
<JamesMunns[m]>
yeah, totally agree
<ivmarkov[m]>
So in the end
<JamesMunns[m]>
"hard realtime" to "soft/non-realtime" is a really huge difference in how you *approach* systems design
<ivmarkov[m]>
it boils down to the following simple question, which is - I hope - as relevant to Rust Embedded as it gets:
<ivmarkov[m]>
JamesMunns[m]: But... but.... we are talking about _physical systems_ here. There is no absolute firm boundary between hard and soft w.r.t. what I'm talking about. If the chance to get bad fragmentation is so statistically insignificant so as this to be comparable to a hardware failure, **why bother** with `no-alloc` at all, is what I'm asking?
<JamesMunns[m]>
But I think that's the "only" part, even if TLSF or other algs "only" need 45%-100% overprovisioning MOST of the time, that means you still need 1.5-2.0x the total mem available, AND most people aren't doing calculations like that, so they just need to add an even higher safety factor
<ivmarkov[m]>
because no-alloc as a programming discipline is hard (another assumption, but OK this had been my experience so far). Lifetimes and stuff...
<JamesMunns[m]>
like, you COULD do the analysis on all possible allocation types and kinds and patterns and do the analysis and stuff.
<JamesMunns[m]>
BUT if you use a "normal" allocator that is smarter and faster, it might still have "worst case" patterns that introduce jitter in the system
<JamesMunns[m]>
like if sometimes an alloc takes 100us and sometimes it takes 5ms
<JamesMunns[m]>
like, don't get me wrong, I'm not militantly anti-alloc
<JamesMunns[m]>
it just makes "guaranteeing" certain timing and success criterias of your system harder, when you can't significantly overprovision the amount of memory you need.
<ivmarkov[m]>
Please look at the math. 45% is at 20-sigma (!!). My math might be completely wrong (I skipped statistics 20y ago at school :-) ) but if it IS correct, 45% pover-provisioning is something you should do if you are the most paranoid man in the world. It is 20-sigma after all?
<JamesMunns[m]>
personally: if I was advising a group that wanted to use an allocator, and they didn't have a safety factor of 2 (e.g. they calculate they need 128KiB of heap, so they ensure they have 256KiB of heap), I would be REALLY nervous
JoonaHolkko[m] has joined #rust-embedded
<JoonaHolkko[m]>
eldruin: Hi. I found a crate made by you named `embedded-i2s`. Would it be possible to get some assistance with it?
<JamesMunns[m]>
UNLESS they really were doing calculation of all the different kinds of allocation patterns and could really prove they could avoid fragmentation.
<JamesMunns[m]>
I'm not sure how your algorithm handles very many different sizes of allocation, not just say "32KiB allocs in a 256KiB space"
<JamesMunns[m]>
like if you have a ton of little allocs, where the size of the allocation header or allocator overhead starts being larger than the allocations themselves
<JamesMunns[m]>
or where you require very deep traversal to figure out where to place a larger alloc (say an HTTP buffer) in an allocator fragmented by very small (say TCP packet header) allocs
<JamesMunns[m]>
I'm less current on the research, so this might just me being overly worried!
<JamesMunns[m]>
But IMO the biggest issue with allocators is first jitter, and second that there isn't good tooling for ACCURATELY measuring fragmentation numbers for complex allocation patterns
<ivmarkov[m]>
I think you are bit doing a deep dive into "how a
<ivmarkov[m]>
_particular_ algorithm handles all sorts of sizes for heap requests". I guess we can only answer that by looking into the _details_ of a particular algorithm (as in e.g. TLSF or others). Where I'm going is different: assuming we trust these folks' experimental results, why do we even bother?
<JamesMunns[m]>
yes, IF your allocator has a bounded WCET that is ALWAYS okay if you hit it in EVERY allocation, AND you overprovision enough (even if it's not much) to allow for WCMF, then you should feel comfortable using an allocator!
<ivmarkov[m]>
And what I have is only anecdotal evidence in the form of "my experience so far..." etc. which I'm trying to escape, nbecause - it is anecdotal after all...
<ivmarkov[m]>
JamesMunns[m]: Yes. But unfortunately this still does not answer the basic simple question: are we too-paranoid w.r.t. heap allocs, or are we not?
<JamesMunns[m]>
but I don't think most folks can make both of those claims!
<JamesMunns[m]>
I do think rust folks are a little too paranoid about heap allocs
<ivmarkov[m]>
s/nbecause/because/
<JamesMunns[m]>
but if you can avoid them, you rule out an entire failure mode
<JamesMunns[m]>
but like you said it comes at a design/impl cost, and we might be too scared about that!
<ivmarkov[m]>
Exactly* :-)
<ivmarkov[m]>
Ruling out an entire failure mode, where the failure might never occur in practice is what is worrying me...
<JamesMunns[m]>
But it's hard to say "it's fine if you are careful, and repeat your analysis over time to make sure it doesn't change"
<ivmarkov[m]>
* Exactly :-)
<JamesMunns[m]>
it's like unsafe code: it's fine if you do the due diligence
<JamesMunns[m]>
but I don't recommend everyone uses unsafe code, because it's way easier to prove correctness if you don't.
<JamesMunns[m]>
and we don't have good automated tooling for people to be confident over time,
<JamesMunns[m]>
s/,/./
<JamesMunns[m]>
tbh, I tend to use pool/slab allocators much more often as an escape hatch for things like packet processing, where you don't have a general allocator, but you can still get a 'static escape hatch
<ivmarkov[m]>
However: I do believe that the anecdotal evidence w.r.t. unsafe code being really bad is much less anecdotal compared to the anecdotal evidence of "heap allocators are really bad at frag" which I've seen so far. Is there, like a single person in this community who had used TLSF with 50% mem overhead for defrag, with terrible alloc paterns (allocating all over the place), and having issues in production?
<ivmarkov[m]>
* the place with all sorts of sizes), and
<JamesMunns[m]>
lead the charge, take notes, and prove us wrong :)
<JamesMunns[m]>
embedded developers are often overcautious, and I would love to be proven wrong!
exark has quit [Ping timeout: 276 seconds]
<ivmarkov[m]>
Well... I'm the newcomer here with completely irrelevant background in Big Data. That's why I'm asking the experts :-)
<JamesMunns[m]>
Please keep a close eye on jitter and performance variance, especially over time: you start your program with no fragmentation, but what happens after 30d of runtime!
<JamesMunns[m]>
but this is also the difference between safety and consumer code: if your smart display crashes once a month, but statistically people are only looking at it 4h a day, then it's likely they will only see a crash once every 6 months, and this is probably fine as long as you reboot quickly!
<JamesMunns[m]>
if your brakes crash once a month, you will be very upset!
<JamesMunns[m]>
but most people aren't building brake controllers! They are building stuff that is totally fine to crash occasionally, as long as they recover quickly and gracefully
<JamesMunns[m]>
if it takes 30s to reboot, and the "cost" of a 30s annoyance once a month, or really only a noticable annoyance twice a year, psh yolo ship it
<JamesMunns[m]>
thats less cool tho if that crash causes a safety issue, or for scientific experiments a loss of data, etc.
<JamesMunns[m]>
engineering is all about "good enough", and sometimes rust embedded folks overdo it for the actual problem at hand :)
<JamesMunns[m]>
but please do push the envelope, you don't need to be doing a phd to take very useful and enlightening data collection, and share it with us to shake off some old superstitious habits :)
<ivmarkov[m]>
* Thank you for your feedback! to address it one by one:
<ivmarkov[m]>
- Yes, I think I DO get the definition of WCMC being "guaranteed worst time" (but in asymptotic algirithmic measurements only) and not really talking about _actual_, _on hardware_ slowness/fastness of the actual agorithm. But I'm not even thinking about that. Let's simplify the task by just assuming that the WCMC is "good enough". Let's only talk about fragmentation, as I do.
<ivmarkov[m]>
* Thank you for your feedback! to address it one by one:
<ivmarkov[m]>
- Yes, I think I DO get the definition of WCET being "guaranteed worst time" (but in asymptotic algirithmic measurements only) and not really talking about _actual_, _on hardware_ slowness/fastness of the actual agorithm. But I'm not even thinking about that. Let's simplify the task by just assuming that the WCET is "good enough". Let's only talk about fragmentation, as I do.
<ivmarkov[m]>
* My first message (dealing with WCMC) was just me freaking out that WCMC - the non-statistical absolute measurement so to say - is really really bad for all that allocator guys.
<ivmarkov[m]>
My _second_ message however, is trying to understand - in a way - that OK, if WCMC is so really unbearably bad, _how often_ - statistically speaking - it is likely to occur in practice? And the statistical results of the TLSF guys (putting aside how much we can trust these as they assume standard distribution and give little to no info in their presentation as to the exact nature of their tests) - STILL - seem to imply that -
<ivmarkov[m]>
_statistically_ - TLSF and similar are really good at fragmentation. If my 20-sigma math is correct, you'll only get 45% of fragmentation in real life in - what is it? - 99.999999% of the cases, which might be equal to your MCU getting broken by a cosmic radiation or something.
<ivmarkov[m]>
* I think you are a bit doing a deep dive into "how a
<ivmarkov[m]>
_particular_ algorithm handles all sorts of sizes for heap requests". I guess we can only answer that by looking into the _details_ of a particular algorithm (as in e.g. TLSF or others). Where I'm going is different: assuming we trust these folks' experimental results, why do we even bother?
<JamesMunns[m]>
Still, engineering is also about risk mitigation and uncertainty, and allocators certainly add a bit of uncertainty to a program design. Might be a worthy tradeoff more than my gut is calibratted for today, esp with chips like the esp32 where there's more RAM to share :)
<dirbaio[m]>
nostd noalloc 4 lyfe 🦀
<JamesMunns[m]>
400K is a lot more than even 256K (the rp2040 or nrf52840), or 64K (many chips), or even 8K (STM32G0)
<JamesMunns[m]>
RP2040 and above is about where I'd probably feel comfortable using a non-bump/slab/pool allocator. Maybe lower if I did the math :)
<JamesMunns[m]>
but that's just me, that's all gut and no science, so take it as low quality feedback :)
<JamesMunns[m]>
but also for context, in a recent rp2040 project that was basically a "USB to RS-485 router", I just used 128KiB to make a 512x 256B "packet buffers"
<JamesMunns[m]>
it relates to "how likely are you to have *extra* RAM on chip"
<JamesMunns[m]>
like, I bet for prod designs, dirbaio has thought out all 256K of the RAM he has available, including max stack
<JamesMunns[m]>
(because most people ALSO don't have good bounded upper stack numbers!)
<ivmarkov[m]>
JamesMunns[m]: Don't get me even starting here... :-)))
<JamesMunns[m]>
Anyway, I have no data to contradict your claims, at least outside of safety critical designs, which are not the common case.
<ivmarkov[m]>
OK, so the answers is... we just don't know? Like, if I'm faced with a bunch of code using Strings, Arcs, Vecs all over the place, shall I say - no - rewrite it all, embedded allocators suck, Thou Shall Not Use Those, even for a telemetry device, or would that be "yeah, it is all fine..."
<ivmarkov[m]>
Gut feelings, that's all we have, I fear...
<JamesMunns[m]>
The answer is "today, we don't have reliable tools, which means erring on the side of caution by overprovisioning or avoiding all together"
<JamesMunns[m]>
* reliable tools for analysis, which
<ivmarkov[m]>
Right. Which is a very nice professional way to say "gut feelings", wouldn't you agree? :D
<JamesMunns[m]>
i'm saying both positive and negative sentitments are unfounded gut feelings :)
<JamesMunns[m]>
so, bring data if you want to change people's gut feelings
<JamesMunns[m]>
can you give me WCET for a given async task?
<ivmarkov[m]>
Even WCMF we have, it is just that it is so bad, that people like me start questioning themselves how much it actually occurs in practice, if at all, hence the statistical measurements?
<JamesMunns[m]>
like I give you an async task or even a blocking rust alloc function, can you give me the max cycle count?
<ivmarkov[m]>
JamesMunns[m]: We have WCET in relation to allocating a block of memory. That's _ALL_ I'm claiming :-)
<JamesMunns[m]>
for TLSF, yes, I guess?
<ivmarkov[m]>
We also have WCMF in relation to allocating a block of memory (WCMF does not even have a useful meaning outside of allocators)
<ivmarkov[m]>
JamesMunns[m]: Ands not only
<ivmarkov[m]>
All of these claim both WCET and WCMF. Look at the papers...
<ivmarkov[m]>
> <@jamesmunns:beeper.com> for TLSF, yes, I guess?
<ivmarkov[m]>
* And not only
<JamesMunns[m]>
but like, if someone is choosing between talc or linked-list-allocator, can you tell them what the WCET and WCMF for their program?
<ivmarkov[m]>
For these two I can't answer right off the top of my head. But in the repo you just looked (and I quored) there are quite a few (at least 5?) algorithms listed, as well as their WCMF measurements (and really, lets move WCET out of the conversation for a while, shall we)?
<ivmarkov[m]>
s/quored/quoted/
<JamesMunns[m]>
I mean, maybe not WCET, but a couple sigmas of allocation variance does matter
<JamesMunns[m]>
like if your allocations can take a variable amount of time between 10us and 10ms, that matters a lot, even if the WCET is 100ms and you assume you never hit that
<JamesMunns[m]>
even 10us and 1ms is a lot of potential jitter
<JamesMunns[m]>
but if you tell me you can show the 3-sigma (99.7%) variance is only 10-30us, then agreed that's probably usually good enough!
<JamesMunns[m]>
But even web server folks will tell you keeping an eye on tail latency matters, even if it's only one in a million :)
<ivmarkov[m]>
Help me understand (specifically for WCET that we still keep getting back into the conversation): Why would I care about jitter, if the algorithm can guarantee me worst case latency, and if I have sized my tasks in such a way, that its worst-case latency is good enough for my use case?
<ivmarkov[m]>
* have sized (compited / the response time) my tasks
<JamesMunns[m]>
because it adds jitter to other things you are doing
<ivmarkov[m]>
* have sized (computed / the response time) my tasks
<JamesMunns[m]>
like, if you should be sampling data at a defined rate, when one or two things jitter a little longer "in sync" with each other, you can end up with worse perf than you think
<ivmarkov[m]>
JamesMunns[m]: So? No worries if my tasks finish _earlier_. The problem is they should not finish _later_ (hence worst time).
<JamesMunns[m]>
do you have a WCET calculation tool I don't know about?
<JamesMunns[m]>
we might have it for the allocator
<JamesMunns[m]>
but not the program that surrounds it
<JamesMunns[m]>
if I have five tasks, that all have a variance of 1-10us of execution time (just for what they are doing), and they each make an allocation with a variance of 1-10us, that means that on average, we would expect them to take 105us to execute
<JamesMunns[m]>
but in the worst case, they would take 100us
<ivmarkov[m]>
Great? As long as 100us is torelable, what is the problem? That's all I'm saying...
<JamesMunns[m]>
the issue is that the more loaded your CPU is, this can cause schedule slips
<JamesMunns[m]>
like, as your CPU gets >2/3 utilized, you start having *weird* behavior because it's harder to "make up" for variance
<JamesMunns[m]>
on an unloaded (<50% CPU), then you can overcome variance like this
<JamesMunns[m]>
let me find some posts on scheduling analysis, koopman has some good takes on this...
<JamesMunns[m]>
I might have to go take pictures of my book, but the point is that if you have variance in your tasks, sometimes they can harmonically combine, causing a surprising amount more jitter
<ivmarkov[m]>
So.. getting back to the allocators... (and while you are searching) ... you are saying - in layman's terms - that the total WCET variance becomes worse due to the induced extra WCET variance coming from the allocator, so eliminating the allocator WCET variance actually matters a lot as it is a non-significant factor (multiplicator) of the overall system variance?
<JamesMunns[m]>
this can be rare, which means if you have bugs or the additive delay is more than you expect on average, you can have hard to diagnose scheduling problems, or cause more jitter than you expect, which matters for some programs, like if you only have 500us to ACK a packet
<JamesMunns[m]>
my gut is that the variance of an alloctor is wider than your average business logic working on small Ns
<JamesMunns[m]>
but what happens very often is people have recurring tasks at say 5/10/25/100/500/1000ms
<JamesMunns[m]>
these can combine *harmonically* very rarely, and sometimes the jitter of something like an allocator can make certain tasks fall on one side or another of a "bulkier" task
<JamesMunns[m]>
like, sometimes even though the variance is +/-100us, it can cause a task to be pushed until after many other tasks because it showed up "just late enough" that 4 other tasks got scheduled
<JamesMunns[m]>
then if those tasks take 400us to operate, you now have a task jitter of 500us!
<JamesMunns[m]>
and all the sudden you have a 1ms variance because your tasks harmonically combined, but each only had a 100us variance total!
<JamesMunns[m]>
this might not matter for what you are doing!
<JamesMunns[m]>
many HMI systems dgaf as long as you can hit your 30-60fps numbers, and 16-32ms is a lot of fudge time!
<JamesMunns[m]>
but if you are sampling an analog signal where being right on 5ms matters, then it does!
<JamesMunns[m]>
and this variance often only shows up when you get >66% CPU usage, so it can show up after months of testing when you add one small feature that takes you over some limit
<JamesMunns[m]>
idk, I feel like I'm too deep in the weeds now.
<JamesMunns[m]>
like, back to the main point: more people could get away with an allocator and have an easier life
<ivmarkov[m]>
You are, I can barely follow I must admit :-)
<JamesMunns[m]>
I don't know how to judge that without a lot of manual analysis, that has to be repeated. I DO that for hard realtime systems
<JamesMunns[m]>
but it's hard to give people a gut feel when they will and wont have problems, because it depends on a LOT of variables
<JamesMunns[m]>
and ruling out failure modes means you have to do fewer analysis steps to be fairly confident you won't have problems.
<JamesMunns[m]>
but also most people aren't doing things like measuring their tasks frequencies and variance anyway! So it's hard to tell when they are and aren't good!
<JamesMunns[m]>
(or even have a good definition of their requirements or deadlines, even!)
<ivmarkov[m]>
Thank you.
<ivmarkov[m]>
Maybe one more thing if you don't mind: I wonder whether an allocator-in-the-system does have a _global_ impact? As in, even if I don't use allocating logic in my super-high-super-important high-frequency task, just because _another_ - low freq - task might use an allocator, that WOULD introduce variance in the high-prio task, wouldn't it? That is, if the allocator uses "stop the world disable all interrupt locks". Right?
<ivmarkov[m]>
Case in point: the TLSF allocator in ESP-IDf uses "disable-all-interrupt+spin-locks on multi-core"
<ivmarkov[m]>
s/IDf/IDF/, s/interrupt/interrupts/
<JamesMunns[m]>
yeah, you need to account all possible critical sections in your WCET analysis
<ivmarkov[m]>
* Thank you.
<ivmarkov[m]>
Maybe one more thing if you don't mind: I wonder whether an allocator-in-the-system does have a _global_ impact? As in, even if I don't use allocating logic in my super-high-super-important high-frequency task, just because _another_ - low freq - task might use an allocator, that WOULD introduce variance in the high-prio task, wouldn't it? That is, if the allocator uses "stop the world disable all interrupts" type of locks.
<ivmarkov[m]>
Right?
<JamesMunns[m]>
"what if every task that could take a critical section, did, at the same time, and for the worst case time possible (e.g. heap block/metadata compaction or something)"
<ivmarkov[m]>
No, the problem is worse - my high priority task (think RTOS threads) might be delayed simply because the OS-tick interrupt is not coming on-time, because this stupid low-prio task had disabled all interrupts. :-(
<ivmarkov[m]>
... because of the allocator global-lock stuff...
<JamesMunns[m]>
ivmarkov[m]: yes, this is classic priority inversion
<JamesMunns[m]>
so like if you have tasks 1, 2, 3, 4, 5; and 1 is your highest prio task, then your real WCET for 1 is actually (WCET(1) + CSMAX(2) + CSMAX(3) + CSMAX(4) + CSMAX(5))
<ivmarkov[m]>
(BTW, we totaly abused the global chat room, we should probably move this off to a thread I feel guilty...)
<ivmarkov[m]>
s/totaly/totally/
<JamesMunns[m]>
(depends on how your scheduler works tho, sometimes it's only MAX(CSMAX(2), CSMAX(3), CSMAX(4), CSMAX(5))
<JamesMunns[m]>
s//`/, s//`)/
<JamesMunns[m]>
yeah, happy to pause the conversation here.
lulf[m] has joined #rust-embedded
<lulf[m]>
Your allocator/low prio tasks could use a critical section implementation that didn't block the high prio interrupt though, assuming that the high prio and low prio didn't share any state?
<JamesMunns[m]>
I still think that more people could use allocators today than they do. But if all crates require allocators, it makes it harder for crates that can't afford allocators.
<ivmarkov[m]>
lulf[m]: Yes.
<ivmarkov[m]>
It is just not the case, as it seems... :D
<JamesMunns[m]>
lulf[m]: yeah, this gets into big picture system design :)
<JamesMunns[m]>
like, static analysis is only a piece of the puzzle, you have to correctly combine the analysis in a way that fits the realities of your system
<JamesMunns[m]>
even if you have the WCETs + deadlines of all your tasks, you need to bring that into reality with how your scheduler and hardware works.
<JamesMunns[m]>
(then the answer is "how can we verify the high prio task never accidentally tries to use data shared with low prio tasks, so it is never inflicted by their variance or potential inversion"!, like what RTIC tries to guarantee: if you have a higher prio task that doesn't share data with lower prio tasks, it'll never be impeded outside of a CS)
<JamesMunns[m]>
fwiw: I *highly* recommend Koopman's "Better Embedded System Software". The examples are a little dated (C, sometimes talks about 8 + 16 bit processors), and it is a little geared towards hard realtime systems, but it is a FANTASTIC primer to all the things I've mentioned today, like how to think about scheduling, etc.
<JamesMunns[m]>
Also a very good primer to "systems thinking" and more basic things like requirements management and planning a system design.
<JamesMunns[m]>
it's written as basically 29 "pamphlets" of information: you can dive into one topic, and not necessarily read the book cover to cover (tho I do recommend it!)
<JamesMunns[m]>
also introduces a lot of vocab so you can start searching for standard research terms for many of these topics, so it's a very good primer book IMO.
<JamesMunns[m]>
s/29/30/
Kaspar[m] has joined #rust-embedded
<Kaspar[m]>
<JamesMunns[m]> "I still think that more people..." <- Shhhh. 🙂 Much easier to add allocators than to remove.
mameluc[m] has joined #rust-embedded
<mameluc[m]>
interesting discussion and a reminder to myself why I avoid these kind of inbetween systems. Either I go with some small mcu with no_std no_alloc or I blink a led on a 8Gb pi and hope for the best. Also WDT to the rescue when there is gut feelings involved
<JamesMunns[m]>
maybe a note, none of this seems related to rust embedded :)
<JamesMunns[m]>
maaaaybe take this one to DM or another room?
<andar1an[m]1>
True, my bad
<RidhamBhagat[m]>
oh okok , fair enough.
<RidhamBhagat[m]>
dmed , did you recieve it?
<RidhamBhagat[m]>
andar1an:
<andar1an[m]1>
my matrix seems to have blown up, I will restart.
<andar1an[m]1>
On a Rust note, if anyone know about wifi 7 firmware specifically for qualcomm chips in rust please @ me.
<konkers[m]>
Has there been any exploration into keeping `PanicInfo` out of the firmware image and supporting it though something like `defmt`? Since the `PanicInfo` and `#[panic_handler]` APIs deal with `&str`s it would likely require `core` API and feature changes to support but could have some potentially sizable code size benefits.
<dirbaio[m]>
no one has tried anything afaik
<dirbaio[m]>
but I think it's not very feasible, because with panic! you can print anything that impls Display or Debug, which will necessarily do on-target formatting with bloated strings
<dirbaio[m]>
my recommendation is to add defmt support to as many crates as you can, so at least their panics don't cause bloat
<dirbaio[m]>
and enable build-std-features=panic_immediate_abort to get rid of bloat of all remaining stdlib panics. This means you don't get panic messages from them, but you can still grab the PC value and a register+stack dump in the HardFault handler and track them down that way.
<konkers[m]>
<dirbaio[m]> "but I think it's not very..." <- Hrm.... that's a good, and unfortunate point.
AlexandrosLiarok has quit [Quit: Idle timeout reached: 172800s]
<konkers[m]>
In my particular case I'm integrating with a C++ firmware that already uses https://pigweed.dev/pw_tokenizer/ so defmt support in crates doesn't immediately solve my problem. Though I could possibly provide a shim crate that provides the defmt API using pw_tokenizer. Might be a nightmare to maintain though.
<dirbaio[m]>
uuuh good luck. defmt is much more powerful than pw_tokenizer
<konkers[m]>
Curious which ways and if they're intractible? I'm only familiar with the gaps going the other way (specifically around stable token IDs).
<thejpster[m]>
Someone tried to send a defmt PR to pull the message out of a PanicInfo but it was using an unstable API that didn’t actually work
<dirbaio[m]>
and even with that, you can't tokenize the panic message
<dirbaio[m]>
it'll still be a bloated string in flash
<dirbaio[m]>
<konkers[m]> "Curious which ways and if they'..." <- uh it's hard to tell from pw_tokenizer's docs. but AFAICT pw_tokenizer tokenizes the "main" string and then serializes the args after it? so if you print a string with `%s` it'll get sent as-is over the wire
<dirbaio[m]>
vs with defmt if you print something that impls defmt::Format with {:?}, the Format impl can write more tokenized strings to the stream with more args
<dirbaio[m]>
so a defmt log message is a tree of tokenized strings, each with their arg substitutions as children.
<konkers[m]>
<dirbaio[m]> "so a defmt log message is a ..." <- I actually think this is doable with pw_tokenizer by either unfolding the tree at compile time to a single string literal or using "nested tokenization" ndhttps://pigweed.dev/seed/0105-pw_tokenizer-pw_log-nested-tokens.html)
<dirbaio[m]>
so it's not a tree
<konkers[m]>
Is there runtime dynamism with the tree?
<dirbaio[m]>
oh hmm wtf
<dirbaio[m]>
that works by embeddiing the "child" tokens into the parent token's string, then tokenizing it all as one? it's slightly different
<konkers[m]>
It basically does recursive token substitution until there's nothing left then formats. It does require that the set of fields to format is fixed.
<dirbaio[m]>
defmt just embeds the {:?} placeholder in the main tokenized stinrg
<dirbaio[m]>
then the arg contains another tokenized string (with its own number), plus its own args
<konkers[m]>
Ah, so values are tagged?
<dirbaio[m]>
they're juts more tokens
<dirbaio[m]>
s/juts/just/
<dirbaio[m]>
{:?} tells the decoder "read a varuint, decode it as a token, then read and decode whatever placeholders it has"
<dirbaio[m]>
so
<dirbaio[m]>
a format impl can choose the token at runtime: if whatever { write!(fmt, "super long string one {} {}", foo, bar") } else { write!(fmt, "super long string two") }
<konkers[m]>
Yeah, that's what I meant by tagged.
<dirbaio[m]>
s//`/, s/"//, s//`/
<konkers[m]>
I have to think on this some more but I think this could be made to work with the way pw_tokenizer works today... and if not with a small extension.
<konkers[m]>
Thanks for all the info!
<dirbaio[m]>
maybe it'll be easier to support both kinds of log messages over the wire 😅
<dirbaio[m]>
have some way to mark "this is a defmt frame" and "this is a pw_tokenizer frame"
<dirbaio[m]>
and support both in the decoder 🫠
<konkers[m]>
One nice upshot with using printf format strings in pw_tokenizer is that it encodes the data types so they don't have to be tagged. (there are plenty of "downshots" of using printf format strings too)
<dirbaio[m]>
defmt won't tag if you do info!("{=u32}", 42)
<dirbaio[m]>
it will if you do info!("{:?}", 42), then it goes through the impl Format for u32 which does a write!(fmt, "{=u32}", self)
<dirbaio[m]>
* it will if you do info!("{:?}", 42), because it goes through the impl Format for u32 which does a write!(fmt, "{=u32}", self)
<konkers[m]>
Can it do type inference if I do info!("{}", 42u32)?
<dirbaio[m]>
nope :(
<dirbaio[m]>
types aren't available when the proc macro runs
<konkers[m]>
There's been some ideas thrown around internally here about how to get around that. TAIT is one of those....
<dirbaio[m]>
typeck runs after, on the macro output 🥲
<konkers[m]>
(our formatting system is quite complex as we want to accept both rust and printf format strings and output either of them based on the logging backend)
<dirbaio[m]>
hmm
<dirbaio[m]>
ah and then hash the result of concating everything in const?
<konkers[m]>
Only a handful of types are implemented at the moment and there's no #[derive(Format)] support yet.
<konkers[m]>
dirbaio[m]: yep
<dirbaio[m]>
that is pretty cool
<dirbaio[m]>
i'm not sure if you can do that in defmt, because the string has to end up in an attr
<dirbaio[m]>
to be the symbol name
<konkers[m]>
Does defmt do hashing? When I last looked, it looked like it was storing directed indexes into the message database which are not stable across builds.
<dirbaio[m]>
defmt doesn't hash the strings, it makes them into "fake" symbols to get the linker assign addresses to them that are 1, 2, 3, 4...
<dirbaio[m]>
so tokens are smaller when encoded as varuints, 1-2 bytes instead of 4 bytes as in pw_tokenizer
<konkers[m]>
Yeah, it's hard requirement for out customers to have tooling that is agnostic to firmware version. So slurp up all the token database elf sections of all the versions and merge them.
<dirbaio[m]>
and there's guaranteed no collision
<konkers[m]>
s/out/our/
<dirbaio[m]>
I had this idea of changing the proc macro to hit an sqlite db at compile time to allocate the ids
<konkers[m]>
* Yeah, it's hard requirement for our customers to have tooling that is agnostic to firmware version. So we slurp up all the token database elf sections of all the versions and merge them.
<dirbaio[m]>
if it exists, use the id, otherwise insert it with the next available id
<dirbaio[m]>
so you could get small IDs, and get a single db capable of decoding logs from multiple fw versions by simply keeping that sqlite db between builds 🤣
<dirbaio[m]>
* you could still get small, * small IDs insteead of 32bit hashes, and
<konkers[m]>
Works if there's a single developer/build machine
<dirbaio[m]>
why? you can have as many dev machines as you can
<dirbaio[m]>
you only need to "serialize" the production builds that you're actually going to release
<dirbaio[m]>
s/can/want/
<konkers[m]>
Are you checking the sqlite db in to the reop?
<dirbaio[m]>
asking the hard questions there 😅
<dirbaio[m]>
no idea
<konkers[m]>
Also, with a distributed build like bazel, your build might not even be happing all on one machine
<dirbaio[m]>
chuck it in blob storage, make ci download, modify, upload?
<dirbaio[m]>
lol
<konkers[m]>
yeah, big lol :rofl:
<konkers[m]>
* yeah, big lol
<dirbaio[m]>
no i'm not joking. firmware build times aren't that long, plus you don't release new firmware versions to production that often
<konkers[m]>
Perhaps a config option/feature could be added to defmt to use stable hashes and let the user decide?
<konkers[m]>
If we had that, the mixing of defmt and pw_tokenizer logs would be a viable solution
<dirbaio[m]>
why do you want bazel/blaze for firmware dev? it's not like "server Google" where they deploy 500mb binaries to borg where you either use blaze or die waiting
<dirbaio[m]>
if a build takes 10min it means you can do 144 prod releases a day, it's more than enough :D
<dirbaio[m]>
or
<dirbaio[m]>
hey
<konkers[m]>
For a handful of different reasons. A big one is that with many projects, the firmware is just one component. Also bazel pretty decent at handling polyglot projects.
<dirbaio[m]>
make the proc macro hit a REST API that allocates the token IDs 🤣
<konkers[m]>
Plus there's the "when you have a hammer, everything looks like a nail" reason too
<konkers[m]>
connecting to a db as part of the build to allocate IDs also breaks build reproducibility
<dirbaio[m]>
that's true!
<dirbaio[m]>
<konkers[m]> "Perhaps a config option/..." <- yeah jokes aside, I can't see any reason not to do this, it seems very doable.
<konkers[m]>
dirbaio[m]: I'll explore that a bit and see how invasive it would be.
<dirbaio[m]>
you'd get bigger tokens, but stable across builds
<konkers[m]>
Yeah, it seems like a tradeoff that some projects would be happy to make
<dirbaio[m]>
ah there's another problem, tokens are a fixed u16 on the wire 😅
<thejpster[m]>
If you can think of a way to widen that without breaking anything I’m all ears
<thejpster[m]>
I want to do a 1.0 and I want 0.3.99 to re-export 1.0
<dirbaio[m]>
defmt 1.0?
<thejpster[m]>
Personally I think it’s long overdue. Not my call though.
<konkers[m]>
<dirbaio[m]> "ah there's another problem..." <- ah, so not varint encoded?
<dirbaio[m]>
it doesn't use varints. ints are fixed width, then the entire stream is compressed in a way that makes zeros take less space https://defmt.ferrous-systems.com/encoding
<dirbaio[m]>
yields smaller code size
<konkers[m]>
dirbaio[m]: Interesting!!
<dirbaio[m]>
<thejpster[m]> "If you can think of a way to..." <- wire format is versioned independently, so a new 0.3.x release could switch to "wire format v5" where tokens are u32
<dirbaio[m]>
together with a defmt-decoder release that adds support for decoding v5, while keeping v4
<thejpster[m]>
The version is encoded in a symbol in the defmt section I think
<dirbaio[m]>
yep, so the decoder can tell which to use
<thejpster[m]>
I’m sure there will be a request for comments before it’s nailed down
<thejpster[m]>
The trick will be allowing future changes without breaking any existing users
<dirbaio[m]>
the advantage of versioning the wire format independently is you can make breaking changes to it without doing a major bump to the defmt crate
<dirbaio[m]>
that's "technically breaking" since it might cause a setup to stop working if you update defmt but not update defmt-decoder on the pc side
<dirbaio[m]>
in the same way a msrv bump is "technically breaking" because your build will stop working if you update defmt without updating rustc
<thejpster[m]>
I’m ok with that. Host tools are easier to update and you should keep them in sync with your fleet
<dirbaio[m]>
but imo that's acceptable
<dirbaio[m]>
it's wayy wayy better than freezing the wire format together
<dirbaio[m]>
s/together/forever/
<dirbaio[m]>
or worse, doing a major bump of the defmt crate 💀
<dirbaio[m]>
so the contract would be "if you update defmt 0.3.x to 0.3.y your firmware will always keep buidling, but you might find it starts speaking a newer wire format"
<dirbaio[m]>
so, no breaking changes to the rust api, just to the wire format
<JamesMunns[m]>
<dirbaio[m]> "you only need to "serialize" the..." <- I'm actually working on an alternative to this.
<JamesMunns[m]>
The device keeps the schemas, and reports them at runtime
<JamesMunns[m]>
I'm building a server side management tool that lets you just get type safe access to fields based on the uploaded schemas
<JamesMunns[m]>
It also has a DB with a rolling history
<JamesMunns[m]>
So if schemas change over time you can still get decoded logs
<JamesMunns[m]>
And it's postcard on the wire for encoding.
<konkers[m]>
JamesMunns[m]: Doesn't that negate the code size benefits of tokenizing the strings?
<JamesMunns[m]>
And also probably running it as a lib crate if you don't want a persistent server
<JamesMunns[m]>
konkers[m]: Yes, it's fine on bigger chips like the rp2040
<JamesMunns[m]>
JamesMunns[m]: Might not scale well to stm32g0.
<JamesMunns[m]>
But my goal is to do it over USB with no debugger, so that's larger chips anyway.
<konkers[m]>
Most of the users of pw_tokenozer are very code size constrained and that's why they adopt it. Having megabytes of code size doesn't seem to be a thing in the near future for high volume products
<JamesMunns[m]>
Schemas for reasonably sized types are like 64-256 bytes
<JamesMunns[m]>
You would be able to fit thousands of scemas in 1MiB :)
<JamesMunns[m]>
If you only want to send logs and already have USB, I would bet it takes only a few (dozen?) K of codespace
<JamesMunns[m]>
But if you don't have USB, I might be able to do it over rtt too
<JamesMunns[m]>
I'll come back with real numbers once I have it more built out :)
<JamesMunns[m]>
s/scemas/schemas/
<konkers[m]>
Do these schemas include the string content of the log?
<JamesMunns[m]>
Yes, I get that makes it take more space. It also makes managing arbitrary devices with code written by other people manageable
<JamesMunns[m]>
If you aren't passing around elfs.
<JamesMunns[m]>
It also makes it possible to dynamically generate interfaces
<JamesMunns[m]>
Anyway, I think I'm solving a different problem, for sure.
<konkers[m]>
Related to postcard-rpc?
<JamesMunns[m]>
Yep
<JamesMunns[m]>
Basically a managed reverse proxy for multiple connected devices
<JamesMunns[m]>
A whole post station, maybe :)
<konkers[m]>
Neat. I was just looking at that the other day for a hobby project.
<konkers[m]>
The N count for rpc schema is definitely a lot less than lines of logging.
<JamesMunns[m]>
The goal is also to support fancy structured logging too, so you get similar deferred formatting benefits
<JamesMunns[m]>
More like tracing than defmt, but some (unformatted) strings are still passed on the wire
<konkers[m]>
We've had similar thoughts with our tokenized logging but have not had time to fully spec it out much less implement it.
<JamesMunns[m]>
My schemas are serialized to postcard, based on the stable wire format and schema types, so I could stabilize it as a standard soonish
<JamesMunns[m]>
I also have "type punning" working, so you can send a `&[u8]` on the MCU side, and receive it as a Vec<u8> on the PC side
<JamesMunns[m]>
I'm excited about it :D
<konkers[m]>
Have you thought much about MCU to MCU comms? pw_rpc gets used for that as well as MCU to host.
<JamesMunns[m]>
The intent is to have an MCU client as well for postcard rpc
<JamesMunns[m]>
At least for point to point comms, or over an existing network.