#rust-embedded on 2024-09-27 — irc logs at libera.irclog.whitequark.org

2022-02-07 19:20 ChanServ changed the topic of #rust-embedded to: Welcome to the Rust Embedded IRC channel! Bridged to #rust-embedded:matrix.org and logged at https://libera.irclog.whitequark.org/rust-embedded, code of conduct at https://www.rust-lang.org/conduct.html

01:03 AtleoS has quit [Read error: Connection reset by peer]

01:27 sugoi has joined #rust-embedded

01:49 sugoi has quit [Ping timeout: 252 seconds]

03:04 sugoi has joined #rust-embedded

03:06 sugoi1 has joined #rust-embedded

03:09 sugoi has quit [Ping timeout: 265 seconds]

03:09 sugoi1 is now known as sugoi

04:02 sugoi has quit [Ping timeout: 265 seconds]

04:32 sugoi has joined #rust-embedded

04:42 sugoi1 has joined #rust-embedded

04:42 sugoi has quit [Ping timeout: 248 seconds]

04:42 sugoi1 is now known as sugoi

05:27 sugoi has quit [Ping timeout: 248 seconds]

05:37 sugoi has joined #rust-embedded

05:41 sugoi has quit [Ping timeout: 252 seconds]

05:49 pbsds30 has joined #rust-embedded

05:50 pbsds3 has quit [Ping timeout: 276 seconds]

05:50 pbsds30 is now known as pbsds3

06:16 ivmarkov[m] has joined #rust-embedded

06:16 <ivmarkov[m]> Good morning!... (full message at <https://catircservices.org/_irc/v1/media/download/AQTQciIjxs52HUHgtUw0g_cZcU32D9O7QCeGhpesNFPYsbq-lLB_7wymqs_WemNIPwq4rw168_D49tYIsFogTn5CeSMlbGwwAGNhdGlyY3NlcnZpY2VzLm9yZy92WmtoWVNJaWtXcGROdWFQZ0VaTFlIU3Y>)

06:16 <ivmarkov[m]> * Good morning!... (full message at <https://catircservices.org/_irc/v1/media/download/Aa4b88cKpC9HlK-qN2PmRE2gfHcFjnNKw3--NgXMyEzMxBENIS_ICc-cM7LB_PsgQhQO6HokfGyOXdXaktHkNgRCeSMlbd2wAGNhdGlyY3NlcnZpY2VzLm9yZy9UQ2t3REZPc3dnQ3pTUXRSVU55c1N4S1o>)

06:16 <ivmarkov[m]> * Good morning!... (full message at <https://catircservices.org/_irc/v1/media/download/AU5ipF2AV67g3Kl_y4GsbJk5liXqDP5Z0o-T_o_-jfYbw4vHvsTKGDcd47NCIHUeBHEgGUNf2lZtJkYxiysT-HNCeSMlbyIQAGNhdGlyY3NlcnZpY2VzLm9yZy9rVGFmdERUUExlUWxxdnNBZm5GTGdUdEs>)

06:17 <ivmarkov[m]> * Good morning!... (full message at <https://catircservices.org/_irc/v1/media/download/AQ6QuAL60Wav3kuBotI_pc1OYm0ERSHUuegk5DEV9ECnJhGU-lSqTV3EJgcYZ84XkSmQdv8ALGl282hP75GauGFCeSMleUywAGNhdGlyY3NlcnZpY2VzLm9yZy9YVk1MWElVaklQUkdzTUpwUWZ0anJwRnk>)

06:17 <ivmarkov[m]> * Good morning!... (full message at <https://catircservices.org/_irc/v1/media/download/AfGOYGS8XhYvlFlrH5AwaW22IgOXP0-t3TQWkLrO6i_dhOS2HihEwd4xqmq4783dIRlJAlU6cPNfB4aQHFahrHtCeSMlfBDgAGNhdGlyY3NlcnZpY2VzLm9yZy9LbnFUcXhtdWdVUXFMV3VEV21USkp5d0w>)

06:23 <ivmarkov[m]> For TLSF (the thing used by ESP-IDF) the author of the above GH repo gives H(M, n) = M * (n - 2) which is simply terrible. I.e. for the same params as above I get 256KB * (32KB - 2) ~ 8 **Gigabytes** of RAM WCMC.

06:23 <ivmarkov[m]> I get it that in reality the chance of hitting WCMC is probably very very small, and I should be looking at averages and standard deviation but yet - such a large fragmentation seems alarming. I do hope I'm just not good at understanding these formulae though. :-)

06:23 <ivmarkov[m]> Would appreciate your feedback!

06:24 <ivmarkov[m]> * For TLSF (the thing used by ESP-IDF) the author of the above GH repo gives H(M, n) = M \* (n - 2) which is simply terrible. I.e. for the same params as above I get 256KB \* (32KB - 2) ~ 8 **Gigabytes** of RAM WCMC.

06:24 <ivmarkov[m]> I get it that in reality the chance of hitting WCMC is probably very very small, and I should be looking at averages and standard deviation but yet - such a large worst-case fragmentation seems alarming. I do hope I'm just not good at understanding these formulae though. :-)

06:24 <ivmarkov[m]> Would appreciate your feedback!

07:17 <ivmarkov[m]> To answer my own question (feel stupid now) the TLSF folks specifically don't even bother with WCMC. With only 12.5% average fragmentation of their worst-case tests and sigma = 1.6% this means six-sigma is at most 22% fragmentation (right?), which sounds quite OK. all [here](http://www.gii.upv.es/tlsf/files/papers/TLSF_performance.pdf)

07:24 <ivmarkov[m]> 20-sigma would then be only 45% fragmentation which is still quite OK. The bottom line of all of this is... why bother with no-alloc if even simple RTOS allocators are that good? Maybe I still make simple math mistakes... not sure, but the question remains?

07:25 sugoi has joined #rust-embedded

07:30 sugoi has quit [Ping timeout: 265 seconds]

07:45 <chrysn[m]> <explodingwaffle1> "https://blog.rust-lang.org/..."; <- I had brief hopes that this would be a viable replacement for TAIT as used in embassy for tasks, but the MVP only allows restricting where clauses and not giving types for statics. Still, it's likely a good step in the right direction..

07:58 emerent has quit [Ping timeout: 248 seconds]

07:59 emerent has joined #rust-embedded

08:49 <JamesMunns[m]> ivmarkov I haven't seen WCMC before, all the hard realtime systems I've worked on either do no allocation, or all allocation at boot, where no allocations occur after init (basically a bump allocator).

08:49 <JamesMunns[m]> The allocation being "constant time" is not necessarily *fast*, it means that you can guaranteed that an alloc with an empty heap or an almost-entirely full heap takes the same time, to make hard realtime guarantees. In hard realtime, you really ONLY care about the "absolute worst possible case", because you still need to hit timing deadlines even in the (almost impossible) worst case

08:50 <JamesMunns[m]> But I'm also not sure what you mean, because the first two messages are talking about how *bad* allocators are, you need a huge amount of overprovisioning of memory, then you say:

08:50 <JamesMunns[m]> > why bother with no-alloc if even simple RTOS allocators are that good? Maybe I still make simple math mistakes

08:51 <JamesMunns[m]> like, you could say that allocations have a "constant time" of 100ms, for example

08:53 <ivmarkov[m]> JamesMunns[m]: > <@jamesmunns:beeper.com> ivmarkov I haven't seen WCMC before, all the hard realtime systems I've worked on either do no allocation, or all allocation at boot, where no allocations occur... (full message at <https://catircservices.org/_irc/v1/media/download/AS3GoLX2Qllo2g9OWegRE1zUBmtlMEZsxPIy8erWckh5c82iAF0fd9c7becewD-t8k7sM6bj4zmUXcyHqNHP4tVCeSMuaSuAAGNhdGlyY3NlcnZpY2VzLm9yZy9VRVFxcWd6WmVmbG53V3JkVnNSUGVZYkM>)

08:54 exark_ has quit [Remote host closed the connection]

08:54 <JamesMunns[m]> fwiw, WCET is usually not algorithmic

08:54 <JamesMunns[m]> in many hard realtime systems, you really are calculating in terms of hardware cycle times in relation to wall clock times

08:54 <JamesMunns[m]> like, often with things like worst case cache miss times, etc.

08:55 <JamesMunns[m]> (or you turn all your caches off so there is no such thing as a cache miss, even tho it might make you 2x slower but predictable)

08:56 <JamesMunns[m]> o1heap even says in the readme:

08:56 <JamesMunns[m]> > it can be described as a predictably bad allocator

08:56 exark has joined #rust-embedded

08:57 <ivmarkov[m]> <JamesMunns[m]> "But I'm also not sure what you..." <- > <@jamesmunns:beeper.com> But I'm also not sure what you mean, because the first two messages are talking about how *bad* allocators are, you need... (full message at <https://catircservices.org/_irc/v1/media/download/ATxoYX7AZKbCif7cDwtzsP1XtAFu5fcfxdXfup8c6SrlsDt8tTL_JD1hzjxJK4-l3oxd2grgOkTc0sMSaer3En1CeSMunmoAAGNhdGlyY3NlcnZpY2VzLm9yZy9RTUZnbWJud0tvVnREd2liT1lrbHBsaU4>)

08:57 <ivmarkov[m]> * My first message (dealing with WCET) was just me freaking out that WCET - the non-statistical absolute measurement so to say - is really really bad for all that allocator guys.

08:57 <ivmarkov[m]> My _second_ message however, is trying to understand - in a way - that OK, if WCET is so really unbearably bad, _how often_ - statistically speaking - it is likely to occur in practice? And the statistical results of the TLSF guys (putting aside how much we can trust these as they assume standard distribution and give little to no info in their presentation as to the exact nature of their tests) - STILL - seem to imply that -

08:57 <ivmarkov[m]> _statistically_ - TLSF and similar are really good at fragmentation. If my 20-sigma math is correct, you'll only get 45% of fragmentation in real life in - what is it? - 99.999999% of the cases, which might be equal to your MCU getting broken by a cosmic radiation or something.

08:58 <JamesMunns[m]> yeah, totally agree

08:58 <ivmarkov[m]> So in the end

08:58 <JamesMunns[m]> "hard realtime" to "soft/non-realtime" is a really huge difference in how you *approach* systems design

08:58 <ivmarkov[m]> it boils down to the following simple question, which is - I hope - as relevant to Rust Embedded as it gets:

08:59 <ivmarkov[m]> JamesMunns[m]: But... but.... we are talking about _physical systems_ here. There is no absolute firm boundary between hard and soft w.r.t. what I'm talking about. If the chance to get bad fragmentation is so statistically insignificant so as this to be comparable to a hardware failure, **why bother** with `no-alloc` at all, is what I'm asking?

09:00 <JamesMunns[m]> But I think that's the "only" part, even if TLSF or other algs "only" need 45%-100% overprovisioning MOST of the time, that means you still need 1.5-2.0x the total mem available, AND most people aren't doing calculations like that, so they just need to add an even higher safety factor

09:00 <ivmarkov[m]> because no-alloc as a programming discipline is hard (another assumption, but OK this had been my experience so far). Lifetimes and stuff...

09:00 <JamesMunns[m]> like, you COULD do the analysis on all possible allocation types and kinds and patterns and do the analysis and stuff.

09:00 <JamesMunns[m]> BUT if you use a "normal" allocator that is smarter and faster, it might still have "worst case" patterns that introduce jitter in the system

09:01 <JamesMunns[m]> like if sometimes an alloc takes 100us and sometimes it takes 5ms

09:01 <JamesMunns[m]> like, don't get me wrong, I'm not militantly anti-alloc

09:01 <JamesMunns[m]> it just makes "guaranteeing" certain timing and success criterias of your system harder, when you can't significantly overprovision the amount of memory you need.

09:02 <ivmarkov[m]> Please look at the math. 45% is at 20-sigma (!!). My math might be completely wrong (I skipped statistics 20y ago at school :-) ) but if it IS correct, 45% pover-provisioning is something you should do if you are the most paranoid man in the world. It is 20-sigma after all?

09:03 <JamesMunns[m]> personally: if I was advising a group that wanted to use an allocator, and they didn't have a safety factor of 2 (e.g. they calculate they need 128KiB of heap, so they ensure they have 256KiB of heap), I would be REALLY nervous

09:03 JoonaHolkko[m] has joined #rust-embedded

09:03 <JoonaHolkko[m]> eldruin: Hi. I found a crate made by you named `embedded-i2s`. Would it be possible to get some assistance with it?

09:04 <JamesMunns[m]> UNLESS they really were doing calculation of all the different kinds of allocation patterns and could really prove they could avoid fragmentation.

09:05 <JamesMunns[m]> I'm not sure how your algorithm handles very many different sizes of allocation, not just say "32KiB allocs in a 256KiB space"

09:05 <JamesMunns[m]> like if you have a ton of little allocs, where the size of the allocation header or allocator overhead starts being larger than the allocations themselves

09:06 <JamesMunns[m]> or where you require very deep traversal to figure out where to place a larger alloc (say an HTTP buffer) in an allocator fragmented by very small (say TCP packet header) allocs

09:07 <JamesMunns[m]> I'm less current on the research, so this might just me being overly worried!

09:07 <JamesMunns[m]> But IMO the biggest issue with allocators is first jitter, and second that there isn't good tooling for ACCURATELY measuring fragmentation numbers for complex allocation patterns

09:07 <ivmarkov[m]> I think you are bit doing a deep dive into "how a

09:07 <ivmarkov[m]> _particular_ algorithm handles all sorts of sizes for heap requests". I guess we can only answer that by looking into the _details_ of a particular algorithm (as in e.g. TLSF or others). Where I'm going is different: assuming we trust these folks' experimental results, why do we even bother?

09:08 <JamesMunns[m]> yes, IF your allocator has a bounded WCET that is ALWAYS okay if you hit it in EVERY allocation, AND you overprovision enough (even if it's not much) to allow for WCMF, then you should feel comfortable using an allocator!

09:09 <ivmarkov[m]> And what I have is only anecdotal evidence in the form of "my experience so far..." etc. which I'm trying to escape, nbecause - it is anecdotal after all...

09:09 <ivmarkov[m]> JamesMunns[m]: Yes. But unfortunately this still does not answer the basic simple question: are we too-paranoid w.r.t. heap allocs, or are we not?

09:09 <JamesMunns[m]> but I don't think most folks can make both of those claims!

09:09 <JamesMunns[m]> I do think rust folks are a little too paranoid about heap allocs

09:09 <ivmarkov[m]> s/nbecause/because/

09:09 <JamesMunns[m]> but if you can avoid them, you rule out an entire failure mode

09:09 <JamesMunns[m]> but like you said it comes at a design/impl cost, and we might be too scared about that!

09:10 <ivmarkov[m]> Exactly* :-)

09:10 <ivmarkov[m]> Ruling out an entire failure mode, where the failure might never occur in practice is what is worrying me...

09:10 <JamesMunns[m]> But it's hard to say "it's fine if you are careful, and repeat your analysis over time to make sure it doesn't change"

09:10 <ivmarkov[m]> * Exactly :-)

09:11 <JamesMunns[m]> it's like unsafe code: it's fine if you do the due diligence

09:11 <JamesMunns[m]> but I don't recommend everyone uses unsafe code, because it's way easier to prove correctness if you don't.

09:11 <JamesMunns[m]> and we don't have good automated tooling for people to be confident over time,

09:11 <JamesMunns[m]> s/,/./

09:12 <JamesMunns[m]> tbh, I tend to use pool/slab allocators much more often as an escape hatch for things like packet processing, where you don't have a general allocator, but you can still get a 'static escape hatch

09:12 <ivmarkov[m]> However: I do believe that the anecdotal evidence w.r.t. unsafe code being really bad is much less anecdotal compared to the anecdotal evidence of "heap allocators are really bad at frag" which I've seen so far. Is there, like a single person in this community who had used TLSF with 50% mem overhead for defrag, with terrible alloc paterns (allocating all over the place), and having issues in production?

09:13 <ivmarkov[m]> * the place with all sorts of sizes), and

09:13 <JamesMunns[m]> lead the charge, take notes, and prove us wrong :)

09:13 <JamesMunns[m]> embedded developers are often overcautious, and I would love to be proven wrong!

09:13 exark has quit [Ping timeout: 276 seconds]

09:13 <ivmarkov[m]> Well... I'm the newcomer here with completely irrelevant background in Big Data. That's why I'm asking the experts :-)

09:13 <JamesMunns[m]> Please keep a close eye on jitter and performance variance, especially over time: you start your program with no fragmentation, but what happens after 30d of runtime!

09:15 <JamesMunns[m]> but this is also the difference between safety and consumer code: if your smart display crashes once a month, but statistically people are only looking at it 4h a day, then it's likely they will only see a crash once every 6 months, and this is probably fine as long as you reboot quickly!

09:15 <JamesMunns[m]> if your brakes crash once a month, you will be very upset!

09:16 <JamesMunns[m]> but most people aren't building brake controllers! They are building stuff that is totally fine to crash occasionally, as long as they recover quickly and gracefully

09:16 <JamesMunns[m]> if it takes 30s to reboot, and the "cost" of a 30s annoyance once a month, or really only a noticable annoyance twice a year, psh yolo ship it

09:17 <JamesMunns[m]> thats less cool tho if that crash causes a safety issue, or for scientific experiments a loss of data, etc.

09:17 <JamesMunns[m]> engineering is all about "good enough", and sometimes rust embedded folks overdo it for the actual problem at hand :)

09:18 <JamesMunns[m]> but please do push the envelope, you don't need to be doing a phd to take very useful and enlightening data collection, and share it with us to shake off some old superstitious habits :)

09:19 <ivmarkov[m]> * Thank you for your feedback! to address it one by one:

09:19 <ivmarkov[m]> - Yes, I think I DO get the definition of WCMC being "guaranteed worst time" (but in asymptotic algirithmic measurements only) and not really talking about _actual_, _on hardware_ slowness/fastness of the actual agorithm. But I'm not even thinking about that. Let's simplify the task by just assuming that the WCMC is "good enough". Let's only talk about fragmentation, as I do.

09:19 <ivmarkov[m]> * Thank you for your feedback! to address it one by one:

09:19 <ivmarkov[m]> - Yes, I think I DO get the definition of WCET being "guaranteed worst time" (but in asymptotic algirithmic measurements only) and not really talking about _actual_, _on hardware_ slowness/fastness of the actual agorithm. But I'm not even thinking about that. Let's simplify the task by just assuming that the WCET is "good enough". Let's only talk about fragmentation, as I do.

09:19 <ivmarkov[m]> * My first message (dealing with WCMC) was just me freaking out that WCMC - the non-statistical absolute measurement so to say - is really really bad for all that allocator guys.

09:19 <ivmarkov[m]> My _second_ message however, is trying to understand - in a way - that OK, if WCMC is so really unbearably bad, _how often_ - statistically speaking - it is likely to occur in practice? And the statistical results of the TLSF guys (putting aside how much we can trust these as they assume standard distribution and give little to no info in their presentation as to the exact nature of their tests) - STILL - seem to imply that -

09:19 <ivmarkov[m]> _statistically_ - TLSF and similar are really good at fragmentation. If my 20-sigma math is correct, you'll only get 45% of fragmentation in real life in - what is it? - 99.999999% of the cases, which might be equal to your MCU getting broken by a cosmic radiation or something.

09:20 <ivmarkov[m]> * I think you are a bit doing a deep dive into "how a

09:20 <ivmarkov[m]> _particular_ algorithm handles all sorts of sizes for heap requests". I guess we can only answer that by looking into the _details_ of a particular algorithm (as in e.g. TLSF or others). Where I'm going is different: assuming we trust these folks' experimental results, why do we even bother?

09:21 <JamesMunns[m]> Still, engineering is also about risk mitigation and uncertainty, and allocators certainly add a bit of uncertainty to a program design. Might be a worthy tradeoff more than my gut is calibratted for today, esp with chips like the esp32 where there's more RAM to share :)

09:21 <dirbaio[m]> nostd noalloc 4 lyfe 🦀

09:21 <JamesMunns[m]> 400K is a lot more than even 256K (the rp2040 or nrf52840), or 64K (many chips), or even 8K (STM32G0)

09:23 <JamesMunns[m]> RP2040 and above is about where I'd probably feel comfortable using a non-bump/slab/pool allocator. Maybe lower if I did the math :)

09:23 <JamesMunns[m]> but that's just me, that's all gut and no science, so take it as low quality feedback :)

09:24 <ivmarkov[m]> (... (full message at <https://catircservices.org/_irc/v1/media/download/AW2xdFDBizkhJApwLg-C6BzOWwytG5J_vLx3eDpjTTe24uTEs-ByL8UVI1gsXycoYvaZrfOzw0ZQM1Ft9l0djohCeSMwLThgAGNhdGlyY3NlcnZpY2VzLm9yZy9LSGFEWGJaSlpmT3FaTUtqTGtxUlF4ZlE>)

09:24 <JamesMunns[m]> but also for context, in a recent rp2040 project that was basically a "USB to RS-485 router", I just used 128KiB to make a 512x 256B "packet buffers"

09:25 <JamesMunns[m]> it relates to "how likely are you to have *extra* RAM on chip"

09:25 <JamesMunns[m]> like, I bet for prod designs, dirbaio has thought out all 256K of the RAM he has available, including max stack

09:25 <JamesMunns[m]> (because most people ALSO don't have good bounded upper stack numbers!)

09:26 <ivmarkov[m]> JamesMunns[m]: Don't get me even starting here... :-)))

09:27 <JamesMunns[m]> Anyway, I have no data to contradict your claims, at least outside of safety critical designs, which are not the common case.

09:27 <ivmarkov[m]> OK, so the answers is... we just don't know? Like, if I'm faced with a bunch of code using Strings, Arcs, Vecs all over the place, shall I say - no - rewrite it all, embedded allocators suck, Thou Shall Not Use Those, even for a telemetry device, or would that be "yeah, it is all fine..."

09:28 <ivmarkov[m]> Gut feelings, that's all we have, I fear...

09:28 <JamesMunns[m]> The answer is "today, we don't have reliable tools, which means erring on the side of caution by overprovisioning or avoiding all together"

09:28 <JamesMunns[m]> * reliable tools for analysis, which

09:29 <ivmarkov[m]> Right. Which is a very nice professional way to say "gut feelings", wouldn't you agree? :D

09:29 <JamesMunns[m]> i'm saying both positive and negative sentitments are unfounded gut feelings :)

09:29 <JamesMunns[m]> so, bring data if you want to change people's gut feelings

09:30 exark has joined #rust-embedded

09:31 <JamesMunns[m]> But also keep in mind that IMO people should have a decent feel for:... (full message at <https://catircservices.org/_irc/v1/media/download/AXx6cRgRe09EeiV3ZLKoRT3so6jeoaXw4kF8zDIUZddoQLJi7iU4ndkW6Cj2WkSrBf6eAeYCY3zuu1GZhKvFH4JCeSMwki3wAGNhdGlyY3NlcnZpY2VzLm9yZy91WEt1VXVweXdwZUFybEJXcWZacm1jQ00>)

09:32 <ivmarkov[m]> JamesMunns[m]: > <@jamesmunns:beeper.com> But also keep in mind that IMO people should have a decent feel for:... (full message at <https://catircservices.org/_irc/v1/media/download/ASm-MsxAz8cgjxYn-s2k153kPbSLlLT8ngTnt8MrN2AE7BHHqXy_95MJu4YoOwig49aA4HLz81Wk9vstWljCJDtCeSMwoJdwAGNhdGlyY3NlcnZpY2VzLm9yZy9iVGNwTHhGQUZSblpDbGZNeFlGdXFMaW0>)

09:32 <JamesMunns[m]> can you give me WCET for a given async task?

09:33 <ivmarkov[m]> Even WCMF we have, it is just that it is so bad, that people like me start questioning themselves how much it actually occurs in practice, if at all, hence the statistical measurements?

09:33 <JamesMunns[m]> like I give you an async task or even a blocking rust alloc function, can you give me the max cycle count?

09:33 <ivmarkov[m]> JamesMunns[m]: We have WCET in relation to allocating a block of memory. That's _ALL_ I'm claiming :-)

09:33 <JamesMunns[m]> for TLSF, yes, I guess?

09:33 <ivmarkov[m]> We also have WCMF in relation to allocating a block of memory (WCMF does not even have a useful meaning outside of allocators)

09:34 <ivmarkov[m]> JamesMunns[m]: Ands not only

09:34 <ivmarkov[m]> All of these claim both WCET and WCMF. Look at the papers...

09:34 <ivmarkov[m]> > <@jamesmunns:beeper.com> for TLSF, yes, I guess?

09:34 <ivmarkov[m]> * And not only

09:34 <JamesMunns[m]> but like, if someone is choosing between talc or linked-list-allocator, can you tell them what the WCET and WCMF for their program?

09:35 <ivmarkov[m]> For these two I can't answer right off the top of my head. But in the repo you just looked (and I quored) there are quite a few (at least 5?) algorithms listed, as well as their WCMF measurements (and really, lets move WCET out of the conversation for a while, shall we)?

09:35 <ivmarkov[m]> s/quored/quoted/

09:36 <JamesMunns[m]> I mean, maybe not WCET, but a couple sigmas of allocation variance does matter

09:36 <JamesMunns[m]> like if your allocations can take a variable amount of time between 10us and 10ms, that matters a lot, even if the WCET is 100ms and you assume you never hit that

09:36 <JamesMunns[m]> even 10us and 1ms is a lot of potential jitter

09:37 <JamesMunns[m]> but if you tell me you can show the 3-sigma (99.7%) variance is only 10-30us, then agreed that's probably usually good enough!

09:39 <JamesMunns[m]> But even web server folks will tell you keeping an eye on tail latency matters, even if it's only one in a million :)

09:41 <ivmarkov[m]> Help me understand (specifically for WCET that we still keep getting back into the conversation): Why would I care about jitter, if the algorithm can guarantee me worst case latency, and if I have sized my tasks in such a way, that its worst-case latency is good enough for my use case?

09:42 <ivmarkov[m]> * have sized (compited / the response time) my tasks

09:42 <JamesMunns[m]> because it adds jitter to other things you are doing

09:42 <ivmarkov[m]> * have sized (computed / the response time) my tasks

09:43 <JamesMunns[m]> like, if you should be sampling data at a defined rate, when one or two things jitter a little longer "in sync" with each other, you can end up with worse perf than you think

09:43 <ivmarkov[m]> JamesMunns[m]: So? No worries if my tasks finish _earlier_. The problem is they should not finish _later_ (hence worst time).

09:43 <JamesMunns[m]> do you have a WCET calculation tool I don't know about?

09:43 <JamesMunns[m]> we might have it for the allocator

09:44 <JamesMunns[m]> but not the program that surrounds it

09:45 <JamesMunns[m]> if I have five tasks, that all have a variance of 1-10us of execution time (just for what they are doing), and they each make an allocation with a variance of 1-10us, that means that on average, we would expect them to take 105us to execute

09:45 <JamesMunns[m]> s/on/_on/, s/average,/average_,/, s/105us/55us/

09:46 <JamesMunns[m]> but in the worst case, they would take 100us

09:46 <ivmarkov[m]> Great? As long as 100us is torelable, what is the problem? That's all I'm saying...

09:46 <JamesMunns[m]> the issue is that the more loaded your CPU is, this can cause schedule slips

09:46 <JamesMunns[m]> like, as your CPU gets >2/3 utilized, you start having *weird* behavior because it's harder to "make up" for variance

09:47 <JamesMunns[m]> on an unloaded (<50% CPU), then you can overcome variance like this

09:47 <JamesMunns[m]> let me find some posts on scheduling analysis, koopman has some good takes on this...

09:49 <JamesMunns[m]> I might have to go take pictures of my book, but the point is that if you have variance in your tasks, sometimes they can harmonically combine, causing a surprising amount more jitter

09:50 <ivmarkov[m]> So.. getting back to the allocators... (and while you are searching) ... you are saying - in layman's terms - that the total WCET variance becomes worse due to the induced extra WCET variance coming from the allocator, so eliminating the allocator WCET variance actually matters a lot as it is a non-significant factor (multiplicator) of the overall system variance?

09:50 <JamesMunns[m]> this can be rare, which means if you have bugs or the additive delay is more than you expect on average, you can have hard to diagnose scheduling problems, or cause more jitter than you expect, which matters for some programs, like if you only have 500us to ACK a packet

09:51 <JamesMunns[m]> my gut is that the variance of an alloctor is wider than your average business logic working on small Ns

09:51 <JamesMunns[m]> s/alloctor/allocator/

09:52 <JamesMunns[m]> https://betterembsw.blogspot.com/2014/05/real-time-scheduling-analysis-for.html

09:52 <JamesMunns[m]> but what happens very often is people have recurring tasks at say 5/10/25/100/500/1000ms

09:53 <JamesMunns[m]> these can combine *harmonically* very rarely, and sometimes the jitter of something like an allocator can make certain tasks fall on one side or another of a "bulkier" task

09:54 <JamesMunns[m]> like, sometimes even though the variance is +/-100us, it can cause a task to be pushed until after many other tasks because it showed up "just late enough" that 4 other tasks got scheduled

09:54 <JamesMunns[m]> then if those tasks take 400us to operate, you now have a task jitter of 500us!

09:54 <JamesMunns[m]> and then you might show up early next time, meaning you went:... (full message at <https://catircservices.org/_irc/v1/media/download/ATIKHyDwQbkDnmuMDUJhHZv3pMq8icQ9-uuPjrb_3I72oagGYkyfyuHpvQBkuw-xpGZyWZMwwjCX1q9Illjv4nxCeSMx6BWwAGNhdGlyY3NlcnZpY2VzLm9yZy9ocGt3cEtxVVdBSWxaamJRc0dIenNMVFY>)

09:55 <JamesMunns[m]> and all the sudden you have a 1ms variance because your tasks harmonically combined, but each only had a 100us variance total!

09:55 <JamesMunns[m]> this might not matter for what you are doing!

09:55 <JamesMunns[m]> many HMI systems dgaf as long as you can hit your 30-60fps numbers, and 16-32ms is a lot of fudge time!

09:56 <JamesMunns[m]> but if you are sampling an analog signal where being right on 5ms matters, then it does!

09:56 <JamesMunns[m]> and this variance often only shows up when you get >66% CPU usage, so it can show up after months of testing when you add one small feature that takes you over some limit

09:57 <JamesMunns[m]> idk, I feel like I'm too deep in the weeds now.

09:57 <JamesMunns[m]> like, back to the main point: more people could get away with an allocator and have an easier life

09:57 <ivmarkov[m]> You are, I can barely follow I must admit :-)

09:58 <JamesMunns[m]> I don't know how to judge that without a lot of manual analysis, that has to be repeated. I DO that for hard realtime systems

09:58 <JamesMunns[m]> but it's hard to give people a gut feel when they will and wont have problems, because it depends on a LOT of variables

09:58 <JamesMunns[m]> and ruling out failure modes means you have to do fewer analysis steps to be fairly confident you won't have problems.

09:59 <JamesMunns[m]> but also most people aren't doing things like measuring their tasks frequencies and variance anyway! So it's hard to tell when they are and aren't good!

10:00 <JamesMunns[m]> (or even have a good definition of their requirements or deadlines, even!)

10:01 <ivmarkov[m]> Thank you.

10:01 <ivmarkov[m]> Maybe one more thing if you don't mind: I wonder whether an allocator-in-the-system does have a _global_ impact? As in, even if I don't use allocating logic in my super-high-super-important high-frequency task, just because _another_ - low freq - task might use an allocator, that WOULD introduce variance in the high-prio task, wouldn't it? That is, if the allocator uses "stop the world disable all interrupt locks". Right?

10:01 <ivmarkov[m]> Case in point: the TLSF allocator in ESP-IDf uses "disable-all-interrupt+spin-locks on multi-core"

10:01 <ivmarkov[m]> s/IDf/IDF/, s/interrupt/interrupts/

10:02 <JamesMunns[m]> yeah, you need to account all possible critical sections in your WCET analysis

10:02 <ivmarkov[m]> * Thank you.

10:02 <ivmarkov[m]> Maybe one more thing if you don't mind: I wonder whether an allocator-in-the-system does have a _global_ impact? As in, even if I don't use allocating logic in my super-high-super-important high-frequency task, just because _another_ - low freq - task might use an allocator, that WOULD introduce variance in the high-prio task, wouldn't it? That is, if the allocator uses "stop the world disable all interrupts" type of locks.

10:02 <ivmarkov[m]> Right?

10:02 <JamesMunns[m]> "what if every task that could take a critical section, did, at the same time, and for the worst case time possible (e.g. heap block/metadata compaction or something)"

10:03 <ivmarkov[m]> No, the problem is worse - my high priority task (think RTOS threads) might be delayed simply because the OS-tick interrupt is not coming on-time, because this stupid low-prio task had disabled all interrupts. :-(

10:03 <ivmarkov[m]> ... because of the allocator global-lock stuff...

10:03 <JamesMunns[m]> ivmarkov[m]: yes, this is classic priority inversion

10:05 <JamesMunns[m]> so like if you have tasks 1, 2, 3, 4, 5; and 1 is your highest prio task, then your real WCET for 1 is actually (WCET(1) + CSMAX(2) + CSMAX(3) + CSMAX(4) + CSMAX(5))

10:05 <ivmarkov[m]> (BTW, we totaly abused the global chat room, we should probably move this off to a thread I feel guilty...)

10:05 <ivmarkov[m]> s/totaly/totally/

10:05 <JamesMunns[m]> (depends on how your scheduler works tho, sometimes it's only MAX(CSMAX(2), CSMAX(3), CSMAX(4), CSMAX(5))

10:06 <JamesMunns[m]> s//`/, s//`)/

10:06 <JamesMunns[m]> yeah, happy to pause the conversation here.

10:07 lulf[m] has joined #rust-embedded

10:07 <lulf[m]> Your allocator/low prio tasks could use a critical section implementation that didn't block the high prio interrupt though, assuming that the high prio and low prio didn't share any state?

10:07 <JamesMunns[m]> I still think that more people could use allocators today than they do. But if all crates require allocators, it makes it harder for crates that can't afford allocators.

10:07 <ivmarkov[m]> lulf[m]: Yes.

10:07 <ivmarkov[m]> It is just not the case, as it seems... :D

10:07 <JamesMunns[m]> lulf[m]: yeah, this gets into big picture system design :)

10:07 <JamesMunns[m]> like, static analysis is only a piece of the puzzle, you have to correctly combine the analysis in a way that fits the realities of your system

10:08 <JamesMunns[m]> even if you have the WCETs + deadlines of all your tasks, you need to bring that into reality with how your scheduler and hardware works.

10:10 <JamesMunns[m]> (then the answer is "how can we verify the high prio task never accidentally tries to use data shared with low prio tasks, so it is never inflicted by their variance or potential inversion"!, like what RTIC tries to guarantee: if you have a higher prio task that doesn't share data with lower prio tasks, it'll never be impeded outside of a CS)

10:12 <JamesMunns[m]> fwiw: I *highly* recommend Koopman's "Better Embedded System Software". The examples are a little dated (C, sometimes talks about 8 + 16 bit processors), and it is a little geared towards hard realtime systems, but it is a FANTASTIC primer to all the things I've mentioned today, like how to think about scheduling, etc.

10:12 <JamesMunns[m]> Also a very good primer to "systems thinking" and more basic things like requirements management and planning a system design.

10:13 * JamesMunns[m] uploaded an image: (58KiB) < https://catircservices.org/_irc/v1/media/download/Ael8A93nMZNSuORetBoydi0MAMiKkodTBGl82IM__W-fPJfX7CbEhjUOeYha8-MgdDfgrcNDLQvY6Xk9UWPDFfhCeSMy-u_wAGxvY2FsLmJlZXBlci5jb20vamFtZXNtdW5uc19WTTR1TUQ5V2pkZnlpblJPUlYxZGJwa1dzV2VJNWN0Snp4czF0U0swbnQzVzg4UWhsZUtrMzV0ZG8wMUJwdllZ >

10:13 <JamesMunns[m]> table of contents, as a teaser :D

10:14 <JamesMunns[m]> it's written as basically 29 "pamphlets" of information: you can dive into one topic, and not necessarily read the book cover to cover (tho I do recommend it!)

10:15 <JamesMunns[m]> also introduces a lot of vocab so you can start searching for standard research terms for many of these topics, so it's a very good primer book IMO.

10:15 <JamesMunns[m]> s/29/30/

10:19 Kaspar[m] has joined #rust-embedded

10:19 <Kaspar[m]> <JamesMunns[m]> "I still think that more people..." <- Shhhh. 🙂 Much easier to add allocators than to remove.

10:22 mameluc[m] has joined #rust-embedded

10:22 <mameluc[m]> interesting discussion and a reminder to myself why I avoid these kind of inbetween systems. Either I go with some small mcu with no_std no_alloc or I blink a led on a 8Gb pi and hope for the best. Also WDT to the rescue when there is gut feelings involved

13:03 <thejpster[m]> https://doc.rust-lang.org/nightly/std/random/index.html

13:03 <thejpster[m]> Long overdue

13:07 <dirbaio[m]> and the traits are in core. hell yes

13:24 sugoi has joined #rust-embedded

13:28 sugoi has quit [Ping timeout: 248 seconds]

13:31 AtleoS has joined #rust-embedded

13:34 <mameluc[m]> now it is much easier to implement xkcd getRandomNumber in the whole code base

13:43 <thejpster[m]> <dirbaio[m]> "and the traits are in core. hell..." <- https://doc.rust-lang.org/nightly/core/random/index.html

13:43 <thejpster[m]> Nice

14:01 AtleoS has quit [Ping timeout: 252 seconds]

14:01 AtleoS has joined #rust-embedded

15:20 kenny has quit [Ping timeout: 246 seconds]

15:47 AtleoS has quit [Ping timeout: 260 seconds]

15:48 AtleoS has joined #rust-embedded

15:56 sugoi has joined #rust-embedded

16:05 kenny has joined #rust-embedded

16:17 kenny has quit [Ping timeout: 265 seconds]

16:26 dygear[m] has quit [Quit: Idle timeout reached: 172800s]

16:37 sugoi has quit [Quit: sugoi]

16:41 kenny has joined #rust-embedded

17:34 RidhamBhagat[m] has joined #rust-embedded

17:34 <RidhamBhagat[m]> Hello

17:34 * RidhamBhagat[m] uploaded an image: (149KiB) < https://catircservices.org/_irc/v1/media/download/AR1rOXWhnTudSu4c2EwNqhPOvFc2wXkMqH30AfDjgkB_yGxe-eGBjS3tyxP3SVD8UEp41YqjjqI3z2rs0ytPWmdCeSNMMlDQAG1hdHJpeC5vcmcvZU9ISHlDYUJWZFhqUUJXckZER0NVVk5B >

17:34 <RidhamBhagat[m]> so i was working on a telematics module and came across this ic which google does not like, where can I find more information on it

17:36 <JamesMunns[m]> With laser like that, it's probably a custom part that will be hard to find info for

17:36 <JamesMunns[m]> I don't think we can help you beyond googling the same info you would

17:36 <JamesMunns[m]> * With laser etchings like that,

17:37 andar1an[m]1 has joined #rust-embedded

17:37 <andar1an[m]1> Is there any rust firmware in works for wifi 7 like Qualcomm chips?

17:45 <andar1an[m]1> Friend asked, and I didn't know. I can check out resources later, just working one something else atm

17:52 <RidhamBhagat[m]> <JamesMunns[m]> "With laser like that, it's..." <- got it

17:52 <RidhamBhagat[m]> https://www.digikey.com/en/products/detail/asahi-kasei-microdevices-akm/AK4458VN/5287021

17:52 <RidhamBhagat[m]> i found this part inside a car's telematics module

17:53 <RidhamBhagat[m]> why would a dac for audio systems be inside a telematics moduel

17:55 <andar1an[m]1> Voice directions through infotainment?

17:55 <andar1an[m]1> Audio warnings?

17:56 <andar1an[m]1> Got onstar? Lol

17:57 <RidhamBhagat[m]> well it is a telematics module...

17:58 <andar1an[m]1> Why I said onstart

17:58 <andar1an[m]1> * Why I said onstar

17:58 <andar1an[m]1> Maybe they are just recording you and selling data to insurance lol

17:58 <RidhamBhagat[m]> that is the main concern lol

17:59 <RidhamBhagat[m]> it is a japanese manufacturer

17:59 <RidhamBhagat[m]> that makes very good trucks

18:00 <andar1an[m]1> There's at least 1 case about that currently haha

18:00 <andar1an[m]1> Is it connected to speakers?

18:00 <andar1an[m]1> Maybe there are audio prompts with updates

18:01 <RidhamBhagat[m]> andar1an[m]1: honestly , no clue. maybe

18:02 <andar1an[m]1> RidhamBhagat[m]: Thank you Taiichi Ohno for your lessons haha

18:02 <RidhamBhagat[m]> 😂

18:03 <RidhamBhagat[m]> <RidhamBhagat[m]> "ic_image.jpg" <- what would be the best way to know about this tho

18:03 <andar1an[m]1> Quite literally TPS was the foundation of agile, feels like it got butchered a bit

18:03 <andar1an[m]1> Contact manufacturer?

18:03 <andar1an[m]1> get a data sheet?

18:04 <andar1an[m]1> There has to be a BOM somewhere, maybe it has more details on IC

18:05 * RidhamBhagat[m] posted a file: (181KiB) < https://catircservices.org/_irc/v1/media/download/AWtNvk6sIJ75Nzxv8jQ-uFqdUChi1hGTJYZ9L--aQ3526fEGBVdqI4AnVyE08hoVmLprfhO54eg2tQ_UXKU6ecVCeSNN9R4AAG1hdHJpeC5vcmcvSUVYaHJ0V2dOaHhyQUJpb3BSV2FaRFhm >

18:05 cyrozap_ has joined #rust-embedded

18:05 <RidhamBhagat[m]> well i was able to get it from the fcc id of the device

18:05 <RidhamBhagat[m]> https://apps.fcc.gov/oetcf/eas/reports/ViewExhibitReport.cfm?mode=Exhibits&RequestTimeout=500&calledFromFrame=N&application_id=ThydOn9441xOX0kB8ewR0w%3D%3D&fcc_id=BEJTM03LNNATY0

18:05 <RidhamBhagat[m]> if you are skeptical downloading a pdf

18:06 <andar1an[m]1> im skeptical clicking anything haha

18:06 <andar1an[m]1> (clicked link though)

18:06 Ultrasauce has joined #rust-embedded

18:07 <andar1an[m]1> Is the link for the entire unit?

18:08 <RidhamBhagat[m]> pretty much

18:08 <andar1an[m]1> (the telematics unit)?

18:08 <andar1an[m]1> ah, I see

18:08 <RidhamBhagat[m]> https://autoparts.toyota.com/products/product/transceiver-telematics-8674106092

18:08 <RidhamBhagat[m]> this is the part

18:09 <andar1an[m]1> There is an electrical schematic on one of the manuals

18:10 <andar1an[m]1> but the IC looks like it has connections all around, and I am seeing nothing with that many connections

18:10 cyrozap has quit [*.net *.split]

18:10 jonored has quit [*.net *.split]

18:10 vancz has quit [*.net *.split]

18:10 sauce has quit [*.net *.split]

18:10 cyrozap_ is now known as cyrozap

18:10 <andar1an[m]1> could some of the solders be for fixing?

18:11 vancz has joined #rust-embedded

18:11 jonored has joined #rust-embedded

18:11 <RidhamBhagat[m]> andar1an[m]1: for fixing ?

18:11 <RidhamBhagat[m]> andar1an[m]1: wait , how did i miss that

18:11 <andar1an[m]1> mounting, ensuring it is afixed securely. It is a car so vibrations

18:11 <andar1an[m]1> there is a BOM on page 11

18:12 <andar1an[m]1> https://apps.fcc.gov/eas/GetApplicationAttachment.html?id=4017724 - if this is correct reference

18:12 <RidhamBhagat[m]> wait

18:12 <RidhamBhagat[m]> which module you are talking about

18:13 <RidhamBhagat[m]> andar1an[m]1: cant open this lol

18:13 <andar1an[m]1> I dk, that is from link you shared, but BOM doesn't make sense

18:13 <RidhamBhagat[m]> oooo thanks for letting me know that

18:13 <RidhamBhagat[m]> but it goes inside a glove box

18:13 <RidhamBhagat[m]> this module

18:13 <RidhamBhagat[m]> i doubt vibrations are much of a concern wrt nvh levels we are working with

18:14 <andar1an[m]1> There are also some DNI entries which I am assuming is "Do not identify" haha

18:14 <andar1an[m]1> nvh?

18:14 <RidhamBhagat[m]> Noise, vibration, and harshness

18:15 <RidhamBhagat[m]> andar1an[m]1: i still cannot find this module you are talking about lol

18:15 <RidhamBhagat[m]> gimme like 2

18:15 <RidhamBhagat[m]> but still

18:15 <RidhamBhagat[m]> what is a audio dac doing

18:16 <RidhamBhagat[m]> the manufacturer will tell me?

18:16 <andar1an[m]1> RidhamBhagat[m]: lol probably not, it isn't massive. But I don't know resonant frequencies you are dealing with

18:17 <andar1an[m]1> You said truck, I assumed big bumps

18:17 <RidhamBhagat[m]> andar1an[m]1: and these are those laser etched chips kinda thingies i am assuming

18:18 <andar1an[m]1> I dk, I imagine there are some means of protecting proprietary data for parts

18:18 <RidhamBhagat[m]> got it

18:18 <RidhamBhagat[m]> anything else interesting you came across that can possibly help me with this?

18:19 <andar1an[m]1> Still looking. It is interesting break from what I am stuck on haha

18:19 <andar1an[m]1> could it be interfacing with an antenna

18:20 <andar1an[m]1> could one use an audio dac for purposes of over the air updates

18:21 <andar1an[m]1> What I am looking at references antenna connections

18:21 <RidhamBhagat[m]> andar1an[m]1: yes i just got there

18:22 <RidhamBhagat[m]> also , why do all these chips have small QR codes on em

18:22 <andar1an[m]1> traceability

18:22 <JamesMunns[m]> it's usually serial numbers for inventory management systems

18:22 <RidhamBhagat[m]> which are practically unusable , some production thingy?

18:22 <RidhamBhagat[m]> JamesMunns[m]: ah okay

18:23 <RidhamBhagat[m]> I kinda wanted to know what is the main cpu for this thing

18:23 <RidhamBhagat[m]> whats the architecture, can i get a flash dump

18:23 <andar1an[m]1> maybe it is related to internal erp or pdm system, but I imagine it is more for government traceability

18:23 <RidhamBhagat[m]> maybe more youtubing and reddit is the wat to go

18:23 <RidhamBhagat[m]> andar1an[m]1: the qr codes?

18:24 <andar1an[m]1> automotive tends to be very guarded haha

18:24 <RidhamBhagat[m]> also the box had a imei number

18:25 <RidhamBhagat[m]> that means we are dealing with something with a sim is it?

18:28 * andar1an[m]1 uploaded an image: (193KiB) < https://catircservices.org/_irc/v1/media/download/AT97GuWVA_-xYrrDdH-C9bqPzVy6r-e295RVdFP-Shj48MF0l6gadAw0cpPt4McvHUZNQ5_CGdNKb2yNcVWpGwNCeSNPRabwAG1hdHJpeC5vcmcvQ1FMVEd1eFB6VU9PQkpmY3ROUUVGcVlS >

18:28 <andar1an[m]1> To confirm, this is the unit?

18:28 <JamesMunns[m]> maybe a note, none of this seems related to rust embedded :)

18:28 <JamesMunns[m]> maaaaybe take this one to DM or another room?

18:29 <andar1an[m]1> True, my bad

18:29 <RidhamBhagat[m]> oh okok , fair enough.

18:30 <RidhamBhagat[m]> dmed , did you recieve it?

18:31 <RidhamBhagat[m]> andar1an:

18:32 <andar1an[m]1> my matrix seems to have blown up, I will restart.

18:32 <andar1an[m]1> On a Rust note, if anyone know about wifi 7 firmware specifically for qualcomm chips in rust please @ me.

18:34 <konkers[m]> Has there been any exploration into keeping `PanicInfo` out of the firmware image and supporting it though something like `defmt`? Since the `PanicInfo` and `#[panic_handler]` APIs deal with `&str`s it would likely require `core` API and feature changes to support but could have some potentially sizable code size benefits.

18:47 <dirbaio[m]> no one has tried anything afaik

18:48 <dirbaio[m]> but I think it's not very feasible, because with panic! you can print anything that impls Display or Debug, which will necessarily do on-target formatting with bloated strings

18:50 <dirbaio[m]> my recommendation is to add defmt support to as many crates as you can, so at least their panics don't cause bloat

18:52 <dirbaio[m]> and enable build-std-features=panic_immediate_abort to get rid of bloat of all remaining stdlib panics. This means you don't get panic messages from them, but you can still grab the PC value and a register+stack dump in the HardFault handler and track them down that way.

18:58 <konkers[m]> <dirbaio[m]> "but I think it's not very..." <- Hrm.... that's a good, and unfortunate point.

19:03 AlexandrosLiarok has quit [Quit: Idle timeout reached: 172800s]

19:08 <konkers[m]> In my particular case I'm integrating with a C++ firmware that already uses https://pigweed.dev/pw_tokenizer/ so defmt support in crates doesn't immediately solve my problem. Though I could possibly provide a shim crate that provides the defmt API using pw_tokenizer. Might be a nightmare to maintain though.

19:09 <dirbaio[m]> uuuh good luck. defmt is much more powerful than pw_tokenizer

19:11 <konkers[m]> Curious which ways and if they're intractible? I'm only familiar with the gaps going the other way (specifically around stable token IDs).

19:22 <thejpster[m]> Someone tried to send a defmt PR to pull the message out of a PanicInfo but it was using an unstable API that didn’t actually work

19:23 <dirbaio[m]> and even with that, you can't tokenize the panic message

19:23 <dirbaio[m]> it'll still be a bloated string in flash

19:24 <dirbaio[m]> <konkers[m]> "Curious which ways and if they'..." <- uh it's hard to tell from pw_tokenizer's docs. but AFAICT pw_tokenizer tokenizes the "main" string and then serializes the args after it? so if you print a string with `%s` it'll get sent as-is over the wire

19:24 <dirbaio[m]> vs with defmt if you print something that impls defmt::Format with {:?}, the Format impl can write more tokenized strings to the stream with more args

19:24 <dirbaio[m]> so a defmt log message is a tree of tokenized strings, each with their arg substitutions as children.

19:28 <thejpster[m]> https://github.com/knurling-rs/defmt/pull/856 was the ticket, links to https://doc.rust-lang.org/nightly/core/panic/struct.PanicInfo.html#method.message. It helps, but not a lot.

19:33 <dirbaio[m]> yep pw_tokenizer messages are just one tokenized string, plus N args which can only be these 4 types https://cs.opensource.google/pigweed/pigweed/+/main:pw_tokenizer/public/pw_tokenizer/internal/argument_types.h;l=65-68?q=pw_tokenizer_ArgTypes&ss=pigweed%2Fpigweed:pw_tokenizer%2F

19:33 <dirbaio[m]> and strings aren't further tokenized

19:33 <konkers[m]> <dirbaio[m]> "so a defmt log message is a ..." <- I actually think this is doable with pw_tokenizer by either unfolding the tree at compile time to a single string literal or using "nested tokenization" ndhttps://pigweed.dev/seed/0105-pw_tokenizer-pw_log-nested-tokens.html)

19:33 <dirbaio[m]> so it's not a tree

19:34 <konkers[m]> Is there runtime dynamism with the tree?

19:36 <dirbaio[m]> oh hmm wtf

19:37 <dirbaio[m]> that works by embeddiing the "child" tokens into the parent token's string, then tokenizing it all as one? it's slightly different

19:38 <konkers[m]> It basically does recursive token substitution until there's nothing left then formats. It does require that the set of fields to format is fixed.

19:38 <dirbaio[m]> defmt just embeds the {:?} placeholder in the main tokenized stinrg

19:38 <dirbaio[m]> then the arg contains another tokenized string (with its own number), plus its own args

19:39 <konkers[m]> Ah, so values are tagged?

19:39 <dirbaio[m]> they're juts more tokens

19:39 <dirbaio[m]> s/juts/just/

19:40 <dirbaio[m]> {:?} tells the decoder "read a varuint, decode it as a token, then read and decode whatever placeholders it has"

19:40 <dirbaio[m]> so

19:40 <dirbaio[m]> a format impl can choose the token at runtime: if whatever { write!(fmt, "super long string one {} {}", foo, bar") } else { write!(fmt, "super long string two") }

19:40 <konkers[m]> Yeah, that's what I meant by tagged.

19:40 <dirbaio[m]> s//`/, s/"//, s//`/

19:42 <konkers[m]> I have to think on this some more but I think this could be made to work with the way pw_tokenizer works today... and if not with a small extension.

19:42 <konkers[m]> Thanks for all the info!

19:42 <dirbaio[m]> maybe it'll be easier to support both kinds of log messages over the wire 😅

19:42 <dirbaio[m]> have some way to mark "this is a defmt frame" and "this is a pw_tokenizer frame"

19:43 <dirbaio[m]> and support both in the decoder 🫠

19:43 <konkers[m]> One nice upshot with using printf format strings in pw_tokenizer is that it encodes the data types so they don't have to be tagged. (there are plenty of "downshots" of using printf format strings too)

19:43 <dirbaio[m]> defmt won't tag if you do info!("{=u32}", 42)

19:44 <dirbaio[m]> it will if you do info!("{:?}", 42), then it goes through the impl Format for u32 which does a write!(fmt, "{=u32}", self)

19:44 <dirbaio[m]> * it will if you do info!("{:?}", 42), because it goes through the impl Format for u32 which does a write!(fmt, "{=u32}", self)

19:44 <konkers[m]> Can it do type inference if I do info!("{}", 42u32)?

19:45 <dirbaio[m]> nope :(

19:45 <dirbaio[m]> types aren't available when the proc macro runs

19:45 <konkers[m]> There's been some ideas thrown around internally here about how to get around that. TAIT is one of those....

19:46 <dirbaio[m]> typeck runs after, on the macro output 🥲

19:46 <dirbaio[m]> I don't think TAIT helps

19:47 <konkers[m]> Lemme find where I'm doing this

19:52 <konkers[m]> https://cs.opensource.google/pigweed/pigweed/+/main:pw_format/rust/pw_format/macros.rs;l=644. Basically TAIT lets you name the type, which you can then access a `const &'static str` for that type. You can then concatenate strings together in a const fn that outputs the data to the "token database" linker section.

19:53 <konkers[m]> (our formatting system is quite complex as we want to accept both rust and printf format strings and output either of them based on the logging backend)

19:53 <dirbaio[m]> hmm

19:54 <dirbaio[m]> ah and then hash the result of concating everything in const?

19:54 <konkers[m]> Only a handful of types are implemented at the moment and there's no #[derive(Format)] support yet.

19:54 <konkers[m]> dirbaio[m]: yep

19:54 <dirbaio[m]> that is pretty cool

19:55 <dirbaio[m]> i'm not sure if you can do that in defmt, because the string has to end up in an attr

19:55 <dirbaio[m]> to be the symbol name

19:55 <konkers[m]> Does defmt do hashing? When I last looked, it looked like it was storing directed indexes into the message database which are not stable across builds.

19:55 <dirbaio[m]> defmt doesn't hash the strings, it makes them into "fake" symbols to get the linker assign addresses to them that are 1, 2, 3, 4...

19:56 <dirbaio[m]> so tokens are smaller when encoded as varuints, 1-2 bytes instead of 4 bytes as in pw_tokenizer

19:56 <konkers[m]> Yeah, it's hard requirement for out customers to have tooling that is agnostic to firmware version. So slurp up all the token database elf sections of all the versions and merge them.

19:57 <dirbaio[m]> and there's guaranteed no collision

19:57 <konkers[m]> s/out/our/

19:57 <dirbaio[m]> I had this idea of changing the proc macro to hit an sqlite db at compile time to allocate the ids

19:57 <konkers[m]> * Yeah, it's hard requirement for our customers to have tooling that is agnostic to firmware version. So we slurp up all the token database elf sections of all the versions and merge them.

19:57 <dirbaio[m]> if it exists, use the id, otherwise insert it with the next available id

19:58 <dirbaio[m]> so you could get small IDs, and get a single db capable of decoding logs from multiple fw versions by simply keeping that sqlite db between builds 🤣

19:58 <dirbaio[m]> * you could still get small, * small IDs insteead of 32bit hashes, and

19:58 <konkers[m]> Works if there's a single developer/build machine

19:59 <dirbaio[m]> why? you can have as many dev machines as you can

19:59 <dirbaio[m]> you only need to "serialize" the production builds that you're actually going to release

19:59 <dirbaio[m]> s/can/want/

19:59 <konkers[m]> Are you checking the sqlite db in to the reop?

19:59 <dirbaio[m]> asking the hard questions there 😅

19:59 <dirbaio[m]> no idea

20:00 <konkers[m]> Also, with a distributed build like bazel, your build might not even be happing all on one machine

20:01 <dirbaio[m]> chuck it in blob storage, make ci download, modify, upload?

20:01 <dirbaio[m]> lol

20:01 <konkers[m]> yeah, big lol :rofl:

20:01 <konkers[m]> * yeah, big lol

20:03 <dirbaio[m]> no i'm not joking. firmware build times aren't that long, plus you don't release new firmware versions to production that often

20:03 <konkers[m]> Perhaps a config option/feature could be added to defmt to use stable hashes and let the user decide?

20:03 <konkers[m]> If we had that, the mixing of defmt and pw_tokenizer logs would be a viable solution

20:04 <dirbaio[m]> why do you want bazel/blaze for firmware dev? it's not like "server Google" where they deploy 500mb binaries to borg where you either use blaze or die waiting

20:05 <dirbaio[m]> if a build takes 10min it means you can do 144 prod releases a day, it's more than enough :D

20:05 <dirbaio[m]> or

20:05 <dirbaio[m]> hey

20:05 <konkers[m]> For a handful of different reasons. A big one is that with many projects, the firmware is just one component. Also bazel pretty decent at handling polyglot projects.

20:05 <dirbaio[m]> make the proc macro hit a REST API that allocates the token IDs 🤣

20:06 <konkers[m]> Plus there's the "when you have a hammer, everything looks like a nail" reason too

20:08 <konkers[m]> connecting to a db as part of the build to allocate IDs also breaks build reproducibility

20:08 <dirbaio[m]> that's true!

20:09 <dirbaio[m]> <konkers[m]> "Perhaps a config option/..." <- yeah jokes aside, I can't see any reason not to do this, it seems very doable.

20:10 <konkers[m]> dirbaio[m]: I'll explore that a bit and see how invasive it would be.

20:10 <dirbaio[m]> you'd get bigger tokens, but stable across builds

20:11 <konkers[m]> Yeah, it seems like a tradeoff that some projects would be happy to make

20:14 <dirbaio[m]> ah there's another problem, tokens are a fixed u16 on the wire 😅

20:16 <thejpster[m]> If you can think of a way to widen that without breaking anything I’m all ears

20:16 <thejpster[m]> I want to do a 1.0 and I want 0.3.99 to re-export 1.0

20:17 <dirbaio[m]> defmt 1.0?

20:18 <thejpster[m]> Personally I think it’s long overdue. Not my call though.

20:19 <konkers[m]> <dirbaio[m]> "ah there's another problem..." <- ah, so not varint encoded?

20:20 <dirbaio[m]> it doesn't use varints. ints are fixed width, then the entire stream is compressed in a way that makes zeros take less space https://defmt.ferrous-systems.com/encoding

20:20 <dirbaio[m]> yields smaller code size

20:21 <konkers[m]> dirbaio[m]: Interesting!!

20:23 <dirbaio[m]> <thejpster[m]> "If you can think of a way to..." <- wire format is versioned independently, so a new 0.3.x release could switch to "wire format v5" where tokens are u32

20:23 <dirbaio[m]> together with a defmt-decoder release that adds support for decoding v5, while keeping v4

20:24 <thejpster[m]> The version is encoded in a symbol in the defmt section I think

20:24 <dirbaio[m]> yep, so the decoder can tell which to use

20:25 <thejpster[m]> I’m sure there will be a request for comments before it’s nailed down

20:25 <thejpster[m]> The trick will be allowing future changes without breaking any existing users

20:25 <dirbaio[m]> the advantage of versioning the wire format independently is you can make breaking changes to it without doing a major bump to the defmt crate

20:26 <dirbaio[m]> that's "technically breaking" since it might cause a setup to stop working if you update defmt but not update defmt-decoder on the pc side

20:27 <dirbaio[m]> in the same way a msrv bump is "technically breaking" because your build will stop working if you update defmt without updating rustc

20:27 <thejpster[m]> I’m ok with that. Host tools are easier to update and you should keep them in sync with your fleet

20:27 <dirbaio[m]> but imo that's acceptable

20:27 <dirbaio[m]> it's wayy wayy better than freezing the wire format together

20:27 <dirbaio[m]> s/together/forever/

20:28 <dirbaio[m]> or worse, doing a major bump of the defmt crate 💀

20:28 <dirbaio[m]> so the contract would be "if you update defmt 0.3.x to 0.3.y your firmware will always keep buidling, but you might find it starts speaking a newer wire format"

20:28 <dirbaio[m]> so, no breaking changes to the rust api, just to the wire format

20:38 <JamesMunns[m]> <dirbaio[m]> "you only need to "serialize" the..." <- I'm actually working on an alternative to this.

20:39 <JamesMunns[m]> The device keeps the schemas, and reports them at runtime

20:39 <JamesMunns[m]> I'm building a server side management tool that lets you just get type safe access to fields based on the uploaded schemas

20:40 <JamesMunns[m]> It also has a DB with a rolling history

20:40 <JamesMunns[m]> So if schemas change over time you can still get decoded logs

20:40 <JamesMunns[m]> And it's postcard on the wire for encoding.

20:41 <konkers[m]> JamesMunns[m]: Doesn't that negate the code size benefits of tokenizing the strings?

20:41 <JamesMunns[m]> And also probably running it as a lib crate if you don't want a persistent server

20:41 <JamesMunns[m]> konkers[m]: Yes, it's fine on bigger chips like the rp2040

20:42 <JamesMunns[m]> JamesMunns[m]: Might not scale well to stm32g0.

20:42 <JamesMunns[m]> But my goal is to do it over USB with no debugger, so that's larger chips anyway.

20:45 <konkers[m]> Most of the users of pw_tokenozer are very code size constrained and that's why they adopt it. Having megabytes of code size doesn't seem to be a thing in the near future for high volume products

20:45 <JamesMunns[m]> Schemas for reasonably sized types are like 64-256 bytes

20:46 <JamesMunns[m]> You would be able to fit thousands of scemas in 1MiB :)

20:47 <JamesMunns[m]> If you only want to send logs and already have USB, I would bet it takes only a few (dozen?) K of codespace

20:47 <JamesMunns[m]> But if you don't have USB, I might be able to do it over rtt too

20:48 <JamesMunns[m]> I'll come back with real numbers once I have it more built out :)

20:48 <JamesMunns[m]> s/scemas/schemas/

20:49 <konkers[m]> Do these schemas include the string content of the log?

20:49 <JamesMunns[m]> Yes, I get that makes it take more space. It also makes managing arbitrary devices with code written by other people manageable

20:50 <JamesMunns[m]> If you aren't passing around elfs.

20:50 <JamesMunns[m]> It also makes it possible to dynamically generate interfaces

20:50 <JamesMunns[m]> Anyway, I think I'm solving a different problem, for sure.

20:51 <konkers[m]> Related to postcard-rpc?

20:51 <JamesMunns[m]> Yep

20:51 <JamesMunns[m]> Basically a managed reverse proxy for multiple connected devices

20:51 <JamesMunns[m]> A whole post station, maybe :)

20:52 <konkers[m]> Neat. I was just looking at that the other day for a hobby project.

20:52 <konkers[m]> The N count for rpc schema is definitely a lot less than lines of logging.

20:53 <JamesMunns[m]> The goal is also to support fancy structured logging too, so you get similar deferred formatting benefits

20:53 <JamesMunns[m]> More like tracing than defmt, but some (unformatted) strings are still passed on the wire

20:55 <konkers[m]> We've had similar thoughts with our tokenized logging but have not had time to fully spec it out much less implement it.

20:56 <JamesMunns[m]> My schemas are serialized to postcard, based on the stable wire format and schema types, so I could stabilize it as a standard soonish

20:57 <JamesMunns[m]> I also have "type punning" working, so you can send a `&[u8]` on the MCU side, and receive it as a Vec<u8> on the PC side

20:57 <JamesMunns[m]> I'm excited about it :D

21:12 <konkers[m]> Have you thought much about MCU to MCU comms? pw_rpc gets used for that as well as MCU to host.

21:31 <JamesMunns[m]> The intent is to have an MCU client as well for postcard rpc

21:31 <JamesMunns[m]> At least for point to point comms, or over an existing network.