ChanServ changed the topic of #rust-embedded to: Welcome to the Rust Embedded IRC channel! Bridged to #rust-embedded:matrix.org and logged at https://libera.irclog.whitequark.org/rust-embedded, code of conduct at https://www.rust-lang.org/conduct.html
<re_irc> <thejpster> Sorry, I've been fiddling again. I drew ISO and ANSI keyboards as ASCII art because I was still getting confused about the symbolic keycap naming: https://github.com/rust-embedded-community/pc-keyboard/tree/moredocs#keycodes
<re_irc> <thejpster> Turns out the site I was using as a reference used _different_ symbolic names for the _same_ scancodes depending on whether it was ANSI or ISO.
<re_irc> <thejpster> I can't be doing with that.
<re_irc> <thejpster> I promise I'll really try and get 0.7.0 shipped after this round.
<re_irc> < (@datdenkikniet:matrix.org)> https://github.com/rust-embedded/cortex-m/pull/463
<re_irc> < (@adamgreig:matrix.org)> how did you find that, ?
<re_irc> < (@datdenkikniet:matrix.org)> See #embassy-rs:matrix.org (https://matrix.to/#/#embassy-rs:matrix.org) :D
<re_irc> < (@adamgreig:matrix.org)> well, that must have been fun
<re_irc> < (@adamgreig:matrix.org)> stupid "do weird non-platform-standard things in init code to appease buggy debuggers" strikes again
<re_irc> < (@dirbaio:matrix.org)> the issue appeared in a nightly bump, so we spent a looooot of time thinking it was a miscompile 😂
<re_irc> < (@datdenkikniet:matrix.org)> +(We had a bit of a journey over there)
<re_irc> < (@dirbaio:matrix.org)> while all it did was change codegen enough to trigger a "strd" :P
<re_irc> < (@dirbaio:matrix.org)> +on stack
<re_irc> < (@adamgreig:matrix.org)> I guess it's been broken since 2021-11-22, so c-m-rt 0.7.0 should have been ok but not 0.7.1? still looking through embassy scrollback though, hadn't been up to date on it for a few days obviously
<re_irc> < (@adamgreig:matrix.org)> ah yes, literally just got to the line about it being OK on 0.7.0, lol
<re_irc> < (@dirbaio:matrix.org)> yeah, 0.7.0 works, 0.7.[12] fails
<re_irc> < (@dirbaio:matrix.org)> i'm VERY surprised this went unnoticed for so long :O
<re_irc> < (@adamgreig:matrix.org)> yea, did you establish if new nightly codegen just hits it more often than before?
<re_irc> < (@dirbaio:matrix.org)> I don't think so, it's probably just bad luck
<re_irc> < (@dirbaio:matrix.org)> in Peter Hansen's code the original bug manifested when bumping nightly-2023-02-04 -> nightly-2023-02-05
<re_irc> <Peter Hansen> : Something changed in 2023-02-05 affected some optimization relating to having a local 2-element array in a function, and picking an index into it with something that amounts to a boolean.
<re_irc> < (@dirbaio:matrix.org)> but the other repros reproduce up to nightlies from 2021
<re_irc> < (@adamgreig:matrix.org)> what target were you building for?
<re_irc> <Peter Hansen> It was an extremely roundabout path (and hours of work) to get back to the real cause.... thumbv7em-none-eabihf
<re_irc> < (@dirbaio:matrix.org)> reproduces with all thumbs
<re_irc> < (@dirbaio:matrix.org)> (didn't test further back than 2021 because lack of "global_asm!" support breaks the build)
<re_irc> < (@adamgreig:matrix.org)> ah, though 0.7.1 didn't have global_asm and did have the bug, so I guess would have included it
<re_irc> < (@adamgreig:matrix.org)> was curious if it was a v6 vs v7 thing, as v6 requires 8 byte aligned SP on function entry but v7 doesn't
<re_irc> < (@dirbaio:matrix.org)> v7 doesn't?
<re_irc> < (@adamgreig:matrix.org)> but it's plausible that LLVM requires it regardless
<re_irc> < (@adamgreig:matrix.org)> v7 requires 4 byte SP alignment, but if LLVM requires 8 byte SP we should be setting STKALIGN in CCR to tell the CPU to make it always 8 bytes on interrupt entry too
<re_irc> < (@adamgreig:matrix.org)> (arm recommend 8 byte alignment anyway)
<re_irc> < (@jamesmunns:beeper.com)> yo, that's one hell of a bug. Good sleuthing!
<re_irc> < (@adamgreig:matrix.org)> more tempted to delete the whole thing and tell debuggers to stop being stupid dum-dums, but alas, we fix what we can fix and take the rest with grace I guess
<re_irc> < (@adamgreig:matrix.org)> same deal with having to manually set the SP a few clock cycles after the CPU does it for us -_-
<re_irc> < (@jamesmunns:beeper.com)> (is there a citation for "LLVM/ARM requires 8 byte alignment"?)
<re_irc> < (@dirbaio:matrix.org)> "some software standards require the stack pointer to be 8-byte aligned"
<re_irc> < (@dirbaio:matrix.org)> "ARM deprecates implementation or use of 4-byte SP alignment."
<re_irc> < (@dirbaio:matrix.org)> "the AAPCS requires 8-byte stack pointer alignment on entry to a conforming function"
<re_irc> < (@adamgreig:matrix.org)> which is "always 4 byte aligned, and 8 byte aligned on entry to an AAPCS function"
<re_irc> < (@dirbaio:matrix.org)> ah yep that does it
<re_irc> < (@adamgreig:matrix.org)> v7-M guarantees 4-byte alignment but you can set STKALIGN to say it should be 8 byte, though I imagine many implementations make this RAO anyway
<re_irc> < (@adamgreig:matrix.org)> with -hf target the FP state preservation takes precedence over stkalign too, so
<re_irc> < (@adamgreig:matrix.org)> I wonder if anything out there doesn't reset it to 1...
<re_irc> < (@dirbaio:matrix.org)> hopefully not 😰
<re_irc> < (@jamesmunns:beeper.com)> whew
<re_irc> < (@adamgreig:matrix.org)> do any other init routines push lr to the stack? I haven't immediately found anyone else doing it but presumably it came from somewhere
<re_irc> < (@datdenkikniet:matrix.org)> Zephyr seems to, at least
<re_irc> < (@datdenkikniet:matrix.org)> Oh but that's some ISR thing, so not 100% sure that it applies here
dne has quit [Remote host closed the connection]
<re_irc> < (@datdenkikniet:matrix.org)> They do also seem to align it to 8 bytes, though
dne has joined #rust-embedded
<re_irc> < (@dirbaio:matrix.org)> there's a related footgun
<re_irc> < (@dirbaio:matrix.org)> if you set RAM size in memory.x to something not multiple of 8
<re_irc> < (@dirbaio:matrix.org)> then stack will also be unaligned
<re_irc> < (@adamgreig:matrix.org)> libopencm3 has a brave "wrote r0 in C", doesn't push LR to the stack, but _does_ set STKALIGN in CCR: https://github.com/libopencm3/libopencm3/blob/8bc483746bd78f2a398f2949420a4128eed5272c/lib/cm3/vector.c#L62
<re_irc> < (@adamgreig:matrix.org)> "/* Enabled by default on most Cortex-M parts, but not M3 r1 */" 😱
<re_irc> < (@adamgreig:matrix.org)> ST's HAL (at least for F1, it's a million startup.s files for every part number of course) doesn't push LR, doesn't seem to set STKALIGN (I guess they already know if their parts have it hardcoded) https://github.com/STMicroelectronics/STM32CubeF1/blob/master/Drivers/CMSIS/Device/ST/STM32F1xx/Source/Templates/gcc/startup_stm32f100xb.s#L100
<re_irc> < (@adamgreig:matrix.org)> this was the original PR in c-m-rt, suggesting it was added to fix a false positive stack overflow in probe-run? https://github.com/rust-embedded/cortex-m-rt/pull/337
<re_irc> < (@adamgreig:matrix.org)> but presumably people don't have their debuggers unwind past reset in all those other startup files, so I wonder what's going on there
<re_irc> < (@adamgreig:matrix.org)> if we can avoid wasting 8 bytes of everyone's stack forever that would be nice
<re_irc> < (@adamgreig:matrix.org)> fair enough that "bl main" at the end of the reset handler will put some PC value from the middle of the reset handler into LR, which indeed overwrites the FFFFFFFF we put into it manually
<re_irc> < (@dirbaio:matrix.org)> why not "b main" instead?
<re_irc> < (@9names:matrix.org)> what would that look like when unwinding?
<re_irc> < (@dirbaio:matrix.org)> that means we'd be relying in the fact the macro enforces main is "-> !"
<re_irc> < (@dirbaio:matrix.org)> but I think that's fine?
<re_irc> < (@dirbaio:matrix.org)> unwind would look like: main, then nothing? hopefully
<re_irc> < (@adamgreig:matrix.org)> the reason for "bl" is that "b" has a very limited jump range
<re_irc> < (@adamgreig:matrix.org)> 1MB for b, 16MB for bl
<re_irc> < (@dirbaio:matrix.org)> then ldr+bx?
<re_irc> < (@adamgreig:matrix.org)> (that's why all those startup files above use bl too)
<re_irc> < (@adamgreig:matrix.org)> hmm
<re_irc> < (@dirbaio:matrix.org)> but
<re_irc> < (@dirbaio:matrix.org)> if everyone uses "bl" and doesn't push LR... we're the odd ones
<re_irc> < (@adamgreig:matrix.org)> yep
<re_irc> < (@adamgreig:matrix.org)> I mean, it seems totally reasonable for lr to indicate that reset called main
<re_irc> < (@adamgreig:matrix.org)> how does the debugger know to stop unwinding when it gets to a stack frame inside reset, though?
<re_irc> < (@dirbaio:matrix.org)> debuggers can already know main -> reset, that's the LR in main's stack frame
<re_irc> < (@dirbaio:matrix.org)> then presumably they look at unwind info for reset
<re_irc> < (@dirbaio:matrix.org)> and see.. nothing?
<re_irc> < (@dirbaio:matrix.org)> perhaps the issue is that we _do_ have the cfi stuff and the " .type Reset,%function"?
<re_irc> < (@dirbaio:matrix.org)> all the ones you linked don't
<re_irc> < (@dirbaio:matrix.org)> so perhaps that's how the debugger knows "this is a random bag of ASM, I should stop trying to unwind"?
<re_irc> < (@dirbaio:matrix.org)> +not a function,
<re_irc> < (@adamgreig:matrix.org)> it wouldn't be the lr in main's stack frame if we used bx, though, right?
<re_irc> < (@dirbaio:matrix.org)> yeah, so that LR in main's frame is 0xFFFF_FFFF
<re_irc> < (@adamgreig:matrix.org)> all I meant was it seems fair enough to use bl and therefore enter main with an lr pointing back to reset
<re_irc> < (@dirbaio:matrix.org)> but then you wouldn't see Reset at all in the unwind, that's less nice
<re_irc> < (@dirbaio:matrix.org)> yeah
<re_irc> < (@adamgreig:matrix.org)> guess we'll need a quick reproducer setup for the unwinding issue then
<re_irc> < (@adamgreig:matrix.org)> does probe-run do its own unwinding... wonder when that one decides to stop
<re_irc> < (@adamgreig:matrix.org)> for the stack alignment check in the second pr, I haven't looked into it but I wonder if we can force a suitable alignment instead?
<re_irc> < (@adamgreig:matrix.org)> maybe it's fine to just error out but perhaps a more actionable error message would be useful in that case ("check ram size is a multiple of 8" on the second line or something maybe)
<re_irc> < (@dirbaio:matrix.org)> yeah I thought so too
<re_irc> < (@adamgreig:matrix.org)> : libopencm3's is written in C so I suppose it effectively does have that :P
<re_irc> < (@adamgreig:matrix.org)> and tock's is a rust naked function so again I expect should do
<re_irc> < (@dirbaio:matrix.org)> the libopencm3 one is cursed :P
<re_irc> < (@dirbaio:matrix.org)> adding asm to Reset to align the stack would waste bytes
<re_irc> < (@dirbaio:matrix.org)> and doing it in the linker script.. it'd need to align _down_ to 8 bytes, can you even do that?
<re_irc> < (@adamgreig:matrix.org)> just zero the last few bits?
<re_irc> < (@adamgreig:matrix.org)> given as it would just mean the last few bytes of ram can never be used anyway in that case I think it's OK to have it error instead
<re_irc> < (@adamgreig:matrix.org)> if naked functions had stabilised already we could have used them and dropped all the cfi stuff anyway, sigh
<re_irc> < (@dirbaio:matrix.org)> added more actionable error message :P
<re_irc> < (@adamgreig:matrix.org)> it's not completely clear to me but it seems like probe-run's unwinder does just wait to see 0xFFFFFFFF? https://github.com/knurling-rs/probe-run/blob/main/src/backtrace/unwind.rs#L91
<re_irc> < (@adamgreig:matrix.org)> wonder if gdb etc have the same problem
<re_irc> < (@dirbaio:matrix.org)> yea, but for that it needs reset to have proper cfi (?)
<re_irc> < (@adamgreig:matrix.org)> : thanks. hopefully anyone overriding _stack_start to some weird value will be able to work out what's up too lol
<re_irc> < (@dirbaio:matrix.org)> if it lands somewhere with no cfi it prints a scary "things might be corupted" message
<re_irc> < (@dirbaio:matrix.org)> : ah you can _override_ it
<re_irc> < (@adamgreig:matrix.org)> naturally 😆
<re_irc> < (@dirbaio:matrix.org)> ERROR(cortex-m-rt): stack start address is not 8-byte aligned.
<re_irc> By default, stack starts at the end of RAM. Check that both RAM origin and
<re_irc> length are set to multiples of 8, in the `memory.x` file.
<re_irc> < (@dirbaio:matrix.org)> better with "by default"
<re_irc> < (@adamgreig:matrix.org)> looks good, doesn't need the comma after 8 though
<re_irc> < (@dirbaio:matrix.org)> ERROR(cortex-m-rt): stack start address is not 8-byte aligned.
<re_irc> If you haven't, stack starts at the end of RAM by default. Check that both RAM
<re_irc> origin and length are set to multiples of 8 in the `memory.x` file.
<re_irc> If you have set _stack_start, check it's set to an address multiple of 8 bytes.
<re_irc> < (@adamgreig:matrix.org)> "set to a multiple of 8 bytes" or "set to an address which is a multiple of 8 bytes" maybe? sorry to nitpick
<re_irc> < (@dirbaio:matrix.org)> I wonder, if you add a second linker script (like defmt does), does that run "after" link.x? and if so, can it change _stack_start to something bad after the assert has already ran?
<re_irc> < (@adamgreig:matrix.org)> I think in that case " PROVIDE(_stack_start = ORIGIN(RAM) + LENGTH(RAM));" from our link.x would have already run
<re_irc> < (@adamgreig:matrix.org)> it's only overridable in the previously-included memory.x in that case?
<re_irc> < (@adamgreig:matrix.org)> I am not completely confident in this though
<re_irc> < (@adamgreig:matrix.org)> (and if overridden in memory.x, the assert would happen afterwards so still catch any upsets)
<re_irc> < (@jamesmunns:beeper.com)> maybe dumb choice - if it can be overridden, do an asm "mask lower three bits, immediately halt if != 0"?
<re_irc> < (@jamesmunns:beeper.com)> (at runtime start)
<re_irc> < (@jamesmunns:beeper.com)> (would increase code size like... two instructions?)
<re_irc> < (@dirbaio:matrix.org)> noooo!you're wasting 4-8 bytes! ☠️
<re_irc> < (@dirbaio:matrix.org)> * noooo! you're
<re_irc> < (@jamesmunns:beeper.com)> lmao
<re_irc> < (@adamgreig:matrix.org)> I'm very reluctant to add that sort of runtime overhead if we can possibly avoid it
<re_irc> <Peter Hansen> Technically one could have _stack_start be 8-aligned even if neither RAM origin nor length were multiples of 8, as long as their sum was... maybe anyone doing that should be shot though.
<re_irc> < (@adamgreig:matrix.org)> so long as it's 8 byte aligned it doesn't matter if ram origin and length aren't multiples of 8 anyway though
<re_irc> < (@adamgreig:matrix.org)> I'd much rather just put in the documentation that if you override _stack_start, it must be a multiple of 8
<re_irc> <Peter Hansen> : Yeah, that was in reference to the text in the ASSERT added.
<re_irc> < (@jamesmunns:beeper.com)> fair! I figured "one assert at startup won't hurt", but again, probably not unreasonable to "fix it in docs"
<re_irc> < (@jamesmunns:beeper.com)> might be fun to have a bunch of "pedantic asserts" in a feature flag we can ask people to turn on when spooky things happen
<re_irc> < (@jamesmunns:beeper.com)> (can you gate asm blocks on "#[cfg(debug_assertions)]"?)
<re_irc> < (@jamesmunns:beeper.com)> like checking all the alignment and whatever else preconditions otherwise only specified in docs.
<re_irc> <Peter Hansen> Maybe better to have the pedantic asserts enabled normally but turnable-off only with an explicit feature.
<re_irc> < (@adamgreig:matrix.org)> best to avoid "turn _off_ with feature"
<re_irc> < (@adamgreig:matrix.org)> but also don't want to say "checks only on debug builds" because they're often not usable on embedded anyway and/or would totally obscure the problem
<re_irc> < (@adamgreig:matrix.org)> a feature that turns on some extra startup checks, akin to the existing set-vtor and set-sp features, I could live with
<re_irc> < (@jamesmunns:beeper.com)> yeah, could be an explicit flag I guess.
<re_irc> < (@jamesmunns:beeper.com)> (and really - should be separate from debug assertions for the reasons you said, the more I think about it)
<re_irc> <Peter Hansen> I was thinking that if they're not merely pedantic, but basically mandatory unless you know what you're doing, then "#[cfg(not(feature = "i_assume_the_risk"))" would make some sense. Maybe some of the checks are merely pedantic, but one where your system is broken like this one, unless you know the implications and take steps to deal with it should be more forceful maybe.
<re_irc> < (@jamesmunns:beeper.com)> its more about "how cargo works" than "should this be on or off"
<re_irc> < (@jamesmunns:beeper.com)> ANY dep could set "lol_trust_me" as a feature flag, and the checks will be gone.
<re_irc> <Peter Hansen> I don't mean the debug assertions aspect, I agree there. I mean this stack_start 8-alignment... it's basically mandatory, except for someone who knows what they're doing, right?
<re_irc> < (@jamesmunns:beeper.com)> (this is why "disable" features are not recommended in Rust)
<re_irc> <Peter Hansen> but okay, i see your point
<re_irc> < (@jamesmunns:beeper.com)> all feature flags are basically "globally OR'd together".
<re_irc> <Peter Hansen> yeah, bummer :(
<re_irc> < (@jamesmunns:beeper.com)> Yeah, features are one of those things I wish was different, but sorta get how we ended up with the current impl.
<re_irc> < (@jamesmunns:beeper.com)> maybe "features2" some day.
<re_irc> < (@adamgreig:matrix.org)> well, boo, the second link script does just get glued in afterwards, and so if you did override _stack_start in an additional link script it won't trigger the assert
<re_irc> < (@adamgreig:matrix.org)> so that's a pain but it's also so many levels of cursed
<re_irc> < (@adamgreig:matrix.org)> ho hum
<re_irc> < (@dirbaio:matrix.org)> lol
<re_irc> < (@adamgreig:matrix.org)> however you _can_ put "LONG(_stack_start & 0xFFFFFFF8);" in link.x
<re_irc> < (@adamgreig:matrix.org)> which (I just tested) does indeed zero the bottom three bits
<re_irc> < (@adamgreig:matrix.org)> which will round _down_ "_stack_start" to the next lower multiple of 8, so seems ok?
<re_irc> < (@adamgreig:matrix.org)> needs testing on arm-none-eabi-gcc and arm-none-eabi-ld as well as rustlld though
<re_irc> < (@adamgreig:matrix.org)> I guess we could turn the error into a warning with that
<re_irc> < (@adamgreig:matrix.org)> (I don't know that warnings are a thing here though, so I'd probably just keep it as an error)
<re_irc> < (@dirbaio:matrix.org)> why'd you want to set a nonaligned stack though?
<re_irc> < (@adamgreig:matrix.org)> it's always a mistake in rust, so I think you wouldn't want that
<re_irc> < (@adamgreig:matrix.org)> (but the warning is because the linker script would force alignment)
<re_irc> < (@adamgreig:matrix.org)> (so it's no longer an error, just letting you know you're wasting 1-7 bytes of ram)
<re_irc> < (@dirbaio:matrix.org)> linker warnings aren't great, cargo doesn't show them unless there's also an error
<re_irc> < (@adamgreig:matrix.org)> yea
<re_irc> < (@adamgreig:matrix.org)> hence just keeping it as an error
<re_irc> < (@adamgreig:matrix.org)> but by masking it in the linker script, we do ensure that even overriding _stack_start in a separate file won't cause this problem
<re_irc> < (@dirbaio:matrix.org)> ahhh
<re_irc> < (@dirbaio:matrix.org)> so, the "LONG(_stack_start & 0xFFFFFFF8);" executes "after", even if it's earlier in the scripts than the override?
<re_irc> < (@dirbaio:matrix.org)> i've never fully understood linker scripts and never will lol
<re_irc> < (@adamgreig:matrix.org)> yea
<re_irc> < (@adamgreig:matrix.org)> I mean, if it executed "before" then we wouldn't have a problem because the second linker script couldn't override it
<re_irc> < (@adamgreig:matrix.org)> well that's https://github.com/rust-embedded/cortex-m/pull/465, let's see what the CI makes of it on the other linkers
<re_irc> < (@adamgreig:matrix.org)> seems fine locally at least
<re_irc> < (@adamgreig:matrix.org)> for the main problem, I propose finding out if we can just remove the push-lr-to-stack business entirely, by testing out behaviour of probe-run and gdb against not having it, not having it and also deleting the cfi func stuff, and some binaries for chibios/st hal/etc that don't include it either
<re_irc> < (@adamgreig:matrix.org)> it seems nicer and more in line with what "everyone else does", just with the caveat that somehow probe-run might need some other way to detect the end of the unwinding?
<re_irc> < (@jamesmunns:beeper.com)> Does stack_end need to be 8-aligned too?
<re_irc> < (@jamesmunns:beeper.com)> that'll require a slightly tricker "LONG((_stack_end & 0xFFFFFFF8) + 8);"
<re_irc> < (@jamesmunns:beeper.com)> I guess we don't ever actually produce a stack end symbol, it'll just be "the top of whatever else gets put in ram"
<re_irc> < (@jamesmunns:beeper.com)> I thought I saw something in that aapcs doc that talked about stack end, but if we hit it we're probably having a bad day anyway
<re_irc> < (@adamgreig:matrix.org)> stack _end_ isn't a real concept afaik, there is stklim on armv8 which is a separate really good cool thing but probably not material here
<re_irc> < (@adamgreig:matrix.org)> (psplim/msplim rather)
<re_irc> < (@adamgreig:matrix.org)> those registers have the last 3 bytes always 0 anyway
<re_irc> < (@jamesmunns:beeper.com)> yeah, looking again I can't find anything
<re_irc> < (@adamgreig:matrix.org)> * bits
<re_irc> < (@jamesmunns:beeper.com)> so... I should go to bed lol
<re_irc> < (@adamgreig:matrix.org)> same...
<re_irc> < (@adamgreig:matrix.org)> good work everyone involved in spotting this, what a subtle bug!
<re_irc> < (@adamgreig:matrix.org)> ARM have some startup code too but it's very basic https://github.com/ARM-software/CMSIS/blob/master/Device/ARM/ARMCM0/Source/GCC/startup_ARMCM0.S
<re_irc> < (@jamesmunns:beeper.com)> should probably be pretty noisy about this one, I'd guess?
<re_irc> < (@jamesmunns:beeper.com)> I don't totally grok when this could trigger, it seemed to require a pretty niche opt?
<re_irc> < (@jamesmunns:beeper.com)> (noisy once the next c-m-rt release happens, I mean)
<re_irc> < (@adamgreig:matrix.org)> I think strd/ldrd are the main culprits, but if there was a reproducer on v6 it can't have involved them
<re_irc> < (@adamgreig:matrix.org)> but yea, let's get it fixed first, then a new c-m-rt release and tweet
<re_irc> < (@adamgreig:matrix.org)> yea
<re_irc> < (@jamesmunns:beeper.com)> should just require a "cargo update -p cortex-m-rt" if you're already on cmrt 0.7?
<re_irc> < (@jamesmunns:beeper.com)> (I think 0.6 happened to be unaffected/was already 8 byte aligned somehow?)
<re_irc> < (@adamgreig:matrix.org)> 0.6 never included the code to push lr to stack, so yea, it's unaffected
<re_irc> < (@adamgreig:matrix.org)> 0.7.0 is OK too, only 0.7.1 and 0.7.2 are affected
<re_irc> < (@adamgreig:matrix.org)> (it's not that we need to take action to align the stack to 8 bytes, it starts out aligned, the problem is that we added an instruction in 0.7.1 that pushes 4 bytes to the stack which unaligned it)
<re_irc> < (@adamgreig:matrix.org)> it would be interesting to understand what else could be going wrong on v6 that triggers it, hmm
<re_irc> < (@adamgreig:matrix.org)> ARM have a tech note about this problem lol
<re_irc> < (@adamgreig:matrix.org)> but, armv6m doesn't include ldrd/strd, and that doc notes that armv7m doesn't fault on misaligned strd/ldrd
<re_irc> < (@jamesmunns:beeper.com)> that note reads like LDRD/STRD exist in armv6? Is there just no thumb version?
<re_irc> < (@jamesmunns:beeper.com)> oh
<re_irc> < (@adamgreig:matrix.org)> there's a special note about cortex-m at the bottom
<re_irc> < (@adamgreig:matrix.org)> but it's really just about exceptions
<re_irc> < (@adamgreig:matrix.org)> so, I think if you were dealing with u128 on the stack it might easily trigger this, but I don't yet understand what was going wrong in the previous reproducers
<re_irc> < (@jamesmunns:beeper.com)> seemed to only repro with an array of size two, maybe it was using ldrd/strd to load those to registers faster?
<re_irc> < (@jamesmunns:beeper.com)> their repro case was using "[u32; 2]"
<re_irc> < (@jamesmunns:beeper.com)> so yeah, doing ldrd/strd for splatting those two from mem to registers makes sense?
<re_irc> < (@adamgreig:matrix.org)> yea, but ldrd is specifically meant to work fine with a word-aligned address, it doesn't need to be 8-byte aligned on armv7m
<re_irc> < (@adamgreig:matrix.org)> and it doesn't exist at all on armv6m which apparently also had problems
<re_irc> <Peter Hansen> I think datdenkikniet had identified an optimization that was applied only in the case of a two-element array, and could involve that strd instruction. The generated code was different before nightly 02-05 so it didn't trigger the optimization which was broken in the face of the non-8-aligned SP.
<re_irc> < (@adamgreig:matrix.org)> so I don't think the problem is actually the ldrd/strd
<re_irc> <Peter Hansen> But it got hard to follow the specifics since at some point the minimal example ended up no longer failing just with 02-05 but was failing with any Rust going back over a year.
<re_irc> <Peter Hansen> I'd just guess that the [u32; 2] situation tends to involve code that may be broken if SP is not 8-aligned. It was likely any of several different instructions involved depending on the specifics of the test code.
<re_irc> < (@adamgreig:matrix.org)> yea, but I wonder what exactly
<re_irc> <Peter Hansen> I actually never noticed an strd myself... not sure where picked that one... maybe in his own output.
<re_irc> < (@adamgreig:matrix.org)> out of interest did you have any u128 anywhere in the troublesome code?
<re_irc> <Peter Hansen> nope
<re_irc> < (@adamgreig:matrix.org)> it's not like [u32; 2] needs 8-byte alignment either
<re_irc> < (@adamgreig:matrix.org)> do you have a complete minimal reproducer anywhere?
<re_irc> < (@jamesmunns:beeper.com)> This was the bad repro I think?
<re_irc> 39b52: 9505 strr5, [sp, #20]
<re_irc> 39b50: 1f48 subsr0, r1, #5
<re_irc> ; let modulator_id = match data_marker {
<re_irc> <Peter Hansen> no, but clearly the code that was being generated only worked with it.
<re_irc> <Peter Hansen> : https://github.com/peter9477/test2
<re_irc> < (@adamgreig:matrix.org)> thanks
<re_irc> < (@jamesmunns:beeper.com)> : if the compiler "knew" the stack was 8 byte aligned on entry, and wanted to do an 8 byte aligned trick with it, it could assume it was free to?
<re_irc> < (@adamgreig:matrix.org)> yes, that's right
<re_irc> < (@adamgreig:matrix.org)> but what trick is it doing?
<re_irc> < (@adamgreig:matrix.org)> the assumption earlier this evening was that as soon as we hit the strd instruction you get UB if it's not 8-byte aligned, but that's not true, so it must be something else
<re_irc> < (@adamgreig:matrix.org)> (plus that one you just posted doesn't have any strd etc)
<re_irc> < (@jamesmunns:beeper.com)> yeah, worth re-examining
<re_irc> < (@jamesmunns:beeper.com)> full dis asm: https://gist.github.com/jamesmunns/3313199fee70c012fb82a1ff361ee8c9
<re_irc> < (@jamesmunns:beeper.com)> main, in case call site context is relevant: https://gist.github.com/jamesmunns/3313199fee70c012fb82a1ff361ee8c9#file-cmrt-repro-txt-L54-L98
<re_irc> < (@jamesmunns:beeper.com)> (note: I didn't run this on hardware, can't guarantee this is a repro case, and that upsets me, lemme go grab a dev board)
<re_irc> < (@jamesmunns:beeper.com)> okay yeah, does repro on hw, that disasm contains whatever bug
<re_irc> < (@jamesmunns:beeper.com)> (thank you Peter Hansen for such an easy repro setup!)
<re_irc> < (@jamesmunns:beeper.com)> (also man that generated "udf" asm is funny)
<re_irc> < (@jamesmunns:beeper.com)> (function prelude, jump to ext asm which
<re_irc> < (@jamesmunns:beeper.com)> +is two udfs and no return, then call another udf for good measure)
<re_irc> < (@jamesmunns:beeper.com)> Yeah, I think that opt is only valid when you are align 8
<re_irc> < (@jamesmunns:beeper.com)> it's basically doing:
<re_irc> retval = *(slice.as_mut_ptr() | (bool as usize << 2 as *mut _));
<re_irc> < (@jamesmunns:beeper.com)> which only works when your base addr is 0 and you can add 4 by or-ing the bits instead of adding
<re_irc> < (@jamesmunns:beeper.com)> abusing orr.w for the pointer math, followed by the "ldrr1, [r1, #0]" (the deref in my example), which then gets packed in by stmia to the return value (I guess the stack dest?)
<re_irc> < (@adamgreig:matrix.org)> it's fractionally worse than that, even
<re_irc> < (@adamgreig:matrix.org)> because it puts that two-u32 array on the stack, it starts by decreasing sp by 8, then writes 99 to SP-4, 3 to SP-8, and returns *((SP-8) | (x << 2))
<re_irc> < (@adamgreig:matrix.org)> but yea, same consequence, only valid when SP is 8-byte aligned
<re_irc> < (@adamgreig:matrix.org)> I guess it's irrelevant that it has sp-8 because if the array had more than two elements you could ignore the rest when indexing with a bool anyway
<re_irc> < (@jamesmunns:beeper.com)> ah, yeah, I didn't super grok the first "sub sp #8"
<re_irc> < (@adamgreig:matrix.org)> but yea, it's that optimisation ORing the SP with x<<2
<re_irc> < (@jamesmunns:beeper.com)> I guess that's like 6 bytes with the "orr.w + ldr r1 [r1, 0]", vs like 8 for an add?
<re_irc> < (@jamesmunns:beeper.com)> or I dunno. it looks like it's llvm doing an "assume stack is 8-aligned" operation, not any wrong asm
<re_irc> < (@jamesmunns:beeper.com)> it's just stickin real hard to aapcs, which we weren't honoring.
<re_irc> < (@adamgreig:matrix.org)> so yea, SP-8 is the value to return when x is 0, SP-4 is the value to return when x=1, and ORing (x<<2) is like +4 when x==1
<re_irc> < (@jamesmunns:beeper.com)> (and then stmia as a 3-word memcpy basically
<re_irc> < (@jamesmunns:beeper.com)> * basically)
<re_irc> < (@adamgreig:matrix.org)> if SP-8 is 8-byte aligned the bottom 3 bits are 0 so you can always add 4 by ORing 100, if it's only 4 byte aligned then that third bit might already be 1, and ORing another 1 in will have no effect
<re_irc> < (@adamgreig:matrix.org)> hence always getting 99 returned
<re_irc> < (@adamgreig:matrix.org)> well, that makes more sense, at least
<re_irc> < (@adamgreig:matrix.org)> unfortunately yea that can clearly happen on v6/v7/v8 and presumably llvm has had this optimisation for some time
<re_irc> < (@jamesmunns:beeper.com)> yep
<re_irc> < (@jamesmunns:beeper.com)> maybe yank those old vers with bad align
<re_irc> < (@jamesmunns:beeper.com)> it seems super a thing the compiler is able to do and it is clear we are wrong
<re_irc> < (@adamgreig:matrix.org)> I don't think it's relevant that you _return_ the two-byte array either, in fact it's missing an optimisation opportunity to only write the array into memory once
<re_irc> < (@adamgreig:matrix.org)> right now it writes it into stack, selects one value or the other, then writes that selected value and the two array values back to the stack after
<re_irc> < (@adamgreig:matrix.org)> oh yea, they're defo getting yanked
<re_irc> < (@jamesmunns:beeper.com)> yeah we have no idea the space of what can trigger this
<re_irc> < (@jamesmunns:beeper.com)> so we have to assume anything can lol
<re_irc> < (@jamesmunns:beeper.com)> (maybe the limit is how much you can stmia back? I dunno the max number of pushes you can chain for that, but array of two makes sense because you can only abuse 8 aligned bytes. might show up for u8 arrays smaller than "[u8; 8]" too)
<re_irc> < (@jamesmunns:beeper.com)> anyway, wild guesses, bed time
<re_irc> < (@jamesmunns:beeper.com)> (okay that "orr.wr1, r2, r1, lsl #2" is actually super cool)
<re_irc> < (@adamgreig:matrix.org)> I dunno if the stmia matters, you can always just write multiple stm statements and this trick still saved you time
<re_irc> < (@adamgreig:matrix.org)> the version I compiled doesn't actually stmia at all (I think that's changed on nightly? but not on stable?
<re_irc> < (@adamgreig:matrix.org)> oh, perhaps it is actually stmia under the hood
<re_irc> < (@adamgreig:matrix.org)> hmm, no, the stores relating to this optimisation are all individual, because it ended up not allocating the registers in order
jcroisant has joined #rust-embedded
fabic has quit [Ping timeout: 252 seconds]
fabic has joined #rust-embedded
<re_irc> < (@datdenkikniet:matrix.org)> Came across https: //github.com/rust-embedded/cortex-m-rt/issues/139 which seems to have some relevant info wrt pushing lr
<re_irc> < (@datdenkikniet:matrix.org)> Came across https://github.com/rust-embedded/cortex-m-rt/issues/139 which seems to have some relevant info when it comes to pushing "lr"
jcroisant has quit [Quit: Connection closed for inactivity]
IlPalazzo-ojiisa has joined #rust-embedded
fabic has quit [Ping timeout: 260 seconds]
<re_irc> < (@datdenkikniet:matrix.org)> Did some digging, and this is what GDB seems to say
<re_irc> < (@datdenkikniet:matrix.org)> * say. It definitely seems to assume that "lr"will be 0xFFFF_FFFF when handling exceptions/and the like, but I've not quite figured if it always uses that to determine the end of a stack frame
<re_irc> < (@datdenkikniet:matrix.org)> there's a lot of caching going on so the code is rather tricky to follow
<re_irc> < (@datdenkikniet:matrix.org)> This (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/arm-tdep.c;h=347f3e6b3077f2c676cc15c170cf62b6b1ea967d;hb=HEAD#l3466) is where I found that
<re_irc> < (@datdenkikniet:matrix.org)> not super related to how we can fix the issue, but may give some insight as to how/why probe-run assumes that 0xFFFF_FFFF must mean that it's at the end
<re_irc> < (@datdenkikniet:matrix.org)> Did some digging, and this is what GDB seems to say. It definitely seems to assume that "lr"will be 0xFFFF_FFFF when handling exceptions and the like, but I've not quite figured if it always uses that to determine the end of a stack frame
<re_irc> < (@datdenkikniet:matrix.org)> Did some digging, and this is what GDB seems to say. It definitely seems to assume that "lr"will be 0xFFFF_FFFF eventually when handling exceptions and the like, but I've not quite figured if it always uses that to determine the end of a stack frame
IlPalazzo-ojiisa has quit [Remote host closed the connection]
<re_irc> < (@datdenkikniet:matrix.org)> : I think it ends up being something along the lines of, for GDB at least:
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> 2. If it's not, it keeps going for all cached values of "lp", and each value of "lp" is then used to construct a frame ID. If the frame ID (= eventually "lr" value) is the same for two frames, it's done.
<re_irc> < (@datdenkikniet:matrix.org)> definitely feels like a probe-run problem that should not have to be fixed in cortex-m-rt
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" (a finite amount), and each value of "lp" is then used to construct a frame ID. If the frame ID (= eventually "lr" value) is the same for two frames, it's done.
<re_irc> < (@datdenkikniet:matrix.org)> definitely feels like a probe-run problem that should not have to be fixed/accounted for in cortex-m-rt
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" or "sp" (not entirely sure which, but a finite amount), and each of these values is then used to construct a frame ID. If the frame ID (= eventually "lr" value) is the same for two frames, it's done.
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" or "sp" (not entirely sure which, but a finite amount), and each of these values is then used to construct a frame ID. If the frame ID (= eventually "lr" or "sp" value) is the same for two frames, it's done.
<re_irc> < (@datdenkikniet:matrix.org)> on top of that, the backtrace analysis that "probe-run" does seems to be perfectly capable of detecting #2 as well (takes about 4 lines of code to add), so it wouldn't be difficult to solve either. I don't know how to go about proving/showing that it actually works the way it's supposed to
<re_irc> < (@datdenkikniet:matrix.org)> PLR = previous LR value
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" or "sp" (not entirely sure which, but a finite amount), and each of these values is then used to construct a frame ID. If the frame ID (= directly correlated "lr" or "sp" value) is the same for two frames, it's done.
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" or "sp" (not entirely sure which, but a finite amount), and each of these values is then used to construct a frame ID. If the frame ID (= directly correlated "lr" or "sp" value) is the same for two frames, it's done because it's reached the outermost value (see here...
<re_irc> ... (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame-unwind.c;h=76601faa4797ae92a51875e53f9f80ab46f47b3a;hb=HEAD#l228)).
<re_irc> < (@datdenkikniet:matrix.org)> for more context: it actually produces a correct backtrace without pushing "lr" to the stack through "cortex-m-rtic"
<re_irc> < (@datdenkikniet:matrix.org)> * "cortex-m-rt"
<re_irc> < (@datdenkikniet:matrix.org)> must note that I've not actually run this example with GDB, so whether it actually does or doesn't work with that I don't know
<re_irc> < (@datdenkikniet:matrix.org)> this (https://github.com/datdenkikniet/probe-run/tree/fix_lr_backtrace) seems to be all that's required to fix it, but I have no idea _why_ the LR value repeats when it gets to the end.
<re_irc> < (@datdenkikniet:matrix.org)> * end, nor if that can be guaranteed somehow.
dc740 has quit [Remote host closed the connection]
dc740 has joined #rust-embedded
<re_irc> < (@datdenkikniet:matrix.org)> to confirm: in GDB, I'm quite certain that this line (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame.c;h=c69a3ea0cb08525c90ac91546c32789d6c565e50;hb=HEAD#l2324) should perform a callback to this function (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame-unwind.c;h=76601faa4797ae92a51875e53f9f80ab46f47b3a;hb=HEAD#l228) through this callback hook set by the target...
<re_irc> ... (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/arm-tdep.c;h=347f3e6b3077f2c676cc15c170cf62b6b1ea967d;hb=HEAD#l3936), which just performs #2
<re_irc> < (@datdenkikniet:matrix.org)> Ah wait, no it doesn't....
<re_irc> < (@datdenkikniet:matrix.org)> +🤦♂️
<re_irc> < (@datdenkikniet:matrix.org)> definitely feels like it may be a probe-run problem that should not have to be fixed/accounted for in cortex-m-rt
<re_irc> < (@datdenkikniet:matrix.org)> I think it ends up being something along the lines of, for GDB at least:
<re_irc> 1. If "lr" 0xFFFF_FFFF, there's an arm-specific thing that indicates that this is now the end of the frame (as per my screenshot).
<re_irc> 2. If it's not, it keeps going for all cached values of "lp" or "sp" (not entirely sure which, but a finite amount), and each of these values is then used to construct a frame ID. If the frame ID (= directly correlated "lr" or "sp" value) is the same "outer frame value", it's done (see here (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame-unwind.c;h=76601faa4797ae92a51875e53f9f80ab46f47b3a;hb=HEAD#l228)).
<re_irc> < (@datdenkikniet:matrix.org)> Ah wait, no it doesn't perform #2, but does somehow detect the outer-most frame.... 🤦♂️
<re_irc> < (@datdenkikniet:matrix.org)> Ah wait, no it doesn't perform #2, but does somehow detect the outer-most frame by comparing it to a static value.... 🤦♂️
<re_irc> Is there a way to shift all elements in a byte buffer as if it were a bytestream?
<re_irc> < (@monacoprinsen:matrix.org)> Hello,
<re_irc> If I shifts the elements individualy by << 1 for example it is not the same since the leading values do not carry over to the next element.
<re_irc> Right now I would achieve this by converting the buffer to a u32 for example and then performing the shift on that value. However I need it back in a buffer to perform a write operation.
<re_irc> Must be a more streamlined way.
<re_irc> < (@jamesmunns:beeper.com)> Maybe see if bitvec has something for that?
fabic has joined #rust-embedded
<re_irc> < (@datdenkikniet:matrix.org)> Ah wait, no it doesn't perform #2, but does somehow detect the outer-most frame by comparing it to a static value.... 🤦♂️ I guess it will have the exact same issue then. It also does a "WARN two stack frames were the same"
<re_irc> < (@datdenkikniet:matrix.org)> Ah wait, no it doesn't perform #2, but does try to somehow detect the outer-most frame by comparing it to a static value.... 🤦♂️ I guess it will have the exact same issue then. It also does a "WARN two stack frames were the same"
<re_irc> <henrik_alser> : You could use rotate_left (or right) on the array
<re_irc> <henrik_alser> Or heapless had HistoryBuffer
<re_irc> <henrik_alser> * has
<re_irc> < (@jamesmunns:beeper.com)> I think? they want to shift an array as if it was one big integers, e.g. the msb of one u32 becomes the lsb of the next u32 in the array (or the other way around)
<re_irc> < (@jamesmunns:beeper.com)> * integer,
emerent has quit [Ping timeout: 252 seconds]
<re_irc> <henrik_alser> Gotcha! Misunderstood the use case
emerent has joined #rust-embedded
<re_irc> < (@monacoprinsen:matrix.org)> : Thanks I will have a look!
<re_irc> < (@monacoprinsen:matrix.org)> : Correct 😊
Foxyloxy has joined #rust-embedded
Foxyloxy___ has quit [Ping timeout: 260 seconds]
Foxyloxy_ has joined #rust-embedded
Foxyloxy has quit [Ping timeout: 260 seconds]
<re_irc> <thejpster> Does anyone here know anything about 109-key Japanese keyboards?
fabic has quit [Ping timeout: 248 seconds]
IlPalazzo-ojiisa has joined #rust-embedded
dc740 has quit [Remote host closed the connection]
xnor has quit [Ping timeout: 268 seconds]
<re_irc> <Cory Frenette> Hello! 👋
<re_irc> I'm trying to get into embedded programming using Rust by writing a "no_std" "embedded-hal" device driver and I'm a little stuck at the moment. Hoping someone here might be able to give me a nudge in the right direction. Details in thread
<re_irc> <Cory Frenette> - I've got a large number (maybe? I'm new to embedded... It's 60+) of device registers that will be used to configure the behavior of the device.
<re_irc> - I'm using the bitflags crate to generate struct wrappers/api for the registers
<re_irc> - Because many combinations of register state are invalid, I want to implement a builder to hold pending configuration changes and validate on calling ".build()" before writing the updated state to their respective registers.
<re_irc> <Cory Frenette> I'd like to write this to be as generic as possible so I want to avoid heap allocation.
<re_irc> The problem with this approach: I can't figure out a way to determine which registers contain changes (and therefore need to be written) without allocation. I keep finding a need reach for a trait object shared by the register structs. I definitely feel like I'm doing something wrong here. Are there more idiomatic ways of doing what I'm trying to do or am I just basically going to have to generate an if statement for each...
<re_irc> ... register struct?
<re_irc> <thejpster> 60 registers, so give them 1 bit each in a u64?
<re_irc> <Cory Frenette> true, to represent changed or not, but at the end of the day I'm still going to have to write something like
<re_irc> interface.write(MyRegisterN)?;
<re_irc> }
<re_irc> times 60+, no? If that's what must be done, that's fine, we have macros.
<re_irc> <Cory Frenette> true, to represent changed or not, but at the end of the day I'm still going to have to write something like
<re_irc> interface.write(MyRegisterN)?;
<re_irc> if bit_set {
<re_irc> }
<re_irc> times 60+, no? If that's what must be done, that's fine, we have macros.
<re_irc> <Cory Frenette> In my fantasy solution, I am kind of hoping for an iterator over (owned? borrowed?) register structs or similar, but I think that would require a collection of trait objects
IlPalazzo-ojiisa has quit [Quit: Leaving.]
xnor has joined #rust-embedded