ChanServ changed the topic of #rust-embedded to: Welcome to the Rust Embedded IRC channel! Bridged to #rust-embedded:matrix.org and logged at https://libera.irclog.whitequark.org/rust-embedded, code of conduct at https://www.rust-lang.org/conduct.html
jason-kairos[m] has joined #rust-embedded
<jason-kairos[m]> I've got an embedded routine that copies a few bytes into a buffer, and it is really slow even with optimizations enabled.... (full message at <https://catircservices.org/_irc/v1/media/download/AV-pZwVWq4DlttNS3C7hVMWAZ5d03s55koEQxo1e17RRDMY-tSAeIicNhRMD0Ra0q5KI5TGHQW5mHRHTZlDXod2_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9BS0pGRWRrb2VuY2pHQ3NqVFRNZGlYU0s>)
<JamesMunns[m]> can you share the rust code?
<jason-kairos[m]> Adding a 4 byte slice with opt=3 and all bounds checking disabled takes 170 cpu cycles (~half as many instructions)
<jason-kairos[m]> Just taking the simple path where does one copy_from_slice
<JamesMunns[m]> you're probably hitting a lot of bounds checking with so many slicing operations
<jason-kairos[m]> I disabled bounds checking, or so I though anyways
<jason-kairos[m]> But it does spend time in functions related to ranges and indexing
<jason-kairos[m]> Perhaps I need to cast it into a pointer and use memcpy?
<JamesMunns[m]> you can't disable bounds checking
<jason-kairos[m]> s/though/thought/
<jason-kairos[m]> Is there any other way to finagle rust into doing "less work" short of raw pointers?
<JamesMunns[m]> yeah, gimme a sec to read the code
<JamesMunns[m]> something like this might cut a couple redundant checks?
<JamesMunns[m]> I'm not totally sure I follow with the omitted ring buffer bits
M9names[m] has quit [Quit: Idle timeout reached: 172800s]
<jason-kairos[m]> I'll have to measure it. It will be interesting to see if split_at and get_mut help
JoshuaFocht[m] has quit [Quit: Idle timeout reached: 172800s]
<JamesMunns[m]> it's worth looking at the asm, you should be able to pick out the bounds checking. In most cases, esp since you have const generics and stuff, you want to be able to prove to the optimizer that all of the slice indexes are inbounds, so the bounds checking will be elided
<JamesMunns[m]> it's the double slicing of data that is a little harder. also copy_from_slice checks that the src and dst are the same size, so ideally you want to do things so that the optimizer can verify that as well and elide those checks
<jason-kairos[m]> I will likey do so for my own education, although it is a real pain to follow functions once they get above about 100 instructions with all the optimizations enabled. I'd predict yours might be 20% faster for the single copy case. I also predict that doing it as a raw pointer might be 50% faster.
<jason-kairos[m]> I'll probably do a conditional compile so that I can reuse as much of the logic as possible. (use the bounds checked version for development, and raw for production)
<JamesMunns[m]> mostly you can just look for jumps to panic, or a basic block with a panic in it
M9names[m] has joined #rust-embedded
<M9names[m]> i find compiler explorer is helpful for exploration work like this:
<M9names[m]> you can put 2 versions of the same code that are conditionally included on the left, and get a nice side-by-side of the effect
<M9names[m]> on the right
<thejpster[m]> <thejpster[m]> "cortex-r programming involves..." <- I’m now reading through various Arm manuals and guides to work out what is going on.
<thejpster[m]> Did you know that Armv8-R in Aarch64 mode allows you to run Linux using an MMU and an RTOS using an MPU at the same time? Linux pays the price of having MMU translation tables in RAM but the MPU does not (giving it more predictable latency). Pretty neat.
<thejpster[m]> Armv8-R Aarch32 doesn’t have an MMU but does let you run multiple RTOSes at the same time.
<thejpster[m]> Armv7-R only let you run one RTOS at once.
<thejpster[m]> I wonder if Cortex-M processors will ever support virtualisation.
<dngrs[m]> <M9names[m]> "i find compiler explorer is..." <- important footnote to Rust in compiler explorer: need to manually add `-C opt-level=3` for release mode if you wanna judge how that's gonna look
<jason-kairos[m]> I was surprised that the compiler explorer output "seemed" more optimized than my program for the same function and optimization level.
<hjeldin__[m]> any idea why i can't use Hsi48Config from rcc?
<hjeldin__[m]> `cannot find struct, variant or union type Hsi48Config in module embassy_stm32::rcc`
dirbaio[m] has joined #rust-embedded
<dirbaio[m]> chekc you're using embassy-stm32 0.2.0
<dirbaio[m]> check you've set the right chip in Cargo.toml
<dirbaio[m]> * in Cargo.toml embassy-stm32 features
<hjeldin__[m]> well, apparently my stm32l476 doesn't support the clock recovery system, nice
<hjeldin__[m]> the examples in stm32l4 use stm32l4r5zi, but according to the stm documentation it shouldn't support crs or hsi48 so i'm not sure what is wrong
<thejpster[m]> ha ha, success
<thejpster[m]> with semihosting, obviously - that's the output you can see
<jason-kairos[m]> 38 instructions (and I'd guess roughly 80 clock cycles) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]> I almost feel like I want to cast my &[u8] (mixture of serial ascii data and binary data) into &[u32] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]> I feel like copying 4 bytes into the end of a buffer ought to be faster - . But eventually I'll have to accept whatever slowness I encounter.
<jason-kairos[m]> * 38 instructions (and I'd guess roughly 80 clock cycles - actual total 140 clock cycles) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]> I almost feel like I want to cast my &\[u8\] (mixture of serial ascii data and binary data) into &\[u32\] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]> s/-//
<jason-kairos[m]> I'd be happy to pad the ends of ascii strings with zeros.
<thejpster[m]> <thejpster[m]> "I’m now reading through various..." <- It turns out what was doing on was I was imply entering the wrong mode. I blame global_asm not letting me use nice pre-processor macros, like inline assembly in C files gets to do.
<thejpster[m]> s/imply/simply/, s/global_asm/global\_asm/
<jason-kairos[m]> * 38 instructions (and I'd guess roughly 80 clock cycles - actual total 140 clock cycles including all setup) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]> I almost feel like I want to cast my &\[u8\] (mixture of serial ascii data and binary data) into &\[u32\] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]> Is it possible that manually writing a for loop to copy bytes might be faster than newlib's memcpy?
<jason-kairos[m]> I'm beginning to suspect that it might be. At least if we are copying aligned u32's instead of bytes
<JamesMunns[m]> jason-kairos[m]: There's a decent chance the optimizer is going to turn anything that looks like a memcpy into an actual memcpy
<jason-kairos[m]> newlib's memcpy or a LLVM compiler intrinsic?
<JamesMunns[m]> I don't think you're getting newlib, you're getting an LLVM intrinsic afaik.
<jason-kairos[m]> When I called "core::ptr::copy_nonoverlapping" I got a reference to newlib in the debugger
<mabez[m]> I think it's actually compiler_builtins memcpy
<JamesMunns[m]> Rust doesn't link in a libc unless you tell it to on no std
<mabez[m]> and it's not efficient at all
<mabez[m]> but it's weakly defined, so you can write your own
<jason-kairos[m]> 0x40000024:30699172
<jason-kairos[m]> 0x0805ede2100in ../../../../../../newlib-4.4.0.20231231/newlib/libc/machine/arm/../../string/memcpy.c
<JamesMunns[m]> * no std (afaik)
<JamesMunns[m]> jason-kairos[m]: Are you linking in C code?
<mabez[m]> Are you linking to C code?
<mabez[m]> :D
<jason-kairos[m]> I am, but rust should not be using that
<JamesMunns[m]> Lto means it's allowed to
<mabez[m]> it will because its a weak def in compiler builtins
<jason-kairos[m]> I didn't know that. That's interesting...
<JamesMunns[m]> It could decide to merge those and pick one or the other, could also be weak symbols too, yeah
<jason-kairos[m]> I'm hearing - "link time optimizations are bad"
<jason-kairos[m]> * are bad" (unpredictable)
<mabez[m]> lto is a very good thing, you likely want it
<JamesMunns[m]> Bad is a judgement call. It's a tool, not a moral choice
ivmarkov[m] has joined #rust-embedded
<ivmarkov[m]> mabez[m]: Is it really saving _that_ much, compared to regular linker GC working on a per-function level?
<mabez[m]> ivmarkov[m]: As always, it depends :D
<JamesMunns[m]> It might not even be LTO. Weak symbols save you from multiple versions of memcpy
<mabez[m]> mabez[m]: Here it is: https://github.com/rust-lang/compiler-builtins/blob/ee92690aa3046d3f5a8f0097d07a8eb762d7be56/src/mem/mod.rs#L24. They've made things a bit more complicated by hiding the attributes behind a new one `#[mem_builtin]`, but it expands to a weak linker attribute
<jason-kairos[m]> Is there a way to ask to use a LLVM compiler intrinsic for a copy operation?
<jason-kairos[m]> * copy operation? (or really, just use anything other that newlib's memcpy)
<mabez[m]> Uh, you're kind of stuck a little bit a think. If you have a strongly defined memcpy symbol in your build dependencies, I don't think there is a way around that
<mabez[m]> The linker will just pick up the strong definition always
<mabez[m]> and if you have two definitions you'll get a linker error
<jason-kairos[m]> So I need to check if newlib uses a weak definiton. And if it doesn't, I'd have to edit newlib and link to the modified version instead.
<mabez[m]> unless the newlib one is also weakly defined, then you can try defining one in your own crate
<mabez[m]> you may also be able to do some linker script crimes to discard the newlib memcpy symbol, but I'm not sure
<jason-kairos[m]> I'm hopeful that maybe I can modify my source so that it is less likely to call newlib.
<jason-kairos[m]> Surely there are a bunch of ways that the compiler has to chose from to copy memory around
<JamesMunns[m]> By the way, are you really bound by the speed of your ring buffer? Or is this academic interest?
<jason-kairos[m]> Unfortunately, I do have a real-time C library that I call from rust where I need to emit a ton of "printf" information
<jason-kairos[m]> It has to take less than 500 cycles on a particular case
<JamesMunns[m]> If you are, there are potentially fixes other than "make ring buffer faster", like using a different data structure, like bip buffers/bbqueue and dma to completely skip CPU copying at all.
<jason-kairos[m]> It's a commercial product and library, and I'm trying to modify it as little as possible
<JamesMunns[m]> Real time and printf? Rough.
<jason-kairos[m]> It's garbage
<jason-kairos[m]> But, it's my garbage
<jason-kairos[m]> * my garbage to care and feed
<jason-kairos[m]> * my garbage (I'm responsible for making it work)
<jason-kairos[m]> I probably doing it wrong. I copy the format string, but really, what I should do is pass the address of the string. And modify the library anyplace it tires to use a non-constant string.
<jason-kairos[m]> * I'm probably
<jason-kairos[m]> I'd like to think that I would make copying 10x 32bit addresses/pointers happen in less than 500 cycles without too much hair pulling
<jason-kairos[m]> s/would/could/
<jason-kairos[m]> * I'm probably, * the address and size of the
<jason-kairos[m]> s/would/could/, s/addresses/values/
<jason-kairos[m]> * I'm probably, * format string to be processed at a later time, but, * the address and size of the
GrantM11235[m] has quit [Quit: Idle timeout reached: 172800s]
<jason-kairos[m]> just did some testing and confirmed 430 cycles to do that using ArrayDeque
<jason-kairos[m]> * using ArrayDeque - problem solves, just have the CPU do less work (and change the third party library source in ten thousand places)
<jason-kairos[m]> * using ArrayDeque - problem solved, "just have the CPU do less work" (and change the third party library source in ten thousand places to support said chage)
<jason-kairos[m]> * using ArrayDeque - problem solved, "just have the CPU do less work" (and change the third party library source in ten thousand places to support said change)
danielb[m] has quit [Quit: Idle timeout reached: 172800s]
Noah[m] has quit [Quit: Idle timeout reached: 172800s]
SirWoodyHackswel has joined #rust-embedded
<SirWoodyHackswel> For breaking down sound into frequencies, FFT is the algorithm to use, correct? I see so many floating point FFT, but as a common DSP function, where are all the integer FFT algorithms?
<jason-kairos[m]> If I recall correctly, the integer version might have a slightly different name, at least academically. I don't know any implementations or better names to search for off the top of my head.
<dirbaio[m]> do FFT with fixed-point numbers instead of floating-point numbers? :P
<SirWoodyHackswel> I guess SW fixed-point decimal is better than full SW floating-point implementation
<JamesMunns[m]> it's going to be a lot faster, for sure
<dngrs[m]> the cmsis_dsp crate ain't bad
<dngrs[m]> it can balloon your binary in impressive and very unintuitive ways tho
<dngrs[m]> * the cmsis_dsp crate ain't bad
<dngrs[m]> * if you're on cortex-m the cmsis_dsp crate ain't bad
<dngrs[m]> in general, if you're doing signal processing stuff like FFT/DCT (the latter being the kinda sorta "alternative name"; it's not exactly the same transform) you likely want DSP operations
<dngrs[m]> * likely want your CPU doing DSP operations
<dngrs[m]> (FFT/DFT retains the imaginary component, DCT only does the real part)
<SirWoodyHackswel> I'm thinking more of RISC-V without HW float, but I'll look into the cmsis_dsp library
<SirWoodyHackswel> Ahh... that's right. I only need the DCT. We don't need no stinkin'
<SirWoodyHackswel> imaginary numbers!
<dngrs[m]> if you end up implementing anything on top of the fixed crate do ping me :D
<SirWoodyHackswel> Yeah, I'm not looking for super-accurate. Just accurate enough to stuff into N amount of bins.
<thejpster[m]> who wants to see a thing?
<thejpster[m]> I ported Google's Arm GICv3 driver to AArch32 (changing MSR/MRS to MCR/MCRR/MRC/MRRC) and then lost three hours of my life because I mis-typed 0b0100 as 8 instead of 4 and was writing to totally the... (full message at <https://catircservices.org/_irc/v1/media/download/ATip8K1aUfRshf5Bs3mJepkmno2lm06C_BGo9EsejGQ-yjMmuSowSwBlh8Ghi4RSZ2Kvx07WXL-mXE8E8Tmd6z6_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9xcGlseW9kRkR5QlZqc0FLWWpZQmR6UGE>)
<thejpster[m]> s/git/gic/
corecode[m] has joined #rust-embedded
<corecode[m]> do you mean integer or fixed int
rukai[m] has joined #rust-embedded
<rukai[m]> I have a custom board with an rp2040 on it. I've been using the stock firmware from the designer of the board for a while and its been working fine.... (full message at <https://catircservices.org/_irc/v1/media/download/ARL7F2PfzMYCUnvKLFsk_uyJSq9MCaPZ1zK1fjLaT4ZWr3JOIQsC1gjQ8tfySAUSxzsWCpPcNw4jbxuXvM4QDj-_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9meWRkUmJpcFVyVmhQb2doQlFJbVFwbks>)
<rukai[m]> * I have a custom board with an rp2040 on it. I've been using the stock firmware from the designer of the board for a while and its been working fine.... (full message at <https://catircservices.org/_irc/v1/media/download/AQPTPboYGDpjvniQwSnYLvYcw1iiVz4voATDaNgiCeR5_VJgbVc96XMuM-9lNcBkYadX93HlBkYWng8-qi3DwZi_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy9OcndGTFpUZHRra1dSak12Yk1QcWtpcG4>)
<thejpster[m]> I’ve never seen the bootsel button not work. It’s handled by the ROM.
<JamesMunns[m]> Did you also reset the chip or power cycle it? You need to hold the bootsel button and hit the reset button (if there is one), or unplug it, hold the button, and plug it back in
<rukai[m]> yep I did that, I dont have a reset button, but I held the bootsel button while reconnecting the usb port (which it gets power from)
<rukai[m]> I even left it unplugged for 12 hours.
<JamesMunns[m]> hmm, yeah, afaik there is no way to disable the hw bootloader, is it maybe possible the switch is damaged? Or do you have the schematic of the custom board?
<M9names[m]> huh, is that schematic just wrong? how is pulling CS low on an active low flash chip going to put it into the boot rom?
<dirbaio[m]> that's what the Pico does as well
<dirbaio[m]> it's one of these crazy hacks rpi does to save pins
<M9names[m]> i know it disables the flash chip, but... how does that do it?
<M9names[m]> is it because it can't pulse CS?
<JamesMunns[m]> yeah, it reads the CS pin at boot time
<M9names[m]> oh. OH.
<JamesMunns[m]> (iirc)
<M9names[m]> i guess that's faster than just trying to talk to flash and bailing if it fails
<M9names[m]> <rukai[m]> "I used a small wire to short the..." <- there's always the most common one: your usb cable is broken.
<M9names[m]> switching cables to a known good one is a good first step.
<M9names[m]> s/one/failure mode/
<rukai[m]> thats good thinking, the usb cables I've used work fine with other devices. I've also tested on 2 separate machines, one linux and one windows.
<JamesMunns[m]> what OS are you running on?
<JamesMunns[m]> ah, on linux, it would be good to see what lsusb says
<rukai[m]> yeah its not showing up on lsusb
<M9names[m]> anything on dmesg?
<JamesMunns[m]> (if it's running a hello world blink it makes sense it isn't on lsusb)
<rukai[m]> ^ yeah thats what I was thinking
<M9names[m]> sorry, you're holding bootsel or shorting it manually but the end result is still your code running blinky?
<dirbaio[m]> try removing power, hold bootsel pushed, apply power, release bootsel
<rukai[m]> nothing showing up in dmesg
<JamesMunns[m]> dirbaio[m]: they said they tried that
<rukai[m]> yeah, its still running blinky when I hold down bootsel while booting
<JamesMunns[m]> do you have a multimeter?
<rukai[m]> I have this, not sure if it works
<JamesMunns[m]> would be good to verify the resistance goes to ~0 across the button when you press it
Jubilee[m] has quit [Quit: Idle timeout reached: 172800s]
<M9names[m]> and i'd test the resistor too. ideally you'd check to the lead of the flash chip but those are often no-lead or bga for rp2040
<M9names[m]> and the rp2040 itself is fine pitch qfn, no fun at all to probe
<rukai[m]> > it's pretty easy to test a multimeter for continuity. put it in resistance mode, it should show infinity or a very high reading.... (full message at <https://catircservices.org/_irc/v1/media/download/AY3y-wdan4SsT5zFCjEB0IJfVoNl8YltUpTzHEwuKfSATGXcvl4bQKDLhmbfHZlpQA_d4cM4HEd_-GjOjLq-IoS_8AAAAAAAAGNhdGlyY3NlcnZpY2VzLm9yZy92VG5MdGVuZEtpb2xETGJDVWJSd1NlcWw>)
<rukai[m]> yeah the rp2040 leads are inaccessible
<M9names[m]> and the other side of that resistor?
<rukai[m]> > and the other side of that resistor?
<rukai[m]> I'm not sure what you are referring to?
<M9names[m]> R10
<M9names[m]> if that has failed open, the switch will have no impact on CS state
<M9names[m]> if you could measure the resistance of that component it would be best (maybe they fitted the wrong component)
<rukai[m]> I'm not quire sure how the ohm settings on the multimeter work, but I set it to 2000 and then got a reading of 1000 from the multimeter, which sounds like the 1K we are expecting
<M9names[m]> yep that sounds perfect
<rukai[m]> s/quire/quite/
<M9names[m]> it doesn't make much sense that the code in bootrom wouldn't be able to read it - it's a pretty simple circuit
<M9names[m]> not many places for it to go wrong here