<JamesMunns[m]>
something like this might cut a couple redundant checks?
<JamesMunns[m]>
I'm not totally sure I follow with the omitted ring buffer bits
M9names[m] has quit [Quit: Idle timeout reached: 172800s]
<jason-kairos[m]>
I'll have to measure it. It will be interesting to see if split_at and get_mut help
JoshuaFocht[m] has quit [Quit: Idle timeout reached: 172800s]
<JamesMunns[m]>
it's worth looking at the asm, you should be able to pick out the bounds checking. In most cases, esp since you have const generics and stuff, you want to be able to prove to the optimizer that all of the slice indexes are inbounds, so the bounds checking will be elided
<JamesMunns[m]>
it's the double slicing of data that is a little harder. also copy_from_slice checks that the src and dst are the same size, so ideally you want to do things so that the optimizer can verify that as well and elide those checks
<jason-kairos[m]>
I will likey do so for my own education, although it is a real pain to follow functions once they get above about 100 instructions with all the optimizations enabled. I'd predict yours might be 20% faster for the single copy case. I also predict that doing it as a raw pointer might be 50% faster.
<jason-kairos[m]>
I'll probably do a conditional compile so that I can reuse as much of the logic as possible. (use the bounds checked version for development, and raw for production)
<JamesMunns[m]>
mostly you can just look for jumps to panic, or a basic block with a panic in it
M9names[m] has joined #rust-embedded
<M9names[m]>
i find compiler explorer is helpful for exploration work like this:
<M9names[m]>
you can put 2 versions of the same code that are conditionally included on the left, and get a nice side-by-side of the effect
<M9names[m]>
on the right
<thejpster[m]>
<thejpster[m]> "cortex-r programming involves..." <- I’m now reading through various Arm manuals and guides to work out what is going on.
<thejpster[m]>
Did you know that Armv8-R in Aarch64 mode allows you to run Linux using an MMU and an RTOS using an MPU at the same time? Linux pays the price of having MMU translation tables in RAM but the MPU does not (giving it more predictable latency). Pretty neat.
<thejpster[m]>
Armv8-R Aarch32 doesn’t have an MMU but does let you run multiple RTOSes at the same time.
<thejpster[m]>
Armv7-R only let you run one RTOS at once.
<thejpster[m]>
I wonder if Cortex-M processors will ever support virtualisation.
<dngrs[m]>
<M9names[m]> "i find compiler explorer is..." <- important footnote to Rust in compiler explorer: need to manually add `-C opt-level=3` for release mode if you wanna judge how that's gonna look
<jason-kairos[m]>
I was surprised that the compiler explorer output "seemed" more optimized than my program for the same function and optimization level.
<hjeldin__[m]>
any idea why i can't use Hsi48Config from rcc?
<hjeldin__[m]>
`cannot find struct, variant or union type Hsi48Config in module embassy_stm32::rcc`
<dirbaio[m]>
chekc you're using embassy-stm32 0.2.0
<dirbaio[m]>
check you've set the right chip in Cargo.toml
<dirbaio[m]>
* in Cargo.toml embassy-stm32 features
<hjeldin__[m]>
well, apparently my stm32l476 doesn't support the clock recovery system, nice
<hjeldin__[m]>
the examples in stm32l4 use stm32l4r5zi, but according to the stm documentation it shouldn't support crs or hsi48 so i'm not sure what is wrong
<thejpster[m]>
with semihosting, obviously - that's the output you can see
<jason-kairos[m]>
38 instructions (and I'd guess roughly 80 clock cycles) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]>
I almost feel like I want to cast my &[u8] (mixture of serial ascii data and binary data) into &[u32] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]>
I feel like copying 4 bytes into the end of a buffer ought to be faster - . But eventually I'll have to accept whatever slowness I encounter.
<jason-kairos[m]>
* 38 instructions (and I'd guess roughly 80 clock cycles - actual total 140 clock cycles) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]>
I almost feel like I want to cast my &\[u8\] (mixture of serial ascii data and binary data) into &\[u32\] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]>
s/-//
<jason-kairos[m]>
I'd be happy to pad the ends of ascii strings with zeros.
<thejpster[m]>
<thejpster[m]> "I’m now reading through various..." <- It turns out what was doing on was I was imply entering the wrong mode. I blame global_asm not letting me use nice pre-processor macros, like inline assembly in C files gets to do.
<jason-kairos[m]>
* 38 instructions (and I'd guess roughly 80 clock cycles - actual total 140 clock cycles including all setup) for rust to call and execute libc's memcpy on 4 bytes. (not including setup)
<jason-kairos[m]>
I almost feel like I want to cast my &\[u8\] (mixture of serial ascii data and binary data) into &\[u32\] and try to convince it that everything is aligned and that every chunk is the size of a register
<jason-kairos[m]>
Is it possible that manually writing a for loop to copy bytes might be faster than newlib's memcpy?
<jason-kairos[m]>
I'm beginning to suspect that it might be. At least if we are copying aligned u32's instead of bytes
<JamesMunns[m]>
jason-kairos[m]: There's a decent chance the optimizer is going to turn anything that looks like a memcpy into an actual memcpy
<jason-kairos[m]>
newlib's memcpy or a LLVM compiler intrinsic?
<JamesMunns[m]>
I don't think you're getting newlib, you're getting an LLVM intrinsic afaik.
<jason-kairos[m]>
When I called "core::ptr::copy_nonoverlapping" I got a reference to newlib in the debugger
<mabez[m]>
I think it's actually compiler_builtins memcpy
<JamesMunns[m]>
Rust doesn't link in a libc unless you tell it to on no std
<mabez[m]>
and it's not efficient at all
<mabez[m]>
but it's weakly defined, so you can write your own
<jason-kairos[m]>
Is there a way to ask to use a LLVM compiler intrinsic for a copy operation?
<jason-kairos[m]>
* copy operation? (or really, just use anything other that newlib's memcpy)
<mabez[m]>
Uh, you're kind of stuck a little bit a think. If you have a strongly defined memcpy symbol in your build dependencies, I don't think there is a way around that
<mabez[m]>
The linker will just pick up the strong definition always
<mabez[m]>
and if you have two definitions you'll get a linker error
<jason-kairos[m]>
So I need to check if newlib uses a weak definiton. And if it doesn't, I'd have to edit newlib and link to the modified version instead.
<mabez[m]>
unless the newlib one is also weakly defined, then you can try defining one in your own crate
<mabez[m]>
you may also be able to do some linker script crimes to discard the newlib memcpy symbol, but I'm not sure
<jason-kairos[m]>
I'm hopeful that maybe I can modify my source so that it is less likely to call newlib.
<jason-kairos[m]>
Surely there are a bunch of ways that the compiler has to chose from to copy memory around
<JamesMunns[m]>
By the way, are you really bound by the speed of your ring buffer? Or is this academic interest?
<jason-kairos[m]>
Unfortunately, I do have a real-time C library that I call from rust where I need to emit a ton of "printf" information
<jason-kairos[m]>
It has to take less than 500 cycles on a particular case
<JamesMunns[m]>
If you are, there are potentially fixes other than "make ring buffer faster", like using a different data structure, like bip buffers/bbqueue and dma to completely skip CPU copying at all.
<jason-kairos[m]>
It's a commercial product and library, and I'm trying to modify it as little as possible
<JamesMunns[m]>
Real time and printf? Rough.
<jason-kairos[m]>
It's garbage
<jason-kairos[m]>
But, it's my garbage
<jason-kairos[m]>
* my garbage to care and feed
<jason-kairos[m]>
* my garbage (I'm responsible for making it work)
<jason-kairos[m]>
I probably doing it wrong. I copy the format string, but really, what I should do is pass the address of the string. And modify the library anyplace it tires to use a non-constant string.
<jason-kairos[m]>
* I'm probably
<jason-kairos[m]>
I'd like to think that I would make copying 10x 32bit addresses/pointers happen in less than 500 cycles without too much hair pulling
<jason-kairos[m]>
s/would/could/
<jason-kairos[m]>
* I'm probably, * the address and size of the
<jason-kairos[m]>
* I'm probably, * format string to be processed at a later time, but, * the address and size of the
GrantM11235[m] has quit [Quit: Idle timeout reached: 172800s]
<jason-kairos[m]>
just did some testing and confirmed 430 cycles to do that using ArrayDeque
<jason-kairos[m]>
* using ArrayDeque - problem solves, just have the CPU do less work (and change the third party library source in ten thousand places)
<jason-kairos[m]>
* using ArrayDeque - problem solved, "just have the CPU do less work" (and change the third party library source in ten thousand places to support said chage)
<jason-kairos[m]>
* using ArrayDeque - problem solved, "just have the CPU do less work" (and change the third party library source in ten thousand places to support said change)
danielb[m] has quit [Quit: Idle timeout reached: 172800s]
Noah[m] has quit [Quit: Idle timeout reached: 172800s]
SirWoodyHackswel has joined #rust-embedded
<SirWoodyHackswel>
For breaking down sound into frequencies, FFT is the algorithm to use, correct? I see so many floating point FFT, but as a common DSP function, where are all the integer FFT algorithms?
<jason-kairos[m]>
If I recall correctly, the integer version might have a slightly different name, at least academically. I don't know any implementations or better names to search for off the top of my head.
<dirbaio[m]>
do FFT with fixed-point numbers instead of floating-point numbers? :P
<SirWoodyHackswel>
I guess SW fixed-point decimal is better than full SW floating-point implementation
<JamesMunns[m]>
it's going to be a lot faster, for sure
<dngrs[m]>
the cmsis_dsp crate ain't bad
<dngrs[m]>
it can balloon your binary in impressive and very unintuitive ways tho
<dngrs[m]>
* the cmsis_dsp crate ain't bad
<dngrs[m]>
* if you're on cortex-m the cmsis_dsp crate ain't bad
<dngrs[m]>
in general, if you're doing signal processing stuff like FFT/DCT (the latter being the kinda sorta "alternative name"; it's not exactly the same transform) you likely want DSP operations
<dngrs[m]>
* likely want your CPU doing DSP operations
<dngrs[m]>
(FFT/DFT retains the imaginary component, DCT only does the real part)
<SirWoodyHackswel>
I'm thinking more of RISC-V without HW float, but I'll look into the cmsis_dsp library
<SirWoodyHackswel>
Ahh... that's right. I only need the DCT. We don't need no stinkin'
<SirWoodyHackswel>
imaginary numbers!
<dngrs[m]>
if you end up implementing anything on top of the fixed crate do ping me :D
<SirWoodyHackswel>
Yeah, I'm not looking for super-accurate. Just accurate enough to stuff into N amount of bins.
<thejpster[m]>
I’ve never seen the bootsel button not work. It’s handled by the ROM.
<JamesMunns[m]>
Did you also reset the chip or power cycle it? You need to hold the bootsel button and hit the reset button (if there is one), or unplug it, hold the button, and plug it back in
<rukai[m]>
yep I did that, I dont have a reset button, but I held the bootsel button while reconnecting the usb port (which it gets power from)
<rukai[m]>
I even left it unplugged for 12 hours.
<JamesMunns[m]>
hmm, yeah, afaik there is no way to disable the hw bootloader, is it maybe possible the switch is damaged? Or do you have the schematic of the custom board?
<M9names[m]>
i guess that's faster than just trying to talk to flash and bailing if it fails
<M9names[m]>
<rukai[m]> "I used a small wire to short the..." <- there's always the most common one: your usb cable is broken.
<M9names[m]>
switching cables to a known good one is a good first step.
<M9names[m]>
s/one/failure mode/
<rukai[m]>
thats good thinking, the usb cables I've used work fine with other devices. I've also tested on 2 separate machines, one linux and one windows.
<JamesMunns[m]>
what OS are you running on?
<JamesMunns[m]>
ah, on linux, it would be good to see what lsusb says
<rukai[m]>
yeah its not showing up on lsusb
<M9names[m]>
anything on dmesg?
<JamesMunns[m]>
(if it's running a hello world blink it makes sense it isn't on lsusb)
<rukai[m]>
^ yeah thats what I was thinking
<M9names[m]>
sorry, you're holding bootsel or shorting it manually but the end result is still your code running blinky?
<M9names[m]>
if that has failed open, the switch will have no impact on CS state
<M9names[m]>
if you could measure the resistance of that component it would be best (maybe they fitted the wrong component)
<rukai[m]>
I'm not quire sure how the ohm settings on the multimeter work, but I set it to 2000 and then got a reading of 1000 from the multimeter, which sounds like the 1K we are expecting
<M9names[m]>
yep that sounds perfect
<rukai[m]>
s/quire/quite/
<M9names[m]>
it doesn't make much sense that the code in bootrom wouldn't be able to read it - it's a pretty simple circuit