#rust-embedded on 2022-01-17 — irc logs at libera.irclog.whitequark.org

00:23 xnor has joined #rust-embedded

02:11 starblue1 has quit [Ping timeout: 256 seconds]

02:11 <re_irc> <@firefrommoonlight:matrix.org> https://zadig.akeo.ie/ if windows

02:11 <re_irc> <@firefrommoonlight:matrix.org> Download this; follow the instructions

02:12 <re_irc> <@firefrommoonlight:matrix.org> Shit wrong chat

02:13 starblue1 has joined #rust-embedded

04:00 rjframe has quit [Ping timeout: 240 seconds]

07:39 ni has quit [Quit: WeeChat 3.0]

07:40 ni has joined #rust-embedded

08:04 jackneilll has quit [Quit: Leaving]

08:04 jackneilll has joined #rust-embedded

08:04 jackneilll has quit [Remote host closed the connection]

08:04 jackneill has joined #rust-embedded

08:19 fabic has joined #rust-embedded

08:19 fabic has quit [Remote host closed the connection]

08:22 gsalazar has joined #rust-embedded

11:13 starblue1 has quit [Ping timeout: 240 seconds]

11:15 starblue1 has joined #rust-embedded

12:51 rjframe has joined #rust-embedded

13:16 fabic has joined #rust-embedded

13:24 gsalazar_ has joined #rust-embedded

13:26 gsalazar has quit [Ping timeout: 240 seconds]

13:38 <re_irc> <@ryan-summers:matrix.org> Hmmm. I have a usb-device SerialPort that properly enumerates on Ubuntu, but doesn't seem to work on Windows. Does anyone happen to know what might be of issue? A USB-pcap doesn't reveal any real traffic on Windows

13:39 <re_irc> <@ryan-summers:matrix.org> Windows just shows it as "Unknown USB device, (Set address failed)", but I can properly send/receive data on Ubuntu

13:40 <re_irc> <@rahix:matrix.org> I think windows needs some additional descriptors or it won't cooperate - but don't ask me about any details

13:41 <re_irc> <@ryan-summers:matrix.org> Lumpio−: Have you had any success using usb-device and usbd-serial on Windows?

13:41 <re_irc> <@ryan-summers:matrix.org> Something about additional descriptors wouldn't surprise me

13:42 <re_irc> <@firefrommoonlight:matrix.org> https://zadig.akeo.ie/

13:42 <Lumpio-> What chip and are you running in debug or release mode

13:43 <re_irc> <@ryan-summers:matrix.org> STM32F407 running in release mode

13:43 <Lumpio-> Windows 10 at least ought to work without extra drivers.

13:44 <re_irc> <@ryan-summers:matrix.org> I very well may have swapped out my USB driver using Zadig in the past... Let me double check that things are default

14:01 <re_irc> <@ryan-summers:matrix.org> Hmmm. Zadig won't even let me install a different driver for the device

14:02 gsalazar_ is now known as gsalazar

14:02 gsalazar has quit [Quit: Leaving]

14:03 gsalazar has joined #rust-embedded

14:07 <re_irc> <@adamgreig:matrix.org> If it's failing to even enumerate then I guess zadig won't see it at all

14:07 <re_irc> <@adamgreig:matrix.org> Set address failed happens very early

14:09 <Lumpio-> I'm fairly sure it doesn't even read configuration descriptors at that point

14:15 fabic has quit [Ping timeout: 256 seconds]

14:24 fabic has joined #rust-embedded

14:33 <re_irc> <@ryan-summers:matrix.org> I'm not even seeing any usb-pcap data that appears relevant during the insertion event

14:51 fabic has quit [Ping timeout: 240 seconds]

15:22 fabic has joined #rust-embedded

15:36 rjframe has quit [Quit: Leaving]

16:15 fabic has quit [Ping timeout: 240 seconds]

16:22 fabic has joined #rust-embedded

16:29 fabic has quit [Quit: Leaving]

18:32 <re_irc> <@ryan-summers:matrix.org> Well, I've installed RTT traces into the various USB device states and tracking control pipe state etc. and there definitely seems to be something amiss:

18:32 <re_irc> <@ryan-summers:matrix.org> [INFO] device.rs:163 - PollResult: Data { ep_out: 0, ep_in_complete: 0, ep_setup: 1 }

18:32 <re_irc> <@ryan-summers:matrix.org> [INFO] device.rs:215 - ControlPipe State: CompleteIn(Request { direction: In, request_type: Standard, recipient: Device, request: 6, value: 256, index: 0, length: 64 })

18:32 <re_irc> <@ryan-summers:matrix.org> [INFO] device.rs:216 - Incoming request: Some(Request { direction: In, request_type: Standard, recipient: Device, request: 6, value: 256, index: 0, length: 64 })

18:32 <re_irc> ... long message truncated: https://psion.agg.io/_matrix/media/r0/download/psion.agg.io/e51a730b60344d361b3f21d408b6d2b2aebd51639f5903dab362d4c7743dd72e (21 lines)

18:32 <re_irc> <@ryan-summers:matrix.org> That repeats 4 times at regular intervals until Windows eventually gives up and says setting address for the device failed

19:06 <re_irc> <@yruama_lairba:matrix.org> hi, when compiling for "thumbv7em-none-eabihf", can the compiler use dsp instruction like mac ?

19:06 <re_irc> <@k900:0upti.me> Mac?

19:10 <re_irc> <@ryan-summers:matrix.org> multiply-accumulate instructions

19:11 <re_irc> <@ryan-summers:matrix.org> And my answer is that I'm not sure. There's been various math intrinsic support updates over the last year that have made things more efficient, but I don't know if DSP instructions are yet included?

19:12 <re_irc> <@yruama_lairba:matrix.org> i checked witha dummy example with godbolt, the answer seems yes

19:17 <re_irc> <@jordens:matrix.org> yruama_lairba:matrix.org: Not all, but MLA and related definitely. Also do `-C target-cpu=cortex-m7` if that is the case for you.

19:18 <re_irc> <@yruama_lairba:matrix.org> mla is multiplication accumulation, isn't a dsp instruction ?

19:19 <re_irc> <@jordens:matrix.org> are you asking or asserting?

19:26 <re_irc> <@yruama_lairba:matrix.org> asking

19:29 <re_irc> <@yruama_lairba:matrix.org> datasheets don't mention "dsp" instruction, so i guesse multiplication accumulation is just considered as a general instruction

19:36 <re_irc> <@mutantbob:matrix.org> I am figuring out how to mix Rust and C++ on an Arduino Uno. I can build a `.elf` file that I can install with `avrdude` and it works, but I have to do some steps manually because cargo thinks it should include `-lstdc++` in the link phase and it is wrong. Can someone help me figure out how to get things to work without manual intervention?...

19:36 <re_irc> ... https://stackoverflow.com/questions/70699064/in-rust-how-do-i-work-around-a-link-failiure-usr-libexec-gcc-avr-ld-cannot-f

20:07 <re_irc> <@jordens:matrix.org> `smlal` is a bit more extreme and also is being used

21:24 <re_irc> <@ubik:matrix.org> I have to store values of a struct which contains two 32-bit fields which don't actually need to be 32-bit. They could be 24-bit and I'd be able to save some space. Any clue how I can optimize that?

22:08 <re_irc> <@adamgreig:matrix.org> perhaps add methods to serialise to a [u8; 6] and back again?

22:15 <re_irc> <@adamgreig:matrix.org> yruama_lairba:matrix.org: yes, e.g. https://rust.godbolt.org/z/WbT349Grx vs https://rust.godbolt.org/z/xW9d11MEj

22:15 <re_irc> <@adamgreig:matrix.org> the thumbv7**e**m-none-eabihf target uses the E-only (DSP) PKHBT instruction

22:15 <re_irc> <@adamgreig:matrix.org> the thumbv7m-none-eabihf target doesn't

22:16 <re_irc> <@adamgreig:matrix.org> but it's not always easy to get the compiler to do it, and it doesn't know about many or all of the DSP instructions, especially the packed vector operations

22:16 <re_irc> <@adamgreig:matrix.org> I think therealprof spent some time recently adding support for several of them to LLVM

22:19 <re_irc> <@firefrommoonlight:matrix.org> ubik:matrix.org: I'm thinking something like AGG's suggestion, of making a custom struct composed of u8s, or u8s and u16s

22:19 <re_irc> <@firefrommoonlight:matrix.org> Anecdotally, I've been messing with something similar today, wherein I wish Rust had an `i24` type

22:19 <re_irc> <@firefrommoonlight:matrix.org> Since if you try to use `i32`, Rust misinterprets the sign bit, forcing you to do some adjustments

22:19 <re_irc> <@firefrommoonlight:matrix.org> eg

22:19 <re_irc> <@firefrommoonlight:matrix.org> ```rust

22:19 <re_irc> ... long message truncated: https://psion.agg.io/_matrix/media/r0/download/psion.agg.io/f06ab080327063d76b59c2796e14d0e743911cea55caec5d577f54f1774abf38 (8 lines)

22:19 <re_irc> <@firefrommoonlight:matrix.org> fn fix_i24_sign(val: &mut i32) {

22:19 <re_irc> <@firefrommoonlight:matrix.org> /// Fix the sign on signed 24 bit integers, represented as `i32`.

22:21 <re_irc> <@adamgreig:matrix.org> hmm, probably some other cute tricks you could do for that

22:22 <re_irc> <@adamgreig:matrix.org> https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ba07e3d7de2d073b0bc306be236cbc96

22:23 <re_irc> <@adamgreig:matrix.org> might be faster than a compare-branch-subtract

22:23 <re_irc> <@firefrommoonlight:matrix.org> Perfect

22:24 <re_irc> <@adamgreig:matrix.org> (it works because >> for signed types sign-extends in rust, i.e. is an arithmetic shift)

22:27 <re_irc> <@firefrommoonlight:matrix.org> I've confirmed from a few tests your code works

22:27 <re_irc> <@firefrommoonlight:matrix.org> and it does sound faster

22:27 <re_irc> <@firefrommoonlight:matrix.org> I had that thought earlier too about the if logic being a problem, since I'm doing this operation many times in realtime

22:27 <re_irc> <@firefrommoonlight:matrix.org> Made the switch - THANKS!

22:28 <re_irc> <@therealprof:matrix.org> adamgreig: It's a while back and it came to a halt because I couldn't find anyone interested in reviewing and merging the stuff. Most of the people seem to be on the payroll of big corporations nowadays and they're mostly interested in the latest and greatest CPU architectures.

22:28 <re_irc> <@adamgreig:matrix.org> :(

22:28 <re_irc> <@therealprof:matrix.org> I need to check whether the move to GH has improved the situation.

22:29 <re_irc> <@adamgreig:matrix.org> these instructions are still in armv8m though right?

22:29 <re_irc> <@therealprof:matrix.org> The last few months have been chaotic...

22:29 <re_irc> <@adamgreig:matrix.org> although it's annoying because now the DSP feature set is not part of the architecture name and you need armv8m.main+dsp or something like that

22:30 <re_irc> <@therealprof:matrix.org> Yes, they're still in armv8 but with more powerful SIMD instructionsets the interest in the 32bit instructions is lukewarm at best.

22:31 <re_irc> <@adamgreig:matrix.org> firefrommoonlight: nice, on thumbv7m (x<<8)>>8 will compile to a single SBFX instruction https://rust.godbolt.org/z/q64r948r1

22:31 <re_irc> <@therealprof:matrix.org> The instructions are pretty much all supported. What is missing though is lowering from IR to them.

22:32 <re_irc> <@therealprof:matrix.org> ... and in some cases also generating the right IR from Rust.

22:32 <re_irc> <@adamgreig:matrix.org> compared to 5 instructions for your original version: https://rust.godbolt.org/z/s5WPaTceG

22:33 <re_irc> <@adamgreig:matrix.org> therealprof: right, yea. from what I could tell rust gets close on the IR front

22:33 <re_irc> <@adamgreig:matrix.org> but i guess the problem is you also ideally want rust/llvm to autovectorise things which I guess is quite niche for thumbv7

22:34 <re_irc> <@therealprof:matrix.org> Last time I checked they were quite behind because they needed to support order LLVM versions and then the new IR codes were "forgotten" to be implemented.

22:35 <re_irc> <@therealprof:matrix.org> A lot of the code generation is relying on LLVM passes recombining trivial instructions into more complex ones so the backend can generate more ideal code.

22:36 <re_irc> <@therealprof:matrix.org> adamgreig: It is sufficient to pass apropriate types to LLVM and it will do the job just fine... If there're support for vectorized lowering.

22:37 <re_irc> <@adamgreig:matrix.org> hm, with std::simd's 32-bit packs?

22:38 <re_irc> <@therealprof:matrix.org> The last time you and were were looking at good code generation we managed to hand vectorisable arrays to LLVM but the lowering supported only the lowering to instructions processing single elements.

22:38 <re_irc> <@adamgreig:matrix.org> I haven't looked at packed_simd/std::arch/std::simd in a while

22:38 <re_irc> <@adamgreig:matrix.org> yea

22:38 <re_irc> <@adamgreig:matrix.org> oh huh, what was stdsimd is now https://github.com/rust-lang/portable-simd and there's also https://github.com/rust-lang/packed_simd ?

22:40 <re_irc> <@adamgreig:matrix.org> there's also https://doc.rust-lang.org/nightly/core/arch/arm/dsp/struct.int16x2_t.html

22:40 <re_irc> <@therealprof:matrix.org> There's no need to. If the lowering supports proper handling of SIMD data, it's enough if Rust passes a properly aligned array in IR form to LLVM and it will pick the best possible instruction sequence for it.

22:42 <re_irc> <@therealprof:matrix.org> But that requires that the instruction selection pass is told that e.g. the DSP instructions support an operation on 2xu16, how to prepare the data and how to pass generate the proper instructions.

22:43 <re_irc> <@therealprof:matrix.org> As far as I can tell all of them are supported by LLVM but most of them only for a single element.

22:44 <re_irc> <@adamgreig:matrix.org> mmm, so basically all the 32-bit DSP instructions work OK, but it won't generate 2x16 or 4x8 vector instructions by itself?

22:44 <re_irc> <@adamgreig:matrix.org> but since the packed types exist in std, in theory one could write a library using asm that did the right thing?

22:45 <re_irc> <@therealprof:matrix.org> Which also means that they're often not used simply because there might be some extra effort involved to e.g. load a u8 into a register (with zero extension), do the operation, extract the value of the register and put it somewhere.

22:45 <re_irc> <@therealprof:matrix.org> adamgreig: Exactly.

22:47 <re_irc> <@adamgreig:matrix.org> in principle I wonder if the relevant inline asm could be added to std instead of llvm

22:48 <re_irc> <@therealprof:matrix.org> adamgreig: I think the IR intrinsics do support all of those architecture specific instructions, yet.

22:48 <re_irc> <@therealprof:matrix.org> adamgreig: It could be but the Rust compiler team is not very keen on doing so.

22:49 <re_irc> <@therealprof:matrix.org> Oh wait.

22:49 <re_irc> <@therealprof:matrix.org> Inline ASM?

22:49 <re_irc> <@adamgreig:matrix.org> I think the packed_simd crate is to be superseded by portable_simd which used to be called stdsimd? and portable_simd will be std::simd eventually?

22:50 <re_irc> <@adamgreig:matrix.org> therealprof:matrix.org: since the packed types exist, you could write a library of methods that took packed types and used inline asm to call the vector instructions

22:50 <re_irc> <@adamgreig:matrix.org> it wouldn't let you use like the `+` and `-` operators and it wouldn't be automatically inferred from a for loop etc

22:51 <re_irc> <@therealprof:matrix.org> Yes, you could. But that would basically mean canned operations which are completely opaque to the compiler.

22:52 <re_irc> <@therealprof:matrix.org> I don't see how you could do that but still get most of the relevant compiler optimisations.

22:57 <re_irc> <@adamgreig:matrix.org> it looks like the types in portable_simd should already lower to the right llvm intrinsics, actually

22:57 <re_irc> <@adamgreig:matrix.org> so you still need to be using a packed type and explicitly writing SIMD code, not letting the compiler infer it from a straight loop

22:57 <re_irc> <@therealprof:matrix.org> ARM specific ones or generic LLVM ones?

22:57 <re_irc> <@adamgreig:matrix.org> ah, I guess generic LLVM intrinsics

22:58 <re_irc> <@adamgreig:matrix.org> right, I see, and LLVM won't turn a "add two i16x2" into the arm vector op?

22:58 <re_irc> <@adamgreig:matrix.org> https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/intrinsics.rs

22:59 <re_irc> <@therealprof:matrix.org> Only if there's isel support for lowering that instruction for i16x2 on the specific target.

23:01 <re_irc> <@therealprof:matrix.org> Many of DSP instructions only had lowering support for single types last time I looked.

23:02 <re_irc> <@therealprof:matrix.org> Any in many cases it's not worth using those over regular instructions due to load/unpack/pack/store.

23:02 <re_irc> <@adamgreig:matrix.org> yea, https://rust.godbolt.org/z/casb8599E

23:03 <re_irc> <@therealprof:matrix.org> There're some exceptions like saturated operations where the compiler often deems the costs low enough to be benefitial.

23:04 <re_irc> <@adamgreig:matrix.org> interesting

23:04 <re_irc> <@adamgreig:matrix.org> exactly so: https://rust.godbolt.org/z/b6vrM6bxj

23:04 <re_irc> <@adamgreig:matrix.org> though it uses two qadd16....

23:05 <re_irc> <@therealprof:matrix.org> Or 4 qadd8, exactly.

23:05 <re_irc> <@adamgreig:matrix.org> it's doing one qadd16 per 16-bit addition and then gluing them together, I see

23:05 <re_irc> <@adamgreig:matrix.org> just using the qadd16 to get the saturating effect

23:06 <re_irc> <@adamgreig:matrix.org> the IR seems OK, it calls LLVM sat.v2i16

23:07 <re_irc> <@therealprof:matrix.org> Not quite sure how the lowering works, I think it is running multiple passes, then calculate the resulting cost of the selected instructions and then it will pick the cheapest sequence.

23:08 <re_irc> <@therealprof:matrix.org> Yes, but the lowering only supports sat.v1i16, so LLVM splits the sat.v2i16 into 2 sat.v1i16.

23:09 <re_irc> <@therealprof:matrix.org> (on that architecture that is, I bet on ARMv8 with NEON and the newer SIMD instruction sets they do have optimized lowering for sat.v2i16)

23:12 <re_irc> <@therealprof:matrix.org> You don't actually need to use SIMD types, there's an autovectorizer which will take an array and automatically slice it into those vector types before it's passed on to the instruction selection.

23:15 <re_irc> <@therealprof:matrix.org> Of course using proper types will help to compiler to see what's what and unlock much better code generation.

23:15 <re_irc> <@adamgreig:matrix.org> hmm, https://rust.godbolt.org/z/1ba89EWoq

23:15 <re_irc> <@adamgreig:matrix.org> so yea, the portable_simd lowers to LLVM types that don't help, but core::arch contains a different i16x2 type that you can pass to the intrinsics it provides that call the underlying instructions

23:16 <re_irc> <@adamgreig:matrix.org> dunno why LLVM left in those two loads in add162 though, lol

23:16 <re_irc> <@adamgreig:matrix.org> `ldr r2, [r2]; ldr r1, [r1];` right sure OK

23:16 <re_irc> <@therealprof:matrix.org> Yes, that's calling the ARM specific intrinsics directly. That's bypassing the isel...

23:18 <re_irc> <@adamgreig:matrix.org> was your PR to add support for eg sat.v2i16 to thumbv7?

23:19 <re_irc> <@therealprof:matrix.org> My latest PRs were to add more scaffolding to make that happen so I wouldn't have to do it blindly, i.e. adding test cases to test the lowering of such generic intrinsics to various architectures.

23:20 <re_irc> <@therealprof:matrix.org> Those isel sequences are not trivial and affecting different sub-architectures so you want to make sure you have proper tests in place to detect any code misgeneration...

23:22 <re_irc> <@therealprof:matrix.org> e.g. before you can fire off an instruction you need to make sure that the data is properly loaded into the registers and the results can be stored.

23:24 <re_irc> <@therealprof:matrix.org> So a handler for a sat.v2i16, needs to be trigger a number of insn handler handlers (to be written), generate the qadd16 instruction and then fire off another set of handlers (to be written) to store the result.

23:26 <re_irc> <@therealprof:matrix.org> Once you install those insn handlers you might see funny changes all over the map because the compiler sees "oh, we can move data in this way -- let's try that!".

23:30 <re_irc> <@adamgreig:matrix.org> compilers huh

23:31 <re_irc> <@therealprof:matrix.org> You know, things like: instead of splitting one load into multiple partials, it could decide to do one load and splat the other elements to other registers using other instructions or even find a way to reuse the data from the same register directly.

23:33 <re_irc> <@therealprof:matrix.org> I think there's a lot of low hanging fruit there but again, most of the people seem to be paid to only care about the latest and greatest (even though a lot of that work could also be of benefit to archictectures higher up the food chain).

23:39 <re_irc> <@adamgreig:matrix.org> yea, I see. that's a shame... it sure would be nice.

23:48 <re_irc> <@therealprof:matrix.org> Time is still rather limited unfortunately but I hope that if and when things get a bit more normal again I'll have time to check out the new dev process of the LLVM project on GH and explore proper DSP instruction support a bit more...