jackneilll has quit [Remote host closed the connection]
jackneill has joined #rust-embedded
fabic has joined #rust-embedded
fabic has quit [Remote host closed the connection]
gsalazar has joined #rust-embedded
starblue1 has quit [Ping timeout: 240 seconds]
starblue1 has joined #rust-embedded
rjframe has joined #rust-embedded
fabic has joined #rust-embedded
gsalazar_ has joined #rust-embedded
gsalazar has quit [Ping timeout: 240 seconds]
<re_irc>
<@ryan-summers:matrix.org> Hmmm. I have a usb-device SerialPort that properly enumerates on Ubuntu, but doesn't seem to work on Windows. Does anyone happen to know what might be of issue? A USB-pcap doesn't reveal any real traffic on Windows
<re_irc>
<@ryan-summers:matrix.org> Windows just shows it as "Unknown USB device, (Set address failed)", but I can properly send/receive data on Ubuntu
<re_irc>
<@rahix:matrix.org> I think windows needs some additional descriptors or it won't cooperate - but don't ask me about any details
<re_irc>
<@ryan-summers:matrix.org> Lumpio−: Have you had any success using usb-device and usbd-serial on Windows?
<re_irc>
<@ryan-summers:matrix.org> Something about additional descriptors wouldn't surprise me
<Lumpio->
What chip and are you running in debug or release mode
<re_irc>
<@ryan-summers:matrix.org> STM32F407 running in release mode
<Lumpio->
Windows 10 at least ought to work without extra drivers.
<re_irc>
<@ryan-summers:matrix.org> I very well may have swapped out my USB driver using Zadig in the past... Let me double check that things are default
<re_irc>
<@ryan-summers:matrix.org> Hmmm. Zadig won't even let me install a different driver for the device
gsalazar_ is now known as gsalazar
gsalazar has quit [Quit: Leaving]
gsalazar has joined #rust-embedded
<re_irc>
<@adamgreig:matrix.org> If it's failing to even enumerate then I guess zadig won't see it at all
<re_irc>
<@adamgreig:matrix.org> Set address failed happens very early
<Lumpio->
I'm fairly sure it doesn't even read configuration descriptors at that point
fabic has quit [Ping timeout: 256 seconds]
fabic has joined #rust-embedded
<re_irc>
<@ryan-summers:matrix.org> I'm not even seeing any usb-pcap data that appears relevant during the insertion event
fabic has quit [Ping timeout: 240 seconds]
fabic has joined #rust-embedded
rjframe has quit [Quit: Leaving]
fabic has quit [Ping timeout: 240 seconds]
fabic has joined #rust-embedded
fabic has quit [Quit: Leaving]
<re_irc>
<@ryan-summers:matrix.org> Well, I've installed RTT traces into the various USB device states and tracking control pipe state etc. and there definitely seems to be something amiss:
<re_irc>
<@ryan-summers:matrix.org> That repeats 4 times at regular intervals until Windows eventually gives up and says setting address for the device failed
<re_irc>
<@yruama_lairba:matrix.org> hi, when compiling for "thumbv7em-none-eabihf", can the compiler use dsp instruction like mac ?
<re_irc>
<@ryan-summers:matrix.org> And my answer is that I'm not sure. There's been various math intrinsic support updates over the last year that have made things more efficient, but I don't know if DSP instructions are yet included?
<re_irc>
<@yruama_lairba:matrix.org> i checked witha dummy example with godbolt, the answer seems yes
<re_irc>
<@jordens:matrix.org> yruama_lairba:matrix.org: Not all, but MLA and related definitely. Also do `-C target-cpu=cortex-m7` if that is the case for you.
<re_irc>
<@yruama_lairba:matrix.org> mla is multiplication accumulation, isn't a dsp instruction ?
<re_irc>
<@jordens:matrix.org> are you asking or asserting?
<re_irc>
<@yruama_lairba:matrix.org> asking
<re_irc>
<@yruama_lairba:matrix.org> datasheets don't mention "dsp" instruction, so i guesse multiplication accumulation is just considered as a general instruction
<re_irc>
<@mutantbob:matrix.org> I am figuring out how to mix Rust and C++ on an Arduino Uno. I can build a `.elf` file that I can install with `avrdude` and it works, but I have to do some steps manually because cargo thinks it should include `-lstdc++` in the link phase and it is wrong. Can someone help me figure out how to get things to work without manual intervention?...
<re_irc>
<@jordens:matrix.org> `smlal` is a bit more extreme and also is being used
<re_irc>
<@ubik:matrix.org> I have to store values of a struct which contains two 32-bit fields which don't actually need to be 32-bit. They could be 24-bit and I'd be able to save some space. Any clue how I can optimize that?
<re_irc>
<@adamgreig:matrix.org> perhaps add methods to serialise to a [u8; 6] and back again?
<re_irc>
<@adamgreig:matrix.org> the thumbv7**e**m-none-eabihf target uses the E-only (DSP) PKHBT instruction
<re_irc>
<@adamgreig:matrix.org> the thumbv7m-none-eabihf target doesn't
<re_irc>
<@adamgreig:matrix.org> but it's not always easy to get the compiler to do it, and it doesn't know about many or all of the DSP instructions, especially the packed vector operations
<re_irc>
<@adamgreig:matrix.org> I think therealprof spent some time recently adding support for several of them to LLVM
<re_irc>
<@firefrommoonlight:matrix.org> ubik:matrix.org: I'm thinking something like AGG's suggestion, of making a custom struct composed of u8s, or u8s and u16s
<re_irc>
<@firefrommoonlight:matrix.org> Anecdotally, I've been messing with something similar today, wherein I wish Rust had an `i24` type
<re_irc>
<@firefrommoonlight:matrix.org> Since if you try to use `i32`, Rust misinterprets the sign bit, forcing you to do some adjustments
<re_irc>
<@adamgreig:matrix.org> might be faster than a compare-branch-subtract
<re_irc>
<@firefrommoonlight:matrix.org> Perfect
<re_irc>
<@adamgreig:matrix.org> (it works because >> for signed types sign-extends in rust, i.e. is an arithmetic shift)
<re_irc>
<@firefrommoonlight:matrix.org> I've confirmed from a few tests your code works
<re_irc>
<@firefrommoonlight:matrix.org> and it does sound faster
<re_irc>
<@firefrommoonlight:matrix.org> I had that thought earlier too about the if logic being a problem, since I'm doing this operation many times in realtime
<re_irc>
<@firefrommoonlight:matrix.org> Made the switch - THANKS!
<re_irc>
<@therealprof:matrix.org> adamgreig: It's a while back and it came to a halt because I couldn't find anyone interested in reviewing and merging the stuff. Most of the people seem to be on the payroll of big corporations nowadays and they're mostly interested in the latest and greatest CPU architectures.
<re_irc>
<@adamgreig:matrix.org> :(
<re_irc>
<@therealprof:matrix.org> I need to check whether the move to GH has improved the situation.
<re_irc>
<@adamgreig:matrix.org> these instructions are still in armv8m though right?
<re_irc>
<@therealprof:matrix.org> The last few months have been chaotic...
<re_irc>
<@adamgreig:matrix.org> although it's annoying because now the DSP feature set is not part of the architecture name and you need armv8m.main+dsp or something like that
<re_irc>
<@therealprof:matrix.org> Yes, they're still in armv8 but with more powerful SIMD instructionsets the interest in the 32bit instructions is lukewarm at best.
<re_irc>
<@adamgreig:matrix.org> firefrommoonlight: nice, on thumbv7m (x<<8)>>8 will compile to a single SBFX instruction https://rust.godbolt.org/z/q64r948r1
<re_irc>
<@therealprof:matrix.org> The instructions are pretty much all supported. What is missing though is lowering from IR to them.
<re_irc>
<@therealprof:matrix.org> ... and in some cases also generating the right IR from Rust.
<re_irc>
<@adamgreig:matrix.org> therealprof: right, yea. from what I could tell rust gets close on the IR front
<re_irc>
<@adamgreig:matrix.org> but i guess the problem is you also ideally want rust/llvm to autovectorise things which I guess is quite niche for thumbv7
<re_irc>
<@therealprof:matrix.org> Last time I checked they were quite behind because they needed to support order LLVM versions and then the new IR codes were "forgotten" to be implemented.
<re_irc>
<@therealprof:matrix.org> A lot of the code generation is relying on LLVM passes recombining trivial instructions into more complex ones so the backend can generate more ideal code.
<re_irc>
<@therealprof:matrix.org> adamgreig: It is sufficient to pass apropriate types to LLVM and it will do the job just fine... If there're support for vectorized lowering.
<re_irc>
<@adamgreig:matrix.org> hm, with std::simd's 32-bit packs?
<re_irc>
<@therealprof:matrix.org> The last time you and were were looking at good code generation we managed to hand vectorisable arrays to LLVM but the lowering supported only the lowering to instructions processing single elements.
<re_irc>
<@adamgreig:matrix.org> I haven't looked at packed_simd/std::arch/std::simd in a while
<re_irc>
<@therealprof:matrix.org> There's no need to. If the lowering supports proper handling of SIMD data, it's enough if Rust passes a properly aligned array in IR form to LLVM and it will pick the best possible instruction sequence for it.
<re_irc>
<@therealprof:matrix.org> But that requires that the instruction selection pass is told that e.g. the DSP instructions support an operation on 2xu16, how to prepare the data and how to pass generate the proper instructions.
<re_irc>
<@therealprof:matrix.org> As far as I can tell all of them are supported by LLVM but most of them only for a single element.
<re_irc>
<@adamgreig:matrix.org> mmm, so basically all the 32-bit DSP instructions work OK, but it won't generate 2x16 or 4x8 vector instructions by itself?
<re_irc>
<@adamgreig:matrix.org> but since the packed types exist in std, in theory one could write a library using asm that did the right thing?
<re_irc>
<@therealprof:matrix.org> Which also means that they're often not used simply because there might be some extra effort involved to e.g. load a u8 into a register (with zero extension), do the operation, extract the value of the register and put it somewhere.
<re_irc>
<@adamgreig:matrix.org> in principle I wonder if the relevant inline asm could be added to std instead of llvm
<re_irc>
<@therealprof:matrix.org> adamgreig: I think the IR intrinsics do support all of those architecture specific instructions, yet.
<re_irc>
<@therealprof:matrix.org> adamgreig: It could be but the Rust compiler team is not very keen on doing so.
<re_irc>
<@therealprof:matrix.org> Oh wait.
<re_irc>
<@therealprof:matrix.org> Inline ASM?
<re_irc>
<@adamgreig:matrix.org> I think the packed_simd crate is to be superseded by portable_simd which used to be called stdsimd? and portable_simd will be std::simd eventually?
<re_irc>
<@adamgreig:matrix.org> therealprof:matrix.org: since the packed types exist, you could write a library of methods that took packed types and used inline asm to call the vector instructions
<re_irc>
<@adamgreig:matrix.org> it wouldn't let you use like the `+` and `-` operators and it wouldn't be automatically inferred from a for loop etc
<re_irc>
<@therealprof:matrix.org> Yes, you could. But that would basically mean canned operations which are completely opaque to the compiler.
<re_irc>
<@therealprof:matrix.org> I don't see how you could do that but still get most of the relevant compiler optimisations.
<re_irc>
<@adamgreig:matrix.org> it looks like the types in portable_simd should already lower to the right llvm intrinsics, actually
<re_irc>
<@adamgreig:matrix.org> so you still need to be using a packed type and explicitly writing SIMD code, not letting the compiler infer it from a straight loop
<re_irc>
<@therealprof:matrix.org> ARM specific ones or generic LLVM ones?
<re_irc>
<@adamgreig:matrix.org> ah, I guess generic LLVM intrinsics
<re_irc>
<@adamgreig:matrix.org> right, I see, and LLVM won't turn a "add two i16x2" into the arm vector op?
<re_irc>
<@therealprof:matrix.org> There're some exceptions like saturated operations where the compiler often deems the costs low enough to be benefitial.
<re_irc>
<@adamgreig:matrix.org> though it uses two qadd16....
<re_irc>
<@therealprof:matrix.org> Or 4 qadd8, exactly.
<re_irc>
<@adamgreig:matrix.org> it's doing one qadd16 per 16-bit addition and then gluing them together, I see
<re_irc>
<@adamgreig:matrix.org> just using the qadd16 to get the saturating effect
<re_irc>
<@adamgreig:matrix.org> the IR seems OK, it calls LLVM sat.v2i16
<re_irc>
<@therealprof:matrix.org> Not quite sure how the lowering works, I think it is running multiple passes, then calculate the resulting cost of the selected instructions and then it will pick the cheapest sequence.
<re_irc>
<@therealprof:matrix.org> Yes, but the lowering only supports sat.v1i16, so LLVM splits the sat.v2i16 into 2 sat.v1i16.
<re_irc>
<@therealprof:matrix.org> (on that architecture that is, I bet on ARMv8 with NEON and the newer SIMD instruction sets they do have optimized lowering for sat.v2i16)
<re_irc>
<@therealprof:matrix.org> You don't actually need to use SIMD types, there's an autovectorizer which will take an array and automatically slice it into those vector types before it's passed on to the instruction selection.
<re_irc>
<@therealprof:matrix.org> Of course using proper types will help to compiler to see what's what and unlock much better code generation.
<re_irc>
<@adamgreig:matrix.org> so yea, the portable_simd lowers to LLVM types that don't help, but core::arch contains a different i16x2 type that you can pass to the intrinsics it provides that call the underlying instructions
<re_irc>
<@adamgreig:matrix.org> dunno why LLVM left in those two loads in add162 though, lol
<re_irc>
<@adamgreig:matrix.org> `ldr r2, [r2]; ldr r1, [r1];` right sure OK
<re_irc>
<@therealprof:matrix.org> Yes, that's calling the ARM specific intrinsics directly. That's bypassing the isel...
<re_irc>
<@adamgreig:matrix.org> was your PR to add support for eg sat.v2i16 to thumbv7?
<re_irc>
<@therealprof:matrix.org> My latest PRs were to add more scaffolding to make that happen so I wouldn't have to do it blindly, i.e. adding test cases to test the lowering of such generic intrinsics to various architectures.
<re_irc>
<@therealprof:matrix.org> Those isel sequences are not trivial and affecting different sub-architectures so you want to make sure you have proper tests in place to detect any code misgeneration...
<re_irc>
<@therealprof:matrix.org> e.g. before you can fire off an instruction you need to make sure that the data is properly loaded into the registers and the results can be stored.
<re_irc>
<@therealprof:matrix.org> So a handler for a sat.v2i16, needs to be trigger a number of insn handler handlers (to be written), generate the qadd16 instruction and then fire off another set of handlers (to be written) to store the result.
<re_irc>
<@therealprof:matrix.org> Once you install those insn handlers you might see funny changes all over the map because the compiler sees "oh, we can move data in this way -- let's try that!".
<re_irc>
<@adamgreig:matrix.org> compilers huh
<re_irc>
<@therealprof:matrix.org> You know, things like: instead of splitting one load into multiple partials, it could decide to do one load and splat the other elements to other registers using other instructions or even find a way to reuse the data from the same register directly.
<re_irc>
<@therealprof:matrix.org> I think there's a lot of low hanging fruit there but again, most of the people seem to be paid to only care about the latest and greatest (even though a lot of that work could also be of benefit to archictectures higher up the food chain).
<re_irc>
<@adamgreig:matrix.org> yea, I see. that's a shame... it sure would be nice.
<re_irc>
<@therealprof:matrix.org> Time is still rather limited unfortunately but I hope that if and when things get a bit more normal again I'll have time to check out the new dev process of the LLVM project on GH and explore proper DSP instruction support a bit more...