#rust-embedded on 2021-07-11 — irc logs at libera.irclog.whitequark.org

00:00 neceve has quit [Ping timeout: 255 seconds]

00:07 fabic has joined #rust-embedded

04:02 x56 has quit [Quit: Ծ-Ծ]

04:13 x56 has joined #rust-embedded

05:00 inara has quit [*.net *.split]

05:00 sauce has quit [*.net *.split]

05:00 sauce has joined #rust-embedded

05:03 inara has joined #rust-embedded

05:59 fabic has quit [Ping timeout: 246 seconds]

06:15 fabic has joined #rust-embedded

06:50 loki_val is now known as crabbedhaloablut

06:52 fabic has quit [Ping timeout: 268 seconds]

06:59 GenTooMan has quit [Ping timeout: 252 seconds]

07:05 GenTooMan has joined #rust-embedded

08:38 fabic has joined #rust-embedded

09:33 fabic has quit [Ping timeout: 246 seconds]

09:48 neceve has joined #rust-embedded

10:56 rjframe has joined #rust-embedded

11:47 fabic has joined #rust-embedded

12:37 tokomak has joined #rust-embedded

12:46 tafa has joined #rust-embedded

13:45 GenTooMan has quit [Ping timeout: 252 seconds]

13:48 rjframe has quit [Ping timeout: 272 seconds]

13:50 GenTooMan has joined #rust-embedded

15:19 GenTooMan has quit [Ping timeout: 240 seconds]

15:29 GenTooMan has joined #rust-embedded

15:33 neceve has quit [Ping timeout: 255 seconds]

15:44 rjframe has joined #rust-embedded

16:07 fabic has quit [Ping timeout: 268 seconds]

16:17 <re_irc> <@therealprof:matrix.org> https://reviews.llvm.org/rG98c2e4115d8d has landed. 🎉

16:36 <re_irc> <@yatekii:matrix.org> pardon my ignorance, what is lowering and what does this mean? :) can I read on it somewhere?

16:38 <re_irc> <@therealprof:matrix.org> yatekii: It means assembler code generation from an intermediate compiler code representation generated in this case by the Rust compiler.

16:41 <re_irc> <@therealprof:matrix.org> In this particular case we found out that for saturated additions of 8/16bit signed integers, the assembly generated by LLVM for the Rust code would turn into a specialzed DSP instruction on supported MCUs/CPUs but LLVM wouldn't do the same for unsigned integers which is what this PR adds.

16:41 <re_irc> <@yatekii:matrix.org> ah nice =)

16:41 <re_irc> <@yatekii:matrix.org> thanks! for implementing and explaing :D

16:41 <re_irc> <@adamgreig:matrix.org> are there any non-vector instructions left that won't be lowered?

16:41 <re_irc> <@adamgreig:matrix.org> i still dunno what to do about the vector stuff now

16:42 <re_irc> <@adamgreig:matrix.org> was very up for doing it in cortex-m, the intrinsics are easy and it seems like a packed struct of I8x4 or I16x4 or U8x4 or U16x4 would be easy and would work, but

16:42 <re_irc> <@adamgreig:matrix.org> it seems like maybe it's going to happen in stdsimd?

16:45 <re_irc> <@adamgreig:matrix.org> or, maybe, stdarch, not sure why the arm dsp simd stuff is in stdarch not stdsimd

16:46 <re_irc> <@therealprof:matrix.org> adamgreig: non-vector?

16:47 <re_irc> <@adamgreig:matrix.org> I worded that poorly, I guess really I just wonder if this completes all the obvious lowerings llvm should have for v7e

16:48 <re_irc> <@adamgreig:matrix.org> I'm assuming it won't easily infer vector/simd stuff, but of course it does still use the vector instructions when operating on scalar but smaller-than-word datatypes (like qadd16 or whatever)

16:50 <re_irc> <@therealprof:matrix.org> Sorry busy with family activities... Can reply soon with more details.

16:51 <re_irc> <@adamgreig:matrix.org> no rush.

17:00 neceve has joined #rust-embedded

17:19 fabic has joined #rust-embedded

17:29 <re_irc> <@adamgreig:matrix.org> therealprof: aaah, the new intrinsics in stdarch _are_ in core, but not in stable core because they haven't made it through the pipeline yet, but if you look at the nightly docs they're present: https://doc.rust-lang.org/nightly/core/arch/arm/index.html

17:29 <re_irc> <@adamgreig:matrix.org> however, dunno how long it will be before they stabilise, which is a shame: https://doc.rust-lang.org/nightly/core/arch/arm/fn.__qadd16.html

17:29 <re_irc> <@adamgreig:matrix.org> so maybe it would still be worth putting in cortex-m, idk

17:34 <re_irc> <@therealprof:matrix.org> Intrinsics are rather boring IMHO.

17:35 <re_irc> <@therealprof:matrix.org> The state of DSP extensions in LLVM is rather desolate.

17:35 <re_irc> <@therealprof:matrix.org> They're modelled as instructions and there's a mapping from the intrinsics but then you're confined to the special types.

17:36 <re_irc> <@adamgreig:matrix.org> at least working and stable and safe intrinsics would let people write efficient platform-specific code and know it will run using vector ops

17:37 <re_irc> <@adamgreig:matrix.org> right now if i wanna write some dsp code on cortex-m i basically have to use unstable asm and do it all by hand, or implement all this stuff i've been looking at in cortex-m, or I guess use the nightly-only core::arch::arm intrinsics now they're in nightly but with no immediate path to stabilisation

17:37 <re_irc> <@adamgreig:matrix.org> so... actually, lots of options

17:37 <re_irc> <@therealprof:matrix.org> There's a special lowering pass for scalars if DSP instructions are present but they don't do SIMD, only single values (which is why we're seeing the instructions multiple times when doing operations on multiple values).

17:37 <re_irc> <@adamgreig:matrix.org> yea, that makes sense and is better than not using the instructions for sure

17:37 <re_irc> <@adamgreig:matrix.org> but it would be nice if it could detect the potential for autovectorisation

17:37 <re_irc> <@therealprof:matrix.org> Yeah. The thing is LLVM handles vector types internally.

17:37 <re_irc> <@adamgreig:matrix.org> the problem is it's probably worse in all those things we tried anyway: two i16 arguments will be passed into a function using two registers because of the ABI, right?

17:38 <re_irc> <@adamgreig:matrix.org> so it would be more instructions to pack them, run the qadd16, then unpack them for output

17:38 <re_irc> <@therealprof:matrix.org> There's also an autovectorizer pass, but since the DSP "vectors" are not modelled, they can't be used at all.

17:39 <re_irc> <@therealprof:matrix.org> adamgreig: Well, there's the experimental simd types in Rust which would allow you to actually declare vectors instead of scalars.

17:40 <re_irc> <@adamgreig:matrix.org> yea, that's what I mean, though they're morally equivalent to `#[repr(packed)] struct I16x2(pub i16, pub i16)`

17:40 <re_irc> <@therealprof:matrix.org> They don't work that well at the moment but even if they would that wouldn't help at all since the lack of DSP support means that LLVM will turn them into scalars automatically before lowering.

17:41 <re_irc> <@adamgreig:matrix.org> except, I don't know how you even construct them: https://doc.rust-lang.org/nightly/core/arch/arm/dsp/struct.int16x2_t.html

17:41 <re_irc> <@adamgreig:matrix.org> therealprof: but you use them with the intrinsics that are also part of experimental simd

17:41 <re_irc> <@adamgreig:matrix.org> they don't even implement + and * and so on afaict

17:42 <re_irc> <@adamgreig:matrix.org> unless we're talking about different things here: I'm looking at the core::arch::arm::dsp::int16x2, but maybe there's a generic high-level int16x2 as well that's meant to be portable?

17:43 <re_irc> <@therealprof:matrix.org> adamgreig: I think that's the broader idea, yes.

17:43 <re_irc> <@adamgreig:matrix.org> unfortunately they don't do any 32-bit types

17:43 <re_irc> <@adamgreig:matrix.org> https://rust-lang.github.io/stdsimd/core_simd/

17:43 <re_irc> <@therealprof:matrix.org> LLVM does support that already.

17:43 <re_irc> <@adamgreig:matrix.org> no 16x2 or 8x4

17:43 <re_irc> <@therealprof:matrix.org> adamgreig: Same as LLVM. 🙄

17:44 <re_irc> <@adamgreig:matrix.org> I see 🙃 though i've noticed their SimdU16 type is const-generic over the number of lanes so you could have SImdU16<2> I suppose

17:45 <re_irc> <@adamgreig:matrix.org> indeed they have `pub type i16x4 = SimdI16<4>;` but not `pub type i16x2 = SimdI16<2>;` so it would be super easy to add if nothing else

17:45 <re_irc> <@therealprof:matrix.org> In LLVM the types are fully generic already. So you can call e.g. `<2 x i16> @llvm.sadd.sat.v2i16(<2 x i16>, <2 x i16>)`

17:46 <re_irc> <@adamgreig:matrix.org> does that lower to the right thing for us?

17:47 <re_irc> <@therealprof:matrix.org> It lowers to correct code, but since v2i16 is not a natively supported type it can't lower to a single add instruction.

17:47 <re_irc> <@therealprof:matrix.org> Currently it would lower to:

17:47 <re_irc> <@therealprof:matrix.org> +; CHECK-ARMBASEDSP: @ %bb.0:

17:47 <re_irc> ... long message truncated: https://psion.agg.io/_matrix/media/r0/download/psion.agg.io/f694182e3fb3b54e911f39db17b4657f07c0fc51f7b43a283571d88f4e054a24 (12 lines)

17:47 <re_irc> <@therealprof:matrix.org> +; CHECK-ARMBASEDSP-LABEL: funcv2i16:

17:47 <re_irc> <@therealprof:matrix.org> +; CHECK-ARMBASEDSP-NEXT: lsl r2, r2, #16

17:48 <re_irc> <@adamgreig:matrix.org> aah, I See

17:48 <re_irc> <@adamgreig:matrix.org> so even if stdsimd added u16x2 and u8x4 and so forth for us, which would implement Add and Mul and all that, LLVM wouldn't lower it to the right thing anyway

17:48 <re_irc> <@therealprof:matrix.org> Note the additional shiftery just to separate and recombine the values. 🙄

17:48 <re_irc> <@adamgreig:matrix.org> but ultimately that would be a very nice way to be able to write SIMD stuff for v7e?

17:49 <re_irc> <@therealprof:matrix.org> No, for any architecture.

17:49 <re_irc> <@therealprof:matrix.org> 😉

17:49 <re_irc> <@adamgreig:matrix.org> sorry, yes, for any arch, _including_ v7e

17:49 <re_irc> <@adamgreig:matrix.org> whereas currently stdsimd doesn't work at all for v7e, both because it doesn't define any 32-bit SIMD types and also because even if it did, LLVM wouldn't lower them to the SIMD instructions

17:50 <re_irc> <@therealprof:matrix.org> It would use the available capabilities. You could also use a `v4i16` which might use a single instruction on NEON and 2 instructions on DSP automatically.

17:50 <re_irc> <@adamgreig:matrix.org> it would be cool to be able to test your SIMD code on x86 and then run it on v7e and know it was doing the right thing both times

17:50 <re_irc> <@adamgreig:matrix.org> yea, I see

17:50 <re_irc> <@therealprof:matrix.org> I think the NEON code would already do the right thing but I haven't tested specifically. 😉

17:50 <re_irc> <@adamgreig:matrix.org> nicer than the stdarch stuff where we'd get arm-specific intrinsics and packed types eventually, but the types are opaque and can only be used with those intrinsics

17:51 rjframe has quit [Remote host closed the connection]

17:51 rjframe has joined #rust-embedded

17:51 <re_irc> <@adamgreig:matrix.org> so maybe worth asking stdsimd people to add the 32-bit types anyway, since at least the llvm codegen would work even if it's not actually simd yet?

17:51 <re_irc> <@therealprof:matrix.org> It's actually brilliant because you can define vectors of arbitrary size and the lowering would automatically use all available resources and registers to the fullest.

17:52 <re_irc> <@therealprof:matrix.org> adamgreig: The only problem is: Currently the code would be lowered to significantly slower assembly than not using them.

17:53 <re_irc> <@adamgreig:matrix.org> compared to passing around each i16 as a u32, sure

17:53 <re_irc> <@adamgreig:matrix.org> but identical to passing them around packed, right?

17:53 <re_irc> <@adamgreig:matrix.org> not sure how a buffer of i16 would be treated either...

17:53 <re_irc> <@adamgreig:matrix.org> but nevertheless we'd need it in stdsimd eventually and in theory there's no reason for them to not include the typedef?

17:54 <re_irc> <@therealprof:matrix.org> adamgreig: Yes, no harm in that.

17:57 <re_irc> <@adamgreig:matrix.org> heh, talking of cross, guess what's in stdsimd's CI

17:58 <re_irc> <@therealprof:matrix.org> One cool thing about the Rust representation is that Rust is free to model structs however they desire. So in theory Rust could detect that you're trying to model a coordinates or color values or whatever and automatically chose a vector format and pass the data around like that.

18:06 <re_irc> <@therealprof:matrix.org> I'm having a hard time making sure the ARM instruction build code understands the concept of a v2i16 and v4i8 value though...

18:06 <re_irc> <@therealprof:matrix.org> llc: /opt/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:354: llvm::SDValue getCopyFromPartsVector(llvm::SelectionDAG&, const llvm::SDLoc&, const llvm::SDValue*, unsigned int, llvm::MVT, llvm::EVT, const llvm::Value*, llvm::Optional<unsigned int>): Assertion `NumRegs == NumParts && "Part...

18:06 <re_irc> ... count doesn't match vector breakdown!"' failed.

18:06 <re_irc> <@therealprof:matrix.org> is where I'm currently stuck.

18:07 rjframe has quit [Ping timeout: 258 seconds]

18:07 <re_irc> <@therealprof:matrix.org> I think I do have the lowering itself pretty much working already.

18:08 <re_irc> <@therealprof:matrix.org> I think the DSP instructions are pretty much all there so it's really just a matter of instruction selection (i.e. what goes in) and lowering (what comes out).

18:12 <re_irc> <@therealprof:matrix.org> If instruction selection doesn't know about v2i16 it will break up `<2 x i16>` into `apply this insn 2 times on i16`. It might be possible to rummage through the DAG while lowering and recombine those but yuck...

18:14 <re_irc> <@therealprof:matrix.org> I think there's also a lot of low hanging fruit in lowering of scalar operations using DSP. Those sign-extend and truction operations seem also quite superfluous to me and only seem to happen because it thinks the scalar operation is actually executed on the full register rather a fraction with the rest being...

18:14 <re_irc> ... simply ignored.

18:16 <re_irc> <@therealprof:matrix.org> Some mappings from IR calls to DSP instructions are also very rudimentary and could use a little love.

18:22 <re_irc> <@adamgreig:matrix.org> well, I opened https://github.com/rust-lang/stdsimd/pull/145 for stdsimd at least, the easy part :P

18:23 <re_irc> <@adamgreig:matrix.org> given the stuff you're looking at in llvm now would benefit all the 32-bit armv6/v7/v8 applications processors as well as v7e-m and v8-m it sounds like it would have a pretty widespread positive impact

18:25 <re_irc> <@therealprof:matrix.org> Yeah, let's see whether I can convince them that we can do better than "hey, we could use some of those fancy instructions for some scalar operations". 😉

18:27 <re_irc> <@therealprof:matrix.org> I think I'll add lowering tests for those types to all relevant test cases. Maybe that'll do the trick as an eye opener.

18:30 fabic has quit [Ping timeout: 268 seconds]

18:31 <re_irc> <@adamgreig:matrix.org> calling core::arch::arm::__qadd16(a: i16x2, b: i16x2) does lower to qadd16 at least

18:31 <re_irc> <@adamgreig:matrix.org> but it lowers to an FFI call to `llvm.arm.qadd16` so you'd hope so, lol

18:32 <re_irc> <@adamgreig:matrix.org> but you have to core::mem::transmute to get in/out of a core::arch::arm::i16x2 type as far as I can tell

18:42 <re_irc> <@therealprof:matrix.org> Yeah, the intrinsics work but are iffy to use since you have to come up with the correct types.

18:44 <re_irc> <@adamgreig:matrix.org> yea

18:46 <re_irc> <@therealprof:matrix.org> I think having those types would go a long way already since one could also impl the standard core math functions for them and they would already turn into useful code for all architectures.

18:47 <re_irc> <@adamgreig:matrix.org> which types exactly, in stdsimd?

18:47 <re_irc> <@therealprof:matrix.org> Yes.

18:47 <re_irc> <@adamgreig:matrix.org> they should already impl all the standard core math functions

18:47 <re_irc> <@adamgreig:matrix.org> https://rust-lang.github.io/stdsimd/core_simd/struct.SimdI16.html

18:48 <re_irc> <@adamgreig:matrix.org> so yes, it would be immediately convenient

18:48 <re_irc> <@therealprof:matrix.org> So `<i16x2>.saturating_add(<i16x2>)` should work already? 🤓

18:48 <re_irc> <@adamgreig:matrix.org> (though i don't know when stdsimd will be stabilised)

18:48 <re_irc> <@adamgreig:matrix.org> not sure - i don't think it would call the intrinsic would it? so it might not lower correctly in llvm?

18:49 <re_irc> <@adamgreig:matrix.org> can't test it because I am getting ICEs when i try to use stdsimd crate, heh

18:49 <re_irc> <@therealprof:matrix.org> I had the same. 😉

18:50 <re_irc> <@adamgreig:matrix.org> actually I can probably write a tiny executable locally instead, hold on a sec

18:52 <re_irc> <@therealprof:matrix.org> Would be interesting to see the generated IR.

18:52 <re_irc> <@therealprof:matrix.org> For scalars `saturating_add` turns into something like: ` %2 = tail call i16 @llvm.sadd.sat.i16(i16 %.sroa.0.0.extract.trunc, i16 %.sroa.03.0.extract.trunc) #3`

19:00 <re_irc> <@adamgreig:matrix.org> `pub fn qadd16(a: i16x2, b: i16x2) -> i16x2 { a.saturating_add(b) }` turns into:

19:00 <re_irc> <@adamgreig:matrix.org> define dso_local void @_ZN4qadd6qadd1617hf57c1092dc925120E(<2 x i16>* noalias nocapture sret(<2 x i16>) dereferenceable(4) %0, <2 x i16>* noalias nocapture readonly dereferenceable(4) %a, <2 x i16>* noalias nocapture readonly dereferenceable(4) %b) unnamed_addr #0 {

19:00 <re_irc> <@adamgreig:matrix.org> start:

19:00 <re_irc> <@adamgreig:matrix.org> %_3 = load <2 x i16>, <2 x i16>* %a, align 4

19:00 <re_irc> ... long message truncated: https://psion.agg.io/_matrix/media/r0/download/psion.agg.io/e36beec9c64597c6ee19eeef0456407d9e0566780d284ad35c11dbdeb59bfec8 (9 lines)

19:01 tokomak has quit [Ping timeout: 240 seconds]

19:02 <re_irc> <@therealprof:matrix.org> Sweet!

19:12 <re_irc> <@adamgreig:matrix.org> uh, no idea at all what's going on with the disassembly for that though

19:13 <re_irc> <@adamgreig:matrix.org> something seems badly wrong because `cargo objdump` is saying that function is `movs r1, #2; movt r1, #4; str r1, [r0]; bx lr`

19:14 <re_irc> <@therealprof:matrix.org> Hm.

19:15 <re_irc> <@adamgreig:matrix.org> meanwhile binaryninja thinks it's defined twice, with the first definition being `vaddw.s8 q9, q0, d2; andvsr0, r1, r4, lsl, #2; strlt r4, [r0, #0x770];` and the second being `movs r1, #2, movt r1, #4, str r1, [r0]; bx lr`

19:15 <re_irc> <@adamgreig:matrix.org> that first disassembly obviously makes no sense

19:15 <re_irc> <@therealprof:matrix.org> 😀

19:15 <re_irc> <@adamgreig:matrix.org> https://psion.agg.io/_matrix/media/r0/download/matrix.org/iglrjNNBsHZxAziqNajtxuro

19:16 <re_irc> <@adamgreig:matrix.org> so.. no idea

19:17 <re_irc> <@adamgreig:matrix.org> (clearly it's disassembling the same bytes into different instructions in both cases, and the vaddw is for non-M-class, so maybe it's showing me two interpretations? dunno what the ELF says)

19:17 <re_irc> <@thalesfragoso:matrix.org> are you calling with constants ? maybe try `#[inline(never)]`

19:17 <re_irc> <@adamgreig:matrix.org> oh interesting, I did inline(never) but I am also calling with constants

19:17 <re_irc> <@adamgreig:matrix.org> time to make up some i16s I guess

19:18 <re_irc> <@thalesfragoso:matrix.org> weird that it would still do that with inline never, does the result makes sense ?

19:18 <re_irc> <@adamgreig:matrix.org> (ha ha, yes, well spotted, I was indeed calling it with [1, 2] twice, and so it returns [2, 4] correctly)

19:18 <re_irc> <@adamgreig:matrix.org> llvm outsmarts me again

19:18 <re_irc> <@thalesfragoso:matrix.org> even with inline never, crazy

19:19 <re_irc> <@therealprof:matrix.org> It's weird that it doesn't show up in the IR?

19:20 <re_irc> <@adamgreig:matrix.org> I generated the IR without it having to be inside a binary crate

19:20 <re_irc> <@adamgreig:matrix.org> ok, got it now, it generates `ldrsh.w, qadd16, strh, qadd16, strh, bx lr`

19:20 <re_irc> <@therealprof:matrix.org> Ah.

19:20 <re_irc> <@therealprof:matrix.org> Yes, that's expected.

19:20 <re_irc> <@adamgreig:matrix.org> so yea, should give the correct result but is suboptimal execution

19:21 <re_irc> <@therealprof:matrix.org> Well, that's way from the worst code that could be generated.

19:21 <re_irc> <@adamgreig:matrix.org> yea

19:22 <re_irc> <@therealprof:matrix.org> Basically it's splitting the vector operation into two scalar operations and then the lowering pass picks it up and turns it into a scalar DSP operation.

19:23 <re_irc> <@therealprof:matrix.org> If you drop down to thumbv7m it should get way worse.

19:23 <re_irc> <@adamgreig:matrix.org> if I let it inline, starting with r0 and r1 loaded, I get `lsrs r2, r1, #16; lsrs r3, r0, #16; qadd16 r0, r1, r0; qadd16 r2, r2, r3; uxth r0, r0; uxth r1, r2;`

19:23 <re_irc> <@adamgreig:matrix.org> checking v7 but yea i'm sure it would be

19:25 <re_irc> <@therealprof:matrix.org> You could also try a regular add which should generate way worse code for `u16x2` than two `u16`.

19:26 <re_irc> <@adamgreig:matrix.org> indeed, it's all over the place, total mess on v7 non-e

19:26 <re_irc> <@adamgreig:matrix.org> I wont' paste it but it's like 30 instructions

19:28 <re_irc> <@therealprof:matrix.org> That's another thing which is missing: Good scalarising when no vector instructions are available. At the moment LLVM kind of assumes that it will always end up in vectors and thus be benefital to keep in that form rather than deconstruct it.

19:29 <re_irc> <@therealprof:matrix.org> So if we were to add some kind of vectorisation it would probably also take the architecture into account.

19:42 <re_irc> <@dirbaio:matrix.org> fun maths question: I got a SPI peripheral that has 2 8-bit dividers, so that `spi_clk = sys_clk / div1 / div2`, where `divX` is in `1..=255`

19:43 <re_irc> <@dirbaio:matrix.org> so seems like I can first calculate `ratio = sys_clk / spi_clk`, then find `div1, div2` such that `div1*div2 = ratio`

19:43 <re_irc> <@dirbaio:matrix.org> but I can't think of an easy way to do that :S

19:46 <re_irc> <@dirbaio:matrix.org> for example if `ratio=10_000` then exact solutions are `100*100, 125*80, 200*50, 250*40`

19:47 <re_irc> <@dirbaio:matrix.org> is there some simple and fast algo to do this? it smells like prime decomposition and stuff :S

19:47 <re_irc> <@dirbaio:matrix.org> all I can think is trial and error, though that's 256 iterations..

19:50 <re_irc> <@adamgreig:matrix.org> Integer factorisation is what you're looking for

19:50 <re_irc> <@dirbaio:matrix.org> yeah to enumerate the divirsors and shit

19:50 <re_irc> <@newam:matrix.org> Wouldn't some sort of a binary search work there? Could still do a trail-and-error but probably cut down on the execution time.

19:50 <re_irc> <@dirbaio:matrix.org> but that might be slower than just trying the 256 possible div1's

19:50 <re_irc> <@adamgreig:matrix.org> Unfortunately integers factorisation is one of the hard problems where we hope it doesn't work in polynomial time, lol

19:51 <re_irc> <@dirbaio:matrix.org> I mean factorization will probably be slower, even if I end up testing less divisors the "constant" cost of testing one will be higher

19:51 <re_irc> <@dirbaio:matrix.org> this is for the rpi pico

19:51 <re_irc> <@dirbaio:matrix.org> so cortex m0, no divide 🤣

19:52 <re_irc> <@adamgreig:matrix.org> Slower than what?

19:54 <re_irc> <@dirbaio:matrix.org> something like

19:54 <re_irc> <@dirbaio:matrix.org> ```rust

19:54 <re_irc> ... long message truncated: https://psion.agg.io/_matrix/media/r0/download/psion.agg.io/afc9482516c2c698b6cfb8a7bd7474b4bb01a9b4f835ea626f8ad0e844457340 (7 lines)

19:54 <re_irc> <@dirbaio:matrix.org> let div2 = ratio / div1;

19:54 <re_irc> <@dirbaio:matrix.org> for div1 in 1..=255 {

19:56 <re_irc> <@dirbaio:matrix.org> I think it can be rewritten without divides or multiples 🤔

19:56 <re_irc> <@dirbaio:matrix.org> so just 255 iterations, with no expensive math

19:57 <re_irc> <@dirbaio:matrix.org> but.. can it be done faster? :D

19:58 <re_irc> <@dirbaio:matrix.org> is it even worth it to try to match it exactly? 🤣

19:58 <re_irc> <@dirbaio:matrix.org> or maybe the API should have the user pass in the dividers instead of the desired clk?

19:58 <re_irc> <@dirbaio:matrix.org> most hals seem to have users pass in desired clks and do the calculations internally

19:59 <re_irc> <@adamgreig:matrix.org> Heh, passing dividers or target freq is certainly a divisive question in hal design

19:59 <re_irc> <@dirbaio:matrix.org> target freq seems much friendlier indeed..

20:00 <re_irc> <@adamgreig:matrix.org> I don't know that there is a more efficient way to factor integers, it's quite intensively studied

20:01 <re_irc> <@adamgreig:matrix.org> I guess it's different if your objective is to minimise frequency error rather than strictly find factors though

20:01 <re_irc> <@dirbaio:matrix.org> yea.. it might be impossible to match target exactly

20:01 <re_irc> <@dirbaio:matrix.org> in that case I just want closest

20:01 <re_irc> <@adamgreig:matrix.org> Closest or less-than?

20:02 <re_irc> <@dirbaio:matrix.org> maybe "closest greatest"? so the SPI never runs faster than requested freq?

20:02 <re_irc> <@adamgreig:matrix.org> Yea

20:03 <re_irc> <@dirbaio:matrix.org> so, find `div1, div2` such that `div1*div2 >= ratio` and `div1*div2` is closest to `ratio`

20:04 <re_irc> <@dirbaio:matrix.org> this is stupid, why didn't they do a single 16bit divider

20:04 <re_irc> <@dirbaio:matrix.org> a 16bit counter is the same amount of gates than 2x 8bit counters 😠

20:05 <re_irc> <@dirbaio:matrix.org> everything about the pico's spi peripheral is a big bag of WTF

20:13 <re_irc> <@adamgreig:matrix.org> 16 bit counter is way more gates than two 8bit counters

20:14 <re_irc> <@dirbaio:matrix.org> wut, doesn't it scale linearly?

20:14 <re_irc> <@adamgreig:matrix.org> Each bit has to compute a function of all previous bits

20:15 <re_irc> <@dirbaio:matrix.org> in all cases, or with fancy carry lookahead?

20:16 <re_irc> <@adamgreig:matrix.org> In all cases where you have a synchronous counter

20:19 <re_irc> <@adamgreig:matrix.org> If you have fancy carry lookahead you can reduce some of the gates needed, but the propagation time increases, so you can't run as high a frequency

20:20 <re_irc> <@adamgreig:matrix.org> So either way your 16 bit counter takes more gates and/or runs slower than two 8 bit

20:20 <re_irc> <@adamgreig:matrix.org> (slower meaning slower maximum permissible clock)

20:20 <re_irc> <@dirbaio:matrix.org> huh I guess I have no idea about HDL then :D

20:22 <re_irc> <@thalesfragoso:matrix.org> But we talking about dividers, no ?

20:23 <re_irc> <@adamgreig:matrix.org> Generally you make dividers using counters

20:23 <re_irc> <@adamgreig:matrix.org> (not always, it does depend and there are other ways)

20:24 <re_irc> <@thalesfragoso:matrix.org> oh, yeah, right

20:46 <re_irc> <@dirbaio:matrix.org> LOL I think I'm overengineering this... but it's O(255) without any multiplies or divides in the loop!

20:46 <re_irc> <@dirbaio:matrix.org> https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=363d8a95db99cc9650a4a6c0f836f06d

21:01 tokomak has joined #rust-embedded

21:18 <re_irc> <@adamgreig:matrix.org> Btw can the divider not divide by 256?

21:19 <re_irc> <@adamgreig:matrix.org> Normally I've seen you write x-1 to divide by X, so range is 1 to 256

21:19 <re_irc> <@dirbaio:matrix.org> yeah I was simplifying

21:19 <re_irc> <@dirbaio:matrix.org> div1 must be 2..254 and even

21:19 <re_irc> <@dirbaio:matrix.org> div2 must be 1..256 (you write div2-1 to the reg)

21:20 <re_irc> <@adamgreig:matrix.org> And even, weird

21:21 <re_irc> <@adamgreig:matrix.org> Wonder if it's actually a 7 bit counter and a fixed /2 too or something

21:22 <re_irc> <@dirbaio:matrix.org> datasheet says "Clock prescale divisor. Must be an even number from 2-254, depending on the frequency of SSPCLK. The least significant bit always returns zero on reads."

21:22 <re_irc> <@dirbaio:matrix.org> so probably yes

21:22 <re_irc> <@dirbaio:matrix.org> and div1 doesn't do the `-1` thing but div2 does 🤷‍♂️

21:25 <re_irc> <@dirbaio:matrix.org> wohoo display at 64mhz

21:29 <re_irc> <@dirbaio:matrix.org> I just realized with all the sane SPI speeds you only need one divider really... epic fail :D

21:30 <re_irc> <@dirbaio:matrix.org> sysclk is 133mhz so with div1=2 and div2=256 you can get down to 259khz

21:31 <re_irc> <@dirbaio:matrix.org> why would anyone want an SPI slower than that lol

21:52 mikehcox has joined #rust-embedded

22:00 <re_irc> <@firefrommoonlight:matrix.org> Have y'all used opamps on battery-powered devices before? How did you do it? Get one with an "enable" or "shutdown" pin?

22:01 <re_irc> <@firefrommoonlight:matrix.org> I'm troubleshooting battery life, and noticed a smoking gun in an IR camera of the op amps

22:03 <re_irc> <@firefrommoonlight:matrix.org> dirbaio: It's nice you have th2 2 prescalers. For example, most STM32s (all?) use a single one with crude factors, so you generally will only get roughly in the desired range.

22:16 neceve has quit [Ping timeout: 255 seconds]

22:25 mikehcox has quit [Quit: Client closed]

22:26 <re_irc> <@adamgreig:matrix.org> Opamps vary wildly in power consumption down to like 1uA, so depending on whether you want it on all the time or not maybe you just get a lower power one?