13:32 UTC

< August 2021 > Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

- Console
- #amaranth-lang
- #armbian
- #armbian-allwinner
- #armbian-amlogic
- #armbian-broadcom
- #armbian-rockchip
- #armlinux
- ##bash-crimes
- #beagle
- #buildbot
- #commonlisp
- #crux
- #crux-arm
- #crux-devel
- #crux-social
- #crystal-lang
- #discferret
- #evennia
- #fedilinks
- #fedora-coreos
- #fedora-riscv
- #ffmpeg
- #ffmpeg-devel
- #foot
- #glasgow
- #hpy
- #hts
- #jruby
- #kisslinux
- #libreelec
- #linux-amlogic
- #linux-exynos
- #linux-mediatek
- #linux-rockchip
- #linux-ti
- #litex
- #maemo-leste
- #mailx
- #mew
- #mlpack
- #moin
- #nmigen
- #numbat
- #ocaml
- ##openfpga
- #openFPGALoader
- #openocd
- #openscad
- #openvswitch
- #osdev
- #picolisp
- #prjbureau
- #pypy
- #racket
- #radxa
- ##raspberrypi-internals
- #riscv
- #river
- #ruby
- #rust-embedded
- #sandstorm
- #scopehal
- #solvespace
- #Speedsaver
- ##stm32-rs
- #tegra
- #titandev
- #u-boot
- ##yamahasynths
- #yocto
- #yosys
- #zeppe-lin

rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

<NamanJain[m]>
Hi! I'm getting a ton of undefined reference errors after compiling covariance computation code. I posted issue on <https://stackoverflow.com/questions/68844764/mlpack-compile-time-issue-linux|stackoverflow>. Kindly help me to fix it.

<shrit[m]>
you need to add `-lmlpack -larmadillo -L/path/to/mlpack/lib/ -L/path/to/armadillo/lib` to your compiler command

<rcurtin[m]>
some friends are in town so we were going to get lunch, but, I don't know when they are going to be awake ... anyway, I'll try and join... what is the meeting about? 😃

<rcurtin[m]>
yeah, basically that strategy is just a "trick" to try and get the compiler to emit SIMD instructions

<jonpsy[m]>
<rcurtin[m]> "some friends are in town so we w" <- thats fyn, we can do it later when you're free?

<rcurtin[m]>
are you sure you need to? If you can express things as Armadillo operations, then it's likely that the Armadillo code will handle this type of thing under the hood

<ABHINAVANAND[m]>
zoq Is there a way to reduce the number of epoch in group norm gradient test. It will help me in debugging.

<jonpsy[m]>
rcurtin[m]: I've thought about it. The orig code used "torch.expand" and lot of redundant stuff.

<rcurtin[m]>
so, my first suggestion would be to determine whether or not this is actually a bottleneck that uses a significant amount of computation time... if not, then it's probably not worthwhile to have the extra code for the SIMD loop

<rcurtin[m]>
second, maybe can you use `.transform()` or something like this? if we ***can*** push SIMD logic down into Armadillo, we should

<rcurtin[m]>
third, it really depends on the type of loop. if I am doing this:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/31a91bd5c44aea54f227a14cb401962fc53ee559)

<rcurtin[m]>
but, if I write a loop like this:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/b4171de339239973f2120e7550b8be08fb960c5c)

<rcurtin[m]>
the key here is the dependencies of each iteration... in the first loop, every iteration depends on the previous iteration's output and thus it can't be parallelized. in the second loop, every iteration is independent and so the compiler can automatically apply SIMD (and I think it typically will)

<jonpsy[m]>
hmm, let's image a 2D array... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/4b0a973f73ab3bdd77f7d81a3057206e8e833953)

<jonpsy[m]>
* hmm, let's imagine a 2D array... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/59c372c928b3811dee69f0484d8bd6474275a42f)

<jonpsy[m]>
seeing column wise, we can cut it into 3 chunks. chunkA ```[ [1, 2], [3, 4] ] ``` chunk B ```[[5, 6], [7, 8]]``` so on for chunkC

<jonpsy[m]>
now each chunk, would be multipled by a vector ```w```, its dimension is same as dim of each vector in chunk

<jonpsy[m]>
so here chunkA has two vecs, ```[1, 2]``` and ```[3, 4]```, so ```w``` would be some ```arma::randn(size(2))```

<jonpsy[m]>
* so here chunkA has two vecs each of dim 2, `[1, 2]` and `[3, 4]`, so `w` would be some `arma::randn(size(2))`

<zoq[m]>
<ABHINAVANAND[m]> "zoq Is there a way to reduce th" <- https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/ann_test_tools.hpp#L197 you can adjust the for loop.

<rcurtin[m]>
sure, I see, and you'll have a bunch of these chunks and a bunch of `w`s and they all need to be multiplied

<jonpsy[m]>
that ```w``` multipled with ```chunk``` gives us a vec, which we will append to ```res```

<rcurtin[m]>
can you express this as a matrix multiplication by reshaping the input array and grouping the `w`s into a matrix?

<rcurtin[m]>
the reason I suggest this is that while you can go to some lengths to get SIMD instructions and make things fast, the BLAS primitives like matrix-matrix multiplication are blindingly fast

<rcurtin[m]>
so if you can structure things such that they can be expressed as a BLAS primitive, that is the most likely route to making things fast

<rcurtin[m]>
(and as a bonus, the code remains clean because we can just do this as a couple Armadillo calls)

<jonpsy[m]>
jonpsy[m]: ```c++... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/d2527f01e8c388141a014efec9e4cccc42aab90b)

<jonpsy[m]>
> <@jonpsy:matrix.org> ``` w_A * chunkA ```... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/dd8e35dccb5441156b8927dff73fe31f6a60581f)

<rcurtin[m]>
yeah, can you do it ***all*** in batch? so, e.g., reshape `Q` such that each chunk corresponds to a single column, and then multiply directly with the `W` matrix?

<rcurtin[m]>
that is maybe not very satisfying because you don't get to play with SIMD instructions, but, if you can do that, the OpenBLAS matrix multiplication code is tuned like crazy and will almost certainly be faster than any other strategy, especially because 1 NxNxN matrix multiplication will tend to be faster than N vector-matrix multiplications of size 1xNxN

<rcurtin[m]>
👍️ yeah, I like optimizing code at a really low level, but most of the time at least with matrix operations, if you can manage to express the problem as a BLAS primitive then OpenBLAS will blow away anything handwritten

<NamanJain[m]>
<shrit[m]> "you need to add `-lmlpack -larm" <- Thanks, @shrit! It works. I have included only -lmlpack -larmadillo

<shrit[m]>
zoq: I am happy to open a new pull request and Change the class name to `DiagonalGaussianDist` to `DiagonalGaussianDistType<>`, and then redefine `DiagonalGaussianDist` as using armadillo internaly. It is true that is policy should be easier to review and will not require to modify half of the code base. If you prefer this, I will open a new pull request and close the current one.

<shrit[m]>
* zoq: I am happy to open a new pull request and Change the class name to `DiagonalGaussianDist` to `DiagonalGaussianDistType<>`, and then redefine `DiagonalGaussianDist` as using armadillo internally. It is true that this policy should be easier to review and will not require to modify half of the codebase. If you prefer this, I will open a new pull request and close the current one.

<zoq[m]>
I just wanted to keep it simple to use (we discussed some of it here https://github.com/mlpack/mlpack/issues/2524) by adding a template parameter I think instead of making it easier to use it is actually more complicated.

<PranshuSrivastav>
> You are trying to link statically with armadillo, which is dynamic on your machine

<PranshuSrivastav>
Hey I don't really understand this, could you please explain it to me in simpler terms..

<shrit[m]>
The library contains a set of functions. When it is created, it is created either statically or dynamically according to your configuration, for more information please have a look here https://stackoverflow.com/questions/1993390/static-linking-vs-dynamic-linking