#mlpack on 2020-01-18 — irc logs at libera.irclog.whitequark.org

2018-11-12 22:39 ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/

02:21 gtank___ has quit [Read error: Connection reset by peer]

02:21 gtank___ has joined #mlpack

04:17 < metahost> zoq: could you please review ensmallen#149 once. I have added a comment.

08:07 < jenkins-mlpack2> Project docker mlpack nightly build build #586: UNSTABLE in 2 hr 53 min: http://ci.mlpack.org/job/docker%20mlpack%20nightly%20build/586/

09:06 ImQ009 has joined #mlpack

10:13 SriramSKGitter[m has joined #mlpack

10:13 < SriramSKGitter[m> When I run bin/mlpack I get the following:

10:13 < SriramSKGitter[m> bin/mlpack_test: symbol lookup error: bin/mlpack_test: undefined symbol: _ZN6mlpack4tree10CosineTreeC1ERKN4arma3MatIdEE

10:13 < SriramSKGitter[m> How do I go about debugging this?

10:14 < SriramSKGitter[m> *bin/mlpack_test not bin/mlpack

10:33 xiaohong has joined #mlpack

10:40 xiaohong has quit [Remote host closed the connection]

10:40 xiaohong has joined #mlpack

10:51 xiaohong has quit [Remote host closed the connection]

10:52 xiaohong has joined #mlpack

10:57 xiaohong has quit [Ping timeout: 260 seconds]

11:03 xiaohong has joined #mlpack

11:11 xiaohong has quit [Remote host closed the connection]

11:18 launchpad5682 has joined #mlpack

11:19 xiaohong has joined #mlpack

11:33 dantraz has joined #mlpack

11:36 xiaohong has quit [Remote host closed the connection]

11:43 dantraz has quit [Remote host closed the connection]

11:46 xiaohong has joined #mlpack

11:50 xiaohong has quit [Ping timeout: 260 seconds]

11:57 xiaohong has joined #mlpack

12:01 xiaohong has quit [Ping timeout: 260 seconds]

12:17 xiaohong has joined #mlpack

12:21 xiaohong has quit [Ping timeout: 260 seconds]

12:25 launchpad5682 has quit [Ping timeout: 260 seconds]

12:37 xiaohong has joined #mlpack

12:42 xiaohong has quit [Ping timeout: 260 seconds]

12:49 Soham has joined #mlpack

12:57 xiaohong has joined #mlpack

13:01 Soham has quit [Remote host closed the connection]

13:02 xiaohong has quit [Ping timeout: 260 seconds]

13:58 xiaohong has joined #mlpack

14:03 xiaohong has quit [Ping timeout: 260 seconds]

14:19 xiaohong has joined #mlpack

14:23 xiaohong has quit [Ping timeout: 260 seconds]

14:39 xiaohong has joined #mlpack

14:44 xiaohong has quit [Ping timeout: 260 seconds]

15:31 PrinceGuptaGitte has joined #mlpack

15:31 < PrinceGuptaGitte> Hi, I'm new to open source but love machine learning. I wanted to start contributing to MLpack, I built it from source in ubuntu, everything is working fine except this line `data::Save("cov.csv", cov, true);`

15:32 < PrinceGuptaGitte> I get this error, I've been trying to figure this out but couldn't.

15:40 xiaohong has joined #mlpack

15:45 xiaohong has quit [Ping timeout: 260 seconds]

16:01 xiaohong has joined #mlpack

16:05 xiaohong has quit [Ping timeout: 260 seconds]

16:21 xiaohong has joined #mlpack

16:23 < kartikdutt18Gitt> Hi @rcurtin , @zoq I have implemented mish activation function #2156, I already had a PR open what should I do, Currently changes made are visible in the previous [PR:](https://github.com/mlpack/mlpack/pull/2126).Thanks.

16:25 < rcurtin> I don't understamd what the question is

16:26 xiaohong has quit [Ping timeout: 260 seconds]

16:26 yash71 has joined #mlpack

16:28 yash71 has quit [Remote host closed the connection]

16:29 < kartikdutt18Gitt> Hi,I already have a PR open regarding another issue and I am unable to open a new PR so should stash changes of previous PR, close it and send a new PR or should I create a new branch with changes of mish function only?

16:29 < kartikdutt18Gitt> The changes can be viewed in the PR #2126 .

16:38 < rcurtin> kartikdutt18Gitt: it sounds to me like you should use a different branch on your mlpack fork for the second set of changes, and then you can open two PRs

16:41 xiaohong has joined #mlpack

16:46 < kartikdutt18Gitt> Ohh thanks, I will create a new branch and send another pull request. Thanks.

16:46 xiaohong has quit [Ping timeout: 260 seconds]

16:46 < rcurtin> sounds good :)

17:02 xiaohong has joined #mlpack

17:06 xiaohong has quit [Ping timeout: 260 seconds]

17:23 < kartikdutt18Gitt> Hi @rcurtin, I have created a new pull request with only the second set of changes.

17:33 SachaD has joined #mlpack

17:39 rahul has joined #mlpack

17:40 < rahul> hi everyone, I m Rahul and I want to contribute in this organisation for the upcoming gsoc 2020,I m good at c++,java & do competitive programming so can anyone help me out how can I start with solving bugs & contribute as well

17:50 Param-29Gitter[m has joined #mlpack

17:50 < Param-29Gitter[m> Hello, I would like to know what is meaning of relevant tickets mentioned on https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas

17:59 xiaohong has joined #mlpack

18:00 < SachaD> Zoq, my minimum example:

18:00 < SachaD> #include <mlpack/core.hpp>

18:00 < SachaD> #include <mlpack/methods/ann/ffn.hpp>

18:00 < SachaD> #include <mlpack/methods/ann/layer/layer.hpp>

18:00 < SachaD> #include <mlpack/methods/ann/loss_functions/mean_squared_error.hpp>

18:00 < SachaD> #include <armadillo>

18:00 < SachaD> #include <omp.h>

18:00 < SachaD> using namespace std;

18:00 < SachaD> int main(int argc, char* argv[])

18:00 < SachaD> {

18:00 < SachaD> arma::mat dataset, trainData, outLabels;

18:00 < SachaD> mlpack::ann::FFN<> model;

18:00 < SachaD> static const int maxWordLength = 31;

18:00 < SachaD> mlpack::data::Load("en-dic-utf16-codes.csv", dataset, true);

18:00 < SachaD> if (dataset.n_rows != maxWordLength){

18:00 < SachaD> return -1;

18:00 < SachaD> }

18:00 < SachaD> trainData = dataset.submat(0, 0, dataset.n_rows - 1, dataset.n_cols - 1);

18:00 < SachaD> outLabels = dataset.submat(0, 0, dataset.n_rows - 1, dataset.n_cols - 1);

18:00 < SachaD> outLabels.for_each( [](arma::mat::elem_type& val) { val += 1; } );

18:00 < SachaD> model.Add< mlpack::ann::Linear<> >(maxWordLength, maxWordLength);

18:00 < SachaD> model.Add< mlpack::ann::SigmoidLayer<> >();

18:00 < SachaD> model.Add< mlpack::ann::Linear<> >(maxWordLength, maxWordLength);

18:00 < SachaD> model.Add< mlpack::ann::LogSoftMax<> >();

18:00 < SachaD> model.Train(trainData, outLabels);

18:00 < SachaD> return 0;

18:00 < SachaD> }

18:01 < SachaD> link to file "en-dic-utf16-codes.csv": https://yadi.sk/d/RxhD2xALODmy2w

18:03 < SachaD> P.S. mlpack works with double data type. Can it work with float or even unsigned short? - they are faster end less expensive.

18:04 xiaohong has quit [Ping timeout: 260 seconds]

18:15 rahul has quit [Ping timeout: 260 seconds]

18:17 garg has joined #mlpack

18:18 garg has quit [Remote host closed the connection]

18:19 xiaohong has joined #mlpack

18:24 xiaohong has quit [Ping timeout: 260 seconds]

18:39 xiaohong has joined #mlpack

18:44 xiaohong has quit [Ping timeout: 260 seconds]

18:48 < kartikdutt18Gitt> Hi SachaD, double has 52 mantissa bits where as float has 23 bits so it double offers more precision in general, Also I think by default PyTorch also requires tensors to be of type float (generally).

19:00 nishantkr18 has joined #mlpack

19:00 xiaohong has joined #mlpack

19:04 xiaohong has quit [Ping timeout: 260 seconds]

19:05 nishantkr18 has quit [Remote host closed the connection]

19:07 nishantkr18 has joined #mlpack

19:09 < PrinceGuptaGitte> Which library is is responsible when we save a armadillo matrix to a CSV?

19:09 < PrinceGuptaGitte> Like this: `data::Save("cov.csv", cov, true);`

19:15 nishantkr18 has quit [Remote host closed the connection]

19:32 < SachaD> kartikdutt18Gitt, yes, but I see the lections on youtube about deep learing and there was told that gpu makes faster multiplications of shorter datatypes. Thats why TensorFlow uses tf.float16 and tf.uint16 for faster speed and less memory. Also GPUs optimized to multiple 16 and 32 bit matices (double is 64 bit). 16 bit is twice more speed/less memory than 32 bit values. I am a c++ programmer, thats why I see in mlpack side, but if

19:32 < SachaD> Tesorflow can operate smaller data types it can be faster and more effective than mlpack or I missed something (Yes, c++ is faster than Python, but if there a large amount of data and calculations it can be beated with more optimized calculations and data storage)?

20:01 xiaohong has joined #mlpack

20:06 xiaohong has quit [Ping timeout: 245 seconds]

20:21 xiaohong has joined #mlpack

20:23 < zoq> SachaD: Not every method in mlpack is templated, so that you can use arma::Mat<float> instead of arma::Mat<double>, I guess a hacky solution is to use: typedef double float;.

20:26 ImQ009 has quit [Quit: Leaving]

20:26 xiaohong has quit [Ping timeout: 260 seconds]

20:27 < zoq> SachaD: 8bit integer is the same, but without a good model design this isn't going to work, you will see a speedup but a drop in accuracy; tf provides some support to quantize the model, but that doesn't work for each model.

20:29 < zoq> Param-29: Hello, sometimes we have issues that might be a good starting point to get familair with a certain project.

20:31 < zoq> rahul: Hello there, https://www.mlpack.org/gsoc.html should answer your questions, let us know if you have any further questions.

20:32 < sreenik[m]> SachaD: The tensor cores in modern Nvidia GPUs (Turing and Volta probably) have some extremely capable Tensor cores in addition to Cuda cores which only work with fp16 and fp8 (not with fp32). Is that what you are emphasizing on?

20:33 < zoq> PrinceGupta: In this case that would be armadillo.

20:42 xiaohong has joined #mlpack

20:46 xiaohong has quit [Ping timeout: 245 seconds]

21:02 xiaohong has joined #mlpack

21:07 xiaohong has quit [Ping timeout: 260 seconds]

21:26 < SachaD> sreenik[m], yes.

21:32 SachaD has quit [Quit: Konversation terminated!]

22:03 xiaohong has joined #mlpack

22:08 xiaohong has quit [Ping timeout: 260 seconds]

22:24 xiaohong has joined #mlpack

22:28 xiaohong has quit [Ping timeout: 260 seconds]

22:34 < sreenik[m]> SachaD: Yes, popular frameworks use a mixed precision to speed up stuff and somewhat preserve accuracy but in the end the low level BLAS routines used are independent of whether we are using python or c++. Currently with mlpack/armadillo, we use Nvblas for gpu based operations (if you enable it) but the data transfer overheads deter most of the speed gains. Since Nvblas cannot handle all sorts of operations that Cublas

22:34 < sreenik[m]> can, it directs some of them to the cpu (default blas routine like openblas, etc.) and some to the gpu instead of copying the entire data to the gpu memory at once. I suppose Tensorflow, etc. directly make use of Cublas and direct all operations to the gpu which makes the operation very fast. As far as mlpack is concerned, Bandicoot is in development (built on top of armadillo) and would address these issues. zoq correct

22:34 < sreenik[m]> me if I am wrong.

22:44 xiaohong has joined #mlpack

22:48 xiaohong has quit [Ping timeout: 260 seconds]

22:51 < rcurtin> SachaD: sreenik[m]: I'm agreed on everything that's written here. realistically, TensorFlow is C++ under the hood. at the moment, mlpack isn't able to make use of GPUs except for early Bandicoot support

22:51 < rcurtin> essentially we have been waiting on a "nice" GPU linear algebra library instead of hand-maintaining awful CUDA code, etc.

22:52 < rcurtin> there is still some extra overhead that running TF is likely to incur, but I of course couldn't predict exactly how mlpack+bandicoot would fare against TF (or PyTorch). we will find out soon enough once bandicoot is ready, I think

23:04 xiaohong has joined #mlpack

23:09 xiaohong has quit [Ping timeout: 260 seconds]

23:42 lozhnikov has quit [Ping timeout: 265 seconds]

23:48 lozhnikov has joined #mlpack