#mlpack on 2020-12-16 — irc logs at libera.irclog.whitequark.org

2018-11-12 22:39 ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/

02:38 HARSHCHAUHAN[m] has quit [Ping timeout: 268 seconds]

02:38 TaapasAgrawalGit has quit [Ping timeout: 268 seconds]

02:38 Cadair has quit [Ping timeout: 268 seconds]

02:38 KhizirSiddiquiGi has quit [Ping timeout: 268 seconds]

02:38 KritikaGuptaGitt has quit [Ping timeout: 268 seconds]

02:38 LolitaNazarov[m] has quit [Ping timeout: 268 seconds]

02:38 NishantKumarGitt has quit [Ping timeout: 268 seconds]

02:39 siddhant_jain[m] has quit [Ping timeout: 268 seconds]

02:39 PulkitgeraGitter has quit [Ping timeout: 268 seconds]

02:39 rishishounakGitt has quit [Ping timeout: 268 seconds]

02:39 AvikantSrivastav has quit [Ping timeout: 268 seconds]

02:39 SeverinoTessarin has quit [Ping timeout: 268 seconds]

02:39 aryan-026[m] has quit [Ping timeout: 268 seconds]

02:39 RishabhGoel[m] has quit [Ping timeout: 268 seconds]

02:39 AbhishekNimje[m] has quit [Ping timeout: 268 seconds]

02:39 RV784Gitter[m] has quit [Ping timeout: 268 seconds]

02:39 siddhant_jain[m] has joined #mlpack

02:40 RV784Gitter[m] has joined #mlpack

02:41 HARSHCHAUHAN[m] has joined #mlpack

02:41 NishantKumarGitt has joined #mlpack

02:41 KhizirSiddiquiGi has joined #mlpack

02:41 Cadair has joined #mlpack

02:42 KritikaGuptaGitt has joined #mlpack

02:42 TaapasAgrawalGit has joined #mlpack

02:42 AbhishekNimje[m] has joined #mlpack

02:42 PulkitgeraGitter has joined #mlpack

02:42 SeverinoTessarin has joined #mlpack

02:42 AvikantSrivastav has joined #mlpack

02:42 rishishounakGitt has joined #mlpack

02:42 RishabhGoel[m] has joined #mlpack

02:42 aryan-026[m] has joined #mlpack

02:42 LolitaNazarov[m] has joined #mlpack

05:22 _slack_mlpack_19 has quit [Ping timeout: 268 seconds]

05:22 _slack_mlpack_19 has joined #mlpack

08:14 ImQ009 has joined #mlpack

08:41 EmmanuelLykosGit has quit [Ping timeout: 268 seconds]

08:41 EmmanuelLykosGit has joined #mlpack

12:15 Samyak has joined #mlpack

14:43 Anton59 has joined #mlpack

14:45 < Anton59> Does mlpack have equivalent functionality of Tensorflow, Pytorch, Scikit-learn ?

14:48 < Anton59> A second question: Is the library going to continue to be supported in the next 3-5 years?

14:50 Samyak has quit [Remote host closed the connection]

14:51 < zoq> Anton59: mlpack implements some neural network functions, but not as many as TF or PyTorch, so at this point I wouldn't directly compare it with TF and PyTorch; it's similair to Scikit meaning our focus is not necessarily on Neural Networks, more on implementing cutting edge methods, make them fast.

14:51 < rcurtin[m]> Hi @Anton59, I don't think we are going anywhere anytime soon :)

14:52 < Anton59> ok, because I am thinking of working with it in my research, since I like C++ and not a big fan of Python.

14:54 < zoq> If I remember right, Ryan correct me if I'm wrong, mlpack is now 13 years old?

14:54 < zoq> started in 2007, I think

14:54 < rcurtin[m]> yeah, 2007 :)

14:55 < zoq> rcurtin[m]: Did you switched to another client?

14:55 < Anton59> so if you compare it with scikit-learn does it cover most of the functionality, maybe with a little bit more typing, but nevertheless ...

14:55 < rcurtin[m]> @Anton59 it sounds like mlpack might be a good choice in this case---I think it is the most mature of the "stable" C++ machine learning toolkits, and development is pretty active

14:55 < rcurtin[m]> @zoq yeah, I finally made the switch over to matrix a handful of weeks ago for all my chat clients

14:55 < rcurtin> I still keep IRC open though since it's what our logging system uses ;)

14:56 < rcurtin[m]> @Anton59 I would say so; there are some places where mlpack has some methods that scikit does not, and also vice versa, but I think for the "typical" algorithms like random forest, linear models, decision trees, k-means, and this type of thing, we should be pretty much at parity

14:57 < rcurtin[m]> in some cases, mlpack is faster (especially with k-means), in part due to the C++ implementation, and in part due to the choice of better algorithms under the hood

14:57 < rcurtin[m]> sometimes, scikit has its algorithms actually implemented in C though (via Cython) and when they do this the performance is quite comparable to mlpack

14:58 < Anton59> thanks, then I can count on it + boost + a DB library and I have everything

14:58 < Anmol2001Gitter[> can anyone help me with this error while testing ?

14:58 < Anmol2001Gitter[> ~/mlpack-3.4.2/build$ bin/mlpack_test -t KNNTest

14:58 < Anmol2001Gitter[> Test setup error: no test cases matching filter or all test cases were disabled

14:58 < rcurtin[m]> @Anton59 awesome

14:59 < rcurtin[m]> @Anmol2001 we moved our testing framework to Catch recently, so the syntax is a little different: try `bin/mlpack_test "[KNNTest]"`

14:59 < rcurtin[m]> did you find documentation somewhere that has the `-t` in it?

14:59 < zoq> rcurtin[m]: Interesting, so I guess the IRC mlpack user base lost another one.

14:59 < rcurtin[m]> zoq: sadly it might be true :-D

15:00 < zoq> Anmol2001Gitter[: https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/README.md

15:00 < rcurtin[m]> oh nice! I didn't know about that file

15:00 < Anmol2001Gitter[> @rcurtin yes i found -t in documentation

15:01 < rcurtin[m]> Anmol2001 (Gitter): could you point out where? would you be interested in updating it?

15:01 < Anmol2001Gitter[> sure

15:01 < Anmol2001Gitter[> https://www.mlpack.org/doc/mlpack-git/doxygen/build.html

15:01 < Anmol2001Gitter[> here in doxygen

15:02 < rcurtin[m]> ah, ok, so that will be in the file `doc/guide/build.hpp`

15:02 < Anmol2001Gitter[> yeah

15:02 < Anmol2001Gitter[> current status is as above

15:02 < rcurtin[m]> are you running on the current git master version?

15:03 < Anmol2001Gitter[> actually i followed doxygen documentation to setup

15:04 < Anton59> Wanted to report an issue I discovered. When I try this: cmake -D DEBUG=ON -D PROFILE=ON ../

15:04 < rcurtin[m]> Anmol2001 (Gitter): okay, but what version of mlpack are you using?

15:05 < Anmol2001Gitter[> 3.4.2

15:05 < Anton59> If I try to compile with debugging and profiling my Linux machine freezes when I do make -j4 with 4 cores on my CPU. If I do just cmake ../ then things are fine.

15:07 anmol has joined #mlpack

15:07 < zoq> Anton59: Just build with less jobs make-j1 or make -j2, mlpack uses a lot of memory for the build (we make use of templates a lot)

15:07 < rcurtin[m]> Anmol2001 (Gitter): sorry, I did not know you were using that version. at that time we were in the middle of the transition, so it looks like KNNTest was a part of `mlpack_catch_test`. if you update to git master then all the tests are in `mlpack_test`

15:07 < zoq> Anton59: Also we are working on reducing the memory usage.

15:07 < Anmol2001Gitter[> @Anton59 try with -j 3 or -j 2 sometimes its the memory exceeded while using more cores

15:09 < Anton59> Thank you, I thought this might be the case. Will try.

15:09 < anmol> welcome brother

15:19 anmol has quit [Ping timeout: 245 seconds]

15:41 Anton59 has quit [Remote host closed the connection]

15:46 Anton20 has joined #mlpack

15:54 < Anton20> Even with make -j 2 when configuration is done with cmake -D DEBUG=ON -D PROFILE=ON ../ the memory gets exhausted and the machine freezes. Because most of it is in headers, maybe I don't need to build with these options turned on. Or at least I can turn off the profiling and just try with debugging.

15:59 < RishabhGarg108Gi> @Anton20 , If you want to use mlpack just for research work and do not plan to contribute, then you can build with -DBUILD_TESTS=OFF. Building tests is really heavy. So, if you skip them, then you can pretty smoothly build it with just -j1 and it also wont take much time.

16:02 < siddhant_jain[m]> Anton20 just use make

16:04 < Anton20> make is like make -j 1 I hope

16:05 < Anmol2001Gitter[> i think by default there is one core

16:16 < iamarchit123Gitt> with 8GB ram if you are building only for DEBUG and PROFILE i have seen default j value suffices suffices but if you building for tests then i have found j1 is necessary 2 of the tests(i think one is ann) consume exceptional memory and if they are running in parralel my Ubuntu machine crashed.I came back after cofee only to see it dead at time of crash only 30 MB of ram was free :)

16:24 Anton20 has quit [Ping timeout: 245 seconds]

16:25 Antonhr has joined #mlpack

16:27 < Antonhr> I appreciate everybody's comments, but if you see that I login to the chat with different names that is because my machine crashes when I try to compile with debug on, even if I exclude the tests. The only way everything compiles is if I do cmake ../ and make.

16:27 < rcurtin[m]> Antonhr: how much RAM do you have available? and do you need to compile the tests?

16:28 < rcurtin[m]> sorry that things are painful right now, it is definitely a known pain point that we are working on

16:30 < Antonhr> So maybe putting a warning on the web site where this statement is : cmake -D DEBUG=ON -D PROFILE=ON ../ and RAM expectations would be better. I have :

16:30 < Antonhr> anton@anton-Precision-7720:~/mlpack-3.4.2/build$ free -h total used free shared buff/cache availableMem: 15Gi 2.0Gi 10Gi 531Mi 2.3Gi 12GiSwap: 2.0Gi 0B 2.0Gi

16:30 < Antonhr> 16 GB RAM

16:31 < rcurtin> so you have like 10GB RAM free and mlpack won't build with only one core?

16:31 < rcurtin> I am still not really understanding; are you building the tests? do you need to build all of the tests?

16:32 < Antonhr> Do not need to build the tests. Even excluding the tests as soon as I include debugging and worse yet profiling the build does not finish. It progressively grinds the Ubuntu 20.04 to a screeching halt.

16:33 < rcurtin[m]> so you are configuring with `cmake -DBUILD_TESTS=OFF`, and then typing `make`; is it building bindings for other languages?

16:33 < Antonhr> Yes sir.

16:33 < rcurtin[m]> do you need bindings for other languages?

16:34 < Antonhr> No need, just C++ pure library,

16:34 < rcurtin[m]> ok; if that's all you need, you can disable the bindings... `-DBUILD_PYTHON_BINDINGS=OFF -DBUILD_JULIA_BINDINGS=OFF -DBUILD_R_BINDINGS=OFF -DBUILD_CLI_EXECUTABLES=OFF`

16:34 < rcurtin[m]> or, you can just make the `mlpack` target: `make mlpack`

16:35 < rcurtin[m]> after `make mlpack`, the library will be in `lib/` and the headers will be in `include/` under your build directory

16:35 < Antonhr> I will try this to speed things up, thanks.

16:35 < rcurtin[m]> I'd suggest just reconfiguring CMake to disable all the bindings

16:35 < rcurtin[m]> I guess I forgot `-DBUILD_GO_BINDINGS=OFF` too

16:35 < Antonhr> All of them, no preference for Go.

16:36 < rcurtin[m]> I forget how many different languages we have bindings for now :)

16:36 < Anmol2001Gitter[> i just made a pr , have a look when free, https://github.com/mlpack/mlpack/pull/2770

16:36 < Antonhr> Yes, but the reason for me is simpler - C++.

16:37 < rcurtin[m]> 👍️

16:37 < RishabhGarg108Gi> @rcurtin , many times its boring to write all these options to disable all the other bindings. Would it be a good idea to have another cmake option -DBUILD_BINDINGS that can be used to enable or disable all bindings with just one option

16:38 < rcurtin[m]> RishabhGarg108 (Gitter): hmm, yeah! that could be nice; alternately, maybe the better idea would be to disable all the `BUILD_X_BINDINGS` options by default

16:40 < RishabhGarg108Gi> Yep. This would be better to disable all of them by default. I will open an issue for this :+1:

16:40 < rcurtin[m]> thanks!

16:41 Antonhr has quit [Remote host closed the connection]

16:45 < Anmol2001Gitter[> @rcurtin i am interested in updating doc/guide/build.hpp as i myself faced the issue : )

16:45 < rcurtin[m]> yes, please, go ahead

18:03 PulkitgeraGitter has quit [Ping timeout: 268 seconds]

18:03 robotcatorGitter has quit [Ping timeout: 268 seconds]

18:05 Antonhr has joined #mlpack

18:07 PulkitgeraGitter has joined #mlpack

18:08 robotcatorGitter has joined #mlpack

18:59 Antonhr has quit [Remote host closed the connection]

19:06 < jeffin143[m]> > make is like make -j 1 I hope

19:06 < jeffin143[m]> @antom

19:06 < jeffin143[m]> By default it will take as many cores you have , that is 4

19:06 < jeffin143[m]> @anton *

19:24 < rcurtin[m]> @jeffin143 are you sure on that one? my understanding is that `make` will only use one core if you don't specify a `-j` option

19:34 < rcurtin[m]> that's very strange, make with only one core fails but two succeeds?

19:35 < Anmol2001Gitter[> https://stackoverflow.com/questions/30887143/make-j-8-g-internal-compiler-error-killed-program-cc1plus

19:35 < iamarchit123Gitt> but in my case make -j2 failed and make -j1 passed with build_test plain make fails

19:35 < Anmol2001Gitter[> the issue in this link is little similar

19:36 < rcurtin[m]> are any of you using `ninja` to build?

19:38 < iamarchit123Gitt> anyone tried by enabling swap memory to increase effective RAM

19:38 < jeffin143[m]> > that's very strange, make with only one core fails but two succeeds?

19:38 < jeffin143[m]> Ryan that was probably 1 year ago , I have been using a good computer since then

19:38 < jeffin143[m]> But I definitely remember that happened

19:39 < rcurtin[m]> interesting; I know that ninja will build in parallel by default but I don't know of any systems that will default `make` to multiple cores

19:40 < jeffin143[m]> May be I use to reverse search and hit tab , and once I might have used make -j4 and thus

19:40 < jeffin143[m]> Not sure

19:40 < jeffin143[m]> I will test again

19:41 < Anmol2001Gitter[> @rcurtin i have done required documentation changes with test commands

20:40 < HARSHCHAUHAN[m]> Hii Everyone, I am trying to setup mlpack env on my local.

20:40 < HARSHCHAUHAN[m]> After this command "cmake -D DEBUG=ON -D PROFILE=ON ../"

20:40 < HARSHCHAUHAN[m]> I am getting an error.

20:41 < HARSHCHAUHAN[m]> can anyone help me out please!!

21:22 < rcurtin[m]> the error message shows that the version of cereal on your system is too old; install a newer version and try again 👍️

21:29 < zoq> rcurtin[m]: I did a quick benchmark between mlpack (master) and mlpack (3.3.2 ) to see if the memory usage increased; I patched out the nn stuff for both, and also disabled all bindings.

21:30 < zoq> mlpack master uses 3686364 kbytes and mlpack 3.3.2 3157104 kbytes

21:30 < zoq> build time also increased from 27:02.16 to 31:50.38

21:30 < zoq> https://gist.github.com/zoq/b1271596fee0cb98c5dd7a02076534c9

21:30 < zoq> https://gist.github.com/zoq/88ede4516d870722bac43c6dbf658356

21:31 < zoq> I'll do the same with the nn code now.

21:34 < zoq> I think we added some code in between besides catch2 and cereal, but 530 MB seems strange.

21:34 < rcurtin[m]> fascinating! do you want to try 3.4.2 also? that has (some) catch2 but not cereal

21:35 < zoq> Yes, let's test 3.4.2 as well

21:36 < rcurtin[m]> it's also possible that the benefit of removing boost won't be seen until we remove all of it; so if we are still using visitor and math in places, those still may be including a huge amount

21:36 < rcurtin[m]> let me see if I can get gcc or clang to output some information on how long it spends in each phase of compilation; that could be helpful too

21:38 < zoq> But somehow we increased the memory footprint.

21:38 < rcurtin[m]> right, I have some ideas for why that could be, but let me get a breakdown of compilation time first

21:41 < zoq> Would be nice to include that into our ci; I'm using 'time' right now, would be easy to add just to get some numbers.

21:41 < rcurtin[m]> agreed, maybe there is some way for Jenkins to track it?

21:42 < zoq> I'll look into it.

21:47 ImQ009 has quit [Quit: Leaving]

21:47 < rcurtin[m]> -ftime-report on `adaboost_test.cpp` and `ann_layer_test.cpp` on mlpack master: https://pastebin.com/tY5HSvZm

21:47 < rcurtin[m]> `ann_layer_test.cpp` takes almost 8GB of RAM by itself!! :-O

21:48 < zoq> wow, insane

21:49 < rcurtin[m]> let's see what that was on 3.3.2 (I realize that the test is probably a little bit smaller, and there were fewer layers, but still, I want to see if it is a huge jump or something)

21:50 < zoq> Okay, if it's that bad, I'll put https://github.com/mlpack/mlpack/issues/2647 on the top of my list.

21:54 < rcurtin[m]> on 3.3.2: https://pastebin.com/J0LHRCDR

21:54 < rcurtin[m]> that's 6.3GB, so still quite a lot but it's a good bit less than master

21:55 < rcurtin[m]> I want to see if I can figure out why it's taking so much memory

21:55 < rcurtin[m]> I can see that many of the other files that don't use the NN toolkit use ~a few hundred MB usually

21:57 < zoq> I guess just adding one layer to LayerTypes will have a huge effect.

22:02 < rcurtin[m]> I remember from a long time ago when I profiled with `-ftime-report` that a massive amount of time was being spent just parsing includes

22:03 < rcurtin[m]> however, the reports I'm seeing now don't seem to have massive amount of time spent in parsing

22:03 < rcurtin[m]> so I am trying to dig further and understand what's going on there

22:03 < rcurtin[m]> I had believed for a long time that simply removing `#include`s would be very helpful (this is, in part, the reason I've thought removing boost would be super helpful), but the results I am seeing now make me wonder if my thoughts were incorrect

22:05 < zoq> I guess you are still right, but it might not have the effect we wanted to see.

22:05 < rcurtin[m]> so I'm trying again removing cotire, which I think will use precompiled headers; maybe that is doing a really good job of reducing parse time?

22:09 < rcurtin[m]> oh wow---without cotire, suddenly simply parsing for `adaboost_test.cpp` takes 850MB and 5 seconds, whereas with cotire it takes 0.4s and 45MB... so things are way better than they *could* be! :)

22:11 < zoq> this is crazy, so cotire did an awsome job, I guess I haven't appreciated it enough.

22:11 < rcurtin[m]> and on `ann_layer_test.cpp`, without cotire parsing takes 12 seconds and 1.0GB, with cotire it takes 4 seconds and 330MB (probably because some headers are not precompiled there)

22:12 < rcurtin[m]> I agree, I think I underappreciated cotire too!

22:12 < rcurtin[m]> I found this tool: https://github.com/mikael-s-persson/templight

22:12 < rcurtin[m]> I think I'll play with it and see if I can make a callgraph for template instantiations; maybe this will give useful information too

22:13 < zoq> looks like a promising tool

22:14 < rcurtin[m]> it'll take a little while to get set up with it; it requires a custom llvm build

22:21 < rcurtin[m]> at least compiling LLVM proves that there is something out there that's more computationally intensive to compile than mlpack :-D

22:21 < zoq> haha

22:23 < zoq> https://gist.github.com/zoq/17482fa97d5ab82308a43b90520b44d5 mlpack-git with ann

22:23 < zoq> so looks like right now you need at least 4.8 GB memory

22:23 < zoq> free memory

22:24 < rcurtin[m]> yeah; I guess the GCC output must not be reporting on peak memory usage at one time but the total (including things that are deallocated and reallocated later)

22:24 < rcurtin[m]> I suppose, maybe we can at least say that we are helping people keep their houses warm in winter :)

22:26 < zoq> haha, the only problem is in some areas you don't heat with electricity, because it's expensive :(

22:26 < rcurtin[m]> true :-D

22:26 < zoq> But I guess it's a nice by product, if you like it warm.

22:27 < zoq> Here it's currently 6 degrees celsius (outside).

22:27 < zoq> Also, I'm building on a remote machine.

22:28 < rcurtin[m]> same here, a little colder than I would expect for atlanta this time of year

22:30 < zoq> Also, I don't expect any snow this year, maybe beginning of next year; which doesn't matter really, because we are in a lockdown.

22:31 < rcurtin[m]> it would give you something interesting and different to look at out the window :)

22:31 < zoq> oh yes definitely

22:43 < shrit[m]> I am currently compiling a neural network code, only one cc file, two functions and two models, in one file. It is using 20 percent of my 32 GB RAM

22:45 < shrit[m]> So I am not suprised if ann_layers_test.cpp is taking about 8 GB of RAM

22:45 < rcurtin[m]> I suspect it might be a quick fix to just split the file into several files, so that all the instantiation doesn't happen all in one file

22:45 < rcurtin[m]> but, that's not a great solution, only a quick one

22:46 < shrit[m]> Agreed, I would blame boost visitors on this increase. Otherwise I can not see anything else that consume that much amount of RAM

22:46 < rcurtin[m]> heavy use of SFINAE could also be to blame, but I would suspect the visitor paradigm too since it is super template heavy

22:47 < shrit[m]> We can divide it, but it will not divide the amount of consumed RAM by half

22:47 < rcurtin[m]> I am still compiling this custom LLVM version, but I'm hoping when I finally manage to make it work it might shed some light on where the painful part is :)

23:24 < shrit[m]> Hope that too :)