#mlpack on 2015-09-16 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

10:43 Udit has joined #mlpack

12:48 Udit has quit [Quit: Udit]

13:04 Udit has joined #mlpack

14:53 Udit has quit [Quit: Udit]

14:57 Udit has joined #mlpack

15:01 Udit has quit [Ping timeout: 255 seconds]

15:17 Udit has joined #mlpack

15:29 Udit has quit [Ping timeout: 240 seconds]

15:35 Udit has joined #mlpack

15:35 Udit has quit [Client Quit]

15:40 Udit has joined #mlpack

16:22 Udit has quit [Quit: Udit]

16:39 Udit has joined #mlpack

17:11 Udit has quit [Quit: Udit]

17:37 Udit has joined #mlpack

18:04 rsv has joined #mlpack

18:14 < rsv> hi again, i have a new question today... it looks like the matrix I am trying to use for LogisticRegression is too big, is there a way to train LogisticRegression in a progressive way to avoid loading a giant matrix?

18:14 < rsv> the error i'm seeing is: error: arma::memory::acquire(): out of memory

18:17 < rsv> the matrix is 4,000,000 observations * 1600 variables

18:22 < naywhayare> rsv: yeah, it can be done, but it'll be a little bit tricky...

18:22 < naywhayare> basically, you can convert your file into a binary format (so, not csv -- just doubles serialized to a disk)

18:22 < naywhayare> then you can use mmap() to a get a pointer to this data, and then you can wrap an Armadillo object around that using the constructor that takes a memory pointer

18:23 < naywhayare> it will certainly be slower than if you could fit the entire matrix in memory, but it should at least work

18:23 < naywhayare> it's been a while since I have done that, so I don't remember the exact mmap() syntax, but it should be possible

18:23 < rsv> okay

18:23 < naywhayare> I'd eventually like to get nice support in mlpack for mmap'ing files, but my time is unfortunately pretty limited

18:23 < naywhayare> I wish I could give you a nicer answer...

18:24 < rsv> this is quite helpful though

18:25 < naywhayare> that's good, at least :)

18:27 < naywhayare> hmm, another thought is that if your matrix is sparse, I think you should be able to use armadillo sparse matrices with logistic regression

18:27 < naywhayare> but I haven't tried that

18:27 < rsv> the marix is indeed sparse

18:28 < rsv> of the 1600 variables most of them will be 0

18:29 < naywhayare> ah, that may be a better approach then

18:29 < naywhayare> I've actually just now been working with refactoring the LogisticRegression class to prepare for the next release

18:29 < naywhayare> I pushed my code to the master repo just now

18:29 < rsv> so armadillo has special constructions for sparse matrices?

18:29 < naywhayare> the changes allow the LogisticRegression class to work with sparse matrices

18:29 < rsv> which are more memory efficient somehow?

18:30 < rsv> oh, really, that's great

18:30 < naywhayare> yeah, Armadillo has the "arma::sp_mat" class

18:30 < naywhayare> it's going to be slow, because the extra overhead of managing the sparse matrix is high

18:30 < naywhayare> but you should be able to fit it into memory still

18:30 < naywhayare> let me do a bit of reading to refresh my memory on how you would load that matrix...

18:33 rsv_ has joined #mlpack

18:33 < rsv_> excellent

18:34 < naywhayare> rsv_: okay, the armadillo sparse matrix support still isn't finished, and the loading is a little bit rough

18:34 rsv has quit [Ping timeout: 246 seconds]

18:34 < naywhayare> I think your best bet is to do this:

18:34 < naywhayare> convert your data file to a coordinate list format (so, each row has three columns: row, column, value)

18:34 < rsv_> is that different from csv?

18:35 < naywhayare> yeah

18:35 < naywhayare> so right now I assume your CSV looks like this:

18:35 < naywhayare> 1, 0, 0, 0, 0, 0, 0, ... (lots of zeroes), 3, 0, 0, 0, ...

18:35 < rsv_> i've actually been loading matrices in code from mysql

18:35 < naywhayare> oh, okay, I have no idea how you are doing that

18:35 < rsv_> and loading values into mat objcets

18:35 < naywhayare> okay

18:35 < naywhayare> in that case take a look at this sparse matrix constructor:

18:35 < rsv_> but you're saying to load these into files

18:36 < naywhayare> no, if you're working from sql, importing that directly into an arma::sp_mat object is probably the best thing to do

18:36 < naywhayare> http://arma.sourceforge.net/docs.html#SpMat

18:36 < rsv_> okay

18:36 < naywhayare> the constructor I'm thinking of is form 1 or form 2

18:37 < naywhayare> where you have a 2-column matrix containing the locations of the nonzero values (row/column pairs)

18:37 < naywhayare> and a vector containing each of the nonzero values

18:37 < rsv_> ahh okay, got it

18:37 < naywhayare> then you should be able to pass that sparse matrix to the LogisticRegression class

18:38 < naywhayare> but you'll have to use the git master branch, and the API for LogisticRegression has changed a little bit

18:38 Udit has quit [Quit: Udit]

18:38 < rsv_> okay

18:38 < naywhayare> so the class will be LogisticRegression<arma::sp_mat>

18:38 < rsv_> and that part happens the same way as if it were a normal matrix

18:38 < naywhayare> yeah, it should operate the same as if you were using arma::mat, as long as you specify arma::sp_mat as the template parameter

18:39 < naywhayare> there may be slight differences when you update to the latest git master revision; if you have problems I am happy to help work them out

18:39 < rsv_> okay, cool

18:39 < rsv_> i'll give this a try

18:39 Udit has joined #mlpack

18:39 < naywhayare> I was actually going to write some tests for sparse matrices now, but then you said you needed the support (interesting coincidence), so I pushed the nearly-done work that I had :)

18:40 < rsv_> great! i guess we'll both be testing it now

18:40 < naywhayare> yeah, please keep me updated as to how it works or if you have any problems

18:40 < naywhayare> (or if you think the API is clunky and should change)

18:41 < rsv_> will do

18:41 < naywhayare> the other note for the refactoring is that LogisticRegression used to have an OptimizerType template parameter (which you'd specify to be SGD or L_BFGS or whatever)

18:41 < naywhayare> but that's realistically only necessary for training, so now that template parameter only needs to be specified in some of the constructors or in the Train() function

18:41 < rsv_> it doesn't anymore?

18:42 < rsv_> so now it's just like LogisticRegressoin lr(A, b) without specifying an optimizertype?

18:43 < naywhayare> it'll actually look like this:

18:43 < naywhayare> LogisticRegression<> lr<OptimizerType>(A, b)

18:44 < rsv_> ah

18:46 < rsv_> so it's safe to update to the latest git master branch and try this out?

18:47 < naywhayare> the LogisticRegression class has a template parameter which is just the type of the matrix (defaults to arma::mat)

18:47 < naywhayare> yeah, should be safe to update

18:48 < naywhayare> sorry for the delayed response... I was intercepted by someone

18:50 < rsv_> no worries, thanks

18:53 7GHAA7L26 has joined #mlpack

18:53 < 7GHAA7L26> mlpack/mlpack#240 (master - e67787e : Ryan Curtin): The build passed.

18:53 < 7GHAA7L26> Change view : https://github.com/mlpack/mlpack/compare/bbe9cd161571...e67787e33613

18:53 < 7GHAA7L26> Build details : https://travis-ci.org/mlpack/mlpack/builds/80691712

18:53 7GHAA7L26 has left #mlpack []

18:54 < naywhayare> heh, interesting choice of bot name...

19:04 Udit has quit [Quit: Udit]

19:07 < naywhayare> ah, this syntax is wrong: "LogisticRegression<arma::sp_mat> lr<SGD>(A, b);"

19:07 < naywhayare> it turns out that explicitly specifying template parameters of constructors is not allowed because C++ is complicated...

19:07 < rsv_> oh

19:07 < naywhayare> so the correct thing to do, I think, will be "LogisticRegression<arma::sp_mat> lr(dimensionality, regularization); lr.Train<SGD>(A, b);"

19:08 < naywhayare> dimensionality will just be A.n_rows

19:08 < rsv_> ah okay

19:08 < naywhayare> and I guess you were using regularization 0

19:08 < naywhayare> I'm trying to get my tests working now... I'll let you know if I find anything else wrong :)

19:08 < rsv_> i'll end up using custom regularization to implement lasso :)

19:08 < naywhayare> ah, okay

19:09 < rsv_> what type is the regularization parameter?

19:09 < naywhayare> just a double

19:12 < rsv_> why is this constructor so different than the one that's like LogisticRegression (const arma::mat &predictors, const arma::vec &responses, const double lambda=0)

19:12 < naywhayare> what do you mean?

19:13 < rsv_> you said to use lr(dimensionality, regularization) which doesn't get constructed with the matrix of predictors and vector of responses

19:13 < rsv_> why are these different?

19:14 < naywhayare> the difference is because I've just refactored the LogisticRegression code

19:15 < rsv_> ah, of couse

19:15 < rsv_> ah, of course

19:15 < rsv_> i should really look at the code before asking stupid questions...

19:15 < naywhayare> I did two major things: I removed the OptimizerType template parameter from the LogisticRegression class, and allowed the OptimizerType template parameter to be specified only when training is happening

19:15 < rsv_> got it

19:16 < naywhayare> and then I added the MatType template parameter to LogisticRegression, so that you can use arma::mat or arma::sp_mat or whatever (even things like arma::Mat<float> and arma::Mat<int> should work, I think)

19:16 < naywhayare> feel free to ask questions :) I still haven't finished the updated documentation

19:16 < naywhayare> and it only gets rebuilt nightly for the mlpack website, so the online doxygen documentation isn't up to date at the moment :)

19:17 < rsv_> perfect

19:31 < naywhayare> lunch time... back later

19:43 < rsv_> i'm not completely sure if this problem is specific to my setup, but i'm getting an armadillo compile error on one of th eheaders

19:43 < rsv_> /usr/include/armadillo_bits/arma_config.hpp:17:18: error: ‘uword’ does not name a type

20:16 Udit has joined #mlpack

20:23 < naywhayare> use arma::uword, that should fix it

20:23 < naywhayare> ..I think?

20:23 < naywhayare> that's an odd place to get the error from

20:53 Udit has quit [Quit: Udit]

20:57 < rsv_> now it's complaining about sword, same thing? arma::sword

20:59 < rsv_> seems to compile now

20:59 < naywhayare> did you have to modify the armadillo sources to make that work?

20:59 < rsv_> yes

20:59 < rsv_> specifically, /usr/include/armadillo_bits/arma_config.hpp

21:00 < rsv_> as well as /usr/local/include/mlpack/core/util/arma_config_check.hpp actually

21:00 < rsv_> i had to change #include "arma_config.hpp" to #include "/usr/include/armadillo_bits/arma_config.hpp" tii

21:01 < naywhayare> that seems rather odd to me that that was necessary, but if it works, I won't ask questions :)

21:03 < rsv_> yeah, i had mlpack installed before i updated to the master branch and everything worked out of the box... weird..

21:06 < naywhayare> ah, if you have mlpack in two places on the system you can get weird include path issues sometimes...

21:08 < rsv_> fair enough

21:22 < rsv_> ah, it was related to an update of armadillo that i made, i reinstalled mlpack and everything is now fine out of the box

21:23 tham has joined #mlpack

21:24 < naywhayare> ah, good to hear

21:29 < tham> Hi, do mlpack has something like route map?

21:29 < tham> Which could show the users what are the targets of the future

21:30 < naywhayare> tham: I don't understand what you mean; do you mean something like a collaborative filtering model for prediction?

21:30 < naywhayare> rsv_: I just committed the final changes I wanted to make to the LogisticRegression class (no big API changes; added another Train() method though), so you may want to update from github

21:31 < naywhayare> it shouldn't break anything :)

21:31 < tham> Not something like that, just like the release date of next version, what kind of algorithms will add into the mlpack, something like this

21:31 < naywhayare> oh! sorry, I misunderstood

21:32 < naywhayare> I'm working towards a release now, and I hope to have it done near the end of the month or shortly thereafter

21:32 < rsv_> naywhayare: thanks!

21:32 < naywhayare> you can take a look at HISTORY.txt in the github master repo to see what has changed since 1.0.12

21:32 < tham> I notice that github has a new folder called ann, there are an algorithm called cnn

21:32 < naywhayare> sorry, it's HISTORY.md now

21:33 < naywhayare> yeah, zoq is the expert there; I know that it's a set of classes and functions for neural networks, but I don't know too much more than that

21:38 < tham> Thanks, just download and compile mlpack1.0.12 64bits version on windows8.1 with visual studio 2015, do anyone need the details?

21:40 < naywhayare> yeah, I'd appreciate it if you could write down what you did or something

21:40 < naywhayare> I have been meaning for ages to get a stable windows build server going, but I haven't had the time. if you have instructions, that should make it much easier :)

21:41 < tham> Yes, I write down at here(http://qtandopencv.blogspot.my/2015/09/deep-learning-04-compile-mlpack-1012-on.html).I hope this can save some troubles of other users who would like to use mlpack on windows

21:41 < naywhayare> awesome, thanks so much

21:42 < tham> But this is not an auto build

21:42 < naywhayare> yeah, I can do the automation, that shouldn't be too hard

21:42 < tham> thanks for the hard works of yours, bye

21:42 tham has quit [Quit: Page closed]

21:51 < rsv_> Train takes arma::row for the responses instead of arma::vec, is that right?

21:53 < naywhayare> yes, that was another change that happened

21:54 < naywhayare> it didn't make sense to have the labels have type 'double' when they're integers (0 or 1)

21:54 < rsv_> true

22:00 < rsv_> is the format for using row different? i'm trying Row<bool> b; b << 0 << 1 << (etc) << endr;

22:01 < naywhayare> try Row<size_t> not Row<bool>... I'm not sure if bool is supported as an armadillo type

22:01 < naywhayare> (in my opinion it should be, but I'm not sure it is)

22:01 < rsv_> that works

22:02 < rsv_> perfect, my example program works with the same result as before

22:04 < naywhayare> and this is using a sparse dataset?

22:05 < rsv_> not yet, getting there

22:07 < naywhayare> ah, okay. well hopefully it works when you do :)

22:20 < rsv_> so i got it to compile with a sparse matrix for A

22:21 < rsv_> A is (1,20), b has 20 responses, i simply do LogisticRegression<sp_mat> lr(A.n_rows, 0); lr.Train (A, b);

22:21 < rsv_> the program runs but it's taking a while to finish...

22:24 < naywhayare> hmm, that will use L-BFGS (the default optimizer). try "lr.Train<SGD>(A, b)" and see if that is any faster

22:25 < naywhayare> I know that sparse matrices have a lot more overhead, but I think that SGD should be faster because it passes over the data sequentially

22:25 < naywhayare> the particular storage format used by Armadillo is specialized for matrix multiplication too, not element access

22:28 < rsv_> where does SGD inherit from? is it not mlpack::optimization::SGD?

22:29 < rsv_> weird, mlpack::optimization::SGD won't compile but mlpack::optimization::L_BFGS does

22:32 rsv_ has quit [Quit: Page closed]

22:32 rsv has joined #mlpack

22:32 < rsv> i think i just forgot an include actually

22:33 < naywhayare> yeah, src/mlpack/core/optimizers/sgd/sgd.hpp

22:33 < naywhayare> (...if I remembered right)

22:33 < rsv> yup, that's right

22:34 < rsv> okay, that improved the computational time by several orders of magnitude probably

22:34 < rsv> it does work now

22:35 < rsv> implementation on the 4Mx1.6K matrix will take some time to implement, but i'll let you now if i get it working

22:39 < naywhayare> yeah, I hope it works

22:40 < naywhayare> it will be slower than using a dense matrix, but it will probably be faster than ordering more RAM, waiting for it to ship, installing it, and then running it :)

22:49 < rsv> prooobably :)

23:19 travis-ci has joined #mlpack

23:19 < travis-ci> mlpack/mlpack#242 (master - 9295469 : Ryan Curtin): The build was broken.

23:19 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/d6e9b1be05f6...9295469715ff

23:19 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/80739430

23:19 travis-ci has left #mlpack []

23:55 rsv has quit [Ping timeout: 246 seconds]