verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Ping timeout: 255 seconds]
Udit has joined #mlpack
Udit has quit [Ping timeout: 240 seconds]
Udit has joined #mlpack
Udit has quit [Client Quit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
rsv has joined #mlpack
< rsv> hi again, i have a new question today... it looks like the matrix I am trying to use for LogisticRegression is too big, is there a way to train LogisticRegression in a progressive way to avoid loading a giant matrix?
< rsv> the error i'm seeing is: error: arma::memory::acquire(): out of memory
< rsv> the matrix is 4,000,000 observations * 1600 variables
< naywhayare> rsv: yeah, it can be done, but it'll be a little bit tricky...
< naywhayare> basically, you can convert your file into a binary format (so, not csv -- just doubles serialized to a disk)
< naywhayare> then you can use mmap() to a get a pointer to this data, and then you can wrap an Armadillo object around that using the constructor that takes a memory pointer
< naywhayare> it will certainly be slower than if you could fit the entire matrix in memory, but it should at least work
< naywhayare> it's been a while since I have done that, so I don't remember the exact mmap() syntax, but it should be possible
< rsv> okay
< naywhayare> I'd eventually like to get nice support in mlpack for mmap'ing files, but my time is unfortunately pretty limited
< naywhayare> I wish I could give you a nicer answer...
< rsv> this is quite helpful though
< naywhayare> that's good, at least :)
< naywhayare> hmm, another thought is that if your matrix is sparse, I think you should be able to use armadillo sparse matrices with logistic regression
< naywhayare> but I haven't tried that
< rsv> the marix is indeed sparse
< rsv> of the 1600 variables most of them will be 0
< naywhayare> ah, that may be a better approach then
< naywhayare> I've actually just now been working with refactoring the LogisticRegression class to prepare for the next release
< naywhayare> I pushed my code to the master repo just now
< rsv> so armadillo has special constructions for sparse matrices?
< naywhayare> the changes allow the LogisticRegression class to work with sparse matrices
< rsv> which are more memory efficient somehow?
< rsv> oh, really, that's great
< naywhayare> yeah, Armadillo has the "arma::sp_mat" class
< naywhayare> it's going to be slow, because the extra overhead of managing the sparse matrix is high
< naywhayare> but you should be able to fit it into memory still
< naywhayare> let me do a bit of reading to refresh my memory on how you would load that matrix...
rsv_ has joined #mlpack
< rsv_> excellent
< naywhayare> rsv_: okay, the armadillo sparse matrix support still isn't finished, and the loading is a little bit rough
rsv has quit [Ping timeout: 246 seconds]
< naywhayare> I think your best bet is to do this:
< naywhayare> convert your data file to a coordinate list format (so, each row has three columns: row, column, value)
< rsv_> is that different from csv?
< naywhayare> yeah
< naywhayare> so right now I assume your CSV looks like this:
< naywhayare> 1, 0, 0, 0, 0, 0, 0, ... (lots of zeroes), 3, 0, 0, 0, ...
< rsv_> i've actually been loading matrices in code from mysql
< naywhayare> oh, okay, I have no idea how you are doing that
< rsv_> and loading values into mat objcets
< naywhayare> okay
< naywhayare> in that case take a look at this sparse matrix constructor:
< rsv_> but you're saying to load these into files
< naywhayare> no, if you're working from sql, importing that directly into an arma::sp_mat object is probably the best thing to do
< rsv_> okay
< naywhayare> the constructor I'm thinking of is form 1 or form 2
< naywhayare> where you have a 2-column matrix containing the locations of the nonzero values (row/column pairs)
< naywhayare> and a vector containing each of the nonzero values
< rsv_> ahh okay, got it
< naywhayare> then you should be able to pass that sparse matrix to the LogisticRegression class
< naywhayare> but you'll have to use the git master branch, and the API for LogisticRegression has changed a little bit
Udit has quit [Quit: Udit]
< rsv_> okay
< naywhayare> so the class will be LogisticRegression<arma::sp_mat>
< rsv_> and that part happens the same way as if it were a normal matrix
< naywhayare> yeah, it should operate the same as if you were using arma::mat, as long as you specify arma::sp_mat as the template parameter
< naywhayare> there may be slight differences when you update to the latest git master revision; if you have problems I am happy to help work them out
< rsv_> okay, cool
< rsv_> i'll give this a try
Udit has joined #mlpack
< naywhayare> I was actually going to write some tests for sparse matrices now, but then you said you needed the support (interesting coincidence), so I pushed the nearly-done work that I had :)
< rsv_> great! i guess we'll both be testing it now
< naywhayare> yeah, please keep me updated as to how it works or if you have any problems
< naywhayare> (or if you think the API is clunky and should change)
< rsv_> will do
< naywhayare> the other note for the refactoring is that LogisticRegression used to have an OptimizerType template parameter (which you'd specify to be SGD or L_BFGS or whatever)
< naywhayare> but that's realistically only necessary for training, so now that template parameter only needs to be specified in some of the constructors or in the Train() function
< rsv_> it doesn't anymore?
< rsv_> so now it's just like LogisticRegressoin lr(A, b) without specifying an optimizertype?
< naywhayare> it'll actually look like this:
< naywhayare> LogisticRegression<> lr<OptimizerType>(A, b)
< rsv_> ah
< rsv_> so it's safe to update to the latest git master branch and try this out?
< naywhayare> the LogisticRegression class has a template parameter which is just the type of the matrix (defaults to arma::mat)
< naywhayare> yeah, should be safe to update
< naywhayare> sorry for the delayed response... I was intercepted by someone
< rsv_> no worries, thanks
7GHAA7L26 has joined #mlpack
< 7GHAA7L26> mlpack/mlpack#240 (master - e67787e : Ryan Curtin): The build passed.
7GHAA7L26 has left #mlpack []
< naywhayare> heh, interesting choice of bot name...
Udit has quit [Quit: Udit]
< naywhayare> ah, this syntax is wrong: "LogisticRegression<arma::sp_mat> lr<SGD>(A, b);"
< naywhayare> it turns out that explicitly specifying template parameters of constructors is not allowed because C++ is complicated...
< rsv_> oh
< naywhayare> so the correct thing to do, I think, will be "LogisticRegression<arma::sp_mat> lr(dimensionality, regularization); lr.Train<SGD>(A, b);"
< naywhayare> dimensionality will just be A.n_rows
< rsv_> ah okay
< naywhayare> and I guess you were using regularization 0
< naywhayare> I'm trying to get my tests working now... I'll let you know if I find anything else wrong :)
< rsv_> i'll end up using custom regularization to implement lasso :)
< naywhayare> ah, okay
< rsv_> what type is the regularization parameter?
< naywhayare> just a double
< rsv_> why is this constructor so different than the one that's like LogisticRegression (const arma::mat &predictors, const arma::vec &responses, const double lambda=0)
< naywhayare> what do you mean?
< rsv_> you said to use lr(dimensionality, regularization) which doesn't get constructed with the matrix of predictors and vector of responses
< rsv_> why are these different?
< naywhayare> the difference is because I've just refactored the LogisticRegression code
< rsv_> ah, of couse
< rsv_> ah, of course
< rsv_> i should really look at the code before asking stupid questions...
< naywhayare> I did two major things: I removed the OptimizerType template parameter from the LogisticRegression class, and allowed the OptimizerType template parameter to be specified only when training is happening
< rsv_> got it
< naywhayare> and then I added the MatType template parameter to LogisticRegression, so that you can use arma::mat or arma::sp_mat or whatever (even things like arma::Mat<float> and arma::Mat<int> should work, I think)
< naywhayare> feel free to ask questions :) I still haven't finished the updated documentation
< naywhayare> and it only gets rebuilt nightly for the mlpack website, so the online doxygen documentation isn't up to date at the moment :)
< rsv_> perfect
< naywhayare> lunch time... back later
< rsv_> i'm not completely sure if this problem is specific to my setup, but i'm getting an armadillo compile error on one of th eheaders
< rsv_> /usr/include/armadillo_bits/arma_config.hpp:17:18: error: ‘uword’ does not name a type
Udit has joined #mlpack
< naywhayare> use arma::uword, that should fix it
< naywhayare> ..I think?
< naywhayare> that's an odd place to get the error from
Udit has quit [Quit: Udit]
< rsv_> now it's complaining about sword, same thing? arma::sword
< rsv_> seems to compile now
< naywhayare> did you have to modify the armadillo sources to make that work?
< rsv_> yes
< rsv_> specifically, /usr/include/armadillo_bits/arma_config.hpp
< rsv_> as well as /usr/local/include/mlpack/core/util/arma_config_check.hpp actually
< rsv_> i had to change #include "arma_config.hpp" to #include "/usr/include/armadillo_bits/arma_config.hpp" tii
< naywhayare> that seems rather odd to me that that was necessary, but if it works, I won't ask questions :)
< rsv_> yeah, i had mlpack installed before i updated to the master branch and everything worked out of the box... weird..
< naywhayare> ah, if you have mlpack in two places on the system you can get weird include path issues sometimes...
< rsv_> fair enough
< rsv_> ah, it was related to an update of armadillo that i made, i reinstalled mlpack and everything is now fine out of the box
tham has joined #mlpack
< naywhayare> ah, good to hear
< tham> Hi, do mlpack has something like route map?
< tham> Which could show the users what are the targets of the future
< naywhayare> tham: I don't understand what you mean; do you mean something like a collaborative filtering model for prediction?
< naywhayare> rsv_: I just committed the final changes I wanted to make to the LogisticRegression class (no big API changes; added another Train() method though), so you may want to update from github
< naywhayare> it shouldn't break anything :)
< tham> Not something like that, just like the release date of next version, what kind of algorithms will add into the mlpack, something like this
< naywhayare> oh! sorry, I misunderstood
< naywhayare> I'm working towards a release now, and I hope to have it done near the end of the month or shortly thereafter
< rsv_> naywhayare: thanks!
< naywhayare> you can take a look at HISTORY.txt in the github master repo to see what has changed since 1.0.12
< tham> I notice that github has a new folder called ann, there are an algorithm called cnn
< naywhayare> sorry, it's HISTORY.md now
< naywhayare> yeah, zoq is the expert there; I know that it's a set of classes and functions for neural networks, but I don't know too much more than that
< tham> Thanks, just download and compile mlpack1.0.12 64bits version on windows8.1 with visual studio 2015, do anyone need the details?
< naywhayare> yeah, I'd appreciate it if you could write down what you did or something
< naywhayare> I have been meaning for ages to get a stable windows build server going, but I haven't had the time. if you have instructions, that should make it much easier :)
< tham> Yes, I write down at here(http://qtandopencv.blogspot.my/2015/09/deep-learning-04-compile-mlpack-1012-on.html).I hope this can save some troubles of other users who would like to use mlpack on windows
< naywhayare> awesome, thanks so much
< tham> But this is not an auto build
< naywhayare> yeah, I can do the automation, that shouldn't be too hard
< tham> thanks for the hard works of yours, bye
tham has quit [Quit: Page closed]
< rsv_> Train takes arma::row for the responses instead of arma::vec, is that right?
< naywhayare> yes, that was another change that happened
< naywhayare> it didn't make sense to have the labels have type 'double' when they're integers (0 or 1)
< rsv_> true
< rsv_> is the format for using row different? i'm trying Row<bool> b; b << 0 << 1 << (etc) << endr;
< naywhayare> try Row<size_t> not Row<bool>... I'm not sure if bool is supported as an armadillo type
< naywhayare> (in my opinion it should be, but I'm not sure it is)
< rsv_> that works
< rsv_> perfect, my example program works with the same result as before
< naywhayare> and this is using a sparse dataset?
< rsv_> not yet, getting there
< naywhayare> ah, okay. well hopefully it works when you do :)
< rsv_> so i got it to compile with a sparse matrix for A
< rsv_> A is (1,20), b has 20 responses, i simply do LogisticRegression<sp_mat> lr(A.n_rows, 0); lr.Train (A, b);
< rsv_> the program runs but it's taking a while to finish...
< naywhayare> hmm, that will use L-BFGS (the default optimizer). try "lr.Train<SGD>(A, b)" and see if that is any faster
< naywhayare> I know that sparse matrices have a lot more overhead, but I think that SGD should be faster because it passes over the data sequentially
< naywhayare> the particular storage format used by Armadillo is specialized for matrix multiplication too, not element access
< rsv_> where does SGD inherit from? is it not mlpack::optimization::SGD?
< rsv_> weird, mlpack::optimization::SGD won't compile but mlpack::optimization::L_BFGS does
rsv_ has quit [Quit: Page closed]
rsv has joined #mlpack
< rsv> i think i just forgot an include actually
< naywhayare> yeah, src/mlpack/core/optimizers/sgd/sgd.hpp
< naywhayare> (...if I remembered right)
< rsv> yup, that's right
< rsv> okay, that improved the computational time by several orders of magnitude probably
< rsv> it does work now
< rsv> implementation on the 4Mx1.6K matrix will take some time to implement, but i'll let you now if i get it working
< naywhayare> yeah, I hope it works
< naywhayare> it will be slower than using a dense matrix, but it will probably be faster than ordering more RAM, waiting for it to ship, installing it, and then running it :)
< rsv> prooobably :)
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#242 (master - 9295469 : Ryan Curtin): The build was broken.
travis-ci has left #mlpack []
rsv has quit [Ping timeout: 246 seconds]