verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Ping timeout: 255 seconds]
Udit has joined #mlpack
Udit has quit [Ping timeout: 240 seconds]
Udit has joined #mlpack
Udit has quit [Client Quit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
Udit has quit [Quit: Udit]
Udit has joined #mlpack
rsv has joined #mlpack
< rsv>
hi again, i have a new question today... it looks like the matrix I am trying to use for LogisticRegression is too big, is there a way to train LogisticRegression in a progressive way to avoid loading a giant matrix?
< rsv>
the error i'm seeing is: error: arma::memory::acquire(): out of memory
< rsv>
the matrix is 4,000,000 observations * 1600 variables
< naywhayare>
rsv: yeah, it can be done, but it'll be a little bit tricky...
< naywhayare>
basically, you can convert your file into a binary format (so, not csv -- just doubles serialized to a disk)
< naywhayare>
then you can use mmap() to a get a pointer to this data, and then you can wrap an Armadillo object around that using the constructor that takes a memory pointer
< naywhayare>
it will certainly be slower than if you could fit the entire matrix in memory, but it should at least work
< naywhayare>
it's been a while since I have done that, so I don't remember the exact mmap() syntax, but it should be possible
< rsv>
okay
< naywhayare>
I'd eventually like to get nice support in mlpack for mmap'ing files, but my time is unfortunately pretty limited
< naywhayare>
I wish I could give you a nicer answer...
< rsv>
this is quite helpful though
< naywhayare>
that's good, at least :)
< naywhayare>
hmm, another thought is that if your matrix is sparse, I think you should be able to use armadillo sparse matrices with logistic regression
< naywhayare>
but I haven't tried that
< rsv>
the marix is indeed sparse
< rsv>
of the 1600 variables most of them will be 0
< naywhayare>
ah, that may be a better approach then
< naywhayare>
I've actually just now been working with refactoring the LogisticRegression class to prepare for the next release
< naywhayare>
I pushed my code to the master repo just now
< rsv>
so armadillo has special constructions for sparse matrices?
< naywhayare>
the changes allow the LogisticRegression class to work with sparse matrices
< rsv>
which are more memory efficient somehow?
< rsv>
oh, really, that's great
< naywhayare>
yeah, Armadillo has the "arma::sp_mat" class
< naywhayare>
it's going to be slow, because the extra overhead of managing the sparse matrix is high
< naywhayare>
but you should be able to fit it into memory still
< naywhayare>
let me do a bit of reading to refresh my memory on how you would load that matrix...
rsv_ has joined #mlpack
< rsv_>
excellent
< naywhayare>
rsv_: okay, the armadillo sparse matrix support still isn't finished, and the loading is a little bit rough
rsv has quit [Ping timeout: 246 seconds]
< naywhayare>
I think your best bet is to do this:
< naywhayare>
convert your data file to a coordinate list format (so, each row has three columns: row, column, value)
< rsv_>
is that different from csv?
< naywhayare>
yeah
< naywhayare>
so right now I assume your CSV looks like this:
< naywhayare>
the constructor I'm thinking of is form 1 or form 2
< naywhayare>
where you have a 2-column matrix containing the locations of the nonzero values (row/column pairs)
< naywhayare>
and a vector containing each of the nonzero values
< rsv_>
ahh okay, got it
< naywhayare>
then you should be able to pass that sparse matrix to the LogisticRegression class
< naywhayare>
but you'll have to use the git master branch, and the API for LogisticRegression has changed a little bit
Udit has quit [Quit: Udit]
< rsv_>
okay
< naywhayare>
so the class will be LogisticRegression<arma::sp_mat>
< rsv_>
and that part happens the same way as if it were a normal matrix
< naywhayare>
yeah, it should operate the same as if you were using arma::mat, as long as you specify arma::sp_mat as the template parameter
< naywhayare>
there may be slight differences when you update to the latest git master revision; if you have problems I am happy to help work them out
< rsv_>
okay, cool
< rsv_>
i'll give this a try
Udit has joined #mlpack
< naywhayare>
I was actually going to write some tests for sparse matrices now, but then you said you needed the support (interesting coincidence), so I pushed the nearly-done work that I had :)
< rsv_>
great! i guess we'll both be testing it now
< naywhayare>
yeah, please keep me updated as to how it works or if you have any problems
< naywhayare>
(or if you think the API is clunky and should change)
< rsv_>
will do
< naywhayare>
the other note for the refactoring is that LogisticRegression used to have an OptimizerType template parameter (which you'd specify to be SGD or L_BFGS or whatever)
< naywhayare>
but that's realistically only necessary for training, so now that template parameter only needs to be specified in some of the constructors or in the Train() function
< rsv_>
it doesn't anymore?
< rsv_>
so now it's just like LogisticRegressoin lr(A, b) without specifying an optimizertype?
< naywhayare>
it'll actually look like this:
< naywhayare>
LogisticRegression<> lr<OptimizerType>(A, b)
< rsv_>
ah
< rsv_>
so it's safe to update to the latest git master branch and try this out?
< naywhayare>
the LogisticRegression class has a template parameter which is just the type of the matrix (defaults to arma::mat)
< naywhayare>
yeah, should be safe to update
< naywhayare>
sorry for the delayed response... I was intercepted by someone
< rsv_>
no worries, thanks
7GHAA7L26 has joined #mlpack
< 7GHAA7L26>
mlpack/mlpack#240 (master - e67787e : Ryan Curtin): The build passed.
< naywhayare>
heh, interesting choice of bot name...
Udit has quit [Quit: Udit]
< naywhayare>
ah, this syntax is wrong: "LogisticRegression<arma::sp_mat> lr<SGD>(A, b);"
< naywhayare>
it turns out that explicitly specifying template parameters of constructors is not allowed because C++ is complicated...
< rsv_>
oh
< naywhayare>
so the correct thing to do, I think, will be "LogisticRegression<arma::sp_mat> lr(dimensionality, regularization); lr.Train<SGD>(A, b);"
< naywhayare>
dimensionality will just be A.n_rows
< rsv_>
ah okay
< naywhayare>
and I guess you were using regularization 0
< naywhayare>
I'm trying to get my tests working now... I'll let you know if I find anything else wrong :)
< rsv_>
i'll end up using custom regularization to implement lasso :)
< naywhayare>
ah, okay
< rsv_>
what type is the regularization parameter?
< naywhayare>
just a double
< rsv_>
why is this constructor so different than the one that's like LogisticRegression (const arma::mat &predictors, const arma::vec &responses, const double lambda=0)
< naywhayare>
what do you mean?
< rsv_>
you said to use lr(dimensionality, regularization) which doesn't get constructed with the matrix of predictors and vector of responses
< rsv_>
why are these different?
< naywhayare>
the difference is because I've just refactored the LogisticRegression code
< rsv_>
ah, of couse
< rsv_>
ah, of course
< rsv_>
i should really look at the code before asking stupid questions...
< naywhayare>
I did two major things: I removed the OptimizerType template parameter from the LogisticRegression class, and allowed the OptimizerType template parameter to be specified only when training is happening
< rsv_>
got it
< naywhayare>
and then I added the MatType template parameter to LogisticRegression, so that you can use arma::mat or arma::sp_mat or whatever (even things like arma::Mat<float> and arma::Mat<int> should work, I think)
< naywhayare>
feel free to ask questions :) I still haven't finished the updated documentation
< naywhayare>
and it only gets rebuilt nightly for the mlpack website, so the online doxygen documentation isn't up to date at the moment :)
< rsv_>
perfect
< naywhayare>
lunch time... back later
< rsv_>
i'm not completely sure if this problem is specific to my setup, but i'm getting an armadillo compile error on one of th eheaders
< rsv_>
/usr/include/armadillo_bits/arma_config.hpp:17:18: error: ‘uword’ does not name a type
Udit has joined #mlpack
< naywhayare>
use arma::uword, that should fix it
< naywhayare>
..I think?
< naywhayare>
that's an odd place to get the error from
Udit has quit [Quit: Udit]
< rsv_>
now it's complaining about sword, same thing? arma::sword
< rsv_>
seems to compile now
< naywhayare>
did you have to modify the armadillo sources to make that work?
< rsv_>
as well as /usr/local/include/mlpack/core/util/arma_config_check.hpp actually
< rsv_>
i had to change #include "arma_config.hpp" to #include "/usr/include/armadillo_bits/arma_config.hpp" tii
< naywhayare>
that seems rather odd to me that that was necessary, but if it works, I won't ask questions :)
< rsv_>
yeah, i had mlpack installed before i updated to the master branch and everything worked out of the box... weird..
< naywhayare>
ah, if you have mlpack in two places on the system you can get weird include path issues sometimes...
< rsv_>
fair enough
< rsv_>
ah, it was related to an update of armadillo that i made, i reinstalled mlpack and everything is now fine out of the box
tham has joined #mlpack
< naywhayare>
ah, good to hear
< tham>
Hi, do mlpack has something like route map?
< tham>
Which could show the users what are the targets of the future
< naywhayare>
tham: I don't understand what you mean; do you mean something like a collaborative filtering model for prediction?
< naywhayare>
rsv_: I just committed the final changes I wanted to make to the LogisticRegression class (no big API changes; added another Train() method though), so you may want to update from github
< naywhayare>
it shouldn't break anything :)
< tham>
Not something like that, just like the release date of next version, what kind of algorithms will add into the mlpack, something like this
< naywhayare>
oh! sorry, I misunderstood
< naywhayare>
I'm working towards a release now, and I hope to have it done near the end of the month or shortly thereafter
< rsv_>
naywhayare: thanks!
< naywhayare>
you can take a look at HISTORY.txt in the github master repo to see what has changed since 1.0.12
< tham>
I notice that github has a new folder called ann, there are an algorithm called cnn
< naywhayare>
sorry, it's HISTORY.md now
< naywhayare>
yeah, zoq is the expert there; I know that it's a set of classes and functions for neural networks, but I don't know too much more than that
< tham>
Thanks, just download and compile mlpack1.0.12 64bits version on windows8.1 with visual studio 2015, do anyone need the details?
< naywhayare>
yeah, I'd appreciate it if you could write down what you did or something
< naywhayare>
I have been meaning for ages to get a stable windows build server going, but I haven't had the time. if you have instructions, that should make it much easier :)
< rsv>
okay, that improved the computational time by several orders of magnitude probably
< rsv>
it does work now
< rsv>
implementation on the 4Mx1.6K matrix will take some time to implement, but i'll let you now if i get it working
< naywhayare>
yeah, I hope it works
< naywhayare>
it will be slower than using a dense matrix, but it will probably be faster than ordering more RAM, waiting for it to ship, installing it, and then running it :)
< rsv>
prooobably :)
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#242 (master - 9295469 : Ryan Curtin): The build was broken.