verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
dhfromkorea has joined #mlpack
dhfromkorea has quit [Remote host closed the connection]
dhfromkorea has joined #mlpack
stephentu_ has quit [Ping timeout: 245 seconds]
< naywhayare>
Squalluca: LARS stores the gram matrix internally (so, 189390x189390, which will be very large...)
< naywhayare>
or, hang on... maybe 13233x13233 depending on if 13233 is the number of dimensions or the number of points
< Squalluca>
13233 is the number of points
< Squalluca>
the bigger number are the dimensions
< naywhayare>
hmm, I think it will be 13233x13233 then, so that should not be a problem
< Squalluca>
so maybe i am passing the wrong matrix to LARS
< naywhayare>
I think the bigger issue for RAM might be that LARS is storing the complete solution path (accessible via BetaPath()), which is std::vector<arma::vec> and each vector has length equal to the number of dimensions
< Squalluca>
i understand, these values are all used in the computation or are them stored for other reason?
< Squalluca>
i guess the are needed if you keep them
< naywhayare>
I'm not the one who wrote LARS
< naywhayare>
I'm taking a look at it now
< naywhayare>
I don't *think* they're necessary, but let me glance at it some more...
< Squalluca>
ok, thank you very much :D
< naywhayare>
hm, okay, so what I think is that only the last two elements of betaPath are ever used
< naywhayare>
in InterpolateBeta() (which is called once LARS converges), to calculate the final solution vector beta
< naywhayare>
the code could probably be refactored to only hold the two most recent betas fairly easily (only lars.cpp uses betaPath)
< naywhayare>
unfortunately I have a paper deadline on Friday so I can't look into it further, but my best guess is that that is where the huge memory usage is coming from
< naywhayare>
once the paper deadline passes I'll have a little bit more time...
< Squalluca>
no problema and thanks, i'll look into it more, because i think the Gram matrix is 189390x189390, that is the dimension that crashes my program
< Squalluca>
when it grows
< Squalluca>
thank you again, you have been very helpful, cya.
< naywhayare>
yeah, the gram matrix should be (number of points) x (number of points), unless I've got my logic backwards
< naywhayare>
crap! I do have it backwards. so the gram matrix is 189390x189390
< naywhayare>
there's not a very easy way to solve that problem unless you have 267GB of RAM or so :)
< naywhayare>
and since my logic is backwards, each element in betaPath will be of length (number of points), so that's probably not the bulk of your memory usage
< Squalluca>
mhhh
< Squalluca>
: (
< Squalluca>
iĺl try to figure something out
< naywhayare>
sorry for the bad news...
< naywhayare>
refactoring LARS to not require the whole Gram matrix in memory at once would be a very significant effort
< naywhayare>
it might be better to think about doing some dimensionality reduction like PCA first, maybe?
< naywhayare>
I don't know what the application is, so I don't know if that's a good or bad idea
< Squalluca>
it is a work based on a paper called "blessing of dimensionality" it uses regression to regress to a low dimensional space from an really High one for face recognition
< Squalluca>
it is intented to use high dimension
< Squalluca>
in fact they use a coordinate descent method for regression, maybe that doesn't use the gram matrix?
< naywhayare>
yeah, techniques that work in extremely high dimensions should definitely avoid calculating the explicit gram matrix
< naywhayare>
is this the paper by D. Chen et al. at CVPR 2013? it looks interesting
< Squalluca>
yes
stephentu_ has joined #mlpack
Squalluca has quit [Quit: Page closed]
dhfromkorea has quit [Remote host closed the connection]
stephentu_ has quit [Ping timeout: 252 seconds]
< stephentu>
naywhayare: so i downloaded arma-4.0 from sourceforge
< stephentu>
but its weird b/c my system config.hpp
< stephentu>
uses this wrapper stuff
< stephentu>
any hints
< stephentu>
i'm on arch linux
< stephentu>
slash if you have a better process
< stephentu>
for building w/ older versons
< stephentu>
*versions
< stephentu>
i'm all ears
jbc__ has quit [Quit: jbc__]
jbc_ has joined #mlpack
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
curiousguy13 has quit [Read error: Connection timed out]
< naywhayare>
stephentu: what did you set -DARMADILLO_LIBRARY to?
< naywhayare>
it should be the path to libarmadillo.so, not the path to the directory containing it
< naywhayare>
(that's my first guess)
< naywhayare>
oh
< naywhayare>
I bet I know what's gone wrong
< naywhayare>
so, this is kind of an oddity and I'm not responsible for it. during the armadillo build, it uses CMake to generate a config.hpp, and places it, along with armadillo and armadillo_bits/, into "tmp/include/"
< naywhayare>
so ARMADILLO_INCLUDE_DIR should be /path/to/armadillo/tmp/include/, if you only did 'make' and not 'make install'
< naywhayare>
but the library is still in /path/to/armadillo/libarmadillo.so...
< stephentu>
oooo
< stephentu>
that might explain it!
< stephentu>
ill try a make install
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
jbc_ has quit [Quit: jbc_]
kshitijk has joined #mlpack
vedhu63w has joined #mlpack
vedhu63w has quit [Remote host closed the connection]
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
udit_s has joined #mlpack
curiousguy13 has joined #mlpack
udit_s has quit [Ping timeout: 252 seconds]
kshitijk has quit [Ping timeout: 264 seconds]
govg has quit [Ping timeout: 256 seconds]
govg has joined #mlpack
curiousguy13 has quit [Ping timeout: 265 seconds]
udit_s has joined #mlpack
curiousguy13 has joined #mlpack
curiousguy13 has quit [Ping timeout: 265 seconds]
kshitijk has joined #mlpack
curiousguy13 has joined #mlpack
stephentu has quit [Quit: Lost terminal]
kshitijk has quit [Ping timeout: 264 seconds]
curiousguy13 has quit [Ping timeout: 256 seconds]
udit_s has quit [Remote host closed the connection]
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 245 seconds]
kshitijk has joined #mlpack
curiousguy13 has joined #mlpack
jbc_ has joined #mlpack
stephentu has joined #mlpack
curiousguy13 has quit [Ping timeout: 252 seconds]
kshitijk has quit [Ping timeout: 245 seconds]
curiousguy13 has joined #mlpack
< stephentu>
naywhayare: the problem was somebody put a non-symmetric cov matrix
< stephentu>
for the gaussian tests
< stephentu>
and then i think the ifdef thing you did
< stephentu>
caused it to factorize it differently
< stephentu>
good times
kshitijk has joined #mlpack
< naywhayare>
"2 1.5; 1 4"
< naywhayare>
what was I thinking? :(
< naywhayare>
well, thanks for digging to the bottom of it :)
< stephentu>
naywhayare: i'm surprised the cholesky call didnt fail
< stephentu>
maybe it just looks at the lower triangle
< stephentu>
or something
< naywhayare>
yeah; from the LAPACK documentation:
< naywhayare>
If UPLO = 'U', the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced.
< naywhayare>
(the opposite applies for UPLO = 'L')
< stephentu>
should symmetric matrices be a separate type?
< stephentu>
(this is a design question)
< stephentu>
(not an actual suggestion)
< naywhayare>
I wouldn't be opposed to the idea, especially because you can represent the matrix in far less space
< naywhayare>
but I think in this case we're constrained by what LAPACK and BLAS want, which is an NxM block of memory, so being clever with the storage doesn't get you anything
< naywhayare>
personally I would be more interested in a way to "mark" matrices as symmetric
< stephentu>
actually i was athinking about that
< stephentu>
like i'm wondering if there should be these bits
< stephentu>
in every matrix
< stephentu>
like IS_UPPER_TRIANGULAR
< stephentu>
IS_SYMMETRIC
< stephentu>
IS_DIAGONAL
< naywhayare>
but your choices there in C++ seem to be (a) a runtime member... increases sizeof(mat), bad; (b) specify it as a template parameter, but now you have a thousand template parameters and the syntax starts to resemble Eigen which in my opinion is overcomplex
< stephentu>
hey eigen only has 4
< stephentu>
template parameters
< naywhayare>
(c) use some kind of expression to mark it... "this_matrix_is_diagonal(matrix)"
< stephentu>
or maybe 5
< stephentu>
i cant remember
< stephentu>
:)
< naywhayare>
:)
< stephentu>
i actually really like eigen
< naywhayare>
either way I think Eigen is overcomplex and the learning curve is pretty steep, which is why I originally chose Armadillo
< stephentu>
i'm starting a new project and i hate to say that i used eigen
< naywhayare>
I mean, if it does the job, it does the job
< naywhayare>
I imagine Eigen gives you more flexibility, but at the cost of the (reasonably) nice syntax that Armadillo has
< naywhayare>
I'm not particularly experienced with Eigen... I perused the docs enough to make a design decision against it years ago
< stephentu>
"Implementing an algorithm on top of Eigen feels like just copying pseudocode."
< stephentu>
haha
< stephentu>
hows the ICML going
< naywhayare>
I have an algorithm for k-means. it's reasonably fast, but it only really shines with large k and large datasets
< naywhayare>
I've got 52.5 hours to get simulations run on datasets that are "large enough"
< naywhayare>
so... we'll see...
< stephentu>
good luck
< naywhayare>
thanks... unfortunately, most of what I have to do is waiting
kshitijk has quit [Ping timeout: 240 seconds]
< stephentu>
prove some theorems while waiting?
< stephentu>
the way i see it
< stephentu>
your algorithm coudl either
< stephentu>
a) work in practice
< stephentu>
or b) have theoretical guarantees
curiousguy13_ has joined #mlpack
curiousguy13 has quit [Read error: Connection timed out]
< naywhayare>
stephentu: it is possible to have both :)
curiousguy13__ has joined #mlpack
curiousguy13_ has quit [Read error: Connection timed out]
stephentu_ has joined #mlpack
< stephentu_>
naywhayare: thats like a phd :)
curiousguy13__ has quit [Read error: Connection timed out]
curiousguy13__ has joined #mlpack
curiousguy13_ has joined #mlpack
curiousguy13__ has quit [Read error: Connection timed out]
stephentu_ has quit [Read error: Connection reset by peer]
curiousguy13__ has joined #mlpack
stephentu_ has joined #mlpack
curiousguy13_ has quit [Ping timeout: 256 seconds]
curiousguy13__ has quit [Read error: Connection timed out]
curiousguy13 has joined #mlpack
stephentu_ has quit [Ping timeout: 245 seconds]
stephentu_ has joined #mlpack
curiousguy13_ has joined #mlpack
curiousguy13 has quit [Read error: Connection timed out]
curiousguy13_ has quit [Read error: Connection timed out]