verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
dhfromkorea has joined #mlpack
dhfromkorea has quit [Remote host closed the connection]
dhfromkorea has joined #mlpack
stephentu_ has quit [Ping timeout: 245 seconds]
< naywhayare> Squalluca: LARS stores the gram matrix internally (so, 189390x189390, which will be very large...)
< naywhayare> or, hang on... maybe 13233x13233 depending on if 13233 is the number of dimensions or the number of points
< Squalluca> 13233 is the number of points
< Squalluca> the bigger number are the dimensions
< naywhayare> hmm, I think it will be 13233x13233 then, so that should not be a problem
< Squalluca> so maybe i am passing the wrong matrix to LARS
< naywhayare> I think the bigger issue for RAM might be that LARS is storing the complete solution path (accessible via BetaPath()), which is std::vector<arma::vec> and each vector has length equal to the number of dimensions
< Squalluca> i understand, these values are all used in the computation or are them stored for other reason?
< Squalluca> i guess the are needed if you keep them
< naywhayare> I'm not the one who wrote LARS
< naywhayare> I'm taking a look at it now
< naywhayare> I don't *think* they're necessary, but let me glance at it some more...
< Squalluca> ok, thank you very much :D
< naywhayare> hm, okay, so what I think is that only the last two elements of betaPath are ever used
< naywhayare> in InterpolateBeta() (which is called once LARS converges), to calculate the final solution vector beta
< naywhayare> the code could probably be refactored to only hold the two most recent betas fairly easily (only lars.cpp uses betaPath)
< naywhayare> unfortunately I have a paper deadline on Friday so I can't look into it further, but my best guess is that that is where the huge memory usage is coming from
< naywhayare> once the paper deadline passes I'll have a little bit more time...
< Squalluca> the bool
< Squalluca> is called in 2 different
< Squalluca> ways
< Squalluca> transposeData
< Squalluca> and rowMajor
< naywhayare> oops, that's a documentation bug... let me fix that quickly
< naywhayare> the variable should be called transposeData
< Squalluca> so i should set it to false if my matrix is row-major?
< naywhayare> yes, but be aware that if you load data with mlpack::data::Load() and it is row-major on disk, the matrix will be transposed by default
< Squalluca> no i am getting data from an opencv matrix
< naywhayare> ah, okay
< naywhayare> fixed the documentation, thanks for pointing it out -- https://github.com/mlpack/mlpack/commit/b4c08074ca03feaf38511e62fd0e928330b99d93
< Squalluca> no problema and thanks, i'll look into it more, because i think the Gram matrix is 189390x189390, that is the dimension that crashes my program
< Squalluca> when it grows
< Squalluca> thank you again, you have been very helpful, cya.
< naywhayare> yeah, the gram matrix should be (number of points) x (number of points), unless I've got my logic backwards
< naywhayare> crap! I do have it backwards. so the gram matrix is 189390x189390
< naywhayare> there's not a very easy way to solve that problem unless you have 267GB of RAM or so :)
< naywhayare> and since my logic is backwards, each element in betaPath will be of length (number of points), so that's probably not the bulk of your memory usage
< Squalluca> mhhh
< Squalluca> : (
< Squalluca> iĺl try to figure something out
< naywhayare> sorry for the bad news...
< naywhayare> refactoring LARS to not require the whole Gram matrix in memory at once would be a very significant effort
< naywhayare> it might be better to think about doing some dimensionality reduction like PCA first, maybe?
< naywhayare> I don't know what the application is, so I don't know if that's a good or bad idea
< Squalluca> it is a work based on a paper called "blessing of dimensionality" it uses regression to regress to a low dimensional space from an really High one for face recognition
< Squalluca> it is intented to use high dimension
< Squalluca> in fact they use a coordinate descent method for regression, maybe that doesn't use the gram matrix?
< naywhayare> yeah, techniques that work in extremely high dimensions should definitely avoid calculating the explicit gram matrix
< naywhayare> is this the paper by D. Chen et al. at CVPR 2013? it looks interesting
< Squalluca> yes
stephentu_ has joined #mlpack
Squalluca has quit [Quit: Page closed]
dhfromkorea has quit [Remote host closed the connection]
stephentu_ has quit [Ping timeout: 252 seconds]
< stephentu> naywhayare: so i downloaded arma-4.0 from sourceforge
< stephentu> did make
< stephentu> and then i built mlpack via
< stephentu> cmake -DARMADILLO_LIBRARY=... -DARMADILLO_INCLUDE_DIR=...
< stephentu> oh i also modified the config.hpp to match my system install one as close as possible
< stephentu> and now i get
< stephentu> but its weird b/c my system config.hpp
< stephentu> uses this wrapper stuff
< stephentu> any hints
< stephentu> i'm on arch linux
< stephentu> slash if you have a better process
< stephentu> for building w/ older versons
< stephentu> *versions
< stephentu> i'm all ears
jbc__ has quit [Quit: jbc__]
jbc_ has joined #mlpack
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
curiousguy13 has quit [Read error: Connection timed out]
< naywhayare> stephentu: what did you set -DARMADILLO_LIBRARY to?
< naywhayare> it should be the path to libarmadillo.so, not the path to the directory containing it
< naywhayare> (that's my first guess)
< naywhayare> oh
< naywhayare> I bet I know what's gone wrong
< naywhayare> so, this is kind of an oddity and I'm not responsible for it. during the armadillo build, it uses CMake to generate a config.hpp, and places it, along with armadillo and armadillo_bits/, into "tmp/include/"
< naywhayare> so ARMADILLO_INCLUDE_DIR should be /path/to/armadillo/tmp/include/, if you only did 'make' and not 'make install'
< naywhayare> but the library is still in /path/to/armadillo/libarmadillo.so...
< stephentu> oooo
< stephentu> that might explain it!
< stephentu> ill try a make install
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
jbc_ has quit [Quit: jbc_]
kshitijk has joined #mlpack
vedhu63w has joined #mlpack
vedhu63w has quit [Remote host closed the connection]
dhfromkorea has joined #mlpack
dhfromkorea has quit [Ping timeout: 265 seconds]
udit_s has joined #mlpack
curiousguy13 has joined #mlpack
udit_s has quit [Ping timeout: 252 seconds]
kshitijk has quit [Ping timeout: 264 seconds]
govg has quit [Ping timeout: 256 seconds]
govg has joined #mlpack
curiousguy13 has quit [Ping timeout: 265 seconds]
udit_s has joined #mlpack
curiousguy13 has joined #mlpack
curiousguy13 has quit [Ping timeout: 265 seconds]
kshitijk has joined #mlpack
curiousguy13 has joined #mlpack
stephentu has quit [Quit: Lost terminal]
kshitijk has quit [Ping timeout: 264 seconds]
curiousguy13 has quit [Ping timeout: 256 seconds]
udit_s has quit [Remote host closed the connection]
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 245 seconds]
kshitijk has joined #mlpack
curiousguy13 has joined #mlpack
jbc_ has joined #mlpack
stephentu has joined #mlpack
curiousguy13 has quit [Ping timeout: 252 seconds]
kshitijk has quit [Ping timeout: 245 seconds]
curiousguy13 has joined #mlpack
< stephentu> naywhayare: the problem was somebody put a non-symmetric cov matrix
< stephentu> for the gaussian tests
< stephentu> and then i think the ifdef thing you did
< stephentu> caused it to factorize it differently
< stephentu> good times
kshitijk has joined #mlpack
< naywhayare> "2 1.5; 1 4"
< naywhayare> what was I thinking? :(
< naywhayare> well, thanks for digging to the bottom of it :)
< stephentu> naywhayare: i'm surprised the cholesky call didnt fail
< stephentu> maybe it just looks at the lower triangle
< stephentu> or something
< naywhayare> yeah; from the LAPACK documentation:
< naywhayare> If UPLO = 'U', the leading N-by-N upper triangular part of A contains the upper triangular part of the matrix A, and the strictly lower triangular part of A is not referenced.
< naywhayare> (the opposite applies for UPLO = 'L')
< stephentu> should symmetric matrices be a separate type?
< stephentu> (this is a design question)
< stephentu> (not an actual suggestion)
< naywhayare> I wouldn't be opposed to the idea, especially because you can represent the matrix in far less space
< naywhayare> but I think in this case we're constrained by what LAPACK and BLAS want, which is an NxM block of memory, so being clever with the storage doesn't get you anything
< naywhayare> personally I would be more interested in a way to "mark" matrices as symmetric
< stephentu> actually i was athinking about that
< stephentu> like i'm wondering if there should be these bits
< stephentu> in every matrix
< stephentu> like IS_UPPER_TRIANGULAR
< stephentu> IS_SYMMETRIC
< stephentu> IS_DIAGONAL
< naywhayare> but your choices there in C++ seem to be (a) a runtime member... increases sizeof(mat), bad; (b) specify it as a template parameter, but now you have a thousand template parameters and the syntax starts to resemble Eigen which in my opinion is overcomplex
< stephentu> hey eigen only has 4
< stephentu> template parameters
< naywhayare> (c) use some kind of expression to mark it... "this_matrix_is_diagonal(matrix)"
< stephentu> or maybe 5
< stephentu> i cant remember
< stephentu> :)
< naywhayare> :)
< stephentu> i actually really like eigen
< naywhayare> either way I think Eigen is overcomplex and the learning curve is pretty steep, which is why I originally chose Armadillo
< stephentu> i'm starting a new project and i hate to say that i used eigen
< naywhayare> I mean, if it does the job, it does the job
< naywhayare> I imagine Eigen gives you more flexibility, but at the cost of the (reasonably) nice syntax that Armadillo has
< naywhayare> I'm not particularly experienced with Eigen... I perused the docs enough to make a design decision against it years ago
< stephentu> "Implementing an algorithm on top of Eigen feels like just copying pseudocode."
< stephentu> haha
< stephentu> hows the ICML going
< naywhayare> I have an algorithm for k-means. it's reasonably fast, but it only really shines with large k and large datasets
< naywhayare> I've got 52.5 hours to get simulations run on datasets that are "large enough"
< naywhayare> so... we'll see...
< stephentu> good luck
< naywhayare> thanks... unfortunately, most of what I have to do is waiting
kshitijk has quit [Ping timeout: 240 seconds]
< stephentu> prove some theorems while waiting?
< stephentu> the way i see it
< stephentu> your algorithm coudl either
< stephentu> a) work in practice
< stephentu> or b) have theoretical guarantees
curiousguy13_ has joined #mlpack
curiousguy13 has quit [Read error: Connection timed out]
< naywhayare> stephentu: it is possible to have both :)
curiousguy13__ has joined #mlpack
curiousguy13_ has quit [Read error: Connection timed out]
stephentu_ has joined #mlpack
< stephentu_> naywhayare: thats like a phd :)
curiousguy13__ has quit [Read error: Connection timed out]
curiousguy13__ has joined #mlpack
curiousguy13_ has joined #mlpack
curiousguy13__ has quit [Read error: Connection timed out]
stephentu_ has quit [Read error: Connection reset by peer]
curiousguy13__ has joined #mlpack
stephentu_ has joined #mlpack
curiousguy13_ has quit [Ping timeout: 256 seconds]
curiousguy13__ has quit [Read error: Connection timed out]
curiousguy13 has joined #mlpack
stephentu_ has quit [Ping timeout: 245 seconds]
stephentu_ has joined #mlpack
curiousguy13_ has joined #mlpack
curiousguy13 has quit [Read error: Connection timed out]
curiousguy13_ has quit [Read error: Connection timed out]
curiousguy13_ has joined #mlpack