verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
tham has joined #mlpack
Mathnerd314_ is now known as Mathnerd314
kwikadi has quit [Remote host closed the connection]
kwikadi has joined #mlpack
< lozhnikov>
marcosirc: You're right, thanks. I'll try to do that.
nilay has joined #mlpack
Mathnerd314 has quit [Ping timeout: 264 seconds]
mentekid has joined #mlpack
< mentekid>
rcurtin: So everything got hashed to bucket 0. I would have never seen that... Cool, thanks :)
mentekid has quit [Ping timeout: 244 seconds]
mentekid has joined #mlpack
< mentekid>
rcurtin: I think we should do the same thing in returnIndicesFromTables, right? There's a similar problem there I think
< mentekid>
let me look at the code
tham has quit [Quit: Page closed]
< mentekid>
rcurtin: I think there's still some bug in the LSH code, my tests crash... Here's a backtrace: http://pastebin.com/Psvadsgp
< mentekid>
(I've added some markers every few lines of code to isolate the error)
< Karl_>
rcurtin_: sorry for not getting back. I got stuck with other things yesterday. I think my kernel isn't proper... I get negative eigenvalues
< Karl_>
zoq: if you want a beta tester let me know how to get the svd-pca code...
< Karl_>
zoq: or was it just the normal pca method?
< zoq>
Karl_: Thanks, I'll get back to you once it is finished.
< lozhnikov>
marcosirc: rcurtin: I opened a PR that contains some changes proposed by Marcos Pividori (RectangleTree::NumDescendants() optimization).
< lozhnikov>
mentekid: Hi, there is a segfault in LSHTest/NumTablesTest. Are you sure that you should use secondHashVectors[j] instead of secondHashVectors(i, j)? (lsh_search_impl.hpp:200 and 202)
mentekid has quit [Ping timeout: 246 seconds]
< lozhnikov>
rcurtin: The error appears in e6bc4b4.
mentekid has joined #mlpack
Mathnerd314 has joined #mlpack
marcosirc has joined #mlpack
< marcosirc>
lozhnikov: great, thanks.
< rcurtin>
lozhnikov: marcosirc: odd, I tested it on my system, I guess I did not run valgrind and now I pay the price :)
nilay_ has joined #mlpack
< mentekid>
rcurtin: I fixed what lozhnikov but I still get a segmentation fault at LSHTrainTest
< mentekid>
the other tests seem to run fine :/
< mentekid>
actually... In Train(), shouldn't secondHashTable be cleared when Train is called?
< rcurtin>
mentekid: I'm an idiot, I have the fix, hang on
nilay_ has quit [Ping timeout: 250 seconds]
< rcurtin>
actually, I don't quite have the fix, this is more complex than I thought
< rcurtin>
okay, fixed in eea2aa4, sorry for the issue
< mentekid>
ah thanks :) I'll finish the style changes and push the final multiprobe tests
< mentekid>
sorry multiprobe changes*
< rcurtin>
sounds good
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#1112 (master - eea2aa4 : Ryan Curtin): The build is still failing.
< sumedhghaisas>
I read through the paper paper... you are right
< sumedhghaisas>
Will spill trees we cannot guarantee the error...
< sumedhghaisas>
But I guess Ryan is right...
< sumedhghaisas>
Considering the popularity of Spill trees... I think we should implement it...
< sumedhghaisas>
we need to decide on the implementation...
< marcosirc>
Yeah, I agree.
< sumedhghaisas>
do you think we should implement a separate command line for defeatist search??
< marcosirc>
Ok, I have been thinking on the implementation.
< marcosirc>
Mm I don't think we should implement it as a separate command line program.
< marcosirc>
Maybe we can include it as a flag to the main mlpack_knn program...
< marcosirc>
It would be clearer this way, I think.. For benchmarks, etc.
< marcosirc>
we could print an error if epsilon value is specified for spill trees...
< sumedhghaisas>
hmmm...
< marcosirc>
But I don't have a strong preference... maybe we can start working implementing spill trees
< marcosirc>
and onces it is ready, we decided.
< sumedhghaisas>
flag does sound a viable option to me...
< sumedhghaisas>
yeah I agree...
< marcosirc>
yeah, maybe it will be confussing...
< sumedhghaisas>
We can also decide when spill tree implementation is ready...
< marcosirc>
Ok.
< marcosirc>
Regarding spill trees implementation.
< marcosirc>
I think it will be similar to binary space trees.
< marcosirc>
However, we need to manage the list of points differently. We are going to have overlapping nodes, so we can not use range of indexes of the main dataset's matrix as we do with binary trees.
< marcosirc>
I am thinking of having a general dataset instance (as we do with binary trees), and leaf nodes will hold a vector of indexes pointing to columns of that matrix.
< marcosirc>
(This is what I mentioned in the last email)
< marcosirc>
I think this will be the simplest/most efficient approach.
< sumedhghaisas>
yes... it does look simple...
< sumedhghaisas>
give me some time to think on it...
< marcosirc>
ok, sure!
< rcurtin>
marcosirc: I agree, I think vector of indices is the easiest way to go here
< nilay_>
so did you get the idea of the error this(randomized svd) technique has compared to normal svd
< zoq>
By reading some other realted papers. Right, now I'm not sure if I do something wrong, it looks like the QUIC-SVD method doesn't work if m=n
< nilay_>
so do we need to integrate r-svd with PCA::Apply or replace it. (if the error is less we might as well replace it?)
< nilay_>
or we still take components according to eigVal so it is correct always
< zoq>
I think what we could do here is to change the PCA method and let the user define which method he likes to use, right now we use exact svd, randomized svd is just an approximation. In case of edge boxes an approximation is totally fine.
< nilay_>
yes what i don't get is what do we lose by doing randomized svd as compared to when we do normal svd.
< zoq>
precision, in case of randomized svd, we just use parts of the full data matrix.
< zoq>
Probably I can work out a proof of concept ... perhaps in the next hours
< zoq>
I think in that case I'll have to figure out why the quic svd method only works when m < n.
< zoq>
maybe rcurtin can provide any insight?
< rcurtin>
hm, it has been a while since I thought about it
< rcurtin>
in this case, m is the number of returned eigenvectors, and n is the number of dimensions in the dataset?
< rcurtin>
ah I guess the matrix being decomposed is m x n
< zoq>
yeah, right
< rcurtin>
but the first paragraph of the paper says quic-svd works for m >= n, but not m < n
< zoq>
right
< rcurtin>
when m < n, we can just transpose the matrix and then once the SVD is done, we switch V and U
< rcurtin>
so I think maybe I don't understand what the issue is
< rcurtin>
maybe I am looking at the wrong part of the paper
< zoq>
maybe I used the wrong dimension, not sure right now, but I used m=n and it didn't work
< rcurtin>
hm, hang on, let me take a look at the code
< rcurtin>
what happens if you change quic_svd_impl.hpp:29 to be >= instead of just > ?
< zoq>
it's not urgent, there is another bug in my randomized svd implementation ...
< zoq>
I think I already tested >=, let's check again
< rcurtin>
yeah, if that does not work, can you open a bug on github?
< rcurtin>
if you want you could assign it to siddharth, but I don't know if he will be able to do anything, I am not sure how much time he has
< rcurtin>
I dunno if he'll even see an email, I haven't heard from him in a while :)
< rcurtin>
but I can take a look into it when I have some time (maybe a week or two, maybe more?)
< zoq>
:) I'll open a bug if I get too frustrated with the code.
< rcurtin>
yeah; the primary quic-svd code is in core/tree/cosine_tree/, not in methods/quic_svd/