verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< marcosirc> quit
marcosirc has quit [Quit: WeeChat 1.4]
nilay has joined #mlpack
< nilay> max( M, dim ) : For matrix M, return the extremum value for each column (dim=0), or each row (dim=1), my question is can we return index of maximum value using arma?
tham has joined #mlpack
< tham> yes, we can
< nilay> tham: it returns the min and max number. i am asking can it return index of min and max number
< nilay> ok yes it can return index also
< nilay> thanks
< nilay> so can we do something like max(M, dim, location)?
< nilay> be right back
< tham> nilay : I think we can't, atleast I cannot find api like that exist
< rcurtin> min and max with indices can be kind of irritating...
< rcurtin> I have never liked the armadillo APIs for this
< rcurtin> if you use subviews, like matrix.col(i), the .max(location) function is not available (at least last time I checked)
< rcurtin> so for columns you can at least do this workaround:
nilay has quit [Ping timeout: 250 seconds]
< rcurtin> vec a = matrix.unsafe_col(i); a.max(location);
< rcurtin> hopefully that is helpful
nilay has joined #mlpack
< nilay> tham rcurtin: i guess then i will write own functions with loops.
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
nilay has quit [Quit: Page closed]
tham has quit [Ping timeout: 250 seconds]
< mentekid> rcurtin: I am done with the probing sequence generation (just tested that it works), now I just need to incorporate it into ReturnIndicesFromTables and we'll have Multiprobe running :)
< mentekid> question: I was thinking we might find it useful to have a recall estimation somewhere in LSH. I'm thinking of something like the user specifying "ground truth" file and we estimate what % of neighbors LSH found. I've implemented a lazy function that does that semi-correctly for LSH testing, but maybe it would be useful to have around
< mentekid> I say semi-correctly because the way I have it now, if real neighbors are 1, 2, 3, 4, 5 and LSH finds 2, 3, 4, 5, 6, that lazy implementation will only compare 1 to 2, 2 to 3 and so on, so it will say recall is 0 (I just realized a few days ago...)
< mentekid> is there something in mlpack that does something similar? Or should I write it? Also, should I make it part of LSH or part of some other module, since it's more generic than LSH (we can also use it with other approximate nearest neighbor functions)
< mentekid> I think it will be useful both for testing multiprobe and later for the tuning algorithm
< mentekid> what do you think?
< rcurtin> mentekid: I agree that this could be useful, but I am not sure how exactly to do recall estimation for a command-line program
< rcurtin> the Hoeffding tree code (or some local branch of it I have) will take a test set and optionally test labels
< rcurtin> and if test labels are given, it will calculate the accuracy of the tree on the test set and print it
< rcurtin> so possibly a similar thing could be done for LSH and approximate NN search implementations: an optional --true_query_distances or --true_query_neighbors file (not sure which)
< rcurtin> and if that is specified then recall is calculated and printed
< mentekid> yes that's what I had in mind, the function will take a matrix and the command-line program will take an optional "truth file" parameter
< rcurtin> yeah, that seems reasonable to me
< mentekid> the thing is I calculated recall thinking it was the same as accuracy, but it's more "relaxed" in the sense that order doesn't matter... It's a set intersection actually - how much do sets A and B have in common
< mentekid> cool - I'll do it that way then :)
< rcurtin> yeah, I think there are multiple ways to calculate recall here because the nearest neighbor sets are ordered
< rcurtin> but the set intersection approach you just described to me sounds fine
sumedhghaisas has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#842 (master - ec1c46a : Ryan Curtin): The build was broken.
travis-ci has left #mlpack []
ayan_ has joined #mlpack
ayan_ has quit [Client Quit]
< mentekid> rcurtin: I have some style/documentation questions for you when you
< mentekid> a) I've documented my function where it is declared in lsh_search.hpp. Is there anywhere else I should add documentation?
< mentekid> b) I've made a bunch of small inline functions, all of them declared and defined directly in lsh_search_impl.hpp. Should I move their declaration to lsh_search.hpp and document them? I think it will create clutter since they only perform small actions on stuff needed for my most important function
< rcurtin> mentekid: I think it would be easier to comment if you opened a PR
< rcurtin> also I am getting lunch in a second so I need to step out :)
< rcurtin> I think, if there is an LSH tutorial (I don't think there is), you should add documentation for your function there
< rcurtin> but I don't think there is anywhere else to do it, except maybe add a note to HISTORY.txt
< rcurtin> I'll be back in about an hour
< mentekid> ah sure. My code is not really ready that's why I haven't opened one yet. I'll do that now. Have a nice lunch, we'll talk later
< rcurtin> hmm I guess also I could just comment on your fork
< rcurtin> if that would work better, I'll do that
< mentekid> that might be better. This is my fork: https://github.com/mentekid/mlpack/tree/MultiprobeLSH
< mentekid> I just noticed I have a bug when T > numProj, so don't compile it yet :P
< mentekid> I fixed it, should be working now (still no mlpack_test changes yet, they're next on my list)
< rcurtin> the inline functions in lsh_search_impl.hpp are fine with me if they won't be used anywhere else
< rcurtin> there are some style issues but I can be picky about those once you open a PR, they are easy to fix :)
< mentekid> Yeah I'll look into style again more carefully once everything else is done, as I told you the code is not camera-ready yet :)
< rcurtin> sure :)
nilay has joined #mlpack
Wizzle has joined #mlpack
< Wizzle> Can mlpack be installed on a local windows machine?
< Wizzle> I'm running windows 10
< Wizzle> I saw the documentation for Linux but none for a windows environment.
< nilay> zoq: for the cython functions implemented in the python codes, do we also use openmp in mlpack? or do i just vectorize them as much as i can?
< nilay> i am talking about histogram_core() and pdist_core()
< rcurtin> Wizzle: that should be possible although I have never tried it for Windows 10
< rcurtin> Wizzle: I just linked to the build artifacts for the most recent build of mlpack on Windows (which we do using AppVeyor)
< rcurtin> maybe that will work for your needs?
< Wizzle> rcurtin: Thanks and greatly appreciate the reply. New to mlpack and good to see that their is active community
< rcurtin> sure, feel free to ask questions if you need any help :)
< rcurtin> not many of us are on Windows (but there are a few), so it would not be entirely unexpected if you encounter problems
< rcurtin> but if you do, we can try and fix them :)
< Wizzle> K, cool. I might migrate to linux. Just want to give a shot in a windows environment as it's already setup. Just want to test it out first
< Wizzle> Once again, thanks for the help
< rcurtin> sure, no problem :)
< mentekid> I'm calling it a day for today, I added ComputeRecall and the truth file option in the executable. From a first look everything works as it should... but I have many doubts regarding how efficient what I've written is. I'll look into that first thing tomorrow :D
< mentekid> actually second thing, first thing is writing the tests
< rcurtin> mentekid: yeah, I saw that std::vector was used in many places, it may be faster to use arma::vec
< rcurtin> when you know the length of the vector, that is
< rcurtin> that may be a minor runtime issue though, compared to the acceleration you have in mind, I dunno :)
< rcurtin> have a great weekend! Monday is a US holiday ("memorial day") so I will not be 100% available
< mentekid> my main concern is that whole heap of pairs thing... I'm not sure if we can but I have a hunch we should avoid it as it might impact performance.
< rcurtin> I will be mostly in a car so I should be somewhat available though
< rcurtin> yeah
< rcurtin> so I will wait until your tests are done and then take a look at it and think if I can think of anything better, if that is okay with you
< mentekid> Also I got to keeping the find() instead of unique(). My reasoning is, since we're going to be getting bigger reference sets, unique might end up being more expensive. So once everything else is set, i'll rerun those tests we did and decide on one or the other
< rcurtin> ah, bigger reference sets for multiprobe, you mean?
< mentekid> I think the hybrid will complicate the code too much and nobody will ever touch it again (my code already did that a bit)
< rcurtin> yeah it will be like the tree code which it seems like only marcos is brave enough to touch :)
< mentekid> yeah, since we're looking into more buckets, each query will get a bigger reference set
< zoq> nilay: Hello, I think it's a good idea, to think about optimization e.g by using openmp later, however you can certainly use OpenMP.
< mentekid> on the other hand (in theory) users will ask for fewer tables... so reference sets might get smaller
< mentekid> I haven't seen the tree code but sounds scary :P
< mentekid> Anyway, I think what remains I can do without bothering you too much (unlike today) so enjoy your long weekend :)
< nilay> zoq: so what should i do for now, just vectorize and write, avoid loops?
< zoq> nilay: sounds good
< nilay> zoq: what i am basically asking is, if we write in a vectorized fashion, is it parallelizable
< nilay> or we need to write loop as free as it can be, to parallelize better.
< nilay> with openmp, for example
< rcurtin> mentekid: you have not bothered me very much at all, I am happy to help out, so feel free to involve me more if you like :)
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
< zoq> ilay: That depends, on the code, but as I said, I think we can optimize critical sections later, maybe test which way is the superior and go with that. So I would go with the vectorized version for now. Does this sound reasonable?
sumedhghaisas has quit [Remote host closed the connection]
< nilay> zoq: yes it does.
nilay has quit [Quit: Page closed]
marcosirc has joined #mlpack
Wizzlw has joined #mlpack
Wizzlw has quit [Client Quit]
< Wizzle> Downloaded mlpack for windows and placed them in my project directory. What are the headers I need to use to access the library?
< Wizzle> #include <mlpack/core.hpp>/
< Wizzle> or do I need to use "using namespace mlpack;"
mentekid has quit [Ping timeout: 276 seconds]
< Wizzle> I guess I just need some documentation on how to install this in a windows environment (Visual Studios 13)
< zoq> Wizzle: You could use the appveyor config as a basic step by step guide: https://github.com/mlpack/mlpack/blob/master/.appveyor.yml Also, you can download all header files at: mlpack.org or github.
< Wizzle> K, thanks!
< zoq> Wizzle: Sure, you can always ask here, if you need any further clarification.