verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
marcosirc has joined #mlpack
TD has quit [Quit: Page closed]
marcosirc has quit [Quit: WeeChat 1.4]
tsathoggua has quit [Quit: Konversation terminated!]
< tham>
The imputer take a lot of parameters, maybe encapsulate it with class is a better choice
govg has quit [Quit: leaving]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
tham has quit [Quit: Page closed]
< mentekid>
rcurtin: I got your code, running the benchmarks now (with default parameters)
< mentekid>
The results you got are impressive so this might actually be a very good optimization :D
marcosirc has joined #mlpack
< rcurtin>
mentekid: yeah I am happy with it. I noticed in a few cases hash table construction slows down a bit but the runtime difference is so small I don't think it's worth looking into very far
< rcurtin>
there moght be a possibility for further acceleration but I have not thought about how. OpenMP would probably be pretty easy to apply
< rcurtin>
*might, not moght :)
< mentekid>
Ok just ran my tests, I only run Corel, covertype, phy, pokerhand and miniboone
< mentekid>
I see what you're saying about construction, but yeah there's more than significant speedup in the search so it doesn't really matter I think
< mentekid>
the only dataset where the optimized version is a bit slower for me is pokerhand
< mentekid>
but that's just 3 seconds, from 15 to 18
< rcurtin>
really, pokerhand was slower? did you run with different parameters?
< mentekid>
no I ran everything with default
< mentekid>
ah but wait
< rcurtin>
pokerhand takes like 0.7s on my machine, maybe you are running with debug symbols? :)
< mentekid>
I'm running with only a small query set
< rcurtin>
yeah, I was just using identical query and reference sets for simplicity
< rcurtin>
wait, what? I have the wrong pokerhand set... only 25k points, not 1M
< rcurtin>
oops... let me try again then
< mentekid>
ah
< mentekid>
I'll recompile too so I'm sure I didn't include debug/profile symbols, though it's in the test directory so they should be off
< rcurtin>
it might take a little while to get the pokerhand numbers, it is a large dataset
< rcurtin>
I'll let you know what I find out when they are done
< mentekid>
that's why I'm running only a few thousand queries but let's wait for the full thing since you got it running
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#965 (master - 29d4331 : Ryan Curtin): The build was broken.
< mentekid>
roughly speaking, corel is faster without single probe, phy is so-and-so
< mentekid>
sift, gist and miniboone I'd say multiprobe is better
< mentekid>
this is without your hash optimization, but I am assuming it will benefit both roughly the same
< rcurtin>
mentekid: master took 39m, lshopt 18n
< rcurtin>
*18m
< rcurtin>
the hash optimization should be orthogonal to multiprobe, I agree
< rcurtin>
let me look into the multiprobe issue in a bit, you are right that those results seem unexpected
< mentekid>
thanks :) I'm running it on profile mode now as well so I can get a callgraph and see where the problem might be
nilay has joined #mlpack
< mentekid>
sorry corel is faster without multiprobe is what I meant
< mentekid>
rcurtin: first batch of results seem to indicate it's not the problem of the implementation but the increased selectivity
< mentekid>
Actually I'll group everything into a nice pdf and mail them to you I think it's easier that way
< rcurtin>
ok, thanks
< rcurtin>
lozhnikov: you are right, I was thinking about the refactoring today, and it is indeed not as straightforward as I had thought
< rcurtin>
let me think about it a little more and get back to you
< lozhnikov>
rcurtin: ok, thanks
nilay has quit [Ping timeout: 250 seconds]
< rcurtin>
I knew that BaseCase() would need to be refactored a bit, but I did not think about the neighbor indices
< rcurtin>
this makes me wonder, if the idea of a 'localDataset' is even a good one
< rcurtin>
if we have to make all of the Rules classes way more complex, then maybe it is better to just say that when adding points to the RectangleTree, we have to use insert_cols()
< rcurtin>
and we could offer a function called "InsertPoints()" for the RectangleTree that could insert many points at once, to avoid calling insert_cols() multiple times
< rcurtin>
that's the best I can think of right now, I will keep thinking
< rcurtin>
I'm sorry that I thought this would be simple and it turned out to be hard, so I guess I wasted a lot of your time :(
< lozhnikov>
rcurtin: Hm.. If we have not got the global dataset, we can not easily get any particular point using its number. But if we have both 'dataset' and 'localDataset' we can easily map size_t to arma::Col using numNonDatasetPoints. It should take log(tree depth) operations for any non-dataset point.
mentekid has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
@marcosirc Hey marcos
< marcosirc>
sumedhghaisas: Hi sumedh!
< sumedhghaisas>
@marcosirc I am just looking into your pull request...
< marcosirc>
ok
< sumedhghaisas>
Some checks haven't passed yet... also last time some check was failing...
< sumedhghaisas>
didn't get time to look at it...
< sumedhghaisas>
Do you remember what was it??\
< marcosirc>
I have forced a rebase to restart the building process.
< marcosirc>
it was not a problem related to my changes
< marcosirc>
it was an external prolem
< marcosirc>
*problem.
< sumedhghaisas>
ohh okay... no problem then... I think I have already looked at the code changes...
< sumedhghaisas>
I will just go over the tests ...
< marcosirc>
Ok
< sumedhghaisas>
@marcosirc In the basic test where you are comparing the results with AKNN with KNN ...
< sumedhghaisas>
I am confused about 'epsilon * 100' as a valid range??
< marcosirc>
hi, it is a percentage, 0-100%
< marcosirc>
I translate epsilon, from the rant [0,1] to [0,100]
< marcosirc>
*range
< marcosirc>
AAnyway, I removed that lines in the next commit.
< marcosirc>
:)
< sumedhghaisas>
but I thought BOOST_REQUIRE_CLOSE asks for absolute tolerance...?
< sumedhghaisas>
ohh I was going commit by commit :)
< sumedhghaisas>
ahh you implemented REQUIRE_RELATIVE _ERROR for that...
mentekid has joined #mlpack
< sumedhghaisas>
and yes we should add a file dedicated for extra text macros...
< sumedhghaisas>
old_boost_definitions is just for compatibility...
< marcosirc>
It is not absolute erro, it uses Knuth's relative error formula.
< marcosirc>
Ok, nice to know you agree.
< marcosirc>
yeah!
< sumedhghaisas>
you mean the one he is given in art of programming book??
< rcurtin>
I'd say yeah we should put it in something like require_relative_error.hpp or something, since it's not an old boost test definition
< rcurtin>
but I guess whatever filename is fine
< rcurtin>
maybe either of you have a better idea :)
< sumedhghaisas>
but dedicating a header for this small macro?? I like the idea of test_tools.hpp... thats way such small hacks can be stored together...
< marcosirc>
yeah, I found that in the boost documentation.
< marcosirc>
Yeah.. maybe they could be placed in the same file..
< rcurtin>
I agree, test_tools.hpp would be nice
govg has quit [Quit: leaving]
< sumedhghaisas>
@marcosirc: tests look solid... hate to point this out but some tests are line commented rather than function but I can fix that while merging ... not a problem :) Do you think we should squash commit 'Replace CLOSE by CLOSE_FRACTION.' ?
< sumedhghaisas>
I am not sure what policy ryan follows here... squashing this commit will have better history as it is later replaced by another function altogether...
< marcosirc>
Ok. No problem, I can remove that commit.
< marcosirc>
If you prefer, I can fix the comments.
< marcosirc>
I copied them from knn_test.hpp
mentekid has quit [Ping timeout: 240 seconds]
< sumedhghaisas>
either way is fine for me... :)
< sumedhghaisas>
seems like AppVeyor is still running...
< rcurtin>
I generally don't squash commits, I like when commits are easier to understand because they are small
< sumedhghaisas>
yeah thats true but then it might bloat the history...
nilay has joined #mlpack
< sumedhghaisas>
also that commit serves no purpose anymore... the code changes in that commit are changing function A to B ... then in the next commit it is changed from B to C... so I thought lets put A to C ... :)
< rcurtin>
sure, I don't disagree with that
< rcurtin>
I think there is lots of bloat in the history anyway, so it is no huge deal either way :)
< sumedhghaisas>
yeah thats true...
< sumedhghaisas>
about that code redundancy thing....
< sumedhghaisas>
since we are deciding on dynamic values of bool... there has to be some dynamic resolution required right??
< marcosirc>
sumedh: sorry, I don't understand what you mean to say.
< marcosirc>
which part of the code are you talking about?
< sumedhghaisas>
@marcosirc Ryan suggested some CRTP to solve the problem... but I don't think that is possible...
< marcosirc>
Ahhh yeah.. about ns_model.
< sumedhghaisas>
I like the idea of boost variant...
< sumedhghaisas>
ohh yes about ns_model
< sumedhghaisas>
boost variant is a good option...
< marcosirc>
I have implemented some changes, using an abstract class.
< marcosirc>
I will push them so you can see.
< sumedhghaisas>
with boost variant?
< sumedhghaisas>
ohh with interface you mean...
< marcosirc>
No, using inheritance...
< marcosirc>
I have been considering many options
< sumedhghaisas>
I am not actually sure which is faster... inheritance or boost variant...
< marcosirc>
but most of them resulted in a lot of lines of code...
< marcosirc>
Yes.. I really didn't have enough time to look into boost variant
< sumedhghaisas>
inheritance solutions will involve lot of code... also all newer trees have to follow certain rules...
< sumedhghaisas>
on other side boost variant is elegant...
< sumedhghaisas>
but do not how it fares in speed]
< marcosirc>
Not necessary every new tree means more code..
< marcosirc>
The main problem can be summarized to:
< marcosirc>
"we want to set the leaf size of trees in neighbor search, and the NeighborSearch Class doesn't provide that option"
< marcosirc>
This is the reason of the difference in the code, between KDTrees/BallTrees and the rest of the tree types.
< marcosirc>
So I implemented a NeighborSearchLeaf class that encapsulates an instance of NeighborSearch class, and adds the functionality to deal with different leafSizes
< marcosirc>
I have to fix the NSModel::Serialize() method yet..
< sumedhghaisas>
@marcosirc hmm... this a new way to think about it... its too late at night in India :) do you mind if we continue this discussion tomorrow??
< marcosirc>
Yes! Sure!
< marcosirc>
I will read about boost variants and CRTP for tomorrow
< marcosirc>
So we can compare/contrast.
< sumedhghaisas>
thanks :) I will look at the code tomorrow... we can discuss about the options tomorrow...
< sumedhghaisas>
I agree...
< sumedhghaisas>
goo night :)
< sumedhghaisas>
*good
< marcosirc>
the same for you!
< marcosirc>
Ahh appveyor succeeded.
< marcosirc>
so, I am going to make the changes you proposed.
< rcurtin>
marcosirc: although right now it is only the ball tree and kd tree that support different leaf size options, other trees that we add in the future may also have other options we want to consider
< marcosirc>
rcurtin: yeah, only binary trees, as far as I understand.
< rcurtin>
no, what I mean is, it is possible that someday we may want to add a user-facing option to control a different part of a different tree
< marcosirc>
maybe I could modify the code to include a leafSize member in NeighborSearch class, and then modify the BuildTree function to decide, depending on the type of the tree, if using or no a leafSize parameter
< rcurtin>
like maybe... if we add vantage point trees, then maybe we want the user to be able to specify the size of the inner and outer balls
< rcurtin>
or actually spill trees are maybe a better example, we want them to be able to specify the amount of overlap or something like this
< rcurtin>
I haven't taken a look too closely at the changes you made though, I hope to have some time tomorrow (but I might not)
< marcosirc>
I have been thinking about that. We should provide an interface to set tree options...
< marcosirc>
Ok.
< rcurtin>
another thing worth considering, is that the only reason NSModel exists is for the mlpack_knn and mlpack_kfn programs, so we don't need to be too concerned with providing NSModel as a public interface, because that is not what it is meant to be
< rcurtin>
(hence why it does not provide all the overloads of Search() that NeighborSearch does)
< marcosirc>
Yeah, I totally agree.
< marcosirc>
rcurtin: I have pushed the changes proposed by sumedh.
< rcurtin>
okay, I'll take a look through the PR on the train tomorrow and make any comments
< marcosirc>
I have to leave now for some hours
< rcurtin>
okay, have a good evening :)
< rcurtin>
I think that any comments I have will be pretty simple
< marcosirc>
Ok, thanks! the same for you!
marcosirc has quit [Quit: WeeChat 1.4]
< rcurtin>
nilay: I guess you deleted the test_branch branch? we should make feature branches in our own forks, like in nilayjain/mlpack:test_branch, not mlpack/mlpack:test_branch
< rcurtin>
if I can clarify anything about that process, please let me know, I am happy to help :)
< nilay>
yeah, i did the same mistake again, i intended to push it to my fork
< rcurtin>
no worries, just let me know if I can help out if you are having git trouble :)
< rcurtin>
it took me a long time to learn git and I still learn new things every day, it is a complex tool :)