#mlpack on 2016-06-09 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:20 marcosirc has joined #mlpack

00:42 TD has quit [Quit: Page closed]

02:25 marcosirc has quit [Quit: WeeChat 1.4]

03:21 tsathoggua has quit [Quit: Konversation terminated!]

03:41 govg has quit [Ping timeout: 272 seconds]

04:11 karthikabinav has joined #mlpack

04:11 karthikabinav has quit [Client Quit]

04:11 nilay has joined #mlpack

04:42 govg has joined #mlpack

05:25 gtank has quit [Ping timeout: 260 seconds]

05:30 gtank has joined #mlpack

07:14 Mathnerd314 has quit [Ping timeout: 276 seconds]

07:44 nilay has quit [Quit: Page closed]

08:42 tham has joined #mlpack

08:42 < tham> keonkim : some idea about imputation--http://pastebin.com/0NTRtYKz

08:43 < tham> The imputer take a lot of parameters, maybe encapsulate it with class is a better choice

08:54 govg has quit [Quit: leaving]

08:55 govg has joined #mlpack

08:56 govg has quit [Client Quit]

08:56 govg has joined #mlpack

09:29 tham has quit [Quit: Page closed]

12:49 < mentekid> rcurtin: I got your code, running the benchmarks now (with default parameters)

12:50 < mentekid> The results you got are impressive so this might actually be a very good optimization :D

13:26 marcosirc has joined #mlpack

13:45 < rcurtin> mentekid: yeah I am happy with it. I noticed in a few cases hash table construction slows down a bit but the runtime difference is so small I don't think it's worth looking into very far

13:46 < rcurtin> there moght be a possibility for further acceleration but I have not thought about how. OpenMP would probably be pretty easy to apply

13:46 < rcurtin> *might, not moght :)

13:49 < mentekid> Ok just ran my tests, I only run Corel, covertype, phy, pokerhand and miniboone

13:50 < mentekid> I see what you're saying about construction, but yeah there's more than significant speedup in the search so it doesn't really matter I think

13:50 < mentekid> the only dataset where the optimized version is a bit slower for me is pokerhand

13:51 < mentekid> but that's just 3 seconds, from 15 to 18

13:52 < rcurtin> really, pokerhand was slower? did you run with different parameters?

13:53 < mentekid> no I ran everything with default

13:53 < mentekid> ah but wait

13:53 < rcurtin> pokerhand takes like 0.7s on my machine, maybe you are running with debug symbols? :)

13:53 < mentekid> I'm running with only a small query set

13:54 < rcurtin> yeah, I was just using identical query and reference sets for simplicity

13:54 < rcurtin> wait, what? I have the wrong pokerhand set... only 25k points, not 1M

13:54 < rcurtin> oops... let me try again then

13:54 < mentekid> ah

13:55 < mentekid> I'll recompile too so I'm sure I didn't include debug/profile symbols, though it's in the test directory so they should be off

13:57 < rcurtin> it might take a little while to get the pokerhand numbers, it is a large dataset

13:57 < rcurtin> I'll let you know what I find out when they are done

13:58 < mentekid> that's why I'm running only a few thousand queries but let's wait for the full thing since you got it running

14:27 travis-ci has joined #mlpack

14:27 < travis-ci> mlpack/mlpack#965 (master - 29d4331 : Ryan Curtin): The build was broken.

14:27 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/28a49fa829ca...29d43319f1a3

14:27 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/136439345

14:27 travis-ci has left #mlpack []

14:35 nilay has joined #mlpack

15:20 nilay has quit [Ping timeout: 250 seconds]

15:22 Mathnerd314 has joined #mlpack

16:06 < mentekid> rcurtin: I'm getting conflicting results for multiprobe: http://pastebin.com/bn0kGyRn

16:09 < mentekid> roughly speaking, corel is faster without single probe, phy is so-and-so

16:10 < mentekid> sift, gist and miniboone I'd say multiprobe is better

16:10 < mentekid> this is without your hash optimization, but I am assuming it will benefit both roughly the same

16:17 < rcurtin> mentekid: master took 39m, lshopt 18n

16:17 < rcurtin> *18m

16:18 < rcurtin> the hash optimization should be orthogonal to multiprobe, I agree

16:20 < rcurtin> let me look into the multiprobe issue in a bit, you are right that those results seem unexpected

16:27 < mentekid> thanks :) I'm running it on profile mode now as well so I can get a callgraph and see where the problem might be

16:46 nilay has joined #mlpack

16:53 < mentekid> sorry corel is faster without multiprobe is what I meant

17:10 < mentekid> rcurtin: first batch of results seem to indicate it's not the problem of the implementation but the increased selectivity

17:11 < mentekid> Actually I'll group everything into a nice pdf and mail them to you I think it's easier that way

17:41 < rcurtin> ok, thanks

17:42 < rcurtin> lozhnikov: you are right, I was thinking about the refactoring today, and it is indeed not as straightforward as I had thought

17:42 < rcurtin> let me think about it a little more and get back to you

17:47 < lozhnikov> rcurtin: ok, thanks

17:53 nilay has quit [Ping timeout: 250 seconds]

17:55 < rcurtin> I knew that BaseCase() would need to be refactored a bit, but I did not think about the neighbor indices

17:55 < rcurtin> this makes me wonder, if the idea of a 'localDataset' is even a good one

17:55 < rcurtin> if we have to make all of the Rules classes way more complex, then maybe it is better to just say that when adding points to the RectangleTree, we have to use insert_cols()

17:56 < rcurtin> and we could offer a function called "InsertPoints()" for the RectangleTree that could insert many points at once, to avoid calling insert_cols() multiple times

17:56 < rcurtin> that's the best I can think of right now, I will keep thinking

17:56 < rcurtin> I'm sorry that I thought this would be simple and it turned out to be hard, so I guess I wasted a lot of your time :(

18:08 < lozhnikov> rcurtin: Hm.. If we have not got the global dataset, we can not easily get any particular point using its number. But if we have both 'dataset' and 'localDataset' we can easily map size_t to arma::Col using numNonDatasetPoints. It should take log(tree depth) operations for any non-dataset point.

18:16 mentekid has quit [Ping timeout: 240 seconds]

18:44 sumedhghaisas has joined #mlpack

18:44 < sumedhghaisas> @marcosirc Hey marcos

18:45 < marcosirc> sumedhghaisas: Hi sumedh!

18:46 < sumedhghaisas> @marcosirc I am just looking into your pull request...

18:46 < marcosirc> ok

18:46 < sumedhghaisas> Some checks haven't passed yet... also last time some check was failing...

18:46 < sumedhghaisas> didn't get time to look at it...

18:47 < sumedhghaisas> Do you remember what was it??\

18:47 < marcosirc> I have forced a rebase to restart the building process.

18:47 < marcosirc> it was not a problem related to my changes

18:47 < marcosirc> it was an external prolem

18:47 < marcosirc> *problem.

18:48 < sumedhghaisas> ohh okay... no problem then... I think I have already looked at the code changes...

18:48 < sumedhghaisas> I will just go over the tests ...

18:50 < marcosirc> Ok

19:03 < sumedhghaisas> @marcosirc In the basic test where you are comparing the results with AKNN with KNN ...

19:04 < sumedhghaisas> I am confused about 'epsilon * 100' as a valid range??

19:06 < marcosirc> hi, it is a percentage, 0-100%

19:07 < marcosirc> I translate epsilon, from the rant [0,1] to [0,100]

19:07 < marcosirc> *range

19:08 < marcosirc> AAnyway, I removed that lines in the next commit.

19:08 < marcosirc> :)

19:09 < sumedhghaisas> but I thought BOOST_REQUIRE_CLOSE asks for absolute tolerance...?

19:09 < sumedhghaisas> ohh I was going commit by commit :)

19:11 < sumedhghaisas> ahh you implemented REQUIRE_RELATIVE _ERROR for that...

19:11 mentekid has joined #mlpack

19:11 < sumedhghaisas> and yes we should add a file dedicated for extra text macros...

19:12 < sumedhghaisas> old_boost_definitions is just for compatibility...

19:12 < marcosirc> It is not absolute erro, it uses Knuth's relative error formula.

19:12 < marcosirc> Ok, nice to know you agree.

19:13 < marcosirc> yeah!

19:16 < sumedhghaisas> you mean the one he is given in art of programming book??

19:18 < rcurtin> I'd say yeah we should put it in something like require_relative_error.hpp or something, since it's not an old boost test definition

19:18 < rcurtin> but I guess whatever filename is fine

19:19 < rcurtin> maybe either of you have a better idea :)

19:20 < sumedhghaisas> but dedicating a header for this small macro?? I like the idea of test_tools.hpp... thats way such small hacks can be stored together...

19:22 < marcosirc> yeah, I found that in the boost documentation.

19:22 < marcosirc> Yeah.. maybe they could be placed in the same file..

19:26 < rcurtin> I agree, test_tools.hpp would be nice

19:30 govg has quit [Quit: leaving]

19:44 < sumedhghaisas> @marcosirc: tests look solid... hate to point this out but some tests are line commented rather than function but I can fix that while merging ... not a problem :) Do you think we should squash commit 'Replace CLOSE by CLOSE_FRACTION.' ?

19:45 < sumedhghaisas> I am not sure what policy ryan follows here... squashing this commit will have better history as it is later replaced by another function altogether...

19:47 < marcosirc> Ok. No problem, I can remove that commit.

19:49 < marcosirc> If you prefer, I can fix the comments.

19:50 < marcosirc> I copied them from knn_test.hpp

19:51 mentekid has quit [Ping timeout: 240 seconds]

19:51 < sumedhghaisas> either way is fine for me... :)

19:52 < sumedhghaisas> seems like AppVeyor is still running...

19:53 < rcurtin> I generally don't squash commits, I like when commits are easier to understand because they are small

19:56 < sumedhghaisas> yeah thats true but then it might bloat the history...

19:58 nilay has joined #mlpack

19:58 < sumedhghaisas> also that commit serves no purpose anymore... the code changes in that commit are changing function A to B ... then in the next commit it is changed from B to C... so I thought lets put A to C ... :)

20:03 < rcurtin> sure, I don't disagree with that

20:03 < rcurtin> I think there is lots of bloat in the history anyway, so it is no huge deal either way :)

20:05 < sumedhghaisas> yeah thats true...

20:05 < sumedhghaisas> about that code redundancy thing....

20:07 < sumedhghaisas> since we are deciding on dynamic values of bool... there has to be some dynamic resolution required right??

20:09 < marcosirc> sumedh: sorry, I don't understand what you mean to say.

20:10 < marcosirc> which part of the code are you talking about?

20:11 < sumedhghaisas> @marcosirc Ryan suggested some CRTP to solve the problem... but I don't think that is possible...

20:11 < marcosirc> Ahhh yeah.. about ns_model.

20:11 < sumedhghaisas> I like the idea of boost variant...

20:11 < sumedhghaisas> ohh yes about ns_model

20:12 < sumedhghaisas> boost variant is a good option...

20:12 < marcosirc> I have implemented some changes, using an abstract class.

20:12 < marcosirc> I will push them so you can see.

20:14 < sumedhghaisas> with boost variant?

20:15 < sumedhghaisas> ohh with interface you mean...

20:16 < marcosirc> No, using inheritance...

20:16 < marcosirc> I have been considering many options

20:17 < sumedhghaisas> I am not actually sure which is faster... inheritance or boost variant...

20:17 < marcosirc> but most of them resulted in a lot of lines of code...

20:17 < marcosirc> Yes.. I really didn't have enough time to look into boost variant

20:18 < marcosirc> Here is a possible implementation: https://github.com/MarcosPividori/mlpack/tree/improve-nsmodel

20:18 < sumedhghaisas> inheritance solutions will involve lot of code... also all newer trees have to follow certain rules...

20:18 < sumedhghaisas> on other side boost variant is elegant...

20:19 < sumedhghaisas> but do not how it fares in speed]

20:19 < marcosirc> Not necessary every new tree means more code..

20:20 < marcosirc> The main problem can be summarized to:

20:20 < marcosirc> "we want to set the leaf size of trees in neighbor search, and the NeighborSearch Class doesn't provide that option"

20:21 < marcosirc> This is the reason of the difference in the code, between KDTrees/BallTrees and the rest of the tree types.

20:23 < marcosirc> So I implemented a NeighborSearchLeaf class that encapsulates an instance of NeighborSearch class, and adds the functionality to deal with different leafSizes

20:25 < marcosirc> I have to fix the NSModel::Serialize() method yet..

20:36 < sumedhghaisas> @marcosirc hmm... this a new way to think about it... its too late at night in India :) do you mind if we continue this discussion tomorrow??

20:36 < marcosirc> Yes! Sure!

20:37 < marcosirc> I will read about boost variants and CRTP for tomorrow

20:37 < marcosirc> So we can compare/contrast.

20:37 < sumedhghaisas> thanks :) I will look at the code tomorrow... we can discuss about the options tomorrow...

20:38 < sumedhghaisas> I agree...

20:38 < sumedhghaisas> goo night :)

20:38 < sumedhghaisas> *good

20:38 < marcosirc> the same for you!

20:38 < marcosirc> Ahh appveyor succeeded.

20:39 < marcosirc> so, I am going to make the changes you proposed.

20:53 < rcurtin> marcosirc: although right now it is only the ball tree and kd tree that support different leaf size options, other trees that we add in the future may also have other options we want to consider

20:54 < marcosirc> rcurtin: yeah, only binary trees, as far as I understand.

20:55 < rcurtin> no, what I mean is, it is possible that someday we may want to add a user-facing option to control a different part of a different tree

20:55 < marcosirc> maybe I could modify the code to include a leafSize member in NeighborSearch class, and then modify the BuildTree function to decide, depending on the type of the tree, if using or no a leafSize parameter

20:55 < rcurtin> like maybe... if we add vantage point trees, then maybe we want the user to be able to specify the size of the inner and outer balls

20:56 < rcurtin> or actually spill trees are maybe a better example, we want them to be able to specify the amount of overlap or something like this

20:56 < marcosirc> Yes, I understand.

20:56 sumedhghaisas has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

20:56 < rcurtin> I haven't taken a look too closely at the changes you made though, I hope to have some time tomorrow (but I might not)

20:57 < marcosirc> I have been thinking about that. We should provide an interface to set tree options...

20:57 < marcosirc> Ok.

20:59 < rcurtin> another thing worth considering, is that the only reason NSModel exists is for the mlpack_knn and mlpack_kfn programs, so we don't need to be too concerned with providing NSModel as a public interface, because that is not what it is meant to be

20:59 < rcurtin> (hence why it does not provide all the overloads of Search() that NeighborSearch does)

21:00 < marcosirc> Yeah, I totally agree.

21:05 < marcosirc> rcurtin: I have pushed the changes proposed by sumedh.

21:07 < rcurtin> okay, I'll take a look through the PR on the train tomorrow and make any comments

21:07 < marcosirc> I have to leave now for some hours

21:07 < rcurtin> okay, have a good evening :)

21:07 < rcurtin> I think that any comments I have will be pretty simple

21:07 < marcosirc> Ok, thanks! the same for you!

21:07 marcosirc has quit [Quit: WeeChat 1.4]

21:09 < rcurtin> nilay: I guess you deleted the test_branch branch? we should make feature branches in our own forks, like in nilayjain/mlpack:test_branch, not mlpack/mlpack:test_branch

21:09 < rcurtin> if I can clarify anything about that process, please let me know, I am happy to help :)

21:09 < nilay> yeah, i did the same mistake again, i intended to push it to my fork

21:13 < rcurtin> no worries, just let me know if I can help out if you are having git trouble :)

21:13 < rcurtin> it took me a long time to learn git and I still learn new things every day, it is a complex tool :)

21:27 travis-ci has joined #mlpack

21:27 < travis-ci> mlpack/mlpack#971 (test_branch - 34840c8 : nilayjain): The build failed.

21:27 < travis-ci> Change view : https://github.com/mlpack/mlpack/commit/34840c8c0f39

21:27 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/136543363

21:27 travis-ci has left #mlpack []

21:41 < nilay> rcurtin: thank you

22:25 < zoq> nilay: Okay, I'm starting to remember, should the CopyMakeBorderTest function return something?

22:27 < nilay> yeah, same mistake again :(

22:27 < zoq> missed it this time :)

22:28 < nilay> why doesn't compiler show this thing

22:28 < nilay> this is supposed to be compilers job

22:28 < nilay> anyways thanks a lot

22:29 < zoq> I guess, there is some option you could use to print a warning.

22:29 < zoq> Also take a look at ConvTriangleTest.

22:30 < nilay> but then again warnings are so many

22:30 < zoq> you mean warnings about int and size_t?

22:30 < nilay> actually i copied the functions and added a Test behind their name and added arguements. and so forgot to see the return value :(

22:31 < nilay> yeah and other warnings also, whenever i compile so much is flushed on screen. i only look for red error messages.

22:31 < zoq> :)

22:31 < nilay> maybe i have to change that way. then i won't miss these errors.

22:32 < zoq> We can fix that warnings later, looks like you can avoid a lot of warnings if you are going to use size_t instead of int.

22:33 < nilay> yeah i will do that

22:34 < nilay> zoq: you were saying something about ConvTriangleTest

22:34 < nilay> also do you think these tests are enough or do we need to add more?

22:35 < zoq> I guess, ConvTriangleTest should return something or you should use void instead.

22:35 < nilay> Yeah they all should be void

22:36 < nilay> same error replicated at multiple places

22:36 < zoq> Should be good enough to test the code.

22:37 < nilay> ok then i'll incorporate changes tham suggested and only work these tests.

22:38 < zoq> Also, the Test function only compares the first entries.

22:38 < zoq> sounds good

22:39 < nilay> n_elem should give all entries?

22:41 < zoq> yeah, right but you never change the pointer. Looks like you could just use: BOOST_REQUIRE_CLOSE(m1(i), m2(i), 1e-3); no need to use memptr()

22:43 < nilay> oh ok.

23:08 benchmark has joined #mlpack

23:08 benchmark has quit [Client Quit]