#mlpack on 2016-07-18 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

05:38 Mathnerd314 has quit [Ping timeout: 250 seconds]

07:18 govg has quit [Ping timeout: 258 seconds]

07:20 govg has joined #mlpack

10:28 kwikadi has quit [Ping timeout: 258 seconds]

10:34 mentekid has joined #mlpack

11:10 kwikadi has joined #mlpack

11:25 mentekid has quit [Ping timeout: 264 seconds]

13:43 Mathnerd314 has joined #mlpack

13:55 mentekid has joined #mlpack

14:03 mentekid has quit [Ping timeout: 250 seconds]

14:19 mentekid has joined #mlpack

15:04 mentekid has quit [Ping timeout: 276 seconds]

16:41 nilay has joined #mlpack

16:54 mentekid has joined #mlpack

17:05 gtank has quit [Ping timeout: 272 seconds]

17:05 mentekid has quit [Ping timeout: 276 seconds]

17:09 gtank has joined #mlpack

17:12 < nilay> zoq: hi, i have a doubt, can you run this test and tell me why an error comes.

17:13 < nilay> https://gist.github.com/nilayjain/dd657ab29252435dc61e5a664a5086f7

17:56 govg has quit [Ping timeout: 240 seconds]

17:58 govg has joined #mlpack

18:36 < nilay> zoq: to initialize the weights i will have to change code in the conv_layer, but then that test wouldn't be one which stands, it'll only be for evaluation

18:47 pantsforbirds has joined #mlpack

18:49 < pantsforbirds> if im interested in contributing is there some documentation i can read? I've found the google summer of code projects, but i cant find any other contribution documents

18:52 < rcurtin> pantsforbirds: have you seen http://www.mlpack.org/involved.html ?

18:52 < rcurtin> I like the nick, by the way :)

18:52 < pantsforbirds> rcurtin, ah thats exactly what i was looking for!

18:52 < pantsforbirds> and thanks!

18:53 < rcurtin> sure, please feel free to ask more questions if you like :)

18:59 nilay has quit [Ping timeout: 250 seconds]

19:09 < pantsforbirds> so if i wanted to help with some optimizer algorithms that would be possible?

19:14 sumedhghaisas_ has joined #mlpack

19:15 < sumedhghaisas_> marcosirc: Hey Marcos...

19:15 < marcosirc> sumedhghaisas: Hi! how are you?!

19:15 < sumedhghaisas_> great... had a great trip.

19:15 < sumedhghaisas_> was exhausted the whole day...

19:16 < sumedhghaisas_> involved too much driving around

19:16 < sumedhghaisas_> so I looked at you mail...

19:16 < marcosirc> Nice! I can imagine!

19:17 < marcosirc> Ok.

19:17 < sumedhghaisas_> So the less that k neighbours problem...

19:17 < marcosirc> I was writing a new mail in response to ryan comments.

19:18 < sumedhghaisas_> So my best bet would be the first solution... given that its properly documented...

19:18 < sumedhghaisas_> but it would be fai to also consider how other libraries handle this case...

19:18 < sumedhghaisas_> *fair

19:19 < sumedhghaisas_> like in the defeatist search if less than k neighbours are found...

19:21 < marcosirc> Yeah, I understand. I have implemented the 2nd solution because it was very simple to do and I thought it would be more useful for future users.

19:22 < marcosirc> I couldn't find many libraries implementing defeatist search.

19:22 < marcosirc> I have searched in google for a while, and found some libraries with different approaches.

19:23 < sumedhghaisas_> I am not sure I understand the second option correctly...

19:24 < marcosirc> I was trying to understand how them consider the tau value. I didn't analysed how they work with different values of k, so I will review this!

19:24 < marcosirc> Sorry, maybe I didn't explained it well.

19:25 < marcosirc> I am just writing a new email with more info.

19:26 < sumedhghaisas_> So the third options checks for less than k candidates...

19:26 < sumedhghaisas_> and if not... converts the overlapping node to normal node...

19:26 < sumedhghaisas_> is that right?

19:27 < sumedhghaisas_> I agree with you that this will add lot of complexity... checking if points are revisited or not...

19:27 < marcosirc> Sorry, do you mean the second option?

19:27 < marcosirc> yeah.

19:28 < marcosirc> If less than k candidates, it considers the node as a non-overlapping node and does backtracking

19:29 < marcosirc> At the end it was not much complexity. Only 3 lines of code :) I have implemented that in the spill-trees branch.

19:29 < sumedhghaisas_> ahh yes sorry...

19:31 < sumedhghaisas_> I meant runtime complexity... but this can a valid option...

19:31 < sumedhghaisas_> if user wants all k neighbours...

19:31 < marcosirc> yeah. I implemente a new tree trait

19:31 < marcosirc> to know if the tree has duplicated points

19:32 < marcosirc> it only check for duplicated candidates when the tree has duplicated points

19:32 < sumedhghaisas_> if switching between them does not involve lot of code... I would prefer keeping both ... and passing flags to switch

19:32 < marcosirc> so it won't modify the behaviour on other tree types.

19:32 < rcurtin> pantsforbirds: sorry for the slow response, I was in a meeting. you are absolutely welcome to help with optimizer algorithms!

19:33 < marcosirc> I also think it doesn't involve importante runtime complexity.

19:33 < marcosirc> because I implemented it this way:

19:34 < sumedhghaisas_> So without flag it would be the straightforward hybrid search.... with flag it will guarantee k neighbours...

19:34 < marcosirc> - you calculate the position in the sorted list of candidate where you want to insert the new point.

19:34 < marcosirc> let's call it "i".

19:35 < sumedhghaisas_> hmmm... okay

19:35 < marcosirc> then you analyse all the position greater or equal to "i" that have the same distance that the candidate you want to include.

19:36 < marcosirc> if the candidate was inserted before, you will find it there, and the probability of having other candidate with the same distance is really really low.

19:36 < marcosirc> so it won't require many operations...

19:37 < marcosirc> Ok, I will consider the flag, but I think it could involve many changes to actual implementation...

19:39 < marcosirc> it is implemented here: https://github.com/MarcosPividori/mlpack/blob/spill-trees/src/mlpack/methods/neighbor_search/neighbor_search_rules_impl.hpp#L479

19:39 < rcurtin> marcosirc: the probability of having another candidate with the same distance is exceedingly low if the data is uniformly distributed, but if instead it comes from a discrete distribution (like the cloud dataset, or possibly even MNIST), neighbors with identical distances are very possible

19:41 < marcosirc> rcurtin: Ok, I understand. Anyway, I don't think it will require too many operations.

19:42 < rcurtin> yeah, you can simply check the neighbor index also

19:42 < marcosirc> Yeah, that is what I mean.

19:43 < marcosirc> I check that index "i" is not present in all the candidate with same distance than the candidate "i".

19:44 < rcurtin> ah, okay, I see what you mean now, sorry for the misunderstanding

19:45 < sumedhghaisas_> but still... won;t it be extra effort for the user who wants hybrid search?

19:45 < marcosirc> Sure, sorry if I don't explain myself properly.

19:45 < sumedhghaisas_> thats why I was suggesting maybe like a 'force k neighbours' flag :)

19:46 < marcosirc> Mmmm, ok. But if you specify a given k, is that you want k neighbors, not less...

19:48 < marcosirc> If you think this would be more useful, I can modify actual implementation to consider a new flag.

19:50 < marcosirc> If you agree, I can review what is the approach of other libraries.

19:51 < sumedhghaisas_> yes you are right... but should we alter the algorithm for it? Thats what is hard to decide...

19:51 < sumedhghaisas_> rcurtin: What do you think about the flag option?

19:57 < marcosirc> I sent a new email with the last information :)

19:58 < sumedhghaisas_> marcosirc: And yes I agree that we should look into the approach by other libraries ...

19:58 < sumedhghaisas_> sorry slipped out of my mind...

19:58 < marcosirc> sumedhghaisas_: ok, I will do it now.

19:59 < rcurtin> sumedhghaisas_: I am not totally sure it is necessary; I don't have much of an opinion either way

20:00 < rcurtin> one of the things to consider is, if we do add a flag that will force the program to return k neighbors, then we should probably make the same option available for LSH and other techniques, but it is not always clear the best way to do that

20:10 < sumedhghaisas_> rcurtin: I understand, for consistency, but in this specific case as the overhead of checking the duplicate point is not much, we will be able to provide user with more control

20:14 < sumedhghaisas_> rcurtin: Also I installed ubuntu 16.04 ... and the default compiler is g++ 5.4.0 ... :)

20:15 < sumedhghaisas_> I will try to solve all those issues...

20:15 < rcurtin> (sorry, I am in a meeting... too many meetings... !)

20:15 < rcurtin> (I'll respond when I have a chance)

20:16 pantsforbirds has quit [Ping timeout: 260 seconds]

20:20 < sumedhghaisas_> rcurtin: ahh tell me about it :) I think my team waste more time in following agile terminology than optimizing the code

20:26 < marcosirc> haha

20:31 < sumedhghaisas_> btw I installed touchegg on my new installation.. amazing it is. now I can do macbook like trackpad getures on ubuntu...

20:32 < sumedhghaisas_> took me a night to set it up so if someone else wants help I can provide the prebuilt scripts :)

20:36 nilay has joined #mlpack

20:37 < nilay> zoq: does this test look good? https://gist.github.com/nilayjain/e2ec2fbb02955508b64812b1b996d1aa ? i know there are a few tweaks to be made, right now i am just printing values, but does this establishes correctness for forward and backward pass or do we need more stern tests? , let me know

20:43 nilay has quit [Quit: Page closed]

21:03 < marcosirc> sumedhghaisas_: rcurtin: In this implementation:

21:03 < marcosirc> https://github.com/kipr/opencv/blob/master/modules/legacy/src/spilltree.cpp#L334

21:03 < marcosirc> They do defeatist search while the node has at least k descendant points.

21:03 < marcosirc> Then, they do normal dfs search with prune rules. So, they can guarantee they will always return k points.

21:03 < marcosirc> It is a similar approach to what we do.

21:06 < marcosirc> But, it looks like they don't care about repeated points... I can't find more documentation than the code itself!

21:15 < zoq> nilay: nice way to initialize the weights, one last step you should do is to compare the output with a reference output, similar to the convolution test: https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/convolution_test.cpp

21:16 < zoq> nilay: Does this mean you fixed the error?

22:13 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

22:32 travis-ci has joined #mlpack

22:32 < travis-ci> mlpack/mlpack#1226 (master - 8900b8c : Ryan Curtin): The build is still failing.

22:32 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/ccc6ff0c01d6...8900b8ca1591

22:32 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/145675536

22:32 travis-ci has left #mlpack []

22:54 < marcosirc> sumedhghaisas_: rcurtin: after searching some time in google and github, I couldn't find many popular libraries implementing spill trees.

22:54 < marcosirc> The most relevant code that I found is an old opencv implementation (the same that I mentioned before):

22:54 < marcosirc> https://github.com/opencv/opencv_attic/blob/a6078cc8477ff055427b67048a95547b3efe92a5/opencv/modules/legacy/src/spilltree.cpp

22:56 < marcosirc> I couldn't find more documentation than the code itself. After reading it, it looks like they guarantee k candidates, but I don't think they check for repeated points..