#mlpack on 2014-07-08 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:04 witness___ has quit [Quit: Connection closed for inactivity]

04:26 govg has quit [Ping timeout: 264 seconds]

06:16 witness___ has joined #mlpack

08:26 Anand has joined #mlpack

08:28 < Anand> Marcus : Your thoughts on the weka logistic regression predicted labels?

08:47 Anand has quit [Ping timeout: 246 seconds]

08:49 Anand has joined #mlpack

08:55 govg has joined #mlpack

08:58 Anand has quit [Ping timeout: 246 seconds]

09:08 Anand_ has joined #mlpack

09:51 < marcus_zoq> Anand_:Hello! Did you commit your changes?

09:53 < Anand_> Marcus : Hi! Yes, to my branch

10:07 Anand_ has quit [Ping timeout: 246 seconds]

10:17 Anand has joined #mlpack

10:43 Anand_ has joined #mlpack

10:44 Anand has quit [Ping timeout: 246 seconds]

10:49 Anand_ has quit [Ping timeout: 246 seconds]

11:34 < jenkins-mlpack> Starting build #2000 for job mlpack - svn checkin test (previous build: SUCCESS)

12:54 < jenkins-mlpack> Project mlpack - svn checkin test build #2000: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2000/

12:55 < jenkins-mlpack> * Ryan Curtin: Rename for slightly changed API.

12:55 < jenkins-mlpack> * Ryan Curtin: Trivial spacing fixes.

12:55 < jenkins-mlpack> * Ryan Curtin: Minor refactoring of AMF class; mostly renaming for consistency and

12:55 < jenkins-mlpack> clarification of comments.

12:55 < jenkins-mlpack> Starting build #2001 for job mlpack - svn checkin test (previous build: SUCCESS)

14:15 < jenkins-mlpack> Project mlpack - svn checkin test build #2001: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2001/

14:15 < jenkins-mlpack> * Ryan Curtin: Patch from Zhihao: sa_update.diff.

14:15 < jenkins-mlpack> * Ryan Curtin: Disambiguate: math::RandomSeed() -> mlpack::math::RandomSeed(). Issue noted by

14:15 < jenkins-mlpack> Zhihao.

14:15 < jenkins-mlpack> * Ryan Curtin: Don't include <armadillo> before <mlpack/core.hpp>.

14:15 < jenkins-mlpack> * Ryan Curtin: Note that we now have simulated annealing.

14:15 < jenkins-mlpack> * Ryan Curtin: Add new contributor.

14:15 < jenkins-mlpack> * Ryan Curtin: Don't include <armadillo> explicitly, because <mlpack/core.hpp> does that

14:15 < jenkins-mlpack> already.

14:15 < jenkins-mlpack> * Ryan Curtin: Move warning to prereqs.hpp, because sometimes prereqs.hpp is included and

14:15 < jenkins-mlpack> core.hpp is not.

14:21 andrewmw94 has joined #mlpack

15:23 Anand has joined #mlpack

15:34 < marcus_zoq> Anand: Hello, so if you finished the logistic regression, the plan is to continue with the linear regression method?

15:35 < Anand> Yes, I also added metrics for linear regression for scikit today.

15:35 < Anand> Will complete it by tomorrow or the day afeter

15:36 < Anand> And then I will take HMM, I guess

15:36 < marcus_zoq> Anand: Sounds good :)

15:37 < marcus_zoq> Anand: Maybe at the end of the week you can merge your branch with the master branch?

15:37 < Anand> Yes sure! Did you have a look at the bug fix? I am not sure if that is the bug that caused build failure, but probably it did

15:41 < marcus_zoq> Anand: It's the same error.

15:42 < Anand> You mean you are still getting the error?

15:42 < marcus_zoq> Anand: Yeah, vec[int(Vec[i])-1]=1 -> IndexError: list assignment index out of range

15:43 < Anand> You merged with my branch?

15:43 < marcus_zoq> Anand: I'm using your branch, so I'm testing with the fix.

15:44 < Anand> Ok. I will see what is going wrong.

15:45 < Anand> Doesn't seem to be a good thing

15:46 < marcus_zoq> Anand: You can install the unittest-xml-reporting package and then run the tests with 'make checks'.

15:49 < Anand> Ok, I will

15:50 < marcus_zoq> Aannd: You can edit the tests.py file to run only a single test.

15:51 < Anand> Ok, yeah I got it. I will need to edit the modules

16:07 Anand has quit [Ping timeout: 246 seconds]

16:23 govg has quit [Quit: leaving]

16:34 govg has joined #mlpack

17:33 govg has quit [Quit: leaving]

17:38 sumedhghaisas has joined #mlpack

17:43 < naywhayare> andrewmw94: a fix for clang -- http://www.mlpack.org/trac/changeset/16788

17:43 < naywhayare> also, I'm debugging that kd-tree issue... but it may be a while until I have a solution

17:44 < naywhayare> I set a breakpoint in Score() for the relevant kd-tree nodes using gdb... but it has been running for four hours and hasn't stopped yet...

17:44 < andrewmw94> hmm, not fun

17:45 < andrewmw94> I have a question about C++

17:45 < andrewmw94> for the R* tree, the descent heuristic is different if you are descending to a leaf node or a non-leaf node.

17:45 < andrewmw94> but for the R tree it's the same

17:45 < andrewmw94> so I wanted to pass a boolean to the EvalNode() function so it works with R*-trees, but with R trees it just ignores the parameter

17:46 < andrewmw94> which causes warnings because it is unused

17:46 < andrewmw94> is there a nice way to solve that? I could add another template or something, but that seems silly

17:47 < naywhayare> ah, just comment out the parameter or leave it unnamed

17:47 < naywhayare> i.e. void Function(const double /* unused */)

17:47 < naywhayare> or void Function(const double)

17:47 < naywhayare> I prefer the first because it leaves some information on what the parameter would be, if it was used

17:48 < andrewmw94> ah, thanks

17:49 < andrewmw94> I was thinking "I could add another const variable to the class and then use that and the compiler should optimize it out.

17:49 < andrewmw94> but there must be a better way to do this

17:50 govg has joined #mlpack

17:50 < andrewmw94> " C++ is nice because it has features for almost everything

17:51 < sumedhghaisas> naywhayare: According to paper we should get final RMSE of 0.87 ... again I am getting 1.3 :(

17:52 < naywhayare> sumedhghaisas: for incremental SVD?

17:52 < sumedhghaisas> yes...

17:52 < naywhayare> have you tried tweaking the parameters?

17:52 < sumedhghaisas> but the performance is better than SVDBatch definitely :)

17:53 andrewmw94 has quit [Quit: Leaving.]

17:53 andrewmw94 has joined #mlpack

17:54 < sumedhghaisas> yes... I am trying other parameters now...

17:54 < sumedhghaisas> did you look at the abstraction??

17:55 < naywhayare> for IncompleteIncrementalTermination? it looks good to me; very simple

17:56 < sumedhghaisas> yes... indeed... :)

18:35 < sumedhghaisas> naywhayare: now I am always returning false in IsConverged... current RMSE is 0.90 and decreasing ... :)

18:35 < sumedhghaisas> can you look at the paper right now??

18:37 < naywhayare> I'm actually a bit busy right now, but if you have a question, tell me what it is and I will look into it shortly...

18:37 < sumedhghaisas> okay no problem.. msg me when you are free...

18:48 < sumedhghaisas> naywhayare: not going to watch world cup semifinal ?? :)

18:53 < naywhayare> no time, too much to do :(

19:02 < sumedhghaisas> ohh okay :(

19:02 < sumedhghaisas> maybe tomorrow??

19:05 < sumedhghaisas> naywhayare: yieeepy... 0.87 now... :)

19:06 < naywhayare> probably no world cup for me, unfortunately. what is the algorithmic question you had? I can look into it now

19:07 < sumedhghaisas> naywhayare: okay... can you look at figure 2??

19:08 < sumedhghaisas> if you see the curve its not smooth... there are lot of up downs between...

19:08 < sumedhghaisas> thats why our algorithm is terminating way early...

19:08 < sumedhghaisas> ohh forgot to tell you look at SVDUSER...

19:09 < sumedhghaisas> thats the one I have implemented right now...

19:09 < naywhayare> I see what you mean

19:10 < naywhayare> it seems like you could avoid that by maybe increasing the tolerance in the ValidationRMSETermination to something like 0.002

19:10 < naywhayare> not 1e-5

19:10 < naywhayare> have you tried that?

19:12 < sumedhghaisas> umm.. but what if the RMSE increses...

19:12 < naywhayare> yeah, it can increase by up to 0.002 each iteration

19:13 < naywhayare> hm, or do you mean if the RMSE increases for multiple iterations?

19:14 < sumedhghaisas> ohh okay... then I can just add abs(...) to IsConverged...

19:15 < sumedhghaisas> right now its (oldRMSE - RMSE) / oldRMSE > tolerance....

19:15 < naywhayare> oh, ok, then maybe you can add a second parameter for the tolerance for how much it is allowed to increase

19:16 < naywhayare> terminate if ((oldRMSE - RMSE) / omdRMSE < tolerance) || ((RMSE - oldRMSE) > increaseTolerance)

19:16 < naywhayare> does that make sense? the first part is the same -- terminate if the relative change in RMSE is below the tolerance

19:16 < naywhayare> and the second part says, terminate if the RMSE jumped back up by increaseTolerance or more

19:19 < sumedhghaisas> yeah... its a good solution... I am thinking now can we remove reverseStepCount??

19:19 < sumedhghaisas> is it important??

19:20 < sumedhghaisas> I guess yes... cause the RMSE may keep on increasing....

19:20 < sumedhghaisas> we need to detect that too...

19:30 < naywhayare> well, hang on... why doesn't reverseStepCount allow the algorithm to converge to an RMSE of 0.87?

19:30 < naywhayare> couldn't you just set it a little higher?

19:33 < jenkins-mlpack> Starting build #2002 for job mlpack - svn checkin test (previous build: SUCCESS)

19:47 imi has joined #mlpack

19:51 < sumedhghaisas> yes... but I guess increaseTolerance is better idea ... along with reverseStepCount it will perform better...

19:52 govg is now known as GOV|govg

19:52 < sumedhghaisas> naywhayare: I can set reverseStepCount little higher but there can be many kinks in the way... if you see the graph in the paper... there are many up downs....

19:53 GOV|govg is now known as zGz|govg

19:53 < naywhayare> sumedhghaisas: that's true -- but how many iterations is each kink?

19:53 < sumedhghaisas> means... I didn;t get you...

19:54 < sumedhghaisas> in the graph before convergence ... there are many many kinks...

19:54 < naywhayare> right

19:54 < naywhayare> but how long is the "up" part of these kinks? 10 iterations? 15 iterations? you should be able to set reverseStepCount to be just a little longer than the longest kink, and that should work

19:55 < sumedhghaisas> yes... but then this can change for different datasets... what should be the default value??

19:56 < naywhayare> well, we can leave the default how it is; many parameters like this have to be tuned for different datasets

19:57 < sumedhghaisas> yes... I will try for higher values of reverseStepCount...

19:57 < sumedhghaisas> lets see if I can produce 0.87...

20:21 < andrewmw94> naywhayare: I have a possibly detailed question. Is now a good time?

20:24 < naywhayare> sure, go ahead

20:25 < andrewmw94> ok, so in the paper for the R* tree, it mentions another paper

20:25 < andrewmw94> http://www.cs.cuhk.hk/~lyu/student/fyp99/lyu9901/chho1/p17-roussopoulos.pdf

20:25 < andrewmw94> which it says makes more sense for static datasets

20:26 < andrewmw94> I quickly did some more searching (eg. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1586&context=compsci) and it sounds like the algorithm outlined in the paper is a lot better than R* trees or X trees if we have static datasets

20:27 < andrewmw94> so I'm wondering whether I should try to implement that instead, or whether I should change the R tree so that it better supports dynamic insertion/deletion of points

20:28 < naywhayare> the problem we had with dynamic insertion/deletion is that when we have multiple arma::mat objects, it's not clear what to use as an index for a given point

20:29 < naywhayare> one could make a TreeType::Insert() function that appended the given vector to the internally held matrix, but this still costs allocation time equivalent to the size of the full matrix

20:30 < andrewmw94> I think I may have a solution to the point ordering thing. It's rather arbitrary, but I think it is consistent. However, I need to give it some more thought. But it also would not work well when adding/deleting points.

20:30 < naywhayare> or you could hold many matrices internally, and also hold some kind of "index offset" with each matrix, but the question there is, how do we make it so the user can easily understand what the indices they get back from NeighborSearch even are?

20:31 < andrewmw94> yeah. the dynamic insertions and deletions add a lot of extra overhead, and since they aren't used currently I'm dubious that it would be worthwhile

20:32 < naywhayare> they aren't currently used, but maybe someone would find them useful if they were implemented

20:32 < naywhayare> so, I could go either way on this one

20:32 < naywhayare> if you are interested in trying to figure out how to easily support dynamically-sized datasets and be able to grow/shrink the tree accordingly, we can go that way

20:33 < naywhayare> but if not, then perhaps substituting Kamel+Faloutsos's ideas is a reasonable replacement for one of the other types of trees

20:33 < naywhayare> at some point, I would eventually like to be able to work with dynamic datasets, but I am not completely sure how to do that best

20:34 < naywhayare> maybe a user wants to do something like this... they have some server that holds on to a NeighborSearch object which holds on to a tree of some sort

20:34 < naywhayare> users occasionally request something that causes NeighborSearch::Search() to be called (maybe for one query point? maybe many?) and results are processed and returned

20:34 < naywhayare> but at the same time, maybe the server occasionally adds points to the tree as new data becomes available

20:35 < naywhayare> I know those types of situations happen in the real world, but we don't have a good solution for anything like that at the moment

20:35 < andrewmw94> yeah. I know it could be useful, but the R* tree also implies that the packing algorithm is better if the tree is "nearly static"

20:35 < andrewmw94> I'm not sure how many insertions/deletions that is supposed to mean

20:36 < naywhayare> if they don't clarify what they mean by "nearly static", then it's anyone's guess...

20:37 < naywhayare> what tree would you want to replace to implement the packing algorithm, if you went that route?

20:38 < andrewmw94> I think the R* tree would make the most sense.

20:38 < andrewmw94> The X tree is basically an extension where you can decide to not split a node.

20:39 < naywhayare> okay; but don't you already have the R* mostly implemented?

20:40 < andrewmw94> yes, but it's mostly the same as the R tree. And I'm not sure about trying to finish it when the dynamic insertion/deletion stuff is still changing. I can try to describe my point ordering idea to you to see if you think it would work, but I don't see a way to have it work with dynamic insertion/deletion

20:43 < naywhayare> sure, go ahead and describe it

20:45 < andrewmw94> basically, once the tree is built, if we assume that it will no longer change, we should be able to do a quasi-pre-order traversal, keeping track of the point numbers. Then I think we could go over the whole thing once moving points around and changing the values of the indices. It should make the queries faster since the points would be stored contingently with others in there node and nearby nodes should be contingent

20:45 < andrewmw94> but if the tree changes, you have to do the whole thing again.

20:46 < naywhayare> that is true

20:46 < naywhayare> that's a reasonable approach

20:46 < naywhayare> I do wonder if it could be done implicitly in the splitting process, like for the BinarySpaceTree

20:46 < naywhayare> but it's an idea that would work. I don't know how fast it would be

20:46 < naywhayare> I'm going to try to spend some time this afternoon and evening thinking about how to better support dynamic insertions/deletions

20:46 < naywhayare> the main problem being that we have to have some way to index new points

20:47 < andrewmw94> doing it at the end of tree construction would be O(n^2) I think

20:48 < naywhayare> I'd think it could be done in O(n log n) or O(n), but I'm not certain

20:48 < naywhayare> anyway, I have to go for now. if you want to put some thought into how to index points across multiple arma::mat objects, too, I'd appreciate it

20:49 < andrewmw94> ok

20:53 < jenkins-mlpack> Project mlpack - svn checkin test build #2002: SUCCESS in 1 hr 19 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2002/

20:53 < jenkins-mlpack> * Ryan Curtin: Document that #315 is fixed.

20:53 < jenkins-mlpack> * Ryan Curtin: Test HMM initial probabilities.

20:53 < jenkins-mlpack> * Ryan Curtin: Add support for HMM initial states. Slight modification of API for creating

20:53 < jenkins-mlpack> HMMs by hand -- the initial parameter is now required. This change may affect

20:53 < jenkins-mlpack> some existing results, and the new results may not perfectly agree with MATLAB,

20:53 < jenkins-mlpack> but MATLAB does not have the flexibility to seriously support initial

20:53 < jenkins-mlpack> probabilities. It is possible to set the initial parameters such that it will

20:53 < jenkins-mlpack> emulate MATLAB behavior right, probably by setting the initial probabilities to

20:53 < jenkins-mlpack> the first column of the transition matrix.

20:53 < jenkins-mlpack> * Ryan Curtin: clang complains when default parameters aren't part of the original declaration.

20:53 < jenkins-mlpack> Also vim removes trailing whitespaces, so this diff looks way longer and more

20:53 < jenkins-mlpack> complex than it actually is...

20:53 < jenkins-mlpack> * Ryan Curtin: Return the correct type of the matrix, because it isn't necessarily dense.

21:31 sumedh_ has joined #mlpack

21:34 sumedhghaisas has quit [Ping timeout: 240 seconds]

21:50 < sumedh_> naywhayare: its working for reverseStepCount 5... :)

21:51 < sumedh_> anyways... what a great match... 7-0 trashing of brazil :P

22:20 < sumedh_> 7-1 it is in fact

22:41 imi has quit [Remote host closed the connection]