naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
witness___ has quit [Quit: Connection closed for inactivity]
govg has quit [Ping timeout: 264 seconds]
witness___ has joined #mlpack
Anand has joined #mlpack
< Anand> Marcus : Your thoughts on the weka logistic regression predicted labels?
Anand has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
govg has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< marcus_zoq> Anand_:Hello! Did you commit your changes?
< Anand_> Marcus : Hi! Yes, to my branch
Anand_ has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
Anand_ has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand_ has quit [Ping timeout: 246 seconds]
< jenkins-mlpack> Starting build #2000 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack> Project mlpack - svn checkin test build #2000: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2000/
< jenkins-mlpack> * Ryan Curtin: Rename for slightly changed API.
< jenkins-mlpack> * Ryan Curtin: Trivial spacing fixes.
< jenkins-mlpack> * Ryan Curtin: Minor refactoring of AMF class; mostly renaming for consistency and
< jenkins-mlpack> clarification of comments.
< jenkins-mlpack> Starting build #2001 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack> Project mlpack - svn checkin test build #2001: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2001/
< jenkins-mlpack> * Ryan Curtin: Patch from Zhihao: sa_update.diff.
< jenkins-mlpack> * Ryan Curtin: Disambiguate: math::RandomSeed() -> mlpack::math::RandomSeed(). Issue noted by
< jenkins-mlpack> Zhihao.
< jenkins-mlpack> * Ryan Curtin: Don't include <armadillo> before <mlpack/core.hpp>.
< jenkins-mlpack> * Ryan Curtin: Note that we now have simulated annealing.
< jenkins-mlpack> * Ryan Curtin: Add new contributor.
< jenkins-mlpack> * Ryan Curtin: Don't include <armadillo> explicitly, because <mlpack/core.hpp> does that
< jenkins-mlpack> already.
< jenkins-mlpack> * Ryan Curtin: Move warning to prereqs.hpp, because sometimes prereqs.hpp is included and
< jenkins-mlpack> core.hpp is not.
andrewmw94 has joined #mlpack
Anand has joined #mlpack
< marcus_zoq> Anand: Hello, so if you finished the logistic regression, the plan is to continue with the linear regression method?
< Anand> Yes, I also added metrics for linear regression for scikit today.
< Anand> Will complete it by tomorrow or the day afeter
< Anand> And then I will take HMM, I guess
< marcus_zoq> Anand: Sounds good :)
< marcus_zoq> Anand: Maybe at the end of the week you can merge your branch with the master branch?
< Anand> Yes sure! Did you have a look at the bug fix? I am not sure if that is the bug that caused build failure, but probably it did
< marcus_zoq> Anand: It's the same error.
< Anand> You mean you are still getting the error?
< marcus_zoq> Anand: Yeah, vec[int(Vec[i])-1]=1 -> IndexError: list assignment index out of range
< Anand> You merged with my branch?
< marcus_zoq> Anand: I'm using your branch, so I'm testing with the fix.
< Anand> Ok. I will see what is going wrong.
< Anand> Doesn't seem to be a good thing
< marcus_zoq> Anand: You can install the unittest-xml-reporting package and then run the tests with 'make checks'.
< Anand> Ok, I will
< marcus_zoq> Aannd: You can edit the tests.py file to run only a single test.
< Anand> Ok, yeah I got it. I will need to edit the modules
Anand has quit [Ping timeout: 246 seconds]
govg has quit [Quit: leaving]
govg has joined #mlpack
govg has quit [Quit: leaving]
sumedhghaisas has joined #mlpack
< naywhayare> andrewmw94: a fix for clang -- http://www.mlpack.org/trac/changeset/16788
< naywhayare> also, I'm debugging that kd-tree issue... but it may be a while until I have a solution
< naywhayare> I set a breakpoint in Score() for the relevant kd-tree nodes using gdb... but it has been running for four hours and hasn't stopped yet...
< andrewmw94> hmm, not fun
< andrewmw94> I have a question about C++
< andrewmw94> for the R* tree, the descent heuristic is different if you are descending to a leaf node or a non-leaf node.
< andrewmw94> but for the R tree it's the same
< andrewmw94> so I wanted to pass a boolean to the EvalNode() function so it works with R*-trees, but with R trees it just ignores the parameter
< andrewmw94> which causes warnings because it is unused
< andrewmw94> is there a nice way to solve that? I could add another template or something, but that seems silly
< naywhayare> ah, just comment out the parameter or leave it unnamed
< naywhayare> i.e. void Function(const double /* unused */)
< naywhayare> or void Function(const double)
< naywhayare> I prefer the first because it leaves some information on what the parameter would be, if it was used
< andrewmw94> ah, thanks
< andrewmw94> I was thinking "I could add another const variable to the class and then use that and the compiler should optimize it out.
< andrewmw94> but there must be a better way to do this
govg has joined #mlpack
< andrewmw94> " C++ is nice because it has features for almost everything
< sumedhghaisas> naywhayare: According to paper we should get final RMSE of 0.87 ... again I am getting 1.3 :(
< naywhayare> sumedhghaisas: for incremental SVD?
< sumedhghaisas> yes...
< naywhayare> have you tried tweaking the parameters?
< sumedhghaisas> but the performance is better than SVDBatch definitely :)
andrewmw94 has quit [Quit: Leaving.]
andrewmw94 has joined #mlpack
< sumedhghaisas> yes... I am trying other parameters now...
< sumedhghaisas> did you look at the abstraction??
< naywhayare> for IncompleteIncrementalTermination? it looks good to me; very simple
< sumedhghaisas> yes... indeed... :)
< sumedhghaisas> naywhayare: now I am always returning false in IsConverged... current RMSE is 0.90 and decreasing ... :)
< sumedhghaisas> can you look at the paper right now??
< naywhayare> I'm actually a bit busy right now, but if you have a question, tell me what it is and I will look into it shortly...
< sumedhghaisas> okay no problem.. msg me when you are free...
< sumedhghaisas> naywhayare: not going to watch world cup semifinal ?? :)
< naywhayare> no time, too much to do :(
< sumedhghaisas> ohh okay :(
< sumedhghaisas> maybe tomorrow??
< sumedhghaisas> naywhayare: yieeepy... 0.87 now... :)
< naywhayare> probably no world cup for me, unfortunately. what is the algorithmic question you had? I can look into it now
< sumedhghaisas> naywhayare: okay... can you look at figure 2??
< sumedhghaisas> if you see the curve its not smooth... there are lot of up downs between...
< sumedhghaisas> thats why our algorithm is terminating way early...
< sumedhghaisas> ohh forgot to tell you look at SVDUSER...
< sumedhghaisas> thats the one I have implemented right now...
< naywhayare> I see what you mean
< naywhayare> it seems like you could avoid that by maybe increasing the tolerance in the ValidationRMSETermination to something like 0.002
< naywhayare> not 1e-5
< naywhayare> have you tried that?
< sumedhghaisas> umm.. but what if the RMSE increses...
< naywhayare> yeah, it can increase by up to 0.002 each iteration
< naywhayare> hm, or do you mean if the RMSE increases for multiple iterations?
< sumedhghaisas> ohh okay... then I can just add abs(...) to IsConverged...
< sumedhghaisas> right now its (oldRMSE - RMSE) / oldRMSE > tolerance....
< naywhayare> oh, ok, then maybe you can add a second parameter for the tolerance for how much it is allowed to increase
< naywhayare> terminate if ((oldRMSE - RMSE) / omdRMSE < tolerance) || ((RMSE - oldRMSE) > increaseTolerance)
< naywhayare> does that make sense? the first part is the same -- terminate if the relative change in RMSE is below the tolerance
< naywhayare> and the second part says, terminate if the RMSE jumped back up by increaseTolerance or more
< sumedhghaisas> yeah... its a good solution... I am thinking now can we remove reverseStepCount??
< sumedhghaisas> is it important??
< sumedhghaisas> I guess yes... cause the RMSE may keep on increasing....
< sumedhghaisas> we need to detect that too...
< naywhayare> well, hang on... why doesn't reverseStepCount allow the algorithm to converge to an RMSE of 0.87?
< naywhayare> couldn't you just set it a little higher?
< jenkins-mlpack> Starting build #2002 for job mlpack - svn checkin test (previous build: SUCCESS)
imi has joined #mlpack
< sumedhghaisas> yes... but I guess increaseTolerance is better idea ... along with reverseStepCount it will perform better...
govg is now known as GOV|govg
< sumedhghaisas> naywhayare: I can set reverseStepCount little higher but there can be many kinks in the way... if you see the graph in the paper... there are many up downs....
GOV|govg is now known as zGz|govg
< naywhayare> sumedhghaisas: that's true -- but how many iterations is each kink?
< sumedhghaisas> means... I didn;t get you...
< sumedhghaisas> in the graph before convergence ... there are many many kinks...
< naywhayare> right
< naywhayare> but how long is the "up" part of these kinks? 10 iterations? 15 iterations? you should be able to set reverseStepCount to be just a little longer than the longest kink, and that should work
< sumedhghaisas> yes... but then this can change for different datasets... what should be the default value??
< naywhayare> well, we can leave the default how it is; many parameters like this have to be tuned for different datasets
< sumedhghaisas> yes... I will try for higher values of reverseStepCount...
< sumedhghaisas> lets see if I can produce 0.87...
< andrewmw94> naywhayare: I have a possibly detailed question. Is now a good time?
< naywhayare> sure, go ahead
< andrewmw94> ok, so in the paper for the R* tree, it mentions another paper
< andrewmw94> which it says makes more sense for static datasets
< andrewmw94> I quickly did some more searching (eg. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1586&context=compsci) and it sounds like the algorithm outlined in the paper is a lot better than R* trees or X trees if we have static datasets
< andrewmw94> so I'm wondering whether I should try to implement that instead, or whether I should change the R tree so that it better supports dynamic insertion/deletion of points
< naywhayare> the problem we had with dynamic insertion/deletion is that when we have multiple arma::mat objects, it's not clear what to use as an index for a given point
< naywhayare> one could make a TreeType::Insert() function that appended the given vector to the internally held matrix, but this still costs allocation time equivalent to the size of the full matrix
< andrewmw94> I think I may have a solution to the point ordering thing. It's rather arbitrary, but I think it is consistent. However, I need to give it some more thought. But it also would not work well when adding/deleting points.
< naywhayare> or you could hold many matrices internally, and also hold some kind of "index offset" with each matrix, but the question there is, how do we make it so the user can easily understand what the indices they get back from NeighborSearch even are?
< andrewmw94> yeah. the dynamic insertions and deletions add a lot of extra overhead, and since they aren't used currently I'm dubious that it would be worthwhile
< naywhayare> they aren't currently used, but maybe someone would find them useful if they were implemented
< naywhayare> so, I could go either way on this one
< naywhayare> if you are interested in trying to figure out how to easily support dynamically-sized datasets and be able to grow/shrink the tree accordingly, we can go that way
< naywhayare> but if not, then perhaps substituting Kamel+Faloutsos's ideas is a reasonable replacement for one of the other types of trees
< naywhayare> at some point, I would eventually like to be able to work with dynamic datasets, but I am not completely sure how to do that best
< naywhayare> maybe a user wants to do something like this... they have some server that holds on to a NeighborSearch object which holds on to a tree of some sort
< naywhayare> users occasionally request something that causes NeighborSearch::Search() to be called (maybe for one query point? maybe many?) and results are processed and returned
< naywhayare> but at the same time, maybe the server occasionally adds points to the tree as new data becomes available
< naywhayare> I know those types of situations happen in the real world, but we don't have a good solution for anything like that at the moment
< andrewmw94> yeah. I know it could be useful, but the R* tree also implies that the packing algorithm is better if the tree is "nearly static"
< andrewmw94> I'm not sure how many insertions/deletions that is supposed to mean
< naywhayare> if they don't clarify what they mean by "nearly static", then it's anyone's guess...
< naywhayare> what tree would you want to replace to implement the packing algorithm, if you went that route?
< andrewmw94> I think the R* tree would make the most sense.
< andrewmw94> The X tree is basically an extension where you can decide to not split a node.
< naywhayare> okay; but don't you already have the R* mostly implemented?
< andrewmw94> yes, but it's mostly the same as the R tree. And I'm not sure about trying to finish it when the dynamic insertion/deletion stuff is still changing. I can try to describe my point ordering idea to you to see if you think it would work, but I don't see a way to have it work with dynamic insertion/deletion
< naywhayare> sure, go ahead and describe it
< andrewmw94> basically, once the tree is built, if we assume that it will no longer change, we should be able to do a quasi-pre-order traversal, keeping track of the point numbers. Then I think we could go over the whole thing once moving points around and changing the values of the indices. It should make the queries faster since the points would be stored contingently with others in there node and nearby nodes should be contingent
< andrewmw94> but if the tree changes, you have to do the whole thing again.
< naywhayare> that is true
< naywhayare> that's a reasonable approach
< naywhayare> I do wonder if it could be done implicitly in the splitting process, like for the BinarySpaceTree
< naywhayare> but it's an idea that would work. I don't know how fast it would be
< naywhayare> I'm going to try to spend some time this afternoon and evening thinking about how to better support dynamic insertions/deletions
< naywhayare> the main problem being that we have to have some way to index new points
< andrewmw94> doing it at the end of tree construction would be O(n^2) I think
< naywhayare> I'd think it could be done in O(n log n) or O(n), but I'm not certain
< naywhayare> anyway, I have to go for now. if you want to put some thought into how to index points across multiple arma::mat objects, too, I'd appreciate it
< andrewmw94> ok
< jenkins-mlpack> Project mlpack - svn checkin test build #2002: SUCCESS in 1 hr 19 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2002/
< jenkins-mlpack> * Ryan Curtin: Document that #315 is fixed.
< jenkins-mlpack> * Ryan Curtin: Test HMM initial probabilities.
< jenkins-mlpack> * Ryan Curtin: Add support for HMM initial states. Slight modification of API for creating
< jenkins-mlpack> HMMs by hand -- the initial parameter is now required. This change may affect
< jenkins-mlpack> some existing results, and the new results may not perfectly agree with MATLAB,
< jenkins-mlpack> but MATLAB does not have the flexibility to seriously support initial
< jenkins-mlpack> probabilities. It is possible to set the initial parameters such that it will
< jenkins-mlpack> emulate MATLAB behavior right, probably by setting the initial probabilities to
< jenkins-mlpack> the first column of the transition matrix.
< jenkins-mlpack> * Ryan Curtin: clang complains when default parameters aren't part of the original declaration.
< jenkins-mlpack> Also vim removes trailing whitespaces, so this diff looks way longer and more
< jenkins-mlpack> complex than it actually is...
< jenkins-mlpack> * Ryan Curtin: Return the correct type of the matrix, because it isn't necessarily dense.
sumedh_ has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
< sumedh_> naywhayare: its working for reverseStepCount 5... :)
< sumedh_> anyways... what a great match... 7-0 trashing of brazil :P
< sumedh_> 7-1 it is in fact
imi has quit [Remote host closed the connection]