naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
witness___ has quit [Quit: Connection closed for inactivity]
govg has quit [Ping timeout: 264 seconds]
witness___ has joined #mlpack
Anand has joined #mlpack
< Anand>
Marcus : Your thoughts on the weka logistic regression predicted labels?
Anand has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
govg has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< marcus_zoq>
Anand_:Hello! Did you commit your changes?
< Anand_>
Marcus : Hi! Yes, to my branch
Anand_ has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
Anand_ has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand_ has quit [Ping timeout: 246 seconds]
< jenkins-mlpack>
Starting build #2000 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare>
also, I'm debugging that kd-tree issue... but it may be a while until I have a solution
< naywhayare>
I set a breakpoint in Score() for the relevant kd-tree nodes using gdb... but it has been running for four hours and hasn't stopped yet...
< andrewmw94>
hmm, not fun
< andrewmw94>
I have a question about C++
< andrewmw94>
for the R* tree, the descent heuristic is different if you are descending to a leaf node or a non-leaf node.
< andrewmw94>
but for the R tree it's the same
< andrewmw94>
so I wanted to pass a boolean to the EvalNode() function so it works with R*-trees, but with R trees it just ignores the parameter
< andrewmw94>
which causes warnings because it is unused
< andrewmw94>
is there a nice way to solve that? I could add another template or something, but that seems silly
< naywhayare>
ah, just comment out the parameter or leave it unnamed
< naywhayare>
i.e. void Function(const double /* unused */)
< naywhayare>
or void Function(const double)
< naywhayare>
I prefer the first because it leaves some information on what the parameter would be, if it was used
< andrewmw94>
ah, thanks
< andrewmw94>
I was thinking "I could add another const variable to the class and then use that and the compiler should optimize it out.
< andrewmw94>
but there must be a better way to do this
govg has joined #mlpack
< andrewmw94>
" C++ is nice because it has features for almost everything
< sumedhghaisas>
naywhayare: According to paper we should get final RMSE of 0.87 ... again I am getting 1.3 :(
< naywhayare>
sumedhghaisas: for incremental SVD?
< sumedhghaisas>
yes...
< naywhayare>
have you tried tweaking the parameters?
< sumedhghaisas>
but the performance is better than SVDBatch definitely :)
andrewmw94 has quit [Quit: Leaving.]
andrewmw94 has joined #mlpack
< sumedhghaisas>
yes... I am trying other parameters now...
< sumedhghaisas>
did you look at the abstraction??
< naywhayare>
for IncompleteIncrementalTermination? it looks good to me; very simple
< sumedhghaisas>
yes... indeed... :)
< sumedhghaisas>
naywhayare: now I am always returning false in IsConverged... current RMSE is 0.90 and decreasing ... :)
< sumedhghaisas>
can you look at the paper right now??
< naywhayare>
I'm actually a bit busy right now, but if you have a question, tell me what it is and I will look into it shortly...
< sumedhghaisas>
okay no problem.. msg me when you are free...
< sumedhghaisas>
naywhayare: not going to watch world cup semifinal ?? :)
< naywhayare>
does that make sense? the first part is the same -- terminate if the relative change in RMSE is below the tolerance
< naywhayare>
and the second part says, terminate if the RMSE jumped back up by increaseTolerance or more
< sumedhghaisas>
yeah... its a good solution... I am thinking now can we remove reverseStepCount??
< sumedhghaisas>
is it important??
< sumedhghaisas>
I guess yes... cause the RMSE may keep on increasing....
< sumedhghaisas>
we need to detect that too...
< naywhayare>
well, hang on... why doesn't reverseStepCount allow the algorithm to converge to an RMSE of 0.87?
< naywhayare>
couldn't you just set it a little higher?
< jenkins-mlpack>
Starting build #2002 for job mlpack - svn checkin test (previous build: SUCCESS)
imi has joined #mlpack
< sumedhghaisas>
yes... but I guess increaseTolerance is better idea ... along with reverseStepCount it will perform better...
govg is now known as GOV|govg
< sumedhghaisas>
naywhayare: I can set reverseStepCount little higher but there can be many kinks in the way... if you see the graph in the paper... there are many up downs....
GOV|govg is now known as zGz|govg
< naywhayare>
sumedhghaisas: that's true -- but how many iterations is each kink?
< sumedhghaisas>
means... I didn;t get you...
< sumedhghaisas>
in the graph before convergence ... there are many many kinks...
< naywhayare>
right
< naywhayare>
but how long is the "up" part of these kinks? 10 iterations? 15 iterations? you should be able to set reverseStepCount to be just a little longer than the longest kink, and that should work
< sumedhghaisas>
yes... but then this can change for different datasets... what should be the default value??
< naywhayare>
well, we can leave the default how it is; many parameters like this have to be tuned for different datasets
< sumedhghaisas>
yes... I will try for higher values of reverseStepCount...
< sumedhghaisas>
lets see if I can produce 0.87...
< andrewmw94>
naywhayare: I have a possibly detailed question. Is now a good time?
< naywhayare>
sure, go ahead
< andrewmw94>
ok, so in the paper for the R* tree, it mentions another paper
< andrewmw94>
so I'm wondering whether I should try to implement that instead, or whether I should change the R tree so that it better supports dynamic insertion/deletion of points
< naywhayare>
the problem we had with dynamic insertion/deletion is that when we have multiple arma::mat objects, it's not clear what to use as an index for a given point
< naywhayare>
one could make a TreeType::Insert() function that appended the given vector to the internally held matrix, but this still costs allocation time equivalent to the size of the full matrix
< andrewmw94>
I think I may have a solution to the point ordering thing. It's rather arbitrary, but I think it is consistent. However, I need to give it some more thought. But it also would not work well when adding/deleting points.
< naywhayare>
or you could hold many matrices internally, and also hold some kind of "index offset" with each matrix, but the question there is, how do we make it so the user can easily understand what the indices they get back from NeighborSearch even are?
< andrewmw94>
yeah. the dynamic insertions and deletions add a lot of extra overhead, and since they aren't used currently I'm dubious that it would be worthwhile
< naywhayare>
they aren't currently used, but maybe someone would find them useful if they were implemented
< naywhayare>
so, I could go either way on this one
< naywhayare>
if you are interested in trying to figure out how to easily support dynamically-sized datasets and be able to grow/shrink the tree accordingly, we can go that way
< naywhayare>
but if not, then perhaps substituting Kamel+Faloutsos's ideas is a reasonable replacement for one of the other types of trees
< naywhayare>
at some point, I would eventually like to be able to work with dynamic datasets, but I am not completely sure how to do that best
< naywhayare>
maybe a user wants to do something like this... they have some server that holds on to a NeighborSearch object which holds on to a tree of some sort
< naywhayare>
users occasionally request something that causes NeighborSearch::Search() to be called (maybe for one query point? maybe many?) and results are processed and returned
< naywhayare>
but at the same time, maybe the server occasionally adds points to the tree as new data becomes available
< naywhayare>
I know those types of situations happen in the real world, but we don't have a good solution for anything like that at the moment
< andrewmw94>
yeah. I know it could be useful, but the R* tree also implies that the packing algorithm is better if the tree is "nearly static"
< andrewmw94>
I'm not sure how many insertions/deletions that is supposed to mean
< naywhayare>
if they don't clarify what they mean by "nearly static", then it's anyone's guess...
< naywhayare>
what tree would you want to replace to implement the packing algorithm, if you went that route?
< andrewmw94>
I think the R* tree would make the most sense.
< andrewmw94>
The X tree is basically an extension where you can decide to not split a node.
< naywhayare>
okay; but don't you already have the R* mostly implemented?
< andrewmw94>
yes, but it's mostly the same as the R tree. And I'm not sure about trying to finish it when the dynamic insertion/deletion stuff is still changing. I can try to describe my point ordering idea to you to see if you think it would work, but I don't see a way to have it work with dynamic insertion/deletion
< naywhayare>
sure, go ahead and describe it
< andrewmw94>
basically, once the tree is built, if we assume that it will no longer change, we should be able to do a quasi-pre-order traversal, keeping track of the point numbers. Then I think we could go over the whole thing once moving points around and changing the values of the indices. It should make the queries faster since the points would be stored contingently with others in there node and nearby nodes should be contingent
< andrewmw94>
but if the tree changes, you have to do the whole thing again.
< naywhayare>
that is true
< naywhayare>
that's a reasonable approach
< naywhayare>
I do wonder if it could be done implicitly in the splitting process, like for the BinarySpaceTree
< naywhayare>
but it's an idea that would work. I don't know how fast it would be
< naywhayare>
I'm going to try to spend some time this afternoon and evening thinking about how to better support dynamic insertions/deletions
< naywhayare>
the main problem being that we have to have some way to index new points
< andrewmw94>
doing it at the end of tree construction would be O(n^2) I think
< naywhayare>
I'd think it could be done in O(n log n) or O(n), but I'm not certain
< naywhayare>
anyway, I have to go for now. if you want to put some thought into how to index points across multiple arma::mat objects, too, I'd appreciate it