verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
lozhnikov has joined #mlpack
zoq has quit [Remote host closed the connection]
zoq has joined #mlpack
Mathnerd314 has quit [Ping timeout: 244 seconds]
mentekid has joined #mlpack
nilay has joined #mlpack
mentekid has quit [Ping timeout: 272 seconds]
mentekid has joined #mlpack
marcosirc has joined #mlpack
< rcurtin>
marcosirc: I have been thinking about the ticket you opened, I hope to have a comprehensive response soon
< rcurtin>
I still need to think through a few things, but I think that you are right that the 2(\lambda(N_q) - \lambda(N_c)) can subtract too much, but I am not sure the proposed alternative is right
< rcurtin>
but it is possible that as I think about it more I will come to a completely different conclusion :)
< marcosirc>
Haha ok! Thanks for your feedback!
< mentekid>
rcurtin: I will start with multiprobe on the upstream/master version of lsh_search_impl.hpp, meaning unique won't be part of my code when I submit it
< marcosirc>
If you agree, I could modify the code according to my proposal, and make some test...
< mentekid>
since we haven't reached a conclusion about which version we should keep
< mentekid>
and when we decide I'll merge those changes, what do you think?
mentekid has quit [Ping timeout: 244 seconds]
nilay_ has joined #mlpack
< nilay_>
zoq: can you suggest how to implement copyMakeBorder and resize functions of opencv? The code for these looks bulky.
< zoq>
nilay_: Tham has written a basic bilinear interpolation function that we could use: http://pastebin.com/tjRzmtYr for the resize function. The interpolation strategy shouldn't really matter in our case.
< zoq>
nilay_: I think we could also use the DownwardReSampling function from the GlimpseLayer class, we could test which function is faster. However, in that case we have to modify the DownwardReSampling.
nilay_ has quit [Ping timeout: 250 seconds]
sumedhghaisas has joined #mlpack
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
Mathnerd314 has joined #mlpack
nilay_ has joined #mlpack
< rcurtin>
marcosirc: you can try implementing it if you like, but even the version we have now is bug-free because there are no trees we implement that can cause the prune to be too tight, I think
< rcurtin>
so even if you did make a new version, I don't know if it would show a bug even if there was one
< rcurtin>
mentekid: I think, based on the data we had, that the unique() approach was just about always fastest, with only a few cases where find() was faster
< rcurtin>
so I think I'll leave this up to you: if you want to keep the code simple, we can use unique()
< rcurtin>
if you don't mind a little extra complexity (and documenting why the complexity is there), then we can use the hybrid approach probably with cutoff between 0.01 and 0.1
< rcurtin>
it seemed like it would not make a huge difference whatever was chosen there
< marcosirc>
Ok, yes I agree in that it is hard to find an example where we see a difference.
nilay_ has quit [Ping timeout: 250 seconds]
< rcurtin>
marcosirc: one way to do it might be to create a "random" treetype for the sake of testing, where points are chosen randomly in such a way that satisfies the definition of space tree
< rcurtin>
I think there is a ticket open for this but I've certainly never gotten around to it :)
nilay_ has joined #mlpack
< nilay_>
zoq: wouldn't inputPadded.col(padSize - i - 1) = input.col(i); give a inconsistent dimension error
< marcosirc>
This sounds interesting! I will look for that ticket.
< rcurtin>
but definitely don't feel obligated to do it unless you want to! I'm still undecided on whether or not it would be something that's really helpful
< marcosirc>
Ok. I will take a look.
< marcosirc>
Thanks
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#825 (master - f55427d : Ryan Curtin): The build passed.
< zoq>
nilay: ah, you are right, better to use inputPadded.col(padSize - i - 1) = inputPadded.col(i); or submat(...)
< lozhnikov>
rcurtin: Why does the DescentType::ChooseDescentNode method depend on const arma::vec& point ? All points belong to a dataset since RectangleTree::InsertPoint(const size_t point) depends on the order number. Why don't we use the order number of a point in the dataset instead? (I want to avoid repeated calculation of the hilbert value in case of the discrete
< lozhnikov>
approach).
sumedhghaisas has quit [Ping timeout: 244 seconds]
< rcurtin>
lozhnikov: originally the idea was that the RectangleTree would have Insert() and Delete() methods for adding and removing points
< rcurtin>
since that is what the R-trees are made for
< rcurtin>
so if that support was available, each node in the tree would have to hold its own (small) dataset
< lozhnikov>
should these points be added to the global dataset?
< rcurtin>
(since join_cols(), which is what we would use for adding points, takes a long time with big datasets but doesn't take a long time with small datasets)
< rcurtin>
anyway that support is not currently present in the RectangleTree and you should not worry about it unless you want to, I am just trying to point out why it is like that
< rcurtin>
but, as I think about it more, I think we could modify DescentType::ChooseDescentNode() accordingly
< rcurtin>
like you could have DescentType hold a reference to the tree node (so it can get the tree's dataset)
< rcurtin>
and have ChooseDescentNode() take the index of the point (which you then use the reference to the tree to get the dataset)
< rcurtin>
I guess, that for the Hilbert tree, you coudl have DescentType calculate the Hilbert values of each of the points in the constructor
< rcurtin>
is that what you were thinking?
< lozhnikov>
Yes, you're right. But this should not work for new points which are not present in the dataset.
< rcurtin>
I agree, but perhaps we can consider that some other time
< rcurtin>
one possible workaround would be like this:
< rcurtin>
each RectangleTree node holds its own local dataset
< rcurtin>
when I add a point, I add it to the local dataset with join_cols(), then call ChooseDescentNode() with the new index
< rcurtin>
or... hmm... I am not sure if that would work
< lozhnikov>
There is a problem. This is function is recursive so the node will change.
< lozhnikov>
And i have another question. I should modify CondenseTree() and InsertPoint() since the Hilbert tree requires adding new points according to their Hilbert values. And I should adjust the largest Hilbert value in CondenseTree().
< lozhnikov>
I do not want to include the Hilbert tree specific code into the RectangleTree. So i want to add insertion of a point into a leaf node into the DescentType.
< lozhnikov>
And want to adjust the largest Hilbert value in the SplitType
< rcurtin>
sorry for the slow response, I was caught talking to someone else
< rcurtin>
let me read CondenseTree() to refresh my memory...
< rcurtin>
okay... so my understanding is that in each node, you are caching the maximum Hilbert value of points contained in it
< rcurtin>
but when CondenseTree() is called, you potentially need to update this maximum Hilbert value
< rcurtin>
let me know if that is not the correct problem
< lozhnikov>
yes, you're right
< rcurtin>
it seems to me that CondenseTree() calls InsertPoint() with points that need to be reinserted
< rcurtin>
and at that point you could update the maximum Hilbert value
< rcurtin>
since InsertPoint() calls DescentType and I think your plan is to have DescentType cache Hilbert values and the maximum Hilbert value
< rcurtin>
is there something I've overlooked? I *think* that will work but I am not 100% certain
< lozhnikov>
I use SplitType instead
< rcurtin>
you could access the SplitType using RectangleTree::SplitType()
< lozhnikov>
It seems this should work
< rcurtin>
I don't think it is necessarily pretty to do it like that, since I think ideally SplitType and DescentType should not have dependencies on each other, but I am not sure I see an alternative here
< rcurtin>
since both need access to the Hilbert values of the points
< lozhnikov>
And another issue. CondenseTree() should shrink the bound after DeletePoint(). I should adjust the largest Hilbert value for the Hilbert tree. What is the best way to do?
< rcurtin>
it seems like DeletePoint() does not call anything in SplitType or DescentType
< rcurtin>
I wonder, if maybe it would be better to refactor the RectangleTree and add another template parameter, "AuxiliaryInformationType", which gets called after insertions and deletions to update any auxiliary information
< rcurtin>
so for the Hilbert tree this auxiliary information could be Hilbert values and maximum Hilbert values
< rcurtin>
for the X tree this could be normalModeMaxNumChildren
< rcurtin>
I am not sure if that is the best idea, let me know what you think
< rcurtin>
the other option is to make some extra function in SplitType or DescentType that is called when a point is deleted, but that seems kind of kludgey
< rcurtin>
I have to go for now, I'll be back later tonight
< lozhnikov>
As for me this approach (with auxiliary information) is much better. I think the X tree need not this since normalModeMaxNumChildren is used only in the SplitType. Thanks.