#mlpack on 2016-05-24 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:20 lozhnikov has joined #mlpack

00:29 zoq has quit [Remote host closed the connection]

00:43 zoq has joined #mlpack

06:04 Mathnerd314 has quit [Ping timeout: 244 seconds]

06:08 mentekid has joined #mlpack

06:41 nilay has joined #mlpack

07:54 mentekid has quit [Ping timeout: 272 seconds]

08:44 mentekid has joined #mlpack

13:42 marcosirc has joined #mlpack

13:58 < rcurtin> marcosirc: I have been thinking about the ticket you opened, I hope to have a comprehensive response soon

13:59 < rcurtin> I still need to think through a few things, but I think that you are right that the 2(\lambda(N_q) - \lambda(N_c)) can subtract too much, but I am not sure the proposed alternative is right

13:59 < rcurtin> but it is possible that as I think about it more I will come to a completely different conclusion :)

14:00 < marcosirc> Haha ok! Thanks for your feedback!

14:02 < mentekid> rcurtin: I will start with multiprobe on the upstream/master version of lsh_search_impl.hpp, meaning unique won't be part of my code when I submit it

14:02 < marcosirc> If you agree, I could modify the code according to my proposal, and make some test...

14:02 < mentekid> since we haven't reached a conclusion about which version we should keep

14:09 < mentekid> and when we decide I'll merge those changes, what do you think?

14:26 mentekid has quit [Ping timeout: 244 seconds]

14:58 nilay_ has joined #mlpack

15:21 < nilay_> zoq: can you suggest how to implement copyMakeBorder and resize functions of opencv? The code for these looks bulky.

15:40 < zoq> nilay_: Sure, let me take a look.

15:43 mentekid has joined #mlpack

15:48 < zoq> nilay_: so I think we could do something like this: https://gist.github.com/zoq/41c92b710b601ba302badb27b64301c9 The for loop to create the border is'not completely finished, but I hope you get the gist.

15:48 < zoq> nilay_: Tham has written a basic bilinear interpolation function that we could use: http://pastebin.com/tjRzmtYr for the resize function. The interpolation strategy shouldn't really matter in our case.

16:07 < zoq> nilay_: I think we could also use the DownwardReSampling function from the GlimpseLayer class, we could test which function is faster. However, in that case we have to modify the DownwardReSampling.

16:20 nilay_ has quit [Ping timeout: 250 seconds]

16:48 sumedhghaisas has joined #mlpack

17:01 tsathoggua has joined #mlpack

17:03 tsathoggua has quit [Client Quit]

17:10 Mathnerd314 has joined #mlpack

18:05 nilay_ has joined #mlpack

18:11 < rcurtin> marcosirc: you can try implementing it if you like, but even the version we have now is bug-free because there are no trees we implement that can cause the prune to be too tight, I think

18:11 < rcurtin> so even if you did make a new version, I don't know if it would show a bug even if there was one

18:12 < rcurtin> mentekid: I think, based on the data we had, that the unique() approach was just about always fastest, with only a few cases where find() was faster

18:12 < rcurtin> so I think I'll leave this up to you: if you want to keep the code simple, we can use unique()

18:12 < rcurtin> if you don't mind a little extra complexity (and documenting why the complexity is there), then we can use the hybrid approach probably with cutoff between 0.01 and 0.1

18:13 < rcurtin> it seemed like it would not make a huge difference whatever was chosen there

18:14 < marcosirc> Ok, yes I agree in that it is hard to find an example where we see a difference.

18:18 nilay_ has quit [Ping timeout: 250 seconds]

18:19 < rcurtin> marcosirc: one way to do it might be to create a "random" treetype for the sake of testing, where points are chosen randomly in such a way that satisfies the definition of space tree

18:19 < rcurtin> I think there is a ticket open for this but I've certainly never gotten around to it :)

18:23 nilay_ has joined #mlpack

18:23 < nilay_> zoq: wouldn't inputPadded.col(padSize - i - 1) = input.col(i); give a inconsistent dimension error

18:23 < marcosirc> This sounds interesting! I will look for that ticket.

18:23 < rcurtin> let me see if I can find it...

18:24 < rcurtin> https://github.com/mlpack/mlpack/issues/272

18:24 < rcurtin> but definitely don't feel obligated to do it unless you want to! I'm still undecided on whether or not it would be something that's really helpful

18:28 < marcosirc> Ok. I will take a look.

18:28 < marcosirc> Thanks

18:44 travis-ci has joined #mlpack

18:44 < travis-ci> mlpack/mlpack#825 (master - f55427d : Ryan Curtin): The build passed.

18:44 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/36b73161c403...f55427d49ef2

18:44 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/132635539

18:44 travis-ci has left #mlpack []

18:45 nilay_ has quit [Ping timeout: 250 seconds]

19:41 marcosirc has quit [Quit: WeeChat 1.4]

19:43 < zoq> nilay: ah, you are right, better to use inputPadded.col(padSize - i - 1) = inputPadded.col(i); or submat(...)

20:20 < lozhnikov> rcurtin: Why does the DescentType::ChooseDescentNode method depend on const arma::vec& point ? All points belong to a dataset since RectangleTree::InsertPoint(const size_t point) depends on the order number. Why don't we use the order number of a point in the dataset instead? (I want to avoid repeated calculation of the hilbert value in case of the discrete

20:20 < lozhnikov> approach).

20:49 sumedhghaisas has quit [Ping timeout: 244 seconds]

21:04 < rcurtin> lozhnikov: originally the idea was that the RectangleTree would have Insert() and Delete() methods for adding and removing points

21:04 < rcurtin> since that is what the R-trees are made for

21:07 < rcurtin> so if that support was available, each node in the tree would have to hold its own (small) dataset

21:08 < lozhnikov> should these points be added to the global dataset?

21:09 < rcurtin> (since join_cols(), which is what we would use for adding points, takes a long time with big datasets but doesn't take a long time with small datasets)

21:11 < rcurtin> anyway that support is not currently present in the RectangleTree and you should not worry about it unless you want to, I am just trying to point out why it is like that

21:11 < rcurtin> but, as I think about it more, I think we could modify DescentType::ChooseDescentNode() accordingly

21:11 < rcurtin> like you could have DescentType hold a reference to the tree node (so it can get the tree's dataset)

21:12 < rcurtin> and have ChooseDescentNode() take the index of the point (which you then use the reference to the tree to get the dataset)

21:12 < rcurtin> I guess, that for the Hilbert tree, you coudl have DescentType calculate the Hilbert values of each of the points in the constructor

21:12 < rcurtin> is that what you were thinking?

21:14 < lozhnikov> Yes, you're right. But this should not work for new points which are not present in the dataset.

21:22 < rcurtin> I agree, but perhaps we can consider that some other time

21:22 < rcurtin> one possible workaround would be like this:

21:22 < rcurtin> each RectangleTree node holds its own local dataset

21:22 < rcurtin> when I add a point, I add it to the local dataset with join_cols(), then call ChooseDescentNode() with the new index

21:22 < rcurtin> or... hmm... I am not sure if that would work

21:24 < lozhnikov> There is a problem. This is function is recursive so the node will change.

21:26 < lozhnikov> And i have another question. I should modify CondenseTree() and InsertPoint() since the Hilbert tree requires adding new points according to their Hilbert values. And I should adjust the largest Hilbert value in CondenseTree().

21:28 < lozhnikov> I do not want to include the Hilbert tree specific code into the RectangleTree. So i want to add insertion of a point into a leaf node into the DescentType.

21:30 < lozhnikov> And want to adjust the largest Hilbert value in the SplitType

21:44 < rcurtin> sorry for the slow response, I was caught talking to someone else

21:44 < rcurtin> let me read CondenseTree() to refresh my memory...

21:48 < rcurtin> okay... so my understanding is that in each node, you are caching the maximum Hilbert value of points contained in it

21:48 < rcurtin> but when CondenseTree() is called, you potentially need to update this maximum Hilbert value

21:48 < rcurtin> let me know if that is not the correct problem

21:48 < lozhnikov> yes, you're right

21:49 < rcurtin> it seems to me that CondenseTree() calls InsertPoint() with points that need to be reinserted

21:49 < rcurtin> and at that point you could update the maximum Hilbert value

21:51 < rcurtin> since InsertPoint() calls DescentType and I think your plan is to have DescentType cache Hilbert values and the maximum Hilbert value

21:51 < rcurtin> is there something I've overlooked? I *think* that will work but I am not 100% certain

21:52 < lozhnikov> I use SplitType instead

21:52 < rcurtin> you could access the SplitType using RectangleTree::SplitType()

21:54 < lozhnikov> It seems this should work

21:54 < rcurtin> I don't think it is necessarily pretty to do it like that, since I think ideally SplitType and DescentType should not have dependencies on each other, but I am not sure I see an alternative here

21:55 < rcurtin> since both need access to the Hilbert values of the points

21:59 < lozhnikov> And another issue. CondenseTree() should shrink the bound after DeletePoint(). I should adjust the largest Hilbert value for the Hilbert tree. What is the best way to do?

22:09 < rcurtin> it seems like DeletePoint() does not call anything in SplitType or DescentType

22:09 < rcurtin> I wonder, if maybe it would be better to refactor the RectangleTree and add another template parameter, "AuxiliaryInformationType", which gets called after insertions and deletions to update any auxiliary information

22:10 < rcurtin> so for the Hilbert tree this auxiliary information could be Hilbert values and maximum Hilbert values

22:10 < rcurtin> for the X tree this could be normalModeMaxNumChildren

22:10 < rcurtin> I am not sure if that is the best idea, let me know what you think

22:11 < rcurtin> the other option is to make some extra function in SplitType or DescentType that is called when a point is deleted, but that seems kind of kludgey

22:11 < rcurtin> I have to go for now, I'll be back later tonight

22:14 < lozhnikov> As for me this approach (with auxiliary information) is much better. I think the X tree need not this since normalModeMaxNumChildren is used only in the SplitType. Thanks.

22:36 mentekid has quit [Ping timeout: 264 seconds]