#mlpack on 2016-05-30 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:39 nilay has joined #mlpack

03:18 < nilay> when i try to push the blog post it shows the following error: remote: Permission to mlpack/blog.git denied to nilayjain. fatal: unable to access 'https://github.com/mlpack/blog.git/': The requested URL returned error: 403

03:21 nilay has quit [Quit: Page closed]

04:19 nilay has joined #mlpack

05:34 mentekid has joined #mlpack

05:52 Mathnerd314 has quit [Ping timeout: 252 seconds]

06:28 Mathnerd314 has joined #mlpack

07:33 mentekid has quit [Ping timeout: 264 seconds]

07:37 Mathnerd314 has quit [Ping timeout: 252 seconds]

08:20 mentekid has joined #mlpack

09:19 < zoq> nilay: You need write access to the repository.

09:20 < zoq> nilay: "One last thing, if you like to use the mlpack blog please send us your GitHub

09:20 < zoq> username, so we can give you access to the repository."

09:21 < nilay> zoq: username is: nilayjain

09:24 < nilay> zoq: so i wanted to ask about the view_as_windows function...

09:25 < zoq> nilay: okay, go ahead

09:26 < nilay> it does not preserve the size of the data. . . size becomes larger.. so what happens to the missing entries...?

09:28 < nilay> zoq: do you get my question

09:28 < nilay> by size i mean volume, that is, the total number of entries, in a matrix or a cube...

09:29 < zoq> nilay: the size of ss_ftr?

09:29 < nilay> the shape may change, but how come size. . . and what can we do to implement such a thing. . .

09:29 < nilay> yes

09:29 < nilay> or reg_ftr for that matter

09:29 < zoq> nilay: let me take a look

09:44 < nilay> zoq: ok

09:54 < zoq> nilay: hm, ideally we can avoid the reshaping and resizing operation to increase the performance. I'm not quite sure yet, what's the best way to do this. Since we have to support different image sizes, it's not that easy. I have to think about it.

09:54 < zoq> nilay: So what they did is to go from the original source to a reshaped source and then remove the border. For now, you can just hard code, the last matrix size (1000, 256, 13)

09:56 < nilay> zoq: the last matrix size also does not preserve the volume

09:58 < nilay> zoq: it is not just reshape, there is something else also, so how can we hard code?

10:08 < zoq> nilay: So, you have to create a new matrix or overwrite the existing for ss_ftr = ss_ftr[:, grid_pos] because it chooses n_cell blocks from all samples.

10:10 < zoq> nilay: the same thing for pdist

10:11 < nilay> zoq: what about ss_ftr = view_as_windows(channels, (p_size, p_size, n_ch))

10:13 < zoq> That's just a neat way, to create a 4 tensor matrix. If you go with (1000, 256, 13) for the if else case, we don't have to to that using the channels input.

10:20 < nilay> zoq: ok, but in that step we have changed the shape and size of ss_ftr, and so now how reshaping it to (1000, 256, 13) would be correct?

10:21 < nilay> zoq: do you see the problem, or am i wrong?

10:22 < zoq> nilay: That's what I'll have to think about, do you think it works if you just use (1000, 256, 13) for now?

10:23 < nilay> zoq: so say i use (1000, 256, 13) , but for the entries that are extra, what would be their value

10:27 < zoq> nilay: ah, right

10:30 < nilay> maybe view_as_windows adds some values too?

10:32 marcosirc has joined #mlpack

10:33 < zoq> nilay: what this function does is to create all 16x16 patches from input

10:35 < zoq> nilay: we could implement our own view_as_windows function, but I think there is a better way

10:37 < nilay> zoq: ok, go ahead

10:38 < zoq> nilay: hm, we need all patches, so maybe not ...

10:40 < zoq> nilay: Looks like we have to replicate the function ...

10:43 < nilay> zoq: extra dimensions come because when we convert say 5*5 matrix to views of 2*2 matrix, we require 9 matrices. this is because at border we don't have 2*2 patch but we still waste 2*2 matrix for it

10:44 < zoq> nilay: right

10:45 < zoq> nilay: the implementation is straightforward two for loops and extract the submat/subcube at that postion

10:46 < nilay> zoq: anything written in numpy or scipy is difficult to replicate. they just check edge cases with python and call a c function which uses pointers and stuff

10:46 < zoq> nilay: if you like I can write a basic view_as_windows function

10:47 < nilay> zoq: you can tell me and i can try to write it.

10:49 < nilay> zoq: in armadillo we will have to store such views as fields which would be very bad.

10:50 < zoq> nilay: we can just use mat or cube, what's the type of channels in your implementation?

10:50 < nilay> type?

10:50 < nilay> RGB

10:50 < zoq> nilay: data type (mat or cube)

10:50 < nilay> cube for 3d

10:50 < zoq> nilay: okay

10:50 < nilay> mat for 2d

11:29 < zoq> nilay: https://gist.github.com/zoq/fdde0b527e2b256e7c6a23545536f4e6

11:29 < zoq> nilay: that function should create a 16x16 patch at (0, 0) (0, 1), (0, 2) ...

11:31 < zoq> nilay: the problem is it's memory inefficient (github page "On my machine, about 12 GB memory is required for training.")

11:31 < nilay> zoq: yes

11:32 < zoq> nilay: I need to think about the problem, maybe we can come up with a better solution ...

11:35 < zoq> nilay: you can also fill the array with dummy values, and use that for testing, maybe I can came back to you tonight with a better solution for the window view problem

11:35 < nilay> zoq: ok

11:58 nilay has quit [Quit: Page closed]

12:30 mentekid has quit [Ping timeout: 260 seconds]

13:41 mentekid has joined #mlpack

15:19 Mathnerd314 has joined #mlpack

15:22 < zoq> nilay: So, I think I found a solution, that works, it should consume about 26.6MB ((1000 * 13 * 256) * 8 byte). If you tell me, how the location structure (smp_lock) looks like, I can write that down.

16:01 < zoq> nilay: The great thing is we don't need all patches, we just the the patches at the specified locations which are 1000 as proposed by the authors.

16:01 < zoq> nilay: So you can either use a mat or cube object e.g. a = mat(256, 13 * 1000) or b=cube(16, 16, 13 * 1000) and sample the correct location, so every 13 cols or slices starts a new patch.

16:51 nilay has joined #mlpack

17:10 < zoq> nilay: I think I figured it out: http://mlpack.org/irc/

17:27 < marcosirc> Hi zoq:

17:27 < zoq> marcosirc: Hello

17:28 < marcosirc> Thanks. I would like to ask you about the benchmarking system

17:28 < marcosirc> Can I modify it to mesure the number of BaseCases in knn and kfn?

17:30 < zoq> marcosirc: You like to perform some kind of grid search right?

17:30 < marcosirc> No, just to test two different implementations with many datasets.

17:32 < nilay> zoq: Oh sorry, I didn't see logs..

17:32 < nilay> let me check

17:32 < zoq> marcosirc: ah, okay, in this case I guess, you can just add a new config block to the existing config.

17:35 < marcosirc> Ah ok. I should get the number of base cases from the verbose output. Should I add a new metric or something like that?

17:35 < nilay> zoq: so i implemented smp_loc as a cube with 2 slices.

17:36 < zoq> marcosirc: What do you like to measure? time?

17:37 < zoq> nilay: so (#xloc, #yloc, 2)?

17:37 < nilay> zoq: i use find() to calculate the pos_loc and neg_loc locations which are column vectors.

17:37 < nilay> zoq: i can input them any way you want

17:38 < marcosirc> zoq: Not the time. I need to measure the number of times a method "BaseCase" is called. This information is printed in the verbose output of knn and knn.

17:38 < nilay> zoq: smp_loc is only used in reg_ftr and ss_ftr which is where i am stuck at

17:39 < nilay> zoq: i can input pos_loc and neg_loc in any way you want. .. right now we have them as column vectors.

17:39 < marcosirc> Of course I can do it through a bash script. But, I was wondering if I could use the benchmarking system for this...

17:41 < zoq> marcosirc: So I think implementing another metric is the right way, it's straightforward, if you send me the output, I can do that for you, or you can do it yourself. Btw. I can setup a machine with the running benchmark suite you could use, if you like.

17:41 < nilay> zoq: we can use ind2sub to know the subscripts for them.. and yeah.. so that is the smp_loc structure for now.. i don't know if we need a structure for that.

17:43 < nilay> zoq: (uvec& pos_loc, uvec& neg_loc)

17:44 < zoq> nilay: okay, the locations are the same locations as in prepare_data it's pos_loc and neg_loc combined. So if I can choose the representation I would use locations = arma::mat(2, numberlocations).

17:44 < zoq> nilay: So, I could call locations.col(0) to get the firstio x,y location or location(0, 0) to get the first x location.

17:45 < marcosirc> zoq: Thanks. Ok, I will work in the benchmarking repo to add that metric, and I make a pull request when it is ready.

17:45 < zoq> marcosirc: Sounds good, let me know if you need help.

17:45 < marcosirc> Ok. Thanks!

17:46 < zoq> nilay: If you agree, I'll go and use that location structure representation, for the window_view block.

17:47 < nilay> zoq: please give me a minute.

17:49 < nilay> zoq: i didn't understand what you mean by "I could call locations.col(0) to get the first x,y location or location(0, 0) to get the first x location." what i am saying is the following: you have some x number of pos_loc's, and some x number of neg_loc's. (as python codes use rand.permutation to take only x number of locations.). so this is what you have.

17:49 travis-ci has joined #mlpack

17:49 < travis-ci> mlpack/mlpack#849 (master - 3d1ed0f : Marcus Edel): The build is still failing.

17:50 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/ec1c46a5fc6c...3d1ed0fa731b

17:50 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/133969580

17:50 travis-ci has left #mlpack []

17:51 < nilay> so an entry in the arma::mat you made is one location.

17:52 < nilay> we can convert that entry (which is an index) to subscript using ind2sub, so that it represents one location.

17:54 < nilay> zoq: are we saying the same thing?

17:54 < zoq> nilay: yes, right, so we end up with an x,y or row,col for one location right?

17:55 < zoq> nilay: ind (e.g 0) -> (0, 0)

17:55 < nilay> yes location[x][y] gives you one location. we concatenate the 2 uvec's to get a size(x,2) matrix. in the first column you have pos_loc and in the second col you have neg_loc

17:55 < nilay> zoq: correct

17:57 < zoq> nilay: okay, so we could store all locations in arma::mat(2, numberoflocations) right?

17:57 < nilay> yes. but i don't know why you use it as (2,numberoflocations), shouldn't it be (numberoflocations, 2)?

17:59 < zoq> nilay: ah, I see, yeah, I think we could use (numberoflocations, 2) or (2, numberoflocations), we just have to use .row or .col

17:59 < zoq> nilay: right

18:00 < zoq> nilay: if you like to go with (numberoflocations, 2), it's fine with me

18:01 < nilay> zoq: i was asking is there any benefit to use (2, numberoflocations)

18:03 < nilay> zoq: so we are taking random features at random locations?

18:03 < zoq> nilay: we use the column-major order, so we should make use of that

18:03 < zoq> nilay: yes, right

18:04 < zoq> nilay: I messed that up in my example, so you are right

18:04 < nilay> zoq: ok i forgot it reverses in column major order, sorry

18:05 < zoq> nilay: so, we agree, to use arma::mat to store the locations right?

18:05 < nilay> zoq: yes

18:07 < zoq> nilay: okay, great, I'll go and write the window_view block, hopefully in the next hours

18:08 < nilay> zoq: if you can please explain your solution again?

18:08 < nilay> then maybe you don't need to write it.. :)

18:12 sumedhghaisas has joined #mlpack

18:13 < sumedhghaisas> marcosirc: hey marcos...

18:13 < marcosirc> Hi sumedh.

18:14 < sumedhghaisas> rcurtin: Lets just also ping Ryan... he might give some inputs too...

18:15 < sumedhghaisas> okay so... I read your last comment on github...

18:15 < sumedhghaisas> I agree about changing the code to test your function...

18:16 < sumedhghaisas> That would mean pushing our timeline a bit...

18:17 < marcosirc> Yes.. I have been working today in the implementation of b_aux. It was simple to fix.

18:17 < marcosirc> I have tested it in the corel dataset, resulting in exactly the same number of base cases after the modification than with the original b_2 bound.

18:18 < marcosirc> Even for cover trees.

18:19 < marcosirc> I was thinking of using the benchmarking system to test this with many different datasets. Once we check everything is ok, the code could be merged.

18:20 < marcosirc> Would you agree on this? I will try to finish this as fast as I can so we can continue with the proposed timeline.

18:20 < sumedhghaisas> hmm... thats good... but I think for trees on which B2 is correct... b2 is definitely tighter...

18:20 < sumedhghaisas> So its worth adding another another trait... what do you think??

18:21 < marcosirc> Well... Yes. Because of that, I proposed to include some information in the treetraits to decide if we want to use previous b2 bound.

18:21 < marcosirc> Yes!

18:22 < sumedhghaisas> Sure no problem... If you have already pushed the code I am happy to take a look at it...

18:23 < marcosirc> I would have expected to notice a difference in the number of base cases with cover trees (between original b2, and b_aux modification). But I haven't. I will test with other dataset to see what happens.

18:24 < sumedhghaisas> thats right... lets see what Ryan has to say about the train addition.

18:24 < marcosirc> (because cover trees would be the only interesting case where it makes sense to have a difference, because it holds points in non-leaf nodes).

18:24 < sumedhghaisas> yes thats true...

18:24 < marcosirc> I will push the code now.

18:24 < sumedhghaisas> even I expected that... maybe we can generate a small dataset which will force the difference??

18:25 < sumedhghaisas> might be a good test case in our Boost system...

18:26 < sumedhghaisas> We can set some Ci's like you did in your example... then generate some points statistically around it...

18:26 < marcosirc> https://github.com/MarcosPividori/mlpack/tree/knn-bounds

18:29 < marcosirc> Well.. Yes, a dataset to force the difference in cover trees would be interesting. I don't have a deepth understanding in how cover trees work, I would need some time to go in depth about them ...

18:30 < marcosirc> Do you mean the example I mentioned where b2 fails?

18:32 < sumedhghaisas> yes... I think you followed a similar approach in producing the tree...

18:33 < sumedhghaisas> The tree where B2 fails will be good starting point to generate a difference...

18:38 < marcosirc> Mmm, If we want to have a difference in the number of base cases for cover trees, I think we should think in a different example.

18:38 < marcosirc> The example I mentioned was a tree using hyperrectangle bounds.

18:38 < marcosirc> And was an example where b2, fails.

18:39 < marcosirc> Cover trees use ball bounds. And we do not need an example where b2 fails. In fact b2 doesnt fail. We need an example where b2 mean more prunning than b_aux...

18:40 < sumedhghaisas> ohh no... when I said a test case for Boost I mean to write a test where B2 will fail and B aux won't...

18:40 < sumedhghaisas> Not for cover trees...

18:40 < sumedhghaisas> that another scenario...

18:40 < sumedhghaisas> sorry for the confusion...

18:41 < marcosirc> Ahhh... that makes sense.

18:41 < marcosirc> Ok.

18:41 < sumedhghaisas> yes for ball bound finding an differencing forcing example would be difficult...

18:42 < sumedhghaisas> we would need to play upon the difference between B2 and Baur ...

18:42 < marcosirc> Well. With actual tree types, that interesting case can not be implemented, because all trees using hyperrectangles, such as KDTree/R-Tree/R*-Tree/X-Tree, only hold point in leaf nodes.

18:43 < sumedhghaisas> yeah ... where we know B2 might fail ...

18:43 < marcosirc> Yes, if we want to force b2 to fail, we need to hold points in a non-leaf node...

18:46 < marcosirc> Maybe some of the new tree types, that Mikhail Lozhnikov is implementing for this GSoC...

18:48 < sumedhghaisas> not even ball tree I assume ... not very familiar with that but its just hypersphere partitioning I think...

18:49 < marcosirc> Yes... Ball trees only hold point in leaf nodes as KDTrees..

18:50 < sumedhghaisas> yeah just read about them...

18:51 < sumedhghaisas> okay so what I am wondering is... Why not think about the framework changes when the appropriate tree is implemented??

18:55 < sumedhghaisas> What I mean is... without the tree with hyper rectangular bounds with leaf node carrying points...

18:55 < sumedhghaisas> Won't it be hand to test Baux??

18:57 < marcosirc> So, I am not sure if I understand. Do you mean waiting until we have a tree type that fails?

18:59 < sumedhghaisas> yeah... I am not sure if its the right way... but I am thinking about it...When we actually have that tree we would have better understanding of exactly what framework changes are needed...

18:59 < sumedhghaisas> to use both baux and B2

19:00 < sumedhghaisas> and the change will be smoother at that point I think...

19:03 < marcosirc> Well... that is an option. You will have to remember to update the code when adding new tree types...

19:03 < marcosirc> I thought it would be useful to be sure we have the correct bounds before continuing.

19:05 tsathoggua has joined #mlpack

19:05 tsathoggua has quit [Client Quit]

19:05 < marcosirc> I don't think b_aux involves many changes, with the advantage that we know it is correct. But if you prefer, we can wait..

19:06 < sumedhghaisas> yeah thats the negative side of it :( anyways since you have already started making changes...

19:06 < sumedhghaisas> yeah... thats what I was saying :)\

19:07 < sumedhghaisas> lets think about merging later... we will also discuss with ryan ...

19:08 < sumedhghaisas> I will first take a look at the code...

19:08 < sumedhghaisas> we know its correct so lets proceed with it...

19:09 < marcosirc> Thanks. Ok. So, if you agree, I can do more tests with different datasets while we wait for Ryan's opinion.

19:09 < sumedhghaisas> Surely...

19:09 < sumedhghaisas> it would be interesting to see the test data where there is a difference in base cases...

19:10 < sumedhghaisas> if you find one in your datasets...

19:11 < marcosirc> I agree.

19:12 < marcosirc> If we don't see any difference in popular datasets, we would have to think about a special case... I need to read more about cover trees.

20:00 nilay has quit [Quit: Page closed]

20:40 sumedhghaisas has quit [Ping timeout: 244 seconds]

21:20 < rcurtin> marcosirc: sumedhghaisas: just read the conversation

21:20 < rcurtin> I think the RectangleTrees hold points not just in the leaves but also in higher level nodes

21:21 < rcurtin> so maybe it is possible to synthesize a problem case that shows b_2 fails

21:21 < rcurtin> but it would probably be very difficult

21:22 < rcurtin> when I synthesized a test dataset to demonstrate an error with John Langford's cover tree code it took I think a full week

21:22 < rcurtin> I think switching to b_aux is fine, but I need to read the existing ticket and add a comment

21:46 < marcosirc> rcurtin: Thanks for your comments! When you say rectangle tree, do you mean R-tree? I have been reading the code, and it doesn't seem to save point in non-leaf nodes.

21:47 < marcosirc> I have to read it in detail, but that was my impression after reading the RTreeSplit's code.

21:58 mentekid has quit [Ping timeout: 258 seconds]

21:58 marcosirc has quit [Quit: WeeChat 1.4]

22:01 < rcurtin> marcosirc: sorry for the slow response, I am in the car, long drive home...

22:09 travis-ci has joined #mlpack

22:09 < travis-ci> mlpack/mlpack#850 (master - 2fe9e82 : Ryan Curtin): The build was fixed.

22:09 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/3d1ed0fa731b...2fe9e82b6350

22:09 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/134023764

22:09 travis-ci has left #mlpack []

22:09 < rcurtin> I think you are right, the code shows that any RectangleTree (which is R tree, R*-tree, and X tree) only hold points in the leaves

22:10 < rcurtin> but I had thought that R trees held points in all levels... I think maybe I need to look at this code more closely...

22:26 < rcurtin> ah nevermind, I had forgotten, R trees and variants only hold points in the leaves

23:09 travis-ci has joined #mlpack

23:09 < travis-ci> mlpack/mlpack#851 (master - 1dad2b6 : Ryan Curtin): The build passed.

23:09 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/2fe9e82b6350...1dad2b662d59

23:09 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/134032688

23:09 travis-ci has left #mlpack []