#mlpack on 2016-05-26 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:07 sumedhghaisas has quit [Ping timeout: 272 seconds]

02:40 tham has joined #mlpack

03:00 < tham> nilay : I read your codes of CopyMakeBorder

03:01 < tham> I think there are something could refine, maybe you could take it as reference

03:02 < tham> 1 : the input data(InImage) would not be changed in the function, it would be better if you can declare it as const

03:02 < tham> 2 : This function would not change any data member of the class, a nice candidate to declare as const member function

03:03 < tham> 3 : maybe prefer size_t as the index type(i, j) is better

03:04 < tham> 4 : postfix increment(i++) of build in type is fine in this case, but it may introduce extra cost if it is iterator. I think using prefix increment(++i) is a good practice

03:05 < tham> 5 : You can simplify the logics of row copy, I put the codes at pastebin(http://pastebin.com/9Pjsz5mU)

03:06 < tham> Another problems are related to api design

03:07 < tham> The libraries I have used like opencv and Qt, both of them manage the images content as a continuous, one dimension array

03:07 < tham> no matter how many channels there are

03:07 < tham> This api assume one slice in charge of one channel

03:08 < tham> But the input of real world library organize the pixels as following BGRBGRBGR(or other orders, like RGBRGBRGB)

03:08 < tham> The image could be gray scale too

03:09 < tham> Should we allow the users to specify the humber of channels?

03:19 tham has quit [Quit: Page closed]

03:43 keonkim has quit [Ping timeout: 260 seconds]

03:48 keonkim has joined #mlpack

05:50 mentekid has joined #mlpack

07:59 Mathnerd314 has quit [Ping timeout: 244 seconds]

09:00 nilay has joined #mlpack

09:16 < nilay> tham: thank you for such detailed analysis... i will keep these in mind from now on... and yes logics of row copy can be simplified too... :)

09:17 < nilay> i think if we are doing operation on 1 pixel at a time (which most image processing libraries have to do) then we need to organize content in image array as RGBRGB. . .

09:17 < nilay> but if we are doing operation one channel at a time (which maybe the case here) then it is better to organize content in image array as a cube with 3 channels.

09:19 < nilay> this is a classic array of structures vs structure of arrays problem: http://stackoverflow.com/questions/17924705/structure-of-arrays-vs-array-of-structures-in-cuda

09:19 < nilay> we need to organize the layout in such a manner so that we exploit locality

09:21 < nilay> to handle grayscale images also, i think i can take channels as input in each function. .

09:21 < nilay> do you think that would work>?

09:21 < nilay> i mean, do you think that is good enough. .

10:01 < nilay> then in that case for grayscale images the arma::cube image will be a cube with dimensions x * y * 1.

10:31 < zoq> tham: "manage the images content as a continuous, one dimension array" I'm not sure I get it, how is that different from what arma::mat does? Using memptr you should end up with the same result.

10:31 < zoq> tham nilay: I agree we should let the user define the number of channels.

10:32 < zoq> tham nilay: I think, if you use the padded matrix instead of the input matrix, you could also get rid of the subvec operation. Another solution would be to use the shift function (4 times), to get rid of the for loops. But I'm not sure, if that's faster.

10:38 < zoq> tham nilay: I think, we agreed to open a pull request once feature x is finished. So, I think, we can discuss code optimizations etc. in the pull request. That makes it much easier for all of us to make comments.

10:49 nilay has quit [Ping timeout: 250 seconds]

11:18 sumedhghaisas has joined #mlpack

11:54 nilay has joined #mlpack

11:55 < nilay> zoq: so i thought i open pull request after feature extraction part is done?

11:55 travis-ci has joined #mlpack

11:55 < travis-ci> mlpack/mlpack#832 (master - 6f6173c : Marcus Edel): The build passed.

11:55 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/5d1723d3305e...6f6173c76514

11:55 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/133085620

11:55 travis-ci has left #mlpack []

11:56 < zoq> nilay: Sounds good

11:57 < nilay> zoq: ok. i think it should be done today or tomorrow. and then i think there will be many comments for the first request :p

11:58 < zoq> nilay: Sounds great, just wanted to point out that we can discuss code more easily using github.

12:00 < zoq> nilay: Do you think, there are some easy tests we could write for the feature extraction part?

12:01 < nilay> zoq: we can just input image and compare python code (or papers matlab code) and the code i wrote, if result is same then it is good?

12:02 < nilay> zoq: please tell me if you mean something else. . .

12:03 < zoq> nilay: hm, I have to think about it, but I agree, comparing with the reference code is one option.

12:04 < nilay> zoq: ok

12:05 < nilay> zoq: we write tests only to verify our correctness or do they help users in any way too?

12:14 < zoq> nilay: A test could serve as correctness test and additionally as a small introduction how to use method x in a particular way. However, we don't really write a test, to show the user how to use the code, it's more like a neat side product.

12:19 < nilay> zoq: so we can write a test for the overall feature extraction part. but the other functions inside are highly specific to the algorithm, so we won't write a test for those functions.

12:19 < nilay> what do you think?

12:23 < nilay> i'll be back in 1 or 2 hours.

12:23 nilay has quit [Quit: Page closed]

12:24 < zoq> nilay: I agree, we should write a test for the overall feature extraction part. And popably a test, for each function we think it makes sense, to do so. I'm not sure there is one yet, but we should, keep that in mind. Does that sound reasonable?

12:49 < zoq> If we return a tuple using std::make_tuple does it call the copy constructor or the move constructor or both? If it calls the copy constructor I'm not sure, it's a good idea, to use the tuple interface inside the preprocess split main.

13:19 nilay has joined #mlpack

13:23 < nilay> zoq: i agree about the testing for functions for which it makes sense part. i did not understand what you mean by "If it calls the copy constructor I'm not sure, it's a good idea, to use the tuple interface inside the preprocess split main."

13:24 < rcurtin> nilay: I think that comment was about #650 :)

13:25 < rcurtin> zoq: I agree, but I am not sure of the behavior of std::tuple in that setting

13:26 < nilay> rcurtin: ok.. i was working hard to make a sense of it :P

13:31 < zoq> rcurtin: Maybe keonkim or tham can provide some insights, I just wanted to make sure we don't use the slow interface inside the preprocessing split tool.

13:36 marcosirc has joined #mlpack

13:42 < rcurtin> marcosirc: I guess you can't close tickets ?

13:43 < marcosirc> Hi Ryan! no I can't.

13:44 < rcurtin> okay, I think there are a lot of things about github permissions I don't completely know :)

13:44 < marcosirc> Haha

13:44 < marcosirc> Thanks for your reply yesterday, about the b2 bound.

13:44 < rcurtin> yeah, did it make sense?

13:45 < marcosirc> I have an example of a tree where b2 can fail, it is related to the rectangle bound I mentioned.

13:45 < rcurtin> okay, can you write it up and put it in the ticket? I have to leave shortly, but I will read it when I have a chance

13:45 < marcosirc> I think the proof doesn't work for non-ball bound trees

13:46 < marcosirc> Yes! I am writing a clear explanation. I will add it as soon as I can.

13:46 < rcurtin> great, thanks!

13:53 < keonkim> rcurtin, zoq: I believe std::make_tuple calls copy constructor.

13:55 < keonkim> As discussed in https://github.com/mlpack/mlpack/pull/523, I think having 6 parameters for TrainTestSplit(input, train, test, and other 3 for labels) can be used as an alternative.

14:00 < keonkim> or we can make it like std::move(std::make_tuple())

14:39 Mathnerd314 has joined #mlpack

14:44 tham has joined #mlpack

14:52 < keonkim> I personally prefer option 1 (6 parameter) because that way we can make CLI executable and the function accept same number of parameters.

14:56 < tham> Hi, std::tuple do not need to move

14:56 < tham> but the parameters(trainData, testData, trainLabel, testLabel) do need to move

14:57 < tham> I forgot to move them, will fix this later on

14:57 < tham> sorry for that

14:57 < keonkim> hello :)

14:57 < tham> I did not notice this obvious defects before

14:57 < tham> Thanks for pointing out

15:00 < tham> nilay zoq : about the image layout, I think we should discuss later after the first pull request start

15:02 < tham> It would be easier if we can see the whole api and structures

15:15 < tham> keonkim : hello:)

15:23 < tham> I open a pull request, #653

15:26 < tham> keonkim : You can prefer option 1 or 2, there is why there are two options from the beginning

15:30 < tham> Discussion of stack overflow(http://stackoverflow.com/questions/17473753/c11-return-value-optimization-or-move)--c++11 Return value optimization or move?

15:31 < tham> In short, do not move local variable, because they will be optimized, either optimized the RVO or move

15:31 < tham> optimized by rvo or move(if rvo cannot be done)

15:32 < tham> I move the parameters(trainData, testData, trainLabel, testLabel) into tuple, because if I do not, make_tuple will copy the value of the parameters

15:33 < tham> The return value of make_tuple itself is rvalue, it can either optimized by rvo, or move

15:33 < tham> if rvo fail, the tuple generated by make_tuple will be moved

15:35 < tham> Do it sound confuse?

15:50 < tham> keonkim zoq rcurtin nilay : My explain make sense?

16:04 < keonkim> tham: yup

16:05 < zoq> tham: Sounds good, however I'm not sure that the move semantic solution is as fast as the reference solution, because move semantics don't have zero overhead, right?

16:06 < tham> I am glad it make sense, this topic make me feel confuse before

16:07 < tham> Yap, it need to copy the pointer, and some internal state

16:07 < tham> It is just another option for the users

16:07 < tham> Not all of the codes need fastest speed

16:08 < tham> It most of the cases, the difference should be able to neglect

16:08 < tham> but if you want to squeeze every cycle from cpu

16:09 < tham> pass by reference maybe is a better choice(I guess, by instinct)

16:10 < zoq> tham: Never tested it, you are probably right and the overhead is negligible, thanks, for the clarification.

16:12 < tham> zoq : you are welcome

16:22 nilay has quit [Ping timeout: 250 seconds]

16:31 tham has quit [Quit: Page closed]

17:21 tsathoggua has joined #mlpack

17:22 tsathoggua has quit [Client Quit]

18:24 sumedhghaisas has quit [Remote host closed the connection]

19:46 < rcurtin> keonkim: there is no string type for Armadillo matrices, you should assume that labels are either integers or doubles

19:47 < rcurtin> like zoq said in his comment, doubles can be used if splitting regression labels (which are floating-point not integers)

19:47 < rcurtin> I think there is some minor trickiness to consider for the CLI program... if we load "0" as a double and save it, it'll save as "0.0e+0.0000000" or something like this

20:01 < keonkim> rcurtin: yup.. I should think about it tomorrow. Got to sleep for now..

20:02 < keonkim> thanks for the review. :)