verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 272 seconds]
tham has joined #mlpack
< tham> nilay : I read your codes of CopyMakeBorder
< tham> I think there are something could refine, maybe you could take it as reference
< tham> 1 : the input data(InImage) would not be changed in the function, it would be better if you can declare it as const
< tham> 2 : This function would not change any data member of the class, a nice candidate to declare as const member function
< tham> 3 : maybe prefer size_t as the index type(i, j) is better
< tham> 4 : postfix increment(i++) of build in type is fine in this case, but it may introduce extra cost if it is iterator. I think using prefix increment(++i) is a good practice
< tham> 5 : You can simplify the logics of row copy, I put the codes at pastebin(http://pastebin.com/9Pjsz5mU)
< tham> Another problems are related to api design
< tham> The libraries I have used like opencv and Qt, both of them manage the images content as a continuous, one dimension array
< tham> no matter how many channels there are
< tham> This api assume one slice in charge of one channel
< tham> But the input of real world library organize the pixels as following BGRBGRBGR(or other orders, like RGBRGBRGB)
< tham> The image could be gray scale too
< tham> Should we allow the users to specify the humber of channels?
tham has quit [Quit: Page closed]
keonkim has quit [Ping timeout: 260 seconds]
keonkim has joined #mlpack
mentekid has joined #mlpack
Mathnerd314 has quit [Ping timeout: 244 seconds]
nilay has joined #mlpack
< nilay> tham: thank you for such detailed analysis... i will keep these in mind from now on... and yes logics of row copy can be simplified too... :)
< nilay> i think if we are doing operation on 1 pixel at a time (which most image processing libraries have to do) then we need to organize content in image array as RGBRGB. . .
< nilay> but if we are doing operation one channel at a time (which maybe the case here) then it is better to organize content in image array as a cube with 3 channels.
< nilay> this is a classic array of structures vs structure of arrays problem: http://stackoverflow.com/questions/17924705/structure-of-arrays-vs-array-of-structures-in-cuda
< nilay> we need to organize the layout in such a manner so that we exploit locality
< nilay> to handle grayscale images also, i think i can take channels as input in each function. .
< nilay> do you think that would work>?
< nilay> i mean, do you think that is good enough. .
< nilay> then in that case for grayscale images the arma::cube image will be a cube with dimensions x * y * 1.
< zoq> tham: "manage the images content as a continuous, one dimension array" I'm not sure I get it, how is that different from what arma::mat does? Using memptr you should end up with the same result.
< zoq> tham nilay: I agree we should let the user define the number of channels.
< zoq> tham nilay: I think, if you use the padded matrix instead of the input matrix, you could also get rid of the subvec operation. Another solution would be to use the shift function (4 times), to get rid of the for loops. But I'm not sure, if that's faster.
< zoq> tham nilay: I think, we agreed to open a pull request once feature x is finished. So, I think, we can discuss code optimizations etc. in the pull request. That makes it much easier for all of us to make comments.
nilay has quit [Ping timeout: 250 seconds]
sumedhghaisas has joined #mlpack
nilay has joined #mlpack
< nilay> zoq: so i thought i open pull request after feature extraction part is done?
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#832 (master - 6f6173c : Marcus Edel): The build passed.
travis-ci has left #mlpack []
< zoq> nilay: Sounds good
< nilay> zoq: ok. i think it should be done today or tomorrow. and then i think there will be many comments for the first request :p
< zoq> nilay: Sounds great, just wanted to point out that we can discuss code more easily using github.
< zoq> nilay: Do you think, there are some easy tests we could write for the feature extraction part?
< nilay> zoq: we can just input image and compare python code (or papers matlab code) and the code i wrote, if result is same then it is good?
< nilay> zoq: please tell me if you mean something else. . .
< zoq> nilay: hm, I have to think about it, but I agree, comparing with the reference code is one option.
< nilay> zoq: ok
< nilay> zoq: we write tests only to verify our correctness or do they help users in any way too?
< zoq> nilay: A test could serve as correctness test and additionally as a small introduction how to use method x in a particular way. However, we don't really write a test, to show the user how to use the code, it's more like a neat side product.
< nilay> zoq: so we can write a test for the overall feature extraction part. but the other functions inside are highly specific to the algorithm, so we won't write a test for those functions.
< nilay> what do you think?
< nilay> i'll be back in 1 or 2 hours.
nilay has quit [Quit: Page closed]
< zoq> nilay: I agree, we should write a test for the overall feature extraction part. And popably a test, for each function we think it makes sense, to do so. I'm not sure there is one yet, but we should, keep that in mind. Does that sound reasonable?
< zoq> If we return a tuple using std::make_tuple does it call the copy constructor or the move constructor or both? If it calls the copy constructor I'm not sure, it's a good idea, to use the tuple interface inside the preprocess split main.
nilay has joined #mlpack
< nilay> zoq: i agree about the testing for functions for which it makes sense part. i did not understand what you mean by "If it calls the copy constructor I'm not sure, it's a good idea, to use the tuple interface inside the preprocess split main."
< rcurtin> nilay: I think that comment was about #650 :)
< rcurtin> zoq: I agree, but I am not sure of the behavior of std::tuple in that setting
< nilay> rcurtin: ok.. i was working hard to make a sense of it :P
< zoq> rcurtin: Maybe keonkim or tham can provide some insights, I just wanted to make sure we don't use the slow interface inside the preprocessing split tool.
marcosirc has joined #mlpack
< rcurtin> marcosirc: I guess you can't close tickets ?
< marcosirc> Hi Ryan! no I can't.
< rcurtin> okay, I think there are a lot of things about github permissions I don't completely know :)
< marcosirc> Haha
< marcosirc> Thanks for your reply yesterday, about the b2 bound.
< rcurtin> yeah, did it make sense?
< marcosirc> I have an example of a tree where b2 can fail, it is related to the rectangle bound I mentioned.
< rcurtin> okay, can you write it up and put it in the ticket? I have to leave shortly, but I will read it when I have a chance
< marcosirc> I think the proof doesn't work for non-ball bound trees
< marcosirc> Yes! I am writing a clear explanation. I will add it as soon as I can.
< rcurtin> great, thanks!
< keonkim> rcurtin, zoq: I believe std::make_tuple calls copy constructor.
< keonkim> As discussed in https://github.com/mlpack/mlpack/pull/523, I think having 6 parameters for TrainTestSplit(input, train, test, and other 3 for labels) can be used as an alternative.
< keonkim> or we can make it like std::move(std::make_tuple())
Mathnerd314 has joined #mlpack
tham has joined #mlpack
< keonkim> I personally prefer option 1 (6 parameter) because that way we can make CLI executable and the function accept same number of parameters.
< tham> Hi, std::tuple do not need to move
< tham> but the parameters(trainData, testData, trainLabel, testLabel) do need to move
< tham> I forgot to move them, will fix this later on
< tham> sorry for that
< keonkim> hello :)
< tham> I did not notice this obvious defects before
< tham> Thanks for pointing out
< tham> nilay zoq : about the image layout, I think we should discuss later after the first pull request start
< tham> It would be easier if we can see the whole api and structures
< tham> keonkim : hello:)
< tham> I open a pull request, #653
< tham> keonkim : You can prefer option 1 or 2, there is why there are two options from the beginning
< tham> Discussion of stack overflow(http://stackoverflow.com/questions/17473753/c11-return-value-optimization-or-move)--c++11 Return value optimization or move?
< tham> In short, do not move local variable, because they will be optimized, either optimized the RVO or move
< tham> optimized by rvo or move(if rvo cannot be done)
< tham> I move the parameters(trainData, testData, trainLabel, testLabel) into tuple, because if I do not, make_tuple will copy the value of the parameters
< tham> The return value of make_tuple itself is rvalue, it can either optimized by rvo, or move
< tham> if rvo fail, the tuple generated by make_tuple will be moved
< tham> Do it sound confuse?
< tham> keonkim zoq rcurtin nilay : My explain make sense?
< keonkim> tham: yup
< zoq> tham: Sounds good, however I'm not sure that the move semantic solution is as fast as the reference solution, because move semantics don't have zero overhead, right?
< tham> I am glad it make sense, this topic make me feel confuse before
< tham> Yap, it need to copy the pointer, and some internal state
< tham> It is just another option for the users
< tham> Not all of the codes need fastest speed
< tham> It most of the cases, the difference should be able to neglect
< tham> but if you want to squeeze every cycle from cpu
< tham> pass by reference maybe is a better choice(I guess, by instinct)
< zoq> tham: Never tested it, you are probably right and the overhead is negligible, thanks, for the clarification.
< tham> zoq : you are welcome
nilay has quit [Ping timeout: 250 seconds]
tham has quit [Quit: Page closed]
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
sumedhghaisas has quit [Remote host closed the connection]
< rcurtin> keonkim: there is no string type for Armadillo matrices, you should assume that labels are either integers or doubles
< rcurtin> like zoq said in his comment, doubles can be used if splitting regression labels (which are floating-point not integers)
< rcurtin> I think there is some minor trickiness to consider for the CLI program... if we load "0" as a double and save it, it'll save as "0.0e+0.0000000" or something like this
< keonkim> rcurtin: yup.. I should think about it tomorrow. Got to sleep for now..
< keonkim> thanks for the review. :)