#mlpack on 2016-05-23 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:21 wasiq has joined #mlpack

01:41 < rcurtin> lozhnikov: thanks for your writeup... I think that approach is basically the same as casting the double to a uint64_t? let me know if I have misunderstood that

01:41 < rcurtin> that can definitely work, but that's a pretty nonlinear mapping so I think this will give you a weirdly stretched Hilbert curve

01:42 < rcurtin> I am not sure if the way that Hilbert curve will be stretched will make much of a difference... I have not thought through that part very far

01:43 < rcurtin> keonkim: my main comment is that I don't think there's any need to provide support for arma::Cube objects... data in mlpack is represented by arma::Mat objects instead

01:43 < rcurtin> if a user has an arma::Cube and wants it to be a Mat, it is pretty easy to sreshape it

01:44 < rcurtin> *reshape

01:46 < rcurtin> my other thought is, we should make the API more like the mlpack API, not like the scikit API

01:46 < rcurtin> so I would suggest Train() and Apply() not Fit() and Transform()

01:47 < rcurtin> I would also suggest using void return value for the method that does the transformation, and have a user pass in a reference for the output

01:47 < rcurtin> if you do that, it should actually make the in-place version unnecessary (depending on how you write it)

02:37 wasiq has quit [Ping timeout: 260 seconds]

02:42 Mathnerd314 has joined #mlpack

05:41 < lozhnikov> rcurtin: I think casting the double to a uint64_t may break the ordering of numbers. So we should fix some bits in case of the negative exponent or the negative sign. And I'm not sure that we can implement this in a platform-independent manner. I think that the frexp function is safer because it does not depend on byte ordering and the hardware

05:41 < lozhnikov> representation of the double datatype at all.

05:43 < keonkim> rcurtin: Thanks, I will apply your comments

05:44 < keonkim> rcurtin: I also moved everything under methods folder. (for now) https://github.com/keonkim/mlpack/tree/master/src/mlpack/methods/preprocess

05:52 mentekid has joined #mlpack

06:53 Mathnerd314 has quit [Ping timeout: 250 seconds]

08:19 mentekid has quit [Ping timeout: 276 seconds]

09:35 mentekid has joined #mlpack

12:01 wasiq has joined #mlpack

12:53 nilay has joined #mlpack

13:11 < nilay> zoq: Hi

13:11 < zoq> nilay: Hello

13:12 < nilay> zoq: can you tell me what happens when we do cmake ../ and what happens when we do make?

13:12 < nilay> zoq: i was unwell last 2 days, so i didn't come much here..

13:14 < zoq> nilay: cmake generates the GNU Makefiles and make builds everything. You basically use cmake once and use make everytime you like to build your changes.

13:15 < zoq> nilay: Hope, you feeling better now.

13:15 < nilay> zoq: so if i write a new file do i have to cmake again.. (adding that filename to cmake lists) or can i get around that..

13:15 < nilay> zoq: yes i am, thanks.

13:15 < nilay> zoq: because making everything takes lot of time. .

13:17 < nilay> and if once i do cmake ../ then it builds everything from scratch

13:19 < zoq> You can just use make, it should rebuild the project if you change a CMakeList file, and if you just change e.g. a header file it should only build the changed files and files that included the changed file.

13:19 < zoq> You can also use make -j2 to use 2 cores.

13:20 < zoq> But, like I said it should only build changes if you run make again.

13:20 < nilay> zoq: but it doesn't show the error concretely if i use more than 1 core.

13:21 < nilay> zoq: but it is still faster, as i type make again and get the errors.

13:21 < zoq> nilay: yeah, right

13:25 nilay has quit [Ping timeout: 250 seconds]

13:35 wasiq has quit [Ping timeout: 250 seconds]

14:44 nilay has joined #mlpack

14:59 Mathnerd314 has joined #mlpack

15:13 < keonkim> fun project here http://pjreddie.com/darknet/ maybe we can reference how it uses cuda

15:28 marcosirc has joined #mlpack

16:09 < rcurtin> lozhnikov: ah, right, frexp() is a much better way than just casting

16:11 < rcurtin> so, I guess, I am not sure, if you use frexp() to obtain a number between [0, 1] and a power of two, how do we then map this to the hilbert curve?

16:11 < rcurtin> since we still don't have an integer representation

16:12 < rcurtin> oh, and I guess GSoC coding starts formally today... hopefully everyone is having a good time so far! :)

16:41 < marcosirc> Thanks! here I am working on neighbor search!

17:02 PcWcBj has joined #mlpack

17:02 PcWcBj has left #mlpack []

17:02 SDatzJLUb has joined #mlpack

17:02 SDatzJLUb has left #mlpack []

17:05 sumedhghaisas has joined #mlpack

17:14 < rcurtin> marcosirc: hang on, I'll look at your PR in a minute

17:14 < marcosirc> ok, thanks!

17:21 mentekid has quit [Ping timeout: 260 seconds]

17:33 mentekid has joined #mlpack

17:43 < marcosirc> rcurtin: I have to leave now, I come back in 2 hours, would this be ok?

17:43 tsathoggua has joined #mlpack

17:46 mentekid has quit [Ping timeout: 272 seconds]

17:47 tsathoggua has quit [Client Quit]

17:47 marcosirc has quit [Quit: WeeChat 1.4]

17:50 < rcurtin> marcosirc: sure, but I see you have already gone so maybe my response is not helpful :)

17:50 < rcurtin> I am finishing a paper to submit today so I am not 100% here, maybe only 50%

17:58 < nilay> can we use stl in mlpack?

18:02 < rcurtin> nilay: what do you mean? parts of the STL are used all over mlpack

18:02 < rcurtin> which component of the STL? sometimes Armadillo has better functionality

18:02 < nilay> rcurtin: say i want to use a map<string, int>

18:04 < nilay> or a simple pair<int, int>

18:06 < rcurtin> yeah... that is done all over the mlpack code. I would personally avoid pair<> because it can be quite slow

18:06 < rcurtin> but map<> there is not really any other good alternative

18:07 < rcurtin> for pair<int, int>, maybe an arma::uvec of length 2 is the better idea

18:07 < nilay> rcurtin: ok

18:11 < rcurtin> let me know if I can clarify anything

18:14 < nilay> rcurtin: sure, thanks :)

18:24 < zoq> keonkim: Hello, you should now be able to push to mlpack/blog. I'm excited to see some neat updates :)

18:25 < keonkim> zoq: I just checked, thanks :)

18:30 travis-ci has joined #mlpack

18:30 < travis-ci> mlpack/mlpack#817 (master - 9b42c22 : Ryan Curtin): The build passed.

18:30 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/39eefded8c6e...9b42c22105c2

18:30 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/132341549

18:30 travis-ci has left #mlpack []

18:40 mentekid has joined #mlpack

18:50 < lozhnikov> rcurtin: we can divide this number by DBL_EPSILON and then use floor().

18:52 < rcurtin> lozhnikov: okay, so this gives you an integer, and you combine this with the power from frexp() (by bitshifting) to get the hilbert index

18:53 < rcurtin> okay, I think I understand, I think this will work

18:53 < rcurtin> and it's not a nonlinear mapping so there is no weird stretching or anything

18:54 < rcurtin> I'm not sure if you would be able to represent the Hilbert index with just 64 bits, it seems like you would need an integer to store the base (the number you divided by DBL_EPSILON and floor()ed) and also an integer to store the power

18:55 < rcurtin> I haven't thought through that part either, so maybe some compression is possible, but even so, two integers (128 bits) is a lot better than 2048 :)

19:21 < lozhnikov> The base consists of DBL_MANT_DIG (=52) bits. And the power needs only log2(DBL_MAX_EXP - DBL_MIN_EXP + 1) = 11. Why do you think that i need two 64-bit integers?

19:31 < rcurtin> lozhnikov: because I did not perform the calculation you just did :)

19:31 < rcurtin> I hadn't thought it through, you are right

20:26 marcosirc has joined #mlpack

20:51 < sumedhghaisas> @marcos Hey marcos

20:51 < sumedhghaisas> @marcosirc

20:54 < marcosirc> Hi sumedh, I read your comment on github. I think you were confused.

20:54 < marcosirc> about rho and lambda.

21:02 < rcurtin> I glanced at it, rho(N_i) <= lambda(N_i) always

21:03 < rcurtin> since rho represents the distance from the center of the node to the furthest point

21:03 < rcurtin> and lambda represents the distance from the center of the node to the furthest descendant

21:03 < rcurtin> and the set of points is a subset of the set of descendants

21:03 < rcurtin> I'll have more time to respond later, I am trying to get my last numbers for this paper so I can submit it :)

21:03 < sumedhghaisas> @marcosirc ohh ... its the exact opposite... pardon me there are lot of symbols in that paper...

21:04 < marcosirc> Thanks. No problem!

21:05 < marcosirc> @sumedhghaisas yes! many symbols!

21:05 < rcurtin> haha, my comment is redundant, marcos already said the exact same thing in the github comment :)

21:05 < rcurtin> sorry for the huge number of symbols :(

21:06 < sumedhghaisas> yeah.. I mean I get confused every time I read that paper...

21:06 < sumedhghaisas> and somewhere down the line you forget the definitions...

21:06 < sumedhghaisas> :P

21:07 < rcurtin> yeah, it's difficult, because to get all the concepts necessary for thinking about trees, there are tons of them

21:07 < rcurtin> and if you look in the neighbor search code there is even one more, I call it "minimum bound distance", which allows yet another prune

21:09 < marcosirc> Haha :) No problem. I definitely prefer more symbols if this means a exacter definition.

21:19 < marcosirc> Yes, I have been reading that code this morning, when using the "adjustedScore".

21:20 < marcosirc> That part of the code is a bit confusing.

21:21 < marcosirc> Is it explained in any paper?

21:22 mentekid has quit [Ping timeout: 276 seconds]

21:22 < rcurtin> hm, let me lkook

21:22 < rcurtin> look*

21:23 < rcurtin> the basic idea was, can we use the scores that the parent combination produced in order to prune before calculating the base case?

21:23 < rcurtin> this is done in John Langford's cover tree code but don't look at that because it's impossible to understand

21:24 < rcurtin> I dunno, I don't think it is in any paper I have written :(

21:24 < rcurtin> I thought it was documented okay in the code, but maybe if there is anything I can explain I can update the comments

21:24 < marcosirc> Ok, yes, that was my intuition about that code. Avoid calculating the base case if possible.

21:25 < rcurtin> I think this can often give you a speedup of maybe 10%, and I think it works best for the cover tree

21:25 < rcurtin> this is because the cover tree nodes hold only one point, which is the center of the node

21:25 < rcurtin> so adjusting the score is really easy and fast

21:26 < rcurtin> whereas with the kd-tree, we might need to calculate the base case between the centroids of two nodes...

21:28 < rcurtin> to be perfectly honest, I wonder if some of the calculations (or some of the logic that tries to avoid calculations) simply has too much overhead, and I wonder if the code would be empirically faster without some of the rules

21:28 < rcurtin> but I haven't done any rigorous benchmarking

21:28 < rcurtin> too little time :(

21:29 < marcosirc> Ok.. Yes, I understand the general idea, but I find it difficult to be 100% sure that code is correct.

21:30 < marcosirc> I will revise it in depth, and let you know if I can contribute something!

21:31 < rcurtin> yeah, I am glad to look over it

21:31 < rcurtin> after I published the Tree-Independent Dual-Tree Algorithms paper there was not much time to revisit it

21:50 sumedhghaisas has quit [Ping timeout: 260 seconds]

21:56 marcosirc has quit [Quit: WeeChat 1.4]

22:09 vedantrathore has joined #mlpack

22:11 < vedantrathore> Hey I'm newbie here, can someone guide me how to contribute to mlpack??

22:13 < zoq> vedantratho: Hello, I think http://www.mlpack.org/involved.html might be helpful.

22:13 < vedantrathore> Thanks!

22:15 < zoq> vedantratho: Let us know if you have any questions or need further informations.

22:19 < vedantrathore> zoq : Sure..Just one thing..actually I'm starting for next year gsoc...so I should

22:20 < vedantrathore> solve the bugs from issues tracker right??

22:24 < zoq> vedantratho: That is one way to get involved, you can also contribute an interesting algorithm. I'm not sure there are any "entrance" level issues left.

22:28 < vedantrathore> Interesting Algorithms about machine learning right??

22:28 < zoq> vedantratho: right

22:32 < vedantrathore> Ok I'll keep you posted..can I have your email address @zoq ?

22:34 < rcurtin> vedantrathore: you should use the mlpack mailing list, which is linked to on the page zoq linked to, to be in touch

22:34 < rcurtin> that way everyone can answer your question, instead of just one person

22:36 < zoq> Yeah, I agree

22:36 < vedantrathore> Ok..I just joined the mailing list, so I should just send email to mlpack@cc.gatech.edu right?

22:37 < zoq> vedantratho: That's right.

22:38 < vedantrathore> Okay Thanks for the help, I guess I'll be in touch..

22:41 < zoq> vedantratho: Sounds good, see you around.

22:42 vedantrathore has quit [Quit: Page closed]

23:08 nilay has quit [Ping timeout: 250 seconds]

23:25 zoq has quit [Ping timeout: 252 seconds]

23:29 zoq has joined #mlpack