#mlpack on 2016-05-20 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:26 kwikadi has quit [Ping timeout: 260 seconds]

01:27 tham has joined #mlpack

01:27 < tham> keonkim : I post the api designs on github(https://github.com/stereomatchingkiss/experiment_apps/tree/master/mlpack_preprocess_exp)

01:27 < tham> Please take it as reference and tell me what do you think about it?

01:28 < tham> Do you have another functions what to implement? Any opinions on the api?

01:28 < tham> About the messages of statistic tool, I will tell you on next week

01:30 < tham> This week I am quite busy on a site project, maybe I will tell you my idea on next Wednesday

01:30 kwikadi has joined #mlpack

02:39 govg has joined #mlpack

02:57 tham has quit [Quit: Page closed]

03:21 govg has quit [Quit: leaving]

07:33 nilay has quit [Ping timeout: 250 seconds]

08:37 nilay has joined #mlpack

08:48 nilay has quit [Quit: Page closed]

10:03 mentekid has joined #mlpack

10:34 nickabc has joined #mlpack

10:35 nickabc has quit [Client Quit]

14:11 nilay has joined #mlpack

14:15 nilay has quit [Ping timeout: 250 seconds]

14:30 wasiq has joined #mlpack

14:51 nilay has joined #mlpack

14:52 < nilay> zoq: Hi, basically what I am doing is porting the python code to c++, is that fine?

14:58 < zoq> nilay: I think, you can't exactly replicate the python code, but it's fine to use the code as a reference. I think, when it comes to the feature extraction there is no other way as to do it like the python code (replication of the Matlab code). Since the python code uses numpy which comes with a bigger function set as armadillo, we have to write a bunch of code ourself.

14:58 < zoq> E.g. we have to write the gradient function, which could be easily written as a combination of the convolution operation and the Sobel filter.

14:59 < zoq> And of course we should follow the mlpack API guidelines :)

15:02 < nilay> zoq: i am not sure what you mean.. he has written gradient function in the same way as you are saying.

15:03 < zoq> nilay: I thought the python code uses 'N.gradient(src)'.

15:04 < zoq> nilay: 'dy[:, :, i], dx[:, :, i] = N.gradient(src[:, :, i])'

15:06 < nilay> zoq: ok so things which cannot be used, must be done manually. otherwise it is ok?

15:08 < zoq> nilay: Sure, as I said, I'm not sure there is another way at least for the feature extraction part. However, if we see something we could to better we should certainly do that.

15:09 < nilay> zoq: that is what I think too. I get your point.

15:13 < zoq> nilay: okay, great :)

16:33 < lozhnikov> rcurtin: Hi, I looked through the Zoltan library. If I am not mistaken the library maps [-DBL_MAX, DBL_MAX]^n to [0,1]^n using a non-linear mapping. And then the library uses the discrete approach.

16:43 < lozhnikov> rcurtin: I think the discrete approach will work better with uniformly distributed data. If I am not mistaken, we usually do not add points after the tree is constructed. So we need not cover the whole space. In that case we can suppose that all points belong to a limited area. And we need not 2048 bits for each axis.

16:46 < lozhnikov> For example we can calculate the maximum value and the minimum value among all points in the database for each axis and use the discrete approach for [min_i, max_i]^n.

16:48 < lozhnikov> rcurtin: I think each of these approaches will be useful (the discrete algorithm and the recursive algorithm).

16:52 < lozhnikov> rcurtin: I think that the discrete approach works better in the finding algorithm. And maybe the recursive algorithm needs less time for the tree construction algorithm.

17:06 < rcurtin> lozhnikov: I see what you mean, let me think about it a little bit

17:06 < rcurtin> you are right that in most cases in mlpack we are not adding any points to trees

17:06 < rcurtin> but I would like that to be possible eventually (at least for the RectangleTree)

17:09 < rcurtin> but I guess, I am not sure how you avoid having huge numbers of bits even if we limit the range to, e.g., [0, 1] in each dimension

17:09 < rcurtin> because I can still have points [1e-308] and [1e-307] which are very close together

17:09 < rcurtin> so I guess, I don't know how those would map to different Hilbert values if we truncate the precision

17:09 < rcurtin> maybe I am misunderstanding?

17:10 < lozhnikov> Of course this approach should not work very well in this case

17:11 < lozhnikov> But it should work with uniformly distributed data

17:14 < lozhnikov> And we can use a non-linear mapping

17:14 < rcurtin> I see what you mean, but do you think that we would have the same problem as the sample set increased in size?

17:14 < rcurtin> like if I have 10 points uniformly distributed, no problem, they are not likely at all to be close

17:15 < rcurtin> but if I have 10 million points, the situation will be different

17:40 Rodya has quit [Quit: Adieu, dear Werther!]

18:03 Rodya has joined #mlpack

18:05 tsathoggua has joined #mlpack

18:05 < lozhnikov> rcurtin: I understand the problem, I should think about it

18:07 tsathoggua has quit [Client Quit]

18:21 sumedhghaisas has joined #mlpack

18:22 mentekid has quit [Ping timeout: 252 seconds]

18:33 Rodya has quit [Ping timeout: 240 seconds]

18:34 < rcurtin> lozhnikov: yeah, I think it is not a great assumption that the user will be inputting uniformly distributed data either... lots of datasets are weird in very unexpected ways

18:39 Rodya has joined #mlpack

19:12 mentekid has joined #mlpack

20:29 < zoq> oops, it wasn't my intention to push everything ...

20:54 nilay has quit [Ping timeout: 250 seconds]

20:57 travis-ci has joined #mlpack

20:57 < travis-ci> mlpack/mlpack#814 (master - 989dd35 : Marcus Edel): The build was fixed.

20:57 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/986620375ce8...989dd35359ee

20:57 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/131799729

20:57 travis-ci has left #mlpack []

21:13 sumedhghaisas has quit [Ping timeout: 260 seconds]

21:21 < mentekid> rcurtin: I made a quick model about the idea of using hash tables/dictionaries on LSH

21:21 < mentekid> the one I was struggling to explain yesterday

21:21 < mentekid> it's in python, I just wrote it so I could see the concept working (not for accuracy or speed)

21:22 < mentekid> would you like to see it?

21:24 < rcurtin> sure, though I may not be able to really take a look for some hours

21:24 < rcurtin> I have some other things I need to wrap up first today

21:25 < mentekid> sure I can put it on a pastebin, or should I commit it through github, which one is easier for you?

21:35 < mentekid> actually I think it's too small for github. Here's the code:http://pastebin.com/LvvENJ1E and a script to run it: http://pastebin.com/4iqwZhsK

21:37 < rcurtin> ok, I will take a look when I have a chance

22:58 mentekid has quit [Ping timeout: 246 seconds]

23:02 travis-ci has joined #mlpack

23:02 < travis-ci> mlpack/mlpack#815 (master - 39eefde : Marcus Edel): The build passed.

23:02 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/989dd35359ee...39eefded8c6e

23:02 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/131825488

23:02 travis-ci has left #mlpack []

23:38 Rodya has quit [Ping timeout: 250 seconds]

23:45 Rodya has joined #mlpack

23:57 Rodya has quit [Ping timeout: 260 seconds]

23:59 Rodya has joined #mlpack