verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< tham>
Please take it as reference and tell me what do you think about it?
< tham>
Do you have another functions what to implement? Any opinions on the api?
< tham>
About the messages of statistic tool, I will tell you on next week
< tham>
This week I am quite busy on a site project, maybe I will tell you my idea on next Wednesday
kwikadi has joined #mlpack
govg has joined #mlpack
tham has quit [Quit: Page closed]
govg has quit [Quit: leaving]
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
nilay has quit [Quit: Page closed]
mentekid has joined #mlpack
nickabc has joined #mlpack
nickabc has quit [Client Quit]
nilay has joined #mlpack
nilay has quit [Ping timeout: 250 seconds]
wasiq has joined #mlpack
nilay has joined #mlpack
< nilay>
zoq: Hi, basically what I am doing is porting the python code to c++, is that fine?
< zoq>
nilay: I think, you can't exactly replicate the python code, but it's fine to use the code as a reference. I think, when it comes to the feature extraction there is no other way as to do it like the python code (replication of the Matlab code). Since the python code uses numpy which comes with a bigger function set as armadillo, we have to write a bunch of code ourself.
< zoq>
E.g. we have to write the gradient function, which could be easily written as a combination of the convolution operation and the Sobel filter.
< zoq>
And of course we should follow the mlpack API guidelines :)
< nilay>
zoq: i am not sure what you mean.. he has written gradient function in the same way as you are saying.
< zoq>
nilay: I thought the python code uses 'N.gradient(src)'.
< nilay>
zoq: ok so things which cannot be used, must be done manually. otherwise it is ok?
< zoq>
nilay: Sure, as I said, I'm not sure there is another way at least for the feature extraction part. However, if we see something we could to better we should certainly do that.
< nilay>
zoq: that is what I think too. I get your point.
< zoq>
nilay: okay, great :)
< lozhnikov>
rcurtin: Hi, I looked through the Zoltan library. If I am not mistaken the library maps [-DBL_MAX, DBL_MAX]^n to [0,1]^n using a non-linear mapping. And then the library uses the discrete approach.
< lozhnikov>
rcurtin: I think the discrete approach will work better with uniformly distributed data. If I am not mistaken, we usually do not add points after the tree is constructed. So we need not cover the whole space. In that case we can suppose that all points belong to a limited area. And we need not 2048 bits for each axis.
< lozhnikov>
For example we can calculate the maximum value and the minimum value among all points in the database for each axis and use the discrete approach for [min_i, max_i]^n.
< lozhnikov>
rcurtin: I think each of these approaches will be useful (the discrete algorithm and the recursive algorithm).
< lozhnikov>
rcurtin: I think that the discrete approach works better in the finding algorithm. And maybe the recursive algorithm needs less time for the tree construction algorithm.
< rcurtin>
lozhnikov: I see what you mean, let me think about it a little bit
< rcurtin>
you are right that in most cases in mlpack we are not adding any points to trees
< rcurtin>
but I would like that to be possible eventually (at least for the RectangleTree)
< rcurtin>
but I guess, I am not sure how you avoid having huge numbers of bits even if we limit the range to, e.g., [0, 1] in each dimension
< rcurtin>
because I can still have points [1e-308] and [1e-307] which are very close together
< rcurtin>
so I guess, I don't know how those would map to different Hilbert values if we truncate the precision
< rcurtin>
maybe I am misunderstanding?
< lozhnikov>
Of course this approach should not work very well in this case
< lozhnikov>
But it should work with uniformly distributed data
< lozhnikov>
And we can use a non-linear mapping
< rcurtin>
I see what you mean, but do you think that we would have the same problem as the sample set increased in size?
< rcurtin>
like if I have 10 points uniformly distributed, no problem, they are not likely at all to be close
< rcurtin>
but if I have 10 million points, the situation will be different
Rodya has quit [Quit: Adieu, dear Werther!]
Rodya has joined #mlpack
tsathoggua has joined #mlpack
< lozhnikov>
rcurtin: I understand the problem, I should think about it
tsathoggua has quit [Client Quit]
sumedhghaisas has joined #mlpack
mentekid has quit [Ping timeout: 252 seconds]
Rodya has quit [Ping timeout: 240 seconds]
< rcurtin>
lozhnikov: yeah, I think it is not a great assumption that the user will be inputting uniformly distributed data either... lots of datasets are weird in very unexpected ways
Rodya has joined #mlpack
mentekid has joined #mlpack
< zoq>
oops, it wasn't my intention to push everything ...
nilay has quit [Ping timeout: 250 seconds]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#814 (master - 989dd35 : Marcus Edel): The build was fixed.
sumedhghaisas has quit [Ping timeout: 260 seconds]
< mentekid>
rcurtin: I made a quick model about the idea of using hash tables/dictionaries on LSH
< mentekid>
the one I was struggling to explain yesterday
< mentekid>
it's in python, I just wrote it so I could see the concept working (not for accuracy or speed)
< mentekid>
would you like to see it?
< rcurtin>
sure, though I may not be able to really take a look for some hours
< rcurtin>
I have some other things I need to wrap up first today
< mentekid>
sure I can put it on a pastebin, or should I commit it through github, which one is easier for you?
< mentekid>
actually I think it's too small for github. Here's the code:http://pastebin.com/LvvENJ1E and a script to run it: http://pastebin.com/4iqwZhsK
< rcurtin>
ok, I will take a look when I have a chance
mentekid has quit [Ping timeout: 246 seconds]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#815 (master - 39eefde : Marcus Edel): The build passed.