verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
wasiq has joined #mlpack
< rcurtin>
lozhnikov: thanks for your writeup... I think that approach is basically the same as casting the double to a uint64_t? let me know if I have misunderstood that
< rcurtin>
that can definitely work, but that's a pretty nonlinear mapping so I think this will give you a weirdly stretched Hilbert curve
< rcurtin>
I am not sure if the way that Hilbert curve will be stretched will make much of a difference... I have not thought through that part very far
< rcurtin>
keonkim: my main comment is that I don't think there's any need to provide support for arma::Cube objects... data in mlpack is represented by arma::Mat objects instead
< rcurtin>
if a user has an arma::Cube and wants it to be a Mat, it is pretty easy to sreshape it
< rcurtin>
*reshape
< rcurtin>
my other thought is, we should make the API more like the mlpack API, not like the scikit API
< rcurtin>
so I would suggest Train() and Apply() not Fit() and Transform()
< rcurtin>
I would also suggest using void return value for the method that does the transformation, and have a user pass in a reference for the output
< rcurtin>
if you do that, it should actually make the in-place version unnecessary (depending on how you write it)
wasiq has quit [Ping timeout: 260 seconds]
Mathnerd314 has joined #mlpack
< lozhnikov>
rcurtin: I think casting the double to a uint64_t may break the ordering of numbers. So we should fix some bits in case of the negative exponent or the negative sign. And I'm not sure that we can implement this in a platform-independent manner. I think that the frexp function is safer because it does not depend on byte ordering and the hardware
< lozhnikov>
representation of the double datatype at all.
< keonkim>
rcurtin: Thanks, I will apply your comments
< nilay>
zoq: can you tell me what happens when we do cmake ../ and what happens when we do make?
< nilay>
zoq: i was unwell last 2 days, so i didn't come much here..
< zoq>
nilay: cmake generates the GNU Makefiles and make builds everything. You basically use cmake once and use make everytime you like to build your changes.
< zoq>
nilay: Hope, you feeling better now.
< nilay>
zoq: so if i write a new file do i have to cmake again.. (adding that filename to cmake lists) or can i get around that..
< nilay>
zoq: yes i am, thanks.
< nilay>
zoq: because making everything takes lot of time. .
< nilay>
and if once i do cmake ../ then it builds everything from scratch
< zoq>
You can just use make, it should rebuild the project if you change a CMakeList file, and if you just change e.g. a header file it should only build the changed files and files that included the changed file.
< zoq>
You can also use make -j2 to use 2 cores.
< zoq>
But, like I said it should only build changes if you run make again.
< nilay>
zoq: but it doesn't show the error concretely if i use more than 1 core.
< nilay>
zoq: but it is still faster, as i type make again and get the errors.
< rcurtin>
lozhnikov: ah, right, frexp() is a much better way than just casting
< rcurtin>
so, I guess, I am not sure, if you use frexp() to obtain a number between [0, 1] and a power of two, how do we then map this to the hilbert curve?
< rcurtin>
since we still don't have an integer representation
< rcurtin>
oh, and I guess GSoC coding starts formally today... hopefully everyone is having a good time so far! :)
< marcosirc>
Thanks! here I am working on neighbor search!
PcWcBj has joined #mlpack
PcWcBj has left #mlpack []
SDatzJLUb has joined #mlpack
SDatzJLUb has left #mlpack []
sumedhghaisas has joined #mlpack
< rcurtin>
marcosirc: hang on, I'll look at your PR in a minute
< marcosirc>
ok, thanks!
mentekid has quit [Ping timeout: 260 seconds]
mentekid has joined #mlpack
< marcosirc>
rcurtin: I have to leave now, I come back in 2 hours, would this be ok?
tsathoggua has joined #mlpack
mentekid has quit [Ping timeout: 272 seconds]
tsathoggua has quit [Client Quit]
marcosirc has quit [Quit: WeeChat 1.4]
< rcurtin>
marcosirc: sure, but I see you have already gone so maybe my response is not helpful :)
< rcurtin>
I am finishing a paper to submit today so I am not 100% here, maybe only 50%
< nilay>
can we use stl in mlpack?
< rcurtin>
nilay: what do you mean? parts of the STL are used all over mlpack
< rcurtin>
which component of the STL? sometimes Armadillo has better functionality
< nilay>
rcurtin: say i want to use a map<string, int>
< nilay>
or a simple pair<int, int>
< rcurtin>
yeah... that is done all over the mlpack code. I would personally avoid pair<> because it can be quite slow
< rcurtin>
but map<> there is not really any other good alternative
< rcurtin>
for pair<int, int>, maybe an arma::uvec of length 2 is the better idea
< nilay>
rcurtin: ok
< rcurtin>
let me know if I can clarify anything
< nilay>
rcurtin: sure, thanks :)
< zoq>
keonkim: Hello, you should now be able to push to mlpack/blog. I'm excited to see some neat updates :)
< keonkim>
zoq: I just checked, thanks :)
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#817 (master - 9b42c22 : Ryan Curtin): The build passed.
< lozhnikov>
rcurtin: we can divide this number by DBL_EPSILON and then use floor().
< rcurtin>
lozhnikov: okay, so this gives you an integer, and you combine this with the power from frexp() (by bitshifting) to get the hilbert index
< rcurtin>
okay, I think I understand, I think this will work
< rcurtin>
and it's not a nonlinear mapping so there is no weird stretching or anything
< rcurtin>
I'm not sure if you would be able to represent the Hilbert index with just 64 bits, it seems like you would need an integer to store the base (the number you divided by DBL_EPSILON and floor()ed) and also an integer to store the power
< rcurtin>
I haven't thought through that part either, so maybe some compression is possible, but even so, two integers (128 bits) is a lot better than 2048 :)
< lozhnikov>
The base consists of DBL_MANT_DIG (=52) bits. And the power needs only log2(DBL_MAX_EXP - DBL_MIN_EXP + 1) = 11. Why do you think that i need two 64-bit integers?
< rcurtin>
lozhnikov: because I did not perform the calculation you just did :)
< rcurtin>
I hadn't thought it through, you are right
marcosirc has joined #mlpack
< sumedhghaisas>
@marcos Hey marcos
< sumedhghaisas>
@marcosirc
< marcosirc>
Hi sumedh, I read your comment on github. I think you were confused.
< marcosirc>
about rho and lambda.
< rcurtin>
I glanced at it, rho(N_i) <= lambda(N_i) always
< rcurtin>
since rho represents the distance from the center of the node to the furthest point
< rcurtin>
and lambda represents the distance from the center of the node to the furthest descendant
< rcurtin>
and the set of points is a subset of the set of descendants
< rcurtin>
I'll have more time to respond later, I am trying to get my last numbers for this paper so I can submit it :)
< sumedhghaisas>
@marcosirc ohh ... its the exact opposite... pardon me there are lot of symbols in that paper...
< marcosirc>
Thanks. No problem!
< marcosirc>
@sumedhghaisas yes! many symbols!
< rcurtin>
haha, my comment is redundant, marcos already said the exact same thing in the github comment :)
< rcurtin>
sorry for the huge number of symbols :(
< sumedhghaisas>
yeah.. I mean I get confused every time I read that paper...
< sumedhghaisas>
and somewhere down the line you forget the definitions...
< sumedhghaisas>
:P
< rcurtin>
yeah, it's difficult, because to get all the concepts necessary for thinking about trees, there are tons of them
< rcurtin>
and if you look in the neighbor search code there is even one more, I call it "minimum bound distance", which allows yet another prune
< marcosirc>
Haha :) No problem. I definitely prefer more symbols if this means a exacter definition.
< marcosirc>
Yes, I have been reading that code this morning, when using the "adjustedScore".
< marcosirc>
That part of the code is a bit confusing.
< marcosirc>
Is it explained in any paper?
mentekid has quit [Ping timeout: 276 seconds]
< rcurtin>
hm, let me lkook
< rcurtin>
look*
< rcurtin>
the basic idea was, can we use the scores that the parent combination produced in order to prune before calculating the base case?
< rcurtin>
this is done in John Langford's cover tree code but don't look at that because it's impossible to understand
< rcurtin>
I dunno, I don't think it is in any paper I have written :(
< rcurtin>
I thought it was documented okay in the code, but maybe if there is anything I can explain I can update the comments
< marcosirc>
Ok, yes, that was my intuition about that code. Avoid calculating the base case if possible.
< rcurtin>
I think this can often give you a speedup of maybe 10%, and I think it works best for the cover tree
< rcurtin>
this is because the cover tree nodes hold only one point, which is the center of the node
< rcurtin>
so adjusting the score is really easy and fast
< rcurtin>
whereas with the kd-tree, we might need to calculate the base case between the centroids of two nodes...
< rcurtin>
to be perfectly honest, I wonder if some of the calculations (or some of the logic that tries to avoid calculations) simply has too much overhead, and I wonder if the code would be empirically faster without some of the rules
< rcurtin>
but I haven't done any rigorous benchmarking
< rcurtin>
too little time :(
< marcosirc>
Ok.. Yes, I understand the general idea, but I find it difficult to be 100% sure that code is correct.
< marcosirc>
I will revise it in depth, and let you know if I can contribute something!
< rcurtin>
yeah, I am glad to look over it
< rcurtin>
after I published the Tree-Independent Dual-Tree Algorithms paper there was not much time to revisit it
sumedhghaisas has quit [Ping timeout: 260 seconds]
marcosirc has quit [Quit: WeeChat 1.4]
vedantrathore has joined #mlpack
< vedantrathore>
Hey I'm newbie here, can someone guide me how to contribute to mlpack??
< zoq>
vedantratho: Let us know if you have any questions or need further informations.
< vedantrathore>
zoq : Sure..Just one thing..actually I'm starting for next year gsoc...so I should
< vedantrathore>
solve the bugs from issues tracker right??
< zoq>
vedantratho: That is one way to get involved, you can also contribute an interesting algorithm. I'm not sure there are any "entrance" level issues left.
< vedantrathore>
Interesting Algorithms about machine learning right??
< zoq>
vedantratho: right
< vedantrathore>
Ok I'll keep you posted..can I have your email address @zoq ?
< rcurtin>
vedantrathore: you should use the mlpack mailing list, which is linked to on the page zoq linked to, to be in touch
< rcurtin>
that way everyone can answer your question, instead of just one person
< zoq>
Yeah, I agree
< vedantrathore>
Ok..I just joined the mailing list, so I should just send email to mlpack@cc.gatech.edu right?
< zoq>
vedantratho: That's right.
< vedantrathore>
Okay Thanks for the help, I guess I'll be in touch..