naywhayare changed the topic of #mlpack to: "http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/"
< jenkins-mlpack>
* Ryan Curtin: Add new contributor.
< jenkins-mlpack>
* andrewmw94: readability changes to BinarySpaceTree::FurthestPointDistance()
< jenkins-mlpack>
* Ryan Curtin: The test for FurthestPointDistance() was wrong also.
< jenkins-mlpack>
* Ryan Curtin: It turns out the implementation of FurthestPointDistance() was just wrong.
< jenkins-mlpack>
Thanks Andrew for pointing it out.
govg has joined #mlpack
udit_s has joined #mlpack
koderok has joined #mlpack
koderok has quit [Ping timeout: 264 seconds]
sumedhghaisas has joined #mlpack
govg has quit [Quit: leaving]
sumedhghaisas has quit [Quit: Leaving]
sumedhghaisas has joined #mlpack
andrewmw94 has joined #mlpack
< udit_s>
naywhayare: you there ?
< udit_s>
marcus_zoq: hey !
< andrewmw94>
udit_s: it's pretty early here on the east coast. I'd be surprised if Ryan is awake yet
< andrewmw94>
but I could be wrong
< udit_s>
hmm, yeah, I figured.
< udit_s>
But since you're up .
< udit_s>
I just wanted to confirm - armadillo has 0 based indexing right ?
< andrewmw94>
I'm nearly certain that is correct
< udit_s>
yeah, it's not clearly mentioned in their docs too.
< udit_s>
also, accessing armadillo mats is different than accessing normal multidimensional arrays, right, apart from the column major format of storage.
< udit_s>
?
< andrewmw94>
you mean in trems of runtime?
< udit_s>
in terms of syntax, in that, they have different methods like row, col, rows, cols and hence r/w access
< andrewmw94>
it looks like they overloaded the "[]" operator to give you the column, but they want you to use "(r,c)" to get individual elements
< udit_s>
yeah, I found that too. Thanks anyways.
sumedhghaisas has joined #mlpack
< andrewmw94>
no problem
< udit_s>
sumedhghaisas: I didn't get to introduce myself in the first irc meeting, but I'm from BITS Pilani, Pilani. Nice knowing two other BITSIANs are also here...
< sumedhghaisas>
Udit_s: hello udit ... Nice to meet you.. You are in which year???
< udit_s>
entering 5th.
< sumedhghaisas>
Ohh you are a dual degree... Computer science??
< udit_s>
yep. math and cs
< sumedhghaisas>
Ohh nice... Thats a dream combo...
< udit_s>
and you ? 3rd year ?
< udit_s>
I mean this will be 4-1 for you ?
sumedhghaisas has quit [Remote host closed the connection]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
Udit_s: yes this would be my 4 - 1...
< sumedhghaisas>
Single degree computer science...
< udit_s>
cool.
< sumedhghaisas>
I sometimes feel i should have taken dual in pilani... Score was decent enough..
< sumedhghaisas>
What was the cutoff for computer science for dual degree??
< udit_s>
I don't remember... I think it might have been 304.
< udit_s>
Or I might be wrong.
< sumedhghaisas>
Ohh i meant cgpa cutoff for taking computer science in second year...
< udit_s>
Again, I got both comfortably, so i don't really remember...besides, you'd be in my junior batch anyway.
< naywhayare>
udit_s: I'm awake now
< naywhayare>
I was trying to start waking up earlier, but I keep staying up late :(
< naywhayare>
and yeah, armadillo is zero-based indexing
< udit_s>
and here I'm trying to start and end work late, hoping to get more time overlapping with you.
< udit_s>
anyways, trivial stuff then.
< naywhayare>
maybe I'll do better tomorrow... the key is just going to bed early
andrewmw94 has quit [Ping timeout: 255 seconds]
< udit_s>
But I was actually thinking of shifting my time two hours and start from 12.
sumedhghaisas has quit [Ping timeout: 240 seconds]
< naywhayare>
sure, that's fine; whatever works for you. I'll do my best to be around
< udit_s>
Also, I'm going to start decision_stump_main in a while, Will get back to you.
< naywhayare>
ok; I have some meetings in a few hours, so I may be away for a while, but I'll respond as soon as I'm able to
andrewmw94 has joined #mlpack
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Read error: Connection reset by peer]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Read error: Connection reset by peer]
sumedhghaisas has joined #mlpack
cuphrody has joined #mlpack
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: I just sent you a mail, if you could review the attached code
< naywhayare>
oldbeardo: sure, I will do that momentarily
< oldbeardo>
thanks, I'll be here
< naywhayare>
what specifically do you want me to look at?
< oldbeardo>
nothing specific, just the architecture that I'm building
< oldbeardo>
I have never written a tree class before, hence
< naywhayare>
ok. my first thought is that there aren't any comments
< naywhayare>
I'm sure you're planning to add them
< oldbeardo>
yeah, once you say what I have done is okay :)
< naywhayare>
but I'd encourage you to write comments as you write the code... at least for me, I often forget exactly why I have implemented things a certain way when I go back later to add comments
< naywhayare>
the API looks okay but I'd call it CosineTree instead of CosineNode
< naywhayare>
I'd also rename 'inputMat' to 'dataset' and then 'GetInputMat()' to just 'Dataset()'; that'll be more in line with the rest of mlpack code
< oldbeardo>
sure, the reason I named it Cosine Node is because I was planning of having another file named 'cosine_tree.cpp' which will construct the tree using this class
< naywhayare>
I see what you mean; I originally tried to design the BinarySpaceTree class that way
< naywhayare>
but what I ended up finding out is that each node is a tree itself
< naywhayare>
so I could basically have a class BinarySpaceTree which holds a BinarySpaceNode and nothing else... or I could have a class BinarySpaceTree that just holds everything the node did
< naywhayare>
I think that reasoning could apply here too
< oldbeardo>
okay, that makes sense, but I noticed that even Mudit's API has a 'cosine_tree.cpp' and 'cosine_tree_builder.cpp'
< naywhayare>
yeah; he never finished it, so I never got to go over the API with him
< naywhayare>
I would have said the same thing to him :)
< oldbeardo>
sure, I will try to fit it in one 'cosine_tree.cpp' file
< oldbeardo>
I have one more question, if you could open the QUIC-SVD paper
< naywhayare>
sure, I still have it open from yesterday
< oldbeardo>
heh, so if you have a look at algorithm 4, the first 3 steps are technically where the tree is being built
< naywhayare>
right
< oldbeardo>
I'm confused as to where to add them, in 'quic_svd.cpp' or something else?
< naywhayare>
what is 'them'? I don't know what you mean
< oldbeardo>
the 3 steps, building the tree
< oldbeardo>
that's one more reason why I separated it as 'cosine_node.cpp'
< naywhayare>
oh, ok. I'd do it like this: have the CosineTree constructor take an argument epsilon
< naywhayare>
that argument specifies how deeply the tree gets built
< naywhayare>
so when I say 'CosineTree ct(dataset, epsilon)', what I get returned to me the tree that results from steps 1-3 of algorithm 4
< oldbeardo>
right, so does that mean I should retain the CosineNode class?
< naywhayare>
why would you need to retain the CosineNode class?
< oldbeardo>
so that the CosineTree class is just responsible for making the tree, just better separation I think, no?
< naywhayare>
you can write it that way, but then I can show you that the two can be merged maybe more easily than you think :)
< naywhayare>
algorithm 4 definitely has lots of temporary build-time-only variables
< naywhayare>
like the priority queue and so on
< naywhayare>
but... the CosineTree class doesn't need to have these as members because they are only required at tree construction time
< naywhayare>
so if the tree construction is entirely in the constructor(s) of CosineTree, then you can just pass around a temporary priority queue and clean it up when the constructor finishes
< naywhayare>
I have to go now, but I'll be back in about 45 minutes to an hour
< naywhayare>
if something I said is confusing or unclear or wrong, don't worry about it -- implement it the way you suggested, and we can go from there
< oldbeardo>
okay, maybe it will get a bit clearer once I code it in
oldbeardo has quit [Quit: Page closed]
sumedhghaisas has quit [Remote host closed the connection]
< andrewmw94>
naywhayare: Correct me if I am wrong, but the BinarySpaceTree uses the mean_split_implementation to actually split the tree, so you could plug in different splitting algorithms
< andrewmw94>
so if I parallel that for the R tree, I could have one class for the tree, and then different splitting algorithms to implement R* and X trees
< andrewmw94>
but then they would all be called RectangleTree, which is kind of confusing since the X tree in particular can be different than an R tree, but on the other hand, RectangleTree rather than RTree sort of emphasises that
< andrewmw94>
but this could just make sense to me because I'm writing it. Do you think an end user would be confused by that?
< naywhayare>
end user confusion can (hopefully) be solved with some typedefs
< naywhayare>
in fact I've been meaning for about five years to write a typedef for kd-trees...
< naywhayare>
something like 'typedef BinarySpaceTree<HRectBound<2>, MeanSplit, ...> KDTree'
< naywhayare>
so we can probably do something like that for R-type trees as well
< naywhayare>
do you think that would work?
< andrewmw94>
I think so
< naywhayare>
the key will be to use a typedef that still allows a user to specify some (but not all) of the template parameters
< naywhayare>
I think typedefs are capable of doing that... and if not, I think there is some other C++ feature that is capable of it
< andrewmw94>
also, kind of random, I've never used hilbert-R-Tree's before, but I assume it could be dealt with in the same way, ie. by having it's own split rule
< andrewmw94>
I think they are normal R Trees, but they use the hilbert curve to bulk load data
< naywhayare>
for instance, KD trees have MeanSplit and HRectBound, but can have any metric (well, most metrics) and any StatisticType
< naywhayare>
yeah, that's fine; if you can templatize out the differences between the tree types, it makes the resulting codebase smaller and easier to deal with
< andrewmw94>
yeah. A KdTree could also have the median split (or better, some random sampling based variant)
< naywhayare>
yeah, I guess you're right, it doesn't need to be mean split. I guess the rectangle bound is all that's necessary to call it a kd-tree
< andrewmw94>
but the largest width split dimension makes the mean split convieninet
< naywhayare>
but even then everyone has a different definition of kd-tree... some people hold one point per node and hold points in every node (that was Bentley's original formulation in 1974)
< andrewmw94>
From what I've heard, it's actually more typical to use medion based splitting and cycle throught the dimensions
< andrewmw94>
but we do it the other way in robocode, so that's what I think of ;)
< naywhayare>
median split gets you nicely balanced trees, but for weird datasets (or highly skewed datasets, or datasets with outliers) I don't think it will perform as well for search
< naywhayare>
median split is also nice because it lets you bound the depth of the tree as O(log N)
< andrewmw94>
yeah, assuming bulk loading
< naywhayare>
yeah
< andrewmw94>
so for the Rtrees, I think I recall you saying it would be bad to store the data in the nodes
< andrewmw94>
and better to store a pointer to the matrix
< andrewmw94>
but since we have to copy the matrices anyways, I'm not sure where else to store them
< andrewmw94>
more specifically, if I have a class, that has a pointer to a vector that holds a lot of data, and the vector is only used in that class, how is this better than having the vector it the class to start with?
< andrewmw94>
or do I just not remember what you said?
< naywhayare>
ok, suppose I have two very simple classes: class A { arma::mat a; }; class B { arma::mat* a; };
< andrewmw94>
I could be confusing C++ with java, but I think they would be the same
< andrewmw94>
because I think the arma::mat would really just be stored as a pointer once the code is compiled
< naywhayare>
you could try it and see, but I doubt that the compiler is allowed to do what you described
Anand has joined #mlpack
< andrewmw94>
but I guess the pointer way is more consistent with the style of code in mlpack, so I'll do that.
< naywhayare>
I don't think there's actually any code in mlpack that works like that, so you can try either
< naywhayare>
I only suggested my idea because I thought it might be faster -- but I could be wrong
< naywhayare>
it's also worth noting that there is generally a big difference between what the standard says about what _could_ be done with code, and what reasonably sane compilers will actually do with it
< naywhayare>
in many cases it's worth writing code that assumes the user is using gcc / clang (as long as it will still compile on other compilers)
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Client Quit]
cuphrody has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< Anand>
Hi Marcus!
< Anand>
Thanks for the suggestion!
< Anand>
I have moved the metrics to methods/metrics/definitions.py and included that subfolder into the path
< Anand>
It seems to be working!
< marcus_zoq>
Hello Anand, okay, great :)
< andrewmw94>
naywhayare: random question. The BinarySpaceTree says that it's inefficient to calculate the distance from a node to it's parent. Why? Isn't this just the distance from the node's centroid to it's parent's centroid?
govg has quit [Ping timeout: 255 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare>
andrewmw94: where did I write that?
< naywhayare>
I think a bit of context will make it clear what I was thinking (or possibly what someone else was thinking, if I didn't write that bit)
< naywhayare>
I think a lot has changed since I originally wrote that; thank you for pointing it out
< naywhayare>
the BinarySpaceTree now caches the parent distance, so at the very least, the comment is inaccurate
< andrewmw94>
ahh, so perhaps it was before the MBR's?
< naywhayare>
but I also found that HasParentDistance is not actually used in the code
< andrewmw94>
or something like that
< naywhayare>
the MBRs?
< andrewmw94>
minimum bounding rectangles
< andrewmw94>
sorry
< andrewmw94>
did you have a version that didn't inculde those
< andrewmw94>
because then it would make sense
< naywhayare>
ah, no; I never had a version that didn't calculate the bounding rectangles
< naywhayare>
the tree construction is a little faster if the parent distance isn't calculated
< andrewmw94>
yeah. I can't think what you would need it for, but it's a one time O(d) cost
< naywhayare>
but the different shouldn't be much. the calculation of parent distances is done in lines 595-601 and lines 652-658 of binary_space_tree_impl.hpp
< naywhayare>
ParentDistance() is used, I think, in that crazy bounding function for nearest neighbor that I showed you yesterday
< naywhayare>
somewhere in NeighborSearchRules
< andrewmw94>
ahh, that reminds me, I need to take notes on the Rules class
< naywhayare>
the paper I linked you to should explain the basic abstraction fairly well
< andrewmw94>
do you want me to send you my notes I took for understanding the code if I ever get them into a decent state
< naywhayare>
but like all abstractions, the real-world implementation is a bit more complex
< naywhayare>
sure, you can do that. this helps me understand what parts of the code are difficult / easy to understand, and I can also (hopefully) correct any misunderstandings that I see
< andrewmw94>
yeah. I'm not sure if you want to include notes on the layout of different parts of the code, but it could be nice for someone to have them. Assuming mine are actually accurate ;)
< andrewmw94>
bun right now I'm just writing them by hand so I remember them. It'll be a while before I type them
< naywhayare>
yeah; no hurry
< naywhayare>
I think you have already seen how helpful it is to have a second set of eyes look over this code
< andrewmw94>
yeah
< naywhayare>
ok... time for me to grab lunch
< naywhayare>
I'll be back in an hour or so
< andrewmw94>
alright
govg has quit [Ping timeout: 264 seconds]
andrewmw94 has quit [Remote host closed the connection]
koderok has joined #mlpack
govg has joined #mlpack
andrewmw94 has joined #mlpack
< andrewmw94>
ahh, that's better. My wireless card broke, so I had to use a windows computer for IRC while programming on Fedora
< andrewmw94>
I hate windows
< andrewmw94>
but my new card just arrived. Life is good.
andrewmw94 has quit [Remote host closed the connection]
Anand has quit [Ping timeout: 240 seconds]
andrewmw94 has joined #mlpack
< andrewmw94>
haha. Or perhaps life isn't so good after all.
govg has quit [Quit: leaving]
andrewmw94 has quit [Quit: andrewmw94]
andrewmw94 has joined #mlpack
< naywhayare>
andrewmw94: it should be possible to compile mlpack on windows, if you're feeling masochistic :)
< udit_s>
^ :D
< andrewmw94>
I'm pretty sure it's easier to just turn arround whenever I use the IRC
< naywhayare>
almost certainly :)
koderok has quit [Ping timeout: 252 seconds]
< naywhayare>
we have windows build slaves, but I don't have the time to give them the attention they need
< naywhayare>
maybe someday I will find the time to figure out why
< andrewmw94>
yeah. Building stuff on windows can be really painful
< andrewmw94>
it makes you wonder why anyone uses it
< andrewmw94>
I mean, programming is all computers are good for, right?
< naywhayare>
haha
< naywhayare>
if you do all your programming inside the visual studio ecosystem, it's actually not so bad
< naywhayare>
but... mlpack is not quite that
< naywhayare>
for a while, newer Armadillo versions would cause the visual studio compiler to simply segfault
< andrewmw94>
I could see that. I've never seen C++ code with so many templates
< naywhayare>
I hadn't either, before I encountered armadillo
< naywhayare>
the handful of technical reports Conrad Sanderson has written about Armadillo are pretty explanatory and helpful in understanding the basic ideas
andrewmw94 has quit [Quit: andrewmw94]
sumedhghaisas has joined #mlpack
andrewmw94 has joined #mlpack
andrewmw94 has quit [Ping timeout: 252 seconds]
andrewmw94 has joined #mlpack
< sumedhghaisas>
naywhayare: factorization is to moved from getReccomendation to the constructor right??
< sumedhghaisas>
as getReccommendation can be called several times...
udit_s has quit [Quit: Ex-Chat]
andrewmw94 has quit [Ping timeout: 255 seconds]
sumedhghaisas has quit [Ping timeout: 276 seconds]
andrewmw94 has joined #mlpack
sumedhghaisas has joined #mlpack
< naywhayare>
sumedhghaisas: yeah, I think the factorization should be moved to the constructor
< naywhayare>
but before you do that, you should ask oldbeardo (whenever you see him next) if he has any problems if you do that
sumedhghaisas has quit [Ping timeout: 265 seconds]
sumedhghaisas has joined #mlpack
< naywhayare>
sumedhghaisas: did you get my messages?
< sumedhghaisas>
no... :(
< sumedhghaisas>
I am getting a stable net ... until then i have to work with this mobile internet... :(
< sumedhghaisas>
okay... got a stable build... :)
< sumedhghaisas>
I have added new folder 'lmf' inside methods... haven't deleted the old nmf yet... cf is now using the new lmf module... have combined WUpdate and HUpdate class...
< sumedhghaisas>
have added lmf_main...
< sumedhghaisas>
there were 'using namespace' in cf.hpp and cf_impl.hpp... wasn't there in release... but there in current version...
< sumedhghaisas>
naywhayare: my msges getting to you??
< naywhayare>
yeah, I am getting them
< naywhayare>
if your internet is not stable you can always check the IRC logs :) http://www.mlpack.org/irc/
< sumedhghaisas>
okay?? I didn't know this... this would definitely help...
< naywhayare>
yeah, I just finished making them a few weeks ago
< naywhayare>
if your build compiles, you can go ahead and check in the modified/added/removed files with 'svn ci' after you've 'svn add'ed or 'svn delete'd all the necessary files
< sumedhghaisas>
Sometimes I keep two IRC clients on just to be safer side :)
< sumedhghaisas>
I wont delete nmf module so early... it can be removed in later releases I guess...
< naywhayare>
ok, sounds good
< sumedhghaisas>
should I move the factorization from getReccommendations to the constructor??
< naywhayare>
that sounds good to me, but before you do it, we should check with oldbeardo
< sumedhghaisas>
no problem... he would be fast asleep by now... so I will just check in these changes ....
< naywhayare>
alternately, you could open a ticket on trac and CC him (siddharth.950)
< sumedhghaisas>
naywhayare: okay you mentioned something about tagging and message regarding revision...
< naywhayare>
you can just hold your changes to getRecommendations locally; it's a good idea to wait until everyone involved in that part of the code has said it's ok before making big changes
< naywhayare>
yeah; just be sure that your commit message is descriptive
< sumedhghaisas>
yeah sure... yeah I will just open the ticket and CC him...
< sumedhghaisas>
Description of all my modifications??
< naywhayare>
"commit some things" is not a great commit message; but "refactor ModuleA into ModuleB and ModuleC because <reasons>; fixes ticket #<xxx>"
< naywhayare>
the second bit, "Commit logical changesets", has some good advice on commit messages
< naywhayare>
basically, a year or two from now, someone might realize that your changeset may have introduced a bug; then they'll investigate it, and having a good log message helps figure out how the bug got introduced, and the right way to fix it
< naywhayare>
happens to me all the time... I'm always hunting down bugs I accidentally introduced, then trying to figure out why I introduced them based on the commit message
< sumedhghaisas>
yes... bug fixing would be a nightmare without a proper description...
< sumedhghaisas>
okay as this is my first time... I will just write down the message here... so you can have a look ...
< sumedhghaisas>
added module 'lmf'(Latent Matrix Factorization) to accommodate SVD based update rules alongside NMF based update rule. CF module is updated to use LMF module.
< sumedhghaisas>
naywhayare: sounds good??
< naywhayare>
sure, seems good
< sumedhghaisas>
okay then :)
< sumedhghaisas>
naywhayare: do I have to delete build folder in my truck before calling svn add??
< andrewmw94>
no, unless the build folder is in a folder that you are adding
< naywhayare>
no; call 'svn add <filename/directoryname>' to be specific on what to add
< sumedhghaisas>
ohh okay... and what about modifications??
< naywhayare>
andrewmw94: in r16530 you added a bunch of files for the different types of splits... but it looks like they're directories ?
< naywhayare>
sumedhghaisas: modified files will be automatically added
< naywhayare>
when you type 'svn commit', it'll show you the list of files you're committing
< naywhayare>
so you can make sure you're committing all the files you've changed or added
< sumedhghaisas>
thanks :)
< andrewmw94>
naywhayare: ahh, that could be. I tried creating files in emacs dired for the first time and must have made them directories accidentally
< naywhayare>
andrewmw94: hah. I was wondering what happened. it's an easy enough fix, and I figured you'd figure it out eventually, but I was entertained :)
< andrewmw94>
hah. And now it won't let me fix it because the file "has unexpectedly changed kind"
< andrewmw94>
I love it when SVN does stuff like this
< naywhayare>
heh...
< sumedhghaisas>
naywhayare: Okay I have committed all the changes... how do I know jenkins buid is fine??
< naywhayare>
you can see now that the build is "pending" (in the left column)
< naywhayare>
ideally Jenkins will send you an email if the build fails, but I don't think Jenkins knows your email
< naywhayare>
I'll have to log in and update it
< sumedhghaisas>
ohh okay :)
< naywhayare>
either way, if you check that page again in a little while (an hour?) you can see whether or not the build was successful
< naywhayare>
and you can also watch it build in real-time; once the build starts, you can click on the link to the current build and watch the console output
< naywhayare>
I'll send a link when the build starts
< naywhayare>
I think jenkins-mlpack will also say something in the channel if the build fails
< sumedhghaisas>
yeah sure that will be cool to watch :)
< sumedhghaisas>
waiting...
< naywhayare>
ok, I reconfigured jenkins-mlpack so that maybe it will say something in the channel when a build starts
< jenkins-mlpack>
Starting build #1910 for job mlpack - svn checkin test (previous build: SUCCESS)
< sumedhghaisas>
yeah it is funny ... ChuckNorris Plugin..... I dont know if you are aware of Rajnikant... I should make a Rajnikant plugin :)
< naywhayare>
I have heard the name... I should watch some movies he is in
< sumedhghaisas>
ohh are you sure about that?? if you are then start with one called robot ... the movie I have seen ... as in complete movie... seen lots of parts though
< jenkins-mlpack>
sumedhghaisas: added module 'lmf'(Latent Matrix Factorization) to accommodate SVD based update rules alongside NMF based update rule. CF module is updated to use LMF module.
< jenkins-mlpack>
Starting build #1911 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has quit [Ping timeout: 276 seconds]