naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< jenkins-mlpack>
* andrewmw94: Fix/update some comments, almost finish the splitting algorithm. Several miscellaneous changes.
< jenkins-mlpack>
* Ryan Curtin: Trivial spelling fix.
< marcus_zoq>
Anand: Good morning! We need to rewrite the formula because if you assume '2' as the highest label, the formula doesn't work. Any idea to work around it?
Anand has quit [Ping timeout: 240 seconds]
Anand has joined #mlpack
< Anand>
Marcus : Some preprocessing of the true labels csv file will give me the highest label and then I can use it in the formula
< Anand>
Just a bit of parsing work has to be done
< Anand>
But, only if it we agree on this being the right thing to do
< marcus_zoq>
So you will search for the highest label and then use the label in the formula?
Anand has quit [Ping timeout: 240 seconds]
cuphrody has joined #mlpack
Anand has joined #mlpack
< Anand>
Marcus : Yes, parsing the file and extracting label seems to be a valid option
< marcus_zoq>
Anand: Sure it will work when the highest label is 1, are we talking about the same thing? :)
< Anand>
No. I am concerned about the case when highest label is not 1.
< Anand>
What then?
< Anand>
Will the reviously mentioned approach work then?
< Anand>
*previously
< marcus_zoq>
I don't think so.
< Anand>
What do you suggest then?
< Anand>
I need this metric to work for multi class classifiers
< Anand>
And why do you think will it not work
< marcus_zoq>
Maybe, I'm wrong but you can test the approach with an made up example binary and multiclass and check the results.
< marcus_zoq>
I have to think about a solution.
< marcus_zoq>
Perhaps in the meantime you can define the labels transition function?
< marcus_zoq>
Or maybe I'm wrong ...
< Anand>
Wheterh we need a labels transition function or not depends on the datasets being used. For now, we have numbers in all our datasets
Anand has quit [Ping timeout: 240 seconds]
Anand has joined #mlpack
Anand has quit [Ping timeout: 240 seconds]
oldbeardo has joined #mlpack
< oldbeardo>
marcus_zoq: I pushed a blog post, but for some reason it's not visible on the blog page
< udit_s>
hey, I've just sent you a mail. I also wanted your help because I'm unable to find the IRC logs from thursday. I had to take a few links off of them.
< marcus_zoq>
udit_s: On that day there was a cooling failure, so the script failed to generate the log.
< udit_s>
wow. talk about murphy's law...
< marcus_zoq>
udit_s: But I wonder if we actually lost the log file.
< udit_s>
how can I access it ?
< marcus_zoq>
udit_s: My log file is gone, maybe naywhayare has a backup log?
< udit_s>
marcus_zoq: okay, but he seems unresponsive. I think he's busy with his paper. Any other suggestions ?
< udit_s>
Basically, I had discretize continuous attributes.
< udit_s>
*to
< marcus_zoq>
Right
< udit_s>
so I was wondering about the supervised discretization method. A few questions.
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare>
udit_s: I can't seem to find a log from thursday
< naywhayare>
but I'm pretty sure I have the log somewhere. I'll keep looking...
< udit_s>
naywhayare: okay, I'm working on unsupervised equi-width binning.
< udit_s>
I'm sorrting the continuous values and then using binning, I'm splitting.
< naywhayare>
okay
< naywhayare>
were you looking for the paper that had the binning algorithm in it?
< naywhayare>
I can probably dig that link up again if you need
< udit_s>
yeah. and the weka one.
< naywhayare>
there was code in Weka somewhere; you can download the source and look for the OneR class
< naywhayare>
let me find the paper...
< naywhayare>
the paper is "Very simple classification rules perform well on most commonly used datasets" and the algorithm was in one of the appendices
< udit_s>
got it ! thanks. btw. are you free in a while ?
< udit_s>
naywhayare: do you remember there was this paper which talked about sorting the continuously valued attributes first ? Could you share that ?
cuphrody has quit [Ping timeout: 276 seconds]
< naywhayare>
udit_s: sorry, I stepped out
< naywhayare>
I don't remember which paper that was...
< naywhayare>
when I get to my lab computer I will see if I have the logs there
Anand_ has joined #mlpack
< marcus_zoq>
Anand_: Did you test the formula with some made up values?
< Anand_>
Yes, the formula will work
< Anand_>
But, I don't know if it is correct to convert the formula
< Anand_>
for multi-class classificatiom
< Anand_>
I was going through a paper to see if I can find something
< Anand_>
Didn't find much
< marcus_zoq>
Okay, see if I can find something
govg has quit [Quit: leaving]
< Anand_>
Ok. I will tell you if I find the right thing
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare>
udit_s: somehow I found the logs... let me put them in the right place and rebuild the irc log page
< Anand_>
Marcus : One way to make one vs all work is to calculate the mean predictive information for each class by assuming it as 0-class and all other classes as 1-class and then take average
< Anand_>
We won't need to modify the formula for this
< Anand_>
Just the file parsing will increase
wavelander has joined #mlpack
< udit_s>
naywhayare: good stuff !
wavelander has left #mlpack []
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
oldbeardo has joined #mlpack
< oldbeardo>
marcus_zoq: sorry about the earlier message, I guess it takes a few minutes
< naywhayare>
oldbeardo: marcus_zoq: something weird happened and Jenkins lost the whole workspace so I just had Jenkins rebuild the blog and now it seems to be okay
< oldbeardo>
and sorry again, I just saw your reply to my message
oldbeardo has quit [Quit: Page closed]
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: okay, didn't receive your message on irc, I have to keep watching on the logs, stupid net connection
< naywhayare>
:( sorry about your connection
< naywhayare>
it's a good thing the irc logging system (mostly) works, though :)
< oldbeardo>
yeah, that's good, earlier when we had just started out it didn't show messages immediately
< naywhayare>
if you find problems with it, let me know so that I can fix them. it will have to wait until after my paper deadline though
< naywhayare>
I know that the month navigation stopped working in June for some reason
< naywhayare>
not sure why
< oldbeardo>
okay, will do, let's hope I don't have to
oldbeardo has quit [Quit: Page closed]
Anand_ has quit [Ping timeout: 240 seconds]
govg has quit [Ping timeout: 240 seconds]
< andrewmw94>
naywhayare: is there a reason you use the terminology leafSize rather than maximumLeafSize or something like that?
< andrewmw94>
I would prefer the latter, but I also want to keep it consistent across trees
udit_s has quit [Remote host closed the connection]
< andrewmw94>
also, I have numChildren to refer to the number of childNodes that an given node has on the level below it, but the way numChildren() is implemented for the BSP tree, it will return the total number of descendants
< andrewmw94>
which makes sense, since a BSP node has two children, but for the R tree variants, I need to have some variable to track the number of children on the next level.
< naywhayare>
andrewmw94: leafSize is just a historical term that's been in use for a long time. I guess maximumLeafSize would actually be more accurate
< naywhayare>
we can change the name of leafSize in other trees to reflect that, so you can go ahead and use maximumLeafSize
< andrewmw94>
what about numChildren()
< naywhayare>
and if you want, you can change all references to leafSize in the rest of the code, because almost certainly I'm going to forget to do it and I don't have time to do it now
< naywhayare>
NumChildren() for binary space trees does only return the number of children and not the number of descendants (that can be obtained with NumDescendants())
< naywhayare>
so for binary space trees NumChildren() is either 0, 1, or 2
< naywhayare>
actually I think it'll never be 1
< naywhayare>
I wouldn't think that it would be hard to track the number of children that an R tree has; shouldn't that be something that each node stores anyway?
< andrewmw94>
yeah. I for some reason thought that it was supposed to return all descendants
< naywhayare>
also NumDescendants() returns the number of descendant points, not the number of descendant nodes
< andrewmw94>
and then I needed to use a variable for the children on the next node, and having both with similar names is confusing, but we don't actually have both
< naywhayare>
there's not yet been any case where the number of descendant nodes has been useful
< andrewmw94>
ahh, I think I was confused on what count meant when I took my notes in the first week
< andrewmw94>
and that's why I got the purposes of numChildren and numDescendants confused
< naywhayare>
yeah, begin and count are useful for the binary space tree because they denote which points in the matrix are descendants of a given node
< jenkins-mlpack>
Starting build #1930 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack>
andrewmw94: change name of leafSize to maxLeafSize. more stuff for the R-tree. Some name changes, some more node splitting, a start on traversal.
andrewmw94 has quit [Ping timeout: 245 seconds]
andrewmw94 has joined #mlpack
andrewmw94 has quit [Remote host closed the connection]
< jenkins-mlpack>
Starting build #1931 for job mlpack - svn checkin test (previous build: SUCCESS)