#mlpack on 2014-06-03 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

03:50 sumedh_ has joined #mlpack

03:53 sumedh__ has quit [Ping timeout: 252 seconds]

04:27 Anand has joined #mlpack

05:03 sumedh_ has quit [Quit: Leaving]

05:33 < jenkins-mlpack> Project mlpack - nightly matrix build build #474: STILL UNSTABLE in 1 hr 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/474/

05:33 < jenkins-mlpack> * andrewmw94: Fix/update some comments, almost finish the splitting algorithm. Several miscellaneous changes.

05:33 < jenkins-mlpack> * Ryan Curtin: Trivial spelling fix.

06:13 < marcus_zoq> Anand: Good morning! We need to rewrite the formula because if you assume '2' as the highest label, the formula doesn't work. Any idea to work around it?

06:24 Anand has quit [Ping timeout: 240 seconds]

06:32 Anand has joined #mlpack

06:33 < Anand> Marcus : Some preprocessing of the true labels csv file will give me the highest label and then I can use it in the formula

06:33 < Anand> Just a bit of parsing work has to be done

06:33 < Anand> But, only if it we agree on this being the right thing to do

06:37 < marcus_zoq> So you will search for the highest label and then use the label in the formula?

07:00 Anand has quit [Ping timeout: 240 seconds]

07:08 cuphrody has joined #mlpack

07:59 Anand has joined #mlpack

07:59 < Anand> Marcus : Yes, parsing the file and extracting label seems to be a valid option

08:04 < marcus_zoq> Anand: Sure it will work when the highest label is 1, are we talking about the same thing? :)

08:16 < Anand> No. I am concerned about the case when highest label is not 1.

08:16 < Anand> What then?

08:16 < Anand> Will the reviously mentioned approach work then?

08:16 < Anand> *previously

08:40 < marcus_zoq> I don't think so.

08:46 < Anand> What do you suggest then?

08:46 < Anand> I need this metric to work for multi class classifiers

08:49 < Anand> And why do you think will it not work

08:54 < marcus_zoq> Maybe, I'm wrong but you can test the approach with an made up example binary and multiclass and check the results.

09:05 < marcus_zoq> I have to think about a solution.

09:08 < marcus_zoq> Perhaps in the meantime you can define the labels transition function?

09:08 < marcus_zoq> Or maybe I'm wrong ...

09:19 < Anand> Wheterh we need a labels transition function or not depends on the datasets being used. For now, we have numbers in all our datasets

09:38 Anand has quit [Ping timeout: 240 seconds]

09:49 Anand has joined #mlpack

10:08 Anand has quit [Ping timeout: 240 seconds]

10:28 oldbeardo has joined #mlpack

10:29 < oldbeardo> marcus_zoq: I pushed a blog post, but for some reason it's not visible on the blog page

10:39 oldbeardo has quit [Quit: Page closed]

10:49 udit_s has joined #mlpack

11:18 < marcus_zoq> oldbeardo: You've missed some newlines in the blog header. I've fixed the header: https://github.com/zoq/blog/commit/8bcf9103787afd852b0ce8de3edc9ee7665726f6

12:17 andrewmw94 has joined #mlpack

13:14 < udit_s> marcus_zoq: hey.

13:18 < marcus_zoq> udit_s: Hello!

13:20 < udit_s> hey, I've just sent you a mail. I also wanted your help because I'm unable to find the IRC logs from thursday. I had to take a few links off of them.

13:28 < marcus_zoq> udit_s: On that day there was a cooling failure, so the script failed to generate the log.

13:29 < udit_s> wow. talk about murphy's law...

13:29 < marcus_zoq> udit_s: But I wonder if we actually lost the log file.

13:29 < udit_s> how can I access it ?

13:30 < marcus_zoq> udit_s: My log file is gone, maybe naywhayare has a backup log?

13:32 < udit_s> marcus_zoq: okay, but he seems unresponsive. I think he's busy with his paper. Any other suggestions ?

13:32 < udit_s> Basically, I had discretize continuous attributes.

13:32 < udit_s> *to

13:33 < marcus_zoq> Right

13:34 < udit_s> so I was wondering about the supervised discretization method. A few questions.

13:38 govg has joined #mlpack

13:38 govg has quit [Changing host]

13:38 govg has joined #mlpack

13:38 govg has quit [Client Quit]

13:38 govg has joined #mlpack

13:38 govg has quit [Changing host]

13:38 govg has joined #mlpack

14:17 govg has quit [Ping timeout: 240 seconds]

14:19 govg has joined #mlpack

14:19 govg has quit [Changing host]

14:19 govg has joined #mlpack

14:43 < naywhayare> udit_s: I can't seem to find a log from thursday

14:44 < naywhayare> but I'm pretty sure I have the log somewhere. I'll keep looking...

14:44 < udit_s> naywhayare: okay, I'm working on unsupervised equi-width binning.

14:45 < udit_s> I'm sorrting the continuous values and then using binning, I'm splitting.

14:45 < naywhayare> okay

14:45 < naywhayare> were you looking for the paper that had the binning algorithm in it?

14:46 < naywhayare> I can probably dig that link up again if you need

14:46 < udit_s> yeah. and the weka one.

14:47 < naywhayare> there was code in Weka somewhere; you can download the source and look for the OneR class

14:47 < naywhayare> let me find the paper...

14:50 < naywhayare> the paper is "Very simple classification rules perform well on most commonly used datasets" and the algorithm was in one of the appendices

14:50 < naywhayare> http://ourmine.googlecode.com/svn-history/r1513/trunk/share/pdf/93holte.pdf

14:52 < udit_s> got it ! thanks. btw. are you free in a while ?

14:59 < udit_s> naywhayare: do you remember there was this paper which talked about sorting the continuously valued attributes first ? Could you share that ?

15:01 cuphrody has quit [Ping timeout: 276 seconds]

15:20 < naywhayare> udit_s: sorry, I stepped out

15:20 < naywhayare> I don't remember which paper that was...

15:20 < naywhayare> when I get to my lab computer I will see if I have the logs there

15:22 Anand_ has joined #mlpack

15:44 < marcus_zoq> Anand_: Did you test the formula with some made up values?

15:45 < Anand_> Yes, the formula will work

15:45 < Anand_> But, I don't know if it is correct to convert the formula

15:45 < Anand_> for multi-class classificatiom

15:46 < Anand_> I was going through a paper to see if I can find something

15:46 < Anand_> Didn't find much

15:46 < marcus_zoq> Okay, see if I can find something

15:47 govg has quit [Quit: leaving]

15:47 < Anand_> Ok. I will tell you if I find the right thing

15:57 govg has joined #mlpack

15:57 govg has quit [Changing host]

15:57 govg has joined #mlpack

15:59 < naywhayare> udit_s: somehow I found the logs... let me put them in the right place and rebuild the irc log page

16:09 < naywhayare> udit_s: ok, done: http://mlpack.org/irc/mlpack.20140529.html

16:38 govg has quit [Quit: leaving]

16:42 < Anand_> Marcus : One way to make one vs all work is to calculate the mean predictive information for each class by assuming it as 0-class and all other classes as 1-class and then take average

16:42 < Anand_> We won't need to modify the formula for this

16:42 < Anand_> Just the file parsing will increase

16:43 wavelander has joined #mlpack

16:43 < udit_s> naywhayare: good stuff !

16:43 wavelander has left #mlpack []

16:48 govg has joined #mlpack

16:48 govg has quit [Changing host]

16:48 govg has joined #mlpack

17:04 oldbeardo has joined #mlpack

17:05 < oldbeardo> marcus_zoq: sorry about the earlier message, I guess it takes a few minutes

17:08 < naywhayare> oldbeardo: marcus_zoq: something weird happened and Jenkins lost the whole workspace so I just had Jenkins rebuild the blog and now it seems to be okay

17:15 < oldbeardo> and sorry again, I just saw your reply to my message

17:16 oldbeardo has quit [Quit: Page closed]

17:17 oldbeardo has joined #mlpack

17:18 < oldbeardo> naywhayare: okay, didn't receive your message on irc, I have to keep watching on the logs, stupid net connection

17:18 < naywhayare> :( sorry about your connection

17:18 < naywhayare> it's a good thing the irc logging system (mostly) works, though :)

17:19 < oldbeardo> yeah, that's good, earlier when we had just started out it didn't show messages immediately

17:19 < naywhayare> if you find problems with it, let me know so that I can fix them. it will have to wait until after my paper deadline though

17:19 < naywhayare> I know that the month navigation stopped working in June for some reason

17:19 < naywhayare> not sure why

17:20 < oldbeardo> okay, will do, let's hope I don't have to

17:25 oldbeardo has quit [Quit: Page closed]

17:31 Anand_ has quit [Ping timeout: 240 seconds]

18:02 govg has quit [Ping timeout: 240 seconds]

19:49 < andrewmw94> naywhayare: is there a reason you use the terminology leafSize rather than maximumLeafSize or something like that?

19:49 < andrewmw94> I would prefer the latter, but I also want to keep it consistent across trees

19:50 udit_s has quit [Remote host closed the connection]

19:50 < andrewmw94> also, I have numChildren to refer to the number of childNodes that an given node has on the level below it, but the way numChildren() is implemented for the BSP tree, it will return the total number of descendants

19:51 < andrewmw94> which makes sense, since a BSP node has two children, but for the R tree variants, I need to have some variable to track the number of children on the next level.

19:58 < naywhayare> andrewmw94: leafSize is just a historical term that's been in use for a long time. I guess maximumLeafSize would actually be more accurate

19:58 < naywhayare> we can change the name of leafSize in other trees to reflect that, so you can go ahead and use maximumLeafSize

19:58 < andrewmw94> what about numChildren()

19:58 < naywhayare> and if you want, you can change all references to leafSize in the rest of the code, because almost certainly I'm going to forget to do it and I don't have time to do it now

19:59 < naywhayare> NumChildren() for binary space trees does only return the number of children and not the number of descendants (that can be obtained with NumDescendants())

19:59 < naywhayare> so for binary space trees NumChildren() is either 0, 1, or 2

19:59 < naywhayare> actually I think it'll never be 1

20:00 < naywhayare> I wouldn't think that it would be hard to track the number of children that an R tree has; shouldn't that be something that each node stores anyway?

20:01 < andrewmw94> yeah. I for some reason thought that it was supposed to return all descendants

20:01 < naywhayare> also NumDescendants() returns the number of descendant points, not the number of descendant nodes

20:01 < andrewmw94> and then I needed to use a variable for the children on the next node, and having both with similar names is confusing, but we don't actually have both

20:01 < naywhayare> there's not yet been any case where the number of descendant nodes has been useful

20:03 < andrewmw94> ahh, I think I was confused on what count meant when I took my notes in the first week

20:03 < andrewmw94> and that's why I got the purposes of numChildren and numDescendants confused

20:05 < naywhayare> yeah, begin and count are useful for the binary space tree because they denote which points in the matrix are descendants of a given node

20:35 < jenkins-mlpack> Starting build #1930 for job mlpack - svn checkin test (previous build: SUCCESS)

21:08 < jenkins-mlpack> Project mlpack - svn checkin test build #1930: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1930/

21:08 < jenkins-mlpack> andrewmw94: change name of leafSize to maxLeafSize. more stuff for the R-tree. Some name changes, some more node splitting, a start on traversal.

22:53 andrewmw94 has quit [Ping timeout: 245 seconds]

23:00 andrewmw94 has joined #mlpack

23:20 andrewmw94 has quit [Remote host closed the connection]

23:29 < jenkins-mlpack> Starting build #1931 for job mlpack - svn checkin test (previous build: SUCCESS)