#mlpack on 2017-09-26 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:42 vivekp has quit [Ping timeout: 240 seconds]

04:29 vivekp has joined #mlpack

05:37 govg has joined #mlpack

07:06 govg has quit [Ping timeout: 260 seconds]

07:07 govg has joined #mlpack

08:22 < wiking> zoq, :) ok so just one more question

08:22 < wiking> which are the datasets that are you using for rf? :)

08:22 < wiking> rcurtin, ^

08:22 < wiking> as i've just found a terrible bug in shogun's rf

08:22 < wiking> and i really wonder how that compares to other rf implementations :D

10:29 vivekp has quit [Ping timeout: 260 seconds]

10:31 vivekp has joined #mlpack

10:52 < zoq> wiking: https://github.com/mlpack/benchmarks/blob/master/config.yaml#L1985

10:53 < zoq> wiking: We can add more if you like to see results for an interesting dataset.

11:16 klitzy_ has joined #mlpack

11:18 vivekp has quit [Ping timeout: 240 seconds]

11:18 klitzy_ has quit [Client Quit]

11:19 < wiking> zoq, lemme check

11:19 < wiking> i wanna run an experimente :)

11:21 < wiking> datasets/iris_train.csv', 'datasets/iris_test.csv', 'datasets/iris_labels.csv are the ones here right

11:21 < wiking> https://github.com/ResearchComputing/meetup_spring_2014/tree/master/notebooks/data

11:21 < wiking> ?

11:21 < wiking> zoq, ^

11:21 vivekp has joined #mlpack

11:21 < wiking> or i meant here https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests/data/iris_train.csv

11:27 < zoq> https://bitbucket.org/zoqbits/benchmark-data.git or http://masterblaster.mlpack.org/job/benchmark%20-%20mlpack/ws/datasets/

11:27 < zoq> sometimes bitbucket is slow so downloading from masterblaster might be faster

11:38 < wiking> thnx

12:21 vivekp has quit [Ping timeout: 240 seconds]

12:26 vivekp has joined #mlpack

14:33 < wiking> zoq, still around/

14:33 < wiking> ?

14:35 < wiking> say regarding the oilspill dataset

14:36 < wiking> nothing it was my fauilt

14:37 < wiking> but i have some constructive critisism :)

14:37 < wiking> lemme know when u r around

14:46 < rcurtin> wiking: the dataset formats are just a little bit odd; the training datasets have the labels as the last column, but the test dataset and its labels are separate

14:47 < wiking> yep yep

14:47 < wiking> rcurtin, morning

14:47 < wiking> so i've got that part

14:47 < wiking> but now i have some real issue

14:47 < wiking> i mean this one is good for a blog

14:47 < wiking> but we should have a chat about it

14:47 < wiking> so say about oilspill rigth

14:47 < wiking> it's a simple binary classification problem

14:48 < wiking> now i've found out a huge bug in shogun's RF

14:48 < wiking> that makes that RF implementation nothing more but simply a boosted tree (note the singular term)

14:48 < rcurtin> yeah, I saw you mention that in $shogun

14:48 < rcurtin> er, #shogun

14:48 < wiking> yep eyp

14:48 < wiking> so now i compare sklearn vs shogun

14:48 < wiking> using oilspill

14:49 < wiking> basically shogun's RF is better than sklearn's :D

14:49 < wiking> which is of course crazy :P

14:49 < rcurtin> so, you are saying that just a boosted tree is outperforming random forests in this case

14:49 < wiking> yes

14:49 < rcurtin> and when you say better, you mean accuracy or some related metric

14:49 < wiking> accuracy: 0.967948717949. vs 0.96474358974358976

14:49 < wiking> (shogun vs sklearn)

14:49 < rcurtin> right; I could see it

14:50 < wiking> so maybe some of the datasets

14:50 < rcurtin> I would be interested to see if the pattern applies on other datasets also

14:50 < wiking> are not the best choise to test

14:50 < wiking> rfs

14:50 < wiking> but yeah i'll do this

14:50 < wiking> now for all the datasets

14:50 < wiking> that you guys are doing

14:50 < wiking> and share the results :)

14:50 < rcurtin> yeah, so I am currently going through some of the benchmarks that I'd like to highlight and picking datasets more carefully

14:50 < wiking> sure thing

14:50 < rcurtin> for instance, many of the logistic regression ones, I am not seeing better than 50% accuracy for some datasets, so I think something is wrong with some of them

14:51 < wiking> as this one is clearly misleading :D

14:51 < rcurtin> so I'm trying to replace with other datasets that give more useful results

14:51 < wiking> i mean the results :P

14:51 < rcurtin> right

14:51 < rcurtin> here's an example of the display I am working on:

14:51 < rcurtin> http://orange.ratml.org:8000/

14:52 < rcurtin> uh, the interface is bad ...

14:52 < rcurtin> so select "All metric plots for parameters sweeps", then LogisticRegression, whatever dataset, and whatever metric

14:52 < rcurtin> I think the reuters dataset gives a nice graph, some of the others are kinda ugly

14:54 < wiking> a this is metric vs speed? :)

14:54 < wiking> nice

14:54 < wiking> we did those at datarobot :)

14:55 < wiking> mmm

14:55 < wiking> i wonder why shogun's LR is so bad :)

14:57 < rcurtin> not sure, I haven't dug into it

15:07 < zoq> Used some datasets from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html for lr.

15:08 < rcurtin> yeah, I am adding some more libsvm datasets

15:08 < zoq> I could share the converted datasets.

15:08 < rcurtin> oh, do you have more that you've converted?

15:08 < rcurtin> I was working with the epsilon_normalized dataset but I screwed it up and will have to start over... :)

15:10 < zoq> Yeah, once I get home I can provide the datasets.

15:11 < rcurtin> sure, sounds good

17:03 < zoq> Looks like I haven't extracted the test labels as another file, but should be easy to modify the script accordingly: https://github.com/zoq/datasets

19:02 < rcurtin> zoq: very nice, thank you

19:14 govg has quit [Ping timeout: 248 seconds]

22:00 witness has joined #mlpack

23:23 cult- has left #mlpack []