verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
govg has joined #mlpack
govg has quit [Ping timeout: 260 seconds]
govg has joined #mlpack
< wiking>
zoq, :) ok so just one more question
< wiking>
which are the datasets that are you using for rf? :)
< wiking>
rcurtin, ^
< wiking>
as i've just found a terrible bug in shogun's rf
< wiking>
and i really wonder how that compares to other rf implementations :D
< zoq>
sometimes bitbucket is slow so downloading from masterblaster might be faster
< wiking>
thnx
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
< wiking>
zoq, still around/
< wiking>
?
< wiking>
say regarding the oilspill dataset
< wiking>
nothing it was my fauilt
< wiking>
but i have some constructive critisism :)
< wiking>
lemme know when u r around
< rcurtin>
wiking: the dataset formats are just a little bit odd; the training datasets have the labels as the last column, but the test dataset and its labels are separate
< wiking>
yep yep
< wiking>
rcurtin, morning
< wiking>
so i've got that part
< wiking>
but now i have some real issue
< wiking>
i mean this one is good for a blog
< wiking>
but we should have a chat about it
< wiking>
so say about oilspill rigth
< wiking>
it's a simple binary classification problem
< wiking>
now i've found out a huge bug in shogun's RF
< wiking>
that makes that RF implementation nothing more but simply a boosted tree (note the singular term)
< rcurtin>
yeah, I saw you mention that in $shogun
< rcurtin>
er, #shogun
< wiking>
yep eyp
< wiking>
so now i compare sklearn vs shogun
< wiking>
using oilspill
< wiking>
basically shogun's RF is better than sklearn's :D
< wiking>
which is of course crazy :P
< rcurtin>
so, you are saying that just a boosted tree is outperforming random forests in this case
< wiking>
yes
< rcurtin>
and when you say better, you mean accuracy or some related metric
< wiking>
accuracy: 0.967948717949. vs 0.96474358974358976
< wiking>
(shogun vs sklearn)
< rcurtin>
right; I could see it
< wiking>
so maybe some of the datasets
< rcurtin>
I would be interested to see if the pattern applies on other datasets also
< wiking>
are not the best choise to test
< wiking>
rfs
< wiking>
but yeah i'll do this
< wiking>
now for all the datasets
< wiking>
that you guys are doing
< wiking>
and share the results :)
< rcurtin>
yeah, so I am currently going through some of the benchmarks that I'd like to highlight and picking datasets more carefully
< wiking>
sure thing
< rcurtin>
for instance, many of the logistic regression ones, I am not seeing better than 50% accuracy for some datasets, so I think something is wrong with some of them
< wiking>
as this one is clearly misleading :D
< rcurtin>
so I'm trying to replace with other datasets that give more useful results
< wiking>
i mean the results :P
< rcurtin>
right
< rcurtin>
here's an example of the display I am working on:
< rcurtin>
yeah, I am adding some more libsvm datasets
< zoq>
I could share the converted datasets.
< rcurtin>
oh, do you have more that you've converted?
< rcurtin>
I was working with the epsilon_normalized dataset but I screwed it up and will have to start over... :)
< zoq>
Yeah, once I get home I can provide the datasets.
< rcurtin>
sure, sounds good
< zoq>
Looks like I haven't extracted the test labels as another file, but should be easy to modify the script accordingly: https://github.com/zoq/datasets