verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
govg has joined #mlpack
govg has quit [Ping timeout: 260 seconds]
govg has joined #mlpack
< wiking> zoq, :) ok so just one more question
< wiking> which are the datasets that are you using for rf? :)
< wiking> rcurtin, ^
< wiking> as i've just found a terrible bug in shogun's rf
< wiking> and i really wonder how that compares to other rf implementations :D
vivekp has quit [Ping timeout: 260 seconds]
vivekp has joined #mlpack
< zoq> wiking: We can add more if you like to see results for an interesting dataset.
klitzy_ has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
klitzy_ has quit [Client Quit]
< wiking> zoq, lemme check
< wiking> i wanna run an experimente :)
< wiking> datasets/iris_train.csv', 'datasets/iris_test.csv', 'datasets/iris_labels.csv are the ones here right
< wiking> ?
< wiking> zoq, ^
vivekp has joined #mlpack
< zoq> sometimes bitbucket is slow so downloading from masterblaster might be faster
< wiking> thnx
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
< wiking> zoq, still around/
< wiking> ?
< wiking> say regarding the oilspill dataset
< wiking> nothing it was my fauilt
< wiking> but i have some constructive critisism :)
< wiking> lemme know when u r around
< rcurtin> wiking: the dataset formats are just a little bit odd; the training datasets have the labels as the last column, but the test dataset and its labels are separate
< wiking> yep yep
< wiking> rcurtin, morning
< wiking> so i've got that part
< wiking> but now i have some real issue
< wiking> i mean this one is good for a blog
< wiking> but we should have a chat about it
< wiking> so say about oilspill rigth
< wiking> it's a simple binary classification problem
< wiking> now i've found out a huge bug in shogun's RF
< wiking> that makes that RF implementation nothing more but simply a boosted tree (note the singular term)
< rcurtin> yeah, I saw you mention that in $shogun
< rcurtin> er, #shogun
< wiking> yep eyp
< wiking> so now i compare sklearn vs shogun
< wiking> using oilspill
< wiking> basically shogun's RF is better than sklearn's :D
< wiking> which is of course crazy :P
< rcurtin> so, you are saying that just a boosted tree is outperforming random forests in this case
< wiking> yes
< rcurtin> and when you say better, you mean accuracy or some related metric
< wiking> accuracy: 0.967948717949. vs 0.96474358974358976
< wiking> (shogun vs sklearn)
< rcurtin> right; I could see it
< wiking> so maybe some of the datasets
< rcurtin> I would be interested to see if the pattern applies on other datasets also
< wiking> are not the best choise to test
< wiking> rfs
< wiking> but yeah i'll do this
< wiking> now for all the datasets
< wiking> that you guys are doing
< wiking> and share the results :)
< rcurtin> yeah, so I am currently going through some of the benchmarks that I'd like to highlight and picking datasets more carefully
< wiking> sure thing
< rcurtin> for instance, many of the logistic regression ones, I am not seeing better than 50% accuracy for some datasets, so I think something is wrong with some of them
< wiking> as this one is clearly misleading :D
< rcurtin> so I'm trying to replace with other datasets that give more useful results
< wiking> i mean the results :P
< rcurtin> right
< rcurtin> here's an example of the display I am working on:
< rcurtin> uh, the interface is bad ...
< rcurtin> so select "All metric plots for parameters sweeps", then LogisticRegression, whatever dataset, and whatever metric
< rcurtin> I think the reuters dataset gives a nice graph, some of the others are kinda ugly
< wiking> a this is metric vs speed? :)
< wiking> nice
< wiking> we did those at datarobot :)
< wiking> mmm
< wiking> i wonder why shogun's LR is so bad :)
< rcurtin> not sure, I haven't dug into it
< rcurtin> yeah, I am adding some more libsvm datasets
< zoq> I could share the converted datasets.
< rcurtin> oh, do you have more that you've converted?
< rcurtin> I was working with the epsilon_normalized dataset but I screwed it up and will have to start over... :)
< zoq> Yeah, once I get home I can provide the datasets.
< rcurtin> sure, sounds good
< zoq> Looks like I haven't extracted the test labels as another file, but should be easy to modify the script accordingly: https://github.com/zoq/datasets
< rcurtin> zoq: very nice, thank you
govg has quit [Ping timeout: 248 seconds]
witness has joined #mlpack
cult- has left #mlpack []