#mlpack on 2018-07-21 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:57 vivekp has quit [Read error: Connection reset by peer]

02:00 vivekp has joined #mlpack

02:03 vivekp has quit [Read error: Connection reset by peer]

02:06 vivekp has joined #mlpack

02:12 vivekp has quit [Ping timeout: 240 seconds]

05:52 ImQ009 has joined #mlpack

06:24 < Atharva> zoq:https://github.com/mlpack/mlpack/blob/3b7bbf0f14172cdb00fd16cbf12918b07c888b96/src/mlpack/methods/ann/layer/sequential_impl.hpp#L75

06:25 < Atharva> What is the reason for setting reset to true here?

06:27 < Atharva> It causes a problem in the case when in the sequential layer, the first layer is linear and the second is convolutional. This sets the height and width of convolutional to 0.

06:40 vivekp has joined #mlpack

07:24 vivekp has quit [Read error: Connection reset by peer]

07:26 vivekp has joined #mlpack

12:54 ImQ009 has quit [Read error: Connection reset by peer]

14:15 ImQ009 has joined #mlpack

14:18 travis-ci has joined #mlpack

14:18 < travis-ci> manish7294/mlpack#73 (impBounds - b7e25ab : Manish): The build is still failing.

14:18 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/74236a6bd37b...b7e25aba5e6d

14:18 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79640360

14:18 travis-ci has left #mlpack []

14:19 travis-ci has joined #mlpack

14:19 < travis-ci> manish7294/mlpack#8 (impBounds - b7e25ab : Manish): The build is still failing.

14:19 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/74236a6bd37b...b7e25aba5e6d

14:19 < travis-ci> Build details : https://travis-ci.org/manish7294/mlpack/builds/406585242

14:19 travis-ci has left #mlpack []

14:22 ImQ009 has quit [Read error: Connection reset by peer]

14:30 ImQ009 has joined #mlpack

14:59 < ShikharJ> zoq: Kris had mentioned that with SSRBM on the digits dataset, the accuracy he obtained was about 82%, so we're good on that number. But with BinaryRBM he had mentioned an accuracy of 86% (we are at about 70% now). I'm unsure how he obtained a number that high, I'll probably look for the comments in the PR as well.

16:17 travis-ci has joined #mlpack

16:17 < travis-ci> manish7294/mlpack#74 (Patch - 23fd8b1 : Manish): The build was fixed.

16:17 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/3bd4acfdf3c3...23fd8b1d7de5

16:17 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79642747

16:17 travis-ci has left #mlpack []

16:17 travis-ci has joined #mlpack

16:17 < travis-ci> manish7294/mlpack#10 (Patch - dfff872 : Manish): The build was fixed.

16:17 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/23fd8b1d7de5...dfff872a421a

16:17 < travis-ci> Build details : https://travis-ci.org/manish7294/mlpack/builds/406613209

16:17 travis-ci has left #mlpack []

16:40 < ShikharJ> zoq: Ah, okay this is embarrassing, I just had to reduce the stepSize a bit, and we're hitting ~80% accuracy on BinaryRBM as well. It could be because we're taking mini-batches, and a larger stepSize would suit a single input batch better. Please review whenever free.

17:05 travis-ci has joined #mlpack

17:05 < travis-ci> manish7294/mlpack#75 (Patch - dfff872 : Manish): The build was fixed.

17:05 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/23fd8b1d7de5...dfff872a421a

17:05 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/79642787

17:05 travis-ci has left #mlpack []

18:57 navdeep has joined #mlpack

19:01 < navdeep> Hi I trained a model using sklearn random_forest and got accuracy of ~80% on test data. Using the same num_of_trees and and minimum_leaf_size when I trained on same data in mlpack random_forest, I got accuracy 68% accuracy on same test data. Any reason why that'd happen?

19:12 < ShikharJ> navdeep: I can't really say why you'd be getting a lower score, but it'd really be helpful if you could provide the scripts you used for the sklearn and random_forest code. That way, it is easier for us to ascertain.

19:13 < navdeep> tuned_parameters = {'min_samples_leaf': range(2,16,2), 'n_estimators' : range(50,250,50), 'min_samples_split': range(2,16,2) }

19:13 < navdeep> rf_model= GridSearchCV(model_rf, tuned_parameters, cv=5, scoring='accuracy', n_jobs= -1)

19:14 < navdeep> rf_model.fit(X_train.values, y_train['label'])

19:14 < navdeep> model_rf = RandomForestClassifier()

19:15 < navdeep> this is what I get as best parameters: min_samples_leaf=2, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=150

19:16 < navdeep> So I use mlpack like this:

19:16 < navdeep> mlpack.random_forest(training=X_train, labels=y_train['label'], print_training_accuracy=True, num_trees=150, minimum_leaf_size=2, verbose=True )

21:02 < rcurtin> navdeep: note that the accuracy depends on the threshold that you use; do you try making an ROC curve to compare the models or anything?

21:03 < rcurtin> also I'd expect minimum_leaf_size == 1 to give the best performance

21:09 < navdeep> I haven't drawn RoC curve yet

21:10 < navdeep> What do you mean by threshold?

21:28 < navdeep> only available input parameters are:

21:28 < navdeep> - copy_all_inputs (bool), - input_model (RandomForestModelType) - labels - minimum_leaf_size (int) - num_trees (int) - print_training_accuracy (bool) - test (matrix) - test_labels (row vector) - training (matrix) - verbose (bool):

21:29 < navdeep> rcurtin: I was reading this article http://lists.mlpack.org/pipermail/mlpack/2018-May/003752.html seems to be written by you, but how do you set threshold in api?

21:29 < navdeep> @rcurtin

22:23 ImQ009 has quit [Quit: Leaving]

22:37 < rcurtin> navdeep: sorry, I stepped out and can give a better response later

22:37 < rcurtin> but in essence use the Predict() overload that returns class probabilities then classify based on that

22:51 navdeep has quit [Ping timeout: 252 seconds]

23:12 navdeep has joined #mlpack

23:13 < navdeep> rcurtin: I am using probability overload one only. My question is still though why same algorithm returns different result for sklearn vs mlpack

23:39 < rcurtin> navdeep: there are a couple things

23:39 < rcurtin> first like I said the accuracy depends on the threshold so to compare these correctly you should look at ROC curves

23:39 < rcurtin> second there are minor implementation differences that could make a difference

23:40 < rcurtin> I see that in scikit, they take max_features = sqrt(dimensions)

23:40 < rcurtin> I see that mlpack's implementation uses a default of 3 that is not easy to change unless you write C++

23:41 < rcurtin> so for sure an option should be added for that and I will try to ensure that I do that this week (Monday perhaps)

23:41 < rcurtin> but that may or may not be making the difference here. an ROC curve would show more