verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
ImQ009 has joined #mlpack
< Atharva> zoq:https://github.com/mlpack/mlpack/blob/3b7bbf0f14172cdb00fd16cbf12918b07c888b96/src/mlpack/methods/ann/layer/sequential_impl.hpp#L75
< Atharva> What is the reason for setting reset to true here?
< Atharva> It causes a problem in the case when in the sequential layer, the first layer is linear and the second is convolutional. This sets the height and width of convolutional to 0.
vivekp has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
ImQ009 has quit [Read error: Connection reset by peer]
ImQ009 has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#73 (impBounds - b7e25ab : Manish): The build is still failing.
travis-ci has left #mlpack []
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#8 (impBounds - b7e25ab : Manish): The build is still failing.
travis-ci has left #mlpack []
ImQ009 has quit [Read error: Connection reset by peer]
ImQ009 has joined #mlpack
< ShikharJ> zoq: Kris had mentioned that with SSRBM on the digits dataset, the accuracy he obtained was about 82%, so we're good on that number. But with BinaryRBM he had mentioned an accuracy of 86% (we are at about 70% now). I'm unsure how he obtained a number that high, I'll probably look for the comments in the PR as well.
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#74 (Patch - 23fd8b1 : Manish): The build was fixed.
travis-ci has left #mlpack []
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#10 (Patch - dfff872 : Manish): The build was fixed.
travis-ci has left #mlpack []
< ShikharJ> zoq: Ah, okay this is embarrassing, I just had to reduce the stepSize a bit, and we're hitting ~80% accuracy on BinaryRBM as well. It could be because we're taking mini-batches, and a larger stepSize would suit a single input batch better. Please review whenever free.
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#75 (Patch - dfff872 : Manish): The build was fixed.
travis-ci has left #mlpack []
navdeep has joined #mlpack
< navdeep> Hi I trained a model using sklearn random_forest and got accuracy of ~80% on test data. Using the same num_of_trees and and minimum_leaf_size when I trained on same data in mlpack random_forest, I got accuracy 68% accuracy on same test data. Any reason why that'd happen?
< ShikharJ> navdeep: I can't really say why you'd be getting a lower score, but it'd really be helpful if you could provide the scripts you used for the sklearn and random_forest code. That way, it is easier for us to ascertain.
< navdeep> tuned_parameters = {'min_samples_leaf': range(2,16,2), 'n_estimators' : range(50,250,50), 'min_samples_split': range(2,16,2) }
< navdeep> rf_model= GridSearchCV(model_rf, tuned_parameters, cv=5, scoring='accuracy', n_jobs= -1)
< navdeep> rf_model.fit(X_train.values, y_train['label'])
< navdeep> model_rf = RandomForestClassifier()
< navdeep> this is what I get as best parameters: min_samples_leaf=2, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=150
< navdeep> So I use mlpack like this:
< navdeep> mlpack.random_forest(training=X_train, labels=y_train['label'], print_training_accuracy=True, num_trees=150, minimum_leaf_size=2, verbose=True )
< rcurtin> navdeep: note that the accuracy depends on the threshold that you use; do you try making an ROC curve to compare the models or anything?
< rcurtin> also I'd expect minimum_leaf_size == 1 to give the best performance
< navdeep> I haven't drawn RoC curve yet
< navdeep> What do you mean by threshold?
< navdeep> only available input parameters are:
< navdeep> - copy_all_inputs (bool), - input_model (RandomForestModelType) - labels - minimum_leaf_size (int) - num_trees (int) - print_training_accuracy (bool) - test (matrix) - test_labels (row vector) - training (matrix) - verbose (bool):
< navdeep> rcurtin: I was reading this article http://lists.mlpack.org/pipermail/mlpack/2018-May/003752.html seems to be written by you, but how do you set threshold in api?
< navdeep> @rcurtin
ImQ009 has quit [Quit: Leaving]
< rcurtin> navdeep: sorry, I stepped out and can give a better response later
< rcurtin> but in essence use the Predict() overload that returns class probabilities then classify based on that
navdeep has quit [Ping timeout: 252 seconds]
navdeep has joined #mlpack
< navdeep> rcurtin: I am using probability overload one only. My question is still though why same algorithm returns different result for sklearn vs mlpack
< rcurtin> navdeep: there are a couple things
< rcurtin> first like I said the accuracy depends on the threshold so to compare these correctly you should look at ROC curves
< rcurtin> second there are minor implementation differences that could make a difference
< rcurtin> I see that in scikit, they take max_features = sqrt(dimensions)
< rcurtin> I see that mlpack's implementation uses a default of 3 that is not easy to change unless you write C++
< rcurtin> so for sure an option should be added for that and I will try to ensure that I do that this week (Monday perhaps)
< rcurtin> but that may or may not be making the difference here. an ROC curve would show more