#mlpack on 2016-04-21 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

05:06 mentekid has joined #mlpack

06:10 sohail has quit [Ping timeout: 246 seconds]

06:10 Mathnerd314 has quit [Ping timeout: 246 seconds]

09:35 Nilabhra has joined #mlpack

09:36 keonkim has quit [Ping timeout: 268 seconds]

09:55 keonkim has joined #mlpack

12:30 ank_95_ has joined #mlpack

14:35 ank_95_ has quit [Quit: Connection closed for inactivity]

15:03 skon46 has joined #mlpack

15:04 mentekid has quit [Ping timeout: 276 seconds]

15:28 Mathnerd314 has joined #mlpack

15:37 sumedhghaisas has joined #mlpack

15:37 palashahuja has joined #mlpack

15:57 skon46 has quit [Quit: Leaving]

15:59 sohail has joined #mlpack

16:02 < sohail> zoq: check it out, I got the fit working pretty OK using a classification tree: https://i.imgur.com/XdsQAwb.png

16:02 < sohail> I ended up training 7 different models, one on each day. You can see Sunday(1) fails spectacularly

16:03 < sohail> I'm thinking combining weekends and weekdays might give better results, but I'm not sure yet

16:49 sumedhghaisas has quit [Ping timeout: 240 seconds]

17:00 sumedhghaisas has joined #mlpack

17:06 palashahuja has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

17:19 virtualgod has joined #mlpack

17:23 ranjan123 has joined #mlpack

17:25 < ranjan123> <24 Hours left to announce the gsoc result .

17:26 < ranjan123> :P

17:28 < ranjan123> I am really excited and prepared to become sad. :D .

17:32 < ranjan123> I have learned so many things in these days . Thanks zoq, rcurtin. I really appreciate your help.

17:52 virtualgod has quit [Quit: Connection closed for inactivity]

18:13 Stellar_Mind has joined #mlpack

18:19 sumedhghaisas has quit [Remote host closed the connection]

18:20 Stellar_Mind has left #mlpack []

18:40 christie has joined #mlpack

18:42 < sohail> so here's a question... how do I get information about the "goodness of fit" from the decision stump API? I can't seem to find that

18:45 < sohail> it's probably a dumb question

18:47 < rcurtin> sohail: what does "goodness of fit" mean here?

18:47 < rcurtin> usually in a machine learning context, you might take your labeled dataset and set aside part of it as a test set

18:47 < rcurtin> then use your accuracy of the classifier on that test set as your goodness of fit measure

18:47 < sohail> rcurtin: the problem I have is that the dataset is changing over time

18:48 < sohail> so if I have 2 week of training data, then the test set will only be relevant for the next two weeks

18:48 < sohail> I'm not sure if explaining this well

18:49 < sohail> the model, not the test set will only be relevant for the next two weeks

18:51 < rcurtin> if your test set is a different set entirely, that's still okay, you can use your error rate on that test set as a measure of goodness of fit

18:51 < rcurtin> but if your test set has significantly different patterns than your training set, then your problem is nonstationary and decision stumps may not be the best technique...

18:51 < rcurtin> nonstationary machine learning problems can be difficult to deal with, because you're trying to learn a pattern that changes over time...

18:51 < sohail> rcurtin: I'm not explaining it well. So I want to create a model that will predict today's behaviour on the last two weeks of data

18:52 < sohail> I guess one way to do this is to use N-1 samples and run it on the Nth sample to see if it works?

18:52 < sohail> the change is not (usually) dramatic, btw.

18:53 < sohail> so how would I test that model?

18:54 < sohail> one way, I guess, since I have two weeks of data is to train it on 2 weeks - 1...day before yesterday

18:54 < rcurtin> yes, I think that might be the right way to do it here

18:54 < sohail> then compare predictions with yesterday?

18:54 < rcurtin> yeah, you can do that, and then get an idea of how good the algorithm will perform in the future

18:54 < sohail> ok, I think that makes sense

18:54 < rcurtin> but when you actually deploy the algorithm, I would train the algorithm on the entire two weeks of data

18:55 < rcurtin> i.e., to see how well it does, train on a subset (every day up to yesterday) then test on yesterday

18:55 < rcurtin> but for your actual results, train on every day that you have data for

18:55 < rcurtin> and the performance will probably be roughly the same

18:55 < sohail> Really, train on every day?

18:55 < sohail> even if I know the patterns change over time?

18:55 < rcurtin> oh, sorry

18:55 < rcurtin> I meant, train on every day in the last two weeks :)

18:56 < sohail> ah, gotcha :)

18:57 < sohail> thanks for your help rcurtin

19:10 govg has joined #mlpack

19:12 < rcurtin> sure, no problem :)

19:12 < rcurtin> ranjan123: I'm happy that I've been able to help with the learning process. :) I'll take a look at your updated PR when I have a chance

19:17 < ranjan123> Hmmm.

19:18 zoq_ has joined #mlpack

19:18 zoq has quit [Read error: Connection reset by peer]

19:19 < ranjan123> I have updated it 6 days back but didn't comment any thing

19:19 zoq_ is now known as zoq

19:19 < rcurtin> github doesn't send me an email for commits to a PR, so I wasn't aware, sorry about that

19:19 < ranjan123> yes

19:30 christie has quit [Ping timeout: 250 seconds]

19:31 govg has quit [Ping timeout: 240 seconds]

19:31 gopalakr has joined #mlpack

19:32 < gopalakr> have a weird issie with linear regression

19:32 < gopalakr> [INFO ] Loading '/tmp/n20train.csv' as CSV data. Size is 30 x 20000. [INFO ] Loading '/tmp/ny1.csv' as raw ASCII formatted data. Size is 1 x 20000. Intel MKL ERROR: Parameter 4 was incorrect on entry to DGELSD. [INFO ] Loading '/tmp/n20test.csv' as CSV data. Size is 30 x 20000. [FATAL] The model was trained on 18446744073709551615-dimensional data, but the test points in '/tmp/n20test.csv' are 30-dimensional! terminate called

19:35 < ranjan123> Parameter 4 was incorrect on entry to DGELSD

19:35 < gopalakr> some MKL issue... cud be if something is wrong within the training

19:36 < gopalakr> the data itself is sane,, no nan/inf

19:36 < ranjan123> Not sure. but may be some problem in the dataset

19:36 < ranjan123> ohh

19:38 < ranjan123> rcurtin, zoq may help you!

19:40 < rcurtin> gopalakr: is it easy for you to try with regular blas/lapack instead of MKL?

19:41 < gopalakr> shud i be recompiling armadillo without mkl ?

19:41 < gopalakr> or just mlpack

19:41 awhitesong has joined #mlpack

19:42 < rcurtin> gopalakr: you'll have to change your armadillo configuration and recompile armadillo, then recompile mlpack

19:43 < gopalakr> got it tx.. is it an issue in general for mlpack ? mkl messing things up ?

19:43 govg has joined #mlpack

19:44 < rcurtin> no, not in general, I just want to see if that fixes the issue

19:44 < rcurtin> I have no idea what the underlying problem might be

19:44 < rcurtin> but if we can isolate it to MKL, we might be able to figure out what's going on

19:44 < gopalakr> ok will let you know.. still recompiling

19:44 < rcurtin> if the problem is still there when using regular BLAS/LAPACK, maybe there is a bug in either armadillo or mlpack

19:45 < zoq> gopalakr: Btw. Did you test with an existing model? Or did you train and test at the same time. I guess, if you start with an existing model there is something wrong with the serialization. If you train and test at the same time, parameters should be set.

19:46 < gopalakr> same time

19:49 < zoq> Okay, maybe there is a command combination that doesn't train the model. Can you show us the command you are using to train/test the model?

19:49 Bartek has joined #mlpack

19:53 < gopalakr> ~/mlpack/usr/local/bin/mlpack_linear_regression -t /tmp/n20train.csv -r /tmp/ny1.csv - p /tmp/nyout1.csv -M m.csv -v

20:09 < zoq> gopalakr: Are you sure this is the complete command, to get the error message "The model was trained on..." you have to specify the test file either using "--test_file" or "-T", maybe that changed over the time, not sure

20:13 < gopalakr> sorry that was another instance...

20:13 < gopalakr> heres the right one -- ~/mlpack/usr/local/bin/mlpack_linear_regression -t /tmp/n20train.csv -r /tmp/ny1.csv -T /tmp/n20test.csv -p /tmp/nyout1.csv -v

20:14 Nilabhra has quit [Remote host closed the connection]

20:23 < zoq> gopalakr: Okay, looks good, as long as you don't specify 'input_model_file' or 'm' it should always train the model before predicting and lr.Parameters().n_elem should be valid.

20:24 Bartek has quit [Ping timeout: 240 seconds]

20:37 Bartek has joined #mlpack

20:47 govg has quit [Ping timeout: 250 seconds]

21:05 ranjan123 has quit [Quit: Page closed]

21:07 Bartek has quit [Ping timeout: 250 seconds]

21:57 mentekid has joined #mlpack

21:57 Bartek has joined #mlpack

21:59 gopalakr has quit [Ping timeout: 250 seconds]

22:02 Bartek has quit [Ping timeout: 276 seconds]

22:14 Bartek has joined #mlpack

22:18 mentekid has quit [Ping timeout: 276 seconds]

22:38 Bartek has quit [Remote host closed the connection]

22:56 awhitesong has left #mlpack []

23:12 travis-ci has joined #mlpack

23:12 < travis-ci> mlpack/mlpack#793 (master - ebf77f8 : Ryan Curtin): The build passed.

23:12 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/d2e353468b8f...ebf77f8b1332

23:12 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/124885447

23:12 travis-ci has left #mlpack []

23:44 govg has joined #mlpack

23:48 govg has quit [Ping timeout: 244 seconds]

23:49 govg has joined #mlpack

23:54 govg has quit [Ping timeout: 244 seconds]

23:55 govg has joined #mlpack