verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
mentekid has joined #mlpack
sohail has quit [Ping timeout: 246 seconds]
Mathnerd314 has quit [Ping timeout: 246 seconds]
Nilabhra has joined #mlpack
keonkim has quit [Ping timeout: 268 seconds]
keonkim has joined #mlpack
ank_95_ has joined #mlpack
ank_95_ has quit [Quit: Connection closed for inactivity]
skon46 has joined #mlpack
mentekid has quit [Ping timeout: 276 seconds]
Mathnerd314 has joined #mlpack
sumedhghaisas has joined #mlpack
palashahuja has joined #mlpack
skon46 has quit [Quit: Leaving]
sohail has joined #mlpack
< sohail> zoq: check it out, I got the fit working pretty OK using a classification tree: https://i.imgur.com/XdsQAwb.png
< sohail> I ended up training 7 different models, one on each day. You can see Sunday(1) fails spectacularly
< sohail> I'm thinking combining weekends and weekdays might give better results, but I'm not sure yet
sumedhghaisas has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
palashahuja has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]
virtualgod has joined #mlpack
ranjan123 has joined #mlpack
< ranjan123> <24 Hours left to announce the gsoc result .
< ranjan123> :P
< ranjan123> I am really excited and prepared to become sad. :D .
< ranjan123> I have learned so many things in these days . Thanks zoq, rcurtin. I really appreciate your help.
virtualgod has quit [Quit: Connection closed for inactivity]
Stellar_Mind has joined #mlpack
sumedhghaisas has quit [Remote host closed the connection]
Stellar_Mind has left #mlpack []
christie has joined #mlpack
< sohail> so here's a question... how do I get information about the "goodness of fit" from the decision stump API? I can't seem to find that
< sohail> it's probably a dumb question
< rcurtin> sohail: what does "goodness of fit" mean here?
< rcurtin> usually in a machine learning context, you might take your labeled dataset and set aside part of it as a test set
< rcurtin> then use your accuracy of the classifier on that test set as your goodness of fit measure
< sohail> rcurtin: the problem I have is that the dataset is changing over time
< sohail> so if I have 2 week of training data, then the test set will only be relevant for the next two weeks
< sohail> I'm not sure if explaining this well
< sohail> the model, not the test set will only be relevant for the next two weeks
< rcurtin> if your test set is a different set entirely, that's still okay, you can use your error rate on that test set as a measure of goodness of fit
< rcurtin> but if your test set has significantly different patterns than your training set, then your problem is nonstationary and decision stumps may not be the best technique...
< rcurtin> nonstationary machine learning problems can be difficult to deal with, because you're trying to learn a pattern that changes over time...
< sohail> rcurtin: I'm not explaining it well. So I want to create a model that will predict today's behaviour on the last two weeks of data
< sohail> I guess one way to do this is to use N-1 samples and run it on the Nth sample to see if it works?
< sohail> the change is not (usually) dramatic, btw.
< sohail> so how would I test that model?
< sohail> one way, I guess, since I have two weeks of data is to train it on 2 weeks - 1...day before yesterday
< rcurtin> yes, I think that might be the right way to do it here
< sohail> then compare predictions with yesterday?
< rcurtin> yeah, you can do that, and then get an idea of how good the algorithm will perform in the future
< sohail> ok, I think that makes sense
< rcurtin> but when you actually deploy the algorithm, I would train the algorithm on the entire two weeks of data
< rcurtin> i.e., to see how well it does, train on a subset (every day up to yesterday) then test on yesterday
< rcurtin> but for your actual results, train on every day that you have data for
< rcurtin> and the performance will probably be roughly the same
< sohail> Really, train on every day?
< sohail> even if I know the patterns change over time?
< rcurtin> oh, sorry
< rcurtin> I meant, train on every day in the last two weeks :)
< sohail> ah, gotcha :)
< sohail> thanks for your help rcurtin
govg has joined #mlpack
< rcurtin> sure, no problem :)
< rcurtin> ranjan123: I'm happy that I've been able to help with the learning process. :) I'll take a look at your updated PR when I have a chance
< ranjan123> Hmmm.
zoq_ has joined #mlpack
zoq has quit [Read error: Connection reset by peer]
< ranjan123> I have updated it 6 days back but didn't comment any thing
zoq_ is now known as zoq
< rcurtin> github doesn't send me an email for commits to a PR, so I wasn't aware, sorry about that
< ranjan123> yes
christie has quit [Ping timeout: 250 seconds]
govg has quit [Ping timeout: 240 seconds]
gopalakr has joined #mlpack
< gopalakr> have a weird issie with linear regression
< gopalakr> [INFO ] Loading '/tmp/n20train.csv' as CSV data. Size is 30 x 20000. [INFO ] Loading '/tmp/ny1.csv' as raw ASCII formatted data. Size is 1 x 20000. Intel MKL ERROR: Parameter 4 was incorrect on entry to DGELSD. [INFO ] Loading '/tmp/n20test.csv' as CSV data. Size is 30 x 20000. [FATAL] The model was trained on 18446744073709551615-dimensional data, but the test points in '/tmp/n20test.csv' are 30-dimensional! terminate called
< ranjan123> Parameter 4 was incorrect on entry to DGELSD
< gopalakr> some MKL issue... cud be if something is wrong within the training
< gopalakr> the data itself is sane,, no nan/inf
< ranjan123> Not sure. but may be some problem in the dataset
< ranjan123> ohh
< ranjan123> rcurtin, zoq may help you!
< rcurtin> gopalakr: is it easy for you to try with regular blas/lapack instead of MKL?
< gopalakr> shud i be recompiling armadillo without mkl ?
< gopalakr> or just mlpack
awhitesong has joined #mlpack
< rcurtin> gopalakr: you'll have to change your armadillo configuration and recompile armadillo, then recompile mlpack
< gopalakr> got it tx.. is it an issue in general for mlpack ? mkl messing things up ?
govg has joined #mlpack
< rcurtin> no, not in general, I just want to see if that fixes the issue
< rcurtin> I have no idea what the underlying problem might be
< rcurtin> but if we can isolate it to MKL, we might be able to figure out what's going on
< gopalakr> ok will let you know.. still recompiling
< rcurtin> if the problem is still there when using regular BLAS/LAPACK, maybe there is a bug in either armadillo or mlpack
< zoq> gopalakr: Btw. Did you test with an existing model? Or did you train and test at the same time. I guess, if you start with an existing model there is something wrong with the serialization. If you train and test at the same time, parameters should be set.
< gopalakr> same time
< zoq> Okay, maybe there is a command combination that doesn't train the model. Can you show us the command you are using to train/test the model?
Bartek has joined #mlpack
< gopalakr> ~/mlpack/usr/local/bin/mlpack_linear_regression -t /tmp/n20train.csv -r /tmp/ny1.csv - p /tmp/nyout1.csv -M m.csv -v
< zoq> gopalakr: Are you sure this is the complete command, to get the error message "The model was trained on..." you have to specify the test file either using "--test_file" or "-T", maybe that changed over the time, not sure
< gopalakr> sorry that was another instance...
< gopalakr> heres the right one -- ~/mlpack/usr/local/bin/mlpack_linear_regression -t /tmp/n20train.csv -r /tmp/ny1.csv -T /tmp/n20test.csv -p /tmp/nyout1.csv -v
Nilabhra has quit [Remote host closed the connection]
< zoq> gopalakr: Okay, looks good, as long as you don't specify 'input_model_file' or 'm' it should always train the model before predicting and lr.Parameters().n_elem should be valid.
Bartek has quit [Ping timeout: 240 seconds]
Bartek has joined #mlpack
govg has quit [Ping timeout: 250 seconds]
ranjan123 has quit [Quit: Page closed]
Bartek has quit [Ping timeout: 250 seconds]
mentekid has joined #mlpack
Bartek has joined #mlpack
gopalakr has quit [Ping timeout: 250 seconds]
Bartek has quit [Ping timeout: 276 seconds]
Bartek has joined #mlpack
mentekid has quit [Ping timeout: 276 seconds]
Bartek has quit [Remote host closed the connection]
awhitesong has left #mlpack []
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#793 (master - ebf77f8 : Ryan Curtin): The build passed.
travis-ci has left #mlpack []
govg has joined #mlpack
govg has quit [Ping timeout: 244 seconds]
govg has joined #mlpack
govg has quit [Ping timeout: 244 seconds]
govg has joined #mlpack