verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
mentekid has joined #mlpack
sohail has quit [Ping timeout: 246 seconds]
Mathnerd314 has quit [Ping timeout: 246 seconds]
Nilabhra has joined #mlpack
keonkim has quit [Ping timeout: 268 seconds]
keonkim has joined #mlpack
ank_95_ has joined #mlpack
ank_95_ has quit [Quit: Connection closed for inactivity]
< ranjan123>
<24 Hours left to announce the gsoc result .
< ranjan123>
:P
< ranjan123>
I am really excited and prepared to become sad. :D .
< ranjan123>
I have learned so many things in these days . Thanks zoq, rcurtin. I really appreciate your help.
virtualgod has quit [Quit: Connection closed for inactivity]
Stellar_Mind has joined #mlpack
sumedhghaisas has quit [Remote host closed the connection]
Stellar_Mind has left #mlpack []
christie has joined #mlpack
< sohail>
so here's a question... how do I get information about the "goodness of fit" from the decision stump API? I can't seem to find that
< sohail>
it's probably a dumb question
< rcurtin>
sohail: what does "goodness of fit" mean here?
< rcurtin>
usually in a machine learning context, you might take your labeled dataset and set aside part of it as a test set
< rcurtin>
then use your accuracy of the classifier on that test set as your goodness of fit measure
< sohail>
rcurtin: the problem I have is that the dataset is changing over time
< sohail>
so if I have 2 week of training data, then the test set will only be relevant for the next two weeks
< sohail>
I'm not sure if explaining this well
< sohail>
the model, not the test set will only be relevant for the next two weeks
< rcurtin>
if your test set is a different set entirely, that's still okay, you can use your error rate on that test set as a measure of goodness of fit
< rcurtin>
but if your test set has significantly different patterns than your training set, then your problem is nonstationary and decision stumps may not be the best technique...
< rcurtin>
nonstationary machine learning problems can be difficult to deal with, because you're trying to learn a pattern that changes over time...
< sohail>
rcurtin: I'm not explaining it well. So I want to create a model that will predict today's behaviour on the last two weeks of data
< sohail>
I guess one way to do this is to use N-1 samples and run it on the Nth sample to see if it works?
< sohail>
the change is not (usually) dramatic, btw.
< sohail>
so how would I test that model?
< sohail>
one way, I guess, since I have two weeks of data is to train it on 2 weeks - 1...day before yesterday
< rcurtin>
yes, I think that might be the right way to do it here
< sohail>
then compare predictions with yesterday?
< rcurtin>
yeah, you can do that, and then get an idea of how good the algorithm will perform in the future
< sohail>
ok, I think that makes sense
< rcurtin>
but when you actually deploy the algorithm, I would train the algorithm on the entire two weeks of data
< rcurtin>
i.e., to see how well it does, train on a subset (every day up to yesterday) then test on yesterday
< rcurtin>
but for your actual results, train on every day that you have data for
< rcurtin>
and the performance will probably be roughly the same
< sohail>
Really, train on every day?
< sohail>
even if I know the patterns change over time?
< rcurtin>
oh, sorry
< rcurtin>
I meant, train on every day in the last two weeks :)
< sohail>
ah, gotcha :)
< sohail>
thanks for your help rcurtin
govg has joined #mlpack
< rcurtin>
sure, no problem :)
< rcurtin>
ranjan123: I'm happy that I've been able to help with the learning process. :) I'll take a look at your updated PR when I have a chance
< ranjan123>
Hmmm.
zoq_ has joined #mlpack
zoq has quit [Read error: Connection reset by peer]
< ranjan123>
I have updated it 6 days back but didn't comment any thing
zoq_ is now known as zoq
< rcurtin>
github doesn't send me an email for commits to a PR, so I wasn't aware, sorry about that
< ranjan123>
yes
christie has quit [Ping timeout: 250 seconds]
govg has quit [Ping timeout: 240 seconds]
gopalakr has joined #mlpack
< gopalakr>
have a weird issie with linear regression
< gopalakr>
[INFO ] Loading '/tmp/n20train.csv' as CSV data. Size is 30 x 20000. [INFO ] Loading '/tmp/ny1.csv' as raw ASCII formatted data. Size is 1 x 20000. Intel MKL ERROR: Parameter 4 was incorrect on entry to DGELSD. [INFO ] Loading '/tmp/n20test.csv' as CSV data. Size is 30 x 20000. [FATAL] The model was trained on 18446744073709551615-dimensional data, but the test points in '/tmp/n20test.csv' are 30-dimensional! terminate called
< ranjan123>
Parameter 4 was incorrect on entry to DGELSD
< gopalakr>
some MKL issue... cud be if something is wrong within the training
< gopalakr>
the data itself is sane,, no nan/inf
< ranjan123>
Not sure. but may be some problem in the dataset
< ranjan123>
ohh
< ranjan123>
rcurtin, zoq may help you!
< rcurtin>
gopalakr: is it easy for you to try with regular blas/lapack instead of MKL?
< gopalakr>
shud i be recompiling armadillo without mkl ?
< gopalakr>
or just mlpack
awhitesong has joined #mlpack
< rcurtin>
gopalakr: you'll have to change your armadillo configuration and recompile armadillo, then recompile mlpack
< gopalakr>
got it tx.. is it an issue in general for mlpack ? mkl messing things up ?
govg has joined #mlpack
< rcurtin>
no, not in general, I just want to see if that fixes the issue
< rcurtin>
I have no idea what the underlying problem might be
< rcurtin>
but if we can isolate it to MKL, we might be able to figure out what's going on
< gopalakr>
ok will let you know.. still recompiling
< rcurtin>
if the problem is still there when using regular BLAS/LAPACK, maybe there is a bug in either armadillo or mlpack
< zoq>
gopalakr: Btw. Did you test with an existing model? Or did you train and test at the same time. I guess, if you start with an existing model there is something wrong with the serialization. If you train and test at the same time, parameters should be set.
< gopalakr>
same time
< zoq>
Okay, maybe there is a command combination that doesn't train the model. Can you show us the command you are using to train/test the model?
< zoq>
gopalakr: Are you sure this is the complete command, to get the error message "The model was trained on..." you have to specify the test file either using "--test_file" or "-T", maybe that changed over the time, not sure
< gopalakr>
sorry that was another instance...
< gopalakr>
heres the right one -- ~/mlpack/usr/local/bin/mlpack_linear_regression -t /tmp/n20train.csv -r /tmp/ny1.csv -T /tmp/n20test.csv -p /tmp/nyout1.csv -v
Nilabhra has quit [Remote host closed the connection]
< zoq>
gopalakr: Okay, looks good, as long as you don't specify 'input_model_file' or 'm' it should always train the model before predicting and lr.Parameters().n_elem should be valid.
Bartek has quit [Ping timeout: 240 seconds]
Bartek has joined #mlpack
govg has quit [Ping timeout: 250 seconds]
ranjan123 has quit [Quit: Page closed]
Bartek has quit [Ping timeout: 250 seconds]
mentekid has joined #mlpack
Bartek has joined #mlpack
gopalakr has quit [Ping timeout: 250 seconds]
Bartek has quit [Ping timeout: 276 seconds]
Bartek has joined #mlpack
mentekid has quit [Ping timeout: 276 seconds]
Bartek has quit [Remote host closed the connection]
awhitesong has left #mlpack []
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#793 (master - ebf77f8 : Ryan Curtin): The build passed.