ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.
LordCow23 has quit [Read error: Connection reset by peer]
mac_nibblet9 has quit [Remote host closed the connection]
feirlane27 has joined #mlpack
feirlane27 has quit [Remote host closed the connection]
jamesd1 has joined #mlpack
jamesd1 has quit [K-Lined]
Game_Freak has joined #mlpack
Game_Freak is now known as Guest34047
Guest34047 has quit [Remote host closed the connection]
kuzetsa6 has joined #mlpack
kuzetsa6 has quit [Remote host closed the connection]
jose18 has joined #mlpack
jose18 has quit [Remote host closed the connection]
Kirjava20 has joined #mlpack
Kirjava20 has quit [Remote host closed the connection]
Genesis- has joined #mlpack
Genesis- has quit [Remote host closed the connection]
heyimwill20 has joined #mlpack
heyimwill20 has quit [Remote host closed the connection]
jadaking has joined #mlpack
jadaking has quit [Remote host closed the connection]
zipleen6 has joined #mlpack
meka17 has joined #mlpack
zipleen6 has quit [Remote host closed the connection]
meka17 has quit [K-Lined]
cnf8 has joined #mlpack
cnf8 has quit [Remote host closed the connection]
mutk has joined #mlpack
mutk has quit [K-Lined]
fikka has joined #mlpack
fikka has quit [Remote host closed the connection]
gsora20 has joined #mlpack
gsora20 has quit [Remote host closed the connection]
evil_steve21 has joined #mlpack
evil_steve21 has quit [Remote host closed the connection]
Ferus6 has joined #mlpack
Ferus6 has quit [Remote host closed the connection]
< ShikharJ>
zoq: Those are minor changes, which I think can be done in your PR itself? I'd suggest that we remove the default parameters altogether, instead of setting the size as 10 for BatchNorm and as 1 for LayerNorm.
< zoq>
ShikharJ: Okay, I'll make the necessary changes later today.
ivarmedi has joined #mlpack
ivarmedi has quit [Remote host closed the connection]
Dominionionion22 has joined #mlpack
Dominionionion22 has quit [Remote host closed the connection]
kaospunk_ has joined #mlpack
kaospunk_ has quit [Remote host closed the connection]
robertohueso has joined #mlpack
robertohueso has quit [Client Quit]
robertohueso has joined #mlpack
Bercik0 has joined #mlpack
Bercik0 has quit [Remote host closed the connection]
IrishFBall3227 has joined #mlpack
IrishFBall3227 has quit [Remote host closed the connection]
mvantellingen6 has joined #mlpack
mvantellingen6 has quit [Remote host closed the connection]
joepie9127 has joined #mlpack
joepie9127 has quit [Remote host closed the connection]
gtrs_ has joined #mlpack
gtrs_ has quit [Remote host closed the connection]
blakjak888 has joined #mlpack
blakjak has joined #mlpack
blakjak has quit [Client Quit]
blakjak888_ has joined #mlpack
blakjak888 has quit [Ping timeout: 256 seconds]
blakjak888_ has quit [Client Quit]
blakjak888 has joined #mlpack
blakjak888_ has joined #mlpack
blakjak888__ has joined #mlpack
blakjak888__ has quit [Client Quit]
blakjak888__ has joined #mlpack
blakjak888__ has quit [Client Quit]
blakjak888 has quit [Ping timeout: 256 seconds]
blakjak888__ has joined #mlpack
blakjak888_ has quit [Ping timeout: 256 seconds]
blakjak888__ has quit [Client Quit]
blakjak888_ has joined #mlpack
< blakjak888_>
I am trying to get a little help with my first MLPACK FFN. I am going thru' Andrew Ng's Deepleaning.ai course and doing the projects in parallel in C++ with MLPACK and Armadillo. Is this the right place to ask some questions related to the use of MLPACK. I have searched a lot for examples but could not find anything to explain why my code is failing.
< blakjak888_>
... with my optimizer like this: mlpack::optimization::SGD<mlpack::optimization::AdamUpdate> optimizer(0.0001, 64, 10000, 1e-8, true, mlpack::optimization::AdamUpdate(1e-8, 0.9, 0.999));
< blakjak888_>
However, I am always getting the same error no matter how I tune my "step" value. It tells me that SGD does not converge.
< blakjak888_>
... sorry, it actually converges to NaN.
< blakjak888_>
I have double checked my input data at it is exactly matching my Python data. The input data is 12288 x 1080.
< blakjak888_>
I suspect I may be missing a step somewhere, however when I follow any examples that I can find on the web for FFNs they pretty much do what I am doing here.
< blakjak888_>
Can anyone give some advice or point me in the right direction?
< rcurtin>
blakjak888_: sorry that you're having issues
< rcurtin>
there are some examples you could look at in src/mlpack/tests/feedforward_network_test.cpp, and perhaps those would be useful
< rcurtin>
if you're training with cross-entropy error I guess this is classification... can you tell me what your labels are?
< blakjak888_>
I have 6 labels. The data is 1080 examples of 64x64x3 (RGB) images of a hand showing 1, 2, 3, 4, 5 fingers or a 0 indicated by thumb and forefinger together.
< rcurtin>
sounds good. do the labels take values between 0 and 5, or 1 and 6? (or something else?)
< blakjak888_>
0 to 5
< rcurtin>
hm. so, I am not sure on this (zoq should correct me) but I think that the output for using CrossEntropyError needs to be one-hot encoded
< rcurtin>
let me check an example...
< rcurtin>
right, I think this is the case; so instead of having a vector like [0 3 2 1 1 ...] where each element is a label, try using a matrix with 6 rows, where only the row corresponding to the true label is 1 and all others are zeros
< blakjak888_>
Yes. I am using one-hot encoding
< rcurtin>
ohh, I see
< rcurtin>
ok
< blakjak888_>
So my label matrix is 6x1080
< rcurtin>
right, that sounds good
< rcurtin>
the model you're using is pretty simple, so I don't think simplifying it further will make a difference...
< blakjak888_>
I have manually checked all the input matrices and they seem to match exactly what I had in the Coursera deeplearning.ai tutorial.
< rcurtin>
have you tried using a different optimizer than Adam?
< blakjak888_>
I came to suspect that my optimizer was not setup correctly due to the error message I was getting.
< rcurtin>
it seems like a bit of a long shot, since usually Adam will work ok
< blakjak888_>
I tried GradientDescent
< rcurtin>
just glancing at the definition of 'optimizer' it seems to have ok parameters to me
< rcurtin>
did that work?
< blakjak888_>
same error. In fact I tried several optimizers and all were giving me the same error.
< blakjak888_>
Converging to NaN
< rcurtin>
hm, ok, that is unexpected. based on what you've told me so far this *should* work, so there must be something else
< rcurtin>
can you show me the full code?
< blakjak888_>
I even cut my FFN to 2 layers. Linear then Sigmoid. ... but same error
< blakjak888_>
Sure.
< blakjak888_>
Is there a way to post code here?
< rcurtin>
I'd use pastebin then copy the link to pastebin
< blakjak888_>
I am not so familiar with the IRC
< rcurtin>
no worries :)
< blakjak888_>
Am using a broweser right now
< rcurtin>
yeah, webchat works reasonably well enough I think :)
< blakjak888_>
Should I paste the file or a copy of the text?
< rcurtin>
either should be fine as long as I can see the code :)
< blakjak888_>
not sure how to do a pastebin on this
< blakjak888_>
If I just paste now I think it will join all the lines together like above.
< blakjak888_>
I'll try
< rcurtin>
hmmm, that's unfortunate... I guess you could re-add the line breaks in but that's a bit tedious
< blakjak888_>
Yup. Won't allow it. Text too long
< rcurtin>
maybe just the part of the code concerned with the building of the network and the training?
< blakjak888_>
Can I email you the text?
< rcurtin>
sure, that can work
< rcurtin>
ryan@ratml.org
< blakjak888_>
Should be in your inbox. I cut out all the header stuff.
< blakjak888_>
Just send main()
< rcurtin>
in your data load code, you can also do 'mlpack::data::Load("train_signs.h5", trainSetX)' and I think that will get everything in the format you need it :)
< rcurtin>
that's just a minor comment, it shouldn't make any difference for the actual program :)
< blakjak888_>
I tried that on the load but this is a multi dimensional H5 dataset
< blakjak888_>
I found this was the only way to get in the data in the layout I needed for Armadillo matrices
< rcurtin>
ah, ok, the Armadillo functionality for loading HDF5 isn't perfect
< rcurtin>
(sorry about that!)
< rcurtin>
it works well if the hdf5 file just has a single matrix
< rcurtin>
are you sure that 'oneHotTrainSetY' doesn't contain any strange values?
< blakjak888_>
I can post the code. I ti s very simple.
< rcurtin>
also, I'm not sure if the call to ResetParameters() is needed before Train()
< blakjak888_>
And I checked that too.
< blakjak888_>
#include <armadillo> template <typename T> arma::Mat<T> oneHot(const arma::Row<T>& in) { //arma::Row<T> uniqueIn(arma::unique(in)); arma::Mat<T> oneHotMtx; //oneHotMtx.zeros(arma::Mat<T>(arma::unique(in)).n_cols, in.n_cols); oneHotMtx.zeros(arma::max(in)+1, in.n_cols); for (unsigned int i = 0; i <= in.n_cols; ++i) oneHotMtx.at(in.at(i), i) = 1; return oneHotMtx; }
< blakjak888_>
Messy. I can email it.
< rcurtin>
thanks---I think I understand like this but it's easier to follow it with line breaks :)
< rcurtin>
I don't see anything wrong with that code either
< rcurtin>
so my only thoughts for debugging here are ResetParameters(), a different optimizer, or perhaps that something about the data is being loaded very weird. However, you say you've already checked the data so that seems unlikely
< rcurtin>
if none of those ideas of those mine work, I'm wondering if the best idea is to open a Github issue. I don't expect that something's wrong with the FFN code, because I've certainly trained a lot of networks like the one you're using here
< rcurtin>
another thing to try (though the comments imply you've already tried it) is to use NegativeLogLikelihood<> not CrossEntropy<>; if you did that, what were the results?
< rcurtin>
(unfortunately NegativeLogLikelihood<> expects a vector of labels from 1 to n_classes, not one-hot encoded. so that is a little confusing)
< blakjak888_>
Is the Linear Function WX+b? Is there a way to manually set the W and b paramters rather than use RandomInit?
< rcurtin>
there are a lot of initializations in src/mlpack/methods/ann/init_rules/, I suppose you could use those to set something explicitly
tokenrove16 has joined #mlpack
< rcurtin>
or, rather, I mean, you could write your own initialization class with the same API as in there, and perhaps set all the weights to 1 or something for debugging
< rcurtin>
unfortunately accessing the parameters of individual layers is a little bit hard because the design of the C++ code uses boost::variant for speed
< blakjak888_>
That was what I was thinking. Are the paramters for each function and the format needed documented? e.g. I would expect for my Linear<> layer, the Paramters should be two matrices. W = 25x12288 and b = 25x1 but when I tried to examine Paramters() return value it gave me some strange matrix
< rcurtin>
right, that Parameters() matrix is actually the entire set of parameters in the entire network
< blakjak888_>
Oh. sorry, I see you replied before i asked
< rcurtin>
so it's the concatenation of all the weights and biases in the network
< rcurtin>
this is nice for speed, because everything is localized in memory
< rcurtin>
however, it's less nice for actually inspecting what is going on
< rcurtin>
it would be possible (but irritating) to write a method that actually returned the weight matrix only of a single layer, but even then because of the boost::variant usage things get really hard with types
< blakjak888_>
Perhaps I could start with a much simpler example to test to see if it is actually my MLPACK installation causing problems. Do you know where I could find something very basic/simple for testing FNN with a small dataset and a known outcome?
< rcurtin>
I think it should be pretty easy to copy-paste the code from that example (or use the example in its entirety) and see if it works
< rcurtin>
another idea would be to look in src/mlpack/tests/feedforward_network_test.cpp, or even run the tests
< blakjak888_>
Thanks. I actually tried a similar version of that last night which was using a Convulution network so was not quite like mine. This looks much better. I will give it a go and see if I have any similar convergence issues.
< rcurtin>
you could do 'make mlpack_test' to build the tests, then run all the tests and see if there are any issues
< rcurtin>
like I said, this one is confusing me a little bit. based on everything you've shown me it *should* work just fine
< rcurtin>
sorry that you are having issues :(
< blakjak888_>
I have a Windows installation so M$ could be screwing it up!!!
< rcurtin>
ahh, yeah, it can be a little more difficult to use mlpack on Windows. but it is possible :)
< rcurtin>
we do test our code with AppVeyor so at least there it properly builds and runs on a Windows environment. but many things could be different between your setup and that one...
< blakjak888_>
I will try now and post back when I get a result.
< rcurtin>
sure; I will be out for lunch soon, but I'll try and respond when I'm able to
< rcurtin>
and if you'd rather move to Github issues for debugging I'm happy to try and help there when I'm able also
< blakjak888_>
Thx
< blakjak888_>
Well the code is running:
< blakjak888_>
Reading data ... Training ... [NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'D:\usr\C++\FirstLSTM\FirstLSTM\nvblas.conf' 0 - accuracy: train = 29.5741%, valid = 29.2143%
< blakjak888_>
and clearly the training is working:
< blakjak888_>
... so I will play with this working code and my data to see if it is actually my data that is causing the problem
< rcurtin>
hmm, ok. I did wonder if anything was 'weird' about the data (maybe a hidden NaN somewhere or something) but based on what you said it seemed unlikely
< rcurtin>
and since it's from a course the data should be ok also
< blakjak888_>
I noticed that in this example code there is no normalization of the data.
< blakjak888_>
The training is done directly on the pixel data.
< rcurtin>
that shouldn't necessarily be an issue; if the pixel data is between 0 and 255 the network should be able to adjust just fine
< rcurtin>
it's just weird things like NaN or Inf that'll definitely cause problems (or uninitialized memory being used? I didn't see any of that just from looking at the code though)
< blakjak888_>
It's a huge dataset, about 13MB so hard to cehck everything, but this has at least given me a direction to go in. Thanks for your help
< rcurtin>
sure :)
tedjp has joined #mlpack
Guest13996 has joined #mlpack
tedjp has quit [Remote host closed the connection]
Guest13996 has quit [Remote host closed the connection]
roidelapluie25 has joined #mlpack
roidelapluie25 has quit [Remote host closed the connection]
KnownSyntax23 has joined #mlpack
KnownSyntax23 has quit [Remote host closed the connection]
catsup1 has joined #mlpack
catsup1 has quit [Ping timeout: 252 seconds]
noeatnosleep22 has joined #mlpack
LeoTh3o28 has joined #mlpack
noeatnosleep22 has quit [Remote host closed the connection]
LeoTh3o28 has quit [Remote host closed the connection]
okamis has joined #mlpack
okamis has quit [Remote host closed the connection]
cjlcarvalho has joined #mlpack
Osleg13 has joined #mlpack
Osleg13 has quit [Remote host closed the connection]
etonka28 has joined #mlpack
etonka28 has quit [Remote host closed the connection]
cjlcarvalho has quit [Ping timeout: 252 seconds]
lumidify has joined #mlpack
lumidify has quit [Remote host closed the connection]
Guest28198 has joined #mlpack
LaunchpadMcQuack has joined #mlpack
Guest28198 has quit [Ping timeout: 252 seconds]
LaunchpadMcQuack has quit [Remote host closed the connection]
FruitieX18 has joined #mlpack
pepo3 has joined #mlpack
FruitieX18 has quit [Remote host closed the connection]
pepo3 has quit [Remote host closed the connection]
hayer_ has joined #mlpack
hayer_ has quit [Remote host closed the connection]