#mlpack on 2018-09-28 — irc logs at libera.irclog.whitequark.org

2018-08-06 04:28 ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.

00:02 LordCow23 has quit [Read error: Connection reset by peer]

00:04 mac_nibblet9 has quit [Remote host closed the connection]

00:17 feirlane27 has joined #mlpack

00:17 feirlane27 has quit [Remote host closed the connection]

00:31 jamesd1 has joined #mlpack

00:32 jamesd1 has quit [K-Lined]

00:46 Game_Freak has joined #mlpack

00:47 Game_Freak is now known as Guest34047

00:51 Guest34047 has quit [Remote host closed the connection]

01:05 kuzetsa6 has joined #mlpack

01:07 kuzetsa6 has quit [Remote host closed the connection]

01:37 jose18 has joined #mlpack

01:37 jose18 has quit [Remote host closed the connection]

01:38 Kirjava20 has joined #mlpack

01:41 Kirjava20 has quit [Remote host closed the connection]

02:16 Genesis- has joined #mlpack

02:20 Genesis- has quit [Remote host closed the connection]

02:39 heyimwill20 has joined #mlpack

02:42 heyimwill20 has quit [Remote host closed the connection]

02:48 jadaking has joined #mlpack

02:52 jadaking has quit [Remote host closed the connection]

03:23 zipleen6 has joined #mlpack

03:24 meka17 has joined #mlpack

03:28 zipleen6 has quit [Remote host closed the connection]

03:28 meka17 has quit [K-Lined]

03:31 cnf8 has joined #mlpack

03:35 cnf8 has quit [Remote host closed the connection]

05:12 mutk has joined #mlpack

05:13 mutk has quit [K-Lined]

06:46 fikka has joined #mlpack

06:47 fikka has quit [Remote host closed the connection]

07:32 gsora20 has joined #mlpack

07:32 gsora20 has quit [Remote host closed the connection]

08:11 evil_steve21 has joined #mlpack

08:12 evil_steve21 has quit [Remote host closed the connection]

08:52 Ferus6 has joined #mlpack

08:52 Ferus6 has quit [Remote host closed the connection]

09:04 < ShikharJ> zoq: Those are minor changes, which I think can be done in your PR itself? I'd suggest that we remove the default parameters altogether, instead of setting the size as 10 for BatchNorm and as 1 for LayerNorm.

09:06 < zoq> ShikharJ: Okay, I'll make the necessary changes later today.

09:18 ivarmedi has joined #mlpack

09:21 ivarmedi has quit [Remote host closed the connection]

09:48 Dominionionion22 has joined #mlpack

09:49 Dominionionion22 has quit [Remote host closed the connection]

09:57 kaospunk_ has joined #mlpack

09:58 kaospunk_ has quit [Remote host closed the connection]

10:43 robertohueso has joined #mlpack

10:44 robertohueso has quit [Client Quit]

10:45 robertohueso has joined #mlpack

10:49 Bercik0 has joined #mlpack

10:52 Bercik0 has quit [Remote host closed the connection]

11:28 IrishFBall3227 has joined #mlpack

11:30 IrishFBall3227 has quit [Remote host closed the connection]

12:35 mvantellingen6 has joined #mlpack

12:40 mvantellingen6 has quit [Remote host closed the connection]

13:01 joepie9127 has joined #mlpack

13:02 joepie9127 has quit [Remote host closed the connection]

13:07 gtrs_ has joined #mlpack

13:07 gtrs_ has quit [Remote host closed the connection]

13:56 blakjak888 has joined #mlpack

14:01 blakjak has joined #mlpack

14:02 blakjak has quit [Client Quit]

14:02 blakjak888_ has joined #mlpack

14:05 blakjak888 has quit [Ping timeout: 256 seconds]

14:06 blakjak888_ has quit [Client Quit]

14:07 blakjak888 has joined #mlpack

14:09 blakjak888_ has joined #mlpack

14:10 blakjak888__ has joined #mlpack

14:11 blakjak888__ has quit [Client Quit]

14:12 blakjak888__ has joined #mlpack

14:12 blakjak888__ has quit [Client Quit]

14:13 blakjak888 has quit [Ping timeout: 256 seconds]

14:13 blakjak888__ has joined #mlpack

14:14 blakjak888_ has quit [Ping timeout: 256 seconds]

14:14 blakjak888__ has quit [Client Quit]

14:15 blakjak888_ has joined #mlpack

14:16 < blakjak888_> I am trying to get a little help with my first MLPACK FFN. I am going thru' Andrew Ng's Deepleaning.ai course and doing the projects in parallel in C++ with MLPACK and Armadillo. Is this the right place to ask some questions related to the use of MLPACK. I have searched a lot for examples but could not find anything to explain why my code is failing.

14:17 < blakjak888_> I build an FFN like this:

14:17 < blakjak888_> mlpack::ann::FFN<mlpack::ann::CrossEntropyError<>, mlpack::ann::RandomInitialization> model; model.Add<mlpack::ann::Linear<> >(trainSetX.n_rows, 25); model.Add<mlpack::ann::ReLULayer<> >(); model.Add<mlpack::ann::Linear<> >(25, 12); model.Add<mlpack::ann::ReLULayer<> >(); model.Add<mlpack::ann::Linear<> >(12, 6); model.Add<mlpack::ann::LogSoftMax<> >();

14:18 < blakjak888_> ... with my optimizer like this: mlpack::optimization::SGD<mlpack::optimization::AdamUpdate> optimizer(0.0001, 64, 10000, 1e-8, true, mlpack::optimization::AdamUpdate(1e-8, 0.9, 0.999));

14:19 < blakjak888_> However, I am always getting the same error no matter how I tune my "step" value. It tells me that SGD does not converge.

14:19 < blakjak888_> ... sorry, it actually converges to NaN.

14:21 < blakjak888_> I have double checked my input data at it is exactly matching my Python data. The input data is 12288 x 1080.

14:23 < blakjak888_> I suspect I may be missing a step somewhere, however when I follow any examples that I can find on the web for FFNs they pretty much do what I am doing here.

14:23 < blakjak888_> Can anyone give some advice or point me in the right direction?

14:30 < rcurtin> blakjak888_: sorry that you're having issues

14:30 < rcurtin> there are some examples you could look at in src/mlpack/tests/feedforward_network_test.cpp, and perhaps those would be useful

14:30 < rcurtin> if you're training with cross-entropy error I guess this is classification... can you tell me what your labels are?

14:32 < blakjak888_> I have 6 labels. The data is 1080 examples of 64x64x3 (RGB) images of a hand showing 1, 2, 3, 4, 5 fingers or a 0 indicated by thumb and forefinger together.

14:32 < rcurtin> sounds good. do the labels take values between 0 and 5, or 1 and 6? (or something else?)

14:32 < blakjak888_> 0 to 5

14:33 < rcurtin> hm. so, I am not sure on this (zoq should correct me) but I think that the output for using CrossEntropyError needs to be one-hot encoded

14:33 < rcurtin> let me check an example...

14:34 < rcurtin> right, I think this is the case; so instead of having a vector like [0 3 2 1 1 ...] where each element is a label, try using a matrix with 6 rows, where only the row corresponding to the true label is 1 and all others are zeros

14:34 < blakjak888_> Yes. I am using one-hot encoding

14:34 < rcurtin> ohh, I see

14:34 < rcurtin> ok

14:34 < blakjak888_> So my label matrix is 6x1080

14:35 < rcurtin> right, that sounds good

14:36 < rcurtin> the model you're using is pretty simple, so I don't think simplifying it further will make a difference...

14:36 < blakjak888_> I have manually checked all the input matrices and they seem to match exactly what I had in the Coursera deeplearning.ai tutorial.

14:36 < rcurtin> have you tried using a different optimizer than Adam?

14:36 < blakjak888_> I came to suspect that my optimizer was not setup correctly due to the error message I was getting.

14:36 < rcurtin> it seems like a bit of a long shot, since usually Adam will work ok

14:37 < blakjak888_> I tried GradientDescent

14:37 < rcurtin> just glancing at the definition of 'optimizer' it seems to have ok parameters to me

14:37 < rcurtin> did that work?

14:37 < blakjak888_> same error. In fact I tried several optimizers and all were giving me the same error.

14:37 < blakjak888_> Converging to NaN

14:38 < rcurtin> hm, ok, that is unexpected. based on what you've told me so far this *should* work, so there must be something else

14:38 < rcurtin> can you show me the full code?

14:38 < blakjak888_> I even cut my FFN to 2 layers. Linear then Sigmoid. ... but same error

14:38 < blakjak888_> Sure.

14:38 < blakjak888_> Is there a way to post code here?

14:38 < rcurtin> I'd use pastebin then copy the link to pastebin

14:38 < blakjak888_> I am not so familiar with the IRC

14:38 < rcurtin> no worries :)

14:38 < blakjak888_> Am using a broweser right now

14:39 < rcurtin> yeah, webchat works reasonably well enough I think :)

14:39 < blakjak888_> Should I paste the file or a copy of the text?

14:40 < rcurtin> either should be fine as long as I can see the code :)

14:42 < blakjak888_> not sure how to do a pastebin on this

14:42 < blakjak888_> If I just paste now I think it will join all the lines together like above.

14:42 < blakjak888_> I'll try

14:43 < rcurtin> hmmm, that's unfortunate... I guess you could re-add the line breaks in but that's a bit tedious

14:43 < blakjak888_> Yup. Won't allow it. Text too long

14:43 < rcurtin> maybe just the part of the code concerned with the building of the network and the training?

14:43 < blakjak888_> Can I email you the text?

14:43 < rcurtin> sure, that can work

14:43 < rcurtin> ryan@ratml.org

14:44 < blakjak888_> Should be in your inbox. I cut out all the header stuff.

14:44 < blakjak888_> Just send main()

14:45 < rcurtin> in your data load code, you can also do 'mlpack::data::Load("train_signs.h5", trainSetX)' and I think that will get everything in the format you need it :)

14:46 < rcurtin> that's just a minor comment, it shouldn't make any difference for the actual program :)

14:47 < blakjak888_> I tried that on the load but this is a multi dimensional H5 dataset

14:47 < blakjak888_> I found this was the only way to get in the data in the layout I needed for Armadillo matrices

14:47 < rcurtin> ah, ok, the Armadillo functionality for loading HDF5 isn't perfect

14:47 < rcurtin> (sorry about that!)

14:47 < rcurtin> it works well if the hdf5 file just has a single matrix

14:48 < rcurtin> are you sure that 'oneHotTrainSetY' doesn't contain any strange values?

14:48 < blakjak888_> I can post the code. I ti s very simple.

14:48 < rcurtin> also, I'm not sure if the call to ResetParameters() is needed before Train()

14:48 < blakjak888_> And I checked that too.

14:48 < blakjak888_> #include <armadillo> template <typename T> arma::Mat<T> oneHot(const arma::Row<T>& in) { //arma::Row<T> uniqueIn(arma::unique(in)); arma::Mat<T> oneHotMtx; //oneHotMtx.zeros(arma::Mat<T>(arma::unique(in)).n_cols, in.n_cols); oneHotMtx.zeros(arma::max(in)+1, in.n_cols); for (unsigned int i = 0; i <= in.n_cols; ++i) oneHotMtx.at(in.at(i), i) = 1; return oneHotMtx; }

14:49 < blakjak888_> Messy. I can email it.

14:50 < rcurtin> thanks---I think I understand like this but it's easier to follow it with line breaks :)

14:51 < rcurtin> I don't see anything wrong with that code either

14:51 < rcurtin> so my only thoughts for debugging here are ResetParameters(), a different optimizer, or perhaps that something about the data is being loaded very weird. However, you say you've already checked the data so that seems unlikely

14:52 < rcurtin> if none of those ideas of those mine work, I'm wondering if the best idea is to open a Github issue. I don't expect that something's wrong with the FFN code, because I've certainly trained a lot of networks like the one you're using here

14:52 < rcurtin> another thing to try (though the comments imply you've already tried it) is to use NegativeLogLikelihood<> not CrossEntropy<>; if you did that, what were the results?

14:52 < rcurtin> (unfortunately NegativeLogLikelihood<> expects a vector of labels from 1 to n_classes, not one-hot encoded. so that is a little confusing)

14:53 < blakjak888_> Is the Linear Function WX+b? Is there a way to manually set the W and b paramters rather than use RandomInit?

14:54 < rcurtin> there are a lot of initializations in src/mlpack/methods/ann/init_rules/, I suppose you could use those to set something explicitly

14:54 tokenrove16 has joined #mlpack

14:54 < rcurtin> or, rather, I mean, you could write your own initialization class with the same API as in there, and perhaps set all the weights to 1 or something for debugging

14:55 < rcurtin> unfortunately accessing the parameters of individual layers is a little bit hard because the design of the C++ code uses boost::variant for speed

14:57 < blakjak888_> That was what I was thinking. Are the paramters for each function and the format needed documented? e.g. I would expect for my Linear<> layer, the Paramters should be two matrices. W = 25x12288 and b = 25x1 but when I tried to examine Paramters() return value it gave me some strange matrix

14:57 < rcurtin> right, that Parameters() matrix is actually the entire set of parameters in the entire network

14:57 < blakjak888_> Oh. sorry, I see you replied before i asked

14:57 < rcurtin> so it's the concatenation of all the weights and biases in the network

14:57 < rcurtin> this is nice for speed, because everything is localized in memory

14:57 < rcurtin> however, it's less nice for actually inspecting what is going on

14:58 < rcurtin> it would be possible (but irritating) to write a method that actually returned the weight matrix only of a single layer, but even then because of the boost::variant usage things get really hard with types

14:59 < blakjak888_> Perhaps I could start with a much simpler example to test to see if it is actually my MLPACK installation causing problems. Do you know where I could find something very basic/simple for testing FNN with a small dataset and a known outcome?

14:59 < rcurtin> https://github.com/mlpack/models/blob/master/Kaggle/DigitRecognizer/src/DigitRecognizer.cpp

14:59 < rcurtin> there's an MNIST example model that just uses a simple network

15:00 < rcurtin> you can get the data from the repository in the right format: https://github.com/mlpack/models

15:02 tokenrove16 has quit [Ping timeout: 264 seconds]

15:02 < rcurtin> I think it should be pretty easy to copy-paste the code from that example (or use the example in its entirety) and see if it works

15:03 < rcurtin> another idea would be to look in src/mlpack/tests/feedforward_network_test.cpp, or even run the tests

15:03 < blakjak888_> Thanks. I actually tried a similar version of that last night which was using a Convulution network so was not quite like mine. This looks much better. I will give it a go and see if I have any similar convergence issues.

15:03 < rcurtin> you could do 'make mlpack_test' to build the tests, then run all the tests and see if there are any issues

15:03 < rcurtin> like I said, this one is confusing me a little bit. based on everything you've shown me it *should* work just fine

15:03 < rcurtin> sorry that you are having issues :(

15:03 < blakjak888_> I have a Windows installation so M$ could be screwing it up!!!

15:04 < rcurtin> ahh, yeah, it can be a little more difficult to use mlpack on Windows. but it is possible :)

15:04 < rcurtin> we do test our code with AppVeyor so at least there it properly builds and runs on a Windows environment. but many things could be different between your setup and that one...

15:04 < blakjak888_> I will try now and post back when I get a result.

15:05 < rcurtin> sure; I will be out for lunch soon, but I'll try and respond when I'm able to

15:05 < rcurtin> and if you'd rather move to Github issues for debugging I'm happy to try and help there when I'm able also

15:05 < blakjak888_> Thx

15:11 < blakjak888_> Well the code is running:

15:11 < blakjak888_> Reading data ... Training ... [NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'D:\usr\C++\FirstLSTM\FirstLSTM\nvblas.conf' 0 - accuracy: train = 29.5741%, valid = 29.2143%

15:12 < blakjak888_> and clearly the training is working:

15:12 < blakjak888_> 1 - accuracy: train = 50.4286%, valid = 48.6429% 2 - accuracy: train = 59.5979%, valid = 58.8095% 3 - accuracy: train = 65.2407%, valid = 63.9524%

15:14 < blakjak888_> ... so I will play with this working code and my data to see if it is actually my data that is causing the problem

15:15 < rcurtin> hmm, ok. I did wonder if anything was 'weird' about the data (maybe a hidden NaN somewhere or something) but based on what you said it seemed unlikely

15:16 < rcurtin> and since it's from a course the data should be ok also

15:18 < blakjak888_> I noticed that in this example code there is no normalization of the data.

15:18 < blakjak888_> The training is done directly on the pixel data.

15:19 < rcurtin> that shouldn't necessarily be an issue; if the pixel data is between 0 and 255 the network should be able to adjust just fine

15:19 < rcurtin> it's just weird things like NaN or Inf that'll definitely cause problems (or uninitialized memory being used? I didn't see any of that just from looking at the code though)

15:21 < blakjak888_> It's a huge dataset, about 13MB so hard to cehck everything, but this has at least given me a direction to go in. Thanks for your help

15:21 < rcurtin> sure :)

15:32 tedjp has joined #mlpack

15:33 Guest13996 has joined #mlpack

15:36 tedjp has quit [Remote host closed the connection]

15:40 Guest13996 has quit [Remote host closed the connection]

15:48 roidelapluie25 has joined #mlpack

15:50 roidelapluie25 has quit [Remote host closed the connection]

15:57 KnownSyntax23 has joined #mlpack

16:02 KnownSyntax23 has quit [Remote host closed the connection]

16:11 catsup1 has joined #mlpack

16:22 catsup1 has quit [Ping timeout: 252 seconds]

16:24 noeatnosleep22 has joined #mlpack

16:26 LeoTh3o28 has joined #mlpack

16:27 noeatnosleep22 has quit [Remote host closed the connection]

16:31 LeoTh3o28 has quit [Remote host closed the connection]

16:44 okamis has joined #mlpack

16:47 okamis has quit [Remote host closed the connection]

17:22 cjlcarvalho has joined #mlpack

17:36 Osleg13 has joined #mlpack

17:41 Osleg13 has quit [Remote host closed the connection]

17:54 etonka28 has joined #mlpack

17:57 etonka28 has quit [Remote host closed the connection]

18:18 cjlcarvalho has quit [Ping timeout: 252 seconds]

18:46 lumidify has joined #mlpack

18:52 lumidify has quit [Remote host closed the connection]

19:24 Guest28198 has joined #mlpack

19:32 LaunchpadMcQuack has joined #mlpack

19:33 Guest28198 has quit [Ping timeout: 252 seconds]

19:39 LaunchpadMcQuack has quit [Remote host closed the connection]

19:55 FruitieX18 has joined #mlpack

19:59 pepo3 has joined #mlpack

20:01 FruitieX18 has quit [Remote host closed the connection]

20:03 pepo3 has quit [Remote host closed the connection]

22:22 hayer_ has joined #mlpack

22:26 hayer_ has quit [Remote host closed the connection]

22:43 phadthai28 has joined #mlpack

22:52 phadthai28 has quit [Ping timeout: 252 seconds]