#mlpack on 2017-07-18 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:01 < chenzhe> zoq: Thanks!

00:17 chenzhe has quit [Quit: chenzhe]

01:12 kris1 has quit [Quit: kris1]

01:53 sumedhghaisas has quit [Ping timeout: 248 seconds]

02:13 sheogorath27_ has joined #mlpack

02:14 sheogorath27_ is now known as sheogorath27

02:31 sumedhghaisas has joined #mlpack

03:43 < sumedhghaisas> rcurtin: Hey Ryan... still awake? I solved yesterday's problem so didn;t send any email. But I am stuck right now at some differentiation problem...

03:48 < sumedhghaisas> what is the diffentiation of W_dash = W / sum(W) ?

03:48 < sumedhghaisas> I have this function in NTM...

03:48 < sumedhghaisas> maybe I am doing some horrible mistake here... but I am getting the differentiation to be zero...

04:04 sumedhghaisas has quit [Ping timeout: 248 seconds]

04:25 partobs-mdp has joined #mlpack

04:30 sumedhghaisas has joined #mlpack

04:40 sumedhghaisas has quit [Ping timeout: 248 seconds]

05:06 < partobs-mdp> zoq: Finally managed to clean up, mow running MiniBatchSGD + reset code. However, what value should I set for epochs in MinibatchSGD? It seems to run a lot of passes in one "iteration"

05:11 kris1 has joined #mlpack

05:25 sumedhghaisas has joined #mlpack

05:33 kris1 has quit [Quit: kris1]

06:00 < partobs-mdp> The objective also steadily decreases, though

06:03 kris1 has joined #mlpack

06:21 kris1 has quit [Quit: kris1]

06:26 kris1 has joined #mlpack

06:31 kris1 has quit [Client Quit]

06:32 kris1 has joined #mlpack

06:35 kris1 has quit [Client Quit]

06:36 kris1 has joined #mlpack

07:20 mikeling has quit [Quit: Connection closed for inactivity]

07:41 sumedhghaisas has quit [Ping timeout: 248 seconds]

08:50 vivekp has quit [Ping timeout: 240 seconds]

08:52 mikeling has joined #mlpack

09:01 shikhar has joined #mlpack

09:28 shikhar has quit [Read error: Connection reset by peer]

09:32 shikhar has joined #mlpack

09:40 shikhar has quit [Read error: Connection reset by peer]

09:51 shikhar has joined #mlpack

10:18 shikhar has quit [Read error: Connection reset by peer]

10:32 shikhar has joined #mlpack

10:54 MikeLDN has joined #mlpack

11:15 shikhar has quit [Read error: Connection reset by peer]

11:19 shikhar has joined #mlpack

11:20 kris1 has quit [Quit: kris1]

11:27 < MikeLDN> Hi all

11:28 < MikeLDN> I'm having a problem with cmd line CF and query file option (testing on movie lens set, probably some mess with indexing). Can anyone give me me a link to other free dataset?

11:42 < zoq> partobs-mdp: I used the same settings as you used for Adam e.g. opt.MaxIterations() = samples; and a batch size of 10.

11:45 < zoq> MikeLDN: You could test it on the MovieLens datasets as proposed on the tutorial: http://www.mlpack.org/docs/mlpack-2.2.3/doxygen/cftutorial.html

11:47 shikhar has quit [Read error: Connection reset by peer]

11:51 < MikeLDN> You mean "ml-latest-small.zip" from https://grouplens.org/datasets/movielens/ ?

11:52 < zoq> MikeLDN: right

11:57 < MikeLDN> Thnx. I have tested with ratings.csv (first removed heading row and timestamp column) but don't get the same result as on CF tutorial page. Both users and items column start with 1 index (instead of 0). Could that be the problem, should I re-index them?

12:09 shikhar has joined #mlpack

12:32 < zoq> MikeLDN: Since the datasets was updated over time you don't see the exact same results. Perhaps rcurtin, also in the channel still has the old dataset, he will respond once he has a chance.

12:34 < MikeLDN> That would be perfect, thanks.

12:36 shikhar has quit [Ping timeout: 240 seconds]

12:37 shikhar has joined #mlpack

12:48 vivekp has joined #mlpack

12:53 < partobs-mdp> zoq: Finally managed to properly run the experiment. After *nine* iterations got 97.4% on testing set.

12:53 < partobs-mdp> bin/mlpack_lstm_baseline -t copy -e 800 -b 2 -l 10 -r 3 -s 1000 -v

12:54 < zoq> partobs-mdp: Sounds good, what batch size did you use?

12:54 < partobs-mdp> I think we can simply test AddTask after it and *finally* start working on HAM (But we have some kind of a head start - the memory data structure)

12:54 < partobs-mdp> zoq: 10, as you proposed

12:54 < partobs-mdp> 99.7% after 10 iterations

12:55 < zoq> partobs-mdp: I agree, I think it's a good idea to split the copy task from the other tasks and open a single PR for the add task. Once we figured out the issues with the other task we can merge them. What do you think?

12:57 < partobs-mdp> zoq: Sound good, but isn't it over-engineering? We're (imho) more or less done with the task API and representation, so we can simply push the changes to the running PR. (Not sure if I'm right about that, though)

13:01 < zoq> We can leave the PR open for now, but I don't like the idea to merge something without providing a model that is able to learn (stable) the task. Since we know we can learn the copy task, I thought we could merge the code now.

13:02 < partobs-mdp> zoq: Can't find the gist with the results, but did we have 5% precision on 4-bit addition task?

13:02 < partobs-mdp> *have we ever had

13:03 < zoq> https://gist.github.com/zoq/7b4fba0869ad4a05cf491513fe518837

13:03 < zoq> almost

13:03 < partobs-mdp> well, it was the result from the first iteration

13:03 < partobs-mdp> on iter5 it has 5.5%

13:04 < partobs-mdp> zoq: I think we can call the gradient explosion problem solved. However, AddTask learns really slowly even now

13:06 < zoq> Have you tested the Add task with minibatch SGD, I couldn't see any progress after like 1000 iterations?

13:06 < partobs-mdp> I'm currently running it on minibatch SGD

13:07 < zoq> If you are able to train the Task, I don't see a reason to split the Add task from the tasks for now.

13:08 shikhar has quit [Ping timeout: 260 seconds]

13:08 shikhar has joined #mlpack

13:11 < partobs-mdp> zoq: Well, although I did manage to get 51.5% on test for adding 1-bit numbers (our best, as far as I remember), the learning always halts after several iterations in AddTask (the objective keeps going down in small steps, but the validation score doesn't really improve)

13:14 < zoq> partobs-mdp: For me the questions remains, what is the issue; parameter initialization or some issue with the task itself, if we are able to learn the Task with HAM or another model, I don't mind to merge it.

13:29 shikhar has quit [Read error: Connection reset by peer]

13:31 shikhar has joined #mlpack

13:51 shikhar_ has joined #mlpack

13:53 shikhar has quit [Ping timeout: 268 seconds]

14:22 shikhar_ has quit [Quit: WeeChat 1.7]

14:30 < rcurtin> MikeLDN: I think I used the standard movielens-100k dataset for that, no special modifications

14:31 shikhar has joined #mlpack

14:31 < rcurtin> I can't seem to find the exact old dataset that I used though

14:31 < rcurtin> I am also finding that with the current master branch, I am not seeing recommendations that look correct... a huge number of the top recommendations are item 1, which doesn't seem right

14:31 < rcurtin> I'm digging deeper now, let's see what I find...

14:40 < MikeLDN> rcurtin: Thnx. I was able to generate -A output but could not check it. However, with -q option wasn't able to get the result I would expect (for instance, single user result is not the same as the one in -A set). User/Item indexes left to start from 1 (not 0). Hope this helps...

14:40 < rcurtin> I see that the bug was introduced between mlpack-2.0.3 and mlpack-2.1.0

14:40 < rcurtin> if you want to try with mlpack-2.0.3, I think you will get the result you expect, and in the mean time I see the code that was changed and am starting to work with it to resolve the issue

14:41 < rcurtin> (I guess this means that mlpack-2.2.4 gets released today as a bugfix...)

14:44 < rcurtin> also, keep in mind that if you run mlpack_cf twice, once with '-A' and once with '-q', you'll get different results because the starting point of the optimization is random

14:45 < rcurtin> so to get the same results, you'd need to also specify --seed <some value>

14:45 < rcurtin> with the same seed value for both runs

14:45 < rcurtin> or, alternately, use --output_model_file to save the model for the first run, then use --input_model_file for a subsequent run

14:46 < MikeLDN> All clear. Thank you for the fast response...

14:50 < rcurtin> sure, let me know if you still have issues with mlpack-2.0.3, and I'll keep you updated with what I find as I dig into this bug

14:51 MikeLDN has quit [Ping timeout: 260 seconds]

14:52 < rcurtin> wow, I found it... it is a single-character bug

15:10 partobs-mdp has quit [Remote host closed the connection]

15:17 MikeLDN has joined #mlpack

15:23 < MikeLDN> Nice. I will wait for the release then, no need to compile 2.0.3 (for Win)

15:30 kris1 has joined #mlpack

15:39 kris1 has quit [Quit: kris1]

15:47 kris1 has joined #mlpack

16:02 < rcurtin> MikeLDN: sure, I should have it done by the end of the day if something else doesn't distract me

16:20 shikhar_ has joined #mlpack

16:22 shikhar has quit [Ping timeout: 246 seconds]

16:35 sumedhghaisas has joined #mlpack

16:37 < MikeLDN> rcurtin: note. I took the fresh code from master, compiled it and just to confirm it is working now as expected (-q option). cheers

16:52 < rcurtin> great! :)

16:57 < sumedhghaisas> zoq: Hey Marcus...

17:06 < rcurtin> sumedhghaisas: I am awake now too, finally we are here at the same time :)

17:06 < rcurtin> sorry about last night, I had just gone to bed

17:07 < sumedhghaisas> rcurtin: Hey Ryan. Okay I also had something to talk to you. About your idea to remove the constant zero vector from GRU and LSTM

17:08 < sumedhghaisas> In most of the experiments now they use a bias initialization which is trained.

17:09 < rcurtin> can you link me to the code and discussion so I can refresh my memory?

17:10 < sumedhghaisas> https://github.com/mlpack/mlpack/pull/1018

17:13 < sumedhghaisas> So I was talking about the zero vector which is used as cell initialization

17:14 < rcurtin> sure, it's PR #1018 but which subdiscussion is it? I am not seeing a relevant one

17:15 < rcurtin> this one? https://github.com/mlpack/mlpack/pull/1018/#discussion_r123898616

17:16 < rcurtin> I owe you a response on this PR too, I'll handle that now, but we should talk about what you wanted to talk about here :)

17:17 < sumedhghaisas> ahh yes... I see that it shows outdated now.

17:17 < rcurtin> that's ok, if that is the right discussion then I am on the same page :)

17:18 < sumedhghaisas> its that 3 point changes that you suggested on the gru_impl.hpp

17:19 < rcurtin> right, I think that's what I just linked to

17:19 < sumedhghaisas> ahh yeah. So now while I am implementin NTM I have realized that most of the architectures now do not use zero vector there.

17:20 < sumedhghaisas> cause always using a zero vector is very restrictive. They use a bias layer. Which can be trained instead.

17:20 < sumedhghaisas> thus the cell will figure out which vector to initialize itself with.

17:21 < sumedhghaisas> for the optimum performance

17:24 < rcurtin> right, so your idea then is to replace the zeros with something trainable

17:25 < sumedhghaisas> yes. But I am not sure how I would design that.

17:25 < rcurtin> do the NTM papers or anything discuss this bias layer?

17:26 < sumedhghaisas> yeah. let me send you the link...

17:26 < sumedhghaisas> https://arxiv.org/pdf/1410.5401.pdf

17:26 < sumedhghaisas> Page 10

17:26 < sumedhghaisas> section Experiments

17:26 < sumedhghaisas> second para

17:32 < rcurtin> right, so they say that the bias vector is learned there

17:32 < rcurtin> at least for the LSTM networks

17:32 < rcurtin> but for NTM they just give "bias values"

17:32 < rcurtin> hehe, and the word 'bias' only appears three times in the paper

17:33 < sumedhghaisas> yeah... seems like NTM is biased towards something.

17:34 < sumedhghaisas> :)

17:34 < rcurtin> according to the paper, it is 'biased towards storing data without interference' :) (that's one of the three occurrences)

17:34 < sumedhghaisas> I think what they mean is again the same learned bias layer... for NTM...

17:35 < rcurtin> yeah, I think that is all we can assume

17:35 < sumedhghaisas> storing data without inference? What does that even mean? anyways...

17:36 < sumedhghaisas> I was thinking about accepting a layer in LSTM constructor which will be used as the initialization layer...

17:36 < sumedhghaisas> what do you think?

17:38 < rcurtin> I'd be interested in zoq's thoughts on that one, I think it could be fine, but we would have to modify the LSTM (and GRU) so it properly trains that initialization layer

17:38 < rcurtin> personally, I think that going with zeros is okay for now---in the end, the priority is getting your overall project done

17:39 < sumedhghaisas> the paper is super vague though. The funniest thing is... in the whole paper they have not mentioned exactly where the external output is. The image shows that it is coming from the controller network. But is the same as the input to the memory heads or it is different?

17:39 < rcurtin> so I would say, maybe it is better to open an issue detailing the problem and how it can be fixed, but focus your efforts on the NTM implementation itself

17:39 < rcurtin> and only revisit the issue this summer if the performance you are getting it far lower than what is expected by the paper

17:40 shikhar_ has quit [Quit: WeeChat 1.7]

17:40 < sumedhghaisas> ahh yes. Thats what I am doing right now. Even in NTM I am assuming zero init.

17:41 < rcurtin> right; so, do you think you want to open an issue detailing the problem, detailing possible approaches, and then we can continue the discussion of what to do there?

17:41 < rcurtin> that will be a little more long-lasting of a discussion than IRC I think :)

17:42 < rcurtin> personally I suspect the zero init will have only a minor effect on the performance

17:42 < rcurtin> (I could be wrong though---but we'll find out later)

17:43 < rcurtin> as for the comments I left in the PR, I'd just do whatever you think the easiest of the three options is

17:43 < rcurtin> it would be nice to get a little speedup over the current implementation, but if you are going to open an issue where we'll re-engineer it later anyway, there is no need to put hours upon hours into perfectly optimizing it

18:00 partobs-mdp has joined #mlpack

18:07 < sumedhghaisas> rcurtin: Ahh yes. I have already pushed the solution using idea 2. Idea 2 involves lot of refactoring so if it is going to get to get replaced I dont see a point

18:07 < sumedhghaisas> *sorry I meant Idea 3 involved lot of refactoring

18:12 < rcurtin> yeah, idea 3 would have been a lot of work

18:12 < rcurtin> idea 2 is just fine, so I guess with that, then if there are no more comments the GRU implementation is ready?

18:13 < rcurtin> I guess, how is the NTM work coming otherwise?

18:14 < rcurtin> I guess for the batchnorm layer, we already have praveen's PR #955, but do you need to modify that to be implemented the way that you need? I think it would be ok to use his PR as a base to implement some further changes

18:23 < ironstark> rcurtin: zoq: Is Neural Network Toolbox not installed in slake? When I try to run Perceptron it shows this error

18:24 < ironstark> To use 'perceptron', the following product must be both licensed and installed:

18:24 < ironstark> Neural Network Toolbox

18:29 < rcurtin> ironstark: I think I installed all toolboxes I have a license to install

18:30 sumedhghaisas has quit [Ping timeout: 248 seconds]

18:33 vivekp has quit [Ping timeout: 240 seconds]

18:37 < ironstark> Actually the current perceptron script is not working

18:37 < ironstark> It is because of a different reason

18:37 < ironstark> thought that was the reason

18:37 < ironstark> but the toolbox is installed.

18:43 < ironstark> rcurtin: https://in.mathworks.com/help/nnet/ref/perceptron.html?searchHighlight=Perceptron&s_tid=doc_srchtitle

18:43 < ironstark> There seems to be some error with the MATLAB installaton

18:45 < ironstark> when I run the scripts on this page - while running the line y = net(x);

18:45 < ironstark> I get: Subscript indices must either be real positive integers or logicals.

18:46 < ironstark> I get the same thing when I am running predictions = net(testSet) in the current perceptron script

18:51 < zoq> ironstark: It's is a binary classifier, so you can't train it on e.g. the iris dataset.

18:52 < zoq> The webpage, or arcene dataset should work.

18:52 < zoq> For the arcene dataset you have to normalize the labels.

18:53 < ironstark> oh, ok got it. Thanks :)

18:54 < zoq> If that doesn't solve the problem, let us know and we take a closer look into the issue.

18:59 < kris1> Hi mikhail i updated the GAN implementation

18:59 < kris1> https://gist.github.com/kris-singh/fb455fa809634bc8f1afc2872407352d

18:59 < kris1> I am only now not sure about the Evaluate function.

18:59 < kris1> Can you have a look at that....

19:01 < kris1> Basicallly if i want to use the evaluate function of the generator or the discriminator network i would have to provide them with predictors and discriminators. If i implement the evaluate function i don’t have to do that

19:01 < kris1> But i am not sure what the evaluate function for gan should be

19:23 mentekid has quit [Quit: Leaving.]

19:25 kris1 has quit [Quit: kris1]

19:25 mentekid has joined #mlpack

19:34 kris1 has joined #mlpack

19:41 partobs-mdp has quit [Remote host closed the connection]

19:42 < kris1> zoq: there?

19:53 < lozhnikov> kris1: I think the Evaluate function should look like log(1-D(G(z))) i.e. the function that we are going to minimize

19:55 < kris1> For the whole gan. ie for both the generator and discriminator

19:56 < lozhnikov> yes, for the GAN class

19:57 < kris1> Okay…. i will implement that. Also could you look at the above gist. I have changed the trianing function as you had suggested

19:58 < kris1> With ssRBM i am running into chol error meaning that the matrix is no more psd

19:59 < kris1> When i am running on mnist dataset

20:02 < lozhnikov> Could you open a PR for GANs? It is more convenient.

20:03 < lozhnikov> What do you mean by "the matrix is no more psd"?

20:04 < kris1> Well i mean to say that lambda matrix now has -ve elements on the diagonal

20:04 < kris1> when i am trianing it

20:05 < lozhnikov> maybe the sign is invalid somewhere

20:06 < kris1> Okay i will check for that

20:07 < lozhnikov> I'll check the formulas in the paper tomorrow, it's too late for that now.

20:10 < lozhnikov> When I was reading the paper for the first time, I tried to derive a couple formulas and I got slightly different formulae up to sign. But I am not sure that I wasn't mistaken, I didn't dig into that.

20:10 < kris1> Do you mean check the formulas with the formulas in the PR. Yes that would be great.

20:11 < lozhnikov> I mean I'll check the formulas in the paper and then I'll check the formulas in the PR

20:11 < kris1> Sure

21:09 < rcurtin> ironstark: zoq: I was using the benchmarking system today, but I found that the check we have been applying does not work: 'if len(result) > 1'

21:09 < rcurtin> this is because 'result' is of type int, and if you do len(int), you get

21:09 < rcurtin> TypeError: object of type 'int' has no len()

21:13 < zoq> rcurtin: Good point, easy fix would be to do 'if isinstance(..., list) and len(..):'

21:32 < zoq> sumedhghais: rcurtin: You can even get rid of the zero output entirely, since the error is zero if the output is zero.

21:32 < zoq> Also, I agree, the effect of the trainable bais layer should be minimal, and it's only used for the LSTM model anyway, and since we are basically interested in the NTM layer we should concentrate on that point.

22:22 MikeLDN has quit [Ping timeout: 260 seconds]

22:49 < kris1> My program gets killed for bigger sized input automatically.

22:50 < kris1> Looking at lldb the problem seems to be iwth allocation of memory.

22:50 < kris1> https://gist.github.com/kris-singh/66d05b1de993d26b4bbe39ae869164cc

22:52 < kris1> arrayops::copy( memptr(), in_mat.mem, in_mat.n_elem );

22:53 < kris1> The program works for small vlaues of inisze and outsize.