verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< chenzhe> zoq: Thanks!
chenzhe has quit [Quit: chenzhe]
kris1 has quit [Quit: kris1]
sumedhghaisas has quit [Ping timeout: 248 seconds]
sheogorath27_ has joined #mlpack
sheogorath27_ is now known as sheogorath27
sumedhghaisas has joined #mlpack
< sumedhghaisas> rcurtin: Hey Ryan... still awake? I solved yesterday's problem so didn;t send any email. But I am stuck right now at some differentiation problem...
< sumedhghaisas> what is the diffentiation of W_dash = W / sum(W) ?
< sumedhghaisas> I have this function in NTM...
< sumedhghaisas> maybe I am doing some horrible mistake here... but I am getting the differentiation to be zero...
sumedhghaisas has quit [Ping timeout: 248 seconds]
partobs-mdp has joined #mlpack
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 248 seconds]
< partobs-mdp> zoq: Finally managed to clean up, mow running MiniBatchSGD + reset code. However, what value should I set for epochs in MinibatchSGD? It seems to run a lot of passes in one "iteration"
kris1 has joined #mlpack
sumedhghaisas has joined #mlpack
kris1 has quit [Quit: kris1]
< partobs-mdp> The objective also steadily decreases, though
kris1 has joined #mlpack
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
kris1 has quit [Client Quit]
kris1 has joined #mlpack
kris1 has quit [Client Quit]
kris1 has joined #mlpack
mikeling has quit [Quit: Connection closed for inactivity]
sumedhghaisas has quit [Ping timeout: 248 seconds]
vivekp has quit [Ping timeout: 240 seconds]
mikeling has joined #mlpack
shikhar has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
MikeLDN has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
kris1 has quit [Quit: kris1]
< MikeLDN> Hi all
< MikeLDN> I'm having a problem with cmd line CF and query file option (testing on movie lens set, probably some mess with indexing). Can anyone give me me a link to other free dataset?
< zoq> partobs-mdp: I used the same settings as you used for Adam e.g. opt.MaxIterations() = samples; and a batch size of 10.
< zoq> MikeLDN: You could test it on the MovieLens datasets as proposed on the tutorial: http://www.mlpack.org/docs/mlpack-2.2.3/doxygen/cftutorial.html
shikhar has quit [Read error: Connection reset by peer]
< MikeLDN> You mean "ml-latest-small.zip" from https://grouplens.org/datasets/movielens/ ?
< zoq> MikeLDN: right
< MikeLDN> Thnx. I have tested with ratings.csv (first removed heading row and timestamp column) but don't get the same result as on CF tutorial page. Both users and items column start with 1 index (instead of 0). Could that be the problem, should I re-index them?
shikhar has joined #mlpack
< zoq> MikeLDN: Since the datasets was updated over time you don't see the exact same results. Perhaps rcurtin, also in the channel still has the old dataset, he will respond once he has a chance.
< MikeLDN> That would be perfect, thanks.
shikhar has quit [Ping timeout: 240 seconds]
shikhar has joined #mlpack
vivekp has joined #mlpack
< partobs-mdp> zoq: Finally managed to properly run the experiment. After *nine* iterations got 97.4% on testing set.
< partobs-mdp> bin/mlpack_lstm_baseline -t copy -e 800 -b 2 -l 10 -r 3 -s 1000 -v
< zoq> partobs-mdp: Sounds good, what batch size did you use?
< partobs-mdp> I think we can simply test AddTask after it and *finally* start working on HAM (But we have some kind of a head start - the memory data structure)
< partobs-mdp> zoq: 10, as you proposed
< partobs-mdp> 99.7% after 10 iterations
< zoq> partobs-mdp: I agree, I think it's a good idea to split the copy task from the other tasks and open a single PR for the add task. Once we figured out the issues with the other task we can merge them. What do you think?
< partobs-mdp> zoq: Sound good, but isn't it over-engineering? We're (imho) more or less done with the task API and representation, so we can simply push the changes to the running PR. (Not sure if I'm right about that, though)
< zoq> We can leave the PR open for now, but I don't like the idea to merge something without providing a model that is able to learn (stable) the task. Since we know we can learn the copy task, I thought we could merge the code now.
< partobs-mdp> zoq: Can't find the gist with the results, but did we have 5% precision on 4-bit addition task?
< partobs-mdp> *have we ever had
< zoq> almost
< partobs-mdp> well, it was the result from the first iteration
< partobs-mdp> on iter5 it has 5.5%
< partobs-mdp> zoq: I think we can call the gradient explosion problem solved. However, AddTask learns really slowly even now
< zoq> Have you tested the Add task with minibatch SGD, I couldn't see any progress after like 1000 iterations?
< partobs-mdp> I'm currently running it on minibatch SGD
< zoq> If you are able to train the Task, I don't see a reason to split the Add task from the tasks for now.
shikhar has quit [Ping timeout: 260 seconds]
shikhar has joined #mlpack
< partobs-mdp> zoq: Well, although I did manage to get 51.5% on test for adding 1-bit numbers (our best, as far as I remember), the learning always halts after several iterations in AddTask (the objective keeps going down in small steps, but the validation score doesn't really improve)
< zoq> partobs-mdp: For me the questions remains, what is the issue; parameter initialization or some issue with the task itself, if we are able to learn the Task with HAM or another model, I don't mind to merge it.
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
shikhar_ has joined #mlpack
shikhar has quit [Ping timeout: 268 seconds]
shikhar_ has quit [Quit: WeeChat 1.7]
< rcurtin> MikeLDN: I think I used the standard movielens-100k dataset for that, no special modifications
shikhar has joined #mlpack
< rcurtin> I can't seem to find the exact old dataset that I used though
< rcurtin> I am also finding that with the current master branch, I am not seeing recommendations that look correct... a huge number of the top recommendations are item 1, which doesn't seem right
< rcurtin> I'm digging deeper now, let's see what I find...
< MikeLDN> rcurtin: Thnx. I was able to generate -A output but could not check it. However, with -q option wasn't able to get the result I would expect (for instance, single user result is not the same as the one in -A set). User/Item indexes left to start from 1 (not 0). Hope this helps...
< rcurtin> I see that the bug was introduced between mlpack-2.0.3 and mlpack-2.1.0
< rcurtin> if you want to try with mlpack-2.0.3, I think you will get the result you expect, and in the mean time I see the code that was changed and am starting to work with it to resolve the issue
< rcurtin> (I guess this means that mlpack-2.2.4 gets released today as a bugfix...)
< rcurtin> also, keep in mind that if you run mlpack_cf twice, once with '-A' and once with '-q', you'll get different results because the starting point of the optimization is random
< rcurtin> so to get the same results, you'd need to also specify --seed <some value>
< rcurtin> with the same seed value for both runs
< rcurtin> or, alternately, use --output_model_file to save the model for the first run, then use --input_model_file for a subsequent run
< MikeLDN> All clear. Thank you for the fast response...
< rcurtin> sure, let me know if you still have issues with mlpack-2.0.3, and I'll keep you updated with what I find as I dig into this bug
MikeLDN has quit [Ping timeout: 260 seconds]
< rcurtin> wow, I found it... it is a single-character bug
partobs-mdp has quit [Remote host closed the connection]
MikeLDN has joined #mlpack
< MikeLDN> Nice. I will wait for the release then, no need to compile 2.0.3 (for Win)
kris1 has joined #mlpack
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< rcurtin> MikeLDN: sure, I should have it done by the end of the day if something else doesn't distract me
shikhar_ has joined #mlpack
shikhar has quit [Ping timeout: 246 seconds]
sumedhghaisas has joined #mlpack
< MikeLDN> rcurtin: note. I took the fresh code from master, compiled it and just to confirm it is working now as expected (-q option). cheers
< rcurtin> great! :)
< sumedhghaisas> zoq: Hey Marcus...
< rcurtin> sumedhghaisas: I am awake now too, finally we are here at the same time :)
< rcurtin> sorry about last night, I had just gone to bed
< sumedhghaisas> rcurtin: Hey Ryan. Okay I also had something to talk to you. About your idea to remove the constant zero vector from GRU and LSTM
< sumedhghaisas> In most of the experiments now they use a bias initialization which is trained.
< rcurtin> can you link me to the code and discussion so I can refresh my memory?
< sumedhghaisas> So I was talking about the zero vector which is used as cell initialization
< rcurtin> sure, it's PR #1018 but which subdiscussion is it? I am not seeing a relevant one
< rcurtin> I owe you a response on this PR too, I'll handle that now, but we should talk about what you wanted to talk about here :)
< sumedhghaisas> ahh yes... I see that it shows outdated now.
< rcurtin> that's ok, if that is the right discussion then I am on the same page :)
< sumedhghaisas> its that 3 point changes that you suggested on the gru_impl.hpp
< rcurtin> right, I think that's what I just linked to
< sumedhghaisas> ahh yeah. So now while I am implementin NTM I have realized that most of the architectures now do not use zero vector there.
< sumedhghaisas> cause always using a zero vector is very restrictive. They use a bias layer. Which can be trained instead.
< sumedhghaisas> thus the cell will figure out which vector to initialize itself with.
< sumedhghaisas> for the optimum performance
< rcurtin> right, so your idea then is to replace the zeros with something trainable
< sumedhghaisas> yes. But I am not sure how I would design that.
< rcurtin> do the NTM papers or anything discuss this bias layer?
< sumedhghaisas> yeah. let me send you the link...
< sumedhghaisas> Page 10
< sumedhghaisas> section Experiments
< sumedhghaisas> second para
< rcurtin> right, so they say that the bias vector is learned there
< rcurtin> at least for the LSTM networks
< rcurtin> but for NTM they just give "bias values"
< rcurtin> hehe, and the word 'bias' only appears three times in the paper
< sumedhghaisas> yeah... seems like NTM is biased towards something.
< sumedhghaisas> :)
< rcurtin> according to the paper, it is 'biased towards storing data without interference' :) (that's one of the three occurrences)
< sumedhghaisas> I think what they mean is again the same learned bias layer... for NTM...
< rcurtin> yeah, I think that is all we can assume
< sumedhghaisas> storing data without inference? What does that even mean? anyways...
< sumedhghaisas> I was thinking about accepting a layer in LSTM constructor which will be used as the initialization layer...
< sumedhghaisas> what do you think?
< rcurtin> I'd be interested in zoq's thoughts on that one, I think it could be fine, but we would have to modify the LSTM (and GRU) so it properly trains that initialization layer
< rcurtin> personally, I think that going with zeros is okay for now---in the end, the priority is getting your overall project done
< sumedhghaisas> the paper is super vague though. The funniest thing is... in the whole paper they have not mentioned exactly where the external output is. The image shows that it is coming from the controller network. But is the same as the input to the memory heads or it is different?
< rcurtin> so I would say, maybe it is better to open an issue detailing the problem and how it can be fixed, but focus your efforts on the NTM implementation itself
< rcurtin> and only revisit the issue this summer if the performance you are getting it far lower than what is expected by the paper
shikhar_ has quit [Quit: WeeChat 1.7]
< sumedhghaisas> ahh yes. Thats what I am doing right now. Even in NTM I am assuming zero init.
< rcurtin> right; so, do you think you want to open an issue detailing the problem, detailing possible approaches, and then we can continue the discussion of what to do there?
< rcurtin> that will be a little more long-lasting of a discussion than IRC I think :)
< rcurtin> personally I suspect the zero init will have only a minor effect on the performance
< rcurtin> (I could be wrong though---but we'll find out later)
< rcurtin> as for the comments I left in the PR, I'd just do whatever you think the easiest of the three options is
< rcurtin> it would be nice to get a little speedup over the current implementation, but if you are going to open an issue where we'll re-engineer it later anyway, there is no need to put hours upon hours into perfectly optimizing it
partobs-mdp has joined #mlpack
< sumedhghaisas> rcurtin: Ahh yes. I have already pushed the solution using idea 2. Idea 2 involves lot of refactoring so if it is going to get to get replaced I dont see a point
< sumedhghaisas> *sorry I meant Idea 3 involved lot of refactoring
< rcurtin> yeah, idea 3 would have been a lot of work
< rcurtin> idea 2 is just fine, so I guess with that, then if there are no more comments the GRU implementation is ready?
< rcurtin> I guess, how is the NTM work coming otherwise?
< rcurtin> I guess for the batchnorm layer, we already have praveen's PR #955, but do you need to modify that to be implemented the way that you need? I think it would be ok to use his PR as a base to implement some further changes
< ironstark> rcurtin: zoq: Is Neural Network Toolbox not installed in slake? When I try to run Perceptron it shows this error
< ironstark> To use 'perceptron', the following product must be both licensed and installed:
< ironstark> Neural Network Toolbox
< rcurtin> ironstark: I think I installed all toolboxes I have a license to install
sumedhghaisas has quit [Ping timeout: 248 seconds]
vivekp has quit [Ping timeout: 240 seconds]
< ironstark> Actually the current perceptron script is not working
< ironstark> It is because of a different reason
< ironstark> thought that was the reason
< ironstark> but the toolbox is installed.
< ironstark> There seems to be some error with the MATLAB installaton
< ironstark> when I run the scripts on this page - while running the line y = net(x);
< ironstark> I get: Subscript indices must either be real positive integers or logicals.
< ironstark> I get the same thing when I am running predictions = net(testSet) in the current perceptron script
< zoq> ironstark: It's is a binary classifier, so you can't train it on e.g. the iris dataset.
< zoq> The webpage, or arcene dataset should work.
< zoq> For the arcene dataset you have to normalize the labels.
< ironstark> oh, ok got it. Thanks :)
< zoq> If that doesn't solve the problem, let us know and we take a closer look into the issue.
< kris1> Hi mikhail i updated the GAN implementation
< kris1> I am only now not sure about the Evaluate function.
< kris1> Can you have a look at that....
< kris1> Basicallly if i want to use the evaluate function of the generator or the discriminator network i would have to provide them with predictors and discriminators. If i implement the evaluate function i don’t have to do that
< kris1> But i am not sure what the evaluate function for gan should be
mentekid has quit [Quit: Leaving.]
kris1 has quit [Quit: kris1]
mentekid has joined #mlpack
kris1 has joined #mlpack
partobs-mdp has quit [Remote host closed the connection]
< kris1> zoq: there?
< lozhnikov> kris1: I think the Evaluate function should look like log(1-D(G(z))) i.e. the function that we are going to minimize
< kris1> For the whole gan. ie for both the generator and discriminator
< lozhnikov> yes, for the GAN class
< kris1> Okay…. i will implement that. Also could you look at the above gist. I have changed the trianing function as you had suggested
< kris1> With ssRBM i am running into chol error meaning that the matrix is no more psd
< kris1> When i am running on mnist dataset
< lozhnikov> Could you open a PR for GANs? It is more convenient.
< lozhnikov> What do you mean by "the matrix is no more psd"?
< kris1> Well i mean to say that lambda matrix now has -ve elements on the diagonal
< kris1> when i am trianing it
< lozhnikov> maybe the sign is invalid somewhere
< kris1> Okay i will check for that
< lozhnikov> I'll check the formulas in the paper tomorrow, it's too late for that now.
< lozhnikov> When I was reading the paper for the first time, I tried to derive a couple formulas and I got slightly different formulae up to sign. But I am not sure that I wasn't mistaken, I didn't dig into that.
< kris1> Do you mean check the formulas with the formulas in the PR. Yes that would be great.
< lozhnikov> I mean I'll check the formulas in the paper and then I'll check the formulas in the PR
< kris1> Sure
< rcurtin> ironstark: zoq: I was using the benchmarking system today, but I found that the check we have been applying does not work: 'if len(result) > 1'
< rcurtin> this is because 'result' is of type int, and if you do len(int), you get
< rcurtin> TypeError: object of type 'int' has no len()
< zoq> rcurtin: Good point, easy fix would be to do 'if isinstance(..., list) and len(..):'
< zoq> sumedhghais: rcurtin: You can even get rid of the zero output entirely, since the error is zero if the output is zero.
< zoq> Also, I agree, the effect of the trainable bais layer should be minimal, and it's only used for the LSTM model anyway, and since we are basically interested in the NTM layer we should concentrate on that point.
MikeLDN has quit [Ping timeout: 260 seconds]
< kris1> My program gets killed for bigger sized input automatically.
< kris1> Looking at lldb the problem seems to be iwth allocation of memory.
< kris1> arrayops::copy( memptr(), in_mat.mem, in_mat.n_elem );
< kris1> The program works for small vlaues of inisze and outsize.