verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
yaswagner has quit [Quit: Page closed]
< jenkins-mlpack> Project docker mlpack weekly build build #48: STILL UNSTABLE in 3 hr 46 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20weekly%20build/48/
< jenkins-mlpack> * akhandait45: removed reduntant data members, moved NegativeLogLikelihood to loss
< jenkins-mlpack> * akhandait45: moved NegativeLogLikelihood to loss folder
< jenkins-mlpack> * akhandait45: rectified mistake in last commit, added comment
< jenkins-mlpack> * akhandait45: changed path
< jenkins-mlpack> * akhandait45: added sampling layer
< jenkins-mlpack> * akhandait45: removed parameters, it just does the reparametrization now
< jenkins-mlpack> * akhandait45: sampling layer done, kl divergence forward implemented
< jenkins-mlpack> * akhandait45: kl backward implemented
< jenkins-mlpack> * akhandait45: fix style errors
< jenkins-mlpack> * akhandait45: suggested changes made
< jenkins-mlpack> * akhandait45: seed removed
< jenkins-mlpack> * akhandait45: changed names in cmakelists
< jenkins-mlpack> * akhandait45: removed build errors
< jenkins-mlpack> * akhandait45: corrected kl forward and backward
< jenkins-mlpack> * akhandait45: added numerical gradient test
< jenkins-mlpack> * akhandait45: gradient check passed, removed redundant lines
< jenkins-mlpack> * akhandait45: corrections made in kl, more tests added
vivekp has quit [Ping timeout: 248 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 260 seconds]
vivekp has joined #mlpack
< Atharva> zoq: Are you there?
< zoq> Atharva: I'm now.
< Atharva> In network_init.hpp, what is offset variable for?
< zoq> Atharva: So all network parameter/weights are stored in a single matrix (continues memory). The idea is that each layer uses a specific part from the parameter matrix, and offset marks the begin for each layer.
< zoq> Atharva: So for the first layer offset is 0 and let's say the layer is of size 10 so that the offset for the next layer is 10, if the second layer is of size 20, the offset would be 10 + 20, for the next layer. Hopefully, this was helpful?
< Atharva> zoq: Understood, thanks!
< zoq> Atharva: Here to help.
< jenkins-mlpack> Project docker mlpack nightly build build #364: SUCCESS in 4 hr 9 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/364/
< Atharva> zoq: Sorry to disturb again, you there?
< Atharva> For the ann input/output PR, @rcurtin commented if it was possible to remove the inSize parameter from the first layer as well and not just the subsequent layers
< Atharva> I have found a way but it involves adding one argument to the ResetParameters() function.
< Atharva> which will be input.n_rows of course
< Atharva> Or there is another way but in that case we will need to stop using ResetParameters() function ourselves and only allow the forward and train functions to call it.
< zoq> Does this mean that each layer would have to make sure that ResetParameters() is called at least once, so if we just use the layer independently we have to make sure the function is called in the forward pass right?
< Atharva> Yes, ResetParameters() has to be called at least once, because that's where we go over te network and set the inSizes and weights for the layers. Till then inSize for all layers is zero.
< Atharva> But in some cases, ResetParameters() has been used before externally the Forward() or Train() functions. In that case the network has no way of knowing what the size of the input data is.
< Atharva> used externally before*
< sumedhghaisas> Atharva: I was just taking a look at the jacobian test for NormalDistribution
< sumedhghaisas> I am nt sure I understand your jacobian test
< zoq> ResetParameters: I see, so do you think we could provide both versions, one version expects that inSize is already set and the other takes the inSize from the provided dataset?
< sumedhghaisas> In the jacobian test you perturb the input and check the approximate gradient with real one. But I see that you are perturbing mean and variance instead
< Atharva> I was confused a lot over that too, but I think the mean and variance in this case can be considered input, and the obervation is just the target.
< Atharva> In the LogProbBackward function, the gradients are w.r.t. the mean and std
< sumedhghaisas> the check is delta_output / delta_input, and we keep the weights, in this case mean and variance, will be kept constant
< sumedhghaisas> ahh... I see what you mean
< Atharva> Also, in the ReconstructionLoss, we receive the mean and std as input which we then forward to the NormalDistribution
< Atharva> zoq: Yes we can keep both the ways, but I think we will need to use multiple definitions of ResetParameters() then
< Atharva> and if someone calls ResetParameters() without any parameter and doesn't provide input size for the first layer, it will throw an error.
< zoq> Atharva: Agreed, but at least for now I think that way we can provide backward compatibility.
< sumedhghaisas> Atharva: ahh I understand the test now. Can you tell me how much is the difference between the two?
< Atharva> zoq: Sorry I don't understand what you mean by backward compatibility
< zoq> Atharva: We can could check the inSize parameter and provide some reasonable output for the user.
< zoq> Atharva: I can use the current code without any changes.
< Atharva> sumedhghaisas: We get 5000 something when we should get 1e-5
< sumedhghaisas> 5000?
< sumedhghaisas> then something is terribly wrong
< Atharva> zoq: Yes that can be done, but another ResetParameters() definition is needed which will be used internally. People can still use the ann module the way it is noe.
< Atharva> sumedhghaisas: Yes, are we messing something big up, like conceptually
< Atharva> zoq: now*
< sumedhghaisas> hmm. thats scary.
< sumedhghaisas> I haven't taken a look at the derivatives yet, got a meeting for an hour, I will come back from it and take a look at the gradients :)
< zoq> Atharva: If a user can use it the way they use it now, I don't see any reason against the idea.
< Atharva> zoq: Okay, I will push a commit then
< Atharva> sumedhghaisas: Sure!
< ShikharJ> zoq: Are you there?
< zoq> Atharva: I'm here now.
< zoq> ShikharJ: I'm here now.
< zoq> ... wrong name :)
< ShikharJ> zoq: I had a doubt regarding the WGAN algorithm. In the usual GAN, we label the real images as one and fake ones as 0, so that we can obtain log(D(x)) + log(1 - D(G(x))) using CrossEntropy loss function.
< ShikharJ> zoq: But according to the WGAN algorithm, we need to obtain a simple D(x) - D(G(x)), so I'm not sure if any labelling at all should be done or not.
< ShikharJ> zoq: If any labelling is to be done, I'm not sure if both the real and the fake ones should have the same label or different labels.
< zoq> ShikharJ: In case of Wasserstein, it would make sense to use -1 for the generated samples and 1 for the real one, since the output doesn't use an activation on top. But I don't think we will see any difference if we use something else.
< ShikharJ> zoq: Ah, I see, and anyways, we're trying to maximize the distance between the D(x) and D(G(x)), so it makes sense.
vivekp has quit [Read error: Connection reset by peer]
prakhar_code[m] has quit [Ping timeout: 255 seconds]
killer_bee[m] has quit [Ping timeout: 247 seconds]
vivekp has joined #mlpack
manish7294 has joined #mlpack
< manish7294> rcurtin: zoq: Can I take 4 days off starting tommorow? I have to attend my cousin's wedding ceremony :)
< manish7294> I think I can be available on IRC for disscusions during first two days at least.
< Atharva> sumedhghaisas: Did you get a chance to look at the code?
prakhar_code[m] has joined #mlpack
< sumedhghaisas> Atharva: I am checking right now :)
< sumedhghaisas> Atharva: Just a quick question, are you storing the variance or the standard deviation? Cause the Softplus must be applied to variance
< Atharva> I am soring the standard deviation, why is that?
< Atharva> storing*
< sumedhghaisas> ahh sorry i meant standard deviation or log standard deviation. But I see that you are using standard deviation later on as well. never mind :)
< Atharva> Okayy :)
< sumedhghaisas> Atharva: could you run the test without Softplus?
< sumedhghaisas> just want to know where the exact error is
< Atharva> Okay, give me a minute
< sumedhghaisas> and also we should have with and without Softplus :)
< Atharva> Do you mean the tests with and without Softplus?
killer_bee[m] has joined #mlpack
< sumedhghaisas> yup
< Atharva> Okay, I will add another
< sumedhghaisas> okay the math seems correct on the first glance
< sumedhghaisas> and the test also looks correct
< sumedhghaisas> hmmm
< sumedhghaisas> interesting
manish7294 has quit [Ping timeout: 260 seconds]
< Atharva> Sorry I was on another branch, so it's taking time to build the tests
< sumedhghaisas> no problem
< sumedhghaisas> okay couple of other pointers
< sumedhghaisas> first is to see if without softplus passes
< sumedhghaisas> could you also run the test with 1 target element and try to analyze the results?
< Atharva> Okay, will do that
< sumedhghaisas> there should be just one entry in the matrix so easy to follow
< Atharva> It just passed without softplus
< sumedhghaisas> Atharva: aha...
< sumedhghaisas> okay need to run to another meeting. But now that we know the problem lies in softmax its easy to find.
< sumedhghaisas> I will take a look later again if you haven't solved it by then.
< Atharva> Hopefully I will have :)
< ShikharJ> zoq: Do you think I could implement a identity loss layer for which the forward routine just returns -arma::accu(target % (input + eps))?
< ShikharJ> zoq: Saying this because the FFN by default sets the loss layer to negative log-likelihood, and the WGAN paper is against using any kind of log or exponent based loss function.
< ShikharJ> zoq: Or if it's fine, I can try to implement the same in the Evaluate function itseld, no worries there.
< zoq> ShikharJ: Implementing an identity loss layer might be the cleanest solution, do you prefer to implement it inside the evaluation function?
< ShikharJ> zoq: I can implement it inside the loss_functions directory. Similar for the loss routine for WGANGradientPenalty.
< zoq> ShikharJ: Yeah, I like the idea.
< ShikharJ> zoq: Ideally we're computing an approximation of Kantorovich-Rubenstein duality form of Wasserstein-1 (or Earth Mover) distance. Do you want it to be named as such?
< zoq> ShikharJ: Don't have a preference here, each one sounds reasonable to me.
< ShikharJ> zoq: Then I'll name it EarthMover distance loss or something similar, so that a user has an idea of where the code is intended to be used.
< zoq> ShikharJ: Sounds perfect to me, don't think there are any confusions with another part of the codebase.
< zoq> manish7294: Thanks for the heads up, as from my side this is absolutely fine, have fun :)
vivekp has quit [Ping timeout: 256 seconds]
< Atharva> sumedhghaisas: It was due to the fact that the approximate jacobian was calculated w.r.t. the standard deviation and logProbBackward was w.r.t. pre standard deviation
< Atharva> I tried perturbing pre standard deviation and the test passed
< ShikharJ> zoq: rcurtin : Are you guys online?
< zoq> ShikharJ: About to step out, but I'm still here.
< ShikharJ> zoq: I was just thinking what would be the correct way of finding the column wise L2 norm of a matrix using armadillo's functions?
< ShikharJ> zoq: Considering an MxN matrix, we need to find the individual norms of the n columns?
< zoq> ShikharJ: Actually, I'm not sure there is an armadillo function that returns the norm for each col, so probably a loop is necessary. What you could do is to search the codebase for "L2 distance" or "euclidean norm".
< ShikharJ> zoq: There isn't, but the issue is that WGAN Gradient Penalty algorithm calculates the norm for each single input in the mini-batch, so I guess I'll go with the loop :)
< ShikharJ> zoq: The WGAN PR is now complete, I'll push the code in sometime, and tmux the builds tomorrow to see how we fare!
< rcurtin> manish7294: of course, enjoy the wedding ceremony!
< rcurtin> I will have some more theory for you by the time you get back
< rcurtin> things have been slow for me here, a lot to do before leaving the job next week
< ShikharJ> rcurtin: Till when can we expect the benchmark systems to remain functional?
< rcurtin> ShikharJ: at least Friday, but I will bring up replacements but they will not be as powerful
< ShikharJ> rcurtin: It's all good, atleast we would be finished with our planned goals and experiments by then :)
< rcurtin> right, well in any case I will be sure that we have something available to run experiments on
< ShikharJ> rcurtin: If I may ask, what was your reason for leaving Symantec?
< rcurtin> lack of work aligned with my research interests
< rcurtin> malware is an interesting problem, but I'm really interested in accelerating algorithms
< rcurtin> and there was not so much space for that inside symantex
< rcurtin> symantec*
< rcurtin> it wasn't a huge company need
< ShikharJ> Ah, I can understand, accelerating stuff is a highly rewarding feeling :)
< ShikharJ> Even I hate being assigned work I have no interest in :P Hope you have a good time in your new position.
< rcurtin> I hope so too, thanks :)