#mlpack on 2018-06-29 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:27 yaswagner has quit [Quit: Page closed]

04:33 < jenkins-mlpack> Project docker mlpack weekly build build #48: STILL UNSTABLE in 3 hr 46 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20weekly%20build/48/

04:33 < jenkins-mlpack> * akhandait45: removed reduntant data members, moved NegativeLogLikelihood to loss

04:33 < jenkins-mlpack> * akhandait45: moved NegativeLogLikelihood to loss folder

04:33 < jenkins-mlpack> * akhandait45: rectified mistake in last commit, added comment

04:33 < jenkins-mlpack> * akhandait45: changed path

04:34 < jenkins-mlpack> * akhandait45: added sampling layer

04:34 < jenkins-mlpack> * akhandait45: removed parameters, it just does the reparametrization now

04:34 < jenkins-mlpack> * akhandait45: sampling layer done, kl divergence forward implemented

04:34 < jenkins-mlpack> * akhandait45: kl backward implemented

04:34 < jenkins-mlpack> * akhandait45: fix style errors

04:34 < jenkins-mlpack> * akhandait45: suggested changes made

04:34 < jenkins-mlpack> * akhandait45: seed removed

04:34 < jenkins-mlpack> * akhandait45: changed names in cmakelists

04:34 < jenkins-mlpack> * akhandait45: removed build errors

04:34 < jenkins-mlpack> * akhandait45: corrected kl forward and backward

04:34 < jenkins-mlpack> * akhandait45: added numerical gradient test

04:34 < jenkins-mlpack> * akhandait45: gradient check passed, removed redundant lines

04:34 < jenkins-mlpack> * akhandait45: corrections made in kl, more tests added

06:26 vivekp has quit [Ping timeout: 248 seconds]

06:29 vivekp has joined #mlpack

08:01 vivekp has quit [Ping timeout: 260 seconds]

08:11 vivekp has joined #mlpack

09:24 < Atharva> zoq: Are you there?

09:59 < zoq> Atharva: I'm now.

10:42 < Atharva> In network_init.hpp, what is offset variable for?

10:58 < zoq> Atharva: So all network parameter/weights are stored in a single matrix (continues memory). The idea is that each layer uses a specific part from the parameter matrix, and offset marks the begin for each layer.

10:58 < zoq> Atharva: So for the first layer offset is 0 and let's say the layer is of size 10 so that the offset for the next layer is 10, if the second layer is of size 20, the offset would be 10 + 20, for the next layer. Hopefully, this was helpful?

10:59 < Atharva> zoq: Understood, thanks!

11:00 < zoq> Atharva: Here to help.

11:23 < jenkins-mlpack> Project docker mlpack nightly build build #364: SUCCESS in 4 hr 9 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/364/

11:41 < Atharva> zoq: Sorry to disturb again, you there?

11:42 < Atharva> For the ann input/output PR, @rcurtin commented if it was possible to remove the inSize parameter from the first layer as well and not just the subsequent layers

11:44 < Atharva> I have found a way but it involves adding one argument to the ResetParameters() function.

11:44 < Atharva> which will be input.n_rows of course

11:49 < Atharva> Or there is another way but in that case we will need to stop using ResetParameters() function ourselves and only allow the forward and train functions to call it.

11:59 < zoq> Does this mean that each layer would have to make sure that ResetParameters() is called at least once, so if we just use the layer independently we have to make sure the function is called in the forward pass right?

12:02 < Atharva> Yes, ResetParameters() has to be called at least once, because that's where we go over te network and set the inSizes and weights for the layers. Till then inSize for all layers is zero.

12:03 < Atharva> But in some cases, ResetParameters() has been used before externally the Forward() or Train() functions. In that case the network has no way of knowing what the size of the input data is.

12:04 < Atharva> used externally before*

12:04 < sumedhghaisas> Atharva: I was just taking a look at the jacobian test for NormalDistribution

12:04 < sumedhghaisas> I am nt sure I understand your jacobian test

12:05 < zoq> ResetParameters: I see, so do you think we could provide both versions, one version expects that inSize is already set and the other takes the inSize from the provided dataset?

12:07 < sumedhghaisas> In the jacobian test you perturb the input and check the approximate gradient with real one. But I see that you are perturbing mean and variance instead

12:08 < Atharva> I was confused a lot over that too, but I think the mean and variance in this case can be considered input, and the obervation is just the target.

12:08 < Atharva> In the LogProbBackward function, the gradients are w.r.t. the mean and std

12:08 < sumedhghaisas> the check is delta_output / delta_input, and we keep the weights, in this case mean and variance, will be kept constant

12:09 < sumedhghaisas> ahh... I see what you mean

12:09 < Atharva> Also, in the ReconstructionLoss, we receive the mean and std as input which we then forward to the NormalDistribution

12:12 < Atharva> zoq: Yes we can keep both the ways, but I think we will need to use multiple definitions of ResetParameters() then

12:13 < Atharva> and if someone calls ResetParameters() without any parameter and doesn't provide input size for the first layer, it will throw an error.

12:13 < zoq> Atharva: Agreed, but at least for now I think that way we can provide backward compatibility.

12:13 < sumedhghaisas> Atharva: ahh I understand the test now. Can you tell me how much is the difference between the two?

12:14 < Atharva> zoq: Sorry I don't understand what you mean by backward compatibility

12:14 < zoq> Atharva: We can could check the inSize parameter and provide some reasonable output for the user.

12:14 < zoq> Atharva: I can use the current code without any changes.

12:15 < Atharva> sumedhghaisas: We get 5000 something when we should get 1e-5

12:16 < sumedhghaisas> 5000?

12:17 < sumedhghaisas> then something is terribly wrong

12:17 < Atharva> zoq: Yes that can be done, but another ResetParameters() definition is needed which will be used internally. People can still use the ann module the way it is noe.

12:17 < Atharva> sumedhghaisas: Yes, are we messing something big up, like conceptually

12:18 < Atharva> zoq: now*

12:18 < sumedhghaisas> hmm. thats scary.

12:19 < sumedhghaisas> I haven't taken a look at the derivatives yet, got a meeting for an hour, I will come back from it and take a look at the gradients :)

12:20 < zoq> Atharva: If a user can use it the way they use it now, I don't see any reason against the idea.

12:20 < Atharva> zoq: Okay, I will push a commit then

12:21 < Atharva> sumedhghaisas: Sure!

14:22 < ShikharJ> zoq: Are you there?

15:08 < zoq> Atharva: I'm here now.

15:08 < zoq> ShikharJ: I'm here now.

15:08 < zoq> ... wrong name :)

15:11 < ShikharJ> zoq: I had a doubt regarding the WGAN algorithm. In the usual GAN, we label the real images as one and fake ones as 0, so that we can obtain log(D(x)) + log(1 - D(G(x))) using CrossEntropy loss function.

15:13 < ShikharJ> zoq: But according to the WGAN algorithm, we need to obtain a simple D(x) - D(G(x)), so I'm not sure if any labelling at all should be done or not.

15:13 < ShikharJ> zoq: If any labelling is to be done, I'm not sure if both the real and the fake ones should have the same label or different labels.

15:19 < zoq> ShikharJ: In case of Wasserstein, it would make sense to use -1 for the generated samples and 1 for the real one, since the output doesn't use an activation on top. But I don't think we will see any difference if we use something else.

15:24 < ShikharJ> zoq: Ah, I see, and anyways, we're trying to maximize the distance between the D(x) and D(G(x)), so it makes sense.

16:13 vivekp has quit [Read error: Connection reset by peer]

16:18 prakhar_code[m] has quit [Ping timeout: 255 seconds]

16:18 killer_bee[m] has quit [Ping timeout: 247 seconds]

16:20 vivekp has joined #mlpack

16:21 manish7294 has joined #mlpack

16:27 < manish7294> rcurtin: zoq: Can I take 4 days off starting tommorow? I have to attend my cousin's wedding ceremony :)

16:27 < manish7294> I think I can be available on IRC for disscusions during first two days at least.

16:57 < Atharva> sumedhghaisas: Did you get a chance to look at the code?

17:04 prakhar_code[m] has joined #mlpack

17:06 < sumedhghaisas> Atharva: I am checking right now :)

17:20 < sumedhghaisas> Atharva: Just a quick question, are you storing the variance or the standard deviation? Cause the Softplus must be applied to variance

17:20 < Atharva> I am soring the standard deviation, why is that?

17:20 < Atharva> storing*

17:22 < sumedhghaisas> ahh sorry i meant standard deviation or log standard deviation. But I see that you are using standard deviation later on as well. never mind :)

17:23 < Atharva> Okayy :)

17:32 < sumedhghaisas> Atharva: could you run the test without Softplus?

17:32 < sumedhghaisas> just want to know where the exact error is

17:34 < Atharva> Okay, give me a minute

17:34 < sumedhghaisas> and also we should have with and without Softplus :)

17:36 < Atharva> Do you mean the tests with and without Softplus?

17:37 killer_bee[m] has joined #mlpack

17:38 < sumedhghaisas> yup

17:38 < Atharva> Okay, I will add another

17:38 < sumedhghaisas> okay the math seems correct on the first glance

17:38 < sumedhghaisas> and the test also looks correct

17:38 < sumedhghaisas> hmmm

17:39 < sumedhghaisas> interesting

17:39 manish7294 has quit [Ping timeout: 260 seconds]

17:39 < Atharva> Sorry I was on another branch, so it's taking time to build the tests

17:39 < sumedhghaisas> no problem

17:40 < sumedhghaisas> okay couple of other pointers

17:40 < sumedhghaisas> first is to see if without softplus passes

17:40 < sumedhghaisas> could you also run the test with 1 target element and try to analyze the results?

17:40 < Atharva> Okay, will do that

17:41 < sumedhghaisas> there should be just one entry in the matrix so easy to follow

17:43 < Atharva> It just passed without softplus

17:43 < sumedhghaisas> Atharva: aha...

17:47 < sumedhghaisas> okay need to run to another meeting. But now that we know the problem lies in softmax its easy to find.

17:47 < sumedhghaisas> I will take a look later again if you haven't solved it by then.

17:48 < Atharva> Hopefully I will have :)

18:11 < ShikharJ> zoq: Do you think I could implement a identity loss layer for which the forward routine just returns -arma::accu(target % (input + eps))?

18:21 < ShikharJ> zoq: Saying this because the FFN by default sets the loss layer to negative log-likelihood, and the WGAN paper is against using any kind of log or exponent based loss function.

18:26 < ShikharJ> zoq: Or if it's fine, I can try to implement the same in the Evaluate function itseld, no worries there.

18:37 < zoq> ShikharJ: Implementing an identity loss layer might be the cleanest solution, do you prefer to implement it inside the evaluation function?

18:38 < ShikharJ> zoq: I can implement it inside the loss_functions directory. Similar for the loss routine for WGANGradientPenalty.

18:39 < zoq> ShikharJ: Yeah, I like the idea.

18:40 < ShikharJ> zoq: Ideally we're computing an approximation of Kantorovich-Rubenstein duality form of Wasserstein-1 (or Earth Mover) distance. Do you want it to be named as such?

18:43 < zoq> ShikharJ: Don't have a preference here, each one sounds reasonable to me.

18:44 < ShikharJ> zoq: Then I'll name it EarthMover distance loss or something similar, so that a user has an idea of where the code is intended to be used.

18:45 < zoq> ShikharJ: Sounds perfect to me, don't think there are any confusions with another part of the codebase.

18:48 < zoq> manish7294: Thanks for the heads up, as from my side this is absolutely fine, have fun :)

18:50 vivekp has quit [Ping timeout: 256 seconds]

19:26 < Atharva> sumedhghaisas: It was due to the fact that the approximate jacobian was calculated w.r.t. the standard deviation and logProbBackward was w.r.t. pre standard deviation

19:26 < Atharva> I tried perturbing pre standard deviation and the test passed

21:23 < ShikharJ> zoq: rcurtin : Are you guys online?

21:26 < zoq> ShikharJ: About to step out, but I'm still here.

21:27 < ShikharJ> zoq: I was just thinking what would be the correct way of finding the column wise L2 norm of a matrix using armadillo's functions?

21:30 < ShikharJ> zoq: Considering an MxN matrix, we need to find the individual norms of the n columns?

21:38 < zoq> ShikharJ: Actually, I'm not sure there is an armadillo function that returns the norm for each col, so probably a loop is necessary. What you could do is to search the codebase for "L2 distance" or "euclidean norm".

21:39 < ShikharJ> zoq: There isn't, but the issue is that WGAN Gradient Penalty algorithm calculates the norm for each single input in the mini-batch, so I guess I'll go with the loop :)

21:40 < ShikharJ> zoq: The WGAN PR is now complete, I'll push the code in sometime, and tmux the builds tomorrow to see how we fare!

21:58 < rcurtin> manish7294: of course, enjoy the wedding ceremony!

21:59 < rcurtin> I will have some more theory for you by the time you get back

22:02 < rcurtin> things have been slow for me here, a lot to do before leaving the job next week

22:02 < ShikharJ> rcurtin: Till when can we expect the benchmark systems to remain functional?

22:05 < rcurtin> ShikharJ: at least Friday, but I will bring up replacements but they will not be as powerful

22:06 < ShikharJ> rcurtin: It's all good, atleast we would be finished with our planned goals and experiments by then :)

22:10 < rcurtin> right, well in any case I will be sure that we have something available to run experiments on

22:11 < ShikharJ> rcurtin: If I may ask, what was your reason for leaving Symantec?

22:25 < rcurtin> lack of work aligned with my research interests

22:25 < rcurtin> malware is an interesting problem, but I'm really interested in accelerating algorithms

22:25 < rcurtin> and there was not so much space for that inside symantex

22:25 < rcurtin> symantec*

22:25 < rcurtin> it wasn't a huge company need

22:26 < ShikharJ> Ah, I can understand, accelerating stuff is a highly rewarding feeling :)

22:28 < ShikharJ> Even I hate being assigned work I have no interest in :P Hope you have a good time in your new position.

23:08 < rcurtin> I hope so too, thanks :)