<rcurtin[m]>
Sometimes the input is necessary to properly compute the backward pass. I can't remember which layers do that, but if you look at the implementations of the Backward() functions, many of them have the input parameter commented out because it's not necessary. However, a couple do need to use it. I think Convolution is one of them, but I haven't checked and don't remember for sure 👍️
<zoq[m]>
Right, we should use the output of the forward pass.
<vaibhavp[m]>
Then, I think maybe a few tests in the ann module need to be rectified.
<rcurtin[m]>
To be clear, the layer API and the activation function API are different 👍️ the logistic function you linked to is not a layer in and of itself
<vaibhavp[m]>
rcurtin[m]: But, in the base_layer Deriv is used to calculate Backward which uses the output of the Forward method.
<zoq[m]>
s/is/it/, s/base_layer/base\_layer/
<rcurtin[m]>
I have no idea, I'm not able to dig deeply into the code here. Maybe there is a problem in the tests, but it would be hard to believe there is a problem at such a fundamental level since so many of our other tests and example models pass
<vaibhavp[m]>
Actually, I also had a hard time believe this was the case, because I had seen this many times and re-read the MultiLayer code multiple times to verify what was happening. But if you look at the [Backward code in the MutliLayer.hpp](https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/layer/multi_layer_impl.hpp#L187) you will see that the layerOutput of that layer is passed which is same as the output of [Forward Method in
<vaibhavp[m]>
* Actually, I also had a hard time believing this was the case, because I had seen this many times error and re-read the MultiLayer code multiple times to verify what was happening. But if you look at the [Backward code in the MutliLayer.hpp](https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/layer/multi_layer_impl.hpp#L187) you will see that the layerOutput of that layer is passed which is same as the output of [Forward
<vaibhavp[m]>
* Actually, I also had a hard time believing this was the case, because I had seen this error many times and re-read the MultiLayer code multiple times to verify what was happening. But if you look at the [Backward code in the MutliLayer.hpp](https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/layer/multi_layer_impl.hpp#L187) you will see that the layerOutput of that layer is passed which is same as the output of [Forward
<vaibhavp[m]>
* Actually, I also had a hard time believing this was the case, because I had seen this error many times and re-read the MultiLayer code multiple times to verify what was happening. But if you look at the [Backward code in the MutliLayer.hpp](https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/layer/multi_layer_impl.hpp#L187) you will see that the layerOutput[i] is passed which is same as the output of [Forward Method in
<vaibhavp[m]>
I don't think I explained it well, but if you look at the links it will all be clear.
<vaibhavp[m]>
s/be//
<vaibhavp[m]>
* I don't think I explained it well, but if you look at the links it will be all clear.
<zoq[m]>
If you can open a PR, we can check how many tests will fail and investigate further.
<vaibhavp[m]>
sure, on it.
<vaibhavp[m]>
Also to be noted is that some layers do not use the JacobianTest but maybe create their own check for Backward method like LogSoftMax
<rcurtin[m]>
I'd love to make the tests for each layer be a little bit more automated, but I don't think anyone's had a chance to do that. Ideally every layer should pass some simple tests on the forward, backward, and gradient passes. The JacobianTest is great because it just computes a numerical approximation of the gradient... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/a28233c58a337ad468b64189ed2e6e4f5db06474>)
<vaibhavp[m]>
<zoq[m]> "If you can open a PR, we can..." <- I performed the "ANNLayerTest" tests on my local machine and all tests passed.
<vaibhavp[m]>
But not all layers use the JacobianTest.
<rcurtin[m]>
If it passed before your change, and it also passed after your change, at least personally I'd be uncomfortable accepting your change (that implies that the particular code you are changing has not been tested properly!), or, at least, I'd want to take a very deep dive into it to understand why 😃
<vaibhavp[m]>
So, let me make a few more changes to see if something happens.
<rcurtin[m]>
It's super tedious to develop like this, but one way I like to develop when I think some code isn't right is, I write a test case that exposes the bug I think should exist. If that test case fails, then I can go fix the code and it will start passing, and I can sleep easy at night knowing I actually found and addressed an issue 😃
<rcurtin[m]>
Yeah, that's also a great idea to build knowledge about the code... change things around, see if anything changes, rinse and repeat, then sooner or later it all starts to make sense :)
<vaibhavp[m]>
<rcurtin[m]> "If it passed before your change,..." <- I completely agree. The reason the tests passed, at least from what I saw, is because most of the test correctly perform the test(without using the JacobianTest or its siblings) or either input is not even required. What I would suggest is that
<vaibhavp[m]>
<rcurtin[m]> "If it passed before your change,..." <- I completely agree. The reason the tests passed, at least from what I saw, is because most of the tests are correctly written(without using the JacobianTest or its siblings) or even if they do use the JacobianTest, input is not even required(Linear layer) or derivative is already calculated in the Forward method(like ELU layer). What I would suggest is perhaps that more tests be
<vaibhavp[m]>
implemented for each layer using JacobianTest(s) instead of using their own tests for Backward method. Perhaps, there should be a guide for how the a layer should be implemented for future contributors, which specifies what should passed to the Backward layer(and Gradient method also) because most people are not aware of what is happening internally in the FFN and RNN, and it is completely logically to assume that input is used to
<vaibhavp[m]>
calculate the derivative of a function(perhaps because it is what they have done in the school or college)
<vaibhavp[m]>
All this considering, there is a error here and output should be passed to the Backward layer instead of input.
<vaibhavp[m]>
So, I added JacobianTest for LogSoftMax, which failed. This is with the current code without changes.
<vaibhavp[m]>
Perhaps there is an issue here. What do you guys think?
<vaibhavp[m]>
The tests are failing in PR(#3465) as they should. Can someone do a sanity check for the changes I made?