14:59
<
jonpsy[m] >
Anything new? Eshaan Agarwal
15:07
<
EshaanAgarwal[m] >
<jonpsy[m]> "Anything new? Eshaan Agarwal..." <- yeah ! i was checking values on random runs and i found that that the new actor network's `actionProb` is coming same as old actor network's `actionProb`. This means that the new actor is not learning !
15:08
<
EshaanAgarwal[m] >
EshaanAgarwal[m]: this is one example but this is consistent across all runs.
15:15
<
jonpsy[m] >
So i guess that backward hunch was right all along
15:15
<
jonpsy[m] >
s/hunch/guess/
15:18
<
EshaanAgarwal[m] >
jonpsy[m]: Yeah ! But I am not sure where we are missing.
15:19
<
jonpsy[m] >
First ensure pytorch & you are calculating the same loss
15:19
<
jonpsy[m] >
It's possible your loss is 0
15:19
<
jonpsy[m] >
But I think that's unlikely, what's more likely is you're setting it 0 somewhere later OR not updating the value.
15:20
<
jonpsy[m] >
Either back prop function isn't working as expected, or loss is not updated. That's most likely
15:20
<
EshaanAgarwal[m] >
jonpsy[m]: I don't think this is happening. Let me share you a screen shots of all the values I computed in that run.
15:21
<
EshaanAgarwal[m] >
jonpsy[m]: Agreed.
15:21
<
jonpsy[m] >
Can you print the updated weights AFTER the backprops.
15:21
<
jonpsy[m] >
I think this will make everything clear
15:22
<
EshaanAgarwal[m] >
jonpsy[m]: Of both the networks ? Okay ! I will do it.
15:23
<
jonpsy[m] >
Those are gradients right? I asked weights.
15:23
<
EshaanAgarwal[m] >
jonpsy[m]: Yeah ! Sorry these were the previous screenshots that I mentioned.
15:23
<
EshaanAgarwal[m] >
Sending them in 1 min
15:23
<
jonpsy[m] >
Nw, take your time. More essential to be correct than fast
15:28
<
jonpsy[m] >
<EshaanAgarwal[m]> "Screenshot from 2022-09-14 20-51..." <- This comes first, the above comes later. Right/
15:28
<
EshaanAgarwal[m] >
jonpsy[m]: Right 😅
15:28
<
jonpsy[m] >
what's dGrad?
15:29
<
jonpsy[m] >
EshaanAgarwal[m]: For the second one, can you show me the full picture? With actor -3.238 something
15:29
<
EshaanAgarwal[m] >
jonpsy[m]: After applying softmax on `dLoss`. This is what we send to backward pass
15:30
<
jonpsy[m] >
Ok, are the `dGrad` values matching?
15:30
<
jonpsy[m] >
in `pytorch`?
15:30
<
EshaanAgarwal[m] >
jonpsy[m]: Let me try that ! Before that I have got the weights for both actor and critic before and after update
15:30
<
jonpsy[m] >
sure, send it
15:30
<
EshaanAgarwal[m] >
jonpsy[m]: Let me check that after this
15:30
<
jonpsy[m] >
create a gmeet, lets connect
15:31
<
EshaanAgarwal[m] >
Sure. Sending the link
15:31
<
jonpsy[m] >
zoq: mind joining?
15:38
<
jonpsy[m] >
Ping back when your net fixes
15:39
<
EshaanAgarwal[m] >
jonpsy[m]: Just got back.
15:44
<
EshaanAgarwal[m] >
EshaanAgarwal[m]: jonpsy: would it be possible that we can't meet at 9:45 ?
15:44
<
EshaanAgarwal[m] >
* jonpsy: would it be possible that we can't meet at 9:45 ?
15:44
<
EshaanAgarwal[m] >
> <@eshaanagarwal:matrix.org> Just got back.
15:45
<
EshaanAgarwal[m] >
* jonpsy: would it be possible that we meet at 9:45 ?
15:45
<
EshaanAgarwal[m] >
s/can't//
16:13
<
jonpsy[m] >
sorry had company work, can we meet again
16:13
<
jonpsy[m] >
Eshaan Agarwal: ^^
16:13
<
EshaanAgarwal[m] >
jonpsy[m]: Sure.
16:14
<
jonpsy[m] >
same meet?
17:32
<
EshaanAgarwal[m] >
<jonpsy[m]> "same meet?..." <- i did try subtracting old and new weights ! but i am facing precision issues in python.
17:33
<
ShubhamAgrawal[m >
EshaanAgarwal[m]: subtraction is not numerically stable
17:33
<
ShubhamAgrawal[m >
try to take average
17:33
<
ShubhamAgrawal[m >
maybe that will solve some problem
17:34
<
EshaanAgarwal[m] >
ShubhamAgrawal[m: But how would that let me know the difference in their values ?
17:34
<
ShubhamAgrawal[m >
How much error are you getting rn?
17:35
<
EshaanAgarwal[m] >
ShubhamAgrawal[m: as of now for 0.3890 - 0.3080 i am getting ans as 0
17:41
<
ShubhamAgrawal[m >
<EshaanAgarwal[m]> "as of now for 0.3890 - 0.3080..." <- Is this maximum or accumulation of all errors
17:41
<
ShubhamAgrawal[m >
?
17:42
<
EshaanAgarwal[m] >
ShubhamAgrawal[m: All are of this order
17:44
<
ShubhamAgrawal[m >
EshaanAgarwal[m]: Then idts it's precision error
17:44
<
ShubhamAgrawal[m >
There is something else that is missing