<ShubhamAgrawal[7>
<zoq[m]1> "I can provide some pseudo code..." <- If we can do weight decay for layer wise approach, then I can create AdamW optimizer, and it can also be used in many areas
<ShubhamAgrawal[7>
Cause for lr weight decay we can create custom callbacks. And give them some basic example as exposed special schedulers which will suffice ig.
<ShubhamAgrawal[7>
Cause I many times personally use weight decay for avoiding overfitting models
CaCode has joined #mlpack
CaCode_ has joined #mlpack
CaCode has quit [Ping timeout: 240 seconds]
CaCode_ has quit [Quit: Leaving]
<zoq[m]1>
For that to work we have to think about how we tell the optimizer, either how to split the parameter matrix or provide a different interface like instead of passing a single matrix that needs to be optimized, a vector of matrices.