rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
pbsds has left #mlpack [The Lounge - https://thelounge.chat]
<EshaanAgarwal[m]> <jonpsy[m]> "That bit flip one is a classic..." <- so should i go ahead with it ?
<jonpsy[m]> Sure, as long as a test is reproducible
<EshaanAgarwal[m]> jonpsy[m]: I have implemented the environment. Would you like to take a look at it whenever you have time ? Its in the same PR for HER
<jonpsy[m]> sure
<jonpsy[m]> can you jog me through exactly how its tested?
<EshaanAgarwal[m]> jonpsy[m]: I have not written the test yet ! There are some modifications in the q_impl.hpp for which I will have to look again so that we could plug in HER seamlessly.
<jonpsy[m]> jonpsy[m]: or planned to be tested
<jonpsy[m]> that bit flip environment for ex. Let's say we start from 00000, goal is 10011. Now whats the game?
<EshaanAgarwal[m]> <EshaanAgarwal[m]> "Which is - Imagine a single..." <- However as mentioned here, we have a arbitrary initial state and arbitrary final vector state. Agent at a time can provide the index where it wants to flip. And agent in the end should learn to reach any goal vector from intiaol vector
<EshaanAgarwal[m]> jonpsy[m]: Yeah so like agent can provide me the index where I need to flip ! Let's say it's 0th then we flip 0 to 1 and 10000 is our next state . Let's say from here it agains chooses 0th index then our next state becomes 00000 back again
<EshaanAgarwal[m]> And so forth
<EshaanAgarwal[m]> Finally it should learn to take us to the desired goal ( 10011)
<jonpsy[m]> after training, how do you specify which goal to choose?
<EshaanAgarwal[m]> jonpsy[m]: Can you elaborate please ?
<jonpsy[m]> So, at training it should've learned how to reach multitude of goals, correct?
<jonpsy[m]> s/at/after/
<EshaanAgarwal[m]> jonpsy[m]: Yeah so basically we have our one final true goal which is the state we want to reach . Now based on the goal selection strategy we will store more transitions where the goal would be different ( I.e let's say it reached B from A after a epsiode while it's actual goal was C . So we will save some more transitions where we will goal as C ( as if C was our final goal all along)
<jonpsy[m]> we should go for examples with sparse rewards
<jonpsy[m]> that robot hand example, any way we can simulate it?
<EshaanAgarwal[m]> jonpsy[m]: Well in the paper they has coded it themselves
<EshaanAgarwal[m]> Mujoco ones which are present in open ai gym might no help
<jonpsy[m]> hence my question, can we simulate it
<EshaanAgarwal[m]> jonpsy[m]: We can code it. They are a little complicated. I will have to see how to do it C++
<EshaanAgarwal[m]> That's why i looked at the but flip environment because it was simple and kind of worked as a pretty good task
<EshaanAgarwal[m]> s/but/bit/
<EshaanAgarwal[m]> jonpsy: zoq fieryblade https://meet.google.com/pnp-rtjw-unz
<zoq[m]> Have to skip today, can we move it tomorrow?
<EshaanAgarwal[m]> zoq[m]: i have no issues. Sure we can do that
<zoq[m]> Thanks