#mlpack on 2022-10-16 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

02:34 pbsds has left #mlpack [The Lounge - https://thelounge.chat]

08:31 <EshaanAgarwal[m]> <jonpsy[m]> "That bit flip one is a classic..." <- so should i go ahead with it ?

11:30 <jonpsy[m]> Sure, as long as a test is reproducible

11:32 <EshaanAgarwal[m]> jonpsy[m]: I have implemented the environment. Would you like to take a look at it whenever you have time ? Its in the same PR for HER

11:32 <jonpsy[m]> sure

11:32 <jonpsy[m]> can you jog me through exactly how its tested?

11:33 <EshaanAgarwal[m]> jonpsy[m]: I have not written the test yet ! There are some modifications in the q_impl.hpp for which I will have to look again so that we could plug in HER seamlessly.

11:34 <jonpsy[m]> jonpsy[m]: or planned to be tested

11:34 <jonpsy[m]> that bit flip environment for ex. Let's say we start from 00000, goal is 10011. Now whats the game?

11:34 <EshaanAgarwal[m]> <EshaanAgarwal[m]> "Which is - Imagine a single..." <- However as mentioned here, we have a arbitrary initial state and arbitrary final vector state. Agent at a time can provide the index where it wants to flip. And agent in the end should learn to reach any goal vector from intiaol vector

11:36 <EshaanAgarwal[m]> jonpsy[m]: Yeah so like agent can provide me the index where I need to flip ! Let's say it's 0th then we flip 0 to 1 and 10000 is our next state . Let's say from here it agains chooses 0th index then our next state becomes 00000 back again

11:36 <EshaanAgarwal[m]> And so forth

11:37 <EshaanAgarwal[m]> Finally it should learn to take us to the desired goal ( 10011)

11:45 <jonpsy[m]> after training, how do you specify which goal to choose?

11:46 <EshaanAgarwal[m]> jonpsy[m]: Can you elaborate please ?

11:46 <jonpsy[m]> So, at training it should've learned how to reach multitude of goals, correct?

11:46 <jonpsy[m]> s/at/after/

11:49 <EshaanAgarwal[m]> jonpsy[m]: Yeah so basically we have our one final true goal which is the state we want to reach . Now based on the goal selection strategy we will store more transitions where the goal would be different ( I.e let's say it reached B from A after a epsiode while it's actual goal was C . So we will save some more transitions where we will goal as C ( as if C was our final goal all along)

11:50 <jonpsy[m]> we should go for examples with sparse rewards

11:50 <jonpsy[m]> that robot hand example, any way we can simulate it?

11:51 <EshaanAgarwal[m]> jonpsy[m]: Well in the paper they has coded it themselves

11:51 <EshaanAgarwal[m]> Mujoco ones which are present in open ai gym might no help

11:52 <jonpsy[m]> hence my question, can we simulate it

11:52 <EshaanAgarwal[m]> jonpsy[m]: We can code it. They are a little complicated. I will have to see how to do it C++

11:53 <EshaanAgarwal[m]> That's why i looked at the but flip environment because it was simple and kind of worked as a pretty good task

11:53 <EshaanAgarwal[m]> s/but/bit/

16:31 <EshaanAgarwal[m]> jonpsy: zoq fieryblade https://meet.google.com/pnp-rtjw-unz

16:34 <zoq[m]> Have to skip today, can we move it tomorrow?

16:34 <EshaanAgarwal[m]> zoq[m]: i have no issues. Sure we can do that

16:35 <zoq[m]> Thanks