jonpsy: zoq have implemented maze environment and pushed the code to PR. Just need to fix a couple of things in it. Meanwhile it would be great if you guys can take a look.
<EshaanAgarwal[m]> "jonpsy: zoq same results with..." <- keep 256, but increase depth
256, 64, 64 like that perhasp
<jonpsy[m]> "256, 64, 64 like that perhasp" <- Shouldn't last layer as 64 give index out of bounds ?
i meant hidden layers
jonpsy[m]: Ok I will make a custom network and then try.
jonpsy: zoq: our test for the maze environment is converging ! I have fixed the environment.
Can you please take a look at it whenever you are free ?
one question
jonpsy[m]: Yes ;)
* Yes
what about other policies
just to be sure, our env just returns `0` right?
and only returns `1` when it reaches the goal (aka where `1` was setup)
<EshaanAgarwal[m]> "Ok I will make a custom network..." <- also keep decreasing complexity while it still converges
jonpsy[m]: Returns -1 if it's wall or out of bound of maze
Other replays weren't even giving positive avg return
It shouldn't "return" anything when it hits a wall
But this was consistently giving positive return of 0.9 in almost 100 episodes
it should just go back
Basically wall is there to restrict movements
so hitting a wall should be "unviable" path
jonpsy[m]: But that was a wrong step so we should give it a negative reward right ?
jonpsy[m]: It is going back and side by side I am giving it a -1 reward for wrong action
EshaanAgarwal[m]: I have implemented this in the code
you could check if the path would lead to "-1" aka wall
and not go there at all
jonpsy[m]: Agent has the option to choose any action right ! That's the whole point that it learns to understand that it doesn't need to move to a wall
From a 0 state if the agents chooses an action which takes it to a wall then it will not go there but at the same time ! We will give a negative reward to it.
* to it for the wrong action it chose.
akhunti1 has joined #mlpack
akhunti1 has quit [Client Quit]
akhunti1 has joined #mlpack
eshaan has joined #mlpack
akhunti1 has quit [Quit: Client closed]
akhunti1 has joined #mlpack
eshaan has quit [*.net *.split]
akhunti1 has quit [*.net *.split]
<jonpsy[m]> "you could check if the path..." <- zoq: have removed the negative reward associated with the wall ! It's still performing way better than others . I have pushed those changes.
* zoq: have removed the negative reward associated with the wall ! It's able to solve the maze and is performing way better than others . I have pushed those changes.
<EshaanAgarwal[m]> "From a 0 state if the agents..." <- I get your point. But I'm tryin to keep things binary here. Win/Lose, that's it
Btw, one way we could mkae this interesting
is limiting the number of steps
jonpsy[m]: I have done that now ! It's performing in that too.
I think we have that feature already
number of steps thing?
jonpsy[m]: Yes but I guess for a simple test this should be fine.
jonpsy[m]: Yes ! Can you elaborate ?
Yeah, so I for ex if you have a maze
It's like a race basically, and you have a time limit. If you don't find within that time limit, you lose
that'll help for graceful exit in case an agent gets stuck in infinite back & forth
jonpsy[m]: We are doing that already
It's max number of steps! That I have set as 120 for now but we can reduce it
we should defo play with that param
When it takes more than the max steps it loses and game terminates
jonpsy[m]: I will try and share the results !
For now, this maze is way too easy. A generic DP can solve this
jonpsy[m]: Yeah you could say the same for the but flipping task too
we should aim for bigger, more constraint maze. How were the other policies faring in this regard?
s/this/our/, s/regard/current maze/
jonpsy[m]: Not able to solve it most of the times ! Avg returns were around 0.5 but this got 1 in almost 70 epiosdes
Really outperformed
that's weird....
jonpsy[m]: Weird how ?
Not 70 on all runs but it was able to solve it ! Random replay was around 0.5-0.6 and moving around that ! I am saying avg return over 50 episodes
A generic RL can solve Frozen lake problem reasonably well.
It was able to solve the game in some epsiodes
* reasonably well. That is, without neural net, simple value table approach
jonpsy[m]: It can solve but in the same order of samples ! I guess not
Ok, let's solidify this then.
Let's go for frozen lake game
jonpsy[m]: Do I have to implement that ?
it's available in openai gym
jonpsy[m]: Oh okay ! Let me check that out
Shouldn't we focus on the documentation and other stuff ! As the deadline is nearing ?
but not for C++ ig, but I can show you a link
EshaanAgarwal[m]: Have you not started it already?
you've commented the codes, right?
jonpsy[m]: I have but a little guidance on what all you expect would help.
i see, i've worked on ensmallen documnetation. Never on mlpack documentation
jonpsy[m]: Yess ! I have.
EshaanAgarwal[m]: I mean this with reference to gsoc submission
Oh that
jonpsy[m]: Would there be difference ?
I missed the context of the conversation, but the mlpack documentation should ideally someday be like the ensmallen documentation but it is not there yet 😃 needs a lot of work...
hey there rcurtin , perfect timing!
So I was wondering, if we add a new method, is there anywhere we need to document the method (Except the code comments)
jonpsy[m]: Meanwhile can we merge HER ? So that we could work on PPO or the Frozen lake env ?
if you want, you could write a tutorial and add it to `doc/tutorials/`, but that's often a lot of work; for now, we probably should leave the user-facing documentation as comments in the code, and then as time goes on (maybe if we go GSoD?), we can extract all that into a Markdown file like ensmallen
but at least personally that is what I'd like to see---it's much easier to maintain
for ensmallen maintaining that documentation is a little easier because the scope/task of the library is so limited; it will be harder for mlpack to figure out how to organize it all
but I think it can be done
rcurtin[m]: Defo to be looked at during GSoD
EshaanAgarwal[m]: jonpsy: can we discuss how we proceed from here ?
It's still a little unsettling to me...
jonpsy[m]: Unsettling in the sense ?
let's stick with maze itself. Don't think we have time for anything else
As for limiting steps to solve the maze ! I will do that. But I feel for testing purposes the maze should do.
jonpsy[m]: Ok and from here ?
Increase complexity of maze
jonpsy[m]: Should we ? By how much ?
zoq: ping
had a question
Can you compare this with the existing RL policy, HER should be better and not worse.
zoq[m]: HER was better ! They weren't able to perform better when I tried RandomReplay and Prioritzed Replay
they were converging though, right?
just, not as often?
jonpsy[m]: No converging the test ! For converging it needs to have avg return 1 in last 50 epiosdes
can you list the policies, with the neural net and their average return?
I set the threshold high so that HER performance can be gauged
jonpsy[m]: Will have to run it to give you numbers. Give me some minutes !
Cool, always track the numbers in a doc.
jonpsy[m]: Will update the numbers in the doc with screen shots and ping you !