rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
<EshaanAgarwal[m]> jonpsy: zoq have implemented maze environment and pushed the code to PR. Just need to fix a couple of things in it. Meanwhile it would be great if you guys can take a look.
<jonpsy[m]> <EshaanAgarwal[m]> "jonpsy: zoq same results with..." <- keep 256, but increase depth
<jonpsy[m]> 256, 64, 64 like that perhasp
<EshaanAgarwal[m]> <jonpsy[m]> "256, 64, 64 like that perhasp" <- Shouldn't last layer as 64 give index out of bounds ?
<jonpsy[m]> i meant hidden layers
<EshaanAgarwal[m]> jonpsy[m]: Ok I will make a custom network and then try.
<EshaanAgarwal[m]> jonpsy: zoq: our test for the maze environment is converging ! I have fixed the environment.
<EshaanAgarwal[m]> Can you please take a look at it whenever you are free ?
<jonpsy[m]> one question
<EshaanAgarwal[m]> jonpsy[m]: Yes ;)
<EshaanAgarwal[m]> * Yes
<jonpsy[m]> what about other policies
<jonpsy[m]> just to be sure, our env just returns `0` right?
<jonpsy[m]> and only returns `1` when it reaches the goal (aka where `1` was setup)
<jonpsy[m]> <EshaanAgarwal[m]> "Ok I will make a custom network..." <- also keep decreasing complexity while it still converges
<EshaanAgarwal[m]> jonpsy[m]: Returns -1 if it's wall or out of bound of maze
<EshaanAgarwal[m]> Other replays weren't even giving positive avg return
<jonpsy[m]> It shouldn't "return" anything when it hits a wall
<EshaanAgarwal[m]> But this was consistently giving positive return of 0.9 in almost 100 episodes
<jonpsy[m]> it should just go back
<jonpsy[m]> Basically wall is there to restrict movements
<jonpsy[m]> so hitting a wall should be "unviable" path
<EshaanAgarwal[m]> jonpsy[m]: But that was a wrong step so we should give it a negative reward right ?
<EshaanAgarwal[m]> jonpsy[m]: It is going back and side by side I am giving it a -1 reward for wrong action
<EshaanAgarwal[m]> EshaanAgarwal[m]: I have implemented this in the code
<jonpsy[m]> you could check if the path would lead to "-1" aka wall
<jonpsy[m]> and not go there at all
<EshaanAgarwal[m]> jonpsy[m]: Agent has the option to choose any action right ! That's the whole point that it learns to understand that it doesn't need to move to a wall
<EshaanAgarwal[m]> From a 0 state if the agents chooses an action which takes it to a wall then it will not go there but at the same time ! We will give a negative reward to it.
<EshaanAgarwal[m]> * to it for the wrong action it chose.
akhunti1 has joined #mlpack
akhunti1 has quit [Client Quit]
akhunti1 has joined #mlpack
eshaan has joined #mlpack
akhunti1 has quit [Quit: Client closed]
akhunti1 has joined #mlpack
eshaan has quit [*.net *.split]
akhunti1 has quit [*.net *.split]
<EshaanAgarwal[m]> <jonpsy[m]> "you could check if the path..." <- zoq: have removed the negative reward associated with the wall ! It's still performing way better than others . I have pushed those changes.
<EshaanAgarwal[m]> s///
<EshaanAgarwal[m]> * zoq: have removed the negative reward associated with the wall ! It's able to solve the maze and is performing way better than others . I have pushed those changes.
<jonpsy[m]> <EshaanAgarwal[m]> "From a 0 state if the agents..." <- I get your point. But I'm tryin to keep things binary here. Win/Lose, that's it
<jonpsy[m]> Btw, one way we could mkae this interesting
<jonpsy[m]> is limiting the number of steps
<EshaanAgarwal[m]> jonpsy[m]: I have done that now ! It's performing in that too.
<jonpsy[m]> I think we have that feature already
<jonpsy[m]> number of steps thing?
<EshaanAgarwal[m]> jonpsy[m]: Yes but I guess for a simple test this should be fine.
<EshaanAgarwal[m]> jonpsy[m]: Yes ! Can you elaborate ?
<jonpsy[m]> Yeah, so I for ex if you have a maze
<jonpsy[m]> It's like a race basically, and you have a time limit. If you don't find within that time limit, you lose
<jonpsy[m]> that'll help for graceful exit in case an agent gets stuck in infinite back & forth
<EshaanAgarwal[m]> jonpsy[m]: We are doing that already
<EshaanAgarwal[m]> It's max number of steps! That I have set as 120 for now but we can reduce it
<jonpsy[m]> we should defo play with that param
<EshaanAgarwal[m]> When it takes more than the max steps it loses and game terminates
<EshaanAgarwal[m]> jonpsy[m]: I will try and share the results !
<jonpsy[m]> For now, this maze is way too easy. A generic DP can solve this
<EshaanAgarwal[m]> jonpsy[m]: Yeah you could say the same for the but flipping task too
<jonpsy[m]> we should aim for bigger, more constraint maze. How were the other policies faring in this regard?
<jonpsy[m]> s/this/our/, s/regard/current maze/
<EshaanAgarwal[m]> jonpsy[m]: Not able to solve it most of the times ! Avg returns were around 0.5 but this got 1 in almost 70 epiosdes
<EshaanAgarwal[m]> Really outperformed
<jonpsy[m]> that's weird....
<EshaanAgarwal[m]> jonpsy[m]: Weird how ?
<EshaanAgarwal[m]> Not 70 on all runs but it was able to solve it ! Random replay was around 0.5-0.6 and moving around that ! I am saying avg return over 50 episodes
<jonpsy[m]> A generic RL can solve Frozen lake problem reasonably well.
<EshaanAgarwal[m]> It was able to solve the game in some epsiodes
<jonpsy[m]> * reasonably well. That is, without neural net, simple value table approach
<EshaanAgarwal[m]> jonpsy[m]: It can solve but in the same order of samples ! I guess not
<jonpsy[m]> Ok, let's solidify this then.
<jonpsy[m]> Let's go for frozen lake game
<EshaanAgarwal[m]> jonpsy[m]: Do I have to implement that ?
<jonpsy[m]> it's available in openai gym
<EshaanAgarwal[m]> jonpsy[m]: Oh okay ! Let me check that out
<EshaanAgarwal[m]> Shouldn't we focus on the documentation and other stuff ! As the deadline is nearing ?
<jonpsy[m]> but not for C++ ig, but I can show you a link
<jonpsy[m]> EshaanAgarwal[m]: Have you not started it already?
<jonpsy[m]> you've commented the codes, right?
<EshaanAgarwal[m]> jonpsy[m]: I have but a little guidance on what all you expect would help.
<jonpsy[m]> i see, i've worked on ensmallen documnetation. Never on mlpack documentation
<EshaanAgarwal[m]> jonpsy[m]: Yess ! I have.
<EshaanAgarwal[m]> EshaanAgarwal[m]: I mean this with reference to gsoc submission
<jonpsy[m]> Oh that
<EshaanAgarwal[m]> jonpsy[m]: Would there be difference ?
<rcurtin[m]> I missed the context of the conversation, but the mlpack documentation should ideally someday be like the ensmallen documentation but it is not there yet 😃 needs a lot of work...
<jonpsy[m]> hey there rcurtin , perfect timing!
<jonpsy[m]> So I was wondering, if we add a new method, is there anywhere we need to document the method (Except the code comments)
<EshaanAgarwal[m]> jonpsy[m]: Meanwhile can we merge HER ? So that we could work on PPO or the Frozen lake env ?
<rcurtin[m]> if you want, you could write a tutorial and add it to `doc/tutorials/`, but that's often a lot of work; for now, we probably should leave the user-facing documentation as comments in the code, and then as time goes on (maybe if we go GSoD?), we can extract all that into a Markdown file like ensmallen
<jonpsy[m]> Oh, I thought we had an mlpack equivalent of [this](https://github.com/mlpack/ensmallen/tree/master/doc)
<rcurtin[m]> not yet :)
<rcurtin[m]> but at least personally that is what I'd like to see---it's much easier to maintain
<rcurtin[m]> for ensmallen maintaining that documentation is a little easier because the scope/task of the library is so limited; it will be harder for mlpack to figure out how to organize it all
<rcurtin[m]> but I think it can be done
<jonpsy[m]> rcurtin[m]: Defo to be looked at during GSoD
<EshaanAgarwal[m]> EshaanAgarwal[m]: jonpsy: can we discuss how we proceed from here ?
<jonpsy[m]> It's still a little unsettling to me...
<EshaanAgarwal[m]> jonpsy[m]: Unsettling in the sense ?
<jonpsy[m]> let's stick with maze itself. Don't think we have time for anything else
<EshaanAgarwal[m]> As for limiting steps to solve the maze ! I will do that. But I feel for testing purposes the maze should do.
<EshaanAgarwal[m]> jonpsy[m]: Ok and from here ?
<jonpsy[m]> Increase complexity of maze
<EshaanAgarwal[m]> jonpsy[m]: Should we ? By how much ?
<jonpsy[m]> zoq: ping
<jonpsy[m]> had a question
<jonpsy[m]> there?
<zoq[m]> Can you compare this with the existing RL policy, HER should be better and not worse.
<EshaanAgarwal[m]> zoq[m]: HER was better ! They weren't able to perform better when I tried RandomReplay and Prioritzed Replay
<jonpsy[m]> they were converging though, right?
<jonpsy[m]> just, not as often?
<EshaanAgarwal[m]> jonpsy[m]: No converging the test ! For converging it needs to have avg return 1 in last 50 epiosdes
<EshaanAgarwal[m]> 0.99*
<jonpsy[m]> can you list the policies, with the neural net and their average return?
<EshaanAgarwal[m]> I set the threshold high so that HER performance can be gauged
<EshaanAgarwal[m]> jonpsy[m]: Will have to run it to give you numbers. Give me some minutes !
<jonpsy[m]> Cool, always track the numbers in a doc.
<EshaanAgarwal[m]> jonpsy[m]: Will update the numbers in the doc with screen shots and ping you !
<EshaanAgarwal[m]> <jonpsy[m]> "Cool, always track the numbers..." <- posted the numbers in doc - https://docs.google.com/document/d/1csJmlG9u9AL_1g_U6QWM-75_Gf9jY4AObK0k9COo5Xo/edit?usp=sharing
<EshaanAgarwal[m]> EshaanAgarwal[m]: jonpsy: zoq
<EshaanAgarwal[m]> <jonpsy[m]> "Cool, always track the numbers..." <- How should we move from here on ?
brongulus has joined #mlpack
_whitelogger has joined #mlpack
brongulus has quit [Ping timeout: 272 seconds]