#mlpack on 2022-11-02 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

00:17 <EshaanAgarwal[m]> jonpsy: zoq have implemented maze environment and pushed the code to PR. Just need to fix a couple of things in it. Meanwhile it would be great if you guys can take a look.

01:54 <jonpsy[m]> <EshaanAgarwal[m]> "jonpsy: zoq same results with..." <- keep 256, but increase depth

01:55 <jonpsy[m]> 256, 64, 64 like that perhasp

06:28 <EshaanAgarwal[m]> <jonpsy[m]> "256, 64, 64 like that perhasp" <- Shouldn't last layer as 64 give index out of bounds ?

06:28 <jonpsy[m]> i meant hidden layers

06:29 <EshaanAgarwal[m]> jonpsy[m]: Ok I will make a custom network and then try.

07:35 <EshaanAgarwal[m]> jonpsy: zoq: our test for the maze environment is converging ! I have fixed the environment.

07:35 <EshaanAgarwal[m]> Can you please take a look at it whenever you are free ?

09:17 <jonpsy[m]> one question

09:18 <EshaanAgarwal[m]> jonpsy[m]: Yes ;)

09:19 <EshaanAgarwal[m]> * Yes

09:21 <jonpsy[m]> what about other policies

09:21 <jonpsy[m]> just to be sure, our env just returns `0` right?

09:21 <jonpsy[m]> and only returns `1` when it reaches the goal (aka where `1` was setup)

09:22 <jonpsy[m]> <EshaanAgarwal[m]> "Ok I will make a custom network..." <- also keep decreasing complexity while it still converges

09:22 <EshaanAgarwal[m]> jonpsy[m]: Returns -1 if it's wall or out of bound of maze

09:22 <EshaanAgarwal[m]> Other replays weren't even giving positive avg return

09:23 <jonpsy[m]> It shouldn't "return" anything when it hits a wall

09:23 <EshaanAgarwal[m]> But this was consistently giving positive return of 0.9 in almost 100 episodes

09:23 <jonpsy[m]> it should just go back

09:23 <jonpsy[m]> Basically wall is there to restrict movements

09:23 <jonpsy[m]> so hitting a wall should be "unviable" path

09:23 <EshaanAgarwal[m]> jonpsy[m]: But that was a wrong step so we should give it a negative reward right ?

09:24 <EshaanAgarwal[m]> jonpsy[m]: It is going back and side by side I am giving it a -1 reward for wrong action

09:24 <EshaanAgarwal[m]> EshaanAgarwal[m]: I have implemented this in the code

09:24 <jonpsy[m]> you could check if the path would lead to "-1" aka wall

09:24 <jonpsy[m]> and not go there at all

09:25 <EshaanAgarwal[m]> jonpsy[m]: Agent has the option to choose any action right ! That's the whole point that it learns to understand that it doesn't need to move to a wall

09:27 <EshaanAgarwal[m]> From a 0 state if the agents chooses an action which takes it to a wall then it will not go there but at the same time ! We will give a negative reward to it.

09:27 <EshaanAgarwal[m]> * to it for the wrong action it chose.

09:38 akhunti1 has joined #mlpack

09:39 akhunti1 has quit [Client Quit]

09:40 akhunti1 has joined #mlpack

09:43 eshaan has joined #mlpack

09:45 akhunti1 has quit [Quit: Client closed]

09:45 akhunti1 has joined #mlpack

10:05 eshaan has quit [*.net *.split]

10:05 akhunti1 has quit [*.net *.split]

11:43 <EshaanAgarwal[m]> <jonpsy[m]> "you could check if the path..." <- zoq: have removed the negative reward associated with the wall ! It's still performing way better than others . I have pushed those changes.

11:45 <EshaanAgarwal[m]> s///

11:45 <EshaanAgarwal[m]> * zoq: have removed the negative reward associated with the wall ! It's able to solve the maze and is performing way better than others . I have pushed those changes.

15:48 <jonpsy[m]> <EshaanAgarwal[m]> "From a 0 state if the agents..." <- I get your point. But I'm tryin to keep things binary here. Win/Lose, that's it

15:49 <jonpsy[m]> Btw, one way we could mkae this interesting

15:49 <jonpsy[m]> is limiting the number of steps

15:49 <EshaanAgarwal[m]> jonpsy[m]: I have done that now ! It's performing in that too.

15:49 <jonpsy[m]> I think we have that feature already

15:49 <jonpsy[m]> number of steps thing?

15:49 <EshaanAgarwal[m]> jonpsy[m]: Yes but I guess for a simple test this should be fine.

15:49 <EshaanAgarwal[m]> jonpsy[m]: Yes ! Can you elaborate ?

15:50 <jonpsy[m]> Yeah, so I for ex if you have a maze

15:50 <jonpsy[m]> It's like a race basically, and you have a time limit. If you don't find within that time limit, you lose

15:51 <jonpsy[m]> that'll help for graceful exit in case an agent gets stuck in infinite back & forth

15:51 <EshaanAgarwal[m]> jonpsy[m]: We are doing that already

15:51 <EshaanAgarwal[m]> It's max number of steps! That I have set as 120 for now but we can reduce it

15:51 <jonpsy[m]> we should defo play with that param

15:51 <EshaanAgarwal[m]> When it takes more than the max steps it loses and game terminates

15:52 <EshaanAgarwal[m]> jonpsy[m]: I will try and share the results !

15:52 <jonpsy[m]> For now, this maze is way too easy. A generic DP can solve this

15:52 <EshaanAgarwal[m]> jonpsy[m]: Yeah you could say the same for the but flipping task too

15:52 <jonpsy[m]> we should aim for bigger, more constraint maze. How were the other policies faring in this regard?

15:53 <jonpsy[m]> s/this/our/, s/regard/current maze/

15:53 <EshaanAgarwal[m]> jonpsy[m]: Not able to solve it most of the times ! Avg returns were around 0.5 but this got 1 in almost 70 epiosdes

15:53 <EshaanAgarwal[m]> Really outperformed

15:54 <jonpsy[m]> that's weird....

15:54 <EshaanAgarwal[m]> jonpsy[m]: Weird how ?

15:56 <EshaanAgarwal[m]> Not 70 on all runs but it was able to solve it ! Random replay was around 0.5-0.6 and moving around that ! I am saying avg return over 50 episodes

15:56 <jonpsy[m]> A generic RL can solve Frozen lake problem reasonably well.

15:56 <EshaanAgarwal[m]> It was able to solve the game in some epsiodes

15:56 <jonpsy[m]> * reasonably well. That is, without neural net, simple value table approach

15:56 <EshaanAgarwal[m]> jonpsy[m]: It can solve but in the same order of samples ! I guess not

15:56 <jonpsy[m]> Ok, let's solidify this then.

15:56 <jonpsy[m]> Let's go for frozen lake game

15:57 <EshaanAgarwal[m]> jonpsy[m]: Do I have to implement that ?

15:57 <jonpsy[m]> it's available in openai gym

15:58 <EshaanAgarwal[m]> jonpsy[m]: Oh okay ! Let me check that out

15:58 <EshaanAgarwal[m]> Shouldn't we focus on the documentation and other stuff ! As the deadline is nearing ?

15:58 <jonpsy[m]> but not for C++ ig, but I can show you a link

15:59 <jonpsy[m]> EshaanAgarwal[m]: Have you not started it already?

15:59 <jonpsy[m]> you've commented the codes, right?

15:59 <EshaanAgarwal[m]> jonpsy[m]: I have but a little guidance on what all you expect would help.

16:00 <jonpsy[m]> i see, i've worked on ensmallen documnetation. Never on mlpack documentation

16:00 <EshaanAgarwal[m]> jonpsy[m]: Yess ! I have.

16:00 <EshaanAgarwal[m]> EshaanAgarwal[m]: I mean this with reference to gsoc submission

16:00 <jonpsy[m]> Oh that

16:01 <EshaanAgarwal[m]> jonpsy[m]: Would there be difference ?

16:01 <jonpsy[m]> https://github.com/jonpsy/GSoC-2021-mlpack

16:01 <rcurtin[m]> I missed the context of the conversation, but the mlpack documentation should ideally someday be like the ensmallen documentation but it is not there yet 😃 needs a lot of work...

16:02 <jonpsy[m]> hey there rcurtin , perfect timing!

16:02 <jonpsy[m]> So I was wondering, if we add a new method, is there anywhere we need to document the method (Except the code comments)

16:02 <EshaanAgarwal[m]> jonpsy[m]: Meanwhile can we merge HER ? So that we could work on PPO or the Frozen lake env ?

16:03 <rcurtin[m]> if you want, you could write a tutorial and add it to `doc/tutorials/`, but that's often a lot of work; for now, we probably should leave the user-facing documentation as comments in the code, and then as time goes on (maybe if we go GSoD?), we can extract all that into a Markdown file like ensmallen

16:04 <jonpsy[m]> Oh, I thought we had an mlpack equivalent of [this](https://github.com/mlpack/ensmallen/tree/master/doc)

16:04 <rcurtin[m]> not yet :)

16:04 <rcurtin[m]> but at least personally that is what I'd like to see---it's much easier to maintain

16:05 <rcurtin[m]> for ensmallen maintaining that documentation is a little easier because the scope/task of the library is so limited; it will be harder for mlpack to figure out how to organize it all

16:05 <rcurtin[m]> but I think it can be done

16:05 <jonpsy[m]> rcurtin[m]: Defo to be looked at during GSoD

16:07 <EshaanAgarwal[m]> EshaanAgarwal[m]: jonpsy: can we discuss how we proceed from here ?

16:10 <jonpsy[m]> It's still a little unsettling to me...

16:10 <EshaanAgarwal[m]> jonpsy[m]: Unsettling in the sense ?

16:11 <jonpsy[m]> let's stick with maze itself. Don't think we have time for anything else

16:11 <EshaanAgarwal[m]> As for limiting steps to solve the maze ! I will do that. But I feel for testing purposes the maze should do.

16:11 <EshaanAgarwal[m]> jonpsy[m]: Ok and from here ?

16:12 <jonpsy[m]> Increase complexity of maze

16:12 <EshaanAgarwal[m]> jonpsy[m]: Should we ? By how much ?

16:12 <jonpsy[m]> zoq: ping

16:13 <jonpsy[m]> had a question

16:13 <jonpsy[m]> there?

16:14 <zoq[m]> Can you compare this with the existing RL policy, HER should be better and not worse.

16:16 <EshaanAgarwal[m]> zoq[m]: HER was better ! They weren't able to perform better when I tried RandomReplay and Prioritzed Replay

16:16 <jonpsy[m]> they were converging though, right?

16:16 <jonpsy[m]> just, not as often?

16:17 <EshaanAgarwal[m]> jonpsy[m]: No converging the test ! For converging it needs to have avg return 1 in last 50 epiosdes

16:18 <EshaanAgarwal[m]> 0.99*

16:19 <jonpsy[m]> can you list the policies, with the neural net and their average return?

16:19 <EshaanAgarwal[m]> I set the threshold high so that HER performance can be gauged

16:19 <EshaanAgarwal[m]> jonpsy[m]: Will have to run it to give you numbers. Give me some minutes !

16:21 <jonpsy[m]> Cool, always track the numbers in a doc.

16:23 <EshaanAgarwal[m]> jonpsy[m]: Will update the numbers in the doc with screen shots and ping you !

17:18 <EshaanAgarwal[m]> <jonpsy[m]> "Cool, always track the numbers..." <- posted the numbers in doc - https://docs.google.com/document/d/1csJmlG9u9AL_1g_U6QWM-75_Gf9jY4AObK0k9COo5Xo/edit?usp=sharing

17:18 <EshaanAgarwal[m]> EshaanAgarwal[m]: jonpsy: zoq

17:35 <EshaanAgarwal[m]> <jonpsy[m]> "Cool, always track the numbers..." <- How should we move from here on ?

20:59 brongulus has joined #mlpack

21:22 _whitelogger has joined #mlpack

21:28 brongulus has quit [Ping timeout: 272 seconds]