rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
_slack_mlpack_16 has joined #mlpack
psydroid has quit [Ping timeout: 246 seconds]
Cadair has quit [Ping timeout: 246 seconds]
psydroid has joined #mlpack
Cadair has joined #mlpack
<jonpsy[m]> Eshaan Agarwal: where are we with this?
<EshaanAgarwal[m]> <jonpsy[m]> "Eshaan Agarwal: where are we..." <- need some time ! will update you by evening.
<jonpsy[m]> ok
<jonpsy[m]> You do realise that our submission time is spitting distance away.
<jonpsy[m]> Lets get atleast one deliverable done
<EshaanAgarwal[m]> jonpsy[m]: yes ! i am sorry. i just had my college classes.
<jonpsy[m]> ok
<EshaanAgarwal[m]> <jonpsy[m]> "ok" <- I fixed couple of things more but i am not able to get positive episode return.
<EshaanAgarwal[m]> Things i fixed - a)Reward and IsEnd were getting stored in wrong way for sampling during training of agent b)Environment doesnt give positive reward incase the agent fails ( no of steps > allowed steps) c) Because of the way we implemented Q learning after certain exploration steps for each next transition we were sending agent to perform optimization in its policy. My previous implementation created redundant transition so i fixed to
<EshaanAgarwal[m]> store her transition only when its the episode is about to be terminated i.e last transition of episode.
<EshaanAgarwal[m]> pushing this code
<EshaanAgarwal[m]> <EshaanAgarwal[m]> "I fixed couple of things more..." <- i was checking the action made by agent ! it was repeatedly 0 index and that bit was just flipping through transitions.
<EshaanAgarwal[m]> <EshaanAgarwal[m]> "i was checking the action made..." <- i am not sure why its not taking other indexes as random action for exploration
<EshaanAgarwal[m]> jonpsy: zoq can we do it this way ? for rewarding the agent we can set it as the number of indexes it got right with respect to the goal.
<EshaanAgarwal[m]> example - 1 0 1 0 0 will have 2.0 as reward if the goal is 1 1 1 1 1. This way situation isnt too much sparse. HER will give the advantage of solving it quickly
<jonpsy[m]> <EshaanAgarwal[m]> "jonpsy: zoq can we do it this..." <- Nope
<jonpsy[m]> That's reward engineering, we're back to square one
<EshaanAgarwal[m]> <jonpsy[m]> "That's reward engineering, we're..." <- I am saying this because i went through the code thoroughly and I am pretty much sure that there isn't any issue with her. Reward is just so much sparse that it's just repeatedly flipping the 0th but and not moving to something else. I tried other replay and even small length of binary vector. But it's the same case
<jonpsy[m]> So increase iter
<EshaanAgarwal[m]> It's getting quite difficult for it.
<EshaanAgarwal[m]> jonpsy[m]: I have ! Almost 200 transition in 1 epsiode and that too 1000 episodes
<jonpsy[m]> And it has yet to achieve the real goal atleast once
<jonpsy[m]> correct?
<EshaanAgarwal[m]> EshaanAgarwal[m]: But if I do it the way I mentioned then it performing way better then other replays.
<EshaanAgarwal[m]> jonpsy[m]: Yes or either the actions exploration is not coming of as it should ! But that's not the part we touched in our code.
<EshaanAgarwal[m]> EshaanAgarwal[m]: Almost giving avg reward of 1100 out of 2000 where other replays really are struggling to get 950 in a 10 bit binary vector
<EshaanAgarwal[m]> EshaanAgarwal[m]: Its repeatedly 0 return and 0 as action for index to be flipped.
<jonpsy[m]> <EshaanAgarwal[m]> "example - 1 0 1 0 0 will have 2..." <- Even if we follow this path, this approach isn't correct. There's too much space for random luck to take part here.
<jonpsy[m]> Besides, if it was converging, then increasing iter should've worked.
<EshaanAgarwal[m]> jonpsy[m]: I am just saying that maybe the environment is too difficult for it to solve ! By making the reward more directing it's performing and converging to a threshold of 55 percent and maybe more if we less the length of vector.
<jonpsy[m]> decrease bit length
<jonpsy[m]> 10 is too much. Start with 3-4
<EshaanAgarwal[m]> jonpsy[m]: I did 5 ! It was same
<jonpsy[m]> try 4
<jonpsy[m]> its exponential
<EshaanAgarwal[m]> jonpsy[m]: Okay with the previous reward scheme ?of env right ?
<jonpsy[m]> yes
<jonpsy[m]> keep decreasing till it converges. At min you can go to `2`
<EshaanAgarwal[m]> jonpsy[m]: Ok let me try this ! I will let you know as soon as it's done.
<zoq[m]> <EshaanAgarwal[m]> "Ok let me try this ! I will..." <- I also think we should try another env, to see if the policy is just not working for this one.
<EshaanAgarwal[m]> zoq[m]: But getting a goal based environment in C++ might be a challenge. I think we can meanwhile assess it's performance by making the reward scheme a little bit simpler ( as I proposed ). Because I worked better on 10 bit vector environment than others. For a smaller length of vector it should show good results
<EshaanAgarwal[m]> Nevertheless I would love to know what you guys think would be the optimum thing.
<zoq[m]> You could easily write a simple maze env which is goal based.
<EshaanAgarwal[m]> <jonpsy[m]> "keep decreasing till it converge..." <- I went till 2 ! 0 episode return and I think it's still just flipping one bit continuously
<zoq[m]> Iā€™m with jonpsy if we start modifying the reward scheme it will diverge from the actual implementation, so instead of finding a proper solution we are masking it.
<zoq[m]> That said maybe our network is too small? Maybe it converges in 1 of 10 cases?
<EshaanAgarwal[m]> zoq[m]: Can you describe this further ? In short amount of time we have, will it be a good bet ?
<EshaanAgarwal[m]> zoq[m]: We can try increasing the network but I tried multiple runs ! It doesn't
<EshaanAgarwal[m]> Throught all the 1000 episodes not even one positive reward return
<EshaanAgarwal[m]> Meanwhile can you please take a look to make sure, I haven't done anything basic in the wrong way ! I have used the gdb and found a couple of mistakes in the past day !
<EshaanAgarwal[m]> EshaanAgarwal[m]: More eyes might help. šŸ˜…
<zoq[m]> EshaanAgarwal[m]: Yes, will go through the code.
<zoq[m]> EshaanAgarwal[m]: Do you have a reference implementation that works?
<EshaanAgarwal[m]> zoq[m]: Actually I used a couple of references but I am not sure if they work ! One was based on the Intel Coach Library. Also the sturcture of Q implementation was different in ours so i did according to that
<EshaanAgarwal[m]> EshaanAgarwal[m]: Apart from that I used the psuedocode given in the paper as reference. I will post all the links on the PR.
<zoq[m]> <EshaanAgarwal[m]> "Can you describe this further..." <- https://github.com/vsindato/maze-escape would be an easy env we can implement in a day.
<EshaanAgarwal[m]> zoq[m]: Okay I will take a look into this.
<zoq[m]> EshaanAgarwal[m]: What was the problem with the intel lib?
<EshaanAgarwal[m]> zoq[m]: I mean I haven't used that. I just went through the code for help. I will also see if I can work an example from it
<jonpsy[m]> <EshaanAgarwal[m]> "Can you describe this further..." <- So you could create a arma::mat, consisting of 0, -1, 1. -1 is forbidden (wall), 0 is empty room, 1 is reward which is only available in one block.
<jonpsy[m]> Fill 2d mat with 0, -1 at random. Finally choose one block at random and make it +1. Ez pz
<jonpsy[m]> Also, agreed with zoq perhaps we could increase complexity of network
<EshaanAgarwal[m]> <jonpsy[m]> "Fill 2d mat with 0, -1 at random..." <- Ok and action should be that agent chooses the row and column to advance to ! If it's +1 then it advances there otherwise it's asked to choose once again ?
<EshaanAgarwal[m]> Can it go back to previous visited states ?
<EshaanAgarwal[m]> <jonpsy[m]> "Also, agreed with zoq perhaps..." <- I will try this in sometime and let you know the results ! Currently we have this
<EshaanAgarwal[m]> `SimpleDQN<> network(128, 128, 2);`
<EshaanAgarwal[m]> Should I make it 256,256,2 ?
<jonpsy[m]> <EshaanAgarwal[m]> "Ok and action should be that..." <- Yeah, going back to previous states should be fine. It's not the game's fauult the agent is stuck
<jonpsy[m]> moves should be +1 in one of 4 directins, no diagonals allowed
<jonpsy[m]> <EshaanAgarwal[m]> "Ok and action should be that..." <- For ex: A sample episode copuld be:... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/23f0c4d8440d7ff42a075afd51b18e9c4fdfd88e>)
<jonpsy[m]> if its in 0, its okay. But you can't move past -1 (since it is a block)
<jonpsy[m]> Also try to ensure, you don't make up creating a wall
<jonpsy[m]> * if its in 0, you can move past it. But you can't move past -1 (since it is a block). You're looking for +1 (aka your fruit/reward)
<EshaanAgarwal[m]> jonpsy[m]: What we can do is that we can fix the maze
<zoq[m]> I think as a quick test a fixed maze is fine.
<EshaanAgarwal[m]> zoq[m]: Ok
<jonpsy[m]> Might as well use the above maze
<EshaanAgarwal[m]> > <@jonpsy:matrix.org> For ex: A sample episode copuld be:... (full message at <https://libera.ems.host/_matrix/media/v3/download/libera.chat/2f9878a464e1a02417467561206d9949634c978a>)
<akhunti1[m]> Hi rcurtin Thanks for your help . Now i am able to solved the issue .
<akhunti1[m]> But now i am getting another issue [ ImportError: libboost_program_options-mt.so.1.53.0: cannot open shared object file: No such file or directory ] . for that i installed boost_1_53_0 library.
<akhunti1[m]> but not able to find this file [ libboost_program_options-mt.so.1.53.0 ] inside the folder .
<rcurtin[m]> Great to hear you got the other issue resolved; I think the `-mt` specifies that you are on Windows? Make sure you have that `-mt` version, and also make sure that your `LD_LIBRARY_PATH` (or runtime linker search path) is set to find the right directory
<rcurtin[m]> also, shrit you will be happy to hear that I got the Python bindings to compile down to a shared library size of about ~500kb each šŸ˜ƒ this is a significant improvement over the previous version of mlpack
<shrit[m]> Brilliant work, yes that makes me happy
<akhunti1[m]> Hi rcurtin Any idea where this file [ libboost_program_options-mt.so.1.53.0 ] located inside boost_1_53_0 directory . Actually i am using linux system .
<akhunti1[m]> I downloaded boost_1_53_0.tar.gz file .and extracted it . As i am using it to create docker image .
<rcurtin[m]> shouldn't it be in the `lib/` subdirectory of the `boost_1_53_0` directory? or something similarly named
jjb[m] has joined #mlpack
<rcurtin[m]> you may consider installing boost via your package manager instead, if you can
<EshaanAgarwal[m]> <EshaanAgarwal[m]> "Should I make it 256,256,2 ?" <- jonpsy: zoq same results with this ! should i try 512,512,2 as well ?
<akhunti1[m]> This is the structure , I got after installation, but not able to find this file [ [ libboost_program_options-mt.so.] .
<akhunti1[m]> Any guess where the file located šŸ™‚
<rcurtin[m]> Did you look directly in the `libs/` directory? that is where all the .so files should be
<rcurtin[m]> You could also use, e.g., a tool like `find` to search for them instead of having me guess at it šŸ˜ƒ
<akhunti1[m]> yes , rcurtin I used find to search ,but did not find the file .
<akhunti1[m]> I thought something i did wrong during installation
<rcurtin[m]> you might try searching inexactly; there should be a `libboost_program_options*.so` of some sort in what you downloaded; if not, maybe you downloaded the wrong package
<akhunti1[m]> Hi rcurtin
<akhunti1[m]> I am getting this error actually , when i am installing boost_1_53_0
<akhunti1[m]> But to run Mlpack 3.1.1 it is expecting boost 1.53.0
<rcurtin[m]> what did you try to resolve the error?
<akhunti1[m]> I download boost1_53_0 , and run $ ./bootstrap.sh command to install
<rcurtin[m]> okay, and did you attempt any debugging or investigation of the error that you presented before you asked me to look at it?
<akhunti1[m]> I installed python dev package as it is showing [ fatal error: pyconfig.h: No such file or directory
<akhunti1[m]> # include <pyconfig.h> ]
<akhunti1[m]> Hi rcurtin I installed python dev package , as it is showing [ fatal error: pyconfig.h: No such file or directory
<akhunti1[m]> # include <pyconfig.h>
<akhunti1[m]> but still getting same error .
<akhunti1[m]> Is there any workaround to resolve the issue . šŸ™‚
<akhunti1[m]> Hi rcurtin if you have any workaround please let me know ,because As you know MLpack 3.1.1. need boost 1.53. for complication. And here one more constraint is that i can not use package manager to install boost .
<akhunti1[m]> šŸ˜„ let me know any thought , if you have , Thanks for your suggestion always .
<akhunti1[m]> At least if i can get .so files for boost1.53.0 , so that i can to my docker file to compile .
<akhunti1[m]> At least if i can get .so files for boost 1.53.0 , i can add to docker file.
<EshaanAgarwal[m]> I am iterating over a armadillo matrix and I need to store particular row and column index into another arma::vec. Issue is that for loop I have used size_t in both row and column but I am not able to insert that into the specified vector. What can be a workaround for this ?
<EshaanAgarwal[m]> I can't change arma::vec to arma::uvec