<jonpsy[m]>
Eshaan Agarwal: where are we with this?
<EshaanAgarwal[m]>
<jonpsy[m]> "Eshaan Agarwal: where are we..." <- need some time ! will update you by evening.
<jonpsy[m]>
ok
<jonpsy[m]>
You do realise that our submission time is spitting distance away.
<jonpsy[m]>
Lets get atleast one deliverable done
<EshaanAgarwal[m]>
jonpsy[m]: yes ! i am sorry. i just had my college classes.
<jonpsy[m]>
ok
<EshaanAgarwal[m]>
<jonpsy[m]> "ok" <- I fixed couple of things more but i am not able to get positive episode return.
<EshaanAgarwal[m]>
Things i fixed - a)Reward and IsEnd were getting stored in wrong way for sampling during training of agent b)Environment doesnt give positive reward incase the agent fails ( no of steps > allowed steps) c) Because of the way we implemented Q learning after certain exploration steps for each next transition we were sending agent to perform optimization in its policy. My previous implementation created redundant transition so i fixed to
<EshaanAgarwal[m]>
store her transition only when its the episode is about to be terminated i.e last transition of episode.
<EshaanAgarwal[m]>
pushing this code
<EshaanAgarwal[m]>
<EshaanAgarwal[m]> "I fixed couple of things more..." <- i was checking the action made by agent ! it was repeatedly 0 index and that bit was just flipping through transitions.
<EshaanAgarwal[m]>
<EshaanAgarwal[m]> "i was checking the action made..." <- i am not sure why its not taking other indexes as random action for exploration
<EshaanAgarwal[m]>
jonpsy: zoq can we do it this way ? for rewarding the agent we can set it as the number of indexes it got right with respect to the goal.
<EshaanAgarwal[m]>
example - 1 0 1 0 0 will have 2.0 as reward if the goal is 1 1 1 1 1. This way situation isnt too much sparse. HER will give the advantage of solving it quickly
<jonpsy[m]>
<EshaanAgarwal[m]> "jonpsy: zoq can we do it this..." <- Nope
<jonpsy[m]>
That's reward engineering, we're back to square one
<EshaanAgarwal[m]>
<jonpsy[m]> "That's reward engineering, we're..." <- I am saying this because i went through the code thoroughly and I am pretty much sure that there isn't any issue with her. Reward is just so much sparse that it's just repeatedly flipping the 0th but and not moving to something else. I tried other replay and even small length of binary vector. But it's the same case
<jonpsy[m]>
So increase iter
<EshaanAgarwal[m]>
It's getting quite difficult for it.
<EshaanAgarwal[m]>
jonpsy[m]: I have ! Almost 200 transition in 1 epsiode and that too 1000 episodes
<jonpsy[m]>
And it has yet to achieve the real goal atleast once
<jonpsy[m]>
correct?
<EshaanAgarwal[m]>
EshaanAgarwal[m]: But if I do it the way I mentioned then it performing way better then other replays.
<EshaanAgarwal[m]>
jonpsy[m]: Yes or either the actions exploration is not coming of as it should ! But that's not the part we touched in our code.
<EshaanAgarwal[m]>
EshaanAgarwal[m]: Almost giving avg reward of 1100 out of 2000 where other replays really are struggling to get 950 in a 10 bit binary vector
<EshaanAgarwal[m]>
EshaanAgarwal[m]: Its repeatedly 0 return and 0 as action for index to be flipped.
<jonpsy[m]>
<EshaanAgarwal[m]> "example - 1 0 1 0 0 will have 2..." <- Even if we follow this path, this approach isn't correct. There's too much space for random luck to take part here.
<jonpsy[m]>
Besides, if it was converging, then increasing iter should've worked.
<EshaanAgarwal[m]>
jonpsy[m]: I am just saying that maybe the environment is too difficult for it to solve ! By making the reward more directing it's performing and converging to a threshold of 55 percent and maybe more if we less the length of vector.
<jonpsy[m]>
decrease bit length
<jonpsy[m]>
10 is too much. Start with 3-4
<EshaanAgarwal[m]>
jonpsy[m]: I did 5 ! It was same
<jonpsy[m]>
try 4
<jonpsy[m]>
its exponential
<EshaanAgarwal[m]>
jonpsy[m]: Okay with the previous reward scheme ?of env right ?
<jonpsy[m]>
yes
<jonpsy[m]>
keep decreasing till it converges. At min you can go to `2`
<EshaanAgarwal[m]>
jonpsy[m]: Ok let me try this ! I will let you know as soon as it's done.
<zoq[m]>
<EshaanAgarwal[m]> "Ok let me try this ! I will..." <- I also think we should try another env, to see if the policy is just not working for this one.
<EshaanAgarwal[m]>
zoq[m]: But getting a goal based environment in C++ might be a challenge. I think we can meanwhile assess it's performance by making the reward scheme a little bit simpler ( as I proposed ). Because I worked better on 10 bit vector environment than others. For a smaller length of vector it should show good results
<EshaanAgarwal[m]>
Nevertheless I would love to know what you guys think would be the optimum thing.
<zoq[m]>
You could easily write a simple maze env which is goal based.
<EshaanAgarwal[m]>
<jonpsy[m]> "keep decreasing till it converge..." <- I went till 2 ! 0 episode return and I think it's still just flipping one bit continuously
<zoq[m]>
Iām with jonpsy if we start modifying the reward scheme it will diverge from the actual implementation, so instead of finding a proper solution we are masking it.
<zoq[m]>
That said maybe our network is too small? Maybe it converges in 1 of 10 cases?
<EshaanAgarwal[m]>
zoq[m]: Can you describe this further ? In short amount of time we have, will it be a good bet ?
<EshaanAgarwal[m]>
zoq[m]: We can try increasing the network but I tried multiple runs ! It doesn't
<EshaanAgarwal[m]>
Throught all the 1000 episodes not even one positive reward return
<EshaanAgarwal[m]>
Meanwhile can you please take a look to make sure, I haven't done anything basic in the wrong way ! I have used the gdb and found a couple of mistakes in the past day !
<EshaanAgarwal[m]>
EshaanAgarwal[m]: More eyes might help. š
<zoq[m]>
EshaanAgarwal[m]: Yes, will go through the code.
<zoq[m]>
EshaanAgarwal[m]: Do you have a reference implementation that works?
<EshaanAgarwal[m]>
zoq[m]: Actually I used a couple of references but I am not sure if they work ! One was based on the Intel Coach Library. Also the sturcture of Q implementation was different in ours so i did according to that
<EshaanAgarwal[m]>
EshaanAgarwal[m]: Apart from that I used the psuedocode given in the paper as reference. I will post all the links on the PR.
<EshaanAgarwal[m]>
zoq[m]: Okay I will take a look into this.
<zoq[m]>
EshaanAgarwal[m]: What was the problem with the intel lib?
<EshaanAgarwal[m]>
zoq[m]: I mean I haven't used that. I just went through the code for help. I will also see if I can work an example from it
<jonpsy[m]>
<EshaanAgarwal[m]> "Can you describe this further..." <- So you could create a arma::mat, consisting of 0, -1, 1. -1 is forbidden (wall), 0 is empty room, 1 is reward which is only available in one block.
<jonpsy[m]>
Fill 2d mat with 0, -1 at random. Finally choose one block at random and make it +1. Ez pz
<jonpsy[m]>
Also, agreed with zoq perhaps we could increase complexity of network
<EshaanAgarwal[m]>
<jonpsy[m]> "Fill 2d mat with 0, -1 at random..." <- Ok and action should be that agent chooses the row and column to advance to ! If it's +1 then it advances there otherwise it's asked to choose once again ?
<EshaanAgarwal[m]>
Can it go back to previous visited states ?
<EshaanAgarwal[m]>
<jonpsy[m]> "Also, agreed with zoq perhaps..." <- I will try this in sometime and let you know the results ! Currently we have this
<jonpsy[m]>
<EshaanAgarwal[m]> "Ok and action should be that..." <- Yeah, going back to previous states should be fine. It's not the game's fauult the agent is stuck
<jonpsy[m]>
moves should be +1 in one of 4 directins, no diagonals allowed
<akhunti1[m]>
Hi rcurtin Thanks for your help . Now i am able to solved the issue .
<akhunti1[m]>
But now i am getting another issue [ ImportError: libboost_program_options-mt.so.1.53.0: cannot open shared object file: No such file or directory ] . for that i installed boost_1_53_0 library.
<akhunti1[m]>
but not able to find this file [ libboost_program_options-mt.so.1.53.0 ] inside the folder .
<rcurtin[m]>
Great to hear you got the other issue resolved; I think the `-mt` specifies that you are on Windows? Make sure you have that `-mt` version, and also make sure that your `LD_LIBRARY_PATH` (or runtime linker search path) is set to find the right directory
<rcurtin[m]>
also, shrit you will be happy to hear that I got the Python bindings to compile down to a shared library size of about ~500kb each š this is a significant improvement over the previous version of mlpack
<shrit[m]>
Brilliant work, yes that makes me happy
<akhunti1[m]>
Hi rcurtin Any idea where this file [ libboost_program_options-mt.so.1.53.0 ] located inside boost_1_53_0 directory . Actually i am using linux system .
<akhunti1[m]>
I downloaded boost_1_53_0.tar.gz file .and extracted it . As i am using it to create docker image .
<rcurtin[m]>
shouldn't it be in the `lib/` subdirectory of the `boost_1_53_0` directory? or something similarly named
jjb[m] has joined #mlpack
<rcurtin[m]>
you may consider installing boost via your package manager instead, if you can
<EshaanAgarwal[m]>
<EshaanAgarwal[m]> "Should I make it 256,256,2 ?" <- jonpsy: zoq same results with this ! should i try 512,512,2 as well ?
<akhunti1[m]>
This is the structure , I got after installation, but not able to find this file [ [ libboost_program_options-mt.so.] .
<akhunti1[m]>
Any guess where the file located š
<rcurtin[m]>
Did you look directly in the `libs/` directory? that is where all the .so files should be
<rcurtin[m]>
You could also use, e.g., a tool like `find` to search for them instead of having me guess at it š
<akhunti1[m]>
yes , rcurtin I used find to search ,but did not find the file .
<akhunti1[m]>
I thought something i did wrong during installation
<rcurtin[m]>
you might try searching inexactly; there should be a `libboost_program_options*.so` of some sort in what you downloaded; if not, maybe you downloaded the wrong package
<akhunti1[m]>
I am getting this error actually , when i am installing boost_1_53_0
<akhunti1[m]>
But to run Mlpack 3.1.1 it is expecting boost 1.53.0
<rcurtin[m]>
what did you try to resolve the error?
<akhunti1[m]>
I download boost1_53_0 , and run $ ./bootstrap.sh command to install
<rcurtin[m]>
okay, and did you attempt any debugging or investigation of the error that you presented before you asked me to look at it?
<akhunti1[m]>
I installed python dev package as it is showing [ fatal error: pyconfig.h: No such file or directory
<akhunti1[m]>
# include <pyconfig.h> ]
<akhunti1[m]>
Hi rcurtin I installed python dev package , as it is showing [ fatal error: pyconfig.h: No such file or directory
<akhunti1[m]>
# include <pyconfig.h>
<akhunti1[m]>
but still getting same error .
<akhunti1[m]>
Is there any workaround to resolve the issue . š
<akhunti1[m]>
Hi rcurtin if you have any workaround please let me know ,because As you know MLpack 3.1.1. need boost 1.53. for complication. And here one more constraint is that i can not use package manager to install boost .
<akhunti1[m]>
š„ let me know any thought , if you have , Thanks for your suggestion always .
<akhunti1[m]>
At least if i can get .so files for boost1.53.0 , so that i can to my docker file to compile .
<akhunti1[m]>
At least if i can get .so files for boost 1.53.0 , i can add to docker file.
<EshaanAgarwal[m]>
I am iterating over a armadillo matrix and I need to store particular row and column index into another arma::vec. Issue is that for loop I have used size_t in both row and column but I am not able to insert that into the specified vector. What can be a workaround for this ?
<EshaanAgarwal[m]>
I can't change arma::vec to arma::uvec