<jonpsy[m]>
Eshaan Agarwal: how's the gdb progress? Is only your test failing or all the test failing??
<EshaanAgarwal[m]>
<jonpsy[m]> "Eshaan Agarwal: how's the gdb..." <- I did not check other tests but I don't think they should fail. Also checking all tests takes some time. Let me do that now.
<jonpsy[m]>
That should. be the first thing to do. Added test shouldn't break existing test
<EshaanAgarwal[m]>
jonpsy[m]: Ok ! I will runn them and let you know but as far as I feel they shouldn't !
<EshaanAgarwal[m]>
If I have to check the equality of two matrices.I can simply do 'matriceA == matriceB' but this will return a matrix object. How could I convert that into a bool ?
<EshaanAgarwal[m]>
s/'/`/, s/'/`/
<EshaanAgarwal[m]>
* If I have to check the equality of two matrices.I can simply do
<EshaanAgarwal[m]>
`matriceA == matriceB`
<EshaanAgarwal[m]>
but this will return a matrix object. How could I convert that into a bool ?
<EshaanAgarwal[m]>
jonpsy: zoq: I think I have fixed the reward now.
<EshaanAgarwal[m]>
I basically wanted to check whether the two binary vectors ( goal and state ) are equal or not.
<EshaanAgarwal[m]>
jonpsy[m]: yeah that is why i needed to convert that mat object to bool ! but i couldnt find something direct to do that.
<EshaanAgarwal[m]>
EshaanAgarwal[m]: so i did this
<jonpsy[m]>
these are binary values, right?
<jonpsy[m]>
1 & 0. Right?
<EshaanAgarwal[m]>
jonpsy[m]: yes and after equality checking it will be a matrix of binary values depicting indvidual element's quality
<jonpsy[m]>
why use `arma::vec`
<EshaanAgarwal[m]>
s/quality/equality/
<EshaanAgarwal[m]>
jonpsy[m]: can you elaborate ?
<jonpsy[m]>
okay. What's the `dtype` `arma::vec` stores?
<EshaanAgarwal[m]>
jonpsy[m]: i got what you are asking but the issue with uvec was that in Train() it uses the state and action values and there we have used normal vec so it gives some kind of error. i had tried that
<jonpsy[m]>
I thought it was all templatized
<jonpsy[m]>
Nw, consider `approx_equal`
<jonpsy[m]>
check the armadillo doc link, look for `approx_equal`. Set rel. tol to a predefined value
<EshaanAgarwal[m]>
Okay !
<EshaanAgarwal[m]>
I just did the test ! One of the other test is failing but I am not sure if I have made any changes to that. It's just not converging
<EshaanAgarwal[m]>
EshaanAgarwal[m]: getting this error when i tried to compile mlpack test after i cloned it from mlpack repo master branch
<EshaanAgarwal[m]>
<EshaanAgarwal[m]> "Screenshot from 2022-10-31 11-34..." <- strangely this happened when i built it in debug mode
<jonpsy[m]>
is the issue persisting?
<EshaanAgarwal[m]>
jonpsy[m]: Not right now ! But definitely there were some different warning when I used make with debug mode
kristjansson has quit [Ping timeout: 250 seconds]
kristjansson has joined #mlpack
<EshaanAgarwal[m]>
<jonpsy[m]> "is the issue persisting?" <- Fixed the acrobat test ! It was a very trivial but small thing. Pushing changes sometime.
<jonpsy[m]>
Heard you fixed the test? Eshaan Agarwal
<jonpsy[m]>
HELL YEAH
<jonpsy[m]>
what was teh trick?
<EshaanAgarwal[m]>
jonpsy[m]: During the function calls, the next state which was passed in the function and the next state variable present in the function created a mess. So I took care of that.
<EshaanAgarwal[m]>
It was updating values in wrong variables hence wasn't learning
<jonpsy[m]>
Thought so
<jonpsy[m]>
So all RL tests are passing?
<EshaanAgarwal[m]>
jonpsy[m]: I guess !
<jonpsy[m]>
Run all test
<jonpsy[m]>
+ your own test
<EshaanAgarwal[m]>
jonpsy[m]: All RL right ?
<jonpsy[m]>
yeah
<EshaanAgarwal[m]>
Will do ! Hopefully they should work now 😀
<jonpsy[m]>
I'm having a good feeling it will
<jonpsy[m]>
so if HER passes as well; we need to benchmark, thats v. imp
<jonpsy[m]>
a concrete graph & numbers. I want a report showing HER is consistently better
<EshaanAgarwal[m]>
jonpsy[m]: For that we might require some other evironments ? 👀
<jonpsy[m]>
Lets start with bit flipping
<EshaanAgarwal[m]>
jonpsy[m]: For HER I think we should think over our thresholds for bit flipping environment. Nevertheless I have started the test. Only QLearningTest file is running ! Rest have converged and passed
<jonpsy[m]>
You told me you've taken someone else work as inspiratoin for bit flip right?
<jonpsy[m]>
Someone else had this code; what were their result?
<EshaanAgarwal[m]>
jonpsy[m]: I mean it was a fairly easy environment. But I took reference from 2 repo. Let me check if they gave their result there or not
<jonpsy[m]>
Also, pls mention the source of ur code in the file
<EshaanAgarwal[m]>
jonpsy[m]: Sure ! I will do
<EshaanAgarwal[m]>
<jonpsy[m]> "Also, pls mention the source..." <- Can I share the test file of one of the implementations here ?
<jonpsy[m]>
sure
<jonpsy[m]>
but generally paste the link of the source in code always. Any helpful doc. related to it is also welcome
<akhunti1[m]>
This is the error i am getting .and unfortunately i am not able to resolve the issue in cmakelist.txt file .
<akhunti1[m]>
If time permit could you pls look into this .
<rcurtin[m]>
akhunti1: I'm sorry, time doesn't really permit... the best I can do is give quick guesses. my suggestion would be to check that libarmadillo.so.10 exists in the system. once you have confirmed that, it looks like you are loading armadillo from inside Python? so you might want to check the exact path that Python is trying to use to load libarmadillo.so.10. unfortunately, I don't know enough about your situation to tell you precisely how to
<rcurtin[m]>
do that, so you will probably have to do some investigation and reading
<akhunti1[m]>
Thanks rcurtin for your time
<akhunti1[m]>
I will try your suggesation.
<akhunti1[m]>
rcurtin[m]: Yes I am loading armadillo from inside Python.
<EshaanAgarwal[m]>
jonpsy: zoq: I had a question. For all the imaginary transitions that I am storing should I also add their reward to the total reward of the episode ?
<jonpsy[m]>
Wouldn't make sense imo
<jonpsy[m]>
Your cricket skills isn't valuable during football & vice-versa
<jonpsy[m]>
* Nobody asks how fast you swim in a cricket match*
<EshaanAgarwal[m]>
jonpsy[m]: But kind of that's the whole point right ? We give positive reward even when we have not achieved goal according to goal strategy
<EshaanAgarwal[m]>
Just asking
<jonpsy[m]>
To guide the process yes, but it isn't the ultimate aim
<EshaanAgarwal[m]>
Becausw then no matter how ! The reward for each episode would always be 1
<EshaanAgarwal[m]>
EshaanAgarwal[m]: At Max*
<jonpsy[m]>
It's good that its adaptable but we shouldn't forget the REAL goal here
<jonpsy[m]>
perhaps we could increase iter
<EshaanAgarwal[m]>
Because final state is the goal we need to achieve for which we give it a reward in bit flipping case. In real life scenarios like picking of something by robotic arm we still could have calculated reward by calculating distance between the place where robot placed and the target place coordinates
<EshaanAgarwal[m]>
jonpsy[m]: Even then we are giving reward in the epsiode when it reached final goal right ? So for that particular transition only we would hav positive reward in the whole episode
<jonpsy[m]>
EshaanAgarwal[m]: Yes, but the entire point is that we don't engineer reward. That it should be sparse & just a "Win" "no Win" like scenario
<EshaanAgarwal[m]>
* calculated reward a good reward for our agent by calculating
<EshaanAgarwal[m]>
jonpsy[m]: Yes agreed ; I am just saying that then just for the but flip case the reward threshold would 1 ! What could be gauged to see HER performance is the number of steps the trained agent takes to reach it goal
<jonpsy[m]>
We should only get `1` reward when we actually achieve our goal
<EshaanAgarwal[m]>
jonpsy[m]: Yes ! And in our case we also don't have any intermediate reward ! The only performance HER should be bringing to the table should be that it solves to the goal faster then others which might not even solve it
<jonpsy[m]>
yep
<jonpsy[m]>
HER engineers rewards for itself, so env by itself should just give `1` only when it reaches true goal.
<EshaanAgarwal[m]>
EshaanAgarwal[m]: Okay then I might need to see the test again ! Because threshold wise it's working fine.
<jonpsy[m]>
First, do it freely. See how many iter it takes for it to achieve real goal
<EshaanAgarwal[m]>
jonpsy[m]: Yeah actually we were not checking that before. Will take a look at it and see then
<jonpsy[m]>
so it was reachin real goal?
<jonpsy[m]>
we can try:
<jonpsy[m]>
a) Use another policy, see how fast it reaches the goal (if it does at all)
<jonpsy[m]>
b) The code you've taken this from, see how iter does it take. We could use that as thresh
<EshaanAgarwal[m]>
> <@jonpsy:matrix.org> we can try:
<EshaanAgarwal[m]>
> b) The code you've taken this from, see how iter does it take. We could use that as thresh
<EshaanAgarwal[m]>
> a) Use another policy, see how fast it reaches the goal (if it does at all)
<EshaanAgarwal[m]>
yeah we can do it that way ! let me take a look at it ! All i am pointing was that setting threshold during train is not neede here
<EshaanAgarwal[m]>
jonpsy[m]: a episode runs till it get to terminal state ! that can only achieve when a) it reached its goal in that case reward is 1 or b> exploration steps got over in which case it gets 0
<EshaanAgarwal[m]>
EshaanAgarwal[m]: fthreshold for reward*
<jonpsy[m]>
EshaanAgarwal[m]: so are we sure we fall in a)?
<EshaanAgarwal[m]>
jonpsy[m]: it will reflect in episode return ! in most of the episode when its exploring we get 1 has reward
<jonpsy[m]>
so per episode, it is able to clear the true goal
<jonpsy[m]>
in most cases i.e.
<EshaanAgarwal[m]>
instead of setting reward threshold for training ! lets give it a suitable samples and then see whether it reached the goal in good number of steps or not
<EshaanAgarwal[m]>
jonpsy[m]: yes as per isterminal function! let me point you the code for it
<jonpsy[m]>
Or what we could do is, it should be able to collect true reward K% of the total episodes
<jonpsy[m]>
i.e. success_rate
<EshaanAgarwal[m]>
jonpsy[m]: yeah so we were doing it kinda wrong !
<EshaanAgarwal[m]>
jonpsy[m]: that could be done too ! i will look into it and revert back to you in some time
<jonpsy[m]>
Ok. Also once you've finalised `successRate` or whicever criteria you like. Run it on the original code repo first, then yours, then other policies
<jonpsy[m]>
& report the results
<EshaanAgarwal[m]>
jonpsy[m]: i will have to see if that repo works or not ! we can try other enviornments or maybe SAC. since that also takes replay
<jonpsy[m]>
We need to make sure its sparse
<jonpsy[m]>
& bit flip is super easy to implement
<EshaanAgarwal[m]>
jonpsy[m]: yeah ! SAC then because writing environment is an issue.
<jonpsy[m]>
What's the ETA of this?
<EshaanAgarwal[m]>
jonpsy[m]: benchmarking ? if all goes well by tomorrow
<EshaanAgarwal[m]>
meanwhile can we get ppo merged ?
<jonpsy[m]>
its fixed?
<jonpsy[m]>
did zoq push his changes?
<EshaanAgarwal[m]>
jonpsy[m]: he hasnt
<EshaanAgarwal[m]>
thats why i asked that we could wrap up maybe one implementation too side by side.
<EshaanAgarwal[m]>
i can help in doing the wrapping up things
<jonpsy[m]>
Nw, focus on getting HER up & running
<EshaanAgarwal[m]>
jonpsy[m]: It's up I guess ! Let's see how it performs.
<EshaanAgarwal[m]>
zoq: jonpsy: can you pls reopen the PPO pull request ?
<zoq[m]>
<EshaanAgarwal[m]> "zoq: jonpsy: can you pls..." <- I will open a new PR.
<coatless[m]>
Seems like some messages are going through to Slack and others aren't :'(
<coatless[m]>
Sorry for the re-open/closed PR spam.
<coatless[m]>
rcurtin: want a handle with the python bindings on conda?
<coatless[m]>
handle <-> help. 0 coffee today :'(
<rcurtin[m]>
coatless: yeah, I have been back and forth with the Slack bridging folks, but the main maintainer is on vacation right now
<rcurtin[m]>
I actually think I am getting closer with the Windows build, but if you have any specific ideas, I'm all ears. It appears that the install path for Windows is wrong, but I need to figure out what "right" is, then I can make a patch and it should be good to go