<jonpsy[m]>
zoq: Hey, so I've sent the invite + also sent a doc of our draft idea proposal. If things go well, we can post it today for students to see.
<jonpsy[m]>
* to see. cc: fieryblade
<zoq[m]>
jonpsy[m]: Okay, see you in 12 minutes.
<jonpsy[m]>
thought it was 8:30?
<jonpsy[m]>
we're moving it to 8?
<zoq[m]>
<zoq[m]> "8pm IST ?" <- isn't that in 12 minutes?>
<jonpsy[m]>
daang
<jonpsy[m]>
my bad, lemme re-adjust
<jonpsy[m]>
hey zoq , here's an example of a very nice application of procedurally generated environment: https://www.youtube.com/watch?v=nvdZpJkT-ls. The flappy bird examples really does the concept much injustice
<zoq[m]>
<jonpsy[m]> "hey zoq , here's an example of a..." <- True, In the end it depends on what we like to use it for. Usually the idea is to show that X is able to solve a certain task, and to say if we can scale up X we can solve task Y as well.
<zoq[m]>
zoq[m]: Montezuma's Revenge has a reputation of being difficult, because you don't get a reward immediately (sparse reward system), it doesn't look fancy, but this sparse reward system makes it more challenging than other Atari games.
<jonpsy[m]>
Yes, do you know of HER?
<zoq[m]>
jonpsy[m]: From the movie?
<jonpsy[m]>
ah no, Hindsight Experience Replay ;)
<zoq[m]>
jonpsy[m]: hehe, I don't think so.
<jonpsy[m]>
so it works well for sparse reward systems
<jonpsy[m]>
it creates dense examples, from sparse examples, its working is really cool. Infact, our Multiobjective reinforcement learning algorithm was using this in backend
<jonpsy[m]>
zoq[m]: On that note, that prince of persia game which you made RL for. Does it have sparse reward as well? (only get reward when completed the level?)
<zoq[m]>
<jonpsy[m]> "On that note, that prince of..." <- In this case it's imitation learning.
<jonpsy[m]>
Oh
<jonpsy[m]>
that might be disastrous on difficult levels
<zoq[m]>
<jonpsy[m]> "that might be disastrous on..." <- Yes, was mainly just to figure out if it's possible.