#mlpack on 2017-04-05 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:17 sumedhghaisas has quit [Ping timeout: 260 seconds]

00:23 diehumblex has quit [Quit: Connection closed for inactivity]

00:41 aashay has joined #mlpack

01:17 chenzhe has quit [Ping timeout: 246 seconds]

01:31 chenzhe has joined #mlpack

01:39 trapz has joined #mlpack

02:02 chenzhe has quit [Ping timeout: 246 seconds]

02:57 trapz has quit [Quit: trapz]

02:58 chenzhe has joined #mlpack

03:04 chenzhe has quit [Ping timeout: 246 seconds]

03:52 sumedhghaisas has joined #mlpack

04:12 vinayakvivek has joined #mlpack

04:24 sumedhghaisas has quit [Ping timeout: 260 seconds]

06:03 naxalpha has joined #mlpack

06:08 naxalpha has quit [Ping timeout: 260 seconds]

06:14 chenzhe has joined #mlpack

06:41 govg has joined #mlpack

06:43 Alvis has quit [Quit: Leaving]

06:55 hxidkd has joined #mlpack

07:05 hxidkd has quit []

07:16 govg has quit [Ping timeout: 260 seconds]

07:16 govg has joined #mlpack

07:59 chenzhe has quit [Ping timeout: 256 seconds]

08:11 mikeling has joined #mlpack

08:57 chenzhe has joined #mlpack

11:10 diehumblex has joined #mlpack

11:31 trapz has joined #mlpack

11:33 trapz has quit [Client Quit]

11:51 trapz has joined #mlpack

12:13 trapz has quit [Quit: trapz]

12:35 witness_ has quit [Quit: Connection closed for inactivity]

12:36 trapz has joined #mlpack

12:50 Trion has joined #mlpack

13:04 < rcurtin> so masterblaster may go offline today but I am not sure

13:04 < rcurtin> I am waiting for both the current lab to tell me whether they can ship it today

13:05 < rcurtin> and the destination lab to tell me what its new IP will be

13:05 < rcurtin> if I don't get both of those pieces of information this morning I'll ask to push the move to next week

13:06 < zoq> sounds good, fingers crossed

13:40 chenzhe has quit [Ping timeout: 256 seconds]

13:53 trapz has quit [Quit: trapz]

13:54 trapz has joined #mlpack

14:06 sumedhghaisas has joined #mlpack

14:07 aashay has quit [Quit: Connection closed for inactivity]

14:53 trapz has quit [Quit: trapz]

15:54 vss has joined #mlpack

16:32 Trion has quit [Quit: Have to go, see ya!]

16:47 mikeling has quit [Quit: Connection closed for inactivity]

17:23 chenzhe has joined #mlpack

17:27 vss has quit [Quit: Page closed]

17:29 < rcurtin> just called off the masterblaster move until next week... they were not going to be able to ship it until tomorrow, and that would probably mean weekend downtime

17:29 < rcurtin> better to just wait until next wednesday, and it should be up by friday

17:30 trapz has joined #mlpack

17:43 chenzhe has quit [Ping timeout: 246 seconds]

17:54 chenzhe has joined #mlpack

18:40 nish21 has joined #mlpack

18:41 nish21 has quit [Client Quit]

18:42 aashay has joined #mlpack

18:48 chenzhe has quit [Ping timeout: 246 seconds]

19:12 trapz has quit [Quit: trapz]

20:01 trapz has joined #mlpack

20:35 < sumedhghaisas> zoq: Hey Marcus... got some time? :P

20:37 < zoq> sumedhghais: Hey, sure go ahead :)

20:39 < sumedhghaisas> okay. So I am not sure about the method so maybe your comments would help me :)

20:40 < sumedhghaisas> What I thought at the start that... MCTS in most of the situation does a fair enough job... Although given a prior over the actions available it performs exceptionally.

20:40 < sumedhghaisas> As can be seen from AlphaGo

20:41 < sumedhghaisas> Although for AlphaGo... the policy network is trained first with supervised learning, quality an fast gradients

20:41 < sumedhghaisas> and then using REINFORCE

20:42 < sumedhghaisas> But given a general task where such prestored data for supervised learning is not present.. can MCTS be used to train the primary policy network?

20:43 < zoq> I'm not completely sure but AlphaGo uses somewhat similar what MCTS is right?

20:43 < sumedhghaisas> Basically the pseudo-approach that I was thinking of is following...

20:43 < zoq> ah, okay

20:44 < sumedhghaisas> Initialize policy network

20:44 < sumedhghaisas> create a target policy network

20:45 < sumedhghaisas> use policy network to play gamestarget network with MCTS to predict the best state

20:45 < sumedhghaisas> *and

20:45 < sumedhghaisas> store the action chosen by policy network with action chosen by MCTS with target network

20:46 < sumedhghaisas> use the pairs to train the network after each complete game

20:46 < sumedhghaisas> after certain number of games copy policy network to target network

20:46 < sumedhghaisas> and continue

20:47 < sumedhghaisas> this way the policy network is making the MCTS bettter with each network copy and MCTS is in turn using the sampling technique to make the policy network better

20:48 < sumedhghaisas> Okay am I making any sense?

20:48 < zoq> Yeah, sounds a little bit like what MacKenzie-Leigh did in his "An ensemble agent for Ms Pac-Man" paper.

20:48 < zoq> If I remember right ...

20:49 < sumedhghaisas> ohhh did not know that... I will take a look

20:49 < sumedhghaisas> but is it ensemble learning?

20:50 < zoq> yeah, if I remember right he dosn't talk about policy network or something similair just some ways to improve the standrd MCTS to avoid pincer.

20:50 govg has quit [Ping timeout: 260 seconds]

20:51 < sumedhghaisas> the hypothesis I am trying to test is that... given higher convergence of MCTS to the highly computational value network... I believe that such initial training of poliy network with MCTS might make the convergence of actor-critic faster

20:51 < zoq> your idea is way more sophisticated :)

20:52 < sumedhghaisas> I mean... I dont trust MCTS to do the entire job... just push policy network up faster then let value network does its job... just like what they did in AlphaGo... but they had master game dataset.. not MCTS

20:52 < sumedhghaisas> But I should look at the paper for related research

20:53 < zoq> I agree, that should influence the time to converge, really interesting idea

20:54 < zoq> Also, depending on the taks, MCTS does really well

20:54 < sumedhghaisas> I also copied the idea from Q-learning, the target network... that should remove the high variation in the gradients

20:55 < sumedhghaisas> yes... Also I looked at the theorotical background of MCTS

20:55 < sumedhghaisas> So given an initial policy... MCTS can only improve the policy... give the regretbound of UCT

20:56 < sumedhghaisas> thus the policy network has to be training in the right direction... maybe would need some gradient clipping but still

20:57 < sumedhghaisas> and if the policy network is training in the right direction ... it has to improve MCTS further as given a better prior the regret of UCT goes down... they proved it using PUCT policy

20:58 < zoq> do you have a link to the paper?

20:58 < sumedhghaisas> sure... wait let see

21:00 < sumedhghaisas> https://pdfs.semanticscholar.org/7c0c/0445e89347798800aad3497fcf2f2d27d4e6.pdf

21:00 < sumedhghaisas> this is the PUCT policy paper

21:02 < sumedhghaisas> Although I think I can improve this further... though I am having some problem with concepts there

21:02 < sumedhghaisas> now the policy network outputs a distribution over the actions

21:03 < sumedhghaisas> MCTS at the end will assign a value for each action... with their bounds..

21:03 < sumedhghaisas> with some mathematical formulation if I somehow create a rational policy from this MCTS value and their bounds

21:04 < sumedhghaisas> like probability distribution

21:04 < sumedhghaisas> then I can use KL-divergence as a loss function and try to reduce the error

21:05 < sumedhghaisas> between the probabilities given by policy network and policy defined by MCTS values

21:06 < sumedhghaisas> but I only know that MCTS values are comparative... as in the best value will tell the best action... though I am know if other value are comparative as well

21:06 < sumedhghaisas> Do you know anything about this?

21:07 < zoq> nothing that pops up right now

21:09 < sumedhghaisas> I am happy that you like the idea... Do you think it can be improved in some other way? My supervisor is reviewing my first draft as well :)

21:13 < zoq> hm, I'm not sure, has anybody ever used PUCB for some like go?

21:14 trapz has quit [Quit: trapz]

21:14 trapz has joined #mlpack

21:17 < sumedhghaisas> AlphaGo... thats where I got this paper :P

21:18 < zoq> ah neat

21:20 < zoq> I see, they use a variant of the PUCT method

21:23 < zoq> I have to read the paper, before I can think about improving the method.

21:34 chenzhe has joined #mlpack

21:43 govg has joined #mlpack

21:58 < zoq> sumedhghais: Btw. do you have any idea why the coverage decreased for commits that didn't changed the code but comments? like: https://github.com/mlpack/mlpack/commits/master?after=6b097f28d5317628130aede16f019d2abe37a268+34

22:04 chenzhe has quit [Ping timeout: 246 seconds]

22:05 vinayakvivek has quit [Quit: Connection closed for inactivity]

22:08 < sumedhghaisas> zoq: ahh that should not happen... hmmm

22:09 < sumedhghaisas> does the site provide omparison coverage?

22:10 < sumedhghaisas> the last commit you mean.. with fix typo?

22:11 < zoq> yes, I think I saw some other commits too

22:13 < zoq> here is another one: https://github.com/mlpack/mlpack/commits/master?after=6b097f28d5317628130aede16f019d2abe37a268+69 "Minor style fixes (80 columns, spaces between operations)."

22:13 < zoq> or "Add sanity check on data size."

22:14 < zoq> I haven't looked into the issue.

22:14 < sumedhghaisas> zoq: this one https://github.com/mlpack/mlpack/commit/6751acf52eefbf2a9bba22794b5e00011f01e372?

22:15 trapz has quit [Quit: trapz]

22:19 < zoq> So if I take a look at: https://coveralls.io/jobs/24559645 this is the last build for the last commit: https://github.com/mlpack/mlpack/commit/6b097f28d5317628130aede16f019d2abe37a268

22:20 < zoq> Should I see only one file under "changed"?

22:20 < zoq> instead the last commit changed the coverage of 112 files

22:44 < sumedhghaisas> zoq: this does make any sense... I also looked at the previous commits

22:45 < sumedhghaisas> the lines covered for prefixedoutstream is always 30

22:45 < sumedhghaisas> I dont think that file has changed for long time now

22:45 < sumedhghaisas> but every time it shows either increase or decrease

22:47 < sumedhghaisas> so even the lines covered is showing changes... like +22 or -13 although the value is constant

22:48 < sumedhghaisas> I am not sure if I am sending the changes to the server or if it is calculating on its own

22:48 < sumedhghaisas> I have to open the dump the code is sending to the server

22:48 < zoq> yeah there is definitely something wrong