verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 260 seconds]
diehumblex has quit [Quit: Connection closed for inactivity]
aashay has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
chenzhe has joined #mlpack
trapz has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
trapz has quit [Quit: trapz]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
sumedhghaisas has joined #mlpack
vinayakvivek has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
naxalpha has joined #mlpack
naxalpha has quit [Ping timeout: 260 seconds]
chenzhe has joined #mlpack
govg has joined #mlpack
Alvis has quit [Quit: Leaving]
hxidkd has joined #mlpack
hxidkd has quit []
govg has quit [Ping timeout: 260 seconds]
govg has joined #mlpack
chenzhe has quit [Ping timeout: 256 seconds]
mikeling has joined #mlpack
chenzhe has joined #mlpack
diehumblex has joined #mlpack
trapz has joined #mlpack
trapz has quit [Client Quit]
trapz has joined #mlpack
trapz has quit [Quit: trapz]
witness_ has quit [Quit: Connection closed for inactivity]
trapz has joined #mlpack
Trion has joined #mlpack
< rcurtin> so masterblaster may go offline today but I am not sure
< rcurtin> I am waiting for both the current lab to tell me whether they can ship it today
< rcurtin> and the destination lab to tell me what its new IP will be
< rcurtin> if I don't get both of those pieces of information this morning I'll ask to push the move to next week
< zoq> sounds good, fingers crossed
chenzhe has quit [Ping timeout: 256 seconds]
trapz has quit [Quit: trapz]
trapz has joined #mlpack
sumedhghaisas has joined #mlpack
aashay has quit [Quit: Connection closed for inactivity]
trapz has quit [Quit: trapz]
vss has joined #mlpack
Trion has quit [Quit: Have to go, see ya!]
mikeling has quit [Quit: Connection closed for inactivity]
chenzhe has joined #mlpack
vss has quit [Quit: Page closed]
< rcurtin> just called off the masterblaster move until next week... they were not going to be able to ship it until tomorrow, and that would probably mean weekend downtime
< rcurtin> better to just wait until next wednesday, and it should be up by friday
trapz has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
chenzhe has joined #mlpack
nish21 has joined #mlpack
nish21 has quit [Client Quit]
aashay has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
trapz has quit [Quit: trapz]
trapz has joined #mlpack
< sumedhghaisas> zoq: Hey Marcus... got some time? :P
< zoq> sumedhghais: Hey, sure go ahead :)
< sumedhghaisas> okay. So I am not sure about the method so maybe your comments would help me :)
< sumedhghaisas> What I thought at the start that... MCTS in most of the situation does a fair enough job... Although given a prior over the actions available it performs exceptionally.
< sumedhghaisas> As can be seen from AlphaGo
< sumedhghaisas> Although for AlphaGo... the policy network is trained first with supervised learning, quality an fast gradients
< sumedhghaisas> and then using REINFORCE
< sumedhghaisas> But given a general task where such prestored data for supervised learning is not present.. can MCTS be used to train the primary policy network?
< zoq> I'm not completely sure but AlphaGo uses somewhat similar what MCTS is right?
< sumedhghaisas> Basically the pseudo-approach that I was thinking of is following...
< zoq> ah, okay
< sumedhghaisas> Initialize policy network
< sumedhghaisas> create a target policy network
< sumedhghaisas> use policy network to play gamestarget network with MCTS to predict the best state
< sumedhghaisas> *and
< sumedhghaisas> store the action chosen by policy network with action chosen by MCTS with target network
< sumedhghaisas> use the pairs to train the network after each complete game
< sumedhghaisas> after certain number of games copy policy network to target network
< sumedhghaisas> and continue
< sumedhghaisas> this way the policy network is making the MCTS bettter with each network copy and MCTS is in turn using the sampling technique to make the policy network better
< sumedhghaisas> Okay am I making any sense?
< zoq> Yeah, sounds a little bit like what MacKenzie-Leigh did in his "An ensemble agent for Ms Pac-Man" paper.
< zoq> If I remember right ...
< sumedhghaisas> ohhh did not know that... I will take a look
< sumedhghaisas> but is it ensemble learning?
< zoq> yeah, if I remember right he dosn't talk about policy network or something similair just some ways to improve the standrd MCTS to avoid pincer.
govg has quit [Ping timeout: 260 seconds]
< sumedhghaisas> the hypothesis I am trying to test is that... given higher convergence of MCTS to the highly computational value network... I believe that such initial training of poliy network with MCTS might make the convergence of actor-critic faster
< zoq> your idea is way more sophisticated :)
< sumedhghaisas> I mean... I dont trust MCTS to do the entire job... just push policy network up faster then let value network does its job... just like what they did in AlphaGo... but they had master game dataset.. not MCTS
< sumedhghaisas> But I should look at the paper for related research
< zoq> I agree, that should influence the time to converge, really interesting idea
< zoq> Also, depending on the taks, MCTS does really well
< sumedhghaisas> I also copied the idea from Q-learning, the target network... that should remove the high variation in the gradients
< sumedhghaisas> yes... Also I looked at the theorotical background of MCTS
< sumedhghaisas> So given an initial policy... MCTS can only improve the policy... give the regretbound of UCT
< sumedhghaisas> thus the policy network has to be training in the right direction... maybe would need some gradient clipping but still
< sumedhghaisas> and if the policy network is training in the right direction ... it has to improve MCTS further as given a better prior the regret of UCT goes down... they proved it using PUCT policy
< zoq> do you have a link to the paper?
< sumedhghaisas> sure... wait let see
< sumedhghaisas> this is the PUCT policy paper
< sumedhghaisas> Although I think I can improve this further... though I am having some problem with concepts there
< sumedhghaisas> now the policy network outputs a distribution over the actions
< sumedhghaisas> MCTS at the end will assign a value for each action... with their bounds..
< sumedhghaisas> with some mathematical formulation if I somehow create a rational policy from this MCTS value and their bounds
< sumedhghaisas> like probability distribution
< sumedhghaisas> then I can use KL-divergence as a loss function and try to reduce the error
< sumedhghaisas> between the probabilities given by policy network and policy defined by MCTS values
< sumedhghaisas> but I only know that MCTS values are comparative... as in the best value will tell the best action... though I am know if other value are comparative as well
< sumedhghaisas> Do you know anything about this?
< zoq> nothing that pops up right now
< sumedhghaisas> I am happy that you like the idea... Do you think it can be improved in some other way? My supervisor is reviewing my first draft as well :)
< zoq> hm, I'm not sure, has anybody ever used PUCB for some like go?
trapz has quit [Quit: trapz]
trapz has joined #mlpack
< sumedhghaisas> AlphaGo... thats where I got this paper :P
< zoq> ah neat
< zoq> I see, they use a variant of the PUCT method
< zoq> I have to read the paper, before I can think about improving the method.
chenzhe has joined #mlpack
govg has joined #mlpack
< zoq> sumedhghais: Btw. do you have any idea why the coverage decreased for commits that didn't changed the code but comments? like: https://github.com/mlpack/mlpack/commits/master?after=6b097f28d5317628130aede16f019d2abe37a268+34
chenzhe has quit [Ping timeout: 246 seconds]
vinayakvivek has quit [Quit: Connection closed for inactivity]
< sumedhghaisas> zoq: ahh that should not happen... hmmm
< sumedhghaisas> does the site provide omparison coverage?
< sumedhghaisas> the last commit you mean.. with fix typo?
< zoq> yes, I think I saw some other commits too
< zoq> here is another one: https://github.com/mlpack/mlpack/commits/master?after=6b097f28d5317628130aede16f019d2abe37a268+69 "Minor style fixes (80 columns, spaces between operations)."
< zoq> or "Add sanity check on data size."
< zoq> I haven't looked into the issue.
trapz has quit [Quit: trapz]
< zoq> Should I see only one file under "changed"?
< zoq> instead the last commit changed the coverage of 112 files
< sumedhghaisas> zoq: this does make any sense... I also looked at the previous commits
< sumedhghaisas> the lines covered for prefixedoutstream is always 30
< sumedhghaisas> I dont think that file has changed for long time now
< sumedhghaisas> but every time it shows either increase or decrease
< sumedhghaisas> so even the lines covered is showing changes... like +22 or -13 although the value is constant
< sumedhghaisas> I am not sure if I am sending the changes to the server or if it is calculating on its own
< sumedhghaisas> I have to open the dump the code is sending to the server
< zoq> yeah there is definitely something wrong