verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
chenzhe_ has joined #mlpack
chenzhe_ has quit [Remote host closed the connection]
kris1 has quit [Quit: Leaving.]
kris1 has joined #mlpack
< kris1> I am going along the layere wise implementation that you suggested. If it looks alright please let me know.
< kris1> zoq: if you have time could you also have a look at it that's how we are planning to implement the Gibbs layer for any given distribution.
kris1 has left #mlpack []
chenzhe has quit [Ping timeout: 260 seconds]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 260 seconds]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 255 seconds]
Trion has joined #mlpack
chenzhe has joined #mlpack
< chenzhe> Hi! I think I need a Linear Programming library for Frank-Wolfe algorithm, anyone has any recommendations?
chenzhe has quit [Quit: chenzhe]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 245 seconds]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 245 seconds]
aashay has joined #mlpack
chenzhe has joined #mlpack
shikhar has joined #mlpack
chenzhe has quit [Ping timeout: 240 seconds]
shikhar_ has joined #mlpack
shikhar has quit [Ping timeout: 268 seconds]
shikhar_ has quit [Quit: WeeChat 1.4]
vivekp has quit [Ping timeout: 240 seconds]
< zoq> kris1: Looks reasonable to me, there are some things that I think could be improved like line 67 as Mikhail already pointed out.
vivekp has joined #mlpack
Trion has quit [Ping timeout: 260 seconds]
kris1 has joined #mlpack
< kris1> Lozhinkov i have commented on the gist please have look
< kris1> Maybe we could discuss the use of the mean and variance function
sgupta has quit [Ping timeout: 260 seconds]
< rcurtin> ironstark: hey, I think you might be working with mlpy now... but unfortunately mlpy hasn't been updated since 2012 and causes Python to segfault on exit once it's imported
< rcurtin> so I think that maybe we should drop that library from the benchmarking system because of that problem
< ironstark> rcurtin : Agreed. So this week i will start working with maybe MATLAB or Shogub
< rcurtin> sure, sounds good to me :)
< rcurtin> sorry for the confusion on that one; I only found that out about a week ago, and I hadn't remembered it was first on your list in your proposal
< cult-> rcurtin: i have this problem that kpca even with linear kernel takes too much time ( i don't even know ) when plain pca is instant
< cult-> what am i missing?
< cult-> *don't even know how much time, but i am running it now since several minutes
< cult-> 13k observations in 4 dimensions
< cult-> i cancelled it at 10 minutes, still didn't complete the computation. i really want to use kpca but i don't understand why its so slow
< rcurtin> KPCA assembles the full kernel matrix, which is size (n_observations * n_observations)
< rcurtin> so that's 13k^2 evaluations of the kernel in your case
< rcurtin> and a huge amount of memory usage
< rcurtin> but PCA assembles the Gram matrix of the data, which is (n_dims * n_dims)
< rcurtin> so only 16 elements in your case
< rcurtin> and then if you are using KPCA, that 13k x 13k matrix has to be eigendecomposed
< rcurtin> whereas with regular PCA it's just the 4x4 matrix to eigendecompose
< rcurtin> so PCA is a lot faster overall
< cult-> uh
< rcurtin> like I said earlier, KPCA with the linear kernel is equivalent to PCA, so if you're just using the linear kernel I'd stick with PCA
< rcurtin> but if you are planning to use a different kernel, then it is probably good to sample your data
< cult-> could centerTransformedData speed up the process?
< rcurtin> no, that's just a step that's performed to the output after the KPCA process is complete
< rcurtin> unfortunately really the only easy way to accelerate KPCA is to sample the dataset
< cult-> ok
< cult-> another question for gmm
< rcurtin> there are other approaches like the Kernel Hebbian Algorithm and other things, but those aren't implemented in mlpack and would take a long time to implement and test...
< rcurtin> (and it's also not fully clear how well they would scale)
< cult-> can I do something about gmm to make it better? can I resample, use some kernel or do something about a better estimation?
< rcurtin> hmm, we don't have GMM estimation in kernel space so I don't think there's any easy route there
< rcurtin> typically the quality of the GMM estimate comes down to the initial estimate the EM algorithm starts with (currently done via k-means)
< rcurtin> and the number of Gaussians that are chosen for the models
< cult-> then before i feed my gmm model, what can i do about the distribution. like, what if i resample n-times and give all those samples as mixtures
< cult-> is that a bad idea?
< rcurtin> I'm not sure I understand the idea, can you clarify a bit please?
< cult-> so instead of kde we only have gmm to estimate density
< cult-> i just want to make a better estimation
< cult-> my idea was to resample x times and give each sampled vector as input from the resample algorithm to gmm as dimensions
< cult-> nvm, i guess this is a question against the literature
< cult-> the idea is just to improve gmm somehow
< rcurtin> do you mean resampling like bootstrapping?
< cult-> yes
< rcurtin> ah, right, so you bootstrap x times and then average the GMM predictions or something
< cult-> for example yes
< rcurtin> I suspect that could give better results, but I wonder if that would take longer than just a simple for-loop KDE calculation
< rcurtin> i.e.
< rcurtin> f(x) = sum_{points in dataset} K(x, p)
< rcurtin> but note that that's a different thing entirely than kernel PCA
< cult-> what if i provide the resampled series as separate gaussians to the gmm?
< cult-> i know its not related to kernel pca, its another question right now
< rcurtin> ok, sure
< rcurtin> there's no good way to specify which gaussian a point comes from in GMM estimation
< rcurtin> the estimation only assumes that you have some dataset and the points come from some GMM with the given number of Gaussians, and then it's up to the estimation procedure to figure out how to fit those Gaussians to the data
< rcurtin> providing the resampled series as separate Gaussians to me would just mean training individual gaussians via GaussianDistribution::Estimate()
< rcurtin> (or I forget maybe that method is called Train())
< cult-> ok
< cult-> what about the first idea that you said:
< cult-> i bootstrap x times of the a prediction and average those predictions?
< cult-> so
< cult-> no, nevermind i got it
< cult-> the question is what should i bootstrap, the input data to the train, or the output variable of probability?
aashay has quit [Quit: Connection closed for inactivity]
< cult-> i think i lost it, im a bit tired now, will read on some techniques
shikhar has joined #mlpack
< rcurtin> ah, sorry, got distracted
< rcurtin> I think you should bootstrap the input data, if you are planning on trying that approach
< cult-> thanks
chenzhe has joined #mlpack
shikhar has quit [Quit: WeeChat 1.4]
chenzhe has quit [Ping timeout: 260 seconds]
< kris1> lozhnikov: Please have a look i added some comments
< lozhnikov> kris1: yes, I have already replied
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2478 (master - e6042e7 : Ryan Curtin): The build was fixed.
travis-ci has left #mlpack []
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 260 seconds]
chenzhe has joined #mlpack
< rcurtin> chenzhe: I am not sure of any good linear programming library out there; I think maybe the better idea might be to implement a technique with Armadillo; what do you think?
< chenzhe> That would be quite a complicated task. There seems to be a GNU package called glpk, but not in Armadillo for sure~
< chenzhe> I will finish other parts like OMP first, and maybe add some simple LP code with Armadillo later
< rcurtin> yeah, adding a dependency could be difficult; glpk seems like a possibility but I am not sure how available or maintained it is
< rcurtin> for a first pass I don't think the LP solver you would implement with Armadillo would need to be amazing, I think it could be simple and functional to keep from taking too much time
< rcurtin> not sure if stephentu has any other ideas
< chenzhe> I will ask him later
aashay has joined #mlpack
chenzhe has quit [Read error: Connection reset by peer]
chenzhe1 has joined #mlpack
chenzhe1 is now known as chenzhe
klitzy has joined #mlpack
klitzy is now known as dhama
dhama_ has joined #mlpack
dhama_ has quit [Client Quit]
chenzhe has quit [Ping timeout: 268 seconds]
mikeling has quit [Quit: Connection closed for inactivity]
dhama has quit [Quit: dhama]
aashay has quit [Quit: Connection closed for inactivity]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 240 seconds]
sumedhghaisas has quit [Ping timeout: 240 seconds]
chenzhe has joined #mlpack
< cult-> as soon as i add a multidimensional space to gmm my predict always fails with 0
< cult-> *prediction by probability
vivekp has quit [Ping timeout: 255 seconds]
< rcurtin> does it fail, or is the value just extremely small?
< rcurtin> PDFs tend to have much smaller values in higher-dimensional spaces
vivekp has joined #mlpack
chenzhe has quit [Quit: chenzhe]
chenzhe has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
chenzhe has joined #mlpack