#mlpack on 2015-07-22 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:24 syrius has joined #mlpack

18:06 _jason_ has joined #mlpack

18:46 < _jason_> #naywhayare I am trying to use the mlpack kernel_pca executable. I am passing these flags: --input_file ../iris_norm.txt -d 2 --kernel 'linear' --output_file kpcaout.txt. The input dataset is 150 rows x 4 cols. If the input file is 150 rows by 4 columns, the kernel matrix inside of the mlpack code ends up being 150x150 which I think is not correct. I think it should be 4x4. The output to the file in this c

18:46 < _jason_> ase is 150x2 which seems correct. I thought maybe I needed to transpose the input, so I created a version of the input file with 4 rows and 150 columns. In this version the kernel matrix is 4x4 which I think is correct, but the output sent to the output file is 4x2, which does not seem right unless I am supposed to get my transformed matrix by multiplying the original matrix times the 4x2 matrix returned f

18:46 < _jason_> orm the mlpack code. Which way is correct? Or am I missing something else? Thanks!

18:48 < naywhayare> hi _jason_, does your input file contain 150 observations/points, or 4 observations/points?

18:49 < _jason_> 150 observations 4 variables

18:49 < naywhayare> okay

18:49 < naywhayare> so in this case, the kernel matrix actually should be 150x150

18:50 < naywhayare> since the kernel matrix is (number of observations) * (number of observations)

18:50 < naywhayare> in your case, you've chosen the linear kernel, where it turns out kernel pca is exactly equivalent to regular PCA

18:51 < naywhayare> and regular PCA would work on a 4x4 covariance3 matrix

18:51 < naywhayare> *covariance matrix

18:51 < naywhayare> so honestly I'd say here you'd be better off just using the 'pca' executable; I think you'd get the same results faster

18:53 < _jason_> Thanks for answering! I realize that the linear Kernel is equivalent to regular PCA. I was just setting up a simple example to make sure I understood what was going on and using it correctly

18:54 < _jason_> I expected (incorrectly) that kernel PCA also worked on an num_vars by num_vars matrix

18:54 < naywhayare> ah, okay

18:55 < naywhayare> yeah, kernel PCA uses an NxN matrix where each entry is the kernel evaluation between two points, and it turns out that these are sufficient represent the eigenvectors in kernel space (for whatever kernel is chosen)

18:55 < naywhayare> *sufficient to represent

18:56 < _jason_> Needing an observations by observations matrix is a bummer for some data sets. Thank you again for the help. I very much appreciate your answers!

18:57 < naywhayare> yep, it is

18:57 < naywhayare> but, in those cases, what is often used are sampling schemes

18:58 < naywhayare> such as the Nystroem method, which zoq wrote an implementation of

18:58 < naywhayare> the "-n" option (--nystroem_method) can be used with the kernel_pca executable for that purpose

18:58 < naywhayare> I have to step out for a meeting... I'll be back in probably 30-45 minutes

18:59 < _jason_> Thanks! You're amazing!

19:04 < naywhayare> sure, no problem

19:04 < naywhayare> that was a very short meeting... "come back tomorrow"

19:08 < _jason_> short meetings are the best kind :)

19:10 < naywhayare> I wholeheartedly agree

19:37 cheng_ has joined #mlpack

20:24 cheng_ has quit [Ping timeout: 265 seconds]