verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
syrius has joined #mlpack
_jason_ has joined #mlpack
< _jason_>
#naywhayare I am trying to use the mlpack kernel_pca executable. I am passing these flags: --input_file ../iris_norm.txt -d 2 --kernel 'linear' --output_file kpcaout.txt. The input dataset is 150 rows x 4 cols. If the input file is 150 rows by 4 columns, the kernel matrix inside of the mlpack code ends up being 150x150 which I think is not correct. I think it should be 4x4. The output to the file in this c
< _jason_>
ase is 150x2 which seems correct. I thought maybe I needed to transpose the input, so I created a version of the input file with 4 rows and 150 columns. In this version the kernel matrix is 4x4 which I think is correct, but the output sent to the output file is 4x2, which does not seem right unless I am supposed to get my transformed matrix by multiplying the original matrix times the 4x2 matrix returned f
< _jason_>
orm the mlpack code. Which way is correct? Or am I missing something else? Thanks!
< naywhayare>
hi _jason_, does your input file contain 150 observations/points, or 4 observations/points?
< _jason_>
150 observations 4 variables
< naywhayare>
okay
< naywhayare>
so in this case, the kernel matrix actually should be 150x150
< naywhayare>
since the kernel matrix is (number of observations) * (number of observations)
< naywhayare>
in your case, you've chosen the linear kernel, where it turns out kernel pca is exactly equivalent to regular PCA
< naywhayare>
and regular PCA would work on a 4x4 covariance3 matrix
< naywhayare>
*covariance matrix
< naywhayare>
so honestly I'd say here you'd be better off just using the 'pca' executable; I think you'd get the same results faster
< _jason_>
Thanks for answering! I realize that the linear Kernel is equivalent to regular PCA. I was just setting up a simple example to make sure I understood what was going on and using it correctly
< _jason_>
I expected (incorrectly) that kernel PCA also worked on an num_vars by num_vars matrix
< naywhayare>
ah, okay
< naywhayare>
yeah, kernel PCA uses an NxN matrix where each entry is the kernel evaluation between two points, and it turns out that these are sufficient represent the eigenvectors in kernel space (for whatever kernel is chosen)
< naywhayare>
*sufficient to represent
< _jason_>
Needing an observations by observations matrix is a bummer for some data sets. Thank you again for the help. I very much appreciate your answers!
< naywhayare>
yep, it is
< naywhayare>
but, in those cases, what is often used are sampling schemes
< naywhayare>
such as the Nystroem method, which zoq wrote an implementation of
< naywhayare>
the "-n" option (--nystroem_method) can be used with the kernel_pca executable for that purpose
< naywhayare>
I have to step out for a meeting... I'll be back in probably 30-45 minutes
< _jason_>
Thanks! You're amazing!
< naywhayare>
sure, no problem
< naywhayare>
that was a very short meeting... "come back tomorrow"