verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
chenzhe has joined #mlpack
< chenzhe> Hi I just got make error when I "make" the latest version of the code~
< chenzhe> Is this caused by "armadillo"?
< chenzhe> Scanning dependencies of target mlpack_arma_config
< chenzhe> [ 0%] Updating arma_config.hpp (if necessary)
< chenzhe> -- Regenerating arma_config.hpp.
< chenzhe> [ 0%] Built target mlpack_arma_config
< chenzhe> Scanning dependencies of target mlpack_headers
chenzhe has quit [Excess Flood]
chenzhe has joined #mlpack
< chenzhe> @rcurtin
stephentu has joined #mlpack
< chenzhe> here is the error message
< chenzhe> thanks~
chenzhe has quit [Ping timeout: 240 seconds]
< rcurtin> chenzhe: is that the up to date git master branch?
< rcurtin> on regression_distribution.cpp:28, the 'fitted' parameter should be declared as arma::rowvec, not arma::vec
< rcurtin> and this is the case in the current git master branch, but based on the compiler output it seems like that is not the case in the code that you have
< rcurtin> oh!
< rcurtin> now I see what the issue is
< rcurtin> it looks like you have mlpack installed locally, in /usr/local/include/
< rcurtin> so you can remove the locally installed mlpack from /usr/local/, and that will work as a temporary fix
< rcurtin> I opened https://github.com/mlpack/mlpack/issues/1013 to make a note to myself, but it is late so I will go to bed now and try to take a look tomorrow :)
stephentu has quit [Ping timeout: 240 seconds]
shikhar has joined #mlpack
Trion has joined #mlpack
govg has joined #mlpack
chenzhe has joined #mlpack
Trion has quit [Quit: Have to go, see ya!]
govg has quit [Ping timeout: 240 seconds]
shikhar has quit [Quit: WeeChat 1.4]
aashay has joined #mlpack
jenkins-mlpack has joined #mlpack
sheogorath27 has joined #mlpack
aashay has quit [Changing host]
aashay has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2466 (master - a76fe5e : Marcus Edel): The build was broken.
travis-ci has left #mlpack []
vivekp has quit [Ping timeout: 268 seconds]
vivekp has joined #mlpack
< cult-> i am looking at kpca and i would like to write my own kernel. i have a resampling algorithm and i'm not sure if it should be written as a kernel or resample the series before applying plain pca. my features are linear, but i was thinking to have a multidimensional feature space to map the series in a non-linear manner.
< cult-> i am missing some fundamental concepts on kernel, and i would be happy to enlightened.
< cult-> the about says: "Kernel Principal Components Analysis (optionally with sampling)" -- but where are the resampling routines?
< cult-> -re
sgupta has joined #mlpack
kris2 has joined #mlpack
shikhar has joined #mlpack
< sgupta> Hi Ryan. Since, we have Jenkins successfully firing docker commands, what tests do you want me to run and what results are to be extracted? I have build an image with mlpack-2.2.0's dependencies already on the server.
stephentu has joined #mlpack
stephentu has quit [Ping timeout: 240 seconds]
vivekp has quit [Ping timeout: 260 seconds]
< rcurtin> sgupta: inside the docker container, you can checkout and build mlpack, then run all the tests
< rcurtin> take a look at the 'mlpack - git commit test' for an idea of what needs to be done inside the build, and then see if you can get those things to run inside the container you've built
< kris2> Hi Lozhnknikov you there
< kris2> ?
< rcurtin> first day of GSoC, I hope everyone is having a good day so far :)
< sgupta> I have build an image for mlpack. So, we can have a container running mlpack by running a single command
< rcurtin> yes, but we don't need a container that runs mlpack, we need a container that has the dependencies for mlpack so that we can then build mlpack inside the container
< sgupta> Okay. Sure. So, we want to build mlpack inside the container and then run tests, is it?
< sgupta> Got your point.
< rcurtin> yes
< rcurtin> once we can do that, then we can build containers with different versions of the dependencies, or different compilers, etc.
< sgupta> Yes! And this one having the dependencies can be used as the base image.
< rcurtin> yeah, definitely; it might be a little tricky because we would need different versions of dependencies in each container, but I think the general idea will work, so we can figure it out when we get there :)
vivekp has joined #mlpack
sgupta has quit [Ping timeout: 240 seconds]
< lozhnikov> kris2: hi, i'll be at home late. You can put here your questions, i'll answer later (in the evening or tomorrow in the morning)
< kris2> Yes i was working on the solution that you gave i had a few problems with it
< kris2> i will make a gist and put my doubts in the comment in the comments
< kris2> and post it here
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
Trion has joined #mlpack
sgupta has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2467 (master - 48676e8 : Ryan Curtin): The build is still failing.
travis-ci has left #mlpack []
shikhar has joined #mlpack
kris2 has quit [Quit: Leaving.]
Trion has quit [Quit: Have to go, see ya!]
< rcurtin> cult-: sorry for the slow response
< rcurtin> the sampling available for kernel PCA is the Nystroem method, found in src/mlpack/methods/nystroem_method/
< rcurtin> basically the Nystroem method samples some number of points from the dataset such that the kernel PCA result should be approximately the same
< rcurtin> you can see how the Nystroem method plugs into kernel PCA in src/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp (and naive_method.hpp for comparison)
< rcurtin> I'm not sure that I understand exactly what your resampling algorithm is though so I don't think I can comment further on that without more information
< cult-> the Evaluate method takes two columns(vectors) right?
< cult-> not just single data points
< rcurtin> for the KernelType? yes, it is evaluating the kernel between two points
< rcurtin> note that a lot of kernels are translation-invariant, like the Gaussian kernel
< rcurtin> I can express the Gaussian kernel two ways:
< rcurtin> K(a, b) = z * exp(-| a - b |^2 / bw^2)
< rcurtin> so it's just a function of the distance between two points
< rcurtin> I could also write:
< rcurtin> K(t) = z * exp(-t^2 / bw^2)
< rcurtin> where t is a scalar
< rcurtin> and pass t = |a - b| to get the same result
< rcurtin> so, maybe the kernel you have in mind, you can rewrite in the two argument form?
< cult-> i mean
< rcurtin> ah sorry maybe I answered the wrong thing :)
< cult-> the args are series right?
< rcurtin> each argument should be a multidimensional single point
< rcurtin> (or it could be one dimension---then the column vector would just have one element)
< cult-> yes, its a column vector that, the features
< cult-> or scores?
< rcurtin> ah, I am not sure I understand what you mean when you say that
< cult-> ok nvm, you already answered that the inputs are not single primitives encapsulated in a column, but potentially a column with more than 1 rows
< cult-> i can see how its called in naive_method.hpp
< cult-> data is the input data points, and from that, it extracts the columns
< cult-> but if you line up the data inverse so that columns are representing rows and vice versa, then the inputs are not vectors but simple data points
< cult-> so which one: a) Evaluate({1,2,3,4,5},{5,6,7,8,9}) or b) Evaluate({1},{4}}} ?
< cult-> the kernel compares vector against vector or one data point against another datapoint?
< rcurtin> the rows in the data matrix should correspond to the features of your data
< rcurtin> so if your data is 10-dimensional, then it should have 10 rows
< rcurtin> and for your kernel you will pass in a 10-dimensional vector
< cult-> ok i see now
< rcurtin> yeah, so it is a little bit strange to think about, because usually in machine learning textbooks, each column corresponds to a feature
< cult-> right
< rcurtin> but in mlpack (because it uses Armadillo which uses LAPACK internally), each row corresponds to a feature
< cult-> ok so i can't do resampling in the kernel itself
< cult-> if i want to do it i have to write a custom kernel rule?
< rcurtin> no, not in the kernel itself---you would want to make a KernelRules class and do the resampling there
< rcurtin> yeah, you will have to write a custom one
< cult-> what about if i want to use the standard pca?
< rcurtin> hm, there is no sampling built in for regular PCA
< rcurtin> I guess, there is another option---
< cult-> or just use kpca with linear kernel?
< rcurtin> before calling KernelPCA or PCA at all, just sample your dataset in order to create a smaller dataset
< rcurtin> and pass the smaller dataset to KernelPCA or PCA
< rcurtin> if you want to do KernelPCA with a linear kernel, it should be faster to just do PCA
< cult-> so if i do a sampling and then pca, is that equal to do a kpca with linear kernel and then apply a custom kernel rule instead of nystroem?
< cult-> where in the first i do the sampling before, and at the second i do it after
< rcurtin> yes, I believe those would be equivalent
< rcurtin> the first would probably run faster and be easier to implement :)
< rcurtin> I am assuming that your sampling strategy is essentially as simple as "we will take these k points out of the larger dataset that has N points and use only those k points for PCA/KPCA"
< cult-> using either the naive rules or nystroem is mandatory i can run kpca without any of these rules?
< cult-> rcurtin: correct
< rcurtin> the naive rules is just standard KPCA, so you could use that
< rcurtin> but like I said, if you are using KPCA with a linear kernel you would get the same results with just regular PCA
< cult-> i feel like i want to do the sampling in the applykernelmatrix
< cult-> but maybe that's wrong
< rcurtin> you can do it there, if you like, it will just be a little bit more code to write :)
< rcurtin> the ApplyKernelMatrix() function expects you to return the transformed data, eigenvalues, and eigenvectors
< cult-> ok then, i am not writing there anything just go with the pre-sampling
< cult-> too bad i wanted to write my own kernel, but i didn't even know how could i improve it heh
aashay has quit [Quit: Connection closed for inactivity]
< rcurtin> fair enough, hopefully everything I said was helpful :)
< rcurtin> let me know if I can help with anything else
< cult-> its always helpful
< rcurtin> sure, glad to help out :)
chenzhe has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
kris2 has joined #mlpack
s1998 has joined #mlpack
s1998 has quit [Client Quit]
shikhar has quit [Quit: WeeChat 1.4]
< kris2> I want to create a function in which there will be virtual function that the any user would implement according to the way the want to implement. Now since mlpack discourages the use of virtual functions is there any work around it.
< kris2> I think rcurtin did somthing similar for creating sgdtest function. Where the when you have to define a test function it has to implement certain functions for example Numfunctions() method
< kris2> can anyone help on this issue ??
< kris2> zoq:
< rcurtin> kris2: can you explain more of what you mean?
< rcurtin> chenzhe: I tried to reproduce your build issue on an OS X system I have, but I couldn't do it---can you tell me more about your setup there?
< zoq> Yeah, I'm not sure what you like to achive, but it sounds like the policy-based design pattern could help?
< kris2> rcurtin: basically i am trying to create interface where the like this template<typename InputType = double, typename OutputType = double, typename ActivationFunction = GaussianDistribution> class CustomDistribution
< kris2> Where CustomDistribution would have a function sample and some other functions
< rcurtin> what is the activation function? I'm not sure what that means in this context
< kris2> so the activation could be any standard distribution like normal, binomial
< kris2> what i am confused with suppose i want to normal(x1+x2+x3, 1) where x1 x2 and x3 are elements of the std::vector<InputType>
< rcurtin> if you need a normal distribution, why not just use the GaussianDistribution class that already exists? I think maybe there is something I don't understand about the situation here
< kris2> ActivationFunction can take distribution actually not just GaussianDistribution
< zoq> I'm not sure I get the problem, do you like to call e.g. the Sample function of the ActivationFunction where ActivationFunction is some distribution e.g. GaussianDistribution?
< kris2> zoq: Exactly, but i want to instantiate the ActivationFunction(GaussianDistribution) with user defined some f(input_vector)-> R.
< kris2> Does that make sense to you
mentekid has quit [Quit: Leaving.]
< kris2> the function f has to be userdefined
< zoq> I guess, hold on
mentekid has joined #mlpack
< rcurtin> yeah, I feel like instead of providing some class that can take any user defined function f(input)->R, the user must provide a class that implements that function f(input)->R
< rcurtin> just like the MetricType policy or KernelType policy
< kris2> rcurtin: i will have a look at them
mikeling has quit [Quit: Connection closed for inactivity]
< chenzhe> rcurtin: I installed the mlpack from "brew", so it has "include" in /usr/local/include/, I just removed them and it works now~ Thanks for the help!
kris2 has quit [Ping timeout: 240 seconds]
kris has joined #mlpack
< chenzhe> if you try to install the old version by "brew", you might be able to reproduce my build problem
< rcurtin> chenzhe: ok, I see---I installed similarly (though not through brew)
< rcurtin> did you also install your compiler through brew?
< rcurtin> or is that XCode or something?
< chenzhe> the packages like Amarillo are installed by brew
< chenzhe> let me check
< chenzhe> brew info armadillo
< chenzhe> homebrew/science/armadillo: stable 7.900.1 (bottled)
< chenzhe> C++ linear algebra library
< chenzhe> /usr/local/Cellar/armadillo/7.800.1 (524 files, 13.7MB)
< chenzhe> Built from source on 2017-03-18 at 12:37:18 with: --with-hdf5
< chenzhe> /usr/local/Cellar/armadillo/7.800.2 (524 files, 13.7MB)
< chenzhe> Built from source on 2017-03-25 at 19:06:49 with: --with-hdf5
< chenzhe> /usr/local/Cellar/armadillo/7.900.1 (525 files, 13.8MB) *
< chenzhe> Poured from bottle on 2017-05-29 at 19:59:54
< chenzhe> g++ --version
< chenzhe> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
< chenzhe> Apple LLVM version 8.1.0 (clang-802.0.42)
< chenzhe> Target: x86_64-apple-darwin16.6.0
< chenzhe> Thread model: posix
< chenzhe> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
< rcurtin> ah, ok, so the compiler comes from xcode
< chenzhe> yes
< rcurtin> ok, let me see if I can reproduce with that information... I'll try setting up brew, and maybe brew is changing the default include order
kris has quit [Quit: Leaving.]
kris has joined #mlpack
chenzhe has quit [Ping timeout: 272 seconds]
< zoq> kris: Have you seen my message with the code snippet?
< kris> zoq: I have actually
< kris> i just had one question though
< kris> Okay can we not have template<typename ActionvationFunction = GaussianDistribution, typename MeanFunction = Sigmoid>class MyFunction
< kris> where the MeanFunction is used for instanting the Gaussian Distribution using the input vector
< zoq> you mean GaussianDistribution(MeanFunction(...))?
< kris> The inputType vec would be a vector of variable for example the visible layer neurons in the rbm
< kris> yes
< kris> zoq:
< kris> So something like this GaussianDistribution(MeanFunction(inputType vector), 1);
< zoq> sure you can do that: I guess in this case ActionvationFunction(MeanFunction(...), ..)
< kris> Yes thats what i mean
< kris> Thanks
< zoq> and MeanFunction is an object instatiation of some class? or does it return e.g. arma::mat?
< zoq> so do you plan to call a function of the Meanfunction class inside the ActionvationFunction?
< kris> Actually there is the doubt i think because every activation function / distribution would have a constructor taking different number of arguments.
< kris> So just doing ActivationFunction(MeanFunction(), ...) won't work
< kris> i guess
< kris> I think thats where your solution makes more sense
< zoq> Maybe you can provide a unified constructor, where you provide all necessary parameter but only use the parameter for that particular function.
< kris> Can you give a example? is this similar to variadic templates ?
chenzhe has joined #mlpack
< zoq> Actually I was thinking of something much simpler: https://gist.github.com/zoq/2143ce30da4b28e9c44751c0324b4aef just added some comments
< kris> mccormmick blogs are quite popular btw
< rcurtin> ah yeah, I saw that one, I did not know that mccormick blogs were popular though
< kris> Well not Karpathy level famous.....:-D
< rcurtin> :)
mentekid has quit [Quit: Leaving.]
kris has quit [Quit: Leaving.]
kris has joined #mlpack
< kris> zoq: I made some comments here could you please have a look https://gist.github.com/zoq/2143ce30da4b28e9c44751c0324b4aef