#mlpack on 2017-05-30 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:18 chenzhe has joined #mlpack

02:40 < chenzhe> Hi I just got make error when I "make" the latest version of the code~

02:40 < chenzhe> Is this caused by "armadillo"?

02:41 < chenzhe> Scanning dependencies of target mlpack_arma_config

02:41 < chenzhe> [ 0%] Updating arma_config.hpp (if necessary)

02:41 < chenzhe> -- Regenerating arma_config.hpp.

02:41 < chenzhe> [ 0%] Built target mlpack_arma_config

02:41 < chenzhe> Scanning dependencies of target mlpack_headers

02:41 chenzhe has quit [Excess Flood]

02:41 chenzhe has joined #mlpack

02:41 < chenzhe> @rcurtin

02:42 stephentu has joined #mlpack

02:49 < chenzhe> https://gist.github.com/czdiao/7caf730ed12e43bcc65ee2061cff1d1f

02:49 < chenzhe> here is the error message

02:49 < chenzhe> thanks~

03:12 chenzhe has quit [Ping timeout: 240 seconds]

03:47 < rcurtin> chenzhe: is that the up to date git master branch?

03:48 < rcurtin> on regression_distribution.cpp:28, the 'fitted' parameter should be declared as arma::rowvec, not arma::vec

03:48 < rcurtin> and this is the case in the current git master branch, but based on the compiler output it seems like that is not the case in the code that you have

03:48 < rcurtin> oh!

03:48 < rcurtin> now I see what the issue is

03:49 < rcurtin> it looks like you have mlpack installed locally, in /usr/local/include/

03:50 < rcurtin> so you can remove the locally installed mlpack from /usr/local/, and that will work as a temporary fix

03:51 < rcurtin> I opened https://github.com/mlpack/mlpack/issues/1013 to make a note to myself, but it is late so I will go to bed now and try to take a look tomorrow :)

05:08 stephentu has quit [Ping timeout: 240 seconds]

05:38 shikhar has joined #mlpack

06:07 Trion has joined #mlpack

06:49 govg has joined #mlpack

06:56 chenzhe has joined #mlpack

07:18 Trion has quit [Quit: Have to go, see ya!]

07:52 govg has quit [Ping timeout: 240 seconds]

08:34 shikhar has quit [Quit: WeeChat 1.4]

09:35 aashay has joined #mlpack

10:13 jenkins-mlpack has joined #mlpack

10:13 sheogorath27 has joined #mlpack

10:14 aashay has quit [Changing host]

10:14 aashay has joined #mlpack

10:15 travis-ci has joined #mlpack

10:15 < travis-ci> mlpack/mlpack#2466 (master - a76fe5e : Marcus Edel): The build was broken.

10:15 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/05f40a7f7278...a76fe5e6ff71

10:15 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/237443928

10:15 travis-ci has left #mlpack []

10:42 vivekp has quit [Ping timeout: 268 seconds]

10:45 vivekp has joined #mlpack

10:54 < cult-> i am looking at kpca and i would like to write my own kernel. i have a resampling algorithm and i'm not sure if it should be written as a kernel or resample the series before applying plain pca. my features are linear, but i was thinking to have a multidimensional feature space to map the series in a non-linear manner.

10:55 < cult-> i am missing some fundamental concepts on kernel, and i would be happy to enlightened.

10:57 < cult-> the about says: "Kernel Principal Components Analysis (optionally with sampling)" -- but where are the resampling routines?

10:57 < cult-> -re

11:38 sgupta has joined #mlpack

11:40 kris2 has joined #mlpack

11:49 shikhar has joined #mlpack

11:51 < sgupta> Hi Ryan. Since, we have Jenkins successfully firing docker commands, what tests do you want me to run and what results are to be extracted? I have build an image with mlpack-2.2.0's dependencies already on the server.

12:24 stephentu has joined #mlpack

13:11 stephentu has quit [Ping timeout: 240 seconds]

13:11 vivekp has quit [Ping timeout: 260 seconds]

13:22 < rcurtin> sgupta: inside the docker container, you can checkout and build mlpack, then run all the tests

13:22 < rcurtin> take a look at the 'mlpack - git commit test' for an idea of what needs to be done inside the build, and then see if you can get those things to run inside the container you've built

13:23 < kris2> Hi Lozhnknikov you there

13:23 < kris2> ?

13:23 < rcurtin> first day of GSoC, I hope everyone is having a good day so far :)

13:23 < sgupta> I have build an image for mlpack. So, we can have a container running mlpack by running a single command

13:24 < rcurtin> yes, but we don't need a container that runs mlpack, we need a container that has the dependencies for mlpack so that we can then build mlpack inside the container

13:25 < sgupta> Okay. Sure. So, we want to build mlpack inside the container and then run tests, is it?

13:26 < sgupta> Got your point.

13:26 < rcurtin> yes

13:27 < rcurtin> once we can do that, then we can build containers with different versions of the dependencies, or different compilers, etc.

13:27 < sgupta> Yes! And this one having the dependencies can be used as the base image.

13:30 < rcurtin> yeah, definitely; it might be a little tricky because we would need different versions of dependencies in each container, but I think the general idea will work, so we can figure it out when we get there :)

13:30 vivekp has joined #mlpack

13:34 sgupta has quit [Ping timeout: 240 seconds]

13:35 < lozhnikov> kris2: hi, i'll be at home late. You can put here your questions, i'll answer later (in the evening or tomorrow in the morning)

13:38 < kris2> Yes i was working on the solution that you gave i had a few problems with it

13:39 < kris2> i will make a gist and put my doubts in the comment in the comments

13:39 < kris2> and post it here

13:42 vivekp has quit [Ping timeout: 240 seconds]

13:47 vivekp has joined #mlpack

13:57 Trion has joined #mlpack

14:10 sgupta has joined #mlpack

14:39 shikhar has quit [Read error: Connection reset by peer]

14:52 travis-ci has joined #mlpack

14:52 < travis-ci> mlpack/mlpack#2467 (master - 48676e8 : Ryan Curtin): The build is still failing.

14:52 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/a76fe5e6ff71...48676e85e9b7

14:52 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/237533137

14:52 travis-ci has left #mlpack []

14:53 shikhar has joined #mlpack

14:57 kris2 has quit [Quit: Leaving.]

15:11 Trion has quit [Quit: Have to go, see ya!]

15:15 < rcurtin> cult-: sorry for the slow response

15:15 < rcurtin> the sampling available for kernel PCA is the Nystroem method, found in src/mlpack/methods/nystroem_method/

15:15 < rcurtin> basically the Nystroem method samples some number of points from the dataset such that the kernel PCA result should be approximately the same

15:16 < rcurtin> you can see how the Nystroem method plugs into kernel PCA in src/mlpack/methods/kernel_pca/kernel_rules/nystroem_method.hpp (and naive_method.hpp for comparison)

15:16 < rcurtin> I'm not sure that I understand exactly what your resampling algorithm is though so I don't think I can comment further on that without more information

15:26 < cult-> the Evaluate method takes two columns(vectors) right?

15:27 < cult-> not just single data points

15:28 < rcurtin> for the KernelType? yes, it is evaluating the kernel between two points

15:28 < rcurtin> note that a lot of kernels are translation-invariant, like the Gaussian kernel

15:28 < rcurtin> I can express the Gaussian kernel two ways:

15:29 < rcurtin> K(a, b) = z * exp(-| a - b |^2 / bw^2)

15:29 < rcurtin> so it's just a function of the distance between two points

15:29 < rcurtin> I could also write:

15:29 < rcurtin> K(t) = z * exp(-t^2 / bw^2)

15:29 < rcurtin> where t is a scalar

15:29 < rcurtin> and pass t = |a - b| to get the same result

15:29 < rcurtin> so, maybe the kernel you have in mind, you can rewrite in the two argument form?

15:30 < cult-> i mean

15:30 < rcurtin> ah sorry maybe I answered the wrong thing :)

15:30 < cult-> the args are series right?

15:31 < rcurtin> each argument should be a multidimensional single point

15:31 < rcurtin> (or it could be one dimension---then the column vector would just have one element)

15:32 < cult-> yes, its a column vector that, the features

15:32 < cult-> or scores?

15:32 < rcurtin> ah, I am not sure I understand what you mean when you say that

15:33 < cult-> ok nvm, you already answered that the inputs are not single primitives encapsulated in a column, but potentially a column with more than 1 rows

15:34 < cult-> i can see how its called in naive_method.hpp

15:35 < cult-> data is the input data points, and from that, it extracts the columns

15:36 < cult-> but if you line up the data inverse so that columns are representing rows and vice versa, then the inputs are not vectors but simple data points

15:37 < cult-> so which one: a) Evaluate({1,2,3,4,5},{5,6,7,8,9}) or b) Evaluate({1},{4}}} ?

15:38 < cult-> the kernel compares vector against vector or one data point against another datapoint?

15:38 < rcurtin> the rows in the data matrix should correspond to the features of your data

15:38 < rcurtin> so if your data is 10-dimensional, then it should have 10 rows

15:39 < rcurtin> and for your kernel you will pass in a 10-dimensional vector

15:39 < cult-> ok i see now

15:40 < rcurtin> yeah, so it is a little bit strange to think about, because usually in machine learning textbooks, each column corresponds to a feature

15:40 < cult-> right

15:40 < rcurtin> but in mlpack (because it uses Armadillo which uses LAPACK internally), each row corresponds to a feature

15:40 < cult-> ok so i can't do resampling in the kernel itself

15:40 < cult-> if i want to do it i have to write a custom kernel rule?

15:40 < rcurtin> no, not in the kernel itself---you would want to make a KernelRules class and do the resampling there

15:41 < rcurtin> yeah, you will have to write a custom one

15:41 < cult-> what about if i want to use the standard pca?

15:42 < rcurtin> hm, there is no sampling built in for regular PCA

15:42 < rcurtin> I guess, there is another option---

15:42 < cult-> or just use kpca with linear kernel?

15:42 < rcurtin> before calling KernelPCA or PCA at all, just sample your dataset in order to create a smaller dataset

15:42 < rcurtin> and pass the smaller dataset to KernelPCA or PCA

15:42 < rcurtin> if you want to do KernelPCA with a linear kernel, it should be faster to just do PCA

15:44 < cult-> so if i do a sampling and then pca, is that equal to do a kpca with linear kernel and then apply a custom kernel rule instead of nystroem?

15:44 < cult-> where in the first i do the sampling before, and at the second i do it after

15:45 < rcurtin> yes, I believe those would be equivalent

15:45 < rcurtin> the first would probably run faster and be easier to implement :)

15:45 < rcurtin> I am assuming that your sampling strategy is essentially as simple as "we will take these k points out of the larger dataset that has N points and use only those k points for PCA/KPCA"

15:46 < cult-> using either the naive rules or nystroem is mandatory i can run kpca without any of these rules?

15:47 < cult-> rcurtin: correct

15:48 < rcurtin> the naive rules is just standard KPCA, so you could use that

15:48 < rcurtin> but like I said, if you are using KPCA with a linear kernel you would get the same results with just regular PCA

15:48 < cult-> i feel like i want to do the sampling in the applykernelmatrix

15:50 < cult-> but maybe that's wrong

15:51 < rcurtin> you can do it there, if you like, it will just be a little bit more code to write :)

15:51 < rcurtin> the ApplyKernelMatrix() function expects you to return the transformed data, eigenvalues, and eigenvectors

15:52 < cult-> ok then, i am not writing there anything just go with the pre-sampling

15:52 < cult-> too bad i wanted to write my own kernel, but i didn't even know how could i improve it heh

15:57 aashay has quit [Quit: Connection closed for inactivity]

15:59 < rcurtin> fair enough, hopefully everything I said was helpful :)

15:59 < rcurtin> let me know if I can help with anything else

16:11 < cult-> its always helpful

16:17 < rcurtin> sure, glad to help out :)

16:35 chenzhe has joined #mlpack

16:35 shikhar has quit [Read error: Connection reset by peer]

16:38 shikhar has joined #mlpack

16:53 kris2 has joined #mlpack

17:29 s1998 has joined #mlpack

17:30 s1998 has quit [Client Quit]

17:38 shikhar has quit [Quit: WeeChat 1.4]

17:43 < kris2> I want to create a function in which there will be virtual function that the any user would implement according to the way the want to implement. Now since mlpack discourages the use of virtual functions is there any work around it.

17:44 < kris2> I think rcurtin did somthing similar for creating sgdtest function. Where the when you have to define a test function it has to implement certain functions for example Numfunctions() method

17:45 < kris2> can anyone help on this issue ??

17:47 < kris2> zoq:

18:01 < rcurtin> kris2: can you explain more of what you mean?

18:01 < rcurtin> chenzhe: I tried to reproduce your build issue on an OS X system I have, but I couldn't do it---can you tell me more about your setup there?

18:02 < zoq> Yeah, I'm not sure what you like to achive, but it sounds like the policy-based design pattern could help?

18:02 < kris2> rcurtin: basically i am trying to create interface where the like this template<typename InputType = double, typename OutputType = double, typename ActivationFunction = GaussianDistribution> class CustomDistribution

18:02 < kris2> Where CustomDistribution would have a function sample and some other functions

18:03 < rcurtin> what is the activation function? I'm not sure what that means in this context

18:04 < kris2> so the activation could be any standard distribution like normal, binomial

18:05 < kris2> what i am confused with suppose i want to normal(x1+x2+x3, 1) where x1 x2 and x3 are elements of the std::vector<InputType>

18:07 < rcurtin> if you need a normal distribution, why not just use the GaussianDistribution class that already exists? I think maybe there is something I don't understand about the situation here

18:08 < kris2> ActivationFunction can take distribution actually not just GaussianDistribution

18:12 < zoq> I'm not sure I get the problem, do you like to call e.g. the Sample function of the ActivationFunction where ActivationFunction is some distribution e.g. GaussianDistribution?

18:14 < kris2> zoq: Exactly, but i want to instantiate the ActivationFunction(GaussianDistribution) with user defined some f(input_vector)-> R.

18:14 < kris2> Does that make sense to you

18:15 mentekid has quit [Quit: Leaving.]

18:15 < kris2> the function f has to be userdefined

18:15 < zoq> I guess, hold on

18:16 mentekid has joined #mlpack

18:18 < rcurtin> yeah, I feel like instead of providing some class that can take any user defined function f(input)->R, the user must provide a class that implements that function f(input)->R

18:18 < rcurtin> just like the MetricType policy or KernelType policy

18:19 < kris2> rcurtin: i will have a look at them

18:20 < zoq> something like this I guess: https://gist.github.com/zoq/70f351a42d7e87a26126370ff18e0554

18:28 mikeling has quit [Quit: Connection closed for inactivity]

18:43 < chenzhe> rcurtin: I installed the mlpack from "brew", so it has "include" in /usr/local/include/, I just removed them and it works now~ Thanks for the help!

18:43 kris2 has quit [Ping timeout: 240 seconds]

18:44 kris has joined #mlpack

18:47 < chenzhe> if you try to install the old version by "brew", you might be able to reproduce my build problem

18:52 < rcurtin> chenzhe: ok, I see---I installed similarly (though not through brew)

18:52 < rcurtin> did you also install your compiler through brew?

18:52 < rcurtin> or is that XCode or something?

18:53 < chenzhe> the packages like Amarillo are installed by brew

18:53 < chenzhe> let me check

18:54 < chenzhe> brew info armadillo

18:54 < chenzhe> homebrew/science/armadillo: stable 7.900.1 (bottled)

18:54 < chenzhe> C++ linear algebra library

18:54 < chenzhe> https://arma.sourceforge.io/

18:54 < chenzhe> /usr/local/Cellar/armadillo/7.800.1 (524 files, 13.7MB)

18:54 < chenzhe> Built from source on 2017-03-18 at 12:37:18 with: --with-hdf5

18:54 < chenzhe> /usr/local/Cellar/armadillo/7.800.2 (524 files, 13.7MB)

18:54 < chenzhe> Built from source on 2017-03-25 at 19:06:49 with: --with-hdf5

18:54 < chenzhe> /usr/local/Cellar/armadillo/7.900.1 (525 files, 13.8MB) *

18:54 < chenzhe> Poured from bottle on 2017-05-29 at 19:59:54

18:54 < chenzhe> From: https://github.com/Homebrew/homebrew-science/blob/master/armadillo.rb

18:54 < chenzhe> g++ --version

18:54 < chenzhe> Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1

18:54 < chenzhe> Apple LLVM version 8.1.0 (clang-802.0.42)

18:54 < chenzhe> Target: x86_64-apple-darwin16.6.0

18:54 < chenzhe> Thread model: posix

18:54 < chenzhe> InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

18:55 < rcurtin> ah, ok, so the compiler comes from xcode

18:55 < chenzhe> yes

18:55 < rcurtin> ok, let me see if I can reproduce with that information... I'll try setting up brew, and maybe brew is changing the default include order

18:57 kris has quit [Quit: Leaving.]

19:10 kris has joined #mlpack

19:30 chenzhe has quit [Ping timeout: 272 seconds]

19:41 < zoq> kris: Have you seen my message with the code snippet?

19:43 < kris> zoq: I have actually

19:43 < kris> i just had one question though

19:46 < kris> Okay can we not have template<typename ActionvationFunction = GaussianDistribution, typename MeanFunction = Sigmoid>class MyFunction

19:47 < kris> where the MeanFunction is used for instanting the Gaussian Distribution using the input vector

19:49 < zoq> you mean GaussianDistribution(MeanFunction(...))?

19:49 < kris> The inputType vec would be a vector of variable for example the visible layer neurons in the rbm

19:49 < kris> yes

19:49 < kris> zoq:

19:51 < kris> So something like this GaussianDistribution(MeanFunction(inputType vector), 1);

19:51 < zoq> sure you can do that: I guess in this case ActionvationFunction(MeanFunction(...), ..)

19:52 < kris> Yes thats what i mean

19:52 < kris> Thanks

19:53 < zoq> and MeanFunction is an object instatiation of some class? or does it return e.g. arma::mat?

19:55 < zoq> so do you plan to call a function of the Meanfunction class inside the ActionvationFunction?

19:56 < kris> Actually there is the doubt i think because every activation function / distribution would have a constructor taking different number of arguments.

19:56 < kris> So just doing ActivationFunction(MeanFunction(), ...) won't work

19:56 < kris> i guess

19:58 < kris> I think thats where your solution makes more sense

19:59 < zoq> Maybe you can provide a unified constructor, where you provide all necessary parameter but only use the parameter for that particular function.

20:02 < kris> Can you give a example? is this similar to variadic templates ?

20:05 chenzhe has joined #mlpack

20:06 < zoq> Actually I was thinking of something much simpler: https://gist.github.com/zoq/2143ce30da4b28e9c44751c0324b4aef just added some comments

20:47 < kris> Have you guys seen this http://mccormickml.com/2017/02/01/getting-started-with-mlpack/

20:47 < kris> mccormmick blogs are quite popular btw

20:51 < rcurtin> ah yeah, I saw that one, I did not know that mccormick blogs were popular though

20:53 < kris> Well not Karpathy level famous.....:-D

21:00 < rcurtin> :)

22:13 mentekid has quit [Quit: Leaving.]

23:02 kris has quit [Quit: Leaving.]

23:21 kris has joined #mlpack

23:44 < kris> zoq: I made some comments here could you please have a look https://gist.github.com/zoq/2143ce30da4b28e9c44751c0324b4aef