#mlpack on 2017-03-06 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:00 < kris1> zoq: thanks. I am new to use of this type of template programming so sorry if doubts seem silly.

00:02 < zoq> kris1: No worries, it takes some time to dive into the codebase, and we are here to help.

00:02 shihao has quit [Quit: Page closed]

00:02 < rcurtin> kris1: I think I saw earlier that you posted some useful links you'd found to read about template metaprogramming; thanks for doing that, I imagine it is helpful to others to share these links

00:04 sumedhghaisas__ has quit [Ping timeout: 260 seconds]

00:16 < kris1> rcurtin: Sure, i will post more if i come across something interesting.

00:43 < kris1> zoq: are we not suppose to call templated functions using function<int, int> format right. But here we call it using funciton(int, int ). But you say these are equivalent in above case. can you point to a link that explains this.

00:43 < kris1> Btw the error is resolved.

00:48 chvsp has quit [Quit: Page closed]

00:50 sumedhghaisas__ has joined #mlpack

01:16 < zoq> kris1: You can do both, but I think in this particular case it's easier to pass an non used parameter to get the data type.

01:16 < zoq> kris1: Here is an easy example: template<typename T> void test() { T y; } we have to use test<int>(), but we could also do: void template<typename T> test(T x) in this case we do test(x) where x is int x;

01:17 < zoq> kris1: You can also search for "Template argument deduction".

01:20 kris1 has quit [Ping timeout: 240 seconds]

01:58 sumedhghaisas__ has quit [Quit: Ex-Chat]

01:58 sumedhghaisas__ has joined #mlpack

02:00 sumedhghaisas__ has quit [Remote host closed the connection]

02:10 travis-ci has joined #mlpack

02:10 < travis-ci> mlpack/mlpack#1956 (master - 47e872e : Ryan Curtin): The build was fixed.

02:10 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/bd453ac66a9f...47e872e87bfb

02:10 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208045545

02:10 travis-ci has left #mlpack []

02:16 travis-ci has joined #mlpack

02:16 < travis-ci> mlpack/mlpack#1957 (master - f463b88 : Ryan Curtin): The build was fixed.

02:16 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/47e872e87bfb...f463b88bc40f

02:16 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208046468

02:16 travis-ci has left #mlpack []

02:24 shihao has joined #mlpack

02:30 travis-ci has joined #mlpack

02:30 < travis-ci> mlpack/mlpack#1960 (master - 199eb23 : Ryan Curtin): The build was fixed.

02:30 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/f463b88bc40f...199eb23130ab

02:30 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208048696

02:30 travis-ci has left #mlpack []

02:47 travis-ci has joined #mlpack

02:47 < travis-ci> mlpack/mlpack#1961 (master - c2761f0 : Ryan Curtin): The build was broken.

02:47 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/199eb23130ab...c2761f058a34

02:47 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208052114

02:47 travis-ci has left #mlpack []

03:00 travis-ci has joined #mlpack

03:00 < travis-ci> mlpack/mlpack#1962 (master - 1e18f8f : Ryan Curtin): The build was broken.

03:00 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/c2761f058a34...1e18f8f3a6b1

03:00 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208053365

03:00 travis-ci has left #mlpack []

03:07 travis-ci has joined #mlpack

03:07 < travis-ci> mlpack/mlpack#1963 (master - 63aa341 : Ryan Curtin): The build was broken.

03:07 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/1e18f8f3a6b1...63aa341b0605

03:07 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208054865

03:07 travis-ci has left #mlpack []

03:10 travis-ci has joined #mlpack

03:10 < travis-ci> mlpack/mlpack#1964 (master - d032540 : Ryan Curtin): The build was broken.

03:10 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/63aa341b0605...d032540e544b

03:10 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208055086

03:10 travis-ci has left #mlpack []

03:23 drewtran has quit [Ping timeout: 260 seconds]

04:04 stellamberV_ has joined #mlpack

04:30 vinayakvivek has joined #mlpack

04:49 stellamberV_ has quit [Ping timeout: 260 seconds]

04:57 mikeling has quit [Quit: Connection closed for inactivity]

05:02 topology has joined #mlpack

06:01 shihao has quit [Ping timeout: 260 seconds]

06:30 mikeling has joined #mlpack

06:48 thyrix has joined #mlpack

06:52 vpal has joined #mlpack

06:52 vivekp has quit [Ping timeout: 256 seconds]

06:52 vpal is now known as vivekp

07:29 vivekp has quit [Ping timeout: 260 seconds]

07:35 vivekp has joined #mlpack

07:59 vivekp has quit [Ping timeout: 268 seconds]

08:01 vivekp has joined #mlpack

08:45 Vivek__ has joined #mlpack

08:50 Vivek__ has quit [Quit: Page closed]

08:58 < thyrix> Hi, is there a description of datasets used in mlpack_test and the benchmark system?

10:15 thyrix has quit [Ping timeout: 260 seconds]

10:23 diehumblex has quit [Quit: Connection closed for inactivity]

10:27 thyrix has joined #mlpack

10:44 thyrix has quit [Ping timeout: 260 seconds]

10:50 thyrix has joined #mlpack

11:07 shubhamagarwal92 has joined #mlpack

11:08 shubhamagarwal92 has quit [Client Quit]

11:08 topology has quit [Ping timeout: 260 seconds]

11:11 dev1 has joined #mlpack

11:12 dev1 has left #mlpack []

11:45 lazycoder_ has joined #mlpack

11:46 lazycoder_ has quit [Client Quit]

12:03 thyrix has quit [Ping timeout: 260 seconds]

12:08 topology has joined #mlpack

12:09 pvskand has joined #mlpack

12:25 damien has joined #mlpack

12:46 vivekp has quit [Ping timeout: 246 seconds]

12:48 vivekp has joined #mlpack

13:08 marko1777 has joined #mlpack

13:14 marko1777 has quit [Quit: Page closed]

13:35 < rcurtin> thyrix: they are typically UCI repository datasets or randomly generated datasets

13:51 diehumblex has joined #mlpack

14:18 thyrix has joined #mlpack

14:20 tejank10 has joined #mlpack

14:24 tejank10 has quit [Ping timeout: 260 seconds]

14:29 prasanna08 has joined #mlpack

14:44 prasanna08 has quit [Ping timeout: 260 seconds]

14:55 < damien> rcurtin, hello. I have a question. Can we build a machine learning based system that monitors the data from 3 sources and identifies if any of them exceed or trigger a value? I'm relatively new to programming and the machinelearning irc wasn't really helpful.

14:55 < damien> I've been using mlpack so i thought i'll ask here.

14:57 < damien> the sources should be synchronous

14:58 < govg> damien: Do you mean like you have three sensors, and you just need to check when any of them exceed a threshold?

14:58 < govg> Or do you wish to learn such a threshold?

14:59 < damien> i have 3 sensors. which give me similar data. not same but similar. So i intend to watch/train them to recognize danger values.

14:59 < govg> Okay, and how is the danger value specified?

14:59 < damien> I'm not sure about making them learn sucha threashold , the application i wish to build would be critical so i think it would be wise to let it know beforehand.. what do you think?

15:00 < damien> Probably numerical values if thats what you're asking.

15:00 < govg> (if you know the threshold beforehand, can't you simply check if the values exceed the threshold at each time step?)

15:02 < damien> exactly right. That's what i thought too. That we can use a deterministic approach.

15:03 < damien> But my college said that a machine learning would be a better approach.

15:04 < damien> saying that if the data is in sync machine learning would be better else if the data is not in sync we can use deterministic approach.

15:05 < damien> I'm not an expert so I'd like to know how to process this.

15:05 < damien> colleague *

15:06 < rcurtin> so, if you're going to do machine learning to learn a threshold, you would need to have training data

15:06 < rcurtin> where basically you have many observations of these sensor values

15:06 < rcurtin> and for each of these observations, you know whether or not you want to issue a warning

15:07 < rcurtin> if you can't obtain data of this type, then doing machine learning is going to be significantly more difficult, and I would say setting a manual threshold would be a better thing

15:07 < rcurtin> if you *can* obtain data of that type, then I would say that you should do something simple like, e.g., logistic regression

15:08 < rcurtin> where you train the logistic regression classifier on your [sensor values, labels] data (where labels are "warning" or "not warning", or whatever is right for your situation)

15:08 < damien> supervised learning?

15:08 < rcurtin> and then you can simply use that logistic regression classifier with your data from the 3 sources at each time step, and it will predict whether or not you should issue a warning

15:08 < rcurtin> yes, exactly, supervised learning is what I'm suggesting here

15:08 < rcurtin> it's possible to do unsupervised learning here, but based on what I am guessing of your situation here, I might suggest that it would be better to use a manually set threshold instead of doing unsupervised learning

15:09 < rcurtin> whether or not that's actually true... well, not sure, you would be able to know best since you know your situation, but that would at least be my off-the-cuff advice :)

15:11 < damien> I agree. I don't want to go with unsupervised learning either. I do have data which i could use to train it.

15:11 < damien> So do you think this is a better approach that the deterministic way?

15:11 < thyrix> How dose your data look like?

15:12 snd has joined #mlpack

15:13 < damien> thyrix, I didn't get you. I haven't seen the data yet but it is there. So i don't know the format.

15:13 < damien> Its mostly tags and integer values if thats what you mean.

15:14 < rcurtin> if you have a bunch of data for training that is labeled, I think that logistic regression would probably be a better way to go

15:15 < rcurtin> although, if you really want to know, what I would do is build an ROC to evaluate the performance of the logistic regression classifier

15:15 < rcurtin> (unfortunately mlpack has no support for that, you'd have to either implement it manually or probably it's easy enough to use scikit's functionality to do it)

15:15 < rcurtin> and you could also build an ROC for the deterministic threshold classifier you suggested, and you could compare the performance

15:16 < damien> ROC?

15:17 < damien> Oh yes yes. That would be really good.

15:18 < rcurtin> yeah, those are good measures of how your system will perform in practice

15:19 < rcurtin> (under the assumption that the real-world data you'll be getting matches the distribution of the test data you build the ROC with)

15:21 < damien> Yeah so basically the data that i'll be using is previously collected real world data.

15:22 < thyrix> Hi rcurtin: I encountered some issue on the test system, have you see that?

15:23 < damien> But the one thing thats bothering me is, how will the system react to spike values? Like if i were to train the data with low medium values and once i set it in the real world i get super rare high values will the system deal with it?

15:33 snd has quit [Ping timeout: 260 seconds]

15:33 Upendra has joined #mlpack

15:33 < thyrix> generally, if you noticed the spike values, it's better to delete them(or take a value you prefer)before you feed the data to your algorithm

15:35 < Upendra> hi i want to know about gsoc project you offer??

15:37 < damien> thyrix, that would be a bad decision because as i said it is a critical system and every single value is important especially the high ones.

15:38 < zoq> Upendra: Hello, have you seen: https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas and mlpack.org/gsoc.html?

15:45 < thyrix> damien, how you deal with the data will lead to different model. Sometimes it will work well, but it really depends on the data

15:46 < thyrix> Some experiments are necessary

15:48 < rcurtin> damien: you'd need to have some test data with very large values already to see how the system will behave

15:49 < rcurtin> if you can't get much of that data, then maybe the threshold idea is better

15:51 < damien> thyrix, Makes sense.. I'll give some thought to this.

15:51 < rcurtin> I once did a project where we were trying to detect whether or not chickens were stressed based on the noises they were making

15:52 < rcurtin> so we set up microphones in the growout house, and got tons and tons of data in the normal state where they were not stressed

15:52 < rcurtin> then, we modified the environment several times in a way that would stress them

15:52 < rcurtin> but, this gave us only a few instances where they were stressed out

15:52 < rcurtin> so in the end, we were not able to use a sophisticated classifier, but instead we had to use a simple threshold

15:52 < rcurtin> because not enough was known about the ways that chickens would actually react and sound under stress

15:52 < rcurtin> it sounds like, maybe in your case, it is a similar situation

15:53 < rcurtin> (that was a strange project, but I learned a lot about chickens)

15:55 < thyrix> really interesting :)

15:56 < damien> rcurtin, Yes.

15:57 Upendra has quit [Quit: Page closed]

15:58 < damien> The data set sounds like what i have. I shall ponder on your words. And thanks a ton for explaining it so much. This saved me spending 2 nights trying to understand the web.

16:00 < damien> Because, the high values are rare. Your experience gave me a question, you had a very large sample of non-stressed data, and you had a very small size of stressed data. This does affect the working of the system doesn't it?

16:02 < damien> And yeah thats hilarious. XD You'll come to mind the next time i see a chicken coop.

16:03 < rcurtin> (sorry I'm in a meeting, hang on a few minutes)

16:03 < damien> okay.

16:05 < thyrix> If most label are the same, a model just give same result to all data will have a good accuracy, but do not make sense

16:05 kesslerfrost has joined #mlpack

16:09 < thyrix> But rare values in feature has different effect

16:13 < thyrix> If some feature has different effect roughly according to (extremely low, normal, extremely high) or some other criterion, maybe code them as (0, 1, 2) well work better

16:13 < damien> thyrix, depends a lot on the dataset and the usage.

16:17 < damien> thyrix, yeah, thats one way to do it, but are you saying it with a machine learning approach or a static approach? Because if its a machine learning way, can you be more elaborate.

16:18 < thyrix> most machine learning algorithm will learn a function on what you give them, what you need to do is to make it batter to learning.

16:21 < thyrix> large value may cause some instability on a numerical result, but we don't know whether it will be this case..

16:22 < thyrix> so I suggest to try some different approach, and take the one worked best

16:24 < thyrix> If your experiment shows a simple threshold is better, then we should use a threshold

16:31 thyrixx has joined #mlpack

16:31 thyrix has quit [Quit: Page closed]

16:36 kesslerfrost has left #mlpack []

16:38 cult- has quit [Quit: WeeChat 1.4]

16:40 cult- has joined #mlpack

16:47 mikeling has quit [Quit: Connection closed for inactivity]

17:10 < rcurtin> damien: yes, we had very few samples of stressed chickens

17:10 < rcurtin> so, giant class imbalance

17:11 < rcurtin> this presents a problem for supervised learning algorithms

17:11 < rcurtin> therefore the threshold idea was easier to apply

17:11 < rcurtin> another idea might be some outlier detection algorithm

17:11 thyrixx has quit [Remote host closed the connection]

17:11 < rcurtin> but I am not too knowledgeable in that field

17:19 < topology> rcurtin: i think i understand dual-tree KDE now. It is a very straightforward idea. I do, however, have a silly doubt

17:19 < damien> I see the problem. I'll have to check my data set to see how it is. Then for example, if a car is being taught to drive and its shown how to turn left a lot and to turn right a little then the next time its shown a right it will not do it perfectly. This explains the class imbalance right?

17:25 < damien> outlier detection algorithm. yes this could be a solution too. Probably,like you said the best thing is to run an ROC on the different models and check which is most consistent.

17:28 < damien> topology, you could ask your doubt and then wait. maybe someone else will have an answer. :)

17:57 Trion has joined #mlpack

17:58 Trion has quit [Client Quit]

18:03 < rcurtin> damien: yeah, the difficult thing with the ROC is, if you don't have many examples of situations where you need to detect something, then the ROC may not be very descriptive

18:04 < rcurtin> an assumption in the ROC is that the test data reflects the distribution of the real world data, so if the situations you have for your ROC are not an accurate picture of the situations you'll need to detect in real life, then the ROC may be misleading

18:04 < rcurtin> topology: sure, go ahead, ask the question and I can try to answer

18:05 indra has quit [Quit: Connection closed for inactivity]

18:10 topology has quit [Ping timeout: 260 seconds]

18:13 Trion has joined #mlpack

18:17 zoq has quit [Quit: Lost terminal]

18:19 zoq has joined #mlpack

18:19 zoq has quit [Client Quit]

18:20 zoq has joined #mlpack

18:26 pvskand has quit [Ping timeout: 260 seconds]

18:28 Trion has quit [Ping timeout: 240 seconds]

18:28 Trion has joined #mlpack

18:40 Trion has quit [Ping timeout: 240 seconds]

19:02 deepanshu_ has joined #mlpack

19:22 aditya_ has joined #mlpack

19:28 aditya_ has quit [Ping timeout: 240 seconds]

19:47 travis-ci has joined #mlpack

19:47 < travis-ci> mlpack/mlpack#1975 (master - 18680ff : Ryan Curtin): The build is still failing.

19:47 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/d032540e544b...18680ff6193b

19:47 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208314392

19:47 travis-ci has left #mlpack []

19:52 travis-ci has joined #mlpack

19:52 < travis-ci> mlpack/mlpack#1976 (master - 41fa0b5 : Ryan Curtin): The build is still failing.

19:52 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/18680ff6193b...41fa0b5e71d8

19:52 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208316185

19:52 travis-ci has left #mlpack []

19:55 travis-ci has joined #mlpack

19:55 < travis-ci> mlpack/mlpack#1977 (master - 9b7fce8 : Ryan Curtin): The build is still failing.

19:55 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/41fa0b5e71d8...9b7fce811825

19:55 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208316963

19:55 travis-ci has left #mlpack []

20:43 havoc_ has joined #mlpack

20:47 < arunreddy> rcurtin:, zoq: A quick question..

20:48 < arunreddy> Considering a policy class variadic template for SGD, can i safely assume that the order of the poilcy types is known and fixed?

20:49 < arunreddy> <UpdatePolicyType, DecayPolicyType> etc..

20:53 havoc_ has quit [Quit: Page closed]

21:02 < rcurtin> arunreddy: I'm not sure I fully understand, can you elaborate a little more?

21:32 < arunreddy> rcurtin: I have explained it in the following gist: https://gist.github.com/arunreddy/f3bcd2709788b4bdf6c9b2570b26d5db

21:32 deepanshu_ has quit [Quit: Connection closed for inactivity]

21:32 < rcurtin> ok, sure I will take a look shortly

21:32 < arunreddy> Let me know if it needs more clarity.

21:43 < rcurtin> arunreddy: I don't think that variadic templates are needed for the SGD class

21:44 < rcurtin> I think the meaning of what zoq was talking about is that you could use these for the Train() method in some algorithm

21:44 < rcurtin> e.g.

21:44 < rcurtin> template<template<class, class...> class OptimizerType>

21:44 < rcurtin> void LogisticRegression::Train(OptimizerType<LogisticRegressionFunction, ...>& opt)

21:45 < rcurtin> but for the SGD class itself, you can just provide defaults and you can get the same behavior where the user does not need to specify the template type

21:52 travis-ci has joined #mlpack

21:52 < travis-ci> mlpack/mlpack#1979 (master - 805b760 : Ryan Curtin): The build is still failing.

21:52 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/9b7fce811825...805b760e3a73

21:52 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/208357418

21:52 travis-ci has left #mlpack []

21:57 arunreddy has quit [Ping timeout: 246 seconds]

22:34 mayank has joined #mlpack

23:01 arunreddy has joined #mlpack

23:05 < arunreddy> rcurtin: Ok got it. So the code for the StandardSGD remains the same.

23:05 < arunreddy> Thanks

23:11 mayank has quit [Ping timeout: 260 seconds]

23:23 shihao has joined #mlpack

23:24 < shihao> rcurtin: my PR https://github.com/mlpack/mlpack/pull/913 build failed and the failure seems due to gmm test.

23:31 < zoq> shihao: Don't worry about the GMM test, checkout: https://github.com/mlpack/mlpack/issues/922 also no need to post the same message in the github issue and the irc channel, we get a notidication once you make a comment.

23:32 < shihao> zoq: Ok, I'm sorry.

23:34 < zoq> shihao: No problem, just wanted to let you know.

23:35 < shihao> zoq: Does gmm stand for Gaussian Mixture Model? If so, I'd like to take a look at this issue and try to solve it since I just learned a lot Gaussian from nbc.

23:38 < zoq> shihao: yes, feel free to take a look at the test, might be fun :)

23:38 < zoq> shihao: Also, as long as the NBC test or a related test that uses the NBC code does not fail you are fine.

23:40 < shihao> zoq: So how can I rebuild my PR if I changed code that is not a part of this PR?

23:48 < zoq> shihao: You can't restart the build for your PR, not sure why you would do that? As I said as long as your test cases are fine and the rest of the code looks good, we merge it in.

23:49 < shihao> zoq: oh, got it. Thanks!

23:51 < zoq> shihao: If your tests fails and you think it's probably because of some bad initialization, we can restart the build for you.

23:53 < shihao> zoq: It's because of other code. No need for rebuild. Thanks :)

23:55 vinayakvivek has quit [Quit: Connection closed for inactivity]