verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< kris1>
zoq: thanks. I am new to use of this type of template programming so sorry if doubts seem silly.
< zoq>
kris1: No worries, it takes some time to dive into the codebase, and we are here to help.
shihao has quit [Quit: Page closed]
< rcurtin>
kris1: I think I saw earlier that you posted some useful links you'd found to read about template metaprogramming; thanks for doing that, I imagine it is helpful to others to share these links
sumedhghaisas__ has quit [Ping timeout: 260 seconds]
< kris1>
rcurtin: Sure, i will post more if i come across something interesting.
< kris1>
zoq: are we not suppose to call templated functions using function<int, int> format right. But here we call it using funciton(int, int ). But you say these are equivalent in above case. can you point to a link that explains this.
< kris1>
Btw the error is resolved.
chvsp has quit [Quit: Page closed]
sumedhghaisas__ has joined #mlpack
< zoq>
kris1: You can do both, but I think in this particular case it's easier to pass an non used parameter to get the data type.
< zoq>
kris1: Here is an easy example: template<typename T> void test() { T y; } we have to use test<int>(), but we could also do: void template<typename T> test(T x) in this case we do test(x) where x is int x;
< zoq>
kris1: You can also search for "Template argument deduction".
kris1 has quit [Ping timeout: 240 seconds]
sumedhghaisas__ has quit [Quit: Ex-Chat]
sumedhghaisas__ has joined #mlpack
sumedhghaisas__ has quit [Remote host closed the connection]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#1956 (master - 47e872e : Ryan Curtin): The build was fixed.
mikeling has quit [Quit: Connection closed for inactivity]
topology has joined #mlpack
shihao has quit [Ping timeout: 260 seconds]
mikeling has joined #mlpack
thyrix has joined #mlpack
vpal has joined #mlpack
vivekp has quit [Ping timeout: 256 seconds]
vpal is now known as vivekp
vivekp has quit [Ping timeout: 260 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 268 seconds]
vivekp has joined #mlpack
Vivek__ has joined #mlpack
Vivek__ has quit [Quit: Page closed]
< thyrix>
Hi, is there a description of datasets used in mlpack_test and the benchmark system?
thyrix has quit [Ping timeout: 260 seconds]
diehumblex has quit [Quit: Connection closed for inactivity]
thyrix has joined #mlpack
thyrix has quit [Ping timeout: 260 seconds]
thyrix has joined #mlpack
shubhamagarwal92 has joined #mlpack
shubhamagarwal92 has quit [Client Quit]
topology has quit [Ping timeout: 260 seconds]
dev1 has joined #mlpack
dev1 has left #mlpack []
lazycoder_ has joined #mlpack
lazycoder_ has quit [Client Quit]
thyrix has quit [Ping timeout: 260 seconds]
topology has joined #mlpack
pvskand has joined #mlpack
damien has joined #mlpack
vivekp has quit [Ping timeout: 246 seconds]
vivekp has joined #mlpack
marko1777 has joined #mlpack
marko1777 has quit [Quit: Page closed]
< rcurtin>
thyrix: they are typically UCI repository datasets or randomly generated datasets
diehumblex has joined #mlpack
thyrix has joined #mlpack
tejank10 has joined #mlpack
tejank10 has quit [Ping timeout: 260 seconds]
prasanna08 has joined #mlpack
prasanna08 has quit [Ping timeout: 260 seconds]
< damien>
rcurtin, hello. I have a question. Can we build a machine learning based system that monitors the data from 3 sources and identifies if any of them exceed or trigger a value? I'm relatively new to programming and the machinelearning irc wasn't really helpful.
< damien>
I've been using mlpack so i thought i'll ask here.
< damien>
the sources should be synchronous
< govg>
damien: Do you mean like you have three sensors, and you just need to check when any of them exceed a threshold?
< govg>
Or do you wish to learn such a threshold?
< damien>
i have 3 sensors. which give me similar data. not same but similar. So i intend to watch/train them to recognize danger values.
< govg>
Okay, and how is the danger value specified?
< damien>
I'm not sure about making them learn sucha threashold , the application i wish to build would be critical so i think it would be wise to let it know beforehand.. what do you think?
< damien>
Probably numerical values if thats what you're asking.
< govg>
(if you know the threshold beforehand, can't you simply check if the values exceed the threshold at each time step?)
< damien>
exactly right. That's what i thought too. That we can use a deterministic approach.
< damien>
But my college said that a machine learning would be a better approach.
< damien>
saying that if the data is in sync machine learning would be better else if the data is not in sync we can use deterministic approach.
< damien>
I'm not an expert so I'd like to know how to process this.
< damien>
colleague *
< rcurtin>
so, if you're going to do machine learning to learn a threshold, you would need to have training data
< rcurtin>
where basically you have many observations of these sensor values
< rcurtin>
and for each of these observations, you know whether or not you want to issue a warning
< rcurtin>
if you can't obtain data of this type, then doing machine learning is going to be significantly more difficult, and I would say setting a manual threshold would be a better thing
< rcurtin>
if you *can* obtain data of that type, then I would say that you should do something simple like, e.g., logistic regression
< rcurtin>
where you train the logistic regression classifier on your [sensor values, labels] data (where labels are "warning" or "not warning", or whatever is right for your situation)
< damien>
supervised learning?
< rcurtin>
and then you can simply use that logistic regression classifier with your data from the 3 sources at each time step, and it will predict whether or not you should issue a warning
< rcurtin>
yes, exactly, supervised learning is what I'm suggesting here
< rcurtin>
it's possible to do unsupervised learning here, but based on what I am guessing of your situation here, I might suggest that it would be better to use a manually set threshold instead of doing unsupervised learning
< rcurtin>
whether or not that's actually true... well, not sure, you would be able to know best since you know your situation, but that would at least be my off-the-cuff advice :)
< damien>
I agree. I don't want to go with unsupervised learning either. I do have data which i could use to train it.
< damien>
So do you think this is a better approach that the deterministic way?
< thyrix>
How dose your data look like?
snd has joined #mlpack
< damien>
thyrix, I didn't get you. I haven't seen the data yet but it is there. So i don't know the format.
< damien>
Its mostly tags and integer values if thats what you mean.
< rcurtin>
if you have a bunch of data for training that is labeled, I think that logistic regression would probably be a better way to go
< rcurtin>
although, if you really want to know, what I would do is build an ROC to evaluate the performance of the logistic regression classifier
< rcurtin>
(unfortunately mlpack has no support for that, you'd have to either implement it manually or probably it's easy enough to use scikit's functionality to do it)
< rcurtin>
and you could also build an ROC for the deterministic threshold classifier you suggested, and you could compare the performance
< damien>
ROC?
< damien>
Oh yes yes. That would be really good.
< rcurtin>
yeah, those are good measures of how your system will perform in practice
< rcurtin>
(under the assumption that the real-world data you'll be getting matches the distribution of the test data you build the ROC with)
< damien>
Yeah so basically the data that i'll be using is previously collected real world data.
< thyrix>
Hi rcurtin: I encountered some issue on the test system, have you see that?
< damien>
But the one thing thats bothering me is, how will the system react to spike values? Like if i were to train the data with low medium values and once i set it in the real world i get super rare high values will the system deal with it?
snd has quit [Ping timeout: 260 seconds]
Upendra has joined #mlpack
< thyrix>
generally, if you noticed the spike values, it's better to delete them(or take a value you prefer)before you feed the data to your algorithm
< Upendra>
hi i want to know about gsoc project you offer??
< damien>
thyrix, that would be a bad decision because as i said it is a critical system and every single value is important especially the high ones.
< thyrix>
damien, how you deal with the data will lead to different model. Sometimes it will work well, but it really depends on the data
< thyrix>
Some experiments are necessary
< rcurtin>
damien: you'd need to have some test data with very large values already to see how the system will behave
< rcurtin>
if you can't get much of that data, then maybe the threshold idea is better
< damien>
thyrix, Makes sense.. I'll give some thought to this.
< rcurtin>
I once did a project where we were trying to detect whether or not chickens were stressed based on the noises they were making
< rcurtin>
so we set up microphones in the growout house, and got tons and tons of data in the normal state where they were not stressed
< rcurtin>
then, we modified the environment several times in a way that would stress them
< rcurtin>
but, this gave us only a few instances where they were stressed out
< rcurtin>
so in the end, we were not able to use a sophisticated classifier, but instead we had to use a simple threshold
< rcurtin>
because not enough was known about the ways that chickens would actually react and sound under stress
< rcurtin>
it sounds like, maybe in your case, it is a similar situation
< rcurtin>
(that was a strange project, but I learned a lot about chickens)
< thyrix>
really interesting :)
< damien>
rcurtin, Yes.
Upendra has quit [Quit: Page closed]
< damien>
The data set sounds like what i have. I shall ponder on your words. And thanks a ton for explaining it so much. This saved me spending 2 nights trying to understand the web.
< damien>
Because, the high values are rare. Your experience gave me a question, you had a very large sample of non-stressed data, and you had a very small size of stressed data. This does affect the working of the system doesn't it?
< damien>
And yeah thats hilarious. XD You'll come to mind the next time i see a chicken coop.
< rcurtin>
(sorry I'm in a meeting, hang on a few minutes)
< damien>
okay.
< thyrix>
If most label are the same, a model just give same result to all data will have a good accuracy, but do not make sense
kesslerfrost has joined #mlpack
< thyrix>
But rare values in feature has different effect
< thyrix>
If some feature has different effect roughly according to (extremely low, normal, extremely high) or some other criterion, maybe code them as (0, 1, 2) well work better
< damien>
thyrix, depends a lot on the dataset and the usage.
< damien>
thyrix, yeah, thats one way to do it, but are you saying it with a machine learning approach or a static approach? Because if its a machine learning way, can you be more elaborate.
< thyrix>
most machine learning algorithm will learn a function on what you give them, what you need to do is to make it batter to learning.
< thyrix>
large value may cause some instability on a numerical result, but we don't know whether it will be this case..
< thyrix>
so I suggest to try some different approach, and take the one worked best
< thyrix>
If your experiment shows a simple threshold is better, then we should use a threshold
thyrixx has joined #mlpack
thyrix has quit [Quit: Page closed]
kesslerfrost has left #mlpack []
cult- has quit [Quit: WeeChat 1.4]
cult- has joined #mlpack
mikeling has quit [Quit: Connection closed for inactivity]
< rcurtin>
damien: yes, we had very few samples of stressed chickens
< rcurtin>
so, giant class imbalance
< rcurtin>
this presents a problem for supervised learning algorithms
< rcurtin>
therefore the threshold idea was easier to apply
< rcurtin>
another idea might be some outlier detection algorithm
thyrixx has quit [Remote host closed the connection]
< rcurtin>
but I am not too knowledgeable in that field
< topology>
rcurtin: i think i understand dual-tree KDE now. It is a very straightforward idea. I do, however, have a silly doubt
< damien>
I see the problem. I'll have to check my data set to see how it is. Then for example, if a car is being taught to drive and its shown how to turn left a lot and to turn right a little then the next time its shown a right it will not do it perfectly. This explains the class imbalance right?
< damien>
outlier detection algorithm. yes this could be a solution too. Probably,like you said the best thing is to run an ROC on the different models and check which is most consistent.
< damien>
topology, you could ask your doubt and then wait. maybe someone else will have an answer. :)
Trion has joined #mlpack
Trion has quit [Client Quit]
< rcurtin>
damien: yeah, the difficult thing with the ROC is, if you don't have many examples of situations where you need to detect something, then the ROC may not be very descriptive
< rcurtin>
an assumption in the ROC is that the test data reflects the distribution of the real world data, so if the situations you have for your ROC are not an accurate picture of the situations you'll need to detect in real life, then the ROC may be misleading
< rcurtin>
topology: sure, go ahead, ask the question and I can try to answer
indra has quit [Quit: Connection closed for inactivity]
topology has quit [Ping timeout: 260 seconds]
Trion has joined #mlpack
zoq has quit [Quit: Lost terminal]
zoq has joined #mlpack
zoq has quit [Client Quit]
zoq has joined #mlpack
pvskand has quit [Ping timeout: 260 seconds]
Trion has quit [Ping timeout: 240 seconds]
Trion has joined #mlpack
Trion has quit [Ping timeout: 240 seconds]
deepanshu_ has joined #mlpack
aditya_ has joined #mlpack
aditya_ has quit [Ping timeout: 240 seconds]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#1975 (master - 18680ff : Ryan Curtin): The build is still failing.
< rcurtin>
but for the SGD class itself, you can just provide defaults and you can get the same behavior where the user does not need to specify the template type
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#1979 (master - 805b760 : Ryan Curtin): The build is still failing.
< zoq>
shihao: Don't worry about the GMM test, checkout: https://github.com/mlpack/mlpack/issues/922 also no need to post the same message in the github issue and the irc channel, we get a notidication once you make a comment.
< shihao>
zoq: Ok, I'm sorry.
< zoq>
shihao: No problem, just wanted to let you know.
< shihao>
zoq: Does gmm stand for Gaussian Mixture Model? If so, I'd like to take a look at this issue and try to solve it since I just learned a lot Gaussian from nbc.
< zoq>
shihao: yes, feel free to take a look at the test, might be fun :)
< zoq>
shihao: Also, as long as the NBC test or a related test that uses the NBC code does not fail you are fine.
< shihao>
zoq: So how can I rebuild my PR if I changed code that is not a part of this PR?
< zoq>
shihao: You can't restart the build for your PR, not sure why you would do that? As I said as long as your test cases are fine and the rest of the code looks good, we merge it in.
< shihao>
zoq: oh, got it. Thanks!
< zoq>
shihao: If your tests fails and you think it's probably because of some bad initialization, we can restart the build for you.
< shihao>
zoq: It's because of other code. No need for rebuild. Thanks :)
vinayakvivek has quit [Quit: Connection closed for inactivity]