#mlpack on 2016-07-06 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:31 nilay has joined #mlpack

01:32 marcosirc has quit [Quit: WeeChat 1.4]

03:19 tham has joined #mlpack

03:20 < tham> @keonkim I list out the features already done/undone and some questions at here

03:20 < tham> http://pastebin.com/fJTrXqBg

03:22 < tham> I think most of the features proposed are done already

03:23 < tham> Please tell me if I missed something

03:38 < tham> Do mlpack or armadillo provide any function for Gram Schmidt computation?

03:47 < tham> fast pca algorithm use GS(Gram Schmidt) to compute the eigenvector, it is not hard to implement

03:47 < tham> But I prefer to reuse existing solution first

04:04 Stellar_Mind has joined #mlpack

05:00 < nilay> after unrolling an std::tuple using recursion, how can we access both the previous std::get<I - 1> and next, std::get<I+1> element, is it because tuple is a doubly linked list?

05:09 < tham> nilay : Could you post the code?

05:10 < tham> if you pass in the whole tuple parameters

05:10 < tham> you can access it like this std::get<I-1>(tuple_type), std::get<I+1>(tuple_type)

05:11 < tham> You can treat it as a "vector" with random access ability, but you have to specify the index at compile time

05:12 < tham> keonkim : sorry , I use wrong symbol to notify you(by @)

05:15 < tham> You can find out the codes of cnn iterate the tuple by Index, but they still pass the whole network(tuple type)

05:15 < nilay> tham: this is the code used in cnn.hpp , zoq sent me a mail explaining this. . . http://pastebin.com/CsAi4P16 .. so i could use std::get<I + 3> also, its just that we must know the index at compile time, so a loop will not work in this case

05:16 < tham> A normal loop can not

05:16 < tham> you must use recursive to specify the index of the element you want to access

05:17 < nilay> why do we have to give it at compile time?

05:18 < tham> Because it is the restriction of std::tuple

05:19 < tham> If you want to access the element of tuple, you need to know the index of the element at compile time, or the type of the tuple element

05:20 < nilay> ok thanks.

05:23 < nilay> so if all the values in the tuple were int, we could use a loop

05:23 < nilay> ?

05:23 < tham> no, that is not what I mean

05:23 < tham> What I mean is

05:23 < tham> std::tuple<int, double, char> vvv = .....

05:24 < tham> then you can access the element as

05:24 < tham> std::get<int>(vvv), std::get<double>(vvv), std::get<char>(vvv) or

05:24 < tham> std::get<0>(vvv), std::get<double>(vvv), std::get<char>(vvv)

05:25 < tham> but both of the access conditions are known at compile time

05:25 < tham> sorry, it should be

05:25 < tham> std::get<0>(vvv), std::get<1>(vvv), std::get<2>(vvv)

05:27 < nilay> so in the code i pasted before, we don't know what layers are in the network at compile time?

05:29 < tham> You already know it, exactly, when you construct the networks by tuple

05:29 < tham> you already specify the layers type you want to construct

05:34 < tham> nilay : I add some comments on the codes--http://pastebin.com/iTf1Kv3Q

05:35 < tham> Do you need a simpler example?

05:35 < tham> This kind of trick looks a little confusing at first, but you will get familiar with them very soon

05:36 < tham> The different are--The syntax is a little bit different + part of the information have to known at compile time

05:42 < nilay> yes i know what it does, but the thing is templates are also specialized at compile time only

05:43 < nilay> so we know the type at compile time?

05:46 < tham> yes

05:47 < tham> in other words, we have to know them

05:47 < tham> else we cannot apply template tricks

05:48 < tham> If you only can deduce the type/index at run time, then you have to rely on dynamic features

05:48 < tham> Like notorious(in most cases) RAII

05:48 < tham> sorry, not RAII

05:48 < tham> RTTI

05:49 < tham> virtual, std::function

05:49 < tham> void*(hard to see this thing now)

05:50 Mathnerd314_ has quit [Ping timeout: 240 seconds]

05:54 < keonkim> tham: hello, I was out for lunch sorry

05:56 < keonkim> tham: I read your pastebin, the initial plan was written without deep understanding of the features inside mlpack & armadillo. Turns out some of the core functions I mentioned in the proposal already exist in the both libraries.

06:05 < keonkim> tham: I agree with the facts you pointed out in the pastebin.

06:06 < keonkim> For DataIO I also don't like a new dependency (especially that is non-relevant to machine learning) to be introduced to mlpack.

06:07 < keonkim> For Data Transformation.. I think is basically done after DatasetMapper and Imputer is merged.

06:08 < keonkim> For Statistical Analysis, I will finish this by this week.

06:12 < tham> nilay : another example--http://pastebin.com/vfrCCm5U

06:13 < tham> keonkim : looking forward to that

06:14 < keonkim> For Mathematical Operator, I think the most relevant and useful application of being able to parse time data is to allow time series data analysis. but I am not used to this so I need to think on it.

06:15 < keonkim> tham: I guess most of the features written in the proposal is done after statistical analysis.

06:16 < keonkim> ?

06:17 < keonkim> tham: GSoC admins said initial project plans can shift if mentors and mentees agree on it. So I was thinking

06:19 < tham> keonkim : I agree

06:19 < keonkim> 1. I could do a simple project using ann module and write a tutorial.

06:19 < keonkim> 2. Make more Policy and Imputation classes

06:20 < tham> "allow time series data analysis", could you give us some examples?

06:21 < tham> most of the features will be done after statistical analysis

06:26 < keonkim> tham: You know, some economic datasets have titles like "Monthly Milk Production" or "Annual Unemployment Rate". (financial datasets too)

06:27 < keonkim> tham: and they come with all kinds of date format like 1993-04 2013.12 2016/06

06:28 < keonkim> tham: currently mlpack cannot parse them but maybe thats up to users to preprocess it

06:31 < tham> Problem is there are many time formats

06:31 < tham> second question is what do you want to do after you parse them?

06:32 < tham> I think boost date time can parse most of the time format

06:33 < tham> we need to think about this, I would study more data about this, it would be better if there are some libraries can refer to

06:40 < keonkim> after parsing them,

06:40 < keonkim> tham: I wanted to implement stock prediction model using mlpack (I guess that is going to be after gsoc)

06:40 < tham> stock rediction model....

06:40 < tham> prediction model....

06:41 < tham> soud interesting

06:41 < keonkim> :)

06:43 < tham> Which machine learning tech you want to use?

06:43 < tham> ann?

06:45 < keonkim> I think it is going to be a little hard, but the goal is using cnn.

06:45 < tham> keonkim : cnn works very well on image classification tasks

06:45 < tham> but I am not sure it works for stock prediction too

06:50 < keonkim> tham: Yea, there is not much resources on it. Thats why I wanted to give it a try.

06:52 < tham> zoq is the expert of ann, I think you could ask him to give you some recommendation

06:52 < tham> autoencoder, RBM are a more general way to extract features

06:53 < tham> keonkim : hope you get good resutls on stock prediction

07:04 < keonkim> tham: haha I hope :)

07:10 < keonkim> tham: Boost.Date_Time seems to be the perfect library for manipulating data. Maybe I could use this alone instead of introducing it into mlpack.

07:11 mentekid has joined #mlpack

07:13 < tham> keonkim : glad you like it :)

07:50 < nilay> tham: thanks, that example helped,

08:08 mentekid has quit [Ping timeout: 264 seconds]

08:38 tham has quit [Ping timeout: 250 seconds]

09:25 mentekid has joined #mlpack

09:46 Stellar_Mind has quit [Ping timeout: 252 seconds]

10:00 Stellar_Mind has joined #mlpack

10:19 nilay has quit [Ping timeout: 250 seconds]

10:41 Stellar_Mind has quit [Ping timeout: 244 seconds]

10:41 Stellar_Mind has joined #mlpack

10:50 nilay has joined #mlpack

11:21 marcosirc has joined #mlpack

12:15 Stellar_Mind has quit [Ping timeout: 272 seconds]

12:49 nilay has quit [Ping timeout: 250 seconds]

13:44 Stellar_Mind has joined #mlpack

14:12 Mathnerd314 has joined #mlpack

15:01 < keonkim> rcurtin zoq tham: I believe #694 is ready to be merged. Please have a look. :)

15:07 < zoq> keonkim: Sure I'll go and take a look at the code later today, any idea why the imputation test fails?

15:11 < rcurtin> keonkim: I'll take a look also later today

15:26 Stellar_Mind has quit [Ping timeout: 252 seconds]

15:28 travis-ci has joined #mlpack

15:28 < travis-ci> mlpack/mlpack#1176 (master - 98babfc : Ryan Curtin): The build was broken.

15:28 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/3fe0b7299769...98babfc774bc

15:28 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/142782948

15:28 travis-ci has left #mlpack []

16:01 < zoq> nilay: In the PrepareData function, don't we use arma::umat loc(lenLoc / 2, 2); locations instead of arma::umat loc(lenLoc * 2, 2);?

16:09 mentekid has quit [Ping timeout: 258 seconds]

16:27 sumedhghaisas has joined #mlpack

16:34 nilay has joined #mlpack

16:40 < nilay> zoq: we take all the posLoc locations (size = lenLoc) and all the negLoc locations (size = lenLoc). so total lenLoc * 2 locations.

17:27 < keonkim> zoq: the fail caused by the other model (probabilistic)

18:21 < keonkim> oh wait it was failing in ImputationTest. I was confused because the test passes on my computer. I will look into it tomorrow.

18:22 < keonkim> I think it is just a minor bug in test code.

19:07 < zoq> nilay: hm, the reference code returns 1000 locations. In that case, the feature vector is way smaller.

19:08 < zoq> keonkim: sounds good

20:08 travis-ci has joined #mlpack

20:08 < travis-ci> mlpack/mlpack#1179 (master - e7b9b04 : Marcus Edel): The build is still failing.

20:08 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/98babfc774bc...e7b9b042d1d6

20:08 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/142855280

20:08 travis-ci has left #mlpack []

20:25 nilay has quit [Ping timeout: 250 seconds]

20:59 sumedhghaisas has quit [Ping timeout: 260 seconds]

21:00 sumedhghaisas has joined #mlpack

21:04 sumedhghaisas has quit [Read error: No route to host]

21:04 sumedhghaisas has joined #mlpack

21:26 sumedhghaisas has quit [Ping timeout: 258 seconds]

21:39 marcosirc has quit [Quit: WeeChat 1.4]

22:30 < zoq> nilay: n_pos_per_gt = int(ceil(float(n_pos) / n_img / len(bnds))) returns the number of positive locations and n_neg_per_gt the number of negative locations, in both cases 500 locations. If Boundary contains more than 500 postive or negative locations, we just randomly choose the right number of locations.

22:33 < zoq> nilay: We could also use 2000 locations, but that would increase the number of features and at the end the runtime.

22:51 travis-ci has joined #mlpack

22:51 < travis-ci> mlpack/mlpack#1180 (master - 6147ed0 : Marcus Edel): The build was fixed.

22:51 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/e7b9b042d1d6...6147ed01bab6

22:51 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/142897421

22:51 travis-ci has left #mlpack []