#mlpack on 2014-08-04 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:47 sumedhghaisas has quit [Ping timeout: 264 seconds]

02:01 jbc__ has joined #mlpack

03:16 jbc__ has quit [Quit: jbc__]

07:58 Anand has joined #mlpack

08:00 < Anand> Marcus : I got a similar error yesterday with scikit. I didn't get it though. I don't know why you are getting 3 plots. You should get only two (one for perceptron and one for logistic).

08:00 < Anand> Can you show me the plots?

08:13 < Anand> Marcus : Is the bootstrapping table printing correctly on your terminal? I mean does it contain two rows for perceptron?

08:42 < Anand> Marcus : Can we see what self.cur.fetchall() returns for the function GetMethodMetricResultsForLibrary(..)?

08:46 < Anand> Marcus : There is a bug in your config file. You are using scikit as the library but running mlpack/perceptron.py

09:33 < Anand> Marcus : It generates an extra plot because the final metrics dict is built in succession, one library at a time. The buildId loop in make_reports.py runs two times once for each library.

10:12 Anand has quit [Quit: Page closed]

11:24 < jenkins-mlpack> Project mlpack - nightly matrix build build #549: ABORTED in 7 hr 24 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/549/

11:42 jbc__ has joined #mlpack

11:47 jbc__ has quit [Client Quit]

12:22 jbc__ has joined #mlpack

12:49 Anand has joined #mlpack

12:50 < Anand> Marcus : The method GetMethodMetricResultForLibrary(..) is returning an empty array in some cases (something wrong with the query?) and hence accessing 0th element of the array is out of bounds and hence the error.

12:50 < Anand> Now, I dont understand why and when is it returning empty array

12:52 < Anand> Logically it should return empty array when the corresponding metric entry is not found in the table. But, I can see the entry in the database

12:53 < marcus_zoq> Anand: Yeah, its in the database.

12:54 < Anand> Marcus : What possibly can be the bug then?

12:54 < Anand> Query?

12:56 < marcus_zoq> Anand: Maybe, I need to go trough the code line by line, give me some minutes

12:57 < Anand> Ok, sure.

13:03 sumedhghaisas has joined #mlpack

13:22 < marcus_zoq> Anand: Okay I fixed the error, but your limit parameter doesn't work. Because there are two build id's (with the example config file). The first build id represents mlpack and the second build id represents scikit. But we benchmark two mlpack functions and one scikit function. So in case of the perceptron benchmark this works but it doesn't work for the logistic regression benchmark.

13:22 sumedhghaisas has quit [Ping timeout: 264 seconds]

13:24 < Anand> Marcus : What was the error? Can you push the code?

13:25 < Anand> I will see what can be done about the limit parameter

13:32 < marcus_zoq> Anand: I pushed my changes. We need to check if the results from the query are emtpy. At the beginning we gether all gather all method ids, but we cannot assume that all libraries have results for all method ids so we need to check if the matric results are empty for a particular method id.

13:34 < Anand> Marcus : Yes, we need to check that. I will introduce a condition on the array returned by fetchall() and see if it is empty

13:36 < marcus_zoq> Anand: I already did this :) if not metrics_string: continue

13:36 < Anand> Ok you already did that in make_reports.py!

13:36 < Anand> Yes

13:36 < Anand> So, I need to fix the limit workaround now. I have also rounded off the metrics to 5 places of decimal.

13:37 < marcus_zoq> Anand: I'm not sure how to solve the limit problem.

13:37 < marcus_zoq> Anand: Great!

13:37 < Anand> Marcus : I will think through

13:48 < marcus_zoq> Anand: This could be a solution: https://gist.github.com/zoq/c18d0a695e327ae86f3f -> Now I get LogisticRegression plot; It shows a grpah for the scikit logisitc regression, but there there is no data for the scikit logisitc regression method

13:49 < marcus_zoq> Anand: My solution isn't really good, maybe there is a better way to calculate the limit ...

14:26 < Anand> Marcus : I didn't really understand. Is it working for you?

14:32 < Anand> Marcus : It won't work as expected. Not generating the grouped plot of two libraries together

14:32 < Anand> I need to see how to do this

14:33 < marcus_zoq> Anand: The fix works, so I get two files LogisticRegression.html and PERCEPTRON.html. The PERCEPTRON.html looks completely correct two graphs one for mlpack and one graph for scikit. The table under the graph is also correct. The LogisticRegression.html file is completely correct there is one graph for mlpack and one graph for scikit, but there should be only one graph for mlpack. The table under the graph is correct and shows only the result for mlpack.

14:34 < marcus_zoq> Anand: *The LogisticRegression.html file isn't

14:38 Anand_ has joined #mlpack

14:38 < Anand_> Marcus : The values plotted are not correct

14:38 < Anand_> Both the plots are same (NBC.html and the other)

14:39 < Anand_> The libraries also seem to be interchanged in my case

14:40 Anand has quit [Ping timeout: 246 seconds]

14:41 < marcus_zoq> Anand_: Ah okay, I used the same results twice, do you have an idea to fix this?

14:42 < Anand_> Marcus : Not exactly. I will fix this after going through the code again. Need some time.

14:44 < Anand_> The metrics.csv has entries for only one library (two different methods). Not sure why

14:44 < Anand_> Let me see

14:45 < marcus_zoq> Anand_: Okay, sure, I'm heading home in a few minutes, so I'm not available ...

14:50 Anand_ has quit [Ping timeout: 246 seconds]

15:01 govg has joined #mlpack

15:01 govg has quit [Changing host]

15:01 govg has joined #mlpack

15:20 andrewmw94 has joined #mlpack

15:24 < jenkins-mlpack> Starting build #2067 for job mlpack - svn checkin test (previous build: SUCCESS)

16:16 sumedhghaisas has joined #mlpack

16:16 < sumedhghaisas> naywhayare: hey ryan, you free??

16:17 < naywhayare> sumedhghaisas: yeah, I am here

16:17 < naywhayare> sorry that I was mostly unavailable over the weekend

16:18 < sumedhghaisas> ohh thats okay... now I am facing the same problem faced by siddharth ....

16:19 < sumedhghaisas> the normal SVD apply will return U, E and V

16:19 < sumedhghaisas> same as quick SVD...

16:19 < sumedhghaisas> naywhayare: what was the way out??

16:20 govg has quit [Ping timeout: 255 seconds]

16:20 < naywhayare> so you have X -> U E V^T

16:20 < naywhayare> but you need X -> W H

16:20 < naywhayare> so just take W = U E and H = V^T, or take W = U and H = E V^T

16:20 udit_s has joined #mlpack

16:20 < sumedhghaisas> yes... I will just multiply U and r first entries of E

16:20 govg has joined #mlpack

16:20 < sumedhghaisas> yes... but as we want rank r...

16:21 < naywhayare> yeah

16:21 < sumedhghaisas> we have to take first r entries of E right??

16:21 < naywhayare> right

16:21 < sumedhghaisas> okay... my problem was how to call that function...

16:22 < sumedhghaisas> you said something about SFINAE that time...

16:22 < naywhayare> to call which function?

16:22 < sumedhghaisas> factorizer.Apply()

16:22 < sumedhghaisas> cause it now takes more arguments...

16:22 < naywhayare> well, you're making a wrapper class around arma::svd()

16:22 < naywhayare> you don't need factorizer.Apply() to take more arguments (where factorizer is your wrapper class)

16:23 < sumedhghaisas> but what about QuickSVD??

16:23 < naywhayare> you just need to make that Apply() function map from U E V^T to W and H

16:23 < sumedhghaisas> can I add overload of Apply() in QuickSVD??

16:23 < naywhayare> hm... that's a compelling point. Siddharth could write a wrapper class for QuicSVD

16:23 < naywhayare> or add another Apply() function to QuicSVD

16:24 < sumedhghaisas> yes... I would prefer Apply() function...

16:24 < naywhayare> or we could use metaprogramming to detect whether the Apply() function takes two or three arguments

16:24 < naywhayare> I think adding Apply() to QuicSVD is the right solution, where that overload of Apply() will factorize into a W and H matrix

16:24 < sumedhghaisas> I haven't seen siddharth around... I wanted to talk to him about that...

16:24 < sumedhghaisas> yes I agree...

16:25 < naywhayare> he said he is having networking troubles because he doesn't have permanent network access

16:25 < naywhayare> you could email him at siddharth.950@gmail.com

16:25 < sumedhghaisas> okay... I will do that...

16:25 < sumedhghaisas> and another issue...

16:26 < sumedhghaisas> now as I am refactoring CF module... I remember there was an issue about where to factorize the matrix...

16:26 < sumedhghaisas> in constructor or in getReccomendations...

16:27 < sumedhghaisas> wait let me find a ticket...

16:27 < naywhayare> (I'm heating up my lunch... back in a minute)

16:27 < sumedhghaisas> okay..

16:29 < sumedhghaisas> #351

16:34 govg has quit [Ping timeout: 256 seconds]

16:35 govg has joined #mlpack

16:35 govg has quit [Changing host]

16:35 govg has joined #mlpack

16:37 < naywhayare> sumedhghaisas: sorry that took so long

16:38 < naywhayare> I don't think siddharth ever responded to #351, but I think it would be a good idea to move the factorization to the constructor

16:38 < sumedhghaisas> yes I agree... but the only problem is... what about the rank??

16:38 < naywhayare> the rank is a parameter of the factorization, so it would have to become a parameter for the constructor

16:39 < naywhayare> (the constructor of CF, that is)

16:39 < sumedhghaisas> okay then... and I just saw #160...

16:40 < sumedhghaisas> that means there is no SVD for sparse matrix??

16:40 < naywhayare> that's correct

16:40 < naywhayare> factorizations for sparse matrices are hard, and there aren't standard packages for any of them

16:40 < sumedhghaisas> ohh then the wrapper will create a problem when supplied with sp_mat

16:41 < naywhayare> that's fine; you can force the wrapper to take arma::mat, not some template argument

16:41 < naywhayare> since it'll only work for dense matrices

16:41 < sumedhghaisas> then it will be a very good idea to add it to MLPACK... as no other implementation exists...

16:41 < naywhayare> and then you can note in the documentation that it won'

16:41 < naywhayare> t work for arma::sp_mat

16:41 < naywhayare> well, the problem is that sparse svd is a difficult problem

16:42 < sumedhghaisas> should I use SFINAE to force it?? or just remove template from the wrapper??

16:42 < naywhayare> just remove the template from the wrapper

16:42 < naywhayare> no need for templates when it only works for one type

16:43 < sumedhghaisas> okay...

16:43 < sumedhghaisas> https://pypi.python.org/pypi/sparsesvd/

16:43 < sumedhghaisas> can we implement this in C++??

16:44 < naywhayare> that's just a wrapper around SVDLIBC, which I didn't know about

16:44 < naywhayare> either way, wrapping that for armadillo will take a significant amount of effort, and I don't think we should worry about it for now

16:44 < naywhayare> especially because I can't even get the SVDLIBC page to load...

16:45 < sumedhghaisas> hehe...

16:45 < sumedhghaisas> same here...

16:49 < sumedhghaisas> its C library... it wouldn't be difficult to create a wrapper...

16:50 < naywhayare> yeah, but it's the other intricacies that are the hard part

16:50 < naywhayare> the wrapper is easy, but we also need to make sure that armadillo can detect the library and that it links properly against it

16:50 < naywhayare> and depending on how they wrote their C code, it may be difficult to make it work with eT = float, double, std::complex<float>, std::complex<double>

16:50 < naywhayare> then writing the tests takes forever...

16:51 < sumedhghaisas> yeah... thats true...

16:51 < sumedhghaisas> maybe after this placement month I will try it... but the ultimate problem, will conrad agree to that??

16:52 < naywhayare> haha, that is always the hard part

16:52 < naywhayare> I can't find svdlibc in the debian repos

16:52 < naywhayare> so it may also involve pushing svdlibc to the repositories of various distros

16:53 < sumedhghaisas> that reminds me... what happened to iterator code??

16:53 < sumedhghaisas> did you send it to conrad??

16:54 < sumedhghaisas> and is MLPACK on any distros??

16:54 < jenkins-mlpack> Project mlpack - svn checkin test build #2067: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2067/

16:54 < jenkins-mlpack> Ryan Curtin: Minor code cleanups.

16:55 < naywhayare> ack, I forgot about the iterator code. I will do that after my meeting in an hour

16:55 < naywhayare> mlpack has been in fedora for a few years, and recently got into the debian repos (so it's in Ubuntu, Mint, etc.)

16:55 < sumedhghaisas> ohh cool...

16:56 < naywhayare> I think someone even packaged it for FreeBSD and Gentoo

16:56 < naywhayare> also Arch Linux and maybe homebrew (not sure on that one)

16:57 < sumedhghaisas> I also have a doubt in my next implementation... so are you free right now or after the meeting??

16:57 < naywhayare> I'm here for an hour... go ahead

16:58 < sumedhghaisas> okay... can you refer to paper... Collaborative Filtering for Implicit Feedback Datasets...

16:59 < sumedhghaisas> on page 4...

16:59 < naywhayare> okay, I am looking at it

16:59 < sumedhghaisas> the paragraph starting with... "A computational bottleneck here is computing...

17:00 < sumedhghaisas> I didn't understand how their method of computation is faster..

17:00 govg has quit [Quit: leaving]

17:01 < naywhayare> okay, I see

17:01 < naywhayare> so, Y^T Y can be precomputed and stored

17:02 < sumedhghaisas> but how exactly it helps??

17:02 < naywhayare> well, the trick is in observing that C^u - I is very sparse

17:02 < naywhayare> so then Y^T (C^u - I) Y is a fast calculation, way faster than O(f^2 n) because (C^u - I) is sparse

17:03 < naywhayare> in an optimizer, you'll calculate the value of x_u many times (that's equation (4))

17:03 < naywhayare> so if you could, say, compute and store Y^T Y at the beginning of the optimization, then you only need to calculate Y^T (C^u - I) Y each time you need the objective function

17:04 < naywhayare> which can be done much more quickly than calculating Y^T C^u Y

17:04 < sumedhghaisas> but does our matrix multiplication support this faster sparse multiplication??

17:04 < sumedhghaisas> I mean armadillo ...

17:05 < sumedhghaisas> does armadillo support faster sparse mutiplication??

17:05 < naywhayare> yes, it should be faster

17:06 < sumedhghaisas> for that 'Cu' should be sparse right??

17:06 < sumedhghaisas> I mean sp_mat...

17:07 < naywhayare> yeah

17:10 < sumedhghaisas> Cu will be equal to arma::transpose(V.col(i)) * eye<sp_mat>(n, n)

17:11 < sumedhghaisas> humm... okay got it...

17:11 < sumedhghaisas> thanks...

17:11 < naywhayare> okay, good. let me know if you have any further issues

17:12 < sumedhghaisas> yes sure... :)

17:17 udit_s has quit [Quit: Leaving]

17:24 govg has joined #mlpack

17:47 govg has quit []

18:18 Anand has joined #mlpack

18:34 Anand has quit [Ping timeout: 246 seconds]

18:35 < sumedhghaisas> naywhayare: where should I put the wrapper of svd??

18:35 < sumedhghaisas> I mean the file...

18:40 < naywhayare> sumedhghaisas: why not in methods/cf/?

18:41 < sumedhghaisas> naywhayare: create folder svd??

18:41 < sumedhghaisas> maybe then we can shift QUICK-SVD inside it...

18:44 < naywhayare> let's wait to figure out what to do with all the factorizers until they're done

18:44 < naywhayare> I'll try and think of some ideas

18:44 < sumedhghaisas> okay... cool... so for right now should I create a folder and add PlainSVD??

18:48 < naywhayare> no, just add it to methods/cf/ directly for now

18:48 < sumedhghaisas> okay

18:51 Anand has joined #mlpack

18:51 < Anand> Marcus : Fixed the bug!

18:52 < marcus_zoq> Anand: Great!

18:52 < Anand> Run and check once.

18:54 < marcus_zoq> Anand: Looks good, you can delete the debug message :)

18:55 < Anand> Oh yeah! Forgot that. Will remove it

18:56 < marcus_zoq> Anand: So the next thing is to integrate the bar chart?

18:57 < Anand> The bar chart is already done, right?

18:58 < marcus_zoq> Anand: I've seen the template, but I thougt you need to integrate it?

18:58 < Anand> Integrate with what?

19:00 < marcus_zoq> Anand: A sorry for the confusion, I meant the pie chart

19:01 < Anand> Marcus : Yes, I wanted to add the pie chart but I am not yet sure how.

19:02 < Anand> The proposal mentions it differently but I dont see a need to represent the true/false positives/negatives using pie charts

19:03 < marcus_zoq> Anand: Yeah, I think you are right

19:04 < Anand> Marcus : Do you have any suggestions regarding more representations? I think we already did the most useful one

19:06 < marcus_zoq> Anand: yeah I think so, I'm a little bit concerned about the representation if we compare a lot more libraries than just two

19:07 < Anand> Marcus : You can add any number of libraries in the bar chart that we did

19:07 < Anand> Just add more to the config file

19:07 < Anand> That is why we call it the grouped bar chart

19:16 < marcus_zoq> Anand: Okay, I've tested the code with more libraries and you are right, it looks good, except that the table under the graph isn't correct. But I think we should move the legend to another postion: https://urgs.org/graphs.png

19:17 < Anand> Oh! I dont understand why would that happen to the table! And yes, I will move the legend

19:19 < Anand> Marcus : Also check the values in the .csv file generated. All the libraries are performing really closely it seems!

19:19 < marcus_zoq> Anand: I used the same values for all libraries :)

19:20 < Anand> Oh! What happened to the table there?

19:21 < marcus_zoq> Anand: Good question

19:21 < Anand> Marcus : Can you see the dictionary? See if it is correct

19:22 < marcus_zoq> Anand: The dict from the make reports?

19:22 < Anand> Yes

19:22 < Anand> Marcus : We use that dict to build that HTML table

19:23 < Anand> If the dict is correct, the table should also be correct

19:23 < marcus_zoq> Anand: I think the dict looks good: https://gist.github.com/zoq/7353d027bb6bbff91c3a

19:24 < Anand> Marcus : Ok. Can you just try with 3 libraries once?

19:26 < Anand> The code still looks good. I dont see any bug there

19:26 < marcus_zoq> Anand: Okay I think I messed up, I deleted everything and now everything looks good

19:26 < Anand> Marcus : Ok. So the table is printing correctly? Can you show me?

19:28 < marcus_zoq> Anand: https://urgs.org/PERCEPTRON.html

19:29 < Anand> Marcus : Great! Without the graphs though! :P

19:29 < marcus_zoq> Anand: oh, wait

19:33 < marcus_zoq> Anand: Okay, now it should work

19:33 < Anand> Marcus : Nice!

19:37 < naywhayare> marcus_zoq: Anand: that looks really nice

19:38 < naywhayare> do you think it would be better to group by the metric instead of by the library?

19:40 < Anand> Ryan : That is a matter of how you like to make sense out of the visualizations. We can do both. And yes it is a nice idea to group by metrics. I will do that too! :)

19:41 < naywhayare> ah, good idea

19:43 < Anand> Ryan : Yes actually it is good to do both. Will be more insightful, I guess.

19:43 < sumedhghaisas> naywhayare: is there any way to simulate a move constructor in armadillo?

19:44 < sumedhghaisas> naywhayare: ahh got it... swap... nevermind

19:52 < sumedhghaisas> naywhayare: what to do when matrix with dimension n * m is provided by the user when n != m? I think zeros have to appended right?

19:53 < sumedhghaisas> to W, sigma and H

20:03 Anand has quit [Quit: Page closed]

20:06 < naywhayare> for CF?

20:06 < naywhayare> if n != m then the number of users is not equal to the number of items

20:06 < naywhayare> I don't see why that's an issue though; SVD will work with non-square matrices

20:06 < naywhayare> and so will all of the other factorization techniques

20:07 < sumedhghaisas> if n != m and we take SVD ...

20:07 < sumedhghaisas> W * diagmat(sigma) * H is invalid...

20:07 < sumedhghaisas> dimensions does not match...

20:08 < sumedhghaisas> like if we take SVD of 5 * 4 matrix

20:09 < sumedhghaisas> sorry W * diagmat(sigma) * trans(H)

20:09 < naywhayare> if n != m then sigma is a rectangular diagonal matrix, not a square matrix like I think diagmat() will produce

20:09 < naywhayare> so you'd need to make an n x m matrix, then set its diagonal (of length min(n, m)) to sigma

20:11 < sumedhghaisas> ahh right...

20:52 sumedhghaisas has quit [Ping timeout: 264 seconds]

21:13 sumedhghaisas has joined #mlpack

21:27 jbc__ has quit [Quit: jbc__]

21:44 sumedhghaisas has quit [Ping timeout: 264 seconds]

21:49 < jenkins-mlpack> Starting build #2068 for job mlpack - svn checkin test (previous build: SUCCESS)

22:14 sumedhghaisas has joined #mlpack

22:21 < naywhayare> sumedhghaisas: I don't think the PlainSVD tests are very good; they depend on a particular random seed (10) and I'm not sure what they're checking

22:22 < naywhayare> depending on a particular random seed is a bad idea, because if the underlying implementation of arma::randu() or arma::randn() changes (which it does from time to time), the test is invalidated

22:22 < sumedhghaisas> ohh okay... so should I remove the RandomSeed??

22:23 < sumedhghaisas> its just checking that the wrapper is functioning correctly...

22:23 < sumedhghaisas> we test both the Apply() functions and see that the returning residue is valid

22:24 < sumedhghaisas> naywhayare: jenkins should build the commit by now right...

23:19 < jenkins-mlpack> Project mlpack - svn checkin test build #2068: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2068/

23:19 < jenkins-mlpack> sumedhghaisas: * added plain SVD factorization - wrapper of arma::svd for CF module

23:19 < jenkins-mlpack> Starting build #2069 for job mlpack - svn checkin test (previous build: SUCCESS)

23:53 jbc__ has joined #mlpack