naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 264 seconds]
jbc__ has joined #mlpack
jbc__ has quit [Quit: jbc__]
Anand has joined #mlpack
< Anand> Marcus : I got a similar error yesterday with scikit. I didn't get it though. I don't know why you are getting 3 plots. You should get only two (one for perceptron and one for logistic).
< Anand> Can you show me the plots?
< Anand> Marcus : Is the bootstrapping table printing correctly on your terminal? I mean does it contain two rows for perceptron?
< Anand> Marcus : Can we see what self.cur.fetchall() returns for the function GetMethodMetricResultsForLibrary(..)?
< Anand> Marcus : There is a bug in your config file. You are using scikit as the library but running mlpack/perceptron.py
< Anand> Marcus : It generates an extra plot because the final metrics dict is built in succession, one library at a time. The buildId loop in make_reports.py runs two times once for each library.
Anand has quit [Quit: Page closed]
< jenkins-mlpack> Project mlpack - nightly matrix build build #549: ABORTED in 7 hr 24 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/549/
jbc__ has joined #mlpack
jbc__ has quit [Client Quit]
jbc__ has joined #mlpack
Anand has joined #mlpack
< Anand> Marcus : The method GetMethodMetricResultForLibrary(..) is returning an empty array in some cases (something wrong with the query?) and hence accessing 0th element of the array is out of bounds and hence the error.
< Anand> Now, I dont understand why and when is it returning empty array
< Anand> Logically it should return empty array when the corresponding metric entry is not found in the table. But, I can see the entry in the database
< marcus_zoq> Anand: Yeah, its in the database.
< Anand> Marcus : What possibly can be the bug then?
< Anand> Query?
< marcus_zoq> Anand: Maybe, I need to go trough the code line by line, give me some minutes
< Anand> Ok, sure.
sumedhghaisas has joined #mlpack
< marcus_zoq> Anand: Okay I fixed the error, but your limit parameter doesn't work. Because there are two build id's (with the example config file). The first build id represents mlpack and the second build id represents scikit. But we benchmark two mlpack functions and one scikit function. So in case of the perceptron benchmark this works but it doesn't work for the logistic regression benchmark.
sumedhghaisas has quit [Ping timeout: 264 seconds]
< Anand> Marcus : What was the error? Can you push the code?
< Anand> I will see what can be done about the limit parameter
< marcus_zoq> Anand: I pushed my changes. We need to check if the results from the query are emtpy. At the beginning we gether all gather all method ids, but we cannot assume that all libraries have results for all method ids so we need to check if the matric results are empty for a particular method id.
< Anand> Marcus : Yes, we need to check that. I will introduce a condition on the array returned by fetchall() and see if it is empty
< marcus_zoq> Anand: I already did this :) if not metrics_string: continue
< Anand> Ok you already did that in make_reports.py!
< Anand> Yes
< Anand> So, I need to fix the limit workaround now. I have also rounded off the metrics to 5 places of decimal.
< marcus_zoq> Anand: I'm not sure how to solve the limit problem.
< marcus_zoq> Anand: Great!
< Anand> Marcus : I will think through
< marcus_zoq> Anand: This could be a solution: https://gist.github.com/zoq/c18d0a695e327ae86f3f -> Now I get LogisticRegression plot; It shows a grpah for the scikit logisitc regression, but there there is no data for the scikit logisitc regression method
< marcus_zoq> Anand: My solution isn't really good, maybe there is a better way to calculate the limit ...
< Anand> Marcus : I didn't really understand. Is it working for you?
< Anand> Marcus : It won't work as expected. Not generating the grouped plot of two libraries together
< Anand> I need to see how to do this
< marcus_zoq> Anand: The fix works, so I get two files LogisticRegression.html and PERCEPTRON.html. The PERCEPTRON.html looks completely correct two graphs one for mlpack and one graph for scikit. The table under the graph is also correct. The LogisticRegression.html file is completely correct there is one graph for mlpack and one graph for scikit, but there should be only one graph for mlpack. The table under the graph is correct and shows only the result for mlpack.
< marcus_zoq> Anand: *The LogisticRegression.html file isn't
Anand_ has joined #mlpack
< Anand_> Marcus : The values plotted are not correct
< Anand_> Both the plots are same (NBC.html and the other)
< Anand_> The libraries also seem to be interchanged in my case
Anand has quit [Ping timeout: 246 seconds]
< marcus_zoq> Anand_: Ah okay, I used the same results twice, do you have an idea to fix this?
< Anand_> Marcus : Not exactly. I will fix this after going through the code again. Need some time.
< Anand_> The metrics.csv has entries for only one library (two different methods). Not sure why
< Anand_> Let me see
< marcus_zoq> Anand_: Okay, sure, I'm heading home in a few minutes, so I'm not available ...
Anand_ has quit [Ping timeout: 246 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
andrewmw94 has joined #mlpack
< jenkins-mlpack> Starting build #2067 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has joined #mlpack
< sumedhghaisas> naywhayare: hey ryan, you free??
< naywhayare> sumedhghaisas: yeah, I am here
< naywhayare> sorry that I was mostly unavailable over the weekend
< sumedhghaisas> ohh thats okay... now I am facing the same problem faced by siddharth ....
< sumedhghaisas> the normal SVD apply will return U, E and V
< sumedhghaisas> same as quick SVD...
< sumedhghaisas> naywhayare: what was the way out??
govg has quit [Ping timeout: 255 seconds]
< naywhayare> so you have X -> U E V^T
< naywhayare> but you need X -> W H
< naywhayare> so just take W = U E and H = V^T, or take W = U and H = E V^T
udit_s has joined #mlpack
< sumedhghaisas> yes... I will just multiply U and r first entries of E
govg has joined #mlpack
< sumedhghaisas> yes... but as we want rank r...
< naywhayare> yeah
< sumedhghaisas> we have to take first r entries of E right??
< naywhayare> right
< sumedhghaisas> okay... my problem was how to call that function...
< sumedhghaisas> you said something about SFINAE that time...
< naywhayare> to call which function?
< sumedhghaisas> factorizer.Apply()
< sumedhghaisas> cause it now takes more arguments...
< naywhayare> well, you're making a wrapper class around arma::svd()
< naywhayare> you don't need factorizer.Apply() to take more arguments (where factorizer is your wrapper class)
< sumedhghaisas> but what about QuickSVD??
< naywhayare> you just need to make that Apply() function map from U E V^T to W and H
< sumedhghaisas> can I add overload of Apply() in QuickSVD??
< naywhayare> hm... that's a compelling point. Siddharth could write a wrapper class for QuicSVD
< naywhayare> or add another Apply() function to QuicSVD
< sumedhghaisas> yes... I would prefer Apply() function...
< naywhayare> or we could use metaprogramming to detect whether the Apply() function takes two or three arguments
< naywhayare> I think adding Apply() to QuicSVD is the right solution, where that overload of Apply() will factorize into a W and H matrix
< sumedhghaisas> I haven't seen siddharth around... I wanted to talk to him about that...
< sumedhghaisas> yes I agree...
< naywhayare> he said he is having networking troubles because he doesn't have permanent network access
< naywhayare> you could email him at siddharth.950@gmail.com
< sumedhghaisas> okay... I will do that...
< sumedhghaisas> and another issue...
< sumedhghaisas> now as I am refactoring CF module... I remember there was an issue about where to factorize the matrix...
< sumedhghaisas> in constructor or in getReccomendations...
< sumedhghaisas> wait let me find a ticket...
< naywhayare> (I'm heating up my lunch... back in a minute)
< sumedhghaisas> okay..
< sumedhghaisas> #351
govg has quit [Ping timeout: 256 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare> sumedhghaisas: sorry that took so long
< naywhayare> I don't think siddharth ever responded to #351, but I think it would be a good idea to move the factorization to the constructor
< sumedhghaisas> yes I agree... but the only problem is... what about the rank??
< naywhayare> the rank is a parameter of the factorization, so it would have to become a parameter for the constructor
< naywhayare> (the constructor of CF, that is)
< sumedhghaisas> okay then... and I just saw #160...
< sumedhghaisas> that means there is no SVD for sparse matrix??
< naywhayare> that's correct
< naywhayare> factorizations for sparse matrices are hard, and there aren't standard packages for any of them
< sumedhghaisas> ohh then the wrapper will create a problem when supplied with sp_mat
< naywhayare> that's fine; you can force the wrapper to take arma::mat, not some template argument
< naywhayare> since it'll only work for dense matrices
< sumedhghaisas> then it will be a very good idea to add it to MLPACK... as no other implementation exists...
< naywhayare> and then you can note in the documentation that it won'
< naywhayare> t work for arma::sp_mat
< naywhayare> well, the problem is that sparse svd is a difficult problem
< sumedhghaisas> should I use SFINAE to force it?? or just remove template from the wrapper??
< naywhayare> just remove the template from the wrapper
< naywhayare> no need for templates when it only works for one type
< sumedhghaisas> okay...
< sumedhghaisas> can we implement this in C++??
< naywhayare> that's just a wrapper around SVDLIBC, which I didn't know about
< naywhayare> either way, wrapping that for armadillo will take a significant amount of effort, and I don't think we should worry about it for now
< naywhayare> especially because I can't even get the SVDLIBC page to load...
< sumedhghaisas> hehe...
< sumedhghaisas> same here...
< sumedhghaisas> its C library... it wouldn't be difficult to create a wrapper...
< naywhayare> yeah, but it's the other intricacies that are the hard part
< naywhayare> the wrapper is easy, but we also need to make sure that armadillo can detect the library and that it links properly against it
< naywhayare> and depending on how they wrote their C code, it may be difficult to make it work with eT = float, double, std::complex<float>, std::complex<double>
< naywhayare> then writing the tests takes forever...
< sumedhghaisas> yeah... thats true...
< sumedhghaisas> maybe after this placement month I will try it... but the ultimate problem, will conrad agree to that??
< naywhayare> haha, that is always the hard part
< naywhayare> I can't find svdlibc in the debian repos
< naywhayare> so it may also involve pushing svdlibc to the repositories of various distros
< sumedhghaisas> that reminds me... what happened to iterator code??
< sumedhghaisas> did you send it to conrad??
< sumedhghaisas> and is MLPACK on any distros??
< jenkins-mlpack> Project mlpack - svn checkin test build #2067: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2067/
< jenkins-mlpack> Ryan Curtin: Minor code cleanups.
< naywhayare> ack, I forgot about the iterator code. I will do that after my meeting in an hour
< naywhayare> mlpack has been in fedora for a few years, and recently got into the debian repos (so it's in Ubuntu, Mint, etc.)
< sumedhghaisas> ohh cool...
< naywhayare> I think someone even packaged it for FreeBSD and Gentoo
< naywhayare> also Arch Linux and maybe homebrew (not sure on that one)
< sumedhghaisas> I also have a doubt in my next implementation... so are you free right now or after the meeting??
< naywhayare> I'm here for an hour... go ahead
< sumedhghaisas> okay... can you refer to paper... Collaborative Filtering for Implicit Feedback Datasets...
< sumedhghaisas> on page 4...
< naywhayare> okay, I am looking at it
< sumedhghaisas> the paragraph starting with... "A computational bottleneck here is computing...
< sumedhghaisas> I didn't understand how their method of computation is faster..
govg has quit [Quit: leaving]
< naywhayare> okay, I see
< naywhayare> so, Y^T Y can be precomputed and stored
< sumedhghaisas> but how exactly it helps??
< naywhayare> well, the trick is in observing that C^u - I is very sparse
< naywhayare> so then Y^T (C^u - I) Y is a fast calculation, way faster than O(f^2 n) because (C^u - I) is sparse
< naywhayare> in an optimizer, you'll calculate the value of x_u many times (that's equation (4))
< naywhayare> so if you could, say, compute and store Y^T Y at the beginning of the optimization, then you only need to calculate Y^T (C^u - I) Y each time you need the objective function
< naywhayare> which can be done much more quickly than calculating Y^T C^u Y
< sumedhghaisas> but does our matrix multiplication support this faster sparse multiplication??
< sumedhghaisas> I mean armadillo ...
< sumedhghaisas> does armadillo support faster sparse mutiplication??
< naywhayare> yes, it should be faster
< sumedhghaisas> for that 'Cu' should be sparse right??
< sumedhghaisas> I mean sp_mat...
< naywhayare> yeah
< sumedhghaisas> Cu will be equal to arma::transpose(V.col(i)) * eye<sp_mat>(n, n)
< sumedhghaisas> humm... okay got it...
< sumedhghaisas> thanks...
< naywhayare> okay, good. let me know if you have any further issues
< sumedhghaisas> yes sure... :)
udit_s has quit [Quit: Leaving]
govg has joined #mlpack
govg has quit []
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< sumedhghaisas> naywhayare: where should I put the wrapper of svd??
< sumedhghaisas> I mean the file...
< naywhayare> sumedhghaisas: why not in methods/cf/?
< sumedhghaisas> naywhayare: create folder svd??
< sumedhghaisas> maybe then we can shift QUICK-SVD inside it...
< naywhayare> let's wait to figure out what to do with all the factorizers until they're done
< naywhayare> I'll try and think of some ideas
< sumedhghaisas> okay... cool... so for right now should I create a folder and add PlainSVD??
< naywhayare> no, just add it to methods/cf/ directly for now
< sumedhghaisas> okay
Anand has joined #mlpack
< Anand> Marcus : Fixed the bug!
< marcus_zoq> Anand: Great!
< Anand> Run and check once.
< marcus_zoq> Anand: Looks good, you can delete the debug message :)
< Anand> Oh yeah! Forgot that. Will remove it
< marcus_zoq> Anand: So the next thing is to integrate the bar chart?
< Anand> The bar chart is already done, right?
< marcus_zoq> Anand: I've seen the template, but I thougt you need to integrate it?
< Anand> Integrate with what?
< marcus_zoq> Anand: A sorry for the confusion, I meant the pie chart
< Anand> Marcus : Yes, I wanted to add the pie chart but I am not yet sure how.
< Anand> The proposal mentions it differently but I dont see a need to represent the true/false positives/negatives using pie charts
< marcus_zoq> Anand: Yeah, I think you are right
< Anand> Marcus : Do you have any suggestions regarding more representations? I think we already did the most useful one
< marcus_zoq> Anand: yeah I think so, I'm a little bit concerned about the representation if we compare a lot more libraries than just two
< Anand> Marcus : You can add any number of libraries in the bar chart that we did
< Anand> Just add more to the config file
< Anand> That is why we call it the grouped bar chart
< marcus_zoq> Anand: Okay, I've tested the code with more libraries and you are right, it looks good, except that the table under the graph isn't correct. But I think we should move the legend to another postion: https://urgs.org/graphs.png
< Anand> Oh! I dont understand why would that happen to the table! And yes, I will move the legend
< Anand> Marcus : Also check the values in the .csv file generated. All the libraries are performing really closely it seems!
< marcus_zoq> Anand: I used the same values for all libraries :)
< Anand> Oh! What happened to the table there?
< marcus_zoq> Anand: Good question
< Anand> Marcus : Can you see the dictionary? See if it is correct
< marcus_zoq> Anand: The dict from the make reports?
< Anand> Yes
< Anand> Marcus : We use that dict to build that HTML table
< Anand> If the dict is correct, the table should also be correct
< marcus_zoq> Anand: I think the dict looks good: https://gist.github.com/zoq/7353d027bb6bbff91c3a
< Anand> Marcus : Ok. Can you just try with 3 libraries once?
< Anand> The code still looks good. I dont see any bug there
< marcus_zoq> Anand: Okay I think I messed up, I deleted everything and now everything looks good
< Anand> Marcus : Ok. So the table is printing correctly? Can you show me?
< Anand> Marcus : Great! Without the graphs though! :P
< marcus_zoq> Anand: oh, wait
< marcus_zoq> Anand: Okay, now it should work
< Anand> Marcus : Nice!
< naywhayare> marcus_zoq: Anand: that looks really nice
< naywhayare> do you think it would be better to group by the metric instead of by the library?
< Anand> Ryan : That is a matter of how you like to make sense out of the visualizations. We can do both. And yes it is a nice idea to group by metrics. I will do that too! :)
< naywhayare> ah, good idea
< Anand> Ryan : Yes actually it is good to do both. Will be more insightful, I guess.
< sumedhghaisas> naywhayare: is there any way to simulate a move constructor in armadillo?
< sumedhghaisas> naywhayare: ahh got it... swap... nevermind
< sumedhghaisas> naywhayare: what to do when matrix with dimension n * m is provided by the user when n != m? I think zeros have to appended right?
< sumedhghaisas> to W, sigma and H
Anand has quit [Quit: Page closed]
< naywhayare> for CF?
< naywhayare> if n != m then the number of users is not equal to the number of items
< naywhayare> I don't see why that's an issue though; SVD will work with non-square matrices
< naywhayare> and so will all of the other factorization techniques
< sumedhghaisas> if n != m and we take SVD ...
< sumedhghaisas> W * diagmat(sigma) * H is invalid...
< sumedhghaisas> dimensions does not match...
< sumedhghaisas> like if we take SVD of 5 * 4 matrix
< sumedhghaisas> sorry W * diagmat(sigma) * trans(H)
< naywhayare> if n != m then sigma is a rectangular diagonal matrix, not a square matrix like I think diagmat() will produce
< naywhayare> so you'd need to make an n x m matrix, then set its diagonal (of length min(n, m)) to sigma
< sumedhghaisas> ahh right...
sumedhghaisas has quit [Ping timeout: 264 seconds]
sumedhghaisas has joined #mlpack
jbc__ has quit [Quit: jbc__]
sumedhghaisas has quit [Ping timeout: 264 seconds]
< jenkins-mlpack> Starting build #2068 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has joined #mlpack
< naywhayare> sumedhghaisas: I don't think the PlainSVD tests are very good; they depend on a particular random seed (10) and I'm not sure what they're checking
< naywhayare> depending on a particular random seed is a bad idea, because if the underlying implementation of arma::randu() or arma::randn() changes (which it does from time to time), the test is invalidated
< sumedhghaisas> ohh okay... so should I remove the RandomSeed??
< sumedhghaisas> its just checking that the wrapper is functioning correctly...
< sumedhghaisas> we test both the Apply() functions and see that the returning residue is valid
< sumedhghaisas> naywhayare: jenkins should build the commit by now right...
< jenkins-mlpack> Project mlpack - svn checkin test build #2068: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2068/
< jenkins-mlpack> sumedhghaisas: * added plain SVD factorization - wrapper of arma::svd for CF module
< jenkins-mlpack> Starting build #2069 for job mlpack - svn checkin test (previous build: SUCCESS)
jbc__ has joined #mlpack