naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 264 seconds]
jbc__ has joined #mlpack
jbc__ has quit [Quit: jbc__]
Anand has joined #mlpack
< Anand>
Marcus : I got a similar error yesterday with scikit. I didn't get it though. I don't know why you are getting 3 plots. You should get only two (one for perceptron and one for logistic).
< Anand>
Can you show me the plots?
< Anand>
Marcus : Is the bootstrapping table printing correctly on your terminal? I mean does it contain two rows for perceptron?
< Anand>
Marcus : Can we see what self.cur.fetchall() returns for the function GetMethodMetricResultsForLibrary(..)?
< Anand>
Marcus : There is a bug in your config file. You are using scikit as the library but running mlpack/perceptron.py
< Anand>
Marcus : It generates an extra plot because the final metrics dict is built in succession, one library at a time. The buildId loop in make_reports.py runs two times once for each library.
< Anand>
Marcus : The method GetMethodMetricResultForLibrary(..) is returning an empty array in some cases (something wrong with the query?) and hence accessing 0th element of the array is out of bounds and hence the error.
< Anand>
Now, I dont understand why and when is it returning empty array
< Anand>
Logically it should return empty array when the corresponding metric entry is not found in the table. But, I can see the entry in the database
< marcus_zoq>
Anand: Yeah, its in the database.
< Anand>
Marcus : What possibly can be the bug then?
< Anand>
Query?
< marcus_zoq>
Anand: Maybe, I need to go trough the code line by line, give me some minutes
< Anand>
Ok, sure.
sumedhghaisas has joined #mlpack
< marcus_zoq>
Anand: Okay I fixed the error, but your limit parameter doesn't work. Because there are two build id's (with the example config file). The first build id represents mlpack and the second build id represents scikit. But we benchmark two mlpack functions and one scikit function. So in case of the perceptron benchmark this works but it doesn't work for the logistic regression benchmark.
sumedhghaisas has quit [Ping timeout: 264 seconds]
< Anand>
Marcus : What was the error? Can you push the code?
< Anand>
I will see what can be done about the limit parameter
< marcus_zoq>
Anand: I pushed my changes. We need to check if the results from the query are emtpy. At the beginning we gether all gather all method ids, but we cannot assume that all libraries have results for all method ids so we need to check if the matric results are empty for a particular method id.
< Anand>
Marcus : Yes, we need to check that. I will introduce a condition on the array returned by fetchall() and see if it is empty
< marcus_zoq>
Anand: I already did this :) if not metrics_string: continue
< Anand>
Ok you already did that in make_reports.py!
< Anand>
Yes
< Anand>
So, I need to fix the limit workaround now. I have also rounded off the metrics to 5 places of decimal.
< marcus_zoq>
Anand: I'm not sure how to solve the limit problem.
< marcus_zoq>
Anand: Great!
< Anand>
Marcus : I will think through
< marcus_zoq>
Anand: This could be a solution: https://gist.github.com/zoq/c18d0a695e327ae86f3f -> Now I get LogisticRegression plot; It shows a grpah for the scikit logisitc regression, but there there is no data for the scikit logisitc regression method
< marcus_zoq>
Anand: My solution isn't really good, maybe there is a better way to calculate the limit ...
< Anand>
Marcus : I didn't really understand. Is it working for you?
< Anand>
Marcus : It won't work as expected. Not generating the grouped plot of two libraries together
< Anand>
I need to see how to do this
< marcus_zoq>
Anand: The fix works, so I get two files LogisticRegression.html and PERCEPTRON.html. The PERCEPTRON.html looks completely correct two graphs one for mlpack and one graph for scikit. The table under the graph is also correct. The LogisticRegression.html file is completely correct there is one graph for mlpack and one graph for scikit, but there should be only one graph for mlpack. The table under the graph is correct and shows only the result for mlpack.
< naywhayare>
that's just a wrapper around SVDLIBC, which I didn't know about
< naywhayare>
either way, wrapping that for armadillo will take a significant amount of effort, and I don't think we should worry about it for now
< naywhayare>
especially because I can't even get the SVDLIBC page to load...
< sumedhghaisas>
hehe...
< sumedhghaisas>
same here...
< sumedhghaisas>
its C library... it wouldn't be difficult to create a wrapper...
< naywhayare>
yeah, but it's the other intricacies that are the hard part
< naywhayare>
the wrapper is easy, but we also need to make sure that armadillo can detect the library and that it links properly against it
< naywhayare>
and depending on how they wrote their C code, it may be difficult to make it work with eT = float, double, std::complex<float>, std::complex<double>
< naywhayare>
then writing the tests takes forever...
< sumedhghaisas>
yeah... thats true...
< sumedhghaisas>
maybe after this placement month I will try it... but the ultimate problem, will conrad agree to that??
< naywhayare>
haha, that is always the hard part
< naywhayare>
I can't find svdlibc in the debian repos
< naywhayare>
so it may also involve pushing svdlibc to the repositories of various distros
< sumedhghaisas>
that reminds me... what happened to iterator code??
< jenkins-mlpack>
Ryan Curtin: Minor code cleanups.
< naywhayare>
ack, I forgot about the iterator code. I will do that after my meeting in an hour
< naywhayare>
mlpack has been in fedora for a few years, and recently got into the debian repos (so it's in Ubuntu, Mint, etc.)
< sumedhghaisas>
ohh cool...
< naywhayare>
I think someone even packaged it for FreeBSD and Gentoo
< naywhayare>
also Arch Linux and maybe homebrew (not sure on that one)
< sumedhghaisas>
I also have a doubt in my next implementation... so are you free right now or after the meeting??
< naywhayare>
I'm here for an hour... go ahead
< sumedhghaisas>
okay... can you refer to paper... Collaborative Filtering for Implicit Feedback Datasets...
< sumedhghaisas>
on page 4...
< naywhayare>
okay, I am looking at it
< sumedhghaisas>
the paragraph starting with... "A computational bottleneck here is computing...
< sumedhghaisas>
I didn't understand how their method of computation is faster..
govg has quit [Quit: leaving]
< naywhayare>
okay, I see
< naywhayare>
so, Y^T Y can be precomputed and stored
< sumedhghaisas>
but how exactly it helps??
< naywhayare>
well, the trick is in observing that C^u - I is very sparse
< naywhayare>
so then Y^T (C^u - I) Y is a fast calculation, way faster than O(f^2 n) because (C^u - I) is sparse
< naywhayare>
in an optimizer, you'll calculate the value of x_u many times (that's equation (4))
< naywhayare>
so if you could, say, compute and store Y^T Y at the beginning of the optimization, then you only need to calculate Y^T (C^u - I) Y each time you need the objective function
< naywhayare>
which can be done much more quickly than calculating Y^T C^u Y
< sumedhghaisas>
but does our matrix multiplication support this faster sparse multiplication??
< sumedhghaisas>
I mean armadillo ...
< sumedhghaisas>
does armadillo support faster sparse mutiplication??
< naywhayare>
yes, it should be faster
< sumedhghaisas>
for that 'Cu' should be sparse right??
< sumedhghaisas>
I mean sp_mat...
< naywhayare>
yeah
< sumedhghaisas>
Cu will be equal to arma::transpose(V.col(i)) * eye<sp_mat>(n, n)
< sumedhghaisas>
humm... okay got it...
< sumedhghaisas>
thanks...
< naywhayare>
okay, good. let me know if you have any further issues
< sumedhghaisas>
yes sure... :)
udit_s has quit [Quit: Leaving]
govg has joined #mlpack
govg has quit []
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< sumedhghaisas>
naywhayare: where should I put the wrapper of svd??
< sumedhghaisas>
I mean the file...
< naywhayare>
sumedhghaisas: why not in methods/cf/?
< sumedhghaisas>
naywhayare: create folder svd??
< sumedhghaisas>
maybe then we can shift QUICK-SVD inside it...
< naywhayare>
let's wait to figure out what to do with all the factorizers until they're done
< naywhayare>
I'll try and think of some ideas
< sumedhghaisas>
okay... cool... so for right now should I create a folder and add PlainSVD??
< naywhayare>
no, just add it to methods/cf/ directly for now
< sumedhghaisas>
okay
Anand has joined #mlpack
< Anand>
Marcus : Fixed the bug!
< marcus_zoq>
Anand: Great!
< Anand>
Run and check once.
< marcus_zoq>
Anand: Looks good, you can delete the debug message :)
< Anand>
Oh yeah! Forgot that. Will remove it
< marcus_zoq>
Anand: So the next thing is to integrate the bar chart?
< Anand>
The bar chart is already done, right?
< marcus_zoq>
Anand: I've seen the template, but I thougt you need to integrate it?
< Anand>
Integrate with what?
< marcus_zoq>
Anand: A sorry for the confusion, I meant the pie chart
< Anand>
Marcus : Yes, I wanted to add the pie chart but I am not yet sure how.
< Anand>
The proposal mentions it differently but I dont see a need to represent the true/false positives/negatives using pie charts
< marcus_zoq>
Anand: Yeah, I think you are right
< Anand>
Marcus : Do you have any suggestions regarding more representations? I think we already did the most useful one
< marcus_zoq>
Anand: yeah I think so, I'm a little bit concerned about the representation if we compare a lot more libraries than just two
< Anand>
Marcus : You can add any number of libraries in the bar chart that we did
< Anand>
Just add more to the config file
< Anand>
That is why we call it the grouped bar chart
< marcus_zoq>
Anand: Okay, I've tested the code with more libraries and you are right, it looks good, except that the table under the graph isn't correct. But I think we should move the legend to another postion: https://urgs.org/graphs.png
< Anand>
Oh! I dont understand why would that happen to the table! And yes, I will move the legend
< Anand>
Marcus : Also check the values in the .csv file generated. All the libraries are performing really closely it seems!
< marcus_zoq>
Anand: I used the same values for all libraries :)
< Anand>
Oh! What happened to the table there?
< marcus_zoq>
Anand: Good question
< Anand>
Marcus : Can you see the dictionary? See if it is correct
< marcus_zoq>
Anand: The dict from the make reports?
< Anand>
Yes
< Anand>
Marcus : We use that dict to build that HTML table
< Anand>
If the dict is correct, the table should also be correct
< Anand>
Marcus : Great! Without the graphs though! :P
< marcus_zoq>
Anand: oh, wait
< marcus_zoq>
Anand: Okay, now it should work
< Anand>
Marcus : Nice!
< naywhayare>
marcus_zoq: Anand: that looks really nice
< naywhayare>
do you think it would be better to group by the metric instead of by the library?
< Anand>
Ryan : That is a matter of how you like to make sense out of the visualizations. We can do both. And yes it is a nice idea to group by metrics. I will do that too! :)
< naywhayare>
ah, good idea
< Anand>
Ryan : Yes actually it is good to do both. Will be more insightful, I guess.
< sumedhghaisas>
naywhayare: is there any way to simulate a move constructor in armadillo?
< sumedhghaisas>
naywhayare: what to do when matrix with dimension n * m is provided by the user when n != m? I think zeros have to appended right?
< sumedhghaisas>
to W, sigma and H
Anand has quit [Quit: Page closed]
< naywhayare>
for CF?
< naywhayare>
if n != m then the number of users is not equal to the number of items
< naywhayare>
I don't see why that's an issue though; SVD will work with non-square matrices
< naywhayare>
and so will all of the other factorization techniques
< sumedhghaisas>
if n != m and we take SVD ...
< sumedhghaisas>
W * diagmat(sigma) * H is invalid...
< sumedhghaisas>
dimensions does not match...
< sumedhghaisas>
like if we take SVD of 5 * 4 matrix
< sumedhghaisas>
sorry W * diagmat(sigma) * trans(H)
< naywhayare>
if n != m then sigma is a rectangular diagonal matrix, not a square matrix like I think diagmat() will produce
< naywhayare>
so you'd need to make an n x m matrix, then set its diagonal (of length min(n, m)) to sigma
< sumedhghaisas>
ahh right...
sumedhghaisas has quit [Ping timeout: 264 seconds]
sumedhghaisas has joined #mlpack
jbc__ has quit [Quit: jbc__]
sumedhghaisas has quit [Ping timeout: 264 seconds]
< jenkins-mlpack>
Starting build #2068 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has joined #mlpack
< naywhayare>
sumedhghaisas: I don't think the PlainSVD tests are very good; they depend on a particular random seed (10) and I'm not sure what they're checking
< naywhayare>
depending on a particular random seed is a bad idea, because if the underlying implementation of arma::randu() or arma::randn() changes (which it does from time to time), the test is invalidated
< sumedhghaisas>
ohh okay... so should I remove the RandomSeed??
< sumedhghaisas>
its just checking that the wrapper is functioning correctly...
< sumedhghaisas>
we test both the Apply() functions and see that the returning residue is valid
< sumedhghaisas>
naywhayare: jenkins should build the commit by now right...