naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< jenkins-mlpack>
* Marcus Edel: Improved nystroem localisation ('oe')
< jenkins-mlpack>
* Marcus Edel: Integrate nystroem method into the kernel_pca_main.cpp file.
< jenkins-mlpack>
* siddharth.950: Adding tests for Reg SVD.
andrewmw94 has joined #mlpack
< andrewmw94>
naywhayare: As regards the SplitType tests, there's really no reason why Child(0) has to be Child(0) rather than Child(1). I can't think of a way to get around this problem while specifying exactly how the bounds should be laid out. Any suggestions?
< andrewmw94>
"bounds" should be "bounds and tree"
< naywhayare>
so what you are saying is that you know how the bounds should be, but you can't have any guarantee on the ordering of them?
< andrewmw94>
yeah
< naywhayare>
depending on how much you know about the bounds, you could do some matching
< naywhayare>
if you can say "one bound should be the hyperrectangle [(-5, 5) (-3, 3)]" then you can search through all the children for that bound
< naywhayare>
then for the next bound, you search through all the children except the one you already found the bound in, and so forth
< andrewmw94>
yeah, that would work.
< naywhayare>
I know I've done this somewhere in the tests
< naywhayare>
I'm trying to find an example
< naywhayare>
but there are apparently 16,606 lines of tests, so... things are a bit hard to find sometimes...
< naywhayare>
ok, here's kind of an example, in gmm_test.cpp, test name "GMMTrainEMMultipleGaussians"
< andrewmw94>
thanks
< naywhayare>
so we train a gaussian mixture model (which is just a bunch of gaussians with different weights, means, and covariances)
< naywhayare>
and we want to compare to our synthetic model, but we have no guarantee things are in the same order
< naywhayare>
so on line 301/302, we just sort by weights and then compare that way
< jenkins-mlpack>
Starting build #2040 for job mlpack - svn checkin test (previous build: SUCCESS)
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: hey, did you check out the tests?
< naywhayare>
they seem fine to me; I'd add at least a simple test to check that RegularizedSVD as a whole is working
< naywhayare>
but the tests for RegularizedSVDFunction are great
< oldbeardo>
thanks :)
< oldbeardo>
what should I work on now? you have something in mind?
< naywhayare>
we need to make sure that the two SVD algorithms you have implemented work with the CF module
< oldbeardo>
okay, Reg SVD should be straight forward, QUIC-SVD might take some time
sumedhghaisas has joined #mlpack
< naywhayare>
oldbeardo: yeah, we will probably need to modify the CF abstraction a little bit to make it work, but we can figure it out as we go :)
< sumedhghaisas>
naywhayare: finally... net is back :)
< naywhayare>
sumedhghaisas: good to hear they got that taken care of :)
< oldbeardo>
naywhayare: do you think Reg SVD should be the default for CF?
< naywhayare>
oldbeardo: it's too early to say... we have like eight choices now, with your contributions and Sumedh's :)
< sumedhghaisas>
okay I am updated with the mail conversation.. so what exactly we are going to do about that iterator??
< naywhayare>
I think what we should do is, when everything is done, run some tests and pick whichever seems to give a good tradeoff between performance and runtime as the default
< naywhayare>
I wouldn't be surprised if that is regularized SVD
< naywhayare>
sumedhghaisas: I think we should put it into arma_extend/, because we'll have to do that anyway for reverse compatibility
< naywhayare>
so we can do that, and then we can also send Conrad the implementation and it's his choice whether or not he wants to accept it
< naywhayare>
do you think that's reasonable?
< oldbeardo>
naywhayare: right, it's been a while since I had a look at the CF module, I'll do that now
< sumedhghaisas>
yes... but there is already a typedef of eT* to mat::iterator..
< sumedhghaisas>
how do we overload that??
< naywhayare>
oldbeardo: I imagine it may need some refactoring, but I'm not opposed to that. we just need to make sure it will still work with the AMF class
< naywhayare>
sumedhghaisas: we don't; we make another class called 'row_col_iterator', like Conrad suggested, and then we use row_col_iterator
< naywhayare>
for SpMat, we can just say 'typedef iterator row_col_iterator'
< oldbeardo>
naywhayare: okay, I will have a look and get back to you
< naywhayare>
but for Mat, we just make a class row_col_iterator that is part of Mat
< sumedhghaisas>
for that we have to redefine the class ...
< sumedhghaisas>
what about begin() and end()??
< naywhayare>
you can use return type overloading for that
< naywhayare>
a class can have both 'iterator begin()' and 'row_col_iterator begin()'
< naywhayare>
so... to add these things to Mat inside of arma_extend/, we'll create a file called Mat_extra_bones.hpp and Mat_extra_meat.hpp
< naywhayare>
take a look at src/mlpack/core/arma_extend/SpMat_extra_bones.hpp and SpMat_extra_meat.hpp
< naywhayare>
so you just put the code that you want to be inside Mat_bones and Mat_meat in those two files
< sumedhghaisas>
I don't think that works... does it?? I always thought there should be some parameter to substitute templates...
< sumedhghaisas>
template<typename T> T begin() { }
< naywhayare>
then, look at arma_extend.hpp, where the ARMA_EXTRA_SPMAT_PROTO and ARMA_EXTRA_SPMAT_MEAT macros are defined... we'll define ARMA_EXTRA_MAT_PROTO and ARMA_EXTRA_MAT_MEAT macros in the same way
< sumedhghaisas>
will this work??
< naywhayare>
I doubt it, because the implementation has to be so different
< naywhayare>
blah, hang on, you are right, you can't do return type overloading
< naywhayare>
what you can do is overloading based on cv-qualifiers (so, const)
< naywhayare>
but that doesn't really apply here
< naywhayare>
we can call the functions begin_row_col() and end_row_col()
< sumedhghaisas>
you can but then it has to be something like this template <typename T> T begin(T a) { }
< naywhayare>
yeah, and in our case begin() shouldn't take any arguments
< naywhayare>
if we used templates like that, we'd have to write begin<row_col_iterator>()
< sumedhghaisas>
yes... exactly...
< naywhayare>
so for now we can do begin_row_col() and end_row_col() and maybe Conrad has a better idea
< naywhayare>
but if I had to guess, that's probably what he would do
< sumedhghaisas>
okay I will look at arma_extend and how it is implemented... and get back to you if I have any doubts...
< naywhayare>
take a look at Mat_bones.hpp and Mat_meat.hpp, too; look for ARMA_EXTRA_MAT_PROTO
< naywhayare>
it's a clever trick... he writes a class like this:
< naywhayare>
class Mat {
< naywhayare>
... // lots of stuff
< naywhayare>
#ifdef ARMA_EXTRA_MAT_PROTO
< naywhayare>
#include ARMA_EXTRA_MAT_PROTO
< naywhayare>
#endif
< naywhayare>
};
< sumedhghaisas>
ohh so that more code can be added ....
< sumedhghaisas>
very clever...
< naywhayare>
yeah, exactly! I think it's a really cool trick
< sumedhghaisas>
yeah exactly ... there are so many things that I learn everyday about C++...
< sumedhghaisas>
just for a trivia... do you know you can access private members inside a c++ class without friend or anything??
< naywhayare>
haha, there is a really dirty way to do this:
< naywhayare>
#define private public
< naywhayare>
#include "file.hpp"
< naywhayare>
#undef private
< naywhayare>
but that is a bad idea for a million reasons
< sumedhghaisas>
haha... no no... not that way...
< sumedhghaisas>
okay... so there is a class like this...
< sumedhghaisas>
class A {
< sumedhghaisas>
private:
< sumedhghaisas>
int a;
< sumedhghaisas>
}
arcane has joined #mlpack
< sumedhghaisas>
so now in main u say...
< sumedhghaisas>
A* a = new A();
< sumedhghaisas>
int* temp = (int*)a;
< sumedhghaisas>
and believe it or not...
< sumedhghaisas>
it works...
< naywhayare>
hm, I see; that works because the internal structure of A is the same as an int
< sumedhghaisas>
there is more...
< naywhayare>
so you could do it in a more complex form, by defining a struct with the same parameters as the class
< sumedhghaisas>
if there are 2 members...
< sumedhghaisas>
temp++ will point to the next member...
< sumedhghaisas>
yeah...
< sumedhghaisas>
okay I have a question...
< sumedhghaisas>
I was reading about template metaprogramming the other day...
< naywhayare>
I imagine this works in most cases, but I don't think it's guaranteed to work all the time because the compiler may add padding to classes in weird ways
< naywhayare>
still, a clever trick one could use in a pinch :)
< sumedhghaisas>
so the basic example is a factorial function implemented in compile time...
< sumedhghaisas>
it is template<typename T> Factorial
< sumedhghaisas>
can I overload this with template<typename T> Factorial??
< naywhayare>
I'm not sure what you mean
< oldbeardo>
naywhayare: I saw the CF class and could think of two problems
< oldbeardo>
naywhayare: firstly, there is no Apply() function in Reg SVD
sumedhghaisas has quit [Ping timeout: 260 seconds]
< oldbeardo>
naywhayare: secondly, Reg SVD works on the raw data itself, and not on the rating matrix
sumedhghaisas has joined #mlpack
< oldbeardo>
naywhayare: not sure how do deal with these issues, should I add an Apply() method to Reg SVD?
< naywhayare>
yeah, we should have an Apply() function, and the optimizer shouldn't be called until then
< naywhayare>
I think that will solve part of your problem
< naywhayare>
CF calls Apply() with the cleanedData matrix, which is a sparse representation of the ratings matrix
< naywhayare>
maybe it would be useful to refactor RegularizedSVD to work with either a sparse or dense matrix?
< oldbeardo>
it's not really refactoring, the code would need to be written again and it won't be that efficient, I had considered that option before starting
< oldbeardo>
a simple solution would be to set a flag depending on the factorizer called
< naywhayare>
oh! I did not realize how you had written the regularized SVD code... I see now
< naywhayare>
so, another option is to use a little template metaprogramming to detect when the factorizer takes the sparse rating matrix and when it just takes the (row, col, rating) matrix
< naywhayare>
and then use specializations or SFINAE to make CF call the factorizer with the right argument
< oldbeardo>
I'm not familiar with the technique you are suggesting, is there an example somewhere?
< sumedhghaisas>
I love SFINAE :) very cool technique...
< naywhayare>
hang on, let me find an example that is reasonably simple...
< sumedhghaisas>
naywhayare: can I ask what is the problem here??
< oldbeardo>
sumedhghaisas: check out the regularized_svd module
< oldbeardo>
sumedhghaisas: we are going to integrate that with the CF module, for which we will some changes
< oldbeardo>
*need some changes
< naywhayare>
oldbeardo: take a look at the kernels in src/mlpack/core/kernels/
< sumedhghaisas>
oldbeardo: hostel net is so slow ... I am updating for 5 mins :(
< naywhayare>
and then look at src/mlpack/core/kernels/kernel_traits.hpp
< oldbeardo>
sumedhghaisas: you are on campus already?
< naywhayare>
so for any KernelType, the value of KernelTraits<KernelType>::IsNormalized is known at compile time
< sumedhghaisas>
yeah... for some placement training purpose... :(
< sumedhghaisas>
very boring...
< naywhayare>
then the IsNormalized variable is used in src/mlpack/methods/fastmks/fastmks_rules_impl.hpp at line 114
< naywhayare>
(that particular bound that is being calculated can be tightened if the kernel is normalized, which is the reason for the if statement)
< naywhayare>
because it's all known at compile time, the if statement actually should get entirely optimized out
< naywhayare>
so, you could do something similar for RegularizedSVD... you could make a 'FactorizerTraits' class or something with a boolean that specifies whether or not the factorizer takes the sparse rating matrix or the row/col/rating matrix
< naywhayare>
and then based on the value of that, you could make an if statement which calls the factorizer with the correct matrix
< oldbeardo>
right, so this will have to be done for each factorizer?
< naywhayare>
not necessarily, which is nice :)
< naywhayare>
you can set the default value of this boolean to false (so that the sparse rating matrix is passed in)
< naywhayare>
so you only need to make the specialization of FactorizerTraits for RegularizedSVD
< naywhayare>
(although it's probably not a bad idea to do it for the other ones anyway, and just set the boolean to its default value, for the sake of clarity)
< oldbeardo>
okay, will do it
oldbeardo has quit [Quit: Page closed]
< arcane>
naywhayare, Hi. does HRectBound need refactoring to accept instantiated metric ? (#246)
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: I thought of one more issue which requires your input
< oldbeardo>
in the Kernel Traits example, everything is contained within mlpack::kernel
< naywhayare>
arcane: I have your email marked to respond to, but I hadn't done it yet. let's leave HRectBound alone for now... I think it needs a bit of thought, since it can only accept LMetric
< oldbeardo>
however, this is not the case for techniques used in CF
< naywhayare>
oldbeardo: you could just put FactorizerTraits in mlpack::cf for now
< oldbeardo>
naywhayare: okay, then where will this come? FactorizerTraits<RegularizedSVD>{ ... }
< oldbeardo>
in the CF class itself?
< naywhayare>
probably in the regularized svd code, but you'll have to put it in the mlpack::cf namespace
< oldbeardo>
sorry, I meant mlpack::cf
< naywhayare>
but I don't see any other way around that
< oldbeardo>
okay, I was thinking of putting all these methods in mlpack::mf
< naywhayare>
mf?
< oldbeardo>
mf -> matrix factorization
< naywhayare>
hmm... ok. if you do that, you should move AMF to the mf namespace too, and also QUIC-SVD
< oldbeardo>
I'm not too sure about this though, I will first try out what you suggested
< sumedhghaisas>
naywhayare: I think row_col_iterator_base is not required in this case... means I don;t know if it is important from the design point of view... is it??
< naywhayare>
probably not, no
< naywhayare>
the Mat iterators are very simple, unlike the SpMat iterators
< sumedhghaisas>
yes... I was reading SpMat::iterator implementation... they are way more complex...
< naywhayare>
yeah... it is very complex to iterate over a sparse matrix
< sumedhghaisas>
okay so I will implement row_col_iterator and row_col_const_iterator...
< udit_s>
naywhayare: hey ryan, you there ?
< naywhayare>
udit_s: yes; I'm a bit busy, but I'm still here :)
< udit_s>
Nothing urgent right now. I wanted to inform you before turning in that I have added a test for the iris dataset.
< udit_s>
It currently has an accuracy of 98%.
< udit_s>
I've fixed a few other things in the perceptron weight update as well. Lets discuss the code tomorrow or over the mail ?
< udit_s>
I was talking about the adaboost.
< naywhayare>
ok, great
< naywhayare>
we can talk about it tomorrow in IRC or in the channel
< naywhayare>
up to you
< udit_s>
sure, what time would suit you ?
< naywhayare>
I'm hoping to be up and online by 1300 UTC
< naywhayare>
is that okay?
< udit_s>
I'll be online around that time - 1400 UTC to be safe.
< jenkins-mlpack>
Starting build #2041 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare>
ok, sounds good
< udit_s>
okay. great. :)
udit_s has quit [Quit: Leaving]
< sumedhghaisas>
naywhayare: okay Mat_bones.hpp is adding extended declaration through ARMA_EXTRA_MAT_PROTO ... but where is ARMA_EXTRA_MAT_MEAT is getting included??
< naywhayare>
should be in Mat_meat.hpp, I thought
< naywhayare>
or, if not, in the include/armadillo file somewhere, probably
< sumedhghaisas>
ohh haan right...
< sumedhghaisas>
I didn't check there...
< andrewmw94>
naywhayare: I need to get the distance from the centroid of a HRectBound to another point. I assume that the Metric() function will be rather slow as it constructs a LMetric. Should I add a method to HRectBound or should I make my own or ...?
< naywhayare>
andrewmw94: ideally, the compiler should optimize all of that out since the LMetric doesn't actually hold any information locally and all of its methods are static
< naywhayare>
so the construction of the object won't actually happen
< andrewmw94>
ahh, thanks
< naywhayare>
this is a little bit related to something arcane was talking about earlier and a bug that is open (#246 I think?); the HRectBound doesn't hold an instantiated metric, but it probably should, since some metrics (such as the Mahalanobis distance, which is a weighted l-norm) have local data
< naywhayare>
but I am not completely sure of what I want to do about that quite yet, so it'll stay the way it is for now (maybe in the next week or two I will figure something better out)
< andrewmw94>
yeah. I'm not sure the RTree would work for anything besides Euclidean distances
< naywhayare>
yeah; I think it should work for the Mahalanobis distance too
< naywhayare>
I think the real requirement is simply that each dimension is explicitly accessible
< naywhayare>
the class of weighted l-norms or something like that. I'm not sure of the exact terminology
< sumedhghaisas>
naywhayare: there is no operator==(const iterator& rhs) for class const_iterator... it should be there right??
< naywhayare>
I don't think there needs to be because of inheritance, if I remember right
< sumedhghaisas>
ohh yes.. you are right... and one more thing...
< sumedhghaisas>
in operator==
< sumedhghaisas>
we are just comparing if col and row are same...
< sumedhghaisas>
what if the iterators are pointing to different matrices??
< naywhayare>
hmm, good point
< naywhayare>
I guess we should check the matrix itself too
< naywhayare>
or the memory location of the matrix
< sumedhghaisas>
now... what does STL iterators do here??
< naywhayare>
I am not sure; I assume that they are unequal if they come from different objects
< naywhayare>
but I haven't checked the standard
< sumedhghaisas>
okay... can be verified with a quick test...
< sumedhghaisas>
okay they check memory location...
< sumedhghaisas>
I will make the necessary changes then...