naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
andrewmw94 has quit [Quit: Leaving.]
< jenkins-mlpack> Yippie, build fixed!
< jenkins-mlpack> Project mlpack - nightly matrix build build #525: FIXED in 4 hr 21 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/525/
< jenkins-mlpack> * Marcus Edel: Improved nystroem localisation ('oe')
< jenkins-mlpack> * Marcus Edel: Integrate nystroem method into the kernel_pca_main.cpp file.
< jenkins-mlpack> * siddharth.950: Adding tests for Reg SVD.
andrewmw94 has joined #mlpack
< andrewmw94> naywhayare: As regards the SplitType tests, there's really no reason why Child(0) has to be Child(0) rather than Child(1). I can't think of a way to get around this problem while specifying exactly how the bounds should be laid out. Any suggestions?
< andrewmw94> "bounds" should be "bounds and tree"
< naywhayare> so what you are saying is that you know how the bounds should be, but you can't have any guarantee on the ordering of them?
< andrewmw94> yeah
< naywhayare> depending on how much you know about the bounds, you could do some matching
< naywhayare> if you can say "one bound should be the hyperrectangle [(-5, 5) (-3, 3)]" then you can search through all the children for that bound
< naywhayare> then for the next bound, you search through all the children except the one you already found the bound in, and so forth
< andrewmw94> yeah, that would work.
< naywhayare> I know I've done this somewhere in the tests
< naywhayare> I'm trying to find an example
< naywhayare> but there are apparently 16,606 lines of tests, so... things are a bit hard to find sometimes...
< naywhayare> ok, here's kind of an example, in gmm_test.cpp, test name "GMMTrainEMMultipleGaussians"
< andrewmw94> thanks
< naywhayare> so we train a gaussian mixture model (which is just a bunch of gaussians with different weights, means, and covariances)
< naywhayare> and we want to compare to our synthetic model, but we have no guarantee things are in the same order
< naywhayare> so on line 301/302, we just sort by weights and then compare that way
< jenkins-mlpack> Starting build #2040 for job mlpack - svn checkin test (previous build: SUCCESS)
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: hey, did you check out the tests?
< naywhayare> they seem fine to me; I'd add at least a simple test to check that RegularizedSVD as a whole is working
< naywhayare> but the tests for RegularizedSVDFunction are great
< oldbeardo> thanks :)
< oldbeardo> what should I work on now? you have something in mind?
< naywhayare> we need to make sure that the two SVD algorithms you have implemented work with the CF module
< oldbeardo> okay, Reg SVD should be straight forward, QUIC-SVD might take some time
sumedhghaisas has joined #mlpack
< naywhayare> oldbeardo: yeah, we will probably need to modify the CF abstraction a little bit to make it work, but we can figure it out as we go :)
< sumedhghaisas> naywhayare: finally... net is back :)
< naywhayare> sumedhghaisas: good to hear they got that taken care of :)
< oldbeardo> naywhayare: do you think Reg SVD should be the default for CF?
< naywhayare> oldbeardo: it's too early to say... we have like eight choices now, with your contributions and Sumedh's :)
< sumedhghaisas> okay I am updated with the mail conversation.. so what exactly we are going to do about that iterator??
< naywhayare> I think what we should do is, when everything is done, run some tests and pick whichever seems to give a good tradeoff between performance and runtime as the default
< naywhayare> I wouldn't be surprised if that is regularized SVD
< naywhayare> sumedhghaisas: I think we should put it into arma_extend/, because we'll have to do that anyway for reverse compatibility
< naywhayare> so we can do that, and then we can also send Conrad the implementation and it's his choice whether or not he wants to accept it
< naywhayare> do you think that's reasonable?
< oldbeardo> naywhayare: right, it's been a while since I had a look at the CF module, I'll do that now
< sumedhghaisas> yes... but there is already a typedef of eT* to mat::iterator..
< sumedhghaisas> how do we overload that??
< naywhayare> oldbeardo: I imagine it may need some refactoring, but I'm not opposed to that. we just need to make sure it will still work with the AMF class
< naywhayare> sumedhghaisas: we don't; we make another class called 'row_col_iterator', like Conrad suggested, and then we use row_col_iterator
< naywhayare> for SpMat, we can just say 'typedef iterator row_col_iterator'
< oldbeardo> naywhayare: okay, I will have a look and get back to you
< naywhayare> but for Mat, we just make a class row_col_iterator that is part of Mat
< sumedhghaisas> for that we have to redefine the class ...
< sumedhghaisas> what about begin() and end()??
< naywhayare> you can use return type overloading for that
< naywhayare> a class can have both 'iterator begin()' and 'row_col_iterator begin()'
< naywhayare> so... to add these things to Mat inside of arma_extend/, we'll create a file called Mat_extra_bones.hpp and Mat_extra_meat.hpp
< naywhayare> take a look at src/mlpack/core/arma_extend/SpMat_extra_bones.hpp and SpMat_extra_meat.hpp
< naywhayare> so you just put the code that you want to be inside Mat_bones and Mat_meat in those two files
< sumedhghaisas> I don't think that works... does it?? I always thought there should be some parameter to substitute templates...
< sumedhghaisas> template<typename T> T begin() { }
< naywhayare> then, look at arma_extend.hpp, where the ARMA_EXTRA_SPMAT_PROTO and ARMA_EXTRA_SPMAT_MEAT macros are defined... we'll define ARMA_EXTRA_MAT_PROTO and ARMA_EXTRA_MAT_MEAT macros in the same way
< sumedhghaisas> will this work??
< naywhayare> I doubt it, because the implementation has to be so different
< naywhayare> blah, hang on, you are right, you can't do return type overloading
< naywhayare> what you can do is overloading based on cv-qualifiers (so, const)
< naywhayare> but that doesn't really apply here
< naywhayare> we can call the functions begin_row_col() and end_row_col()
< sumedhghaisas> you can but then it has to be something like this template <typename T> T begin(T a) { }
< naywhayare> yeah, and in our case begin() shouldn't take any arguments
< naywhayare> if we used templates like that, we'd have to write begin<row_col_iterator>()
< sumedhghaisas> yes... exactly...
< naywhayare> so for now we can do begin_row_col() and end_row_col() and maybe Conrad has a better idea
< naywhayare> but if I had to guess, that's probably what he would do
< sumedhghaisas> okay I will look at arma_extend and how it is implemented... and get back to you if I have any doubts...
< naywhayare> take a look at Mat_bones.hpp and Mat_meat.hpp, too; look for ARMA_EXTRA_MAT_PROTO
< naywhayare> it's a clever trick... he writes a class like this:
< naywhayare> class Mat {
< naywhayare> ... // lots of stuff
< naywhayare> #ifdef ARMA_EXTRA_MAT_PROTO
< naywhayare> #include ARMA_EXTRA_MAT_PROTO
< naywhayare> #endif
< naywhayare> };
< sumedhghaisas> ohh so that more code can be added ....
< sumedhghaisas> very clever...
< naywhayare> yeah, exactly! I think it's a really cool trick
< sumedhghaisas> yeah exactly ... there are so many things that I learn everyday about C++...
< sumedhghaisas> just for a trivia... do you know you can access private members inside a c++ class without friend or anything??
< naywhayare> haha, there is a really dirty way to do this:
< naywhayare> #define private public
< naywhayare> #include "file.hpp"
< naywhayare> #undef private
< naywhayare> but that is a bad idea for a million reasons
< sumedhghaisas> haha... no no... not that way...
< sumedhghaisas> okay... so there is a class like this...
< sumedhghaisas> class A {
< sumedhghaisas> private:
< sumedhghaisas> int a;
< sumedhghaisas> }
arcane has joined #mlpack
< sumedhghaisas> so now in main u say...
< sumedhghaisas> A* a = new A();
< sumedhghaisas> int* temp = (int*)a;
< sumedhghaisas> and believe it or not...
< sumedhghaisas> it works...
< naywhayare> hm, I see; that works because the internal structure of A is the same as an int
< sumedhghaisas> there is more...
< naywhayare> so you could do it in a more complex form, by defining a struct with the same parameters as the class
< sumedhghaisas> if there are 2 members...
< sumedhghaisas> temp++ will point to the next member...
< sumedhghaisas> yeah...
< sumedhghaisas> okay I have a question...
< sumedhghaisas> I was reading about template metaprogramming the other day...
< naywhayare> I imagine this works in most cases, but I don't think it's guaranteed to work all the time because the compiler may add padding to classes in weird ways
< naywhayare> still, a clever trick one could use in a pinch :)
< sumedhghaisas> so the basic example is a factorial function implemented in compile time...
< sumedhghaisas> it is template<typename T> Factorial
< sumedhghaisas> can I overload this with template<typename T> Factorial??
< naywhayare> I'm not sure what you mean
< oldbeardo> naywhayare: I saw the CF class and could think of two problems
< oldbeardo> naywhayare: firstly, there is no Apply() function in Reg SVD
sumedhghaisas has quit [Ping timeout: 260 seconds]
< oldbeardo> naywhayare: secondly, Reg SVD works on the raw data itself, and not on the rating matrix
sumedhghaisas has joined #mlpack
< oldbeardo> naywhayare: not sure how do deal with these issues, should I add an Apply() method to Reg SVD?
< naywhayare> yeah, we should have an Apply() function, and the optimizer shouldn't be called until then
< naywhayare> I think that will solve part of your problem
< naywhayare> CF calls Apply() with the cleanedData matrix, which is a sparse representation of the ratings matrix
< naywhayare> maybe it would be useful to refactor RegularizedSVD to work with either a sparse or dense matrix?
< oldbeardo> it's not really refactoring, the code would need to be written again and it won't be that efficient, I had considered that option before starting
< oldbeardo> a simple solution would be to set a flag depending on the factorizer called
< naywhayare> oh! I did not realize how you had written the regularized SVD code... I see now
< naywhayare> so, another option is to use a little template metaprogramming to detect when the factorizer takes the sparse rating matrix and when it just takes the (row, col, rating) matrix
< naywhayare> and then use specializations or SFINAE to make CF call the factorizer with the right argument
< oldbeardo> I'm not familiar with the technique you are suggesting, is there an example somewhere?
< sumedhghaisas> I love SFINAE :) very cool technique...
< naywhayare> hang on, let me find an example that is reasonably simple...
< sumedhghaisas> naywhayare: can I ask what is the problem here??
< oldbeardo> sumedhghaisas: check out the regularized_svd module
< oldbeardo> sumedhghaisas: we are going to integrate that with the CF module, for which we will some changes
< oldbeardo> *need some changes
< naywhayare> oldbeardo: take a look at the kernels in src/mlpack/core/kernels/
< sumedhghaisas> oldbeardo: hostel net is so slow ... I am updating for 5 mins :(
< naywhayare> and then look at src/mlpack/core/kernels/kernel_traits.hpp
< oldbeardo> sumedhghaisas: you are on campus already?
< naywhayare> so for any KernelType, the value of KernelTraits<KernelType>::IsNormalized is known at compile time
< sumedhghaisas> yeah... for some placement training purpose... :(
< sumedhghaisas> very boring...
< naywhayare> then the IsNormalized variable is used in src/mlpack/methods/fastmks/fastmks_rules_impl.hpp at line 114
< naywhayare> (that particular bound that is being calculated can be tightened if the kernel is normalized, which is the reason for the if statement)
< naywhayare> because it's all known at compile time, the if statement actually should get entirely optimized out
< naywhayare> so, you could do something similar for RegularizedSVD... you could make a 'FactorizerTraits' class or something with a boolean that specifies whether or not the factorizer takes the sparse rating matrix or the row/col/rating matrix
< naywhayare> and then based on the value of that, you could make an if statement which calls the factorizer with the correct matrix
< oldbeardo> right, so this will have to be done for each factorizer?
< naywhayare> not necessarily, which is nice :)
< naywhayare> you can set the default value of this boolean to false (so that the sparse rating matrix is passed in)
< naywhayare> so you only need to make the specialization of FactorizerTraits for RegularizedSVD
< naywhayare> (although it's probably not a bad idea to do it for the other ones anyway, and just set the boolean to its default value, for the sake of clarity)
< oldbeardo> okay, will do it
oldbeardo has quit [Quit: Page closed]
< arcane> naywhayare, Hi. does HRectBound need refactoring to accept instantiated metric ? (#246)
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: I thought of one more issue which requires your input
< oldbeardo> in the Kernel Traits example, everything is contained within mlpack::kernel
< naywhayare> arcane: I have your email marked to respond to, but I hadn't done it yet. let's leave HRectBound alone for now... I think it needs a bit of thought, since it can only accept LMetric
< oldbeardo> however, this is not the case for techniques used in CF
< naywhayare> oldbeardo: you could just put FactorizerTraits in mlpack::cf for now
< oldbeardo> naywhayare: okay, then where will this come? FactorizerTraits<RegularizedSVD>{ ... }
< oldbeardo> in the CF class itself?
< naywhayare> probably in the regularized svd code, but you'll have to put it in the mlpack::cf namespace
< oldbeardo> sorry, I meant mlpack::cf
< naywhayare> but I don't see any other way around that
< oldbeardo> okay, I was thinking of putting all these methods in mlpack::mf
< naywhayare> mf?
< oldbeardo> mf -> matrix factorization
< naywhayare> hmm... ok. if you do that, you should move AMF to the mf namespace too, and also QUIC-SVD
< oldbeardo> I'm not too sure about this though, I will first try out what you suggested
arcane has quit [Quit: Leaving]
< jenkins-mlpack> Project mlpack - svn checkin test build #2040: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2040/
< jenkins-mlpack> andrewmw94: Bug fix. Node splitting tests.
udit_s has joined #mlpack
< sumedhghaisas> naywhayare: what is arma_inline and arma_hot??
< sumedhghaisas> and do they help increasing speed??
oldbeardo has quit [Quit: Page closed]
< naywhayare> sumedhghaisas: they are hints to the compiler; you can look at what they actually are in compiler.hpp, I think
< naywhayare> I am not sure how much speed advantage they provide; I didn't write that code
< sumedhghaisas> okay I read all those files...
< sumedhghaisas> just one doubt...
< sumedhghaisas> / So that we satisfy the STL iterator types.
< sumedhghaisas> typedef std::bidirectional_iterator_tag iterator_category;
< sumedhghaisas> typedef eT value_type;
< sumedhghaisas> typedef uword difference_type; // not certain on this one
< sumedhghaisas> typedef const eT* pointer;
< sumedhghaisas> typedef const eT& reference;
< sumedhghaisas> why is this added to the class??
< sumedhghaisas> naywhayare: I think row_col_iterator_base is not required in this case... means I don;t know if it is important from the design point of view... is it??
< naywhayare> probably not, no
< naywhayare> the Mat iterators are very simple, unlike the SpMat iterators
< sumedhghaisas> yes... I was reading SpMat::iterator implementation... they are way more complex...
< naywhayare> yeah... it is very complex to iterate over a sparse matrix
< sumedhghaisas> okay so I will implement row_col_iterator and row_col_const_iterator...
< udit_s> naywhayare: hey ryan, you there ?
< naywhayare> udit_s: yes; I'm a bit busy, but I'm still here :)
< udit_s> Nothing urgent right now. I wanted to inform you before turning in that I have added a test for the iris dataset.
< udit_s> It currently has an accuracy of 98%.
< udit_s> I've fixed a few other things in the perceptron weight update as well. Lets discuss the code tomorrow or over the mail ?
< udit_s> I was talking about the adaboost.
< naywhayare> ok, great
< naywhayare> we can talk about it tomorrow in IRC or in the channel
< naywhayare> up to you
< udit_s> sure, what time would suit you ?
< naywhayare> I'm hoping to be up and online by 1300 UTC
< naywhayare> is that okay?
< udit_s> I'll be online around that time - 1400 UTC to be safe.
< jenkins-mlpack> Starting build #2041 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare> ok, sounds good
< udit_s> okay. great. :)
udit_s has quit [Quit: Leaving]
< sumedhghaisas> naywhayare: okay Mat_bones.hpp is adding extended declaration through ARMA_EXTRA_MAT_PROTO ... but where is ARMA_EXTRA_MAT_MEAT is getting included??
< naywhayare> should be in Mat_meat.hpp, I thought
< naywhayare> or, if not, in the include/armadillo file somewhere, probably
< sumedhghaisas> ohh haan right...
< sumedhghaisas> I didn't check there...
< andrewmw94> naywhayare: I need to get the distance from the centroid of a HRectBound to another point. I assume that the Metric() function will be rather slow as it constructs a LMetric. Should I add a method to HRectBound or should I make my own or ...?
< naywhayare> andrewmw94: ideally, the compiler should optimize all of that out since the LMetric doesn't actually hold any information locally and all of its methods are static
< naywhayare> so the construction of the object won't actually happen
< andrewmw94> ahh, thanks
< naywhayare> this is a little bit related to something arcane was talking about earlier and a bug that is open (#246 I think?); the HRectBound doesn't hold an instantiated metric, but it probably should, since some metrics (such as the Mahalanobis distance, which is a weighted l-norm) have local data
< naywhayare> but I am not completely sure of what I want to do about that quite yet, so it'll stay the way it is for now (maybe in the next week or two I will figure something better out)
< andrewmw94> yeah. I'm not sure the RTree would work for anything besides Euclidean distances
< naywhayare> yeah; I think it should work for the Mahalanobis distance too
< naywhayare> I think the real requirement is simply that each dimension is explicitly accessible
< naywhayare> the class of weighted l-norms or something like that. I'm not sure of the exact terminology
< sumedhghaisas> naywhayare: there is no operator==(const iterator& rhs) for class const_iterator... it should be there right??
< naywhayare> I don't think there needs to be because of inheritance, if I remember right
< sumedhghaisas> ohh yes.. you are right... and one more thing...
< sumedhghaisas> in operator==
< sumedhghaisas> we are just comparing if col and row are same...
< sumedhghaisas> what if the iterators are pointing to different matrices??
< naywhayare> hmm, good point
< naywhayare> I guess we should check the matrix itself too
< naywhayare> or the memory location of the matrix
< sumedhghaisas> now... what does STL iterators do here??
< naywhayare> I am not sure; I assume that they are unequal if they come from different objects
< naywhayare> but I haven't checked the standard
< sumedhghaisas> okay... can be verified with a quick test...
< sumedhghaisas> okay they check memory location...
< sumedhghaisas> I will make the necessary changes then...
< jenkins-mlpack> Project mlpack - svn checkin test build #2041: SUCCESS in 1 hr 26 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2041/
< jenkins-mlpack> saxena.udit: Adaboost improved. Tests for the UCI iris dataset added.
< sumedhghaisas> naywhayare: there is one subtle point... I checked with a vector... if iterator is pointing to begin() and it-- is called...
< sumedhghaisas> the iterator is still valid...
< sumedhghaisas> *it prints garbage...
< sumedhghaisas> what should we do?? follow this or make it null??
sumedhghaisas has quit [Read error: Connection reset by peer]