naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 252 seconds]
< jenkins-mlpack>
Starting build #1919 for job mlpack - svn checkin test (previous build: SUCCESS)
cuphrody has quit [Remote host closed the connection]
cuphrody has joined #mlpack
oldbeardo has quit [Ping timeout: 240 seconds]
andrewmw94 has quit [Quit: andrewmw94]
andrewmw94 has joined #mlpack
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
witness___ has quit [Quit: Connection closed for inactivity]
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: there?
oldbeardo has quit [Ping timeout: 240 seconds]
govg has quit [Ping timeout: 255 seconds]
govg has joined #mlpack
< naywhayare>
andrewmw94: I didn't realize it, but I think we have less refactoring to do than I thought; both the BinarySpaceTree and CoverTree classes already hold references to the dataset the tree is built on
< naywhayare>
so really the only refactoring work we'll have to do as your R-tree implementation comes together is to adapt node.Point(i) to actually return an arma::vec or arma::sp_vec or arma::subview_Col or whatever
< andrewmw94>
yes
< andrewmw94>
I think I'm trying to decide whether I should keep the one large matrix for bulkloading
< naywhayare>
I can't see any possible way this can affect runtime either, which is nice
< andrewmw94>
or whether I can have the tree always store the matrices in the leaf nodes
< naywhayare>
if you're building the R-tree where each node has a small matrix of its own, is there any advantage to keeping the large matrix?
< andrewmw94>
not that I know of
< andrewmw94>
the memory locallity?
< andrewmw94>
but that should be a small difference
< naywhayare>
yeah, that can be an advantage, but as you noted before it seems like the leaf nodes should each be holding matrices of their own
< andrewmw94>
I have a question about code structure though
< naywhayare>
so that seems to rule out any necessity of keeping a large matrix
< naywhayare>
yeah, go ahead
< andrewmw94>
when you insert a point, it can either be within a bound, or not. If it is, then deciding which child to place it in is simple. But if it isn't then you need to use a hueristic to choose. You also need to use a hueristic to choose how to split nodes. Should those be in the same file, or should I keep them in separate files?
< naywhayare>
assuming these heuristics are independent, then I'd put them in separate files
< naywhayare>
but if they are related (i.e. you always use one node splitting heuristic with one child selection heuristic), then it's probably better to keep those two heuristics in one file; maybe in the same class
< andrewmw94>
no, I think they should be separate. It's just that the one for inserting a point into a child is really short (at the moment at least)
< andrewmw94>
just bound.minDistance()
< andrewmw94>
I assume that returns 0 if the point is within the bound
< andrewmw94>
but some of the more complicated trees use fancier hueristics, so it will probably be nice to have them separate. There's just a lot of different permutations you could go through combining all of these
< naywhayare>
that's just fine to put a very simple heuristic in its own class
< naywhayare>
take a look at src/mlpack/methods/kmeans/random_partition.hpp for another example of a very very simple heuristic :)
< andrewmw94>
yeah. I just wanted to make sure I wasn't going through my old "code" and adding new templates everywhere if there was a better solution
< naywhayare>
ok; thanks for checking
govg has quit [Quit: leaving]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
naywhayare: how would I delete the LMF folder?? do I have to run some command for it??
< sumedhghaisas>
on svn I mean...
< sumedhghaisas>
I delete it on local and run svn del right??
< naywhayare>
svn delete lmf
< naywhayare>
then svn commit lmf
< naywhayare>
you might need to pass the --recursive flag to svn delete, I don't remember (it will tell you)
udit_s has quit [Quit: Ex-Chat]
sumedhghaisas has quit [Ping timeout: 245 seconds]
< udit_s_>
about what marcus said in that last mail - if I do commit like that, I will be committing to the main branch.
< udit_s_>
But I remember you talking about committing to some other repo
< naywhayare>
was I? I have no idea what I was talking about then. I was suggesting that you commit to the main repo, but be careful to either ensure that it still builds okay after your commit, or commit code in such a way that CMake doesn't compile it, so the build doesn't break
< udit_s_>
I read that you have a paper coming up. Does that mean you wouldn't be free to discuss the flow of data through my code with me anytime soon ?
< naywhayare>
no, I have to balance GSoC mentoring and paper writing :)
< naywhayare>
so I will make myself available
< udit_s_>
I saw in your replies that you didn't quite get the hang of a few things I was doing, so I wanted to sit down and properly go through the code, how something should be done, right to the bare bones if need be.
< naywhayare>
sure, did you want to do that now or later?
< udit_s_>
I'm still going through calculateEntropy, checking some dimensional error somewhere :), so let's say we do it either tomorrow same time, or thursday morning for you.
< naywhayare>
ok, sounds good to me
< naywhayare>
alternately, if you think the discussion is easier through email (which allows longer responses), that's fine too
< udit_s_>
this would give you enough time to go through the code too, on your own.
< naywhayare>
ok, sounds good. and the updated code will be on github?
< udit_s_>
yeah, but I'd rather be discussing here...
< naywhayare>
sure, that's just fine
< udit_s_>
on github till I at least get down the proper structure of the files right with you...
< udit_s_>
and then commit to your repo...
< udit_s_>
For now, I haven't made changes to the mlpack_test.cpp and corresponding cmakelists
< naywhayare>
sure, that's fine
< naywhayare>
wow, I have a tendency to write the same things over and over :)
< udit_s_>
I run my own similar setup
< naywhayare>
it's not hard to integrate new code into the existing mlpack_test, so I'll show you how and walk you through the process sometime in the future
< udit_s_>
Also (off topic now), are you a research student ? A PhD student ?
< naywhayare>
yeah, I'm a Ph.D. student
< naywhayare>
I do research on dual-tree algorithms, which is part of why mlpack has such extensive support for them (the other part of why is that all of the other original developers of mlpack did research in exactly the same area)
< udit_s_>
Oh,
< udit_s_>
and what about armadillo ? It's a pretty awesome library. Were you part of the original developers there too ?
< naywhayare>
no; we started using armadillo in mlpack in late 2009, and then we found we needed more functionality from it, so I started contributing patches
< naywhayare>
next thing I knew, I spent a few months writing the sparse matrix support for armadillo, and now Conrad lists me as one of the main developers
< udit_s_>
Woah.
< naywhayare>
but the original idea for a template metaprogramming based linear algebra library, as far as I know, was either entirely his idea or based on Eigen's really complex and not-as-easy implementation (if it existed back in 2008)
< naywhayare>
it's a much easier linear algebra library to work with than the GenMatrix and GenVector classes that the original mlpack developers created
< naywhayare>
instead of writing 'a = b + (c * d)' you'd have to write something like... 'la::MultMatrix(c, d, temp); la::AddMatrix(b, temp, a);'
< naywhayare>
and for more complicated expressions it would be many many lines of hard-to-read code
< udit_s_>
yeah, having gone through the documentation of armadillo a lot of times, just looking at the methods and functions provided was insane...
< naywhayare>
there's still a lot of sparse matrix decompositions and operations that need to be implemented for armadillo, but I don't have the time and I haven't found anyone who's interested yet
< naywhayare>
so, Armadillo is mostly a wrapper library around LAPACK and BLAS (and a few other libraries), so, for the most part, it basically unwraps operations to simpler operations at compile-time, and then calls LAPACK and BLAS to perform them
< naywhayare>
as time has gone on, Conrad has implemented more and more complex functionality by hand instead of relying on LAPACK and BLAS, though, so there's a significant amount of functions that don't (or can't) use other libraries
< jenkins-mlpack>
Starting build #1924 for job mlpack - svn checkin test (previous build: SUCCESS)
< udit_s_>
have you guys ever tried comparing it's performance to other solutions - you know, like MatLab ? or something similar ?
< naywhayare>
marcus's automatic benchmarking system would actually be great for doing this, but I doubt Conrad has the time and I certainly don't
< udit_s_>
maybe sometime in the future, then...
sumedhghaisas has quit [Ping timeout: 258 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
naywhayare: kay I added directory amf with svn mkdir... now how to add files with svn??
< sumedhghaisas>
*okay
udit_s_ has quit [Quit: Ex-Chat]
< naywhayare>
svn add
< sumedhghaisas>
naywhayare: Ohh then did forget to add the directory in the last commit?? cause I did add all the files??
< sumedhghaisas>
with svn add
< naywhayare>
I have not seen a commit by you today
< naywhayare>
are you sure you actually committed it with 'svn ci'?
< sumedhghaisas>
no not today?? the lmf commit I am referring to...
< sumedhghaisas>
You mentioned in the mail some problem with LMF folder...
< naywhayare>
oh, I see what you mean now
< naywhayare>
when you do 'svn add' on a directory, it's equivalent to 'svn mkdir'
< naywhayare>
when you use 'svn copy', not only the files from the other directory are copied, but also the revision history too
< naywhayare>
so 'svn copy' is a better choice for amf/lmf than 'svn mkdir' and then copying the files manually
< naywhayare>
does that make sense?
< sumedhghaisas>
ohhh... Now I get it thanks...
< sumedhghaisas>
So now I should delete the amf with svn delete and use command "svn cp nmf amf" right??
< naywhayare>
yes, that would be the right thing to do
< naywhayare>
you might have to use --force with the svn delete because you have not committed it yet, though
< sumedhghaisas>
okay... I guess I will have more doubts about this later... I will do this today only :)
< sumedhghaisas>
naywhayare: now I have created amf... Inside I need 2 folders ... I create them with svn mkdir ... what about the files of nmf present in amf?? how do I remove them??
< jenkins-mlpack>
andrewmw94: more stuff for R Trees
< naywhayare>
sumedhghaisas: you could use 'svn delete' to remove them, or 'svn mv' to rename them
< naywhayare>
I would imagine you just want to rename them, or move them into your new subdirectories
< sumedhghaisas>
yes... I will just call svn mv ....
< sumedhghaisas>
And about the main NMF code...
< naywhayare>
let's just leave it as nmf_main, but change the usage of NMF<...> to AMF<...> (and substitute in the new rules class names)
< sumedhghaisas>
I will keep only the main for NMF and it will use AMF...
< naywhayare>
yeah, exactly
< sumedhghaisas>
should I delete all the other files in NMF??
< sumedhghaisas>
I dont think nmf.hpp nmf_impl.hpp are needed anymore...
< naywhayare>
if they've been moved to the amf/ directory and you've updated all references to NMF code in the CMakeLists.txt files and in nmf_test.cpp, then yes
< sumedhghaisas>
yeah I will do that before deleting them...
< naywhayare>
you can just check that everything works afterwards by 'make mlpack_test nmf', and then running mlpack_test -t NMFTest
< naywhayare>
or I guess you could just type 'make' to build everything, but 'mlpack_test' and 'nmf' should be the only affected targets
< sumedhghaisas>
yeah I will just use make instead... It will check amf, cf and nmf...
< naywhayare>
ah, CF, that's the one I forgot
< naywhayare>
I guess you should run CFTest too
< sumedhghaisas>
make test..??
< naywhayare>
yeah, I think that builds mlpack_test and automatically runs it
< naywhayare>
but that will run all of the tests, not just CFTest and NMFTest
< naywhayare>
so it might be a bit slower and have a lot more output
< naywhayare>
it still works fine though :)
< sumedhghaisas>
I usually do that... feels good that everything works just fine :)
andrewmw94 has left #mlpack []
< sumedhghaisas>
naywhayare: There is a test to_string test which uses NMF. I have just commented out that part of that test. What is this test??
< naywhayare>
it is meant to test that everything in mlpack has a ToString() method
< sumedhghaisas>
NMF didn't have a ToString() method...
< naywhayare>
not in version 1.0.8, but it does in trunk
< sumedhghaisas>
In C++ it should be operator std::string() to convert it to a string??
< naywhayare>
yes, but I don't think it will automatically cast it to a string if I call 'cout << mlpackObject'
< naywhayare>
oh holy crap, that _does_ work, doesn't it
< naywhayare>
wow
< naywhayare>
so... the way it currently works is that a bunch of template metaprogramming happens when PrefixedOutStream << mlpackObject is called to determine if mlpackObject has a ToString method
< sumedhghaisas>
yeah it does... I use it ... okay lets check again...
< naywhayare>
and if it does, then ToString() is called
< naywhayare>
wow, this is great! we can get rid of a lot of weird template metaprogramming code now!
< naywhayare>
the only concern is that random mlpack objects will get casted to a string when that's not intended
< sumedhghaisas>
Yeah I saw that code in PrefixedOutStream... That code is really weird...
< sumedhghaisas>
umm... Lets think of cases when that could happen...
< naywhayare>
so, for instance, arma::mat(std::string) is a constructor that exists
< sumedhghaisas>
I think operator std::string() is also called at the time of parameter casting...
< sumedhghaisas>
yeah I was thinking of the same..
< naywhayare>
if a user were to accidentally write arma::mat(mlpackObject), then it would compile but fail catastrophically
< sumedhghaisas>
yes... And debugging it would be very hard...
< naywhayare>
:(
< naywhayare>
then maybe changing over to use operator std::string() is not a great idea
< sumedhghaisas>
But then users won't have that functionality with mlpack_objects...
< sumedhghaisas>
I always wondered... can I have and operator of my own class.. Like 'operator test()' ??
< naywhayare>
yeah, I think that should work
< naywhayare>
right now users can do PrefixedOutStream << mlpackObject
< naywhayare>
with C++11, it's easy to change this so cout << mlpackObject will work properly
< sumedhghaisas>
typecasting won't accept ToString() I guess...
< naywhayare>
and we'll have to transition to requiring C++11 by the end of the summer so that Udit's project will work (it is going to use variadic templates)
< sumedhghaisas>
ohh.. nice :) I have never used variadic templates... can you just brief me about it??
< naywhayare>
it's templates with a variable number of arguments
< sumedhghaisas>
woww..
< sumedhghaisas>
And I also wanted to ask you about good books on C++... Now that I am quite familiar with it I want to master it... Way back you mentioned a book if I remember correctly...
< naywhayare>
it would have been Alexandrescu's "Modern C++ Design"
< naywhayare>
that doesn't have anything about C++11, though
< naywhayare>
but I'm not (yet) an expert on C++11
< sumedhghaisas>
yup... I should get time this summer to read it...
< sumedhghaisas>
I love C++ as a language... I have worked with JAVA but C++ just intrigues me...
< naywhayare>
I think the generic programming capabilities of C++ are great and don't exist in any other popular language, but at the same time, they can be extremely complex and hard to debug...
< sumedhghaisas>
yes... "With great power comes great headache... "... :)
< naywhayare>
that is definitely the truth :)
< sumedhghaisas>
If we compile JAVA code.. is it as fast as C++??
< naywhayare>
that's a hard question to answer now
< sumedhghaisas>
If both codes are optimized in their respective languages...
< naywhayare>
ten years ago, the answer would have almost certainly been no -- C++ was way faster
< naywhayare>
however, the Java compiler (and JIT) have improved very significantly
< naywhayare>
so now they are more even. but I don't know any exact numbers, just rumors and hearsay from people who know Java
< sumedhghaisas>
Yes I have been listening to the same rumors...
< sumedhghaisas>
Okay I forgot to commit ... :)
< sumedhghaisas>
what is the command to for adding a msg??
< sumedhghaisas>
go it ... -m
< naywhayare>
either that, or you can leave no message and it will open an editor for you to write a message in