naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
jbc_ has joined #mlpack
jbc_ has quit [Client Quit]
jbc_ has joined #mlpack
jbc_ has quit [Quit: jbc_]
witness___ has joined #mlpack
sumedhghaisas has joined #mlpack
witness___ has quit [Quit: Connection closed for inactivity]
sumedhghaisas has quit [Ping timeout: 272 seconds]
govg has quit [Quit: leaving]
sumedhghaisas has joined #mlpack
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
sumedhghaisas has joined #mlpack
< marcus_zoq> "We require OpenMP now." ... interesting
< naywhayare> well, that's not quite true yet
< naywhayare> it's commented out for now :0
< naywhayare> * :)
< naywhayare> but I am experimenting with making the tree traversals use OpenMP
< naywhayare> so far I have some speedup
< naywhayare> I also have some OpenMP implementations of things like k-means that were done a few years ago for some class projects by other Georgia Tech students that I haven't worked in yet
< naywhayare> if you think there is a better alternative to OpenMP, then we could use that instead, but it seems like the easiest way to do shared-memory parallelism
< marcus_zoq> ah, nice, I haven't really thought about parallelism ...
< naywhayare> yeah, it's not something I often consider... thinking about serial tree algorithms is hard enough :)
< naywhayare> but I proposed to implement parallel tree traversals for my thesis, so I have to do it...
< naywhayare> I will be implementing a distributed (many-node) tree traversal, too, but I am not sure how or if that fits into mlpack
conrad_s has joined #mlpack
< conrad_s> hi ryan, it's been a long time since I used IRC
< naywhayare> conrad_s: hello! there are still some IRC users :)
andrewmw94 has joined #mlpack
< conrad_s> thought I'd check this out -- I've seen lots of svn activity on mlpack lately
< naywhayare> yeah, GSoC is coming to a close and there is a lot of activity
< conrad_s> just got your email about the differences between spop_min and spop_max. so it's no problem to use either version of the check? on first sight it seems to be bit of a difference between one check and the other.
< naywhayare> yeah, it should not be a difference, unless I am having a mental hiccup
< naywhayare> the termination condition '(stuff) <= index_of_min_val' and '(stuff) < index_of_min_val + 1' should operate identically
< naywhayare> since index_of_min_val is an integer type I can't see a situation where the two conditionals aren't equivalent (except an overflow situation, which shouldn't happen)
< naywhayare> but it is possible I have overlooked something trivial and am about to feel stupid... :)
< conrad_s> maybe my brain isn't working either -- it's a bit late here. I'll leave the code as is, and release arma 4.400 sometime tomorrow
< naywhayare> okay, sounds good. either way, it passes the 26 test cases I wrote, which cover both loops, so I think it's alright
< naywhayare> before tomorrow I will send you a minor documentation patch detailing the sentinel values for the sparse matrices which were confusing Dirk
< naywhayare> but I don't think I have anything else on my "high priority Armadillo" list. I looked through the code you committed for P.W. Dondl's batch sparse matrix constructor, and did not see any issues
< naywhayare> the memory allocation seems to be just fine
< conrad_s> ok, that'd be great. it may help other people interfacing with SpMat. Patrick Dondl is apparently writing a wrapper for CHOLMOD
< naywhayare> though I've not written tests for it
< naywhayare> ah, great! I have not been able to find any time to undertake something like that
< naywhayare> when he finishes it I will help take a look at it, though the lag time may be a few days to a few weeks :-\
< conrad_s> no problem. I'm not sure how it's going to work with certain parts being LGPL and other parts GPL.
< conrad_s> he mentioned that one possibility is to put the wrapper as a separate package on sourceforge
< naywhayare> only the "Supernodal", "Modify" and "GPU" modules are GPLed, so I don't know if he will be using those parts
< naywhayare> but if I know Tim Davis, those are the only useful parts of the code
< conrad_s> if GPL parts are being used, I'd prefer for it to be a separate package, so we don't confuse the armadillo message of "safe license". either way, having the wrapper will be useful.
< naywhayare> I agree
< conrad_s> you probably noticed that I added (diagonal) GMM code to armadillo
< naywhayare> yes, I noticed it a while ago, but didn't spend much time looking at it
oldbeardo has joined #mlpack
< conrad_s> not meant to step on the feet of mlpack. the code has been laying around for quite a while, and a few folks at my work asked me to integrate it properly into armadillo.
< conrad_s> it was a very borderline case as to whether it should go into armadillo or not, as I don't want the library to start having its own urban sprawl
< naywhayare> yeah; I would see it as borderline too. we have a benchmarking system set up for mlpack that marcus_zoq put together, so at some point if I ever have time I may try to find a way to add Armadillo's GMM implementation to the mix of things that are tested
< naywhayare> the main issue is that that's diagonal GMMs, and we test full-covariance GMMs at the moment
< naywhayare> also the mlpack code for training diagonal GMMs involves the full-covariance EM algorithm and then a step that sets all non-diagonal entries to 0, so it's excruciatingly slow...
< naywhayare> the reason it's like that is because the abstractions we used made that the easiest way to do it. it sounds like actually the thing to do is just specialize to Armadillo's implementation in that situation
< naywhayare> I think I'll open a ticket in Trac so I don't forget...
< conrad_s> ok, so there is no real clash here then. the gmm_diag code uses OpenMP for k-means and the EM algorithm. the k-means implementaiton also uses robust statistics for the means. the EM algorithm uses the "log_exp_add" trick to prevent underflows/overflows. I haven't looked too closely at the GMM implementation in mlpack, so perhaps these "tricks" are already being used there.
< naywhayare> I know some of the mlpack code uses logs to prevent underflows and overflows, but I'm not sure if the GMM code does
< naywhayare> even if there was a clash, I wouldn't see it as a problem. in my opinion, more (competently written) implementations of algorithms is usually a good thing
< naywhayare> one of the things we noticed in our benchmarking adventures (which, I guess, is a pretty obvious observation in hindsight) is that the runtime of different libraries can wildly vary depending on the input dataset
< naywhayare> or, more specifically, the relative runtime of different libraries, as compared with each other
< conrad_s> yes, and the initial starting points can make a big difference in iterative algorithms like k-means.
< naywhayare> yeah; for our comparisons we try to make sure that the random elements of algorithms are removed
< conrad_s> the openmp stuff was a little bit tricky to get right, so feel free to adapt it into mlpack.
< naywhayare> yeah; I have been playing with OpenMP recently for use in the mlpack tree traversals, using the #pragma omp task construct
< naywhayare> what I've found so far is that the overhead of instantiating tasks is pretty high (at least with the standard OpenMP implementation which comes with gcc)
< conrad_s> openmp is actually pretty nice. I initially converted the code to use c++11 threads, but it turns out that it becomes a bit of a mess at link time. sometimes I wonder if gcc developers like to inhale various herbs while writing code
< conrad_s> *nice in terms of API
< naywhayare> haha, maybe they are believers in the Ballmer peak
< naywhayare> just with THC instead of alcohol...
< conrad_s> looking at things from a "big picture" point of view, and after experiencing C++ for many years, I'm starting to think that the language is unnecessarily complicated
< naywhayare> I agree, and each successive revision of the standard isn't making things any better
< naywhayare> at the same time, I have yet to find a language that lets me produce code that is as clean (for users) and fast as C++
< naywhayare> but the tradeoff is that if a user does something wrong, they get thousands of lines of errors...
< naywhayare> I decided to count the number of lines of errors I got while working on something yesterday... the issue was a misspelled function, I think. 130,000 lines of errors
< conrad_s> insane. you'd think the C++ committee would have come up with a solution to this by now. C++ does generate fast code, but productivity goes out the window as soon as low-level template metaprogramming is involved. it's C with lots of bits stuck on, but it doesn't really gel.
< conrad_s> I'm starting to look at the Rust language http://www.rust-lang.org/ which is still under development. so far their generics (templates) are rather rudimentary, but they have done a lot of other things right. compared to C++ it feels positvely modern.
< naywhayare> hm, interesting. I will have to take a look sometime
< conrad_s> it's really late here -- I better head off to sleep :)
< naywhayare> :)
< naywhayare> 'night
< conrad_s> night
< naywhayare> I'll have sent you an email with a minor documentation patch by the time you wake up
conrad_s has left #mlpack []
< naywhayare> andrewmw94: I took a look through the Rectangle tree code and have a few high level comments
< naywhayare> I cleaned up a lot of stuff with typedefs, in the RTreeSplit and RStarTreeSplit classes... it makes it look a lot less ugly :)
< naywhayare> but I noticed that RTreeSplit and RTreeDescentHeuristic are, in general, always used together. are there compelling situations where one would use RTreeSplit with RStarTreeDescentHeuristic?
< naywhayare> or, really, I guess I mean, could you do that? if in general you can't, then maybe we should merge the two classes for simplicity
< naywhayare> I have vague memories of having this discussion a very long time ago, though, so I seem to have a vague feeling that the answer is that you could arbitrarily combine split types with descent types
govg has quit [Remote host closed the connection]
< naywhayare> oldbeardo: sumedh committed his changes to cf_main.cpp, so it should be pretty straightforward to add your regularized SVD implementation as an option now
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< oldbeardo> naywhayare: okay, I'll have a look, I will certainly need your help for this, I don't have an idea of how executables work
< naywhayare> okay; maybe we can do it through email? I am going to get lunch in 15 minutes, then I have a lecture to attend, so I would be back at 1800 UTC
< naywhayare> which is probably a little late for you
< naywhayare> you can take a look at the program, though, it's pretty simple... just look at int main():
< naywhayare> it takes in the parameters, which includes the algorithm
< naywhayare> then runs CF with the given factorizer, and saves the output
< naywhayare> so I think all you will need to do is add another 'else if' statement and add the name of your factorizer, then call ComputeRecommendations appropriately
< naywhayare> or use Sumedh's "CR" convenience macro
< naywhayare> if you can also add a short blurb to the PROGRAM_INFO macro about what your regularized SVD implementation is, that would also be great
< naywhayare> the PROGRAM_INFO macro contains the text that the user sees when they type 'cf -h'
< naywhayare> I need to get Sumedh to add short descriptions of his algorithms there, too, so that a user knows what choices they have for the --algorithm parameter
< naywhayare> once you've made the changes, you can build it and test it with some dataset... something like 'cf -a "regularized_svd" -o recommendations.csv -i dataset.csv'
< oldbeardo> okay, by the way I just finished writing a code example for Reg SVD, I will commit the changes right now
< naywhayare> where "regularized_svd" is whatever name you used for the algorithm
< naywhayare> great, thanks!
< oldbeardo> naywhayare: I have a question about svn
< naywhayare> sure, go ahead
< oldbeardo> right now I have three modified files, two of which belong to QUIC-SVD containing the changes I sent to you yesterday
< oldbeardo> what if I want to commit the changes only from the third file?
< naywhayare> 'svn ci name_of_file_to_commit'
< naywhayare> you can specify multiple files like that too... 'svn ci file1 file2 file3 directory1/', etc.
< oldbeardo> okay, thanks
< naywhayare> sure, no problem
< oldbeardo> can I mention a message along with the commit?
< naywhayare> I don't know what you mean
< oldbeardo> no problem, I made the commit
< naywhayare> ah, ok
< oldbeardo> naywhayare: just curious, which class will you be attending?
govg has quit [Remote host closed the connection]
< naywhayare> I have a friend who moved to california instead of finishing his degree
< naywhayare> but now he wants to finish his degree, but he lives in california
< naywhayare> so he asked me to attend the lectures for him and record them
< naywhayare> most of it should be review (I read the Sipser book last year, so I think I'm familiar with the content), but I'm hoping maybe the instructor will talk about some interesting things I don't know :)
< oldbeardo> heh, how come this link is of an Indian institute?
< naywhayare> if not, I'm bringing interesting papers Google Scholar recommended to me to read while he lectures...
< naywhayare> I'm not sure. I think the instructor taught at Tech but has since left to IIT Hyderbad, it looks like
< naywhayare> I just found the first course website I could for that class
< oldbeardo> right
< naywhayare> anyway, I have to grab some lunch now...
< naywhayare> I will talk to you later. if you have problems with the CF modifications, feel free to send me an email and I'll respond to it once I get it, or leave messages in IRC or whatever works best for you
< oldbeardo> sure, see you tomorrow
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
sumedhghaisas has quit [Ping timeout: 272 seconds]
sumedhghaisas has joined #mlpack
govg has quit [Ping timeout: 250 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
oldbeardo has quit [Quit: Page closed]
jbc_ has joined #mlpack
jbc_ has quit [Quit: jbc_]
jbc_ has joined #mlpack
govg_ has joined #mlpack
< marcus_zoq> naywhayare: you were faster :)
< naywhayare> marcus_zoq: :) I was waiting on some runs to finish so I figured I might as well do something useful
< marcus_zoq> Hopefully the build is fine, It looks like he builds mlpack on windows ...
< naywhayare> uh-oh, I guess I should spend some time working with the Windows build slaves or something
< naywhayare> although Gilles Barges says he has it compiling fine under MinGW, though with a few changes and it's apparently a bit difficult to configure
< naywhayare> if you want to work with the Windows boxes I'll happily give you an account on the systems, but I would think remote desktop from Germany to Atlanta would be nightmarishly slow
< marcus_zoq> yeah I think so, also I'm not really a windows user, but maybe It it worth a try
< naywhayare> ok; I'd need an IP to add an IP exemption from, though
< naywhayare> leaving a windows box open to rdesktop from any IP is a bad idea...
< marcus_zoq> Okay, in this case I need to tunnel the connection through another server :) maybe you can add two ip's so I can test which one has the better route?
< naywhayare> yeah, that's just fine
sumedhghaisas has quit [Ping timeout: 272 seconds]
< marcus_zoq> I guess you need a ipv4 addresse, at home I don't have a static ipv4 addresse :(
< naywhayare> I can do ipv6 too
< naywhayare> ...I think
< naywhayare> I know Windows supports it, at least...
sumedhghaisas has joined #mlpack
jenkins-mlpack has quit [Ping timeout: 246 seconds]
jenkins-mlpack has joined #mlpack
sumedhghaisas has quit [Ping timeout: 245 seconds]