verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
tham has joined #mlpack
< tham> zoq : I saw your discussions with nilay
< tham> IMHO, selective search is easier to implement and understand, opencv and dlib, both of them provide the implementation of selective search
< tham> If you prefer selective search, you can convert the implementation of those libs to something suit for mlapck
< tham> About edgeBoxes, openCV never provide complete implementation
< tham> The class, StructureEdgeDetection of openCV show us how to detect the edges
< tham> but it do not tell us how to train the random forest
< tham> I mean, structured random forest
< tham> To understand the details, you need to study the paper and the source codes(I am stuck at it)
< tham> For me, if you prefer to implement selective search, you can ensure there are two well known libraries like opencv and dlib could be a reference
< tham> both of them are developed by c++, convert their codes to something mlpack could use is quite easy
< tham> If you prefer edgeBoxes, there are two benefits I can see
< tham> first, edgeBoxes is much faster than selective search
< tham> second, none of the open source c++ library have ever provide a complete version of edgeBoxes
< tham> You may able to contribute the edgeBoxes to openCV after this project finished too.
< tham> There are many details I do not understand about edgesBoxes, it is not a big deal to convert the matlab codes to c++
< tham> the hard part is understand the reasons behind the codes
< tham> Before you start to implement random forest for mlpack(if we could treat Hoeffding tree as dtree, this should be easy)
< tham> You need to find a way to convert the data of mat file to something armadillo can eat
< rcurtin> :)
< rcurtin> if the hoeffding tree needs to be refactored, I can help do that
< tham> rcurtin : it is funny :). And thanks for the help of refactored(I do not know we need to do that or not)
< rcurtin> :)
govg has quit [Quit: leaving]
mentekid has joined #mlpack
mentekid has quit [Read error: Connection reset by peer]
mentekid has joined #mlpack
nilay has joined #mlpack
tham has quit [Quit: Page closed]
Mathnerd314 has quit [Ping timeout: 240 seconds]
nilay has quit [Ping timeout: 250 seconds]
mentekid has quit [Ping timeout: 265 seconds]
mentekid has joined #mlpack
Queries has joined #mlpack
< Queries> Hello!
Queries has quit [Client Quit]
mentekid has quit [Remote host closed the connection]
mentekid has joined #mlpack
govg has joined #mlpack
< mentekid> sumedghaisas: I read yesterday when you were chatting with marcus that you implemented several SVD algorithms, I wanted to ask if they are in mlpack
< rcurtin> mentekid: very unexpected results from my simulations
< rcurtin> in every single case the unique() strategy was best
< mentekid> you mean regardless of cutoff?
< rcurtin> I tested with { default parameters, (K=10, L=10) } and cutoff { 0.0 0.001 0.1 0.3 0.5 0.7 0.9 0.99 1}
< rcurtin> on covertype, phy, corel, and miniboone datasets
< rcurtin> I think the corel dataset I have might be different than the one you are using
< rcurtin> but still
< mentekid> I think it is trolling us :P
< rcurtin> usually the cutoff 0.3 or 0.5 gave the very fastest results but there was never a case where cutoff = 0 was faster than cutoff = 1
< rcurtin> yeah I agree
< rcurtin> I want to try on another system
< rcurtin> in this case I am using Armadillo 6 with default Debian configuration (I think this is using ATLAS)
< rcurtin> I will paste my numbers when I am at my desk but I am on the train right now so I can't
< rcurtin> maybe it is possible that different Armadillo setups produce wildly sifferent results
< mentekid> I have no idea what's happening... Were the previous results (ones you posted here a few days ago) from the same configuration?
< rcurtin> no, they were from a different system
< rcurtin> I am going to replicate what I did last night exactly on that system today
< rcurtin> I see that the system I will be testing on today uses Armadillo with OpenBLAS
< rcurtin> that may be the big factor
< mentekid> I'm not sure which one I'm running
< mentekid> so yeah it's probably the underlying system
Mathnerd314 has joined #mlpack
< rcurtin> ldd /path/to/libarmadillo.so is how I am checking
< rcurtin> on one system it is linked to libatlas.so and the other it is linked to libopenblas.so
< mentekid> ah thanks. Mine is linked to libblas 3
< mentekid> but atlas is mentioned further down :/
< rcurtin> yeah so it is very odd then that your results are so inconsistent with mine that use atlas
< rcurtin> when I can paste well I will get you the test scripts I used (very similar to yours)
< mentekid> Yeah I should probably run mine again too, maybe the inconsistencies are caused by something running in the background
< rcurtin> mentekid: http://pastebin.com/4nDUwJrG
< rcurtin> what I did was 'make mlpack_lsh', then if the threshold was X, I would 'cp bin/mlpack_lsh bin/mlpack_lsh-cX'
< rcurtin> and I also made ones that only used the find() strategy and only used the unique() strategy
< rcurtin> http://pastebin.com/TV8NXMMc is the test script, it is quite simple
< rcurtin> I'm going to do the same thing on the system with openblas now
< mentekid> cool I'll repeat the tests tonight as well... Maybe I should run each a few times and average them
< mentekid> Where did you get corel from btw? I have several versions of this: http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features
< rcurtin> I believe that what I am using is a modified version of ColorHistogram.asc
< rcurtin> the first column (the index?) is dropped, leaving 32 columns
< rcurtin> and I only have 37749 points
< rcurtin> to be honest I should probably throw away what I have and use the full 68040 points
< mentekid> yeah I have 32 dimensions as well, but more points. It shouldn't be different though it's just half the dataset instead of all of it (or something like that)
< rcurtin> that paper references the 37749x32 dataset that I have
< rcurtin> but that and random mlpack output that's posted around the internet are all I can find
< rcurtin> so I have no idea how I ended up with a smaller dataset; I think I got it from the lab group I worked for
< rcurtin> and they probably got it from the lab group my advisor worked for
< rcurtin> which was related to John Langford's lab, so probably I just have a file of unknown origin that's been passed down for generations :)
< mentekid> I had a similar story with my "mnist" dataset which ended up being something completely unrelated to the mnist handwritten digits actual dataset
< mentekid> at least yours has the correct dimensions :P
< rcurtin> haha
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
< rcurtin> mentekid: okay, I got the test script going on the system with openblas; I will let you know what results I get
Rodya has quit [Ping timeout: 260 seconds]
Rodya has joined #mlpack
< mentekid> rcurtin: I have to leave soon (they're locking the lab and I don't have keys :P), is it good for you if we talk tomorrow?
mentekid has quit [Ping timeout: 260 seconds]
< rcurtin> sure that works for me
< rcurtin> but I think that you are already gone :)
mentekid has joined #mlpack
sumedhghaisas has quit [Ping timeout: 244 seconds]
gtank has quit [Remote host closed the connection]
mentekid has quit [Ping timeout: 276 seconds]
gtank has joined #mlpack
< rcurtin> mentekid: http://pastebin.com/efRBT75z -- results for system with OpenBLAS
govg has quit [Ping timeout: 260 seconds]
mentekid has joined #mlpack
< mentekid> rcurtin: hey, I got home just now, thanks for the data
< mentekid> it seems like unique is again fastest in most cases, am I wrong?
< mentekid> and some cutoff version is faster for corel default, corel tuned and phy default
< mentekid> and I find the cutoff at 0.05 to be the most stable, it doesn't always come first but it's never much slower than the best