#mlpack on 2016-05-13 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:44 tham has joined #mlpack

00:50 < tham> zoq : I saw your discussions with nilay

00:52 < tham> IMHO, selective search is easier to implement and understand, opencv and dlib, both of them provide the implementation of selective search

00:52 < tham> If you prefer selective search, you can convert the implementation of those libs to something suit for mlapck

00:53 < tham> About edgeBoxes, openCV never provide complete implementation

00:55 < tham> The class, StructureEdgeDetection of openCV show us how to detect the edges

00:55 < tham> but it do not tell us how to train the random forest

00:56 < tham> I mean, structured random forest

00:56 < tham> To understand the details, you need to study the paper and the source codes(I am stuck at it)

00:57 < tham> For me, if you prefer to implement selective search, you can ensure there are two well known libraries like opencv and dlib could be a reference

00:57 < tham> both of them are developed by c++, convert their codes to something mlpack could use is quite easy

00:58 < tham> If you prefer edgeBoxes, there are two benefits I can see

00:58 < tham> first, edgeBoxes is much faster than selective search

00:59 < tham> second, none of the open source c++ library have ever provide a complete version of edgeBoxes

01:02 < tham> You may able to contribute the edgeBoxes to openCV after this project finished too.

01:03 < tham> There are many details I do not understand about edgesBoxes, it is not a big deal to convert the matlab codes to c++

01:03 < tham> the hard part is understand the reasons behind the codes

01:23 < tham> Before you start to implement random forest for mlpack(if we could treat Hoeffding tree as dtree, this should be easy)

01:25 < tham> You need to find a way to convert the data of mat file to something armadillo can eat

01:37 < rcurtin> tham: here is something armadillo can eat: https://s-media-cache-ak0.pinimg.com/736x/87/28/c9/8728c93ddb071b2d1d9cad7110da264d.jpg

01:37 < rcurtin> :)

01:37 < rcurtin> if the hoeffding tree needs to be refactored, I can help do that

01:41 < tham> rcurtin : it is funny :). And thanks for the help of refactored(I do not know we need to do that or not)

01:48 < rcurtin> :)

05:06 govg has quit [Quit: leaving]

05:31 mentekid has joined #mlpack

05:39 mentekid has quit [Read error: Connection reset by peer]

05:41 mentekid has joined #mlpack

05:45 nilay has joined #mlpack

06:15 tham has quit [Quit: Page closed]

06:25 Mathnerd314 has quit [Ping timeout: 240 seconds]

07:29 nilay has quit [Ping timeout: 250 seconds]

07:31 mentekid has quit [Ping timeout: 265 seconds]

08:38 mentekid has joined #mlpack

09:14 Queries has joined #mlpack

09:14 < Queries> Hello!

09:15 Queries has quit [Client Quit]

11:22 mentekid has quit [Remote host closed the connection]

11:48 mentekid has joined #mlpack

11:54 govg has joined #mlpack

12:58 < mentekid> sumedghaisas: I read yesterday when you were chatting with marcus that you implemented several SVD algorithms, I wanted to ask if they are in mlpack

13:38 < rcurtin> mentekid: very unexpected results from my simulations

13:38 < rcurtin> in every single case the unique() strategy was best

13:39 < mentekid> you mean regardless of cutoff?

13:40 < rcurtin> I tested with { default parameters, (K=10, L=10) } and cutoff { 0.0 0.001 0.1 0.3 0.5 0.7 0.9 0.99 1}

13:40 < rcurtin> on covertype, phy, corel, and miniboone datasets

13:41 < rcurtin> I think the corel dataset I have might be different than the one you are using

13:41 < rcurtin> but still

13:41 < mentekid> I think it is trolling us :P

13:42 < rcurtin> usually the cutoff 0.3 or 0.5 gave the very fastest results but there was never a case where cutoff = 0 was faster than cutoff = 1

13:42 < rcurtin> yeah I agree

13:42 < rcurtin> I want to try on another system

13:42 < rcurtin> in this case I am using Armadillo 6 with default Debian configuration (I think this is using ATLAS)

13:42 < rcurtin> I will paste my numbers when I am at my desk but I am on the train right now so I can't

13:43 < rcurtin> maybe it is possible that different Armadillo setups produce wildly sifferent results

13:43 < mentekid> I have no idea what's happening... Were the previous results (ones you posted here a few days ago) from the same configuration?

13:44 < rcurtin> no, they were from a different system

13:44 < rcurtin> I am going to replicate what I did last night exactly on that system today

13:45 < rcurtin> I see that the system I will be testing on today uses Armadillo with OpenBLAS

13:45 < rcurtin> that may be the big factor

13:46 < mentekid> I'm not sure which one I'm running

13:46 < mentekid> so yeah it's probably the underlying system

13:46 Mathnerd314 has joined #mlpack

13:47 < rcurtin> ldd /path/to/libarmadillo.so is how I am checking

13:47 < rcurtin> on one system it is linked to libatlas.so and the other it is linked to libopenblas.so

13:49 < mentekid> ah thanks. Mine is linked to libblas 3

13:49 < mentekid> but atlas is mentioned further down :/

13:50 < rcurtin> yeah so it is very odd then that your results are so inconsistent with mine that use atlas

13:50 < rcurtin> when I can paste well I will get you the test scripts I used (very similar to yours)

13:51 < mentekid> Yeah I should probably run mine again too, maybe the inconsistencies are caused by something running in the background

14:10 < rcurtin> mentekid: http://pastebin.com/4nDUwJrG

14:10 < rcurtin> what I did was 'make mlpack_lsh', then if the threshold was X, I would 'cp bin/mlpack_lsh bin/mlpack_lsh-cX'

14:10 < rcurtin> and I also made ones that only used the find() strategy and only used the unique() strategy

14:11 < rcurtin> http://pastebin.com/TV8NXMMc is the test script, it is quite simple

14:11 < rcurtin> I'm going to do the same thing on the system with openblas now

14:14 < mentekid> cool I'll repeat the tests tonight as well... Maybe I should run each a few times and average them

14:14 < mentekid> Where did you get corel from btw? I have several versions of this: http://archive.ics.uci.edu/ml/datasets/Corel+Image+Features

14:17 < rcurtin> I believe that what I am using is a modified version of ColorHistogram.asc

14:17 < rcurtin> the first column (the index?) is dropped, leaving 32 columns

14:17 < rcurtin> and I only have 37749 points

14:18 < rcurtin> to be honest I should probably throw away what I have and use the full 68040 points

14:19 < mentekid> yeah I have 32 dimensions as well, but more points. It shouldn't be different though it's just half the dataset instead of all of it (or something like that)

14:24 < rcurtin> http://hunch.net/~jl/projects/predictive_indexing/predictive_indexing.pdf

14:24 < rcurtin> that paper references the 37749x32 dataset that I have

14:24 < rcurtin> but that and random mlpack output that's posted around the internet are all I can find

14:24 < rcurtin> so I have no idea how I ended up with a smaller dataset; I think I got it from the lab group I worked for

14:25 < rcurtin> and they probably got it from the lab group my advisor worked for

14:25 < rcurtin> which was related to John Langford's lab, so probably I just have a file of unknown origin that's been passed down for generations :)

14:25 < mentekid> I had a similar story with my "mnist" dataset which ended up being something completely unrelated to the mnist handwritten digits actual dataset

14:26 < mentekid> at least yours has the correct dimensions :P

14:29 < rcurtin> haha

14:29 tsathoggua has joined #mlpack

14:30 tsathoggua has quit [Client Quit]

14:53 < rcurtin> mentekid: okay, I got the test script going on the system with openblas; I will let you know what results I get

15:25 Rodya has quit [Ping timeout: 260 seconds]

15:31 Rodya has joined #mlpack

15:45 < mentekid> rcurtin: I have to leave soon (they're locking the lab and I don't have keys :P), is it good for you if we talk tomorrow?

15:51 mentekid has quit [Ping timeout: 260 seconds]

15:52 < rcurtin> sure that works for me

15:52 < rcurtin> but I think that you are already gone :)

16:12 mentekid has joined #mlpack

16:25 sumedhghaisas has quit [Ping timeout: 244 seconds]

16:56 gtank has quit [Remote host closed the connection]

17:24 mentekid has quit [Ping timeout: 276 seconds]

17:40 gtank has joined #mlpack

19:29 < rcurtin> mentekid: http://pastebin.com/efRBT75z -- results for system with OpenBLAS

20:11 govg has quit [Ping timeout: 260 seconds]

20:12 mentekid has joined #mlpack

20:35 < mentekid> rcurtin: hey, I got home just now, thanks for the data

20:36 < mentekid> it seems like unique is again fastest in most cases, am I wrong?

20:39 < mentekid> and some cutoff version is faster for corel default, corel tuned and phy default

20:41 < mentekid> and I find the cutoff at 0.05 to be the most stable, it doesn't always come first but it's never much slower than the best