verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
lozhnikov_ has joined #mlpack
lozhnikov has quit [Ping timeout: 246 seconds]
nilay has joined #mlpack
lozhnikov_ is now known as lozhnikov
Mathnerd314 has quit [Ping timeout: 250 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
< mentekid> rcurtin: my results turned out to be similar to yours. In most cases some cutoff version or unique is faster
< mentekid> My data: http://pastebin.com/pPZ441PL My processing script (outputs matlab-ready input for visualisation): http://pastebin.com/jLRDr5D6
mentekid has quit [Ping timeout: 260 seconds]
gtank_ has joined #mlpack
gtank has quit [Ping timeout: 250 seconds]
gtank_ is now known as gtank
mentekid has joined #mlpack
< zoq> nilay: Hello, if you have time take a look at: https://gist.github.com/zoq/6fdc43c959d71980ef8bf31088866da0
< zoq> nilay: I suggest we do the following basic steps:
< zoq> 1. Implement the-the functions to extract the features and the function to prepare the data e.g. to split the data we use to train the structured tree.
< zoq> 2. Implement the mapping function we need to train a single structured tree.
< zoq> 3. Go over to the actual training of the structured tree by using one of the existing mlpack implementations.
< zoq> After each step we write tests, to ensure the implementation does what we expect.
< zoq> nilay: Does this sound reasonable?
< nilay> zoq: Hi, I looked at the interface.. I still have doubts in the details, the details that paper doesn't tell so explicitly.. Do you look fully at the python implementation for this?
< zoq> nilay: yes, I definitely left out some functions, participial in the StructuredTree class.
< zoq> nilay: There is no problem to add function on the way.
< nilay> zoq: ok. but for example what do SelfSimilarityFeatures() and RegularFeatures do. .
< zoq> nilay: Extract the features given an image and a set of locations ("Inspired by the edge detection results of Lim et al.,
< zoq> we use a similar set of color and gradient channels [...] resulting in 7228 total candidate features per patch").
< nilay> zoq: also i have doubt in the following: once we have a 16x16 structured label. do we generate binary vector z for each tree by randomly taking entries from it.? and how do we store edge map information at the leaves. then we would have to map back from Z to Y?
< zoq> nilay: Lim et al. divides the features into self-similarity and regular features.
< zoq> nilay: hm, let me take a look
< zoq> nilay: So, as I see it we don't have to map back from Z to Y, we can just use Z as edge map, and that is basically all we need.
< zoq> nilay: and yes, we we train each tree with random samples
< nilay> zoq: so from 16x16 patch i take randomly 16 pixels. and find z in between them. and this i do for each tree separately for this patch only.
< nilay> zoq: and then similarly for the next 16x16 structured label..
< zoq> nilay: Almost, you take x random locations (x,y) from the image and extract a 16x16 patch at that location and for that patch you extract the features.
< zoq> nilay: ah, you are talking about the labels right?
< nilay> zoq: yes the labels.
< zoq> nilay: you are right
< nilay> zoq: so now we have this intermediate mapping z, which are now are input data points. and the features, we find from some other method (lim et. al.).. how can this be possible?
< nilay> because normally i have a n dimensional data point. and i have n features (corresponding to each dimension) for that data
< nilay> zoq: the whole point of converting structured labels Y to intermediate mapping Z is, so that we can use points in Z as the input data points to our random forest.
< zoq> nilay: yes, right, I'm not sure I get your point.
< zoq> At the end, you are going to train a single tree with some input features and the coressponding labels.
< nilay> zoq: i am saying, take a point z from Z. now dimension of z should be the number of features we have.
< nilay> is this wrong?
< zoq> nilay: you are right
< nilay> zoq: so, i have my data points in a 5 dimensional space Z. and my features are calculated by lim.. as some 7228 total candidate features per patch.. how is this possible
< nilay> zoq: because we calculate z as a binary vector separately. and we calculate our features separately. this does not make sense.
< zoq> nilay: You always have to use the same location for the feature extraction as for the binary edge map. You can't just take a random sample of the feature space and a random edge map as label.
< zoq> nilay: Thats what the prepare_data function does.
< zoq> nilay: ftrs and lbls are samples from the same location
< zoq> nilay: I'm not sure we have to replicate this pos_loc and neg_loc block, It should be sufficient to use a uniform distribution.
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
< nilay> zoq: i think i understand now.
nilay has quit [Ping timeout: 250 seconds]
< zoq> nilay: So I think it would be a good idea to start with the feature computation. So extract 7228 features given an image and a set of locations.
nilay has joined #mlpack
nilay has quit [Ping timeout: 250 seconds]
nilay has joined #mlpack
< nilay> zoq: ok
nilay has quit [Ping timeout: 250 seconds]
Mathnerd314 has joined #mlpack
< mentekid> What would be the process for compiling mlpack with a different compiler? For example I have g++ but now I'm installing the Intel g++ compiler student edition - how can I specify which compiler make should use (is there such an option?)
< mentekid> Intel c++ sorry
< rcurtin> cmake -DCMAKE_CXX_COMPILER=/path/to/icc
< rcurtin> I think that's all that's necessary
< rcurtin> the CMake output will tell you if it's been specified correctly, since it prints the compiler that is being used
< rcurtin> mentekid: I was looking through your simulation results, one of the things that is confusing to me is that there doesn't often appear to be a big difference between the unique and find strategies
< rcurtin> like for miniboone tuned, find takes 6.63s whereas unique takes 6.58s
< rcurtin> but on my system with openblas find takes 16.86s and unique takes 3.89s
< rcurtin> is it possible that we are using different versions of armadillo or openblas or something, and this is responsible for the discrepancy?
< rcurtin> I am using Armadillo 5.200.1 with openblas 0.2.14-1
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
< mentekid> rcurtin: sorry for the late reply, just saw this
< mentekid> let me see what versions I am using
< mentekid> I have version 6 of armadillo, not sure how I can find the smaller indices
< mentekid> and blas 3
< mentekid> ah, armadillo 6.500.5
< mentekid> I don't understand how it could show such big differences though, this is significantly different in our systems
< mentekid> is it possible we're running different datasets?
< mentekid> I get all of mine from http://archive.ics.uci.edu/ml/index.html
< rcurtin> miniboone is about 130k points in 50 dimensions, right?
< rcurtin> that's the set I have
< mentekid> I have 49 dimensions, maybe I erased one thinking it was an index
< rcurtin> on the other system I ran benchmarks on, I used Armadillo 6, but no big runtime difference
< mentekid> but yeah 130k points
< rcurtin> one dimension different shouldn't make such a huge difference
< rcurtin> so I guess, I'm pretty confused why there is so little difference between the find and unique strategies on your system
< mentekid> yeah I found that weird too
< mentekid> I'm not sure how to find out which blas version I am using
< mentekid> i am looking for openblas but can only find cblas
< mentekid> I have to go for a while, I'll log back in later
mentekid has quit [Ping timeout: 260 seconds]
< rcurtin> okay; you can find out what you are using with ldd /usr/lib/libarmadillo.so (or wherever libarmadillo.so is) and then figure out what the blas and lapack links are
mentekid has joined #mlpack
< mentekid> Ah I can connect from my phone, so no problem :)
< mentekid> So yeah I don't understand why there's such a big difference
< mentekid> to be honest I didn't see it till you mentioned it though
< mentekid> Is your binary compiled with debugging/profiling on? I think mine had debugging off (I'll make sure). Could that be the difference?
< rcurtin> yeah, I compiled with -DDEBUG=OFF -DPROFILE=OFF
mentekid has quit [Quit: Bye]
mentekid has joined #mlpack
mentekid_ has joined #mlpack
< mentekid_> hmm, let me recompile with that and run it, but I doubt that is it
< mentekid_> ah stupid phone stole my nickname
mentekid has quit [Read error: Connection reset by peer]
arunkumarms has joined #mlpack
arunkumarms has quit [Quit: rcirc on GNU Emacs 24.5.1]
mentekid has joined #mlpack
mentekid has quit [Quit: Bye]
mentekid has joined #mlpack
mentekid has quit [Read error: Connection reset by peer]