#mlpack on 2016-05-17 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:39 govg has quit [Ping timeout: 260 seconds]

00:40 govg has joined #mlpack

00:43 tsathoggua has joined #mlpack

00:43 tsathoggua has quit [Client Quit]

01:04 govg has quit [Ping timeout: 252 seconds]

01:06 govg has joined #mlpack

01:14 govg has quit [Ping timeout: 244 seconds]

01:15 govg has joined #mlpack

02:09 keonkim has quit [Ping timeout: 250 seconds]

02:11 keonkim has joined #mlpack

03:24 govg has quit [Quit: leaving]

03:24 govg has joined #mlpack

03:26 Mathnerd314 has quit [Ping timeout: 260 seconds]

04:40 bang has joined #mlpack

04:41 bang is now known as Guest54688

05:36 Guest54688 has quit [Quit: Page closed]

06:35 govg has quit [Ping timeout: 260 seconds]

06:37 mentekid has quit [Remote host closed the connection]

06:45 mentekid has joined #mlpack

06:53 mentekid has quit [Ping timeout: 276 seconds]

07:34 mentekid has joined #mlpack

08:22 < tham> zoq : Hi, I have some questions want to ask

08:23 < tham> About autopilot, how do you detect obstacle?

08:24 < tham> By radar?computer vision?both?or other tech?

08:43 nilay has joined #mlpack

08:58 nilay has quit [Ping timeout: 250 seconds]

10:38 < zoq> tham: That depends, on the algorithm. You can use the laser scanner at the bottom, the laser scanner on the top or cameras.

10:39 < zoq> tham: and of course combinations

10:45 < tham> zoq : Thanks

10:48 tham has quit [Quit: Page closed]

11:21 nilay has joined #mlpack

11:37 < zoq> nilay: Hello, how are things going? Have you thought about the initial interface? If you like I can also propose something and we could go from there.

11:39 < nilay> zoq: Hi, i haven't thought of initial interface uptill now. I was thinking to implement random forest first.

11:42 < zoq> nilay: Sounds like a good start, I think it's a good idea to discuss the initial interface of the random forest before we start coding.

11:43 < nilay> ok

11:44 < nilay> zoq: i do not get how only decision-stump can be used for random forest.

11:44 < zoq> nilay: So, if you like I can propose something or you could do that if you like.

11:44 < zoq> nilay: I think, it would be clear how to use them in the interface.

11:45 < nilay> zoq: ok then.

11:48 < zoq> nilay: So, if you like I can come up with something that we could use as discussion basis.

11:48 < nilay> zoq: that would be good

11:50 < zoq> nilay: okay, good. I'll see if can write something down at the end of the day, and we can probably talk about it tomorrow, if you have time?

11:51 < nilay> zoq: i had a question, when starting to code, if I make the files(random_forest.cpp, random_forest.hpp... etc) in the methods directory then how do i only execute them?

11:51 < nilay> zoq: end of the day by utc?

11:52 < zoq> nilay: Does the current time work for you?

11:52 < nilay> yes

11:53 < zoq> nilay: okay, good. I think the best idea is to use a unit test to do that. Note, if you build mlpack you only build changes.

11:53 < nilay> zoq: what is a unit test?

11:54 < nilay> do you mean i just do ../cmake and it'll take care of everything.

11:54 < zoq> nilay: Also you have to write a CMakefile to build the code. Take a look at https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/decision_stump especially at the CMakeLists.txt file.

11:55 < zoq> nilay: You can basically change the files accoring to your project.

11:56 < nilay> zoq: i take a look at it. so i just write the cmake makefile and the other files and do ../cmake ?

11:56 < zoq> nilay: yes

11:57 < zoq> nilay: About how to test your code, I'll go and send you an mail in a couple of hours. I have to go to a meeting now.

11:57 < nilay> zoq: ok

12:01 nilay has quit [Ping timeout: 250 seconds]

12:04 nilay has joined #mlpack

13:40 mentekid has quit [Ping timeout: 276 seconds]

13:50 nilay has quit [Ping timeout: 250 seconds]

13:58 nilay has joined #mlpack

14:01 mentekid has joined #mlpack

14:09 Mathnerd314 has joined #mlpack

15:08 < zoq> nilay: I just sent you some instructions how to create the project and how to test and run the code. Let me know if anything isn't clear.

15:12 < nilay> zoq: ok, i will try them. Can we discuss the interface?

15:13 < nilay> zoq: we need to use opencv for various image functions, we will just not use the trained model opencv provides, right?

15:13 < zoq> nilay: great, can we discuss the interface tomorrow?

15:13 < nilay> zoq: okay sure.

15:14 < nilay> zoq: can you answer my opencv query..?

15:20 < zoq> nilay: I think, we do not need any special OpenCV image function for the project, e.g. we don't need openCV to compute the image gradient, we can do it ourself in a couple of lines. Since OpenCV comes with this huge amount of dependencies, it doesn't make much sense to me, to install OpenCV just to use I guess two functions.

15:21 < nilay> zoq: if we don't use opencv, how do we read image?

15:22 < zoq> nilay: e.g. by using arma::load(...)

15:23 < nilay> zoq: Inspired by Lim et al. [31], we use a similar set of color and gradient channels (originally developed for fast pedes- trian detection [12]). We compute three color channels in CIE-LUV color space along with normalized gradient mag- nitude at two scales (original and half resolution). Addition- ally, we split each gradient magnitude channel into four channels based on orientation. And what about this?

15:26 < zoq> nilay: Take a look at the rgb2luv function in https://github.com/ArtanisCV/StructuredForests/blob/master/utils.p .., no need to use opencv.

15:26 < zoq> nilay: The same applies for the gradient function.

15:26 < nilay> zoq: ok

15:27 < zoq> nilay: armadillo comes with an histogram function so we could use that

15:27 < zoq> nilay: And also to do convolution with a triangle filter.

15:30 < nilay> zoq: ok, i get it now. many optimizations can be done for random forests as seen here... ( https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm ) do we want to do them?

15:30 < nilay> zoq: or we just build the random forest as given in the paper. or do you think we discuss this tomorrow?

15:32 < zoq> nilay: I would go with the random forest as described in the paper. And do optimizations afterwards.

15:32 < nilay> zoq: ok, and how do we use decision stump. Tree will not grow by more than 1 level, if decision stump is used.

15:34 < zoq> nilay: Right, we have to modify the code, or use Cloud's code that is already partially modified.

15:36 < zoq> nilay: So read images and labels -> extract features -> train structured trees -> merge ensemble -> done

15:36 < nilay> zoq: do we assume the labels are provided to us?

15:38 < zoq> nilay: Yes, we can start with a subset of the BSDS500 dataset (http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html) to test the code.

15:39 < nilay> zoq: ok, and is there a armadillo tutorial i can look at?, i cannot seem to find using simple google search.

15:40 < zoq> nilay: I don't think so, but the docs (http://arma.sourceforge.net/docs.html) are pretty good.

15:41 mentekid has quit [Ping timeout: 276 seconds]

15:42 < nilay> zoq: ok, that's all my doubts for now :)

15:42 < nilay> thanks.

15:45 < zoq> nilay: Sure no problem, once we start coding all this becomes more clear :)

15:47 < zoq> I really like Andrew's blog post: http://mlpack.org/gsocblog/andrew-week-6.html ... And I guess, at some point you realize the same.

15:57 < nilay> zoq: yes, I hope so. Right now I look most of the things up too. But I guess that is the result of using so many different languages.

16:09 < nilay> zoq: The paper(FAST EDGE DETECTION USING STRUCTURED FORESTS) talks about the following (Section 4, page 5, input features): Our learning approach predicts a structured 16 x 16 segmentation mask from a larger 32 x 32 image patch. They augment the image patch to 32x32x3. But never really say how to find the structured 16x16 labels.

16:16 < zoq> nilay: yeah, the paper left out some details. In that case the dataset already contains the labels. So each pixel in the input image (or a 32 x 32 patch) has a coressponding segmentation label that we use as label.

16:22 < nilay> zoq: why is the segmentation label 16x16 and image patch 32x32

16:23 wasiq has joined #mlpack

16:23 < nilay> zoq: or are there 4 such labels to cover the entire area

16:33 < zoq> nilay: You could also use the entire area, but if I remember correctly they use 16x16 for performance reasons.

16:37 mentekid has joined #mlpack

16:40 < nilay> zoq: so we convert 32x32 segmentation label to 16x16, by max voting (or some other metric) in each 2x2 patch

16:41 < zoq> nilay: yes

16:41 < nilay> zoq: ok

16:46 tsathoggua has joined #mlpack

16:47 tsathoggua has quit [Client Quit]

17:16 < rcurtin> mentekid: I finished my simulations on the box with OpenBLAS, did I get you those results?

17:17 < rcurtin> ah right, I see that I did now... I agree, it seems like cutoff 0.05 or so seems to be a good choice

17:17 < rcurtin> but I'm a bit concerned because your system showed such different patterns

17:19 < mentekid> rcurtin: hey so I haven't had the time to run it properly on my system yet

17:20 < mentekid> I will run it today though so we can take a better look

17:21 < mentekid> I hope my results will agree with yours so we can decide for one version of the code and put a lid on this :)

17:21 < mentekid> Do you have time now or later to talk about the project? I have a few questions

17:42 < rcurtin> yeah, I have time now

17:43 < rcurtin> sorry for the slow response on that :)

17:44 < mentekid> no problem :)

17:44 < mentekid> So first of all regarding the blog, I think I would prefer posting something on the mlpack website. How do I do that?

17:46 < rcurtin> sure, so let's make sure that is all still working...

17:47 < rcurtin> okay, so the blog itself is still up, but I don't see any link to it from mlpack.org

17:48 < rcurtin> let me update the website... I guess I could add a link in "learn about mlpack" and under the "how can I join the mlpack community" page

17:49 < mentekid> if it's too much hassle I could always just post in the list

17:50 < mentekid> I mean if I'm the only student that prefers the blog, there's no point in making everyone checking in every week instead of simply getting an email

17:53 < rcurtin> nah it's no problem---I have been wanting to use the blog more anyway

17:54 < rcurtin> for instance to write up a post "here is how I did <task> in mlpack", those can be helpful

17:54 < rcurtin> and I suspect you will not be the only one who wants to provide updates as a blog post; in 2014 that was the preference of all the students

17:54 < zoq> yeah, I think some of the other also like to use the blog ... we just have to make sure it works as it did the last time.

17:57 < rcurtin> okay, I updated the webpage with links

17:57 < mentekid> cool then :)

17:57 < rcurtin> the blog itself is found at https://github.com/zoq/blog/

17:57 < rcurtin> and I *think* that you make a post just by writing markdown in content/blog/

17:58 < rcurtin> maybe it is more complex than that... I am about to find out :)

17:59 < zoq> That's right just create a single markdown file and push the file. I can send everyone an mail with some instructions and invite everyone to the repo.

18:00 < zoq> A markdown file with some metadata at the top: https://raw.githubusercontent.com/zoq/blog/master/content/blog/AnandWeekTen.md

18:01 < mentekid> Cool that

18:01 < mentekid> (sorry pressed enter)

18:01 < mentekid> That's practical, so I just push to the repo and then it appears on the blog.

18:01 < zoq> mentekid: right

18:02 < mentekid> another question is regarding my timeline and milestones

18:02 < mentekid> I'm not sure but I think some of the deliverables can be moved a bit sooner

18:03 < mentekid> for example, I've allocated almost a week for proposing changes and getting feedback, but I believe I can do that by Sunday so I can get right to the interesting part sooner

18:04 < mentekid> a) should I revise the milestones and resend it somewhere b) do you have any feedback regarding the milestones I've set

18:06 < mentekid> (I still can't find the blog link by the way)

18:08 < rcurtin> try ctrl+r on the mlpack website

18:08 < rcurtin> the blog itself is at http://www.mlpack.org/gsocblog/

18:08 < rcurtin> let me look at your proposal once I finish this first blog post, just a moment...

18:12 < rcurtin> I thought this post would only take a minute to write... :)

18:19 sumedhghaisas has joined #mlpack

18:24 < rcurtin> zoq: I think the github webhook for the blog repo needs to be updated to point to big.mlpack.org:7780

18:26 < zoq> rcurtin: okay

18:30 < zoq> rcurtin: nice blog post :)

18:32 < rcurtin> okay, I think I got that looking decent

18:32 < rcurtin> I want to do some more CSS work with the blog site, like maybe to make the fonts the same, but maybe I will get to that later

18:32 < rcurtin> mentekid: okay, let me look at your proposal timeline now, sorry that took so long

18:33 < mentekid> it's ok, I'm setting up the timing tests now too no hurry

18:35 < rcurtin> I see what you mean about the milestones and timeline, I think you are already ahead of schedule

18:35 < rcurtin> also what latex package did you use to make that timeline?

18:36 < mentekid> it was a ripoff from stackoverflow, let me find the thread

18:36 < rcurtin> I think that there's no need to change the timeline unless you are falling far behind; 1.5 weeks for the C++ implementation might be a bit short, but LSH is a much simpler algorithm than, i.e., nearest neighbor search with trees

18:36 < mentekid> here: http://tex.stackexchange.com/questions/196794/how-can-you-create-a-vertical-timeline

18:37 < rcurtin> ah, okay, it's just a pretty tabular environment; nice!

18:37 < rcurtin> yeah, I am not sure I have too much feedback on the timeline... it looks good to me

18:37 < rcurtin> if the reality deviates from the plan (like if you are running early) that's not an issue at all

18:38 < rcurtin> and if you're running behind, also not an issue, you have a bunch of "blue sky" time allocated and we can use that if necessary

18:38 < mentekid> cool then. Yeah my concern was with how closely I will follow it, right now I have no idea if I'll go far behind or too fast...

18:39 < mentekid> But I've already dived into the code so I more or less know what I want to do, at least regarding the multiprobe part

18:39 < rcurtin> it's always hard to know; my personal prediction is, you will be ahead of schedule for most of the implementation, but the testing will probably go over schedule somewhat

18:39 < rcurtin> I have no idea if I will be right with that prediction though :)

18:40 < rcurtin> also, some things came today!

18:40 < mentekid> no that sounds about right I think

18:40 < rcurtin> oh... this is not the right window. oops. but I did get some packages and it was nice :)

18:41 < mentekid> by the way, I made some changes to lsh for my thesis, so it would allow me to change the projection tables (there wasn't a way to change them before)

18:42 < rcurtin> just like an accessor for the projections matrix?

18:42 < rcurtin> if you want to submit a PR for that, feel free, that could be useful to other people too

18:43 < mentekid> I started making it like that but then I saw I would end up re-implementing half a function

18:43 < mentekid> so I just added a default argument to LSHSearch.Train()

18:43 < mentekid> by default it's an empty vector, but you can change it to any vector you want

18:44 < rcurtin> why a vector and not a matrix? I thought you would want to specify all the projections, not just one

18:44 < mentekid> no I mean a std::vector of arma::mat objects

18:45 < mentekid> that's how the LSHSearch object stores the projection tables, as a vector of matrices

18:45 < rcurtin> oh, right, I misread it, I thought it was just one projection table, but yeah, it is many

18:45 < rcurtin> actually, I am not sure what I was thinking, but it was incorrect :)

18:46 < rcurtin> I guess in Train() then we need to check and make sure that if the user passed anything in the std::vector, that it is the right number of tables

18:46 < rcurtin> and throw a std::invalid_argument otherwise

18:46 < mentekid> yeah I haven't done that because it was meant for personal use so I just did a quick and dirty swap, but I should anyway

18:46 < mentekid> my purpose was to change the projections because I've found a way to make more reliable projections using PCA, but I think it could also be helpful in the testing process because now we can simply run some small examples by hand and see what happens

18:47 < rcurtin> so actually that is a bit related to the paper I am about to submit...

18:47 < rcurtin> there is this hashing algorithm for furthest neighbor search (a more niche problem)

18:47 < rcurtin> and it chooses projections randomly

18:47 < rcurtin> but when you instead choose your projections based on the data (like you might with PCA), you get an algorithm which empirically performs way better

18:47 < rcurtin> (fewer projections needed for the same performance)

18:48 < mentekid> yeah it's almost the same thing with LSH

18:48 < mentekid> if you replace just a few projections with PCs you get better results

18:50 < mentekid> is the furthest neighbor done for machine learning or other purposes by the way?

18:50 < mentekid> I've only seen it in stuff like fluid dynamics I think

18:52 < rcurtin> oh, there are applications in fluid dynamics? that is good to know

18:53 < rcurtin> it's used for a couple of embedding algorithms

18:53 < rcurtin> to be perfectly honest I didn't think it had that interesting applications, but they were specifically asking for algorithms in the CFP for the conference

18:53 < mentekid> I think they use it in Fast Multipole Method which is the most headache-inducing algorithm I've read :P

18:53 < rcurtin> ah, okay

18:53 < rcurtin> FMM... that is where dual-tree algorithms come from :)

18:53 < mentekid> which is used in fluid dynamics and electromagnetics and stuff

18:55 < mentekid> right, there you have a target and a source tree right?

18:55 < rcurtin> I can't remember a part of the FMM where furthest neighbor is used, but I didn't study it too in-depth, so maybe I should revisit it

18:56 < mentekid> I can't say I'm really familiar, but I remember the part where you find groups of points that don't "intersect"

18:56 < mentekid> I only read the algorithm once and didn't like it

18:57 < mentekid> but I watched somebody present his thesis related to it, that's where I got the impression they used fnn

18:57 < mentekid> I might be wrong though

18:58 < rcurtin> I'll read through it later and let you know... I need a better motivation for the algorithm, so I am definitely looking for applications :)

19:03 < mentekid> nice :) where are you submitting?

19:03 nilay has quit [Ping timeout: 250 seconds]

19:12 < rcurtin> SISAP ("similarity search and applications")

19:12 < rcurtin> http://www.sisap.org/2016/

19:22 sumedhghaisas has quit [Ping timeout: 276 seconds]

19:27 < mentekid> oh Japan, very nice! Good luck :)

19:30 < rcurtin> yeah, we will see if it gets in... but I think I have good theory (an absolute approximation guarantee) and I have good results, so it should be no issue

19:30 < rcurtin> I have never been to Japan... I would really like to go

19:49 wasiq has quit [Ping timeout: 276 seconds]

21:56 mentekid has quit [Ping timeout: 244 seconds]

22:35 mentekid has joined #mlpack