#mlpack on 2017-05-12 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:00 < rcurtin> hello there everyone, I think I will wait another minute or two for a few people that I haven't seen yet

00:00 < zoq> yeah, good idea :)

00:02 < Kirill> hey guys

00:02 govg has joined #mlpack

00:02 < chenzhe> hi!

00:03 < zoq> Hello there!

00:03 < rcurtin> hey there Kirill, glad you could make it :)

00:03 < rcurtin> I thought maybe I had sent the invitation to the wrong email since I didn't see you this morning, but I guess you got it in the end :)

00:03 < Kirill> yeah, I did

00:05 < rcurtin> ok, I guess I will go ahead and start, people can read the logs if they need to catch up :)

00:05 < rcurtin> so, hello everyone, thanks for coming to the meeting! this is the closest we can get to an in-person meeting since we are all so far apart

00:06 < rcurtin> congratulations to the students on your acceptance!

00:06 < rcurtin> as you probably know, this is the second instance of this meeting, so basically I'll be talking about the same things

00:06 < rcurtin> the meeting is logged, and you can find the logs at http://www.mlpack.org/irc/

00:06 < rcurtin> but for some reason the log is not fully working, so I can't seem to load the messages from earlier today... I will have to look into that...

00:07 < rcurtin> anyway, I think you all know me, I'm Ryan Curtin, the mlpack GSoC organization administrator

00:07 < chenzhe> It seems to be quite a long log to read ^_^

00:07 < rcurtin> yeah, many messages :)

00:07 < rcurtin> this is the biggest GSoC mlpack has had; this year we have 10 students. In previous years we've had 6, 5, and 3... so this is a much bigger logistical challenge!

00:08 < rcurtin> I believe I have sent everyone an email with useful links, documentation, and other information about the project; if you didn't get that email, let me know and I'll send it to you (and make sure I send everything to your correct email in the future)

00:08 < rcurtin> as far as schedule goes...

00:08 < rcurtin> right now is the "Community Bonding" period, which goes from May 4 to May 30

00:09 < rcurtin> ideally, during this time, you can get to know your mentor a bit, get to know the community a bit, maybe have some fun, maybe learn a bit more about mlpack

00:09 < rcurtin> some nice ways to do this are on the mailing list, or direct emails to your mentor, or here on IRC---don't feel restricted to talk only about mlpack; we can have some fun in the channel also :)

00:10 < rcurtin> once the community bonding period is over, the actual coding goes from may 30 to august 21

00:10 < rcurtin> during that time, there will be two midterm evaluation periods; one at the end of June, and one at the end of July

00:10 < rcurtin> then, at the end of the summer there's a final evluation

00:10 < rcurtin> after that, there's no schedule imposed by Google, but we hope that you will stick around and continue to participate in the project :)

00:10 < rcurtin> if there are any stipend issues or administrative issues, probably Google will be the most helpful there but you should feel free to ask me and I can try and help or escalate to Google as needed

00:11 < rcurtin> so, before I moved on, any questions about schedule or anything? next we can talk about student and mentor expectations

00:12 < rcurtin> ok, I will take that as a no :)

00:12 < Kirill> yeah, we got it

00:12 < rcurtin> I don't think that any of the expectations here are too difficult, but I think it's important to discuss these before the summer starts to ensure we are all on the same page

00:12 < rcurtin> students are expected to work the equivalent of a full-time job or internship; so, a full work week

00:12 < rcurtin> it's okay if some weeks you work more and some weeks you work less, but in the end it should even out

00:13 < rcurtin> it's also okay if you have to travel or will be unavailable, but please make sure your mentor knows that you'll be gone

00:13 < rcurtin> disappearing students are a big problem in GSoC, so if we don't hear from you for a little while we may start to get scared that you have disappeared :)

00:14 < rcurtin> regular contact and communication with mentors is very important and expected; preferably, this would be via the public #mlpack IRC channel, but if you or your mentor prefer alternate means of contact that is okay too

00:14 < rcurtin> the reason we suggest using the public channel or public mailing list is so that more people than just your mentor can answer any questions; sometimes this can be helpful

00:14 < rcurtin> if you are having trouble with some part of your project, definitely do not be afraid to ask---that is what the mentor is there for

00:14 < rcurtin> students are also expected to provide some kind of weekly status update to the community

00:14 < rcurtin> this could be an email post, like these examples:

00:15 < rcurtin> https://mailman.cc.gatech.edu/pipermail/mlpack/2013-July/000142.html

00:15 < rcurtin> https://mailman.cc.gatech.edu/pipermail/mlpack/2016-July/001022.html

00:15 < rcurtin> or it could be a blog post, like these:

00:15 < rcurtin> http://mlpack.org/gsocblog/improvement-of-automatic-benchmarking-system-week-10-highlights.html

00:15 < rcurtin> http://mlpack.org/gsocblog/approximate-nearest-neighbor-search-week-11.html

00:15 < rcurtin> whichever way you'd like to do it is up to you, but weekly updates are important because people in the community may be interested in following what you are up to in your project

00:15 < rcurtin> the blog posts are done through a Github repository at https://github.com/mlpack/blog

00:15 < rcurtin> and I'll make sure you have the right permissions to post there after the meeting

00:16 govg has quit [Ping timeout: 240 seconds]

00:16 < rcurtin> not every project goes according to plan; sometimes, the project may fall way behind the timeline, or it may proceed way ahead of the timeline

00:16 < rcurtin> this is not necessarily a problem, do not worry!

00:16 < rcurtin> if this does happen, the student and the mentor should discuss to see what is realistically accomplishable in the rest of the summer and adjust the goals accordingly

00:17 < rcurtin> many GSoC projects don't get completed in the original way they were defined, so don't worry if things change during the summer---estimating how much work some software can take is a very hard task, and almost nobody gets it right

00:17 < rcurtin> we really hope we won't fail any students this year, and we certainly don't expect to---every accepted student is a good student, so the prior probability of failure is very low in our opinions

00:18 < rcurtin> but, we will fail a student who disappears, doesn't appear to be working the expected amount, or who is otherwise seriously underperforming

00:18 < rcurtin> if such a situation does occur, the student will be made fully aware with warnings, so any failure will not be unexpected

00:18 < rcurtin> like I said, I really don't expect this to be an issue, but it's important to talk about it beforehand in case any issues actually do arise, so that we are on the same page

00:18 < rcurtin> any questions about student expectations? if not, I'll go ahead and move onto mentor expectations

00:19 < zoq> nothing from my end :)

00:20 < rcurtin> this meeting is going faster than the last one, not very many questions :)

00:20 < rcurtin> no complaints from my end about that :)

00:20 < Kirill> :)

00:20 < rcurtin> next I'll talk about mentor expectations, which are also important to set out beforehand

00:20 < rcurtin> mentors should work with their students to determine times that they are both available to work together

00:21 < rcurtin> they should also be willing to debug code problems and help the student understand any theory as needed

00:21 < rcurtin> similar to how students are expected to be available and in contact with their mentors, mentors are also expected to be responsive and in contact with their students, providing reasonably quick replies

00:21 < rcurtin> the mentor shouldn't do the majority of the work on the project, of course, but they should be there to help as needed

00:22 stephentu has joined #mlpack

00:22 < rcurtin> like earlier, the best form of communication for a student-mentor pair is going to depend on the preferences of the student and mentor, but ideally this may be best to do in public, to allow others to contribute/observe/ask questions if they might like

00:23 < rcurtin> hey Stephen :) if you want to catch up, some logs are here: http://mlpack.org/irc/

00:23 < rcurtin> (or maybe you already read the logs from this morning, in which case, you probably know what I'm about to say! :))

00:24 < rcurtin> the two midterm evaluations and final evaluation need to be done by the mentor during the periods described by Google (end of June, end of July, and end of August)

00:24 < rcurtin> if there is some problem, I think I can enter the evaluations as an organization administrator

00:24 < rcurtin> but as always, if there's any problem you can just ask me (or zoq or whoever) and we can all try and get it figured out

00:24 < rcurtin> any questions about mentor expectations?

00:25 < rcurtin> also, can everyone send me the github account they'd like to use this summer so I can add them to the right Github team for GSoC students?

00:25 < Kirill> ok

00:25 < stephentu> i'm glad i joined around the time you were talking about mentor expectations lol

00:25 < rcurtin> :)

00:25 < stephentu> sorry i had a meeting run late

00:25 < Kirill> here or throug mail?

00:25 < rcurtin> no problem, don't worry about it

00:25 < rcurtin> here is fine, you can just give me the ID

00:25 < Kirill> micyril

00:25 < rcurtin> Kirill, I think I already have yours, I'm assuming you'll use the same one you have in the past

00:26 < rcurtin> yeah, right, let me add that now

00:26 < rcurtin> and Stephen you're already a member of the organization :)

00:26 < Kirill> yeah, I sent it just to make sure

00:26 < rcurtin> chenzhe: what Github account id should I add for you?

00:27 < rcurtin> I guess I need Kartik's also, but I think he is not here

00:27 < rcurtin> anyway, those can be added later, no need to do it now :)

00:27 < rcurtin> next I'll add some short history, I dunno if it will be interesting for anyone, but I think it is interesting :)

00:27 < rcurtin> mlpack was first developed in 2007 in a lab at Georgia Tech (so, over ten years now!)

00:28 chenzhe1 has joined #mlpack

00:28 < rcurtin> the lab had maybe ~10 people that contributed to the library early on, and I joined the effort around late 2009

00:28 < rcurtin> my job, when I joined the lab, (in addition to research...) was to prepare the library to actually release as open source

00:28 < rcurtin> but this took two full years of refactoring and a team of people, so mlpack 1.0.0 wasn't released until december 2011 at a NIPS workshop

00:29 chenzhe has quit [Ping timeout: 246 seconds]

00:29 chenzhe1 is now known as chenzhe

00:29 < rcurtin> at that point, the lab kind of died when the advisor (Alex Gray) left Georgia Tech to start a company called Skytree

00:29 < rcurtin> and the few remaining people (which I think at some point was just me) got involved with Google Summer of Code, and the community has grown a ton since then

00:29 < rcurtin> this is our fourth Summer of Code, and like I said earlier by far the biggest

00:30 < rcurtin> now the library has somewhere over 80 contributors, from all different continents except Antarctica

00:30 < rcurtin> and one of the contributors is actually a deep learning system: https://github.com/C0deAi

00:30 < rcurtin> there's also a pull request open from North Korea, so I think we are the only ML library with code that's come from there :)

00:31 < Kirill> :D

00:31 < rcurtin> it is very exciting that when I travel to conferences now, there is some good name recognition of mlpack---people know what it is, unlike in 2012

00:31 < rcurtin> I'm really hoping that many of the projects this summer will get a lot of interest from the larger machine learning community

00:31 < rcurtin> many of the projects are focused on the neural network code, which I am hoping we will be able to release as stable soon, and that will probably get a lot of interest

00:32 < stephentu> can i ask a general mlpack question

00:32 < rcurtin> in addition, I am currently working on automatic Python bindings for the command-line programs, and this should help bring more people to use mlpack (and maybe other languages too)

00:32 < rcurtin> of course, go ahead

00:32 < stephentu> how do you see mlpack w/ respect to all the other ML frameworks out there like sklearn, and all the DL frameworks like pytorch adn tensorflow

00:32 < stephentu> esp the ones w/ company backed support

00:32 < rcurtin> I think the focus is pretty different, or, at least it traditionally has been different

00:33 < rcurtin> technically we have a little bit of support via Symantec, but that's not the same thing as, e.g., Spark and Databricks :)

00:33 < rcurtin> traditionally mlpack focused on very fast implementations instead of ease of use

00:33 < rcurtin> since it's in C++, it's already much higher on the learning curve than most people want to climb

00:33 < rcurtin> mlpack also has typically focused on less "standard" algorithms, and has implementations of a lot of stuff you won't find elsewhere

00:33 < rcurtin> (though we have definitely added more 'standard' techniques over the years)

00:34 < stephentu> ya i think thats how i stubmled upon mlpack in teh 1st place

00:34 < stephentu> i was looking for some SDP implementation

00:34 < rcurtin> yeah, that's a decent example---mlpack has one of the better optimizer frameworks out there (in my opinion)

00:34 < stephentu> cool

00:35 < rcurtin> I think, moving forward, that the best path might be to focus on speed, then ease of use

00:35 < Kirill> does mlpack supports running in parallel on multi-core processors?

00:35 < rcurtin> Kirill: sort of; there is some OpenMP support for some algorithms

00:36 < rcurtin> and you could also use OpenBLAS inside of Armadillo to get parallel linear algebra

00:36 < rcurtin> Shikhar's project this year will focus on some parallelization too

00:36 < Kirill> ok

00:36 < rcurtin> that's not to say it's perfect; there is a lot of room for improvement :)

00:36 < rcurtin> but, I think maybe "there is a lot of room for improvement" applies to just about any code anywhere :)

00:37 < rcurtin> stephentu: back to your original question, I'm not fully sure how to see mlpack's DL code vs. pytorch or tensorflow yet

00:37 < rcurtin> I have to come up with something, because deep learning is really hot inside of Symantec and my goal is to get people to use mlpack inside of Symantec :)

00:38 < rcurtin> so I will have to figure out some persuasive arguments for why one would use mlpack vs. TF or whatever else... and I think maybe benchmarks will be a good part of that argument

00:39 < stephentu> good luck, its a tough battle to fight

00:39 < rcurtin> agreed, it absolutely is

00:39 < rcurtin> but in the mean time, I am having fun working on mlpack and I think there are places where we definitely provide some nice support that other libraries don't :)

00:40 < rcurtin> so I don't have too much insight on "where things will be" in a year or so, we will have to see, but I do know that the improvements from this summer's projects will be exciting and my opinion is they will capture a good amount of interest :)

00:40 < rcurtin> I guess, I'm out of things to say in the meeting, but I think maybe it is a good idea to have some introductions so we can get to know each other a little bit

00:40 < rcurtin> I don't have any formal structure, so maybe it will be chaotic, but...

00:41 < rcurtin> I'm Ryan, I live in Atlanta, I did my B.S., M.S., and Ph.D. at Georgia Tech, and now I work for Symantec and still managed to stay in Atlanta

00:41 < chenzhe> it seems that I just loss connection when you ask me about github account, did you get it?

00:41 < rcurtin> chenzhe: no, I didn't see the message, sorry about that

00:41 < chenzhe> czdiao

00:41 < rcurtin> excellent, thanks

00:41 < chenzhe> or maybe diao@ualberta.ca

00:41 < rcurtin> in my free time, my favorite hobby is racing go karts; it's a lot of fun --- http://ratml.org/misc_img/ironman_round_3.jpg is a picture

00:42 < rcurtin> that is my introduction, everyone else should feel free to introduce themselves :)

00:44 < stephentu> we are all too shy to introduce ourselves

00:44 < rcurtin> apparently so, that's okay :)

00:44 < stephentu> i guess i'll go

00:44 < rcurtin> the morning crowd was much more talkative :)

00:44 < stephentu> im one of the mentors this year.

00:44 < stephentu> i am a phd student at UC berkeley.

00:44 < stephentu> hoping to graduate someday

00:45 < rcurtin> but even if you don't, the weather is always nice so it is not a problem to stay in Berkeley :)

00:45 < stephentu> i've been trying to play more guitar lately

00:45 < stephentu> which is fu

00:45 < stephentu> fun

00:45 < stephentu> lol its also lots of FUUU

00:45 < stephentu> weather is great here, COL isn't as great

00:45 < stephentu> :(

00:45 < rcurtin> yeah I thought that's what you meant... I have been learning the bass and my experience is maybe more FUUU than fun :)

00:46 < rcurtin> yeah, that is a disadvantage to california :(

00:46 < chenzhe> what is COL?

00:47 < stephentu> cost of living

00:47 < chenzhe> that's true......

00:48 < chenzhe> Your last name seems to be Chinese

00:48 < stephentu> my parents are from taiwan

00:48 < stephentu> i was born in the US though

00:49 < chenzhe> I see, I guess you don't speak Chinese 😀

00:50 < stephentu> its been my goal for N years to improve chinese

00:50 < stephentu> but it never happens

00:50 < chenzhe> haha

00:50 < stephentu> so unfortunately we will be hosting our meetings in english

00:50 < stephentu> one of these days

00:50 < chenzhe> That's true~ Language is hard

00:51 < rcurtin> I have been trying to brush up my german, even that is difficult :)

00:51 < rcurtin> and nowhere as hard as chinese

00:51 < rcurtin> nowhere near*

00:52 < chenzhe> haha

00:52 < chenzhe> I can go now~ My name is Chenzhe, I am doing my Ph.D. in U of Alberta in Canada

00:53 < chenzhe> I work in Applied math, maybe graduating later this year or next year

00:54 < chenzhe> I like skiing~

00:54 < stephentu> like a true canadian

00:54 < chenzhe> That's how people can do in Canada

00:54 < chenzhe> I guess you cannot do anything else in the long winter

00:54 < rcurtin> perfect place to like skiing :)

00:55 < rcurtin> atlanta is too hot for that, there are not big mountains and not much snow

00:55 < chenzhe> We just said goodbye to last big snow a few weeks ago

00:55 < stephentu> where are you originally from?

00:56 < rcurtin> it was 90F here today :(

00:56 < chenzhe> Mainland China, you might heard about an ancient city named Xi'An

00:56 < Kirill> chenzhe, here in Russia also snow was lately

00:56 < Kirill> so, you are not alone

00:56 < chenzhe> haha

00:57 < chenzhe> We finally got spring now, it's about 20 C

00:57 < stephentu> chenzhe: there is this delicioius place in NYC called xian's famous foods

00:58 < stephentu> do you know if it is the same xi'an?

00:59 < chenzhe> really? When I was in Flushing, I remembered there is a small restaurant named Biang, which is a very complicated Chinese word

00:59 < stephentu> http://xianfoods.com/

00:59 < stephentu> delicious

00:59 < stephentu> very bad for you

00:59 < stephentu> but delicious

01:00 < Kirill> So, maybe it's a good time to introduce myself

01:00 < chenzhe> Looks similar, this city is actually famous for all kinds of noodles

01:00 < chenzhe> sure

01:01 < Kirill> my name is Kirill as you can guess :)

01:02 < Kirill> I'm a PhD student at Ural Federal Univercity (Ekaterinburg, Russia) working on Computational Humor

01:02 < Kirill> This summer I'm going to work on cross-validation and hyper-parameter tuning infrastructure

01:03 < stephentu> Kirill: will you be experimenting w/ any of these bayesian methods for hyperparam selection?

01:03 < Kirill> In my free time I like to walk and cycle with friends

01:04 < Kirill> stephentu: it will be beyond the scope of this summer, but it can be extanded in this in the future

01:05 < Kirill> I hope to make it flexible enough to make it possible

01:05 < stephentu> cool, ya its definitely a lot of work to implement that stuff

01:05 < rcurtin> yeah, I am very excited about this project, but you probably already know that from the emails :)

01:07 < stephentu> so rcurtin can you explain a bit more about what community bonding entails

01:07 < stephentu> maybe i missed this

01:07 < stephentu> we're not techically supposed to get started yet

01:07 < stephentu> from what i can tell

01:07 < rcurtin> yeah, the students don't need to write any code yet (unless they want to)

01:08 < rcurtin> the idea is for students to integrate into the community and get to know the other students and mentors

01:08 < rcurtin> so that the summer itself is more fun and feels less like a heartless consulting job :)

01:08 < stephentu> haha

01:08 < rcurtin> in a real job I guess this would be the equivalent of water cooler discussions

01:09 < rcurtin> but we don't have an international water cooler, just #mlpack and the mailing list :(

01:09 < rcurtin> (and whatever other communication methods)

01:11 < rcurtin> there are no strict and hard requirements for the community bonding period though, just a general idea :)

01:16 < rcurtin> ok, I guess there is nothing else for now, maybe we will hear from Kartik and Sumedh in the future :)

01:17 < rcurtin> feel free to idle in the channel and chat!

01:17 < chenzhe> Sure

01:17 < rcurtin> thank you everyone for attending the meeting

01:17 < rcurtin> I'll be around on and off for the next couple hours before I go to bed

01:17 < rcurtin> I am looking forward to the summer :)

01:23 Kirill has quit [Ping timeout: 260 seconds]

01:26 < stephentu> great thanks

01:26 < stephentu> i'll try to come on irc more

01:26 stephentu has quit [Quit: Lost terminal]

02:24 sumedhghaisas has joined #mlpack

02:35 mikeling has joined #mlpack

03:10 chenzhe has quit [Ping timeout: 260 seconds]

04:59 govg has joined #mlpack

05:38 govg has quit [Ping timeout: 240 seconds]

06:10 vivekp has quit [Ping timeout: 272 seconds]

06:10 vpal has joined #mlpack

06:11 vpal is now known as vivekp

07:33 mentekid has joined #mlpack

10:47 sumedhghaisas has quit [Ping timeout: 240 seconds]

11:12 sumedhghaisas has joined #mlpack

13:22 govg has joined #mlpack

13:28 sumedhghaisas has quit [Ping timeout: 240 seconds]

15:32 < rcurtin> ok, I set up the benchmarks repository to post to the mlpack-git mailing list when commits are pushed

15:32 < rcurtin> I'll also get PRs and issues set up for the other repositories to send emails to the mlpack-git mailing list

15:33 < zoq> good idea and there is the first one

15:35 < rcurtin> ok, that should be set up correctly now

15:35 < rcurtin> now since I am on the mlpack-git mailing list I have to now set my personal github account to ignore all activity on the blog and benchmarks repositories

15:36 < zoq> yeah, a new filter for me as well

17:00 sumedhghaisas has joined #mlpack

17:49 chenzhe has joined #mlpack

18:32 mikeling has quit [Quit: Connection closed for inactivity]

20:09 chenzhe has quit [Ping timeout: 260 seconds]

20:51 chenzhe has joined #mlpack

23:39 sumedhghaisas has quit [Ping timeout: 240 seconds]