ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at
xiaohong has joined #mlpack
xiaohong has quit [Ping timeout: 256 seconds]
Yashwants19 has joined #mlpack
< Yashwants19> Hi rcurtin: I think so mlpack-static-code-analysis is down due to some reason.
Yashwants19 has quit [Ping timeout: 256 seconds]
pd09041999 has joined #mlpack
< rcurtin> Yashwants19: it's not broken, it's just super backed up :) all the build slaves are currently doing the monthly build
< rcurtin> I think it will probably be tomorrow until it gets the static code analysis jobs done
Yashwants19 has joined #mlpack
< Yashwants19> Thanks for information :)
Yashwants19 has quit [Ping timeout: 256 seconds]
pd09041999 has quit [Ping timeout: 268 seconds]
sreenik has joined #mlpack
pd09041999 has joined #mlpack
pd09041999 has quit [Ping timeout: 246 seconds]
< lozhnikov> jeffin143: Hello, sorry, I didn't see your message. I'll look through #1814 today.
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
loong has joined #mlpack
loong has quit [Client Quit]
pd09041999 has joined #mlpack
pd09041999 has quit [Max SendQ exceeded]
pd09041999 has joined #mlpack
pd09041999 has quit [Remote host closed the connection]
KimSangYeon-DGU has joined #mlpack
favre49 has joined #mlpack
< favre49> is anyone else on the hangouts cal yet?
< ShikharJ> favre49: We're joining soon.
< rcurtin> yeah, I think we are still waiting on some people to join
Toshal has joined #mlpack
< zoq> If anybody can mute himself, if they aren't talking, that would be awsome.
sumedhghaisas has joined #mlpack
< sumedhghaisas> Hey Guys! I am not able to join, its saying that the meeting is full :( any ideas?
< zoq> let's see if I can invite you directly
< sumedhghaisas> zoq: Thanks :)
< zoq> send out an invitation
< zoq> ah nice
< Toshal> can you invite me as well
< zoq> sure
< Toshal> It's showing me the same error.
< zoq> done
< zoq> does that work?
< Toshal> Nope.
< jeffin143> Can u post the hangout link here
< Toshal> It has limit on number of users I guess.
< zoq> Did I use the correct mail?
< Toshal> yes, I answered the invitation it said the video call is full.
< rcurtin> I didn't realize that there is a ten person limit :(
< rcurtin> the notes from the IRC meeting will be recorded
< zoq> limit should be 150
< rcurtin> oh?
< rcurtin> ok
< ShikharJ> The limit is 25 for apps users fro business, government or education.
< zoq> hm, can't hear you?
< ShikharJ> For everyone else, it is 10.
< rcurtin> oh no
< rcurtin> hang on
< zoq> I can hear you
< Toshal> Okay
< rcurtin> I have this problem where hangouts sometimes crashes and I need to restart, but it has no video
< rcurtin> or audio
< zoq> yeah, can you repeat the last minute or so
< rcurtin> I haven't figured out what the issue is, but I'll be able to rejoin in just a moment
< KimSangYeon-DGU> Yeah
< ShikharJ> Try using chrome, if you're not using?
vivekp has joined #mlpack
< zoq> Good thing there is another meeting.
vivekp has quit [Read error: Connection reset by peer]
< zoq> I can leave the meeting to make space for one more. Anybody likes to take that place?
< ShikharJ> I don't see Saksham or Toshal. I guess Saksham is unavailable, Toshal what about you?
vivekp has joined #mlpack
< Toshal> Is the limit changed?
< zoq> Kinda, probably more important for you to join.
< Toshal> No it's fine.
< zoq> okay
< Toshal> Actually win situation everytime.
< ShikharJ> I'll disconnect for now, Toshal, if you wish to join.
< Toshal> I am having exams tommorow will start studyting. I will be there on 18th May meeting.
< ShikharJ> Oh okay, I'll brief you and Saksham then.
< sreenik> Is there a slot free?
< Toshal> ShikharJ: Thanks, Good Night for now.
< zoq> If you like to join, I'll make space :)
< ShikharJ> sreenik: Just made one.
< sreenik> Okay thanks
< ShikharJ> Feel free to join.
Toshal has quit []
< zoq> sreenik: Just lost the conenction, so I can't invite you right now.
< zoq> sreenik> Perhaps anybody else can do that as well?
< zoq> Ahh, wrong name.
< sreenik> zoq: Shall I make way for you?
< sreenik> Ohh Okay
< zoq> Interesting, I can't rejoin my own hangouts session :)
< rcurtin> thanks everyone, we'll have an IRC meeting also so that will be logged and can be referred to later :)
< rcurtin> the main thing I learned from the meeting is that I am not the only spaceflight enthusiast
< rcurtin> fact before the meeting I was playing a little bit of Kerbal Space Program :)
< KimSangYeon-DGU> It's a great time :)
< zoq> who is a spaceflight enthusiast?
< rcurtin> me, roberto, atharva, at least :)
< zoq> I'm wondering I could invite someone using my google apps account, to make room for more people
< zoq> ohh, nice
manish7294 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 256 seconds]
< favre49> fun meeting, i'll have to figure out the microphone. See you guys later
favre49 has quit [Quit: Page closed]
< rcurtin> zoq: yeah, we'll have to see if we can increase the limit somehow in the future, I'll see if I can look into some solutions
< manish7294> Rahul: xiaohong: Sorry, don't know your irc nick. I hope it's fine to address you by name directly. You will mostly see me around here on weekends but don't hesitate to put up your thoughts and question here, I will try to answer as early as possible.
< manish7294> Anyone in bengaluru?
< ShikharJ> manish7294: favre49 is Rahul I think.
< manish7294> ShikharJ: Thanks, I shall ping him again :)
< manish7294> favre49: please see the above messages.
< jeffin143> Sorry, too much network lag
< jeffin143> Also firefox didn't support hangout
< ShikharJ> jeffin143: Yeah, firefox has issues with most video conferencing softwares that I've tried. Maybe we can try zoom next time, though I'm not aware of the limits there either.
< rcurtin> zoom has a 40 minute meeting limit for free meetings, but we could probably do it
< jeffin143> Shikharj : zoom is good , We have our educational lectures their, but unsure of a personal account
< ShikharJ> rcurtin: Can't we just open a new meeting when the 40 minutes run out? :P
< rcurtin> yeah that is what we do at my company :)
< jeffin143> :)
< ShikharJ> rcurtin: RelationalAI is it?
< rcurtin> ShikharJ: yeah. we have some paid Zoom rooms that we use, but for the informal meetings between a few people we use free rooms
< ShikharJ> rcurtin: Are you at liberty to discuss what do the folks at RelationalAI do? Or is it stealth-mode or something right now?
< rcurtin> ShikharJ: yeah, I think I can, let me finish this meeting first though :)
manish7294 has quit [Quit: Page closed]
akhandait has joined #mlpack
< akhandait> rcurtin: I have never tried Kerbal Space Program but have heard it's really good. I guess I will try it out this summer.
< ShikharJ> From what I've heard, it's also one of the toughest.
< rcurtin> I like to write mlpack code while flying Kerbal ships in the background :)
< rcurtin> I just avoid using time acceleration, so, e.g., I need to adjust an orbit but 30 minutes from now, I can write a lot of code in that 30 minutes :)
< rcurtin> anyway, about RelationalAI, we are working on building a database that can do machine learning computations
< rcurtin> the key observation is this: your data scientist will usually first go to a database and issue some big SQL query or something like this to extract a data matrix, which they'll then save as CSV or something
< rcurtin> then, the data scientist will take the CSV and load it into machine learning tools like mlpack or whatever their favorite library is :)
< rcurtin> but the computation of that data matrix can take a really long time, especially when the query is quite complex
< rcurtin> it turns out that it is possible to learn a machine learning model directly over the tables in the database, and this can give orders-of-magnitude speedup in the time it takes to learn a model
< ShikharJ> Yeah, as someone who is going through a lot of code written by some data scientists, I can sure say that is the thing they do all day majorly.
< rcurtin> we're still developing a core product at the moment, but it is exciting work and I am figuring out how to plug in mlpack internally :)
< ShikharJ> Okay, but what is the output of the model? A processed data matrix? Or a pipelined end product of the experiments done on that matrix?
< rcurtin> end product; so, e.g., the parameters of a linear regression model, ROC curves, etc.
< rcurtin> I would say that it is still very experimental so we are not yet able to replicate all parts of a data scientist's work. but the parts we can replicate are pretty fast :)
< ShikharJ> I see, and what exactly are your inputs? I'm guessing a specific user would have to provide some information regarding the data he wants to consider and what he doesn't want to consider. Is that through UI, or is it through an API?
< ShikharJ> Sorry, it's none of my business to know, I'm just curious.
< rcurtin> no, it's no problem at all :)
< ShikharJ> This stuff is super interesting :)
< rcurtin> my current imagination is, the data scientist will write something that looks like the SQL query they would have otherwise input
< rcurtin> and then they'll also issue some command that says the type of model they'd like to learn
< rcurtin> then, both of these are computed at once
< rcurtin> so, that SQL query is never actually executed as written---it is only used to figure out exactly what needs to be computed to learn the desired model type
< ShikharJ> Hmm, that could simplify a lot of the routine work, especially, if a UI is used I feel. But then, I have no experience whatsoever in this regard.
< rcurtin> yeah, so simplifying the work of the data scientist is definitely an angle
< rcurtin> the angle I am most interested in is the big speedups though, that is what I am always most excited about :)
< ShikharJ> I can understand.
KimSangYeon-DGU has quit [Quit: Page closed]
gmanlan has joined #mlpack
< gmanlan> Hi there, somebody knows if it is possible to change the default decision threshold in decision trees/random forest?
akhandait has quit [Quit: Connection closed for inactivity]
< rcurtin> gmanlan: if you're using the binding, you can use --output_probabilities to get output probabilities instead of classifications
< rcurtin> (the same from C++, there's an overload of Classify() that returns probabilities)
< rcurtin> sorry if I botched function or parameter names, I'm doing that from memory
< gmanlan> hey
< gmanlan> what I need to do is change the default 0.5 to something like 0.7
< gmanlan> I don't think it's standard procedure, that's why I'm asking if it would be possible
< gmanlan> (probably not a good idea)
< rcurtin> no, it's totally standard procedure
< rcurtin> one would typically build an ROC curve or something, and then from this choose the 'best' threshold for the desired false positive rate (or something like this)
< gmanlan> exactly
< rcurtin> let me toss a change into the RF fix branch---
< rcurtin> I'll add a threshold parameter to the Classify() function, like it is for LogisticRegression
< rcurtin> actually, maybe I should do that in a separate PR, since there are many classifiers that should have that support
< rcurtin> I'll also make the option available from the bindings
< rcurtin> handling some other things right now but I'll do that next
< gmanlan> yep, maybe better to do it separately, but I feel bad for you - every time I ask something you end up coding...
< rcurtin> :)
< rcurtin> no worries
< rcurtin> in general people ask for things that are important and needed
< gmanlan> (Y) thanks
< rcurtin> I set aside some time each day to handle requests like this but also save some time to work on my own pet projects too :)
< gmanlan> of course
< rcurtin> also if I'm quick about it it's only 20-30 minutes of work
< gmanlan> haha, not everybody is like you - it would take me much more time to me
< gmanlan> qq: anything pending for the RF fix PR? - I have tested it and looks fine so far
< rcurtin> well I am also coming up on my 10 year anniversary working on mlpack, so I know the code pretty well at this point :)
< gmanlan> :)
< rcurtin> yeah, I just need to handle Marcus's comments and I think it will be good to go
< gmanlan> great
< rcurtin> my release scripts should make an immediate release after that easy too
< rcurtin> hopefully :)
< gmanlan> awesome!
sreenik has quit [Quit: Page closed]
< rcurtin> ha... I took a look at Jenkins to see how it was doing on the monthly matrix build
< rcurtin> Build Queue (664)
< rcurtin> might be a minute
< gmanlan> :)
gmanlan has quit [Ping timeout: 256 seconds]
< rcurtin> gmanlan: hmm, the only issue is that the threshold concept is less well-defined for multiclass classifiers
< rcurtin> it can be generalized but it's less straightforward
< rcurtin> one would have to pass a vector, not a single threshold
< rcurtin> so this is no longer a 20-30 minute task :) let me think about the right abstraction for this... I don't think most users think about vectors of thresholds (which are technically prior probabilities for classification)
gmanlan has joined #mlpack
< gmanlan> rcurtin: you are right, it's mainly for binary classification
< gmanlan> don't worry about it - we can add it to the backlog and think about it
< gmanlan> I don't think scikit supports it, but I thought it may be useful for some cases
< rcurtin> yeah, with scikit usually you have to use predict_proba() and then do the thresholding yourself
< rcurtin> I always thought it was nice in mlpack to be able to just conveniently get the classifications directly
< rcurtin> I have to drive home in a little while, I was going to think about some ideas there. For now the best I can think of is a standalone function that takes class probabilities and thresholds (or a threshold vector)
< rcurtin> but that feels "unlike" mlpack in that usually you just get your results by directly calling some function of the class
< gmanlan> that's right