verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< stephentu> i love these neural net people
< stephentu> all these new phrases for everything
< stephentu> backpropagation
< stephentu> co adaptation
Cooler_ has joined #mlpack
< Cooler_> hello sirs
< zoq> Cooler_: Hello!
< Cooler_> so i need help
< Cooler_> i need make a knn text classifier
< Cooler_> do how i can convert text with words to matrix with float numbers to be input at KNN ?
< Cooler_> i thing using MLpack, but examples of site use only numbers at CSV http://www.mlpack.org/doxygen.php?doc=nstutorial.html
< Cooler_> someone can help me ?
< zoq> Cooler_: You could convert the character to its ASCII int value. Something like: int(c) or std::static_cast<int>(c)
< Cooler_> i dont need cast to "int", so i need make a simple spam classifier with KNN, but in my dataset, have a big text per line...
< Cooler_> so i need classifer where have spam in simple string input, i need using KNN for estimate
< Cooler_> i think some thing like this https://github.com/rieck/sally
< naywhayare> Cooler_: I'm not sure kNN on floating-point values of your words is really very meaningful
< naywhayare> embedding your strings into a metric space would allow you to use mlpack's allknn program, though
< naywhayare> another option, though this is maybe not for the faint of heart, is to use the FastMKS algorithm to find the most similar words according to some string kernel
< naywhayare> that is, some function that calculates the similarity of strings, but is not necessarily a metric
< naywhayare> then you can use FastMKS to find the k other strings that have maximum similarity (according to the kernel function you define), and then use majority voting to decide on the class
< naywhayare> if the sally project you linked to lets you easily and quickly embed strings into a metric space, though, that may be a better choice for the sake of simplicity
< Cooler_> nice idea
< Cooler_> but stirng similarity , look this following https://github.com/rieck/harry
< naywhayare> the primary advantage of FastMKS would be that if you have very large sets of strings to search through, FastMKS can scale sublinearly in the size of the set to search for each query string
< naywhayare> whereas harry probably iterates over every string in the set to be searched
< naywhayare> if your dataset is smaller, though, harry may be quicker; FastMKS has a possibly expensive preprocessing step (it has to build a tree on the strings)
< naywhayare> also the code you'd have to write to make FastMKS search strings would be a little tricky...
< Cooler_> i think using 20 lines of dataset
< Cooler_> this lines is type SPAM
< Cooler_> this library i using naive bayes, now i need using KNN or other way, but you say about FastMKS need study about
< naywhayare> unless your set of strings is really large (millions or more), it's probably not worth your time to investigate FastMKS
< naywhayare> but it's probably good to know it's an option, in case you need it
< naywhayare> I think using the 'sally' tool to map the strings into vectors, then using mlpack's allknn to determine the nearest neighbors, then taking a majority vote to do kNN classification is a good way to solve the problem
< Cooler_> nice view point, thanks for help, you clean my mind to following new tries... cheers
< Cooler_> other ask, do you know another tool to make same thing of 'sally' ?
< naywhayare> no, unfortunately, I don't know any other tools for embedding strings into metric spaces :(
< Cooler_> ok thanks, so at naive bayes i use this way https://github.com/CoolerVoid/libtext_bayes/blob/master/libtext_bayes.cpp#L203
< Cooler_> using probability occurrence of words... i think try using this method to try put values at KNN
curiousguy13_ has quit [Ping timeout: 256 seconds]
prakhar2511 has joined #mlpack
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 246 seconds]
prakhar2511 has quit [Ping timeout: 250 seconds]
tunnelshade_ has joined #mlpack
tunnelshade has quit [Ping timeout: 245 seconds]
tunnelshade_ is now known as tunnelshade
lezorich has quit [Quit: Ex-Chat]
curiousguy13_ has joined #mlpack
prakhar2511 has joined #mlpack
stephentu has quit [Quit: Lost terminal]
kshitijk has joined #mlpack
prakhar2511 has quit [Ping timeout: 250 seconds]
curiousguy13_ has quit [Ping timeout: 245 seconds]
curiousguy13_ has joined #mlpack
kshitijk has quit [Ping timeout: 246 seconds]
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 264 seconds]
lezorich has joined #mlpack
prakhar2511 has joined #mlpack
prakhar2511 has quit [Ping timeout: 250 seconds]
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 240 seconds]
prakhar2511 has joined #mlpack
apir8181 has joined #mlpack
curiousguy13_ has quit [Ping timeout: 246 seconds]
curiousguy13_ has joined #mlpack
lezorich has quit [Ping timeout: 256 seconds]
< apir8181> Hi, is it methods in lars and linear regression have using SGD for large scale problem ? I am wondering whether SGD enhancement could be a feature or not?
< naywhayare> apir8181: I'm not sure I understand what you mean
lezorich has joined #mlpack
< naywhayare> I don't think that LARS or LinearRegression use SGD currently
< naywhayare> I think they use Armadillo solvers directly
< apir8181> I am reading a paper [Large-Scale Machine Learning with Stochastic Gradient Descent]. It seems that many models can use sgd to solve for large scale problem. So, I am wondering is it exist some models in mlpack right now could not handle large scale problem?
< apir8181> linear regression in mlpack seems to use matrix decomposition method to train the model.
apir8181 has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]
prakhar2511 has quit [Ping timeout: 250 seconds]
thisisanick has joined #mlpack
prakhar2511 has joined #mlpack
< thisisanick> hi
< zoq> thisisanick: Hello!
< thisisanick> I want to contribute to the MlPack through Google Summer of code 15. I need some information and advice.
< zoq> thisisanick: Regarding how to start?
< thisisanick> and also what could be done as a peroject
< thisisanick> project*
< zoq> ah okay, so the best way to get started is to download mlpack and compile it from source, then use it
< zoq> for some simple machine learning tasks.
< thisisanick> It seems like the expected Gsoc proposals should focus on general improvements about the project rather than implementing new algortihms
< zoq> We are always interested in new algorithms so if you interested in some special field I think we can figure something out.
< zoq> At least we need someone (mentor) who is familar with the topic.
prakhar2511 has quit [Ping timeout: 244 seconds]
< thisisanick> My ML background is mostly on Regression(Linear, Logistic). I am still learning (like neural networks). At the time of project starts, I am going to have more information. At this point, I need advice on what to focus. As you can understand, I am more into implementing algorithms
< zoq> I can't advise you on which project to go with. You should select the project you are most interested in and best suited for. But if you are interested in implementing various neural networks there is a project on the wiki.
< thisisanick> I understand. Thank you. I will tr
prakhar2511 has joined #mlpack
< thisisanick> I will examine the mlpack and try to find an unimplmeneted one
thisisanick has quit [Quit: Page closed]
kshitijk has joined #mlpack
kshitijk has quit [Ping timeout: 256 seconds]
kshitijk has joined #mlpack
stephentu has joined #mlpack
kshitijk has quit [Ping timeout: 264 seconds]
prakhar2511 has quit [Ping timeout: 264 seconds]
prakhar2511 has joined #mlpack
prakhar2511 has quit [Ping timeout: 252 seconds]
kshitijk has joined #mlpack
adityaosp95 has joined #mlpack
< naywhayare> fascinating, we've been rejected for GSoC 2015
< naywhayare> that was unexpected
< zoq> yeah :(
< naywhayare> I'll attend the feedback meeting on Friday to try and figure out why
< lezorich> :(
< stephentu> ya thats too bad
< naywhayare> shogun says they got rejected too
< zoq> oh
< stephentu> maybe machine learning is not exciting enough for them
< naywhayare> so no machine learning in GSoC 2015? (I don't know of any other organizations for ML that applied)
< naywhayare> yeah, obviously machine learning is uninteresting and not relevant to industry or Google at all :)
< stephentu> hmm we even had like neural nets
< stephentu> we didnt use the words big data though
< naywhayare> ah! that's what we forgot :)
< zoq> haha
< stephentu> there are a few robotics orgs
< naywhayare> I saw that, there's also this confusing machine learning "thing":
pt_25 has joined #mlpack
< stephentu> see their frontpage contains the phrase big data like 10 times
< stephentu> they also power lexisnexis
< naywhayare> lexisnexis... do people still use that?
< naywhayare> I don't think I've heard that word since the 90s
< stephentu> lawyers i presume
sumedhghaisas has joined #mlpack
< stephentu> i used it a lot to do research for speech and debate in HS
< naywhayare> ah, okay
< naywhayare> oh wow, I know a guy associated with these HPCCSystems guys
< naywhayare> and they're not far from where I am
< naywhayare> I'm... not seeing much about open source here
< naywhayare> they have a "community edition" that's open-source
< stephentu> well they have a github
< naywhayare> yeah, but they also sell an "enterprise" edition... man, that seems pretty shady to me. sucks for any of their GSoC students, who probably won't make any of the money that HPCC makes by selling their half-open-source product
< stephentu> ya i dont see what incentive a student woudl have to do these projects
< stephentu> very strange
< stephentu> well on the plus side
< stephentu> i have more time to study for prelims this summer :)
< naywhayare> hah, yeah, I was kind of thinking the same thing
< naywhayare> it would have been a lot of fun to mentor students, but this does mean I have more free time to actually write a thesis...
pt_25 has quit [Quit: Page closed]
< stephentu> ya graduating might be nice
< stephentu> or maybe i'll use the time and implement some of the projects myself
< stephentu> like the atomic norm stuff
< naywhayare> yeah, I may take the time and do one or two of the projects; we'll see
Cooler_ has quit [Ping timeout: 246 seconds]
stephentu has quit [Quit: Lost terminal]
stephentu has joined #mlpack
yingryic has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
kshitijk has quit [Ping timeout: 245 seconds]
curiousguy13_ has quit [Ping timeout: 244 seconds]
curiousguy13_ has joined #mlpack
stephentu has quit [Ping timeout: 252 seconds]
stephentu has joined #mlpack
lezorich has quit [Ping timeout: 256 seconds]
stephent1 has joined #mlpack
stephentu has quit [Ping timeout: 252 seconds]