verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
chrismeyer has quit [Quit: chrismeyer]
Mathnerd314 has quit [Ping timeout: 258 seconds]
nilay has quit [Quit: Page closed]
mentekid has joined #mlpack
mentekid has quit [Remote host closed the connection]
mentekid has joined #mlpack
marcosirc has quit [Ping timeout: 264 seconds]
marcosirc has joined #mlpack
< mentekid> rcurtin: I think I sort-of managed to backport trigamma and polygamma :)
< mentekid> I also added the boost_backport.hpp file that handles boost versions and includes either our backported files or new versions
< mentekid> (I am uploading that now)
< mentekid> So instead of #include <boost/.../trigamma.hpp> I do <mlpack/.../boost_backport.hpp>
< mentekid> is that what you meant?
< rcurtin> yep, and boost_backport.hpp should either include your backported trigamma, if the user's boost version is too old, or the regular boost trigamma, if the boost version is new enough
< mentekid> cool, that's what it does now
< rcurtin> great; I'll take a look through it when I have a chance
< rcurtin> I have a lot of work stuff to do today and tomorrow, but next week the office is closed so I can catch back up with all the PRs that are waiting on me :(
< mentekid> I think I've introduced a bug for newer versions with that, so I'll try to solve that as well
< rcurtin> when the Jenkins server finally comes online (I think it is getting close now, they are trying to run the second scanner before they open the firewall), we'll be able to test this more easily
< rcurtin> because we can have Jenkins fire off Docker containers with all manner of different Boost versions to test each one
< mentekid> It's ok... I expected this to take me at most Tuesday but it proved to be more complicated and I realized today I basically have 3 weeks till GSoC is over so there's a lot to be done
< rcurtin> wow, is it only three weeks now?
< rcurtin> time goes too fast...
< mentekid> It's August 15, and this week is almost over so yeah
< mentekid> and I still have no clear plan on how to implement the other 99% of the modeling algorithm :P
< rcurtin> let me know if I can help out with anything
< mentekid> I don't think you can... I just need to sit on my ass until I have a skeleton of what needs to be done
< mentekid> I mean, you could if you wanted to but it's what I have to do :P
< rcurtin> :)
< rcurtin> I guess I mean, if there is anything you are hung up on, let me know and I will do my best to provide quick help
sumedhghaisas has joined #mlpack
< mentekid> Yeah, I'll let you know. I predict once I do create the skeleton I will bombard you with specific questions
< rcurtin> I will do my best to be ready :)
< mentekid> actually, if you do have time -
< mentekid> though travis is building fine, my machine now fails
< mentekid> it seems I end up including the backported details/polygamma.hpp file despite the if/else
< mentekid> this is the boost_backport:https://github.com/mentekid/mlpack/blob/d0f0c89bf6183a19950a99002292528f7d2589e5/src/mlpack/core/boost_backport/boost_backport.hpp
< mentekid> if you see some obvious mistake let me know
< rcurtin> mentekid: looks great to me; I might try and separate the includes of unordered_map and trigamma/polygamma into separate #if sections but it makes no functional difference
< rcurtin> I think maybe you have lit the AppVeyor servers on fire... when you pushed each commit to the PR individually, AppVeyor built every single one of them :)
< mentekid> oh crap
< rcurtin> it's not a problem at all, I just think it's funny :)
< mentekid> I thought it killed the builds when new PRs came
< mentekid> poor servers
< rcurtin> actually it looks like most of the builds killed themselves:
< rcurtin> fatal error C1001: An internal error has occurred in the compiler. [C:\projects\mlpack\build\src\mlpack\mlpack.vcxproj]
< rcurtin> MSVC is so buggy...
< mentekid> I hated windows compiling before
< mentekid> after GSoC I hate it with a passion
< rcurtin> yeah, it's tough with mlpack... I'm really happy zoq got AppVeyor set up, because previously I'd had a bunch of poorly maintained Windows boxes that were supposed to build it from Jenkins
< rcurtin> but keeping them up to date was just too time consuming
< rcurtin> with Appveyor it's way better, we can at least see if the build is broken or not and get quick feedback on what's wrong with it
< rcurtin> maybe AppVeyor does cancel the builds if a new commit is pushed... I see at least one failure with this message:
< rcurtin> MSBUILD : error MSB4017: The build stopped unexpectedly because of an unexpected logger failure.
< mentekid> I would expect it does... That's why I did rapid-fire commits
< mentekid> Is the fail because of my code or unrelated?
< rcurtin> the most recent failure on your PR is the same "unexpected logger failure"
< rcurtin> but I don't think that has anything to do with the code
< rcurtin> maybe that's because the hard drive melted?
< rcurtin> I don't know any easy way to force AppVeyor to build again... if you want to push another very simple commit, maybe it will build it successfully...
< mentekid> is it your machine or appveyor's?
< marcosirc> sumedhghaisas: Hi, How are you? have you seen the example I sent you yesterday?
< rcurtin> mentekid: AppVeyor's, not mine... the few Windows boxes I had have been turned off for months now
< keonkim> hello, as I moved on to statistics module, I created a little helper class that calculates statistics. It is basically a wrapper class on top of armadillo statistics functions, but it provides more functionalities like Skewness() and Kurtosis().
< zoq> rcurtin: We can rerun the buid job.
< mentekid> I hope I didn't cause them any damage... Still I'd feel worse if it was yours, so bright side to everything
< rcurtin> mentekid: I'm only joking, I doubt anything is on fire :)
< zoq> #1285?
< rcurtin> zoq: yeah, that's the one... how do you restart it
< rcurtin> ?
< rcurtin> mentekid: AppVeyor offers the free service for any number of builds, so I am sure we are not the ones hitting their servers the hardest :)
< zoq> by clicking the "RE-BUILD PR" button
< keonkim> I intended this to be only used for a small descriptive statistics program I am making, and I put it inside a separate file to be used as a utility class. But now as I think of it, it might not be really efficient.
< rcurtin> I must be blind... do I need to log in to see that button?
< keonkim> do you think I should keep doing what I am doing?
< zoq> rcurtin: I think so
< sumedhghaisas> marcosirc: Hey marcos... just came home ... can I ping you after dinner? just making it...
< marcosirc> sumedhghaisas: Ok. Thanks!
< rcurtin> zoq: I'm not sure what the issue is... it seems like the "rebuild" button should be next to the "log" button at the top of the screen on https://ci.appveyor.com/project/mlpack/mlpack/build/%231287
< rcurtin> but I only see "log", no option to rebuild
< zoq> keonkim: If it turns out to be slower as the armadillo equivalent I don't think it's a good idea, can you run some benchmarks?
chrismeyer has joined #mlpack
< keonkim> zoq: it might be same or little slower since it is using armadillo functions. One advantage is that it can be used to calculate stats dimensions by dimensions. for example, arma::max(Matrix) would give a vector of all maxes of each dimensions. The new Max(dimension) function in contrast can calculate Max of just one dimension..
< keonkim> But i think the only case where a user wants to output a statistics of just one dimension is the descriptive statistics program. Maybe I should keep all code in one file.
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#1258 (master - 82cf865 : Ryan Curtin): The build is still failing.
travis-ci has left #mlpack []
< zoq> keonkim: Doesn't do arma::max( Q, dim ) the same thing?
< keonkim> on all dimensions, arma::statistics functions took 39.903932s whereas the new functions took about 45s. on just one dimension, the armadillo statistics functions took 21.442109s (because it calculates for all other dimensions anyways) and the new functions took 0.246790s
< mentekid> rcurtin: Now I am completely baffled... My boost_backport code should completely ignore the backported code (for my system, for example).
< keonkim> dim in arma::max(Q, dim) just indicates whether the matrix is columnMaor or RowMajor
< mentekid> But it includes it nontheless, and that produces an error because I declare boost::math::policies::digits_base10 and it is already declaredrcurtin:
< zoq> keonkim: ah, right
< keonkim> fyi the file size for the above tests was 400 x 100600
< rcurtin> mentekid: are you sure that you aren't including the backported versions accidentally somewhere?
< rcurtin> another thing to ensure is that boost/version.hpp is being included before boost_backport.hpp, otherwise I think BOOST_VERSION might be undefined
< rcurtin> maybe I should have said that earlier... I forgot about that part
< mentekid> ah
< mentekid> so the else is executed
< rcurtin> I think right now prereqs.hpp includes boost/version.hpp but you can probably move that to boost_backport.hpp
< rcurtin> yeah, I think if you don't define the variable, BOOST_VERSION will be set to 0
< rcurtin> (which will use your backported version)
< mentekid> I don't see boost/version.hpp included in prereqs
< rcurtin> hm, you're right
< mentekid> yep that was it
< rcurtin> yeah, maybe it was an error that boost/version.hpp was not included in prereqs.hpp
< mentekid> I have to stop thinking of cmake as a wizard that gives me library versions :P
< rcurtin> or possibly, some other part of boost we had included before the #if was including version.hpp
< rcurtin> if CMake was a wizard, it is a very bad one :)
< zoq> keonkim: Couldn't we just use the arma::statistics if we go over all dimensions and the modified code if we just use one dimension?
< mentekid> I just expect it to do what I would like it to do... So now I expected it to set BOOST_VERSION to the correct number
< mentekid> now it's building. cool
< rcurtin> yeah; in this case BOOST_VERSION comes directly from version.hpp, not CMake
< rcurtin> (though it would be possible to have CMake set BOOST_VERSION too by adding something like -DBOOST_VERSION=105800 or whatever to the compile flags)
< zoq> rcurtin: Can you see the settings in: https://ci.appveyor.com/team?
< keonkim> zoq: I was thinking the same... I might just put this inside the descriptive statistics program then.
< zoq> keonkim: I like the interface, so I think that would be the best solution here.
< rcurtin> zoq: I see "rcurtin team", so I wonder if I am logged in with the wrong account, or if my account is not linked to the mlpack account, or something
< rcurtin> but I can see in my profile that I am a collaborator on the mlpack team
< rcurtin> er, the mlpack "account" is the word they use there
< zoq> hm, the github Team "mlpack/Owners" has the APPVEYOR ROLE: "Administrator"
tham has joined #mlpack
keonkim has quit [Quit: PanicBNC - http://PanicBNC.net]
< tham> keonkim : Hi, I like the api of you Statistics class too
< tham> Original purpose is provide a n api looks like R describe and summary
< tham> These functions can help the users gain some insight of their data
< tham> If you found generate formatted output is hard, boost format is a good choice
< tham> or you could use the printf api
< tham> sprintf
< tham> I am trying to solve the bottleneck of file loading, will open a pull request on this weekend if everything are smooth
< tham> before that I think I would like to merge the pull request #694
< tham> about #694, should we remove the api with output parameter?I could finish this part it is merged
< tham> after it is merged
< rcurtin> tham: I'll look over #694 today or tomorrow and provide any final comments; I think it is much improved
< tham> boost::format is flexible, ease to use but not a fast library, to make things faster, we can avoid recreating the boost::format over and over again
< tham> rcurtin : agree, the api with output was suggested by me, feel bad to ask keon to remove it after the trouble he go through
< rcurtin> it's ok, sometimes that is the best option. I have written very many parts of mlpack only to go back and remove them entirely later :)
< rcurtin> it's not ideal, certainly, but I think it's somewhat unavoidable
Mathnerd314 has joined #mlpack
< tham> porting fast csv parser to mlpack, this parser is quite fast as I mentioned in #681, I believe this solution should solve the problem of speed and compile at the same time
< rcurtin> tham: this one? :7780
< rcurtin> oops, bad paste... this one? https://github.com/ben-strasser/fast-cpp-csv-parser
< tham> rcurtin : yes
sumedhghaisas has quit [Ping timeout: 240 seconds]
< rcurtin> how does it compare to boost::spirit in runtime?
nilay has joined #mlpack
< tham> rcurtin : haven't compare it with boost::spirit yet, but it is very fast too, can parse a csv file close to 40MB around 181ms
< tham> I do the performance measurement at https://github.com/mlpack/mlpack/pull/681
govg has quit [Ping timeout: 276 seconds]
< tham> Compare with spirit, there are still 2200ms for me to post process the string parse by fast csv parser and build the DataSetMapper
< rcurtin> tham: 40MB in 181ms is pretty amazing
< rcurtin> I'm still playing with #681 to try and accelerate the compile time, but I have not had time to look into it in too much detail
< tham> rcurtin : it is ok, I would try to develop a smaller, less compile time expensive solution for this problem
< rcurtin> I think it might be possible to keep the compile time down and also use boost::spirit, but I have not figured out exactly how yet
< tham> rcurtin : I would try to look into this problem too, fast csv parser can parse the element into different string but do not know they are numerical or categorical
< tham> To make it work we need a converter too, I think the compile time issues is related to #722?
< rcurtin> yep, I thought opening #722 would allow us to help reduce the compile time in other ways too
< rcurtin> it's possible that we could reduce the compile time enough in other ways that including boost spirit would not be a big issue
tham has quit [Quit: Page closed]
keonkim has joined #mlpack
mentekid has quit [Ping timeout: 244 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas> marcosirc: Hey marcos...
< marcosirc> sumedhghaisas: Hi!
< sumedhghaisas> I have looked at your example...
< sumedhghaisas> this is the example to show the reduced tau value right?
< marcosirc> Yeah.
< sumedhghaisas> okay I understood that... I am still not able to decide between the implementations ...
< marcosirc> also, it shows that the overlapping is different in different parts of the decision boundary
< marcosirc> Ok.
< sumedhghaisas> overlapping is different? I didn't get that... you mean the distance from decision boundry to p1 and to p2?
< marcosirc> Yes.
< sumedhghaisas> ahh yes that I understood...
< sumedhghaisas> I think we should implement the original algorithm ... I have 2 reasons behind that...
< sumedhghaisas> it will provide us a benchmark so that we can evaluate your variations effectively...
< sumedhghaisas> second... I am assuming spill trees are used widely ....
< sumedhghaisas> So it won;t be a good decision to implement an improvised version of spill trees but not the original...
< sumedhghaisas> although
< sumedhghaisas> regarding dual tree algorithms...
< sumedhghaisas> I think your way is better...
< sumedhghaisas> Implementing both ways is also not an option as they will be executed in different frameworks...
govg has joined #mlpack
< sumedhghaisas> Its difficult to gauge their effectiveness... like which one is better... the only way to prove that hrect bounds are better is to compare it to the actual implementation...
< sumedhghaisas> what is your opinion?
< marcosirc> Ok, yes I agree that the difference in the implementations is a problem if we want to benchmark and make assertions about spill trees in general.
< marcosirc> mmm (I am thinking)
sumedhghaisas has quit [Ping timeout: 240 seconds]
< marcosirc> sumedhghaisas: ups.. it looks like you are offline.
< marcosirc> sumedhghaisas: So, I didn't undertood what you mean when you say: Implementing both ways is also not an option as they will be executed in different frameworks..
< marcosirc> Also, I think the difference is not only the speed (how many nodes are prunned), but also the accuracy. In the "bound's approach" the amount of overlapping is different in different part of the decision boundary, it will approximate differently...
< marcosirc> So Ok, if you agree I will implemente again spill trees in a new branch
< marcosirc> with the approach proposed in Ting Liu's paper. Using the same cutting hyperplanes when building the tree and when searching the tree.
< marcosirc> I will work on this the next days and I will write an email with the results.
< marcosirc> would you agree?
sumedhghaisas has joined #mlpack
< sumedhghaisas> rcurtin: Hey Ryan...
< sumedhghaisas> I am trying to install lcov on the jenkins machine....
< sumedhghaisas> its giving me error...
sumedhghaisas has quit [Ping timeout: 250 seconds]
< rcurtin> sumedhghaisas: lcov installed on big, should work now
< marcosirc> sumedhghaisas: I wrote some messages while you were online.
< marcosirc> *offline
< rcurtin> hopefully he checks the logs :)
< marcosirc> I will send him the messages in private
< marcosirc> uhh I didn't realized he is offline again :|
< rcurtin> :)
sumedhghaisas has joined #mlpack
< rcurtin> sumedhghaisas: I installed lcov on big, it should work now
< sumedhghaisas> ohh great I will give it build now...
sumedhghaisas_ has joined #mlpack
chrismeyer_ has joined #mlpack
chrismeyer has quit [Ping timeout: 276 seconds]
chrismeyer_ is now known as chrismeyer
sumedhghaisas has quit [Ping timeout: 264 seconds]
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
chrismeyer has quit [Quit: chrismeyer]
nilay has quit [Quit: Page closed]
chrismeyer has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#1260 (master - 4c88348 : Ryan Curtin): The build is still failing.
travis-ci has left #mlpack []
sumedhghaisas has joined #mlpack
marcosirc has quit [Quit: WeeChat 1.4]
marcosirc has joined #mlpack
mentekid has joined #mlpack
sumedhghaisas has quit [Ping timeout: 264 seconds]
marcosirc has quit [Quit: WeeChat 1.4]
mentekid has quit [Ping timeout: 244 seconds]
mentekid has joined #mlpack
mentekid has quit [Ping timeout: 240 seconds]
chrismeyer has quit [Ping timeout: 258 seconds]