#mlpack on 2016-07-21 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:53 govg has quit [Ping timeout: 240 seconds]

03:10 govg has joined #mlpack

04:58 chrismeyer has quit [Quit: chrismeyer]

06:10 Mathnerd314 has quit [Ping timeout: 258 seconds]

07:19 nilay has quit [Quit: Page closed]

08:29 mentekid has joined #mlpack

09:45 mentekid has quit [Remote host closed the connection]

09:47 mentekid has joined #mlpack

11:35 marcosirc has quit [Ping timeout: 264 seconds]

12:28 marcosirc has joined #mlpack

13:30 < mentekid> rcurtin: I think I sort-of managed to backport trigamma and polygamma :)

13:30 < mentekid> I also added the boost_backport.hpp file that handles boost versions and includes either our backported files or new versions

13:30 < mentekid> (I am uploading that now)

13:31 < mentekid> So instead of #include <boost/.../trigamma.hpp> I do <mlpack/.../boost_backport.hpp>

13:32 < mentekid> is that what you meant?

13:47 < rcurtin> yep, and boost_backport.hpp should either include your backported trigamma, if the user's boost version is too old, or the regular boost trigamma, if the boost version is new enough

13:48 < mentekid> cool, that's what it does now

13:48 < rcurtin> great; I'll take a look through it when I have a chance

13:48 < rcurtin> I have a lot of work stuff to do today and tomorrow, but next week the office is closed so I can catch back up with all the PRs that are waiting on me :(

13:48 < mentekid> I think I've introduced a bug for newer versions with that, so I'll try to solve that as well

13:48 < rcurtin> when the Jenkins server finally comes online (I think it is getting close now, they are trying to run the second scanner before they open the firewall), we'll be able to test this more easily

13:49 < rcurtin> because we can have Jenkins fire off Docker containers with all manner of different Boost versions to test each one

13:49 < mentekid> It's ok... I expected this to take me at most Tuesday but it proved to be more complicated and I realized today I basically have 3 weeks till GSoC is over so there's a lot to be done

13:50 < rcurtin> wow, is it only three weeks now?

13:50 < rcurtin> time goes too fast...

13:50 < mentekid> It's August 15, and this week is almost over so yeah

13:51 < mentekid> and I still have no clear plan on how to implement the other 99% of the modeling algorithm :P

13:51 < rcurtin> let me know if I can help out with anything

13:52 < mentekid> I don't think you can... I just need to sit on my ass until I have a skeleton of what needs to be done

13:53 < mentekid> I mean, you could if you wanted to but it's what I have to do :P

13:53 < rcurtin> :)

13:53 < rcurtin> I guess I mean, if there is anything you are hung up on, let me know and I will do my best to provide quick help

13:54 sumedhghaisas has joined #mlpack

13:54 < mentekid> Yeah, I'll let you know. I predict once I do create the skeleton I will bombard you with specific questions

13:56 < rcurtin> I will do my best to be ready :)

13:58 < mentekid> actually, if you do have time -

13:58 < mentekid> though travis is building fine, my machine now fails

13:59 < mentekid> it seems I end up including the backported details/polygamma.hpp file despite the if/else

13:59 < mentekid> this is the boost_backport:https://github.com/mentekid/mlpack/blob/d0f0c89bf6183a19950a99002292528f7d2589e5/src/mlpack/core/boost_backport/boost_backport.hpp

14:00 < mentekid> if you see some obvious mistake let me know

14:05 < rcurtin> mentekid: looks great to me; I might try and separate the includes of unordered_map and trigamma/polygamma into separate #if sections but it makes no functional difference

14:06 < rcurtin> I think maybe you have lit the AppVeyor servers on fire... when you pushed each commit to the PR individually, AppVeyor built every single one of them :)

14:06 < mentekid> oh crap

14:06 < rcurtin> it's not a problem at all, I just think it's funny :)

14:06 < mentekid> I thought it killed the builds when new PRs came

14:07 < mentekid> poor servers

14:07 < rcurtin> actually it looks like most of the builds killed themselves:

14:07 < rcurtin> fatal error C1001: An internal error has occurred in the compiler. [C:\projects\mlpack\build\src\mlpack\mlpack.vcxproj]

14:07 < rcurtin> https://ci.appveyor.com/project/mlpack/mlpack/build/%231277

14:07 < rcurtin> MSVC is so buggy...

14:08 < mentekid> I hated windows compiling before

14:08 < mentekid> after GSoC I hate it with a passion

14:09 < rcurtin> yeah, it's tough with mlpack... I'm really happy zoq got AppVeyor set up, because previously I'd had a bunch of poorly maintained Windows boxes that were supposed to build it from Jenkins

14:09 < rcurtin> but keeping them up to date was just too time consuming

14:09 < rcurtin> with Appveyor it's way better, we can at least see if the build is broken or not and get quick feedback on what's wrong with it

14:09 < rcurtin> maybe AppVeyor does cancel the builds if a new commit is pushed... I see at least one failure with this message:

14:09 < rcurtin> MSBUILD : error MSB4017: The build stopped unexpectedly because of an unexpected logger failure.

14:10 < mentekid> I would expect it does... That's why I did rapid-fire commits

14:10 < mentekid> Is the fail because of my code or unrelated?

14:10 < rcurtin> the most recent failure on your PR is the same "unexpected logger failure"

14:11 < rcurtin> but I don't think that has anything to do with the code

14:11 < rcurtin> maybe that's because the hard drive melted?

14:11 < rcurtin> I don't know any easy way to force AppVeyor to build again... if you want to push another very simple commit, maybe it will build it successfully...

14:11 < mentekid> is it your machine or appveyor's?

14:11 < marcosirc> sumedhghaisas: Hi, How are you? have you seen the example I sent you yesterday?

14:14 < rcurtin> mentekid: AppVeyor's, not mine... the few Windows boxes I had have been turned off for months now

14:14 < keonkim> hello, as I moved on to statistics module, I created a little helper class that calculates statistics. It is basically a wrapper class on top of armadillo statistics functions, but it provides more functionalities like Skewness() and Kurtosis().

14:14 < zoq> rcurtin: We can rerun the buid job.

14:15 < mentekid> I hope I didn't cause them any damage... Still I'd feel worse if it was yours, so bright side to everything

14:15 < rcurtin> mentekid: I'm only joking, I doubt anything is on fire :)

14:15 < zoq> #1285?

14:15 < rcurtin> zoq: yeah, that's the one... how do you restart it

14:15 < rcurtin> ?

14:16 < rcurtin> mentekid: AppVeyor offers the free service for any number of builds, so I am sure we are not the ones hitting their servers the hardest :)

14:16 < zoq> by clicking the "RE-BUILD PR" button

14:16 < keonkim> I intended this to be only used for a small descriptive statistics program I am making, and I put it inside a separate file to be used as a utility class. But now as I think of it, it might not be really efficient.

14:16 < rcurtin> I must be blind... do I need to log in to see that button?

14:16 < keonkim> here is the header of the statistics class: https://github.com/keonkim/mlpack/blob/describe/src/mlpack/core/data/statistics.hpp

14:16 < keonkim> do you think I should keep doing what I am doing?

14:16 < zoq> rcurtin: I think so

14:18 < sumedhghaisas> marcosirc: Hey marcos... just came home ... can I ping you after dinner? just making it...

14:18 < marcosirc> sumedhghaisas: Ok. Thanks!

14:21 < rcurtin> zoq: I'm not sure what the issue is... it seems like the "rebuild" button should be next to the "log" button at the top of the screen on https://ci.appveyor.com/project/mlpack/mlpack/build/%231287

14:21 < rcurtin> but I only see "log", no option to rebuild

14:23 < zoq> keonkim: If it turns out to be slower as the armadillo equivalent I don't think it's a good idea, can you run some benchmarks?

14:27 chrismeyer has joined #mlpack

14:28 < keonkim> zoq: it might be same or little slower since it is using armadillo functions. One advantage is that it can be used to calculate stats dimensions by dimensions. for example, arma::max(Matrix) would give a vector of all maxes of each dimensions. The new Max(dimension) function in contrast can calculate Max of just one dimension..

14:30 < keonkim> But i think the only case where a user wants to output a statistics of just one dimension is the descriptive statistics program. Maybe I should keep all code in one file.

14:32 travis-ci has joined #mlpack

14:32 < travis-ci> mlpack/mlpack#1258 (master - 82cf865 : Ryan Curtin): The build is still failing.

14:32 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/57a24a9d9fbd...82cf86500e8a

14:32 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/146385563

14:32 travis-ci has left #mlpack []

14:33 < zoq> keonkim: Doesn't do arma::max( Q, dim ) the same thing?

14:33 < keonkim> on all dimensions, arma::statistics functions took 39.903932s whereas the new functions took about 45s. on just one dimension, the armadillo statistics functions took 21.442109s (because it calculates for all other dimensions anyways) and the new functions took 0.246790s

14:34 < mentekid> rcurtin: Now I am completely baffled... My boost_backport code should completely ignore the backported code (for my system, for example).

14:34 < keonkim> dim in arma::max(Q, dim) just indicates whether the matrix is columnMaor or RowMajor

14:34 < mentekid> But it includes it nontheless, and that produces an error because I declare boost::math::policies::digits_base10 and it is already declaredrcurtin:

14:34 < zoq> keonkim: ah, right

14:35 < keonkim> fyi the file size for the above tests was 400 x 100600

14:36 < rcurtin> mentekid: are you sure that you aren't including the backported versions accidentally somewhere?

14:36 < rcurtin> another thing to ensure is that boost/version.hpp is being included before boost_backport.hpp, otherwise I think BOOST_VERSION might be undefined

14:36 < rcurtin> maybe I should have said that earlier... I forgot about that part

14:37 < mentekid> ah

14:37 < mentekid> so the else is executed

14:37 < rcurtin> I think right now prereqs.hpp includes boost/version.hpp but you can probably move that to boost_backport.hpp

14:37 < rcurtin> yeah, I think if you don't define the variable, BOOST_VERSION will be set to 0

14:37 < rcurtin> (which will use your backported version)

14:37 < mentekid> I don't see boost/version.hpp included in prereqs

14:39 < rcurtin> hm, you're right

14:39 < mentekid> yep that was it

14:39 < rcurtin> yeah, maybe it was an error that boost/version.hpp was not included in prereqs.hpp

14:39 < mentekid> I have to stop thinking of cmake as a wizard that gives me library versions :P

14:40 < rcurtin> or possibly, some other part of boost we had included before the #if was including version.hpp

14:40 < rcurtin> if CMake was a wizard, it is a very bad one :)

14:40 < zoq> keonkim: Couldn't we just use the arma::statistics if we go over all dimensions and the modified code if we just use one dimension?

14:41 < mentekid> I just expect it to do what I would like it to do... So now I expected it to set BOOST_VERSION to the correct number

14:41 < mentekid> now it's building. cool

14:43 < rcurtin> yeah; in this case BOOST_VERSION comes directly from version.hpp, not CMake

14:44 < rcurtin> (though it would be possible to have CMake set BOOST_VERSION too by adding something like -DBOOST_VERSION=105800 or whatever to the compile flags)

14:45 < zoq> rcurtin: Can you see the settings in: https://ci.appveyor.com/team?

14:45 < keonkim> zoq: I was thinking the same... I might just put this inside the descriptive statistics program then.

14:46 < zoq> keonkim: I like the interface, so I think that would be the best solution here.

14:48 < rcurtin> zoq: I see "rcurtin team", so I wonder if I am logged in with the wrong account, or if my account is not linked to the mlpack account, or something

14:48 < rcurtin> but I can see in my profile that I am a collaborator on the mlpack team

14:48 < rcurtin> er, the mlpack "account" is the word they use there

14:54 < zoq> hm, the github Team "mlpack/Owners" has the APPVEYOR ROLE: "Administrator"

14:55 tham has joined #mlpack

15:00 keonkim has quit [Quit: PanicBNC - http://PanicBNC.net]

15:07 < tham> keonkim : Hi, I like the api of you Statistics class too

15:11 < tham> Original purpose is provide a n api looks like R describe and summary

15:11 < tham> These functions can help the users gain some insight of their data

15:12 < tham> If you found generate formatted output is hard, boost format is a good choice

15:12 < tham> or you could use the printf api

15:12 < tham> sprintf

15:13 < tham> I am trying to solve the bottleneck of file loading, will open a pull request on this weekend if everything are smooth

15:14 < tham> before that I think I would like to merge the pull request #694

15:15 < tham> about #694, should we remove the api with output parameter?I could finish this part it is merged

15:15 < tham> after it is merged

15:18 < rcurtin> tham: I'll look over #694 today or tomorrow and provide any final comments; I think it is much improved

15:18 < tham> boost::format is flexible, ease to use but not a fast library, to make things faster, we can avoid recreating the boost::format over and over again

15:20 < tham> rcurtin : agree, the api with output was suggested by me, feel bad to ask keon to remove it after the trouble he go through

15:21 < rcurtin> it's ok, sometimes that is the best option. I have written very many parts of mlpack only to go back and remove them entirely later :)

15:22 < rcurtin> it's not ideal, certainly, but I think it's somewhat unavoidable

15:26 Mathnerd314 has joined #mlpack

15:27 < tham> porting fast csv parser to mlpack, this parser is quite fast as I mentioned in #681, I believe this solution should solve the problem of speed and compile at the same time

15:27 < rcurtin> tham: this one? :7780

15:27 < rcurtin> oops, bad paste... this one? https://github.com/ben-strasser/fast-cpp-csv-parser

15:28 < tham> rcurtin : yes

15:28 sumedhghaisas has quit [Ping timeout: 240 seconds]

15:28 < rcurtin> how does it compare to boost::spirit in runtime?

15:30 nilay has joined #mlpack

15:32 < tham> rcurtin : haven't compare it with boost::spirit yet, but it is very fast too, can parse a csv file close to 40MB around 181ms

15:33 < tham> I do the performance measurement at https://github.com/mlpack/mlpack/pull/681

15:33 govg has quit [Ping timeout: 276 seconds]

15:36 < tham> Compare with spirit, there are still 2200ms for me to post process the string parse by fast csv parser and build the DataSetMapper

15:37 < rcurtin> tham: 40MB in 181ms is pretty amazing

15:37 < rcurtin> I'm still playing with #681 to try and accelerate the compile time, but I have not had time to look into it in too much detail

15:39 < tham> rcurtin : it is ok, I would try to develop a smaller, less compile time expensive solution for this problem

15:39 < rcurtin> I think it might be possible to keep the compile time down and also use boost::spirit, but I have not figured out exactly how yet

15:44 < tham> rcurtin : I would try to look into this problem too, fast csv parser can parse the element into different string but do not know they are numerical or categorical

15:48 < tham> To make it work we need a converter too, I think the compile time issues is related to #722?

15:49 < rcurtin> yep, I thought opening #722 would allow us to help reduce the compile time in other ways too

15:49 < rcurtin> it's possible that we could reduce the compile time enough in other ways that including boost spirit would not be a big issue

15:54 tham has quit [Quit: Page closed]

16:25 keonkim has joined #mlpack

16:30 mentekid has quit [Ping timeout: 244 seconds]

16:41 sumedhghaisas has joined #mlpack

16:45 < sumedhghaisas> marcosirc: Hey marcos...

16:45 < marcosirc> sumedhghaisas: Hi!

16:46 < sumedhghaisas> I have looked at your example...

16:46 < sumedhghaisas> this is the example to show the reduced tau value right?

16:47 < marcosirc> Yeah.

16:48 < sumedhghaisas> okay I understood that... I am still not able to decide between the implementations ...

16:48 < marcosirc> also, it shows that the overlapping is different in different parts of the decision boundary

16:48 < marcosirc> Ok.

16:50 < sumedhghaisas> overlapping is different? I didn't get that... you mean the distance from decision boundry to p1 and to p2?

16:51 < marcosirc> Yes.

16:54 < sumedhghaisas> ahh yes that I understood...

16:55 < sumedhghaisas> I think we should implement the original algorithm ... I have 2 reasons behind that...

16:55 < sumedhghaisas> it will provide us a benchmark so that we can evaluate your variations effectively...

16:56 < sumedhghaisas> second... I am assuming spill trees are used widely ....

16:57 < sumedhghaisas> So it won;t be a good decision to implement an improvised version of spill trees but not the original...

16:57 < sumedhghaisas> although

16:57 < sumedhghaisas> regarding dual tree algorithms...

16:58 < sumedhghaisas> I think your way is better...

16:59 < sumedhghaisas> Implementing both ways is also not an option as they will be executed in different frameworks...

17:00 govg has joined #mlpack

17:03 < sumedhghaisas> Its difficult to gauge their effectiveness... like which one is better... the only way to prove that hrect bounds are better is to compare it to the actual implementation...

17:03 < sumedhghaisas> what is your opinion?

17:08 < marcosirc> Ok, yes I agree that the difference in the implementations is a problem if we want to benchmark and make assertions about spill trees in general.

17:09 < marcosirc> mmm (I am thinking)

17:19 sumedhghaisas has quit [Ping timeout: 240 seconds]

17:22 < marcosirc> sumedhghaisas: ups.. it looks like you are offline.

17:22 < marcosirc> sumedhghaisas: So, I didn't undertood what you mean when you say: Implementing both ways is also not an option as they will be executed in different frameworks..

17:24 < marcosirc> Also, I think the difference is not only the speed (how many nodes are prunned), but also the accuracy. In the "bound's approach" the amount of overlapping is different in different part of the decision boundary, it will approximate differently...

17:24 < marcosirc> So Ok, if you agree I will implemente again spill trees in a new branch

17:25 < marcosirc> with the approach proposed in Ting Liu's paper. Using the same cutting hyperplanes when building the tree and when searching the tree.

17:27 < marcosirc> I will work on this the next days and I will write an email with the results.

17:27 < marcosirc> would you agree?

17:30 sumedhghaisas has joined #mlpack

17:42 < sumedhghaisas> rcurtin: Hey Ryan...

17:43 < sumedhghaisas> I am trying to install lcov on the jenkins machine....

17:43 < sumedhghaisas> its giving me error...

17:47 sumedhghaisas has quit [Ping timeout: 250 seconds]

17:48 < rcurtin> sumedhghaisas: lcov installed on big, should work now

17:49 < marcosirc> sumedhghaisas: I wrote some messages while you were online.

17:49 < marcosirc> *offline

17:50 < rcurtin> hopefully he checks the logs :)

17:50 < marcosirc> I will send him the messages in private

17:51 < marcosirc> uhh I didn't realized he is offline again :|

17:55 < rcurtin> :)

17:55 sumedhghaisas has joined #mlpack

17:56 < rcurtin> sumedhghaisas: I installed lcov on big, it should work now

17:56 < sumedhghaisas> ohh great I will give it build now...

18:02 sumedhghaisas_ has joined #mlpack

18:04 chrismeyer_ has joined #mlpack

18:05 chrismeyer has quit [Ping timeout: 276 seconds]

18:05 chrismeyer_ is now known as chrismeyer

18:06 sumedhghaisas has quit [Ping timeout: 264 seconds]

18:19 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

18:23 chrismeyer has quit [Quit: chrismeyer]

18:24 nilay has quit [Quit: Page closed]

18:25 chrismeyer has joined #mlpack

18:33 travis-ci has joined #mlpack

18:33 < travis-ci> mlpack/mlpack#1260 (master - 4c88348 : Ryan Curtin): The build is still failing.

18:33 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/82cf86500e8a...4c88348ddc9a

18:33 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/146448127

18:33 travis-ci has left #mlpack []

18:48 sumedhghaisas has joined #mlpack

19:04 marcosirc has quit [Quit: WeeChat 1.4]

19:04 marcosirc has joined #mlpack

19:19 mentekid has joined #mlpack

19:35 sumedhghaisas has quit [Ping timeout: 264 seconds]

19:58 marcosirc has quit [Quit: WeeChat 1.4]

21:31 mentekid has quit [Ping timeout: 244 seconds]

21:36 mentekid has joined #mlpack

22:26 mentekid has quit [Ping timeout: 240 seconds]

23:36 chrismeyer has quit [Ping timeout: 258 seconds]