#mlpack on 2016-06-17 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

04:48 Mohith has joined #mlpack

04:52 < Mohith> hello guyz

05:28 Mohith has quit [Ping timeout: 250 seconds]

05:49 sumedhghaisas has joined #mlpack

06:05 Stellar_Mind has joined #mlpack

06:17 Stellar_Mind has quit [Ping timeout: 276 seconds]

06:21 < marcosirc> clear

06:21 marcosirc has quit [Quit: WeeChat 1.4]

06:31 Stellar_Mind has joined #mlpack

06:33 mohiths has joined #mlpack

06:36 < mohiths> hello i need help

06:36 < mohiths> i installed visual studio 2015

06:36 Stellar_Mind has quit [Ping timeout: 264 seconds]

06:37 < mohiths> when i clicked manage nuget packages for solution

06:37 < mohiths> it is showing me error that no projects are supported by nuget

06:37 < mohiths> please help me

06:40 Stellar_Mind has joined #mlpack

06:43 Mathnerd314 has quit [Ping timeout: 272 seconds]

06:46 < mohiths> hello

06:46 < mohiths> Can anyone help me out!

06:47 < keonkim> hello

06:47 < keonkim> mohiths: I wrote this tutorial -> http://keon.io/mlpack-on-windows.html

06:47 < mohiths> yeah i'm following that only

06:48 < mohiths> but when i clicked manage nuget packages for solution

06:48 < keonkim> mohiths: yap

06:48 < mohiths> it is showing error that no projcets are supported by nuget

06:49 < mohiths> There is some problem with nuget

06:50 < keonkim> hmm... unfortunately I don't use vs anymore, don't know if I can help without recreating the situation

06:50 < keonkim> what step were you on before getting that error message?

06:50 < mohiths> oh should i switch to linux now ?

06:50 < mohiths> is that the only solution that i have?

06:51 < mohiths> ok wait let me explain ......

06:51 < mohiths> I'm in step 2 of your tutorial

06:52 < mohiths> i created a project file using existing code

06:53 < mohiths> then it is given that i need to go to tools and click nuget package manager and then nuget packages for solutions

06:53 < mohiths> and there i'm getting error that no projects are supported by nuget

06:54 < mohiths> hey bro there ?

06:55 < keonkim> are you using the latest VS 2015?

06:55 < keonkim> yup

06:55 < keonkim> I was searching on google

06:55 < keonkim> :)

06:56 < mohiths> yeah VS 2015

06:56 < keonkim> is it updated after Oct 2015?

06:57 < mohiths> yeah It's the latest version

06:59 < keonkim> hmm strange, there is not much I can find on the internet

07:00 < mohiths> yeah me too

07:00 < mohiths> should i switch to linux ? is it more easier ?

07:01 < keonkim> at least for me using it on linux is much easier.

07:02 < mohiths> oh good

07:02 < mohiths> can you please send me any links .. how to install mlpack on linux?

07:03 < mohiths> also tell me which version of linux is better ?

07:04 < keonkim> I use ubuntu. you can follow the README on github: https://github.com/mlpack/mlpack

07:06 < mohiths> which version of ubuntu ?

07:08 < keonkim> Any stable version should be fine.

07:08 < keonkim> I tried with 14.04 and 16.04

07:08 < mohiths> okay

07:08 < mohiths> what about 15.10?

07:08 < keonkim> 15.05 I never tried.

07:09 < keonkim> but it should be fine.

07:09 < mohiths> okay thank you

07:09 < keonkim> 14.04 and 16.04 are Long Term Support versions

07:09 < mohiths> time has come to switch to linux

07:09 < keonkim> so I recommend those

07:09 < mohiths> ok i'm installing 14.04

07:09 < mohiths> thanks a lot

07:11 mohiths has quit [Quit: Page closed]

07:11 < keonkim> but if you are comfortable with windows, fixing it should take less time than learning Linux environment from scratch.

07:16 nilay has joined #mlpack

08:23 mentekid has quit [Ping timeout: 244 seconds]

08:23 mentekid has joined #mlpack

08:43 TD has quit [Ping timeout: 250 seconds]

08:48 tsathoggua has joined #mlpack

08:49 tsathoggua has quit [Client Quit]

09:43 Stellar_Mind has quit [Ping timeout: 244 seconds]

09:48 nilay has quit [Quit: Page closed]

12:58 sumedhghaisas has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

15:01 marcosirc has joined #mlpack

15:24 Mathnerd314 has joined #mlpack

16:11 TD has joined #mlpack

16:11 < TD> Has anyone ever installed mlpack and created a working solution in a windows environment?

16:22 < marcosirc> Hi! I think tham uses mlpack in windows. But he is not online now...

16:28 < TD> Okay, just wanted to know if this was possible. I'm stuck in between trying to fix a data feed so it works in Linux or trying to install mlpack in Windows.

16:28 < mentekid> didn't keon write a nice tutorial for mlpack on windows 10?

16:29 < mentekid> I think I saw a link a few days ago floating around... Let me check (or have you seen that?)

16:29 < mentekid> yep here: http://keon.io/mlpack-on-windows.html

16:30 < TD> Yeah, and it worked great. Mlpack is successfully installed. When I try to build a custom solution though, I am getting errors that certain files can't be opened. Reached out to VS and then told me contact the creator of the library

16:31 < TD> certain libraries

16:31 < TD> I haven't given up but just wanted to know if someone has programmed a successful solution

16:31 < TD> hope :-0

16:31 < zoq> TD: Does the same error occur if you open one of the mlpack executables?

16:32 < TD> zoq: That's my hope right now

16:32 < TD> I will let you know

16:32 < TD> I am going to try it tonight

16:33 < zoq> okay, thanks

16:33 < TD> persistently stupid has always been my strong point in solving problems :-)

16:34 < TD> Or stupid persistence

16:34 < TD> It's a combination

16:35 < zoq> nah, I'm sure, we can figure it out somehow

16:38 nilay has joined #mlpack

16:42 kwikadi has quit [Remote host closed the connection]

16:50 kwikadi has joined #mlpack

16:52 < marcosirc> rcurtin: are you online?

16:58 < rcurtin> marcosirc: yeah, I just got back from lunch

16:59 < marcosirc> rcurtin: ok! I have included more documentation as requested in the PR.

16:59 < marcosirc> And, also, I fixed an error in serialization.

16:59 < marcosirc> Loading was not working.

17:00 < rcurtin> loading of NSModel was not working?

17:01 < rcurtin> I had tested that, but it is possible something changed. one of the problems is, we only have tests for C++ code, not really for the command-line programs

17:01 < rcurtin> I think that at some point we should figure out how to test the command-line programs and integrate that with the rest of the mlpack tests

17:01 < rcurtin> but I haven't had a chance to look into it fully (I think maybe that should be next after the 2.0.2 release, which would have happened except I found a bug...)

17:02 < marcosirc> Yeah. I agree, I was planning to include some test in serialization of NSModel.

17:02 < rcurtin> the documentation looks great, thanks for taking the time to do that

17:02 < marcosirc> Thanks!

17:03 < rcurtin> ah, I see the serialization issue you were talking about now

17:03 < rcurtin> I think we could probably make boost::variant serialize the right thing, but then we would need to have it hold something like SecondShim<NeighborSearch<...>> instead of NeighborSearch<...> objects

17:03 < marcosirc> yeah. I have thought in two different ways of fixing it.

17:03 < rcurtin> and I am not sure that is worth the extra effort

17:04 < rcurtin> have you seen src/mlpack/core/data/serialization_shim.hpp? lots of effort for such a minor change ...

17:04 < marcosirc> Yeah I have seen that...

17:05 < marcosirc> I have fixed serialization. Now it works. I use boost serialization for variants.

17:05 < rcurtin> yeah, I agree, the solution you pushed to the PR looks good to me

17:06 < marcosirc> Ok. Thanks. That is all I wanted to know.

17:06 < rcurtin> sure, glad I could help. today is a paper deadline day so I am not able to commit so much time to mlpack today, unfortunately...

17:08 < marcosirc> rcurtin: sure, no problem! Thanks for your time.

17:24 < mentekid> rcurtin: ok so all the vectorization amounted to nothing

17:24 < mentekid> search is 2-3 times slower now :(

17:25 < rcurtin> hmm :(

17:26 < rcurtin> well I still think it is useful that you wrote the code, because now we know :)

17:26 < rcurtin> do you have the code pasted somewhere so I can glance at it?

17:26 < rcurtin> maybe I will have an idea to speed it up, or, maybe we will be forced to conclude that it wasn't helpful

17:27 < mentekid> yep take a look here: https://github.com/mentekid/mlpack/tree/lsh-basecaseopt

17:28 < mentekid> I thought it might be the sorting, but I commented it out and still got bad results so it's not that

17:34 < rcurtin> it looks like you are calculating all of the reference set distances at once, then sorting and inserting

17:34 < rcurtin> I wonder if you might be better off sequentially calculating each distance, like the BaseCase() loop

17:34 < rcurtin> but it sounds like maybe you have done some profiling of what is fast and what is slow and maybe even that would not speed things up

17:36 < mentekid> I think the main reason the vectorization is faster is because I'm doing the matrix vector multiplication

17:37 < mentekid> and I saw some pretty good cpu usage at that point, close to 400% for my 4-core machine

17:38 < mentekid> which is openBLAS using parallelism in the background

17:38 < mentekid> I can try doing it as you say sequentially though, which will indeed skip the sorting

17:39 < mentekid> But still sorting doesn't seem to be the problem in the end

17:39 < rcurtin> so the cost of assembling the vector of norms is too high then, I guess?

17:40 < mentekid> let me profile the code with armadillo debugging symbols so I know, but I guess that's the waste yeah

17:40 < mentekid> I guessed if we did enough calculations it would offset that, but apparently the candidate sets are smaller so it's too hard to balance it out

18:47 < marcosirc> Hi @zoq

18:48 < marcosirc> to benchmark some libraries all I have to do is make run... and make reports.... isn't it?

18:54 < zoq> arcosirc: If you are going to use javascript interface there is no need to run 'make reports', but you need to set LOG=True. Also you could set BLOCK and METHODBLOCk to speed things up:

18:54 < zoq> make CONFIG=commit-benchmark.yaml MLPACK_BIN=/home/marcus/workspace/mlpack-release/bin/ MLPACK_BIN_DEBUG=/home/marcus/workspace/mlpack-release/bin/ BLOCK=mlpack METHODBLOCK=LSH LOG=True run

18:57 < zoq> If you need a machine to run some benchmarks ... I can provide access to a machine that already comes with a working benchmark system setup

18:59 < marcosirc> zoq: thanks for your reply.

18:59 < marcosirc> I don't know why I can't see any graphic...

19:00 < marcosirc> I execute make run .. as you suggested

19:00 < marcosirc> then I go to the reports directory, and I execture: "python -m SimpleHTTPServer"

19:01 < marcosirc> when I open the html page and select different options, nothing is shown...

19:01 < zoq> for any view?

19:01 < marcosirc> yes. I can't see any graphic for any view.

19:02 < zoq> hm, maybe the database is empty, can you test it with: http://big.mlpack.org/job/benchmark%20-%20mlpack%20-%20nightly/ws/reports/benchmark-daily.db

19:03 < marcosirc> mmm I don't think so because I can see the different options and methods that I have run.

19:03 < marcosirc> I will try that anyway.

19:05 < marcosirc> Mmm with your db everything works fine... it is strange... I will clean everything and start again.

19:06 < zoq> hm, can you run the benchmark with LOG=False and check if you get any results?

19:08 < marcosirc> Ok, I will check.

19:22 < marcosirc> with LOG=False it is exactly the same. I will pull the last changes from your repo and try again.

19:24 < zoq> with LOG=FALSE the results are printed to stdout

19:26 < marcosirc> Yeah, they were printed.

19:28 < zoq> hm, can you send me the database file?

19:42 < marcosirc> zoq: I sent you an email with the db file.

19:45 < mentekid> rcurtin: sorry for being late with my response. I just completed the profiling. It seems like the final sorting actually does account for a part of the delay - around 30% of total time

19:46 < mentekid> but that alone doesn't justify the times I saw, where my "optimized" code was 2-3 times slower than the original

19:46 < mentekid> also for some reason the sift dataset resists being profiled, i have no idea why

19:50 < rcurtin> what do you mean? I dunno how a dataset can resist :)

19:50 < mentekid> I think it's just that it is small so the profiler is confused in the noise

19:50 < rcurtin> maybe use a larger version? :)

19:50 < mentekid> yeah that's what I'm doing now

19:51 < rcurtin> :)

19:51 < zoq> marcosirc: Thanks, let me take a look.

20:00 tsathoggua has joined #mlpack

20:03 tsathoggua has quit [Client Quit]

20:14 nilay has quit [Ping timeout: 250 seconds]

20:15 < zoq> marcosirc: Does "options: '-k 3 --seed 42 -e 0.05'" end with a space in your config file?

20:23 < marcosirc> zoq: yes.

20:23 < marcosirc> is that the problem?

20:27 < zoq> yes I think so, rc.param_name = param_name_full.split("(")[0].replace(/^\s+|\s+$/g, ''); also truncates the whitespace at the end, so if we query the database with methods.parameters == rc.param_name it doesn't match

20:29 < zoq> Can you rerun the benchmark without the white space at the end?

20:31 < zoq> that's definitely an issue, I'll fix it later today

20:31 < marcosirc> zoq: yeah! that was the problem.

20:32 < marcosirc> Now it works fine. Thanks. I think it would take me a lot of time to find this problem.

20:39 < zoq> you're welcome, I guess, the regex isn't that easy to read

20:43 < mentekid> rcurtin: different datasets, similar results. I re-run the timing test as well (with sorting, without sorting, with the old code) and the times seem to more or less agree with the profiler... So I guess it was a bad idea :/

20:44 < mentekid> And it's a bummer because OpenBLAS was running on all 4 cores and hitting good cpu usage... I guess we'll have to do parallelization ourselves

20:48 < mentekid> I'll start with the parallel query processing as we discussed. If you come up with any ideas to maybe improve the vectorized code let me know :)

20:49 < mentekid> I'll actually run a few datasets and parameters at night because all my tests were using default parameters

21:37 < rcurtin> openblas on all cores is still underperforming the existing approach?

21:37 < rcurtin> hm... that intuitively seems incorrect to me...

21:38 < rcurtin> but I guess it is possible that the cost of calculating that norms vector is just too high

21:39 < mentekid> But it shouldn't be - that's what I find so weird: We already do that calculation at least once, so why would doing it once in the beginning for all points be that wasteful...

21:41 < rcurtin> let me take a closer look at the code...

21:43 < mentekid> please do, I'm starting to believe I've done something stupid without realizing it

21:43 < rcurtin> I think the call to .cols() should be avoided; that will assemble a copy of the data matrix

21:43 < rcurtin> better to just loop over all the refIndices points and calculate their dot products

21:44 < rcurtin> as a bonus when you do it that way there is no need to store the distances of all candidates and sprt them

21:44 < rcurtin> *sort

21:45 < mentekid> you mean around line 528 right?

21:45 < mentekid> where I create a copy of the reference set

21:50 < rcurtin> yep, line 528

21:51 < mentekid> I see, I'll try doing it like BaseCase - maybe that will work better

22:04 marcosirc has quit [Quit: WeeChat 1.4]

22:29 < TD> Would anyone have the properties needed for VS2015 to run an executable? I am guessing Project - 'Visual C++' - 'Win32Project' - 'DLL' - 'Empty project'

22:31 < TD> And also the solution properties - 'C/C++ - General Additional Include - ?

22:32 < TD> Runtime Library - Multi-thread(/MT) ?

22:33 < TD> Precomplied Header - Not Using Precomplied Headers ?

22:33 < TD> And any other solution properties I am missing?

22:37 < TD> Windows 10 environment

23:22 mentekid has quit [Ping timeout: 246 seconds]