verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
Mohith has joined #mlpack
< Mohith>
hello guyz
Mohith has quit [Ping timeout: 250 seconds]
sumedhghaisas has joined #mlpack
Stellar_Mind has joined #mlpack
Stellar_Mind has quit [Ping timeout: 276 seconds]
< marcosirc>
clear
marcosirc has quit [Quit: WeeChat 1.4]
Stellar_Mind has joined #mlpack
mohiths has joined #mlpack
< mohiths>
hello i need help
< mohiths>
i installed visual studio 2015
Stellar_Mind has quit [Ping timeout: 264 seconds]
< mohiths>
when i clicked manage nuget packages for solution
< mohiths>
it is showing me error that no projects are supported by nuget
< TD>
Has anyone ever installed mlpack and created a working solution in a windows environment?
< marcosirc>
Hi! I think tham uses mlpack in windows. But he is not online now...
< TD>
Okay, just wanted to know if this was possible. I'm stuck in between trying to fix a data feed so it works in Linux or trying to install mlpack in Windows.
< mentekid>
didn't keon write a nice tutorial for mlpack on windows 10?
< mentekid>
I think I saw a link a few days ago floating around... Let me check (or have you seen that?)
< TD>
Yeah, and it worked great. Mlpack is successfully installed. When I try to build a custom solution though, I am getting errors that certain files can't be opened. Reached out to VS and then told me contact the creator of the library
< TD>
certain libraries
< TD>
I haven't given up but just wanted to know if someone has programmed a successful solution
< TD>
hope :-0
< zoq>
TD: Does the same error occur if you open one of the mlpack executables?
< TD>
zoq: That's my hope right now
< TD>
I will let you know
< TD>
I am going to try it tonight
< zoq>
okay, thanks
< TD>
persistently stupid has always been my strong point in solving problems :-)
< TD>
Or stupid persistence
< TD>
It's a combination
< zoq>
nah, I'm sure, we can figure it out somehow
nilay has joined #mlpack
kwikadi has quit [Remote host closed the connection]
kwikadi has joined #mlpack
< marcosirc>
rcurtin: are you online?
< rcurtin>
marcosirc: yeah, I just got back from lunch
< marcosirc>
rcurtin: ok! I have included more documentation as requested in the PR.
< marcosirc>
And, also, I fixed an error in serialization.
< marcosirc>
Loading was not working.
< rcurtin>
loading of NSModel was not working?
< rcurtin>
I had tested that, but it is possible something changed. one of the problems is, we only have tests for C++ code, not really for the command-line programs
< rcurtin>
I think that at some point we should figure out how to test the command-line programs and integrate that with the rest of the mlpack tests
< rcurtin>
but I haven't had a chance to look into it fully (I think maybe that should be next after the 2.0.2 release, which would have happened except I found a bug...)
< marcosirc>
Yeah. I agree, I was planning to include some test in serialization of NSModel.
< rcurtin>
the documentation looks great, thanks for taking the time to do that
< marcosirc>
Thanks!
< rcurtin>
ah, I see the serialization issue you were talking about now
< rcurtin>
I think we could probably make boost::variant serialize the right thing, but then we would need to have it hold something like SecondShim<NeighborSearch<...>> instead of NeighborSearch<...> objects
< marcosirc>
yeah. I have thought in two different ways of fixing it.
< rcurtin>
and I am not sure that is worth the extra effort
< rcurtin>
have you seen src/mlpack/core/data/serialization_shim.hpp? lots of effort for such a minor change ...
< marcosirc>
Yeah I have seen that...
< marcosirc>
I have fixed serialization. Now it works. I use boost serialization for variants.
< rcurtin>
yeah, I agree, the solution you pushed to the PR looks good to me
< marcosirc>
Ok. Thanks. That is all I wanted to know.
< rcurtin>
sure, glad I could help. today is a paper deadline day so I am not able to commit so much time to mlpack today, unfortunately...
< marcosirc>
rcurtin: sure, no problem! Thanks for your time.
< mentekid>
rcurtin: ok so all the vectorization amounted to nothing
< mentekid>
search is 2-3 times slower now :(
< rcurtin>
hmm :(
< rcurtin>
well I still think it is useful that you wrote the code, because now we know :)
< rcurtin>
do you have the code pasted somewhere so I can glance at it?
< rcurtin>
maybe I will have an idea to speed it up, or, maybe we will be forced to conclude that it wasn't helpful
< mentekid>
I thought it might be the sorting, but I commented it out and still got bad results so it's not that
< rcurtin>
it looks like you are calculating all of the reference set distances at once, then sorting and inserting
< rcurtin>
I wonder if you might be better off sequentially calculating each distance, like the BaseCase() loop
< rcurtin>
but it sounds like maybe you have done some profiling of what is fast and what is slow and maybe even that would not speed things up
< mentekid>
I think the main reason the vectorization is faster is because I'm doing the matrix vector multiplication
< mentekid>
and I saw some pretty good cpu usage at that point, close to 400% for my 4-core machine
< mentekid>
which is openBLAS using parallelism in the background
< mentekid>
I can try doing it as you say sequentially though, which will indeed skip the sorting
< mentekid>
But still sorting doesn't seem to be the problem in the end
< rcurtin>
so the cost of assembling the vector of norms is too high then, I guess?
< mentekid>
let me profile the code with armadillo debugging symbols so I know, but I guess that's the waste yeah
< mentekid>
I guessed if we did enough calculations it would offset that, but apparently the candidate sets are smaller so it's too hard to balance it out
< marcosirc>
Hi @zoq
< marcosirc>
to benchmark some libraries all I have to do is make run... and make reports.... isn't it?
< zoq>
arcosirc: If you are going to use javascript interface there is no need to run 'make reports', but you need to set LOG=True. Also you could set BLOCK and METHODBLOCk to speed things up:
< zoq>
make CONFIG=commit-benchmark.yaml MLPACK_BIN=/home/marcus/workspace/mlpack-release/bin/ MLPACK_BIN_DEBUG=/home/marcus/workspace/mlpack-release/bin/ BLOCK=mlpack METHODBLOCK=LSH LOG=True run
< zoq>
If you need a machine to run some benchmarks ... I can provide access to a machine that already comes with a working benchmark system setup
< marcosirc>
zoq: thanks for your reply.
< marcosirc>
I don't know why I can't see any graphic...
< marcosirc>
I execute make run .. as you suggested
< marcosirc>
then I go to the reports directory, and I execture: "python -m SimpleHTTPServer"
< marcosirc>
when I open the html page and select different options, nothing is shown...
< zoq>
for any view?
< marcosirc>
yes. I can't see any graphic for any view.
< marcosirc>
mmm I don't think so because I can see the different options and methods that I have run.
< marcosirc>
I will try that anyway.
< marcosirc>
Mmm with your db everything works fine... it is strange... I will clean everything and start again.
< zoq>
hm, can you run the benchmark with LOG=False and check if you get any results?
< marcosirc>
Ok, I will check.
< marcosirc>
with LOG=False it is exactly the same. I will pull the last changes from your repo and try again.
< zoq>
with LOG=FALSE the results are printed to stdout
< marcosirc>
Yeah, they were printed.
< zoq>
hm, can you send me the database file?
< marcosirc>
zoq: I sent you an email with the db file.
< mentekid>
rcurtin: sorry for being late with my response. I just completed the profiling. It seems like the final sorting actually does account for a part of the delay - around 30% of total time
< mentekid>
but that alone doesn't justify the times I saw, where my "optimized" code was 2-3 times slower than the original
< mentekid>
also for some reason the sift dataset resists being profiled, i have no idea why
< rcurtin>
what do you mean? I dunno how a dataset can resist :)
< mentekid>
I think it's just that it is small so the profiler is confused in the noise
< rcurtin>
maybe use a larger version? :)
< mentekid>
yeah that's what I'm doing now
< rcurtin>
:)
< zoq>
marcosirc: Thanks, let me take a look.
tsathoggua has joined #mlpack
tsathoggua has quit [Client Quit]
nilay has quit [Ping timeout: 250 seconds]
< zoq>
marcosirc: Does "options: '-k 3 --seed 42 -e 0.05'" end with a space in your config file?
< marcosirc>
zoq: yes.
< marcosirc>
is that the problem?
< zoq>
yes I think so, rc.param_name = param_name_full.split("(")[0].replace(/^\s+|\s+$/g, ''); also truncates the whitespace at the end, so if we query the database with methods.parameters == rc.param_name it doesn't match
< zoq>
Can you rerun the benchmark without the white space at the end?
< zoq>
that's definitely an issue, I'll fix it later today
< marcosirc>
zoq: yeah! that was the problem.
< marcosirc>
Now it works fine. Thanks. I think it would take me a lot of time to find this problem.
< zoq>
you're welcome, I guess, the regex isn't that easy to read
< mentekid>
rcurtin: different datasets, similar results. I re-run the timing test as well (with sorting, without sorting, with the old code) and the times seem to more or less agree with the profiler... So I guess it was a bad idea :/
< mentekid>
And it's a bummer because OpenBLAS was running on all 4 cores and hitting good cpu usage... I guess we'll have to do parallelization ourselves
< mentekid>
I'll start with the parallel query processing as we discussed. If you come up with any ideas to maybe improve the vectorized code let me know :)
< mentekid>
I'll actually run a few datasets and parameters at night because all my tests were using default parameters
< rcurtin>
openblas on all cores is still underperforming the existing approach?
< rcurtin>
hm... that intuitively seems incorrect to me...
< rcurtin>
but I guess it is possible that the cost of calculating that norms vector is just too high
< mentekid>
But it shouldn't be - that's what I find so weird: We already do that calculation at least once, so why would doing it once in the beginning for all points be that wasteful...
< rcurtin>
let me take a closer look at the code...
< mentekid>
please do, I'm starting to believe I've done something stupid without realizing it
< rcurtin>
I think the call to .cols() should be avoided; that will assemble a copy of the data matrix
< rcurtin>
better to just loop over all the refIndices points and calculate their dot products
< rcurtin>
as a bonus when you do it that way there is no need to store the distances of all candidates and sprt them
< rcurtin>
*sort
< mentekid>
you mean around line 528 right?
< mentekid>
where I create a copy of the reference set
< rcurtin>
yep, line 528
< mentekid>
I see, I'll try doing it like BaseCase - maybe that will work better
marcosirc has quit [Quit: WeeChat 1.4]
< TD>
Would anyone have the properties needed for VS2015 to run an executable? I am guessing Project - 'Visual C++' - 'Win32Project' - 'DLL' - 'Empty project'
< TD>
And also the solution properties - 'C/C++ - General Additional Include - ?
< TD>
Runtime Library - Multi-thread(/MT) ?
< TD>
Precomplied Header - Not Using Precomplied Headers ?
< TD>
And any other solution properties I am missing?