< rcurtin> lozhnikov: okay, sounds good. I still think maybe AuxiliaryInformationType is the better place for normalModeMaxNumChildren even though SplitType uses it, but I'll think about it
< rcurtin> if it is, I can do the refactoring
< nilay> in armadillo, we can slice the cube(x,y,z) into matrices by the third dimension, z, by using cube.slice(0) , cube.slice(1) and so on . . . is there a way to slice the cube(x,y,z) into matrices by the first dimension x.?
< carbon_addict> Armadillo 7.100 released:
< lozhnikov> rcurtin: You're right. I thought about it and i decided that normalModeMaxNumChildren should be moved to AuxiliaryInformationType. I'll do the refactoring.
< tham> nilay : You said you want to slice through x axis?
< tham> What do you mean?
< tham> Cube is a third dimension "matrix"
< tham> You can treat it as a container which store a lot of two dimension matrix, just like std::vector<arma::Mat_<T>>
< tham> you can access the x axis like this
< tham> my_cube.slice(0).col(0); //access column 0 of matrix 0 in my_cube
< tham> You can use this solution to access the matrix of cube too
< tham> auto &b = a.slice(0);
< tham> in most of the cases, you should be able to treat b as the reference of matrix
< tham> About copyMakeBorder, what kind of mode you want to implement? I do not think you need implement all of the modes, you can add them step by step if you like
< tham> Forgot to said, I was quite busy last few days, now I have more times to participate in GSOC
< tham> May I study your codes about the copyMakeBorder?
< tham> I am not good at study those complicated theory(I am glad zoq is very good at this), but I think I am able to help you debug, design the api
< tham> rcurtin : keonkim said he want to remove the normalize algorithm at here--
< tham> I saw your suggestion of the api, I think using Train() and Apply() is a good idea, this could make the style more consistent
< tham> What do you think about remove the normalize algorithm?
< tham> I think this file could keep there for backward compatility too
< tham> Talk a lot, I need to get some sleep, exhausted in recent days
< zoq> tham nilay: I don't think we should implement more than one border type. Any border type should be good enough, for our purpose.
mentekid has joined #mlpack
< rcurtin> tham: keonkim: if we remove normalize_labels, what will we replace it with? that function is used for a few algortihms (like NCA I think)
< rcurtin> I have no problems removing it as long as we think through some alternative or something
< keonkim> rcurtin: I found that normalization functions are implemented seperately in other methods, so I thought gathering all them into one class would be great.
< rcurtin> keonkim: that sounds reasonable to me
nilay has joined #mlpack
< rcurtin> marcosirc: thanks for answering #647, I think you are right with your diagnosis
< rcurtin> :)
< marcosirc> Thanks!
< rcurtin> marcosirc: I changed my mind, I think B_2 is right, I am writing up a proof now
< rcurtin> you will have to try to find an error in the proof :)
< marcosirc> Mmm. Do you think so? It will be great to see that proof.
< rcurtin> yeah... almost done
< nilay> tham: sorry for late reply... i implemented reflection border type. here is the code:
< nilay> we only pad image on right and bottom as done in python codes.
< nilay> to multiply (2 cubes) or a (mat and a cube), i have to write a for loop and use dot product?
< zoq> nilay: If you use cubes, that's right.
< nilay> zoq: cubes should be used for images?
< zoq> nilay: If you like you can use you can internally use cubes, but since the rest of mlpack works with matrices, we should design the interface accordingly.
< nilay> zoq: so the public function will input matrices and private ones can input cubes? cubes give neater code i guess.. if you want i can do all by matrices only. .
< zoq> nilay: It's fine for me to use cubes in private functions.
< nilay> zoq: ok
< rcurtin> be careful with cubes, note that slices are not contiguous in memory
< rcurtin> ack, sorry, I think I am incorrect
< rcurtin> yeah, the thing to be aware of is that Mat.slice(0).col(0) is not directly next to Mat.slice(1).col(0) in memory
< rcurtin> that doesn't mean you shouldn't use it, it's just a thing to be aware of when thinking about memory access patterns
< nilay> rcurtin: column need to be copied this way so i don't have option
< rcurtin> yeah, I do not know the details of what you are doing, I'm just pointing that out
< rcurtin> you want to access the columns of a slice sequentially, instead of accessing a given column sequentially across slices
< rcurtin> i.e. slice(i).col(0), slice(i).col(1), ...
< rcurtin> not slice(0).col(i), slice(1).col(i), ...
< rcurtin> that's all I meant :)
< nilay> rcurtin: ok thanks. i'll keep that in mind. :)
< mentekid> rcurtin: I have run into somewhat of a dilemma regarding my implementation... Can I bother you for a sec?
< rcurtin> mentekid: sure. I am in a meeting, but I am not 100% paying attention to it :)
< mentekid> Cool. The problem is this: I have to create a min-heap that holds some scores (doubles). I can do that with a priority queue from stl, no problem
< mentekid> thing is, these scores correspond to some vectors (so if we have 8 scores we have 8 vectors)
< mentekid> but I can't push the vector on the heap because it'
< mentekid> (sorry mispressed return) it's not comparable to stuff... I have thought of the complex work-around of creating my dummy class that has a value and a vector and implementing a friend-function that does comparisons... I just wanted to ask if you can think (or have seen) anything simpler
< rcurtin> I think you could use std::pair<double, size_t> for this
< rcurtin> I forget what the default comparison is for std::pair, but I think it compares the first value first, and upon equality it will compare the second value
< rcurtin> so I think tha tmight work in your case
< mentekid> ah
< mentekid> I knew I had to ask you, I don't know the STL yet
< rcurtin> I have found, in the past, that std::pair can be slow, but that was 6 years ago and in a different situation
< rcurtin> here I think there may be not better alternative
< mentekid> so I could also have a pair<double, vector<size_t>> right?
< rcurtin> yeah, you can also do that, but the comparison might be a little bit more trouble there
< rcurtin> but you can still write a custom comparator and I think pass it as a template argument to the priority_queue
< mentekid> ah, yes I can do that
< rcurtin> I might consider holding the vector<size_t> separately and just holding an index to the vector in the std::pair
< rcurtin> this could avoid copies/moves (depending on the underlying implementation)
< mentekid> hmm so that way I would have the second item of the pair pointing to the vector
< mentekid> I think that would work faster yeah
< rcurtin> might be worth trying both, but my intuition suggests holding an index to the vector would be faster
< rcurtin> but it is pretty clear my intuition is not always right :)
< mentekid> cool I'll try each and see how it goes
< mentekid> at least I avoided creating a whole class from scratch just for one minheap
< rcurtin> :)
< mentekid> thanks :)
< rcurtin> sure, let me know if I can help with anything else
< rcurtin> it is my job after all :)
nilay has quit [Ping timeout: 250 seconds]
< mentekid> I think the multiprobe LSH was designed explicitly so it would be impossible to be written in C++ :P
< mentekid> my head hurts
< mentekid> I think after I write it I'll find out there was a much simpler way which I didn't see
< rcurtin> mentekid: I think that has more to do with C++'s design than multiprobe LSH :)
< dnm_> hi
< dnm_> I am trying to run a multithreaded program with a c++ code that uses mlpack, , but I cannot get any parallelization.
< dnm_> Only one core is being used (there is no cpu usage on the other cores) and omp_get_max_threads always returns 1.
< dnm_> the same code runs fine if I don't include mlpack .
< dnm_> Did anyone have the same problem before?
< dnm_> Thanks
< rcurtin> dnm_: hi there, so do you mean that including mlpack causes your OpenMP-ized code to no longer be parallel?
< dnm_> yes
< rcurtin> there should not be anything in mlpack that changes your OpenMP configuration
< dnm_> And it is happening only in one of the compute nodes of the server that I use
< rcurtin> some of mlpack is OpenMP-ized (specifically, the density estimation tree code), but this should not be affecting your code
< rcurtin> so it does not happen in every compute node on the server you are using either, only sometimes?
< dnm_> I tried to use c++11 thread for parallelization but the same thing happened
< dnm_> Only one cpu usage
< dnm_> there is no problem with forking, but I cannot get more than one CPU
< rcurtin> if you can construct a minimal working example that demonstrates this behavior (1 CPU when including mlpack, n CPUs when not including mlpack), I can look into this further
< rcurtin> but I am really dubious that mlpack is the problem here
< dnm_> I tried it with a couple of examples, I will try to send one of them
< rcurtin> okay, if you send it I will take a look
< rcurtin> there is nothing in the mlpack code that modifies the OpenMP configuration
< rcurtin> this is why I think maybe the issue is something else
< rcurtin> but still, if I can reproduce it, I can look into it
< dnm_> thanks, I am trying to write a small example that shows the problem
< dnm_> but it is not just openmp issue I guess.
< dnm_> becuase I could not get parallelization when trying to fork with c++11 threads
< dnm_> forking was ok
< dnm_> but there is cpu binding
< dnm_> they are all in the same cpu
< dnm_> And the weird thing is I have this problem in only one the nodes of the server that I use
< dnm_> the other ones run the code fine
< dnm_> anyway I am writing a small example
< rcurtin> yes so I am saying, if your problem is only in one of the nodes on your server, I suspect the problem may be with the server, not mlpack
< rcurtin> but if you can get me a small example, like I said, I will try it and we will see what happens :)
< dnm_> #include <cmath>
< dnm_> #include <iostream>
< dnm_> I will sent all of it in one line
< dnm_> formatting will be weird
< dnm_> #include <cmath> #include <iostream> #include <fstream> // std::ifstream #include <ostream> // std::ifstream #include <omp.h> #include <mlpack/methods/linear_regression/linear_regression.hpp> using namespace std; using namespace arma; using namespace mlpack::regression;
< dnm_> This is the include section
< dnm_> vec multivarLinearRegression(mat data, vec responses) { vec responsesDB = conv_to< vec >::from(responses); mat dataDB = conv_to<mat>::from(data); // Regress. LinearRegression lr(dataDB,responsesDB, 0, false); // temporary solution for conversion (FL) //return conv_to< vec >::from(lr.Parameters()); return lr.Parameters(); }
< dnm_> void testMLR(){ mat data(2,5); // 2-dimensional, 5 points vec responses(5); data(0,0)= 2; data(1,0)= 3; data(0,1)= 5; data(1,1)= 1; data(0,2)= 4; data(1,2)= 20; data(0,3)= 7; data(1,3)= 16; data(0,4)= 9; data(1,4)= 12; cout << data << endl; responses(0)= 13; responses(1)= 13; responses(2)= 68; responses(3)= 62; responses(4)= 54; //cout << responses << endl; vec vc = multivar
< zoq> dnm_: please you pastebin or something like that
< dnm_> int main() { const unsigned size = 900000; double sinTable[size]; int tid; arma::mat A = zeros(2,3); cout << A << endl; cout << "max thread " << omp_get_max_threads() << endl; cout << "num of procs " << omp_get_num_procs() << endl; omp_set_num_threads(4); //#pragma omp parallel shared(sinTable) private(tid) //{ #pragma omp parallel for schedule(dynamic, 1) for(size_t n=0; n<s
< dnm_> all of the code is this
< dnm_> I first compile with
< dnm_> g++ -c -m64 -pipe -DARMA_DONT_USE_WRAPPER -fopenmp -std=c++11 -lpthread -lgomp -O2 -Wall -W -D_REENTRANT -DARMA_USE_ARPACK -DARMA_64BIT_WORD -I. -I/usr/include -I/usr/local/include/mlpack -I/usr/local/include/boost -o tmp.o tmp.cpp
< dnm_> then run
< dnm_> g++ -m64 -DARMA_DONT_USE_WRAPPER -g -ggdb -fopenmp -lpthread -lgomp -Wl,-O1 -o aaa tmp.o -L/usr/lib/x86_64-linux-gnu -lmlpack -larmadillo -lblas -llapack -larpack -L/usr/local/lib/ -fopenmp -lpthread
< dnm_> link
< dnm_> Although I don't call the function using mlpack in the main function, still the same problem happens
< rcurtin> can you provide the code in pastebin or something? it's basically impossible for me to copy-paste that
< dnm_> thanks again
< dnm_> to compile and link I use the following lines
< dnm_> g++ -c -m64 -pipe -DARMA_DONT_USE_WRAPPER -fopenmp -std=c++11 -lpthread -lgomp -O2 -Wall -W -D_REENTRANT -DARMA_USE_ARPACK -DARMA_64BIT_WORD -I. -I/usr/include -I/usr/local/include/mlpack -I/usr/local/include/boost -o tmp.o tmp.cpp
< dnm_> g++ -m64 -DARMA_DONT_USE_WRAPPER -fopenmp -lpthread -lgomp -Wl,-O1 -o aaa tmp.o -L/usr/lib/x86_64-linux-gnu -lmlpack -larmadillo -lblas -llapack -larpack -L/usr/local/lib/ -fopenmp -lpthread
< rcurtin> and on your system what is the output when you run the program?
< rcurtin> on my system it reports:
< rcurtin> max thread 32
< rcurtin> num of procs 32
< dnm_> max thread 1 num of procs 1
< dnm_> only one
< rcurtin> and what you are saying is
< dnm_> but if I do not include mlpack and the function , it shows 24 threads
< rcurtin> that if you comment out the line '#include <mlpack/methods/linear_regression/linear_regression.hpp>', the output is different
< dnm_> and the functions that uses mlpack
< dnm_> yes
< rcurtin> I see no difference on my system when I comment out the mlpack code
< rcurtin> and you also say that this only happens on one system, right?
< dnm_> this is happening in only one of the machines
< dnm_> in the server I use
< dnm_> but due to the memory limits on the machines I need to use that certain node
< dnm_> and I am actually not sure why this might be happening
< dnm_> somehow mlpack is binding CPU usage
< rcurtin> I am not convinced of that
< dnm_> and it is not just an openmp issue
< rcurtin> I think that the symptom you are seeing is that when you include mlpack, the problem occurs
< rcurtin> but I do not think it is because of mlpack
< rcurtin> I think it is because of other system configuration somewhere, or possibly slight differences in how you are compiling with and without mlpack, and other reasons
< rcurtin> either way, whatever the issue is, I unfortunately can't debug it unless I can reproduce it, and I am not able to
< dnm_> but it is blocking more than one cpu usage with c++11 threads as well...
< dnm_> ok, thanks anyway..
< rcurtin> yes, you said this, but there is not any reason why mlpack would do this
< rcurtin> I mean, I can try and help, but I can't really dig in here if I can't reproduce it
< rcurtin> is there anything special about the one particular node you are running on?
< dnm_> not really
< dnm_> other than having a very large memory
< rcurtin> I think that probably the best thing that you can do to figure out what is going on here is to find some auxiliary openmp or c++11 threads functions that will tell you something about the system in question and why you are only getting one processor from omp_get_max_threads()
< rcurtin> but I am not sure of exactly what would be needed to debug at that level, as I've never seen this problem or anything resembling it
< dnm_> ok, thanks. I will try..
< rcurtin> yes, let me know what you find out
< dnm_> sure, thanks again
< rcurtin> if we can actually pinpoint the problem to some code in mlpack, I will fix it, but like I said earlier, I really can't think of any reasonable theory for how mlpack would be affecting your setup
< rcurtin> sorry that I can't be more helpful at this time
< dnm_> ok, thanks again for your time.
< rcurtin> sure, no worries, that is what we are here for :)
