verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has joined #mlpack
govg has quit [Ping timeout: 244 seconds]
govg has joined #mlpack
mentekid has joined #mlpack
mentekid has quit [Ping timeout: 260 seconds]
nilay has joined #mlpack
mentekid has joined #mlpack
nilay_ has joined #mlpack
nilay has quit [Ping timeout: 250 seconds]
JMarler has joined #mlpack
< JMarler>
Hi all, In need of a quick bit of advice regarding the internals of libarmadillo matricies - specifically in the naive bayes classifier
< JMarler>
In the Classify method of mlpacks NBC
< JMarler>
Am I right in thinking lines like the following
< JMarler>
will be allocating on the heap for the arma::vec and arma::mat objects ?
< JMarler>
In other words is heap memory being requested from the OS in the Classify method ?
< JMarler>
Apologies I'm not entirely sure how armadillo allocated memory internally. Assuming it just uses a std::vector or similar
< JMarler>
Suspect this is the case
< mentekid>
JMarler: yes, armadillo uses heap memory to store the elements of vectors and matrices
< mentekid>
but you don't need to delete/free anything, it does that itself
< JMarler>
Blast
< JMarler>
Assumed that was the case
< JMarler>
I'm attempting to use some of the various classifiers in a real time audio/callback thread and heap allocation is a big no go.
< JMarler>
Looking at the code for the NBC can anyone confirm for me whether I'm right in thinking that the various Mat and Vec objects could be potentially allocated at construction time of the NBC ?
< JMarler>
Willing to alter the code for my own use but I don't want to go flying along on a fruitless endeavour
< mentekid>
Maybe someone's more qualified to answer that for you (I'm new here) but I think since it's all on the heap, they are allocated at construction time, not compile time
< mentekid>
I think only arma::mat::fixed grabs memory at compilation
< JMarler>
Thanks mente kid.
< JMarler>
My issue is that I am calling the NBC's Classify method directly in my audio callback
< JMarler>
Looking at the code for the NBC there are lines like the following in the Classify method:
< JMarler>
arma::vec exponents(diffs.n_cols);
< JMarler>
Which I am assuming means the object above (arma::vec exponents) is allocating memory for its elements on the heap
< JMarler>
On a real time audio thread this can cause drop outs as it potentially means the thread could end up waiting for the OS to actually serve up the heap space
< JMarler>
The audio thread needs to run in constant/deterministic time. So requesting memory from the OS is tabboo.
< JMarler>
I'm still in my early days as far as Machine Learning is concerned and this is for an academic project so I guess my question for anyone that may be able to answer is looking at the NBC Classify method would I be right in thinking that objects like arma::vec exponents could potentially be constructed outside of the Classify method ? i.e. in the NBC class's constructor ?
< JMarler>
For example the implementation would probably involve providing an option to construct and NBC with a known / fixed training set size to avoid dynamically allocating the training set matrix by calling functions like:
< JMarler>
results.setSize(&data.n_cols);
< JMarler>
Basically I'd just like to sanity check the purely changing the boiler plate / memory allocation scheme won't affect the internals of the classification algorithm.
< JMarler>
I could possibly add a classify method with an additional argument flag like "fixed" or "realtime":
< JMarler>
Any answers to previous questions most appreciated
< mentekid>
JMarler: I'm not sure how to answer these since I'm new, maybe some of the other guys will be able to help you
< mentekid>
Because of the timezones they might be sleeping at the moment though so you might have to wait a bit :)
< JMarler>
mentekid: No problem. Thanks for the input regardless
nilay_ has quit [Ping timeout: 250 seconds]
< JMarler>
I've noticed libarmadillo has an ARMA_MAT_PREALLOC flag defined
< JMarler>
This internally I believe sets: static const uword mat_prealloc = 16;
< JMarler>
In fact...ignore that. Being slow, ARMA_MAT_PREALLOC is still irrelevant for my usage
< JMarler>
Starting to look at making some of these changes now. Would this be something that would be of use to mlpack ? i.e. as a pull request.
< JMarler>
It's a fairly significant change so I may be better just moving away from mlpack and attempting to roll out my own routines for the audio domain
< JMarler>
I think it could have some potential though. It could be of big use to the audio community as users of frameworks like JUCE for audio analysis purposes amongst others.
nilay has joined #mlpack
< rcurtin>
JMarler: I'm not sure there's an easy way to get mlpack to run without allocating memory all over the place
< rcurtin>
Armadillo already will allocate all over the place for linear algebra operations
< JMarler>
rcurtin: Thanks for the response
< JMarler>
Sorry I'm not in the know how in regards to the internals of armadillo
< rcurtin>
so it may be somewhat hard to adapt the NaiveBayesClassifier class to not allocate anything
< rcurtin>
I have to wonder if maybe the easier approach is to modify Armadillo only
< rcurtin>
and basically write your own memory manager
< JMarler>
So regardless of whether the memory for all the various matrices and vectors is being allocated at object construction you would say armadillo would still alloc internally ?
< rcurtin>
there is a function there, memory::acquire(), that is basically a wrapper around the new operator... maybe it could be modified to take memory that you had preallocated
< rcurtin>
hang on... phone call
< JMarler>
It would be fine for the NBC to dynamically allocate for my problem just not in the Train() and Classify() methods, it would have to pre-allocate all the Mat and Col/Vec variables in the NBC constructor
< JMarler>
Maybe it's too specific a use case and I'm going to have to scale things down to use my own classifiers/routines (mlpack just seemed the best way to get things up and running quickly at the time)
< rcurtin>
unfortunately I think I agree that it's too specific; I don't really want to change the NBC class to do that, unless we change all classes inside of mlpack to function like that (for consistency), and that would be a huge undertaking
< rcurtin>
I think for your use case, you might be best off just modifying mlpack and using it
< rcurtin>
but a thing to be careful about is that the expressions of Armadillo may still allocate memory, like:
< rcurtin>
arma::mat diffs = data - arma::repmat(means.col(i), 1, data.n_cols);
< rcurtin>
I am not sure, but it's possible that the repmat call may allocate memory
< rcurtin>
(ideally it shouldn't, the template metaprogramming should take care of it, but it's possible that it doesn't)
< JMarler>
I thought it would be the assignment itself as figured arma::mat probably allocates space for its elements internally regardless ?
< rcurtin>
yeah; so there are multiple possible allocations in that statement
< rcurtin>
first, the memory for diffs has to be allocated
< rcurtin>
and memory may be allocated for the repmat call
< rcurtin>
the reason I say "may" and not "will" is because it depends entirely on how good the template metaprogramming is inside armadillo
< rcurtin>
it's possible (and I am hopeful) that the code actually generated will not allocate any extra memory for the repmat call, by just looping over that column means.col(i) repeatedly
< JMarler>
Yeah that was my assumption. Hence it'd mean all Mat and Vecs in those routines being member vars which have their mem allocated in the NBC constructor. For the sake of safety it'd have to be that way
< rcurtin>
but it's hard to know without diving into the Armadillo codebase and that code is... difficult :)
< JMarler>
No problem. I figured this was going to be the case
< rcurtin>
yeah
< rcurtin>
so this is why I thought that maybe it might be easier to modify Armadillo to have its own allocator
< JMarler>
Shall have to decide whether to just mod mlpack for myself or go it alone entirely
< rcurtin>
like, at the beginning of the program, allocate a big chunk of memory, and then in memory::acquire(), just grab some of that memory
< rcurtin>
I think the Armadillo memory allocations are the vast majority of dynamic allocations inside of the classes
< JMarler>
To be honest the research project is likely to use a limited number of classifier types, maybe NBC, Kmeans, SVM and one other so going it alone might have to be the way forward.
< JMarler>
Yeah. It might be possible to use Armadillo still but I'm starting to lean more towards a couple small scale classes given all of these issues.
< JMarler>
Thanks a lot for the input anyways Ryan
< rcurtin>
sure, I'm sorry it's not working straight out of the box :)
< rcurtin>
I have to run... I'll be back later today
< JMarler>
Ahhhhh who knows. Maybe in retirement I might have the time to attempt that kind of pull request.........thankfully I'm 26 so unlikely to ever put the mlpack team through the requests!