naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedh__ has quit [Ping timeout: 240 seconds]
Anand_ has joined #mlpack
< Anand_> Marcus : I have added the code for linear regression for all libraries. Check once and then I will merge
Anand_ has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
Anand_ has quit [Ping timeout: 246 seconds]
marcus_zoq has quit [Remote host closed the connection]
< jenkins-mlpack> Yippie, build fixed!
< jenkins-mlpack> Project mlpack - nightly matrix build build #513: FIXED in 4 hr 25 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/513/
< jenkins-mlpack> * siddharth.950: Adding Regularized SVD Code
< jenkins-mlpack> * Ryan Curtin: Oops, this needed to be divided by 2.
< jenkins-mlpack> * Ryan Curtin: Use slightly safer Width().
< jenkins-mlpack> * Ryan Curtin: Use the bound's cached MinWidth() for MinimumBoundDistance().
< jenkins-mlpack> * Ryan Curtin: Test MinWidth().
< jenkins-mlpack> * Ryan Curtin: Add MinWidth(), which is a better solution than having the tree calculate it by
< jenkins-mlpack> hand.
< jenkins-mlpack> * Ryan Curtin: Fix elusive bug that only occurred in particularly rare situations.
< jenkins-mlpack> * Ryan Curtin: Add MinimumBoundDistance().
< jenkins-mlpack> This represents the minimum distance between the center of a node and any edge
< jenkins-mlpack> of the bound. Note that for ball bounds, this is equivalent to the furthest
< jenkins-mlpack> descendant distance.
< jenkins-mlpack> * saxena.udit: Minor improvement. No major functionality changes
< jenkins-mlpack> * saxena.udit: Changes are part of perceptron code review, as discussed with Ryan
marcus_zoq has joined #mlpack
Anand has joined #mlpack
< Anand> Marcus : Do we even need the feval(..) function? Can't we just use fitlm(..) and then use its return values in predict(..) to get the labels ?
marcus_z1q has joined #mlpack
marcus_z1q has quit [Remote host closed the connection]
< marcus_zoq> Anand: You can just use the predict function.
< Anand> Yeah, I did that
< marcus_zoq> Anand: Okay, good!
< Anand> Are we good for the next merge?
sumedh__ has joined #mlpack
sumedh_ has joined #mlpack
< marcus_zoq> Anand: I think so.
sumedh__ has quit [Ping timeout: 240 seconds]
< Anand> Ok, I will merge in a few minutes. Please have a look after I do that and let me know about any build failures. I will fix the bugs.
< Anand> Marcus : I am getting some errors in scikit linear regression while doing "make run LOG=False". There is some error in creating the confusion matrix, I guess. Can you have a look. I have not merged yet
< Anand> Here is the trace:
< Anand> Traceback (most recent call last): File "benchmark/run_benchmark.py", line 313, in <module> Main(args.config, args.blocks, log, args.methodBlocks, update) File "benchmark/run_benchmark.py", line 279, in Main instance.RunMetrics(options) File "./linear_regression.py", line 135, in RunMetrics File "/home/anand/GSoC/benchmarks/methods/metrics/definitions.py", line 25, in ConfusionMatrix return confusion_matrix(labels, p
< Anand> File "/usr/local/lib/python3.3/dist-packages/sklearn/metrics/metrics.py", line 115, in _check_clf_targets "".format(type_true, type_pred)) ValueError: Can't handle mix of multiclass and continuous make: *** [.run] Error 1
< Anand> Similar code works for NBC, though
Anand_ has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand_ has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
udit_s has joined #mlpack
Anand_ has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< naywhayare> udit_s: I did some research on AdaBoost, and it looks like my initial conceptions of what AdaBoost is (back when you wrote the proposal) is somewhat wrong
< naywhayare> first, I can only find evidence and implementations that suggest that only one weak learner is used
< naywhayare> I had thought that AdaBoost could use many weak learners (i.e. a decision stump and a perceptron and the naive bayes classifier and so forth)
< naywhayare> however, I also think that it would be possible to generalize AdaBoost to use multiple weak learners instead of just one -- but it's up to you whether or not you want to do that
< udit_s> naywhayare: sorry.
< udit_s> naywhayare: Yeah, I think that multiple weak learners are not that big a problem.
< udit_s> If they have a similar constructor, or wrapping class, or multiple template parameters, they can be implemented.
< udit_s> This would, I think, do away with the variadic template implementation.
< naywhayare> if we only used one weak learner, then yes, variadic templates are not necessary
< naywhayare> on the other hand, if we want multiple weak learners (which might be neat), then we will have to use variadic templates
< naywhayare> either way, maybe it is better to start with an implementation for just one weak learner and then generalize it later (if you want)
< udit_s> That way, you will need to have one format for the constructor for all of them, if I'm not mistaken.
< naywhayare> yes; that was the second point in your email
< naywhayare> I think that we should make each weak learner constructor follow the same syntax
< udit_s> Yeah. And, yeah, I'm quite interested in implementing multiple learners.
< naywhayare> but at the same time, there is another problem... suppose I want to run AdaBoost on decision stumps
< naywhayare> however, I want the decision stumps to have a minimum bin size of, say, 500 (or, something that's not the default)
< udit_s> Why not use a wrapping class ? One which has an overloaded constructor ?
< udit_s> oh, sorry, go on...
< naywhayare> because of the extra overhead associated with that... that's more code to maintain, and a user won't be able to use a weak classifier that they wrote without also writing a wrapper
< naywhayare> anyway, back to my example, we want a decision stump with a nonstandard set of parameters
< naywhayare> but if we do something like 'WeakLearner l = WeakLearner(trainingData)' (where WeakLearner is a template argument that's DecisionStump in this case)
< naywhayare> then it won't get our nonstandard set of parameters
< naywhayare> so I thought maybe one thing we could do is provide another constructor for the weak learners... something like
< naywhayare> WeakLearner(const arma::mat& trainingData, const arma::Row<size_t>& labels, const WeakLearner& other)
< naywhayare> which will take the learning parameters from 'other' and use them to train on the given data
< naywhayare> and this could work regardless of what parameters the WeakLearner needed; the AdaBoost class doesn't need to know about the parameters, it just passes the old one and its parameters to the new one
< naywhayare> does that make sense? maybe you see a better way?
< udit_s> I get that the constructor will help in bypassing the need of knowing/explicitly defining the parameters if we know the parameters of 'other'. But.
< udit_s> If you look at the constructor of decision stumps, don't we already have a constructor that helps override default parameters ? I mean, it's only in the main.cpp where we define these. Otherwise you can enter any value...
< udit_s> Or did I not get what you were saying ?
< naywhayare> the parameters for each possible weak learner are different
< naywhayare> if we write some templated code in the AdaBoost class like this:
< naywhayare> WeakLearner l(trainingData, labels, lastLearner.MinBucketSize())
< naywhayare> that's not going to work for any case where WeakLearner isn't the DecisionStump
< naywhayare> and other weak learners may have more than just one parameter
andrewmw94 has joined #mlpack
< udit_s> So the second constructor helps a new weak learner of the decision stump take parameters from another weak learner which is possibly not a decision stump ?
< naywhayare> no; when you write the AdaBoost code, you have to consider each weak learner in a generic sense
< naywhayare> you cannot write any code that is specific to a particular weak learner
< naywhayare> when you train a new weak learner of a certain type, it will use the same parameters as the old one
< naywhayare> but you cannot write any code that is specific to, say, the decision stump or the perceptron or any of the weak learners
Anand_ has quit [Ping timeout: 246 seconds]
< udit_s> exactly. got that.
< naywhayare> so if you were to create a constructor of the form WeakLearner(const arma::mat& trainingData, const arma::Row<size_t>& labels, const WeakLearner& other)
< naywhayare> then it could take the specific learner's parameters and apply them to the new learner
< naywhayare> and AdaBoost would not need to know about any of those parameters
< udit_s> Oh. Okay.
< udit_s> I think I got it.
< naywhayare> so, the last item (or, really, the first item in your email) was how to train the weak learner with the distribution D(t)
< naywhayare> it does look like we will have to modify the decision stump and the perceptron to be able to take weights into account
< naywhayare> for the perceptron, this is easy -- just make the step size taken in the learning policy for a particular point equal to its weight
< naywhayare> for the decision stump, it might be a little more difficult, but maybe you could do it by weighting the entropy calculation by a point's weight?
< naywhayare> the other possibility is to make the dataset much larger and duplicate points in such a way that the distribution of points is equal to the weighted distribution
< naywhayare> but that seems very slow...
< udit_s> higher the weight, lower the entropy ?
< naywhayare> I'm not sure... something like that. maybe instead of p(x) log p(x), something like w(x) p(x) log p(x)
< naywhayare> where w(x) is the combined weight of a particular class
< udit_s> yeah, the second one would be slow.
< udit_s> also, could you explain by what you meant about the perceptron ? instead of w = w + x, you would do w = w +weight*x ?
< naywhayare> yes, basically
< udit_s> or something like that. Okay, I think that makes sense - it wouldn't change correct instances' weights by too much,
< udit_s> and would still "focus" on the incorrect wieghts by updating their corresponding weight vectors accordingly.
< naywhayare> right
< naywhayare> once implemented, we can write a simple test for it... basically, we can generate two datasets with different linearly separable decision surfaces
< naywhayare> maybe they are two dimensional... one surface could be a line at x = 5, the other could be a line at y = 3 (or some random numbers like that)
< naywhayare> then we combine the datasets into one and train a weighted perceptron on it, weighting the first part of the dataset with weights all equal to 0
< naywhayare> and it should recover the line at y = 3
< naywhayare> then we can do the same with the second part of the dataset all weighted to 0, and it should recover the line at x = 5
< naywhayare> then we can do it again, but not with zero weights... maybe just very small weights, like 1e-4
< naywhayare> and it should still recover the line at y = 3; then do it again for the second part of the dataset and it should recover the line at x = 5
< naywhayare> does that make sense? maybe I have done a poor job of explaining it?
< udit_s> No, I think I got what you meant. But what would this test ? That it is focussing on the points we want it to focus on ?
< naywhayare> yes
< udit_s> okay.
< naywhayare> it would test that the perceptron is properly utilizing the weights we've given it
< naywhayare> and we could do something similar for the decision stump
< naywhayare> so I think I have provided an ok answer to all the questions in your email... is there anything I've missed, or any other concerns?
< udit_s> I think so, but I'll still be reiterating the answers back, just to make sure I've correctly understood your line of thought.
< naywhayare> okay
< udit_s> okay, going back to the first thing we were discussing in the beginning- so basically, the 'initial'parameters of the 'other' weak learner would be set in the weak learner arguement that is input from, say the main.cpp of adaboost
< naywhayare> right; the AdaBoost class will probably need to take an instantiated weak learner as an input argument, and then the weak learners that it trains on weighted data can use the input argument weak learner to get their parameters
< udit_s> then, based on the template parameters, and the second constructor, we could use this first blue print of a weak learner ( defined in the main file and passed to the adaboost object as an arguement ) to train other subsequent weak learners.
< naywhayare> right
< udit_s> cool. I understood the perceptron distribution scheme, but I think I'll have to think on how we could implement, a focussing scheme if you will, in a decision stump.
< naywhayare> I'll try to do a little thinking about how to do that for the decision stump. for now, we can just focus on the perceptron
< udit_s> I think I'll read up on it or try to extrapolate some of what we've discussed.
< udit_s> yeah. okay.
< udit_s> And the paper I shared, that's the .mh algorithm I'll be implementing right now. Later I'll extend this to adding the Samme.R algorithm. - I think it's just another step in the boosting iteration.
< naywhayare> yeah, I think all of the different adaboost algorithms are straightforward extensions of the original
< udit_s> Yeah.
< udit_s> Okay. Great. Anything else ? I think I'll have a basic outline ready by tonight, and I'll probably share it with you guys if I do.
< naywhayare> I don't have anything else to add at this time
< udit_s> I'll see you in a while then. :)
udit_s has quit [Quit: Leaving]
oldbeardo has joined #mlpack
Anand has joined #mlpack
< marcus_zoq> Anand: We have to cast the predicted values to integers, because the 'ConfusionMatrix' function can't deal with continiuos inputs.
< Anand> Marcus : Ok, it works now! But, will casting them to integers not affect the calculations?
< Anand> And the weka time parser is still giving errors. Can you tell me what it expects?
< marcus_zoq> Anand: You are right, we need to do a mapping.
< marcus_zoq> Anand: I will look into the issue in a few minutes.
< Anand> Sure
Anand has quit [Ping timeout: 246 seconds]
Anand has joined #mlpack
< marcus_zoq> Anand: I can't build the weka logsitic regression java code.
< marcus_zoq> Anand: You've forgotten to include classMap function. And you can't redefine the data variable.
< Anand> Are you talking about weka linear regression?
< marcus_zoq> Yes
< Anand> We dont need the class mapping for linear regression as we can directly get the predicted class
< marcus_zoq> Anand: So we can remove line: 91?
< Anand> Yes , we can!
< marcus_zoq> Anand: So now we are dealing with the following error: 'weka.classifiers.functions.Logistic: Cannot handle numeric class!'
< Anand> Ok. You said about redifining the data variable too?
< marcus_zoq> Anand: Line: 112 -> String data=""; But we alread defined 'Instances data = source.getDataSet();'
< Anand> Ok.
< marcus_zoq> Anand: Sorry, I've used the wrong data, so we aren't dealing with the mention error.
< Anand> Ok. I thought we were using a wrong weka classifier
< Anand> What is the status now?
< marcus_zoq> Anand: Good question, the weka_lr_predictions.csv looks weird -> 0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.
Anand_ has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< marcus_zoq> Anand: Now I'm talking about the logsitic regression function.
< Anand_> So, is linear regression building fine?
Anand has joined #mlpack
< marcus_zoq> Anand: Yeah, After I've removed the classMaping and fixed the data redefinition of the data variable.
< Anand> Ok.
< Anand> So, the weka_lr_probabilities is misbehaving
< Anand> I think we have a data redifinition here too
< Anand> !
< Anand> This should be the problem
< Anand> Lines : 68 and 132
Anand_ has quit [Ping timeout: 246 seconds]
< Anand> Marcus : Is it working for you?
< marcus_zoq> Anand: We should use fdata = fdata.concat(","); right?
< marcus_zoq> Anand: And fdata = fdata.concat(String.valueOf(prediction));
< Anand> Yes, you are right!
< marcus_zoq> Anand: Okay :)
< marcus_zoq> Anand: Let's check if this works
< Anand> fdata = fdata.concat(String.valueOf(probabilities[k])); and fdata=fdata.concat(",");
< Anand> Did it work?
< marcus_zoq> Okay, Kind of, if you write 'writer_predict.write(String.valueOf(predictedClass.doubleValue()));' the output is '0.00.00.00.00[..]', because we inserte a comma between the values right?
< marcus_zoq> Anand: We don't insert a comma between the values.
< Anand> Oh. yes! We need to insert a comma.
< Anand> No actually we dont need a comma for predictions
< Anand> we might need a newline though!
< Anand> We need commas only for probabs
< marcus_zoq> You are right: writer_predict.write(String.valueOf(predictedClass.doubleValue())+"\n"); works
< Anand> Ok. Great! :)
< Anand> Can you push the changes?
< Anand> I think I will not hurry for the merge. let me make sure that all bugs, if any left, are removed by tomorrow and then I will merge on Sunday. Probably, some more errors might have crept in
< marcus_zoq> Anand: Yeah, we need to modify the logsitic regression in the same way.
< Anand> We just did logistic regression, right?
< marcus_zoq> Anand: Right, the linear regression builds, but there is also the data.concat bug.
< Anand> Yeah, I will fix that!
< marcus_zoq> Anand: Let me check in the changes.
< Anand> Sure
< Anand> Marcus : Also tell me the next step. I think I also need to start with the bootstrapping framework after this merge, probably.
< marcus_zoq> Anand: I will look into the bootstrapping method on the weekend. So, I can give you a proper answer to your question, if this is okay for you?
< Anand> Sure. I will also start it after the weekend.
< Anand> Marcus : Please also check the python code for the timer parser errors during make run
< Anand> Or you can tell me and I will fix it
< marcus_zoq> Anand: I've commited everything, you are dealing with an timing error?
< Anand> Yes, try to run small_config.yaml for weka linear regression
< Anand> It gives the following error :
< Anand> [FATAL] Can't parse the data: wrong format [FATAL] Can't parse the timer Traceback (most recent call last): File "benchmark/run_benchmark.py", line 313, in <module> Main(args.config, args.blocks, log, args.methodBlocks, update) File "benchmark/run_benchmark.py", line 279, in Main instance.RunMetrics(options) File "./linear_regression.py", line 118, in RunMetrics File "/home/anand/GSoC/benchmarks/util/misc.py", line 17
< marcus_zoq> Anand: Can you send me your small_config?
< marcus_zoq> Anand: Can you also make sure that the code follows the coding guidelines?
< Anand> Check your mail.
< Anand> Which file did I miss?
< marcus_zoq> Anand: Regarding the coding guidelines?
< Anand> Yes
< marcus_zoq> Anand: Some of the java files 'String predict="";', sorry I've been picky about that...
< Anand> No. I am sorry. I will take care of that!
< Anand> :)
< marcus_zoq> Anand: The problem is weka needs files in the arff format.
< Anand> Ok. You mean no csv?
< marcus_zoq> Anand: I've written a simple conversion from csv to arff. So if you use format: [arff] the files are converted.
< Anand> So, I just need to change the yaml file?
< marcus_zoq> Anand: The problem in this case we transform the truelabels. So we need to take care of this situation.
< Anand> How?
< marcus_zoq> Anand: It should be a simple change. I can make this change in a couple of minutes. I need to give someone a ride back in a couple of minutes ...
< Anand> Ok. Sure
Anand has quit [Ping timeout: 246 seconds]
< naywhayare> oldbeardo: are you there?
< naywhayare> you said you needed some help yesterday, but I was unable to help then... I am here now
< naywhayare> although I imagine you are probably asleep; it is very late where you are...
oldbeardo_ has joined #mlpack
< oldbeardo_> naywhayare: sorry about that, net issues
< oldbeardo_> naywhayare: I will send you the code that I was having trouble with
oldbeardo has quit [Ping timeout: 246 seconds]
< naywhayare> oldbeardo_: ok, I will look into it immediately
< naywhayare> you said you are unavailable next week, so whatever I can help with now before you go, I will
< oldbeardo_> naywhayare: sent you the code, the problem lies with the second RegularizedSVD() constructor
< oldbeardo_> I'm not able to figure out what exactly
< oldbeardo_> the code gives errors on compilation
< naywhayare> regularized_svd_impl.hpp:53:39: error: no matching function for call to ‘mlpack::optimization::SGD<mlpack::svd::RegularizedSVDFunction>::SGD()’
< naywhayare> that ?
< oldbeardo_> yes
< oldbeardo_> I have no idea as to what SGD() has got to do with the function
< naywhayare> the issue is that the RegularizedSVD class has a member of type SGD<...>
< naywhayare> but SGD has no default constructor, because it requires the user to pass in an instantiated function
sumedh__ has joined #mlpack
< oldbeardo_> you mean the optimizer object?
< naywhayare> yeah
< naywhayare> in that second constructor, you haven't provided any initializer for the optimizer object, so it tries to call the default constructor (which has no arguments)
< naywhayare> but there isn't a default constructor for any optimizers
< naywhayare> I am trying to work out a solution... it will be easiest to show you what I mean by sending you a diff
< naywhayare> hang on
< oldbeardo_> well I can always scrap the 'optimizer' object from the class
< naywhayare> that's one possibility, but I would think that the idea is that the constructor sets up all the parameters for the SVD
< naywhayare> and then ::Apply() will actually run it
sumedh_ has quit [Ping timeout: 240 seconds]
< naywhayare> my solution doesn't completely work, but that's because SGD is hardcoded into the class; if you templatize it, it will work... so let me explain the idea
< naywhayare> instead of holding SGD<RegularizedSVDFunction> optimizer, hold SGD<RegularizedSVDFunction>* optimizer (a pointer)
< naywhayare> if the user calls the first constructor, set optimizer to 'new SGD<RegularizedSVDFunction>(...)'
< naywhayare> and also have the class hold a boolean, 'ownsOptimizer', which is set to true if the first constructor (where the user does not pass an optimizer) is called
< naywhayare> if the user calls the second constructor, set ownsOptimizer to false and optimizer to &batchOptimizer
< naywhayare> then create a destructor with the code 'if (ownsOptimizer) { delete optimizer; }' because the memory has to be cleaned up if the class allocated it
< naywhayare> does that idea make sense?
< naywhayare> or have I overlooked something?
< naywhayare> I wasn't able to get it to compile easily because SGD is hardcoded, but I think really 'optimizer' should be of type 'OptimizerType<RegularizedSVDFunction>*'
< oldbeardo_> right, it does make sense, I did something like that for MGS in QUIC-SVD
< naywhayare> ok; that type of approach should solve your problem, I think
< oldbeardo_> I will try it out, also I have written 3 tests for Reg SVD, I will commit it whenever I get internet
< oldbeardo_> after adding 2-3 more tests
< oldbeardo_> I also went through the PMF paper today, it seems identical to Reg SVD, except that there are two regularization parameters instead of one
< naywhayare> yes, I remember it being quite similar
< naywhayare> all of these factorization algorithms seem to be mostly the same thing with simple little tweaks here and there :)
< oldbeardo_> yes, so should there be a different class for that? because as I see it Reg SVD is a special case of PMF
< oldbeardo_> where both the parameters have the same value
< jenkins-mlpack> Starting build #2012 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare> hm... in that case no, we should just have them as one class
< naywhayare> and that one parameter can be set to 0 to produce regularized SVD amd PMF otherwise
< oldbeardo_> okay, could you confirm this by reading the paper when you get time? there may be subtleties that I may have missed
< naywhayare> yes, I will try to do that in the next week
< naywhayare> is that okay?
< oldbeardo_> sure, especially since I may not have internet for a week
< oldbeardo_> by the way, when will NIPS announce accepted papers?
< naywhayare> I don't know -- and I don't want to know
< naywhayare> if I know when I get the reviews back, I'll fret about it and get all worried
< naywhayare> much better to one day just randomly receive an email "ok, reviews are in, check them out at this URL"
< naywhayare> I know it will say it on nips.cc
< naywhayare> but if you look it up... don't tell me, because then I won't be able to forget, and then I won't get anything done in the three days leading up to it because I'll be on edge :)
oldbeardo_ has quit [Ping timeout: 246 seconds]
< marcus_zoq> naywhayare: Actually I would really like to see the nystroem method integrated into the kernel pca method, but I'm pretty stuck in work so, I think it has to wait until the next release.
< naywhayare> marcus_zoq: what else needs to be done with it? I think I might be able to find some time
< naywhayare> I seem to remember it was pretty close to done, but I wasn't sure of its exact status
< naywhayare> which is, in part, why I sent the email :)
< marcus_zoq> naywhayare: I've ended up in testing kMeans instead of random sampled points. The results are much better, so I've thought about a good way to integrate kMeans into the existing architecture without losing the ability to choose another policy like random sampling (e.g. to decrease the memory footprint).
< marcus_zoq> naywhayare: Beside that if we use the kMeans we only need the cluster centroids, but right now there is no option to return the cluster centroids without the assignments.
< naywhayare> marcus_zoq: the assignments end up getting calculated no matter what... do you want me to make a quick overload so that a user can call Cluster(data, clusters, centroids, initialGuess)?
andrewmw94 has quit [Quit: Leaving.]
andrewmw94 has joined #mlpack
< naywhayare> actually I suppose it's possible to not actually hold the list of assignments at all and still calculate k-means results
< naywhayare> at some point, after the next release, I have a bunch of refactoring to do for k-means; I wrote a dual-tree algorithm for it about a month ago, and I need to polish it and work it in
< marcus_zoq> Okay, I think at this point there is no need to add a new overloding because it remains the same. The problem is how do we integrate the kmeans policy into the existing architecture.
< jenkins-mlpack> Project mlpack - svn checkin test build #2012: SUCCESS in 1 hr 21 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2012/
< jenkins-mlpack> Ryan Curtin: Simple style changes for consistency.
< marcus_zoq> Right now you are returning the indexes of the selected points, which is great to decrease the memory footprint.
< naywhayare> oh, I see -- KMeans has no Select() function
< naywhayare> we could write a simple wrapper class to forward a call to PointSelectionPolicy::Select() to KMeans::Cluster() and return only the centroids ?
< naywhayare> KMeansSelectionPolicy or something like that...
< naywhayare> I don't know if that's the best idea, though; it's just the first one I could think of
< marcus_zoq> I think that's not my point, right now the implemented PointSelectionPolicy returns the indexes of the existing data matrix. So, we only need to hold the indexes and can assemble the mini-kernel matrix with the existing data matrix -> miniKernel(i, j) = kernel.Evaluate(data.col(selectedPoints(i)), but in case of the kMeans policy we can't just return the indexes of the selected points we need the centroids -> miniKernel(i, j) = kernel.Evaluate(ce
< marcus_zoq> The easiest thing what we can do is to return the selected points instead of the indexes, this will work for all policies, but increase the memory need in case of the random policy
< naywhayare> oh! I see what you mean now. let me think for a little while and get back to you...
imi has joined #mlpack
imi has quit [Quit: Leaving]
sumedh__ has quit [Ping timeout: 264 seconds]