ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
< gmanlan>
I was trying to determine if this issue was before the randomness fix or not...
< gmanlan>
but can't tell based on recent source code modifications
< rcurtin>
gmanlan: if you want to try something real quick... can you try modifying the default template parameter for MultipleRandomDimensionSelect in src/mlpack/methods/random_forest/multiple_random_dimension_select.hpp?
< rcurtin>
specifically you could modify it to sqrt(dimensionality of a dataset you will test on)
pd09041999 has joined #mlpack
< rcurtin>
then if you run with --minimum_samples_leaf=1 it seems to be producing more expected behavior
< rcurtin>
I need to compare with some other toolkits, and sweep the number of trees to see
< gmanlan>
ok let me check
< rcurtin>
I need to try another few datasets, but if this is the case then I will need to refactor so that is more readily user-selectable
< gmanlan>
ok so when using MultipleRandomDimensionSelect and sqrt(dimensions), min_samples_leaf 1, it delivers a decent accuracy
< rcurtin>
ok, that's good to hear, at least this is moving the right direction
< rcurtin>
let me keep experimenting and doing some comparisons
< gmanlan>
I'm comparing against the RandomDimensionSelect...
< gmanlan>
this is interesting..
pd09041999 has quit [Remote host closed the connection]
< gmanlan>
so when using RandomDimensionSelect the model will peak at 20 trees, but when using MultipleRandomDimensionSelect it will continue improving when adding more trees
< rcurtin>
that sounds about right. to me it sounds like the right solution here is to (a) modify MultipleRandomDimensionSelect to allow some other values to be set
< rcurtin>
(b) set minimum_leaf_size to 1 by default (also applies to decision tree)
< gmanlan>
right
< gmanlan>
I think that the sample code we provide which uses RandomDimensionSelect is not a happy example then
< rcurtin>
agreed, I think that sample code was written before MultipleRandomDimensionSelect existed
< gmanlan>
I can update it later, don't worry about it now
< gmanlan>
so is RandomDimensionSelect actually useful for RF?
< rcurtin>
I don't think it would be recommended by Leo Breiman, but one could use it in a situation where they wanted every level of the tree to select its split randomly
< rcurtin>
it may have some use for 'extremely randomized trees', should anyone ever implement those
< gmanlan>
ah yes - ok so for most of the users it would be better MultipleRandomDimensionSelect and probably using your tweak for sqrt(dimensions)
< rcurtin>
yeah, agreed
< gmanlan>
rcurtin: I will be back in 30' - let me know if there is anything you need for the RF fix
< rcurtin>
I'm still not getting the accuracy I need to be seeing, so I am still debugging
mulx10 has joined #mlpack
< mulx10>
zoq, favre49 : Great idea ! I'll also mail the details.
mulx10 has quit [Quit: Page closed]
< rcurtin>
gmanlan: I'm still debugging here. it may be a little bit more. I think I see the issues, but then the fix exposed an efficiency issue
< rcurtin>
about to call it a night; I'll come back to it tomorrow
mlpackuser100 has joined #mlpack
mlpackuser100 has quit [Client Quit]
mlpackuser100 has joined #mlpack
< mlpackuser100>
HELP
< mlpackuser100>
Hi. I am attempting to use the mlpack SVDPlusPlus(Policy) class, but I've found that it is signficiantly slower than other (even amateur) SVD++ implementations for the same parameters. Is this expected?
< mlpackuser100>
I am asking in general, I geuss. Of course, if any information about my specific case is needed, I can provide it.
pd09041999 has joined #mlpack
< gmanlan>
rcurtin: sounds good - let's chat tomorrow - good night
gmanlan has quit [Ping timeout: 256 seconds]
pd09041999 has quit [Ping timeout: 250 seconds]
Mulx10 has joined #mlpack
< Mulx10>
jeffin143: what's your progress with Reshape layer?
< Mulx10>
Thanks
jeffin143 has joined #mlpack
pd09041999 has joined #mlpack
Mulx10 has quit [Ping timeout: 256 seconds]
< jeffin143>
Mulx10 : exams over by 10th , so won't work on that till 10
< ShikharJ>
Toshal, Saksham: I'd like to have a conversation regarding the project if you guys are available sometime this week on the IRC? Let's decide on a time and day and devise a plan of execution?
< ShikharJ>
zoq: When is the first official IRC meeting for mlpack planned?
jeffin143 has quit [Read error: Connection reset by peer]
jeffin143 has joined #mlpack
pd09041999 has quit [Ping timeout: 246 seconds]
jeffin143 has quit [Ping timeout: 248 seconds]
jeffin143 has joined #mlpack
< jeffin143>
Zoq : if you free anytime , could we finish up the work with #1798 , I guess there is nothing more to be done . Thank you
govg has quit [Ping timeout: 245 seconds]
sooham has joined #mlpack
< sooham>
hi guys are any gsoc projects free this summer?
< sooham>
by free I mean not assigned? I would be down to implement one.
< rcurtin>
frogEye: I saw your email but have not had a chance to respond
< frogEye>
I have tried few more things, atleast it runs but I am sure its not getting categories in correct way. Some example will really help or just pointing things in the code will help.
< zoq>
ShikharJ: Nothing fixed at this point, I'll send out a mail with a whenisgood link (later today).
< zoq>
jeffin143: Agreed, will take another look later today.
< rcurtin>
frogEye: I need to take a look at the API to suggest the right solution, but you could try modifying your dataset so that your categorical values are strings (not numeric categories)
< rcurtin>
then when you load with data::Load(), it should detect and automatically set the categoricals to have type Datatype::categorical
< rcurtin>
however, I am pretty sure there is a nicer way
frogEye has quit [Ping timeout: 256 seconds]
Mulx10 has joined #mlpack
frogEye has joined #mlpack
Yashwants19 has joined #mlpack
< Yashwants19>
Hi rcurtin: can you please review my pr.
< Yashwants19>
#1765 and #1884
< Yashwants19>
Thank you :)
Yashwants19 has quit [Client Quit]
< Mulx10>
rcurtin, zoq, ShikharJ : does mlpack support loading an image?
< Mulx10>
Zoq: Most of the datasets are images are jpg or png, so loading ppm or pgm won't help.
< Mulx10>
Do you think we should add support for jpeg and png images as well in mlpack?
< rcurtin>
Mulx10: I think it could be useful, but finding the right API could be hard
< Mulx10>
rcurtin : yes, I agree. So how should I proceed?
< Mulx10>
OpenCv, lib jpeg are some options.
< Mulx10>
I was thinking of extending data::Load for this task
< rcurtin>
Mulx10: I don't have a problem with that, but the key is dependency management---ideally we should try to avoid introducing new dependencies if possible
< rcurtin>
if OpenCV already makes it easy to support loading many images, maybe it is better to make an OpenCV converter?
Yashwants19 has joined #mlpack
Mulx10 has quit [Ping timeout: 256 seconds]
< Yashwants19>
Hi rcurtin: does Extra Tree classifier is available in mlpack.
Yashwants19 has quit [Client Quit]
< rcurtin>
Yashwants19: no, not at the moment. I am also very underwater with handling a lot of things so it may be a while until I am able to review pull requests in any meaningful capacity again
< jeffin143>
Mulx10 , rcurtin : opencv converter sounds good, always thought that support for loading image needs to be there .
< rcurtin>
yeah, the tricky part is trying to make the dependencies not a problem. we don't want to force the user to have OpenCV installed, but if it is installed we can take advantage of it
< frogEye>
@rcurtin: Ya that is the problem my categorical data is numerical in nature
< frogEye>
I tried looked into API but still have that issue.
< frogEye>
Hopefully you can help me with this.
jeffin143 has quit [Ping timeout: 245 seconds]
jeffin143 has joined #mlpack
Yashwants19 has joined #mlpack
< Yashwants19>
No problem rcurtin.
Yashwants19 has quit [Client Quit]
frogEye has quit [Quit: Page closed]
poojasakhare has joined #mlpack
< poojasakhare>
hello,
< poojasakhare>
i had some queries, can i get some help ?
poojasakhare has quit [Quit: Page closed]
< jeffin143>
poojasakhare : sure :) , just post the queries one will get to you with the relevant answer
< jeffin143>
rcurtin : didn't we apply for gsod this time, I fondly remember that you mentioned it in your mail
lrinelli has quit [Quit: Page closed]
< rcurtin>
jeffin143: I didn't apply for GSoD, I'm not sure anyone else had time to. my memory of the meeting where we talked about it is that everyone thought it was a good idea but nobody had the time to do it (correct me if I'm wrong---I could be!)
< jeffin143>
Rcurtin
< jeffin143>
Umm , there was lot of Network lag , so I just went through the mail regarding the discussion
< jeffin143>
Would have been good if we would have applied, could have improved a bit on documenta
< jeffin143>
Documentation* part , anyways are we up for Google code in..???
gmanlan has joined #mlpack
< zoq>
That's what I remember as well. Writing good documentation is tricky especially if you aren't familiar with the codebase, so I think GSoD does take a huge amount of time from the mentor side to produce something that is useful at the end.
< zoq>
We should interview some of the Shogun guys and ask how it went.
sreenik has joined #mlpack
< sreenik>
Opencv, fortunately has accepted a proposal solely based on data augmentation. So having an optional opencv dependency might benefit
< rcurtin>
sreenik: an optional dependency is completely fine with me
sreenik has quit [Remote host closed the connection]
< gmanlan>
sreenik: what kind of capability do you need for images? I have been working with steg images for a long time...
pd09041999 has quit [Ping timeout: 246 seconds]
sreenik has joined #mlpack
pd09041999 has joined #mlpack
lozhnikov has quit [Ping timeout: 244 seconds]
< sreenik>
gmanlan: What I mean are things like random cropping, rotation, brightness, contrast adjustments, etc, which are generally done when train data is limited
< gmanlan>
uhm ok, and you need that to be part of mlpack for some reason?
lozhnikov has joined #mlpack
< sreenik>
gmanlan: Do you think otherwise? I am talking about test time augmentation as well.
< gmanlan>
I have little info sorry, I'm not sure what's the goal of your project
pd09041999 has quit [Excess Flood]
pd09041999 has joined #mlpack
< zoq>
I guess the question here is, is it necessary that this has to be a part of mlpack or can it be used in combination with mlpack. If I just wanted to load images, I could use openCV or something like that an pass that to some mlpack function that just takes the input as arma::mat or arma::cube.
< gmanlan>
right, that's my point - but I don't know the intent of the project...
< sreenik>
zoq: You are right. Loading is fine, not sure about test-time augmentation though (because that happens inside the predict function). Anyway, this discussion doesn't have much relevance now since opencv's augmentation module is not yet complete
< rcurtin>
sreenik: yeah, so if OpenCV gives us everything we need out of the box, then we can just use it in an example
< rcurtin>
if not, maybe we can write some support functionalty with something like '#ifdef OPENCV_INSTALLED' and handle that part via CMake
petris has quit [Ping timeout: 250 seconds]
petris_ has joined #mlpack
< zoq>
mlpackuser1: Hm, it might be worth to test out other optimizer settings, e.g. increase the batchsize, also https://github.com/mlpack/ensmallen/issues/108 might be relevant. Perhaps you can share the dataset?