ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
< gmanlan> I was trying to determine if this issue was before the randomness fix or not...
< gmanlan> but can't tell based on recent source code modifications
< rcurtin> gmanlan: if you want to try something real quick... can you try modifying the default template parameter for MultipleRandomDimensionSelect in src/mlpack/methods/random_forest/multiple_random_dimension_select.hpp?
< rcurtin> specifically you could modify it to sqrt(dimensionality of a dataset you will test on)
pd09041999 has joined #mlpack
< rcurtin> then if you run with --minimum_samples_leaf=1 it seems to be producing more expected behavior
< rcurtin> I need to compare with some other toolkits, and sweep the number of trees to see
< gmanlan> ok let me check
< rcurtin> I need to try another few datasets, but if this is the case then I will need to refactor so that is more readily user-selectable
< gmanlan> ok so when using MultipleRandomDimensionSelect and sqrt(dimensions), min_samples_leaf 1, it delivers a decent accuracy
< rcurtin> ok, that's good to hear, at least this is moving the right direction
< rcurtin> let me keep experimenting and doing some comparisons
< gmanlan> I'm comparing against the RandomDimensionSelect...
< gmanlan> this is interesting..
pd09041999 has quit [Remote host closed the connection]
< gmanlan> so when using RandomDimensionSelect the model will peak at 20 trees, but when using MultipleRandomDimensionSelect it will continue improving when adding more trees
< rcurtin> that sounds about right. to me it sounds like the right solution here is to (a) modify MultipleRandomDimensionSelect to allow some other values to be set
< rcurtin> (b) set minimum_leaf_size to 1 by default (also applies to decision tree)
< gmanlan> right
< gmanlan> I think that the sample code we provide which uses RandomDimensionSelect is not a happy example then
< rcurtin> agreed, I think that sample code was written before MultipleRandomDimensionSelect existed
< gmanlan> I can update it later, don't worry about it now
< gmanlan> so is RandomDimensionSelect actually useful for RF?
< rcurtin> I don't think it would be recommended by Leo Breiman, but one could use it in a situation where they wanted every level of the tree to select its split randomly
< rcurtin> it may have some use for 'extremely randomized trees', should anyone ever implement those
< gmanlan> ah yes - ok so for most of the users it would be better MultipleRandomDimensionSelect and probably using your tweak for sqrt(dimensions)
< rcurtin> yeah, agreed
< gmanlan> rcurtin: I will be back in 30' - let me know if there is anything you need for the RF fix
< rcurtin> I'm still not getting the accuracy I need to be seeing, so I am still debugging
mulx10 has joined #mlpack
< mulx10> zoq, favre49 : Great idea ! I'll also mail the details.
mulx10 has quit [Quit: Page closed]
< rcurtin> gmanlan: I'm still debugging here. it may be a little bit more. I think I see the issues, but then the fix exposed an efficiency issue
< rcurtin> about to call it a night; I'll come back to it tomorrow
mlpackuser100 has joined #mlpack
mlpackuser100 has quit [Client Quit]
mlpackuser100 has joined #mlpack
< mlpackuser100> HELP
< mlpackuser100> Hi. I am attempting to use the mlpack SVDPlusPlus(Policy) class, but I've found that it is signficiantly slower than other (even amateur) SVD++ implementations for the same parameters. Is this expected?
< mlpackuser100> I am asking in general, I geuss. Of course, if any information about my specific case is needed, I can provide it.
pd09041999 has joined #mlpack
< gmanlan> rcurtin: sounds good - let's chat tomorrow - good night
gmanlan has quit [Ping timeout: 256 seconds]
pd09041999 has quit [Ping timeout: 250 seconds]
Mulx10 has joined #mlpack
< Mulx10> jeffin143: what's your progress with Reshape layer?
< Mulx10> Thanks
jeffin143 has joined #mlpack
pd09041999 has joined #mlpack
Mulx10 has quit [Ping timeout: 256 seconds]
< jeffin143> Mulx10 : exams over by 10th , so won't work on that till 10
< ShikharJ> Toshal, Saksham: I'd like to have a conversation regarding the project if you guys are available sometime this week on the IRC? Let's decide on a time and day and devise a plan of execution?
< ShikharJ> zoq: When is the first official IRC meeting for mlpack planned?
jeffin143 has quit [Read error: Connection reset by peer]
jeffin143 has joined #mlpack
pd09041999 has quit [Ping timeout: 246 seconds]
jeffin143 has quit [Ping timeout: 248 seconds]
jeffin143 has joined #mlpack
< jeffin143> Zoq : if you free anytime , could we finish up the work with #1798 , I guess there is nothing more to be done . Thank you
govg has quit [Ping timeout: 245 seconds]
sooham has joined #mlpack
< sooham> hi guys are any gsoc projects free this summer?
< sooham> by free I mean not assigned? I would be down to implement one.
lrinelli has joined #mlpack
sooham has quit [Quit: Page closed]
mlpackuser100 has quit [Quit: Page closed]
< jenkins-mlpack2> Project docker mlpack nightly build build #318: STILL UNSTABLE in 3 hr 55 min: http://ci.mlpack.org/job/docker%20mlpack%20nightly%20build/318/
pd09041999 has joined #mlpack
pd09041999 has quit [Ping timeout: 245 seconds]
frogEye has joined #mlpack
< frogEye> I have been trying to work with categorical features
< frogEye> Is there any good sample which can help me to work with that
< frogEye> I have tried to use DatasetInfo but I have not been able to make it work properly
< frogEye> I have posted the same question with code which I have tried on stackoverflow also https://stackoverflow.com/questions/56001462/categorical-features-in-mlpack
< frogEye> any help will be highly appreciated.
pd09041999 has joined #mlpack
pd09041999 has quit [Ping timeout: 255 seconds]
pd09041999 has joined #mlpack
< rcurtin> frogEye: I saw your email but have not had a chance to respond
< frogEye> I have tried few more things, atleast it runs but I am sure its not getting categories in correct way. Some example will really help or just pointing things in the code will help.
< zoq> ShikharJ: Nothing fixed at this point, I'll send out a mail with a whenisgood link (later today).
< zoq> jeffin143: Agreed, will take another look later today.
< rcurtin> frogEye: I need to take a look at the API to suggest the right solution, but you could try modifying your dataset so that your categorical values are strings (not numeric categories)
< rcurtin> then when you load with data::Load(), it should detect and automatically set the categoricals to have type Datatype::categorical
< rcurtin> however, I am pretty sure there is a nicer way
frogEye has quit [Ping timeout: 256 seconds]
Mulx10 has joined #mlpack
frogEye has joined #mlpack
Yashwants19 has joined #mlpack
< Yashwants19> Hi rcurtin: can you please review my pr.
< Yashwants19> #1765 and #1884
< Yashwants19> Thank you :)
Yashwants19 has quit [Client Quit]
< Mulx10> rcurtin, zoq, ShikharJ : does mlpack support loading an image?
< Mulx10> I am not sure, I am not able to find.
< zoq> Mulx10: You can load PPM or PGM files; see http://arma.sourceforge.net/docs.html#save_load_mat
Mulx10 has quit [Ping timeout: 256 seconds]
pd09041999 has quit [Ping timeout: 248 seconds]
pd09041999 has joined #mlpack
Mulx10 has joined #mlpack
< Mulx10> Zoq: Most of the datasets are images are jpg or png, so loading ppm or pgm won't help.
< Mulx10> Do you think we should add support for jpeg and png images as well in mlpack?
< rcurtin> Mulx10: I think it could be useful, but finding the right API could be hard
< Mulx10> rcurtin : yes, I agree. So how should I proceed?
< Mulx10> OpenCv, lib jpeg are some options.
< Mulx10> I was thinking of extending data::Load for this task
< rcurtin> Mulx10: I don't have a problem with that, but the key is dependency management---ideally we should try to avoid introducing new dependencies if possible
< rcurtin> if OpenCV already makes it easy to support loading many images, maybe it is better to make an OpenCV converter?
Yashwants19 has joined #mlpack
Mulx10 has quit [Ping timeout: 256 seconds]
< Yashwants19> Hi rcurtin: does Extra Tree classifier is available in mlpack.
Yashwants19 has quit [Client Quit]
< rcurtin> Yashwants19: no, not at the moment. I am also very underwater with handling a lot of things so it may be a while until I am able to review pull requests in any meaningful capacity again
< jeffin143> Mulx10 , rcurtin : opencv converter sounds good, always thought that support for loading image needs to be there .
< rcurtin> yeah, the tricky part is trying to make the dependencies not a problem. we don't want to force the user to have OpenCV installed, but if it is installed we can take advantage of it
< frogEye> @rcurtin: Ya that is the problem my categorical data is numerical in nature
< frogEye> I tried looked into API but still have that issue.
< frogEye> Hopefully you can help me with this.
jeffin143 has quit [Ping timeout: 245 seconds]
jeffin143 has joined #mlpack
Yashwants19 has joined #mlpack
< Yashwants19> No problem rcurtin.
Yashwants19 has quit [Client Quit]
frogEye has quit [Quit: Page closed]
poojasakhare has joined #mlpack
< poojasakhare> hello,
< poojasakhare> i had some queries, can i get some help ?
poojasakhare has quit [Quit: Page closed]
< jeffin143> poojasakhare : sure :) , just post the queries one will get to you with the relevant answer
< jeffin143> rcurtin : didn't we apply for gsod this time, I fondly remember that you mentioned it in your mail
lrinelli has quit [Quit: Page closed]
< rcurtin> jeffin143: I didn't apply for GSoD, I'm not sure anyone else had time to. my memory of the meeting where we talked about it is that everyone thought it was a good idea but nobody had the time to do it (correct me if I'm wrong---I could be!)
< jeffin143> Rcurtin
< jeffin143> Umm , there was lot of Network lag , so I just went through the mail regarding the discussion
< jeffin143> Would have been good if we would have applied, could have improved a bit on documenta
< jeffin143> Documentation* part , anyways are we up for Google code in..???
gmanlan has joined #mlpack
< zoq> That's what I remember as well. Writing good documentation is tricky especially if you aren't familiar with the codebase, so I think GSoD does take a huge amount of time from the mentor side to produce something that is useful at the end.
< zoq> We should interview some of the Shogun guys and ask how it went.
sreenik has joined #mlpack
< sreenik> Opencv, fortunately has accepted a proposal solely based on data augmentation. So having an optional opencv dependency might benefit
< rcurtin> sreenik: an optional dependency is completely fine with me
sreenik has quit [Remote host closed the connection]
< gmanlan> sreenik: what kind of capability do you need for images? I have been working with steg images for a long time...
pd09041999 has quit [Ping timeout: 246 seconds]
sreenik has joined #mlpack
pd09041999 has joined #mlpack
lozhnikov has quit [Ping timeout: 244 seconds]
< sreenik> gmanlan: What I mean are things like random cropping, rotation, brightness, contrast adjustments, etc, which are generally done when train data is limited
< gmanlan> uhm ok, and you need that to be part of mlpack for some reason?
lozhnikov has joined #mlpack
< sreenik> gmanlan: Do you think otherwise? I am talking about test time augmentation as well.
< gmanlan> I have little info sorry, I'm not sure what's the goal of your project
pd09041999 has quit [Excess Flood]
pd09041999 has joined #mlpack
< zoq> I guess the question here is, is it necessary that this has to be a part of mlpack or can it be used in combination with mlpack. If I just wanted to load images, I could use openCV or something like that an pass that to some mlpack function that just takes the input as arma::mat or arma::cube.
< gmanlan> right, that's my point - but I don't know the intent of the project...
< sreenik> zoq: You are right. Loading is fine, not sure about test-time augmentation though (because that happens inside the predict function). Anyway, this discussion doesn't have much relevance now since opencv's augmentation module is not yet complete
pd09041999 has quit [Ping timeout: 258 seconds]
lozhnikov has quit [Quit: ZNC 1.7.3 - https://znc.in]
sreenik has quit [Quit: Page closed]
lozhnikov has joined #mlpack
< rcurtin> sreenik: yeah, so if OpenCV gives us everything we need out of the box, then we can just use it in an example
< rcurtin> if not, maybe we can write some support functionalty with something like '#ifdef OPENCV_INSTALLED' and handle that part via CMake
petris has quit [Ping timeout: 250 seconds]
petris_ has joined #mlpack
< zoq> mlpackuser1: Hm, it might be worth to test out other optimizer settings, e.g. increase the batchsize, also https://github.com/mlpack/ensmallen/issues/108 might be relevant. Perhaps you can share the dataset?
< zoq> sooham: Hey, anything not listed here: https://summerofcode.withgoogle.com/organizations/5868789747417088/#projects is unassigned. I guess there are some projects listen on the GSoC page that are somewhat open ended as well.
gmanlan has quit [Ping timeout: 256 seconds]