< zoq> uzipaz: Have you thought about reducing the dimension e.g using PCA?
< uzipaz> zoq: the original dataset I was given had about 1600 features and 2050 samples... also it contained many missing values... we used WEKA to do feature selction, we used best first, genetic search which reduced the dataset to 150 f, 1123 s and 842 f, 1123s respectively
< uzipaz> zoq: didn't try using PCA though
< jand> Hi, i have a dataset of 60k samples, each is of dimension 4k. I train a DET, but I always get +inf as the density estimate for all training examples. I use the default params and 10 fold cross-validation. Is it due to the volume at the specific leaves being very small, such that f_N(x) becomes +inf? Thanks for any help you can give.
< rcurtin> jand: that's almost certainly what is happening, with 4k dimensions
< rcurtin> ah, too late, they already left... well, hopefully they know where to find the IRC logs...
< jandrews_> hi rcurtin
< jandrews_> i read the irc logs :)
< jandrews_> were you going to say anything else, before i lost my connection
< jandrews_> ?
< rcurtin> ah great, good to know you got the answer :)
< rcurtin> I don't really have any other suggestions... the DET volume calculations are generally done in logspace, which helps with the extremely small volumes in very large dimensions
< rcurtin> but still in 4000 dimensions the volumes will still get too small or too large
< rcurtin> maybe you could try PCA or some other dimensionality reduction technique (or even just feature selection of some sort?) to reduce the dimensionality before using DETs?
< jandrews_> ok, great. thanks for the suggestion. i had thought about PCA.
