verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has quit [Ping timeout: 276 seconds]
yudi has joined #mlpack
< yudi> #newbie I am new to this org
< yudi> I would like to know about this org
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
vivekp has quit [Changing host]
vivekp has joined #mlpack
yudi has quit [Ping timeout: 260 seconds]
yudi has joined #mlpack
yudi has quit [Client Quit]
yudi has joined #mlpack
< yudi> hello, I am new to this org I wanted to start contributing in this org
yudi has quit [Quit: Page closed]
yudi has joined #mlpack
< yudi> hello can anyone brief me about my role in contributing to this org
yudi has quit [Quit: Page closed]
alsc has joined #mlpack
< alsc> hey there! I have been using rmsprop as optimizer for ann. with learning rate 0.002 and alpha: 0.9 sometimes it doesn’t converge… can you suggest another optimizer?
< alsc> getting some warnings:
< alsc> g++ --version g++ (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
< alsc> last thing: I can’t understan how to use the Serialize method to save and load an ANN
alexscc has joined #mlpack
< alexscc> I think I have found how to do it, needs to use directly boost::serialization
< rcurtin> alexscc: yeah, you can either use boost::serialization or you can use data::Load()
< rcurtin> i.e. data::Load("filename.bin", "model_name", model)
< rcurtin> I don't think the warnings you're getting are a problem, but probably they should be fixed at some point...
< rcurtin> when you say RMSprop doesn't converge, is the error jumping around to huge values? maybe it needs a smaller learning rate
< rcurtin> you could also try one of the other optimizers... there are a lot now, see src/mlpack/core/optimizers/
< rcurtin> many have basically the same API as RMSprop because they are SGD-like
< rcurtin> O a;sp cam
< rcurtin> oops, fingers off by one
< rcurtin> I also can't reproduce the Log::Info issue; I used this code: https://pastebin.com/dHehAZLq
< alexscc> rcurtin: yes rmsprop jumps of thousands, but the strange thing is that some similar configurations converge quickly
< alexscc> others don’t
< rcurtin> if I compile with 'g++ -DNDEBUG -o test test.cpp -lmlpack' (with include directories set right), I get output '[INFO ] hello!'
< alexscc> is NDEBUG needed_
< rcurtin> doesn't seem like it, even if I don't have -DNDEBUG it still displays output
< alexscc> because at least with xcode I don’t define it. maybe CMake defines it automatically?
< alexscc> weird. I am in the middle of a change but Iàll definitely test it again
< rcurtin> CMake defines it when mlpack is compiled; sometimes NDEBUG can be used in lower-level libraries for optimizations
< rcurtin> but I don't think mlpack uses it directly
< rcurtin> whether or not Log::Debug output is shown is dependent only on the DEBUG macro, but here we're using Log::Info not Log::Debug anyway
< alexscc> ok. I will check again. sorry if it was a false alarm but I was pretty sure about it. will let you know
< rcurtin> no worries---it's possible also that it is a problem on your system but not on mine, then we can figure out what's different between the two systems and whether or not it is a bug :)
< alexscc> heh ok :)
< alexscc> anyway, rmsprop should be comparable to ada* no? I’ll test them as well anyway
< alexscc> do you have a sec for a question about basic statistics? :)
< rcurtin> I think so, but my guess is that in your case the loss surface is dependent on a lot of things, so it's hard to say what will work best :)
< rcurtin> and yeah, sure, I can try and answer
< alexscc> I have this 70dim dataset, 15 classes, around 3000 samples. with KNN, NCA, ANN I can’t get a cross validation greater than 90%
< alexscc> samples are annotated
< alexscc> the only description I got for now out of the db is the eigenvectors, the variance and the means
< alexscc> I am pretty sure that there are outliars that mess up all the dataset and I’d like to spot them
< alexscc> is there a classic way of getting statistics about a dataset, in form of numbers maybe? I have used 2D pca to plot it but it’s not a great result… squashing on 2D is quite nice bt definitely not useful for finding outliars
< rcurtin> hmmm, there is the 'mlpack_preprocess_describe' program which will print some statistics about the dataset
< rcurtin> but I am not sure how helpful it would be for finding outliers
< alexscc> nie.. :)
< alexscc> nice*
< rcurtin> one idea for finding them might be to use furthest neighbor search
< rcurtin> if a point is an outlier, it'll commonly be the furthest neighbor of other points
< rcurtin> so you could run all-k-furthest-neighbor search and analyze the results to see which points are appearing quite a lot
< rcurtin> however, a furthest neighbor result is *not necessarily* an outlier
< alexscc> PCA showed quite good class separation. they were definitely overlapping but I could notice that the clusters are quite well defined
< rcurtin> but, running KFN might help give you some ideas of those points which could be outliers, then maybe you could run some additional test or something (like see if the point's distance to its nearest neighbor is large or something)
< alexscc> ok, I see. sounds good
< rcurtin> hmmm, what if you run KNN after using PCA to reduce the dimensionality?
< rcurtin> since distances factor in every dimension, maybe some of the less important dimensions are just "confusing" the distances and making the classes less separable
< alexscc> I thought about it
< alexscc> the eigenvalues are descending almost exponentially, so I think that after one third they can be considered noise
< alexscc> my other idea was for the outliars was plotting using t-sne
< alexscc> how does that sound?
< rcurtin> yeah, and I guess if the eigenvalues are very small compared to the main eigenvalues, it should not affect the distance computations much
< rcurtin> hmm, t-sne could work as an embedding, but I wonder if you could then use that embedding directly to learn on and maybe you would get better accuracy
< rcurtin> there are a lot of outlier detection techniques out there, but I'm not particularly well-versed in it to be honest, so I don't have any suggestions that would be better than looking at the wikipedia page for 'outlier detection' and picking a technique :)
< alexscc> haha ok ok
< alexscc> well thanks
< alexscc> I’ll try those two mlpack methods
< rcurtin> KFN might be a bit of a roundabout way to do outlier detection, but it's at least already implemented and ready to use
< rcurtin> you might consider using approximate k-furthest neighbor because it's likely to be a lot faster... it can be found in src/mlpack/methods/approx_kfn/
< alexscc> knn is already blazing fast with mpack
< alexscc> mlpack
< rcurtin> great to hear :)
< alexscc> I also wanto to try and use the malahanobis distance instead of the euclidian distance in KNN just to check if crossvalidation accuracy goes up
< alexscc> it should be. if there are outliars, normalization will be making a mess
< rcurtin> the mahalanobis distance with an identity covariance matrix will be equivalent to the euclidean distance, so unless you'll plug in the results of NCA or something I don't think it will have an effect
< alexscc> ahhh yes
< alexscc> true
< rcurtin> in my timing simulations I found that it's actually faster to multiply the data by a transformation matrix instead of trying to use the mahalanobis distance with the equivalent covariance matrix
< alexscc> new question now: is there any linear programming method like dynamic time warping around, ready to be used?
< rcurtin> not inside of mlpack, unfortunately :(
< alexscc> I have my own implementation but… I need to transpose everything :)
< alexscc> column vs row major
< rcurtin> you could also write a small function to convert between an Armadillo column-major format and the row-major format that you're using
< alexscc> I am using deques everywhere 💀
< alexscc> deque<vector> nice huh :/
< rcurtin> it gets the job done :)
< rcurtin> still you could probably write a function to fill the deque with the armadillo matrix contents and not need to rewrite everything, I don't imagine it would be too complex
< alexscc> yeah
< alexscc> I wonder if there’s a list or projects around, besides the tutorials, that show the usage of all those nice methods that are in mlpack
< alexscc> most things are way above my head, would be a fun way to learn those techniques
< alexscc> I mean, projects using mlpack
< alexscc> I miss google code search
< alexscc> github’s search for these things is basically helpless
< rcurtin> sorry, stepped out for a second...
< rcurtin> I don't know too many projects that are directly using mlpack, but I haven't looked too closely
< rcurtin> I remember one, though, let me dig up the name
< alexscc> nice, images
< alexscc> i am working on audio
< alexscc> maybe I’ll write something then :(
< alexscc> :)
< alexscc> ah weird, I get a very deep template error when trying to serialize ANN with boost
< alexscc> boost::archive::text_oarchive o(ofs);
< alexscc> o << BOOST_SERIALIZATION_NVP(model);
< alexscc> where model is a FFN<NegativeLogLikelihood<> >
< rcurtin> ugh, boost::serialization template errors are the worst
< rcurtin> FFN<> should have a 'serialize()' method now, not a 'Serialize()' method that has to be used with a shim, so that code should work fine with the latest git master branch (as of probably three or four weeks ago)
< alexscc> I followed recurrent_network_test
< alexscc> I pulled 30 minutes ago :)
< rcurtin> ah, ok
< alexscc> /usr/local/include/boost/serialization/access.hpp:116:11: No member named 'serialize' in 'arma::Mat<double>'
< rcurtin> do you want to pastebin some of the errors? I may recognize the issue
< rcurtin> oh
< alexscc> heh, veery deep
< rcurtin> I bet you are including <armadillo> before <mlpack/core.hpp> :)
< rcurtin> and I also bet that a warning is even displayed about that :)
< alexscc> yup :)
< rcurtin> so Armadillo doesn't natively have boost::serialization functionality
< alexscc> let’s see if that makes the trick
< rcurtin> so we "shim" it in using some nice functionality armadillo has
< rcurtin> but in order for that to work, we have to include armadillo in a special way, hence the ordering restriction
< alexscc> I see
< alexscc> hey is there a way to instantiate the templates in a separate file?
< alexscc> compiling takes so long
< alexscc> :p
< rcurtin> yeah, you can use 'extern templates', we do this for the data::Load() functions
< rcurtin> see src/mlpack/core/data/load.cpp (and the relevant sections that declare those instantiations in load.hpp)
< rcurtin> I think using ccache can also help, but to be honest I have never taken the time to use ccache
< alexscc> ok!
< alexscc> cool
< alexscc> serialization error is still there, different now I’ll take a better look
< alexscc> holy wall of template errors
< alexscc> linker error though
< alexscc> Undefined symbols for architecture x86_64:
< alexscc> "boost::serialization::typeid_system::extended_type_info_typeid_0::type_register(std::type_info const&)", referenced from:
< rcurtin> are you linking against boost_serialization? (i.e. -lboost_serialization)
< alexscc> :) sorryyy
< alexscc> succeded
< rcurtin> :)
< alexscc> thanks a lot
< rcurtin> if you're on ubuntu 16.04, do be aware of this issue: https://bugs.launchpad.net/ubuntu/+source/boost/+bug/1583805
< rcurtin> sure, happy to help
< rcurtin> that issue will only hit if you're on ubuntu 16.04, using boost 1.58, and de-serializing and re-serializing an object (and even then it only happens if the object is holding std::vectors somewhere)
< alexscc> ahh ok cool I’ll keep that in mind
< rcurtin> yeah, we ran into this issue on travis some time back, took a little while to get to the bottom of it
alsc has quit [Quit: alsc]
alexscc has quit [Quit: alexscc]
alsc has joined #mlpack
alsc has quit [Quit: alsc]
petris has joined #mlpack
hsg has joined #mlpack
hsg has quit [Ping timeout: 260 seconds]