ChanServ changed the topic of #mlpack to: "Due to ongoing spam on freenode, we've muted unregistered users. See for more information, or also you could join #mlpack-temp and chat there."
< cult-> mlpack's impl against princomp shows huge performance gain, why armadillo's implementation so slow?
ImQ009 has joined #mlpack
vivekp has joined #mlpack
ImQ009 has quit [Quit: Leaving]
ImQ009 has joined #mlpack
miqlas has joined #mlpack
< miqlas> hi and good morning.
< miqlas> i got problems during building mlpack on Haiku. It configures and starts to build, but then:
< miqlas> it is strange. Afaik hdf5 is not a direct dependency for mlpack, but trough armadillo (but fixme)
< miqlas> that path /packages/hdf5-1.10.1-2/.self/develop/headers/hdf5.h surely doesn't exist ,as the current installed version is never than that.
< miqlas> It should be: /packages/hdf5-1.10.2-1/.self/develop/headers/hdf5.h
< miqlas> But i have absolutely no idea where does this path comes from.
< miqlas> Any idea?
< miqlas> It is a freshly extracted tarball, so it is not something residue from earlier configure-run.
Xenthys10 has joined #mlpack
Xenthys10 has quit [Remote host closed the connection]
duoi9 has joined #mlpack
duoi9 has quit [Remote host closed the connection]
vivekp has quit [Ping timeout: 268 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 252 seconds]
vivekp has joined #mlpack
< rcurtin> miqlas: good morning, long time no see :)
< rcurtin> I looked at the build output, I am wondering if maybe the Haiku Armadillo package is referencing hdf5 in an improper location
< rcurtin> you could check whether ARMA_USE_HDF5 is defined in the Armadillo file armadillo_bits/config.hpp
< rcurtin> cult-: there are a couple different ways to do PCA, some involve eigendecomposition and some involve SVD, amd some work directly on the data matrix and some work on the computed covariance
< rcurtin> each of these are faster or slower in different cases, and then also mlpack has some non-direct iterative solvers which can be very fast too
< miqlas> rcurtin: hi
< miqlas> it is because you guys provides releases so infrequently.
< miqlas> :)
< miqlas> we are in the last march before the beta, so every port should be rechecked, build against the beta candidates, and so on.
< miqlas> btw, the problem with mlpack solved in the meantime, arma was the culprit. I don't know how and where, but it using somewhere an absolute path for hdf5 headers, but hdf5 was updated in the meantime this it was not found.
< miqlas> rcurtin: are you guys working on hpc, or hpc-class workstation? ml is pretty *PU intensive *={g,c}
< miqlas> I should update mlpack to the latest... and enable python, right?
< rcurtin> ah, sorry---I stepped out for a bit
< rcurtin> my guess for Armadillo is that the path to the include got hard-coded, maybe a find-and-replace in the <armadillo> header or something?
< rcurtin> anyway glad you got it sorted out
< rcurtin> personally, I work on a reasonably powerful desktop, think 8-core i7 with 16GB of RAM
< rcurtin> Armadillo can be configured to use GPUs via the nvblas package, but I don't personally use it
< rcurtin> machine learning is not hugely computationally expensive for everything---deep learning is a good example of something that is really expensive though :)
< miqlas> rcurtin: we don't have nvidia drivers, so nvblas and gpu acceleratiuon is out of question for now for Haiku, but i would like to know, what would be important for a HPC OS, to see if Haiku qualified for this role (because driver not yet, but maybe the HPC users have different use cases, where Haiku could be relevant.
< miqlas> Like the dependencyes in the packages (ok, apt does the same)
< miqlas> or the fact that the packages would be never really extracted, just virtually, so if one deletes mlpack or something else there will be no resideu files.
< miqlas> *residue
< cult-> i have written a sign correction that compares the current and previous eigenvalues and if they are negatively correlated by more than 50% percent, reverse the signs in the matrix.
< cult-> yeah maybe it's not a very good idea ..
< rcurtin> miqlas: definitely nvidia cublas and related packages would be necessary, at least for data scientists and machine learning people
< rcurtin> i.e. whatever's needed to run tensorflow or cntk or similar, since it makes such a difference
< miqlas> rcurtin: thanks for the info. Sadly i cannot provide you nv CL or other gpu acceleration.. Still thanks for the info
< zoq> cult-: The idea is relative simple:
< zoq> arma::rowvec signs = arma::sign(arma::max(eigvec, 0));
< zoq> eigvec.each_row() %= signs;
< zoq> after the svd call in the exact svd method is basically all you need; just have to make sure I use the right operation each_row or each_col
< rcurtin> miqlas: sure, just letting you know :)
< miqlas> rcurtin: i assume admins of HPC clusters are really really knowing what they do. Maybe, just maybe the packages what and how Haiku cna provide them could be useful for them. correct me if i'm wrong.
< miqlas> OFC noobody would do a 3000 core Haiku supercomputer (Haiku support only 64 core in default, but thats only a constant, so one can extend it if it is required, even to 3000,
< miqlas> so we have an OS but we have (better to say I) don't have any idea what we could do with it. So the question is, could the possibilities what Haiku can provide be useful for HPC guys or not.
< miqlas> I assume they have ti fight with every depdndecies, even a libc could be hard to get. But I have no idea about the HPC world, so sorry if i'm talking stupid.
< rcurtin> right, so actually the HPC world is just a little bit different than the machine learning world
< rcurtin> (so people who use TensorFlow often aren't doing it on a giant cluster)
< rcurtin> and I am not 100% familiar with what is done there and what packages are needed, unfortunately
< rcurtin> maybe someone else here is, I'm not sure, but I don't know how much help I can be :(
< miqlas> rcurtin: you are already a big help to me.
< miqlas> thanks.
< rcurtin> sure, happy to help where I can :)
< miqlas> no everybody is so open to new platfdorms and patches, i have to say.
< miqlas> the blender guys are awesome, but some arent't.
< miqlas> some project says: We will deprecate BeOs Haiku support if none keeps them up to date. I sent my patches and the PR is still open after almost a year.
< miqlas> So you guys are great.
< rcurtin> that's disappointing, sorry you have to work with people like that
< rcurtin> from my end most of the compatibility patches, etc., are super simple to review and merge :)
< rcurtin> also I like unusual and different OSes and architectures so I find it fun to help out :)
< cult-> zoq: where are you planning to put this piece of code?
< zoq> cult-: We could either provide an utility function or add another parameter to the PCA class.
< zoq> Also, just realized that we have to take the absolute value over the eigvec.
ImQ009 has quit [Quit: Leaving]
< cult-> zoq: looks good i will try it tomorrow
< cult-> If works, the place is def inside the pca class, that's the primary place someone would look for it
Shikhar has joined #mlpack
< Shikhar> zoq: Are you there?
< zoq> Shikhar: I'm here
< Shikhar> zoq: I made some basic changes to the CycleGAN PR (couldn't get much time to implement the complete GAN), and the higher level functions still remain. Just wanted to know your opinion if the current structure is the right way to go, with two of discriminators and generators each.
< zoq> Shikhar: The structure looks good to me, passsing two generators/discriminators is super easy.
< zoq> Also, it's rally close to the existing GAN framework, so if someone used the GAN class it's easy enough to switch.
< Shikhar> zoq: Thaks for the quick review. I'll complete this as I get some time.
< zoq> Excited to run some tests :)
< zoq> Shikhar: How did the interview go?
< Shikhar> Ah yes. For the case of CycleGAN probably MNIST to SVHN should be a good starting test. Like here:
< Shikhar> zoq: Ah, didn't make it. Passed a couple of rounds, but got rejected on the third. Still applying to places, probably even to professors at Georgia Tech, seeing if they have any open positions.
< zoq> Agreed, nice task.
vivekp has quit [Ping timeout: 272 seconds]
< zoq> Shikhar: Sorry to hear that; Perhaps Ryan could provide some insight.
< Shikhar> zoq: Nah it was anyways a backup option, my main focus is on research based roles for now, and as such academic institutions.
< zoq> I see, so perhaps we should see if you could publish the GAN work in one or another form, maybe there is an interesting workshop.
< Shikhar> zoq: Oh yeah, I completely forgot about that. I was only thinking about a journal publication in a journal like PeerJ, but that would probably have to wait for a 4.0 release to happen.
< zoq> Would be interesting to show a fast CPU implementation, the numbers we have right now are quite good.
< Shikhar> zoq: Yeah, benchmarking the code should be a priority for now.
< zoq> Shikhar: Agreed, with some 'real world' datasets.
Shikhar has quit [Quit: Page closed]
< rcurtin> Shikhar: zoq: so, there is a NIPS workshop on machine learning open source software this year; it did get accepted
< rcurtin> so it's possible that the GAN work could be published there. I think the website is still under construction
< rcurtin> I can't review the submission or anything (and I think really I shouldn't be involved at all) since I am one of the organizers, but I'd be happy to provide some input on it
< rcurtin> maybe there is some possibility there
< rcurtin> also, sorry to hear the interview didn't go great; is it an internship or full-time position you are looking for? I think the skills you've shown this summer are in high demand, so I think it should be possible to find something good sooner or later :)
< rcurtin> I guess I am too late, but I think Shikhar looks at the logs :)
< zoq> rcurtin: Sounds interesting.
< zoq> cult-: Let me know if that works for you.
< zoq> looks like the website isn't up yet
< rcurtin> right, as far as I know it's still being put together