#mlpack on 2018-08-26 — irc logs at libera.irclog.whitequark.org

2018-08-06 04:28 ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.

06:10 < cult-> mlpack's impl against princomp shows huge performance gain, why armadillo's implementation so slow?

06:16 ImQ009 has joined #mlpack

06:26 vivekp has joined #mlpack

07:05 ImQ009 has quit [Quit: Leaving]

07:21 < cult-> zoq: https://prod.sandia.gov/techlib-noauth/access-control.cgi/2007/076422.pdf

07:36 ImQ009 has joined #mlpack

08:53 miqlas has joined #mlpack

08:53 < miqlas> hi and good morning.

08:56 < miqlas> i got problems during building mlpack on Haiku. It configures and starts to build, but then: http://0x0.st/stsB.txt

08:56 < miqlas> it is strange. Afaik hdf5 is not a direct dependency for mlpack, but trough armadillo (but fixme)

08:58 < miqlas> that path /packages/hdf5-1.10.1-2/.self/develop/headers/hdf5.h surely doesn't exist ,as the current installed version is never than that.

08:58 < miqlas> It should be: /packages/hdf5-1.10.2-1/.self/develop/headers/hdf5.h

08:58 < miqlas> But i have absolutely no idea where does this path comes from.

08:58 < miqlas> Any idea?

08:59 < miqlas> It is a freshly extracted tarball, so it is not something residue from earlier configure-run.

10:48 Xenthys10 has joined #mlpack

10:51 Xenthys10 has quit [Remote host closed the connection]

10:54 duoi9 has joined #mlpack

10:56 duoi9 has quit [Remote host closed the connection]

12:14 vivekp has quit [Ping timeout: 268 seconds]

12:30 vivekp has joined #mlpack

12:47 vivekp has quit [Ping timeout: 252 seconds]

12:50 vivekp has joined #mlpack

14:15 < rcurtin> miqlas: good morning, long time no see :)

14:16 < rcurtin> I looked at the build output, I am wondering if maybe the Haiku Armadillo package is referencing hdf5 in an improper location

14:16 < rcurtin> you could check whether ARMA_USE_HDF5 is defined in the Armadillo file armadillo_bits/config.hpp

14:17 < rcurtin> cult-: there are a couple different ways to do PCA, some involve eigendecomposition and some involve SVD, amd some work directly on the data matrix and some work on the computed covariance

14:18 < rcurtin> each of these are faster or slower in different cases, and then also mlpack has some non-direct iterative solvers which can be very fast too

14:25 < miqlas> rcurtin: hi

14:26 < miqlas> it is because you guys provides releases so infrequently.

14:26 < miqlas> :)

14:26 < miqlas> we are in the last march before the beta, so every port should be rechecked, build against the beta candidates, and so on.

14:27 < miqlas> btw, the problem with mlpack solved in the meantime, arma was the culprit. I don't know how and where, but it using somewhere an absolute path for hdf5 headers, but hdf5 was updated in the meantime this it was not found.

14:29 < miqlas> rcurtin: are you guys working on hpc, or hpc-class workstation? ml is pretty *PU intensive *={g,c}

14:34 < miqlas> I should update mlpack to the latest... and enable python, right?

15:27 < rcurtin> ah, sorry---I stepped out for a bit

15:28 < rcurtin> my guess for Armadillo is that the path to the include got hard-coded, maybe a find-and-replace in the <armadillo> header or something?

15:28 < rcurtin> anyway glad you got it sorted out

15:28 < rcurtin> personally, I work on a reasonably powerful desktop, think 8-core i7 with 16GB of RAM

15:29 < rcurtin> Armadillo can be configured to use GPUs via the nvblas package, but I don't personally use it

15:29 < rcurtin> machine learning is not hugely computationally expensive for everything---deep learning is a good example of something that is really expensive though :)

15:41 < miqlas> rcurtin: we don't have nvidia drivers, so nvblas and gpu acceleratiuon is out of question for now for Haiku, but i would like to know, what would be important for a HPC OS, to see if Haiku qualified for this role (because driver not yet, but maybe the HPC users have different use cases, where Haiku could be relevant.

15:41 < miqlas> Like the dependencyes in the packages (ok, apt does the same)

15:42 < miqlas> or the fact that the packages would be never really extracted, just virtually, so if one deletes mlpack or something else there will be no resideu files.

15:42 < miqlas> *residue

16:03 < cult-> i have written a sign correction that compares the current and previous eigenvalues and if they are negatively correlated by more than 50% percent, reverse the signs in the matrix.

16:09 < cult-> yeah maybe it's not a very good idea ..

16:41 < rcurtin> miqlas: definitely nvidia cublas and related packages would be necessary, at least for data scientists and machine learning people

16:41 < rcurtin> i.e. whatever's needed to run tensorflow or cntk or similar, since it makes such a difference

16:44 < miqlas> rcurtin: thanks for the info. Sadly i cannot provide you nv CL or other gpu acceleration.. Still thanks for the info

16:46 < zoq> cult-: The idea is relative simple:

16:46 < zoq> arma::rowvec signs = arma::sign(arma::max(eigvec, 0));

16:46 < zoq> eigvec.each_row() %= signs;

16:46 < zoq> after the svd call in the exact svd method is basically all you need; just have to make sure I use the right operation each_row or each_col

16:48 < rcurtin> miqlas: sure, just letting you know :)

16:50 < miqlas> rcurtin: i assume admins of HPC clusters are really really knowing what they do. Maybe, just maybe the packages what and how Haiku cna provide them could be useful for them. correct me if i'm wrong.

16:51 < miqlas> OFC noobody would do a 3000 core Haiku supercomputer (Haiku support only 64 core in default, but thats only a constant, so one can extend it if it is required, even to 3000,

16:54 < miqlas> so we have an OS but we have (better to say I) don't have any idea what we could do with it. So the question is, could the possibilities what Haiku can provide be useful for HPC guys or not.

16:55 < miqlas> I assume they have ti fight with every depdndecies, even a libc could be hard to get. But I have no idea about the HPC world, so sorry if i'm talking stupid.

17:21 < rcurtin> right, so actually the HPC world is just a little bit different than the machine learning world

17:21 < rcurtin> (so people who use TensorFlow often aren't doing it on a giant cluster)

17:21 < rcurtin> and I am not 100% familiar with what is done there and what packages are needed, unfortunately

17:21 < rcurtin> maybe someone else here is, I'm not sure, but I don't know how much help I can be :(

17:22 < miqlas> rcurtin: you are already a big help to me.

17:22 < miqlas> thanks.

17:37 < rcurtin> sure, happy to help where I can :)

17:43 < miqlas> no everybody is so open to new platfdorms and patches, i have to say.

17:43 < miqlas> the blender guys are awesome, but some arent't.

17:44 < miqlas> some project says: We will deprecate BeOs Haiku support if none keeps them up to date. I sent my patches and the PR is still open after almost a year.

17:45 < miqlas> So you guys are great.

18:17 < rcurtin> that's disappointing, sorry you have to work with people like that

18:18 < rcurtin> from my end most of the compatibility patches, etc., are super simple to review and merge :)

18:18 < rcurtin> also I like unusual and different OSes and architectures so I find it fun to help out :)

18:38 < cult-> zoq: where are you planning to put this piece of code?

20:05 < zoq> cult-: We could either provide an utility function or add another parameter to the PCA class.

20:05 < zoq> Also, just realized that we have to take the absolute value over the eigvec.

20:07 < zoq> This is what I currently use: https://gist.github.com/zoq/63fba0e1ce0b471e93c271133fe6b77a

20:19 ImQ009 has quit [Quit: Leaving]

20:30 < cult-> zoq: looks good i will try it tomorrow

20:32 < cult-> If works, the place is def inside the pca class, that's the primary place someone would look for it

20:33 Shikhar has joined #mlpack

20:34 < Shikhar> zoq: Are you there?

20:37 < zoq> Shikhar: I'm here

20:38 < Shikhar> zoq: I made some basic changes to the CycleGAN PR (couldn't get much time to implement the complete GAN), and the higher level functions still remain. Just wanted to know your opinion if the current structure is the right way to go, with two of discriminators and generators each.

20:41 < zoq> Shikhar: The structure looks good to me, passsing two generators/discriminators is super easy.

20:43 < zoq> Also, it's rally close to the existing GAN framework, so if someone used the GAN class it's easy enough to switch.

20:44 < Shikhar> zoq: Thaks for the quick review. I'll complete this as I get some time.

20:44 < zoq> Excited to run some tests :)

20:45 < zoq> Shikhar: How did the interview go?

20:46 < Shikhar> Ah yes. For the case of CycleGAN probably MNIST to SVHN should be a good starting test. Like here: https://github.com/yunjey/mnist-svhn-transfer

20:47 < Shikhar> zoq: Ah, didn't make it. Passed a couple of rounds, but got rejected on the third. Still applying to places, probably even to professors at Georgia Tech, seeing if they have any open positions.

20:48 < zoq> Agreed, nice task.

20:49 vivekp has quit [Ping timeout: 272 seconds]

20:49 < zoq> Shikhar: Sorry to hear that; Perhaps Ryan could provide some insight.

20:50 < Shikhar> zoq: Nah it was anyways a backup option, my main focus is on research based roles for now, and as such academic institutions.

20:53 < zoq> I see, so perhaps we should see if you could publish the GAN work in one or another form, maybe there is an interesting workshop.

20:55 < Shikhar> zoq: Oh yeah, I completely forgot about that. I was only thinking about a journal publication in a journal like PeerJ, but that would probably have to wait for a 4.0 release to happen.

20:55 < zoq> Would be interesting to show a fast CPU implementation, the numbers we have right now are quite good.

20:57 < Shikhar> zoq: Yeah, benchmarking the code should be a priority for now.

20:57 < zoq> Shikhar: Agreed, with some 'real world' datasets.

21:01 Shikhar has quit [Quit: Page closed]

21:09 < rcurtin> Shikhar: zoq: so, there is a NIPS workshop on machine learning open source software this year; it did get accepted

21:09 < rcurtin> so it's possible that the GAN work could be published there. I think the website is still under construction

21:10 < rcurtin> I can't review the submission or anything (and I think really I shouldn't be involved at all) since I am one of the organizers, but I'd be happy to provide some input on it

21:10 < rcurtin> maybe there is some possibility there

21:11 < rcurtin> also, sorry to hear the interview didn't go great; is it an internship or full-time position you are looking for? I think the skills you've shown this summer are in high demand, so I think it should be possible to find something good sooner or later :)

21:13 < rcurtin> I guess I am too late, but I think Shikhar looks at the logs :)

21:37 < zoq> rcurtin: Sounds interesting.

21:38 < zoq> cult-: Let me know if that works for you.

21:42 < zoq> https://nips.cc/Conferences/2018/Schedule?showEvent=10920 looks like the website isn't up yet

21:49 < rcurtin> right, as far as I know it's still being put together