ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
gaulishcoin has quit [Read error: Connection reset by peer]
< RishabhGarg108Gi> I think this macro is provided by catch and this makes very easy for you to perform various checks while writing some unit test. You can have a look at https://github.com/mlpack/mlpack/issues/2523 to appreciate how handy this macro is.
yashwants19[m] has joined #mlpack
< AakashkaushikGit> Hey @zoq , I will be having some breaks between exams so i will keep updating that ann_exp repository.
gaulishcoin0 has joined #mlpack
< iamarchit123Gitt> @RishabhGarg108 if we are using REQUIRE,TEST_CODE from CATCH in our own custom code apart from including catch.hpp any library we need to link to while compiling our code @RishabhGarg108
ImQ009 has joined #mlpack
< RishabhGarg108Gi> No. All you need is the catch.hpp . This header will be sufficient.
< zoq> AakashkaushikGit: Sounds good, I will work on it as well.
Sunanda has joined #mlpack
< zoq> Hello everyone, the casual mlpack video meeting starts in about 15 minutes - https://zoom.us/j/3820896170, for more details checkout the community page - https://www.mlpack.org/community.html
Sunanda has quit [Remote host closed the connection]
< rcurtin[m]> wow, only 1m40s to compile the PCA example with templight... it finished as soon as I left the meeting
< rcurtin[m]> but `dot` is having more trouble turning the graphviz graph into an image... :)
Antonhr has joined #mlpack
< Antonhr> I was able to compile cmake -D DEBUG=ON -D PROFILE=ON ../; make -j 4 with no problems after increasing the /swapfile on my Linux machine.
< Antonhr> The compilation was dipping into the swap file claiming up to 11GB of it in addition to my 16GB RAM. So my initial 2GB /swapfile did not stand a chance.
< iamarchit123Gitt> how much you allocated for Swapfile and what was the available free RAM when you started the compilation
< rcurtin[m]> Antonhr: wow, I'm surprised it needed to be so big. we've been digging into memory usage over here too; still tracking down what our major culprits are
< iamarchit123Gitt> 11GB :O
< Antonhr> I created a 16GB /swapfile and it worked since I have an SSD no slowdown at all. Able to browse and do other work while waiting for it to finish.
< Antonhr> I had about 11GB unused RAM when I started, but around 50% to 80% of the compilation it eats it all up and starts borrowing from the swap. I was watching it just for fun.
< Antonhr> Using the swap was way faster and cheaper than upgrading the laptop with another 16GB. So I am glad it worked fine.
< Antonhr> free -h total used free shared buff/cache availableMem: 15Gi 14Gi 169Mi 53Mi 504Mi 319MiSwap: 15Gi 11Gi 4.9Gi
< Antonhr> Worst case when 11GB of swap was claimed.
< shrit[m]> Agreed, we are working on reducing the memory consmption during compilation. By the way, it is always recommended to have equal amount of RAM and swap in linux :+1:
< rcurtin[m]> shrit: really? this must be new (...or newer than when I started just sizing all my swap at ~2GB in 2006...)
< shrit[m]> I feel I can no longer compile mlpack on my laptop, only on my workstation
< rcurtin[m]> I never went and updated my understanding of how to set swap sizes, so maybe I should do it differently in the future :)
< Antonhr> I wonder why the default swap size for Ubuntu 20.04 is 2GB if I have 16GB of RAM.
< iamarchit123Gitt> If someone has HDD will it be smoothless considering HDD has slower access than SSD which is faster and has speed comparable to RAM if i am not wrong
< Antonhr> Modern HDDs have caches, so it may not be that bad, but with GB size accesses the SSD will be performing better.
< shrit[m]> I remember that the rule was swarm = ram * 2 for small RAM, this is good if I am using my old 64 MB RAM laptop.
< shrit[m]> swap*
< shrit[m]> In our today measure 8 GB should be the minumum for a laptop or any workstation, since most of them are shipped with Windows
< shrit[m]> So the same amount for swap should at least be equal. knowing that most laptop and PC are sold with 512GB -> 1TB of HDD, 8 GB of swap should not be a big loss
< Antonhr> yes, even 16GB of swap seems not a great loss out of 1TB
Antonhr has quit [Remote host closed the connection]
< shrit[m]> I put 18 GB of swap on my workstation. Knowning that my RAM is 32GB, I have rarely seen it being used, only once I was compiling a physiques engine, it used about 12 GB of swap, total RAM usage of 44GB.
< rcurtin[m]> ha! after 5 hours my `dot` run of the template call graph for `PCA<>` finished. It gave this output:
< rcurtin[m]> ```
< rcurtin[m]> dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.000267811 to fit
< rcurtin[m]> ```
< rcurtin[m]> and it produced an image that's 32766 pixels wide and 1 pixel tall
< rcurtin[m]> so... not very useful...
< rcurtin[m]> although, maybe better than the alternative; without scaling, it would have a resolution 12M x 375 and take 34GB to represent in memory (not sure how good png encoding is)
< shrit[m]> it is very strange
< shrit[m]> I do not know how much information are generated, but this is a lot for only PCA
< shrit[m]> this should be around 5.5K * 5.5K pixel for a one screen
< shrit[m]> one image
< rcurtin[m]> according to the text output there are 176k template instantiations during PCA compilation
< rcurtin[m]> of those, 67k are in `std::`
< abernauer[m]> that sounds like a problem lol
< zoq> did the bitmap export work for the fibonacci example?
< rcurtin[m]> wait, sorry, I got that backwards, *108k* of those are in `std::`
< rcurtin[m]> ~35k are in `arma::`
< rcurtin[m]> and yeah, bitmap export worked just fine for the fibonacci example
< zoq> hm, does that mean there are ~33k template instantiations for PCA?
< rcurtin[m]> I'm not sure yet
ImQ009 has quit [Quit: Leaving]
< shrit[m]> For this number of insantiation, I would award gcc a prize, it is very fast though
< shrit[m]> they need a trophy :+1:
< rcurtin[m]> ha! I am trying running templight again, filtering out all instantiations in system headers
< rcurtin[m]> hopefully this will give something a bit smaller and more manageable
< shrit[m]> All of these instantiation are reducing running time.
< rcurtin[m]> ok, I think that I have read enough of the templight-tools documentation to understand what I am looking at, and I have learned enough about kcachegrind to understand its results
< rcurtin[m]> it seems that quite some time is spent in compiling cereal internals (xml.hpp, json.hpp), but to me this is a little confusing as I never actually used any cereal functionality in the example program
< rcurtin[m]> https://www.ratml.org/misc/pca.cg (you can load that with kcachegrind)
< rcurtin[m]> and here's the code: https://www.ratml.org/misc/pca.cpp
< rcurtin[m]> a somewhat significant amount of time is spent instantiating Armadillo classes (no surprise there)
< shrit[m]> my browser is on fire when loading the pca.cg, I will download it
< rcurtin[m]> wow, it is trying to load that directly? probably downloading and then opening with kcachegrind is the right thing
< shrit[m]> actually it is on fire when loading only the text
< rcurtin[m]> just ballparking: ~20-30% of time is spent in cereal headers; ~2-5% of time in STL headers; ~15-25% of time in Armadillo headers... if I'm reading that right
< rcurtin[m]> now that I understand this, let me try again with the RNN example
< shrit[m]> I can see it right now
< rcurtin[m]> one of the things that I am struggling to understand right now is why we are instantiating all this cereal stuff when we aren't even using it in that example code
< shrit[m]> I can see all of it
< shrit[m]> I have no idea
< shrit[m]> but we are including cereal.
< shrit[m]> So I do not know if this counts, because we are including it in the core
< rcurtin[m]> yeah, we are including it, but that should not cause types to be instantiated, it should just be parsed, that's all
< rcurtin[m]> there must be some class somewhere we are instantiating, which requires some cereal type or something
< shrit[m]> Actually we can not see the code on you website
< rcurtin[m]> wrong filename
< shrit[m]> but I can think about when loading the data, we are calling cereal
< rcurtin[m]> hmm, I am not calling `data::Load()` though
< shrit[m]> Maybe aramdillo
< shrit[m]> armadillo*
< shrit[m]> because we are adding serialize function to armadillo that the reason we include armadillo later.
< shrit[m]> So if we include arma we include cereal
< rcurtin[m]> maybe? my understanding is that those are template functions, so only if we actually call them will things be instantiated
< rcurtin[m]> I'm trying to look up the entire call chain to see what the highest-level parent of the cereal instantiations are
< rcurtin[m]> I can't find any tool that can load the full graph visualization of instantiations... so all I seem to be able to use is kcachegrind
< rcurtin[m]> so I don't know how to get the answer to the question of why we are instantiating things in cereal
< rcurtin[m]> I think the best idea I have for how to do this is to slowly remove includes from `core.hpp` and see when cereal stops showing up...
< shrit[m]> When I look at the elf object in Kcachegrind, I can see only the cereal:JSONInputArchive
< shrit[m]> which will include everything else.
< rcurtin[m]> I'm wondering if it is getting instantiated as part of things like `HasSerialize` and the other things we are using like that with SFINAE
< shrit[m]> Maybe
< rcurtin[m]> I got the RNN example done too... the boost visitor stuff completely dwarfs the cereal time
< rcurtin[m]> I'm uploading the callgraph file now
< rcurtin[m]> whereas cereal was ~20-30% in the PCA example, here it appears to be... 2-3%?
< rcurtin[m]> just a few of the `*_visitor_impl.hpp` seem to account for 60% of the template instantiation time
< shrit[m]> omg
< shrit[m]> I believe this is normal, when thinking about 5 minutes to compile a file
< rcurtin[m]> you know... I think one "quick fix" (but it is not a great fix) would be to merge a bunch of the visitors
< rcurtin[m]> that kind of breaks the entire visitor idea, of course... since you want to have one visitor type per task
< rcurtin[m]> a lot of the pain appears to be under `variant<LayerTypes>::apply_visitor(VisitorType)` for lots of different `VisitorType`s
< shrit[m]> I know, and finally only one type is required
< rcurtin[m]> I'm going to go do some other things for a while, but I also want to try compiling `knn_main.cpp`, since that uses `KNNModel` which also uses `boost::visitor`
< rcurtin[m]> it would be interesting to see the behavior there, and also that is a much simpler example to adapt to see the effect on compilation times (but I am not sure it will tell us too much about the ANN experiment, it will just give some small idea)
< shrit[m]> Agreed
< shrit[m]> I will need a lot of time until we see something out from the ann experiment.
< shrit[m]> I can not see visitor is going before 6 or 10 months from now
< zoq> If the inheritance approach does not introduce any big slowdowns, I expect we can remove the visitor approach at least from the ann codebase in way less time.
< zoq> I try to run some experiments over the weekend.