ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
gaulishcoin has quit [Read error: Connection reset by peer]
< RishabhGarg108Gi>
I think this macro is provided by catch and this makes very easy for you to perform various checks while writing some unit test. You can have a look at https://github.com/mlpack/mlpack/issues/2523 to appreciate how handy this macro is.
yashwants19[m] has joined #mlpack
< AakashkaushikGit>
Hey @zoq , I will be having some breaks between exams so i will keep updating that ann_exp repository.
gaulishcoin0 has joined #mlpack
< iamarchit123Gitt>
@RishabhGarg108 if we are using REQUIRE,TEST_CODE from CATCH in our own custom code apart from including catch.hpp any library we need to link to while compiling our code @RishabhGarg108
ImQ009 has joined #mlpack
< RishabhGarg108Gi>
No. All you need is the catch.hpp . This header will be sufficient.
< zoq>
AakashkaushikGit: Sounds good, I will work on it as well.
Sunanda has quit [Remote host closed the connection]
< rcurtin[m]>
wow, only 1m40s to compile the PCA example with templight... it finished as soon as I left the meeting
< rcurtin[m]>
but `dot` is having more trouble turning the graphviz graph into an image... :)
Antonhr has joined #mlpack
< Antonhr>
I was able to compile cmake -D DEBUG=ON -D PROFILE=ON ../; make -j 4 with no problems after increasing the /swapfile on my Linux machine.
< Antonhr>
The compilation was dipping into the swap file claiming up to 11GB of it in addition to my 16GB RAM. So my initial 2GB /swapfile did not stand a chance.
< iamarchit123Gitt>
how much you allocated for Swapfile and what was the available free RAM when you started the compilation
< rcurtin[m]>
Antonhr: wow, I'm surprised it needed to be so big. we've been digging into memory usage over here too; still tracking down what our major culprits are
< iamarchit123Gitt>
11GB :O
< Antonhr>
I created a 16GB /swapfile and it worked since I have an SSD no slowdown at all. Able to browse and do other work while waiting for it to finish.
< Antonhr>
I had about 11GB unused RAM when I started, but around 50% to 80% of the compilation it eats it all up and starts borrowing from the swap. I was watching it just for fun.
< Antonhr>
Using the swap was way faster and cheaper than upgrading the laptop with another 16GB. So I am glad it worked fine.
< Antonhr>
free -h total used free shared buff/cache availableMem: 15Gi 14Gi 169Mi 53Mi 504Mi 319MiSwap: 15Gi 11Gi 4.9Gi
< Antonhr>
Worst case when 11GB of swap was claimed.
< shrit[m]>
Agreed, we are working on reducing the memory consmption during compilation. By the way, it is always recommended to have equal amount of RAM and swap in linux :+1:
< rcurtin[m]>
shrit: really? this must be new (...or newer than when I started just sizing all my swap at ~2GB in 2006...)
< shrit[m]>
I feel I can no longer compile mlpack on my laptop, only on my workstation
< rcurtin[m]>
I never went and updated my understanding of how to set swap sizes, so maybe I should do it differently in the future :)
< Antonhr>
I wonder why the default swap size for Ubuntu 20.04 is 2GB if I have 16GB of RAM.
< iamarchit123Gitt>
If someone has HDD will it be smoothless considering HDD has slower access than SSD which is faster and has speed comparable to RAM if i am not wrong
< Antonhr>
Modern HDDs have caches, so it may not be that bad, but with GB size accesses the SSD will be performing better.
< shrit[m]>
I remember that the rule was swarm = ram * 2 for small RAM, this is good if I am using my old 64 MB RAM laptop.
< shrit[m]>
swap*
< shrit[m]>
In our today measure 8 GB should be the minumum for a laptop or any workstation, since most of them are shipped with Windows
< shrit[m]>
So the same amount for swap should at least be equal. knowing that most laptop and PC are sold with 512GB -> 1TB of HDD, 8 GB of swap should not be a big loss
< Antonhr>
yes, even 16GB of swap seems not a great loss out of 1TB
Antonhr has quit [Remote host closed the connection]
< shrit[m]>
I put 18 GB of swap on my workstation. Knowning that my RAM is 32GB, I have rarely seen it being used, only once I was compiling a physiques engine, it used about 12 GB of swap, total RAM usage of 44GB.
< rcurtin[m]>
ha! after 5 hours my `dot` run of the template call graph for `PCA<>` finished. It gave this output:
< rcurtin[m]>
```
< rcurtin[m]>
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.000267811 to fit
< rcurtin[m]>
```
< rcurtin[m]>
and it produced an image that's 32766 pixels wide and 1 pixel tall
< rcurtin[m]>
so... not very useful...
< rcurtin[m]>
although, maybe better than the alternative; without scaling, it would have a resolution 12M x 375 and take 34GB to represent in memory (not sure how good png encoding is)
< shrit[m]>
it is very strange
< shrit[m]>
I do not know how much information are generated, but this is a lot for only PCA
< shrit[m]>
this should be around 5.5K * 5.5K pixel for a one screen
< shrit[m]>
one image
< rcurtin[m]>
according to the text output there are 176k template instantiations during PCA compilation
< rcurtin[m]>
of those, 67k are in `std::`
< abernauer[m]>
that sounds like a problem lol
< zoq>
did the bitmap export work for the fibonacci example?
< rcurtin[m]>
wait, sorry, I got that backwards, *108k* of those are in `std::`
< rcurtin[m]>
~35k are in `arma::`
< rcurtin[m]>
and yeah, bitmap export worked just fine for the fibonacci example
< zoq>
hm, does that mean there are ~33k template instantiations for PCA?
< rcurtin[m]>
I'm not sure yet
ImQ009 has quit [Quit: Leaving]
< shrit[m]>
For this number of insantiation, I would award gcc a prize, it is very fast though
< shrit[m]>
they need a trophy :+1:
< rcurtin[m]>
ha! I am trying running templight again, filtering out all instantiations in system headers
< rcurtin[m]>
hopefully this will give something a bit smaller and more manageable
< shrit[m]>
All of these instantiation are reducing running time.
< rcurtin[m]>
ok, I think that I have read enough of the templight-tools documentation to understand what I am looking at, and I have learned enough about kcachegrind to understand its results
< rcurtin[m]>
it seems that quite some time is spent in compiling cereal internals (xml.hpp, json.hpp), but to me this is a little confusing as I never actually used any cereal functionality in the example program
< rcurtin[m]>
a somewhat significant amount of time is spent instantiating Armadillo classes (no surprise there)
< shrit[m]>
my browser is on fire when loading the pca.cg, I will download it
< rcurtin[m]>
wow, it is trying to load that directly? probably downloading and then opening with kcachegrind is the right thing
< shrit[m]>
actually it is on fire when loading only the text
< rcurtin[m]>
just ballparking: ~20-30% of time is spent in cereal headers; ~2-5% of time in STL headers; ~15-25% of time in Armadillo headers... if I'm reading that right
< rcurtin[m]>
now that I understand this, let me try again with the RNN example
< shrit[m]>
I can see it right now
< rcurtin[m]>
one of the things that I am struggling to understand right now is why we are instantiating all this cereal stuff when we aren't even using it in that example code
< shrit[m]>
I can see all of it
< shrit[m]>
I have no idea
< shrit[m]>
but we are including cereal.
< shrit[m]>
So I do not know if this counts, because we are including it in the core
< rcurtin[m]>
yeah, we are including it, but that should not cause types to be instantiated, it should just be parsed, that's all
< rcurtin[m]>
there must be some class somewhere we are instantiating, which requires some cereal type or something
< shrit[m]>
Actually we can not see the code on you website
< shrit[m]>
but I can think about when loading the data, we are calling cereal
< rcurtin[m]>
hmm, I am not calling `data::Load()` though
< shrit[m]>
Maybe aramdillo
< shrit[m]>
armadillo*
< shrit[m]>
because we are adding serialize function to armadillo that the reason we include armadillo later.
< shrit[m]>
So if we include arma we include cereal
< rcurtin[m]>
maybe? my understanding is that those are template functions, so only if we actually call them will things be instantiated
< rcurtin[m]>
I'm trying to look up the entire call chain to see what the highest-level parent of the cereal instantiations are
< rcurtin[m]>
I can't find any tool that can load the full graph visualization of instantiations... so all I seem to be able to use is kcachegrind
< rcurtin[m]>
so I don't know how to get the answer to the question of why we are instantiating things in cereal
< rcurtin[m]>
I think the best idea I have for how to do this is to slowly remove includes from `core.hpp` and see when cereal stops showing up...
< shrit[m]>
When I look at the elf object in Kcachegrind, I can see only the cereal:JSONInputArchive
< shrit[m]>
which will include everything else.
< rcurtin[m]>
I'm wondering if it is getting instantiated as part of things like `HasSerialize` and the other things we are using like that with SFINAE
< shrit[m]>
Maybe
< rcurtin[m]>
I got the RNN example done too... the boost visitor stuff completely dwarfs the cereal time
< rcurtin[m]>
I'm uploading the callgraph file now
< rcurtin[m]>
whereas cereal was ~20-30% in the PCA example, here it appears to be... 2-3%?
< rcurtin[m]>
just a few of the `*_visitor_impl.hpp` seem to account for 60% of the template instantiation time
< shrit[m]>
omg
< shrit[m]>
I believe this is normal, when thinking about 5 minutes to compile a file
< rcurtin[m]>
you know... I think one "quick fix" (but it is not a great fix) would be to merge a bunch of the visitors
< rcurtin[m]>
that kind of breaks the entire visitor idea, of course... since you want to have one visitor type per task
< rcurtin[m]>
a lot of the pain appears to be under `variant<LayerTypes>::apply_visitor(VisitorType)` for lots of different `VisitorType`s
< shrit[m]>
I know, and finally only one type is required
< rcurtin[m]>
I'm going to go do some other things for a while, but I also want to try compiling `knn_main.cpp`, since that uses `KNNModel` which also uses `boost::visitor`
< rcurtin[m]>
it would be interesting to see the behavior there, and also that is a much simpler example to adapt to see the effect on compilation times (but I am not sure it will tell us too much about the ANN experiment, it will just give some small idea)
< shrit[m]>
Agreed
< shrit[m]>
I will need a lot of time until we see something out from the ann experiment.
< shrit[m]>
I can not see visitor is going before 6 or 10 months from now
< zoq>
If the inheritance approach does not introduce any big slowdowns, I expect we can remove the visitor approach at least from the ann codebase in way less time.
< zoq>
I try to run some experiments over the weekend.