rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
aakashi2001 has joined #mlpack
aakashi2001 has quit [Ping timeout: 240 seconds]
PriyanshuGupta[m has quit [Quit: You have been kicked for being idle]
<jonathanplatkie4> Hi! I'd like to remove the following message about Armadillo/mlpack during compilation:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/6e773b6ed06c1eeed46249e85ec746b969aec5ec)
aakashi2001 has joined #mlpack
aakashi2001 has quit [Changing host]
aakashi2001 has joined #mlpack
_whitelogger has joined #mlpack
aakashi2001 has quit [Ping timeout: 250 seconds]
<rcurtin[m]> 🤦 🤦 🤦 I discovered today that in my comparison scripts where I'm comparing PyTorch against the refactored mlpack convolution code and varying batch size, that I never actually use the batch size in the PyTorch scripts and so I have been comparing all this time against a batch size of 1, no matter what I set it to in my scripts
<JeffinSam[m]> Gsoc has arrived :)
<shrit[m]> Oh gosh
<jjb[m]> Whoops.
<shrit[m]> rcurtin: but did the results of batch size 1 in mlpack are similar to batch size =1 in pytorch
<JeffinSam[m]> Feb 7-21 :)
<rcurtin[m]> yeah, batch size 1 for mlpack and PyTorch are pretty much the same
<shrit[m]> why the batch size did not change in the script
<shrit[m]> ?
<rcurtin[m]> but I noticed that mlpack's performance degraded (when the learning rate is held constant) when the batch size increased. that observation makes sense, but I noticed that PyTorch seemingly did not degrade with increasing batch size
<rcurtin[m]> haha, because I wrote `batch_size = int(sys.argv[1])` but then never used the variable anywhere 😂 😂
<rcurtin[m]> now I am running the simulations correctly, and it seems like results are pretty much the same between pytorch and mlpack. I need to finish them before I am sure, but I think everything is working right here
<rcurtin[m]> ok, here are my simulation results for the MNIST CNN example and a similar implementation in PyTorch:... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/d8332791d66dba1ac848a4cdc4b53acab6daa508)
<rcurtin[m]> so, I can finally move on to the next thing 😄
<zoq[m]1> Do you start with the same initial weights?
<rcurtin[m]> they aren't the exact same; my guess is that's probably what's different here
<rcurtin[m]> but mostly my goal here is just to make sure something isn't horribly wrong, and that seems to be true
<rcurtin[m]> my training was also for only 1 epoch, and I used the same learning rate for both libraries. I suspect if I ran until convergence and tuned the learning rate for both libraries, that I would be able to produce the same performance
<zoq[m]1> Agreed, just wanted to make sure I interpret the results correctly, because the results start to differ quite a bit with a higher batch-size.