verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Ping timeout: 276 seconds]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#3885 (master - 38d2155 : Marcus Edel): The build has errored.
< daivik>
rcurtin: In continuation to the discussion yesterday: I think I may have found a fix for the problem we were facing, but I don't know whether it should be a permanent solution - please let me know what you think. So, when we left off - we knew Emission distribution had dimensionality 0 for some reason. I printed the emission distribution like you s
< daivik>
uggested and found that:
< daivik>
1. emission.size() was two times the number of states. I later found that the reason for this was a stray emission.resize() right before the loading step (in hmm_model.hpp). Commenting this out corrected the size of the emission vector.
< daivik>
2. Each emission[i] had dimensionality one more than the expected. For some reason (which I'm yet to figure out), when a DiscreteDistribution is saved (using something like ar & BOOST_SERIALIZATION_NVP(dd)) and then reloaded (again using BOOST_SERIALIZATION_NVP(dd)), the dimensionality increases by one. Please run this simplified program I've writt
< daivik>
en as a POC (https://pastebin.com/y2zSaArJ). In order to fix this, I added a check in DiscreteDistribution::serialize() to erase the first element of the probabilities vector whenever a load occurs (because the first element was the extra/useless thing that was being added).
< daivik>
I have a few more points to add to this - each of which I'm fairly certain is an issue:
< daivik>
1. None of the serialization tests report the error with DiscreteDistribution::serialize() when I run mlpack_test. IMHO, something should have failed and alerted about this.
< daivik>
2. My "fix" is not really a fix. I still do not know why the extra dimension is being added on its own. And while mlpack_test shows no errors (related to this), how do I know that I haven't broken something?
< daivik>
In summation, what do I do now? Do I submit a pull request with my "fix"? Do I open an issue? Do I proceed with trying to find out why the exra dimension is being mysteriously added? Do I write tests/more tests for serialization/hmm CLI bindings?
< rcurtin>
daivik: thanks for digging so deep. let me look into this and get back to you in a little while (I am about to go into a meeting)
< rcurtin>
daivik: I tried running the POC code you gave, but it looks like on my system, DiscreteDistribution::serialize() seems to behave properly:
< rcurtin>
I do think you are correct that emission.resize() is not necessary; I know why it is that way...
< rcurtin>
it used to be that the code was structured differently and we couldn't use BOOST_SERIALIZATION_NVP() to serialize a vector type so we had to serialize the elements individually (long story)
< rcurtin>
when we switched it to use boost serialization as it was intended, we refactored that code, but I guess the emissions.resize() stayed around by accident
< daivik>
can you please also verify whether you're having problem 1. on your system. that is, try saving a model file using hmm_train and then reusing that in hmm_loglik
< daivik>
the resize() should cause problems on any system
< daivik>
also, what might be the reason for the added dimension i get on my system (I'll paste the output I get in a minute when the build finishes)
< rcurtin>
sure, let's see what that does
< rcurtin>
daivik: first I copied the exact model XML from the pastebin you gave, and that appears to work correctly:
< rcurtin>
[INFO ] Loading 'test.csv' as raw ASCII formatted data. Size is 1 x 3.
< rcurtin>
log_likelihood: -7.73293
< daivik>
thats very strange -- i had to dig through a lot of code to get that
< rcurtin>
and if I try to build the model the same way I think you did (mlpack_hmm_train -i train.csv -n 2 -t discrete -M model2.xml), then run with that model, it successfully completes with a similar log likelihood
< rcurtin>
hmmm, what version of boost are you using?
< daivik>
1.58
< rcurtin>
I am using 1.62 here; I wonder if this is some boost serialization versioning issue
< rcurtin>
I say this because I am currently debugging some serialization failure on 1.66...
< daivik>
right, i should really upgrade
< rcurtin>
still, we do support boost 1.58 so if the issue is in boost we'll need to come up with some kind of workaround...
< daivik>
geez, all this for just a version issue. I've learnt my lesson now
< rcurtin>
well, if upgrading does fix it, we should still at least open an issue somewhere, because I am sure you will not be the last boost 1.58 user :)
< daivik>
i guess i'll upgrade -- probably lose the issue -- and then carry on writing the tests for hmm CLI bindings :)
< daivik>
I'll let you know if upgrading resolves the issue
< daivik>
thanks a lot for your help
< daivik>
And I'm very sorry for wasting your time -- I should've known to check the release notes at least
< rcurtin>
no, you are not wasting my time, figuring stuff out like this is important :)
< rcurtin>
keep in mind that sooner or later someone else will come along with the same problem, so it's important to figure out what's going on on your system and try and diagnose it
< rcurtin>
and it can be a time consuming process sometimes
< rcurtin>
for instance, this other serialization issue I mentioned with 1.66, I've probably spent 2 or 3 hours digging into it today
< rcurtin>
what I discovered in the end is that some change from 1.65.1 to 1.66 caused a problem in the mlpack tests, so I was about to submit the bug report to the serialization github repository
< rcurtin>
and then I discovered that actually, there is not a bug in boost::serialization, just unclear behavior that could be improved, and that this behavior has existed since long before 1.66 came out
< rcurtin>
so, I guess I'll fix the mlpack code, not submit the bug to upstream serialization, and then try and figure out what the actual difference was that causes it to fail for 1.66 but not 1.65...
< rcurtin>
so yeah... this is a large part of the important maintenance that has to happen for a project like this, so don't feel bad at all pointing out issues that need to be investigated like this :)
< daivik>
I did get to learn about the boost serialization modules - so i'm not completely disheartened. And yes, you're absolutely right - things like this do take up a lot of time sometimes (and often dont lead to things of much consequence), but I guess they're still important - like you said.