verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Ping timeout: 250 seconds]
govg has joined #mlpack
govg has quit [Ping timeout: 256 seconds]
govg has joined #mlpack
govg has quit [Ping timeout: 276 seconds]
govg has joined #mlpack
witness_ has joined #mlpack
govg has quit [Ping timeout: 265 seconds]
govg has joined #mlpack
< zoq>
ShikharJ: Hello, do you think the valid convolution (dilatation) is correct, the backward/gradient step for all the layer should be the same.
witness_ has quit [Quit: Connection closed for inactivity]
< ShikharJ>
zoq: I'm not sure what you mean, the implementation is correct (see my example above).
< ShikharJ>
fswatch.so
< zoq>
ShikharJ: Have to take a look at the example (test).
< ShikharJ>
zoq: Are you suggesting that a 3x3 kernel (with dilation = 2) should be augmented to a 5x5 kernel (with zeroes included in between), and the gradients correpsonding to that 5x5 kernel be used?
< zoq>
yeah, it's not the fastest solution, but I thought that would work.
< ShikharJ>
But wouldn't that be wrong? The kernel and the weights are defined for 3x3 size?
< zoq>
In this case, we would only update and use a portion of the weights, true; but the output should be correct.
< ShikharJ>
zoq: Just to be clear, we'll update the 3x3 original kernel with the parts of the 5x5 output obtained from Gradients method, right?
< zoq>
ShikharJ: Correct, currently I'll see if I can think of another solution, but I think that would work for now. On the other side, are we going to use atrous convolutions for the GAN project?
manish7294 has joined #mlpack
< ShikharJ>
zoq: Not quite, this is only for the sake of completion of the convolutional toolkit of mlpack.
< rcurtin>
welcome to the first day of GSoC everybody :)
< rcurtin>
(I guess my day starts later than everyone else's in the US)
< zoq>
ShikharJ: Right, I guess it would be good to have it and test it out, but it's not the priority, I guess if the solution works it's okay, but as I said it's not ideal.
< zoq>
rcurtin: Right, for me it's 3:00 pm :)
< rcurtin>
:)
< manish7294>
rcurtin: In India, we are already touching end of our day :)
< manish7294>
rcurtin: Is it OK to talk now?
< rcurtin>
manish7294: I am actually about to leave to go to work---is it too long to wait ~30-45 minutes?
< rcurtin>
if so I can wait to leave for a little while
< manish7294>
manish7294: No hurry! We can talk later.
< rcurtin>
ok, I'll let you know when I am in the office :)
< manish7294>
rcurtin: cool enough!
< rcurtin>
manish7294: back now, sorry---took a little longer than expected
< manish7294>
rcurtin: no worries! Need to discuss upon SDP formulation.
< manish7294>
currently I tried make a prototype SDP but everytime I am ending upon with a matrix of huge size.
< manish7294>
which is resulting in bad_alloc() error.
< rcurtin>
right, so the SDP should be solving a matrix of dxd size where d == number of dimensions
< rcurtin>
and the number of constraints should be k * n where k is probably between 1 and 5 and n is the number of points
< rcurtin>
that shouldn't be so large that a bad_alloc happens, though
< manish7294>
Here, constraints will be k * k * n as there are impostors too.
< rcurtin>
no, I think that should be 2 * k * n
< rcurtin>
since you have kn constraints for the neighbors and kn constraints for the impostors
< manish7294>
I think you are right!
< manish7294>
But as for sdp.C() matrix, in what way are you thinking on initializing that one.
< rcurtin>
what are you currently trying?
< manish7294>
I am getting confused with the constraint part of objective function
< manish7294>
In order to include it in the objective function, I need a bigger matrix
< rcurtin>
right, so, for the C matrix, we need to express the objective such that dot(C, M) == the objective function
< manish7294>
yes, for that C must include targetNeighbor + impostors expression
< rcurtin>
in other formulations like MVU, C can be expressed as sparse
< rcurtin>
but remember also that M is dxd, so C must also be dxd
< rcurtin>
which should be relatively small
< rcurtin>
I'm not sure it will be sparse here, I haven't worked out the algebra to convert to dot(C, M) form
< rcurtin>
actually, I think the BoostMetric paper has done the work for us... I was looking at the LMNN paper to start with
< manish7294>
Yup! that's what is confusing me that how will we be keeping C as dxd
< rcurtin>
see equation (P1)
< rcurtin>
in their notation, we are closer to what we need... they have dot(M, A) + (some other things)
< rcurtin>
the matrix A (equation 1) should be dxd
< manish7294>
dot(M, A ) is fine. But that other thing includes inner product of (1-yijl) with eijl
< manish7294>
So, this product in itself requires a dimension of N * N *N :(
< rcurtin>
the C * sum(slack variables) should result in a scalar
< manish7294>
Yup! but all slack variables are different. So , we can't declare their basis as a single 1 element
< rcurtin>
we are just summing themm though, so I don't see the issue
< rcurtin>
them*
< manish7294>
Well, I can try keeping that whole term as a single element.
< manish7294>
Still we will be needing d + 1 dimension
< manish7294>
d for A and 1 for other term
< rcurtin>
this doesn't make sense to me; the objective is just adding (in the notation of the BoostMetric paper) dot(A, M) and the C*sum(slack variables) term
< rcurtin>
the objective is just a scalar
< manish7294>
Here M is a matrix and A is a scalar. so it will result out as a scalar multiplication of matrix
< rcurtin>
no, A is a dxd matrix
< rcurtin>
and when you take dot(A, M) you get a scalar
< rcurtin>
if I have some point x_i and x_j of dimension d, and I take (x_i - x_j)*(x_i - x_j)^T, I get a dxd matrix
< manish7294>
You are right. But A is a scalar as A is sum of all target neighbor metric evaluation
< manish7294>
Okay, Now it's getting worse. As you said objective function is a scalar :(
< rcurtin>
each of those target-neighbor metric evaluations is a matrix of size dxd
< rcurtin>
and we sum each of those, and we still get a dxd matrix
< rcurtin>
one problem I do see is that the SDP class only supports objectives of the form dot(C, X) (in the notation of the SDP code, not the BoostMetric paper... sorry if that is confusing)
< rcurtin>
so maybe we may need to change it to allow some penalty parameter, but that is not *too* hard I don't think
< rcurtin>
let me know if I can clarify anything... the notation is not easy because the BoostMetric paper uses letters differently than the LMNN paper which also uses them differently than the mlpack code :)
< manish7294>
But standard SDP is that only, we may need to convert the LMNN SDP to that form
< rcurtin>
I don't see any way to "stuff" the penalty parameter into the matrix A, but let me think about it, maybe there is some easy algebra that can be done
< rcurtin>
ohh, I see... hang on, let me think about this
< manish7294>
Sure! have your time
< rcurtin>
hmm, so it may be a little bit hard to represent this with Armadillo
< rcurtin>
but I agree that it could work
< rcurtin>
basically, instead of handling the slack constraints separately, this formulation "stuffs" it all into the objective function
< manish7294>
I tried but ending up with bad_alloc() :(
< manish7294>
Yes
< rcurtin>
well, it would definitely be better handled with the sp_mat class
< manish7294>
sp_mat too reaches its limit here
< rcurtin>
since in this formulation you linked to, we have a huge matrix where A (of size dxd) is dense, and epsilon is diagonal
ImQ009 has joined #mlpack
< manish7294>
as you can see Eijl is diag matrix in objective
< rcurtin>
it shouldn't, I don't think... how are you initializing it?
< rcurtin>
right, agreed, it is diagonal, and it will have size (I think) 2*k*n
< rcurtin>
so the total size of the sparse matrix should be (2*k*n + d) by (2*k*n + d)
< manish7294>
Yup
< rcurtin>
and what are k and n in the code you are trying to run?
< manish7294>
k = 3, n=125
< rcurtin>
right, and d is what?
< manish7294>
d is 4
< rcurtin>
okay, so this is a 754x754 matrix
< rcurtin>
that should be easily representable even as a dense matrix
< manish7294>
yup
< rcurtin>
are you sure your implementation is correct?
< rcurtin>
that will take 4.33MB of memory (assuming it's arma::mat), and less if it's arma::sp_mat
< manish7294>
I was doing something worse. I was taking constraints as k * k*n
< rcurtin>
but even that would only give 1129x1129
< rcurtin>
which is 9.72MB for a dense matrix and less for a sparse matrix
< manish7294>
Then I should probably check my code
< manish7294>
Do you think this could work?
< rcurtin>
the formulation you gave could work, but as we scale to larger data we will need to do one of two things:
< rcurtin>
a) represent that matrix as an sp_mat, since it will contain mostly zeros
< rcurtin>
b) write a simple wrapper class that represents the two parts of the matrix as a dense matrix (A) and the diagonal vector of slack variables (epsilon)
< rcurtin>
and that simple wrapper class would have to implement the dot() function, or maybe a template specialization could be used for Evaluate() or something like this
< rcurtin>
I guess the third option is, change the SDP class so that it supports inequality constraints, and then we could represent the slack variables like that
< manish7294>
I will give it a shot again and will report to you by tommorow. If it's okay!
< manish7294>
One more thing I need to dissus is about the constraint class.
< rcurtin>
sure, that's just fine
< manish7294>
Currently I have made a directory structure as: a separate directory for LMNN and Boostmetric with constraint class under lmnn namespace
< manish7294>
Is it okay to keep constraint section in lmnn
< rcurtin>
yeah, I don't see any issue with that
< rcurtin>
honestly when I think about it more, I suspect it may be easier to write a new type of inequality constraint than to stuff the slack variables into the objective functino
< rcurtin>
function*
< rcurtin>
you can give the objective function idea a try to ensure that it works (for small problems where you can represent that whole matrix as an arma::mat, it's probably easier to implement and test)
< rcurtin>
but I don't think that approach will scale as well or run as quickly without a custom matrix-like class (which would be a lot of work)
< manish7294>
Yeah, this change could be promising. I will look into that. Could you give me some starters for this change?
< rcurtin>
hmm, so let me look quickly to get an idea...
< rcurtin>
hm, I don't know that I like the way this code is structured very much, so there may be a little bit of trickiness
< rcurtin>
take a look at LRSDPFunction::EvaluateConstraint() though (in lrsdp_function_impl.hpp)
< rcurtin>
you can see there that basically this just computes (<A, X> - b)
< rcurtin>
so that is for a constraint like '<A, X> = b'
< manish7294>
Right
< rcurtin>
if we want a constraint instead like '<A, x> >= b', then we want to return 0 if <A, x> >= b, and otherwise we can return (<A, X> - b)
< rcurtin>
(or that maybe should be (b - <A, X>), I didn't fully check the math I did there)
< rcurtin>
as far as implementation goes, I could see a few possibilities
< rcurtin>
the first might be to add a vector to the SDP class to specify what type each sparse and dense constraint is (that would be either equality or inequality, for now)
< rcurtin>
another might be to add a 'denseInequalityA' and 'sparseInequalityA' member
< rcurtin>
that makes the class a little more complex though
< rcurtin>
I could see either way. the complexity is not a huge issue if it is well-documented so that anyone writing an SDP can understand how to add their constraints
< manish7294>
Yup! that could a fine structural change
< rcurtin>
we'd also have to modify the EvaluateConstraint() and GradientConstraint() functions of LRSDPFunction, as well as the PrimalDualSolver to be able to handle these types of constraints
< manish7294>
Yup! I think that will do.
< manish7294>
Thanks for help! :)
< rcurtin>
sure, that is what I am here for :)
< manish7294>
zoq: Just a dumb question. We have to push the blog at end of the each week right?
< zoq>
manish7294: Good question, at the end or the beginning works just fine; last year most people pushed something on Monday.
sumedhghaisas has joined #mlpack
< rcurtin>
manish7294: for the first week, you could just have an intro post or something if you wanted to do it today
< manish7294>
rcurtin: Sure, I will push something related to the current state.
< rcurtin>
sounds good
< ShikharJ>
rcurtin: Can we request the armadillo team for porting the shed_rows and shed_cols function for arma::Cube as well?
< ShikharJ>
It is required in the Gradients method for Atrous Convolution.
< ShikharJ>
Currently, I haave to use a dummy Matrix and an additional Cube and then do the computation slice by slice.
< zoq>
which allows you to define ARMA_EXTRA_CUBE_MEAT and armadillo will include that for you
< ShikharJ>
Hmm, I see if I'm able to come up with something. If not, maybe we can just open an issue for someone else to optimize the operations later.
< ShikharJ>
*I'll
< zoq>
sounds like a good plan to me
< rcurtin>
ShikharJ: the armadillo team is me and conrad :)
< rcurtin>
so you could definitely implement it, and we can submit it upstream
< ShikharJ>
Oh I didn't know that :P Sure, I'll implement it, but my first priority would be to get the dilated convolution PR merged. So I think opening an issue for a later work would be much quicker.
< rcurtin>
sure, that's fine
< rcurtin>
I guess I should say, really Conrad does a lot more work for Armadillo than I do, I'd say I'm a "frequent contributor" and I did most of the sparse support over the years
< ShikharJ>
rcurtin: I don't think I know the history of Armadillo (like you told us about mlpack). I'd love to hear it :)
< rcurtin>
ah, I don't know this one as well but I can try...
< rcurtin>
Conrad originally developed it in his work at NICTA in roughly 2008, and has mostly maintained the project himself since then
< rcurtin>
around 2011 I needed sparse matrix support for mlpack, so with a few other mlpack contributors we put together an initial implementation and sent it to him
< rcurtin>
since then, I've been involved mostly from the sparse support side of things
< rcurtin>
we actually just submitted a paper on the Armadillo sparse matrix format, if you are interested:
< ShikharJ>
Paper seems interesting! And does Conrad contribute to mlpack as well?
< rcurtin>
occasionally he submits bugfixes, but he hasn't written any actual algorithms or anything for inclusion in mlpack
< rcurtin>
typically the bugfixes are just handling Armadillo warnings or issues (since he tests Armadillo releases against mlpack)
< ShikharJ>
I see. Before mlpack, I was a GSoC student at SymEngine (a symbolic manipulation library). That's where I was first introduced to linear algebra routines and sparse matrices. Working with matrices is actually quite interesting.
< rcurtin>
yes, there's a lot more to it than I originally expected
< ShikharJ>
Learnt a lot from that experience.
< rcurtin>
in 2010, I thought it would be easy to implement sparse matrices... I quickly found out that they are completely different and much harder to handle than dense matrices
< ShikharJ>
Yeah, and the feeling you get after optimizing and implementing the each of those routines is quite rewarding in itself.
< rcurtin>
it's true, for me I get the same feeling accelerating algorithms in mlpack
< rcurtin>
although I wish I had more time to really dig in like that :)
< ShikharJ>
:)
< rcurtin>
hmm, it looks like the SimpleTransposedConvolutionLayerTest is failing sometimes:
< rcurtin>
I took a look into it a little bit, and it looks like valgrind is throwing some issues when I run the test
< ShikharJ>
Yeah, the output is correct, but even when I ran SimpleTransposedConvolutionLayerTest in conjunction with other tests, sometimes, on my system it failed too.
< rcurtin>
right, so I tried to run the test alone with different random seeds, but I didn't seem to be able to easily produce an issue
< rcurtin>
so I then tried valgrind, but I haven't dug deep enough to see what's really wrong yet
< rcurtin>
I think it may have to do with calling Forward(std::move(input), std::move(output)), then using the 'output' matrix, but I am not 100% sure
< rcurtin>
actually, this leads to another question
< rcurtin>
zoq: for the Forward(), Backward(), and Gradient() functions, do the output matrices need to be rvalue references, or could they just be references?
< rcurtin>
maybe there is a piece I am missing or misunderstanding when I think about it
manish7294 has joined #mlpack
< manish7294>
rcurtin: zoq: I tried pushing a blog but got permission error. Are we allowed to push yet?
< rcurtin>
ah, hang on, there is one permission I forgot to set
< rcurtin>
manish7294: should be fixed now... try again :)
< manish7294>
rcurtin: Thanks! It worked!
sumedhghaisas has quit [Read error: Connection reset by peer]