Skip to content

Sign in adjoint derivative calculation #35

@spenrich

Description

@spenrich

I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.

dw = -(x @ dx + y @ dy + s @ ds)
dz = np.concatenate(
[dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])])
if np.allclose(dz, 0):
r = np.zeros(dz.shape)
elif mode == "dense":
r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz)
else:
r = _diffcp.lsqr(MT, dz).solution
values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols]
dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape)
db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m]
dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n]
return dA, db, dc

It seems like, when compared to the paper, the code solves M.T @ r = dz for r, whereas the paper solves M.T @ g = -dz for g. So r = -g. But then the equations used in the code to compute (dA, db, dc) seem to match those in the paper, when they should all differ by a negative sign.

Similarly, for the forward-mode derivative, you solve M @ dz = dQ @ pi_z for dz, use the same equations as in the paper despite the sign difference, but you multiply (dx, dy, dz) by -1 before returning, so this is fine.

Is this a sign error in the adjoint derivative, or did I get something wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions