-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.
Lines 341 to 357 in 83080bc
| dw = -(x @ dx + y @ dy + s @ ds) | |
| dz = np.concatenate( | |
| [dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])]) | |
| if np.allclose(dz, 0): | |
| r = np.zeros(dz.shape) | |
| elif mode == "dense": | |
| r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz) | |
| else: | |
| r = _diffcp.lsqr(MT, dz).solution | |
| values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols] | |
| dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape) | |
| db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m] | |
| dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n] | |
| return dA, db, dc |
It seems like, when compared to the paper, the code solves M.T @ r = dz for r, whereas the paper solves M.T @ g = -dz for g. So r = -g. But then the equations used in the code to compute (dA, db, dc) seem to match those in the paper, when they should all differ by a negative sign.
Similarly, for the forward-mode derivative, you solve M @ dz = dQ @ pi_z for dz, use the same equations as in the paper despite the sign difference, but you multiply (dx, dy, dz) by -1 before returning, so this is fine.
Is this a sign error in the adjoint derivative, or did I get something wrong?