Class lsst::meas::modelfit::Optimizer

class Optimizer

A numerical optimizer customized for least-squares problems with Bayesian priors.

The algorithm used by Optimizer combines the Gauss-Newton approach of approximating the second-derivative (Hessian) matrix as the inner product of the Jacobian of the residuals, while maintaining a matrix of corrections to this to account for large residuals, which is updated using a symmetric rank-1 (SR1) secant formula. We assume the prior has analytic first and second derivatives, but use numerical derivatives to compute the Jacobian of the residuals at every step. A trust region approach is used to ensure global convergence.

We consider the function \(f(x)\) we wish to optimize to have two terms, which correspond to negative log likelihood ( \(\chi^2/2=\|r(x)|^2\), where \(r(x)\) is the vector of residuals at \(x\)) and negative log prior \(q(x)=-\ln P(x)\):

\[ f(x) = \frac{1}{2}\|r(x)\|^2 + q(x) \]
At each iteration \(k\), we expand \(f(x)\) in a Taylor series in \(s=x_{k+1}-x_{k}\):
\[ f(x) \approx m(s) = f(x_k) + g_k^T s + \frac{1}{2}s^T H_k s \]
where
\[ g_k \equiv \left.\frac{\partial f}{\partial x}\right|_{x_k} = J_k^T r_k + \nabla q_k;\quad\quad J_k \equiv \left.\frac{\partial r}{\partial x}\right|_{x_k} \]
\[ H_k = J_k^T J_k + \nabla^2 q_k + B_k \]
Here, \(B_k\) is the SR1 approximation term to the second derivative term:
\[ B_k \approx \sum_i \frac{\partial^2 r^{(i)}_k}{\partial x^2}r^{(i)}_k \]
which we initialize to zero and then update with the following formula:
\[ B_{k+1} = B_{k} + \frac{v v^T}{v^T s};\quad\quad v\equiv J^T_{k+1} r_{k+1} - J^T_k r_k \]
Unlike the more common rank-2 BFGS update formula, SR1 updates are not guaranteed to produce a positive definite Hessian. This can result in more accurate approximations of the Hessian (and hence more accurate covariance matrices), but it rules out line-search methods and the simple dog-leg approach to the trust region problem. As a result, we should require fewer steps to converge, but spend more time computing each step; this is ideal when we expect the time spent in function evaluation to dominate the time per step anyway.