File optimizer.h¶
-
namespace
lsst
Class for a simple mapping implementing a generic AstrometryTransform.
Remove all non-astronomical counts from the Chunk Exposure’s pixels.
Forward declarations for lsst::utils::Cache
For details on the Cache class, see the Cache.h file.
It uses a template rather than a pointer so that the derived classes can use the specifics of the transform. The class simplePolyMapping overloads a few routines.
A base class for image defects
Numeric constants used by the Integrate.h integrator routines.
Compute Image Statistics
- Note
Gauss-Kronrod-Patterson quadrature coefficients for use in quadpack routine qng. These coefficients were calculated with 101 decimal digit arithmetic by L. W. Fullerton, Bell Labs, Nov 1981.
- Note
The Statistics class itself can only handle lsst::afw::image::MaskedImage() types. The philosophy has been to handle other types by making them look like lsst::afw::image::MaskedImage() and reusing that code. Users should have no need to instantiate a Statistics object directly, but should use the overloaded makeStatistics() factory functions.
-
namespace
meas
-
namespace
modelfit
Functions
-
void
solveTrustRegion
(ndarray::Array<Scalar, 1, 1> const &x, ndarray::Array<Scalar const, 2, 1> const &F, ndarray::Array<Scalar const, 1, 1> const &g, double r, double tolerance)¶ Solve a symmetric quadratic matrix equation with a ball constraint.
This computes a near-exact solution to the “trust region subproblem” necessary in trust-region-based nonlinear optimizers:
\[ \min_x{\quad g^T x + \frac{1}{2}x^T F x}\quad\quad\quad \text{s.t.} ||x|| \le r \]The tolerance parameter sets how close to \(r\) we require the norm of the solution to be when it lies on the constraint, as a fraction of \(r\) itself.
This implementation is based on the algorithm described in Section 4.3 of “Nonlinear Optimization” by Nocedal and Wright.
-
class
Optimizer
- #include <optimizer.h>
A numerical optimizer customized for least-squares problems with Bayesian priors.
The algorithm used by Optimizer combines the Gauss-Newton approach of approximating the second-derivative (Hessian) matrix as the inner product of the Jacobian of the residuals, while maintaining a matrix of corrections to this to account for large residuals, which is updated using a symmetric rank-1 (SR1) secant formula. We assume the prior has analytic first and second derivatives, but use numerical derivatives to compute the Jacobian of the residuals at every step. A trust region approach is used to ensure global convergence.
We consider the function \(f(x)\) we wish to optimize to have two terms, which correspond to negative log likelihood ( \(\chi^2/2=\|r(x)|^2\), where \(r(x)\) is the vector of residuals at \(x\)) and negative log prior \(q(x)=-\ln P(x)\):
\[ f(x) = \frac{1}{2}\|r(x)\|^2 + q(x) \]At each iteration \(k\), we expand \(f(x)\) in a Taylor series in \(s=x_{k+1}-x_{k}\):\[ f(x) \approx m(s) = f(x_k) + g_k^T s + \frac{1}{2}s^T H_k s \]where\[ g_k \equiv \left.\frac{\partial f}{\partial x}\right|_{x_k} = J_k^T r_k + \nabla q_k;\quad\quad J_k \equiv \left.\frac{\partial r}{\partial x}\right|_{x_k} \]\[ H_k = J_k^T J_k + \nabla^2 q_k + B_k \]Here, \(B_k\) is the SR1 approximation term to the second derivative term:\[ B_k \approx \sum_i \frac{\partial^2 r^{(i)}_k}{\partial x^2}r^{(i)}_k \]which we initialize to zero and then update with the following formula:\[ B_{k+1} = B_{k} + \frac{v v^T}{v^T s};\quad\quad v\equiv J^T_{k+1} r_{k+1} - J^T_k r_k \]Unlike the more common rank-2 BFGS update formula, SR1 updates are not guaranteed to produce a positive definite Hessian. This can result in more accurate approximations of the Hessian (and hence more accurate covariance matrices), but it rules out line-search methods and the simple dog-leg approach to the trust region problem. As a result, we should require fewer steps to converge, but spend more time computing each step; this is ideal when we expect the time spent in function evaluation to dominate the time per step anyway.Public Types
-
enum
StateFlags
¶ Values:
-
CONVERGED_GRADZERO
= 0x0001¶
-
CONVERGED_TR_SMALL
= 0x0002¶
-
CONVERGED
= CONVERGED_GRADZERO | CONVERGED_TR_SMALL¶
-
FAILED_MAX_INNER_ITERATIONS
= 0x0010¶
-
FAILED_MAX_OUTER_ITERATIONS
= 0x0020¶
-
FAILED_MAX_ITERATIONS
= 0x0030¶
-
FAILED_EXCEPTION
= 0x0040¶
-
FAILED_NAN
= 0x0080¶
-
STATUS_STEP_REJECTED
= 0x0100¶
-
STATUS_STEP_ACCEPTED
= 0x0200¶
-
STATUS_STEP
= STATUS_STEP_REJECTED | STATUS_STEP_ACCEPTED¶
-
STATUS_TR_UNCHANGED
= 0x1000¶
-
STATUS_TR_DECREASED
= 0x2000¶
-
STATUS_TR_INCREASED
= 0x4000¶
-
STATUS_TR
= STATUS_TR_UNCHANGED | STATUS_TR_DECREASED | STATUS_TR_INCREASED¶
-
STATUS
= STATUS_STEP | STATUS_TR¶
-
-
typedef OptimizerObjective
Objective
-
typedef OptimizerControl
Control
-
typedef OptimizerHistoryRecorder
HistoryRecorder
Public Functions
-
lsst::meas::modelfit::Optimizer::Optimizer(PTR( Objective const) objective, ndarray::Array< Scalar const, 1, 1 > const & parameters, Control const & ctrl)
-
PTR
(Objective const) const
-
Control const &
getControl
() const
-
bool
step
()
-
bool
step
(HistoryRecorder const &recorder, afw::table::BaseCatalog &history)
-
int
run
()
-
int
run
(HistoryRecorder const &recorder, afw::table::BaseCatalog &history)
-
int
getState
() const
-
Scalar
getObjectiveValue
() const
-
ndarray::Array<Scalar const, 1, 1>
getParameters
() const
-
ndarray::Array<Scalar const, 1, 1>
getResiduals
() const
-
ndarray::Array<Scalar const, 1, 1>
getGradient
() const
-
ndarray::Array<Scalar const, 2, 2>
getHessian
() const
-
void
removeSR1Term
() Remove the symmetric-rank-1 secant term from the Hessian, making it just (J^T J)
Private Functions
-
bool
_stepImpl
(int outerIterCount, HistoryRecorder const *recorder = NULL, afw::table::BaseCatalog *history = NULL)¶
-
int
_runImpl
(HistoryRecorder const *recorder = NULL, afw::table::BaseCatalog *history = NULL)¶
-
void
_computeDerivatives
()¶
Friends
-
friend
lsst::meas::modelfit::OptimizerHistoryRecorder
-
struct
IterationData
¶
-
enum
-
class
OptimizerControl
- #include <optimizer.h>
Configuration object for Optimizer.
Many of these configuration options pertain to how the trust region is updated. It’s easiest to understand these together rather than separately. At each iteration, a quadratic model of the objective function is formed. We can use this model to predict how we expect the objective function to behave over a step, and compare it to how the actual objective function behaves. To do this, we’ll use the ratio of the actual reduction in the objective function to the predicted reduction in the objective function, and call this \(\rho\). Then,
the step is accepted, and the parameters updated, when \(\rho >\)
stepAcceptThreshold
.if \(\rho > \)
trustRegionGrowReductionRatio
and the length of the step is greater thantrustRegionGrowStepFraction
times the current trust region radius, the trust region radius will be multiplied bytrustRegionGrowFactor
.if
trustRegionShrinkMinReductionRatio
\(< \rho < \)trustRegionShrinkMaxReductionRatio
, the trust region radius will be multiplied bytrustRegionShrinkFactor
.
Public Functions
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(noSR1Term, bool, "If true, ignore the SR1 update term in the Hessian, resulting in a Levenberg-Marquardt-like method")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(skipSR1UpdateThreshold, double, "Skip the SR1 update if |v||s| / (|v||s|) is less than this threshold")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(minTrustRadiusThreshold, double, "If the trust radius falls below this threshold, consider the algorithm converged")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(gradientThreshold, double, "If the maximum of the gradient falls below this threshold, consider the algorithm converged")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(numDiffRelStep, double, "relative step size used for numerical derivatives (added to other steps)")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(numDiffAbsStep, double, "absolute step size used for numerical derivatives (added to other steps)")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(numDiffTrustRadiusStep, double, "step size (in units of trust radius) used for numerical derivatives (added to relative step)")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(stepAcceptThreshold, double, "steps with reduction ratio greater than this are accepted")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionInitialSize, double, "the initial trust region will be set to this value")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionGrowReductionRatio, double, "steps with reduction radio greater than this may increase the trust radius")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionGrowStepFraction, double, "steps with length this fraction of the trust radius may increase the trust radius")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionGrowFactor, double, "when increase the trust region size, multiply the radius by this factor")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionShrinkReductionRatio, double, "steps with reduction radio less than this will decrease the trust radius")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionShrinkFactor, double, "when reducing the trust region size, multiply the radius by this factor")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(trustRegionSolverTolerance, double, "value passed as the tolerance to solveTrustRegion")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(maxInnerIterations, int, "maximum number of iterations (i.e. function evaluations and trust region subproblems) per step")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(maxOuterIterations, int, "maximum number of steps")
-
lsst::meas::modelfit::OptimizerControl::LSST_CONTROL_FIELD(doSaveIterations, bool, "whether to save all iterations for debugging purposes")
-
OptimizerControl
()
-
class
OptimizerHistoryRecorder
Public Functions
-
lsst::meas::modelfit::OptimizerHistoryRecorder::OptimizerHistoryRecorder(afw::table::Schema & schema, PTR( Model ) model, bool doRecordDerivatives)
-
void
apply
(int outerIterCount, int innerIterCount, afw::table::BaseCatalog &history, Optimizer const &optimizer) const
-
void
unpackDerivatives
(ndarray::Array<Scalar const, 1, 1> const &nested, Vector &gradient, Matrix &hessian) const
-
void
unpackDerivatives
(afw::table::BaseRecord const &record, Vector &gradient, Matrix &hessian) const
-
void
unpackDerivatives
(ndarray::Array<Scalar const, 1, 1> const &nested, ndarray::Array<Scalar, 1, 1> const &gradient, ndarray::Array<Scalar, 2, 2> const &hessian) const
-
void
unpackDerivatives
(afw::table::BaseRecord const &record, ndarray::Array<Scalar, 1, 1> const &gradient, ndarray::Array<Scalar, 2, 2> const &hessian) const
-
void
fillObjectiveModelGrid
(afw::table::BaseRecord const &record, ndarray::Array<Scalar const, 2, 1> const ¶meters, ndarray::Array<Scalar, 1, 1> const &output) const
-
-
class
OptimizerObjective
- #include <optimizer.h>
Base class for objective functions for Optimizer.
Public Functions
-
static
PTR
(Prior)
-
OptimizerObjective
(int dataSize_, int parameterSize_) Base class constructor; must be called by all subclasses.
-
void
fillObjectiveValueGrid
(ndarray::Array<Scalar const, 2, 1> const ¶meters, ndarray::Array<Scalar, 1, 1> const &output) const Evaluate the Objective on a 1-d grid.
This delegates to computeResiduals, and hence need not be reimplemented by derived classes. It is intended mostly for diagnostic purposes; when investigating the behavior of the optimizer, it is frequently valuable to plot its path against the gridded objective function.
Frequently, the arguments to this function will be flattened views into higher dimensional arrays, allowing it to be used to construct N-d grids despite taking only a 1-d sequence of grid points.
- Parameters
[in] parameters
: An array with shape (gridSize, parameterSize), specifying the parameter vector at points on the grid.[out] output
: Output array for objective values with shape (gridSize). Must be allocated, but need not be initialized.
-
virtual void
computeResiduals
(ndarray::Array<Scalar const, 1, 1> const ¶meters, ndarray::Array<Scalar, 1, 1> const &residuals) const = 0 Evaluate the residuals of the model for a given parameter vector.
- Parameters
[in] parameters
: An array of parameters with shape (parameterSize).[out] residuals
: Output array that will contain (model - data) on return. Must be allocated to shape (dataSize), but need not be initialized.
-
virtual bool
differentiateResiduals
(ndarray::Array<Scalar const, 1, 1> const ¶meters, ndarray::Array<Scalar, 2, -2> const &derivatives) const Evaluate analytic derivatives of the model or signal that they are not available.
Subclasses that provide analytic derivatives should implement this method and return true. Subclasses that do not should return false to indicate to the optimizer that numeric derivatives must be used instead. Subclasses that can only sometimes compute analytic derivatives are not supported.
- Parameters
[in] parameters
: An array of parameters with shape (parameterSize).[out] derivatives
: Output array that will contain d(model - data)/d(parameters) on return. Must be allocated to shape (dataSize, parameterSize), but need not be initialized.
-
virtual bool
hasPrior
() const Return true if the Objective has a Bayesian prior as well as a likelihood.
The default implementation returns false.
-
virtual Scalar
computePrior
(ndarray::Array<Scalar const, 1, 1> const ¶meters) const Compute the value of the Bayesian prior for the given parameter vector.
The default implementation simply returns 1.0 (appropriate for an unnormalized constant prior, which is mathematically equivalent to no prior).
-
virtual void
differentiatePrior
(ndarray::Array<Scalar const, 1, 1> const ¶meters, ndarray::Array<Scalar, 1, 1> const &gradient, ndarray::Array<Scalar, 2, 1> const &hessian) const Compute the first and second derivatives of the Bayesian prior with respect to the parameters.
The default implementation simply sets the output arrays to 0.0 (appropriate for an constant prior, which is mathematically equivalent to no prior).
- Parameters
[in] parameters
: An array of parameters with shape (parameterSize).[out] gradient
: First derivative of the prior with respect to the parameters. Must be allocated to shape (parameterSize), but need not be initialized.[out] hessian
: Second derivative of the prior with respect to the parameters. Must be allocated to shape (parameterSize, parameterSize), but need not be initialized. This is a symmetric matrix, and only the lower triangle is used.
-
virtual
~OptimizerObjective
()
Public Members
-
int const
dataSize
-
int const
parameterSize
Public Static Functions
-
static
PTR
(OptimizerObjective) Return a concrete Objective object built from a Likelihood and Prior.
Most fitting problems that can be formulated in terms of (multi-shapelet) Models, Likelihoods, and Priors can just use this Objective. The returned Objective relies on numerical derivatives, however, so simple problems where analytic derivatives are easy to implement may merit a custom OptimizerObjective.
-
static
-
void
-
namespace