File GaussianProcess.h¶
-
namespace
lsst
Class for a simple mapping implementing a generic AstrometryTransform.
Remove all non-astronomical counts from the Chunk Exposure’s pixels.
Forward declarations for lsst::utils::Cache
For details on the Cache class, see the Cache.h file.
It uses a template rather than a pointer so that the derived classes can use the specifics of the transform. The class simplePolyMapping overloads a few routines.
A base class for image defects
Numeric constants used by the Integrate.h integrator routines.
Compute Image Statistics
- Note
Gauss-Kronrod-Patterson quadrature coefficients for use in quadpack routine qng. These coefficients were calculated with 101 decimal digit arithmetic by L. W. Fullerton, Bell Labs, Nov 1981.
- Note
The Statistics class itself can only handle lsst::afw::image::MaskedImage() types. The philosophy has been to handle other types by making them look like lsst::afw::image::MaskedImage() and reusing that code. Users should have no need to instantiate a Statistics object directly, but should use the overloaded makeStatistics() factory functions.
-
namespace
afw
-
namespace
math
-
template<typename
T
>
classCovariogram
- #include <GaussianProcess.h>
The parent class of covariogram functions for use in Gaussian Process interpolation
Each instantiation of a Covariogram will store its own hyper parameters
Subclassed by lsst::afw::math::NeuralNetCovariogram< T >, lsst::afw::math::SquaredExpCovariogram< T >
Public Functions
-
virtual
~Covariogram
()
-
Covariogram
(const Covariogram&)
-
Covariogram &
operator=
(const Covariogram&)
-
Covariogram
(Covariogram&&)
-
Covariogram &
operator=
(Covariogram&&)
-
Covariogram
() construct a Covariogram assigning default values to the hyper parameters
-
virtual T
operator()
(ndarray::Array<const T, 1, 1> const &p1, ndarray::Array<const T, 1, 1> const &p2) const Actually evaluate the covariogram function relating two points you want to interpolate from
- Parameters
[in] p1
: the first point[in] p2
: the second point
-
virtual
-
template<typename
T
>
classGaussianProcess
- #include <GaussianProcess.h>
Stores values of a function sampled on an image and allows you to interpolate the function to unsampled points
The data will be stored in a KD Tree for easy nearest neighbor searching when interpolating.
The array _function[] will contain the values of the function being interpolated. You can provide a two dimensional array _function[][] if you wish to interpolate a vector of functions. In this case _function[i][j] is the jth function associated with the ith data point. Note: presently, the covariance matrices do not relate elements of _function[i][] to each other, so the variances returned will be identical for all functions evaluated at the same point in parameter space.
_data[i][j] will be the jth component of the ith data point.
_max and _min contain the maximum and minimum values of each dimension in parameter space (if applicable) so that data points can be normalized by _max-_min to keep distances between points reasonable. This is an option specified by calling the relevant constructor.
Public Functions
-
GaussianProcess
(const GaussianProcess&)
-
GaussianProcess &
operator=
(const GaussianProcess&)
-
GaussianProcess
(GaussianProcess&&)
-
GaussianProcess &
operator=
(GaussianProcess&&)
This is the constructor you call if you do not wish to normalize the positions of your data points and you have only one function.
- Parameters
[in] dataIn
: an ndarray containing the data points; the ith row of datain is the ith data point[in] ff
: a one-dimensional ndarray containing the values of the scalar function associated with each data point. This is the function you are interpolating[in] covarIn
: is the input covariogram
This is the constructor you call if you want the positions of your data points normalized by the span of each dimension and you have only one function.
Note: the member variable _useMaxMin will allow the code to remember which constructor you invoked
- Parameters
[in] dataIn
: an ndarray containing the data points; the ith row of datain is the ith data point[in] mn
: a one-dimensional ndarray containing the minimum values of each dimension (for normalizing the positions of data points)[in] mx
: a one-dimensional ndarray containing the maximum values of each dimension (for normalizing the positions of data points)[in] ff
: a one-dimensional ndarray containing the values of the scalar function associated with each data point. This is the function you are interpolating[in] covarIn
: is the input covariogram
this is the constructor to use in the case of a vector of input functions and an unbounded/unnormalized parameter space
- Parameters
[in] dataIn
: contains the data points, as in other constructors[in] ff
: contains the functions. Each row of ff corresponds to a datapoint. Each column corresponds to a function (ff[i][j] is the jth function associated with the ith data point)[in] covarIn
: is the input covariogram
this is the constructor to use in the case of a vector of input functions using minima and maxima in parameter space
- Parameters
[in] dataIn
: contains the data points, as in other constructors[in] mn
: contains the minimum allowed values of the parameters in parameter space[in] mx
: contains the maximum allowed values of the parameters in parameter space[in] ff
: contains the functions. Each row of ff corresponds to a datapoint. Each column corresponds to a function (ff[i][j] is the jth function associated with the ith data point)[in] covarIn
: is the input covariogram
-
int
getNPoints
() const return the number of data points stored in the GaussianProcess
-
int
getDim
() const return the dimensionality of data points stored in the GaussianProcess
-
void
getData
(ndarray::Array<T, 2, 2> pts, ndarray::Array<T, 1, 1> fn, ndarray::Array<int, 1, 1> indices) const Return a sub-sample the data underlying the Gaussian Process
- Parameters
[out] pts
: will contain the data points from the Gaussian Process[out] fn
: will contain the function values from the Gaussian Process[in] indices
: is an array of indices indicating the points to return
-
void
getData
(ndarray::Array<T, 2, 2> pts, ndarray::Array<T, 2, 2> fn, ndarray::Array<int, 1, 1> indices) const Return a sub-sample the data underlying the Gaussian Process
- Parameters
[out] pts
: will contain the data points from the Gaussian Process[out] fn
: will contain the function values from the Gaussian Process[in] indices
: is an array of indices indicating the points to return
-
T
interpolate
(ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 1, 1> const &vin, int numberOfNeighbors) const Interpolate the function value at one point using a specified number of nearest neighbors
the interpolated value of the function will be returned at the end of this method
- Parameters
[out] variance
: a one-dimensional ndarray. The value of the variance predicted by the Gaussian process will be stored in the zeroth element[in] vin
: a one-dimensional ndarray representing the point at which you want to interpolate the function[in] numberOfNeighbors
: the number of nearest neighbors to be used in the interpolation
Note: if you used a normalized parameter space, you should not normalize vin before inputting. The code will remember that you want a normalized parameter space, and will apply the normalization when you call interpolate
-
void
interpolate
(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 1, 1> const &vin, int numberOfNeighbors) const This is the version of GaussianProcess::interpolate for a vector of functions.
Note: Because the variance currently only depends on the covariance function and the covariance function currently does not include any terms relating different elements of mu to each other, all of the elements of variance will be identical
- Parameters
[out] mu
: will store the vector of interpolated function values[out] variance
: will store the vector of interpolated variances on mu[in] vin
: the point at which you wish to interpolate the functions[in] numberOfNeighbors
: is the number of nearest neighbor points to use in the interpolation
-
T
selfInterpolate
(ndarray::Array<T, 1, 1> variance, int dex, int numberOfNeighbors) const This method will interpolate the function on a data point for purposes of optimizing hyper parameters.
The interpolated value of the function will be returned at the end of this method
- Parameters
[out] variance
: a one-dimensional ndarray. The value of the variance predicted by the Gaussian process will be stored in the zeroth element[in] dex
: the index of the point you wish to self interpolate[in] numberOfNeighbors
: the number of nearest neighbors to be used in the interpolation
- Exceptions
pex::exceptions::RuntimeError
: if the nearest neighbor search does not find the data point itself as the nearest neighbor
This method ignores the point on which you are interpolating when requesting nearest neighbors
-
void
selfInterpolate
(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, int dex, int numberOfNeighbors) const The version of selfInterpolate called for a vector of functions
- Parameters
[out] mu
: this is where the interpolated function values will be stored[out] variance
: the variance on mu will be stored here[in] dex
: the index of the point you wish to interpolate[in] numberOfNeighbors
: the number of nearest neighbors to use in the interpolation
- Exceptions
pex::exceptions::RuntimeError
: if the nearest neighbor search does not find the data point itself as the nearest neighbor
-
void
batchInterpolate
(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 2, 2> const &queries) const Interpolate a list of query points using all of the input data (rather than nearest neighbors)
This method will attempt to construct a _npts X _npts covariance matrix C and solve the problem Cx=b. Be wary of using it in the case where _npts is very large.
- Parameters
[out] mu
: a 1-dimensional ndarray where the interpolated function values will be stored[out] variance
: a 1-dimensional ndarray where the corresponding variances in the function value will be stored[in] queries
: a 2-dimensional ndarray containing the points to be interpolated. queries[i][j] is the jth component of the ith point
This version of the method will also return variances for all of the query points. That is a very time consuming calculation relative to just returning estimates for the function. Consider calling the version of this method that does not calculate variances (below). The difference in time spent is an order of magnitude for 189 data points and 1,000,000 interpolations.
-
void
batchInterpolate
(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 2, 2> const &queries) const Interpolate a list of points using all of the data. Do not return variances for the interpolation.
This method will attempt to construct a _npts X _npts covariance matrix C and solve the problem Cx=b. Be wary of using it in the case where _npts is very large.
- Parameters
[out] mu
: a 1-dimensional ndarray where the interpolated function values will be stored[in] queries
: a 2-dimensional ndarray containing the points to be interpolated. queries[i][j] is the jth component of the ith point
This version of the method does not return variances. It is an order of magnitude faster than the version of the method that does return variances (timing done on a case with 189 data points and 1 million query points).
-
void
batchInterpolate
(ndarray::Array<T, 2, 2> mu, ndarray::Array<T, 2, 2> variance, ndarray::Array<T, 2, 2> const &queries) const This is the version of batchInterpolate (with variances) that is called for a vector of functions.
-
void
batchInterpolate
(ndarray::Array<T, 2, 2> mu, ndarray::Array<T, 2, 2> const &queries) const This is the version of batchInterpolate (without variances) that is called for a vector of functions.
-
void
addPoint
(ndarray::Array<T, 1, 1> const &vin, T f) Add a point to the pool of data used by GaussianProcess for interpolation
Note: excessive use of addPoint and removePoint can result in an unbalanced
KdTree, which will slow down nearest neighbor searches- Parameters
[in] vin
: a one-dimensional ndarray storing the point in parameter space that you are adding[in] f
: the value of the function at that point
- Exceptions
pex::exceptions::RuntimeError
: if you call this when you should have called the version taking a vector of functions (below)pex::exceptions::RuntimeError
: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::addPoint() )
-
void
addPoint
(ndarray::Array<T, 1, 1> const &vin, ndarray::Array<T, 1, 1> const &f) This is the version of addPoint that is called for a vector of functions
Note: excessive use of addPoint and removePoint can result in an unbalanced
KdTree, which will slow down nearest neighbor searches- Exceptions
pex::exceptions::RuntimeError
: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::addPoint() )
-
void
removePoint
(int dex) This will remove a point from the data set
Note: excessive use of addPoint and removePoint can result in an unbalanced
KdTree, which will slow down nearest neighbor searches- Parameters
[in] dex
: the index of the point you want to remove from your data set
- Exceptions
pex::exceptions::RuntimeError
: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::removePoint() )
-
void
setKrigingParameter
(T kk) Assign a value to the Kriging paramter
- Parameters
[in] kk
: the value assigned to the Kriging parameters
Assign a different covariogram to this GaussianProcess
- Parameters
[in] covar
: the Covariogram object that you wish to assign
-
void
setLambda
(T lambda) set the value of the hyperparameter _lambda
_lambda is a parameter meant to represent the characteristic variance of the function you are interpolating. Currently, it is a scalar such that all data points must have the same characteristic variance. Future iterations of the code may want to promote _lambda to an array so that different data points can have different variances.
- Parameters
[in] lambda
: the value you want assigned to _lambda
-
GaussianProcessTimer &
getTimes
() const Give the user acces to _timer, an object keeping track of the time spent on various processes within interpolate.
This will return a GaussianProcessTimer object. The user can, for example, see how much time has been spent on Eigen’s linear algebra package (see the comments on the GaussianProcessTimer class) using code like
gg=GaussianProcess(….)
ticktock=gg.getTimes()
ticktock.display()
Private Members
-
int
_npts
¶
-
int
_useMaxMin
¶
-
int
_dimensions
¶
-
int
_room
¶
-
int
_roomStep
¶
-
int
_nFunctions
¶
-
T
_krigingParameter
¶
-
T
_lambda
¶
-
ndarray::Array<T, 1, 1>
_max
¶
-
ndarray::Array<T, 1, 1>
_min
¶
-
ndarray::Array<T, 2, 2>
_function
¶
-
std::shared_ptr<Covariogram<T>>
_covariogram
¶
-
GaussianProcessTimer
_timer
¶
-
-
class
GaussianProcessTimer
- #include <GaussianProcess.h>
This is a structure for keeping track of how long the interpolation methods spend on different parts of the interpolation
_eigenTime keeps track of how much time is spent using Eigen’s linear algebra packages
_iterationTime keeps track of how much time is spent iterating over matrix indices (this is also a catch-all for time that does not obviously fit in the other categories)
_searchTime keeps track of how much time is spent on nearest neighbor searches (when applicable)
_varianceTime keeps track of how much time is spent calculating the variance of our interpolated function value (note: time spent using Eigen packages for this purpose is tallied here, not in _eigenTime)
_totalTime keeps track of how much time total is spent on interpolations
_interpolationCount keeps track of how many points have been interpolated
Public Functions
-
GaussianProcessTimer
()
-
GaussianProcessTimer
(GaussianProcessTimer const&)
-
GaussianProcessTimer
(GaussianProcessTimer&&)
-
GaussianProcessTimer &
operator=
(GaussianProcessTimer const&)
-
GaussianProcessTimer &
operator=
(GaussianProcessTimer&&)
-
~GaussianProcessTimer
()
-
void
reset
() Resets all of the data members of GaussianProcessTimer to zero.
-
void
start
() Starts the timer for an individual call to an interpolation routine
-
void
addToEigen
() Adds time to _eigenTime
-
void
addToVariance
() Adds time to _varianceTime
-
void
addToSearch
() Adds time to _searchTime
-
void
addToIteration
() Adds time to _iterationTime
-
void
addToTotal
(int i) Adds time to _totalTime and adds counts to _interpolationCount
- Parameters
[in] i
: the number of counts to add to _interpolationCount
-
void
display
() Displays the current values of all times and _interpolationCount
-
-
template<typename
T
>
classKdTree
- #include <GaussianProcess.h>
The data for GaussianProcess is stored in a KD tree to facilitate nearest-neighbor searches
Note: I have removed the ability to arbitrarily specify a distance function. The KD Tree nearest neighbor search algorithm only makes sense in the case of Euclidean distances, so I have forced KdTree to use Euclidean distances.
Public Functions
-
KdTree
(const KdTree&)
-
KdTree &
operator=
(const KdTree&)
-
KdTree
(KdTree&&)
-
KdTree &
operator=
(KdTree&&)
-
KdTree
()
-
void
Initialize
(ndarray::Array<T, 2, 2> const &dt) Build a KD Tree to store the data for GaussianProcess
- Parameters
[in] dt
: an array, the rows of which are the data points (dt[i][j] is the jth component of the ith data point)
- Exceptions
pex::exceptions::RuntimeError
: if the tree is not properly constructed
-
void
findNeighbors
(ndarray::Array<int, 1, 1> neighdex, ndarray::Array<double, 1, 1> dd, ndarray::Array<const T, 1, 1> const &v, int n_nn) const Find the nearest neighbors of a point
neighbors will be returned in ascending order of distance
- Parameters
[out] neighdex
: this is where the indices of the nearest neighbor points will be stored[out] dd
: this is where the distances to the nearest neighbors will be stored[in] v
: the point whose neighbors you want to find[in] n_nn
: the number of nearest neighbors you want to find
note that distance is forced to be the Euclidean distance
-
T
getData
(int ipt, int idim) const Return one element of one node on the tree
- Parameters
[in] ipt
: the index of the node to return[in] idim
: the index of the dimension to return
-
ndarray::Array<T, 1, 1>
getData
(int ipt) const Return an entire node from the tree
I currently have this as a return-by-value method. When I tried it as a return-by-reference, the compiler gave me
- Parameters
[in] ipt
: the index of the node to return
warning: returning reference to local temporary object
Based on my reading of Stack Overflow, this is because ndarray was implicitly creating a new ndarray::Array<T,1,1> object and passing a reference thereto. It is unclear to me whether or not this object would be destroyed once the call to getData was complete.
The code still compiled, ran, and passed the unit tests, but the above behavior seemed to me like it could be dangerous (and, because ndarray was still creating a new object, it did not seem like we were saving any time), so I reverted to return-by-value.
-
void
addPoint
(ndarray::Array<const T, 1, 1> const &v) Add a point to the tree. Allot more space in _tree and data if needed.
- Parameters
[in] v
: the point you are adding to the tree
- Exceptions
pex::exceptions::RuntimeError
: if the branch ending in the new point is not properly constructed
-
void
removePoint
(int dex) Remove a point from the tree. Reorganize what remains so that the tree remains self-consistent
- Parameters
[in] dex
: the index of the point you want to remove from the tree
- Exceptions
pex::exceptions::RuntimeError
: if the entire tree is not poperly constructed after the point has been removed
-
int
getNPoints
() const return the number of data points stored in the tree
-
void
getTreeNode
(ndarray::Array<int, 1, 1> const &v, int dex) const Return the _tree information for a given data point
- Parameters
[out] v
: the array in which to store the entry from _tree[in] dex
: the index of the node whose information you are requesting
Private Functions
-
void
_organize
(ndarray::Array<int, 1, 1> const &use, int ct, int parent, int dir)¶ Find the daughter point of a node in the tree and segregate the points around it
- Parameters
[in] use
: the indices of the data points being considered as possible daughters[in] ct
: the number of possible daughters[in] parent
: the index of the parent whose daughter we are chosing[in] dir
: which side of the parent are we on? dir==1 means that we are on the left side; dir==2 means the right side.
-
int
_findNode
(ndarray::Array<const T, 1, 1> const &v) const¶ Find the point already in the tree that would be the parent of a point not in the tree
- Parameters
[in] v
: the points whose prospective parent you want to find
-
void
_lookForNeighbors
(ndarray::Array<const T, 1, 1> const &v, int consider, int from) const¶ This method actually looks for the neighbors, determining whether or not to descend branches of the tree.
The class
KdTree keeps track of how many neighbors you want and how many neighbors you have found and what their distances from v are in the class member variables _neighborsWanted, _neighborsFound, _neighborCandidates, and _neighborDistances- Parameters
[in] v
: the point whose neighbors you are looking for[in] consider
: the index of the data point you are considering as a possible nearest neighbor[in] from
: the index of the point you last considered as a nearest neighbor (so the search does not backtrack along the tree)
-
int
_testTree
() const¶ Make sure that the tree is properly constructed. Returns 1 of it is. Return zero if not.
-
int
_walkUpTree
(int target, int dir, int root) const¶ A method to make sure that every data point in the tree is in the correct position relative to its parents.
This method returns the value of _masterParent if the branch is correctly contructed. It returns zero otherwise
- Parameters
[in] target
: is the index of the node you are looking at[in] dir
: is the direction (1,2) of the branch you ascended from root[in] root
: is the node you started walking up from
-
void
_count
(int where, int *ct) const¶ A method which counts the number of nodes descended from a given node (used by removePoint(int))
- Parameters
[in] where
: the node you are currently on[inout] *ct
: keeps track of how many nodes you have encountered as you descend the tree
-
void
_descend
(int root)¶ Descend the tree from a node which has been removed, reassigning severed nodes as you go
- Parameters
root
: the index of the node where you are currently
-
void
_reassign
(int target)¶ Reassign nodes to the tree that were severed by a call to removePoint(int)
- Parameters
target
: the node you are reassigning
-
double
_distance
(ndarray::Array<const T, 1, 1> const &p1, ndarray::Array<const T, 1, 1> const &p2) const¶ calculate the Euclidean distance between the points p1 and p2
Private Members
-
ndarray::Array<int, 2, 2>
_tree
¶
-
ndarray::Array<int, 1, 1>
_inn
¶
-
ndarray::Array<T, 2, 2>
_data
¶
-
int
_npts
¶
-
int
_dimensions
¶
-
int
_room
¶
-
int
_roomStep
¶
-
int
_masterParent
¶
-
int
_neighborsFound
¶
-
int
_neighborsWanted
¶
-
ndarray::Array<double, 1, 1>
_neighborDistances
¶
-
ndarray::Array<int, 1, 1>
_neighborCandidates
¶
-
-
template<typename
T
>
classNeuralNetCovariogram
: public lsst::afw::math::Covariogram<T> - #include <GaussianProcess.h>
a Covariogram that recreates a neural network with one hidden layer and infinite units in that layer
Contains two hyper parameters (_sigma0 and _sigma1) that characterize the expected variance of the function being interpolated
see Rasmussen and Williams (2006) http://www.gaussianprocess.org/gpml/ equation 4.29
Public Functions
-
~NeuralNetCovariogram
()
-
NeuralNetCovariogram
()
-
void
setSigma0
(double sigma0) set the _sigma0 hyper parameter
-
void
setSigma1
(double sigma1) set the _sigma1 hyper parameter
-
T
operator()
(ndarray::Array<const T, 1, 1> const &p1, ndarray::Array<const T, 1, 1> const &p2) const Actually evaluate the covariogram function relating two points you want to interpolate from
- Parameters
[in] p1
: the first point[in] p2
: the second point
-
-
template<typename
T
>
classSquaredExpCovariogram
: public lsst::afw::math::Covariogram<T> - #include <GaussianProcess.h>
a Covariogram that falls off as the negative exponent of the square of the distance between the points
Contains one hyper parameter (_ellSquared) encoding the square of the characteristic length scale of the covariogram
Public Functions
-
~SquaredExpCovariogram
()
-
SquaredExpCovariogram
()
-
void
setEllSquared
(double ellSquared) set the _ellSquared hyper parameter (the square of the characteristic length scale of the covariogram)
-
T
operator()
(ndarray::Array<const T, 1, 1> const &p1, ndarray::Array<const T, 1, 1> const &p2) const Actually evaluate the covariogram function relating two points you want to interpolate from
- Parameters
[in] p1
: the first point[in] p2
: the second point
Private Members
-
double
_ellSquared
¶
-
-
template<typename
-
namespace