Template Class GaussianProcess

Class Documentation

template<typename T>
class GaussianProcess

Stores values of a function sampled on an image and allows you to interpolate the function to unsampled points

The data will be stored in a KD Tree for easy nearest neighbor searching when interpolating.

The array _function[] will contain the values of the function being interpolated. You can provide a two dimensional array _function[][] if you wish to interpolate a vector of functions. In this case _function[i][j] is the jth function associated with the ith data point. Note: presently, the covariance matrices do not relate elements of _function[i][] to each other, so the variances returned will be identical for all functions evaluated at the same point in parameter space.

_data[i][j] will be the jth component of the ith data point.

_max and _min contain the maximum and minimum values of each dimension in parameter space (if applicable) so that data points can be normalized by _max-_min to keep distances between points reasonable. This is an option specified by calling the relevant constructor.

Public Functions

GaussianProcess(const GaussianProcess&)
GaussianProcess &operator=(const GaussianProcess&)
GaussianProcess(GaussianProcess&&)
GaussianProcess &operator=(GaussianProcess&&)
GaussianProcess(ndarray::Array<T, 2, 2> const &dataIn, ndarray::Array<T, 1, 1> const &ff, std::shared_ptr<Covariogram<T>> const &covarIn)

This is the constructor you call if you do not wish to normalize the positions of your data points and you have only one function.

Parameters
  • [in] dataIn: an ndarray containing the data points; the ith row of datain is the ith data point

  • [in] ff: a one-dimensional ndarray containing the values of the scalar function associated with each data point. This is the function you are interpolating

  • [in] covarIn: is the input covariogram

GaussianProcess(ndarray::Array<T, 2, 2> const &dataIn, ndarray::Array<T, 1, 1> const &mn, ndarray::Array<T, 1, 1> const &mx, ndarray::Array<T, 1, 1> const &ff, std::shared_ptr<Covariogram<T>> const &covarIn)

This is the constructor you call if you want the positions of your data points normalized by the span of each dimension and you have only one function.

Note: the member variable _useMaxMin will allow the code to remember which constructor you invoked

Parameters
  • [in] dataIn: an ndarray containing the data points; the ith row of datain is the ith data point

  • [in] mn: a one-dimensional ndarray containing the minimum values of each dimension (for normalizing the positions of data points)

  • [in] mx: a one-dimensional ndarray containing the maximum values of each dimension (for normalizing the positions of data points)

  • [in] ff: a one-dimensional ndarray containing the values of the scalar function associated with each data point. This is the function you are interpolating

  • [in] covarIn: is the input covariogram

GaussianProcess(ndarray::Array<T, 2, 2> const &dataIn, ndarray::Array<T, 2, 2> const &ff, std::shared_ptr<Covariogram<T>> const &covarIn)

this is the constructor to use in the case of a vector of input functions and an unbounded/unnormalized parameter space

Parameters
  • [in] dataIn: contains the data points, as in other constructors

  • [in] ff: contains the functions. Each row of ff corresponds to a datapoint. Each column corresponds to a function (ff[i][j] is the jth function associated with the ith data point)

  • [in] covarIn: is the input covariogram

GaussianProcess(ndarray::Array<T, 2, 2> const &dataIn, ndarray::Array<T, 1, 1> const &mn, ndarray::Array<T, 1, 1> const &mx, ndarray::Array<T, 2, 2> const &ff, std::shared_ptr<Covariogram<T>> const &covarIn)

this is the constructor to use in the case of a vector of input functions using minima and maxima in parameter space

Parameters
  • [in] dataIn: contains the data points, as in other constructors

  • [in] mn: contains the minimum allowed values of the parameters in parameter space

  • [in] mx: contains the maximum allowed values of the parameters in parameter space

  • [in] ff: contains the functions. Each row of ff corresponds to a datapoint. Each column corresponds to a function (ff[i][j] is the jth function associated with the ith data point)

  • [in] covarIn: is the input covariogram

int getNPoints() const

return the number of data points stored in the GaussianProcess

int getDim() const

return the dimensionality of data points stored in the GaussianProcess

void getData(ndarray::Array<T, 2, 2> pts, ndarray::Array<T, 1, 1> fn, ndarray::Array<int, 1, 1> indices) const

Return a sub-sample the data underlying the Gaussian Process

Parameters
  • [out] pts: will contain the data points from the Gaussian Process

  • [out] fn: will contain the function values from the Gaussian Process

  • [in] indices: is an array of indices indicating the points to return

void getData(ndarray::Array<T, 2, 2> pts, ndarray::Array<T, 2, 2> fn, ndarray::Array<int, 1, 1> indices) const

Return a sub-sample the data underlying the Gaussian Process

Parameters
  • [out] pts: will contain the data points from the Gaussian Process

  • [out] fn: will contain the function values from the Gaussian Process

  • [in] indices: is an array of indices indicating the points to return

T interpolate(ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 1, 1> const &vin, int numberOfNeighbors) const

Interpolate the function value at one point using a specified number of nearest neighbors

the interpolated value of the function will be returned at the end of this method

Parameters
  • [out] variance: a one-dimensional ndarray. The value of the variance predicted by the Gaussian process will be stored in the zeroth element

  • [in] vin: a one-dimensional ndarray representing the point at which you want to interpolate the function

  • [in] numberOfNeighbors: the number of nearest neighbors to be used in the interpolation

Note: if you used a normalized parameter space, you should not normalize vin before inputting. The code will remember that you want a normalized parameter space, and will apply the normalization when you call interpolate

void interpolate(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 1, 1> const &vin, int numberOfNeighbors) const

This is the version of GaussianProcess::interpolate for a vector of functions.

Note: Because the variance currently only depends on the covariance function and the covariance function currently does not include any terms relating different elements of mu to each other, all of the elements of variance will be identical

Parameters
  • [out] mu: will store the vector of interpolated function values

  • [out] variance: will store the vector of interpolated variances on mu

  • [in] vin: the point at which you wish to interpolate the functions

  • [in] numberOfNeighbors: is the number of nearest neighbor points to use in the interpolation

T selfInterpolate(ndarray::Array<T, 1, 1> variance, int dex, int numberOfNeighbors) const

This method will interpolate the function on a data point for purposes of optimizing hyper parameters.

The interpolated value of the function will be returned at the end of this method

Parameters
  • [out] variance: a one-dimensional ndarray. The value of the variance predicted by the Gaussian process will be stored in the zeroth element

  • [in] dex: the index of the point you wish to self interpolate

  • [in] numberOfNeighbors: the number of nearest neighbors to be used in the interpolation

Exceptions
  • pex::exceptions::RuntimeError: if the nearest neighbor search does not find the data point itself as the nearest neighbor

This method ignores the point on which you are interpolating when requesting nearest neighbors

void selfInterpolate(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, int dex, int numberOfNeighbors) const

The version of selfInterpolate called for a vector of functions

Parameters
  • [out] mu: this is where the interpolated function values will be stored

  • [out] variance: the variance on mu will be stored here

  • [in] dex: the index of the point you wish to interpolate

  • [in] numberOfNeighbors: the number of nearest neighbors to use in the interpolation

Exceptions
  • pex::exceptions::RuntimeError: if the nearest neighbor search does not find the data point itself as the nearest neighbor

void batchInterpolate(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 1, 1> variance, ndarray::Array<T, 2, 2> const &queries) const

Interpolate a list of query points using all of the input data (rather than nearest neighbors)

This method will attempt to construct a _npts X _npts covariance matrix C and solve the problem Cx=b. Be wary of using it in the case where _npts is very large.

Parameters
  • [out] mu: a 1-dimensional ndarray where the interpolated function values will be stored

  • [out] variance: a 1-dimensional ndarray where the corresponding variances in the function value will be stored

  • [in] queries: a 2-dimensional ndarray containing the points to be interpolated. queries[i][j] is the jth component of the ith point

This version of the method will also return variances for all of the query points. That is a very time consuming calculation relative to just returning estimates for the function. Consider calling the version of this method that does not calculate variances (below). The difference in time spent is an order of magnitude for 189 data points and 1,000,000 interpolations.

void batchInterpolate(ndarray::Array<T, 1, 1> mu, ndarray::Array<T, 2, 2> const &queries) const

Interpolate a list of points using all of the data. Do not return variances for the interpolation.

This method will attempt to construct a _npts X _npts covariance matrix C and solve the problem Cx=b. Be wary of using it in the case where _npts is very large.

Parameters
  • [out] mu: a 1-dimensional ndarray where the interpolated function values will be stored

  • [in] queries: a 2-dimensional ndarray containing the points to be interpolated. queries[i][j] is the jth component of the ith point

This version of the method does not return variances. It is an order of magnitude faster than the version of the method that does return variances (timing done on a case with 189 data points and 1 million query points).

void batchInterpolate(ndarray::Array<T, 2, 2> mu, ndarray::Array<T, 2, 2> variance, ndarray::Array<T, 2, 2> const &queries) const

This is the version of batchInterpolate (with variances) that is called for a vector of functions.

void batchInterpolate(ndarray::Array<T, 2, 2> mu, ndarray::Array<T, 2, 2> const &queries) const

This is the version of batchInterpolate (without variances) that is called for a vector of functions.

void addPoint(ndarray::Array<T, 1, 1> const &vin, T f)

Add a point to the pool of data used by GaussianProcess for interpolation

Note: excessive use of addPoint and removePoint can result in an unbalanced

KdTree, which will slow down nearest neighbor searches
Parameters
  • [in] vin: a one-dimensional ndarray storing the point in parameter space that you are adding

  • [in] f: the value of the function at that point

Exceptions
  • pex::exceptions::RuntimeError: if you call this when you should have called the version taking a vector of functions (below)

  • pex::exceptions::RuntimeError: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::addPoint() )

void addPoint(ndarray::Array<T, 1, 1> const &vin, ndarray::Array<T, 1, 1> const &f)

This is the version of addPoint that is called for a vector of functions

Note: excessive use of addPoint and removePoint can result in an unbalanced

KdTree, which will slow down nearest neighbor searches
Exceptions
  • pex::exceptions::RuntimeError: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::addPoint() )

void removePoint(int dex)

This will remove a point from the data set

Note: excessive use of addPoint and removePoint can result in an unbalanced

KdTree, which will slow down nearest neighbor searches
Parameters
  • [in] dex: the index of the point you want to remove from your data set

Exceptions
  • pex::exceptions::RuntimeError: if the tree does not end up properly constructed (the exception is actually thrown by KdTree<T>::removePoint() )

void setKrigingParameter(T kk)

Assign a value to the Kriging paramter

Parameters
  • [in] kk: the value assigned to the Kriging parameters

void setCovariogram(std::shared_ptr<Covariogram<T>> const &covar)

Assign a different covariogram to this GaussianProcess

Parameters
  • [in] covar: the Covariogram object that you wish to assign

void setLambda(T lambda)

set the value of the hyperparameter _lambda

_lambda is a parameter meant to represent the characteristic variance of the function you are interpolating. Currently, it is a scalar such that all data points must have the same characteristic variance. Future iterations of the code may want to promote _lambda to an array so that different data points can have different variances.

Parameters
  • [in] lambda: the value you want assigned to _lambda

GaussianProcessTimer &getTimes() const

Give the user acces to _timer, an object keeping track of the time spent on various processes within interpolate.

This will return a GaussianProcessTimer object. The user can, for example, see how much time has been spent on Eigen’s linear algebra package (see the comments on the GaussianProcessTimer class) using code like

gg=GaussianProcess(….)

ticktock=gg.getTimes()

ticktock.display()