TransformDiaSourceCatalogConnections¶
- class lsst.ap.association.TransformDiaSourceCatalogConnections(*, config: PipelineTaskConfig | None = None)¶
Bases:
PipelineTaskConnections
Attributes Summary
Mapping holding all connection attributes.
Class used for declaring PipelineTask input connections.
Connection for initInput dataset.
Connection for output dataset.
Class used for declaring PipelineTask input connections.
Set of dimension names that define the unit of work for this task.
Set with the names of all
InitInput
connection attributes.Set with the names of all
InitOutput
connection attributes.Set with the names of all
connectionTypes.Input
connection attributes.Set with the names of all
Output
connection attributes.Set with the names of all
PrerequisiteInput
connection attributes.Class used for declaring PipelineTask input connections.
Methods Summary
adjustQuantum
(inputs, outputs, label, data_id)Override to make adjustments to
lsst.daf.butler.DatasetRef
objects in thelsst.daf.butler.Quantum
during the graph generation stage of the activator.buildDatasetRefs
(quantum)Build
QuantizedConnection
corresponding to inputQuantum
.Return the names of regular input and output connections whose data IDs should be used to compute the spatial bounds of this task's quanta.
Return the names of regular input and output connections whose data IDs should be used to compute the temporal bounds of this task's quanta.
Attributes Documentation
- allConnections: Mapping[str, BaseConnection] = {'diaSourceCat': Input(name='{fakesType}{coaddName}Diff_diaSrc', storageClass='SourceCatalog', doc='Catalog of DiaSources produced during image differencing.', multiple=False, deprecated=None, _deprecation_context='', dimensions=('instrument', 'visit', 'detector'), isCalibration=False, deferLoad=False, minimum=1, deferGraphConstraint=False), 'diaSourceSchema': InitInput(name='{fakesType}{coaddName}Diff_diaSrc_schema', storageClass='SourceCatalog', doc='Schema for DIASource catalog output by ImageDifference.', multiple=False, deprecated=None, _deprecation_context=''), 'diaSourceTable': Output(name='{fakesType}{coaddName}Diff_diaSrcTable', storageClass='DataFrame', doc='.', multiple=False, deprecated=None, _deprecation_context='', dimensions=('instrument', 'visit', 'detector'), isCalibration=False), 'diffIm': Input(name='{fakesType}{coaddName}Diff_differenceExp', storageClass='ExposureF', doc='Difference image on which the DiaSources were detected.', multiple=False, deprecated=None, _deprecation_context='', dimensions=('instrument', 'visit', 'detector'), isCalibration=False, deferLoad=False, minimum=1, deferGraphConstraint=False), 'reliability': Input(name='{fakesType}{coaddName}RealBogusSources', storageClass='Catalog', doc='Reliability (e.g. real/bogus) classificiation of diaSourceCat sources (optional).', multiple=False, deprecated=None, _deprecation_context='', dimensions=('instrument', 'visit', 'detector'), isCalibration=False, deferLoad=False, minimum=1, deferGraphConstraint=False)}¶
Mapping holding all connection attributes.
This is a read-only view that is automatically updated when connection attributes are added, removed, or replaced in
__init__
. It is also updated after__init__
completes to reflect changes ininputs
,prerequisiteInputs
,outputs
,initInputs
, andinitOutputs
.
- defaultTemplates = {'coaddName': 'deep', 'fakesType': ''}¶
- deprecatedTemplates = {}¶
- diaSourceCat¶
Class used for declaring PipelineTask input connections.
- Raises:
- TypeError
Raised if
minimum
is greater than one butmultiple=False
.- NotImplementedError
Raised if
minimum
is zero for a regularInput
connection; this is not currently supported by our QuantumGraph generation algorithm.
- Attributes:
- name
str
The default name used to identify the dataset type.
- storageClass
str
The storage class used when (un)/persisting the dataset type.
- multiple
bool
Indicates if this connection should expect to contain multiple objects of the given dataset type. Tasks with more than one connection with
multiple=True
with the same dimensions may want to implementPipelineTaskConnections.adjustQuantum
to ensure those datasets are consistent (i.e. zip-iterable) inPipelineTask.runQuantum
and notify the execution system as early as possible of outputs that will not be produced because the corresponding input is missing.- dimensionsiterable of
str
The
lsst.daf.butler.Butler
lsst.daf.butler.Registry
dimensions used to identify the dataset type identified by the specified name.- deferLoad
bool
Indicates that this dataset type will be loaded as a
lsst.daf.butler.DeferredDatasetHandle
. PipelineTasks can use this object to load the object at a later time.- minimum
bool
Minimum number of datasets required for this connection, per quantum. This is checked in the base implementation of
PipelineTaskConnections.adjustQuantum
, which raisesNoWorkFound
if the minimum is not met forInput
connections (causing the quantum to be pruned, skipped, or never created, depending on the context), andFileNotFoundError
forPrerequisiteInput
connections (causing QuantumGraph generation to fail).PipelineTask
implementations may provide customadjustQuantum
implementations for more fine-grained or configuration-driven constraints, as long as they are compatible with this minium.- deferGraphConstraint
bool
, optional If
True
, do not include this dataset type’s existence in the initial query that starts the QuantumGraph generation process. This can be used to make QuantumGraph generation faster by avoiding redundant datasets, and in certain cases it can (along with careful attention to which tasks are included in the same QuantumGraph) be used to work around the QuantumGraph generation algorithm’s inflexible handling of spatial overlaps. This option has no effect when the connection is not an overall input of the pipeline (or subset thereof) for which a graph is being created, and it never affects the ordering of quanta.
- name
- diaSourceSchema¶
Connection for initInput dataset.
- diaSourceTable¶
Connection for output dataset.
- diffIm¶
Class used for declaring PipelineTask input connections.
- Raises:
- TypeError
Raised if
minimum
is greater than one butmultiple=False
.- NotImplementedError
Raised if
minimum
is zero for a regularInput
connection; this is not currently supported by our QuantumGraph generation algorithm.
- Attributes:
- name
str
The default name used to identify the dataset type.
- storageClass
str
The storage class used when (un)/persisting the dataset type.
- multiple
bool
Indicates if this connection should expect to contain multiple objects of the given dataset type. Tasks with more than one connection with
multiple=True
with the same dimensions may want to implementPipelineTaskConnections.adjustQuantum
to ensure those datasets are consistent (i.e. zip-iterable) inPipelineTask.runQuantum
and notify the execution system as early as possible of outputs that will not be produced because the corresponding input is missing.- dimensionsiterable of
str
The
lsst.daf.butler.Butler
lsst.daf.butler.Registry
dimensions used to identify the dataset type identified by the specified name.- deferLoad
bool
Indicates that this dataset type will be loaded as a
lsst.daf.butler.DeferredDatasetHandle
. PipelineTasks can use this object to load the object at a later time.- minimum
bool
Minimum number of datasets required for this connection, per quantum. This is checked in the base implementation of
PipelineTaskConnections.adjustQuantum
, which raisesNoWorkFound
if the minimum is not met forInput
connections (causing the quantum to be pruned, skipped, or never created, depending on the context), andFileNotFoundError
forPrerequisiteInput
connections (causing QuantumGraph generation to fail).PipelineTask
implementations may provide customadjustQuantum
implementations for more fine-grained or configuration-driven constraints, as long as they are compatible with this minium.- deferGraphConstraint
bool
, optional If
True
, do not include this dataset type’s existence in the initial query that starts the QuantumGraph generation process. This can be used to make QuantumGraph generation faster by avoiding redundant datasets, and in certain cases it can (along with careful attention to which tasks are included in the same QuantumGraph) be used to work around the QuantumGraph generation algorithm’s inflexible handling of spatial overlaps. This option has no effect when the connection is not an overall input of the pipeline (or subset thereof) for which a graph is being created, and it never affects the ordering of quanta.
- name
- dimensions: set[str] = {'detector', 'instrument', 'visit'}¶
Set of dimension names that define the unit of work for this task.
Required and implied dependencies will automatically be expanded later and need not be provided.
This may be replaced or modified in
__init__
to change the dimensions of the task. After__init__
it will be afrozenset
and may not be replaced.
- initInputs: set[str] = frozenset({'diaSourceSchema'})¶
Set with the names of all
InitInput
connection attributes.See
inputs
for additional information.
- initOutputs: set[str] = frozenset({})¶
Set with the names of all
InitOutput
connection attributes.See
inputs
for additional information.
- inputs: set[str] = frozenset({'diaSourceCat', 'diffIm', 'reliability'})¶
Set with the names of all
connectionTypes.Input
connection attributes.This is updated automatically as class attributes are added, removed, or replaced in
__init__
. Removing entries from this set will cause those connections to be removed after__init__
completes, but this is supported only for backwards compatibility; new code should instead just delete the collection attributed directly. After__init__
this will be afrozenset
and may not be replaced.
- outputs: set[str] = frozenset({'diaSourceTable'})¶
Set with the names of all
Output
connection attributes.See
inputs
for additional information.
- prerequisiteInputs: set[str] = frozenset({})¶
Set with the names of all
PrerequisiteInput
connection attributes.See
inputs
for additional information.
- reliability¶
Class used for declaring PipelineTask input connections.
- Raises:
- TypeError
Raised if
minimum
is greater than one butmultiple=False
.- NotImplementedError
Raised if
minimum
is zero for a regularInput
connection; this is not currently supported by our QuantumGraph generation algorithm.
- Attributes:
- name
str
The default name used to identify the dataset type.
- storageClass
str
The storage class used when (un)/persisting the dataset type.
- multiple
bool
Indicates if this connection should expect to contain multiple objects of the given dataset type. Tasks with more than one connection with
multiple=True
with the same dimensions may want to implementPipelineTaskConnections.adjustQuantum
to ensure those datasets are consistent (i.e. zip-iterable) inPipelineTask.runQuantum
and notify the execution system as early as possible of outputs that will not be produced because the corresponding input is missing.- dimensionsiterable of
str
The
lsst.daf.butler.Butler
lsst.daf.butler.Registry
dimensions used to identify the dataset type identified by the specified name.- deferLoad
bool
Indicates that this dataset type will be loaded as a
lsst.daf.butler.DeferredDatasetHandle
. PipelineTasks can use this object to load the object at a later time.- minimum
bool
Minimum number of datasets required for this connection, per quantum. This is checked in the base implementation of
PipelineTaskConnections.adjustQuantum
, which raisesNoWorkFound
if the minimum is not met forInput
connections (causing the quantum to be pruned, skipped, or never created, depending on the context), andFileNotFoundError
forPrerequisiteInput
connections (causing QuantumGraph generation to fail).PipelineTask
implementations may provide customadjustQuantum
implementations for more fine-grained or configuration-driven constraints, as long as they are compatible with this minium.- deferGraphConstraint
bool
, optional If
True
, do not include this dataset type’s existence in the initial query that starts the QuantumGraph generation process. This can be used to make QuantumGraph generation faster by avoiding redundant datasets, and in certain cases it can (along with careful attention to which tasks are included in the same QuantumGraph) be used to work around the QuantumGraph generation algorithm’s inflexible handling of spatial overlaps. This option has no effect when the connection is not an overall input of the pipeline (or subset thereof) for which a graph is being created, and it never affects the ordering of quanta.
- name
Methods Documentation
- adjustQuantum(inputs: dict[str, tuple[lsst.pipe.base.connectionTypes.BaseInput, collections.abc.Collection[lsst.daf.butler._dataset_ref.DatasetRef]]], outputs: dict[str, tuple[lsst.pipe.base.connectionTypes.Output, collections.abc.Collection[lsst.daf.butler._dataset_ref.DatasetRef]]], label: str, data_id: DataCoordinate) tuple[collections.abc.Mapping[str, tuple[lsst.pipe.base.connectionTypes.BaseInput, collections.abc.Collection[lsst.daf.butler._dataset_ref.DatasetRef]]], collections.abc.Mapping[str, tuple[lsst.pipe.base.connectionTypes.Output, collections.abc.Collection[lsst.daf.butler._dataset_ref.DatasetRef]]]] ¶
Override to make adjustments to
lsst.daf.butler.DatasetRef
objects in thelsst.daf.butler.Quantum
during the graph generation stage of the activator.- Parameters:
- inputs
dict
Dictionary whose keys are an input (regular or prerequisite) connection name and whose values are a tuple of the connection instance and a collection of associated
DatasetRef
objects. The exact type of the nested collections is unspecified; it can be assumed to be multi-pass iterable and supportlen
andin
, but it should not be mutated in place. In contrast, the outer dictionaries are guaranteed to be temporary copies that are truedict
instances, and hence may be modified and even returned; this is especially useful for delegating tosuper
(see notes below).- outputs
Mapping
Mapping of output datasets, with the same structure as
inputs
.- label
str
Label for this task in the pipeline (should be used in all diagnostic messages).
- data_id
lsst.daf.butler.DataCoordinate
Data ID for this quantum in the pipeline (should be used in all diagnostic messages).
- inputs
- Returns:
- adjusted_inputs
Mapping
Mapping of the same form as
inputs
with updated containers of inputDatasetRef
objects. Connections that are not changed should not be returned at all. Datasets may only be removed, not added. Nested collections may be of any multi-pass iterable type, and the order of iteration will set the order of iteration withinPipelineTask.runQuantum
.- adjusted_outputs
Mapping
Mapping of updated output datasets, with the same structure and interpretation as
adjusted_inputs
.
- adjusted_inputs
- Raises:
- ScalarError
Raised if any
Input
orPrerequisiteInput
connection hasmultiple
set toFalse
, but multiple datasets.- NoWorkFound
Raised to indicate that this quantum should not be run; not enough datasets were found for a regular
Input
connection, and the quantum should be pruned or skipped.- FileNotFoundError
Raised to cause QuantumGraph generation to fail (with the message included in this exception); not enough datasets were found for a
PrerequisiteInput
connection.
Notes
The base class implementation performs important checks. It always returns an empty mapping (i.e. makes no adjustments). It should always called be via
super
by custom implementations, ideally at the end of the custom implementation with already-adjusted mappings when any datasets are actually dropped, e.g.:def adjustQuantum(self, inputs, outputs, label, data_id): # Filter out some dataset refs for one connection. connection, old_refs = inputs["my_input"] new_refs = [ref for ref in old_refs if ...] adjusted_inputs = {"my_input", (connection, new_refs)} # Update the original inputs so we can pass them to super. inputs.update(adjusted_inputs) # Can ignore outputs from super because they are guaranteed # to be empty. super().adjustQuantum(inputs, outputs, label_data_id) # Return only the connections we modified. return adjusted_inputs, {}
Removing outputs here is guaranteed to affect what is actually passed to
PipelineTask.runQuantum
, but its effect on the larger graph may be deferred to execution, depending on the context in whichadjustQuantum
is being run: if one quantum removes an output that is needed by a second quantum as input, the second quantum may not be adjusted (and hence pruned or skipped) until that output is actually found to be missing at execution time.Tasks that desire zip-iteration consistency between any combinations of connections that have the same data ID should generally implement
adjustQuantum
to achieve this, even if they could also run that logic during execution; this allows the system to see outputs that will not be produced because the corresponding input is missing as early as possible.
- buildDatasetRefs(quantum: Quantum) tuple[lsst.pipe.base.connections.InputQuantizedConnection, lsst.pipe.base.connections.OutputQuantizedConnection] ¶
Build
QuantizedConnection
corresponding to inputQuantum
.- Parameters:
- quantum
lsst.daf.butler.Quantum
Quantum object which defines the inputs and outputs for a given unit of processing.
- quantum
- Returns:
- retVal
tuple
of (InputQuantizedConnection
, OutputQuantizedConnection
) Namespaces mapping attribute names (identifiers of connections) to butler references defined in the inputlsst.daf.butler.Quantum
.
- retVal
- getSpatialBoundsConnections() Iterable[str] ¶
Return the names of regular input and output connections whose data IDs should be used to compute the spatial bounds of this task’s quanta.
The spatial bound for a quantum is defined as the union of the regions of all data IDs of all connections returned here, along with the region of the quantum data ID (if the task has spatial dimensions).
- Returns:
- connection_names
collections.abc.Iterable
[str
] Names of collections with spatial dimensions. These are the task-internal connection names, not butler dataset type names.
- connection_names
Notes
The spatial bound is used to search for prerequisite inputs that have skypix dimensions. The default implementation returns an empty iterable, which is usually sufficient for tasks with spatial dimensions, but if a task’s inputs or outputs are associated with spatial regions that extend beyond the quantum data ID’s region, this method may need to be overridden to expand the set of prerequisite inputs found.
Tasks that do not have spatial dimensions that have skypix prerequisite inputs should always override this method, as the default spatial bounds otherwise cover the full sky.
- getTemporalBoundsConnections() Iterable[str] ¶
Return the names of regular input and output connections whose data IDs should be used to compute the temporal bounds of this task’s quanta.
The temporal bound for a quantum is defined as the union of the timespans of all data IDs of all connections returned here, along with the timespan of the quantum data ID (if the task has temporal dimensions).
- Returns:
- connection_names
collections.abc.Iterable
[str
] Names of collections with temporal dimensions. These are the task-internal connection names, not butler dataset type names.
- connection_names
Notes
The temporal bound is used to search for prerequisite inputs that are calibration datasets. The default implementation returns an empty iterable, which is usually sufficient for tasks with temporal dimensions, but if a task’s inputs or outputs are associated with timespans that extend beyond the quantum data ID’s timespan, this method may need to be overridden to expand the set of prerequisite inputs found.
Tasks that do not have temporal dimensions that do not implement this method will use an infinite timespan for any calibration lookups.