lsst-pipe-base v28.0.0 (2024-11-21)¶
New Features¶
Added support for initializing processing output runs with just a pipeline graph, not a quantum graph.
This also moves much of the logic for initializing output runs from
lsst.ctrl.mpexec.PreExecInit
toPipelineGraph
andQuantumGraph
methods. (DM-38041)Added functionality to aggregate multiple
QuantumProvenanceGraph.Summary
objects into oneSummary
for a holistic report.While the
QuantumProvenanceGraph
was designed to resolve processing over dataquery-identified groups,QuantumProvenanceGraph.aggregate
is designed to combine multiple group-level reports into one which totals the successes,issues, and failures over the same section of pipeline. (DM-41605)Created a
QuantumProvenanceGraph
class, which details the status of every quantum and dataset over multiple attempts at executing graphs, noting when quanta have been recovered.Steps through all the quantum graphs associated with certain tasks or processing steps. For each graph/attempt, the status of each quantum and dataset is recorded in
QuantumProvenanceGraph.add_new_graph
and outcomes of quanta over multiple runs are resolved inQuantumProvenanceGraph.resolve_duplicates
. At the end of this process, we can combine all attempts into a summary. This serves to answer the question “What happened to this data ID?” in a holistic sense. (DM-41711)Included the number of expected instances in
pipetask report
task-level summary for theQuantumGraphExecutionReport
. (DM-44368)Added mocking support for tasks that write regular datasets with config, log, or metadata storage classes. (DM-44583)
Added new
show_dot
functionality.Refactored the existing pipeline graph
show
function, implementing a newparse_display_args
function to handle the parsing of the--dot
argument. Theshow_dot
function is then implemented to display the pipeline graph as a dot file. A notable user-visible change is that output dataset types with common dimensions and storage classes will now be grouped together in dot files. This change was implemented in order to save space for otherwise very large dot files. (DM-44647)Removed the prohibition on optional regular (i.e., non-prerequisite) input connections.
Optional inputs can be declared by passing
minimum=0
in the connection definition. (DM-45457)Storage class conversions of component dataset types are now supported in pipelines. (DM-46064)
API Changes¶
Relocated the “dot tools” from
lsst.ctrl.mpexec.dotTools
tolsst.pipe.base.dot_tools
unchanged. (DM-45701)
Bug Fixes¶
Appended failed quanta to a list and then return for
pipetask report
. The previous version of the human-readable report only reported the first failed quantum by exiting the loop upon finding it. (DM-44091)Fixed support for task metadata as inputs in the
PipelineTask
mocking system. (DM-45536)Explanatory logs for “initial data ID query returned no rows” now appear as a single log message instead of one entry per line. This improves display in log aggregators, but there is no change to console behavior. (DM-45722)
Other Changes and Additions¶
Added
pipe.base.utils.RegionTimeInfo
, a container for serializing pairs of sky region and timespan. It’s intended for several specific applications when running the AP pipeline. (DM-43020)Added an optional parameter to
PipelineStepTester
that lets configs be tweaked before testing. This is needed for AP pipelines, whose APDB config cannot be defaulted, and is not intended for wide adoption. (DM-43960)Explanatory logs for “initial data ID query returned no rows” are now reported at
ERROR
, notCRITICAL
, level. (DM-45722)Added a DEBUG-level log message into
_pipeline_graph.py
to signify which task is being run. (DM-46351)
An API Removal or Deprecation¶
Removed deprecated code scheduled to be removed after v27:
Removed
lsst.pipe.base.graphBuilder
.Removed
lsst.pipe.base.pipeTools
.Removed
lsst.pipe.base.BaseConnection.makeDatasetType
Removed
Pipeline.toExpandedPipeline
(replaced byto_graph
).Removed
PipelineDatasetTypes
andTaskDatasetTypes
.Removed
QuantumGraphBuilderError
.APIs no longer accept
TaskDef
. (DM-40443)
lsst-pipe-base 27.0.0 (2024-05-29)¶
New Features¶
Added a manifest checker which walks an executed quantum graph to generate a summary report containing information about produced dataset types, missing data, and failures. (DM-37163)
Updated the open-source license to allow for the code to be distributed with either GPLv3 or BSD 3-clause license. (DM-37231)
Rewrote quantum graph generation.
The new algorithm is much faster, more extensible, and easier to maintain (especially when storage-class conversions are present in a pipeline). It also allows
PipelineTasks
to raiseNoWorkFound
or otherwise restrict their outputs during quantum-graph generation and immediately affect the downstream graph. (DM-38498)Added a new subpackage,
lsst.pipe.base.pipeline_graph
, for text-art visualization of pipeline graphs. (DM-39779)Added an option to the interface for creating subsets of whole pipelines which allows control over how named subsets within the pipeline are modified when labels are missing from the new subsetted pipeline. The previous behavior is the new default, that is to drop any named subsets within the pipeline that contain a task label for which there is no task with that label defined. The new option is to to edit each named subset to remove the extra label from the named subset, but otherwise leaving it in the new subsetted pipeline. The interface has been modified in
Pipeline
and also the lower levelPipelineIR
, though the latter should rarely be used directly. The new argument is implemented as an enum option, and can be most easily accessed from thePipeline
class asPipeline.PipelineSubsetCtrl.(DROP/EDIT)
. This interface is available through YAML pipeline specification by specifying thelabeledSubsetModifyMode
key when writing YAML import defectives.New Python interfaces were added for manipulating labeled subsets in a pipeline. These include;
Pipeline.subsets
which is a property returning adict`
of subset labels to sets of task labels,Pipeline.addLabeledSubset
to add a new labeled subset to aPipeline
, andPipeline.removeLabeledSubset
to remove a labeled subset from a pipeline. (DM-41203)Added
QuantumGraph
summary. (DM-41542)Added human-readable option to report summary dictionaries. (DM-41606)
Added a section to pipelines which allows the explicit declaration of which susbsets correspond to steps and the dimensions the step’s quanta can be sharded with. (DM-41650)
The
butler transfer-from-graph
command now supports a--dry-run
option to allow the transfer to run without updating the target butler. (DM-42306)Added
TaskMetadata.get_dict
andset_dict
methods.These provide a consistent way to assign and extract nested dictionaries from
TaskMetadata
,lsst.daf.base.PropertySet
, andlsst.daf.base.PropertyList
. (DM-42928)Added
CachingLimitedButler
as a new type ofLimitedButler
.A
CachingLimitedButler
caches on both.put()
and.get()
, and holds a single instance of the most recently used dataset type for that put/get.The dataset types which will be cached on put/get are controlled via the
cache_on_put
andcache_on_get
attributes, respectively.By default, copies of the cached items are returned on
get
, so that code is free to operate on data in-place. Ano_copy_on_cache
attribute also exists to tell theCachingLimitedButler
not to return copies when it is known that the calling code can be trusted not to change values, e.g., when passing calibs toisrTask
. (DM-43060)QuantumGraph
generation now saves software stack versions in the graph’s metadata. (DM-43225)Added support for testing transient error recovery logic to the
PipelineTask
mock system. (DM-43484)Added
deferBinding
attribute toInput
connection, which allows us to have an input connection with the same dataset type as an output. (DM-43572)
API Changes¶
Deprecated various interfaces that have been obsoleted by
PipelineGraph
.The most prominent deprecations are:
the
Pipeline.toExpandedPipeline
, as well as iteration and task-label indexing forPipeline
;the
PipelineDatasetTypes
andTaskDatasetTypes
classes;the old
GraphBuilder
interface for buildingQuantumGraph
objects. (DM-40441)
Modified the
Instrument
constructors to be class methods rather than static methods. This means that when you callSubclass.from_string()
the returned instrument class is checked to make sure it is a subclass ofSubclass
and not just a subclass ofInstrument
. (DM-42636)
Bug Fixes¶
Fixed bug in pipeline mocking triggered by declaring a config as an input connection. (DM-41191)
Fixed bug in
QuantumGraph
generation triggered by anadjustQuantum
that modifies input edges when prerequisite input edges are present on that quantum. (DM-41486)Fixed bug in meta class compatibility between Python versions for
DatasetQueryConstraints
(DM-41853)Fixed bug in
DatasetTypeExecutionReport
in which extra steps led to miscategorization. The “outputs” section ofpipetask report
should be correct now. (DM-41898)Fixed a QG generation bug involving unusual combinations of dimensions and calibration datasets. (DM-42301)
Fixed an incorrect count of previously-successful quanta in
QuantumGraphBuilder
logging. (DM-42737)Fixed component-dataset query bug in execution reports. (DM-42954)
Replaced failing
QuantumGraph
packages equality check with a weaker test. (DM-43538)Propagated
subsetCtrl
intosubset_from_labels
within thesubsetFromLabels
pipeline method. (DM-44341)
Other Changes and Additions¶
An API Removal or Deprecation¶
Removed
topLevelOnly
parameter fromTaskMetadata.names()
.Removed the
saveMetadata
configuration fromPipelineTask
.Removed
lsst.pipe.base.cmdLineTask.profile
(uselsst.utils.timer.profile
instead).Removed
ButlerQuantumContext
class. UseQuantumContext
instead.Removed
recontitutedDimensions
parameter fromQuantumNode.from_simple()
(DM-40150)
lsst-pipe-base v26.0.0 (2023-09-22)¶
New Features¶
Added system for obtaining data ID packer objects from the combination of an
Instrument
class and configuration. (DM-31924)Added a
PipelineGraph
class that represents a Pipeline with all configuration overrides applied as a graph. (DM-33027)Added new command
butler transfer-from-graph
to transfer results of execution with Quantum-backed butler. (DM-33497)buildExecutionButler
method now supports input graph with all dataset references resolved. (DM-37582)Added convince methods to the Python api for Pipelines. These methods allow merging pipelines, adding labels to / removing labels from subsets, and finding subsets containing a specified label. (DM-37655)
An
Instrument
can now specify the dataset type definition that it would like to use for raw data. This can be done by setting theraw_definition
class property to a tuple of the dataset type name, the dimensions to use for this dataset type, and the storage class name. (DM-37950)Modified
InMemoryDatasetHandle
to allow it to be constructed with keyword arguments that will be converted to the relevant DataId. (DM-38091)Modified
InMemoryDatasetHandle
to allow it to be configured to always deep copy the Python object onget()
. (DM-38694)Revived bit-rotted support for “mocked”
PipelineTask
execution and moved it here (fromctrl_mpexec
). (DM-38952)Formalized support for modifying connections in
PipelineTaskConnections.__init__
implementations.Connections can now be added, removed, or replaced with normal attribute syntax. Removing entries from e.g.
self.inputs
in__init__
still works for backwards compatibility, but deleting attributes is generally preferred. The task dimensions can also be replaced or modified in place in__init__
. (DM-38953)Added a method on
PipelineTaskConfig
objects namedapplyConfigOverrides
. This method is called by the system executingPipelineTask
s within a pipeline, and is passed the instrument and config overrides defined within the pipeline for that task. (DM-39100)Add
Instrument.make_default_dimension_packer
to restore simple access to the default data ID packer for an instrument. (DM-39453)The back-end to quantum graph loading has been optimized such that duplicate objects are not created in memory, but create shared references. This results in a large decrease in memory usage, and decrease in load times. (DM-39582)
A new class
ExecutionResources
has been created to record the number of cores and memory that has been allocated for the execution of a quantum.QuantumContext
(newly renamed fromButlerQuantumContext
) now has aresources
property that can be queried by a task inrunQuantum
. This can be used to tell the task that it can use multiple cores or possibly should make a more efficient use of the available memory resources. (DM-39661)
Made it possible to deprecate
PipelineTask
connections. (DM-39902)Parameters defined in a Pipeline can now be used within a config Python block as well as within config files loaded by a Pipeline. (DM-40198)
When looking up prerequisite inputs with skypix data IDs (e.g., reference catalogs) for a quantum whose data ID is not spatial, use the union of the spatial regions of the input and output datasets as a constraint.
This keeps global sequence-point tasks from being given all such datasets in the input collections. (DM-40243)
Added support for init-input/output datasets in PipelineTask mocking. (DM-40381)
API Changes¶
Several changes to API to add support for
QuantumBackedButler
:Added a
globalInitOutputRefs
method to theQuantumGraph
class which returns global per-graph output dataset references (e.g. for “packages” dataset type).ButlerQuantumContext
can work with eitherButler
orLimitedButler
. Its__init__
method should not be used directly, instead one of the two new class methods should be used -from_full
orfrom_limited
.The
ButlerQuantumContext.registry
attribute was removed, andButlerQuantumContext.dimensions
has been added to hold theDimensionUniverse
.The abstract method
TaskFactory.makeTask
was updated and simplified to acceptTaskDef
andLimitedButler
. (DM-33497)
ButlerQuantumContext
was updated to only need aLimitedButler
.Factory methods
from_full
andfrom_limited
were dropped, a constructor accepting aLimitedButler
instance is now used to make instances. (DM-37704)
Added method
QuantumGraph.updateRun
. This new method updates run collection name and dataset IDs for all output and intermediate datasets in a graph, allowing the graph to be reused.GraphBuilder.makeGraph
method dropped theresolveRefs
argument, the builder now always makes resolved references. Therun
argument is now required to be non-empty string. (DM-38780)
Bug Fixes¶
Fixed a bug that led to valid storage class conversions being rejected when using execution butler. (DM-38614)
Fixed a bug related to checking component datasets in execution butler creation, introduced in DM-38614. (DM-38888)
Fixed handling of storage classes in
QuantumGraph
generation.This could lead to a failure downstream in execution butler creation, and would likely have led to problems with Quantum-Backed Butler usage as well. (DM-39198)
Fixed a bug in
QuantumGraph
generation that could result in datasets fromskip_existing_in
collections being used as outputs, and another that preventedQuantumGraph
generation when askip_existing_in
collection has some outputs from a failed quantum. (DM-39672)Fixed a bug in quantum graph builder which resulted in missing datastore records for calibration datasets. This bug was causing failures for
pipetask
execution with quantum-backed butler. (DM-40254)Ensured QuantumGraphs are built with datastore records for init-input datasets that might have been produced by another task in the pipeline, but will not be because all quanta for that task were skipped due to existing outputs. (DM-40381)
QuantumGraph.updateRun()
method was fixed to update dataset ID in references which have their run collection changed. (DM-40392)
Other Changes and Additions¶
Modified the calling signature for the
Task
constructor such that only theconfig
parameter can be positional. All other parameters must now be keyword parameters. (DM-15325)The
Struct
class is now a subclass ofSimpleNamespace
. (DM-36649)The
DuplicateOutputError
logger now produces a more helpful error message. (DM-38234)Execution butler creation has been changed to use the
DatasetRefs
from the graph rather than creating new registry entries from the dataIDs. This is possible now that the graph is always created with resolved refs and ensures that provenance is consistent between the graph and the outputs.This change to execution butler required that
ButlerQuantumContext.put()
no longer unresolves the graphDatasetRef
(otherwise there would be a dataset ID mismatch). This results in the dataset always using the output run defined in the graph even if the Butler was created with a different default run. (DM-38779)
Stopped sorting Pipeline elements on read.
Ordering specified in pipeline files is now preserved instead. (DM-38953)
Loosened documentation of
QuantumGraph.inputQuanta
andoutputQuanta
. They are not guaranteed to be (and currently are not) lists, so the new documentation describes them as iterables.Documented
universe
constructor parameter toQuantumGraph
.Brought
QuantumGraph
property docs in line with DM standards.
An API Removal or Deprecation¶
Removed deprecated kwargs parameter from in-memory equivalent dataset handle.
Removed deprecated
pipe_base
timer
module (it was moved toutils
).Removed the warning from deprecated
PipelineIR._read_imports
and replaced with a raise.Removed the warning from deprecated
Pipeline._parse_file_specifier
and replaced with a raise.Removed deprecated methods from
TaskMetadata
. (DM-37534)
The
PipelineTaskConfig.saveMetadata
field is now deprecated and will be removed after v26. Its value is ignored and task metadata is always saved.The
ResourceConfig
class has been removed; it was never used. (DM-39377)
Deprecated the
reconstituteDimensions
argument fromQuantumNode.from_simple
(DM-39582)ButlerQuantumContext
has been renamed toQuantumContext
. This reflects the additional functionality it now has. (DM-39661)Removed support for reading quantum graphs in pickle format. (DM-40032)
lsst-pipe-base v25.0.0 (2023-02-28)¶
This is the first release without any support for the Generation 2 middleware.
New Features¶
Added
PipelineStepTester
class, to enable testing that multi-step pipelines are able to run without error. (DM-33779)QuantumGraph
now saves theDimensionUniverse
it was created with when it is persisted. This removes the need to explicitly pass theDimensionUniverse
when loading a saved graph. (DM-35082)Added support for transferring files into execution butler. (DM-35494)
A new class
InMemoryDatasetHandle
is now available. This class provides a variant oflsst.daf.butler.DeferredDatasetHandle
that does not require a butler and lets you store your in-memory objects in something that looks like one and so can be passed toTask.run()
methods that expect to be able to do deferred loading. (DM-35741)Add unit test to cover the new
getNumberOfQuantaForTask
method.Add graph interface,
getNumberOfQuantaForTask
, to determine number of quanta associated with a giventaskDef
.Modifications to
getQuantaForTask
to support showing added additional quanta information in the logger. (DM-36145)
Allow
PipelineTasks
to provide defaults for the--dataset-query-constraints
option for thepipetask
tool. (DM-37786)
API Changes¶
ButlerQuantumContext.get
method can acceptNone
as a reference and returnsNone
as a result object. (DM-35752)GraphBuilder.makeGraph
method addsbind
parameter for bind values to use with the user expression. (DM-36487)InMemoryDatasetHandle
now supports storage class conversion onget()
. (DM-4551)
Bug Fixes¶
lsst.pipe.base.testUtils.makeQuantum
no longer crashes if given a connection that is set to a dataset component. (DM-35721)Ensure
QuantumGraphs
are given aDimensionUniverse
at construction.This fixes a mostly-spurious dimension universe inconsistency warning when reading QuantumGraphs, introduced on DM-35082. (DM-35681)
Fixed an error message that says that repository state has changed during
QuantumGraph
generation when init input datasets are just missing. (DM-37786)
Other Changes and Additions¶
Make diagnostic logging for empty
QuantumGraphs
harder to ignore.Log messages have been upgraded from
WARNING
toFATAL
, and an exception traceback that tends to hide them has been removed. (DM-36360)
An API Removal or Deprecation¶
Removed the
Task.getSchemaCatalogs
andTask.getAllSchemaCatalogs
APIs. These were used byCmdLineTask
but are no longer used in the current middleware. (DM-2850)Relocated
lsst.pipe.base.cmdLineTask.profile
tolsst.utils.timer.profile
. This was relocated as part of the Gen2 removal that includes the removal ofCmdLineTask
. (DM-35697)ArgumentParser
,CmdLineTask
, andTaskRunner
classes have been removed and associated gen2 documentation.The
PipelineIR.from_file()
method has been removed.The
getTaskLogger
function has been removed. (DM-35917)
Replaced
CmdLineTask
andArgumentParser
with non-functioning stubs, disabling all Gen2 functionality. A deprecation message is now issued but the classes do nothing. (DM-35675)
lsst-pipe-base v24.0.0 (2022-08-26)¶
New Features¶
Add the ability for user control over dataset constraints in
QuantumGraph
creation. (DM-31769)Builds using
setuptools
now calculate versions from the Git repository, including the use of alpha releases for those associated with weekly tags. (DM-32408)Improve diagnostics for empty
QuantumGraph
. (DM-32459)A new class has been written for handling
Task
metadata.lsst.pipe.base.TaskMetadata
will in future become the default metadata class forTask
, replacinglsst.daf.base.PropertySet
. The new metadata class is not yet enabled by default. (DM-32682)Add
TaskMetadata.to_dict()
method (this is now used by thelsst.daf.base.PropertySet.from_mapping()
method and triggered by the Butler if type conversion is needed).Use the existing metadata storage class definition if one already exists in a repository.
Switch
Task
to useTaskMetadata
for storing task metadata, rather thanlsst.daf.base.PropertySet
. This removes a C++ dependency from the middleware. (DM-33155)
Added
lsst.pipe.base.Instrument
to represent an instrument in Butler registry.Added
butler register-instrument
command (relocated fromobs_base
).
Bug Fixes¶
Fixed a bug where imported pipeline parameters were taking preference over “top-level” preferences (DM-32080)
Other Changes and Additions¶
If a
PipelineTask
has connections that have a different storage class for a dataset type than the one defined in registry, this will now be allowed if the storage classes are compatible. TheTask
run()
method will be given the Python type it expects and can return the Python type it has declared it returns. The Butler will do the type conversion automatically. (DM-33303)Topological sorting of pipelines on write has been disabled; the order in which the pipeline tasks were read/added is preserved instead. This makes it unnecessary to import all tasks referenced by the pipeline in order to write it. (DM-34155)
lsst-pipe-base v23.0.1 (2022-02-02)¶
Miscellaneous Changes of Minor Interest¶
Execution butler creation time has been reduced significantly by avoiding unnecessary checks for existence of files in the datastore. (DM-33345)
lsst-pipe-base v23.0.0 (2021-12-10)¶
New Features¶
Added a new facility for creating “lightweight” (execution) butlers that pre-fills a local SQLite registry. This can allow a pipeline to be executed without talking to the main registry. (DM-28646)
Allow
PipelineTasks
inputs and outputs to be optional under certain conditions, so tasks with no work to do can be skipped without blocking downstream tasks from running. (DM-30649)Log diagnostic information when QuantumGraphs are empty because the initial query yielded no results.
At present, these diagnostics only cover missing input datasets, which is a common way to get an empty QuantumGraph, but not the only way. (DM-31583)
API Changes¶
GraphBuilder
constructor boolean argumentskipExisting
is replaced withskipExistingIn
which accepts collections to check for existing quantum outputs. (DM-27492)
Other Changes and Additions¶
The logger associated with
Task
is now derived from a Pythonlogging.Logger
and notlsst.log.Log
. This logger includes a newverbose()
log method as an intermediate betweenINFO
andDEBUG
. (DM-30301)Added metadata to QuantumGraphs. This changed the on disk save format, but is backwards compatible with graphs saved with previous versions of the QuantumGraph code. (DM-30702)
All Doxygen documentation has been removed and replaced by Sphinx. (DM-23330)
New documentation on writing pipelines has been added. (DM-27416)
lsst-pipe-base v22.0 (2021-04-01)¶
New Features¶
Add ways to test a PipelineTask’s init inputs/outputs [DM-23156]
Pipelines can now support URIs [DM-28036]
Graph files can now be loaded and saved via URIs [DM-27682]
A new format for saving graphs has been developed (with a
.qgraph
extension). This format supports the ability to read a subset of a graph from an object store. [DM-27784]Graph building with a pipeline that specifies an instrument no longer needs an explicit instrument to be given. [DM-27985]
A
parameters
section has been added to pipeline definitions. [DM-27633]