QuantumBackedButler¶
- class lsst.daf.butler.QuantumBackedButler(predicted_inputs: Iterable[UUID], predicted_outputs: Iterable[UUID], dimensions: DimensionUniverse, datastore: Datastore, storageClasses: StorageClassFactory, dataset_types: Mapping[str, DatasetType] | None = None)¶
- Bases: - LimitedButler- An implementation of - LimitedButlerintended to back execution of a single- Quantum.- Parameters:
- predicted_inputsIterable[DatasetId]
- Dataset IDs for datasets that can can be read from this butler. 
- predicted_outputsIterable[DatasetId]
- Dataset IDs for datasets that can be stored in this butler. 
- dimensionsDimensionUniverse
- Object managing all dimension definitions. 
- datastoreDatastore
- Datastore to use for all dataset I/O and existence checks. 
- storageClassesStorageClassFactory
- Object managing all storage class definitions. 
 
- predicted_inputs
 - Notes - Most callers should use the - initialize- classmethodto construct new instances instead of calling the constructor directly.- QuantumBackedButleruses a SQLite database internally, in order to reuse existing- DatastoreRegistryBridgeand- OpaqueTableStorageimplementations that rely SQLAlchemy. If implementations are added in the future that don’t rely on SQLAlchemy, it should be possible to swap them in by overriding the type arguments to- initialize(though at present,- QuantumBackedButlerwould still create at least an in-memory SQLite database that would then go unused).`- We imagine - QuantumBackedButlerbeing used during (at least) batch execution to capture- Datastorerecords and save them to per-quantum files, which are also a convenient place to store provenance for eventual upload to a SQL-backed- Registry(once- Registryhas tables to store provenance, that is). These per-quantum files can be written in two ways:- The SQLite file used internally by - QuantumBackedButlercan be used directly but customizing the- filenameargument to- initialize, and then transferring that file to the object store after execution completes (or fails; a- try/finallypattern probably makes sense here).
- A JSON or YAML file can be written by calling - extract_provenance_data, and using- pydanticmethods to write the returned- QuantumProvenanceDatato a file.
 - Note that at present, the SQLite file only contains datastore records, not provenance, but that should be easy to address (if desired) after we actually design a - Registryschema for provenance. I also suspect that we’ll want to explicitly close the SQLite file somehow before trying to transfer it. But I’m guessing we’d prefer to write the per-quantum files as JSON anyway.- Attributes Summary - This is a Generation 3 Butler. - Structure managing all dimensions recognized by this data repository ( - DimensionUniverse).- Methods Summary - datasetExistsDirect(ref)- Return - Trueif a dataset is actually present in the Datastore.- Extract provenance information and datastore records from this butler. - from_predicted(config, predicted_inputs, ...)- Construct a new - QuantumBackedButlerfrom sets of input and output dataset IDs.- get(ref, /, *[, parameters, storageClass])- Retrieve a stored dataset. - getDeferred(ref, /, *[, parameters, ...])- Create a - DeferredDatasetHandlewhich can later retrieve a dataset, after an immediate registry lookup.- getDirect(ref, *[, parameters, storageClass])- Deprecated since version v26.0. - getDirectDeferred(ref, *[, parameters, ...])- Deprecated since version v26.0. - initialize(config, quantum, dimensions[, ...])- Construct a new - QuantumBackedButlerfrom repository configuration and helper types.- markInputUnused(ref)- Indicate that a predicted input was not actually used when processing a - Quantum.- pruneDatasets(refs, *[, disassociate, ...])- Remove one or more datasets from a collection and/or storage. - put(obj, ref, /)- Store a dataset that already has a UUID and - RUNcollection.- putDirect(obj, ref, /)- Store a dataset that already has a UUID and - RUNcollection.- Attributes Documentation - GENERATION: ClassVar[int] = 3¶
- This is a Generation 3 Butler. - This attribute may be removed in the future, once the Generation 2 Butler interface has been fully retired; it should only be used in transitional code. 
 - dimensions¶
 - Methods Documentation - datasetExistsDirect(ref: DatasetRef) bool¶
- Return - Trueif a dataset is actually present in the Datastore.- Parameters:
- refDatasetRef
- Resolved reference to a dataset. 
 
- ref
- Returns:
- existsbool
- Whether the dataset exists in the Datastore. 
 
- exists
 
 - extract_provenance_data() QuantumProvenanceData¶
- Extract provenance information and datastore records from this butler. - Returns:
- provenanceQuantumProvenanceData
- A serializable struct containing input/output dataset IDs and datastore records. This assumes all dataset IDs are UUIDs (just to make it easier for - pydanticto reason about the struct’s types); the rest of this class makes no such assumption, but the approach to processing in which it’s useful effectively requires UUIDs anyway.
 
- provenance
 - Notes - QuantumBackedButlerrecords this provenance information when its methods are used, which mostly saves- PipelineTaskauthors from having to worry about while still recording very detailed information. But it has two small weaknesses:- Calling - getDirectDeferredor- getDirectis enough to mark a dataset as an “actual input”, which may mark some datasets that aren’t actually used. We rely on task authors to use- markInputUnusedto address this.
- We assume that the execution system will call - datasetExistsDirecton all predicted inputs prior to execution, in order to populate the “available inputs” set. This is what I envision ‘- SingleQuantumExecutordoing after we update it to use this class, but it feels fragile for this class to make such a strong assumption about how it will be used, even if I can’t think of any other executor behavior that would make sense.
 
 - classmethod from_predicted(config: ~lsst.daf.butler.core.config.Config | str | ~urllib.parse.ParseResult | ~lsst.resources._resourcePath.ResourcePath | ~pathlib.Path, predicted_inputs: ~typing.Iterable[~uuid.UUID], predicted_outputs: ~typing.Iterable[~uuid.UUID], dimensions: ~lsst.daf.butler.core.dimensions._universe.DimensionUniverse, datastore_records: ~typing.Mapping[str, ~lsst.daf.butler.core.datastoreRecordData.DatastoreRecordData], filename: str = ':memory:', OpaqueManagerClass: ~typing.Type[~lsst.daf.butler.registry.interfaces._opaque.OpaqueTableStorageManager] = <class 'lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager'>, BridgeManagerClass: ~typing.Type[~lsst.daf.butler.registry.interfaces._bridge.DatastoreRegistryBridgeManager] = <class 'lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager'>, search_paths: ~typing.List[str] | None = None, dataset_types: ~typing.Mapping[str, ~lsst.daf.butler.core.datasets.type.DatasetType] | None = None) QuantumBackedButler¶
- Construct a new - QuantumBackedButlerfrom sets of input and output dataset IDs.- Parameters:
- configConfigorResourcePathExpression
- A butler repository root, configuration filename, or configuration instance. 
- predicted_inputsIterable[DatasetId]
- Dataset IDs for datasets that can can be read from this butler. 
- predicted_outputsIterable[DatasetId]
- Dataset IDs for datasets that can be stored in this butler, must be fully resolved. 
- dimensionsDimensionUniverse
- Object managing all dimension definitions. 
- filenamestr, optional
- Name for the SQLite database that will back this butler; defaults to an in-memory database. 
- datastore_recordsdict[str,DatastoreRecordData] orNone
- Datastore records to import into a datastore. 
- OpaqueManagerClasstype, optional
- A subclass of - OpaqueTableStorageManagerto use for datastore opaque records. Default is a SQL-backed implementation.
- BridgeManagerClasstype, optional
- A subclass of - DatastoreRegistryBridgeManagerto use for datastore location records. Default is a SQL-backed implementation.
- search_pathslistofstr, optional
- Additional search paths for butler configuration. 
- dataset_types: `Mapping` [`str`, `DatasetType`], optional
- Mapping of the dataset type name to its registry definition. 
 
- config
 
 - get(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any¶
- Retrieve a stored dataset. - Parameters:
- ref: `DatasetRef`
- A resolved - DatasetRefdirectly associated with a dataset.
- parametersdict
- Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset. 
- storageClassStorageClassorstr, optional
- The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read - StorageClasscan force a different type to be returned. This type must be compatible with the original type.
 
- Returns:
- objobject
- The dataset. 
 
- obj
- Raises:
- AmbiguousDatasetError
- Raised if the supplied - DatasetRefis unresolved.
 
 - Notes - In a - LimitedButlerthe only allowable way to specify a dataset is to use a resolved- DatasetRef. Subclasses can support more options.
 - getDeferred(ref: DatasetRef, /, *, parameters: dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) DeferredDatasetHandle¶
- Create a - DeferredDatasetHandlewhich can later retrieve a dataset, after an immediate registry lookup.- Parameters:
- refDatasetRef
- For the default implementation of a - LimitedButler, the only acceptable parameter is a resolved- DatasetRef.
- parametersdict
- Additional StorageClass-defined options to control reading, typically used to efficiently read only a subset of the dataset. 
- storageClassStorageClassorstr, optional
- The storage class to be used to override the Python type returned by this method. By default the returned type matches the dataset type definition for this dataset. Specifying a read - StorageClasscan force a different type to be returned. This type must be compatible with the original type.
 
- ref
- Returns:
- objDeferredDatasetHandle
- A handle which can be used to retrieve a dataset at a later time. 
 
- obj
 - Notes - In a - LimitedButlerthe only allowable way to specify a dataset is to use a resolved- DatasetRef. Subclasses can support more options.
 - getDirect(ref: DatasetRef, *, parameters: Dict[str, Any] | None = None, storageClass: StorageClass | str | None = None) Any¶
- Deprecated since version v26.0: Butler.get() now behaves like Butler.getDirect() when given a DatasetRef. Please use Butler.get(). Will be removed after v27.0. 
 - getDirectDeferred(ref: DatasetRef, *, parameters: dict | None = None, storageClass: StorageClass | str | None = None) DeferredDatasetHandle¶
- Deprecated since version v26.0: Butler.getDeferred() now behaves like getDirectDeferred() when given a DatasetRef. Please use Butler.getDeferred(). Will be removed after v27.0. 
 - classmethod initialize(config: ~lsst.daf.butler.core.config.Config | str | ~urllib.parse.ParseResult | ~lsst.resources._resourcePath.ResourcePath | ~pathlib.Path, quantum: ~lsst.daf.butler.core.quantum.Quantum, dimensions: ~lsst.daf.butler.core.dimensions._universe.DimensionUniverse, filename: str = ':memory:', OpaqueManagerClass: ~typing.Type[~lsst.daf.butler.registry.interfaces._opaque.OpaqueTableStorageManager] = <class 'lsst.daf.butler.registry.opaque.ByNameOpaqueTableStorageManager'>, BridgeManagerClass: ~typing.Type[~lsst.daf.butler.registry.interfaces._bridge.DatastoreRegistryBridgeManager] = <class 'lsst.daf.butler.registry.bridge.monolithic.MonolithicDatastoreRegistryBridgeManager'>, search_paths: ~typing.List[str] | None = None, dataset_types: ~typing.Mapping[str, ~lsst.daf.butler.core.datasets.type.DatasetType] | None = None) QuantumBackedButler¶
- Construct a new - QuantumBackedButlerfrom repository configuration and helper types.- Parameters:
- configConfigorResourcePathExpression
- A butler repository root, configuration filename, or configuration instance. 
- quantumQuantum
- Object describing the predicted input and output dataset relevant to this butler. This must have resolved - DatasetRefinstances for all inputs and outputs.
- dimensionsDimensionUniverse
- Object managing all dimension definitions. 
- filenamestr, optional
- Name for the SQLite database that will back this butler; defaults to an in-memory database. 
- OpaqueManagerClasstype, optional
- A subclass of - OpaqueTableStorageManagerto use for datastore opaque records. Default is a SQL-backed implementation.
- BridgeManagerClasstype, optional
- A subclass of - DatastoreRegistryBridgeManagerto use for datastore location records. Default is a SQL-backed implementation.
- search_pathslistofstr, optional
- Additional search paths for butler configuration. 
- dataset_types: `Mapping` [`str`, `DatasetType`], optional
- Mapping of the dataset type name to its registry definition. 
 
- config
 
 - markInputUnused(ref: DatasetRef) None¶
- Indicate that a predicted input was not actually used when processing a - Quantum.- Parameters:
- refDatasetRef
- Reference to the unused dataset. 
 
- ref
 - Notes - By default, a dataset is considered “actually used” if it is accessed via - getDirector a handle to it is obtained via- getDirectDeferred(even if the handle is not used). This method must be called after one of those in order to remove the dataset from the actual input list.- This method does nothing for butlers that do not store provenance information (which is the default implementation provided by the base class). 
 - pruneDatasets(refs: Iterable[DatasetRef], *, disassociate: bool = True, unstore: bool = False, tags: Iterable[str] = (), purge: bool = False) None¶
- Remove one or more datasets from a collection and/or storage. - Parameters:
- refsIterableofDatasetRef
- Datasets to prune. These must be “resolved” references (not just a - DatasetTypeand data ID).
- disassociatebool, optional
- Disassociate pruned datasets from - tags, or from all collections if- purge=True.
- unstorebool, optional
- If - True(- Falseis default) remove these datasets from all datastores known to this butler. Note that this will make it impossible to retrieve these datasets even via other collections. Datasets that are already not stored are ignored by this option.
- tagsIterable[str], optional
- TAGGEDcollections to disassociate the datasets from. Ignored if- disassociateis- Falseor- purgeis- True.
- purgebool, optional
- If - True(- Falseis default), completely remove the dataset from the- Registry. To prevent accidental deletions,- purgemay only be- Trueif all of the following conditions are met:- This mode may remove provenance information from datasets other than those provided, and should be used with extreme care. 
 
- refs
- Raises:
- TypeError
- Raised if the butler is read-only, if no collection was provided, or the conditions for - purge=Truewere not met.
 
 
 - put(obj: Any, ref: DatasetRef, /) DatasetRef¶
- Store a dataset that already has a UUID and - RUNcollection.- Parameters:
- objobject
- The dataset. 
- refDatasetRef
- Resolved reference for a not-yet-stored dataset. 
 
- obj
- Returns:
- refDatasetRef
- The same as the given, for convenience and symmetry with - Butler.put.
 
- ref
- Raises:
- TypeError
- Raised if the butler is read-only. 
 
 - Notes - Whether this method inserts the given dataset into a - Registryis implementation defined (some- LimitedButlersubclasses do not have a- Registry), but it always adds the dataset to a- Datastore, and the given- ref.idand- ref.runare always preserved.
 - putDirect(obj: Any, ref: DatasetRef, /) DatasetRef¶
- Store a dataset that already has a UUID and - RUNcollection.- Parameters:
- objobject
- The dataset. 
- refDatasetRef
- Resolved reference for a not-yet-stored dataset. 
 
- obj
- Returns:
- refDatasetRef
- The same as the given, for convenience and symmetry with - Butler.put.
 
- ref
- Raises:
- TypeError
- Raised if the butler is read-only. 
 
 - Deprecated since version v26.0: Butler.put() now behaves like Butler.putDirect() when given a DatasetRef. Please use Butler.put(). Will be removed after v27.0.