DatasetProvenance¶

Bases: BaseModel

Provenance of a single DatasetRef.

Attributes Summary

model_config

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Methods Summary

`add_extra_provenance`(dataset_id, extra)	Attach extra provenance to a specific dataset.
`add_input`(ref)	Add an input dataset to the provenance.
`from_flat_dict`(prov_dict, butler)	Create a provenance object from a provenance dictionary.
`model_post_init`(context, /)	This function is meant to behave like a BaseModel method to initialise private attributes.
`populate_cache`()
`strip_provenance_from_flat_dict`(prov_dict)	Remove provenance keys from a mapping that had been populated by `to_flat_dict`.
`to_flat_dict`(ref, /, *[, prefix, sep, ...])	Return provenance as a flattened dictionary.

Attributes Documentation

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Methods Documentation

add_extra_provenance(dataset_id: uuid.UUID, extra: Mapping[str, _PROV_TYPES]) → None¶

Attach extra provenance to a specific dataset.

Parameters:

dataset_iduuid.UUID: The ID of the dataset to receive this provenance.
extraMapping [ str, typing.Any ]: The extra provenance information as a dictionary. The values must be simple Python scalars or scalars that can be serialized by Pydantic and convert to a simple string value.

Notes

The keys in the extra provenance can not include provenance keys of run, id, or datasettype (in upper or lower case).

add_input(ref: DatasetRef) → None¶

Add an input dataset to the provenance.

Parameters:

refDatasetRef: A dataset to register as an input.

classmethod from_flat_dict(prov_dict: Mapping[str, Any], butler: Butler) → tuple[Self, DatasetRef | None]¶

Create a provenance object from a provenance dictionary.

Parameters:

prov_dictcollections.abc.Mapping: Dictionary populated by to_flat_dict.
butlerlsst.daf.butler.Butler: Butler to query to find references datasets.

Returns:

provDatasetProvenance: Provenance extracted from this object.
refDatasetRef or None: Dataset associated with this provenance. Can be None if no provenance found.

Raises:

ValueError: Raised if no provenance values are found in the dictionary.
RuntimeError: Raised if a referenced dataset is not known to the given butler.

model_post_init(context: Any, /) → None¶

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Args:: self: The BaseModel instance. context: The context.

populate_cache() → Self¶

classmethod strip_provenance_from_flat_dict(prov_dict: MutableMapping[str, Any]) → None¶

Remove provenance keys from a mapping that had been populated by to_flat_dict.

Parameters:

prov_dictcollections.abc.MutableMapping: Dictionary to modify.

Return provenance as a flattened dictionary.

Parameters:

refDatasetRef or None: If given, a dataset for which this provenance is relevant and should be included.
prefixstr, optional: A prefix to use for each key in the provenance dictionary.
sepstr, optional: Separator to use to represent hierarchy. Must be a single character. Can not be a number, letter, or underscore (to avoid confusion with provenance keys themselves).
simple_typesbool, optional: If True only simple Python types will be used in the returned dictionary, specifically UUIDs will be returned as str. If False, UUIDs will be returned as uuid.UUID. Complex types found in DatasetProvenance.extras will be cast to a str if True.
use_upperbool or None, optional: If None the case of the keys matches the case of the first character of the prefix (defined by whether str.isupper() returns true, else they will be lower case). If False the case will be lower case, and if True the case will be upper case.

Returns:

provdict: Dictionary representing the provenance. The keys are defined in the notes below.

Raises:

ValueError: Raised if the separator is not a single character.

Notes

Keys from the given dataset (all optional if no dataset is given):

Id:: UUID of the given dataset.
Run:: Run of the given dataset.
Datasettype:: Dataset type of the given dataset.
Dataid x:: An entry for each required dimension, “x”, in the data ID.

Each input dataset will have the id, run, and datasettype keys as defined above (but no dataid key) with an input N prefix where N starts counting at 0.

The quantum ID, if present, will use key quantum.

Examples

>>> provenance.to_flat_dict(
...     ref, prefix="lsst.butler", sep=".", simple_types=True
... )
{
    "lsst.butler.id": "ae0fa83d-cc89-41dd-9680-f97ede49f01e",
    "lsst.butler.run": "test_run",
    "lsst.butler.datasettype": "data",
    "lsst.butler.dataid.detector": 10,
    "lsst.butler.dataid.instrument": "LSSTCam",
    "lsst.butler.quantum": "d93a735b-08f0-477d-bc95-2cc32d6d898b",
    "lsst.butler.input.0.id": "3dfd7ba5-5e35-4565-9d87-4b33880ed06c",
    "lsst.butler.input.0.run": "other_run",
    "lsst.butler.input.0.datasettype": "astropy_parquet",
    "lsst.butler.input.1.id": "7a99f6e9-4035-3d68-842e-58ecce1dc935",
    "lsst.butler.input.1.run": "other_run",
    "lsst.butler.input.1.datasettype": "astropy_parquet",
}

Navigation

DatasetProvenance¶