DatasetRef

class lsst.daf.butler.DatasetRef(datasetType: DatasetType, dataId: DataCoordinate, run: str, *, id: UUID | None = None, conform: bool = True, id_generation_mode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE, datastore_records: Mapping[str, list[lsst.daf.butler.datastore.stored_file_info.StoredDatastoreItemInfo]] | None = None)

Bases: object

Reference to a Dataset in a Registry.

A DatasetRef may point to a Dataset that currently does not yet exist (e.g., because it is a predicted input for provenance).

Parameters:
datasetTypeDatasetType

The DatasetType for this Dataset.

dataIdDataCoordinate

A mapping of dimensions that labels the Dataset within a Collection.

runstr

The name of the run this dataset was associated with when it was created.

idDatasetId, optional

The unique identifier assigned when the dataset is created. If id is not specified, a new unique ID will be created.

conformbool, optional

If True (default), call DataCoordinate.standardize to ensure that the data ID’s dimensions are consistent with the dataset type’s. DatasetRef instances for which those dimensions are not equal should not be created in new code, but are still supported for backwards compatibility. New code should only pass False if it can guarantee that the dimensions are already consistent.

id_generation_modeDatasetIdGenEnum

ID generation option. UNIQUE makes a random UUID4-type ID. DATAID_TYPE makes a deterministic UUID5-type ID based on a dataset type name and dataId. DATAID_TYPE_RUN makes a deterministic UUID5-type ID based on a dataset type name, run collection name, and dataId.

datastore_recordsDatasetDatastoreRecords or None

Datastore records to attach.

Notes

See also Organizing and identifying datasets

Attributes Summary

dataId

A mapping of Dimension primary key values that labels the dataset within a Collection (DataCoordinate).

datasetType

The definition of this dataset (DatasetType).

dimensions

Dimensions associated with the underlying DatasetType.

id

Primary key of the dataset (DatasetId).

run

The name of the run that produced the dataset.

Methods Summary

expanded(dataId)

Return a new DatasetRef with the given expanded data ID.

from_json(json_str[, universe, registry])

Convert from JSON to a pydantic model.

from_simple(simple[, universe, registry, ...])

Construct a new object from simplified form.

groupByType(refs)

Group an iterable of DatasetRef by DatasetType.

isComponent()

Indicate whether this DatasetRef refers to a component.

isComposite()

Boolean indicating whether this DatasetRef is a composite type.

is_compatible_with(other)

Determine if the given DatasetRef is compatible with this one.

iter_by_type(refs)

Group an iterable of DatasetRef by DatasetType with special hooks for custom iterables that can do this efficiently.

makeComponentRef(name)

Create a DatasetRef that corresponds to a component.

makeCompositeRef()

Create a DatasetRef of the composite from a component ref.

overrideStorageClass(storageClass)

Create a new DatasetRef from this one, but with a modified DatasetType that has a different StorageClass.

replace(*[, id, run, storage_class, ...])

Create a new DatasetRef from this one, but with some modified attributes.

to_json([minimal])

Convert this class to JSON assuming that the to_simple() returns a pydantic model.

to_simple([minimal])

Convert this class to a simple python type.

Attributes Documentation

dataId: DataCoordinate

A mapping of Dimension primary key values that labels the dataset within a Collection (DataCoordinate).

Cannot be changed after a DatasetRef is constructed.

datasetType: DatasetType

The definition of this dataset (DatasetType).

Cannot be changed after a DatasetRef is constructed.

dimensions

Dimensions associated with the underlying DatasetType.

id

Primary key of the dataset (DatasetId).

Cannot be changed after a DatasetRef is constructed.

run: str

The name of the run that produced the dataset.

Cannot be changed after a DatasetRef is constructed.

Methods Documentation

expanded(dataId: DataCoordinate) DatasetRef

Return a new DatasetRef with the given expanded data ID.

Parameters:
dataIdDataCoordinate

Data ID for the new DatasetRef. Must compare equal to the original data ID.

Returns:
refDatasetRef

A new DatasetRef with the given data ID.

classmethod from_json(json_str: str, universe: DimensionUniverse | None = None, registry: Registry | None = None) SupportsSimple

Convert from JSON to a pydantic model.

Parameters:
cls_type of SupportsSimple

The Python type being created.

json_strstr

The JSON string representing this object.

universeDimensionUniverse or None, optional

The universe required to instantiate some models. Required if registry is None.

registryRegistry or None, optional

Registry from which to obtain the dimension universe if an explicit universe has not been given.

Returns:
modelSupportsSimple

Pydantic model constructed from JSON and validated.

classmethod from_simple(simple: SerializedDatasetRef, universe: DimensionUniverse | None = None, registry: Registry | None = None, datasetType: DatasetType | None = None) DatasetRef

Construct a new object from simplified form.

Generally this is data returned from the to_simple method.

Parameters:
simpledict of [str, Any]

The value returned by to_simple().

universeDimensionUniverse

The special graph of all known dimensions. Can be None if a registry is provided.

registrylsst.daf.butler.Registry, optional

Registry to use to convert simple form of a DatasetRef to a full DatasetRef. Can be None if a full description of the type is provided along with a universe.

datasetTypeDatasetType, optional

If datasetType is supplied, this will be used as the datasetType object in the resulting DatasetRef instead of being read from the SerializedDatasetRef. This is useful when many refs share the same type as memory can be saved. Defaults to None.

Returns:
refDatasetRef

Newly-constructed object.

static groupByType(refs: Iterable[DatasetRef]) NamedKeyDict[DatasetType, list[lsst.daf.butler._dataset_ref.DatasetRef]]

Group an iterable of DatasetRef by DatasetType.

Parameters:
refsIterable [ DatasetRef ]

DatasetRef instances to group.

Returns:
groupedNamedKeyDict [ DatasetType, list [ DatasetRef ] ]

Grouped DatasetRef instances.

Notes

When lazy item-iterables are acceptable instead of a full mapping, iter_by_type can in some cases be far more efficient.

isComponent() bool

Indicate whether this DatasetRef refers to a component.

Returns:
isComponentbool

True if this DatasetRef is a component, False otherwise.

isComposite() bool

Boolean indicating whether this DatasetRef is a composite type.

Returns:
isCompositebool

True if this DatasetRef is a composite type, False otherwise.

is_compatible_with(other: DatasetRef) bool

Determine if the given DatasetRef is compatible with this one.

Parameters:
otherDatasetRef

Dataset ref to check.

Returns:
is_compatiblebool

Returns True if the other dataset ref is either the same as this or the dataset type associated with the other is compatible with this one and the dataId and dataset ID match.

Notes

Compatibility requires that the dataId and dataset ID match and the DatasetType is compatible. Compatibility is defined as the storage class associated with the dataset type of the other ref can be converted to this storage class.

Specifically this means that if you have done:

new_ref = ref.overrideStorageClass(sc)

and this is successful, then the guarantee is that:

assert ref.is_compatible_with(new_ref) is True

since we know that the python type associated with the new ref can be converted to the original python type. The reverse is not guaranteed and depends on whether bidirectional converters have been registered.

static iter_by_type(refs: Iterable[DatasetRef]) Iterable[tuple[lsst.daf.butler._dataset_type.DatasetType, collections.abc.Iterable[lsst.daf.butler._dataset_ref.DatasetRef]]]

Group an iterable of DatasetRef by DatasetType with special hooks for custom iterables that can do this efficiently.

Parameters:
refsIterable [ DatasetRef ]

DatasetRef instances to group. If this satisfies the _DatasetRefGroupedIterable protocol, its _iter_by_dataset_type method will be called.

Returns:
groupedIterable [ tuple [ DatasetType, Iterable [ DatasetRef ] ]]

Grouped DatasetRef instances.

makeComponentRef(name: str) DatasetRef

Create a DatasetRef that corresponds to a component.

Parameters:
namestr

Name of the component.

Returns:
refDatasetRef

A DatasetRef with a dataset type that corresponds to the given component, and the same ID and run (which may be None, if they are None in self).

makeCompositeRef() DatasetRef

Create a DatasetRef of the composite from a component ref.

Requires that this DatasetRef is a component.

Returns:
refDatasetRef

A DatasetRef with a dataset type that corresponds to the composite parent of this component, and the same ID and run (which may be None, if they are None in self).

overrideStorageClass(storageClass: str | StorageClass) DatasetRef

Create a new DatasetRef from this one, but with a modified DatasetType that has a different StorageClass.

Parameters:
storageClassstr or StorageClass

The new storage class.

Returns:
modifiedDatasetRef

A new dataset reference that is the same as the current one but with a different storage class in the DatasetType.

replace(*, id: DatasetId | None = None, run: str | None = None, storage_class: str | StorageClass | None = None, datastore_records: DatasetDatastoreRecords | None | Literal[False] = False) DatasetRef

Create a new DatasetRef from this one, but with some modified attributes.

Parameters:
idDatasetId or None

If not None then update dataset ID.

runstr or None

If not None then update run collection name. If dataset_id is None then this will also cause new dataset ID to be generated.

storage_classstr or StorageClass or None

The new storage class. If not None, replaces existing storage class.

datastore_recordsDatasetDatastoreRecords or None

New datastore records. If None remove all records. By default datastore records are preserved.

Returns:
modifiedDatasetRef

A new dataset reference with updated attributes.

to_json(minimal: bool = False) str

Convert this class to JSON assuming that the to_simple() returns a pydantic model.

Parameters:
minimalbool

Return minimal possible representation.

to_simple(minimal: bool = False) SerializedDatasetRef

Convert this class to a simple python type.

This makes it suitable for serialization.

Parameters:
minimalbool, optional

Use minimal serialization. Requires Registry to convert back to a full type.

Returns:
simpledict or int

The object converted to a dictionary.