DatasetRef¶

class lsst.daf.butler.DatasetRef(datasetType: DatasetType, dataId: DataCoordinate, run: str, *, id: UUID | None = None, conform: bool = True, id_generation_mode: DatasetIdGenEnum = DatasetIdGenEnum.UNIQUE, datastore_records: Mapping[str, Iterable[StoredDatastoreItemInfo]] | None = None)¶

Bases: object

Reference to a Dataset in a Registry.

A DatasetRef may point to a Dataset that currently does not yet exist (e.g., because it is a predicted input for provenance).

Parameters:

datasetTypeDatasetType: The DatasetType for this Dataset.
dataIdDataCoordinate: A mapping of dimensions that labels the Dataset within a Collection.
runstr: The name of the run this dataset was associated with when it was created.
idDatasetId, optional: The unique identifier assigned when the dataset is created. If id is not specified, a new unique ID will be created.
conformbool, optional: If True (default), call DataCoordinate.standardize to ensure that the data ID’s dimensions are consistent with the dataset type’s. DatasetRef instances for which those dimensions are not equal should not be created in new code, but are still supported for backwards compatibility. New code should only pass False if it can guarantee that the dimensions are already consistent.
id_generation_modeDatasetIdGenEnum: ID generation option. UNIQUE makes a random UUID4-type ID. DATAID_TYPE makes a deterministic UUID5-type ID based on a dataset type name and dataId. DATAID_TYPE_RUN makes a deterministic UUID5-type ID based on a dataset type name, run collection name, and dataId.

See also

Organizing and identifying datasets

Attributes Summary

`dataId`	A mapping of `Dimension` primary key values that labels the dataset within a Collection (`DataCoordinate`).
`datasetType`	The definition of this dataset (`DatasetType`).
`dimensions`	Dimensions associated with the underlying `DatasetType`.
`id`	Primary key of the dataset (`DatasetId`).
`run`	The name of the run that produced the dataset.

Methods Summary

`expanded`(dataId)	Return a new `DatasetRef` with the given expanded data ID.
`from_json`(json_str[, universe, registry])	Convert from JSON to a pydantic model.
`from_simple`(simple[, universe, registry, ...])	Construct a new object from simplified form.
`groupByType`(refs)	Group an iterable of `DatasetRef` by `DatasetType`.
`isComponent`()	Indicate whether this `DatasetRef` refers to a component.
`isComposite`()	Boolean indicating whether this `DatasetRef` is a composite type.
`is_compatible_with`(ref)	Determine if the given `DatasetRef` is compatible with this one.
`iter_by_type`(refs)	Group an iterable of `DatasetRef` by `DatasetType` with special hooks for custom iterables that can do this efficiently.
`makeComponentRef`(name)	Create a `DatasetRef` that corresponds to a component.
`makeCompositeRef`()	Create a `DatasetRef` of the composite from a component ref.
`overrideStorageClass`(storageClass)	Create a new `DatasetRef` from this one, but with a modified `DatasetType` that has a different `StorageClass`.
`replace`(*[, id, run, storage_class, ...])	Create a new `DatasetRef` from this one, but with some modified attributes.
`to_json`([minimal])	Convert this class to JSON assuming that the `to_simple()` returns a pydantic model.
`to_simple`([minimal])	Convert this class to a simple python type.

Attributes Documentation

dataId: DataCoordinate¶

A mapping of Dimension primary key values that labels the dataset within a Collection (DataCoordinate).

Cannot be changed after a DatasetRef is constructed.

datasetType: DatasetType¶

The definition of this dataset (DatasetType).

Cannot be changed after a DatasetRef is constructed.

dimensions¶: Dimensions associated with the underlying DatasetType.

id¶

Primary key of the dataset (DatasetId).

Cannot be changed after a DatasetRef is constructed.

run: str¶

The name of the run that produced the dataset.

Cannot be changed after a DatasetRef is constructed.

Methods Documentation

expanded(dataId: DataCoordinate) → DatasetRef¶

Return a new DatasetRef with the given expanded data ID.

Parameters:

dataIdDataCoordinate: Data ID for the new DatasetRef. Must compare equal to the original data ID.

Returns:

refDatasetRef: A new DatasetRef with the given data ID.

classmethod from_json(json_str: str, universe: DimensionUniverse | None = None, registry: Registry | None = None) → SupportsSimple¶: Convert from JSON to a pydantic model.

classmethod from_simple(simple: SerializedDatasetRef, universe: DimensionUniverse | None = None, registry: Registry | None = None, datasetType: DatasetType | None = None) → DatasetRef¶

Construct a new object from simplified form.

Generally this is data returned from the to_simple method.

Parameters:

simpledict of [str, Any]: The value returned by to_simple().
universeDimensionUniverse: The special graph of all known dimensions. Can be None if a registry is provided.
registrylsst.daf.butler.Registry, optional: Registry to use to convert simple form of a DatasetRef to a full DatasetRef. Can be None if a full description of the type is provided along with a universe.
datasetTypeDatasetType, optional: If datasetType is supplied, this will be used as the datasetType object in the resulting DatasetRef instead of being read from the SerializedDatasetRef. This is useful when many refs share the same type as memory can be saved. Defaults to None.

Returns:

refDatasetRef: Newly-constructed object.

static groupByType(refs: Iterable[DatasetRef]) → NamedKeyDict[DatasetType, list[lsst.daf.butler._dataset_ref.DatasetRef]]¶

Group an iterable of DatasetRef by DatasetType.

Parameters:

refsIterable [ DatasetRef ]: DatasetRef instances to group.

Returns:

groupedNamedKeyDict [ DatasetType, list [ DatasetRef ] ]: Grouped DatasetRef instances.

Notes

When lazy item-iterables are acceptable instead of a full mapping, iter_by_type can in some cases be far more efficient.

isComponent() → bool¶

Indicate whether this DatasetRef refers to a component.

Returns:

isComponentbool: True if this DatasetRef is a component, False otherwise.

isComposite() → bool¶

Boolean indicating whether this DatasetRef is a composite type.

Returns:

isCompositebool: True if this DatasetRef is a composite type, False otherwise.

is_compatible_with(ref: DatasetRef) → bool¶

Determine if the given DatasetRef is compatible with this one.

Parameters:

otherDatasetRef: Dataset ref to check.

Returns:

is_compatiblebool: Returns True if the other dataset ref is either the same as this or the dataset type associated with the other is compatible with this one and the dataId and dataset ID match.

Notes

Compatibility requires that the dataId and dataset ID match and the DatasetType is compatible. Compatibility is defined as the storage class associated with the dataset type of the other ref can be converted to this storage class.

Specifically this means that if you have done:

new_ref = ref.overrideStorageClass(sc)

and this is successful, then the guarantee is that:

assert ref.is_compatible_with(new_ref) is True

since we know that the python type associated with the new ref can be converted to the original python type. The reverse is not guaranteed and depends on whether bidirectional converters have been registered.

static iter_by_type(refs: Iterable[DatasetRef]) → Iterable[tuple[lsst.daf.butler._dataset_type.DatasetType, collections.abc.Iterable[lsst.daf.butler._dataset_ref.DatasetRef]]]¶

Group an iterable of DatasetRef by DatasetType with special hooks for custom iterables that can do this efficiently.

Parameters:

refsIterable [ DatasetRef ]: DatasetRef instances to group. If this satisfies the _DatasetRefGroupedIterable protocol, its _iter_by_dataset_type method will be called.

Returns:

groupedIterable [ tuple [ DatasetType, Iterable [ DatasetRef ] ]]: Grouped DatasetRef instances.

makeComponentRef(name: str) → DatasetRef¶

Create a DatasetRef that corresponds to a component.

Parameters:

namestr: Name of the component.

Returns:

refDatasetRef: A DatasetRef with a dataset type that corresponds to the given component, and the same ID and run (which may be None, if they are None in self).

makeCompositeRef() → DatasetRef¶

Create a DatasetRef of the composite from a component ref.

Requires that this DatasetRef is a component.

Returns:

refDatasetRef: A DatasetRef with a dataset type that corresponds to the composite parent of this component, and the same ID and run (which may be None, if they are None in self).

overrideStorageClass(storageClass: str | StorageClass) → DatasetRef¶

Create a new DatasetRef from this one, but with a modified DatasetType that has a different StorageClass.

Parameters:

storageClassstr or StorageClass: The new storage class.

Returns:

modifiedDatasetRef: A new dataset reference that is the same as the current one but with a different storage class in the DatasetType.

Create a new DatasetRef from this one, but with some modified attributes.

Parameters:

idDatasetId or None: If not None then update dataset ID.
runstr or None: If not None then update run collection name. If dataset_id is None then this will also cause new dataset ID to be generated.
storage_classstr or StorageClass or None.: The new storage class. If not None, replaces existing storage class.
datastore_recordsDatasetDatastoreRecords or None: New datastore records. If None remove all records. By default datastore records are preserved.

Returns:

modifiedDatasetRef: A new dataset reference with updated attributes.

to_json(minimal: bool = False) → str¶: Convert this class to JSON assuming that the to_simple() returns a pydantic model.

to_simple(minimal: bool = False) → SerializedDatasetRef¶

Convert this class to a simple python type.

This makes it suitable for serialization.

Parameters:

minimalbool, optional: Use minimal serialization. Requires Registry to convert back to a full type.

Returns:

simpledict or int: The object converted to a dictionary.

Navigation

DatasetRef¶