CollectionManager

class lsst.daf.butler.registry.interfaces.CollectionManager(*, registry_schema_version: VersionTuple | None = None)

Bases: Generic[_Key], VersionedExtension

An interface for managing the collections (including runs) in a Registry.

Parameters:
registry_schema_versionVersionTuple or None, optional

Version of registry schema.

Notes

Each layer in a multi-layer Registry has its own record for any collection for which it has datasets (or quanta). Different layers may use different IDs for the same collection, so any usage of the IDs obtained through the CollectionManager APIs are strictly for internal (to Registry) use.

Methods Summary

addCollectionForeignKey(tableSpec, *[, ...])

Add a foreign key (field and constraint) referencing the collection table.

addRunForeignKey(tableSpec, *[, prefix, ...])

Add a foreign key (field and constraint) referencing the run table.

clone(db, caching_context)

Make an independent copy of this manager instance bound to a new Database instance.

extend_collection_chain(...)

Add children to the end of a CHAINED collection.

find(name)

Return the collection record associated with the given name.

getCollectionForeignKeyName([prefix])

Return the name of the field added by addCollectionForeignKey if called with the same prefix.

getDocumentation(key)

Retrieve the documentation string for a collection.

getParentChains(key)

Find all CHAINED collection names that directly contain the given collection.

getRunForeignKeyName([prefix])

Return the name of the field added by addRunForeignKey if called with the same prefix.

get_docs(key)

Retrieve the documentation string for multiple collections.

initialize(db, context, *, caching_context)

Construct an instance of the manager.

lookup_name_sql(sql_key, sql_from_clause)

Return a SQLAlchemy column and FROM clause that enable a query to look up a collection name from the key.

prepend_collection_chain(...)

Add children to the beginning of a CHAINED collection.

refresh()

Ensure all other operations on this manager are aware of any collections that may have been registered by other clients since it was initialized or last refreshed.

register(name, type[, doc])

Ensure that a collection of the given name and type are present in the layer this manager is associated with.

remove(name)

Completely remove a collection.

remove_from_collection_chain(...)

Remove children from a CHAINED collection.

resolve_wildcard(wildcard, *[, ...])

Iterate over collection records that match a wildcard.

setDocumentation(key, doc)

Set the documentation string for a collection.

update_chain(parent_collection_name, ...[, ...])

Replace all of the children in a chained collection with a new list.

Methods Documentation

abstract classmethod addCollectionForeignKey(tableSpec: TableSpec, *, prefix: str = 'collection', onDelete: str | None = None, constraint: bool = True, **kwargs: Any) FieldSpec

Add a foreign key (field and constraint) referencing the collection table.

Parameters:
tableSpecddl.TableSpec

Specification for the table that should reference the collection table. Will be modified in place.

prefixstr, optional

A name to use for the prefix of the new field; the full name may have a suffix (and is given in the returned ddl.FieldSpec).

onDeletestr, optional

One of “CASCADE” or “SET NULL”, indicating what should happen to the referencing row if the collection row is deleted. None indicates that this should be an integrity error.

constraintbool, optional

If False (True is default), add a field that can be joined to the collection primary key, but do not add a foreign key constraint.

**kwargs

Additional keyword arguments are forwarded to the ddl.FieldSpec constructor (only the name and dtype arguments are otherwise provided).

Returns:
fieldSpecddl.FieldSpec

Specification for the field being added.

abstract classmethod addRunForeignKey(tableSpec: TableSpec, *, prefix: str = 'run', onDelete: str | None = None, constraint: bool = True, **kwargs: Any) FieldSpec

Add a foreign key (field and constraint) referencing the run table.

Parameters:
tableSpecddl.TableSpec

Specification for the table that should reference the run table. Will be modified in place.

prefixstr, optional

A name to use for the prefix of the new field; the full name may have a suffix (and is given in the returned ddl.FieldSpec).

onDeletestr, optional

One of “CASCADE” or “SET NULL”, indicating what should happen to the referencing row if the collection row is deleted. None indicates that this should be an integrity error.

constraintbool, optional

If False (True is default), add a field that can be joined to the run primary key, but do not add a foreign key constraint.

**kwargs

Additional keyword arguments are forwarded to the ddl.FieldSpec constructor (only the name and dtype arguments are otherwise provided).

Returns:
fieldSpecddl.FieldSpec

Specification for the field being added.

abstract clone(db: Database, caching_context: CachingContext) Self

Make an independent copy of this manager instance bound to a new Database instance.

Parameters:
dbDatabase

New Database object to use when instantiating the manager.

caching_contextCachingContext

New CachingContext object to use when instantiating the manager.

Returns:
instanceCollectionManager

New manager instance with the same configuration as this instance, but bound to a new Database object.

abstract extend_collection_chain(parent_collection_name: str, child_collection_names: list[str]) None

Add children to the end of a CHAINED collection.

If any of the children already existed in the chain, they will be moved to the new position at the end of the chain.

Parameters:
parent_collection_namestr

The name of a CHAINED collection to which we will add new children.

child_collection_nameslist [ str ]

A child collection name or list of child collection names to be added to the parent.

Raises:
MissingCollectionError

If any of the specified collections do not exist.

CollectionTypeError

If the parent collection is not a CHAINED collection.

CollectionCycleError

If this operation would create a collection cycle.

Notes

If this function is called within a call to Butler.transaction, it will hold a lock that prevents other processes from modifying the parent collection until the end of the transaction. Keep these transactions short.

abstract find(name: str) CollectionRecord[_Key]

Return the collection record associated with the given name.

Parameters:
namestr

Name of the collection.

Returns:
recordCollectionRecord

Object representing the collection, including its type and ID. If record.type is CollectionType.RUN, this will be a RunRecord instance. If record.type is CollectionType.CHAIN, this will be a ChainedCollectionRecord instance.

Raises:
MissingCollectionError

Raised if the given collection does not exist.

Notes

Collections registered by another client of the same layer since the last call to initialize or refresh may not be found.

abstract classmethod getCollectionForeignKeyName(prefix: str = 'collection') str

Return the name of the field added by addCollectionForeignKey if called with the same prefix.

Parameters:
prefixstr

A name to use for the prefix of the new field; the full name may have a suffix.

Returns:
namestr

The field name.

abstract getDocumentation(key: _Key) str | None

Retrieve the documentation string for a collection.

Parameters:
key_Key

Internal primary key value for the collection.

Returns:
docsstr or None

Docstring for the collection with the given key.

abstract getParentChains(key: _Key) set[str]

Find all CHAINED collection names that directly contain the given collection.

Parameters:
key_Key

Internal primary key value for the collection.

Returns:
namesset [str]

Parent collection names.

abstract classmethod getRunForeignKeyName(prefix: str = 'run') str

Return the name of the field added by addRunForeignKey if called with the same prefix.

Parameters:
prefixstr

A name to use for the prefix of the new field; the full name may have a suffix.

Returns:
namestr

The field name.

abstract get_docs(key: Iterable[_Key]) Mapping[_Key, str]

Retrieve the documentation string for multiple collections.

Parameters:
keyIterable [ _Key ]

Internal primary key value for the collection.

Returns:
docsMapping [ _Key, str]

Documentation strings indexed by collection key. Only collections with non-empty documentation strings are returned.

abstract classmethod initialize(db: Database, context: StaticTablesContext, *, caching_context: CachingContext, registry_schema_version: VersionTuple | None = None) CollectionManager

Construct an instance of the manager.

Parameters:
dbDatabase

Interface to the underlying database engine and namespace.

contextStaticTablesContext

Context object obtained from Database.declareStaticTables; used to declare any tables that should always be present in a layer implemented with this manager.

caching_contextCachingContext

Object controlling caching of information returned by managers.

registry_schema_versionVersionTuple or None

Schema version of this extension as defined in registry.

Returns:
managerCollectionManager

An instance of a concrete CollectionManager subclass.

lookup_name_sql(sql_key: ColumnElement[_Key], sql_from_clause: FromClause) tuple[sqlalchemy.sql.elements.ColumnElement[str], sqlalchemy.sql.selectable.FromClause]

Return a SQLAlchemy column and FROM clause that enable a query to look up a collection name from the key.

Parameters:
sql_keysqlalchemy.ColumnElement

SQL column expression that evaluates to the collection key.

sql_from_clausesqlalchemy.FromClause

SQL FROM clause from which sql_key was obtained.

Returns:
sql_namesqlalchemy.ColumnElement [ str ]

SQL column expression that evalutes to the collection name.

sql_from_clausesqlalchemy.FromClause

SQL FROM clause that includes the given sql_from_clause and any table needed to provided sql_name.

abstract prepend_collection_chain(parent_collection_name: str, child_collection_names: list[str]) None

Add children to the beginning of a CHAINED collection.

If any of the children already existed in the chain, they will be moved to the new position at the beginning of the chain.

Parameters:
parent_collection_namestr

The name of a CHAINED collection to which we will add new children.

child_collection_nameslist [ str ]

A child collection name or list of child collection names to be added to the parent.

Raises:
MissingCollectionError

If any of the specified collections do not exist.

CollectionTypeError

If the parent collection is not a CHAINED collection.

CollectionCycleError

If this operation would create a collection cycle.

Notes

If this function is called within a call to Butler.transaction, it will hold a lock that prevents other processes from modifying the parent collection until the end of the transaction. Keep these transactions short.

abstract refresh() None

Ensure all other operations on this manager are aware of any collections that may have been registered by other clients since it was initialized or last refreshed.

abstract register(name: str, type: CollectionType, doc: str | None = None) tuple[lsst.daf.butler.registry.interfaces._collections.CollectionRecord[_Key], bool]

Ensure that a collection of the given name and type are present in the layer this manager is associated with.

Parameters:
namestr

Name of the collection.

typeCollectionType

Enumeration value indicating the type of collection.

docstr, optional

Documentation string for the collection. Ignored if the collection already exists.

Returns:
recordCollectionRecord

Object representing the collection, including its type and ID. If type is CollectionType.RUN, this will be a RunRecord instance. If type is CollectionType.CHAIN, this will be a ChainedCollectionRecord instance.

registeredbool

True if the collection was registered, False if it already existed.

Raises:
TransactionInterruption

Raised if this operation is invoked within a Database.transaction context.

DatabaseConflictError

Raised if a collection with this name but a different type already exists.

Notes

Concurrent registrations of the same collection should be safe; nothing should happen if the types are consistent, and integrity errors due to inconsistent types should happen before any database changes are made.

abstract remove(name: str) None

Completely remove a collection.

Any existing CollectionRecord objects that correspond to the removed collection are considered invalidated.

Parameters:
namestr

Name of the collection to remove.

Notes

If this collection is referenced by foreign keys in tables managed by other objects, the ON DELETE clauses of those tables will be invoked. That will frequently delete many dependent rows automatically (via “CASCADE”, but it may also cause this operation to fail (with rollback) unless dependent rows that do not have an ON DELETE clause are removed first.

abstract remove_from_collection_chain(parent_collection_name: str, child_collection_names: list[str]) None

Remove children from a CHAINED collection.

Parameters:
parent_collection_namestr

The name of a CHAINED collection from which we will remove children.

child_collection_nameslist [ str ]

A child collection name or list of child collection names to be removed from the parent.

Raises:
MissingCollectionError

If any of the specified collections do not exist.

CollectionTypeError

If the parent collection is not a CHAINED collection.

Notes

If this function is called within a call to Butler.transaction, it will hold a lock that prevents other processes from modifying the parent collection until the end of the transaction. Keep these transactions short.

abstract resolve_wildcard(wildcard: CollectionWildcard, *, collection_types: Set[CollectionType] = frozenset({CollectionType.RUN, CollectionType.TAGGED, CollectionType.CHAINED, CollectionType.CALIBRATION}), flatten_chains: bool = True, include_chains: bool | None = None) list[lsst.daf.butler.registry.interfaces._collections.CollectionRecord[_Key]]

Iterate over collection records that match a wildcard.

Parameters:
wildcardCollectionWildcard

Names and/or patterns for collections.

collection_typescollections.abc.Set [ CollectionType ], optional

If provided, only yield collections of these types.

flatten_chainsbool, optional

If True (default) recursively yield the child collections of CHAINED collections.

include_chainsbool, optional

If True, return records for CHAINED collections themselves. The default is the opposite of flatten_chains: either return records for CHAINED collections or their children, but not both.

Returns:
recordslist [ CollectionRecord ]

Matching collection records.

abstract setDocumentation(key: _Key, doc: str | None) None

Set the documentation string for a collection.

Parameters:
key_Key

Internal primary key value for the collection.

docstr, optional

Docstring for the collection with the given key.

abstract update_chain(parent_collection_name: str, child_collection_names: list[str], allow_use_in_caching_context: bool = False) None

Replace all of the children in a chained collection with a new list.

Parameters:
parent_collection_namestr

The name of a CHAINED collection to be modified.

child_collection_nameslist [ str ]

A child collection name or list of child collection names to be assigned to the parent.

allow_use_in_caching_contextbool, optional

If True, skip a check that would otherwise disallow this function from being called inside an active caching context. (Only exists for legacy use, will eventually be removed).

Raises:
MissingCollectionError

If any of the specified collections do not exist.

CollectionTypeError

If the parent collection is not a CHAINED collection.

CollectionCycleError

If this operation would create a collection cycle.

Notes

If this function is called within a call to Butler.transaction, it will hold a lock that prevents other processes from modifying the parent collection until the end of the transaction. Keep these transactions short.