DatastoreCacheManager

class lsst.daf.butler.DatastoreCacheManager(config: Union[str, DatastoreCacheManagerConfig], universe: DimensionUniverse)

Bases: lsst.daf.butler.AbstractDatastoreCacheManager

A class for managing caching in a Datastore using local files.

Parameters:
config : str or DatastoreCacheManagerConfig

Configuration to control caching.

universe : DimensionUniverse

Set of all known dimensions, used to expand and validate any used in lookup keys.

Notes

Two environment variables can be used to override the cache directory and expiration configuration:

  • $DAF_BUTLER_CACHE_DIRECTORY
  • $DAF_BUTLER_CACHE_EXPIRATION_MODE

The expiration mode should take the form mode=threshold so for example to configure expiration to limit the cache directory to 5 datasets the value would be datasets=5.

Attributes Summary

cache_directory
cache_size Size of the cache in bytes.
file_count Return number of cached files tracked by registry.

Methods Summary

find_in_cache(ref, extension) Look for a dataset in the cache and return its location.
move_to_cache(uri, ref) Move a file to the cache.
remove_from_cache(refs, …) Remove the specified datasets from the cache.
scan_cache() Scan the cache directory and record information about files.
should_be_cached(entity, DatasetType, …) Indicate whether the entity should be added to the cache.

Attributes Documentation

cache_directory
cache_size

Size of the cache in bytes.

file_count

Return number of cached files tracked by registry.

Methods Documentation

find_in_cache(ref: lsst.daf.butler.core.datasets.ref.DatasetRef, extension: str) → Iterator[Optional[lsst.daf.butler.core._butlerUri._butlerUri.ButlerURI]]

Look for a dataset in the cache and return its location.

Parameters:
ref : DatasetRef

Dataset to locate in the cache.

extension : str

File extension expected. Should include the leading “.”.

Yields:
uri : ButlerURI or None

The URI to the cached file, or None if the file has not been cached.

Notes

Should be used as a context manager in order to prevent this file from being removed from the cache for that context.

move_to_cache(uri: lsst.daf.butler.core._butlerUri._butlerUri.ButlerURI, ref: lsst.daf.butler.core.datasets.ref.DatasetRef) → Optional[lsst.daf.butler.core._butlerUri._butlerUri.ButlerURI]

Move a file to the cache.

Move the given file into the cache, using the supplied DatasetRef for naming. A call is made to should_be_cached() and if the DatasetRef should not be accepted None will be returned.

Cache expiry can occur during this.

Parameters:
uri : ButlerURI

Location of the file to be relocated to the cache. Will be moved.

ref : DatasetRef

Ref associated with this file. Will be used to determine the name of the file within the cache.

Returns:
new : ButlerURI or None

URI to the file within the cache, or None if the dataset was not accepted by the cache.

remove_from_cache(refs: Union[lsst.daf.butler.core.datasets.ref.DatasetRef, Iterable[lsst.daf.butler.core.datasets.ref.DatasetRef]]) → None

Remove the specified datasets from the cache.

It is not an error for these datasets to be missing from the cache.

Parameters:
ref : DatasetRef or iterable of DatasetRef

The datasets to remove from the cache.

scan_cache() → None

Scan the cache directory and record information about files.

should_be_cached(entity: Union[DatasetRef, DatasetType, StorageClass]) → bool

Indicate whether the entity should be added to the cache.

This is relevant when reading or writing.

Parameters:
entity : StorageClass or DatasetType or DatasetRef

Thing to test against the configuration. The name property is used to determine a match. A DatasetType will first check its name, before checking its StorageClass. If there are no matches the default will be returned.

Returns:
should_cache : bool

Returns True if the dataset should be cached; False otherwise.