Datastore is configured in the
datastore section of the top-level Butler YAML configuration.
The only mandatory entry in the datastore configuration is the
This specifies the fully qualified class name of the Python class implementing the datastore.
The default Butler configuration uses the
All other keys depend on the specific datastore class that is selected.
The default configuration values can be inspected at
$DAF_BUTLER_DIR/python/lsst/daf/butler/configs (they can be accessed directly as Python package resources) and current values can be obtained by calling
butler config-dump on a Butler repository.
The supported datastores are:
File-Based Datastores (local POSIX along with remote datastores such as S3)
There is a single file-based datastore (
FileDatastore) that handles local POSIX file system and remote object stores.
This datastore uses formatters to read datasets from files and write datasets to files.
Data access is entirely mediated by the URI used to specify the datastore root and currently supports S3 and WebDAV in addition to local files.
If an absolute URI is stored directly in the datastore it can use a different URI scheme from that used to locate the root of the datastore.
The supported configurations are:
The location of the “root” of the datastore “file system”. Usually the default value of
<butlerRoot>/datastorecan be left unchanged. Here
<butlerRoot>is a magic value that is replaced either with the location of the Butler configuration file or the top-level
rootas set in that
This sections defines the name of the registry table that should be used to hold details about datasets stored in the datastore (such as the path within the datastore and the associated formatter). This only needs to be set if multiple datastores are to be used simultaneously within one Butler repository since the table names should not clash.
A Boolean to define whether an attempt should be made to initialize the datastore by creating the directory. Defaults to
True, and that default should normally not be changed.
The template to use to construct “files” within the datastore. The template uses data dimensions to do this. Generally the default setting will be usable although it can be tuned per
StorageClassor data ID. Changes to this template only apply to new datasets since datastore remembers the names associated with previous datasets. Templates are formatted as
StorageClassor data ID to a specific formatter class that understands the associated Python type and will serialize it to a file artifact. The formatters section also supports the definitions of write recipes (bulk configurations that can be selected for specific formatters) and write parameters (parameters that control how the dataset is serialized; note it is required that all serialized artifacts be readable by a formatter without knowing which write parameters were used). Once a formatter is associated with a particular dataset it is permanently associated with that dataset even if the configuration is later modified to specify a different formatter.
StorageClassor data ID that will be accepted or rejected by this datastore.
Controls whether composite datasets are disassembled by the datastore. By default composites are not disassembled. Disassembly can be controlled by
StorageClassor data ID.
Templates, formatters, constraints, and composites all use a standard look up priority. The order is:
If there is an
instrumentin the data ID the first look up will be for a key that matches
instrument<INSTRUMENT_NAME>. If there is a match the items within that part of the hierarchy will be matched in preference to those at the top-level.
The highest priority is then the
DatasetTypecorresponds to a component of a composite the composite name will then be checked.
If there is still no match the dimensions will be used. Dimensions are specified by the presence of a
+as a separator. For example
instrument+physical_filter+visitwould match any
DatasetTypethat uses those three dimensions.
The final match is against the
Datastore to Datastore Transfers¶
transfer_from() is called between two
FileDatastore datastores, there is an optimized code path that makes it far more efficient than doing a butler export followed by an import or ingest.
If the source datastore uses absolute URIs for some datasets, whether those datasets are copied/linked into the target datastore or left as direct URIs depends on the value of the
If the default transfer mode of
"auto" is used the direct URI will be stored in the target identically to that used in the source datastore.
This can be useful if you are doing a local transfer and know that the original location will always be available.
Any other transfer mode, such as
"copy" will be result in the target datastore taking ownership of the file.
InMemoryDatastore currently only supports the
This allows the datastore to accept specific dataset types.
In the future more features will be added to allow some form of cache expiry.
ChainedDatastore datastore enables multiple other datastores to be combined into one.
The datastore will be sent to every datastore in the chain and success is reported if any of the datastores accepts the dataset.
When a dataset is retrieved each datastore is asked for the dataset in turn and the first match is sufficient.
This allows an in-memory datastore to be combined with a file-based datastore to enable simple in-memory retrieval for a dataset that has been persisted to disk.
A file-based datastore can be turned into a a chained datastore after the fact, for example by adding an in-memory caching datastore.
The only constraint is that all the datasets in registry are associated with at least one of the datastores in the chain.
ChainedDatastore has a
datastores key that contains a list of datastore configurations that can match the
datastore contents from other datastores.
ChainedDatastore can also support
lsst.analysis.tools package implements a special kind of datastore to facilitate uploading
MetricMeasurementBundles to a Sasquatch instance.
SasquatchDatastore is currently write only and is meant to aid dispatching
MetricMeasurementBundles anytime such a dataset is put with the butler.
Often times this datastore will be used in conjunction with both a
datastores.chainedDatastore.ChainedDatastore and a
In such a setup, the
MetricMeasurementBundle will be uploaded to the Sasquatch database, and then persisted to a file based location the butler can retrieve from.
The supported configurations are
The url where a http rest api based kafka proxy to the Sasquatch database can be found.
An access token that is used to authenticate to the rest api server.
The namespace within a Sasquatch database where metrics will be uploaded.