lsst.daf.butler¶
This module provides an abstracted data access interface, known as the Butler. It can be used to read and write data without having to know the details of file formats or locations.
Contributing¶
lsst.daf.butler
is developed at https://github.com/lsst/daf_butler.
You can find Jira issues for this module under the daf_butler component.
Using the Butler¶
Command Line Scripts¶
butler¶
butler [OPTIONS] COMMAND [ARGS]...
Options
-
--log-level
<log_level>
¶ The Python log level to use.
Options: critical|error|warning|info|debug|CRITICAL|ERROR|WARNING|INFO|DEBUG
config-dump¶
Dump either a subset or full Butler configuration to standard output.
REPO is the URI or path to an existing data repository root or configuration file.
butler config-dump [OPTIONS] REPO
Options
-
-s
,
--subset
<subset>
¶ Subset of a configuration to report. This can be any key in the hierarchy such as ‘.datastore.root’ where the leading ‘.’ specified the delimiter for the hierarchy.
-
-p
,
--searchpath
<searchpath>
¶ Additional search paths to use for configuration overrides
-
--file
<outfile>
¶ Print the (possibly-expanded) configuration for a repository to a file, or to stdout by default.
Arguments
-
REPO
¶
Required argument
config-validate¶
Validate the configuration files for a Gen3 Butler repository.
REPO is the URI or path to an existing data repository root or configuration file.
butler config-validate [OPTIONS] REPO
Options
-
-q
,
--quiet
¶
Do not report individual failures.
-
-d
,
--dataset-type
<dataset_type>
¶ Specific DatasetType(s) to validate.
-
-i
,
--ignore
<ignore>
¶ DatasetType(s) to ignore for validation.
Arguments
-
REPO
¶
Required argument
convert¶
Convert a Butler gen 2 repository into a gen 3 repository.
REPO is the URI or path to the gen3 repository. Will be created if it does not already exist
butler convert [OPTIONS] REPO
Options
-
-i
,
--instrument
<instrument>
¶ The fully-qualified name of the gen3 Instrument subclass for this camera. [required]
-
--gen2root
<gen2root>
¶ Root path of the gen 2 repo to be converted. [required]
-
--skymap-name
<skymap_name>
¶ Name of the new gen3 skymap (e.g. ‘discrete/ci_hsc’).
-
--skymap-config
<skymap_config>
¶ Path to skymap config file defining the new gen3 skymap.
-
--calibs
<calibs>
¶ Path to the gen 2 calibration repo. It can be absolute or relative to gen2root.
-
--reruns
<reruns>
¶ List of gen 2 reruns to convert.
-
-t
,
--transfer
<transfer>
¶ Mode to use to transfer files into the new repository.
Options: auto|link|symlink|hardlink|copy|move|relsymlink
-
-C
,
--config-file
<config_file>
¶ Path to a
ConvertRepoConfig
override to be included after the Instrument config overrides are applied.
Arguments
-
REPO
¶
Required argument
create¶
Create an empty Gen3 Butler repository.
REPO is the URI or path to the new repository. Will be created if it does not exist.
butler create [OPTIONS] [REPO]
Options
-
--seed-config
<seed_config>
¶ Path to an existing YAML config file to apply (on top of defaults).
-
--standalone
¶
Include all defaults in the config file in the repo, insulating the repo from changes in package defaults.
-
--override
¶
Allow values in the supplied config to override all repo settings.
-
-f
,
--outfile
<outfile>
¶ Name of output file to receive repository configuration. Default is to write butler.yaml into the specified repo.
Arguments
-
REPO
¶
Optional argument
define-visits¶
REPO is the URI or path to an existing data repository root or configuration file.
butler define-visits [OPTIONS] REPO
Options
-
-C
,
--config-file
<config_file>
¶ Path to a pex_config override to be included after the Instrument config overrides are applied.
-
--collections
<collections>
¶ The collections to be searched (in order) when reading datasets.
-
-i
,
--instrument
<instrument>
¶ The name or fully-qualified class name of an instrument. [required]
Arguments
-
REPO
¶
Required argument
import¶
Import data into a butler repository.
REPO is the URI or path to the new repository. Will be created if it does not exist.
DIRECTORY is the folder containing dataset files.
butler import [OPTIONS] REPO DIRECTORY
Options
-
-t
,
--transfer
<transfer>
¶ The external data transfer mode.
Options: auto|link|symlink|hardlink|copy|move|relsymlink
-
--output-run
<output_run>
¶ The name of the run datasets should be output to. [required]
-
--export-file
<export_file>
¶ Name for the file that contains database information associated with the exported datasets. If this is not an absolute path, does not exist in the current working directory, and –dir is provided, it is assumed to be in that directory. Defaults to “export.yaml”.
Arguments
-
REPO
¶
Required argument
-
DIRECTORY
¶
Required argument
ingest-raws¶
REPO is the URI or path to an existing data repository root or configuration file.
butler ingest-raws [OPTIONS] REPO
Options
-
-c
,
--config
<config>
¶ Config override, as a key-value pair.
-
-C
,
--config-file
<config_file>
¶ Path to a pex config override to be included after the Instrument config overrides are applied.
-
--output-run
<output_run>
¶ The name of the run datasets should be output to. [required]
-
-d
,
--dir
<directory>
¶ The path to the directory containing the raws to ingest.
-
-f
,
--file
<file>
¶ The name of a file containing raws to ingest.
-
-t
,
--transfer
<transfer>
¶ The external data transfer mode.
Options: auto|link|symlink|hardlink|copy|move|relsymlink
-
--ingest-task
<ingest_task>
¶ The fully qualified class name of the ingest task to use.
Arguments
-
REPO
¶
Required argument
query-collections¶
Get the collections whose names match an expression.
REPO is the URI or path to an existing data repository root or configuration file.
butler query-collections [OPTIONS] REPO
Options
-
--collection-type
<collection_type>
¶ If provided, only list collections of this type.
Options: CHAINED|RUN|TAGGED
-
--flatten-chains
,
--no-flatten-chains
¶
Recursively get the child collections of matching CHAINED collections. Default is –no-flatten-chains.
-
--include-chains
,
--no-include-chains
¶
For –include-chains, return records for matching CHAINED collections. For –no-include-chains do not return records for CHAINED collections. Default is the opposite of –flatten-chains: include either CHAINED collections or their children, but not both.
Arguments
-
REPO
¶
Required argument
query-dataset-types¶
Get the dataset types in a repository.
REPO is the URI or path to an existing data repository root or configuration file.
GLOB is one or more strings to apply to the search.
butler query-dataset-types [OPTIONS] REPO GLOB ...
Options
-
-v
,
--verbose
¶
Include dataset type name, dimensions, and storage class in output.
-
--components
,
--no-components
¶
For –components, apply all expression patterns to component dataset type names as well. For –no-components, never apply patterns to components. Default (where neither is specified) is to apply patterns to components only if their parent datasets were not matched by the expression. Fully-specified component datasets (
str
orDatasetType
instances) are always included.
Arguments
-
REPO
¶
Required argument
-
GLOB
...
¶ Optional argument(s)
register-instrument¶
Add an instrument to the data repository.
REPO is the URI or path to an existing data repository root or configuration file.
butler register-instrument [OPTIONS] REPO
Options
-
-i
,
--instrument
<instrument>
¶ The fully-qualified name of an Instrument subclass. [required]
Arguments
-
REPO
¶
Required argument
write-curated-calibrations¶
Add an instrument’s curated calibrations to the data repository.
REPO is the URI or path to an existing data repository root or configuration file.
butler write-curated-calibrations [OPTIONS] REPO
Options
-
-i
,
--instrument
<instrument>
¶ The name or fully-qualified class name of an instrument. [required]
-
--output-run
<output_run>
¶ The name of the run datasets should be output to. [required]
Arguments
-
REPO
¶
Required argument
The Dimensions System¶
Python API reference¶
lsst.daf.butler Package¶
Functions¶
addDimensionForeignKey (tableSpec, dimension, …) |
Add a field and possibly a foreign key to a table specification that reference the table for the given Dimension . |
makeDimensionElementTableSpec (element) |
Create a complete table specification for a DimensionElement . |
Classes¶
AmbiguousDatasetError |
Exception raised when a DatasetRef is not resolved (has no ID or run), but the requested operation requires one of them. |
Butler (config, str, None] = None, *, butler, …) |
Main entry point for the data access system. |
ButlerConfig ([other, searchPaths]) |
Contains the configuration for a Butler |
ButlerURI (uri, urllib.parse.ParseResult], …) |
Convenience wrapper around URI parsers. |
ButlerValidationError |
There is a problem with the Butler configuration. |
CollectionSearch (items, …) |
An ordered search path of collections and dataset type restrictions. |
CollectionType |
Enumeration used to label different types of collections. |
CompositeAssembler (storageClass) |
Class for providing assembler and disassembler support for composites. |
CompositesConfig ([other, validate, …]) |
|
CompositesMap (config, ButlerConfig, …) |
Determine whether a specific datasetType or StorageClass should be disassembled. |
Config ([other]) |
Implements a datatype that is used by Butler for configuration parameters. |
ConfigSubset ([other, validate, …]) |
Config representing a subset of a more general configuration. |
Constraints (config, str]], *, universe) |
Determine whether a DatasetRef , DatasetType , or StorageClass is allowed to be handled. |
ConstraintsConfig ([other]) |
Configuration information for Constraints |
ConstraintsValidationError |
Exception thrown when a constraints list has mutually exclusive definitions. |
DataCoordinate (graph, values, …]) |
An immutable data ID dictionary that guarantees that its key-value pairs identify all required dimensions in a DimensionGraph . |
DatasetComponent (name, storageClass, component) |
Component of a dataset and associated information. |
DatasetRef |
Reference to a Dataset in a Registry . |
DatasetType (name, dimensions, …) |
A named category of Datasets that defines how they are organized, related, and stored. |
DatasetTypeNotSupportedError |
A DatasetType is not handled by this routine. |
DatasetTypeRestriction (names, ellipsis]) |
An immutable set-like object that represents a restriction on the dataset types to search for within a collection. |
Datastore (config, str], bridgeManager, …) |
Datastore interface. |
DatastoreConfig ([other, validate, …]) |
|
DatastoreValidationError |
There is a problem with the Datastore configuration. |
DeferredDatasetHandle (butler, ref, parameters) |
Proxy class that provides deferred loading of a dataset from a butler. |
Dimension (name, *, related, uniqueKeys, **kwargs) |
A named data-organization concept that can be used as a key in a data ID. |
DimensionConfig ([other, validate, …]) |
Configuration that defines a DimensionUniverse . |
DimensionElement (name, *, related, metadata, …) |
A named data-organization concept that defines a label and/or metadata in the dimensions system. |
DimensionGraph |
An immutable, dependency-complete collection of dimensions. |
DimensionPacker (fixed, dimensions) |
An abstract base class for bidirectional mappings between a DataCoordinate and a packed integer ID. |
DimensionRecord (*args) |
Base class for the Python representation of database records for a DimensionElement . |
DimensionUniverse |
A special DimensionGraph that constructs and manages a complete set of compatible dimensions. |
ExpandedDataCoordinate (graph, values, …], …) |
A data ID that has been expanded to include all relevant metadata. |
FileDataset (path, refs, List[DatasetRef]], …) |
A struct that represents a dataset exported to a file. |
FileDescriptor (location, storageClass, …) |
Describes a particular file. |
FileTemplate (template) |
Format a path template into a fully expanded path. |
FileTemplateValidationError |
Exception thrown when a file template is not consistent with the associated DatasetType . |
FileTemplates (config, str], default, *, universe) |
Collection of FileTemplate templates. |
FileTemplatesConfig ([other]) |
Configuration information for FileTemplates |
Formatter (fileDescriptor, dataId, …) |
Interface for reading and writing Datasets with a particular StorageClass . |
FormatterFactory () |
Factory for Formatter instances. |
IndexedTupleDict (indices, int][K, int], …) |
An immutable mapping that combines a tuple of values with a (possibly shared) mapping from key to tuple index. |
Location (datastoreRootUri, str], path) |
Identifies a location within the Datastore . |
LocationFactory (datastoreRoot) |
Factory for Location instances. |
LookupKey (name, dimensions, …) |
Representation of key that can be used to lookup information based on dataset type name, storage class name, dimensions. |
MappingFactory (refType) |
Register the mapping of some key to a python type and retrieve instances. |
NamedKeyDict (*args) |
A dictionary wrapper that require keys to have a .name attribute, and permits lookups using either key objects or their names. |
NamedKeyMapping |
An abstract base class for custom mappings whose keys are objects with a str name attribute, for which lookups on the name as well as the object are permitted. |
NamedValueSet (elements) |
A custom mutable set class that requires elements to have a .name attribute, which can then be used as keys in dict -like lookup. |
Quantum (*, taskName, taskClass, dataId, run, …) |
A discrete unit of work that may depend on one or more datasets and produces one or more datasets. |
Registry (database, universe, *, attributes, …) |
Registry interface. |
RepoExport (registry, datastore, backend, *, …) |
Public interface for exporting a subset of a data repository. |
RepoExportBackend |
An abstract interface for data repository export implementations. |
RepoImportBackend |
An abstract interface for data repository import implementations. |
RepoTransferFormatConfig ([other, validate, …]) |
The section of butler configuration that associates repo import/export backends with file formats. |
SkyPixDimension (name, pixelization) |
A special Dimension subclass for hierarchical pixelizations of the sky. |
StorageClass (name, pytype, str, …) |
Class describing how a label maps to a particular Python type. |
StorageClassConfig ([other, validate, …]) |
|
StorageClassFactory (config, str, None] = None) |
Factory for StorageClass instances. |
StoredDatastoreItemInfo |
Internal information associated with a stored dataset in a Datastore . |
StoredFileInfo (formatter, …) |
Datastore-private metadata associated with a file stored in a Datastore. |
Timespan |
A generic 2-element named tuple for time intervals. |
ValidationError |
Some sort of validation error has occurred. |
YamlRepoExportBackend (stream) |
A repository export implementation that saves to a YAML file. |
YamlRepoImportBackend (stream, registry) |
A repository import implementation that reads from a YAML file. |
Class Inheritance Diagram¶
lsst.daf.butler.registry Package¶
Classes¶
CollectionSearch (items, …) |
An ordered search path of collections and dataset type restrictions. |
CollectionType |
Enumeration used to label different types of collections. |
ConflictingDefinitionError |
Exception raised when trying to insert a database record when a conflicting record already exists. |
DatasetTypeRestriction (names, ellipsis]) |
An immutable set-like object that represents a restriction on the dataset types to search for within a collection. |
DbAuth (path, envVar, authList, str]]] = None) |
Retrieves authentication information for database connections. |
DbAuthError |
A problem has occurred retrieving database authentication information. |
DbAuthPermissionsError |
Credentials file has incorrect permissions. |
InconsistentDataIdError |
Exception raised when a data ID contains contradictory key-value pairs, according to dimension relationships. |
MissingCollectionError |
Exception raised when an operation attempts to use a collection that does not exist. |
OrphanedRecordError |
Exception raised when trying to remove or modify a database record that is still being used in some other table. |
Registry (database, universe, *, attributes, …) |
Registry interface. |
RegistryConfig ([other, validate, …]) |
Class Inheritance Diagram¶
lsst.daf.butler.registry.interfaces Package¶
Classes¶
ButlerAttributeExistsError |
Exception raised when trying to update existing attribute without specifying force option. |
ButlerAttributeManager |
An interface for managing butler attributes in a Registry . |
ChainedCollectionRecord (key, name) |
A subclass of CollectionRecord that adds the list of child collections in a CHAINED collection. |
CollectionManager |
An interface for managing the collections (including runs) in a Registry . |
CollectionRecord (key, name, type) |
A struct used to represent a collection in internal Registry APIs. |
Database (*, origin, connection, namespace) |
An abstract interface that represents a particular database engine’s representation of a single schema/namespace/database. |
DatabaseConflictError |
Exception raised when database content (row values or schema entities) are inconsistent with what this client expects. |
DatasetRecordStorage (datasetType) |
An interface that manages the records associated with a particular DatasetType . |
DatasetRecordStorageManager |
An interface that manages the tables that describe datasets. |
DatastoreRegistryBridge (datastoreName) |
An abstract base class that defines the interface that a Datastore uses to communicate with a Registry . |
DatastoreRegistryBridgeManager (*, opaque, …) |
An abstract base class that defines the interface between Registry and Datastore when a new Datastore is constructed. |
DimensionRecordStorage |
An abstract base class that represents a way of storing the records associated with a single DimensionElement . |
DimensionRecordStorageManager (*, universe) |
An interface for managing the dimension records in a Registry . |
FakeDatasetRef |
A fake DatasetRef that can be used internally by butler where only the dataset ID is available. |
MissingCollectionError |
Exception raised when an operation attempts to use a collection that does not exist. |
OpaqueTableStorage (name) |
An interface that manages the records associated with a particular opaque table in a Registry . |
OpaqueTableStorageManager |
An interface that manages the opaque tables in a Registry . |
ReadOnlyDatabaseError |
Exception raised when a write operation is called on a read-only Database . |
RunRecord (key, name, type) |
A subclass of CollectionRecord that adds execution information and an interface for updating it. |
StaticTablesContext (db) |
Helper class used to declare the static schema for a registry layer in a database. |
Class Inheritance Diagram¶
lsst.daf.butler.registry.queries Package¶
Classes¶
Query (*, sql, summary, columns, collections) |
A wrapper for a SQLAlchemy query that knows how to transform result rows into data IDs and dataset references. |
QueryBuilder (summary, *, collections, …) |
A builder for potentially complex queries that join tables based on dimension relationships. |
QuerySummary (requested, *, dataId, …) |
A struct that holds and categorizes the dimensions involved in a query. |
Class Inheritance Diagram¶
lsst.daf.butler.registry.wildcards Module¶
Classes¶
CategorizedWildcard (strings, patterns, …) |
The results of preprocessing a wildcard expression to separate match patterns from strings. |
CollectionQuery (search, ellipsis], patterns, …) |
An unordered query for collections and dataset type restrictions. |
CollectionSearch (items, …) |
An ordered search path of collections and dataset type restrictions. |
DatasetTypeRestriction (names, ellipsis]) |
An immutable set-like object that represents a restriction on the dataset types to search for within a collection. |
Class Inheritance Diagram¶
Example datastores¶
lsst.daf.butler.datastores.chainedDatastore Module¶
Classes¶
ChainedDatastore (config, str], …) |
Chained Datastores to allow read and writes from multiple datastores. |
Class Inheritance Diagram¶
lsst.daf.butler.datastores.inMemoryDatastore Module¶
Classes¶
StoredMemoryItemInfo (timestamp, …) |
Internal InMemoryDatastore Metadata associated with a stored DatasetRef. |
InMemoryDatastore (config, str], …) |
Basic Datastore for writing to an in memory cache. |
Class Inheritance Diagram¶
lsst.daf.butler.datastores.posixDatastore Module¶
Classes¶
PosixDatastore (config, str], bridgeManager, …) |
Basic POSIX filesystem backed Datastore. |
Class Inheritance Diagram¶
lsst.daf.butler.datastores.s3Datastore Module¶
Classes¶
S3Datastore (config, str], bridgeManager, …) |
Basic S3 Object Storage backed Datastore. |
Class Inheritance Diagram¶
Example formatters¶
lsst.daf.butler.formatters.file Module¶
Classes¶
FileFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing files on a POSIX file system. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.json Module¶
Classes¶
JsonFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing Python objects to and from JSON files. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.matplotlib Module¶
Classes¶
MatplotlibFormatter (fileDescriptor, dataId, …) |
Interface for writing matplotlib figures. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.parquet Module¶
Classes¶
ParquetFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing Pandas DataFrames to and from Parquet files. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.pexConfig Module¶
Classes¶
PexConfigFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing pex.config.Config objects from disk. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.pickle Module¶
Classes¶
PickleFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing Python objects to and from pickle files. |
Class Inheritance Diagram¶
lsst.daf.butler.formatters.yaml Module¶
Classes¶
YamlFormatter (fileDescriptor, dataId, …) |
Interface for reading and writing Python objects to and from YAML files. |
Class Inheritance Diagram¶
Database backends¶
lsst.daf.butler.registry.databases.sqlite Module¶
Classes¶
SqliteDatabase (*, connection, origin, …) |
An implementation of the Database interface for SQLite3. |
Class Inheritance Diagram¶
lsst.daf.butler.registry.databases.postgresql Module¶
Classes¶
PostgresqlDatabase (*, connection, origin, …) |
An implementation of the Database interface for PostgreSQL. |
Class Inheritance Diagram¶
lsst.daf.butler.registry.databases.oracle Module¶
Classes¶
OracleDatabase (*, connection, origin, …) |
An implementation of the Database interface for Oracle. |
Class Inheritance Diagram¶
Support API¶
lsst.daf.butler.core.utils Module¶
Functions¶
allSlots (self) |
Return combined __slots__ for all classes in objects mro. |
getClassOf (typeOrName, str]) |
Given the type name or a type, return the python type. |
getFullTypeName (cls) |
Return full type name of the supplied entity. |
getInstanceOf (typeOrName, str], *args, **kwargs) |
Given the type name or a type, instantiate an object of that type. |
immutable (cls) |
A class decorator that simulates a simple form of immutability for the decorated class. |
iterable (a) |
Make input iterable. |
safeMakeDir (directory) |
Make a directory in a manner avoiding race conditions |
stripIfNotNone (s) |
Strip leading and trailing whitespace if the given object is not None. |
transactional (func) |
Decorator that wraps a method and makes it transactional. |
Class Inheritance Diagram¶
lsst.daf.butler.core.repoRelocation Module¶
Functions¶
replaceRoot (configRoot, butlerRoot) |
Update a configuration root with the butler root location. |
Variables¶
BUTLER_ROOT_TAG |
The special string to be used in configuration files to indicate that the butler root location should be used. |
Test utilities¶
lsst.daf.butler.tests Package¶
Functions¶
addDatasetType (butler, name, dimensions, …) |
Add a new dataset type to a repository. |
expandUniqueId (butler, partialId) |
Return a complete data ID matching some criterion. |
makeTestCollection (repo) |
Create a read/write Butler to a fresh collection. |
makeTestRepo (root, dataIds, *[, config]) |
Create an empty repository with dummy data IDs. |
registerMetricsExample (butler) |
Modify a repository to support reading and writing MetricsExample objects. |
Classes¶
BadNoWriteFormatter (fileDescriptor, dataId, …) |
A formatter that always fails without writing anything. |
BadWriteFormatter (fileDescriptor, dataId, …) |
A formatter that never works but does leave a file behind. |
DatasetTestHelper |
Helper methods for Datasets |
DatastoreTestHelper |
Helper methods for Datastore tests |
DummyRegistry () |
Dummy Registry, for Datastore test purposes. |
ListAssembler (storageClass) |
Parameter handler for list parameters |
MetricsAssembler (storageClass) |
Parameter handler for parameters using Metrics |
MetricsExample ([summary, output, data]) |
Smorgasboard of information that might be the result of some processing. |
MultiDetectorFormatter (fileDescriptor, …) |