Butler v29.1.0 (2025-06-13)¶
New Features¶
- Added support in user expressions for explicit specification of bind identifiers with a preceding colon symbol. Legacy format of bind identifiers without colon is still supported, but will be deprecated in the future. (DM-38497) 
- Added a new - --show-dataset-typesargument (- -t) to- butler query-collectionsto list the dataset types in each collection. Also added a new- --exclude-dataset-typeswhich allows a comma-separated list of string globs to be passed in for exclusion when dataset types are shown. (DM-48975)
- Added Pydantic serialization for - DimensionRecordSet. (DM-49622)
- Added - FileDataset.to_simpleand- FileDataset.from_simplefor serializing- FileDatasetinstances. (DM-49670)
- Added a - join_data_coordinate_tablemethod to the- Queryclass for uploading data IDs from Astropy tables. This includes support for filling in required-dimension columns using constraints from a- wherestring (for example now a table with just- visitIDs along with- instrument='LSSTCam'in the- wherestring will work). (DM-49949)
- Added support to the user expression language for a - GLOB(expression, pattern)function. The function performs case-sensitive match of the string expression against the pattern. The pattern can include- *and- ?meta-characters that match any number or a single character. (DM-50191)
- Added the ability for Zip ingest to register any missing dimension records. Note that if any datasets use - visitthat require registration then the records being registered will not fully define the visit and so can not be usable for graph building. It is recommended that visits be defined first before ingest at this time. (DM-50313)
- Added support to the user expression language for UUID literals that can be used to query dataset IDs. UUID literals are specified using function call syntax: - UUID('hex-string'). (DM-50451)
- Added ability for Butler to record timing metrics such as the amount of time spent in - get()or- put()and the number of times those are called. A metrics object can be given to the- Butlerconstructor to record everything, or alternatively a new context manager- Butler.record_metricscan be used to record metrics for a specific usage. (DM-50491)
- RemoteButlercan now be used as a source Butler in- transfer_from. (DM-51075)
API Changes¶
- The - unstoreparameter for- Butler.removeRuns()has been deprecated. We now always remove the file artifacts when removing the collection.
- Added a - unlink_from_chainsparameter to- Butler.removeRuns()to allow the RUN collections to be unlinked from their parent chains automatically. (DM-50996)
 
Bug Fixes¶
- Fixed an issue where - find_first=Falsedataset queries would sometimes return duplicate results if more than one collection was being searched. (DM-47201)
- Query expressions now allow float columns to be compared with - intliterals. Query expressions now allow- OVERLAPSto be used to compare timespans with- ingest_date.(DM-47644)
- Butler.ingest()will now register missing run collections instead of raising a- MissingCollectionError. (DM-49670)
- Fixed handling of expressions like - "id=2"in contexts where- "id"resolves unambigously to a dimension primary key column. (DM-50465)
- butler remove-runswill no longer block if a parallel process calls- butler remove-runson a run contained in the same collection chain. (DM-50855)
Performance Enhancement¶
- Butler.transfer_from()now uses fewer database queries when inserting datasets of multiple dataset types, when some of the dataset types have the same dimensions. (DM-49513)
- Improve the performance of certain queries by moving - WHEREclause terms down into subqueries. (DM-50969)
- Butler.transfer_fromnow uses fewer database queries. (DM-51302)
Other Changes and Additions¶
- retrieveArtifactsand- transfer_fromnow transfer their artifacts using multiple threads. (DM-31824)
- Reorganized the code for - Butler.ingestand- Butler.ingest_zipto share the code from- Butler.transfer_fromin order to provide consistent error messages for incompatible dataset types and to allow a future possibility of registering dataset types and dimension records as part of- ingest_zip.
- Removed the - use_cacheparameter from the- DimensionUniverseconstructor. The universe is always cached and the remote butler now uses that cache and does not need to disable the cache. (DM-50044)
 
- Significantly sped up - Butler.pruneDatasetsby using parallelized artifact deletions. (DM-50724)
- Modified trash emptying in datastore such that only the datasets to be removed as part of the original butler request are the ones that are trashed immediately. (DM-50727) 
- Modified CLI tooling to work with both click 8.1 and click 8.2. (DM-50823) 
- Fixed handling of - FileDatastore.transfer_from()such that it now works with chained datastores. (DM-50935)
- Reorganized - Butler.removeRuns()to remove datasets in chunks to provide more feedback to the user and allow for restarting if the command fails for some reason. (DM-50996)
- Added a - fallback_instrumentto ObsCore configurations. This instrument is used when an ObsCore record is constructed from a dataset type that has no instrument defined. (DM-51269)
- Butler.transfer_from()will now raise a- FileNotFoundErrorwhile transferring files if a file listed in the database is not actually available on the filesystem. Previously, it would silently skip the file. (DM-51302)
Butler v29.0.0 (2025-03-25)¶
New Features¶
- Added new class - DatasetProvenancefor tracking the provenance of an individual dataset.
- Modified - Butler.put()to accept an optional- DatasetProvenance.
- Added - add_provenancemethods to- FormatterV2and- StorageClassDelegate. These methods will now be called with the provenance object during- Butler.put()to allow the in-memory dataset to be updated prior to writing. (DM-35396)
 
- Added - Butler.retrieve_artifacts_zipand- QuantumBackedButler.retrieve_artifacts_zipmethods to retrieve the dataset artifacts and store them into a zip file.
- Added - Butler.ingest_zipto ingest the contents of a Zip file.
- Added - SerializedDatasetRefContainerV1class to allow a collection of- DatasetRefto be serialized efficiently. JSON serializations made using this class will be supported.
- Added - --zipparameter to- butler retrieve-artifacts.
- Changed - Butler.retrieveArtifactsto always write a JSON index file describing where the artifacts came from.
- Added a - butler ingest-zipcommand-line tool for ingesting zip files created by- butler retrieve-artifacts. (DM-46776)
 
- The - DAF_BUTLER_PLUGINSenvironment variable should no longer be set if packages use- pip installand have been upgraded to use entry points. Butler can now read the subcommands from- pipe_baseand- daf_butler_migrateautomatically. Setting the environment variable for these packages will result in an error. (DM-47143)
- Added two new APIs for handling Butler dataset URIs. - Butler.parse_dataset_uriparses a URI and returns the butler repository label and associated UUID.- Butler.get_dataset_from_uriwill parse a URI and attempt to retrieve the- DatasetRef. URIs should be in the form of IVOA identifiers as described in DMTN-302. Deprecated- butler://URIs are still supported but should not be used in new systems. (DM-47325)
- Added a - --chains NO-CHILDRENmode to the- butler query-collectionsCLI, which returns results without recursing into- CHAINEDcollections. (DM-47768)
- Added - lsst.daf.butler.formatters.parquet.add_pandas_index_to_astropy()function which stores special metadata that will be used to create a pandas DataFrame index if the table is read as a- DataFrame. (DM-48141)
- Modified the Obscore - RecordFactoryto support per-universe subclass discovery using entry points.- Added - RecordFactory.get_record_type_from_universeto obtain the correct factory class.
- Renamed - ExposureRegionFactoryto- DerivedRegionFactoryto make it clearer that this class is not solely used for exposures but the usage can change with universe.
- Added - RecordFactory.region_dimensionto return the dimension that would be needed to obtain a region for this universe. (DM-48282)
 
- Added new methods to - DatasetProvenancefor serializing provenance to a flat dictionary and recovering provenance from that dictionary.
- Modifed - ParquetFormatterto write provenance metadata to Astropy tables. (DM-48869)
 
API Changes¶
- Added - QuantumBackedButler.retrieve_artifactsmethod to allow dataset artifacts to be retrieved from a graph. (DM-47328)
Bug Fixes¶
- Fixed inserts with - replace=Trueon dimensions with only primary key columns. (DM-46631)
- Fixed a bug where - DatastoreCacheManagerwould raise- ValueError('badly formed hexadecimal UUID string')if files with unexpected names are present in the cache directory when trying to load a file from the cache. (DM-46936)
- Fixed a crash in the new Butler query system which happened in some conditions when using the find-first option with multiple collections. (DM-47475) 
- Fixed a bug in which projections spatial-join queries (particularly those where the dimensions of the actual regions being compared are not in the query result rows) could return additional records where there actually was no overlap. (DM-47947) 
- Fixed a bug where dataset fields like - ingest_datewere raising- InvalidQueryError: Unrecognized identifierwhen used in a- Butler.query_datasets- whereclause. (DM-48094)
- Fixed a query bug that could lead to unexpectedly coarse spatial joins. - When dataset search or other join operand had some dimensions from either side of a potential spatial join (e.g. - {tract, visit}), we had been blocking the addition of an automatic spatial join on the assumption that this would be embedded in that join operand. But this is only desirable when the spatial join that would have been added actually the same one implied by that join operand’s dimensions; if it’s something more fine grained (e.g.- {tract, patch, visit}) we end up with result rows that relate dimensions (e.g.- patchand- visit) that do not actually overlap. Now automatic spatial joins are only blocked when the join operand includes all dimensions that would have participated in the automatic join. (DM-48880)
- Fixed a bug that could result in incorrectly empty query results when a data ID constraint was inconsistent with some dataset types in a collection, but not the on actually being queried for. (DM-48974) 
- Added pyarrow metadata keywords for astropy table to fix warnings on read. (DM-49509) 
Other Changes and Additions¶
- Now support a type conversion field in file template format strings. (DM-47976) 
- Modified ObsCore configuration to support - facility_maplookup table to allow the facility to be associated with a specific instrument. This is important for butler repositories containing data from multiple instruments and facilities. (DM-46914)
Butler v28.0.0 (2024-11-20)¶
New Features¶
- Added a new formatter class, - lsst.daf.butler.FormatterV2that has been redesigned to be solely focused on file I/O with a much cleaner interface. This is now the recommended interface for writing a formatter. Butler continues to support the legacy- Formatterbut you should plan to migrate to the new simpler interface. (DM-26658)
- File templates are now allowed to define multiple alternate dimensions within a single field. Use the - |separator to specify alternatives. For example rather than specifying the- day_obsfor both- visitand- exposurethey can now be combined as- {exposure.day_obs|visit.day_obs:?}. This can be useful if you want, say, a- groupdimension to be included but not if- exposureis also in the dataId:- {exposure.obs_id|group}would pick the- exposure- obs_idin preference to- groupbut use- groupif no- exposureis defined. (DM-44147)
- Added - --no-track-file-attrsto- butler import(and associated import API) and- butler ingest-filescommands to allow an import/ingest to disable the calculation of file sizes on ingest. This can be useful if you are importing thousands of files from an object store where the file size determination can take a significant amount of time. (DM-45237)
- The - ParquetFormatternow declares it can_accept Arrow tables, Astropy tables, Numpy tables, and pandas DataFrames. This means that we have complete lossless storage of any parquet-compatible type into a datastore that has declared a different type; e.g., an astropy table with units can be persisted into a DataFrame storage class without those units being stripped. Also added- can_acceptto the- InMemoryDatastoredelegates, and now one- ArrowTableDelegatehandles all the parquet-compatible datasets. (DM-45431)
- Added - --collectionsoption to- butler query-dataset-typesto allow the resultant dataset types to be constrained by those that are used by specific collections.
- Changed the - Butler.collectionsproperty to be a- ButlerCollectionsinstance. This object can still act as a sequence equivalent to- ButlerCollections.defaultsbut adds new APIs for querying and manipulating collections. Any methods with names starting with- x_are deemed to be an experimental API that may change in the future. (DM-45738)
 
- Region overlap queries can now use points as regions. Points can be specified as - region OVERLAPS POINT(ra, dec), or by binding an- lsst.sphgeom.LonLator- astropy.coordinates.SkyCoordvalue. (At the moment, this feature is only available when using the new query system.) (DM-45752)
- Added an expiration mode of “disabled” to the datastore cache manager. This allows an environment variable to be used to disable caching completely, or allows for a default configuration to be disabled and for environment variables to enable caching. (DM-45775) 
- A new query system and interface is now available using - butler.query()as a context manager. This new system is much more flexible and supports far more expressive queries, and no longer requires the results to be placed in a- setto remove duplication.
- Added - butler.query_datasets(),- butler.query_dimension_records()and- butler.query_data_ids()as replacements for the- butler.registryequivalents. These use the new query system and are preferred over the old interfaces.
- The experimental collections querying interface is now public and called - butler.collections.query_infoand- butler.collections.query.
- The command line tools - query-datasets,- associate,- retrieve-artifactsand- transfer-datasetsnow support a- --limitparameter. The default for all except- associate(which defaults to no limit) is to limit the number of results to 10,000. A warning will be issued if the cap is hit.
- The command line tools - query-datasets,- associate,- retrieve-artifactsand- transfer-datasetsnow support- --order-byto control the sorting in conjunction with- --limit. For- query-datasetsthis will also control the sorting of the reported tables. (DM-45872)
 
- Added - Butler.clone(), which lets you make a copy of a Butler instance, optionally overriding default collections/run/data ID. (DM-46298)
- Updated the parquet formatter to use - fsspec, which allows direct access to columns in S3, WebDAV, etc. (DM-46575)
API Changes¶
- Added - DatastoreCacheManager.create_disabled()to create a cache manager which is disabled by default but can be enabled via the environment. (DM-45775)
- The internal import backend classes, such as - YamlRepoImportBackend, have been changed to use a butler rather than a registry. These are used by- butler.import_()but there should be no external impact from this change. (DM-45791)
- Added - DimensionGroup.region_dimensionand- DimensionGroup.timespan_dimensionproperties to make it easy to ask which dimension in the group is the best one to use for region or time calculations. (DM-45860)
Bug Fixes¶
- Fixed an issue preventing dataset types with group dimensions from being put into a Butler repo. (DM-43020) 
- Worked around - pandasbugs when using non-floating-point masked columns. (DM-43925)
- Postgres database connections are now checked for liveness before they are used, significantly reducing the chance of exceptions being thrown due to stale connections. (DM-44050) 
- Fixed handling of dataset types that use - healpix11dimensions; previously they caused exception in many query operations. (DM-45119)
- We no longer try to create the datastore root at startup for non-POSIX filesystems, to fix an issue where this would fail on read-only repositories stored on S3/HTTP/GS. (DM-45140) 
- Fixed bug where datetime columns would serialize to parquet from Pandas but not from astropy or numpy. (DM-45386) 
- Fixed an issue where boolean metadata columns (like - exposure.can_see_skyand- exposure.has_simulated) were not usable in- whereclauses for Registry query functions. These column names can now be used as a boolean expression, for example- where="exposure.can_see_skyor- where="NOT exposure.can_see_sky". (DM-45680)
- Fixed a bug in - butler query-datasetsthat incorrectly rejected a find-first query against a chain collection as having a glob. (DM-46339)
- Fixed an issue where default data IDs were not constraining query results in the new query system. (DM-46347) 
- Fixed support for multiple-instrument (and multiple-skymap) - whereexpressions in the new query system. (DM-46401)
- Fixed an issue where - query_datasetswould sometimes fail when searching in a single run collection. (DM-46430)
- Fixed the return type of - arrow_to_numpyso that a masked record array is returned if any of the columns in the arrow table includes nulls. Previously the masks were ignored and fill values were visible and used in calculations. (DM-46563)
- Fixed an issue where the new query system was rejecting numpy integers used in data IDs or bind values. (DM-46711) 
Performance Enhancement¶
- Increased the Postgres connection pool size, fixing an issue where multi-threaded services would re-create the database connection excessively. (DM-44050) 
Other Changes and Additions¶
- Added - QPEnsembleand- PZModelto- datastores/formatters.yamland- storageClasses.yamlto enable storage of the machine learning models used by photo-z algorithms as well as the photo-z estimates produced by those algorithms. (DM-45541)
- Added storage classes for - lsst.daf.butler.Timespanand- lsst.pipe.base.utils.RegionTimeInfo. (DM-43020)
- Butler.transfer_from()has been modified to allow there to be a dataset type mismatch between the source butler and the target butler. For this to work it is required that converters are registered for both directions such that the source python type can be converted to the target python type and the target python type can be converted to the source python type. Without supporting bidirectional conversions there will be problems with inconsistencies in the behavior of- butler.get()for transferred datasets and those that were stored natively. (DM-44280)
- Added helpful exception notes when Parquet serialization fails. (DM-44399) 
- File ingest no longer checks that every file exists. This can take a very long time if thousands of files are being ingested from an object store. Now at most 200 files will be checked. Whether all files are subsequently checked depends on the transfer mode and whether - --no-track-file-attrsis enabled. For- director in-place ingest coupled with- --no-track-file-attrsthe file existence might never be verified. (DM-45237)
- The command-line tools have been modified to use the new query system and interface. The only user visible changes are that the - --no-checkand- --offsetoptions are no longer used since they are not supported by the new system. (DM-45556)
- Moved - CollectionTypeto the top level of the package hierarchy. There should be no change visible to external users but if previously you were using the deprecated- from lsst.daf.butler.registry import CollectionTypeplease change to- from lsst.daf.butler import CollectionType(which has always worked). (DM-45767)
- Enabled remote butler to utilize a datastore cache. By default clients created using a factory method will use a disabled cache that can be enabled by an environment variable and clients created from - Butler()will use a default cache configuration. (DM-45775)
- Updated default version of - datasetsmanager; new Butler repositories will use TAI nanoseconds for- ingest_datecolumn instead of database-native timestamps. (DM-46601)
An API Removal or Deprecation¶
- Removed the - componentsparameter from registry APIs.
- Dropped supported for regular expressions ( - re.Pattern) in dataset type expressions. Wildcard globs are still supported. (DM-36457)
 
- Removed - DimensionGraphand the- Mappinginterface to- DataCoordinate, along with most other public interfaces that utilize- DimensionElementinstances instead of just their string names.
- The - Butler.collection_chainsproperty is now deprecated. Please use- Butler.collectionsinstead. (DM-45738)
- Regular expressions in collection and dataset type patterns are now deprecated. (Shell-like globs will continue to be supported.) - Materializing dataset queries into temporary tables is now deprecated. (Materializing data ID queries will continue to be supported.) - The - datasetTypesargument to- Registry.queryCollectionsis now deprecated. (This parameter has never had any effect.)- We will soon stop raising - DataIdValueErrorexceptions for typos and other bad values in query expressions like- instrument='HsC'for typos and other bad values in query expressions. Instead, these queries will return an empty iterable of results.- Using HTM and HEALPix spatial dimensions like - htm11or- healpix10in data ID constraints passed to queries is now deprecated. The exception is- htm7, which will continue to work.- The - --no-checkparameter to- butler query-dimension-recordsis now deprecated.- The - offsetargument to- limit()for- Registry.queryDataIdsand- Registry.queryDimensionRecordsresult objects is now deprecated.- The - --offsetoption for- butler query-data-idsand- butler-query-datasetsis no longer supported, and will raise on exception if you attempt to use it.- It will soon become mandatory to explicitly provide - --collectionsand a dataset type search when calling- butler query-datasets.- Using - Butler.collectionsto get the list of default collections is now deprecated. Use- Butler.collections.defaultsinstead. (DM-46599)
Butler 27.0.0 (2024-05-28)¶
Now supports Python 3.12.
New Features¶
- Updated the open-source license to allow for the code to be distributed with either GPLv3 or BSD 3-clause license. (DM-37231) 
- Added new storage class and formatter for - NNModelPackagePayload– an interface between butler and pretrained neural networks, currently implemented in pytorch. (DM-38454)
- Improved support for finding calibrations and spatially-joined datasets as follow-ups to data ID queries. (DM-38498) 
- Added a storage class and associated formatter for the Spectractor - FitParametersclass, which holds the fitted- LIBRADTRANatmospheric parameters. (DM-38745)
- Added support for serialization and deserialization of Arrow schemas via Parquet, and added support for translation of - docand- unitsto/from arrow/astropy schemas. (DM-40582)
- Added - DimensionElement.schemaas a less SQL-oriented way to inspect the fields of a- DimensionRecord.- Also added two high-level containers ( - DimensionRecordSetand- DimensionRecordTable) for- DimensionRecordobjects, but these should be considered experimental and unstable until they are used in public- ButlerAPIs. (DM-41113)
- Added new - ButlerAPIs migrated from registry:- Butler.get_dataset_type(),- Butler.get_dataset(), and- Butler.find_dataset(). (DM-41365)
- Butler server can now be configured to use a - ChainedDatastore. (DM-41880)
- Added new API - Butler.transfer_dimension_records_from()to copy dimension records out of some refs and add them to the target butler.
- This and - Butler.transfer_from()now copy related dimension records as well as the records associated directly with the refs. For example, if- visitis being transferred additional records such as- visit_definitionwill also be copied. This requires a full Butler and not a limited Butler (such as the one backed by a quantum graph). (DM-41966)
 
- Added - LabeledButlerFactory, a factory class for constructing Butler instances. This is intended for use in long-lived services that need to be able to create a Butler instance for each incoming client request. (DM-42188)
- Added a new optional dependency set - remote, which can be used to install the dependencies required by the client half of Butler client/server. (DM-42190)
- “Cloned” Butler instances returned from - Butler(butler=otherButler)and- LabeledButlerFactoryno longer share internal state with their parent instance. This makes it safe to use the new instance concurrently with the original in separate threads. It is still unsafe to use a single- Butlerinstance concurrently from multiple threads. (DM-42317)
- Released - DimensionUniverseversion 6 *- groupand- day_obsare now true dimensions. *- exposurenow implies both- groupand- day_obs, and- visitimplies- day_obs.
- Exported YAML files using universe version 1 and newer can be imported and converted to universe version 6. (DM-42636) 
 
- The Butler repository index can now be configured by a new environment variable - $DAF_BUTLER_REPOSITORIES, which contains the configuration directly instead of requiring lookup via a URI. (DM-42660)
- Added - can_see_skymetadata field to- exposuredimension record (dimension universe v7). This field can indicate whether the detector received photons from the sky taking into account the camera shutter and the dome and telescope alignment. (DM-43101)
- Added additional collection chain methods to the - Butler.collection_chainsinterface:- extend_chain,- remove_from_chain, and- redefine_chain. These methods are all “atomic” functions that can safely be used concurrently from multiple processes. (DM-43315)
- Added a - timespanparameter to- Butler.get()(for direct and remote butler). This parameter can be used to specify an explicit time for calibration selection without requiring a temporal coordinate be included in the data ID. Additionally, if no time span is specified and no time span can be found in the data ID a default full-range time span will be used for calibration selection. This allows a calibration to be selected if there is only one matching calibration in the collection. (DM-43499)
- Added a new method - Butler.collection_chains.prepend_chain. This allows you to insert collections at the beginning of a chain. It is an “atomic” operation that can be safely used concurrently from multiple processes. (DM-43671)
- Added - MatchingKernelstorage class for persisting the PSF-matching kernel from image differencing. (DM-43736)
- Made - Timespana Pydantic model and added a- SerializableRegiontype alias that allows- lsst.sphgeom.Regionto be used directly as a Pydantic model field. (DM-43769)
API Changes¶
- Deprecated most public APIs that use - Dimensionor- DimensionElementobjects.- This implements RFC-834, deprecating the - DimensionGraphclass (in favor of the new, similar- DimensionGroup) and a large number of- DataCoordinatemethods and attributes, including its- collections.abc.Mappinginterface.- This includes: - use - DataCoordinate.dimensionsinstead of- DataCoordinate.graph(likewise for arguments to- DataCoordinate.standardize);
- use - dict(DataCoordinate.required)as a drop-in replacement for- DataCoordinate.byName(), but consider whether you want- DataCoordinate.required(a- Mappingview rather than a- dict) or- DataCoordinate.mapping(a- Mappingwith all available key-value pairs, not just the required ones);
- also use - DataCoordinate.mappingor- DataCoordinate.requiredinstead of treating- DataCoordinateitself as a- Mapping, except square-bracket indexing, which is still very much supported;
- use - DataCoordinate.dimensions.required.namesor- DataCoordinate.required.keys()as a drop-in replacement for- DataCoordinate.keys().namesor- DataCoordinate.names, but consider whether you actually want- DataCoordinate.dimensions.namesor- DataCoordinate.mapping.keysinstead.
 - DimensionGroupis almost identical to- DimensionGraph, but it and its subset attributes are not directly iterable (since those iterate over- Dimensionand- DimensionElementobjects); use the- .namesattribute to iterate over names instead (just as names could be iterated over in- DimensionGraph).- DimensionGraphis still used in some- lsst.daf.butlerAPIs (most prominently- DatasetType.dimensions) that may be accessed without deprecation warnings being emitted, but iterating over that object or its subset attributes will yield deprecation warnings. And- DimensionGraphis still accepted along with- DimensionGroupwithout warning in most public APIs. When- DimensionGraphis removed, methods and properties that return- DimensionGraphwill start returning- DimensionGroupinstead.- Rare code (mostly in downstream middleware packages) that does need access to - Dimensionor- DimensionElementobjects should obtain them directly from the- DimensionUniverse. For the pattern of checking whether a dimension is a skypix level, test whether its name is in- DimensionUniverse.skypix_dimensionsor- DimensionGroup.skypixinstead of obtaining a- Dimensioninstance and calling- isinstance(dimension, SkyPixDimension). (DM-34340)
- Added new - transfer_option_no_shortthat creates the- --transferoption without the associated- -talias. (DM-35599)
- Butlerclass became an abstract base class, original- Butlerwas renamed to- DirectButler.
- Clients that need an access to - DirectButlerclass will have to import it from- lsst.daf.butler.direct_butler.
- Butler.from_config(...)should be used to make- Butlerinstances.- Butler(...)still works and is identical to- Butler.from_config(...), but will generate- mypyerrors. (DM-41116)
 
- SqlRegistrydoes not inherit now from- Registryor any other interface, and has been moved to- registry.sql_registrymodule. (DM-41235)
- Added - Butler._querycontext manager which will support building of the complex queries for data in Butler. For now- Butler._queryprovides access to just three convenience methods similar to query methods in- Registry. This new API should be considered experimental and potentially unstable, its use should be limited to downstream middleware code for now. (DM-41761)
- Added - dry_runparameter to- Butler.transfer_fromto allow the transfer to run without doing the transfer. (DM-42306)
 
- The - Datastorebase class was changed so that subclasses are no longer required to have the same constructor parameters as the base class. Subclasses are now required to implement- _create_from_configfor creating an instance from the- Datastore.fromConfigstatic method, and- clonefor creating a copy of an existing instance. (DM-42317)
- Added - Timespan.from_day_obs()to construct a 24-hour time span from an observing day specified as a YYYYMMDD integer. (DM-42636)
Bug Fixes¶
- Fixed QuantumGraph-load breakage introduced on DM-41043. (DM-41164) 
- DirectButler.transfer_fromno longer requires expanded dataset refs under certain circumstances. However, providing expanded refs in advance is still recommended for efficiency. (DM-41165)
- Fixed caching in - DatasetRefdeserialization that caused the serialized storage class to be ignored.- This caused intermittent failures when running pipelines that use multiple storage classes for the same dataset type. (DM-41562) 
- Stopped accepting and ignoring unrecognized keyword arguments in - DimensionRecordconstructors.- Passing an invalid field to a - DimensionRecordnow raises- TypeError.- This also prevents - DimensionRecordconstruction from reinterpreting- timespan=Noneas- timespan=Timespan(None, None). (DM-41724)
- Enabled collection-information caching in several contexts, especially during dataset query result iteration. - This fixed a performance- and database-load regression introduced on DM-41117, in which we emitted many redundant queries for collection information. (DM-42216) 
- Fixed miscellaneous thread-safety issues in - DimensionUniverse,- DimensionGroup, and- StorageClassFactory. (DM-42317)
- butler query-collections --chains=TABLEnow lists children in search order, not alphabetical order. (DM-42605)
- Fixed problem with serialization of - exposuredimension records with Pydantic v2. (DM-42812)
- Butler.existsnow throws a- NoDefaultCollectionErrorwhen attempting to query for a- DataIdwithout specifying any collections to search. Previously it would return- False, hiding the user error. (DM-42945)
- Reading masked parquet columns into astropy Tables now uses appropriate fill values. In addition, floating point columns will be filled with - NaNinstead of using a masked column. This fixes discrepancies when accessing masked columns with- .filled()or- not. (DM-43187)
- Reverted/fixed part of DM-43187. Now masked floating point columns will retain their masked status on read. The underlying array value and fill value are still - NaNfor consistency when using- filled()or- notfor these masked columns. (DM-43570)
- The - flattenflag for the- butler collection-chainCLI command now works as documented: it only flattens the specified children instead of flattening the entire collection chain.- registry.setCollectionChainwill no longer throw unique constraint violation exceptions when there are concurrent calls to this function. Instead, all calls will succeed and the last write will win. As a side-effect of this change, if calls to- setCollectionChainoccur within an explicit call to- Butler.transaction, other processes attempting to modify the same chain will block until the transaction completes. (DM-43671)
- Fixed an issue where - registry.setCollectionChainwould raise a- KeyErrorwhen assigning to a collection that was present in the collection cache. (DM-43750)
Performance Enhancement¶
- FileDatastore.knows()no longer requires database I/O if its input- DatasetRefhas datastore records attached. (DM-41880)
- Made significant performance enhancements when transferring hundreds of thousands of datasets. - Datastore now declares to - ResourcePathwhen a resource is known to be a file.
- Sped up file template validation. 
- Only request dimension metadata for template formatting if that metadata is needed. 
- Sped up cloning of - Locationinstances.
- No longer merge formatter - kwargsunless there is something to merge.
- Declared when a file location is trusted to be within the datastore. (DM-42306) 
 
Other Changes and Additions¶
- Reorganized internal subpackages, renamed modules, and adjusted symbol lifting. - This included moving some symbols that we had always intended to be private (or public only to other middleware packages) that were not clearly marked as such (e.g., with leading underscores) before. (DM-41043) 
- Dropped support for Pydantic 1.x. (DM-42302) 
- Created Dimension Universe 5 which increases the size of the instrument name field in the - instrumentdimension from 16 to 32 characters. (DM-42896)
An API Removal or Deprecation¶
- Removed dataset type component query support from all Registry methods. The main - Registry.query*methods now warn if a- componentsparameter is given and raise if it has a value other than- False. The components parameters will be removed completely after v27.
- Removed - CollectionSearchclass. A simple- tupleis now used for this. (DM-36303)
 
- Removed various already-deprecated factory methods for - DimensionPackerobjects and their support code, as well as the concrete- ObservationDimensionPacker.- While - daf_butlerstill defines the- DimensionPackerabstract interface, all construction logic has moved to downstream packages. (DM-38687)
- Removed - Butler.datastoreproperty. The datastore can no longer be accessed directly.
- Removed - Butler.datasetExists(and the “direct” variant). Please use- Butler.exists()and- Butler.stored()instead.
- Removed - Butler.getDirectand related APIs.- Butler.get()et al now use the- DatasetRefdirectly if one is given.
- Removed the - runand- ideGenerationModeparameters from- Butler.ingest(). They were no longer being used.
- Removed the - --reuse-idsoption for the- butler importcommand-line. This option was no longer used now that UUIDs are used throughout.
- Removed the - reconsitutedDimensionparameter from- Quantum.from_simple. (DM-40150)
 
Butler v26.0.0 (2023-09-22)¶
Now supports Python 3.11.
New Features¶
- Added the ability to remove multiple dataset types at once, including expansion of wildcards, with - Registry.removeDatasetTypeand- butler remove-dataset-type. (DM-34568)
- Added the - ArrowNumpyDictstorage class to Parquet formatter. (DM-37279)
- Added support for columns with array values (1D and multi-dimensional) in Parquet tables accessed via arrow/astropy/numpy. Pandas does not support array-valued columns. (DM-37425) 
- Integrated an experimental Butler server into distribution. - lsst.daf.butler.serverwill likely not be in this location permanently. The interface is also evolving and should be considered extremely unstable. Some testing of the remote registry code has been included. (DM-37609)
- Added support for writing/reading masked columns in astropy tables. This also adds support for masked columns in pandas dataframes, with limited support for conversion between the two. (DM-37757) 
- Dimension records are now available via attribute access on - DataCoordinateinstances, allowing syntax like- data_id.exposure.day_obs. (DM-38054)
- Added default row groups (targeting a size of <~ 1GB) for Parquet files. (DM-38063) 
- Butler.get()and- Butler.put()can now be used with resolved- DatasetRef. (DM-38210)
- Butler.transfer_from()can now be used in conjunction with a- ChainedDatastore. Additionally, datastore constraints are now respected. (DM-38240)
- Modified - Butler.import_()(and by extension the- butler importcommand-line) to accept URIs for the directory and export file.
- Modified - butler ingest-filesto accept a remote URI for the table file. (DM-38492)
 
- Added support for multi-index dataframes with - DataFrameDelegateand- InMemoryDatastore. (DM-38642)
- Added new APIs to support the deprecation of - LimitedButler.datastore:- LimitedButler.get_datastore_rootscan be used to retrieve any root URIs associated with attached datastores. If a datastore does not support the concept it will return- Nonefor its root URI.
- LimitedButler.get_datastore_namescan be used to retrieve the names of the internal datastores.
- LimitedButler.get_many_urisallows for the bulk retrieval of URIs from a list of refs.
- Also made - getURIand- getURIsavailable for- LimitedButler. (DM-39915)
 
- Modified to fully support Pydantic version 2.x and version 1.x. (DM-40002; DM-40303) 
API Changes¶
- Added new APIs for checking dataset existence. - storedchecks whether the datastore artifact(s) exists for a single- DatasetRef.
- stored_manyis a bulk version of- storedthat can be used for many- DatasetRef.
- existschecks whether registry and datastore know about a single- DatasetRefand can optionally check for artifact existence. The results are returned in an- Flagobject (specifically- DatasetExistence) that evaluates to- Trueif the dataset is available for retrieval.
 - Additionally - DatasetRefnow has a new method for checking whether two- DatasetRefonly differ by compatible storage classes. (DM-32940)
- lsst.daf.Butler.transfer_frommethod now accepts- LimitedButleras a source Butler. In cases when a full butler is needed as a source it will try to cast it to a- Butler. (DM-33497)
- Creating an unresolved dataset reference now issues an - UnresolvedRefWarningand is deprecated (and subsequently removed).
- A resolved - DatasetRefcan now be created by specifying the run without the ID – the constructor will now automatically issue an ID. Previously this was an error. To support ID generation a new optional parameter- id_generation_modecan now be given to the constructor to allow the ID to be constructed in different ways. (DM-37703)
 
- DatasetRefconstructor now requires- runargument in all cases and always constructs a resolved reference.
- Methods - DatasetRef.resolved(),- DatasetRef.unresolved(), and- DatasetRef.getCheckedId()were removed. (DM-37704)
 
- Added - StorageClassDelegate.copy()method. By default this method calls- copy.deepcopy()but subclasses can override as needed. (DM-38694)
- Database.fromUriand- Database.makeEnginemethods now accept- sqlalchemy.engine.URLinstances in addition to strings. (DM-39484)
- Added new parameter - without_datastoreto the- Butlerand- ButlerConfigconstructors to allow a butler to be created that can not access a datastore. This can be helpful if you want to query registry without requiring the overhead of the datastore. (DM-40120)
Bug Fixes¶
- Fixed race condition in datastore cache involving the possibility of multiple processes trying to retrieve the same file simultaneously and one of those processes deleting the file on exit of the context manager. (DM-37092) 
- Made - Registry.findDatasetrespect the storage class of a- DatasetTypethat is passed to it. This also makes direct- PipelineTaskexecution respect storage class conversions in the same way that execution butler already did. (DM-37450)
- Can now properly retrieve astropy full table metadata with - butler.get. (DM-37530)
- Fixed an order-of-operations bug in the query system (and as a result, - QuantumGraphgeneration) that manifested as a “Custom operation find_first not supported by engine iteration” message. (DM-37625)
- Butler.putis fixed to raise a correct exception for duplicate put attempts for- DatasetRefwith the same dataset ID. (DM-37704)
- Fixed parsing of order by terms to treat direct references to dimension primary key columns as references to the dimensions. (DM-37855) 
- Fixed bugs involving CALIBRATION-collection skipping and long dataset type names that were introduced on DM-31725. (DM-37868) 
- Now check for big-endian arrays when serializing to Parquet. This allows astropy FITS tables to be easily serialized. (DM-37913) 
- Fixed bugs in spatial query constraints introduced in DM-31725. (DM-37930) 
- Fixed additional bugs in spatial query constraints introduced in DM-31725. (DM-37938) 
- Fixed occasional crashes in - Butler- refresh()method due to a race condition in dataset types refresh. (DM-38305)
- Fixed query manipulation logic to more aggressively move operations from Python postprocessing to SQL. - This fixes a bug in - QuantumGraphgeneration that occurs when a dataset type that is actually present in an input collection has exactly the same dimensions as the graph as a whole, manifesting as a mismatch between- daf_relationengines. (DM-38402)
- Add check for - ListTypewhen pandas converts a list object into Parquet. (DM-38845)
- Few registry methods treated empty collection list in the same way as - None, meaning that Registry-default run collection was used. This has been fixed now to mean that queries always return empty result set, with explicit “doomed by” messages. (DM-38915)
- Fixed a bug in - butler query-data-idsthat caused a cryptic “the query has deferred operations…” error message when a spatial join is involved. (DM-38943)
- Fixed more issues with storage class conversion. (DM-38952) 
- Fixed a SQL generation bug for queries that involve the common - skypixdimension and at least two other spatial dimensions. (DM-38954)
- Fixed bugs in storage class conversion in - FileDatastore, as used by- QuantumBackedButler. (DM-39198)
- Fixed the bug in initializing PostgreSQL registry which resulted in “password authentication failed” error. The bug appeared during the SQLAlchemy 2.0 transition which changed default rendering of URL to string. (DM-39484) 
- Fixed a rare bug in follow-up dataset queries involving relation commutators. - This occurred when building QuantumGraphs where a “warp” dataset type was an overall input to the pipeline and present in more than one input RUN collection. (DM-40184) 
- Ensureed - Datastorerecord exports (as used in quantum-backed butler) are deduplicated when necessary. (DM-40381)
Performance Enhancement¶
- When passing lazy query-results objects directly to various registry methods ( - associate,- disassociate,- removeDatasets, and- certify), query and process one dataset type at a time instead of querying for all of them and grouping by type in Python. (DM-39939)
Other Changes and Additions¶
- Rewrote the registry query system, using the new - daf_relationpackage.- This change should be mostly invisible to users, but there are some subtle behavior changes: - Registry.findDatasetsnow respects the given storage class when passed a full- DatasetTypeinstance, instead of replacing it with storage class registered with that dataset type. This causes storage class overrides in- PipelineTaskinput connections to be respected in more contexts as well; in at least some cases these were previously being incorrectly ignored.
- Registry.findDatasetsnow utilizes cached summaries of which dataset types and governor dimension values are present in each collection. This should result in fewer and simpler database calls, but it does make the result vulnerable to stale caches (which, like- Registrymethods more generally, must be addressed manually via calls to- Registry.refresh.
- The diagnostics provided by the - explain_no_resultsmethods on query result object (used prominently in the reporting on empty quantum graph builds) have been significantly improved, though they now use- daf_relationterminology that may be unfamiliar to users.
- Registryis now more consistent about raising- DataIdValueErrorwhen given invalid governor dimension values, while not raising (but providing- explain_no_resultsdiagnostics) for all other invalid dimension values, as per RFC-878.
- Registrymethods that take a- whereargument are now typed to expect a- strthat is not- None, with the default no-op value now an empty string (before either an empty- stror- Nonecould be passed, and meant the same thing). This should only affect downstream type checking, as the runtime code still just checks for whether the argument evaluates as- Falsein a boolean context. (DM-31725)
 
- Added dimensions config entries that declare that the - visitdimension “populates” various dimension elements that define many-to-many relationships.- In the future, this will be used to ensure the correct records are included in exports of dimension records. (DM-34589) 
- Added converter config to allow - lsst.ip.isr.IntermediateTransmissionCurveand subclasses to be used for- lsst.afw.image.TransmissionCurve. (DM-36597)
- Butler.getURIsno longer checks the file system to see if the file exists before returning a URI if the datastore thinks it knows about the file. This does mean that if someone has removed the file from the file system without deleting it from datastore that a URI could be retrieved for something that does not exist. (DM-37173)
- Enhanced the JSON and YAML formatters so that they can both handle dataclasses and Pydantic models (previously JSON supported Pydantic and YAML supported dataclasses). 
- Rationalized the storage class conversion handling to always convert from a - dictto the original type even if the caller is requesting a- dict. Without this change it was possible to have some confusion where a Pydantic model’s serialization did not match the- dict-like view it was emulating. (DM-37214)
 
- Added an - obsCoreTableManagerproperty to- Registryfor access to the ObsCore table manager. This will be set to- Nonewhen repository lacks an ObsCore table. It should only be used by a limited number of clients, e.g.- lsst.obs.base.DefineVisitsTask, which need to update the table. (DM-38205)
- Modified - Butler.ingest()such that it can now ingest resolved- DatasetRef. If unresolved refs are given (which was the previous requirement for ingest and is no longer possible) they are resolved internally but a warning is issued.
- Added - repr()support for- RegistryDefaultsclass. (DM-38779)
 
- The behavior of - FileDatastore.transfer_from()has been clarified regarding what to do when an absolute URI (from a direct ingest) is found in the source butler. If- transfer="auto"(the default) the absolute URI will be stored in the target butler. If any other transfer mode is used the absolute URI will be copied/linked into the target butler. (DM-38870)
- Made minor modifications to the StorageClass system to support mock storage classes (in - pipe_base) for testing. (DM-38952)
- Replaced the use of - lsst.utils.ellipsismypy workaround with the native type- type.EllipsisTypeavailable since Python 3.10. (DM-39410)
- Moved Butler repository aliasing resolution into - ButlerConfigso that it is available everywhere without having to do the resolving each time. (DM-39563)
- Added ability for some butler primitives to be cached and re-used on deserialization through a special interface. (DM-39582) 
- Replaced usage of - Butler.registry.dimensionswith- Butler.dimensions.
- Modernized type annotations. 
- Fixed some documentation problems. 
- Made some Minor modernizations to use set notation and f-strings. (DM-39605) 
 
- Changed all Butler code and tests to use conforming DataIDs. Removed the fake - DataCoordinateclasses from the datastore tests. Improved type annotations in some test files. (DM-39665)
- Added various optimizations to - QuantumGraphloading. (DM-40121)
- Fixed docs on referring to timespans in queries, and made related error messages more helpful. (DM-38084) 
- Clarified that - butler prune-datasets --purgealways removes dataset entries and clarified when the run argument is used. (DM-39086)
An API Removal or Deprecation¶
- Deprecated methods for constructing or using - DimensionPackerinstances.- The - DimensionPackerinterface is not being removed, but all concrete implementations will now be downstream of- daf_butlerand will not satisfy the assumptions of the current interfaces for constructing them. (DM-31924)
- Butler.datasetExistshas been deprecated and will be removed in a future release. It has been replaced by- Butler.stored()(specifically to check if the datastore has the artifact) and- Butler.exists()which will check registry and datastore and optionally check whether the artifact exists. (DM-32940)
- Removed the - Spectractionstorage class. This was a temporary storage class added for convenience during development, which was a roll-up-and-pickle of all the potentially relevant parts of the extraction. All the necessary information is now stored inside the- SpectractorSpectrumstorage class. (DM-33932)
- Removed deprecated - ButlerURI(use- lsst.resources.ResourcePathinstead).
- Removed deprecated - kwargsparameter from- DeferredDatasetHandle.
- Removed the deprecated - butler prune-collectioncommand.
- Removed the deprecated - checkManagerDigestsfrom butler registry. (DM-37534)
 
- Deprecated - Butler.getDirect()and- Butler.putDirect(). We have modified the- get()and- put()variants to recognize the presence of a resolved- DatasetRefand use it directly. For- get()we no longer unpack the- DatasetRefand re-run the query, but return exactly the dataset being requested.
- Removed - Butler.pruneCollections. This method was replaced by- Butler.removeRunsand- Registry.removeCollectionsa long time ago and the command-line interface was removed previously. (DM-38210)
 
- Code that calculates schema digests was removed, registry will no longer store digests in the database. Previously we saved schema digests, but we did not verify them since w_2022_22 in v24.0. (DM-38235) 
- Support for integer dataset IDs in registry has now been removed. All dataset IDs must now be - uuid.UUID. (DM-38280)
- Removed support for non-UUID dataset IDs in - Butler.transfer_from(). The- id_gen_mapparameter has been removed and the- local_refsparameter has been removed from- Datastore.transfer_from(). (DM-38409)
- Deprecated - reconstituteDimensionsargument from- Quantum.from_simple. (DM-39582)
- The semi-public - Butler.datastoreproperty has now been deprecated. The- LimitedButlerAPI has been expanded such that there is no longer any need for anyone to access the datastore class directly. (DM-39915)
- lsst.daf.butler.registry.DbAuthclass has been moved to the- lsst-utilspackage and can be imported from the- lsst.utils.db_authmodule. (DM-40462)
Butler v25.0.0 (2023-02-27)¶
This is the last release that can access data repositories using integer dataset IDs. Please either recreate these repositories or convert them to use UUIDs using the butler migrate tooling.
New Features¶
- Added - StorageClass.is_typemethod to compare a type with that of the storage class itelf.
- Added keys, values, items, and iterator for - StorageClassFactory. (DM-29835)
 
- Updated parquet backend to use Arrow Tables natively, and add converters to and from pandas DataFrames, Astropy Tables, and Numpy structured arrays. (DM-34874) 
- Butler.transfer_from()can now copy dimension records as well as datasets. This significantly enhances the usability of this method when transferring between disconnected Butlers. The- butler transfer-datasetscommand will transfer dimension records by default but this can be disabled with the- --no-transfer-dimensionsoption (which can be more efficient if you know that the destination Butler contains all the records). (DM-34887)
- butler query-data-idswill now determine default dimensions to use if a dataset type and collection is specified. The logical AND of all supplied dataset types will be used. Additionally, if no results are returned a reason will now be given in many cases. (DM-35391)
- Added - DataFrameDelegateto allow DataFrames to be used with- lsst.pipe.base.InMemoryDatasetHandle. (DM-35803)
- Add - StorageClass.findStorageClassmethod to find a storage class from a python type. (DM-35815)
- The optional dependencies of - lsst-resourcescan be requested as optional dependencies of- lsst-daf-butlerand will be passed down to the underlying package. This allows callers of- lsst.daf.butlerto specify the type of resources they want to be able to access without being aware of the role of- lsst.resourcesas an implementation detail. (DM-35886)
- Requires Python 3.10 or greater for better type annotation support. (DM-36174) 
- Bind values in Registry queries can now specify list/tuple of numbers for identifiers appearing on the right-hand side of - INexpression. (DM-36325)
- It is now possible to override the python type returned by - butler.get()(if the types are compatible with each other) by using the new- readStorageClassparameter. Deferred dataset handles can also be overridden.- For example, to return an - astropy.table.Tablefrom something that usually returns an- lsst.afw.table.Catalogyou would do:- table = butler.getDirect(ref, readStorageClass="AstropyTable") - Any parameters given to the - get()must still refer to the native storage class. (DM-4551)
API Changes¶
- Deprecate support for accessing data repositories with integer dataset IDs, and disable creation of new data repositories with integer dataset IDs, as per RFC-854. (DM-35063) 
- DimensionUniversenow has a- isCompatibleWith()method to check if two universes are compatible with each other. The initial test is very basic but can be improved later. (DM-35082)
- Deprecated support for components in - Registry.query*methods, per RFC-879. (DM-36312)
- Multiple minor API changes to query methods from RFC-878 and - RFC-879 <https://rubinobs.atlassian.net/browse/RFC-879>_.- This includes: - CollectionSearchis deprecated in favor of- Sequence[str]and the new- CollectionWildcardclass.
- queryDatasetTypesand- queryCollectionsnow return- Iterable(representing an unspecified in-memory collection) and- Sequence, respectively, rather than iterators.
- DataCoordinateQueryResults.findDatasetsnow raises- MissingDatasetTypeErrorwhen the given dataset type is not registered.
- Passing regular expressions and other patterns as dataset types to - queryDataIdsand- queryDimensionRecordsis deprecated.
- Passing unregistered dataset types - queryDataIdsand- queryDimensionRecordsis deprecated; in the future this will raise- MissingDatasetTypeErrorinstead of returning no query results.
- Query result class - explain_no_resultsnow returns- Iterableinstead of- Iterator. (DM-36313)
 
- A method has been added to - DatasetRefand- DatasetType, named- overrideStorageClass, to allow a new object to be created that has a different storage class associated with it. (DM-4551)
Bug Fixes¶
- Fixed a bug in the parquet reader where a single string column name would be interpreted as an iterable. (DM-35803) 
- Fixed bug in - elementsargument to various export methods that prevented it from doing anything. (DM-36111)
- A bug has been fixed in - DatastoreCacheManagerthat triggered if two processes try to cache the same dataset simultaneously. (DM-36412)
- Fixed bug in pandas - dataframeto arrow conversion that would crash with some pandas object data types. (DM-36775)
- Fixed bug in pandas - dataframeto arrow conversion that would crash with partially nulled string columns. (DM-36795)
Other Changes and Additions¶
- For command-line options that split on commas, it is now possible to specify parts of the string not to split by using - []to indicate comma-separated list content. (DM-35917)
- Moved the typing workaround for the built-in - Ellipsis(- ) singleton to- lsst.utils. (DM-36108)
- Now define regions for data IDs with multiple spatial dimensions to the intersection of those dimensions’ regions. (DM-36111) 
- Added support for in-memory datastore to roll back a call to - datastore.trash(). This required that the- bridge.moveToTrash()method now takes an additional- transactionparameter (that can be- None). (DM-36172)
- Restructured internal Registry query system methods to share code better and prepare for more meaningful changes. (DM-36174) 
- Removed unnecessary table-locking in dimension record insertion. - Prior to this change, we used explicit full-table locks to guard against a race condition that wasn’t actually possible, which could lead to deadlocks in rare cases involving insertion of governor dimension records. (DM-36326) 
- Chained Datastore can now support “move” transfer mode for ingest. Files are copied to each child datastore unless only one child datastore is accepting the incoming files, in which case “move” is used. (DM-36410) 
- DatastoreCacheManagercan now use an environment variable,- $DAF_BUTLER_CACHE_DIRECTORY_IF_UNSET, to specify a cache directory to use if no explicit directory has been specified by configuration or by the- $DAF_BUTLER_CACHE_DIRECTORYenvironment variable. Additionally, a- DatastoreCacheManager.set_fallback_cache_directory_if_unset()class method has been added that will set this environment variable with a suitable value. This is useful for multiprocessing where each forked or spawned subprocess needs to share the same cache directory. (DM-36412)
- Added support for - ChainedDatastore.export(). (DM-36517)
- Reworked transaction and connection management for compatibility with transaction-level connection pooling on the server. - Butler clients still hold long-lived connections, via delegation to SQLAlchemy’s connection pooling, which can handle disconnections transparently most of the time. But we now wrap all temporary table usage and cursor iteration in transactions. (DM-37249) 
An API Removal or Deprecation¶
- Removed deprecated filterLabel exposure component access. (DM-27811) 
Butler v24.0.0 (2022-08-26)¶
New Features¶
- Support LSST-style visit definitions where a single exposure is part of a set of related exposures all taken with the same acquisition command. Each exposure knows the “visit” it is part of. - Modify the - exposuredimension record to include- seq_startand- seq_endmetadata.
- Modify - visitrecord to include a- seq_numfield.
- Remove - visit_systemdimension and add- visit_system_membershiprecord to allow a visit to be associated with multiple visit systems. (DM-30948)
 
- butler export-calibsnow takes a- --transferoption to control how data are exported (use- directto do in-place export) and a- --datasetsoption to limit the dataset types to be exported. It also now takes a default collections parameter (all calibration collections). (DM-32061)
- Iterables returned from registry methods - queryDataIdsand- queryDimensionRecordshave two new methods -- order_byand- limit. (DM-32403)
- Builds using - setuptoolsnow calculate versions from the Git repository, including the use of alpha releases for those associated with weekly tags. (DM-32408)
- Butler can now support lookup of repositories by label if the user environment is correctly configured. This is done using the new - get_repo_uri()and- get_known_repos()APIs. (DM-32491)
- Add a butler command line command called - butler remove-collectionsthat can remove non-RUN collections. (DM-32687)
- Add a butler command line command called - butler remove-runsthat can remove RUN collections and contained datasets. (DM-32831)
- It is now possible to register type conversion functions with storage classes. This can allow a dataset type definition to change storage class in the registry whilst allowing datasets that have already been serialized using one python type to be returned using the new python type. The - storageClasses.yamldefinitions can now look like:- TaskMetadata: pytype: lsst.pipe.base.TaskMetadata converters: lsst.daf.base.PropertySet: lsst.pipe.base.TaskMetadata.from_metadata - Declares that if a - TaskMetadatais expected then a- PropertySetcan be converted to the correct python type. (DM-32883)
- Dimension record imports now ignore conflicts (without checking for consistency) instead of failing. (DM-33148) 
- Storage class converters can now also be used on - put. (DM-33155)
- If a - DatasetTypehas been constructed that differs from the registry definition, but in a way that is compatible through- StorageClassconversion, then using that in a- lsst.daf.butler.Butler.get()call will return a python type that matches the user-specified- StorageClassinstead of the internal python type. (DM-33303)
- The dataset ID can now be used in a file template for datastore (using - {id}). (DM-33414)
- Add - Registry.getCollectionParentChainsto find the- CHAINEDcollections that another collection belongs to. (DM-33643)
- Added - has_simulatedto the- exposurerecord to indicate that some content of this exposure was simulated. (DM-33728)
- The command-line tooling has changed how it sets the default logger when using - --log-level. Now only the default logger(s) (- lsstand the colon-separated values stored in the- $DAF_BUTLER_ROOT_LOGGER) will be affected by using- --log-levelwithout a specific logger name. By default only this default logger will be set to- INFOlog level and all other loggers will remain as- WARNING. Use- --log-level '.=level'to change the root logger (this will not change the default logger level and so an additional call to- --log-level DEBUGmay be needed to turn on debugging for all loggers). (DM-33809)
- Added - azimuthto the- exposureand- visitrecords. (DM-33859)
- If repository aliases have been defined for the site they can now be used in place of the Butler repository URI in both the - Butlerconstructor and command-line tools. (DM-33870)
- Added - visit_systemto- instrumentrecord and allowed it to be used as a tie breaker in dataset determination if a dataId is given using- seq_numand- day_obsand it matches multiple visits.
- Modify export YAML format to include the dimension universe version and namespace. 
- Allow export files with older visit definitions to be read (this does not fill in the new metadata records). 
- DimensionUniversenow supports the- inoperator to check if a dimension is part of the universe. (DM-33942)
 
- Added a definition for using healpix in skypix definitions. 
- Change dimension universe caching to support a namespace in addition to a version number. (DM-33946) 
 
- Added a formatter for - lsst.utils.packages.PackagesPython types in- lsst.daf.butler.formatters.packages.PackagesFormatter. (DM-34105)
- Added an optimization that speeds up - butler query-datasetswhen using- --show-uri. (DM-35120)
API Changes¶
- Many internal utilities from - lsst.daf.butler.core.utilshave been relocated to the- lsst.utilspackage. (DM-31722)
- The - ButlerURIclass has now been removed from this package. It now exists as- lsst.resources.ResourcePath. All code should be modified to use the new class name. (DM-31723)
- lsst.daf.butler.Registry.registerRunand- lsst.daf.butler.Registry.registerCollectionnow return a Booelan indicating whether the collection was created or already existed. (DM-31976)
- A new optional parameter, - record_validation_infohas been added to- ingest(and related datastore APIs) to allow the caller to declare that file attributes such as the file size or checksum should not be recorded. This can be useful if the file is being monitored by an external system or it is known that the file might be compressed in-place after ingestion. (DM-33086)
- Added a new - DatasetType.is_compatible_withmethod. This method determines if two dataset types are compatible with each other, taking into account whether the storage classes allow type conversion. (DM-33278)
- The - runparameter has been removed from Butler method- lsst.daf.butler.Butler.pruneDatasets. It was never used in Butler implementation, client code should simply remove it. (DM-33488)
- Registry methods now raise exceptions belonging to a class hierarchy rooted at - lsst.daf.butler.registry.RegistryError. See also Error handling with Registry methods for details. (DM-33600)
- Added - DatasetType.storageClass_nameproperty to allow the name of the storage class to be retrieved without requiring that the storage class exists. This is possible if people have used local storage class definitions or a test- DatasetTypewas created temporarily. (DM-34460)
Bug Fixes¶
- butler export-calibscan now copy files that require the use of a file template (for example if a direct URI was stored in datastore) with metadata records. File templates that use metadata records now complain if the record is not attached to the- DatasetRef. (DM-32061)
- Make it possible to run - queryDimensionRecordswhile constraining on the existence of a dataset whose dimensions are not a subset of the record element’s dependencies (e.g.- rawand- exposure). (DM-32454)
- Butler constructor can now take a - os.PathLikeobject when the- butler.yamlis not included in the path. (DM-32467)
- In the butler presets file (used by the - --@option), use option names that match the butler CLI command option names (without leading dashes). Fail if option names used in the presets file do not match options for the current butler command. (DM-32986)
- The butler CLI command - remove-runscan now unlink RUN collections from parent CHAINED collections. (DM-33619)
- Improves - butler query-collections:- TABLE output formatting is easier to read. 
- Adds INVERSE modes for TABLE and TREE output, to view CHAINED parent(s) of collections (non-INVERSE lists children of CHAINED collections). 
- Sorts datasets before printing them. (DM-33902) 
 
- Fix garbled printing of raw-byte hashes in query-dimension-records. (DM-34007) 
- The automatic addition of - butler.yamlto the Butler configuration URI now also happens when a- ResourcePathinstance is given. (DM-34172)
- Fix handling of “doomed” (known to return no results even before execution) follow-up queries for datasets. This frequently manifested as a - KeyErrorwith a message about dataset type registration during- QuantumGraphgeneration. (DM-34202)
- Fix - queryDataIdsbug involving dataset constraints with no dimensions. (DM-34247)
- The - click.PathAPI changed, change from ordered arguments to keyword arguments when calling it. (DM-34261)
- Fix - queryCollectionsbug in which children of chained collections were being alphabetically sorted instead of ordered consistently with the order in which they would be searched. (DM-34328)
- Fixes the bug introduced in DM-33489 (appeared in w_2022_15) which causes not-NULL constraint violation for datastore component column. (DM-34375) 
- Fixes an issue where the command line tools were caching argument and option values but not separating option names from option values correctly in some cases. (DM-34812) 
Other Changes and Additions¶
- Add a - NOT NULLconstraint to dimension implied dependency columns.- NULLvalues in these columns already cause the query system to misbehave. (DM-21840)
- Update parquet writing to use default per-column compression. (DM-31963) 
- Tidy up - remove-runssubcommand confirmation report by sorting dataset types and filtering out those with no datasets in the collections to be deleted. (DM-33584)
- The constraints on collection names have been relaxed. Previously collection names were limited to ASCII alphanumeric characters plus a limited selection of symbols (directory separator, @-sign). Now all unicode alphanumerics can be used along with emoji. (DM-33999) 
- File datastore now always writes a temporary file and renames it even for local file system datastores. This minimizes the risk of a corrupt file being written if the process writing the file is killed at the wrong time. (DM-35458) 
An API Removal or Deprecation¶
- The - butler prune-collectionscommand line command is now deprecated. Please consider using- remove-collectionsor- remove-runsinstead. Will be removed after v24. (DM-32499)
- All support for reading and writing - Filterobjects has been removed. The old- filtercomponent for exposures has been removed, and replaced with a new- filtercomponent backed by- FilterLabel. It functions identically to the- filterLabelcomponent, which has been deprecated. (DM-27177)
Butler v23.0.0 (2021-12-10)¶
New Features¶
- Add ability to cache datasets locally when using a remote file store. This can significantly improve performance when retrieving components from a dataset. (DM-13365) 
- Add a new - butler retrieve-artifactscommand to copy file artifacts from a Butler datastore. (DM-27241)
- Add - butler transfer-datasetscommand-line tool and associated- Butler.transfer_from()API.- This can be used to transfer datasets between different butlers, with the caveat that dimensions and dataset types must be pre-defined in the receiving butler repository. (DM-28650) 
- Add - ampparameter to the Exposure StorageClass, allowing single-amplifier subimage reads. (DM-29370)
- Add new - butler collection-chainsubcommand for creating collection chains from the command line. (DM-30373)
- Add - butler ingest-filessubcommand to simplify ingest of any external file. (DM-30935)
- Add class representing a collection of log records ( - ButlerLogRecords).
- Allow this class to be stored and retrieved from a Butler datastore. 
- Add special log handler to allow JSON log records to be stored. 
- Add - --log-fileoption to command lines to redirect log output to file.
- Add - --no-log-ttyto disable log output to terminal. (DM-30977)
 
- Registry methods that previously could raise an exception when searching in calibrations collections now have an improved logic that skip those collections if they were not given explicitly but only appeared in chained collections. (DM-31337) 
- Add a confirmation step to - butler prune-collectionto help prevent accidental removal of collections. (DM-31366)
- Add - butler register-dataset-typecommand to register a new dataset type. (DM-31367)
- Use cached summary information to simplify queries involving datasets and provide better diagnostics when those queries yield no results. (DM-31583) 
- Add a new - butler export-calibscommand to copy calibrations and write an export.yaml document from a Butler datastore. (DM-31596)
- Support rewriting of dataId containing dimension records such as - day_obsand- seq_numin- butler.put(). This matches the behavior of- butler.get(). (DM-31623)
- Add - --log-labeloption to- butlercommand to allow extra information to be injected into the log record. (DM-31884)
- The - Butler.transfer_frommethod no longer registers new dataset types by default.
- Add the related option - --register-dataset-typesto the- butler transfer-datasetssubcommand. (DM-31976)
 
- Support UUIDs as the primary keys in registry and allow for reproducible UUIDs. - This change will significantly simplify transferring of data between butler repositories. (DM-29196) 
- Allow registry methods such as - queryDatasetsto use a glob-style string when specifying collection or dataset type names. (DM-30200)
- Add support for updating and replacing dimension records. (DM-30866) 
API Changes¶
- A new method - Datastore.knows()has been added to allow a user to ask the datastore whether it knows about a specific dataset but without requiring a check to see if the artifact itself exists. Use- Datastore.exists()to check that the datastore knows about a dataset and the artifact exists. (DM-30335)
Bug Fixes¶
- Fix handling of ingest_date timestamps. - Previously there was an inconsistency between ingest_date database-native UTC handling and astropy Time used for time literals which resulted in 37 second difference. This updates makes consistent use of database-native time functions to resolve this issue. (DM-30124) 
- Fix butler repository creation when a seed config has specified a registry manager override. - Previously only that manager was recorded rather than the full set. We always require a full set to be recorded to prevent breakage of a butler when a default changes. (DM-30372) 
- Stop writing a temporary datastore cache directory every time a - Butlerobject was instantiated. Now only create one when one is requested. (DM-30743)
- Fix - Butler.transfer_from()such that it registers any missing dataset types and also skips any datasets that do not have associated datastore artifacts. (DM-30784)
- Add support for click 8.0. (DM-30855) 
- Replace UNION ALL with UNION for subqueries for simpler query plans. (DM-31429) 
- Fix parquet formatter error when reading tables with no indices. - Previously, this would cause butler.get to fail to read valid parquet tables. (DM-31700) 
- Fix problem in ButlerURI where transferring a file from one URI to another would overwrite the existing file even if they were the same actual file (for example because of soft links in the directory hierarchy). (DM-31826) 
Performance Enhancement¶
- Make collection and dataset pruning significantly more efficient. (DM-30140) 
- Add indexes to make certain spatial join queries much more efficient. (DM-31548) 
- Made 20x speed improvement for - Butler.transfer_from. The main slow down is asking the datastore whether a file artifact exists. This is now parallelized and the result is cached for later. (DM-31785)
- Minor efficiency improvements when accessing - lsst.daf.butler.Confighierarchies. (DM-32305)
- FileDatastore: Improve removing of datasets from the trash by at least a factor of 10. (DM-29849) 
Other Changes and Additions¶
- Enable serialization of - DatasetRefand related classes to JSON format. (DM-28678)
- ButlerURI- httpschemes can now handle non-WebDAV endpoints. Warnings are only issued if WebDAV functionality is requested. (DM-29708)
- Switch logging such that all logging messages are now forwarded to Python - loggingfrom- lsst.log. Previously all Python- loggingmessages were being forwarded to- lsst.log. (DM-31120)
- Add formatter and storageClass information for FocalPlaneBackground. (DM-22534) 
- Add formatter and storageClass information for IsrCalib. (DM-29531) 
- Change release note creation to use [Towncrier](https://towncrier.readthedocs.io/en/actual-freaking-docs/index.html). (DM-30291) 
- Add a Butler configuration for an execution butler that has pre-defined registry entries but no datastore records. - The - Butler.put()will return the pre-existing dataset ref but will still fail if a datastore record is found. (DM-30335)
- If an unrecognized dimension is used as a look up key in a configuration file (using the - +syntax) a warning is used suggesting a possible typo rather than a confusing- KeyError. This is no longer a fatal error and the key will be treated as a name. (DM-30685)
- Add - splittransfer mode that can be used when some files are inside the datastore and some files are outside the datastore. This is equivalent to using- Noneand- directmode dynamically. (DM-31251)
Butler v22.0 (2021-04-01)¶
New Features¶
- A Butler instance can now be configured with dataId defaults such as an instrument or skymap. [DM-27153] 
- Add - butler prune-datasetscommand. [DM-26689]
- Add - butler query-dimension-recordscommand [DM-27344]
- Add - --unlinkoption to- butler prune-collectioncommand. [DM-28857]
- Add progress reporting option for long-lived commands. [DM-28964] 
- Add - butler associatecommand to add existing datasets to a tagged collection. [DM-26688]
- Add officially-supported JSON serialization for core Butler classes. [DM-28314] 
- Allow - butler.get()to support dimension record values such as exposure observing day or detector name in the dataID. [DM-27152]
- Add “direct” ingest mode to allow a file to be ingested retaining the full path to the original file. [DM-27478] 
Bug Fixes¶
- Fix temporal queries and clarify - Timespanbehavior. [DM-27985]
Other Changes and Additions¶
- Make - ButlerURIclass immutable. [DM-29073]
- Add - ButlerURI.findFileResourcesmethod to walk the directory tree and return matching files. [DM-29011]
- Improve infrastructure for handling test repositories. [DM-23862] 
Butler Datastores¶
New Features¶
- Implement basic file caching for use with remote datastores. [DM-29383] 
- Require that a DataId always be available to a - Formatter. This allows formatters to do a consistency check such as comparing the physical filter in a dataId with that read from a file. [DM-28583]
- Add special mode to datastore to instruct it to ignore registry on - get. This is useful for Execution Butlers where registry knows in advance about all datasets but datastore does not. [DM-28648]
- Add - forgetmethod to instruct datastore to remove all knowledge of a dataset without deleting the file artifact. [DM-29106]
Butler Registry¶
New Features¶
- Avoid long-lived connections to database. [DM-26302] 
- Add option to flatten when setting a collection chain. [DM-29203]