Datasets vs. Butler repositories¶
Datasets are organized using a specific directory structure instead of an LSST Butler repository.
This is by design:
ingestion of observatory files into a repository is considered part of the pipeline system being tested by ap_verify
, so ap_verify
must be fed uningested data as its input.
The ingestion step creates a valid repository that is then used by the rest of the pipeline.
A secondary benefit of this approach is that dataset maintainers do not need to manually ensure that the Git repository associated with a dataset remains a valid Butler repository despite changes to the dataset. The dataset format merely requires that files be segregated into science and calibration directories, a much looser integrity constraint.
While datasets are not Butler repositories themselves, the dataset format includes a directory, repo
, that serves as a template for the post-ingestion repository.
This template helps ensure that all repositories based on the dataset will be properly set up, in particular that any observatory-specific settings will be applied.
repo
is never modified by ap_verify
; all repositories created by the pipeline must be located elsewhere, whether or not they are backed by the file system.