Datasets vs. Butler repositories

Datasets are organized using a specific directory structure instead of an LSST Butler repository. This is by design: ingestion of observatory files into a repository is considered part of the pipeline system being tested by ap_verify, so ap_verify must be fed uningested data as its input. The ingestion step creates a valid repository that is then used by the rest of the pipeline.

A secondary benefit of this approach is that dataset maintainers do not need to manually ensure that the Git repository associated with a dataset remains a valid Butler repository despite changes to the dataset. The dataset format merely requires that files be segregated into science and calibration directories, a much looser integrity constraint.

While datasets are not Butler repositories themselves, the dataset format includes a directory, repo, that serves as a template for repositories created by ap_verify.py. This template helps ensure that all repositories based on the dataset will be properly set up, in particular that any observatory-specific settings will be applied. repo is never modified by ap_verify; all repositories created by the pipeline must be located elsewhere, whether or not they are backed by the file system.