.. py:currentmodule:: lsst.ap.verify .. _ap-verify-datasets-butler: ################################ Datasets vs. Butler repositories ################################ Datasets are organized using a :ref:`specific directory structure` instead of an :ref:`LSST Butler repository`. This is by design: :ref:`ingestion of observatory files into a repository` is considered part of the pipeline system being tested by ``ap_verify``, so ``ap_verify`` must be fed uningested data as its input. The ingestion step creates a valid repository that is then used by the rest of the pipeline. A secondary benefit of this approach is that dataset maintainers do not need to manually ensure that the Git repository associated with a dataset remains a valid Butler repository despite changes to the dataset. The dataset format merely requires that files be segregated into science and calibration directories, a much looser integrity constraint. While datasets are not Butler repositories themselves, the dataset format includes a directory, :file:`repo`, that serves as a template for the post-ingestion repository. This template helps ensure that all repositories based on the dataset will be properly set up, in particular that any observatory-specific settings will be applied. :file:`repo` is never modified by ``ap_verify``; all repositories created by the pipeline must be located elsewhere, whether or not they are backed by the file system.