.. py:currentmodule:: lsst.ap.verify

.. _ap-verify-datasets-butler:

################################
Datasets vs. Butler repositories
################################

:doc:`Datasets <datasets>` are organized using a :ref:`specific directory structure<ap-verify-datasets-structure>` instead of an :ref:`LSST Butler repository<butler>`.
This is by design:
:ref:`ingestion of observatory files into a repository<ingest>` is considered part of the pipeline system being tested by ``ap_verify``, so ``ap_verify`` must be fed uningested data as its input.
The ingestion step creates a valid repository that is then used by the rest of the pipeline.

A secondary benefit of this approach is that dataset maintainers do not need to manually ensure that the Git repository associated with a dataset remains a valid Butler repository despite changes to the dataset.
The dataset format merely requires that files be segregated into science and calibration directories, a much looser integrity constraint.

While datasets are not Butler repositories themselves, the dataset format includes a directory, :file:`repo`, that serves as a template for :ref:`repositories created by ap_verify.py <ap-verify-run-output>`.
This template helps ensure that all repositories based on the dataset will be properly set up, in particular that any observatory-specific settings will be applied.
:file:`repo` is never modified by ``ap_verify``; all repositories created by the pipeline must be located elsewhere, whether or not they are backed by the file system.