Using the Butler in unit tests

This document describes how to use the Butler when testing Butler-dependent code in circumstances where one should not use a repository associated with a specific telescope, such as in unit tests. It covers tools for creating memory-based repositories and for creating simple mock datasets.

The goal is not to produce a completely realistic production-like environment, as in integration testing, but to provide the minimal functionality needed for higher-level code such as tasks to run in an isolated test environment.

Overview

The lsst.daf.butler.tests module provides tools for using the Butler in tests, including creating and populating minimal repositories. These tools are designed to give each unit test case its own, customized environment containing only the state needed for that particular test.

Creating and populating a repository

The lsst.daf.butler.tests.makeTestRepo function creates an in-memory repository. The repository is optimized for speed rather than for production processing, but otherwise is similar to lsst.daf.butler.Butler.makeRepo.

The addDataIdValue function defines allowed data ID values after the repository has been created. By default, related values are assigned to each other in an arbitrary one-to-one fashion. This is good enough for many tests, where a single data ID is all that’s needed.

If you need more control (or if the assignment algorithm produces inconsistent results), addDataIdValue lets you specify by keyword which values are related to which. Any unspecified relationships are still filled arbitrarily, so you need only give the ones your tests depend on.

from lsst.daf.butler.tests import addDataIdValue

addDataIdValue(butler, "skymap", "map")
# Tract requires a skymap; can assume it's "map" because no other options.
# Were there more than one skymap, this would choose one unpredictably.
addDataIdValue(butler, "tract", 42)
# Explicit specification.
addDataIdValue(butler, "tract", 43, skymap="map")
# Can map dimensions in a many-to-many relationship.
for patch in [0, 1, 2, 3, 4, 5]:
    addDataIdValue(butler, "patch", patch, tract=42)
    addDataIdValue(butler, "patch", patch, tract=43)

The addDatasetType function registers any dataset types needed by the test (e.g., “calexp”). As with registering the data IDs, this is a prerequisite for actually reading or writing any datasets of that type during the test.

Test collections

On some systems, makeTestRepo can be too slow to run for every test, so at present it should be called in the setUpClass method. Individual tests can be partly isolated using the lsst.daf.butler.tests.makeTestCollection method, which creates a new collection in an existing repository.

Note

While each collection has its own datasets, the set of valid dataset types and data IDs is repository-wide. Avoid defining these at the collection level, because they may have unpredictable effects on later tests.

Once a test collection is created, datasets can be read or written as usual:

# Assuming self is a unittest.TestCase object
butler = butlerTests.makeTestCollection(repo, uniqueId=self.id())

exposure = makeTestExposure()  # user-defined code
butler.put(exposure, "calexp", dataId)
processCalexp(dataId)  # user-defined code