Getting started

lsst.faro is part of the LSST Science Pipelines. If you are new to the LSST Science Pipelines, it may be helpful to begin with the Getting started tutorial and installation instructions.

If developing on Rubin computing facilities, a shared version of the software stack should be available for use.

Running faro

  • Running and building lsst.faro locally.

  • Running faro on an reprocessed Gen3 repository at NCSA

lsst.faro is can be run using pipetask.

Shared Gen3 Data Repositories

Information on shared Gen3 data repositories for data managed at NCSA can be found in /repo/README.md. See DMTN-167 for more information on the organization of Gen3 data repositories.

Warning

When developing metrics in faro, particular care should be taken when creating a new dataset type name associated with a metric. As noted in DMTN-167, the dataset type names are global with no implicit name spacing. This may change in the future; see DM-29817. When developing metrics, it is recommended to run on a local data repository rather than a shared Gen3 data repository in case metrics need to renamed or the dimensions associated with metric calculation need to be changed.

Example: rc2_subset

Running faro on a small local dataset. The rc2_subset is the smallest CI dataset for which all faro metrics can be run without error and produce meaingful results.

  1. Set up rc2_subset following the instructions here.

  2. Set up faro package; see setting up.

  3. An example command (update the command):

    pipetask run -b $RC2_SUBSET_DIR/SMALL_HSC/butler.yaml -p $FARO_DIR/pipelines/metrics_pipeline_matched.yaml -i u/$USER/single_frame -o u/$USER/faro_matched_visits_r --register-dataset-types -d "instrument='HSC' AND detector=42 AND band='r'"
    

Documentation for using the pipetask run command and various options can be found here. Briefly, the example command above uses the -b option to specify the Butler repository, -p to specify the pipeline, -i to specify the input collection, -o to specify the output collection (this should almost always be a user collection prefixed with u/username/ unless you are running in production), and -d to provide a query to select a subset of data on which to compute metrics.

Warning

The --register-dataset-types option should be used with caution as this will allow the registration of new dataset types that are global across the repository.

Example: HSC RC2 dataset

Running faro on a Gen3 repository at NCSA. The HSC RC2 data that is reprocessed monthly with the latest version of the Science Pipelines is a good example, see DMTN-091. Information on the current status of HSC RC2 re-processing and latest runs can be found here.

  1. Set up lsst.faro package; see setting_up.

  2. An example command, in this case running metrics on the source catalog of single visit:

    pipetask --long-log run -b /repo/main/butler.yaml --register-dataset-types -p $FARO_DIR/pipelines/measurement/measurement_detector_table.yaml -d "visit=35892 AND skymap='hsc_rings_v1' AND instrument='HSC'" --output u/$USER/faro_test -i HSC/runs/RC2/w_2021_18/DM-29973 --timeout 999999
    

Use your username.

Example: DRP processing

lsst.faro can be run together with other processing steps in a pipeline, e.g., as part of DRP processing.

Examples of this functionality can be found in the rc2_subset. One could follow the steps in this tutorial for more information.

Adding a metric to faro

Before making contributions to faro, we recommend to consult the LSST DM Developers Guide as a general reference for software development in Rubin DM, and in particular, the best practices covered in the DM development workflow.

Normative Science Verification Metrics

lsst.faro is used for both science verification as well as scientific validation and charactization.

Normative metrics are associated with science performance requirements defined in the DMSR, OSS, and LSR that will be verified by the Rubin Observatory Construction Project. If you are intending to implement a normative metric, please read the information below; for non-normative metrics skip to the next section.

  1. Please contact the core development team by posting on the #rubinobs-science-verification Slack channel or by reaching out to one of the main developers. This will facilitate coordination and scheduling of work.

  2. Review the detailed metric specification and algorithm definition. Detailed requirement specifications and associated test cases are being developed in the LSST Verification and Validation (LVV) project in JIRA. (For more systems engineering details, see the LSST Verification & Validation Documentation and LSST Verification Architecture.)

Planning Work

  1. Create JIRA ticket. faro has been tracking development using 6-month work cycles, i.e., JIRA epics. There is also a backlog epic. When starting faro development, or making a bugfix, create a JIRA ticket. Include “faro” as a Component and set the team as “DM Science”. It is recommended to contact the faro team to help everyone stay on the same page.

Setting Up

  1. Development can be done from the Rubin Science Platform (RSP) notebook aspect, lsst-devl services, or using Docker image containing the Science Pipelines software. If using the RSP, suggest to read the tutorial on developing Science Pipelines in the notebook aspect.

  2. Set up Science Pipelines:

    source /software/lsstsw/stack/loadLSST.bash
    setup lsst_distrib
    

The example above points to a shared version of the software stack on the GPFS file systems.

  1. Clone the faro repo:

    git clone https://github.com/lsst/faro.git
    

This is a local version of faro package to do development work.

  1. Set up local version of the faro package.

    cd faro
    setup -k -r .
    

At this point you can verify that you are using your local version:

eups list -s | grep faro
  1. Create a development branch:

    git checkout -b git checkout -b tickets/DM-NNNNN
    

All development should happen on ticket branches (and should have associated JIRA tickets). User branches (e.g., u/jcarlin/) can be used for experimenting/testing.

Adding a Metric

  1. Identify the analysis context. Review the associated connections, config, and task base classes for that analysis context to understand the in-memory python objects that will be passed to the run method of the metric measurement task and the configuration options. See design concepts for more information. Currently implemented analysis contexts are listed here.

  2. Implement Measurement task. This will be an instance of lsst.pipe.base.Task that performs the specific operations of a given metric. See NumSourcesTask defined in BaseSubTasks.py for a simple example metric that returns the number of rows in an input source/object catalog. Additional examples of measurement tasks can be found in the python/lsst/faro/measurement directory of the package.

  3. Implement unit tests. All algorithmic code used for metric computation should have associated unit tests. Examples can be found in the package tests directory.

  4. Add metric to a pipeline yaml file. The pipeline yaml contains the configuration information to execute metrics. See measurement_visit_table.yaml for an example that uses VisitTableMeasurementTask to count the number of rows in an input source/object catalog. Additional examples of pipeline files can be found in pipelines/measurement directory of the package.

  5. Name the metric. Currently each metric is associated with separately named dataset type that is global (more info here). To date, metric names have followed the pattern “metricvalue_{package}_{metric}” where the “package” and “metric” are given in the yaml configuration file. Metric naming conventions is an area of active development and it is recommended to contact the faro development team for up-to-date guidance.

Review

The following is brief summary of the steps for Review preparation.

  1. Push code.

  2. Run unit tests with scons. Run scons from the top level directory of the package.

    scons
    
  3. Build package documentation locally. From the top level package directory:

    package-docs build
    
  4. Run continuous Integration test with Jenkins. Now that we have tested the package on its own, it is time to test integration with the rest of the Science Pipelines. When running the Jenkins test, the list of EUPS packages to build should include lsst_distrib lsst_ci ci_hsc_gen3 ci_imsim. The latter two EUPS packages will run CI tests that include executing faro on DRP products.

  5. Make the Pull Request.

  6. Follow code review steps.

  7. Merge. Rebase if needed – see pushing code.