Command-line task argument reference¶
This page describes the command-line arguments and environment variables common to command-line tasks.
Signature and syntax¶
The basic call signature of a command-line task is:
task.py REPOPATH [@file [@file2 ...]] [--output OUTPUTREPO | --rerun RERUN] [named arguments]
See Argument files for details on @file
syntax.
For named arguments that take multiple values do not use a =
after the argument name.
For example, --config-file foo.py bar.py
, not --config-file=foo bar
.
Status code¶
A command-line task returns a status code of 0
if all data IDs were successfully processed.
If the command-line task failed to process one or more data IDs, the status code is equal to the number of data IDs that failed.
See also: --noExit
.
Positional arguments¶
-
REPOPATH
¶
Input Butler data repository URI or path.
The input Butler data repository is always the first argument to a command-line task. This argument is required for all command-line task runs, except when printing help (
--help
).In general, this is a URI that depends on the Butler backend. For example,
swift://host/path
for a Swift backend orfile://path
for a POSIX backend.For POSIX backends, this may also be an absolute file path or a path relative to the current working directory.
If the
PIPE_INPUT_ROOT
environment variable is set, then theREPOPATH
is relative to that. See Path environment variable examples.For background, see Using Butler data repositories and reruns with command-line tasks.
See also
--rerun
argument to specify input and output reruns within this Butler repository.
Named arguments¶
An output data repository must be specified with either --output
or --rerun
.
Other named arguments are optional.
-
--calib
<calib_repo>
¶ Calibration data repository URI path.
The path may be absolute, relative to the current working directory, or relative to
PIPE_CALIB_ROOT
(when set). See Path environment variable examples.
-
-c
<name=val>
,
--config
<name=val>
¶ Task configuration overrides.
The
-c
/--config
argument can appear multiple times.See How to set configurations with command-line arguments for more information.
-
-C
<configfile>
,
--config-file
<configfile>
,
--configfile
<configfile>
¶ Task configuration override file(s).
The
-C
/-config-file
/--configfile
argument can appear multiple times.See How to use configuration files for more information.
-
--clobber-config
¶
Backup and overwrite existing config files.
Normally a command-line task checks existing config files in a Butler repository to ensure that the current configurations are consistent with previous pipeline executions. This argument disables this check, which may be useful for development.
This argument is safe with
-j
multiprocessing, but not necessarily with other forms of parallel execution.See How to override configuration checks with the --clobber-config argument for more information.
-
--clobber-output
¶
Remove and re-create the output repository if it already exists.
This argument is safe with
-j
multiprocessing, but not necessarily with other forms of parallel execution.
-
--clobber-versions
¶
Backup and then overwrite existing package version provenance.
Normally a command-line task checks that the Science Pipelines package versions are the same as for previous executions that wrote to an output repository or rerun. This argument disables this check, which may be useful for development.
This argument is safe with
-j
multiprocessing, but not necessarily with other forms of parallel execution.See How to override software version checks with --no-versions or --clobber-versions for more information.
-
-h
,
--help
¶
Print help.
The help is equivalent to this documentation page, describing command-line arguments. This help does not describe the command-line task’s specific functionality.
-
--id
[[<dataid>] ...]
¶ Butler data IDs.
Specify data IDs to process using data ID syntax. For example,
--id visit=12345 ccd=1,2^0,3
. For more information, see Specifying data IDs with command-line tasks.An
--id
argument without values indicates that all data available in the input repository will be processed (see How to specify all available data IDs).For many-to-one processing tasks the
--id
argument specifies output data IDs, while--selectId
is used for input data IDs.The
--id
argument can appear multiple times. See How to use multiple --id arguments.
-
-L
<level|component=level> [level|component=level...]
,
--loglevel
<level|component=level> [level|component=level...]
¶ Log level.
Supported levels are:
trace
,debug
,info
,warn
,error
, orfatal
.Log levels can be set globally (
-L debug
) or for a specific named logger (-L pipe.base=debug
).Specify multiple arguments to control the global and named logging levels simultaneousy (
-L warn pipe.base=debug
).The
-L
/--loglevel
argument can appear multiple times.For more information, see Logging with command-line tasks.
-
--longlog
¶
Enable the verbose logging format.
See Using the verbose logging format for more information.
-
--debug
¶
Enable debugging mode.
-
--doraise
¶
Raise an exception on error.
This mode causes the task to exit early if it encounters an error, rather than logging the error and continuing.
-
--no-backup-config
¶
Disable copying config to file~N backup.
-
--no-versions
¶
Disable package version consistency validation.
This mode permits data processing even if outputs exist in the output data repository or rerun from a different version of Science Pipelines packages.
This mode is useful for development should not be used in production processing.
See also
--clobber-versions
.See How to override software version checks with --no-versions or --clobber-versions for more information.
-
--output
<output_repo>
¶ Output data repository URI or path.
The output data repository will be created if it does not exist.
The path may be absolute, relative to the current working directory, or relative to
PIPE_CALIB_ROOT
(when set). See Path environment variable examples.--output
may not be used with the--rerun
argument.See Using Butler data repositories and reruns with command-line tasks for background.
-
-j
<processes>
,
--processes
<processes>
¶ Number of processes to use.
When processes is larger than 1 the task uses the Python
multiprocessing
module to parallelize processing of multiple datasets across multiple processors.See also
--timeout
.See Parallel processing with command-line tasks for more information.
-
--rerun
<[input:]output>
¶ Specify output rerun (and optionally the input rerun as well).
Reruns are data repositories relative to the root repository,
REPOPATH
.--rerun output
is equivalent to--output REPOPATH/rerun/output
.An input rerun can also, optionally, be specified.
--rerun input:output
sets the input repository path toREPOPATH/rerun/input
the output repository path toREPOPATH/rerun/output
.If an argument to
--rerun
starts with a/
, it will be interpreted as an absolute path rather than as being relative to the root input data repository.The arguments supplied to
--rerun
may refer to symbolic links to directories. Data will be read or written from the links’ targets.See Using Butler data repositories and reruns with command-line tasks for more information.
-
--show
<config|history|data|tasks|run>
¶ Print metadata without processing.
Permitted values are:
config
: show configuration state; add=PATTERN
to limit to configuration entries matching the glob pattern.history=PATTERN
: show where the configuration entries that match the glob pattern were set.data
: show data IDs resolved by the--id
argument.tasks
: show sub-tasks run by the command-line task.
Multiple values can be shown at once. For example,
--show config data
.Normally the command-line task will exit before processing any data. If you want to also run the task after showing metadata, append the
run
value. For example,--show config data run
.
-
--selectId
¶
Input data IDs for many-to-one tasks.
For many-to-one processing tasks, such as coaddition, the
--selectId
argument is used to specify input data IDs, while--id
is used to specify output data IDs. The syntax for--selectId
is identical to that of--id
.For more information about dataId selection syntax, see Specifying data IDs with command-line tasks.
-
--noExit
¶
(Advanced) prevent the command-line task from exiting directly to the shell with a non-zero status code if there are one or more processing failures.
If there are failures, by default, a command-line task exits to the directly shell with a status code equal to the number of data IDs that it failed to process. This means that the command-line task does not return to the run script that originally called the
lsst.pipe.base.CmdLineTask.parseAndRun
method if there is an error. Some command-line tasks (such as the MPI-enabled scripts inpipe_drivers
) needlsst.pipe.base.CmdLineTask.parseAndRun
to always return to the run script. In that case, use this--noExit
argument.When
--noExit
is used, the command-line task will not exit to the shell fromlsst.pipe.base.CmdLineTask.parseAndRun
if failures are encountered. Instead, it will return normally to the run script that calledparseAndRun
. In this case, it is up to the run script to set an appropriate shell status code.See also
--doraise
.
Argument files¶
Arguments can be written to a plain text file and referenced with an @filepath
command-line argument.
The contents of argument files are identical to what you’d write on the command line, with these rules:
Text can be split across multiple lines. For example, you can have one argument per line.
Do not use
\
as a continuation character.Include comments with a
#
character. Content on a line after the#
character is ignored.Blank lines and lines starting with
#
are ignored.
You can mix argument files with other command-line arguments (including additional --id
and --config
arguments).
You can include multiple @filepath
references in the same command.
Example¶
For example, the file foo.txt
contains:
--id visit=54123^55523 raft=1,1^2,1 # data ID
--config someParam=someValue --config-file configOverrideFilePath
You can then reference it with @foo.txt
, along with additional command-line arguments:
task.py repo @foo.txt --config anotherParam=anotherValue --output outputPath
Environment variables¶
The PIPE_INPUT_ROOT
, PIPE_CALIB_ROOT
, and PIPE_OUTPUT_ROOT
environment variables let you more easily specify Butler data repositories.
Each environment variable is used as a root directory for relative paths provided on the command line. If you set an absolute path on the command line, the environment variable is ignored. see examples.
-
PIPE_CALIB_ROOT
¶ Root directory for the calibration Butler data repository argument (–calib).
-
PIPE_OUTPUT_ROOT
¶ Root directory for the output Butler data repository argument (–output).
Path environment variable examples¶
These examples feature PIPE_INPUT_ROOT
to help specify the input data repository along with REPOPATH
, which is the first positional argument of any command.
The data repository path is
$PIPE_INPUT_ROOT/DATA
(orDATA
ifPIPE_INPUT_ROOT
is undefined):processCcd.py DATA [...]
The data repository path is
$PIPE_INPUT_ROOT
(or current working directory ifPIPE_INPUT_ROOT
is undefined):processCcd.py . [...]
The data repository path is an absolute path:
processccd.py /DATA/a [...]
PIPE_INPUT_ROOT
is ignored in this case:
The same behavior applies to the named arguments:
--calib
withPIPE_CALIB_ROOT
.--output
withPIPE_OUTPUT_ROOT
.