PipelineGraph¶
- class lsst.pipe.base.pipeline_graph.PipelineGraph(*, description: str = '', universe: DimensionUniverse | None = None, data_id: DataCoordinate | Mapping[str, Any] | None = None)¶
- Bases: - object- A graph representation of fully-configured pipeline. - PipelineGraphinstances are typically constructed by calling- Pipeline.to_graph, but in rare cases constructing and then populating an empty one may be preferable.- Parameters:
- descriptionstr, optional
- String description for this pipeline. 
- universelsst.daf.butler.DimensionUniverse, optional
- Definitions for all butler dimensions. If not provided, some attributes will not be available until - resolveis called.
- data_idlsst.daf.butler.DataCoordinateor other data ID, optional
- Data ID that represents a constraint on all quanta generated by this pipeline. This typically just holds the instrument constraint included in the pipeline definition, if there was one. 
 
- description
 - Attributes Summary - Data ID that represents a constraint on all quanta generated from this pipeline. - A mapping view of the dataset types in the graph. - String description for this pipeline. - Whether this graph's tasks and dataset types have been topologically sorted (with unspecified but deterministic tiebreakers) since the last modification to the graph. - Whether all of this graph's nodes are resolved and any all have been checked for correctness. - The special "packages" dataset type that records software versions. - An ordered iterable of the labels of task subsets that must be executed separately. - A mapping of all labeled subsets of tasks. - A mapping view of the tasks in the graph. - Definitions for all butler dimensions. - Methods Summary - add_task(label, task_class[, config, ...])- Add a new task to the graph. - add_task_nodes(nodes[, parent])- Add one or more existing task nodes to the graph. - add_task_subset(subset_label, task_labels[, ...])- Add a label for a set of tasks that are already in the pipeline. - check_dataset_type_registrations(butler[, ...])- Check that dataset type registrations in a data repository match the definitions in this pipeline graph. - consumers_of(dataset_type_name)- Return the - TaskNodeand/or- TaskInitNodeobjects that read the given dataset type.- consuming_edges_of(dataset_type_name)- Return the - ReadEdgeobjects that link the named dataset type to the tasks that consume it.- copy()- Return a copy of this graph that copies all mutable state. - diff_tasks(other)- Compare two pipeline graphs. - get_all_dimensions([prerequisites])- Return all dimensions used in this graph's tasks and dataset types. - get_task_step(task_label)- Return the step that the given task belongs to. - group_by_dimensions([prerequisites])- Group this graph's tasks and dataset types by their dimensions. - init_output_run(butler)- Initialize a new output RUN collection by writing init-output datasets (including configs and packages). - inputs_of(task_label[, init])- Return the dataset types that are inputs to a task. - instantiate_tasks([get_init_input, ...])- Instantiate all tasks in the pipeline. - iter_edges([init])- Iterate over edges in the graph. - Iterate over nodes in the graph. - Iterate over all of the dataset types that are consumed but not produced by the graph. - make_bipartite_xgraph([init])- Return a bipartite networkx representation of just the runtime or init-time pipeline graph. - make_dataset_type_xgraph([init])- Return a networkx representation of just the dataset types in the pipeline. - make_task_xgraph([init])- Return a networkx representation of just the tasks in the pipeline. - Export a networkx representation of the full pipeline graph, including both init and runtime edges. - outputs_of(task_label[, init, ...])- Return the dataset types that are outputs of a task. - producer_of(dataset_type_name)- Return the - TaskNodeor- TaskInitNodethat writes the given dataset type.- producing_edge_of(dataset_type_name)- Return the - WriteEdgethat links the producing task to the named dataset type.- reconfigure_tasks(*args[, ...])- Update the configuration for one or more tasks. - register_dataset_types(butler[, ...])- Register all dataset types in a data repository. - remove_task_subset(subset_label)- Remove a labeled set of tasks. - remove_tasks(labels[, drop_from_subsets])- Remove one or more tasks from the graph. - resolve([registry, dimensions, ...])- Resolve all dimensions and dataset types and check them for consistency. - select(expression)- Return a new pipeline graph with the tasks that match an expression. - select_tasks(expression)- Return the tasks that match an expression. - sort()- Sort this graph's nodes topologically with deterministic (but unspecified) tiebreakers. - Iterate over independent subgraphs that together comprise this pipeline graph. - write_configs(butler)- Write the config datasets for all tasks in the pipeline graph. - write_init_outputs(butler)- Write the init-output datasets for all tasks in the pipeline graph. - write_packages(butler)- Write the 'packages' dataset for the currently-active software versions. - Attributes Documentation - data_id¶
- Data ID that represents a constraint on all quanta generated from this pipeline. - This is may not be available unless the graph is resolved. 
 - dataset_types¶
- A mapping view of the dataset types in the graph. - This mapping has - strparent dataset type name keys, but only provides access to its- DatasetTypeNodevalues if- resolvehas been called since the last modification involving a task that uses a dataset type. See- DatasetTypeMappingViewfor details.
 - description¶
- String description for this pipeline. 
 - has_been_sorted¶
- Whether this graph’s tasks and dataset types have been topologically sorted (with unspecified but deterministic tiebreakers) since the last modification to the graph. - If the pipeline graph has step definitions, the sort order is consistent with the step order. - This may return - Falseif the graph happens to be sorted but- sortwas never called,.
 - is_fully_resolved¶
- Whether all of this graph’s nodes are resolved and any all have been checked for correctness. - A fully-resolved graph is always sorted as well, and a fully-resolved graph with sorted 
 - packages_dataset_type¶
- The special “packages” dataset type that records software versions. - This is not associated with a task and hence is not considered part of the pipeline graph in other respects, but it does get written with other provenance datasets. 
 - steps¶
- An ordered iterable of the labels of task subsets that must be executed separately. - Steps are intended to be partition of the full pipeline graph - the steps together include all tasks, and each task belongs to exactly one step - but this is only checked when the graph is resolved. - The order of the steps is required to be consistent with the graph’s topological ordering, and when sorting a graph with steps, that sort order is guaranteed to be consistent with the order of the steps. But there may be other orderings of the steps that are still valid for execution (e.g. two steps could be runnable in parallel). 
 - task_subsets¶
- A mapping of all labeled subsets of tasks. - Keys are subset labels, values are sets of task labels. See - TaskSubsetfor more information.- Use - add_task_subsetto add a new subset. The subsets themselves may be modified in-place.
 - tasks¶
- A mapping view of the tasks in the graph. - This mapping has - strtask label keys and- TaskNodevalues. Iteration is topologically and deterministically ordered if and only if- sorthas been called since the last modification to the graph.
 - universe¶
- Definitions for all butler dimensions. 
 - Methods Documentation - add_task(label: str | None, task_class: type[PipelineTask], config: PipelineTaskConfig | None = None, connections: PipelineTaskConnections | None = None) TaskNode¶
- Add a new task to the graph. - Parameters:
- labelstrorNone
- Label for the task in the pipeline. If - None,- Task._DefaultNameis used.
- task_classtype[PipelineTask]
- Class object for the task. 
- configPipelineTaskConfig, optional
- Configuration for the task. If not provided, a default-constructed instance of - task_class.ConfigClassis used.
- connectionsPipelineTaskConnections, optional
- Object that describes the dataset types used by the task. If not provided, one will be constructed from the given configuration. If provided, it is assumed that - confighas already been validated and frozen.
 
- label
- Returns:
- nodeTaskNode
- The new task node added to the graph. 
 
- node
- Raises:
- ValueError
- Raised if configuration validation failed when constructing - connections.
- PipelineDataCycleError
- Raised if the graph is cyclic after this addition. 
- RuntimeError
- Raised if an unexpected exception (which will be chained) occurred at a stage that may have left the graph in an inconsistent state. Other exceptions should leave the graph unchanged. 
 
 - Notes - Checks for dataset type consistency and multiple producers do not occur until - resolveis called, since the resolution depends on both the state of the data repository and all contributing tasks.- Adding new tasks removes any existing resolutions of all dataset types it references and marks the graph as unsorted. It is most efficient to add all tasks up front and only then resolve and/or sort the graph. 
 - add_task_nodes(nodes: Iterable[TaskNode], parent: PipelineGraph | None = None) None¶
- Add one or more existing task nodes to the graph. - Parameters:
- nodesIterable[TaskNode]
- Iterable of task nodes to add. If any tasks have resolved dimensions, they must have the same dimension universe as the rest of the graph. 
- parentPipelineGraph, optional
- If provided, another - PipelineGraphfrom which these nodes were obtained. Any dataset type nodes already present in- parentthat are referenced by the given tasks will be used in this graph if they are not already present, preserving any dataset type resolutions present in the parent graph. Adding nodes from a parent graph after the graph has its own nodes (e.g. from- add_task) or nodes from a third graph may result in invalid dataset type resolutions. It is safest to only use this argument when populating an empty graph for the first time.
 
- nodes
- Raises:
- PipelineDataCycleError
- Raised if the graph is cyclic after this addition. 
 
 - Notes - Checks for dataset type consistency and multiple producers do not occur until - resolveis called, since the resolution depends on both the state of the data repository and all contributing tasks.- Adding new tasks removes any existing resolutions of all dataset types it references (unless - parent is not Noneand marks the graph as unsorted. It is most efficient to add all tasks up front and only then resolve and/or sort the graph.
 - add_task_subset(subset_label: str, task_labels: Iterable[str], description: str = '') None¶
- Add a label for a set of tasks that are already in the pipeline. 
 - check_dataset_type_registrations(butler: Butler, include_packages: bool = True) None¶
- Check that dataset type registrations in a data repository match the definitions in this pipeline graph. - Parameters:
- butlerButler
- Data repository client. 
- include_packagesbool, optional
- Whether to include the special “packages” dataset type that records software versions (this is not associated with a task and hence is not considered part of the pipeline graph in other respects, but it does get written with other provenance datasets). 
 
- butler
- Raises:
- lsst.daf.butler.MissingDatasetTypeError
- Raised if one or more non-optional-input or output dataset types in the pipeline is not registered at all. 
- lsst.daf.butler.ConflictingDefinitionError
- Raised if the definition in the data repository is not identical to the definition in the pipeline graph. 
 
 - Notes - Note that dataset type definitions that are storage-class-conversion compatible but not identical are not permitted by these checks, because the expectation is that these differences are handled by - resolve, which makes the pipeline graph use the data repository definitions. This method is intended to check that none of those definitions have changed.
 - consumers_of(dataset_type_name: str) list[lsst.pipe.base.pipeline_graph._tasks.TaskNode | lsst.pipe.base.pipeline_graph._tasks.TaskInitNode]¶
- Return the - TaskNodeand/or- TaskInitNodeobjects that read the given dataset type.- Parameters:
- dataset_type_namestr
- Dataset type name. Must not be a component. 
 
- dataset_type_name
- Returns:
 - Notes - On resolved graphs, it may be slightly more efficient to use: - graph.dataset_types[dataset_type_name].producing_edges - but this method works on graphs with unresolved dataset types as well. 
 - consuming_edges_of(dataset_type_name: str) list[lsst.pipe.base.pipeline_graph._edges.ReadEdge]¶
- Return the - ReadEdgeobjects that link the named dataset type to the tasks that consume it.- Parameters:
- dataset_type_namestr
- Dataset type name. Must not be a component. 
 
- dataset_type_name
- Returns:
 - Notes - On resolved graphs, it may be slightly more efficient to use: - graph.dataset_types[dataset_type_name].producing_edges - but this method works on graphs with unresolved dataset types as well. 
 - copy() PipelineGraph¶
- Return a copy of this graph that copies all mutable state. 
 - diff_tasks(other: PipelineGraph) list[str]¶
- Compare two pipeline graphs. - This only compares graph structure and task classes (including their edges). It does not compare full configuration (which is subject to spurious differences due to import-cache state), dataset type resolutions, or sort state. - Parameters:
- otherPipelineGraph
- Graph to compare to. 
 
- other
- Returns:
 
 - get_all_dimensions(prerequisites: bool = True) DimensionGroup¶
- Return all dimensions used in this graph’s tasks and dataset types. - Parameters:
- Returns:
- dimensionsDimensionGroup.
- All dimensions in this pipeline. 
 
- dimensions
 
 - group_by_dimensions(prerequisites: bool = False) dict[lsst.daf.butler.dimensions._group.DimensionGroup, tuple[dict[str, lsst.pipe.base.pipeline_graph._tasks.TaskNode], dict[str, lsst.pipe.base.pipeline_graph._dataset_types.DatasetTypeNode]]]¶
- Group this graph’s tasks and dataset types by their dimensions. - Parameters:
- Returns:
- groupsdict[DimensionGroup,tuple]
- A dictionary of groups keyed by - DimensionGroup, in which each value is a tuple of:
- a - dictof- DatasetTypeNodeinstances, keyed by dataset type name.
 - that have those dimensions. 
 
- groups
 - Notes - Init inputs and outputs are always included, but always have empty dimensions and are hence are all grouped together. 
 - init_output_run(butler: Butler) None¶
- Initialize a new output RUN collection by writing init-output datasets (including configs and packages). - Parameters:
- butlerlsst.daf.butler.Butler
- A full butler data repository client with its default run set to the collection where datasets should be written. 
 
- butler
 
 - inputs_of(task_label: str, init: bool = False) dict[str, lsst.pipe.base.pipeline_graph._dataset_types.DatasetTypeNode | None]¶
- Return the dataset types that are inputs to a task. - Parameters:
- Returns:
- inputsdict[str,DatasetTypeNodeorNone]
- Dictionary parent dataset type name keys and either - DatasetTypeNodevalues (if the dataset type has been resolved) or- Nonevalues.
 
- inputs
 - Notes - To get the input edges of a task or task init node (which provide information about storage class overrides nd components) use: - graph.tasks[task_label].iter_all_inputs() - or - graph.tasks[task_label].init.iter_all_inputs() - or the various mapping attributes of the - TaskNodeand- TaskInitNodeclass.
 - instantiate_tasks(get_init_input: Callable[[DatasetType], Any] | None = None, init_outputs: list[tuple[Any, DatasetType]] | None = None, labels: Iterable[str] | None = None) list[PipelineTask]¶
- Instantiate all tasks in the pipeline. - Parameters:
- get_init_inputCallable, optional
- Callable that accepts a single - DatasetTypeparameter and returns the init-input dataset associated with that dataset type. Must respect the storage class embedded in the type. This is optional if the pipeline does not have any overall init inputs. When a full butler is available,- lsst.daf.butler.Butler.getcan be used directly here.
- init_outputslist, optional
- A list of - (obj, dataset type)init-output dataset pairs, to be appended to in-place. Both the object and the dataset type will correspond to the storage class of the output connection, which may not be the same as the storage class on the graph’s dataset type node.
- labelsIterable[str], optional
- The labels of tasks to instantiate. If not provided, all tasks in the graph will be instantiated. 
 
- get_init_input
- Returns:
- taskslist
- Constructed - PipelineTaskinstances.
 
- tasks
 
 - iter_edges(init: bool = False) Iterator[Edge]¶
- Iterate over edges in the graph. - Parameters:
- Returns:
 - Notes - This method always returns _either_ init edges or runtime edges, never both. The full (internal) graph that contains both also includes a special edge that connects each task init node to its runtime node; that is also never returned by this method, since it is never a part of the init-only or runtime-only subgraphs. 
 - iter_nodes() DatasetTypeNode | None]]¶
- Iterate over nodes in the graph. - Returns:
- nodesIterator[tuple]
- A lazy iterator over all of the nodes in the graph. Each yielded element is a tuple of: - the node type enum value ( - NodeType);
- the string name for the node (task label or parent dataset type name); 
- the node value ( - TaskNode,- TaskInitNode,- DatasetTypeNode, or- Nonefor dataset type nodes that have not been resolved).
 
 
- nodes
 
 - iter_overall_inputs() Iterator[tuple[str, lsst.pipe.base.pipeline_graph._dataset_types.DatasetTypeNode | None]]¶
- Iterate over all of the dataset types that are consumed but not produced by the graph. - Returns:
- dataset_typesIterator[tuple]
- A lazy iterator over the overall-input dataset types (including overall init inputs and prerequisites). Each yielded element is a tuple of: - the parent dataset type name; 
- the resolved - DatasetTypeNode, or- Noneif the dataset type has not been resolved.
 
 
- dataset_types
 
 - make_bipartite_xgraph(init: bool = False) MultiDiGraph¶
- Return a bipartite networkx representation of just the runtime or init-time pipeline graph. - Parameters:
- Returns:
- xgraphnetworkx.MultiDiGraph
- Directed acyclic graph with parallel edges. 
 
- xgraph
 - Notes - The returned graph uses - NodeKeyinstances for nodes. Parallel edges represent the same dataset type appearing in multiple connections for the same task, and are hence rare. The connection name is used as the edge key to disambiguate those parallel edges.- This graph is bipartite because each dataset type node only has edges that connect it to a task [init] node, and vice versa. - See - TaskNode,- TaskInitNode,- DatasetTypeNode,- ReadEdge, and- WriteEdgefor the descriptive node and edge attributes added.
 - make_dataset_type_xgraph(init: bool = False) DiGraph¶
- Return a networkx representation of just the dataset types in the pipeline. - Parameters:
- Returns:
- xgraphnetworkx.DiGraph
- Directed acyclic graph with no parallel edges. 
 
- xgraph
 - Notes - The returned graph uses - NodeKeyinstances for nodes. The tasks that link these tasks are not represented at all; edges have no attributes, and there are no parallel edges.- See - DatasetTypeNodefor the descriptive node and attributes added.
 - make_task_xgraph(init: bool = False) DiGraph¶
- Return a networkx representation of just the tasks in the pipeline. - Parameters:
- Returns:
- xgraphnetworkx.DiGraph
- Directed acyclic graph with no parallel edges. 
 
- xgraph
 - Notes - The returned graph uses - NodeKeyinstances for nodes. The dataset types that link these tasks are not represented at all; edges have no attributes, and there are no parallel edges.- See - TaskNodeand- TaskInitNodefor the descriptive node and attributes added.
 - make_xgraph() MultiDiGraph¶
- Export a networkx representation of the full pipeline graph, including both init and runtime edges. - Returns:
- xgraphnetworkx.MultiDiGraph
- Directed acyclic graph with parallel edges. 
 
- xgraph
 - Notes - The returned graph uses - NodeKeyinstances for nodes. Parallel edges represent the same dataset type appearing in multiple connections for the same task, and are hence rare. The connection name is used as the edge key to disambiguate those parallel edges.- Almost all edges connect dataset type nodes to task or task init nodes or vice versa, but there is also a special edge that connects each task init node to its runtime node. The existence of these edges makes the graph not quite bipartite, though its init-only and runtime-only subgraphs are bipartite. - See - TaskNode,- TaskInitNode,- DatasetTypeNode,- ReadEdge, and- WriteEdgefor the descriptive node and edge attributes added.
 - outputs_of(task_label: str, init: bool = False, include_automatic_connections: bool = True) dict[str, lsst.pipe.base.pipeline_graph._dataset_types.DatasetTypeNode | None]¶
- Return the dataset types that are outputs of a task. - Parameters:
- Returns:
- outputsdict[str,DatasetTypeNodeorNone]
- Dictionary parent dataset type name keys and either - DatasetTypeNodevalues (if the dataset type has been resolved) or- Nonevalues.
 
- outputs
 - Notes - To get the input edges of a task or task init node (which provide information about storage class overrides nd components) use: - graph.tasks[task_label].iter_all_outputs() - or - graph.tasks[task_label].init.iter_all_outputs() - or the various mapping attributes of the - TaskNodeand- TaskInitNodeclass.
 - producer_of(dataset_type_name: str) TaskNode | TaskInitNode | None¶
- Return the - TaskNodeor- TaskInitNodethat writes the given dataset type.- Parameters:
- dataset_type_namestr
- Dataset type name. Must not be a component. 
 
- dataset_type_name
- Returns:
- edgeTaskNode,TaskInitNode, orNone
- Producing node or - Noneif there isn’t one in this graph.
 
- edge
- Raises:
- DuplicateOutputError
- Raised if there are multiple tasks defined to produce this dataset type. This is only possible if the graph’s dataset types are not resolved. 
 
 
 - producing_edge_of(dataset_type_name: str) WriteEdge | None¶
- Return the - WriteEdgethat links the producing task to the named dataset type.- Parameters:
- dataset_type_namestr
- Dataset type name. Must not be a component. 
 
- dataset_type_name
- Returns:
- Raises:
- DuplicateOutputError
- Raised if there are multiple tasks defined to produce this dataset type. This is only possible if the graph’s dataset types are not resolved. 
 
 - Notes - On resolved graphs, it may be slightly more efficient to use: - graph.dataset_types[dataset_type_name].producing_edge - but this method works on graphs with unresolved dataset types as well. 
 - reconfigure_tasks(*args: tuple[str, PipelineTaskConfig], check_edges_unchanged: bool = False, assume_edges_unchanged: bool = False, **kwargs: PipelineTaskConfig) None¶
- Update the configuration for one or more tasks. - Parameters:
- *argstuple[str,PipelineTaskConfig]
- Positional arguments are each a 2-tuple of task label and new config object. Note that the same arguments may also be passed as - **kwargs, which is usually more readable, but task labels in- *argsare not required to be valid Python identifiers.
- check_edges_unchangedbool, optional
- If - True, require the edges (connections) of the modified tasks to remain unchanged after the configuration updates, and verify that this is the case.
- assume_edges_unchangedbool, optional
- If - True, the caller declares that the edges (connections) of the modified tasks will remain unchanged after the configuration updates, and that it is unnecessary to check this.
- **kwargsPipelineTaskConfig
- New config objects or overrides to apply to copies of the current config objects, with task labels as the keywords. 
 
- *args
- Raises:
- ValueError
- Raised if - assume_edges_unchangedand- check_edges_unchangedare both- True, or if the same task appears twice.
- EdgesChangedError
- Raised if - check_edges_unchanged=Trueand the edges of a task do change.
 
 - Notes - If reconfiguring a task causes its edges to change, any dataset type nodes connected to that task (not just those whose edges have changed!) will be unresolved. 
 - register_dataset_types(butler: Butler, include_packages: bool = True) None¶
- Register all dataset types in a data repository. - Parameters:
- butlerButler
- Data repository client. 
- include_packagesbool, optional
- Whether to include the special “packages” dataset type that records software versions (this is not associated with a task and hence is not considered part of the pipeline graph in other respects, but it does get written with other provenance datasets). 
 
- butler
 
 - remove_task_subset(subset_label: str) None¶
- Remove a labeled set of tasks. - Parameters:
- subset_labelstr
- Label for this set of tasks. 
 
- subset_label
 - Notes - If this subset is a step, it is also removed from the step definitions. 
 - remove_tasks(labels: Iterable[str], drop_from_subsets: bool = True) list[tuple[lsst.pipe.base.pipeline_graph._tasks.TaskNode, set[str]]]¶
- Remove one or more tasks from the graph. - Parameters:
- Returns:
- Raises:
- PipelineGraphError
- Raised if - drop_from_subsetsis- Falseand the task is still part of one or more subsets.
 
 - Notes - Removing a task will cause dataset nodes with no other referencing tasks to be removed. Any other dataset type nodes referenced by a removed task will be reset to an “unresolved” state. 
 - resolve(registry: Registry | None = None, dimensions: DimensionUniverse | None = None, dataset_types: Mapping[str, DatasetType] | None = None, visualization_only: bool = False) None¶
- Resolve all dimensions and dataset types and check them for consistency. - Resolving a graph also causes it to be sorted. - Parameters:
- registrylsst.daf.butler.Registry, optional
- Client for the data repository to resolve against. 
- dimensionslsst.daf.butler.DimensionUniverse, optional
- Definitions for all dimensions. Takes precedence over - registry.dimensionsif both are provided. If neither is provided, defaults to the default dimension universe (- lsst.daf.butler.DimensionUniverse()).
- dataset_typesMapping[str,DatasetType], optional
- Mapping of dataset types to consider registered. Takes precedence over - registry.getDatasetType()if both are provided.
- visualization_onlybool, optional
- Resolve the graph as well as possible even when dimensions and storage classes cannot really be determined. This can include using the - universe.commonSkyPixas the assumed dimensions of connections that use the “skypix” placeholder and using “<UNKNOWN>” as a storage class name (which will fail if the storage class itself is ever actually loaded).
 
- registry
- Raises:
- ConnectionTypeConsistencyError
- Raised if a prerequisite input for one task appears as a different kind of connection in any other task. 
- DuplicateOutputError
- Raised if multiple tasks have the same dataset type as an output. 
- IncompatibleDatasetTypeError
- Raised if different tasks have different definitions of a dataset type. Different but compatible storage classes are permitted. 
- MissingDatasetTypeError
- Raised if a dataset type definition is required to exist in the data repository but none was found. This should only occur for dataset types that are not produced by a task in the pipeline and are consumed with different storage classes or as components by tasks in the pipeline. 
- EdgesChangedError
- Raised if - check_edges_unchanged=Trueand the edges of a task do change after import and reconfiguration.
 
 - Notes - The - universeattribute is set to- dimensionsand used to set all- TaskNode.dimensionsattributes. Dataset type nodes are resolved by first looking for a registry definition, then using the producing task’s definition, then looking for consistency between all consuming task definitions.
 - select(expression: str) PipelineGraph¶
- Return a new pipeline graph with the tasks that match an expression. - Parameters:
- expressionstr
- String expression to evaluate. See Pipeline graph subset expressions. 
 
- expression
- Returns:
- new_graphPipelineGraph
- New pipeline graph with just the matching tasks. 
 
- new_graph
 - Notes - All resolved dataset type nodes will be preserved. - If - has_been_sorted, the new graph will be sorted as well.- Task subsets will not be included in the returned graph. 
 - select_tasks(expression: str) set[str]¶
- Return the tasks that match an expression. - Parameters:
- expressionstr
- String expression to evaluate. See Pipeline graph subset expressions. 
 
- expression
- Returns:
 
 - sort() None¶
- Sort this graph’s nodes topologically with deterministic (but unspecified) tiebreakers. - This does nothing if the graph is already known to be sorted. 
 - split_independent() Iterable[PipelineGraph]¶
- Iterate over independent subgraphs that together comprise this pipeline graph. - Returns:
- subgraphsIterable[PipelineGraph]
- An iterable over component subgraphs that could be run independently (they have only overall inputs in common). May be a lazy iterator. 
 
- subgraphs
 - Notes - All resolved dataset type nodes will be preserved. - If there is only one component, - selfmay be returned as the only element in the iterable.- If - has_been_sorted, all subgraphs will be sorted as well.- Task subsets will not be included in the returned graphs. 
 - write_configs(butler: Butler) None¶
- Write the config datasets for all tasks in the pipeline graph. - Parameters:
- butlerlsst.daf.butler.Butler
- A full butler data repository client with its default run set to the collection where datasets should be written. 
 
- butler
- Raises:
- lsst.daf.butler.registry.ConflictingDefinitionError
- Raised if a config dataset already exists and is not consistent with the config in the pipeline graph. 
 
 - Notes - Config datasets that already exist in the butler’s output run collection will be checked for consistency. - This method writes outputs with new random dataset IDs and should hence only be used when writing init-outputs prior to building a - QuantumGraph. Use- QuantumGraph.write_configsif a quantum graph has already been built.
 - write_init_outputs(butler: Butler) None¶
- Write the init-output datasets for all tasks in the pipeline graph. - Parameters:
- butlerlsst.daf.butler.Butler
- A full butler data repository client with its default run set to the collection where datasets should be written. 
 
- butler
 - Notes - Datasets that already exist in the butler’s output run collection will not be written. - This method writes outputs with new random dataset IDs and should hence only be used when writing init-outputs prior to building a - QuantumGraph. Use- QuantumGraph.write_init_outputsif a quantum graph has already been built.
 - write_packages(butler: Butler) None¶
- Write the ‘packages’ dataset for the currently-active software versions. - Parameters:
- butlerlsst.daf.butler.Butler
- A full butler data repository client with its default run set to the collection where datasets should be written. 
 
- butler
- Raises:
- lsst.daf.butler.registry.ConflictingDefinitionError
- Raised if the packages dataset already exists and is not consistent with the current packages. 
 
 - Notes - If the packages dataset already exists, it will be compared to the versions in the current packages. New packages that weren’t present before are not considered an inconsistency. - This method writes outputs with new random dataset IDs and should hence only be used when writing init-outputs prior to building a - QuantumGraph. Use- QuantumGraph.write_packagesif a quantum graph has already been built.