LeafRelation

final class lsst.daf.relation.LeafRelation(engine: Engine, columns: frozenset[ColumnTag], payload: Any, name: str = '', name_prefix: dataclasses.InitVar[str | None] = 'leaf', messages: Sequence[str] = (), parameters: Any = None, min_rows: int = 0, max_rows: int | None = None)

Bases: BaseRelation

A Relation class that represents direct storage of rows, rather than an operation on some other relation.

Attributes Summary

is_join_identity

Whether a join to this relation will result in the other relation being returned directly (bool).

is_locked

Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (bool).

is_trivial

Whether this relation has no real content (bool).

max_rows

The maximum number of rows this relation might have (int or None).

messages

Messages for use when processing the relation with the Diagnostics class or similar algorithms (Sequence [ str ]).

min_rows

The minimum number of rows this relation might have (int).

name

Name used to identify and reconstruct this relation (str).

name_prefix

Prefix used when calling Engine.get_relation_name when name is not provided (str).

parameters

Extra data used to uniquely identify and/or reconstruct this relation.

Methods Summary

attach_payload(payload)

Attach an engine-specific payload to this relation.

chain(rhs)

Return a new relation with all rows from this relation and another.

join(rhs[, predicate, backtrack, transfer])

Return a new relation that joins this one to the given one.

make_doomed(engine, columns, messages[, name])

Construct a leaf relation with no rows and one or more messages explaining why.

make_join_identity(engine[, name])

Construct a leaf relation with no columns and exactly one row.

materialized([name, name_prefix])

Return a new relation that indicates that this relation's payload should be cached after it is first processed.

sorted(terms, *[, preferred_engine, ...])

Return a new relation that sorts rows according to a sequence of column expressions.

transferred_to(destination)

Return a new relation that transfers this relation to a new engine.

with_calculated_column(tag, expression, *[, ...])

Return a new relation that adds a calculated column to this one.

with_only_columns(columns, *[, ...])

Return a new relation whose columns are a subset of this relation's.

with_rows_satisfying(predicate, *[, ...])

Return a new relation that filters out rows via a boolean expression.

without_duplicates(*[, preferred_engine, ...])

Return a new relation that removes any duplicate rows from this one.

Attributes Documentation

is_join_identity

Whether a join to this relation will result in the other relation being returned directly (bool).

Join identity relations have exactly one row and no columns.

is_locked

Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (bool).

See Relation.is_locked.

is_trivial

Whether this relation has no real content (bool).

A trivial relation is either a join identity with no columns and exactly one row, or a relation with an arbitrary number of columns and no rows (i.e. min_rows==max_rows==0).

max_rows: int | None = None

The maximum number of rows this relation might have (int or None).

messages: Sequence[str] = ()

Messages for use when processing the relation with the Diagnostics class or similar algorithms (Sequence [ str ]).

This is typically used to explain why a leaf relation has no rows when max_rows==0; see make_doomed.

min_rows: int = 0

The minimum number of rows this relation might have (int).

name: str = ''

Name used to identify and reconstruct this relation (str).

name_prefix: dataclasses.InitVar[str | None] = 'leaf'

Prefix used when calling Engine.get_relation_name when name is not provided (str).

parameters: Any = None

Extra data used to uniquely identify and/or reconstruct this relation.

Methods Documentation

attach_payload(payload: Any) None

Attach an engine-specific payload to this relation.

This method may be called exactly once on a Relation instance that was not initialized with a payload, despite the fact that Relation objects are otherwise considered immutable.

Parameters:
payload

Engine-specific content to attach.

Raises:
TypeError

Raised if this relation already has a payload, or can never have a payload. TypeError is used here for consistency with other attempts to assign to an attribute of an immutable object.

chain(rhs: Relation) Relation

Return a new relation with all rows from this relation and another.

This is a convenience method that constructs and applies a Chain operation.

Parameters:
rhsRelation

Other relation to chain to self. Must have the same columns and engine as self.

Returns:
relationRelation

New relation with all rows from both relations. This method never returns an operand directly, even if the other has max_rows==0, as it is assumed that even relations with no rows are useful to preserve in the tree for diagnostics.

Raises:
ColumnError

Raised if the two relations do not have the same columns.

EngineError

Raised if the two relations do not have the same engine.

join(rhs: Relation, predicate: Predicate | None = None, *, backtrack: bool = True, transfer: bool = False) Relation

Return a new relation that joins this one to the given one.

This is a convenience method that constructs and applies a Join operation, via PartialJoin.apply.

Parameters:
rhsRelation

Relation to join to self.

predicatePredicate, optional

Boolean expression that must evaluate to true in order to join a a pair of rows, in addition to an implicit equality constraint on any columns in both relations.

backtrackbool, optional

If True (default) and self.engine != rhs.engine, attempt to insert this join before a transfer upstream of self, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and self.engine != rhs.engine, insert a new Transfer before the Join. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

Returns:
relationRelation

New relation that joins self to rhs. May be self or rhs if the other is a join identity.

Raises:
ColumnError

Raised if the given predicate requires columns not present in self or rhs.

EngineError

Raised if it was impossible to insert this operation in rhs.engine via backtracks or transfers on self, or if the predicate was not supported by the engine.

Notes

This method does not treat self and rhs symmetrically: it always considers rhs fixed, and only backtracks into or considers applying transfers to self.

classmethod make_doomed(engine: Engine, columns: Set[ColumnTag], messages: Sequence[str], name: str = '0') LeafRelation

Construct a leaf relation with no rows and one or more messages explaining why.

Parameters:
engineEngine

The engine that is responsible for interpreting this relation.

columnsSet [ ColumnTag ]

The columns in this relation.

messagesSequence [ str ]

One or more messages explaining why the relation has no rows.

namestr, optional

Name used to identify and reconstruct this relation.

Returns:
relationLeafRelation

Doomed leaf relation.

classmethod make_join_identity(engine: Engine, name: str = 'I') LeafRelation

Construct a leaf relation with no columns and exactly one row.

Parameters:
engineEngine

The engine that is responsible for interpreting this relation.

namestr, optional

Name used to identify and reconstruct this relation.

Returns:
relationLeafRelation

Leaf relation with no columns and one row.

materialized(name: str | None = None, *, name_prefix: str = 'materialization') Relation

Return a new relation that indicates that this relation’s payload should be cached after it is first processed.

This is a convenience method that constructs and applies a Materialization operation.

Parameters:
namestr, optional

Name to use for the cached payload within the engine (e.g. the name for a temporary table in SQL). If not provided, a name will be created via a call to Engine.get_relation_name.

name_prefixstr, optional

Prefix to pass to Engine.get_relation_name; ignored if name is provided. Unlike most operations, Materialization relations are locked by default, since they reflect user intent to mark a specific tree as cacheable.

Returns:
relationRelation

New relation that marks its upstream tree for caching. May be self if it is already a LeafRelation or another materialization (in which case the given name or name prefix will be ignored).

sorted(terms: Sequence[SortTerm], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation

Return a new relation that sorts rows according to a sequence of column expressions.

This is a convenience method that constructs and applies a Sort operation.

Parameters:
termsSequence [ SortTerm ]

Ordered sequence of column expressions to sort on, with whether to apply them in ascending or descending order.

preferred_engineEngine, optional

Engine that the operation would ideally be performed in. If this is not equal to self.engine, the backtrack, transfer, and require_preferred_engine arguments control the behavior.

backtrackbool, optional

If True (default) and the current engine is not the preferred engine, attempt to insert this sort before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and the current engine is not the preferred engine, insert a new Transfer before the Sort. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

require_preferred_enginebool, optional

If True (False is default) and the current engine is not the preferred engine, raise EngineError. If backtrack is also true, the exception is only raised if the backtrack attempt fails. Ignored if transfer is true.

Returns:
relationRelation

New relation with sorted rows. Will be self if terms is empty. If self is already a sort operation relation, the operations will be merged by concatenating their terms, which may result in duplicate sort terms that have no effect.

Raises:
ColumnError

Raised if any column required by a SortTerm is not present in self.columns.

EngineError

Raised if require_preferred_engine=True and it was impossible to insert this operation in the preferred engine, or if a SortTerm expression was not supported by the engine.

transferred_to(destination: Engine) Relation

Return a new relation that transfers this relation to a new engine.

This is a convenience method that constructs and applies a Transfer operation.

Parameters:
destinationEngine

Engine for the new relation.

Returns:
relationRelation

New relation in the given engine. Will be self if self.engine == destination.

with_calculated_column(tag: ColumnTag, expression: ColumnExpression, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation

Return a new relation that adds a calculated column to this one.

This is a convenience method chat constructs and applies a Calculation operation.

Parameters:
tagColumnTag

Identifier for the new column.

expressionColumnExpression

Expression used to populate the new column.

preferred_engineEngine, optional

Engine that the operation would ideally be performed in. If this is not equal to self.engine, the backtrack, transfer, and require_preferred_engine arguments control the behavior.

backtrackbool, optional

If True (default) and the current engine is not the preferred engine, attempt to insert this calculation before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and the current engine is not the preferred engine, insert a new Transfer before the Calculation. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

require_preferred_enginebool, optional

If True (False is default) and the current engine is not the preferred engine, raise EngineError. If backtrack is also true, the exception is only raised if the backtrack attempt fails. Ignored if transfer is true.

Returns:
relationRelation

Relation that contains the calculated column.

Raises:
ColumnError

Raised if the expression requires columns that are not present in self.columns, or if tag is already present in self.columns.

EngineError

Raised if require_preferred_engine=True and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.

with_only_columns(columns: Set[ColumnTag], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation

Return a new relation whose columns are a subset of this relation’s.

This is a convenience method that constructs and applies a Projection operation.

Parameters:
columnsSet [ ColumnTag ]

Columns to be propagated to the new relation; must be a subset of self.columns.

preferred_engineEngine, optional

Engine that the operation would ideally be performed in. If this is not equal to self.engine, the backtrack, transfer, and require_preferred_engine arguments control the behavior.

backtrackbool, optional

If True (default) and the current engine is not the preferred engine, attempt to insert this projection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and the current engine is not the preferred engine, insert a new Transfer before the Projection. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

require_preferred_enginebool, optional

If True (False is default) and the current engine is not the preferred engine, raise EngineError. If backtrack is also true, the exception is only raised if the backtrack attempt fails. Ignored if transfer is true.

Returns:
relationRelation

New relation with only the given columns. Will be self if columns == self.columns.

Raises:
ColumnError

Raised if columns is not a subset of self.columns.

EngineError

Raised if require_preferred_engine=True and it was impossible to insert this operation in the preferred engine.

with_rows_satisfying(predicate: Predicate, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation

Return a new relation that filters out rows via a boolean expression.

This is a convenience method that constructions and applies a Selection operation.

Parameters:
predicatePredicate

Boolean expression that evaluates to False for rows that should be included and False for rows that should be filtered out.

preferred_engineEngine, optional

Engine that the operation would ideally be performed in. If this is not equal to self.engine, the backtrack, transfer, and require_preferred_engine arguments control the behavior.

backtrackbool, optional

If True (default) and the current engine is not the preferred engine, attempt to insert this selection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and the current engine is not the preferred engine, insert a new Transfer before the Selection. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

require_preferred_enginebool, optional

If True (False is default) and the current engine is not the preferred engine, raise EngineError. If backtrack is also true, the exception is only raised if the backtrack attempt fails. Ignored if transfer is true.

Returns:
relationRelation

New relation with only the rows that satisfy the given predicate. May be self if the predicate is trivially True.

Raises:
ColumnError

Raised if predicate.columns_required is not a subset of self.columns.

EngineError

Raised if require_preferred_engine=True and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.

without_duplicates(*, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation

Return a new relation that removes any duplicate rows from this one.

This is a convenience method that constructs and applies a Deduplication operation.

Parameters:
preferred_engineEngine, optional

Engine that the operation would ideally be performed in. If this is not equal to self.engine, the backtrack, transfer, and require_preferred_engine arguments control the behavior.

backtrackbool, optional

If True (default) and the current engine is not the preferred engine, attempt to insert this deduplication before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.

transferbool, optional

If True (False is default) and the current engine is not the preferred engine, insert a new Transfer before the Deduplication. If backtrack is also true, the transfer is added only if the backtrack attempt fails.

require_preferred_enginebool, optional

If True (False is default) and the current engine is not the preferred engine, raise EngineError. If backtrack is also true, the exception is only raised if the backtrack attempt fails. Ignored if transfer is true.

Returns:
relationRelation

Relation with no duplicate rows. This may be self if it can be determined that there is no duplication already, but this is not guaranteed.

Raises:
EngineError

Raised if require_preferred_engine=True and it was impossible to insert this operation in the preferred engine.