Select¶

class lsst.daf.relation.sql.Select(target: lsst.daf.relation._relation.Relation, payload: Optional[Any, None] = None, *, sort: lsst.daf.relation._operations._sort.Sort, projection: lsst.daf.relation._operations._projection.Projection | None[lsst.daf.relation._operations._projection.Projection, None], deduplication: lsst.daf.relation._operations._deduplication.Deduplication | None[lsst.daf.relation._operations._deduplication.Deduplication, None], slice: lsst.daf.relation._operations._slice.Slice, skip_to: lsst.daf.relation._relation.Relation)¶

Bases: lsst.daf.relation.MarkerRelation

A marker operation used by a the SQL engine to group relation trees into SELECT statements.

Select objects should not generally be added to relation trees by code outside the SQL engine itself, except via the inherited reapply interface. Use Engine.conform to insert Select markers into an arbitrary relation tree (while reordering its operations accordingly).

Notes

A conformed SQL relation tree always starts with a Select relation, immediately followed by any projection, deduplication, sort, and slice (in that order) that appear within the corresponding SQL SELECT statement. These operations are held directly by the Select itself as attributes, and the first upstream relation that isn’t one of those operations is held as the skip_to attribute. Nested Select instances correspond to sets of relations that will appear within subqueries. This practice allows the SQL engine to reduce subquery nesting and bring Sort operations downstream into the outermost SELECT statement whenever possible, since ORDER BY clauses in subqueries do not propagate to the order of the outer statement.

The SQL engine’s relation-processing algorithms typically traverse a tree that starts with a Select by recursing to skip_to rather than target, since the Select object’s attributes also fully determine the operations between skip_to and target. In this pattern, the apply_skip and reapply_skip methods are used to add possibly-modified Select markers after the upstream skip_to tree has been processed.

In contrast, general relation-processing algorithms that only see the Select as an opaque MarkerRelation recurse via target and use reapply to restore a possibly-modified Select marker.

The operations managed by the Select are always added in the same order, which is consistent with the order the equivalent SELECT statement would apply them:

Sort
Projection
Deduplication
Slice

Note that the Projection needs to follow the Sort in order to allow the ORDER BY clause to reference columns that do not appear in the SELECT clause (otherwise these operations would commute with each other and the Deduplication).

Attributes Summary

`columns`	The columns in this relation (`Set` [ `ColumnTag` ] ).
`engine`	The engine that is responsible for interpreting this relation (`Engine`).
`has_deduplication`	Whether there is a `Deduplication` between `skip_to` and `target` (`bool`).
`has_projection`	Whether there is a `Projection` between `skip_to` and `target` (`bool`).
`has_slice`	Whether there is a `Slice` between `skip_to` and `target` (`bool`).
`has_sort`	Whether there is a `Sort` between `skip_to` and `target`.
`is_join_identity`	Whether a `join` to this relation will result in the other relation being returned directly (`bool`).
`is_locked`	Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (`bool`).
`is_trivial`	Whether this relation has no real content (`bool`).
`max_rows`	The maximum number of rows this relation might have (`int` or `None`).
`min_rows`	The minimum number of rows this relation might have (`int`).
`payload`

Methods Summary

`apply_skip`(skip_to, sort, None] = None, …)	Wrap a relation in a `Select` and add all of the operations it manages.
`attach_payload`(payload)	Attach an engine-specific `payload` to this relation.
`chain`(rhs)	Return a new relation with all rows from this relation and another.
`join`(rhs, predicate, *, backtrack, transfer)	Return a new relation that joins this one to the given one.
`materialized`(name, None] = None, *, name_prefix)	Return a new relation that indicates that this relation’s payload should be cached after it is first processed.
`reapply`(target, payload, None] = None)	Mark a new target relation, returning a new instance of the same type.
`reapply_skip`(skip_to, None] = None, after, …)	Return a modified version of this `Select`.
`sorted`(terms, *, preferred_engine, …)	Return a new relation that sorts rows according to a sequence of column expressions.
`strip`()	Remove the `Select` marker and any preceding `Projection` from a relation if it has no other managed operations.
`transferred_to`(destination)	Return a new relation that transfers this relation to a new engine.
`with_calculated_column`(tag, expression, *, …)	Return a new relation that adds a calculated column to this one.
`with_only_columns`(columns, *, …)	Return a new relation whose columns are a subset of this relation’s.
`with_rows_satisfying`(predicate, *, …)	Return a new relation that filters out rows via a boolean expression.
`without_duplicates`(*, preferred_engine, …)	Return a new relation that removes any duplicate rows from this one.

Attributes Documentation

columns¶: The columns in this relation (Set [ ColumnTag ] ).

engine¶: The engine that is responsible for interpreting this relation (Engine).

has_deduplication¶: Whether there is a Deduplication between skip_to and target (bool).

has_projection¶: Whether there is a Projection between skip_to and target (bool).

has_slice¶: Whether there is a Slice between skip_to and target (bool).

has_sort¶: Whether there is a Sort between skip_to and target. (bool)

is_join_identity¶

Whether a join to this relation will result in the other relation being returned directly (bool).

Join identity relations have exactly one row and no columns.

See also

LeafRelation.make_join_identity

is_locked¶: Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (bool).

is_trivial¶

Whether this relation has no real content (bool).

A trivial relation is either a join identity with no columns and exactly one row, or a relation with an arbitrary number of columns and no rows (i.e. min_rows==max_rows==0).

max_rows¶

The maximum number of rows this relation might have (int or None).

This is None for relations whose size is not bounded from above.

min_rows¶: The minimum number of rows this relation might have (int).

payload = None¶

Methods Documentation

classmethod apply_skip(skip_to: lsst.daf.relation._relation.Relation, sort: Optional[lsst.daf.relation._operations._sort.Sort, None] = None, projection: Optional[lsst.daf.relation._operations._projection.Projection, None] = None, deduplication: Optional[lsst.daf.relation._operations._deduplication.Deduplication, None] = None, slice: Optional[lsst.daf.relation._operations._slice.Slice, None] = None) → lsst.daf.relation.sql._select.Select¶

Wrap a relation in a Select and add all of the operations it manages.

Parameters:

skip_to : Relation: The relation to add the Select to, after first adding any requested operations. This must not have any of the operation types managed by the Select class unless they are immediately upstream of another existing Select.
sort : Sort, optional: A sort to apply to skip_to and add to the new Select.
projection : Projection, optional: A projection to apply to skip_to and add to the new Select.
deduplication : Deduplication, optional: A deduplication to apply to skip_to and add to the new Select.
slice : Slice, optional: A slice to apply to skip_to and add to the new Select.

Returns:

select : Select: A relation tree terminated by a new Select.

attach_payload(payload: Any) → None¶

Attach an engine-specific payload to this relation.

This method may be called exactly once on a MarkerRelation instance that was not initialized with a payload, despite the fact that Relation objects are otherwise considered immutable.

Parameters:	payload Engine-specific content to attach.
Raises:	TypeError Raised if this relation already has a payload, or if this marker subclass can never have a payload. `TypeError` is used here for consistency with other attempts to assign to an attribute of an immutable object.

chain(rhs: lsst.daf.relation._relation.Relation) → lsst.daf.relation._relation.Relation¶

Return a new relation with all rows from this relation and another.

This is a convenience method that constructs and applies a Chain operation.

Parameters:	rhs : `Relation` Other relation to chain to `self`. Must have the same columns and engine as `self`.
Returns:	relation : `Relation` New relation with all rows from both relations. If the engine `preserves order` for chains, all rows from `self` will appear before all rows from `rhs`, in their original order. This method never returns an operand directly, even if the other has `max_rows==0`, as it is assumed that even relations with no rows are useful to preserve in the tree for `diagnostics`.
Raises:	ColumnError Raised if the two relations do not have the same columns. EngineError Raised if the two relations do not have the same engine. RowOrderError Raised if `self` or `rhs` is unnecessarily ordered; see `expect_unordered`.

join(rhs: Relation, predicate: Predicate | None = None, *, backtrack: bool = True, transfer: bool = False) → Relation¶

Return a new relation that joins this one to the given one.

This is a convenience method that constructs and applies a Join operation, via PartialJoin.apply.

Parameters:	rhs : `Relation` Relation to join to `self`. predicate : `Predicate`, optional Boolean expression that must evaluate to true in order to join a a pair of rows, in addition to an implicit equality constraint on any columns in both relations. backtrack : `bool`, optional If `True` (default) and `self.engine != rhs.engine`, attempt to insert this join before a transfer upstream of `self`, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and `self.engine != rhs.engine`, insert a new `Transfer` before the `Join`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails.
Returns:	relation : `Relation` New relation that joins `self` to `rhs`. May be `self` or `rhs` if the other is a `join identity`.
Raises:	ColumnError Raised if the given predicate requires columns not present in `self` or `rhs`. EngineError Raised if it was impossible to insert this operation in `rhs.engine` via backtracks or transfers on `self`, or if the predicate was not supported by the engine. RowOrderError Raised if `self` or `rhs` is unnecessarily ordered; see `expect_unordered`.

Notes

This method does not treat self and rhs symmetrically: it always considers rhs fixed, and only backtracks into or considers applying transfers to self.

materialized(name: Optional[str, None] = None, *, name_prefix: str = 'materialization') → lsst.daf.relation._relation.Relation¶

Return a new relation that indicates that this relation’s payload should be cached after it is first processed.

This is a convenience method that constructs and applies a Materialization operation.

Parameters:	name : `str`, optional Name to use for the cached payload within the engine (e.g. the name for a temporary table in SQL). If not provided, a name will be created via a call to `Engine.get_relation_name`. name_prefix : `str`, optional Prefix to pass to `Engine.get_relation_name`; ignored if `name` is provided. Unlike most operations, `Materialization` relations are locked by default, since they reflect user intent to mark a specific tree as cacheable.
Returns:	relation : `Relation` New relation that marks its upstream tree for caching. May be `self` if it is already a `LeafRelation` or another materialization (in which case the given name or name prefix will be ignored).
Raises:

See also

Processor.materialize

reapply(target: lsst.daf.relation._relation.Relation, payload: Optional[Any, None] = None) → lsst.daf.relation.sql._select.Select¶

Mark a new target relation, returning a new instance of the same type.

Parameters:	target : `Relation` New relation to mark. payload, optional Payload to attach to the new relation.
Returns:	relation : `MarkerRelation` A new relation with the given target.

Notes

This method is primarily intended for use by operations that “unroll” a relation tree to perform some modification upstream and then “replay” the operations and markers that were downstream. MarkerRelation implementations with state that depends on the target will need to override this method to update that state accordingly.

reapply_skip(skip_to: Optional[lsst.daf.relation._relation.Relation, None] = None, after: Optional[lsst.daf.relation._unary_operation.UnaryOperation, None] = None, **kwargs) → lsst.daf.relation.sql._select.Select¶

Return a modified version of this Select.

Parameters:

skip_to : Relation, optional: The relation to add the Select to, after first adding any requested operations. This must not have any of the operation types managed by the Select class unless they are immediately upstream of another existing Select. If not provided, self.skip_to is used.
after : UnaryOperation, optional: A unary operation to apply to skip_to before the operations managed by Select. Must not be one of the operattion types managed by Select.
**kwargs: Operations to include in the Select, forwarded to apply_skip. Default is to apply the same operations already in self, and None can be passed to drop one of these operations.

Returns:

select : Select: Relation tree wrapped in a modified Select.

sorted(terms: Sequence[SortTerm], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) → Relation¶

Return a new relation that sorts rows according to a sequence of column expressions.

This is a convenience method that constructs and applies a Sort operation.

Parameters:	terms : `Sequence` [ `SortTerm` ] Ordered sequence of column expressions to sort on, with whether to apply them in ascending or descending order. preferred_engine : `Engine`, optional Engine that the operation would ideally be performed in. If this is not equal to `self.engine`, the `backtrack`, `transfer`, and `require_preferred_engine` arguments control the behavior. backtrack : `bool`, optional If `True` (default) and the current engine is not the preferred engine, attempt to insert this sort before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, insert a new `Transfer` before the `Sort`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails. require_preferred_engine : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, raise `EngineError`. If `backtrack` is also true, the exception is only raised if the backtrack attempt fails. Ignored if `transfer` is true.
Returns:	relation : `Relation` New relation with sorted rows. Will be `self` if `terms` is empty. If `self` is already a sort operation relation, the operations will be merged by concatenating their terms, which may result in duplicate sort terms that have no effect.
Raises:	ColumnError Raised if any column required by a `SortTerm` is not present in `self.columns`. EngineError Raised if `require_preferred_engine=True` and it was impossible to insert this operation in the preferred engine, or if a `SortTerm` expression was not supported by the engine.

strip() → tuple¶

Remove the Select marker and any preceding Projection from a relation if it has no other managed operations.

Returns:	relation : `Relation` Upstream relation with the `Select`. removed_projection : `bool` Whether a `Projection` operation was also stripped.

transferred_to(destination: Engine) → Relation¶

Return a new relation that transfers this relation to a new engine.

This is a convenience method that constructs and applies a Transfer operation.

Parameters:	destination : `Engine` Engine for the new relation.
Returns:	relation : `Relation` New relation in the given engine. Will be `self` if `self.engine == destination`.
Raises:

with_calculated_column(tag: ColumnTag, expression: ColumnExpression, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) → Relation¶

Return a new relation that adds a calculated column to this one.

This is a convenience method chat constructs and applies a Calculation operation.

Parameters:	tag : `ColumnTag` Identifier for the new column. expression : `ColumnExpression` Expression used to populate the new column. preferred_engine : `Engine`, optional Engine that the operation would ideally be performed in. If this is not equal to `self.engine`, the `backtrack`, `transfer`, and `require_preferred_engine` arguments control the behavior. backtrack : `bool`, optional If `True` (default) and the current engine is not the preferred engine, attempt to insert this calculation before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, insert a new `Transfer` before the `Calculation`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails. require_preferred_engine : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, raise `EngineError`. If `backtrack` is also true, the exception is only raised if the backtrack attempt fails. Ignored if `transfer` is true.
Returns:	relation : `Relation` Relation that contains the calculated column.
Raises:	ColumnError Raised if the expression requires columns that are not present in `self.columns`, or if `tag` is already present in `self.columns`. EngineError Raised if `require_preferred_engine=True` and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.

with_only_columns(columns: Set[ColumnTag], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) → Relation¶

Return a new relation whose columns are a subset of this relation’s.

This is a convenience method that constructs and applies a Projection operation.

Parameters:	columns : `Set` [ `ColumnTag` ] Columns to be propagated to the new relation; must be a subset of `self.columns`. preferred_engine : `Engine`, optional Engine that the operation would ideally be performed in. If this is not equal to `self.engine`, the `backtrack`, `transfer`, and `require_preferred_engine` arguments control the behavior. backtrack : `bool`, optional If `True` (default) and the current engine is not the preferred engine, attempt to insert this projection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, insert a new `Transfer` before the `Projection`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails. require_preferred_engine : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, raise `EngineError`. If `backtrack` is also true, the exception is only raised if the backtrack attempt fails. Ignored if `transfer` is true.
Returns:	relation : `Relation` New relation with only the given columns. Will be `self` if `columns == self.columns`.
Raises:	ColumnError Raised if `columns` is not a subset of `self.columns`. EngineError Raised if `require_preferred_engine=True` and it was impossible to insert this operation in the preferred engine.

with_rows_satisfying(predicate: Predicate, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) → Relation¶

Return a new relation that filters out rows via a boolean expression.

This is a convenience method that constructions and applies a Selection operation.

Parameters:	predicate : `Predicate` Boolean expression that evaluates to `False` for rows that should be included and `False` for rows that should be filtered out. preferred_engine : `Engine`, optional Engine that the operation would ideally be performed in. If this is not equal to `self.engine`, the `backtrack`, `transfer`, and `require_preferred_engine` arguments control the behavior. backtrack : `bool`, optional If `True` (default) and the current engine is not the preferred engine, attempt to insert this selection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, insert a new `Transfer` before the `Selection`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails. require_preferred_engine : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, raise `EngineError`. If `backtrack` is also true, the exception is only raised if the backtrack attempt fails. Ignored if `transfer` is true.
Returns:	relation : `Relation` New relation with only the rows that satisfy the given predicate. May be `self` if the predicate is `trivially True`.
Raises:	ColumnError Raised if `predicate.columns_required` is not a subset of `self.columns`. EngineError Raised if `require_preferred_engine=True` and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.

without_duplicates(*, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) → Relation¶

Return a new relation that removes any duplicate rows from this one.

This is a convenience method that constructs and applies a Deduplication operation.

Parameters:	preferred_engine : `Engine`, optional Engine that the operation would ideally be performed in. If this is not equal to `self.engine`, the `backtrack`, `transfer`, and `require_preferred_engine` arguments control the behavior. backtrack : `bool`, optional If `True` (default) and the current engine is not the preferred engine, attempt to insert this deduplication before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content. transfer : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, insert a new `Transfer` before the `Deduplication`. If `backtrack` is also true, the transfer is added only if the backtrack attempt fails. require_preferred_engine : `bool`, optional If `True` (`False` is default) and the current engine is not the preferred engine, raise `EngineError`. If `backtrack` is also true, the exception is only raised if the backtrack attempt fails. Ignored if `transfer` is true.
Returns:	relation : `Relation` Relation with no duplicate rows. This may be `self` if it can be determined that there is no duplication already, but this is not guaranteed.
Raises:	EngineError Raised if `require_preferred_engine=True` and it was impossible to insert this operation in the preferred engine.

Navigation

Select¶