Select¶
- class lsst.daf.relation.sql.Select(target: Relation, payload: Any | None = None, *, sort: Sort, projection: Projection | None, deduplication: Deduplication | None, slice: Slice, skip_to: Relation, is_compound: bool)¶
Bases:
MarkerRelationA marker operation used by a the SQL engine to group relation trees into SELECT statements.
Selectobjects should not generally be added to relation trees by code outside the SQL engine itself, except via the inheritedreapplyinterface. UseEngine.conformto insertSelectmarkers into an arbitrary relation tree (while reordering its operations accordingly).Notes
A conformed SQL relation tree always starts with a
Selectrelation, immediately followed by any projection, deduplication, sort, and slice (in that order) that appear within the corresponding SQLSELECTstatement. These operations are held directly by theSelectitself as attributes, and the first upstream relation that isn’t one of those operations is held as theskip_toattribute. NestedSelectinstances correspond to sets of relations that will appear within subqueries. This practice allows the SQL engine to reduce subquery nesting and bringSortoperations downstream into the outermostSELECTstatement whenever possible, sinceORDER BYclauses in subqueries do not propagate to the order of the outer statement.The SQL engine’s relation-processing algorithms typically traverse a tree that starts with a
Selectby recursing toskip_torather thantarget, since theSelectobject’s attributes also fully determine the operations betweenskip_toandtarget. In this pattern, theapply_skipandreapply_skipmethods are used to add possibly-modifiedSelectmarkers after the upstreamskip_totree has been processed.In contrast, general relation-processing algorithms that only see the
Selectas an opaqueMarkerRelationrecurse viatargetand usereapplyto restore a possibly-modifiedSelectmarker.The operations managed by the
Selectare always added in the same order, which is consistent with the order the equivalentSELECTstatement would apply them:SortProjectionDeduplicationSlice
Note that the
Projectionneeds to follow theSortin order to allow theORDER BYclause to reference columns that do not appear in theSELECTclause (otherwise these operations would commute with each other and theDeduplication).Attributes Summary
The columns in this relation (
Set[ColumnTag] ).The engine that is responsible for interpreting this relation (
Engine).Whether there is a
Deduplicationbetweenskip_toandtarget(bool).Whether there is a
Projectionbetweenskip_toandtarget(bool).Whether there is a
Slicebetweenskip_toandtarget(bool).Whether there is a
Sortbetweenskip_toandtarget.Whether a
jointo this relation will result in the other relation being returned directly (bool).Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (
bool).Whether this relation has no real content (
bool).The maximum number of rows this relation might have (
intorNone).The minimum number of rows this relation might have (
int).The engine-specific contents of the relation.
Methods Summary
apply_skip(skip_to[, sort, projection, ...])Wrap a relation in a
Selectand add all of the operations it manages.attach_payload(payload)Attach an engine-specific
payloadto this relation.chain(rhs)Return a new relation with all rows from this relation and another.
join(rhs[, predicate, backtrack, transfer])Return a new relation that joins this one to the given one.
materialized([name, name_prefix])Return a new relation that indicates that this relation's payload should be cached after it is first processed.
reapply(target[, payload])Mark a new target relation, returning a new instance of the same type.
reapply_skip([skip_to, after])Return a modified version of this
Select.sorted(terms, *[, preferred_engine, ...])Return a new relation that sorts rows according to a sequence of column expressions.
strip()Remove the
Selectmarker and any precedingProjectionfrom a relation if it has no other managed operations.transferred_to(destination)Return a new relation that transfers this relation to a new engine.
with_calculated_column(tag, expression, *[, ...])Return a new relation that adds a calculated column to this one.
with_only_columns(columns, *[, ...])Return a new relation whose columns are a subset of this relation's.
with_rows_satisfying(predicate, *[, ...])Return a new relation that filters out rows via a boolean expression.
without_duplicates(*[, preferred_engine, ...])Return a new relation that removes any duplicate rows from this one.
Attributes Documentation
- is_join_identity¶
Whether a
jointo this relation will result in the other relation being returned directly (bool).Join identity relations have exactly one row and no columns.
See also
LeafRelation.make_join_identity
- is_locked¶
Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (
bool).
- is_trivial¶
Whether this relation has no real content (
bool).A trivial relation is either a
join identitywith no columns and exactly one row, or a relation with an arbitrary number of columns and no rows (i.e.min_rows==max_rows==0).
- max_rows¶
The maximum number of rows this relation might have (
intorNone).This is
Nonefor relations whose size is not bounded from above.
Methods Documentation
- classmethod apply_skip(skip_to: Relation, sort: Sort | None = None, projection: Projection | None = None, deduplication: Deduplication | None = None, slice: Slice | None = None) Select¶
Wrap a relation in a
Selectand add all of the operations it manages.- Parameters:
- skip_to
Relation The relation to add the
Selectto, after first adding any requested operations. This must not have any of the operation types managed by theSelectclass unless they are immediately upstream of another existingSelect.- sort
Sort, optional A sort to apply to
skip_toand add to the newSelect.- projection
Projection, optional A projection to apply to
skip_toand add to the newSelect.- deduplication
Deduplication, optional A deduplication to apply to
skip_toand add to the newSelect.- slice
Slice, optional A slice to apply to
skip_toand add to the newSelect.
- skip_to
- Returns:
- attach_payload(payload: Any) None¶
Attach an engine-specific
payloadto this relation.This method may be called exactly once on a
MarkerRelationinstance that was not initialized with apayload, despite the fact thatRelationobjects are otherwise considered immutable.- Parameters:
- payload
Engine-specific content to attach.
- Raises:
- TypeError
Raised if this relation already has a payload, or if this marker subclass can never have a payload.
TypeErroris used here for consistency with other attempts to assign to an attribute of an immutable object.
- chain(rhs: Relation) Relation¶
Return a new relation with all rows from this relation and another.
This is a convenience method that constructs and applies a
Chainoperation.- Parameters:
- rhs
Relation Other relation to chain to
self. Must have the same columns and engine asself.
- rhs
- Returns:
- relation
Relation New relation with all rows from both relations. If the engine
preserves orderfor chains, all rows fromselfwill appear before all rows fromrhs, in their original order. This method never returns an operand directly, even if the other hasmax_rows==0, as it is assumed that even relations with no rows are useful to preserve in the tree fordiagnostics.
- relation
- Raises:
- ColumnError
Raised if the two relations do not have the same columns.
- EngineError
Raised if the two relations do not have the same engine.
- RowOrderError
Raised if
selforrhsis unnecessarily ordered; seeexpect_unordered.
- join(rhs: Relation, predicate: Predicate | None = None, *, backtrack: bool = True, transfer: bool = False) Relation¶
Return a new relation that joins this one to the given one.
This is a convenience method that constructs and applies a
Joinoperation, viaPartialJoin.apply.- Parameters:
- rhs
Relation Relation to join to
self.- predicate
Predicate, optional Boolean expression that must evaluate to true in order to join a a pair of rows, in addition to an implicit equality constraint on any columns in both relations.
- backtrack
bool, optional If
True(default) andself.engine != rhs.engine, attempt to insert this join before a transfer upstream ofself, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) andself.engine != rhs.engine, insert a newTransferbefore theJoin. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.
- rhs
- Returns:
- relation
Relation New relation that joins
selftorhs. May beselforrhsif the other is ajoin identity.
- relation
- Raises:
- ColumnError
Raised if the given predicate requires columns not present in
selforrhs.- EngineError
Raised if it was impossible to insert this operation in
rhs.enginevia backtracks or transfers onself, or if the predicate was not supported by the engine.- RowOrderError
Raised if
selforrhsis unnecessarily ordered; seeexpect_unordered.
Notes
This method does not treat
selfandrhssymmetrically: it always considersrhsfixed, and only backtracks into or considers applying transfers toself.
- materialized(name: str | None = None, *, name_prefix: str = 'materialization') Relation¶
Return a new relation that indicates that this relation’s payload should be cached after it is first processed.
This is a convenience method that constructs and applies a
Materializationoperation.- Parameters:
- name
str, optional Name to use for the cached payload within the engine (e.g. the name for a temporary table in SQL). If not provided, a name will be created via a call to
Engine.get_relation_name.- name_prefix
str, optional Prefix to pass to
Engine.get_relation_name; ignored ifnameis provided. Unlike most operations,Materializationrelations are locked by default, since they reflect user intent to mark a specific tree as cacheable.
- name
- Returns:
- relation
Relation New relation that marks its upstream tree for caching. May be
selfif it is already aLeafRelationor another materialization (in which case the given name or name prefix will be ignored).
- relation
- Raises:
See also
Processor.materialize
- reapply(target: Relation, payload: Any | None = None) Select¶
Mark a new target relation, returning a new instance of the same type.
- Parameters:
- target
Relation New relation to mark.
- payload, optional
Payload to attach to the new relation.
- target
- Returns:
- relation
MarkerRelation A new relation with the given target.
- relation
Notes
This method is primarily intended for use by operations that “unroll” a relation tree to perform some modification upstream and then “replay” the operations and markers that were downstream.
MarkerRelationimplementations with state that depends on the target will need to override this method to update that state accordingly.
- reapply_skip(skip_to: Relation | None = None, after: UnaryOperation | None = None, **kwargs: Any) Select¶
Return a modified version of this
Select.- Parameters:
- skip_to
Relation, optional The relation to add the
Selectto, after first adding any requested operations. This must not have any of the operation types managed by theSelectclass unless they are immediately upstream of another existingSelect. If not provided,self.skip_tois used.- after
UnaryOperation, optional A unary operation to apply to
skip_tobefore the operations managed bySelect. Must not be one of the operattion types managed bySelect.- **kwargs
Operations to include in the
Select, forwarded toapply_skip. Default is to apply the same operations already inself, andNonecan be passed to drop one of these operations.
- skip_to
- Returns:
- sorted(terms: Sequence[SortTerm], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation¶
Return a new relation that sorts rows according to a sequence of column expressions.
This is a convenience method that constructs and applies a
Sortoperation.- Parameters:
- terms
Sequence[SortTerm] Ordered sequence of column expressions to sort on, with whether to apply them in ascending or descending order.
- preferred_engine
Engine, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine, thebacktrack,transfer, andrequire_preferred_enginearguments control the behavior.- backtrack
bool, optional If
True(default) and the current engine is not the preferred engine, attempt to insert this sort before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, insert a newTransferbefore theSort. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, raiseEngineError. Ifbacktrackis also true, the exception is only raised if the backtrack attempt fails. Ignored iftransferis true.
- terms
- Returns:
- relation
Relation New relation with sorted rows. Will be
selfiftermsis empty. Ifselfis already a sort operation relation, the operations will be merged by concatenating their terms, which may result in duplicate sort terms that have no effect.
- relation
- Raises:
- ColumnError
Raised if any column required by a
SortTermis not present inself.columns.- EngineError
Raised if
require_preferred_engine=Trueand it was impossible to insert this operation in the preferred engine, or if aSortTermexpression was not supported by the engine.
- strip() tuple[lsst.daf.relation._relation.Relation, bool]¶
Remove the
Selectmarker and any precedingProjectionfrom a relation if it has no other managed operations.
- transferred_to(destination: Engine) Relation¶
Return a new relation that transfers this relation to a new engine.
This is a convenience method that constructs and applies a
Transferoperation.- Parameters:
- destination
Engine Engine for the new relation.
- destination
- Returns:
- relation
Relation New relation in the given engine. Will be
selfifself.engine == destination.
- relation
- Raises:
- with_calculated_column(tag: ColumnTag, expression: ColumnExpression, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation¶
Return a new relation that adds a calculated column to this one.
This is a convenience method chat constructs and applies a
Calculationoperation.- Parameters:
- tag
ColumnTag Identifier for the new column.
- expression
ColumnExpression Expression used to populate the new column.
- preferred_engine
Engine, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine, thebacktrack,transfer, andrequire_preferred_enginearguments control the behavior.- backtrack
bool, optional If
True(default) and the current engine is not the preferred engine, attempt to insert this calculation before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, insert a newTransferbefore theCalculation. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, raiseEngineError. Ifbacktrackis also true, the exception is only raised if the backtrack attempt fails. Ignored iftransferis true.
- tag
- Returns:
- relation
Relation Relation that contains the calculated column.
- relation
- Raises:
- ColumnError
Raised if the expression requires columns that are not present in
self.columns, or iftagis already present inself.columns.- EngineError
Raised if
require_preferred_engine=Trueand it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.
- with_only_columns(columns: Set[ColumnTag], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation¶
Return a new relation whose columns are a subset of this relation’s.
This is a convenience method that constructs and applies a
Projectionoperation.- Parameters:
- columns
Set[ColumnTag] Columns to be propagated to the new relation; must be a subset of
self.columns.- preferred_engine
Engine, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine, thebacktrack,transfer, andrequire_preferred_enginearguments control the behavior.- backtrack
bool, optional If
True(default) and the current engine is not the preferred engine, attempt to insert this projection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, insert a newTransferbefore theProjection. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, raiseEngineError. Ifbacktrackis also true, the exception is only raised if the backtrack attempt fails. Ignored iftransferis true.
- columns
- Returns:
- relation
Relation New relation with only the given columns. Will be
selfifcolumns == self.columns.
- relation
- Raises:
- ColumnError
Raised if
columnsis not a subset ofself.columns.- EngineError
Raised if
require_preferred_engine=Trueand it was impossible to insert this operation in the preferred engine.
- with_rows_satisfying(predicate: Predicate, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation¶
Return a new relation that filters out rows via a boolean expression.
This is a convenience method that constructions and applies a
Selectionoperation.- Parameters:
- predicate
Predicate Boolean expression that evaluates to
Falsefor rows that should be included andFalsefor rows that should be filtered out.- preferred_engine
Engine, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine, thebacktrack,transfer, andrequire_preferred_enginearguments control the behavior.- backtrack
bool, optional If
True(default) and the current engine is not the preferred engine, attempt to insert this selection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, insert a newTransferbefore theSelection. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, raiseEngineError. Ifbacktrackis also true, the exception is only raised if the backtrack attempt fails. Ignored iftransferis true.
- predicate
- Returns:
- relation
Relation New relation with only the rows that satisfy the given predicate. May be
selfif the predicate istrivially True.
- relation
- Raises:
- ColumnError
Raised if
predicate.columns_requiredis not a subset ofself.columns.- EngineError
Raised if
require_preferred_engine=Trueand it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.
- without_duplicates(*, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation¶
Return a new relation that removes any duplicate rows from this one.
This is a convenience method that constructs and applies a
Deduplicationoperation.- Parameters:
- preferred_engine
Engine, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine, thebacktrack,transfer, andrequire_preferred_enginearguments control the behavior.- backtrack
bool, optional If
True(default) and the current engine is not the preferred engine, attempt to insert this deduplication before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, insert a newTransferbefore theDeduplication. Ifbacktrackis also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool, optional If
True(Falseis default) and the current engine is not the preferred engine, raiseEngineError. Ifbacktrackis also true, the exception is only raised if the backtrack attempt fails. Ignored iftransferis true.
- preferred_engine
- Returns:
- relation
Relation Relation with no duplicate rows. This may be
selfif it can be determined that there is no duplication already, but this is not guaranteed.
- relation
- Raises:
- EngineError
Raised if
require_preferred_engine=Trueand it was impossible to insert this operation in the preferred engine.