Relation¶
- class lsst.daf.relation.Relation(*args, **kwargs)¶
Bases:
Protocol
An abstract interface for expression trees on tabular data.
See also
Notes
This ABC is a
typing.Protocol
, which means that classes that implement its interface can be recognized as such by static type checkers without actually inheriting from it, and in fact all concrete relation types inherit only fromBaseRelation
(which provides implementations of manyRelation
methods, but does not include the complete interface or inherit fromRelation
itself) instead. This split allows subclasses to implement attributes that are defined as properties here asdataclass
attributes instead of true properties, somethingtyping.Protocol
explicitly permits and recommends that nevertheless works only if the protocol is not actually inherited from.In almost all cases, users should use
Relation
instead ofBaseRelation
: the only exception is when writing anisinstance
check to see if a type is a relation at all, rather than a particular relation subclass.BaseRelation
may become an alias toRelation
itself in the future iftyping.Protocol
inheritance interaction with properties is improved.All concrete
Relation
types are frozen, equality-comparabledataclasses
. They also provide a very concisestr
representation (in addition to the dataclass-providedrepr
) suitable for summarizing an entire relation tree.Attributes Summary
The engine that is responsible for interpreting this relation (
Engine
).Whether a
join
to this relation will result in the other relation being returned directly (bool
).Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (
bool
).Whether this relation has no real content (
bool
).The maximum number of rows this relation might have (
int
orNone
).The minimum number of rows this relation might have (
int
).The engine-specific contents of the relation.
Methods Summary
attach_payload
(payload)Attach an engine-specific
payload
to this relation.chain
(rhs)Return a new relation with all rows from this relation and another.
join
(rhs[, predicate, backtrack, transfer])Return a new relation that joins this one to the given one.
materialized
([name, name_prefix])Return a new relation that indicates that this relation's payload should be cached after it is first processed.
sorted
(terms, *[, preferred_engine, ...])Return a new relation that sorts rows according to a sequence of column expressions.
transferred_to
(destination)Return a new relation that transfers this relation to a new engine.
with_calculated_column
(tag, expression, *[, ...])Return a new relation that adds a calculated column to this one.
with_only_columns
(columns, *[, ...])Return a new relation whose columns are a subset of this relation's.
with_rows_satisfying
(predicate, *[, ...])Return a new relation that filters out rows via a boolean expression.
without_duplicates
(*[, preferred_engine, ...])Return a new relation that removes any duplicate rows from this one.
Attributes Documentation
- is_join_identity¶
Whether a
join
to this relation will result in the other relation being returned directly (bool
).Join identity relations have exactly one row and no columns.
See also
- is_locked¶
Whether this relation and those upstream of it should be considered fixed by tree-manipulation algorithms (
bool
).
- is_trivial¶
Whether this relation has no real content (
bool
).A trivial relation is either a
join identity
with no columns and exactly one row, or a relation with an arbitrary number of columns and no rows (i.e.min_rows==max_rows==0
).
- max_rows¶
The maximum number of rows this relation might have (
int
orNone
).This is
None
for relations whose size is not bounded from above.
- payload¶
The engine-specific contents of the relation.
This is
None
in the common case that engine-specific contents are to be computed on-the-fly. Relation payloads permit “deferred initialization” - while relation objects are otherwise immutable, the payload may be set (once) after construction, viaattach_payload
.
Methods Documentation
- abstract attach_payload(payload: Any) None ¶
Attach an engine-specific
payload
to this relation.This method may be called exactly once on a
Relation
instance that was not initialized with apayload
, despite the fact thatRelation
objects are otherwise considered immutable.- Parameters:
- payload
Engine-specific content to attach.
- Raises:
- TypeError
Raised if this relation already has a payload, or can never have a payload.
TypeError
is used here for consistency with other attempts to assign to an attribute of an immutable object.
- abstract chain(rhs: Relation) Relation ¶
Return a new relation with all rows from this relation and another.
This is a convenience method that constructs and applies a
Chain
operation.- Parameters:
- rhs
Relation
Other relation to chain to
self
. Must have the same columns and engine asself
.
- rhs
- Returns:
- relation
Relation
New relation with all rows from both relations. This method never returns an operand directly, even if the other has
max_rows==0
, as it is assumed that even relations with no rows are useful to preserve in the tree fordiagnostics
.
- relation
- Raises:
- ColumnError
Raised if the two relations do not have the same columns.
- EngineError
Raised if the two relations do not have the same engine.
- abstract join(rhs: Relation, predicate: Predicate | None = None, *, backtrack: bool = True, transfer: bool = False) Relation ¶
Return a new relation that joins this one to the given one.
This is a convenience method that constructs and applies a
Join
operation, viaPartialJoin.apply
.- Parameters:
- rhs
Relation
Relation to join to
self
.- predicate
Predicate
, optional Boolean expression that must evaluate to true in order to join a a pair of rows, in addition to an implicit equality constraint on any columns in both relations.
- backtrack
bool
, optional If
True
(default) andself.engine != rhs.engine
, attempt to insert this join before a transfer upstream ofself
, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) andself.engine != rhs.engine
, insert a newTransfer
before theJoin
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.
- rhs
- Returns:
- relation
Relation
New relation that joins
self
torhs
. May beself
orrhs
if the other is ajoin identity
.
- relation
- Raises:
- ColumnError
Raised if the given predicate requires columns not present in
self
orrhs
.- EngineError
Raised if it was impossible to insert this operation in
rhs.engine
via backtracks or transfers onself
, or if the predicate was not supported by the engine.
Notes
This method does not treat
self
andrhs
symmetrically: it always considersrhs
fixed, and only backtracks into or considers applying transfers toself
.
- abstract materialized(name: str | None = None, *, name_prefix: str = 'materialization') Relation ¶
Return a new relation that indicates that this relation’s payload should be cached after it is first processed.
This is a convenience method that constructs and applies a
Materialization
operation.- Parameters:
- name
str
, optional Name to use for the cached payload within the engine (e.g. the name for a temporary table in SQL). If not provided, a name will be created via a call to
Engine.get_relation_name
.- name_prefix
str
, optional Prefix to pass to
Engine.get_relation_name
; ignored ifname
is provided. Unlike most operations,Materialization
relations are locked by default, since they reflect user intent to mark a specific tree as cacheable.
- name
- Returns:
- relation
Relation
New relation that marks its upstream tree for caching. May be
self
if it is already aLeafRelation
or another materialization (in which case the given name or name prefix will be ignored).
- relation
See also
- abstract sorted(terms: Sequence[SortTerm], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation ¶
Return a new relation that sorts rows according to a sequence of column expressions.
This is a convenience method that constructs and applies a
Sort
operation.- Parameters:
- terms
Sequence
[SortTerm
] Ordered sequence of column expressions to sort on, with whether to apply them in ascending or descending order.
- preferred_engine
Engine
, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine
, thebacktrack
,transfer
, andrequire_preferred_engine
arguments control the behavior.- backtrack
bool
, optional If
True
(default) and the current engine is not the preferred engine, attempt to insert this sort before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, insert a newTransfer
before theSort
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, raiseEngineError
. Ifbacktrack
is also true, the exception is only raised if the backtrack attempt fails. Ignored iftransfer
is true.
- terms
- Returns:
- relation
Relation
New relation with sorted rows. Will be
self
ifterms
is empty. Ifself
is already a sort operation relation, the operations will be merged by concatenating their terms, which may result in duplicate sort terms that have no effect.
- relation
- Raises:
- abstract transferred_to(destination: Engine) Relation ¶
Return a new relation that transfers this relation to a new engine.
This is a convenience method that constructs and applies a
Transfer
operation.
- abstract with_calculated_column(tag: ColumnTag, expression: ColumnExpression, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation ¶
Return a new relation that adds a calculated column to this one.
This is a convenience method chat constructs and applies a
Calculation
operation.- Parameters:
- tag
ColumnTag
Identifier for the new column.
- expression
ColumnExpression
Expression used to populate the new column.
- preferred_engine
Engine
, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine
, thebacktrack
,transfer
, andrequire_preferred_engine
arguments control the behavior.- backtrack
bool
, optional If
True
(default) and the current engine is not the preferred engine, attempt to insert this calculation before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, insert a newTransfer
before theCalculation
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, raiseEngineError
. Ifbacktrack
is also true, the exception is only raised if the backtrack attempt fails. Ignored iftransfer
is true.
- tag
- Returns:
- relation
Relation
Relation that contains the calculated column.
- relation
- Raises:
- ColumnError
Raised if the expression requires columns that are not present in
self.columns
, or iftag
is already present inself.columns
.- EngineError
Raised if
require_preferred_engine=True
and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.
- abstract with_only_columns(columns: Set[ColumnTag], *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation ¶
Return a new relation whose columns are a subset of this relation’s.
This is a convenience method that constructs and applies a
Projection
operation.- Parameters:
- columns
Set
[ColumnTag
] Columns to be propagated to the new relation; must be a subset of
self.columns
.- preferred_engine
Engine
, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine
, thebacktrack
,transfer
, andrequire_preferred_engine
arguments control the behavior.- backtrack
bool
, optional If
True
(default) and the current engine is not the preferred engine, attempt to insert this projection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, insert a newTransfer
before theProjection
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, raiseEngineError
. Ifbacktrack
is also true, the exception is only raised if the backtrack attempt fails. Ignored iftransfer
is true.
- columns
- Returns:
- relation
Relation
New relation with only the given columns. Will be
self
ifcolumns == self.columns
.
- relation
- Raises:
- ColumnError
Raised if
columns
is not a subset ofself.columns
.- EngineError
Raised if
require_preferred_engine=True
and it was impossible to insert this operation in the preferred engine.
- abstract with_rows_satisfying(predicate: Predicate, *, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation ¶
Return a new relation that filters out rows via a boolean expression.
This is a convenience method that constructions and applies a
Selection
operation.- Parameters:
- predicate
Predicate
Boolean expression that evaluates to
False
for rows that should be included andFalse
for rows that should be filtered out.- preferred_engine
Engine
, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine
, thebacktrack
,transfer
, andrequire_preferred_engine
arguments control the behavior.- backtrack
bool
, optional If
True
(default) and the current engine is not the preferred engine, attempt to insert this selection before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, insert a newTransfer
before theSelection
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, raiseEngineError
. Ifbacktrack
is also true, the exception is only raised if the backtrack attempt fails. Ignored iftransfer
is true.
- predicate
- Returns:
- relation
Relation
New relation with only the rows that satisfy the given predicate. May be
self
if the predicate istrivially True
.
- relation
- Raises:
- ColumnError
Raised if
predicate.columns_required
is not a subset ofself.columns
.- EngineError
Raised if
require_preferred_engine=True
and it was impossible to insert this operation in the preferred engine, or if the expression was not supported by the engine.
- abstract without_duplicates(*, preferred_engine: Engine | None = None, backtrack: bool = True, transfer: bool = False, require_preferred_engine: bool = False) Relation ¶
Return a new relation that removes any duplicate rows from this one.
This is a convenience method that constructs and applies a
Deduplication
operation.- Parameters:
- preferred_engine
Engine
, optional Engine that the operation would ideally be performed in. If this is not equal to
self.engine
, thebacktrack
,transfer
, andrequire_preferred_engine
arguments control the behavior.- backtrack
bool
, optional If
True
(default) and the current engine is not the preferred engine, attempt to insert this deduplication before a transfer upstream of the current relation, as long as this can be done without breaking up any locked relations or changing the resulting relation content.- transfer
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, insert a newTransfer
before theDeduplication
. Ifbacktrack
is also true, the transfer is added only if the backtrack attempt fails.- require_preferred_engine
bool
, optional If
True
(False
is default) and the current engine is not the preferred engine, raiseEngineError
. Ifbacktrack
is also true, the exception is only raised if the backtrack attempt fails. Ignored iftransfer
is true.
- preferred_engine
- Returns:
- relation
Relation
Relation with no duplicate rows. This may be
self
if it can be determined that there is no duplication already, but this is not guaranteed.
- relation
- Raises:
- EngineError
Raised if
require_preferred_engine=True
and it was impossible to insert this operation in the preferred engine.