AssociationTask#

Bases: Task

Associate DIAOSources into existing DIAObjects.

This task performs the association of detected DIASources in a visit with the previous DIAObjects detected over time. It also creates new DIAObjects out of DIASources that cannot be associated with previously detected DIAObjects.

Methods Summary

`associate_sources`(dia_objects, dia_sources)	Associate the input DIASources with the catalog of DIAObjects.
`check_dia_source_radec`(dia_sources)	Check that all DiaSources have non-NaN values for RA/DEC.
`match`(dia_objects, dia_sources, score_struct)	Solve a min-cost bipartite matching between sources and objects.
`run`(diaSources, diaObjects[, schema])	Associate the new DiaSources with existing DiaObjects.
`score`(dia_objects, dia_sources, max_dist)	Build the candidate (DIASource, DIAObject) match table and score every pair.

Methods Documentation

associate_sources(dia_objects, dia_sources, schema=None)#

Associate the input DIASources with the catalog of DIAObjects.

DiaObject DataFrame must be indexed on diaObjectId.

Parameters#

dia_objectspandas.DataFrame: Catalog of DIAObjects to attempt to associate the input DIASources into.
dia_sourcespandas.DataFrame: DIASources to associate into the DIAObjectCollection.
schemadict [str, felis.datamodel.Schema] or None, optional: Dictionary of Schemas from sdm_schemas containing the table definition to use.

Returns#

resultlsst.pipe.base.Struct

Results struct with components.

diaSources : Full set of diaSources both matched and not. (pandas.DataFrame)
nUpdatedDiaObjects : Number of DiaObjects that were associated. (int)
nUnassociatedDiaObjects : Number of DiaObjects that were not matched a new DiaSource. (int)

check_dia_source_radec(dia_sources)#

Check that all DiaSources have non-NaN values for RA/DEC.

If one or more DiaSources are found to have NaN values, throw a warning to the log with the ids of the offending sources. Drop them from the table.

Parameters#

dia_sourcespandas.DataFrame: Input DiaSources to check for NaN values.

Returns#

trimmed_sourcespandas.DataFrame: DataFrame of DiaSources trimmed of all entries with NaN values for RA/DEC.

match(dia_objects, dia_sources, score_struct, schema=None)#

Solve a min-cost bipartite matching between sources and objects.

Each DIASource is given a synthetic ‘no-match’ alternative (a per-source ‘ghost’ column) carrying score_struct.unmatched_cost. A min-weight full bipartite matching is then solved on the sparse cost matrix made from real candidate pairs and ghost edges. Sources matched to a ghost are reported as unassociated; sources matched to a real object inherit that object’s diaObjectId.

When two sources compete for the same object, the source with a strictly worse alternative gets that object, and the other source falls back to its second-best candidate rather than creating a new DIAObject.

Parameters#

dia_objects, dia_sourcespandas.DataFrame: Must contain ra and dec; raErr and decErr are used when present.
score_structlsst.pipe.base.Struct: Output of score: src_idx, obj_idx, scores, unmatched_cost.
schemadict [str, felis.datamodel.Schema] or None, optional: Dictionary of Schemas from sdm_schemas containing the table definition to use.

Returns#

result : lsst.pipe.base.Struct

diaSources : input source table with diaObjectId populated (0 for unmatched). (pandas.DataFrame)

nUpdatedDiaObjects : number of DIAObjects matched to a new DIASource. (int)

nUnassociatedDiaObjects : number of preloaded DIAObjects with no matching DIASource. (int)

run(diaSources, diaObjects, schema=None)#

Associate the new DiaSources with existing DiaObjects.

Parameters#

diaSourcespandas.DataFrame: New DIASources to be associated with existing DIAObjects.
diaObjectspandas.DataFrame: Existing diaObjects from the Apdb.
schemadict [str, felis.datamodel.Schema] or None, optional: Dictionary of Schemas from sdm_schemas containing the table definition to use. If None, dtypes for new columns are guessed from the input tables.

Returns#

resultlsst.pipe.base.Struct

Results struct with components.

matchedDiaSources : DiaSources that were matched. Matched Sources have their diaObjectId updated and set to the id of the diaObject they were matched to. (pandas.DataFrame)
unAssocDiaSources : DiaSources that were not matched. Unassociated sources have their diaObject set to 0 as they were not associated with any existing DiaObjects. (pandas.DataFrame)
nUpdatedDiaObjects : Number of DiaObjects that were matched to new DiaSources. (int)
nUnassociatedDiaObjects : Number of DiaObjects that were not matched a new DiaSource. (int)

score(dia_objects, dia_sources, max_dist)#

Build the candidate (DIASource, DIAObject) match table and score every pair.

For each DIASource, all DIAObjects within max_dist are retrieved from a kd-tree on unit vectors. Each candidate pair is then scored:

If both inputs carry usable raErr/decErr columns, the score is the 2D Gaussian negative log-likelihood of the position residual (0.5 * chi^2 + 0.5 * ln(var_ra * var_dec)), so the match prefers the most likely object, not merely the nearest or the lowest-chi^2 one (see _position_nll).
Otherwise, the distance (in radians) is used as the score.

No candidates are dropped by the score itself: every pair within max_dist is retained, and the score is used only to rank them.

raErr and decErr are taken to follow the LSST DPDD convention: each is the marginal uncertainty of the catalog coordinate itself in degrees (no cos(dec) factor folded into raErr). Under that convention the cos(dec) factor cancels between residual and uncertainty, and chi^2 reduces to dRA^2 / sum(raErr^2) + dDec^2 / sum(decErr^2).

max_dist is both the candidate pre-filter and the association radius: every pair within it is retained, and the score is used only to rank candidates in the downstream match.

Parameters#

dia_objects, dia_sourcespandas.DataFrame: Must contain ra and dec; raErr and decErr are used when present.
max_distlsst.geom.Angle: Hard angular upper bound on candidate pairs.

Returns#

resultlsst.pipe.base.Struct

Flat candidate-pair table:

src_idxnumpy.ndarray of int
Positional source index for each surviving pair.
obj_idxnumpy.ndarray of int
Positional object index for each surviving pair.
scoresnumpy.ndarray of float
Cost of each pair (position negative log-likelihood if uncertainty-based, chord distance in radians otherwise). Lower is better; NLL values may be negative.
unmatched_costfloat
Cost to assign to the synthetic ‘no-match’ alternative in the linear-assignment match — set so that any surviving real candidate is preferred.