ApdbSql¶
- class lsst.dax.apdb.ApdbSql(config: ApdbSqlConfig)¶
Bases:
ApdbImplementation of APDB interface based on SQL database.
The implementation is configured via standard
pex_configmechanism usingApdbSqlConfigconfiguration class. For an example of different configurations checkconfig/folder.- Parameters:
- config
ApdbSqlConfig Configuration object.
- config
Attributes Summary
Object providing adminitrative interface for APDB (
ApdbAdmin).Object controlling access to APDB metadata (
ApdbMetadata).Name of the metadata key to store code version number.
Name of the metadata key to store code version number.
Name of the metadata key to store code version number.
Name of the metadata key to store replica code version number.
Name of the metadata key to store schema version number.
APDB table schema from
sdm_schemas(ApdbSchema).Methods Summary
Return version number for current APDB implementation.
containsVisitDetector(visit, detector, ...)Test whether any sources for a given visit-detector are present in the APDB.
Return the number of DiaObjects that have only one DiaSource associated with them.
from_config(config)Create Ppdb instance from configuration object.
from_uri(uri)Make Apdb instance from a serialized configuration.
Return APDB configuration for this instance, including any updates that may be read from database.
getDiaForcedSources(region, object_ids, ...)Return catalog of DiaForcedSource instances from a given region.
getDiaObjects(region)Return catalog of DiaObject instances from a given region.
getDiaObjectsForDedup([since])Return catalog of DiaObject stored in APDB since specified time.
getDiaSources(region, object_ids, visit_time)Return catalog of DiaSource instances from a given region.
getDiaSourcesForDiaObjects(objects, start_time)Return catalog of DiaSources associated with given DiaObjects.
Return
ApdbReplicainstance for this database.init_database(db_url, *[, schema_file, ...])Initialize new APDB instance and make configuration object for it.
reassignDiaSources(idMap)Associate DiaSources with SSObjects, dis-associating them from DiaObjects.
reassignDiaSourcesToDiaObjects(idMap, *[, ...])Re-assign DiaSources from one DiaObject to another, typically during deduplication.
resetDedup([dedup_time])Delete deduplication-related data and remember deduplication time.
setValidityEnd(objects, validityEnd[, ...])Close validity interval for specified DiaObjects.
store(visit_time, objects[, sources, ...])Store all three types of catalogs in the database.
tableDef(table)Return table schema definition for a given table.
Return dictionary with the table names and row counts.
Attributes Documentation
- admin¶
- metadata¶
- metadataCodeVersionKey = 'version:ApdbSql'¶
Name of the metadata key to store code version number.
- metadataConfigKey = 'config:apdb-sql.json'¶
Name of the metadata key to store code version number.
- metadataDedupKey = 'status:deduplication.json'¶
Name of the metadata key to store code version number.
- metadataReplicaVersionKey = 'version:ApdbSqlReplica'¶
Name of the metadata key to store replica code version number.
- metadataSchemaVersionKey = 'version:schema'¶
Name of the metadata key to store schema version number.
- schema¶
Methods Documentation
- classmethod apdbImplementationVersion() VersionTuple¶
Return version number for current APDB implementation.
- Returns:
- version
VersionTuple Version of the code defined in implementation class.
- version
- containsVisitDetector(visit: int, detector: int, region: Region, visit_time: Time) bool¶
Test whether any sources for a given visit-detector are present in the APDB.
- Parameters:
- visit, detector
int The ID of the visit-detector to search for.
- region
lsst.sphgeom.Region Region corresponding to the visit/detector combination.
- visit_time
astropy.time.Time Visit time (as opposed to visit processing time). This can be any timestamp in the visit timespan, e.g. its begin or end time.
- visit, detector
- Returns:
- countUnassociatedObjects() int¶
Return the number of DiaObjects that have only one DiaSource associated with them.
Used as part of ap_verify metrics.
- Returns:
- count
int Number of DiaObjects with exactly one associated DiaSource.
- count
Notes
This method can be very inefficient or slow in some implementations.
- classmethod from_config(config: ApdbConfig) Apdb¶
Create Ppdb instance from configuration object.
- Parameters:
- config
ApdbConfig Configuration object, type of this object determines type of the Apdb implementation.
- config
- Returns:
- apdb
apdb Instance of
Apdbclass.
- apdb
- classmethod from_uri(uri: str | ParseResult | ResourcePath | Path) Apdb¶
Make Apdb instance from a serialized configuration.
- Parameters:
- uri
ResourcePathExpression URI or local file path pointing to a file with serialized configuration, or a string with a “label:” prefix. In the latter case, the configuration will be looked up from an APDB index file using the label name that follows the prefix. The APDB index file’s location is determined by the
DAX_APDB_INDEX_URIenvironment variable.
- uri
- Returns:
- apdb
apdb Instance of
Apdbclass, the type of the returned instance is determined by configuration.
- apdb
- getConfig() ApdbSqlConfig¶
Return APDB configuration for this instance, including any updates that may be read from database.
- Returns:
- config
ApdbConfig APDB configuration.
- config
- getDiaForcedSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time, start_time: Time | None = None) DataFrame | None¶
Return catalog of DiaForcedSource instances from a given region.
- Parameters:
- region
lsst.sphgeom.Region Region to search for DIASources.
- object_idsiterable [
int], optional List of DiaObject IDs to further constrain the set of returned sources. If list is empty then empty catalog is returned with a correct schema. If
Nonethen returned sources are not constrained.- visit_time
astropy.time.Time Time of the current visit. If APDB contains records later than this time they may also be returned.
- start_time
astropy.time.Time, optional Lower bound of time window for the query. If not specified then it is calculated using
visit_timeandread_forced_sources_monthsconfiguration parameter.
- region
- Returns:
- catalog
pandas.DataFrame, orNone Catalog containing DiaForcedSource records.
Noneis returned ifstart_timeis not specified andread_forced_sources_monthsconfiguration parameter is set to 0.
- catalog
- Raises:
- NotImplementedError
May be raised by some implementations if
object_idsisNone.
Notes
This method returns DiaForcedSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by
read_forced_sources_monthsconfig parameter, w.r.t.visit_time. Ifobject_idsis empty then an empty catalog is always returned with the correct schema (columns/types). Ifobject_idsisNonethen no filtering is performed and some of the returned records may be outside the specified region.
- getDiaObjects(region: Region) DataFrame¶
Return catalog of DiaObject instances from a given region.
This method returns only the last version of each DiaObject, and may return only the subset of the DiaObject columns needed for AP association. Some records in a returned catalog may be outside the specified region, it is up to a client to ignore those records or cleanup the catalog before futher use.
- Parameters:
- region
lsst.sphgeom.Region Region to search for DIAObjects.
- region
- Returns:
- catalog
pandas.DataFrame Catalog containing DiaObject records for a region that may be a superset of the specified region.
- catalog
- getDiaObjectsForDedup(since: Time | None = None) DataFrame¶
Return catalog of DiaObject stored in APDB since specified time.
This method should be used by deduplication algorithm to retrieve DiaObject records added to APDB since previous deduplication (typically during previous night). Returned catalog will have only a small subset of DiaObject attributes required by deduplication algorithm.
- Parameters:
- since
astropy.time.Time, optional Starting search time (time of previous deduplication). If not provided the time of the last deduplication stored in metadata by
resetDedupmethod is used.
- since
- Returns:
- catalog
pandas.DataFrame Catalog containing DiaObject records, only a subset of attributes will be returned.
- catalog
- getDiaSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time, start_time: Time | None = None) DataFrame | None¶
Return catalog of DiaSource instances from a given region.
- Parameters:
- region
lsst.sphgeom.Region Region to search for DIASources.
- object_idsiterable [
int], optional List of DiaObject IDs to further constrain the set of returned sources. If
Nonethen returned sources are not constrained. If list is empty then empty catalog is returned with a correct schema.- visit_time
astropy.time.Time Time of the current visit. If APDB contains records later than this time they may also be returned.
- start_time
astropy.time.Time, optional Lower bound of time window for the query. If not specified then it is calculated using
visit_timeandread_forced_sources_monthsconfiguration parameter.
- region
- Returns:
- catalog
pandas.DataFrame, orNone Catalog containing DiaSource records.
Noneis returned ifstart_timeis not specified andread_sources_monthsconfiguration parameter is set to 0.
- catalog
Notes
This method returns DiaSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by
read_sources_monthsconfig parameter, w.r.t.visit_time. Ifobject_idsis empty then an empty catalog is always returned with the correct schema (columns/types). Ifobject_idsisNonethen no filtering is performed and some of the returned records may be outside the specified region.
- getDiaSourcesForDiaObjects(objects: list[lsst.dax.apdb.recordIds.DiaObjectId], start_time: Time, max_dist_arcsec: float = 1.0) DataFrame¶
Return catalog of DiaSources associated with given DiaObjects.
- Parameters:
- objects
list[DiaObjectId] DiaObjects associated with returned DiaSources.
- start_time
astropy.time.Time Lower bound for
midpointMjdTaifor returned DiaSources.- max_dist_arcsec
float Maximum expected distance in arcsec between DiaSource and DiaObject. This parameter is used to optimize spatial queries in cases when DiaObject is located near the partition boundary. If the distance from DiaObject to the boundary is smaller than
max_dist_arcsec, then the neighbor partition will be included in search too.
- objects
- Returns:
- catalog
pandas.DataFrame Catalog containing DiaSource records associated to given DiaObjects.
- catalog
Notes
Primary purpose of this method is to support deduplication algorithm. Its implementation is likely to be very slow and inefficient, it should not be used for regular queries.
- get_replica() ApdbSqlReplica¶
Return
ApdbReplicainstance for this database.
- classmethod init_database(db_url: str, *, schema_file: str | None = None, ss_schema_file: str | None = None, read_sources_months: int | None = None, read_forced_sources_months: int | None = None, enable_replica: bool = False, connection_timeout: int | None = None, dia_object_index: str | None = None, htm_level: int | None = None, htm_index_column: str | None = None, ra_dec_columns: tuple[str, str] | None = None, prefix: str | None = None, namespace: str | None = None, drop: bool = False) ApdbSqlConfig¶
Initialize new APDB instance and make configuration object for it.
- Parameters:
- db_url
str SQLAlchemy database URL.
- schema_file
str, optional Location of (YAML) configuration file with APDB schema. If not specified then default location will be used.
- ss_schema_file
str, optional Location of (YAML) configuration file with SSO schema. If not specified then default location will be used.
- read_sources_months
int, optional Number of months of history to read from DiaSource.
- read_forced_sources_months
int, optional Number of months of history to read from DiaForcedSource.
- enable_replica
bool, optional If True, make additional tables used for replication to PPDB.
- connection_timeout
int, optional Database connection timeout in seconds.
- dia_object_index
str, optional Indexing mode for DiaObject table.
- htm_level
int, optional HTM indexing level.
- htm_index_column
str, optional Name of a HTM index column for DiaObject and DiaSource tables.
- ra_dec_columns
tuple[str,str], optional Names of ra/dec columns in DiaObject table.
- prefix
str, optional Optional prefix for all table names.
- namespace
str, optional Name of the database schema for all APDB tables. If not specified then default schema is used.
- drop
bool, optional If
Truethen drop existing tables before re-creating the schema.
- db_url
- Returns:
- config
ApdbSqlConfig Resulting configuration object for a created APDB instance.
- config
- reassignDiaSources(idMap: Mapping[int, int]) None¶
Associate DiaSources with SSObjects, dis-associating them from DiaObjects.
- Parameters:
- idMap
Mapping Maps DiaSource IDs to their new SSObject IDs.
- idMap
- Raises:
- ValueError
Raised if DiaSource ID does not exist in the database.
- reassignDiaSourcesToDiaObjects(idMap: Mapping[DiaSourceId, int], *, increment_nDiaSources: bool = True, decrement_nDiaSources: bool = True) None¶
Re-assign DiaSources from one DiaObject to another, typically during deduplication.
- Parameters:
- idMap
Mapping[DiaSourceId,int] Mapping from DiaSource to their new
diaObjectId.- increment_nDiaSources
bool, optional If
Truethen increment the value ofnDiaSourcesin DiaObjects that DiaSources are reassigned to.- decrement_nDiaSources
bool, optional If
Truethen decrement the value ofnDiaSourcesin DiaObjects that DiaSources are reassigned from.
- idMap
- Raises:
- LookupError
Raised if some of DiaSources or DiaObjects are not found.
Notes
DiaSources initially could be associated with SSObjects. This method needs to be called before
setValidityEnd.
- resetDedup(dedup_time: Time | None = None) None¶
Delete deduplication-related data and remember deduplication time. Deduplication data generated before
dedup_timewill be erased.- Parameters:
- dedup_time
astropy.time.Time, optional Time of the last deduplication, current time is used if not provided.
- dedup_time
- setValidityEnd(objects: list[lsst.dax.apdb.recordIds.DiaObjectId], validityEnd: Time, raise_on_missing_id: bool = False) int¶
Close validity interval for specified DiaObjects.
- Parameters:
- objects
list[DiaObjectId] DiaObjects which will have their validityEnd updated, if their current validityEnd is NULL.
- validityEnd
astropy.time.Time Value for validityEnd.
- raise_on_missing_id
bool, optional If
TruethenLookupErrorwill be raised if any object in the list is missing from the database.
- objects
- Returns:
- count
int Actual number of records for which validityEnd was updated.
- count
- Raises:
- LookupError
Raised if
raise_on_missing_idisTrueand some of the specified DiaObjects could not be found in the database.
Notes
This method has to be called after
reassignDiaSourcesToDiaObjects.
- store(visit_time: Time, objects: DataFrame, sources: DataFrame | None = None, forced_sources: DataFrame | None = None) None¶
Store all three types of catalogs in the database.
- Parameters:
- visit_time
astropy.time.Time Time of the visit.
- objects
pandas.DataFrame Catalog with DiaObject records.
- sources
pandas.DataFrame, optional Catalog with DiaSource records.
- forced_sources
pandas.DataFrame, optional Catalog with DiaForcedSource records.
- visit_time
Notes
This methods takes DataFrame catalogs, their schema must be compatible with the schema of APDB table:
column names must correspond to database table columns
types and units of the columns must match database definitions, no unit conversion is performed presently
columns that have default values in database schema can be omitted from catalog
this method knows how to fill interval-related columns of DiaObject (validityStart, validityEnd) they do not need to appear in a catalog
source catalogs have
diaObjectIdcolumn associating sources with objects
This operation need not be atomic, but DiaSources and DiaForcedSources will not be stored until all DiaObjects are stored.
- tableDef(table: ApdbTables) Table | None¶
Return table schema definition for a given table.
- Parameters:
- table
ApdbTables One of the known APDB tables.
- table
- Returns:
- tableSchema
schema_model.TableorNone Table schema description,
Noneis returned if table is not defined by this implementation.
- tableSchema
- tableRowCount() dict[str, int]¶
Return dictionary with the table names and row counts.
Used by
ap_prototo keep track of the size of the database tables. Depending on database technology this could be expensive operation.- Returns:
- row_counts
dict Dict where key is a table name and value is a row count.
- row_counts