ApdbCassandra

class lsst.dax.apdb.ApdbCassandra(config: ApdbCassandraConfig)

Bases: Apdb

Implementation of APDB database on to of Apache Cassandra.

The implementation is configured via standard pex_config mechanism using ApdbCassandraConfig configuration class. For an example of different configurations check config/ folder.

Parameters:
configApdbCassandraConfig

Configuration object.

Attributes Summary

metadata

Object controlling access to APDB metadata (ApdbMetadata).

metadataCodeVersionKey

Name of the metadata key to store code version number.

metadataConfigKey

Name of the metadata key to store code version number.

metadataReplicaVersionKey

Name of the metadata key to store replica code version number.

metadataSchemaVersionKey

Name of the metadata key to store schema version number.

partition_zero_epoch

Start time for partition 0, this should never be changed.

Methods Summary

apdbImplementationVersion()

Return version number for current APDB implementation.

apdbSchemaVersion()

Return schema version number as defined in config file.

containsVisitDetector(visit, detector)

Test whether data for a given visit-detector is present in the APDB.

countUnassociatedObjects()

Return the number of DiaObjects that have only one DiaSource associated with them.

dailyJob()

Implement daily activities like cleanup/vacuum.

from_config(config)

Create Ppdb instance from configuration object.

from_uri(uri)

Make Apdb instance from a serialized configuration.

getDiaForcedSources(region, object_ids, ...)

Return catalog of DiaForcedSource instances from a given region.

getDiaObjects(region)

Return catalog of DiaObject instances from a given region.

getDiaSources(region, object_ids, visit_time)

Return catalog of DiaSource instances from a given region.

getSSObjects()

Return catalog of SSObject instances.

get_replica()

Return ApdbReplica instance for this database.

init_database(hosts, keyspace, *[, ...])

Initialize new APDB instance and make configuration object for it.

makeField(doc)

Make a ConfigurableField for Apdb.

reassignDiaSources(idMap)

Associate DiaSources with SSObjects, dis-associating them from DiaObjects.

store(visit_time, objects[, sources, ...])

Store all three types of catalogs in the database.

storeSSObjects(objects)

Store or update SSObject catalog.

tableDef(table)

Return table schema definition for a given table.

Attributes Documentation

metadata
metadataCodeVersionKey = 'version:ApdbCassandra'

Name of the metadata key to store code version number.

metadataConfigKey = 'config:apdb-cassandra.json'

Name of the metadata key to store code version number.

metadataReplicaVersionKey = 'version:ApdbCassandraReplica'

Name of the metadata key to store replica code version number.

metadataSchemaVersionKey = 'version:schema'

Name of the metadata key to store schema version number.

partition_zero_epoch = <Time object: scale='tai' format='unix_tai' value=0.0>

Start time for partition 0, this should never be changed.

Methods Documentation

classmethod apdbImplementationVersion() VersionTuple

Return version number for current APDB implementation.

Returns:
versionVersionTuple

Version of the code defined in implementation class.

apdbSchemaVersion() VersionTuple

Return schema version number as defined in config file.

Returns:
versionVersionTuple

Version of the schema defined in schema config file.

containsVisitDetector(visit: int, detector: int) bool

Test whether data for a given visit-detector is present in the APDB.

Parameters:
visit, detectorint

The ID of the visit-detector to search for.

Returns:
presentbool

True if some DiaObject, DiaSource, or DiaForcedSource records exist for the specified observation, False otherwise.

countUnassociatedObjects() int

Return the number of DiaObjects that have only one DiaSource associated with them.

Used as part of ap_verify metrics.

Returns:
countint

Number of DiaObjects with exactly one associated DiaSource.

Notes

This method can be very inefficient or slow in some implementations.

dailyJob() None

Implement daily activities like cleanup/vacuum.

What should be done during daily activities is determined by specific implementation.

classmethod from_config(config: ApdbConfig) Apdb

Create Ppdb instance from configuration object.

Parameters:
configApdbConfig

Configuration object, type of this object determines type of the Apdb implementation.

Returns:
apdbapdb

Instance of Apdb class.

classmethod from_uri(uri: str | ParseResult | ResourcePath | Path) Apdb

Make Apdb instance from a serialized configuration.

Parameters:
uriResourcePathExpression

URI or local file path pointing to a file with serialized configuration, or a string with a “label:” prefix. In the latter case, the configuration will be looked up from an APDB index file using the label name that follows the prefix. The APDB index file’s location is determined by the DAX_APDB_INDEX_URI environment variable.

Returns:
apdbapdb

Instance of Apdb class, the type of the returned instance is determined by configuration.

getDiaForcedSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time) DataFrame | None

Return catalog of DiaForcedSource instances from a given region.

Parameters:
regionlsst.sphgeom.Region

Region to search for DIASources.

object_idsiterable [ int ], optional

List of DiaObject IDs to further constrain the set of returned sources. If list is empty then empty catalog is returned with a correct schema. If None then returned sources are not constrained. Some implementations may not support latter case.

visit_timeastropy.time.Time

Time of the current visit.

Returns:
catalogpandas.DataFrame, or None

Catalog containing DiaSource records. None is returned if read_forced_sources_months configuration parameter is set to 0.

Raises:
NotImplementedError

May be raised by some implementations if object_ids is None.

Notes

This method returns DiaForcedSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by read_forced_sources_months config parameter, w.r.t. visit_time. If object_ids is empty then an empty catalog is always returned with the correct schema (columns/types). If object_ids is None then no filtering is performed and some of the returned records may be outside the specified region.

getDiaObjects(region: Region) DataFrame

Return catalog of DiaObject instances from a given region.

This method returns only the last version of each DiaObject. Some records in a returned catalog may be outside the specified region, it is up to a client to ignore those records or cleanup the catalog before futher use.

Parameters:
regionlsst.sphgeom.Region

Region to search for DIAObjects.

Returns:
catalogpandas.DataFrame

Catalog containing DiaObject records for a region that may be a superset of the specified region.

getDiaSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time) DataFrame | None

Return catalog of DiaSource instances from a given region.

Parameters:
regionlsst.sphgeom.Region

Region to search for DIASources.

object_idsiterable [ int ], optional

List of DiaObject IDs to further constrain the set of returned sources. If None then returned sources are not constrained. If list is empty then empty catalog is returned with a correct schema.

visit_timeastropy.time.Time

Time of the current visit.

Returns:
catalogpandas.DataFrame, or None

Catalog containing DiaSource records. None is returned if read_sources_months configuration parameter is set to 0.

Notes

This method returns DiaSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by read_sources_months config parameter, w.r.t. visit_time. If object_ids is empty then an empty catalog is always returned with the correct schema (columns/types). If object_ids is None then no filtering is performed and some of the returned records may be outside the specified region.

getSSObjects() DataFrame

Return catalog of SSObject instances.

Returns:
catalogpandas.DataFrame

Catalog containing SSObject records, all existing records are returned.

get_replica() ApdbCassandraReplica

Return ApdbReplica instance for this database.

classmethod init_database(hosts: list[str], keyspace: str, *, schema_file: str | None = None, schema_name: str | None = None, read_sources_months: int | None = None, read_forced_sources_months: int | None = None, use_insert_id: bool = False, use_insert_id_skips_diaobjects: bool = False, port: int | None = None, username: str | None = None, prefix: str | None = None, part_pixelization: str | None = None, part_pix_level: int | None = None, time_partition_tables: bool = True, time_partition_start: str | None = None, time_partition_end: str | None = None, read_consistency: str | None = None, write_consistency: str | None = None, read_timeout: int | None = None, write_timeout: int | None = None, ra_dec_columns: list[str] | None = None, replication_factor: int | None = None, drop: bool = False) ApdbCassandraConfig

Initialize new APDB instance and make configuration object for it.

Parameters:
hostslist [str]

List of host names or IP addresses for Cassandra cluster.

keyspacestr

Name of the keyspace for APDB tables.

schema_filestr, optional

Location of (YAML) configuration file with APDB schema. If not specified then default location will be used.

schema_namestr, optional

Name of the schema in YAML configuration file. If not specified then default name will be used.

read_sources_monthsint, optional

Number of months of history to read from DiaSource.

read_forced_sources_monthsint, optional

Number of months of history to read from DiaForcedSource.

use_insert_idbool, optional

If True, make additional tables used for replication to PPDB.

use_insert_id_skips_diaobjectsbool, optional

If True then do not fill regular DiaObject table when use_insert_id is True.

portint, optional

Port number to use for Cassandra connections.

usernamestr, optional

User name for Cassandra connections.

prefixstr, optional

Optional prefix for all table names.

part_pixelizationstr, optional

Name of the MOC pixelization used for partitioning.

part_pix_levelint, optional

Pixelization level.

time_partition_tablesbool, optional

Create per-partition tables.

time_partition_startstr, optional

Starting time for per-partition tables, in yyyy-mm-ddThh:mm:ss format, in TAI.

time_partition_endstr, optional

Ending time for per-partition tables, in yyyy-mm-ddThh:mm:ss format, in TAI.

read_consistencystr, optional

Name of the consistency level for read operations.

write_consistencystr, optional

Name of the consistency level for write operations.

read_timeoutint, optional

Read timeout in seconds.

write_timeoutint, optional

Write timeout in seconds.

ra_dec_columnslist [str], optional

Names of ra/dec columns in DiaObject table.

replication_factorint, optional

Replication factor used when creating new keyspace, if keyspace already exists its replication factor is not changed.

dropbool, optional

If True then drop existing tables before re-creating the schema.

Returns:
configApdbCassandraConfig

Resulting configuration object for a created APDB instance.

classmethod makeField(doc: str) ConfigurableField

Make a ConfigurableField for Apdb.

Parameters:
docstr

Help text for the field.

Returns:
configurableFieldlsst.pex.config.ConfigurableField

A ConfigurableField for Apdb.

reassignDiaSources(idMap: Mapping[int, int]) None

Associate DiaSources with SSObjects, dis-associating them from DiaObjects.

Parameters:
idMapMapping

Maps DiaSource IDs to their new SSObject IDs.

Raises:
ValueError

Raised if DiaSource ID does not exist in the database.

store(visit_time: Time, objects: DataFrame, sources: DataFrame | None = None, forced_sources: DataFrame | None = None) None

Store all three types of catalogs in the database.

Parameters:
visit_timeastropy.time.Time

Time of the visit.

objectspandas.DataFrame

Catalog with DiaObject records.

sourcespandas.DataFrame, optional

Catalog with DiaSource records.

forced_sourcespandas.DataFrame, optional

Catalog with DiaForcedSource records.

Notes

This methods takes DataFrame catalogs, their schema must be compatible with the schema of APDB table:

  • column names must correspond to database table columns

  • types and units of the columns must match database definitions, no unit conversion is performed presently

  • columns that have default values in database schema can be omitted from catalog

  • this method knows how to fill interval-related columns of DiaObject (validityStart, validityEnd) they do not need to appear in a catalog

  • source catalogs have diaObjectId column associating sources with objects

storeSSObjects(objects: DataFrame) None

Store or update SSObject catalog.

Parameters:
objectspandas.DataFrame

Catalog with SSObject records.

Notes

If SSObjects with matching IDs already exist in the database, their records will be updated with the information from provided records.

tableDef(table: ApdbTables) Table | None

Return table schema definition for a given table.

Parameters:
tableApdbTables

One of the known APDB tables.

Returns:
tableSchemaschema_model.Table or None

Table schema description, None is returned if table is not defined by this implementation.