ApdbCassandra¶
- class lsst.dax.apdb.ApdbCassandra(config: ApdbCassandraConfig)¶
Bases:
Apdb
Implementation of APDB database on to of Apache Cassandra.
The implementation is configured via standard
pex_config
mechanism usingApdbCassandraConfig
configuration class. For an example of different configurations check config/ folder.- Parameters:
- config
ApdbCassandraConfig
Configuration object.
- config
Attributes Summary
Object controlling access to APDB metadata (
ApdbMetadata
).Name of the metadata key to store code version number.
Name of the metadata key to store code version number.
Name of the metadata key to store replica code version number.
Name of the metadata key to store schema version number.
Start time for partition 0, this should never be changed.
Methods Summary
Return version number for current APDB implementation.
Return schema version number as defined in config file.
containsVisitDetector
(visit, detector)Test whether data for a given visit-detector is present in the APDB.
Return the number of DiaObjects that have only one DiaSource associated with them.
dailyJob
()Implement daily activities like cleanup/vacuum.
from_config
(config)Create Ppdb instance from configuration object.
from_uri
(uri)Make Apdb instance from a serialized configuration.
getDiaForcedSources
(region, object_ids, ...)Return catalog of DiaForcedSource instances from a given region.
getDiaObjects
(region)Return catalog of DiaObject instances from a given region.
getDiaSources
(region, object_ids, visit_time)Return catalog of DiaSource instances from a given region.
Return catalog of SSObject instances.
Return
ApdbReplica
instance for this database.init_database
(hosts, keyspace, *[, ...])Initialize new APDB instance and make configuration object for it.
makeField
(doc)Make a
ConfigurableField
for Apdb.reassignDiaSources
(idMap)Associate DiaSources with SSObjects, dis-associating them from DiaObjects.
store
(visit_time, objects[, sources, ...])Store all three types of catalogs in the database.
storeSSObjects
(objects)Store or update SSObject catalog.
tableDef
(table)Return table schema definition for a given table.
Attributes Documentation
- metadata¶
- metadataCodeVersionKey = 'version:ApdbCassandra'¶
Name of the metadata key to store code version number.
- metadataConfigKey = 'config:apdb-cassandra.json'¶
Name of the metadata key to store code version number.
- metadataReplicaVersionKey = 'version:ApdbCassandraReplica'¶
Name of the metadata key to store replica code version number.
- metadataSchemaVersionKey = 'version:schema'¶
Name of the metadata key to store schema version number.
- partition_zero_epoch = <Time object: scale='tai' format='unix_tai' value=0.0>¶
Start time for partition 0, this should never be changed.
Methods Documentation
- classmethod apdbImplementationVersion() VersionTuple ¶
Return version number for current APDB implementation.
- Returns:
- version
VersionTuple
Version of the code defined in implementation class.
- version
- apdbSchemaVersion() VersionTuple ¶
Return schema version number as defined in config file.
- Returns:
- version
VersionTuple
Version of the schema defined in schema config file.
- version
- containsVisitDetector(visit: int, detector: int) bool ¶
Test whether data for a given visit-detector is present in the APDB.
- countUnassociatedObjects() int ¶
Return the number of DiaObjects that have only one DiaSource associated with them.
Used as part of ap_verify metrics.
- Returns:
- count
int
Number of DiaObjects with exactly one associated DiaSource.
- count
Notes
This method can be very inefficient or slow in some implementations.
- dailyJob() None ¶
Implement daily activities like cleanup/vacuum.
What should be done during daily activities is determined by specific implementation.
- classmethod from_config(config: ApdbConfig) Apdb ¶
Create Ppdb instance from configuration object.
- Parameters:
- config
ApdbConfig
Configuration object, type of this object determines type of the Apdb implementation.
- config
- Returns:
- apdb
apdb
Instance of
Apdb
class.
- apdb
- classmethod from_uri(uri: str | ParseResult | ResourcePath | Path) Apdb ¶
Make Apdb instance from a serialized configuration.
- Parameters:
- uri
ResourcePathExpression
URI or local file path pointing to a file with serialized configuration, or a string with a “label:” prefix. In the latter case, the configuration will be looked up from an APDB index file using the label name that follows the prefix. The APDB index file’s location is determined by the
DAX_APDB_INDEX_URI
environment variable.
- uri
- Returns:
- apdb
apdb
Instance of
Apdb
class, the type of the returned instance is determined by configuration.
- apdb
- getDiaForcedSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time) DataFrame | None ¶
Return catalog of DiaForcedSource instances from a given region.
- Parameters:
- region
lsst.sphgeom.Region
Region to search for DIASources.
- object_idsiterable [
int
], optional List of DiaObject IDs to further constrain the set of returned sources. If list is empty then empty catalog is returned with a correct schema. If
None
then returned sources are not constrained. Some implementations may not support latter case.- visit_time
astropy.time.Time
Time of the current visit.
- region
- Returns:
- catalog
pandas.DataFrame
, orNone
Catalog containing DiaSource records.
None
is returned ifread_forced_sources_months
configuration parameter is set to 0.
- catalog
- Raises:
- NotImplementedError
May be raised by some implementations if
object_ids
isNone
.
Notes
This method returns DiaForcedSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by
read_forced_sources_months
config parameter, w.r.t.visit_time
. Ifobject_ids
is empty then an empty catalog is always returned with the correct schema (columns/types). Ifobject_ids
isNone
then no filtering is performed and some of the returned records may be outside the specified region.
- getDiaObjects(region: Region) DataFrame ¶
Return catalog of DiaObject instances from a given region.
This method returns only the last version of each DiaObject. Some records in a returned catalog may be outside the specified region, it is up to a client to ignore those records or cleanup the catalog before futher use.
- Parameters:
- region
lsst.sphgeom.Region
Region to search for DIAObjects.
- region
- Returns:
- catalog
pandas.DataFrame
Catalog containing DiaObject records for a region that may be a superset of the specified region.
- catalog
- getDiaSources(region: Region, object_ids: Iterable[int] | None, visit_time: Time) DataFrame | None ¶
Return catalog of DiaSource instances from a given region.
- Parameters:
- region
lsst.sphgeom.Region
Region to search for DIASources.
- object_idsiterable [
int
], optional List of DiaObject IDs to further constrain the set of returned sources. If
None
then returned sources are not constrained. If list is empty then empty catalog is returned with a correct schema.- visit_time
astropy.time.Time
Time of the current visit.
- region
- Returns:
- catalog
pandas.DataFrame
, orNone
Catalog containing DiaSource records.
None
is returned ifread_sources_months
configuration parameter is set to 0.
- catalog
Notes
This method returns DiaSource catalog for a region with additional filtering based on DiaObject IDs. Only a subset of DiaSource history is returned limited by
read_sources_months
config parameter, w.r.t.visit_time
. Ifobject_ids
is empty then an empty catalog is always returned with the correct schema (columns/types). Ifobject_ids
isNone
then no filtering is performed and some of the returned records may be outside the specified region.
- getSSObjects() DataFrame ¶
Return catalog of SSObject instances.
- Returns:
- catalog
pandas.DataFrame
Catalog containing SSObject records, all existing records are returned.
- catalog
- get_replica() ApdbCassandraReplica ¶
Return
ApdbReplica
instance for this database.
- classmethod init_database(hosts: list[str], keyspace: str, *, schema_file: str | None = None, schema_name: str | None = None, read_sources_months: int | None = None, read_forced_sources_months: int | None = None, use_insert_id: bool = False, replica_skips_diaobjects: bool = False, port: int | None = None, username: str | None = None, prefix: str | None = None, part_pixelization: str | None = None, part_pix_level: int | None = None, time_partition_tables: bool = True, time_partition_start: str | None = None, time_partition_end: str | None = None, read_consistency: str | None = None, write_consistency: str | None = None, read_timeout: int | None = None, write_timeout: int | None = None, ra_dec_columns: list[str] | None = None, replication_factor: int | None = None, drop: bool = False) ApdbCassandraConfig ¶
Initialize new APDB instance and make configuration object for it.
- Parameters:
- hosts
list
[str
] List of host names or IP addresses for Cassandra cluster.
- keyspace
str
Name of the keyspace for APDB tables.
- schema_file
str
, optional Location of (YAML) configuration file with APDB schema. If not specified then default location will be used.
- schema_name
str
, optional Name of the schema in YAML configuration file. If not specified then default name will be used.
- read_sources_months
int
, optional Number of months of history to read from DiaSource.
- read_forced_sources_months
int
, optional Number of months of history to read from DiaForcedSource.
- use_insert_id
bool
, optional If True, make additional tables used for replication to PPDB.
- replica_skips_diaobjects
bool
, optional If
True
then do not fill regularDiaObject
table whenuse_insert_id
isTrue
.- port
int
, optional Port number to use for Cassandra connections.
- username
str
, optional User name for Cassandra connections.
- prefix
str
, optional Optional prefix for all table names.
- part_pixelization
str
, optional Name of the MOC pixelization used for partitioning.
- part_pix_level
int
, optional Pixelization level.
- time_partition_tables
bool
, optional Create per-partition tables.
- time_partition_start
str
, optional Starting time for per-partition tables, in yyyy-mm-ddThh:mm:ss format, in TAI.
- time_partition_end
str
, optional Ending time for per-partition tables, in yyyy-mm-ddThh:mm:ss format, in TAI.
- read_consistency
str
, optional Name of the consistency level for read operations.
- write_consistency
str
, optional Name of the consistency level for write operations.
- read_timeout
int
, optional Read timeout in seconds.
- write_timeout
int
, optional Write timeout in seconds.
- ra_dec_columns
list
[str
], optional Names of ra/dec columns in DiaObject table.
- replication_factor
int
, optional Replication factor used when creating new keyspace, if keyspace already exists its replication factor is not changed.
- drop
bool
, optional If
True
then drop existing tables before re-creating the schema.
- hosts
- Returns:
- config
ApdbCassandraConfig
Resulting configuration object for a created APDB instance.
- config
- classmethod makeField(doc: str) ConfigurableField ¶
Make a
ConfigurableField
for Apdb.- Parameters:
- doc
str
Help text for the field.
- doc
- Returns:
- configurableField
lsst.pex.config.ConfigurableField
A
ConfigurableField
for Apdb.
- configurableField
- reassignDiaSources(idMap: Mapping[int, int]) None ¶
Associate DiaSources with SSObjects, dis-associating them from DiaObjects.
- Parameters:
- idMap
Mapping
Maps DiaSource IDs to their new SSObject IDs.
- idMap
- Raises:
- ValueError
Raised if DiaSource ID does not exist in the database.
- store(visit_time: Time, objects: DataFrame, sources: DataFrame | None = None, forced_sources: DataFrame | None = None) None ¶
Store all three types of catalogs in the database.
- Parameters:
- visit_time
astropy.time.Time
Time of the visit.
- objects
pandas.DataFrame
Catalog with DiaObject records.
- sources
pandas.DataFrame
, optional Catalog with DiaSource records.
- forced_sources
pandas.DataFrame
, optional Catalog with DiaForcedSource records.
- visit_time
Notes
This methods takes DataFrame catalogs, their schema must be compatible with the schema of APDB table:
column names must correspond to database table columns
types and units of the columns must match database definitions, no unit conversion is performed presently
columns that have default values in database schema can be omitted from catalog
this method knows how to fill interval-related columns of DiaObject (validityStart, validityEnd) they do not need to appear in a catalog
source catalogs have
diaObjectId
column associating sources with objects
- storeSSObjects(objects: DataFrame) None ¶
Store or update SSObject catalog.
- Parameters:
- objects
pandas.DataFrame
Catalog with SSObject records.
- objects
Notes
If SSObjects with matching IDs already exist in the database, their records will be updated with the information from provided records.
- tableDef(table: ApdbTables) Table | None ¶
Return table schema definition for a given table.
- Parameters:
- table
ApdbTables
One of the known APDB tables.
- table
- Returns:
- tableSchema
schema_model.Table
orNone
Table schema description,
None
is returned if table is not defined by this implementation.
- tableSchema