FastEHR.database.collector¶
Classes¶
A class to interface with an SQLite database to collect and collate |
Module Contents¶
- class FastEHR.database.collector.SQLiteDataCollector(db_path: str)¶
A class to interface with an SQLite database to collect and collate patient records.
This class provides functionality for extracting structured patient data, aggregating medical events, and computing metadata for pre-processing from an SQLite database.
- Inherits from:
Static- Handles static patient data, such as birthyear and ethnicity.
Diagnoses- Handles diagnosis-related records.Measurements- Handles event-based measurements, which mayoptionally include an associated value.
Attributes¶
- db_pathstr
Path to the SQLite database file.
- connectionsqlite3.Connection.
SQLite connection object, initialized when connect() is called.
- cursorsqlite3.Cursor.
Cursor for executing SQL queries.
Methods¶
- connect()
Establish the SQLite database connection.
- disconnect()
Close the SQLite database connection.
- _extract_distinct()
Extracts distinct values of a given column across multiple tables.
- _extract_AGG()
Performs grouped aggregations over tables.
- _t_digest_values()
Uses the t-digest algorithm to approximate percentiles of a given measurement. <https://github.com/tdunning/t-digest>
- _generate_lazy_by_distinct()
- Generates Polars LazyFrames for distinct patient or practice
identifiers.
- _collate_lazy_tables()
Merges static and dynamic patient records into a single LazyFrame.
- get_meta_information()
- Collects metadata from the SQLite database, including distributions
of diagnoses and measurements.
Initializes the SQLiteDataCollector.
Parameters¶
- db_pathstr
Path to the SQLite database file.
- db_path¶
- connection = None¶
- cursor = None¶
- connect()¶
Establishes a connection to the SQLite database.
If the connection is already established, this method does nothing.
Raises¶
- sqlite3.Error
If an error occurs while connecting to the database.
- disconnect()¶
Closes the SQLite database connection.
- This method ensures that both the connection and cursor are properly
closed.
- get_meta_information(practice_ids: list | None = None, static: bool = True, diagnoses: bool = True, measurement: bool = True) dict¶
Collects metadata from the SQLite database, such as distributions of diagnoses and measurements.
Parameters¶
- practice_idslist, optional
List of practice IDs to filter metadata collection (default is None).
- staticbool, optional
Whether to collect static patient information (default is True).
- diagnosesbool, optional
Whether to collect diagnosis-related metadata (default is True).
- measurementbool, optional
Whether to collect measurement-related metadata (default is True).
Returns¶
- dict
A dictionary containing metadata tables.