FastEHR.database.collector

Classes

SQLiteDataCollector

A class to interface with an SQLite database to collect and collate

Module Contents

class FastEHR.database.collector.SQLiteDataCollector(db_path: str)

A class to interface with an SQLite database to collect and collate patient records.

This class provides functionality for extracting structured patient data, aggregating medical events, and computing metadata for pre-processing from an SQLite database.

Inherits from:
  • Static - Handles static patient data, such as birth

    year and ethnicity.

  • Diagnoses - Handles diagnosis-related records.

  • Measurements - Handles event-based measurements, which may

    optionally include an associated value.

Attributes

db_pathstr

Path to the SQLite database file.

connectionsqlite3.Connection.

SQLite connection object, initialized when connect() is called.

cursorsqlite3.Cursor.

Cursor for executing SQL queries.

Methods

connect()

Establish the SQLite database connection.

disconnect()

Close the SQLite database connection.

_extract_distinct()

Extracts distinct values of a given column across multiple tables.

_extract_AGG()

Performs grouped aggregations over tables.

_t_digest_values()

Uses the t-digest algorithm to approximate percentiles of a given measurement. <https://github.com/tdunning/t-digest>

_generate_lazy_by_distinct()
Generates Polars LazyFrames for distinct patient or practice

identifiers.

_collate_lazy_tables()

Merges static and dynamic patient records into a single LazyFrame.

get_meta_information()
Collects metadata from the SQLite database, including distributions

of diagnoses and measurements.

Initializes the SQLiteDataCollector.

Parameters

db_pathstr

Path to the SQLite database file.

db_path
connection = None
cursor = None
connect()

Establishes a connection to the SQLite database.

If the connection is already established, this method does nothing.

Raises

sqlite3.Error

If an error occurs while connecting to the database.

disconnect()

Closes the SQLite database connection.

This method ensures that both the connection and cursor are properly

closed.

get_meta_information(practice_ids: list | None = None, static: bool = True, diagnoses: bool = True, measurement: bool = True) dict

Collects metadata from the SQLite database, such as distributions of diagnoses and measurements.

Parameters

practice_idslist, optional

List of practice IDs to filter metadata collection (default is None).

staticbool, optional

Whether to collect static patient information (default is True).

diagnosesbool, optional

Whether to collect diagnosis-related metadata (default is True).

measurementbool, optional

Whether to collect measurement-related metadata (default is True).

Returns

dict

A dictionary containing metadata tables.