You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Population object is accessed via db.population. It provides methods for querying all patients, sequences, and frequencies across the entire database — i.e., the full study population.
# Default: list of PatientInstance objectspatients=db.population.patients()
forpinpatients[:3]:
print(p.id)
# Simple list of patient ID stringspatient_ids=db.population.patients(as_list=True)
# Pandas DataFrame with both identifiersdf=db.population.patients(as_pandas=True, with_ids=True)
df.head()
Returns all pre-calculated transitive sequences from the sequences table for the full population. Observation code IDs are translated to their string codes.
Parameters
Parameter
Type
Default
Description
as_pandas
bool
False
If True, returns a Pandas DataFrame.
as_iterator
bool
False
If True, returns a generator that yields one dictionary per row. Useful for large result sets to avoid loading everything into memory.
Return Columns
Column
Type
Description
patient_id
str
The patient's string identifier.
obs_code_1
str
The first observation code in the sequence.
obs_code_2
str
The second observation code in the sequence.
time_diff
int
Temporal distance in days (or bucket number if buckets were used).
occurrence_count
int
Number of times this sequence was observed for this patient.
Examples
# List of dictsseqs=db.population.sequences()
# Pandas DataFrameseqs_df=db.population.sequences(as_pandas=True)
seqs_df.head()
# Memory-efficient iterator for large datasetsforseqindb.population.sequences(as_iterator=True):
print(seq["patient_id"], seq["obs_code_1"], "→", seq["obs_code_2"])
Returns population-level frequency statistics from the pre-calculated frequencies table. Results can be filtered by observation codes.
Parameters
Parameter
Type
Default
Description
observation1
str or list[str]
None
Filter by obs_code_1. A single string or a list of strings. If None, all values are included.
observation2
str or list[str]
None
Filter by obs_code_2. A single string or a list of strings. If None, all values are included.
as_pandas
bool
False
If True, returns a Pandas DataFrame.
as_iterator
bool
False
If True, returns a generator yielding one dictionary per row.
with_ids
bool
False
If True, returns raw integer obs_code_id values instead of translated string codes.
Filter Logic
Parameters are AND-ed: passing both observation1 and observation2 returns only rows matching both.
When a parameter is a list, the values within it are OR-ed (implemented as SQL IN).
All observation codes are validated upfront. A KeyError is raised immediately if any code does not exist in lookup_observations, with the invalid code(s) named in the error message.
If filters are valid but no matching rows exist, an empty list or DataFrame is returned (not an error).
Return Columns
Column
Type
Description
obs_code_1
str or int
First observation code (string by default; integer if with_ids=True).
obs_code_2
str or int
Second observation code (string by default; integer if with_ids=True).
temporal_distance
int
Days between the two observations (or bucket number).
observation_cnt
int
Total number of times this sequence was observed across all patients.
patient_cnt
int
Number of distinct patients who have this sequence.
Examples
# All frequencies as a DataFramedf=db.population.frequencies(as_pandas=True)
# Filter by a single obs_code_1df=db.population.frequencies(observation1="428.0", as_pandas=True)
# Filter by multiple obs_code_1 values (OR logic within the list)df=db.population.frequencies(observation1=["428.0", "250.00"], as_pandas=True)
# Filter by both obs_code_1 AND obs_code_2df=db.population.frequencies(observation1="428.0", observation2="metoprolol", as_pandas=True)
# Memory-efficient iteratorforfreqindb.population.frequencies(as_iterator=True):
print(freq["obs_code_1"], "→", freq["obs_code_2"], ":", freq["patient_cnt"], "patients")
# Return raw integer IDs instead of string codesdf=db.population.frequencies(with_ids=True, as_pandas=True)
# KeyError example — invalid code raises immediatelytry:
db.population.frequencies(observation1="INVALID_CODE")
exceptKeyErrorase:
print(e) # "Observation code(s) not found in observation1: INVALID_CODE"