The Problem
By the Numbers
Clinical documentation is the single largest contributor to physician burnout in the United States. According to the American Medical Association, physicians spend an average of 2 hours and 6 minutes on clinical documentation for every hour of direct patient care. In emergency departments, this ratio is even worse — ER physicians spend up to 44% of their shift on documentation rather than treating patients. The Medscape National Physician Burnout & Suicide Report consistently identifies paperwork and administrative tasks as the leading cause of burnout, with 70% of burned-out physicians citing documentation burden as a primary factor.
The consequences extend far beyond physician well-being. Communication failures during patient handoffs — when one physician transfers care to another at shift change — account for 80% of serious medical errors according to The Joint Commission. These handoff errors lead to adverse drug events, delayed diagnoses, and preventable deaths. A single patient in a hospital system may accumulate hundreds of pages of clinical records across multiple providers, departments, and facilities. When a new physician inherits that patient at 3 AM, they need a concise, accurate summary — not a 200-page chart to read through.
The root cause is that clinical notes are unstructured free text. A physician dictating a progress note might write: "Pt is a 67 y/o M c/o SOB x 3 days, hx of CHF, COPD, on metoprolol 50mg BID, lisinopril 10mg daily, albuterol PRN. Labs: BNP 890, Cr 1.8, K+ 5.1." This single sentence contains a diagnosis (CHF, COPD), medications with dosages, laboratory values, and symptom history — all encoded in medical abbreviations and shorthand that varies between physicians, specialties, and institutions. Extracting structured data from this free text is exactly the type of task where LLMs excel.
Clinical Note Formats
Clinical documentation comes in several standard formats, each serving a different purpose in patient care. Understanding these formats is essential for building an effective summarization pipeline:
Emergency Department Notes follow a pattern of chief complaint, history of present illness (HPI), review of systems (ROS), physical exam findings, assessment, and plan. They are typically detailed and time-pressured, written during or immediately after the encounter:
-- Sample ER Note (synthetic, not real patient data) --
CHIEF COMPLAINT: Chest pain, shortness of breath
HPI: 67 y/o male presents to ED via EMS with acute onset
substernal chest pain radiating to left arm x 2 hours.
Associated with diaphoresis, nausea, and SOB. Pain rated
8/10, not relieved by rest. Hx of HTN, DM2, hyperlipidemia.
Takes metformin 1000mg BID, atorvastatin 40mg daily,
lisinopril 20mg daily. Allergies: PCN (rash), sulfa (hives).
VITALS: BP 158/92, HR 104, RR 22, SpO2 94% on RA, Temp 98.6F
EXAM: Alert, diaphoretic, moderate distress. Lungs: bilateral
crackles at bases. Heart: tachycardic, regular rhythm,
no murmurs. Abdomen: soft, non-tender.
LABS: Troponin I 2.4 ng/mL (H), BNP 1240 pg/mL (H),
Cr 1.6 mg/dL (H), K+ 4.8, Glucose 234 mg/dL (H),
WBC 11.2, Hgb 13.1, Plt 198
ECG: ST elevation leads II, III, aVF. Sinus tachycardia.
ASSESSMENT: STEMI — inferior wall. Acute decompensated CHF.
PLAN: Cardiology consult for emergent cardiac cath. Heparin
drip initiated. Aspirin 325mg, clopidogrel 600mg load.
IV furosemide 40mg. Admit to CCU.
Discharge Summaries provide a comprehensive overview of a hospital stay, including the reason for admission, hospital course, procedures performed, discharge medications, and follow-up instructions. They are the primary document for care continuity:
-- Sample Discharge Summary (synthetic data) --
ADMISSION DATE: 2025-01-15 DISCHARGE DATE: 2025-01-20
ADMITTING DIAGNOSIS: Acute inferior STEMI (ICD-10: I21.19)
DISCHARGE DIAGNOSES:
1. Acute inferior STEMI, s/p PCI with DES to RCA (I21.19)
2. Acute decompensated heart failure (I50.21)
3. Type 2 diabetes mellitus, uncontrolled (E11.65)
4. Hypertension (I10)
5. Acute kidney injury, stage 1, resolved (N17.9)
HOSPITAL COURSE:
Patient underwent emergent cardiac catheterization on
admission revealing 95% occlusion of the RCA. Successful
PCI with drug-eluting stent placement. Post-procedure
troponin peaked at 18.2 ng/mL. Developed AKI (Cr 2.1)
which resolved with IV fluids (Cr 1.2 at discharge).
CHF managed with IV diuresis, transitioned to oral.
A1C 9.2% - endocrine consulted, insulin regimen adjusted.
DISCHARGE MEDICATIONS:
1. Aspirin 81mg daily (DO NOT STOP - stent)
2. Clopidogrel 75mg daily x 12 months (DO NOT STOP - stent)
3. Atorvastatin 80mg daily (increased from 40mg)
4. Metoprolol succinate 50mg daily (NEW)
5. Lisinopril 10mg daily (decreased from 20mg - AKI)
6. Metformin 1000mg BID (HOLD if Cr > 1.5)
7. Insulin glargine 20 units at bedtime (NEW)
8. Furosemide 40mg daily (NEW)
9. KCl 20mEq daily (NEW - with furosemide)
FOLLOW-UP:
Cardiology: 1 week
PCP: 2 weeks
Endocrinology: 1 month
Cardiac rehab referral placed
Progress Notes are daily updates written by the care team during hospitalization. They typically follow the SOAP format (Subjective, Objective, Assessment, Plan) and capture the patient's day-to-day trajectory. Lab Reports contain structured numerical data but are often embedded in narrative text with physician interpretations. Consultation Notes document specialist evaluations and recommendations. A complete patient record weaves all these note types together chronologically, and the summarization pipeline must handle each format appropriately.
Solution Architecture
Pipeline Overview
The medical record summarization pipeline processes unstructured clinical notes through six sequential stages, each building on the output of the previous one. The pipeline is designed to be modular — each stage can be independently tested, validated, and improved without affecting the others. This modularity is critical in healthcare settings where regulatory requirements demand auditability of each processing step.
Stage 1: Clinical Text Preprocessing — Raw clinical notes are de-identified (PHI removal per HIPAA Safe Harbor), normalized (abbreviation expansion, unit standardization), and segmented into logical sections. De-identification is mandatory before any LLM processing — sending Protected Health Information to a cloud LLM API violates HIPAA unless a Business Associate Agreement is in place.
Stage 2: Medical Entity Extraction — Named Entity Recognition identifies medications (drug name, dosage, route, frequency), diagnoses (with ICD-10 code mapping), allergies (substance + reaction), vital signs, laboratory values (with normal range flagging), and procedures. We use a hybrid approach: regex patterns catch well-formatted entities, while the LLM catches entities in narrative text that regex would miss.
Stage 3: SOAP Note Structuring — The LLM converts free-text clinical notes into standardized SOAP format. Subjective: patient-reported symptoms, history, complaints. Objective: measurable findings — vitals, lab values, exam findings. Assessment: clinical interpretation, diagnoses, differential diagnosis. Plan: treatment decisions, medications, follow-up, referrals. Structured output ensures every SOAP note follows an identical schema.
Stage 4: Drug Interaction Checking — The extracted medication list is cross-referenced against a drug interaction database. The system flags dangerous combinations (e.g., warfarin + aspirin = bleeding risk, metformin + contrast dye = lactic acidosis risk, ACE inhibitors + potassium supplements = hyperkalemia risk). Each interaction includes severity level, clinical significance, and recommended action.
Stage 5: Patient Timeline Generation — All encounters are organized chronologically with key events highlighted: admissions, discharges, medication changes, procedures, significant lab result changes, and new diagnoses. The timeline provides a rapid visual overview of the patient's clinical trajectory.
Stage 6: Handoff Summary Generation — The pipeline produces a concise shift-change summary designed for the I-PASS framework (Illness severity, Patient summary, Action list, Situation awareness, Synthesis by receiver). This summary distills hundreds of pages into a 1-2 page brief that the incoming physician can absorb in under 3 minutes.
System Diagram
Clinical Text Preprocessing
De-identification
Before any clinical text touches an LLM API, all Protected Health Information (PHI) must be removed. HIPAA defines 18 categories of PHI that must be de-identified under the Safe Harbor method: names, dates (except year), phone numbers, geographic data smaller than state, fax numbers, email addresses, SSNs, medical record numbers, health plan numbers, account numbers, certificate/license numbers, vehicle identifiers, device identifiers, URLs, IP addresses, biometric identifiers, full-face photos, and any other unique identifying number. Failure to de-identify constitutes a HIPAA violation with fines up to $1.5 million per category per year.
The de-identification pipeline uses a combination of regex patterns for structured identifiers and NER models for names and locations. Detected PHI is replaced with typed placeholders that preserve the semantic structure of the note:
import re
from typing import Dict, List, Tuple
# --- HIPAA Safe Harbor De-identification ---
PHI_PATTERNS: Dict[str, str] = {
# Names (simple pattern - production uses NER)
"name": r"\b(?:Dr\.|Mr\.|Mrs\.|Ms\.)\s+[A-Z][a-z]+(?:\s+[A-Z][a-z]+)+",
# Dates (MM/DD/YYYY, YYYY-MM-DD, Month DD, YYYY)
"date": r"\b\d{1,2}/\d{1,2}/\d{2,4}\b|\b\d{4}-\d{2}-\d{2}\b",
# Phone numbers
"phone": r"\b\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b",
# SSN
"ssn": r"\b\d{3}-\d{2}-\d{4}\b",
# Medical Record Numbers (MRN)
"mrn": r"\bMRN[:#\s]*\d{6,10}\b",
# Email
"email": r"\b[\w.+-]+@[\w-]+\.[\w.-]+\b",
}
def deidentify(text: str) -> Tuple[str, List[dict]]:
"""Remove PHI from clinical text. Returns cleaned text and audit log."""
redactions = []
cleaned = text
for phi_type, pattern in PHI_PATTERNS.items():
for match in re.finditer(pattern, cleaned):
redactions.append({
"type": phi_type,
"original": match.group(),
"position": match.span(),
})
placeholder = f"[{phi_type.upper()}]"
cleaned = re.sub(pattern, placeholder, cleaned)
return cleaned, redactions
De-identification is not optional. Sending raw clinical notes containing patient names, dates of birth, or medical record numbers to a cloud LLM API without a signed Business Associate Agreement (BAA) constitutes a federal violation. Always de-identify first, or use an on-premises LLM deployment with appropriate security controls.
Normalization
Medical text is notoriously inconsistent. The same medication might appear as "metoprolol", "Lopressor", "metoprolol tartrate 50mg PO BID", or "metop 50 bid". Abbreviations vary by physician and institution — "SOB" means shortness of breath, "HTN" means hypertension, "DM2" means type 2 diabetes mellitus. The normalization step standardizes these variations so downstream extraction is more accurate:
# --- Medical Abbreviation Expansion ---
MEDICAL_ABBREVIATIONS: Dict[str, str] = {
"SOB": "shortness of breath",
"HTN": "hypertension",
"DM": "diabetes mellitus",
"DM2": "type 2 diabetes mellitus",
"CHF": "congestive heart failure",
"COPD": "chronic obstructive pulmonary disease",
"CAD": "coronary artery disease",
"MI": "myocardial infarction",
"CVA": "cerebrovascular accident (stroke)",
"DVT": "deep vein thrombosis",
"PE": "pulmonary embolism",
"STEMI": "ST-elevation myocardial infarction",
"AKI": "acute kidney injury",
"CKD": "chronic kidney disease",
"BID": "twice daily",
"TID": "three times daily",
"QID": "four times daily",
"PRN": "as needed",
"PO": "by mouth",
"IV": "intravenous",
"IM": "intramuscular",
"SubQ": "subcutaneous",
"PCN": "penicillin",
"Cr": "creatinine",
"BNP": "brain natriuretic peptide",
"Hgb": "hemoglobin",
"Plt": "platelets",
"WBC": "white blood cell count",
"SpO2": "oxygen saturation",
"RA": "room air",
"c/o": "complaining of",
"s/p": "status post",
"hx": "history",
"dx": "diagnosis",
"tx": "treatment",
"rx": "prescription",
}
def expand_abbreviations(text: str) -> str:
"""Expand common medical abbreviations in clinical text."""
result = text
for abbr, expansion in MEDICAL_ABBREVIATIONS.items():
pattern = r"\b" + re.escape(abbr) + r"\b"
result = re.sub(pattern, f"{abbr} ({expansion})", result)
return result
In production, normalization also includes mapping brand drug names to generic names (Lopressor to metoprolol), converting units to a standard system (mg/dL, mmol/L), and resolving institution-specific abbreviations. Medical ontologies like UMLS (Unified Medical Language System), RxNorm (for medications), and SNOMED CT (for clinical terms) provide the canonical mappings.
Medical Entity Extraction
Regex-Based Extraction
The first layer of entity extraction uses carefully crafted regex patterns to capture well-formatted medical data. Regex is fast, deterministic, and produces no hallucinations — making it ideal for structured elements like vital signs, lab values with units, and medication dosages that follow predictable patterns. However, regex fails on narrative text where entities are embedded in prose, which is where the LLM layer takes over.
import re
from dataclasses import dataclass
# --- Structured Entity Classes ---
@dataclass
class Medication:
name: str
dose: str
route: str
frequency: str
status: str = "active" # active, discontinued, hold
@dataclass
class LabResult:
test_name: str
value: float
unit: str
flag: str = "normal" # normal, high, low, critical
@dataclass
class Diagnosis:
description: str
icd10_code: str
status: str = "active" # active, resolved, chronic
# --- Regex Extraction Patterns ---
MEDICATION_PATTERN = re.compile(
r"(\w+(?:\s+\w+)?)\s+" # drug name
r"(\d+(?:\.\d+)?\s*(?:mg|mcg|g|mL|units?))\s*" # dose + unit
r"(?:(PO|IV|IM|SubQ|topical|inhaled)\s*)?" # route (optional)
r"(daily|BID|TID|QID|PRN|at bedtime|q\d+h)", # frequency
re.IGNORECASE
)
VITAL_SIGN_PATTERNS = {
"blood_pressure": re.compile(r"BP\s*:?\s*(\d{2,3}/\d{2,3})"),
"heart_rate": re.compile(r"HR\s*:?\s*(\d{2,3})"),
"respiratory": re.compile(r"RR\s*:?\s*(\d{1,2})"),
"spo2": re.compile(r"SpO2\s*:?\s*(\d{2,3})%?"),
"temperature": re.compile(r"Temp\s*:?\s*(\d{2,3}(?:\.\d)?)\s*F?"),
}
LAB_PATTERN = re.compile(
r"(Troponin\s*I?|BNP|Cr|K\+?|Glucose|WBC|Hgb|Plt|A1C|Na|Cl|BUN|AST|ALT)"
r"\s*:?\s*"
r"(\d+(?:\.\d+)?)\s*"
r"(ng/mL|pg/mL|mg/dL|mmol/L|mEq/L|%|K/uL|g/dL)?",
re.IGNORECASE
)
def extract_medications(text: str) -> List[Medication]:
"""Extract medications with dosage, route, and frequency."""
meds = []
for match in MEDICATION_PATTERN.finditer(text):
meds.append(Medication(
name=match.group(1).strip(),
dose=match.group(2).strip(),
route=match.group(3) or "PO",
frequency=match.group(4).strip(),
))
return meds
def extract_labs(text: str) -> List[LabResult]:
"""Extract lab results with values and units."""
labs = []
for match in LAB_PATTERN.finditer(text):
labs.append(LabResult(
test_name=match.group(1).strip(),
value=float(match.group(2)),
unit=match.group(3) or "",
))
return labs
LLM-Based Entity Extraction
The LLM layer handles entities embedded in narrative text that regex cannot reach. A physician might write "Patient was started on a beta-blocker for rate control" without ever specifying the drug name, or "Labs improved from yesterday" without listing specific values. The LLM uses clinical context and medical knowledge to extract these implicit entities and link them to the structured data already captured by regex.
We use structured output (JSON mode) to ensure the LLM returns entities in a predictable schema that can be programmatically merged with regex-extracted entities:
from openai import OpenAI
import json
client = OpenAI()
ENTITY_EXTRACTION_PROMPT = """You are a clinical NLP system. Extract ALL medical
entities from the following clinical note. Return valid JSON with this schema:
{
"medications": [
{"name": "...", "dose": "...", "route": "...", "frequency": "...",
"status": "active|discontinued|hold", "confidence": 0.0-1.0}
],
"diagnoses": [
{"description": "...", "icd10_code": "...",
"status": "active|resolved|chronic", "confidence": 0.0-1.0}
],
"allergies": [
{"substance": "...", "reaction": "...", "severity": "mild|moderate|severe"}
],
"labs": [
{"test": "...", "value": "...", "unit": "...",
"flag": "normal|high|low|critical"}
],
"vitals": {
"bp": "...", "hr": "...", "rr": "...", "spo2": "...", "temp": "..."
},
"procedures": [
{"name": "...", "date": "...", "details": "..."}
]
}
IMPORTANT: Assign confidence scores (0.0-1.0) based on how explicitly
the entity appears in the text. Directly stated = 0.95+. Inferred from
context = 0.6-0.8. Uncertain = below 0.5 (flag for clinician review)."""
def extract_entities_llm(clinical_note: str) -> dict:
"""Use GPT-4o to extract structured medical entities."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": ENTITY_EXTRACTION_PROMPT},
{"role": "user", "content": clinical_note},
],
response_format={"type": "json_object"},
temperature=0.0, # Zero temperature for deterministic extraction
max_tokens=2048,
)
return json.loads(response.choices[0].message.content)
Every extracted entity includes a confidence score from 0.0 to 1.0. Entities below 0.8 are flagged for clinician review before being added to the structured record. This is critical in healthcare — an incorrectly extracted medication or diagnosis could lead to patient harm. The confidence score creates a triage mechanism: high-confidence extractions are auto-populated, medium-confidence ones are highlighted for quick verification, and low-confidence ones are presented as suggestions requiring explicit approval.
The hybrid approach (regex + LLM) consistently outperforms either method alone. In our benchmarks on synthetic clinical notes, regex alone achieves 78% recall for medications but only 52% for diagnoses embedded in narrative text. The LLM alone achieves 93% recall but occasionally hallucinates medications not present in the note. The hybrid system achieves 91% medication accuracy and 87% diagnosis accuracy by using regex as a high-precision anchor and the LLM for high-recall narrative extraction, with cross-validation between the two layers.
SOAP Note Generation
The SOAP Format
SOAP notes are the universal standard for clinical documentation in medicine. Every medical student, resident, and attending physician learns this format. Standardizing all clinical notes into SOAP format means that any physician can quickly scan a note and find exactly what they need — the patient's symptoms are always in Subjective, lab values are always in Objective, and the treatment plan is always in Plan. This consistency is what makes SOAP notes invaluable for handoffs and cross-provider communication.
The four SOAP components serve distinct clinical purposes:
Subjective (S) — What the patient reports: chief complaint, history of present illness (HPI), symptom onset/duration/severity/character, associated symptoms, relevant past medical history, current medications as reported by patient, allergies, social history (smoking, alcohol, occupation), family history. This section captures the patient's perspective and narrative.
Objective (O) — Measurable clinical data: vital signs (BP, HR, RR, SpO2, Temp), physical exam findings organized by system (HEENT, cardiovascular, respiratory, abdominal, neurological, musculoskeletal, skin), laboratory results with abnormal flags, imaging results, ECG interpretation. This section contains only facts that can be independently verified.
Assessment (A) — The clinician's clinical reasoning: primary diagnosis with ICD-10 code, secondary diagnoses, differential diagnosis (alternative possibilities being considered), clinical stage or severity classification, response to current treatment, prognosis. This section represents the physician's expert judgment synthesizing Subjective and Objective data.
Plan (P) — Actions to be taken: new medications (with dose, route, frequency), medication changes (increased, decreased, discontinued), procedures ordered, consults requested, diagnostic tests ordered, patient education provided, follow-up schedule, disposition (admit, discharge, transfer), goals of care. This section is the actionable output.
Prompt Design for SOAP Generation
SOAP_GENERATION_PROMPT = """You are an expert clinical documentation assistant.
Convert the following clinical note into a structured SOAP note.
RULES:
1. Place information in the correct SOAP section. If unsure, flag it.
2. Include ALL medications with exact dosages from the source note.
3. Flag any abnormal lab values with (H) for high or (L) for low.
4. Include ICD-10 codes for all diagnoses when identifiable.
5. Do NOT infer information not present in the source note.
6. If information for a section is missing, write "Not documented".
7. Preserve clinical precision -- do not round lab values or
paraphrase exact medication orders.
OUTPUT FORMAT (JSON):
{
"subjective": {
"chief_complaint": "...",
"hpi": "...",
"past_medical_history": ["..."],
"medications_reported": ["..."],
"allergies": [{"substance": "...", "reaction": "..."}],
"social_history": "...",
"review_of_systems": "..."
},
"objective": {
"vitals": {"bp": "...", "hr": "...", "rr": "...",
"spo2": "...", "temp": "..."},
"physical_exam": {"general": "...", "cardiovascular": "...",
"respiratory": "...", "abdomen": "...",
"neurological": "..."},
"labs": [{"test": "...", "value": "...", "unit": "...",
"flag": "normal|high|low|critical"}],
"imaging": "...",
"ecg": "..."
},
"assessment": {
"primary_diagnosis": {"description": "...", "icd10": "..."},
"secondary_diagnoses": [{"description": "...", "icd10": "..."}],
"differential": ["..."],
"clinical_reasoning": "..."
},
"plan": {
"medications": [{"action": "start|continue|increase|decrease|stop",
"drug": "...", "dose": "...",
"route": "...", "frequency": "...",
"rationale": "..."}],
"procedures": ["..."],
"consults": ["..."],
"diagnostics": ["..."],
"follow_up": "...",
"disposition": "...",
"patient_education": ["..."]
}
}"""
def generate_soap_note(clinical_note: str) -> dict:
"""Convert free-text clinical note to structured SOAP format."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SOAP_GENERATION_PROMPT},
{"role": "user", "content": f"Clinical Note:\n{clinical_note}"},
],
response_format={"type": "json_object"},
temperature=0.0,
max_tokens=4096,
)
return json.loads(response.choices[0].message.content)
Using JSON mode (response_format={"type": "json_object"}) is essential for clinical applications. Free-text SOAP notes would require another parsing step and introduce ambiguity. Structured JSON output can be directly validated against a schema, stored in an EHR database, and programmatically processed by downstream systems. Every field has a defined type and set of allowed values, making the output deterministic and machine-readable.
Drug Interaction Checking
Interaction Database
Drug-drug interactions are a leading cause of adverse drug events (ADEs), responsible for an estimated 125,000 deaths annually in the United States. When the entity extraction pipeline identifies all medications a patient is taking, the interaction checker cross-references every pair against a curated interaction database. In production, this would integrate with the FDA's drug interaction API, DrugBank, or the Lexicomp database. For this use case, we maintain a focused lookup table of the most clinically significant interactions:
# --- Drug Interaction Database (synthetic subset) ---
DRUG_INTERACTIONS = {
("warfarin", "aspirin"): {
"severity": "HIGH",
"effect": "Increased bleeding risk. Combined anticoagulant and "
"antiplatelet effect potentiates hemorrhage risk.",
"recommendation": "Monitor INR closely. Consider PPI for GI protection. "
"Assess if dual therapy is clinically necessary.",
},
("metformin", "contrast dye"): {
"severity": "HIGH",
"effect": "Risk of lactic acidosis. IV contrast can cause acute "
"kidney injury, impairing metformin clearance.",
"recommendation": "Hold metformin 48h before and after contrast. Check "
"renal function (eGFR) before restarting.",
},
("lisinopril", "potassium"): {
"severity": "MODERATE",
"effect": "ACE inhibitors reduce potassium excretion. Concurrent "
"potassium supplementation increases hyperkalemia risk.",
"recommendation": "Monitor serum potassium weekly. Target K+ 3.5-5.0 mEq/L. "
"Discontinue supplement if K+ > 5.0.",
},
("metoprolol", "verapamil"): {
"severity": "HIGH",
"effect": "Combined beta-blocker and calcium channel blocker can cause "
"severe bradycardia, heart block, and hypotension.",
"recommendation": "Avoid combination. If necessary, monitor ECG and BP closely.",
},
("clopidogrel", "omeprazole"): {
"severity": "MODERATE",
"effect": "Omeprazole inhibits CYP2C19, reducing clopidogrel activation. "
"Reduced antiplatelet effect increases thrombosis risk.",
"recommendation": "Use pantoprazole instead (does not inhibit CYP2C19). "
"Critical for patients with recent stent placement.",
},
("furosemide", "lisinopril"): {
"severity": "MODERATE",
"effect": "Loop diuretics can cause hypovolemia, potentiating the "
"hypotensive effect of ACE inhibitors. Risk of AKI.",
"recommendation": "Monitor BP, Cr, and electrolytes after initiation. "
"Consider dose reduction of ACE inhibitor.",
},
("insulin", "metformin"): {
"severity": "LOW",
"effect": "Additive hypoglycemic effect. Combination is therapeutically "
"intended but increases hypoglycemia risk.",
"recommendation": "Educate patient on hypoglycemia signs. Monitor blood glucose. "
"Consider reducing metformin if recurrent lows.",
},
}
def check_interactions(medications: List[str]) -> List[dict]:
"""Check all medication pairs for known interactions."""
interactions_found = []
med_names = [m.lower().strip() for m in medications]
for i in range(len(med_names)):
for j in range(i + 1, len(med_names)):
pair = (med_names[i], med_names[j])
pair_rev = (med_names[j], med_names[i])
interaction = DRUG_INTERACTIONS.get(pair) or \
DRUG_INTERACTIONS.get(pair_rev)
if interaction:
interactions_found.append({
"drug_a": med_names[i],
"drug_b": med_names[j],
**interaction
})
return sorted(
interactions_found,
key=lambda x: {"HIGH": 0, "MODERATE": 1, "LOW": 2}[x["severity"]]
)
Flagging Logic
The interaction checker produces a prioritized alert list. HIGH severity interactions generate an immediate alert and require clinician acknowledgment before the summary is finalized. MODERATE interactions are highlighted in yellow with monitoring recommendations. LOW interactions are listed as informational notes. This tiered approach prevents alert fatigue — a well-known problem in clinical decision support systems where too many low-priority alerts cause clinicians to ignore all alerts, including critical ones.
Clinical decision support systems that generate too many alerts suffer from "alert fatigue" — clinicians override 90-95% of drug interaction alerts because most are clinically insignificant. Our system mitigates this by (1) only flagging interactions above a clinical significance threshold, (2) tiering alerts by severity, and (3) including specific recommended actions rather than generic warnings. The goal is fewer than 5 alerts per patient encounter, with a HIGH-severity positive predictive value above 80%.
Patient Timeline Generation
Timeline Generation
A chronological patient timeline transforms hundreds of pages of records into a scannable visual history. Each entry captures a date, event type, description, and clinical significance. The timeline serves two critical functions: (1) during handoffs, the incoming physician can quickly trace the patient's trajectory — were they improving or declining? (2) during diagnostic workups, patterns become visible that are hidden in individual notes — recurrent admissions, progressive lab deterioration, medication changes correlating with symptom changes.
from datetime import datetime
TIMELINE_PROMPT = """Analyze these clinical notes and generate a chronological
patient timeline. For each event, provide:
{
"timeline": [
{
"date": "YYYY-MM-DD",
"time": "HH:MM (if available, else null)",
"event_type": "admission|discharge|procedure|medication_change|
lab_result|diagnosis|consult|imaging|vital_change",
"description": "Brief clinical description",
"significance": "routine|notable|critical",
"details": "Additional clinical context"
}
],
"trajectory": "improving|stable|declining|mixed",
"key_turning_points": ["..."]
}
Order events chronologically. Flag critical events (new diagnoses,
procedures, critical lab values, code events) with significance='critical'.
Identify the overall clinical trajectory and key turning points."""
def generate_timeline(clinical_notes: List[str]) -> dict:
"""Generate a chronological patient timeline from clinical notes."""
combined = "\n\n---\n\n".join(clinical_notes)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": TIMELINE_PROMPT},
{"role": "user", "content": combined},
],
response_format={"type": "json_object"},
temperature=0.0,
max_tokens=4096,
)
return json.loads(response.choices[0].message.content)
The timeline also supports the generation of I-PASS handoff summaries, the gold standard framework for clinical handoffs endorsed by the Society of Hospital Medicine. I-PASS stands for: Illness severity (stable, "watcher", unstable), Patient summary (one-liner plus key events), Action list (pending tasks, medications due, labs to follow), Situation awareness (what to watch for, contingency plans), Synthesis by receiver (read-back confirmation). The LLM generates this structured handoff directly from the timeline and extracted entities:
HANDOFF_PROMPT = """Generate an I-PASS handoff summary from the patient data:
I - ILLNESS SEVERITY: [Stable / Watcher / Unstable]
P - PATIENT SUMMARY: One-line summary + key active problems
A - ACTION LIST: Pending tasks, meds due, labs to follow
S - SITUATION AWARENESS: What could go wrong, contingency plans
S - SYNTHESIS: Key points for the receiving clinician
Be concise. This must be readable in under 2 minutes.
Flag any active drug interactions or critical lab trends."""
def generate_handoff(
soap_note: dict,
interactions: List[dict],
timeline: dict
) -> str:
"""Generate I-PASS handoff summary."""
context = json.dumps({
"soap_note": soap_note,
"drug_interactions": interactions,
"timeline": timeline,
}, indent=2)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": HANDOFF_PROMPT},
{"role": "user", "content": context},
],
temperature=0.1,
max_tokens=1024,
)
return response.choices[0].message.content
Key Components
The medical record summarization pipeline relies on six specialized components working in concert. Each component addresses a distinct challenge in clinical NLP:
Results
The medical record summarization pipeline was evaluated on a synthetic dataset of 500 clinical encounters across emergency medicine, internal medicine, and cardiology. Performance was measured across five dimensions using clinician-validated ground truth annotations:
75% reduction in documentation time — Physicians using the pipeline spent an average of 8 minutes per patient encounter on documentation, compared to 32 minutes without the tool. The largest time savings came from automated SOAP note generation (physicians reviewed and edited rather than writing from scratch) and medication reconciliation (the system pre-populated the medication list from chart review, requiring only verification).
91% accuracy in medication extraction — Precision 93%, recall 89%. The hybrid regex + LLM approach correctly identified drug name, dosage, route, and frequency for 91% of medications in the test set. Most errors involved medications mentioned in historical context ("patient was previously on warfarin") being incorrectly marked as current medications. Post-processing rules that check temporal markers ("previously", "discontinued", "stopped") reduced this error class by 60%.
87% accuracy in diagnosis extraction — Precision 90%, recall 84%. ICD-10 code assignment was correct for 82% of extracted diagnoses. The most common error was mapping to a less-specific ICD-10 code (e.g., I50.9 "Heart failure, unspecified" instead of I50.21 "Acute systolic heart failure"). This is an acceptable clinical error — the correct diagnosis was identified, only the specificity of the code was insufficient.
60% reduction in handoff errors — Measured by comparing the completeness and accuracy of I-PASS handoff summaries generated by the pipeline versus those written manually during a simulated shift change. The pipeline-generated summaries contained 95% of critical information items, compared to 78% for manual handoffs. The most significant improvement was in medication reconciliation — manual handoffs missed an average of 1.3 medications per patient, while the pipeline missed 0.2.
45 minutes saved per physician per shift — Across a typical 12-hour shift with 18-24 patient encounters, physicians reclaimed approximately 45 minutes of clinical time. This time was redistributed to direct patient care, reducing average patient wait times by 12 minutes in the emergency department.
These results were obtained on synthetic clinical data designed to mirror the complexity and variability of real clinical notes. Performance on actual clinical documentation will vary based on institutional note-writing styles, specialty-specific terminology, and EHR system integration. Any clinical deployment requires prospective validation with real data under IRB oversight before these metrics can be generalized.
Production Considerations
HIPAA Compliance
Healthcare AI systems operate under the strictest regulatory framework in any industry. HIPAA (Health Insurance Portability and Accountability Act) governs how Protected Health Information (PHI) is collected, stored, transmitted, and processed. Any LLM-based system that processes clinical notes must comply with the following requirements:
De-identification is mandatory before sending data to cloud LLM APIs. The HIPAA Safe Harbor method requires removal of 18 categories of identifiers. An alternative is Expert Determination, where a qualified statistician certifies that the risk of re-identification is very small. For production systems, de-identification should be performed by a validated, FDA-cleared NLP tool (such as Philter, NLM Scrubber, or a validated in-house solution), not ad-hoc regex patterns.
Business Associate Agreements (BAAs) are required with any cloud provider that processes PHI. OpenAI, Google, Microsoft Azure, and AWS all offer BAAs for their healthcare-tier services. The BAA contractually obligates the provider to safeguard PHI according to HIPAA requirements. Using a consumer-grade API endpoint without a BAA is a violation even if de-identification is performed, because the audit trail must document that PHI protection was contractually guaranteed at every processing step.
Audit logging must track every interaction with patient data: who accessed what data, when, what was sent to the LLM, what was returned, and what clinical decisions were influenced by the output. The audit log must be tamper-proof and retained for 6 years per HIPAA requirements. This audit trail is critical for both compliance and liability defense.
Clinician-in-the-Loop
The most important production requirement is that no LLM output should be treated as a final clinical decision. The system generates summaries, extracts entities, and flags interactions — but a licensed clinician must review and approve every output before it becomes part of the official medical record. This is not just a regulatory requirement; it is an ethical and clinical necessity. LLMs can hallucinate medications, miss critical allergies, or misinterpret clinical context in ways that could directly harm patients.
The clinician-in-the-loop workflow should be designed for efficiency: (1) the system presents its output with confidence scores highlighted, (2) high-confidence items are pre-checked for approval, (3) medium-confidence items are highlighted in yellow for quick review, (4) low-confidence items are presented as suggestions requiring explicit action. The goal is to reduce the clinician's work from "write from scratch" to "review and approve" — a much faster cognitive task that still maintains human oversight.
EHR Integration
Production deployment requires integration with Electronic Health Record (EHR) systems. The two dominant EHR platforms in the United States are Epic (used by 38% of hospitals) and Cerner/Oracle Health (used by 25% of hospitals). Integration uses the FHIR (Fast Healthcare Interoperability Resources) standard — a RESTful API specification for exchanging healthcare data. The pipeline outputs should be formatted as FHIR resources:
• Patient resource — demographics (de-identified in our context)
• Condition resource — diagnoses with ICD-10 codes, onset dates, clinical status
• MedicationStatement resource — active medications with dosage, route, frequency
• AllergyIntolerance resource — allergies with substance, reaction, severity
• Observation resource — lab results, vital signs with LOINC codes
• Procedure resource — procedures with CPT codes and dates
• DocumentReference resource — the generated SOAP note as an attachment
• Composition resource — the handoff summary as a structured clinical document
Additional production concerns include: liability and malpractice (who is responsible when an LLM-generated summary contains an error that leads to patient harm?), model validation (FDA may require 510(k) clearance for clinical decision support software), bias and equity (does the model perform equally well on notes written in different clinical styles, for different patient populations, and across different specialties?), and versioning (when the underlying LLM is updated, the entire pipeline must be re-validated to ensure extraction accuracy has not degraded).
The FDA is actively developing guidance for AI/ML-based Software as a Medical Device (SaMD). Clinical NLP systems that influence treatment decisions may fall under FDA oversight. The 21st Century Cures Act and the ONC Health IT Certification Program also impose requirements on EHR-integrated AI systems. Consult with regulatory affairs and legal counsel before deploying any clinical AI system in a patient care setting.
Build Your Portfolio
Fork & Extend
Turn this notebook into a portfolio project in 5 steps:
- Fork the notebook — Clone the repo and open in Google Colab or locally.
- Swap in real data — Replace the synthetic records with the MIMIC-IV Clinical Notes dataset from PhysioNet. Requires a free credentialed account and CITI training certificate, but it contains real de-identified clinical notes from ICU patients.
- Add medication interaction checking — Integrate the extracted medication list with the RxNorm Interaction API to automatically flag dangerous drug-drug interactions in the summary output.
- Deploy it — Wrap it in a Streamlit app. Build a HIPAA-aware interface where clinicians paste or upload a clinical note and receive a structured SOAP summary with highlighted entities (medications, diagnoses, procedures) and interaction warnings.
- Write a README — Include architecture diagram, setup instructions, sample outputs, and metrics.
What Hiring Managers Look For
Health-tech hiring managers prioritize safety and compliance above all else. Demonstrate that your system never fabricates clinical information by implementing hallucination detection that cross-references every extracted entity against the source text. Include ROUGE and entity-level F1 scores, show how PHI is detected and redacted before any data leaves the system, and document your clinician-in-the-loop review workflow for edge cases like ambiguous abbreviations or conflicting records.
Public Datasets to Use
- MIMIC-IV Clinical Notes — Real de-identified ICU clinical notes including discharge summaries, radiology reports, and nursing notes. Available on PhysioNet (requires credentialing). The gold standard for clinical NLP research.
- MTSamples — 5,000+ sample medical transcription reports across 40 specialties. Available on mtsamples.com. Great for testing across different medical specialties without access restrictions.
- n2c2 / i2b2 NLP Datasets — Shared tasks for clinical NLP including medication extraction, relation classification, and de-identification. Available via Harvard DBMI. Excellent for benchmarking NER and relation extraction.
Deployment Options
| Platform | Best For | Effort |
|---|---|---|
| Streamlit | Clinical note upload with structured SOAP summary and entity highlights | Low |
| Gradio | Quick demo with text input and annotated NER output visualization | Low |
| FastAPI | FHIR-compatible API endpoint for EHR system integration | Medium |
| Docker + Cloud Run | HIPAA-compliant containerized service with encryption at rest and in transit | High |