Data Container
The healthchain.io.containers
module provides FHIR-native containers for healthcare data processing. These containers handle the complexities of clinical data formats while providing a clean Python interface for NLP/ML pipelines.
DataContainer 📦
DataContainer
is a generic base class for storing data of any type.
from healthchain.io.containers import DataContainer
# Create a DataContainer with string data
container = DataContainer("Some data")
# Convert to dictionary and JSON
data_dict = container.to_dict()
data_json = container.to_json()
# Create from dictionary or JSON
container_from_dict = DataContainer.from_dict(data_dict)
container_from_json = DataContainer.from_json(data_json)
Document 📄
The Document
class is HealthChain's core container for clinical text and structured healthcare data. It handles FHIR resources natively, automatically manages validation and conversion, and integrates seamlessly with NLP models and CDS workflows.
Use Document containers for clinical notes, discharge summaries, patient records, and any healthcare data that combines text with structured FHIR resources.
Attribute | Access | Primary Purpose | Key Features | Common Use Cases |
---|---|---|---|---|
FHIR Data | doc.fhir |
Manage clinical data in FHIR format | • Resource bundles • Clinical lists (problems, meds, allergies) • Document references • CDS prefetch |
• Store patient records • Track medical history • Manage clinical documents |
NLP | doc.nlp |
Process and analyze text | • Tokenization • Entity recognition • Embeddings • spaCy integration |
• Extract medical terms • Analyze clinical text • Generate features |
CDS | doc.cds |
Clinical decision support | • Recommendation cards • Suggested actions • Clinical alerts |
• Generate alerts • Suggest interventions • Guide clinical decisions |
Model Outputs | doc.models |
Store ML model results | • Multi-framework support • Task-specific outputs • Text generation |
• Store classifications • Keep predictions • Track generations |
FHIR Data (doc.fhir
)
The FHIR component provides production-ready management of FHIR resources with automatic validation, error handling, and convenient accessors for common clinical workflows:
Storage and Management:
- Automatic
Bundle
creation and management - Resource type validation
- Convenient access to common clinical data lists
- Automatic extraction of
OperationOutcome
andProvenance
resources intodoc.fhir.operation_outcomes
anddoc.fhir.provenances
(removed from bundle)
Convenience Accessors:
patient
: First Patient resource in the bundle, orNone
patients
: List of Patient resourcesproblem_list
: List ofCondition
resources (diagnoses, problems)medication_list
: List ofMedicationStatement
resourcesallergy_list
: List ofAllergyIntolerance
resources
Document Reference Management:
- Document relationship tracking (parent/child/sibling)
- Attachment handling with
base64
encoding - Document family retrieval
CDS Support:
- Support for CDS Hooks prefetch resources
- Resource indexing by type
Example: Clinical Documentation Workflow
from healthchain.io import Document
from healthchain.fhir import (
create_condition,
create_medication_statement,
create_document_reference,
)
# Initialize with clinical text from EHR
doc = Document("Patient presents with uncontrolled hypertension and Type 2 diabetes")
# Build problem list with SNOMED CT codes
doc.fhir.problem_list = [
create_condition(
subject="Patient/123",
code="38341003",
display="Hypertension"
),
create_condition(
subject="Patient/123",
code="44054006",
display="Type 2 diabetes mellitus"
)
]
# Document current medications
doc.fhir.medication_list = [
create_medication_statement(
subject="Patient/123",
code="197361",
display="Lisinopril 10 MG"
),
create_medication_statement(
subject="Patient/123",
code="860975",
display="Metformin 500 MG"
)
]
# Track document versions and amendments
initial_note = create_document_reference(
data="Initial assessment: Patient presents with chest pain",
content_type="text/plain",
description="Initial ED note"
)
initial_id = doc.fhir.add_document_reference(initial_note)
# Add amended note
amended_note = create_document_reference(
data="Amended: Patient presents with chest pain, ruling out cardiac etiology",
content_type="text/plain",
description="Amended ED note"
)
amended_id = doc.fhir.add_document_reference(
amended_note,
parent_id=initial_id,
relationship_type="replaces"
)
# Retrieve document history for audit trail
family = doc.fhir.get_document_reference_family(amended_id)
print(f"Original note: {family['parents'][0].description}")
# Prepare data for CDS Hooks integration
prefetch = {
"Condition": doc.fhir.problem_list,
"MedicationStatement": doc.fhir.medication_list,
}
doc.fhir.prefetch_resources = prefetch
# CDS service can query prefetch data
conditions = doc.fhir.get_prefetch_resources("Condition")
print(f"Active conditions: {len(conditions)}")
# Handle errors and track data provenance
if doc.fhir.operation_outcomes:
for outcome in doc.fhir.operation_outcomes:
print(f"Warning: {outcome.issue[0].diagnostics}")
# Access patient demographics
if doc.fhir.patient:
print(f"Patient: {doc.fhir.patient.name[0].given[0]} {doc.fhir.patient.name[0].family}")
Technical Notes:
- All FHIR resources are validated using fhir.resources
- Document relationships follow the FHIR DocumentReference.relatesTo standard
Resource Documentation:
NLP Component (doc.nlp
)
Process clinical text with medical NLP models and access extracted features:
get_tokens()
: Tokenized clinical text for downstream processingget_entities()
: Medical entities with optional CUI codes (SNOMED CT, RxNorm)get_embeddings()
: Vector representations for similarity search and clusteringget_spacy_doc()
: Direct access to spaCy document for custom processingword_count()
: Token-based word count
Example: Medical Entity Extraction
# Extract medical concepts from clinical note
doc = Document("Patient diagnosed with pneumonia, started on azithromycin")
# Get medical entities
entities = doc.nlp.get_entities()
for entity in entities:
print(f"{entity.text}: {entity.label_}") # "pneumonia: CONDITION"
# Access full spaCy document for custom processing
spacy_doc = doc.nlp.get_spacy_doc()
for ent in spacy_doc.ents:
if hasattr(ent._, "cui"):
print(f"{ent.text} -> SNOMED: {ent._.cui}")
Clinical Decision Support (doc.cds
)
Generate CDS Hooks cards and actions for real-time EHR integration:
cards
: Clinical recommendation cards displayed in EHR workflowsactions
: Suggested interventions (orders, referrals, documentation)
Example: CDS Hooks Response
from healthchain.models import Card, Action
# Generate clinical alert
doc.cds.cards = [
Card(
summary="Drug interaction detected",
indicator="critical",
detail="Warfarin + NSAIDs: Increased bleeding risk",
source={"label": "Clinical Decision Support"},
)
]
# Suggest action
doc.cds.actions = [
Action(
type="create",
description="Order CBC to monitor platelets",
resource={
"resourceType": "ServiceRequest",
"code": {"text": "Complete Blood Count"}
}
)
]
Model Outputs (doc.models
)
Store and retrieve ML model predictions across multiple frameworks:
get_output(model_name, task)
: Retrieve model predictions by name and taskget_generated_text(model_name, task)
: Extract generated text from LLMs- Supports Hugging Face, LangChain, spaCy, and custom models
Example: Multi-Model Pipeline
# Store classification results
doc.models.add_output(
model_name="clinical_classifier",
task="diagnosis_prediction",
output={"prediction": "diabetes", "confidence": 0.95}
)
# Store LLM summary
doc.models.add_output(
model_name="gpt4",
task="summarization",
output="Patient presents with classic diabetic symptoms..."
)
# Retrieve outputs
diagnosis = doc.models.get_output("clinical_classifier", "diagnosis_prediction")
summary = doc.models.get_generated_text("gpt4", "summarization")
Example: Complete Clinical Workflow
from healthchain.io import Document
from healthchain.fhir import create_condition
from healthchain.models import Card, Action
# Initialize with clinical note from EHR
doc = Document("67yo M presents with acute chest pain radiating to left arm, diaphoresis")
# Process with NLP model
print(f"Clinical note length: {doc.nlp.word_count()} words")
entities = doc.nlp.get_entities()
# Extract FHIR conditions from text
spacy_doc = doc.nlp.get_spacy_doc()
for ent in spacy_doc.ents:
if ent.label_ == "CONDITION" and hasattr(ent._, "cui"):
doc.fhir.problem_list.append(
create_condition(
subject="Patient/123",
code=ent._.cui,
display=ent.text
)
)
# Or use helper method for automatic extraction
doc.update_problem_list_from_nlp()
# Generate CDS alert based on findings
doc.cds.cards = [
Card(
summary="STEMI Alert - Activate Cath Lab",
indicator="critical",
detail="Patient meets criteria for ST-elevation myocardial infarction",
source={"label": "Cardiology Protocol"},
)
]
# Track model predictions
doc.models.add_output(
model_name="cardiac_risk_model",
task="classification",
output={"risk_level": "high", "score": 0.89}
)
# Access all components
print(f"Problem list: {len(doc.fhir.problem_list)} conditions")
print(f"CDS cards: {len(doc.cds.cards)} alerts")
print(f"Risk assessment: {doc.models.get_output('cardiac_risk_model', 'classification')}")
Tabular 📊
The Tabular
class handles structured healthcare data like lab results, patient cohorts, and claims data. It wraps pandas DataFrame with healthcare-specific operations.
Example: Patient Cohort Analysis
import pandas as pd
from healthchain.io.containers import Tabular
# Load patient cohort data
df = pd.DataFrame({
'patient_id': ['P001', 'P002', 'P003'],
'age': [45, 62, 58],
'diagnosis': ['diabetes', 'hypertension', 'diabetes'],
'hba1c': [7.2, None, 8.1]
})
cohort = Tabular(df)
# Analyze cohort characteristics
print(f"Cohort size: {cohort.row_count()} patients")
print(f"Average age: {cohort.data['age'].mean():.1f} years")
print(f"\nClinical measures:\n{cohort.describe()}")
# Filter for diabetic patients
diabetic_cohort = cohort.data[cohort.data['diagnosis'] == 'diabetes']
print(f"\nDiabetic patients: {len(diabetic_cohort)}")
print(f"Mean HbA1c: {diabetic_cohort['hba1c'].mean():.1f}%")
# Export for reporting
cohort.to_csv('patient_cohort_analysis.csv')
Example: Lab Results Processing
# Load lab results from EHR export
labs = Tabular.from_csv('lab_results.csv')
print(f"Total lab orders: {labs.row_count()}")
print(f"Test types: {labs.data['test_name'].nunique()}")
# Identify abnormal results
abnormal = labs.data[labs.data['flag'] == 'ABNORMAL']
print(f"Abnormal results: {len(abnormal)} ({len(abnormal)/labs.row_count()*100:.1f}%)")
These containers provide a consistent, FHIR-aware interface for healthcare data processing throughout HealthChain pipelines, handling validation, conversion, and integration with clinical workflows automatically.