Document 📄
The Document class is a container for working with both clinical text and structured healthcare data. It natively manages FHIR resources, runs NLP over raw notes, tracks clinical document relationships, stores decision support outputs, and holds LLM model predictions.
Use Document containers for clinical notes, discharge summaries, patient records, and any healthcare data that combines text with structured FHIR resources.
Usage
The main things you'll do with Document:
- Store and update clinical notes and FHIR Bundles
- Extract and manipulate diagnoses, meds, allergies, and documents
- Run NLP to extract entities or embeddings from text
- Generate & store CDS Hooks cards (recommendations, alerts)
- Attach model predictions for downstream use
API Overview
Document has four key components (all accessible as attributes):
| Attribute | For |
|---|---|
doc.fhir |
FHIR management—Clinical lists, Bundles, DocReference, patient info |
doc.nlp |
NLP features—entities, tokens, embeddings |
doc.cds |
Decision support—recommendation cards, actions |
doc.models |
ML/LLM outputs—store/retrieve predictions, generations |
FHIR Data (doc.fhir)
- Automatic FHIR Bundle creation and management
- Resource type validation
- Easy access to clinical data lists (e.g., problems, medications, allergies)
- OperationOutcome and Provenance resources automatically extracted and accessible as
doc.fhir.operation_outcomesanddoc.fhir.provenances(removed from main bundle)
Convenience Accessors
| Attribute | Description |
|---|---|
patient |
First Patient resource in the bundle (or None) |
patients |
List of Patient resources |
problem_list |
List of Condition resources (diagnoses, problems) |
medication_list |
List of MedicationStatement resources |
allergy_list |
List of AllergyIntolerance resources |
Document Reference Management
- Document relationship tracking (parent/child/sibling)
- Attachment handling with base64 encoding
- Document family retrieval
CDS Support
- Support for CDS Hooks prefetch resources
- Resource indexing by type
from healthchain.io import Document
from healthchain.fhir import (
create_condition,
create_document_reference,
)
# Initialize with clinical text from EHR
doc = Document("Patient presents with uncontrolled hypertension and Type 2 diabetes")
# Build problem list with SNOMED CT codes
doc.fhir.problem_list = [
create_condition(
subject="Patient/123",
code="38341003",
display="Hypertension"
),
create_condition(
subject="Patient/123",
code="44054006",
display="Type 2 diabetes mellitus"
)
]
# Track document versions and amendments
initial_note = create_document_reference(
data="Initial assessment: Patient presents with chest pain",
content_type="text/plain",
description="Initial ED note"
)
initial_id = doc.fhir.add_document_reference(initial_note)
# Add amended note
amended_note = create_document_reference(
data="Amended: Patient presents with chest pain, ruling out cardiac etiology",
content_type="text/plain",
description="Amended ED note"
)
amended_id = doc.fhir.add_document_reference(
amended_note,
parent_id=initial_id,
relationship_type="replaces"
)
# Retrieve document history for audit trail
family = doc.fhir.get_document_reference_family(amended_id)
print(f"Original note: {family['parents'][0].description}")
# Handle errors and track data provenance
if doc.fhir.operation_outcomes:
for outcome in doc.fhir.operation_outcomes:
print(f"Warning: {outcome.issue[0].diagnostics}")
# Access patient demographics
if doc.fhir.patient:
print(f"Patient: {doc.fhir.patient.name[0].given[0]} {doc.fhir.patient.name[0].family}")
# Prepare data for CDS Hooks integration
prefetch = {
"Condition": doc.fhir.problem_list,
"MedicationStatement": doc.fhir.medication_list,
}
doc.fhir.prefetch_resources = prefetch
# CDS service can query prefetch data
conditions = doc.fhir.get_prefetch_resources("Condition")
print(f"Active conditions: {len(conditions)}")
NLP (doc.nlp)
- Medical text features: tokens, entities (
get_entities()), embeddings (get_embeddings()) - Direct spaCy doc access, fast word counting
# Extract medical concepts from clinical note
doc = Document("Patient diagnosed with pneumonia, started on azithromycin")
# Get medical entities
entities = doc.nlp.get_entities()
for entity in entities:
print(f"{entity.text}: {entity.label_}") # "pneumonia: CONDITION"
# Access full spaCy document for custom processing
spacy_doc = doc.nlp.get_spacy_doc()
for ent in spacy_doc.ents:
if hasattr(ent._, "cui"):
print(f"{ent.text} -> SNOMED: {ent._.cui}")
Clinical Decision Support (doc.cds)
cards: Clinical recommendation cards displayed in EHR workflowsactions: Suggested interventions (orders, referrals, documentation)
from healthchain.models import Card, Action
# Generate clinical alert
doc.cds.cards = [
Card(
summary="Drug interaction detected",
indicator="critical",
detail="Warfarin + NSAIDs: Increased bleeding risk",
source={"label": "Clinical Decision Support"},
)
]
# Suggest action
doc.cds.actions = [
Action(
type="create",
description="Order CBC to monitor platelets",
resource={
"resourceType": "ServiceRequest",
"code": {"text": "Complete Blood Count"}
}
)
]
LLM Model Outputs (doc.models)
get_output(model_name, task): Retrieve model predictions by name and taskget_generated_text(model_name, task): Extract generated text from LLMs- Supports Hugging Face, LangChain, spaCy, and custom models
# Store classification results
doc.models.add_output(
model_name="clinical_classifier",
task="diagnosis_prediction",
output={"prediction": "diabetes", "confidence": 0.95}
)
# Store LLM summary
doc.models.add_output(
model_name="gpt4",
task="summarization",
output="Patient presents with classic diabetic symptoms..."
)
# Retrieve outputs
diagnosis = doc.models.get_output("clinical_classifier", "diagnosis_prediction")
summary = doc.models.get_generated_text("gpt4", "summarization")
Properties and Methods
# FHIR access
print(doc.fhir.problem_list)
print(doc.fhir.patient)
# NLP
tokens = doc.nlp.get_tokens()
ents = doc.nlp.get_entities()
# Clinical decision support
cards = doc.cds.cards
# Model outputs
doc.models.add_output("my_model", "task", output={"foo": "bar"})
print(doc.models.get_output("my_model", "task"))
Resource Docs
API Reference
See Document API Reference for full details.