Document 📄

The Document class is a container for working with both clinical text and structured healthcare data. It natively manages FHIR resources, runs NLP over raw notes, tracks clinical document relationships, stores decision support outputs, and holds LLM model predictions.

Use Document containers for clinical notes, discharge summaries, patient records, and any healthcare data that combines text with structured FHIR resources.

Usage

The main things you'll do with Document:

Store and update clinical notes and FHIR Bundles
Extract and manipulate diagnoses, meds, allergies, and documents
Run NLP to extract entities or embeddings from text
Generate & store CDS Hooks cards (recommendations, alerts)
Attach model predictions for downstream use

API Overview

Document has four key components (all accessible as attributes):

Attribute	For
`doc.fhir`	FHIR management—Clinical lists, Bundles, DocReference, patient info
`doc.nlp`	NLP features—entities, tokens, embeddings
`doc.cds`	Decision support—recommendation cards, actions
`doc.models`	ML/LLM outputs—store/retrieve predictions, generations

FHIR Data (`doc.fhir`)

Automatic FHIR Bundle creation and management
Resource type validation
Easy access to clinical data lists (e.g., problems, medications, allergies)
OperationOutcome and Provenance resources automatically extracted and accessible as doc.fhir.operation_outcomes and doc.fhir.provenances (removed from main bundle)

Convenience Accessors

Attribute	Description
`patient`	First Patient resource in the bundle (or `None`)
`patients`	List of Patient resources
`problem_list`	List of Condition resources (diagnoses, problems)
`medication_list`	List of MedicationStatement resources
`allergy_list`	List of AllergyIntolerance resources

Document Reference Management

Document relationship tracking (parent/child/sibling)
Attachment handling with base64 encoding
Document family retrieval

CDS Support

Support for CDS Hooks prefetch resources
Resource indexing by type

from healthchain.io import Document
from healthchain.fhir import (
    create_condition,
    create_document_reference,
)

# Initialize with clinical text from EHR
doc = Document("Patient presents with uncontrolled hypertension and Type 2 diabetes")

# Build problem list with SNOMED CT codes
doc.fhir.problem_list = [
    create_condition(
        subject="Patient/123",
        code="38341003",
        display="Hypertension"
    ),
    create_condition(
        subject="Patient/123",
        code="44054006",
        display="Type 2 diabetes mellitus"
    )
]

# Track document versions and amendments
initial_note = create_document_reference(
    data="Initial assessment: Patient presents with chest pain",
    content_type="text/plain",
    description="Initial ED note"
)
initial_id = doc.fhir.add_document_reference(initial_note)

# Add amended note
amended_note = create_document_reference(
    data="Amended: Patient presents with chest pain, ruling out cardiac etiology",
    content_type="text/plain",
    description="Amended ED note"
)
amended_id = doc.fhir.add_document_reference(
    amended_note,
    parent_id=initial_id,
    relationship_type="replaces"
)

# Retrieve document history for audit trail
family = doc.fhir.get_document_reference_family(amended_id)
print(f"Original note: {family['parents'][0].description}")


# Handle errors and track data provenance
if doc.fhir.operation_outcomes:
    for outcome in doc.fhir.operation_outcomes:
        print(f"Warning: {outcome.issue[0].diagnostics}")

# Access patient demographics
if doc.fhir.patient:
    print(f"Patient: {doc.fhir.patient.name[0].given[0]} {doc.fhir.patient.name[0].family}")

# Prepare data for CDS Hooks integration
prefetch = {
    "Condition": doc.fhir.problem_list,
    "MedicationStatement": doc.fhir.medication_list,
}
doc.fhir.prefetch_resources = prefetch

# CDS service can query prefetch data
conditions = doc.fhir.get_prefetch_resources("Condition")
print(f"Active conditions: {len(conditions)}")

NLP (`doc.nlp`)

Medical text features: tokens, entities (get_entities()), embeddings (get_embeddings())
Direct spaCy doc access, fast word counting

# Extract medical concepts from clinical note
doc = Document("Patient diagnosed with pneumonia, started on azithromycin")

# Get medical entities
entities = doc.nlp.get_entities()
for entity in entities:
    print(f"{entity.text}: {entity.label_}")  # "pneumonia: CONDITION"

# Access full spaCy document for custom processing
spacy_doc = doc.nlp.get_spacy_doc()
for ent in spacy_doc.ents:
    if hasattr(ent._, "cui"):
        print(f"{ent.text} -> SNOMED: {ent._.cui}")

Clinical Decision Support (`doc.cds`)

cards: Clinical recommendation cards displayed in EHR workflows
actions: Suggested interventions (orders, referrals, documentation)

from healthchain.models import Card, Action

# Generate clinical alert
doc.cds.cards = [
    Card(
        summary="Drug interaction detected",
        indicator="critical",
        detail="Warfarin + NSAIDs: Increased bleeding risk",
        source={"label": "Clinical Decision Support"},
    )
]

# Suggest action
doc.cds.actions = [
    Action(
        type="create",
        description="Order CBC to monitor platelets",
        resource={
            "resourceType": "ServiceRequest",
            "code": {"text": "Complete Blood Count"}
        }
    )
]

LLM Model Outputs (`doc.models`)

get_output(model_name, task): Retrieve model predictions by name and task
get_generated_text(model_name, task): Extract generated text from LLMs
Supports Hugging Face, LangChain, spaCy, and custom models

# Store classification results
doc.models.add_output(
    model_name="clinical_classifier",
    task="diagnosis_prediction",
    output={"prediction": "diabetes", "confidence": 0.95}
)

# Store LLM summary
doc.models.add_output(
    model_name="gpt4",
    task="summarization",
    output="Patient presents with classic diabetic symptoms..."
)

# Retrieve outputs
diagnosis = doc.models.get_output("clinical_classifier", "diagnosis_prediction")
summary = doc.models.get_generated_text("gpt4", "summarization")

Properties and Methods

# FHIR access
print(doc.fhir.problem_list)
print(doc.fhir.patient)

# NLP
tokens = doc.nlp.get_tokens()
ents = doc.nlp.get_entities()

# Clinical decision support
cards = doc.cds.cards

# Model outputs
doc.models.add_output("my_model", "task", output={"foo": "bar"})
print(doc.models.get_output("my_model", "task"))

Resource Docs

API Reference

See Document API Reference for full details.