Data Generator
Healthcare systems use standardized data formats, but each hospital or clinic configures their data differently. This creates challenges when building applications that need to work across multiple healthcare systems.
The data generator creates test data that matches the structure and format expected by Electronic Health Record (EHR) systems. It's designed for testing your applications, not for research studies that need realistic patient populations.
According to the UK ONS synthetic data classification, HealthChain generates "level 1: synthetic structural data" - data that follows the correct format but contains fictional information.

CDS Data Generator
The .generate_prefetch() method will return a Prefetch model with the prefetch field populated with a dictionary of FHIR resources. Each key in the dictionary corresponds to a FHIR resource type, and the value is a list of FHIR resources of that type. For more information, check out the CDS Hooks documentation.
For each workflow, a pre-configured list of FHIR resources is randomly generated and placed in the prefetch field of a CDSRequest.
Current implemented workflows:
| Workflow | Implementation Completeness | Generated Synthetic Resources |
|---|---|---|
| patient-view | Patient, Encounter (Future: MedicationStatement, AllergyIntolerance) |
|
| encounter-discharge | Patient, Encounter, Procedure, MedicationRequest, Optional DocumentReference |
|
| order-sign | Partial | Future: MedicationRequest, ProcedureRequest, ServiceRequest |
| order-select | Partial | Future: MedicationRequest, ProcedureRequest, ServiceRequest |
For more information on CDS workflows, see the CDS Hooks Protocol documentation.
You can use the data generator with SandboxClient.load_free_text() or standalone:
from healthchain.sandbox import SandboxClient
# Create client
client = SandboxClient(
api_url="http://localhost:8000",
endpoint="/cds/cds-services/my-service",
workflow="encounter-discharge"
)
# Generate FHIR data from clinical notes
client.load_free_text(
csv_path="./data/discharge_notes.csv",
column_name="text",
workflow="encounter-discharge",
random_seed=42
)
responses = client.send_requests()
from healthchain.sandbox.generators import CdsDataGenerator
from healthchain.sandbox.workflows import Workflow
# Initialize data generator
data_generator = CdsDataGenerator()
# Generate FHIR resources for use case workflow
data_generator.set_workflow(Workflow.encounter_discharge)
prefetch = data_generator.generate_prefetch()
print(prefetch.model_dump())
# {
# "prefetch": {
# "encounter":
# {
# "resourceType": ...
# }
# }
#}
Loading free-text
You can specify the free_text_csv field of the .generate_prefetch() method to load in free-text sources into the data generator, e.g. discharge summaries. This will wrap the text into a FHIR DocumentReference resource (N.B. currently we place the text directly in the resource attachment, although it is technically supposed to be base64 encoded).
A random text document from the csv file will be picked for each generation.