Skip to content

Parsers

Parsers are responsible for extracting structured data from various healthcare document formats. The module includes built-in parsers for common formats like CDA and HL7v2.

Available Parsers

Parser Description
CDAParser Parses CDA XML documents into structured data
HL7v2Parser Parses HL7v2 messages into structured data

CDA Parser

The CDA Parser extracts data from Clinical Document Architecture (CDA) XML documents based on configured section identifiers.

Internally, it uses xmltodict to parse the XML into a dictionary, validates the dictionary with Pydantic, and then maps each entry to the section keys. See Working with xmltodict in HealthChain for more details.

Each extracted entry should be mapped to the name of the corresponding configuration file, which will be used as the section_key. The configuration file contains information about the section identifiers that are used to extract the correct section entries.

The input data should be in the format {<section_key>}: {<section_entries>}.

(Full Documentation on Configuration)

Usage Examples

from healthchain.interop import create_engine, FormatType

# Create an engine
engine = create_engine()

# Parse a CDA document directly to FHIR
with open("tests/data/test_cda.xml", "r") as f:
    cda_xml = f.read()

fhir_resources = engine.to_fhir(cda_xml, src_format=FormatType.CDA)

# Access the CDA parser directly (advanced use case)
cda_parser = engine.cda_parser
sections = cda_parser.parse_document(cda_xml)

# Extract problems section data
problems = sections.get("problems", [])
# parsed CDA section entry in xmltodict format - note that '@' is used to access attributes
# {
#   "act": {
#     "@classCode": "ACT",
#     "@moodCode": "EVN",
#     ...
#   }
# }
View full parsed output Note how the original XML structure is preserved in dictionary format with '@' used to denote attributes:
[{
  'act': {
    '@classCode': 'ACT',
    '@moodCode': 'EVN',
    'templateId': [
      {'@root': '2.16.840.1.113883.10.20.1.27'},
      {'@root': '1.3.6.1.4.1.19376.1.5.3.1.4.5.1'},
      {'@root': '1.3.6.1.4.1.19376.1.5.3.1.4.5.2'},
      {'@root': '2.16.840.1.113883.3.88.11.32.7'},
      {'@root': '2.16.840.1.113883.3.88.11.83.7'}
    ],
    'id': {
      '@extension': '51854-concern',
      '@root': '1.2.840.114350.1.13.525.3.7.2.768076'
    },
    'code': {
      '@nullFlavor': 'NA'
    },
    'text': {
      'reference': {'@value': '#problem12'}
    },
    'statusCode': {
      '@code': 'active'
    },
    'effectiveTime': {
      'low': {'@value': '20210317'}
    },
    'entryRelationship': {
      '@typeCode': 'SUBJ',
      '@inversionInd': False,
      'observation': {
        '@classCode': 'OBS',
        '@moodCode': 'EVN',
        'templateId': [
          {'@root': '1.3.6.1.4.1.19376.1.5.3.1.4.5'},
          {'@root': '2.16.840.1.113883.10.20.1.28'}
        ],
        'id': {
          '@extension': '51854',
          '@root': '1.2.840.114350.1.13.525.3.7.2.768076'
        },
        'code': {
          '@code': '64572001',
          '@codeSystem': '2.16.840.1.113883.6.96',
          '@codeSystemName': 'SNOMED CT'
        },
        'text': {
          'reference': {'@value': '#problem12name'}
        },
        'statusCode': {
          '@code': 'completed'
        },
        'effectiveTime': {
          'low': {'@value': '20190517'}
        },
        'value': {
          '@code': '38341003',
          '@codeSystem': '2.16.840.1.113883.6.96',
          '@codeSystemName': 'SNOMED CT',
          '@xsi:type': 'CD',
          '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
          'originalText': {
            'reference': {'@value': '#problem12name'}
          }
        },
        'entryRelationship': {
          '@typeCode': 'REFR',
          '@inversionInd': False,
          'observation': {
            '@classCode': 'OBS',
            '@moodCode': 'EVN',
            'templateId': [
              {'@root': '2.16.840.1.113883.10.20.1.50'},
              {'@root': '2.16.840.1.113883.10.20.1.57'},
              {'@root': '1.3.6.1.4.1.19376.1.5.3.1.4.1.1'}
            ],
            'code': {
              '@code': '33999-4',
              '@codeSystem': '2.16.840.1.113883.6.1',
              '@displayName': 'Status'
            },
            'statusCode': {
              '@code': 'completed'
            },
            'effectiveTime': {
              'low': {'@value': '20190517'}
            },
            'value': {
              '@code': '55561003',
              '@codeSystem': '2.16.840.1.113883.6.96',
              '@xsi:type': 'CE',
              '@displayName': 'Active',
              '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance'
            }
          }
        }
      }
    }
  }
}]
This data structure represents a problem (condition) entry from a CDA document, containing:
- A problem act with template IDs and status
- An observation with clinical details (SNOMED code 38341003 - Hypertension)
- Status information (Active)
- Dates (onset date: May 17, 2019)
This data structure is then processed by the generator to map to the configured FHIR resource.

Section Configuration

Sections make up the structure of a CDA document. The CDA parser uses identifiers in the section configuration file to determine which sections to extract and map to FHIR resources. Each section is identified by a template ID, code, or both:

# Example section configuration
cda:
  sections:
    problems:
      identifiers:
        template_id: "2.16.840.1.113883.10.20.1.11"
        code: "11450-4"
      resource: "Condition"

Creating a Custom Parser

You can create a custom parser by implementing a class that inherits from BaseParser and registering it with the engine (this will replace the default parser for the format type):

from healthchain.interop import create_engine, FormatType
from healthchain.interop.config_manager import InteropConfigManager
from healthchain.interop.parsers.base import BaseParser

class CustomParser(BaseParser):
    def __init__(self, config: InteropConfigManager):
        super().__init__(config)

    def from_string(self, data: str) -> dict:
        # Parse the document and return structured data
        return {"structured_data": "example"}

# Register the custom parser with the engine
engine = create_engine()
engine.register_parser(FormatType.CDA, CustomParser(engine.config))