Working with xmltodict in HealthChain
The HealthChain interoperability engine uses xmltodict to convert between XML and Python dictionaries. This guide explains key conventions to be aware of when working with the parsed data.
Why use xmltodict
? You say, Why not use the lxml
or xml.etree.ElementTree
or some other decent library so you can work on the XML tree directly?
There are two main reasons:
-
HealthChain uses Pydantic models for validation and type checking extensively, which works best with JSON-able data. We wanted to keep everything in modern Python ecosystem whilst still being able to work with XML, which is still a very common format in healthcare
-
Developer experience: it's just easier to work with JSON than XML trees in Python 🤷♀️
The flow roughly looks like this:
Still with me? Cool. Let's dive into the key conventions to be aware of when working with the parsed data.
Key Conventions
Attribute Prefixes
XML attributes are prefixed with @
:
Text Content
Text content of elements is represented with #text
:
Lists vs Single Items
A collection of elements with the same name becomes a list:
becomes:Force List Parameter
When parsing, you can force certain elements to always be lists even when there's only one:
Namespaces
Namespaces are included in element names:
becomes:Tips for Working with CDA Documents
- Remember to use the
@
prefix for attributes - Always check if an element might be a list before accessing it directly
- In Liquid, use
['string']
to access attributes with@
prefixes. e.g.act.entry.code['@code']
- When generating XML, make sure to include required namespaces