Component
BaseComponent
Bases: Generic[T]
, ABC
Abstract base class for all components in the pipeline.
This class should be subclassed to create specific components. Subclasses must implement the call method.
Source code in healthchain/pipeline/components/base.py
__call__(data)
abstractmethod
Process the input data and return the processed data.
PARAMETER | DESCRIPTION |
---|---|
data
|
The input data to be processed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataContainer[T]
|
DataContainer[T]: The processed data. |
Source code in healthchain/pipeline/components/base.py
Component
Bases: BaseComponent[T]
A concrete implementation of the BaseComponent class.
This class can be used as a base for creating specific components that do not require any additional processing logic.
METHOD | DESCRIPTION |
---|---|
__call__ |
DataContainer[T]) -> DataContainer[T]: Process the input data and return the processed data. In this implementation, the input data is returned unmodified. |
Source code in healthchain/pipeline/components/base.py
HFTransformer
Bases: BaseComponent[str]
A component that integrates Hugging Face transformers models into the pipeline.
This component allows using any Hugging Face model and task within the pipeline by wrapping the transformers.pipeline API. The model outputs are stored in the document's model_outputs container under the "huggingface" source key.
Note that this component is only recommended for non-conversational language tasks. For chat-based tasks, consider using LangChainLLM instead.
PARAMETER | DESCRIPTION |
---|---|
pipeline
|
A pre-configured HuggingFace pipeline object to use for inference. Must be an instance of transformers.pipelines.base.Pipeline.
TYPE:
|
ATTRIBUTE | DESCRIPTION |
---|---|
task |
The task name of the underlying pipeline, e.g. "sentiment-analysis", "ner". Automatically extracted from the pipeline object.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ImportError
|
If the transformers package is not installed |
TypeError
|
If pipeline is not a valid HuggingFace Pipeline instance |
Example
Initialize for sentiment analysis
from transformers import pipeline nlp = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english") component = HFTransformer(pipeline=nlp) doc = component(doc) # Analyzes sentiment of doc.data
Or use the factory method
component = HFTransformer.from_model_id( ... model="facebook/bart-large-cnn", ... task="summarization", ... max_length=130, ... min_length=30, ... do_sample=False ... ) doc = component(doc) # Generates summary of doc.data
Source code in healthchain/pipeline/components/integrations.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
__call__(doc)
Process the document using the Hugging Face pipeline. Adds outputs to .model_outputs['huggingface'].
Source code in healthchain/pipeline/components/integrations.py
__init__(pipeline)
Initialize with a pre-configured HuggingFace pipeline.
PARAMETER | DESCRIPTION |
---|---|
pipeline
|
A pre-configured HuggingFace pipeline object from transformers.pipeline(). Must be an instance of transformers.pipelines.base.Pipeline.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ImportError
|
If transformers package is not installed |
TypeError
|
If pipeline is not a valid HuggingFace Pipeline instance |
Source code in healthchain/pipeline/components/integrations.py
from_model_id(model, task, **kwargs)
classmethod
Create a transformer component from a model identifier.
Factory method that initializes a HuggingFace pipeline with the specified model and task, then wraps it in a HFTransformer component.
PARAMETER | DESCRIPTION |
---|---|
model
|
The model identifier or path to load. Can be: - A model ID from the HuggingFace Hub (e.g. "bert-base-uncased") - A local path to a saved model
TYPE:
|
task
|
The task to run (e.g. "text-classification", "token-classification", "summarization")
TYPE:
|
**kwargs
|
Additional configuration options passed to transformers.pipeline() Common options include: - device: Device to run on ("cpu", "cuda", etc.) - batch_size: Batch size for inference - model_kwargs: Dict of model-specific args
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
HFTransformer
|
Initialized transformer component wrapping the pipeline
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If invalid kwargs are passed to pipeline initialization |
ValueError
|
If pipeline initialization fails for any other reason |
ImportError
|
If transformers package is not installed |
Source code in healthchain/pipeline/components/integrations.py
LangChainLLM
Bases: BaseComponent[str]
A component that integrates LangChain chains into the pipeline.
This component allows using any LangChain chain within the pipeline by wrapping the chain's invoke method. The chain outputs are stored in the document's model_outputs container under the "langchain" source key.
PARAMETER | DESCRIPTION |
---|---|
chain
|
The LangChain chain to run on the document text. Must be a Runnable object from the LangChain library.
TYPE:
|
task
|
The task name to use when storing outputs, e.g. "summarization", "chat". Used as key to organize model outputs in the document's model container.
TYPE:
|
**kwargs
|
Additional parameters to pass to the chain's invoke method. These are forwarded directly to the chain's invoke() call.
TYPE:
|
RAISES | DESCRIPTION |
---|---|
TypeError
|
If chain is not a LangChain Runnable object or if invalid kwargs are passed |
ValueError
|
If there is an error during chain invocation |
ImportError
|
If langchain-core package is not installed |
Example
from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI
chain = ChatPromptTemplate.from_template("What is {input}?") | ChatOpenAI() component = LangChainLLM(chain=chain, task="chat") doc = component(doc) # Runs the chain on doc.data and stores output
Source code in healthchain/pipeline/components/integrations.py
__call__(doc)
Process the document using the LangChain chain. Adds outputs to .model_outputs['langchain'].
Source code in healthchain/pipeline/components/integrations.py
__init__(chain, task, **kwargs)
Initialize with a LangChain chain.
Source code in healthchain/pipeline/components/integrations.py
SpacyNLP
Bases: BaseComponent[str]
A component that integrates spaCy models into the pipeline.
This component allows using any spaCy model within the pipeline by loading and applying it to process text documents. The spaCy doc outputs are stored in the document's nlp annotations container under .spacy_docs.
PARAMETER | DESCRIPTION |
---|---|
nlp
|
A pre-configured spaCy Language object.
TYPE:
|
Example
Using pre-configured pipeline
import spacy nlp = spacy.load("en_core_web_sm", disable=["parser"]) component = SpacyNLP(nlp) doc = component(doc)
Or using model name
component = SpacyNLP.from_model_id("en_core_web_sm", disable=["parser"]) doc = component(doc)
Source code in healthchain/pipeline/components/integrations.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
__call__(doc)
Process the document using the spaCy pipeline. Adds outputs to nlp.spacy_docs.
Source code in healthchain/pipeline/components/integrations.py
__init__(nlp)
from_model_id(model, **kwargs)
classmethod
Create a SpacyNLP component from a model identifier.
PARAMETER | DESCRIPTION |
---|---|
model
|
The name or path of the spaCy model to load. Can be a model name like 'en_core_web_sm' or path to saved model.
TYPE:
|
**kwargs
|
Additional configuration options passed to spacy.load. Common options include disable, exclude, enable.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
SpacyNLP
|
Initialized spaCy component
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ImportError
|
If spaCy or the specified model is not installed |
TypeError
|
If invalid kwargs are passed to spacy.load |
Source code in healthchain/pipeline/components/integrations.py
requires_package(package_name, import_path)
Decorator to check if an optional package is available.
PARAMETER | DESCRIPTION |
---|---|
package_name
|
Name of the package to install (e.g., 'langchain-core')
TYPE:
|
import_path
|
Import path to check (e.g., 'langchain_core.runnables')
TYPE:
|
Source code in healthchain/pipeline/components/integrations.py
TextPreProcessor
Bases: BaseComponent[Document]
A component for preprocessing text documents.
This class applies various cleaning and tokenization steps to a Document object, based on the provided configuration.
ATTRIBUTE | DESCRIPTION |
---|---|
tokenizer |
The tokenizer to use. Can be "basic" or a custom tokenization function that takes a string and returns a list of tokens. Defaults to "basic".
TYPE:
|
lowercase |
Whether to convert text to lowercase. Defaults to False.
TYPE:
|
remove_punctuation |
Whether to remove punctuation. Defaults to False.
TYPE:
|
standardize_spaces |
Whether to standardize spaces. Defaults to False.
TYPE:
|
regex |
List of regex patterns and replacements. Defaults to an empty list.
TYPE:
|
tokenizer_func |
The tokenization function.
TYPE:
|
cleaning_steps |
List of text cleaning functions.
TYPE:
|
Source code in healthchain/pipeline/components/preprocessors.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
__call__(doc)
Preprocess the given Document.
This method applies the configured cleaning steps and tokenization to the document's text (in that order).
PARAMETER | DESCRIPTION |
---|---|
doc
|
The document to preprocess.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Document
|
The preprocessed document with updated tokens and preprocessed text.
TYPE:
|
Source code in healthchain/pipeline/components/preprocessors.py
__init__(tokenizer='basic', lowercase=False, remove_punctuation=False, standardize_spaces=False, regex=None)
Initialize the TextPreprocessor with the given configuration.
PARAMETER | DESCRIPTION |
---|---|
tokenizer
|
The tokenizer to use. Can be "basic" or a custom tokenization function that takes a string and returns a list of tokens. Defaults to "basic".
TYPE:
|
lowercase
|
Whether to convert text to lowercase. Defaults to False.
TYPE:
|
remove_punctuation
|
Whether to remove punctuation. Defaults to False.
TYPE:
|
standardize_spaces
|
Whether to standardize spaces. Defaults to False.
TYPE:
|
regex
|
List of regex patterns and replacements. Defaults to None.
TYPE:
|
Source code in healthchain/pipeline/components/preprocessors.py
TextPostProcessor
Bases: BaseComponent[Document]
A component for post-processing text documents, specifically for refining entities.
This class applies post-coordination rules to entities in a Document object, replacing entities with their refined versions based on a lookup dictionary.
ATTRIBUTE | DESCRIPTION |
---|---|
entity_lookup |
A dictionary for entity refinement lookups.
TYPE:
|
Source code in healthchain/pipeline/components/postprocessors.py
__call__(doc)
Apply post-processing to the given Document.
This method refines the entities in the document based on the entity_lookup. If an entity exists in the lookup, it is replaced with its refined version.
PARAMETER | DESCRIPTION |
---|---|
doc
|
The document to be post-processed.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Document
|
The post-processed document with refined entities.
TYPE:
|
Note
If the entity_lookup is empty or the document has no 'entities' attribute, the document is returned unchanged.
Source code in healthchain/pipeline/components/postprocessors.py
__init__(postcoordination_lookup=None)
Initialize the TextPostProcessor with an optional postcoordination lookup.
PARAMETER | DESCRIPTION |
---|---|
postcoordination_lookup
|
A dictionary for entity refinement lookups. If not provided, an empty dictionary will be used.
TYPE:
|
Source code in healthchain/pipeline/components/postprocessors.py
CdsCardCreator
Bases: BaseComponent[str]
Component that creates CDS Hooks cards from model outputs or static content.
This component formats text into CDS Hooks cards that can be displayed in an EHR system.
It can create cards from either:
1. Model-generated text stored in a document's model outputs container
2. Static content provided during initialization
The component uses Jinja2 templates to format the text into valid CDS Hooks card JSON.
The generated cards are added to the document's CDS container.
Args:
template (str, optional): Jinja2 template string for card creation. If not provided,
uses a default template that creates an info card.
template_path (Union[str, Path], optional): Path to a Jinja2 template file.
static_content (str, optional): Static text to use instead of model output.
source (str, optional): Source framework to get model output from (e.g. "huggingface").
task (str, optional): Task name to get model output from (e.g. "summarization").
delimiter (str, optional): String to split model output into multiple cards.
default_source (Dict[str, Any], optional): Default source info for cards.
Defaults to {"label": "Card Generated by HealthChain"}.
Example:
>>> # Create cards from model output
>>> creator = CdsCardCreator(source="huggingface", task="summarization")
>>> doc = creator(doc) # Creates cards from model output
>>>
>>> # Create cards with static content
>>> creator = CdsCardCreator(static_content="Static card message")
>>> doc = creator(doc) # Creates card with static content
>>>
>>> # Create cards with custom template
>>> template = '''
... {
... "summary": "{{ model_output[:140] }}",
... "indicator": "info",
... "source": {{ default_source | tojson }},
... "detail": "{{ model_output }}"
... }
... '''
>>> creator = CdsCardCreator(
... template=template,
... source="langchain",
... task="chat",
... delimiter="
" ... ) >>> doc = creator(doc) # Creates cards split by newlines
Source code in healthchain/pipeline/components/cdscardcreator.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
|
__call__(doc)
Process a document and create CDS Hooks cards from model outputs or static content.
Creates cards in one of two ways: 1. From model-generated text stored in the document's model outputs container, accessed using the configured source and task 2. From static content provided during initialization
The generated text can optionally be split into multiple cards using a delimiter. Each piece of text is formatted using the configured template into a CDS Hooks card and added to the document's CDS container.
PARAMETER | DESCRIPTION |
---|---|
doc
|
Document containing model outputs and CDS container
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Document
|
The input document with generated CDS cards added to its CDS container
TYPE:
|
RAISES | DESCRIPTION |
---|---|
ValueError
|
If neither model configuration (source and task) nor static content is provided for card creation |
Source code in healthchain/pipeline/components/cdscardcreator.py
create_card(content)
Creates a CDS Card using the template and model output.