Harmonization Framework

Installation

To install the harmonization framework, clone the repository, create a virtual environment with the required dependencies, and install the package into your python environment. Use of a virtual environment is recommended.

git clone git@github.com:bmir-radx/harmonization-framework.git

python3 -m venv venv 
source venv/bin/activate
pip install -r requirements.txt

pip install .

Usage

We recommend using the harmonization framework in an interactive Python environment like a Jupyter notebook. A demonstration is provided in demo/integration.ipynb.

Sidecar (local API service)

The package exposes a small sidecar entrypoint for running the FastAPI backend as a local service (intended to be launched by an Electron app).

Required environment variables:

API_PORT (required): port to bind.
API_HOST (optional): defaults to 127.0.0.1.

Example:

API_PORT=54321 API_HOST=127.0.0.1 harmonization-sidecar

When running, the health check is available at:

GET http://127.0.0.1:54321/health/

Graceful shutdown is supported via:

POST http://127.0.0.1:54321/shutdown/

Logs are written to stdout/stderr as JSON lines. Optionally, set API_LOG_PATH to also write logs to a file.

Sidecar packaging (CI)

The repository includes a GitHub Actions workflow that builds the sidecar executable for macOS, Windows, and Linux. The workflow outputs artifacts:

harmonization-sidecar-mac (tar.gz)
harmonization-sidecar-win (zip)
harmonization-sidecar-linux (tar.gz)

Artifacts are built under:

dist/sidecar/<os_short>/

Electron consumption (high level)

Electron should consume the per-OS artifact produced by the packaging workflow, unpack it into the app's bundled resources, and launch the sidecar binary at runtime. The launcher sets API_PORT (and optionally API_HOST) and then polls /health/ before issuing API calls.

See docs/electron_sidecar.md for the full packaging and launch guide.

Serialization Format

Harmonization rules and primitives serialize to JSON-friendly dictionaries with a consistent schema:

Rule shape:
- source (string)
- target (string)
- operations (list of operation dicts)
Operation shape:
- operation (snake_case identifier)
- other fields are snake_case
- numeric values are serialized as numbers (not strings)

Example:

{
  "source": "height_in",
  "target": "height_cm",
  "operations": [
    {"operation": "convert_units", "source_unit": "inch", "target_unit": "cm"},
    {"operation": "round", "precision": 1}
  ]
}

Primitives Reference

The table below lists the available primitives, their purpose, and settings. All settings are provided in the operation dict for rule serialization.

Operation	Purpose	Settings
`bin`	Bucket numeric values into non-overlapping ranges; returns the bin label.	`bins`: list of `{label,start,end}` (ranges must not overlap; inclusive bounds)
`cast`	Convert values between primitive types.	`source`: type `target`: type (`text`, `integer`, `boolean`, `decimal`, `float`); boolean casting accepts common string/number forms
`convert_date`	Convert date/time strings between formats.	`source_format`, `target_format` (strftime patterns; raises if parsing fails)
`convert_units`	Convert numeric values between units using pint.	`source_unit`, `target_unit` (Unit enum or pint string; raises on invalid units)
`do_nothing`	No-op transform (pass-through).	None
`enum_to_enum`	Map discrete values to other values.	`mapping` (dict) `strict` (bool, default `false`) `default` (optional)
`format_number`	Format numeric values with fixed decimal places.	`precision` (int, >=0); output is text (string)
`normalize_boolean`	Normalize truthy/falsy values to booleans.	`truthy` (list, optional; defaults below) `falsy` (list, optional; defaults below) `strict` (bool, default `true`) `default` (optional; used when `strict=false`)
`normalize_text`	Apply a single text normalization.	`normalization` (`strip`, `lower`, `upper`, `remove_accents`, `remove_punctuation`, `remove_special_characters`)
`offset`	Add an offset to numeric values.	`offset` (number)
`reduce`	Reduce a list of values to one value.	`reduction` (`any`, `none`, `all`, `one-hot`, `sum`); expects a list/tuple input; one-hot returns index or None
`round`	Round numeric values to a given precision.	`precision` (int, >=0); uses Python `round` semantics
`scale`	Multiply numeric values by a factor.	`scaling_factor` (number)
`substitute`	Regex-based string substitution.	`expression` (regex; validated) `substitution` (replacement)
`threshold`	Clamp numeric values between bounds.	`lower`, `upper` (numbers; lower <= upper; output type follows numeric promotion)
`truncate`	Cut strings to a max length.	`length` (int, >=0)

Defaults for normalize_boolean (used when truthy/falsy are not provided):

truthy: ["true","t","yes","y","1",1,true,"on"]
falsy: ["false","f","no","n","0",0,false,"off",""]

Primitive examples

Each operation is represented by a JSON-friendly dict. Examples:

Operation	Example
`bin`	`{"operation":"bin","bins":[{"label":"low","start":0,"end":9},{"label":"high","start":10,"end":19}]}`
`cast`	`{"operation":"cast","source":"text","target":"integer"}`
`convert_date`	`{"operation":"convert_date","source_format":"%Y-%m-%d","target_format":"%m/%d/%Y"}`
`convert_units`	`{"operation":"convert_units","source_unit":"inch","target_unit":"cm"}`
`do_nothing`	`{"operation":"do_nothing"}`
`enum_to_enum`	`{"operation":"enum_to_enum","mapping":{"BL":"baseline","FU":"follow_up"},"strict":false,"default":"unknown"}`
`format_number`	`{"operation":"format_number","precision":2}`
`normalize_boolean`	`{"operation":"normalize_boolean","truthy":["yes","y","1"],"falsy":["no","n","0"],"strict":true}`
`normalize_text`	`{"operation":"normalize_text","normalization":"lower"}`
`offset`	`{"operation":"offset","offset":2.5}`
`reduce`	`{"operation":"reduce","reduction":"one-hot"}`
`round`	`{"operation":"round","precision":2}`
`scale`	`{"operation":"scale","scaling_factor":0.453592}`
`substitute`	`{"operation":"substitute","expression":",","substitution":" "}`
`threshold`	`{"operation":"threshold","lower":0,"upper":100}`
`truncate`	`{"operation":"truncate","length":3}`

NormalizeBoolean defaults

If you use the normalize_boolean primitive without specifying truthy or falsy lists, the following defaults are applied:

truthy: ["true", "t", "yes", "y", "1", 1, true, "on"]
falsy: ["false", "f", "no", "n", "0", 0, false, "off", ""]

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
demo		demo
docs		docs
scripts		scripts
src/harmonization_framework		src/harmonization_framework
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harmonization Framework

Installation

Usage

Sidecar (local API service)

Sidecar packaging (CI)

Electron consumption (high level)

Serialization Format

Primitives Reference

Primitive examples

NormalizeBoolean defaults

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

bmir-radx/harmonization-framework

Folders and files

Latest commit

History

Repository files navigation

Harmonization Framework

Installation

Usage

Sidecar (local API service)

Sidecar packaging (CI)

Electron consumption (high level)

Serialization Format

Primitives Reference

Primitive examples

NormalizeBoolean defaults

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages