LENS: Role-Aware Clinical Summary Grading

LENS is a role-aware multi-agent grading pipeline for clinical summaries. The same summary is scored in parallel by three role-specific agents:

Physician
Triage Nurse
Bedside Nurse

Each role scores the summary across 8 rubric dimensions on a 1-5 scale. The system then computes a role-level weighted overall score, a cross-role overall score, and an Orchestrator Disagreement view that shows how far the three role scores differ on each dimension.

Core Capabilities

Parallel scoring by three role-specific agents
Shared 8-dimension LENS rubric
Two scoring modes:
- llm: OpenAI model-based scoring
- heuristic: local baseline scoring without API calls
Per-role weighted overall scoring based on questionnaire-derived role priors
Orchestrator validation, disagreement mapping, and score aggregation
Human-readable and JSON outputs

Pipeline Overview

Input a clinical summary.
Load rubric definitions and role configurations.
Run the three role agents in parallel.
Validate each role scorecard.
Build an Orchestrator Disagreement map for all 8 dimensions.
Aggregate the role outputs into:
- per-role scores
- per-role overall scores
- final overall score across roles

Repository Structure

config/lens_rubric.json Defines the 8 rubric dimensions and evaluation focus.
config/roles.json Defines the three role agents, persona metadata, and w_prior weights.
grading_pipeline/cli.py Command-line entrypoint and human-readable output formatting.
grading_pipeline/orchestrator.py Runs the multi-agent pipeline, validation, disagreement mapping, and aggregation.
grading_pipeline/llm_scoring.py LLM-based scoring logic.
grading_pipeline/scoring.py Heuristic baseline scoring and score utilities.
grading_pipeline/openai_client.py Minimal OpenAI Responses API client.
tests/ Input-validation and orchestrator tests.

Requirements

Python 3.10+
OpenAI API key for llm mode

This project uses the Python standard library only. There is no dependency installation step at the moment.

API Key Setup

If you want to run the LLM pipeline, you must use your own OpenAI API key.

Create a file named .env in the project root.

Project root:

same folder as README.md
same folder as config/
same folder as grading_pipeline/

Expected file location:

LENS Project/.env

Add the following line to .env:

OPENAI_API_KEY=your_openai_api_key_here

Optional override:

OPENAI_BASE_URL=https://api.openai.com/v1/responses

You can use .env.example as the template:

cp .env.example .env

Important notes:

.env is already ignored by git and should not be committed.
If you run with --engine heuristic, no API key is required.
The code reads OPENAI_API_KEY from .env first, then falls back to your shell environment.

Quick Start

Run with the default LLM mode:

python -m grading_pipeline --summary "Your summary here"

Run with the heuristic baseline:

python -m grading_pipeline --engine heuristic --summary "Your summary here"

Use a summary file:

python -m grading_pipeline --summary-file path/to/summary.txt

Output JSON instead of the human-readable report:

python -m grading_pipeline --summary "Your summary here" --format json --pretty

Select a specific model:

python -m grading_pipeline --model gpt-4o-mini --summary "Your summary here"

Adjust the disagreement threshold:

python -m grading_pipeline --gap-threshold 0.5 --summary "Your summary here"

Input Rules

The CLI validates summary input before the scoring pipeline runs.

The summary must:

be provided through --summary or --summary-file
not be empty
not be whitespace only
be at least 30 characters after trimming whitespace

If the summary is invalid, the CLI exits with a non-zero code and no scoring call is made.

Output

The human-readable output includes:

role-by-role scores for all 8 dimensions
a weighted Overall score for each role
Orchestrator Disagreement showing score gaps per dimension
final Overall Score across all three roles

Example output shape:

----------------------------------------
Role-Aware Multi-Agent Grading Pipeline:
----------------------------------------
Physician:
Factual Accuracy: 5.0
Relevant Chronic Problem Coverage: 4.0
...

Overall: 4.12
----------------------------------------
Triage Nurse:
...
----------------------------------------
Bedside Nurse:
...
----------------------------------------
----------------------------------------
Orchestrator Disagreement:
----------------------------------------
Factual Accuracy: 1.0
Relevant Chronic Problem Coverage: 0.0
...
----------------------------------------
Overall Score: 4.0

Scoring Logic

Each role has its own prior weights in config/roles.json.

Role-level overall score:

Role Overall = weighted average of the 8 dimension scores

Cross-role overall score:

Overall Score = average of the 3 role overall scores

Disagreement per dimension:

Gap = highest agent score - lowest agent score

Testing

Run the test suite:

pytest -q

Current tests cover:

CLI summary input validation
disagreement-map correctness
validation and repair behavior
conditional adjudication behavior
weighted aggregation behavior

Current Status

The current implementation includes:

parallel three-role scoring
role-aware weighting
strict input validation
orchestrator disagreement reporting
weighted final score aggregation
human-readable report formatting for demo and presentation use

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
config		config
docs		docs
schemas		schemas
src/grading_pipeline		src/grading_pipeline
static		static
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LENS: Role-Aware Clinical Summary Grading

Core Capabilities

Pipeline Overview

Repository Structure

Requirements

API Key Setup

Quick Start

Input Rules

Output

Scoring Logic

Testing

Current Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LENS: Role-Aware Clinical Summary Grading

Core Capabilities

Pipeline Overview

Repository Structure

Requirements

API Key Setup

Quick Start

Input Rules

Output

Scoring Logic

Testing

Current Status

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages