Skip to content

andyxhadji/nd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nd

Autonomous AgentField agents for processing MR comments.

Overview

nd provides two agents that work together to automate code review comment handling:

  1. Triage Agent - Polls middleman for new MR comments, classifies them as actionable or not, and creates kata tasks for items requiring attention.

  2. Worker Agent - Claims tasks from kata, analyzes complexity, executes code changes via harness, runs roborev for quality validation, and posts responses after human approval.

Quick Start

# Install
pip install -e .

# Run triage agent
python -m nd.triage

# Run worker agent
python -m nd.worker

Configuration

All configuration via environment variables:

Variable Default Description
AGENTFIELD_URL http://localhost:8080 AgentField control plane URL
WORKER_NODE_ID nd-worker Worker agent node ID to trigger when tasks are created
MIDDLEMAN_URL http://localhost:8091 Middleman API URL
MIDDLEMAN_DB ~/.middleman/middleman.db Middleman SQLite database path
KATA_SERVER (empty) Kata daemon URL. Empty → local auto-start (host runs only). For Docker, compose sets http://127.0.0.1:7878 so agents reach the in-compose kata-daemon service over the shared network namespace.
AGENT_PORT 0 (auto) Fixed port for the agent's HTTP server. Used by Docker Compose to give each agent (triage, worker-1, worker-2) a distinct port inside the shared kata-daemon netns. Empty/0 → auto-pick.
CONFIDENCE_THRESHOLD 70 Minimum confidence for auto-execution
ROBOREV_MAX_ITERATIONS 3 Max roborev-refine iterations
TRIAGE_MODEL bedrock/converse/arn:aws:bedrock:us-east-1:657062785455:application-inference-profile/mj2ayeqbysnr LLM model for triage
WORKER_MODEL bedrock/converse/arn:aws:bedrock:us-east-1:657062785455:application-inference-profile/mj2ayeqbysnr LLM model for worker
AGENT_INSTANCE_ID worker-1 Unique ID for worker instance
GITHUB_TOKEN (empty) GitHub API token for posting responses
GITLAB_TOKEN (empty) GitLab API token for posting responses
ND_CURRENT_USER (empty) Username to filter MRs
ND_ASSIGNED_USERNAMES (empty) Comma-separated usernames for poll_issues. If empty, poll_issues returns an error
WORKSPACE_ROOT /var/nd Root directory for the worker's bare git cache (<root>/repos/...) and per-task worktrees (<root>/work/...). Ephemeral by default; mount as a docker volume to persist the cache across container restarts.
ND_WORKSPACE_ROOT ./.nd-workspace Host path mounted to /var/nd by Docker Compose for durable worker worktrees and bare repo cache.
WORKSPACE_KEEP_ON_FAILURE true When a task fails or pauses, leave the worktree on disk for human inspection. Set to 0 / false to also clean up failed runs.
OPENROUTER_API_KEY (required) OpenRouter API key (or AWS creds for Bedrock models)

Setting environment variables

A starter template lives at .env.example. Copy it to .env.local (gitignored) and fill in real values before running anything that depends on it:

cp .env.example .env.local
# Edit .env.local and add required credentials

Required variables in .env.local:

  • GITHUB_TOKEN or GITLAB_TOKEN - for posting responses to MRs/PRs
  • ND_CURRENT_USER - your username for filtering MR comments
  • ND_ASSIGNED_USERNAMES - comma-separated list for issue polling
  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN - for Bedrock models

Important: The docker-compose.yml has been configured to load AWS credentials and ND_CURRENT_USER from .env.local via env_file. Do NOT add these variables to the environment: section in docker-compose.yml, as shell variable interpolation will override the .env.local values with empty strings.

AWS Credentials for Bedrock

When using Bedrock models, you need AWS credentials with bedrock:InvokeModel permissions. The required AWS role depends on your organization's IAM configuration:

  • horizon-okta role (recommended): Has Bedrock permissions. Credentials are typically available via aws configure export-credentials when logged in through AWS SSO/Okta.
  • horizon role (from saml2aws): May have an explicit deny policy for Bedrock. If you see errors like is not authorized to perform: bedrock:InvokeModel ... with an explicit deny in an identity-based policy, you're using the wrong role.

To get working credentials:

# If using AWS SSO/Okta (horizon-okta role):
aws configure export-credentials --format env-no-export

# Copy the output to .env.local:
# AWS_ACCESS_KEY_ID=ASIAZR676OGX...
# AWS_SECRET_ACCESS_KEY=...
# AWS_SESSION_TOKEN=...

If aws configure export-credentials doesn't work, check ~/.aws/credentials or contact your AWS administrator to ensure your role has bedrock:InvokeModel permissions for the inference profile ARN configured in WORKER_MODEL.

For local runs (python -m nd.triage, pytest, ./test-local.sh):

Source .env.local before running, or use ./test-local.sh which loads it automatically:

# .env.local
OPENROUTER_API_KEY=sk-or-...
ND_CURRENT_USER=your-username
ND_ASSIGNED_USERNAMES=alice,bob
KATA_SERVER=https://kata.example.com
GITHUB_TOKEN=ghp_...

For Docker Compose (docker compose up):

Both the triage and worker-* services load .env.local via env_file:. Add any required vars there:

# .env.local
ND_CURRENT_USER=your-username
ND_ASSIGNED_USERNAMES=alice,bob
KATA_SERVER=https://kata.example.com
# AWS creds if using Bedrock models
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_SESSION_TOKEN=...

After editing .env.local, recreate the container so it picks up the new values:

docker compose up -d --force-recreate triage
docker compose up -d --force-recreate worker-1 worker-2

Verify a var made it into the container:

docker compose exec triage printenv ND_ASSIGNED_USERNAMES

Precedence note: Variables listed under environment: in docker-compose.yml take precedence over env_file. If a var is interpolated like - FOO=${FOO} and your shell doesn't export FOO, it resolves to an empty string and overrides .env.local. To avoid surprises, define the var only in .env.local (not also in environment:), or export it in the shell before running compose.

Worker workspaces

The worker prepares a fresh git worktree for every claimed task, backed by a shared bare cache. The on-disk layout under WORKSPACE_ROOT (default /var/nd) is:

/var/nd/
├── repos/<host>/<owner>/<repo>.git/   # bare cache, fetched once per task
└── work/<task-slug>/                  # per-task worktree

Behavior:

  • MR tasks check out the MR's head_branch directly.
  • Issue tasks create nd/issue-<short_id> off the repo's default branch (resolved from origin/HEAD).
  • On successful completion the worker removes the worktree; on failure or pause it is left in place for inspection by default. Set WORKSPACE_KEEP_ON_FAILURE=0 (or false) to also tear it down on failed/paused runs.

Docker Compose bind-mounts ${ND_WORKSPACE_ROOT:-./.nd-workspace} to /var/nd, so paused/failed worktrees and the bare cache are inspectable from the host. Override ND_WORKSPACE_ROOT in your shell or .env.local to put this state somewhere else.

Worker containers also mount ${HOME}/.claude and ${HOME}/.claude.json to /root, and set the Claude Code Bedrock environment variables, so the Claude Code harness can use the same provider configuration as the host.

Kata daemon for Docker

Compose runs kata's daemon as its own service (kata-daemon) listening on 127.0.0.1:7878. The agent services (triage, worker-1, worker-2) all use network_mode: "service:kata-daemon" so they share that container's network namespace and can reach the daemon on loopback — required because kata refuses to start on a non-loopback TCP listener (see internal/daemon/auth.go checkAuthStartup).

Key consequences:

  • Tasks created from compose live in the kata-data named volume, not in your host's ~/.kata/kata.db. They are not visible to the host kata CLI. This is the price of running kata fully inside docker on macOS, where Docker Desktop cannot bridge host unix sockets into containers.
  • Agents share one network namespace. Each agent binds a distinct AGENT_PORT (8001, 8002, 8003) to avoid collisions, and is reachable from agentfield as kata-daemon:<AGENT_PORT>.
  • No KATA_HOME/KATA_DB/KATA_DB_HASH plumbing is needed in .env.local — those concepts only matter to the daemon itself, which is configured by the kata-daemon service block.

To inspect tasks created from compose:

docker compose exec kata-daemon kata list
docker compose exec kata-daemon kata projects list

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│    Middleman    │────▶│  Triage Agent   │────▶│      Kata       │
│   (MR Comments) │     │  (Classifies)   │     │    (Tasks)      │
└─────────────────┘     └─────────────────┘     └────────┬────────┘
                                                         │
                                                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  GitHub/GitLab  │◀────│  Worker Agent   │◀────│  Worker Agent   │
│   (Responses)   │     │ (Executes code) │     │ (Claims tasks)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌─────────────────┐
                        │    Roborev      │
                        │ (Code review)   │
                        └─────────────────┘

Human Approval Gates

The worker agent pauses for human approval at three points:

  1. Spec Review - For low-confidence tasks, a spec is generated and requires approval
  2. Roborev Failure - If roborev finds issues that can't be auto-fixed
  3. Response Approval - All responses require approval before posting

Docker Deployment

# First time setup
cp .env.example .env.local
# Edit .env.local with your credentials

# Start all services
docker compose up -d

# Check service status
docker compose ps

# View logs
docker compose logs triage --tail=50
docker compose logs worker-1 --tail=50

# Restart services after config changes
docker compose down
docker compose up -d

# Or recreate specific services
docker compose up -d --force-recreate triage worker-1 worker-2

Verifying the deployment

# Check agent health
docker compose exec triage curl -sS http://localhost:8001/health
docker compose exec worker-1 curl -sS http://localhost:8002/health

# Verify environment variables loaded
docker compose exec triage printenv | grep -E "ND_CURRENT_USER|AWS_ACCESS_KEY_ID"

# Check kata daemon
docker compose exec kata-daemon kata projects list

# Test AgentField connectivity (should show "Connected to AgentField server")
docker compose logs triage | grep -i agentfield

Testing

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run unit tests only
pytest tests/unit/ -v

# Run functional tests
pytest tests/functional/ -v

# Run the full local worker smoke (creates a real GitHub PR; not for CI)
ND_RUN_FULL_WORKER_SMOKE=1 pytest tests/local/test_full_worker_smoke.py -v -s

# Run with coverage
pytest --cov=nd

Project Structure

nd/
├── __init__.py                 # Package init, version
├── schemas.py                  # All Pydantic models (shared)
├── config.py                   # Environment config loader
├── clients/
│   ├── __init__.py
│   ├── middleman.py            # Middleman API client
│   ├── kata.py                 # Kata CLI wrapper
│   ├── platform.py             # GitHub/GitLab API posting
│   └── workspace.py            # Bare git cache + per-task worktrees
├── triage/
│   ├── __init__.py
│   ├── agent.py                # Triage agent definition
│   ├── classifier.py           # Actionable classification logic
│   └── __main__.py             # Entry point: python -m nd.triage
├── worker/
│   ├── __init__.py
│   ├── agent.py                # Worker agent definition
│   ├── analyzer.py             # Task complexity analysis
│   └── __main__.py             # Entry point: python -m nd.worker
tests/
├── unit/                       # Unit tests
├── functional/                 # Functional tests

Troubleshooting

Agents can't reach AgentField

Symptoms: Logs show "AgentField server unavailable - running in degraded mode" or "Could not resolve host: agentfield"

Causes:

  1. Port conflict preventing agentfield from binding to port 8081
  2. agentfield container not on the Docker network

Solutions:

# Check for port conflicts
lsof -i :8081

# Stop conflicting containers
docker ps -a | grep agentfield
docker stop <container-id>

# Recreate all services
docker compose down
docker compose up -d

# Verify agentfield network connectivity
docker inspect fire-tortellini-agentfield-1 | grep -A 10 Networks
docker compose exec worker-1 curl -sS http://agentfield:8080/health

AWS credentials expired or wrong role

Symptoms:

  • "The security token included in the request is expired"
  • "User: arn:aws:sts::657062785455:assumed-role/horizon/... is not authorized to perform: bedrock:InvokeModel ... with an explicit deny in an identity-based policy"

Root cause: The horizon role (from saml2aws) may lack Bedrock permissions, while horizon-okta role (from AWS SSO) has them.

Solution:

# Option 1: Use horizon-okta credentials (recommended)
aws configure export-credentials --format env-no-export
# Copy AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN to .env.local

# Option 2: If using saml2aws, check which role has Bedrock access
aws sts get-caller-identity  # Check current role
# Look for "assumed-role/horizon-okta" (good) vs "assumed-role/horizon" (may be denied)

# After updating .env.local, recreate workers
docker compose up -d --force-recreate worker-1 worker-2

# Verify Bedrock access works
docker compose exec worker-1 python -c "
import boto3, os
client = boto3.Session(
    aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'],
    aws_session_token=os.environ['AWS_SESSION_TOKEN'],
    region_name='us-east-1'
).client('bedrock-runtime')
response = client.invoke_model(
    modelId='arn:aws:bedrock:us-east-1:657062785455:application-inference-profile/mj2ayeqbysnr',
    body='{\"anthropic_version\":\"bedrock-2023-05-31\",\"max_tokens\":10,\"messages\":[{\"role\":\"user\",\"content\":\"test\"}]}'
)
print('✓ Bedrock access verified')
"

Workers not claiming tasks

Symptoms: claim_task returns {"claimed": false} even though tasks exist

Possible causes:

  1. Task already owned by another worker
  2. Task doesn't have nd label
  3. Task is in wrong project

Debug:

# Check tasks
docker compose exec kata-daemon kata list --project <project-name>

# Check task labels and owner
docker compose exec kata-daemon kata list --project <project-name> --json | python -m json.tool

Task body format errors

Symptoms: Worker fails with "Could not parse task body" or "platform_host must be non-empty"

Cause: Task body doesn't match expected format from KataClient.build_issue_task_body()

Solution: Use the triage agent's create_issue_task reasoner which formats tasks correctly, or manually format the task body to match the expected structure with headers like ## Issue Context.

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors