version	2.0
last_updated	2025-12-22

Troubleshooting Guide

This guide covers common issues, debugging techniques, and solutions for the pytest-based evaluation system.

Quick Diagnostics

When something goes wrong, check:

Exit code: What was pytest_exit_code? (0=ok, 1=failures, 2-5=infrastructure)
Infrastructure failure: Is infrastructure_failure: true?
Tests collected: Were tests discovered? Check pytest_collected
Pytest output: Read evaluation/stdout.txt and evaluation/stderr.txt
CTRF report: Check evaluation/report.json for raw results

Exit Codes

Code	Meaning	Action
0	All tests passed	Success!
1	Some tests failed	Review failure messages
2	Interrupted	Check timeout or manual cancel
3	Internal error	Pytest itself crashed - check stderr
4	Usage error	Invalid pytest options
5	No tests collected	Tests not found - check paths

Infrastructure Failures

Infrastructure failures mean pytest itself failed to run properly.

No Tests Collected (Exit Code 5)

Symptoms:

pytest_collected: 0
infrastructure_failure: true
Empty or no test results

Causes and Solutions:

Test files not copied correctly

# Check test directory in workspace
ls -la .evaluation_tests/

Wrong checkpoint in test filename

# Must be: test_checkpoint_1.py (not test_cp1.py)
problems/my_problem/tests/test_checkpoint_1.py

Import errors in test file

# Check pytest output for import errors
cat evaluation/stderr.txt

Missing conftest.py

# Required in tests/conftest.py
def pytest_addoption(parser):
    parser.addoption("--entrypoint", action="store", required=True)
    parser.addoption("--checkpoint", action="store", required=True)

Collection Errors (Exit Code 3)

Symptoms:

infrastructure_failure: true
Errors in stderr about syntax or imports

Causes and Solutions:

Syntax error in test file

# Check for Python syntax errors
python -m py_compile tests/test_checkpoint_1.py

Missing pytest dependency

# Add to problem config
test_dependencies:
  - "some-package>=1.0"

Fixture not defined

# Make sure conftest.py defines all required fixtures
@pytest.fixture
def entrypoint_argv(request):
    return shlex.split(request.config.getoption("--entrypoint"))

Test Failures

All Tests Fail

Check entrypoint:

# The entrypoint passed to tests
cat evaluation/stdout.txt | grep "entrypoint"

Check if submission runs:

# Try running manually in workspace
cd outputs/checkpoint_1
python main.py --help

Specific Tests Fail

Get failure details:

# Load results and inspect failures
for test in results.tests:
    if test.status == "failed":
        print(f"{test.id}: {test.failure_message}")

Check CTRF report for details:

import json
with open("evaluation/report.json") as f:
    ctrf = json.load(f)
for test in ctrf["results"]["tests"]:
    if test["status"] == "failed":
        print(test["name"], test.get("message", ""))

Timeout Errors

Symptoms:

Tests killed after timeout
pytest-timeout messages in output

Solutions:

Increase checkpoint timeout

checkpoints:
  checkpoint_1:
    timeout: 120  # seconds

Check for infinite loops in submission
Check for blocking I/O

uvx Issues

Dependency Installation Fails

Symptoms:

stderr shows pip/uv errors
infrastructure_failure: true

Solutions:

Check dependency format

test_dependencies:
  - "requests>=2.28"  # Version specifier
  - "pyyaml"          # Just package name

Check for incompatible versions

# Try installing manually
uvx --with=pytest --with=my-package pytest --version

Package Not Found

Check PyPI name:

pip search my-package  # Verify package exists

Marker Issues

Unknown Marker Warnings

Symptoms:

"PytestUnknownMarkWarning" in output
Tests still run but with warnings

Solutions:

Register marker in problem config

markers:
  my_marker:
    description: "My custom marker"
    group: Functionality

Use built-in markers
- @pytest.mark.error (GroupType.ERROR)
- @pytest.mark.functionality (GroupType.FUNCTIONALITY)
- @pytest.mark.regression (GroupType.REGRESSION)

Wrong GroupType Assignment

Check marker precedence:

Prior checkpoint tests → REGRESSION (regardless of markers)
@pytest.mark.error → ERROR (current checkpoint only)
@pytest.mark.regression → REGRESSION
Custom markers from config
@pytest.mark.functionality → FUNCTIONALITY
Default → CORE

Docker Issues

Container Fails to Start

Check Docker status:

docker ps -a
docker logs <container_id>

Common causes:

Port conflicts
Resource limits
Image not found

Network Issues

Check connectivity:

# Inside container
curl -v http://host.docker.internal:8080

Volume Mount Issues

Check permissions:

ls -la /workspace  # Inside container

Debugging Techniques

Read Pytest Output

# View stdout (test output)
cat outputs/checkpoint_1/evaluation/stdout.txt

# View stderr (errors and warnings)
cat outputs/checkpoint_1/evaluation/stderr.txt

Examine CTRF Report

import json
with open("outputs/checkpoint_1/evaluation/report.json") as f:
    report = json.load(f)

# Summary
print(f"Passed: {report['results']['summary']['passed']}")
print(f"Failed: {report['results']['summary']['failed']}")

# Individual tests
for test in report["results"]["tests"]:
    print(f"{test['name']}: {test['status']}")

Run Pytest Manually

# Navigate to workspace with tests
cd outputs/run_123/checkpoint_1

# Run pytest directly (similar to what PytestRunner does)
uvx \
  --with=pytest \
  --with=pytest-json-ctrf \
  --with=pytest-json-report \
  pytest \
  --entrypoint='python main.py' \
  --checkpoint='checkpoint_1' \
  -vv \
  .evaluation_tests/

Enable Verbose Logging

import logging
logging.basicConfig(level=logging.DEBUG)

from slop_code.evaluation import run_checkpoint_pytest
results = run_checkpoint_pytest(...)

Check Test File Copying

# Verify tests were copied
ls -la outputs/checkpoint_1/.evaluation_tests/

# Should contain:
# - conftest.py
# - test_checkpoint_1.py
# - (possibly test_checkpoint_0.py if include_prior_tests=true)

Common Errors

"ModuleNotFoundError: No module named 'conftest'"

Fix: Add __init__.py or check test directory structure

touch tests/__init__.py  # Sometimes needed

"fixture 'entrypoint_argv' not found"

Fix: Define fixture in conftest.py

@pytest.fixture(scope="session")
def entrypoint_argv(request):
    return shlex.split(request.config.getoption("--entrypoint"))

"unrecognized arguments: --entrypoint"

Fix: Add pytest_addoption in conftest.py

def pytest_addoption(parser):
    parser.addoption("--entrypoint", action="store", required=True)
    parser.addoption("--checkpoint", action="store", required=True)

"No tests ran"

Fix: Check test function names start with test_

# Wrong
def check_something():
    ...

# Correct
def test_something():
    ...

"Unable to run submission"

Fix: Check entrypoint configuration

# Problem config
entry_file: main.py

# Environment config
commands:
  command: python
  entry_file: "{entry_file}"

Performance Issues

Slow Test Execution

Reduce test count for development

pytest -k "test_basic" .evaluation_tests/

Increase parallelization (if tests are independent)

test_dependencies:
  - "pytest-xdist"
# Then use: pytest -n auto

Check for slow submission startup

High Memory Usage

Profile tests
```
pytest --memprof .evaluation_tests/
```
Reduce test data size

Use fixtures with proper scope

@pytest.fixture(scope="session")  # Not "function"
def expensive_data():
    return load_data()

Getting Help

When reporting issues, include:

Exit code and infrastructure_failure status
Pytest stdout and stderr (from evaluation/ directory)
Problem config (especially test_dependencies, markers)
Checkpoint config (timeout, include_prior_tests)
Test file structure (list of files in tests/)
Environment (Python version, Docker version if applicable)

Example Issue Report

Issue: Tests not collected

Exit code: 5
infrastructure_failure: true
pytest_collected: 0

Directory structure:
problems/my_problem/
├── config.yaml
└── tests/
    ├── conftest.py
    └── test_checkpoint_1.py

stderr:
ModuleNotFoundError: No module named 'custom_utils'

config.yaml:
test_dependencies: []  # Missing custom_utils

Next Steps

Understand architecture: Architecture Guide
Check configuration: Configuration Guide
Interpret results: Reporting Guide

FilesExpand file tree

troubleshooting.md

Latest commit

History

troubleshooting.md

File metadata and controls

Troubleshooting Guide

Quick Diagnostics

Exit Codes

Infrastructure Failures

No Tests Collected (Exit Code 5)

Collection Errors (Exit Code 3)

Test Failures

All Tests Fail

Specific Tests Fail

Timeout Errors

uvx Issues

Dependency Installation Fails

Package Not Found

Marker Issues

Unknown Marker Warnings

Wrong GroupType Assignment

Docker Issues

Container Fails to Start

Network Issues

Volume Mount Issues

Debugging Techniques

Read Pytest Output

Examine CTRF Report

Run Pytest Manually

Enable Verbose Logging

Check Test File Copying

Common Errors

"ModuleNotFoundError: No module named 'conftest'"

"fixture 'entrypoint_argv' not found"

"unrecognized arguments: --entrypoint"

"No tests ran"

"Unable to run submission"

Performance Issues

Slow Test Execution

High Memory Usage

Getting Help

Example Issue Report

Next Steps