Skip to content

Inspect-AI scorer adapter for ir.eval — DEFERRED #29

Description

@thorwhalen

Refs #12. One of the #12 backlog items. Decision: defer.

inspect-ai is not a dependency (ir declares only numpy/dol/ef/vd/argh), nothing in the codebase consumes it, and it appears only as a recommended backbone in ir_03 with a short scorer sketch. ir's native discovery/selection reports already cover the metrics an Inspect-AI scorer would expose (recall@k / NDCG@k / failure taxonomy / conditional commit rate). Wrapping ir's eval as Inspect-AI scorers now is speculative interoperability that adds a heavy optional dependency and an unused public surface.

Revisit if

there is a concrete consumer (an actual Inspect-AI eval run that needs ir as a retriever/scorer). If so, gate it behind an optional extra (ir[inspect]) with a lazy import module so import ir and the default test run stay offline — wrapping as_doc_retriever, a case→sample mapping, and a conditional selection scorer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions