Refs #12. One of the #12 backlog items. Decision: defer.
inspect-ai is not a dependency (ir declares only numpy/dol/ef/vd/argh), nothing in the codebase consumes it, and it appears only as a recommended backbone in ir_03 with a short scorer sketch. ir's native discovery/selection reports already cover the metrics an Inspect-AI scorer would expose (recall@k / NDCG@k / failure taxonomy / conditional commit rate). Wrapping ir's eval as Inspect-AI scorers now is speculative interoperability that adds a heavy optional dependency and an unused public surface.
Revisit if
there is a concrete consumer (an actual Inspect-AI eval run that needs ir as a retriever/scorer). If so, gate it behind an optional extra (ir[inspect]) with a lazy import module so import ir and the default test run stay offline — wrapping as_doc_retriever, a case→sample mapping, and a conditional selection scorer.
Refs #12. One of the #12 backlog items. Decision: defer.
inspect-aiis not a dependency (ir declares only numpy/dol/ef/vd/argh), nothing in the codebase consumes it, and it appears only as a recommended backbone inir_03with a short scorer sketch. ir's native discovery/selection reports already cover the metrics an Inspect-AI scorer would expose (recall@k / NDCG@k / failure taxonomy / conditional commit rate). Wrapping ir's eval as Inspect-AI scorers now is speculative interoperability that adds a heavy optional dependency and an unused public surface.Revisit if
there is a concrete consumer (an actual Inspect-AI eval run that needs ir as a retriever/scorer). If so, gate it behind an optional extra (
ir[inspect]) with a lazy import module soimport irand the default test run stay offline — wrappingas_doc_retriever, a case→sample mapping, and a conditional selection scorer.