Scope question: thin `graphrag eval` runner, or keep all evaluation in benchmark-qed? #2338

machachlouei · 2026-04-27T14:47:56Z

machachlouei
Apr 27, 2026

Question for maintainers: Would a minimal graphrag eval command that runs queries through existing modes and emits structured outputs (no scoring) be in scope here, or should all evaluation tooling stay in microsoft/benchmark-qed?

I'm aware BenchmarkQED exists and that it's positioned as the evaluation companion to this library. Before I propose anything more concrete, I want to check whether a thin runner inside graphrag itself would be welcome or whether you'd prefer the boundary stay where it is today.

The narrow shape I have in mind — explicitly not including scoring:

graphrag eval \
  --dataset questions.jsonl \
  --method local --method global \
  --output results.jsonl

Reads questions from JSONL
Runs each through one or more existing query modes (local, global, drift, basic)
Records answer, latency, token usage if available, and metadata
Writes JSONL results
Leaves all judging / similarity / faithfulness scoring to BenchmarkQED or the user

The motivation is just convenience: today users running mode comparisons end up writing the same loop-over-questions script. A first-party runner would give a consistent entry point without duplicating anything BenchmarkQED already does well.

Totally fine with "no, please contribute that to benchmark-qed instead" as an answer — I just want to know which side of the line this falls on before doing any work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope question: thin `graphrag eval` runner, or keep all evaluation in benchmark-qed? #2338

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Scope question: thin graphrag eval runner, or keep all evaluation in benchmark-qed? #2338

Uh oh!

machachlouei Apr 27, 2026

Replies: 0 comments

Scope question: thin `graphrag eval` runner, or keep all evaluation in benchmark-qed? #2338

machachlouei
Apr 27, 2026