Scope question: thin graphrag eval runner, or keep all evaluation in benchmark-qed?
#2338
machachlouei
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Question for maintainers: Would a minimal
graphrag evalcommand that runs queries through existing modes and emits structured outputs (no scoring) be in scope here, or should all evaluation tooling stay in microsoft/benchmark-qed?I'm aware BenchmarkQED exists and that it's positioned as the evaluation companion to this library. Before I propose anything more concrete, I want to check whether a thin runner inside graphrag itself would be welcome or whether you'd prefer the boundary stay where it is today.
The narrow shape I have in mind — explicitly not including scoring:
local,global,drift,basic)The motivation is just convenience: today users running mode comparisons end up writing the same loop-over-questions script. A first-party runner would give a consistent entry point without duplicating anything BenchmarkQED already does well.
Totally fine with "no, please contribute that to benchmark-qed instead" as an answer — I just want to know which side of the line this falls on before doing any work.
Beta Was this translation helpful? Give feedback.
All reactions