feat: add multi-index query support (queryAll / query_all)#3
feat: add multi-index query support (queryAll / query_all)#3
Conversation
- Add MultiQueryResult class (TS + Python) with combined_context, labels, per-index results, and optional agentic answer - Add static TreeDex.queryAll() (TS, parallel) and TreeDex.query_all() (Python) — queries N indexes, merges context with [Label] separators - Export MultiQueryResult from src/index.ts and treedex/__init__.py - Add 10 new tests in test/core.test.ts and tests/test_core.py covering multi-index queries, custom labels, shared LLM, agentic mode, and validation errors - Add examples/multi_index.py and examples/node/multi-index.ts - Add docs/how-it-works.md — full phase-by-phase pipeline walkthrough - Update docs/api.md, README.md, DOCS.md with MultiQueryResult reference - Fix docs nav_order conflict (how-it-works.md=3, shifted api.md+ up by 1)
📝 WalkthroughWalkthroughThis pull request introduces a new multi-index querying API to TreeDex, enabling simultaneous queries across multiple TreeDex indexes with merged results. New Changes
Sequence DiagramsequenceDiagram
participant Client
participant TreeDex1 as TreeDex Index 1
participant TreeDex2 as TreeDex Index 2
participant TreeDex3 as TreeDex Index N
participant ContextMerge as Context Merger
participant LLM as LLM (Agentic)
Client->>TreeDex1: query(question, agentic=false)
Client->>TreeDex2: query(question, agentic=false)
Client->>TreeDex3: query(question, agentic=false)
par Concurrent Queries
TreeDex1-->>Client: QueryResult (context)
TreeDex2-->>Client: QueryResult (context)
TreeDex3-->>Client: QueryResult (context)
end
Client->>ContextMerge: merge(results, labels)
ContextMerge-->>Client: combinedContext [Label1]...[LabelN]
alt Agentic Mode Enabled
Client->>LLM: answerPrompt(combinedContext, question)
LLM-->>Client: answer
else Agentic Disabled
Client-->>Client: answer = ""
end
Client-->>Client: MultiQueryResult(results, labels, combinedContext, answer)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tests/test_core.py (1)
9-9: Cover the package-level re-export in one test.This PR changes
treedex/__init__.py, but these tests still importMultiQueryResultfromtreedex.core, so a broken root-level export would pass unnoticed. Adding one smoke test viafrom treedex import MultiQueryResultwould lock down the public API.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_core.py` at line 9, Add a smoke test that imports the public symbol from the package root to lock the public API: instead of only importing MultiQueryResult from treedex.core, add a new test (or extend tests/test_core.py) that does a simple from treedex import MultiQueryResult (and optionally TreeDex, QueryResult) and asserts the imported names are not None or usable; target the package-level re-export (MultiQueryResult) so a broken treedex/__init__.py will fail the test.README.md (1)
386-387: Consider making multi-index method signatures explicit in the API table.Current table entry is concise, but adding
llm/agentic/labelsoptions there would reduce ambiguity for first-time users.✍️ Suggested doc tweak
-| **Multi-index query** | **`TreeDex.query_all(indexes, q)`** | **`await TreeDex.queryAll(indexes, q)`** | +| **Multi-index query** | **`TreeDex.query_all(indexes, q, llm=None, agentic=False, labels=None)`** | **`await TreeDex.queryAll(indexes, q, { llm?, agentic?, labels? })`** |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 386 - 387, The API table entry for the multi-index query is ambiguous because it doesn't show optional parameters; update the README row for the multi-index method to explicitly list the available options (llm, agentic, labels) and their types/usage for both function variants (TreeDex.query_all(indexes, q) and await TreeDex.queryAll(indexes, q)), e.g., indicate an optional options object like { llm, agentic, labels } after q and briefly note defaults/valid values so first-time users know how to pass LLM/agentic/labels settings when calling these methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/how-it-works.md`:
- Around line 39-45: Several fenced code blocks in the document are unlabeled
and trigger markdownlint rule MD040; update each unlabeled triple-backtick fence
(e.g., the blocks starting with "[H1] Chapter 1: Introduction", "You are a
document retrieval system...", the JSON snippet { "node_ids": ["0005"], ... },
the "[2: Safety Guidelines]" block, and the "Document (PDF / TXT / HTML / DOCX)"
/ "QueryResult ..." block) to include a language identifier such as text by
changing the opening ``` to ```text and leaving the closing ``` unchanged so
markdownlint no longer flags MD040.
In `@examples/node/multi-index.ts`:
- Line 11: Replace the non-null assertion when instantiating GeminiLLM by
explicitly guarding process.env.GEMINI_API_KEY: read it into a const (e.g.,
const apiKey = process.env.GEMINI_API_KEY), check if (!apiKey) throw a clear
Error (e.g., "GEMINI_API_KEY environment variable is required"), and then call
new GeminiLLM(apiKey); reference the GeminiLLM constructor and the
GEMINI_API_KEY env var in your change.
In `@src/core.ts`:
- Around line 538-549: Before calling Promise.all over indexes.map(idx =>
idx.query(...)), validate LLM availability and fail fast: if a global
options?.llm is provided use that, otherwise ensure each index has a non-null
llm (check idx.llm) and throw an Error if any index lacks an LLM before
dispatching any queries. Update the pre-flight logic around the llm/agentic
handling (the llm constant and the indexes.map(...) that calls idx.query) to
perform this check and only proceed to Promise.all when all required LLMs are
present.
In `@treedex/core.py`:
- Around line 445-455: Before calling idx.query for each index, validate that
every index has an effective LLM available: for each element in indexes compute
effective_llm = llm if llm is not None else idx.llm and if any effective_llm is
None raise ValueError("No LLM available for index X") (or a similar message) so
validation runs before any idx.query calls; then pass the effective_llm to
idx.query (i.e., replace idx.query(question, llm=llm, ...) with using the
precomputed effective_llm) while keeping the existing labels length check
intact.
---
Nitpick comments:
In `@README.md`:
- Around line 386-387: The API table entry for the multi-index query is
ambiguous because it doesn't show optional parameters; update the README row for
the multi-index method to explicitly list the available options (llm, agentic,
labels) and their types/usage for both function variants
(TreeDex.query_all(indexes, q) and await TreeDex.queryAll(indexes, q)), e.g.,
indicate an optional options object like { llm, agentic, labels } after q and
briefly note defaults/valid values so first-time users know how to pass
LLM/agentic/labels settings when calling these methods.
In `@tests/test_core.py`:
- Line 9: Add a smoke test that imports the public symbol from the package root
to lock the public API: instead of only importing MultiQueryResult from
treedex.core, add a new test (or extend tests/test_core.py) that does a simple
from treedex import MultiQueryResult (and optionally TreeDex, QueryResult) and
asserts the imported names are not None or usable; target the package-level
re-export (MultiQueryResult) so a broken treedex/__init__.py will fail the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90e2f02a-9033-4778-a733-a324151aea79
📒 Files selected for processing (17)
DOCS.mdREADME.mddocs/api.mddocs/benchmark-report.mddocs/benchmarks.mddocs/case-studies.mddocs/configuration.mddocs/how-it-works.mddocs/llm-backends.mdexamples/multi_index.pyexamples/node/multi-index.tssrc/core.tssrc/index.tstest/core.test.tstests/test_core.pytreedex/__init__.pytreedex/core.py
| ``` | ||
| [H1] Chapter 1: Introduction | ||
| This chapter covers the background... | ||
|
|
||
| [H2] 1.1 Motivation | ||
| The problem we are solving is... | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks to satisfy markdownlint.
Lines 39, 88, 100, and 141 use unlabeled fences (MD040), which can fail docs lint checks.
🧹 Suggested fix
-```
+```text
[H1] Chapter 1: Introduction
This chapter covers the background...
@@
-The problem we are solving is...
-```
+The problem we are solving is...
+```
-```
+```text
You are a document retrieval system. Given this tree structure, pick
the node_ids most relevant to the query. Return JSON only.
@@
-{ "node_ids": ["0005"], "reasoning": "Safety section covers this." }
-```
+{ "node_ids": ["0005"], "reasoning": "Safety section covers this." }
+```
-```
+```text
[2: Safety Guidelines]
Section 2 covers personal protective equipment...
-```
+```
-```
+```text
Document (PDF / TXT / HTML / DOCX)
@@
-QueryResult { context, nodeIds, pageRanges, reasoning, answer }
-```
+QueryResult { context, nodeIds, pageRanges, reasoning, answer }
+```Also applies to: 88-93, 100-103, 141-182
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 39-39: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/how-it-works.md` around lines 39 - 45, Several fenced code blocks in the
document are unlabeled and trigger markdownlint rule MD040; update each
unlabeled triple-backtick fence (e.g., the blocks starting with "[H1] Chapter 1:
Introduction", "You are a document retrieval system...", the JSON snippet {
"node_ids": ["0005"], ... }, the "[2: Safety Guidelines]" block, and the
"Document (PDF / TXT / HTML / DOCX)" / "QueryResult ..." block) to include a
language identifier such as text by changing the opening ``` to ```text and
leaving the closing ``` unchanged so markdownlint no longer flags MD040.
| import { TreeDex, GeminiLLM } from "../../src/index.js"; | ||
|
|
||
| async function main() { | ||
| const llm = new GeminiLLM(process.env.GEMINI_API_KEY!); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
fd -i 'llm-backends.ts'
rg -n -C3 'class GeminiLLM|constructor\s*\(' src/llm-backends.tsRepository: mithun50/TreeDex
Length of output: 4391
Guard GEMINI_API_KEY explicitly instead of using non-null assertion.
Line 11 uses non-null assertion on process.env.GEMINI_API_KEY, which suppresses type checks but doesn't prevent missing env vars at runtime. The GeminiLLM constructor expects apiKey: string with no undefined fallback, so an explicit guard provides clearer error handling.
💡 Suggested fix
- const llm = new GeminiLLM(process.env.GEMINI_API_KEY!);
+ const apiKey = process.env.GEMINI_API_KEY;
+ if (!apiKey) {
+ throw new Error("Missing GEMINI_API_KEY environment variable.");
+ }
+ const llm = new GeminiLLM(apiKey);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const llm = new GeminiLLM(process.env.GEMINI_API_KEY!); | |
| const apiKey = process.env.GEMINI_API_KEY; | |
| if (!apiKey) { | |
| throw new Error("Missing GEMINI_API_KEY environment variable."); | |
| } | |
| const llm = new GeminiLLM(apiKey); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/node/multi-index.ts` at line 11, Replace the non-null assertion when
instantiating GeminiLLM by explicitly guarding process.env.GEMINI_API_KEY: read
it into a const (e.g., const apiKey = process.env.GEMINI_API_KEY), check if
(!apiKey) throw a clear Error (e.g., "GEMINI_API_KEY environment variable is
required"), and then call new GeminiLLM(apiKey); reference the GeminiLLM
constructor and the GEMINI_API_KEY env var in your change.
| const llm = options?.llm; | ||
| const agentic = options?.agentic ?? false; | ||
| const labels = options?.labels ?? indexes.map((_, i) => `Document ${i + 1}`); | ||
|
|
||
| if (labels.length !== indexes.length) { | ||
| throw new Error("labels length must match indexes length."); | ||
| } | ||
|
|
||
| // Query all indexes in parallel (no agentic per-index; answer at the end) | ||
| const results = await Promise.all( | ||
| indexes.map((idx) => idx.query(question, { llm, agentic: false })), | ||
| ); |
There was a problem hiding this comment.
Validate LLM availability before firing Promise.all().
Right now a single index with idx.llm === null can make queryAll() reject only after the other queries have already been dispatched. For a paid backend, that means the call can fail after partial external work has already happened. Fail fast before starting any retrievals.
💡 Suggested fix
if (labels.length !== indexes.length) {
throw new Error("labels length must match indexes length.");
}
+ if (!llm && indexes.some((idx) => idx.llm === null)) {
+ throw new Error(
+ "No LLM provided for one or more indexes. Pass options.llm or set llm on every TreeDex.",
+ );
+ }
+
// Query all indexes in parallel (no agentic per-index; answer at the end)
const results = await Promise.all(
indexes.map((idx) => idx.query(question, { llm, agentic: false })),
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const llm = options?.llm; | |
| const agentic = options?.agentic ?? false; | |
| const labels = options?.labels ?? indexes.map((_, i) => `Document ${i + 1}`); | |
| if (labels.length !== indexes.length) { | |
| throw new Error("labels length must match indexes length."); | |
| } | |
| // Query all indexes in parallel (no agentic per-index; answer at the end) | |
| const results = await Promise.all( | |
| indexes.map((idx) => idx.query(question, { llm, agentic: false })), | |
| ); | |
| const llm = options?.llm; | |
| const agentic = options?.agentic ?? false; | |
| const labels = options?.labels ?? indexes.map((_, i) => `Document ${i + 1}`); | |
| if (labels.length !== indexes.length) { | |
| throw new Error("labels length must match indexes length."); | |
| } | |
| if (!llm && indexes.some((idx) => idx.llm === null)) { | |
| throw new Error( | |
| "No LLM provided for one or more indexes. Pass options.llm or set llm on every TreeDex.", | |
| ); | |
| } | |
| // Query all indexes in parallel (no agentic per-index; answer at the end) | |
| const results = await Promise.all( | |
| indexes.map((idx) => idx.query(question, { llm, agentic: false })), | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core.ts` around lines 538 - 549, Before calling Promise.all over
indexes.map(idx => idx.query(...)), validate LLM availability and fail fast: if
a global options?.llm is provided use that, otherwise ensure each index has a
non-null llm (check idx.llm) and throw an Error if any index lacks an LLM before
dispatching any queries. Update the pre-flight logic around the llm/agentic
handling (the llm constant and the indexes.map(...) that calls idx.query) to
perform this check and only proceed to Promise.all when all required LLMs are
present.
| if labels is None: | ||
| labels = [f"Document {i + 1}" for i in range(len(indexes))] | ||
|
|
||
| if len(labels) != len(indexes): | ||
| raise ValueError("labels length must match indexes length.") | ||
|
|
||
| # Query each index (no per-index agentic; single answer at the end) | ||
| results = [ | ||
| idx.query(question, llm=llm, agentic=False) | ||
| for idx in indexes | ||
| ] |
There was a problem hiding this comment.
Fail fast when any index has no usable LLM.
If llm is omitted and one later TreeDex has idx.llm is None, earlier indexes may already have issued retrieval calls before query_all() raises. That turns a validation error into a partial, billable failure. Validate effective LLM availability for every index before starting the fan-out.
💡 Suggested fix
if len(labels) != len(indexes):
raise ValueError("labels length must match indexes length.")
+ if llm is None and any(idx.llm is None for idx in indexes):
+ raise ValueError(
+ "No LLM provided for one or more indexes. Pass llm= to "
+ "query_all() or set llm on every TreeDex instance."
+ )
+
# Query each index (no per-index agentic; single answer at the end)
results = [
idx.query(question, llm=llm, agentic=False)
for idx in indexes
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if labels is None: | |
| labels = [f"Document {i + 1}" for i in range(len(indexes))] | |
| if len(labels) != len(indexes): | |
| raise ValueError("labels length must match indexes length.") | |
| # Query each index (no per-index agentic; single answer at the end) | |
| results = [ | |
| idx.query(question, llm=llm, agentic=False) | |
| for idx in indexes | |
| ] | |
| if labels is None: | |
| labels = [f"Document {i + 1}" for i in range(len(indexes))] | |
| if len(labels) != len(indexes): | |
| raise ValueError("labels length must match indexes length.") | |
| if llm is None and any(idx.llm is None for idx in indexes): | |
| raise ValueError( | |
| "No LLM provided for one or more indexes. Pass llm= to " | |
| "query_all() or set llm on every TreeDex instance." | |
| ) | |
| # Query each index (no per-index agentic; single answer at the end) | |
| results = [ | |
| idx.query(question, llm=llm, agentic=False) | |
| for idx in indexes | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@treedex/core.py` around lines 445 - 455, Before calling idx.query for each
index, validate that every index has an effective LLM available: for each
element in indexes compute effective_llm = llm if llm is not None else idx.llm
and if any effective_llm is None raise ValueError("No LLM available for index
X") (or a similar message) so validation runs before any idx.query calls; then
pass the effective_llm to idx.query (i.e., replace idx.query(question, llm=llm,
...) with using the precomputed effective_llm) while keeping the existing labels
length check intact.
|
@copilot see this once |
…ard, and markdownlint MD040 Agent-Logs-Url: https://github.com/mithun50/TreeDex/sessions/94b91517-029d-4aa9-a905-030e116be523 Co-authored-by: mithun50 <99024517+mithun50@users.noreply.github.com>
Reviewed the PR and addressed all four CodeRabbit review comments in commit
All 106 tests still pass. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@claude review this pr. |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
docs/how-it-works.md— add language identifiers (text) to unlabeled fenced code blocks (MD040)examples/node/multi-index.ts— guardGEMINI_API_KEYexplicitly instead of non-null assertionsrc/core.ts— validate LLM availability before firingPromise.all()(fail fast)treedex/core.py— validate LLM availability before fan-out (fail fast)