Skip to content

SqliteStore backend + annotation, audit, and query result cache tools#169

Draft
data-douser wants to merge 2 commits intomainfrom
dd/sqlite-annotation-cache
Draft

SqliteStore backend + annotation, audit, and query result cache tools#169
data-douser wants to merge 2 commits intomainfrom
dd/sqlite-annotation-cache

Conversation

@data-douser
Copy link
Collaborator

Summary

Replace lowdb with sql.js (asm.js build) as a zero-dependency SQLite persistence backend. Add 14 new MCP tools for annotation management, audit workflows, and query result caching. Refactor server code into smaller, focused modules.

Changes

SqliteStore persistence backend

  • server/src/lib/sqlite-store.ts — unified SQLite persistence with 3 tables:
    • sessions: session tracking (migrated from lowdb)
    • annotations: key-value annotation store with categories and full-text search
    • query_result_cache: BQRS/SARIF result caching with subset retrieval (line ranges, grep, SARIF filters)
  • server/src/types/sql-js.d.ts — type declarations for sql.js
  • server/package.json — added sql.js, removed lowdb

New tools (gated by ENABLE_ANNOTATION_TOOLS env var)

  • Annotation tools (6): annotation_create, annotation_list, annotation_search, annotation_delete
  • Audit tools (4): audit_store_findings, audit_list_findings, audit_add_notes, audit_clear_repo
  • Cache tools (4): query_results_cache_lookup, query_results_cache_retrieve, query_results_cache_clear, query_results_cache_compare

Code refactoring

  • Extracted database-resolver.ts from cli-tool-registry.ts
  • Extracted query-resolver.ts from cli-tool-registry.ts
  • Extracted result-processor.ts from cli-tool-registry.ts
  • Extracted codeql-version.ts from cli-executor.ts
  • cli-tool-registry.ts reduced from ~1060 to ~693 lines

Bug fixes

  • Fix params.output not propagated to processQueryRunResults (caused cache writes to silently skip)
  • Fix duplicate SARIF generation bypassing auto-cache pipeline
  • Fix external predicate conditions for direct query paths (CallGraphFrom/To/FromTo)

Testing

  • All 1074 server unit tests passing
  • New test: sqlite-store.test.ts (31 tests)
  • Updated: session-data-manager.test.ts, monitoring-tools.test.ts

Review order

This PR is independent and should be reviewed first (other PRs depend on it).

Closes #165
Part of #163

Replace lowdb with sql.js (asm.js build) for zero-dependency SQLite persistence.
Bundle inline with esbuild — no native modules, no external deps at runtime.

SqliteStore provides three tables:
- sessions: session tracking (migrated from lowdb)
- annotations: key-value annotation store with categories and metadata
- query_result_cache: BQRS/SARIF result caching with subset retrieval

New tools (gated by ENABLE_ANNOTATION_TOOLS env var):
- annotation_create, annotation_list, annotation_search, annotation_delete
- audit_store_findings, audit_list_findings, audit_add_notes, audit_clear_repo
- query_results_cache_lookup, query_results_cache_retrieve,
  query_results_cache_clear, query_results_cache_compare

Code refactoring for maintainability:
- Extract database-resolver.ts from cli-tool-registry.ts
- Extract query-resolver.ts from cli-tool-registry.ts
- Extract result-processor.ts from cli-tool-registry.ts
- Extract codeql-version.ts from cli-executor.ts

Bug fixes:
- Fix params.output not propagated to proce- Fix params.output not propagated to proce- Fix params.output not propagated txternal predicate conditions for direct query paths

Closes #165
Copilot AI review requested due to automatic review settings March 25, 2026 11:53
@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA d26fe4f.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

License Issues

server/package.json

PackageVersionLicenseIssue Type
sql.js^1.14.1NullUnknown License

OpenSSF Scorecard

PackageVersionScoreDetails
npm/sql.js 1.14.1 🟢 3.4
Details
CheckScoreReason
Code-Review⚠️ 2Found 6/30 approved changesets -- score normalized to 2
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Maintained🟢 43 commit(s) and 2 issue activity found in the last 90 days -- score normalized to 4
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies🟢 6dependency not pinned by hash detected -- score normalized to 6
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Security-Policy⚠️ 0security policy file not detected
Fuzzing⚠️ 0project is not fuzzed
License🟢 9license file detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Signed-Releases⚠️ 0Project has not signed or included provenance with any releases.
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
npm/sql.js ^1.14.1 UnknownUnknown

Scanned Files

  • package-lock.json
  • server/package.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the server’s persistence layer from lowdb to a unified sql.js-backed SqliteStore, and introduces opt-in MCP tools for annotations, audit workflows, and query result caching while refactoring CLI-related logic into smaller modules.

Changes:

  • Replace lowdb session persistence with SqliteStore (sessions + annotations + query result cache tables).
  • Add new opt-in MCP tools: annotation_*, audit_*, and query_results_cache_*.
  • Refactor query/database resolution and query-result processing (interpretation + auto-caching) into dedicated modules.

Reviewed changes

Copilot reviewed 18 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
server/test/src/tools/monitoring-tools.test.ts Updates monitoring-tool tests to account for async store initialization and new config flag.
server/test/src/lib/sqlite-store.test.ts Adds unit tests for SqliteStore sessions, annotations, persistence, and cache behaviors.
server/test/src/lib/session-data-manager.test.ts Updates tests to await async initialization after migrating persistence backend.
server/src/types/sql-js.d.ts Adds minimal typings for sql.js asm.js import + core DB surface.
server/src/types/monitoring.ts Adds enableAnnotationTools config flag (default false).
server/src/tools/cache-tools.ts Introduces query_results_cache_* tools (lookup/retrieve/clear/compare).
server/src/tools/audit-tools.ts Introduces audit_* tools layered on annotations.
server/src/tools/annotation-tools.ts Introduces annotation_* tools for CRUD + search.
server/src/lib/sqlite-store.ts Implements the new unified SQLite persistence backend + cache subset retrieval.
server/src/lib/session-data-manager.ts Migrates session persistence from lowdb to SqliteStore; adds getStore().
server/src/lib/result-processor.ts Extracts query result interpretation/evaluation and auto-cache pipeline.
server/src/lib/query-results-evaluator.ts Adds query metadata caching with mtime-based invalidation.
server/src/lib/query-resolver.ts Extracts query-path resolution via codeql resolve queries.
server/src/lib/database-resolver.ts Extracts database-path resolution and caches results in-memory.
server/src/lib/codeql-version.ts Adds target/actual CodeQL version tracking and mismatch warning.
server/src/lib/cli-tool-registry.ts Integrates extracted resolvers/result processor and fixes output propagation + predicate handling.
server/src/lib/cli-executor.ts Wires version detection into startup validation; re-exports version helpers.
server/src/codeql-development-mcp-server.ts Registers new annotation/audit/cache tools at startup and initializes session manager.
server/package.json Removes lowdb, adds sql.js.
package-lock.json Updates lockfile to reflect dependency swap (remove lowdb/steno, add sql.js).
Comments suppressed due to low confidence (2)

server/src/lib/sqlite-store.ts:425

  • updateAnnotation sets updated_at = datetime('now'), which produces a different string format than the ISO timestamps written in createAnnotation. This can cause incorrect ordering when sorting by updated_at. Consider updating this to use the same format as inserts.
    setClauses.push("updated_at = datetime('now')");

    const db = this.ensureDb();
    db.run(
      `UPDATE annotations SET ${setClauses.join(', ')} WHERE id = $id`,
      params as Record<string, string | number | null>,
    );

server/src/lib/session-data-manager.ts:49

  • Docstring says “sql.js WASM init is async”, but this store uses the asm.js build. Please adjust the wording so it’s accurate (async init is still true, but it’s not specifically WASM).
  /**
   * Initialize the database and ensure it's properly set up.
   * Must be awaited before any session operations (sql.js WASM init is async).
   */

Comment on lines +5 to +16
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { SqliteStore } from '../../../src/lib/sqlite-store';
import { existsSync, rmSync } from 'fs';

describe('SqliteStore', () => {
let store: SqliteStore;
const testDir = '.ql-mcp-sqlite-test';

beforeEach(async () => {
if (existsSync(testDir)) {
rmSync(testDir, { recursive: true, force: true });
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses a fixed directory name (.ql-mcp-sqlite-test) in the repo root for test storage. In this repo’s tests, temporary data should go under the project-local .tmp/ directory via getProjectTmpBase() / createProjectTempDir() helpers to avoid polluting the workspace and reduce the risk of collisions. Consider generating a unique temp dir per test run and cleaning it up.

Suggested change
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { SqliteStore } from '../../../src/lib/sqlite-store';
import { existsSync, rmSync } from 'fs';
describe('SqliteStore', () => {
let store: SqliteStore;
const testDir = '.ql-mcp-sqlite-test';
beforeEach(async () => {
if (existsSync(testDir)) {
rmSync(testDir, { recursive: true, force: true });
}
import { existsSync, rmSync } from 'fs';
import { SqliteStore } from '../../../src/lib/sqlite-store';
import { createProjectTempDir } from '../../../src/utils/temp-dir';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
describe('SqliteStore', () => {
let store: SqliteStore;
let testDir: string;
beforeEach(async () => {
testDir = createProjectTempDir();

Copilot uses AI. Check for mistakes.
Comment on lines +303 to +314
const now = new Date().toISOString();
db.run(
`INSERT INTO annotations (category, entity_key, label, content, metadata, created_at, updated_at)
VALUES ($category, $entity_key, $label, $content, $metadata, $created_at, $updated_at)`,
{
$category: category,
$entity_key: entityKey,
$label: label ?? null,
$content: content ?? null,
$metadata: metadata ?? null,
$created_at: now,
$updated_at: now,
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createAnnotation writes created_at/updated_at using new Date().toISOString(), but updateAnnotation later sets updated_at = datetime('now'). Mixing timestamp formats can break ORDER BY updated_at sorting because the strings aren’t comparable across formats. Use a single consistent timestamp format for both create and update (either SQLite datetime('now') everywhere or ISO strings everywhere).

This issue also appears on line 419 of the same file.

Suggested change
const now = new Date().toISOString();
db.run(
`INSERT INTO annotations (category, entity_key, label, content, metadata, created_at, updated_at)
VALUES ($category, $entity_key, $label, $content, $metadata, $created_at, $updated_at)`,
{
$category: category,
$entity_key: entityKey,
$label: label ?? null,
$content: content ?? null,
$metadata: metadata ?? null,
$created_at: now,
$updated_at: now,
db.run(
`INSERT INTO annotations (category, entity_key, label, content, metadata, created_at, updated_at)
VALUES ($category, $entity_key, $label, $content, $metadata, datetime('now'), datetime('now'))`,
{
$category: category,
$entity_key: entityKey,
$label: label ?? null,
$content: content ?? null,
$metadata: metadata ?? null,

Copilot uses AI. Check for mistakes.
Comment on lines +195 to +201
flush(): void {
if (!this.db) return;
const data = this.db.export();
const buffer = Buffer.from(data);
writeFileSync(this.dbPath, buffer);
this.dirty = false;
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush() exports the entire in-memory sql.js database and writes it to disk synchronously. Since many write operations call flush() immediately, this can become a significant performance bottleneck (and block the event loop) as the DB grows (especially with cached query results). Consider batching/debouncing flushes, flushing only on close, or making flush-on-write configurable.

Copilot uses AI. Check for mistakes.
Comment on lines 2 to 4
* Session Data Management
* Provides unified JSON storage and session lifecycle management using lowdb
* Provides session lifecycle management backed by SqliteStore (sql.js WASM)
*/
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Header comment says sessions are backed by “sql.js WASM”, but SqliteStore imports the asm.js build (sql.js/dist/sql-asm.js). Update the comment to avoid implying a WASM dependency.

This issue also appears on line 46 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +226
beforeEach(async () => {
// Initialize the sqlite store (required for sql.js WASM backend)
await sessionDataManager.initialize();

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sessionDataManager.initialize() is called before mocking getConfig() to use testStorageDir, but the underlying SqliteStore’s storage directory is fixed at construction time from the real config. As a result, these tests may initialize/create a DB in the default location and never clean it up. Consider resetting modules after setting MONITORING_STORAGE_LOCATION, or using a fresh SessionDataManager instance in the tests.

Copilot uses AI. Check for mistakes.
Comment on lines +49 to 57
// Check cache with mtime validation
const stat = statSync(queryPath);
const mtime = stat.mtimeMs;
const cached = metadataCache.get(queryPath);
if (cached && cached.mtime === mtime) {
return cached.metadata;
}

const queryContent = readFileSync(queryPath, 'utf-8');
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractQueryMetadata calls statSync(queryPath) and then readFileSync(queryPath). This introduces a TOCTOU window (file can change between the two operations) and violates the repo’s guidance to avoid stat-then-read patterns. Prefer reading via an opened file descriptor and fstatSync, or read first and then stat for cache invalidation.

Copilot uses AI. Check for mistakes.
Comment on lines +653 to +654
const [start, end] = options.resultIndices;
selected = selected.slice(start, end);
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCacheSarifSubset treats resultIndices as [start, end] but uses slice(start, end), which excludes end. This is inconsistent with lineRange handling (inclusive end) and with the tool description that reads like an inclusive range. Either document end as exclusive, or change the implementation to include the end index (and add clamping/validation).

Suggested change
const [start, end] = options.resultIndices;
selected = selected.slice(start, end);
const [rawStart, rawEnd] = options.resultIndices;
const total = selected.length;
if (Number.isFinite(rawStart) && Number.isFinite(rawEnd) && total > 0) {
const start = Math.max(0, Math.min(total - 1, Math.floor(rawStart)));
const end = Math.max(0, Math.min(total - 1, Math.floor(rawEnd)));
if (end >= start) {
// resultIndices is treated as an inclusive [start, end] range
selected = selected.slice(start, end + 1);
} else {
selected = [];
}
} else {
selected = [];
}

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +9
* Opt-in via ENABLE_AUDIT_TOOLS=true (disabled by default).
* Requires ENABLE_ANNOTATION_TOOLS=true (audit tools are layered on annotations).
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says “Opt-in via ENABLE_AUDIT_TOOLS=true”, but registerAuditTools only gates on config.enableAnnotationTools / ENABLE_ANNOTATION_TOOLS. Either add and wire up an enableAuditTools config/env flag as documented, or update the comment to reflect the actual gating behavior.

Suggested change
* Opt-in via ENABLE_AUDIT_TOOLS=true (disabled by default).
* Requires ENABLE_ANNOTATION_TOOLS=true (audit tools are layered on annotations).
* Enabled when ENABLE_ANNOTATION_TOOLS=true (disabled by default when annotation tools are off).
* Audit tools are layered on annotations; there is no separate ENABLE_AUDIT_TOOLS flag.

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +107
cacheKey: z.string().describe('The cache key of the result to retrieve.'),
lineRange: z.tuple([z.number(), z.number()]).optional().describe('Line range [start, end] (1-indexed). For graphtext/CSV output.'),
resultIndices: z.tuple([z.number(), z.number()]).optional().describe('SARIF result index range [start, end] (0-indexed).'),
fileFilter: z.string().optional().describe('For SARIF: only include results whose file path contains this string.'),
grep: z.string().optional().describe('Text search filter: only include lines/results containing this term.'),
maxLines: z.number().optional().describe('Maximum number of lines to return (default: 500).'),
},
async ({ cacheKey, lineRange, resultIndices, fileFilter, grep, maxLines }) => {
const store = sessionDataManager.getStore();
const meta = store.getCacheEntryMeta(cacheKey);

if (!meta) {
return { content: [{ type: 'text' as const, text: `No cached result found for key: ${cacheKey}` }] };
}

// For SARIF format with SARIF-specific filters, use SARIF subset retrieval
const isSarif = meta.outputFormat.includes('sarif');
if (isSarif && (resultIndices || fileFilter)) {
const subset = store.getCacheSarifSubset(cacheKey, {
resultIndices,
fileFilter,
maxResults: maxLines,
});
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_results_cache_retrieve uses a maxLines parameter but passes it as maxResults for SARIF subset retrieval. This mixes “lines” and “results” concepts in the API and can be confusing for callers. Consider adding a separate maxResults parameter for SARIF (or renaming to a generic maxItems) and keep maxLines for line-oriented formats only.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SqliteStore backend + annotation, audit, and query result cache tools

2 participants