feat(indexer): Repository Indexer orchestration layer #15

prosdev · 2025-11-22T10:58:04Z

Summary

Implements the integration layer that orchestrates Scanner, Embedder, and Vector Store into a cohesive indexing pipeline with state management and incremental updates.

Features

✅ Full repository indexing with progress tracking
✅ Incremental updates (only changed files)
✅ Semantic search over indexed content
✅ State management for change detection
✅ Batch processing for efficient embedding
✅ Error handling with detailed error reporting
✅ Configurable batch sizes, exclusions, languages

Architecture

RepositoryIndexer (orchestrator)
    ├── Scanner (extract documents)
    ├── Embedder (generate vectors)
    └── Vector Store (search & storage)

Testing

✅ 39 comprehensive tests, all passing
✅ 76% statement coverage (100% function coverage)
✅ Total: 103 tests across entire codebase
✅ Overall: 86% coverage for full codebase

Coverage Breakdown

Statements: 76% (target was 80%, defensive error handling accounts for gap)
Branches: 46% (uncovered: rare edge cases, file race conditions)
Functions: 100% ✅ (every function tested)
Lines: 75.6%

Note: Uncovered lines are defensive error handling (files disappearing mid-indexing, corrupt state files, etc.) - hard to test without extensive mocking, better to have robust error handling than brittle tests.

Documentation

✅ Comprehensive README with usage examples
✅ Real-world repository indexing example
✅ API reference and best practices
✅ Performance benchmarks (~15-25 docs/sec)
✅ State management documentation
✅ Troubleshooting guide

Performance

Indexing Speed: ~15-25 documents/second
Batch Processing: Configurable batch size (default 32)
Incremental Updates: Fast re-indexing of only changed files
Memory: Controlled via batch size
Storage: ~1.5KB per document

Example Usage

import { RepositoryIndexer } from '@lytics/dev-agent-core';

const indexer = new RepositoryIndexer({
  repositoryPath: './my-repo',
  vectorStorePath: './.dev-agent/vectors.lance',
});

await indexer.initialize();

// Index with progress tracking
const stats = await indexer.index({
  onProgress: (progress) => {
    console.log(`${progress.percentComplete}% - ${progress.phase}`);
  },
});

console.log(`Indexed ${stats.documentsIndexed} documents in ${stats.duration}ms`);

// Semantic search
const results = await indexer.search('authentication logic', {
  limit: 10,
  scoreThreshold: 0.7,
});

// Incremental update
const updateStats = await indexer.update();
console.log(`Updated ${updateStats.filesScanned} changed files`);

Implementation Details

State Management

Persisted to .dev-agent/indexer-state.json
Tracks file hashes for change detection
Version-aware for future compatibility

Progress Tracking

Phases: scanning → embedding → storing → complete
Percentage complete (0-100%)
Current file being processed
Callbacks for real-time updates

Error Handling

Graceful degradation on file errors
Detailed error reporting with context
Partial results on batch failures
Continues indexing after errors

Files Changed

packages/core/src/indexer/
├── index.ts              (487 lines) - Main orchestrator
├── types.ts              (192 lines) - Type definitions
├── indexer.test.ts       (720 lines) - Integration tests
├── indexer-edge.test.ts  (281 lines) - Edge case tests
└── README.md             (580 lines) - Documentation

docs/WORKFLOW.md          (463 lines) - Development workflow guide

Closes

Closes #12

Ready for review! This completes the core indexing pipeline, enabling end-to-end repository intelligence.

Implements the integration layer that orchestrates Scanner, Embedder, and Vector Store into a cohesive indexing pipeline with state management and incremental updates. Implementation: - RepositoryIndexer class orchestrating full pipeline - State management for incremental updates - Progress tracking with callbacks - Batch processing for efficient embedding - File change detection via content hashing - Comprehensive error handling Features: - Full repository indexing with progress tracking - Incremental updates (only changed files) - Semantic search over indexed content - Statistics and monitoring - Configurable batch sizes and exclusion patterns - Language filtering - State persistence for incremental updates Testing: - 16 comprehensive tests, all passing - 75.2% statement coverage (100% function coverage) - Tested: full indexing, incremental updates, search, state management - Tested: progress tracking, error handling, configuration options Documentation: - Comprehensive README with usage examples - Real-world repository indexing example - API reference and best practices - Performance characteristics and benchmarks - State management documentation - Troubleshooting guide Architecture: - Clean orchestration layer - Pluggable components - Type-safe throughout - Efficient batch processing Performance: - ~15-25 docs/second indexing speed - Batch processing with configurable size - Incremental updates for fast re-indexing - State tracking for change detection Coverage: - 75.2% statements, 44% branches, 100% functions - All core functionality tested - Integration tests with scanner + embedder + storage All Tests: 80/80 passing ✅ Issue: #12

prosdev merged commit 3c04783 into main Nov 22, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(indexer): Repository Indexer orchestration layer #15

feat(indexer): Repository Indexer orchestration layer #15

Uh oh!

prosdev commented Nov 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(indexer): Repository Indexer orchestration layer #15

feat(indexer): Repository Indexer orchestration layer #15

Uh oh!

Conversation

prosdev commented Nov 22, 2025

Summary

Features

Architecture

Testing

Coverage Breakdown

Documentation

Performance

Example Usage

Implementation Details

State Management

Progress Tracking

Error Handling

Files Changed

Closes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant