Skip to content

Conversation

@prosdev
Copy link
Collaborator

@prosdev prosdev commented Nov 22, 2025

Summary

Implements the integration layer that orchestrates Scanner, Embedder, and Vector Store into a cohesive indexing pipeline with state management and incremental updates.

Features

Full repository indexing with progress tracking
Incremental updates (only changed files)
Semantic search over indexed content
State management for change detection
Batch processing for efficient embedding
Error handling with detailed error reporting
Configurable batch sizes, exclusions, languages

Architecture

RepositoryIndexer (orchestrator)
    ├── Scanner (extract documents)
    ├── Embedder (generate vectors)
    └── Vector Store (search & storage)

Testing

  • 39 comprehensive tests, all passing
  • 76% statement coverage (100% function coverage)
  • Total: 103 tests across entire codebase
  • Overall: 86% coverage for full codebase

Coverage Breakdown

  • Statements: 76% (target was 80%, defensive error handling accounts for gap)
  • Branches: 46% (uncovered: rare edge cases, file race conditions)
  • Functions: 100% ✅ (every function tested)
  • Lines: 75.6%

Note: Uncovered lines are defensive error handling (files disappearing mid-indexing, corrupt state files, etc.) - hard to test without extensive mocking, better to have robust error handling than brittle tests.

Documentation

  • ✅ Comprehensive README with usage examples
  • ✅ Real-world repository indexing example
  • ✅ API reference and best practices
  • ✅ Performance benchmarks (~15-25 docs/sec)
  • ✅ State management documentation
  • ✅ Troubleshooting guide

Performance

  • Indexing Speed: ~15-25 documents/second
  • Batch Processing: Configurable batch size (default 32)
  • Incremental Updates: Fast re-indexing of only changed files
  • Memory: Controlled via batch size
  • Storage: ~1.5KB per document

Example Usage

import { RepositoryIndexer } from '@lytics/dev-agent-core';

const indexer = new RepositoryIndexer({
  repositoryPath: './my-repo',
  vectorStorePath: './.dev-agent/vectors.lance',
});

await indexer.initialize();

// Index with progress tracking
const stats = await indexer.index({
  onProgress: (progress) => {
    console.log(`${progress.percentComplete}% - ${progress.phase}`);
  },
});

console.log(`Indexed ${stats.documentsIndexed} documents in ${stats.duration}ms`);

// Semantic search
const results = await indexer.search('authentication logic', {
  limit: 10,
  scoreThreshold: 0.7,
});

// Incremental update
const updateStats = await indexer.update();
console.log(`Updated ${updateStats.filesScanned} changed files`);

Implementation Details

State Management

  • Persisted to .dev-agent/indexer-state.json
  • Tracks file hashes for change detection
  • Version-aware for future compatibility

Progress Tracking

  • Phases: scanning → embedding → storing → complete
  • Percentage complete (0-100%)
  • Current file being processed
  • Callbacks for real-time updates

Error Handling

  • Graceful degradation on file errors
  • Detailed error reporting with context
  • Partial results on batch failures
  • Continues indexing after errors

Files Changed

packages/core/src/indexer/
├── index.ts              (487 lines) - Main orchestrator
├── types.ts              (192 lines) - Type definitions
├── indexer.test.ts       (720 lines) - Integration tests
├── indexer-edge.test.ts  (281 lines) - Edge case tests
└── README.md             (580 lines) - Documentation

docs/WORKFLOW.md          (463 lines) - Development workflow guide

Closes

Closes #12


Ready for review! This completes the core indexing pipeline, enabling end-to-end repository intelligence.

Implements the integration layer that orchestrates Scanner, Embedder, and Vector Store
into a cohesive indexing pipeline with state management and incremental updates.

Implementation:
- RepositoryIndexer class orchestrating full pipeline
- State management for incremental updates
- Progress tracking with callbacks
- Batch processing for efficient embedding
- File change detection via content hashing
- Comprehensive error handling

Features:
- Full repository indexing with progress tracking
- Incremental updates (only changed files)
- Semantic search over indexed content
- Statistics and monitoring
- Configurable batch sizes and exclusion patterns
- Language filtering
- State persistence for incremental updates

Testing:
- 16 comprehensive tests, all passing
- 75.2% statement coverage (100% function coverage)
- Tested: full indexing, incremental updates, search, state management
- Tested: progress tracking, error handling, configuration options

Documentation:
- Comprehensive README with usage examples
- Real-world repository indexing example
- API reference and best practices
- Performance characteristics and benchmarks
- State management documentation
- Troubleshooting guide

Architecture:
- Clean orchestration layer
- Pluggable components
- Type-safe throughout
- Efficient batch processing

Performance:
- ~15-25 docs/second indexing speed
- Batch processing with configurable size
- Incremental updates for fast re-indexing
- State tracking for change detection

Coverage:
- 75.2% statements, 44% branches, 100% functions
- All core functionality tested
- Integration tests with scanner + embedder + storage

All Tests: 80/80 passing ✅

Issue: #12
@prosdev prosdev merged commit 3c04783 into main Nov 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Repository Indexer - Integration Layer

1 participant