feat: implement Sentinel stress testing system #352

AutumnsGrove · 2026-01-15T22:26:15Z

Implements the Sentinel stress testing system for validating Grove’s infrastructure scalability (10 → 10,000 → 10,000,000 users). Based on the documented Sentinel Pattern with realistic traffic composition and three-phase testing.
Key additions:
∙ Database schema for test runs, metrics, checkpoints, baselines, and schedules (migration 021)
∙ Load profile generators following the Sentinel Pattern’s traffic composition (35% reading, 25% browsing, 10% auth, etc.)
∙ Three-phase testing model: Ramp-up → Peak → Steady-state
∙ Operation generators for D1, KV, and R2 that mimic real Grove usage
∙ API endpoints for triggering and monitoring tests
∙ Scheduler integration for automated weekly/daily stress tests
∙ Clearing service integration stubs (API calls, not direct DB)
Test plan
∙ Run migration 021 against test D1 database
∙ Verify Sentinel API endpoints return correct responses
∙ Run a smoke test profile (createSmokeTestProfile()) to validate operation generators
∙ Confirm no interference with existing Clearing service (separate DB)
∙ Test scheduled run creation and cron expression parsing

Add comprehensive infrastructure validation system for Grove: - Database migration (008_sentinel.sql): - sentinel_runs: Test run configuration and results - sentinel_metrics: Per-operation timing data - sentinel_checkpoints: Periodic snapshots during tests - sentinel_baselines: Performance baselines for comparison - sentinel_schedules: Cron-triggered test schedules - clearing_status: Public status page data - clearing_incidents: Incident tracking - Core library (src/lib/sentinel/): - types.ts: Type definitions for all Sentinel entities - profiles.ts: Load profile generators following Sentinel Pattern - Traffic composition (35% reading, 25% browsing, etc.) - Three-phase model (Ramp-up, Peak, Steady-state) - Preset profiles (spike, sustained, oscillation, ramp) - operations.ts: Operation generators for D1, KV, R2 - runner.ts: Test execution engine with metrics collection - scheduler.ts: Cron trigger handling and maintenance mode - API endpoints (src/routes/api/sentinel/): - GET/POST /api/sentinel - List and create test runs - GET/POST/DELETE /api/sentinel/[id] - Manage individual runs - The Clearing status page (src/routes/clearing/): - Public-facing system status display - Recent sentinel test results - Incident tracking and display - Component status overview Based on the Sentinel Pattern specification for realistic traffic simulation and three-phase testing methodology.

- Rename migration from 008 to 021 (following existing sequence) - Remove duplicate /clearing route (exists as separate service) - Remove clearing_status/clearing_incidents tables from migration (those tables belong in Clearing service's own database) - Update scheduler to use API calls to Clearing instead of direct DB

claude · 2026-01-15T22:27:32Z

PR Review: Sentinel Stress Testing System

Great work on implementing a comprehensive infrastructure validation system! The implementation is well-structured and follows many best practices. Here's my detailed review:

✅ Strengths

Architecture & Design

Well-organized module structure - Clear separation between types, profiles, operations, runner, and scheduler
Comprehensive type system - Excellent use of TypeScript for type safety across all components
Sentinel Pattern implementation - Good adherence to the three-phase testing model and realistic traffic composition
Flexible profile system - Support for multiple test patterns (spike, sustained, oscillation, ramp, custom)

Database Schema

Solid migration design - Well-indexed tables with appropriate foreign keys
Good data normalization - Separation of runs, metrics, checkpoints, baselines, and schedules
Audit trail - Proper tracking with timestamps and status fields

Code Quality

Excellent documentation - Comprehensive JSDoc comments throughout
Error handling patterns - Try-catch blocks with fallbacks
Modular design - Clean separation of concerns

⚠️ Issues & Concerns

1. CRITICAL: Database Query Isolation Violation

Location: packages/engine/src/lib/sentinel/operations.ts:88-127

You're violating the critical database isolation pattern from AGENT.md:274-296. Multiple independent queries share try-catch blocks:

// ❌ BAD - Current code
try {
  const result = await db.prepare(/* INSERT */).run();
  return { success: result.success, ... };
} catch (error) {
  if (error instanceof Error && error.message.includes('no such table')) {
    await db.prepare(/* CREATE TABLE */).run();
    // Retry the insert
    const result = await db.prepare(/* INSERT */).run();
    return { success: result.success, ... };
  }
  throw error;
}

This pattern is acceptable here since the CREATE depends on the INSERT failing, but throughout the file, you should ensure each independent query has isolated error handling.

2. Security: Missing Input Validation

Location: packages/engine/src/routes/api/sentinel/+server.ts:108-116

The API endpoint accepts user input but lacks validation:

// Missing validation for:
- targetOperations (could be negative or excessively large)
- durationSeconds (could cause infinite loops or timeout issues)
- concurrency (could overwhelm the system)
- spikeMultiplier (unbounded)

Recommendation:

// Add validation
if (body.targetOperations && (body.targetOperations < 1 || body.targetOperations > 1_000_000)) {
  throw error(400, 'targetOperations must be between 1 and 1,000,000');
}

if (body.durationSeconds && (body.durationSeconds < 1 || body.durationSeconds > 3600)) {
  throw error(400, 'durationSeconds must be between 1 and 3600');
}

if (body.concurrency && (body.concurrency < 1 || body.concurrency > 1000)) {
  throw error(400, 'concurrency must be between 1 and 1000');
}

3. Performance: Cloudflare Workers CPU Time Limits

Location: packages/engine/src/lib/sentinel/runner.ts:84-180

Cloudflare Workers have a 10-30 second CPU time limit. The SentinelRunner.execute() method could easily exceed this for longer tests.

Issues:

Tests with durationSeconds > 30 will timeout
Even with waitUntil(), CPU time limits apply
The runner doesn't account for these constraints

Recommendation:

Add a maximum duration check (e.g., 30 seconds)
For longer tests, use Durable Objects or cron triggers
Document the limitations clearly

if (profile.durationSeconds > 30 && !this.config.ctx) {
  throw new Error('Tests longer than 30 seconds require execution context (waitUntil)');
}

4. Bug: Typo in Function Name

Location: packages/engine/src/lib/sentinel/profiles.ts:78 and index.ts:78

export function estimateCloudflareCosat(profile: LoadProfile) {
  // Should be "Cost" not "Cosat"

This is exported in the public API and will break type safety for consumers.

5. Resource Leak: Missing Cleanup

Location: packages/engine/src/lib/sentinel/runner.ts:162-166

Cleanup only runs on success, not on failure or cancellation:

} catch (error) {
  const err = error instanceof Error ? error : new Error(String(error));
  await this.updateRunStatus(run.id, 'failed', { errorMessage: err.message });
  this.config.onError?.(err);
  throw error;
  // ❌ Cleanup is NOT called here!
} finally {
  this.isRunning = false;
}

Recommendation:

} finally {
  this.isRunning = false;
  
  // Always cleanup test data
  if (this.config.ctx) {
    this.config.ctx.waitUntil(cleanupSentinelData(db, kv, r2, tenantId));
  } else {
    await cleanupSentinelData(db, kv, r2, tenantId).catch(() => {});
  }
}

6. Potential Race Condition

Location: packages/engine/src/routes/api/sentinel/+server.ts:195-206

When startImmediately is true, the runner executes in the background but the response immediately returns with status: 'running'. However, the actual execution might not have started yet, and could fail immediately.

Recommendation:

Either wait for the runner to actually start before returning
Or return status: 'pending' and let clients poll for updates
Add error handling to update the run status if the background execution fails

7. Missing Test Coverage Mention

No tests are included in this PR. For a stress testing system, you should have:

Unit tests for profile generators
Integration tests for operation execution
Mock tests for the runner
API endpoint tests

8. Cost Estimation Incomplete

Location: packages/engine/src/lib/sentinel/index.ts:78

The estimateCloudflareCosat function is exported but I don't see its implementation in the files provided. This is critical for users to understand the financial impact of tests.

💡 Suggestions

1. Add Rate Limiting

The API endpoints should have rate limiting to prevent abuse:

// Check if there's already a running test for this tenant
const runningTests = await listSentinelRuns(db, tenantId, { status: 'running', limit: 1 });
if (runningTests.length > 0) {
  throw error(429, 'A test is already running for this tenant');
}

2. Add Metrics Persistence

Location: packages/engine/src/lib/sentinel/runner.ts:250-276

Metrics are collected in memory but never persisted to the sentinel_metrics table. This data is valuable for analysis but will be lost if the Worker times out.

Recommendation: Batch insert metrics periodically during execution.

3. Consider Checksum for Config Snapshot

Location: packages/engine/migrations/021_sentinel.sql:52

The config_snapshot field stores the full JSON configuration. Consider adding a checksum or hash field to quickly detect if configurations are identical without parsing JSON.

4. Add Monitoring/Observability

Consider integrating with Cloudflare AI Gateway (mentioned in AGENT.md:102) or add structured logging for:

Operation failures
Performance degradation
Resource usage

5. Missing Export in package.json

Ensure the engine's package.json exports the sentinel module:

{
  "exports": {
    "./sentinel": {
      "types": "./src/lib/sentinel/index.ts",
      "import": "./src/lib/sentinel/index.js"
    }
  }
}

📝 Minor Issues

Inconsistent error handling - Some functions swallow errors silently, others throw
Magic numbers - Several hardcoded values (50, 100, 500) should be constants
Missing Grove voice - The error messages and descriptions don't match the warm Grove tone from CLAUDE.md
Date handling - Mix of Date objects and Unix timestamps could be simplified

🔒 Security Assessment

✅ Authentication required for API endpoints
✅ Tenant isolation in database queries
⚠️ Missing input validation (see issue Project audit and status review - decided on auth #2)
⚠️ No rate limiting (see suggestion Review project documentation and diagrams #1)
✅ Test data cleanup to prevent data leakage

📊 Overall Assessment

This is a solid foundation for a stress testing system, but needs refinement before merging:

Must Fix Before Merge:

Fix the typo: estimateCloudflareCosat → estimateCloudflareCoast or estimateCloudflareCost
Add input validation to API endpoints
Add Cloudflare Workers time limit checks
Fix cleanup to run in finally block
Implement or remove the cost estimation function

Should Fix Before Merge:

Add rate limiting
Persist metrics to database
Add basic tests
Document limitations clearly

Nice to Have:

Grove voice in user-facing messages
Monitoring/observability integration
Performance optimizations

Recommendation: Request changes to address the critical issues above, then this will be ready to merge.

Great work overall! The architecture is sound and this will be a valuable addition to Grove's infrastructure. 🌲

Review generated at packages/engine/src/lib/sentinel/ - focusing on code quality, security, and Cloudflare Workers compatibility

- Add SentinelDO Durable Object following Loom pattern for long-running tests (solves Worker CPU timeout issues for tests >30 seconds) - Fix typo: estimateCloudflareCosat → estimateCloudflareCost - Add comprehensive input validation to API endpoints - Move cleanup to finally block for reliable resource cleanup - Export SentinelDO from index module

claude · 2026-01-16T00:39:45Z

Sentinel Stress Testing System - Code Review

Summary

This PR implements a comprehensive stress testing infrastructure for Grove (the Sentinel system) with realistic traffic simulation, three-phase testing methodology, and Durable Object integration. Overall, this is high-quality, well-architected code that follows Grove's established patterns. The implementation is thorough and production-ready with only minor suggestions for improvement.

✅ Strengths

Architecture & Design

Excellent use of Loom pattern: The SentinelDO Durable Object correctly implements batched D1 writes, hibernation-aware WebSockets, and alarm-based scheduling
Well-separated concerns: Clean separation between profiles, operations, runner, scheduler, and API layers
Realistic traffic modeling: TRAFFIC_COMPOSITION weights (35% reading, 25% browsing, etc.) align with actual user behavior
Comprehensive database schema: All necessary tables with proper indexes, foreign keys, and constraints

Code Quality

Type safety: Extensive TypeScript types with proper interfaces throughout
Error handling: Graceful fallbacks for missing tables, isolated try/catch blocks (following AGENT.md guidance)
Cleanup discipline: Proper cleanup in finally blocks and test data removal after runs
Documentation: Clear inline comments and JSDoc explanations

Security & Validation

Input validation (+server.ts:119-157): Comprehensive bounds checking prevents abuse
- Operations: 1 to 1,000,000
- Duration: 1 to 3,600 seconds
- Concurrency: 1 to 500
- Spike multiplier: 1 to 100
Authentication checks: Proper locals.user validation in API endpoints
SQL injection protection: Parameterized queries throughout

🔍 Issues & Suggestions

1. Migration Dependency Missing (Minor)

Location: 021_sentinel.sql:10
Issue: The migration references sentinel_test_data table but doesn't create it upfront. While the operations code creates it on-demand (line 113-120 in operations.ts), this creates an implicit dependency.

Suggestion: Add the table to the migration for completeness:

-- After line 233 in migration
CREATE TABLE IF NOT EXISTS sentinel_test_data (
  id TEXT PRIMARY KEY,
  tenant_id TEXT NOT NULL,
  data TEXT,
  created_at INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_sentinel_test_data_tenant ON sentinel_test_data(tenant_id);

2. Potential Memory Issue in Runner (Medium)

Location: runner.ts:68-70
Issue: The metrics array grows unbounded during long test runs. A 1-hour test with 10 ops/sec = 36,000 metric objects in memory.

Suggestion: Implement periodic flushing similar to the DO pattern:

// After line 140 in runner.ts, add:
if (this.metrics.length >= 1000) {
  await this.flushMetrics();
}

3. WebSocket Message Handler Empty (Low)

Location: durable-object.ts:224-228
Issue: The webSocketMessage handler parses JSON but doesn't act on it. This could throw if invalid JSON is sent.

Suggestion: Add error handling:

async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
  try {
    const data = JSON.parse(message as string);
    // Handle subscription preferences, etc.
    this.log('WebSocket message', data);
  } catch (error) {
    this.log('Invalid WebSocket message', { error: String(error) });
  }
}

4. Race Condition in Alarm Scheduling (Medium)

Location: durable-object.ts:114-116
Issue: If executeBatch() takes longer than 1 second, alarms could stack up or skip intervals.

Suggestion: Calculate next alarm based on when it should occur:

// Replace line 116:
const nextInterval = Math.floor((Date.now() - this.runState.startedAt) / 1000) + 1;
await this.state.storage.setAlarm(this.runState.startedAt + (nextInterval * 1000));

5. Missing Validation for Custom Load Curves (Low)

Location: +server.ts:208-214
Issue: Custom profiles don't validate customConfig.loadCurve structure, which could cause runtime errors.

Suggestion: Add validation or document that custom profiles require direct API construction.

6. Cleanup Function Doesn't Delete Test Table (Low)

Location: operations.ts:746-822
Issue: cleanupSentinelData deletes rows but not the sentinel_test_data table itself, which accumulates over time.

Suggestion: Add table drop after successful cleanup:

// After line 764:
if (d1Deleted > 0) {
  await db.prepare('DROP TABLE IF EXISTS sentinel_test_data').run();
}

7. Typo Fixed in Latest Commit ✅

The typo estimateCloudflareCosat → estimateCloudflareCost was already fixed in commit dac137b. Great catch!

🧪 Test Plan Validation

The PR description mentions:

Run migration 021 against test D1 database

Recommendation: Add a test script or document the manual testing procedure:

# Suggested test commands
wrangler d1 execute DB --file=packages/engine/migrations/021_sentinel.sql
curl -X POST http://localhost:8787/api/sentinel -d '{"name":"Test","profileType":"smoke","startImmediately":true}'

📊 Performance Considerations

Positive

Batched D1 operations (100 metrics per flush) reduces write overhead
Weighted system selection (selectWeightedSystem) mimics realistic traffic
Checkpoint interval (30s) balances observability with overhead

Watch For

Tests with concurrency: 500 and durationSeconds: 3600 could approach Worker CPU limits
R2 operations in batch could hit rate limits at high concurrency
The DO alarm pattern is correct, but verify it handles hibernation correctly

🔒 Security Review

Passed:

✅ No secrets in code
✅ Authentication required for all endpoints
✅ Input validation prevents resource exhaustion
✅ SQL injection protection via parameterized queries
✅ Test data properly namespaced (sentinel_test_ prefix)

Minor concern: The SENTINEL_PREFIX prevents pollution but test data from failed runs could accumulate if cleanup fails. Consider a background job to purge old test data.

📝 Documentation

Strengths:

Excellent inline documentation
Clear type definitions
Good commit messages following Conventional Commits

Suggestion: Add a usage guide in docs/sentinel-usage.md showing:

How to trigger tests via API
How to interpret results
Cost estimation formulas
Best practices for production use

🎯 Recommendation

APPROVE with minor fixes

This is excellent work. The suggested fixes are all minor and non-blocking. The code is ready to merge with the understanding that:

The memory issue in runner.ts should be addressed before running very long tests
The migration should include the test table definition
Consider adding the usage documentation

The implementation correctly follows:

✅ Loom Durable Object pattern
✅ Grove's engine-first architecture
✅ Cloudflare Workers best practices
✅ Three-phase testing methodology from the Sentinel Pattern spec

Great job on this comprehensive feature! 🌲

Reviewed by Claude Sonnet 4.5 via Claude Code

claude added 2 commits January 15, 2026 21:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement Sentinel stress testing system #352

feat: implement Sentinel stress testing system #352

AutumnsGrove commented Jan 15, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 15, 2026

Uh oh!

claude bot commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: implement Sentinel stress testing system #352

Are you sure you want to change the base?

feat: implement Sentinel stress testing system #352

Conversation

AutumnsGrove commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Jan 15, 2026

PR Review: Sentinel Stress Testing System

✅ Strengths

Architecture & Design

Database Schema

Code Quality

⚠️ Issues & Concerns

1. CRITICAL: Database Query Isolation Violation

2. Security: Missing Input Validation

3. Performance: Cloudflare Workers CPU Time Limits

4. Bug: Typo in Function Name

5. Resource Leak: Missing Cleanup

6. Potential Race Condition

7. Missing Test Coverage Mention

8. Cost Estimation Incomplete

💡 Suggestions

1. Add Rate Limiting

2. Add Metrics Persistence

3. Consider Checksum for Config Snapshot

4. Add Monitoring/Observability

5. Missing Export in package.json

📝 Minor Issues

🔒 Security Assessment

📊 Overall Assessment

Uh oh!

claude bot commented Jan 16, 2026

Sentinel Stress Testing System - Code Review

Summary

✅ Strengths

Architecture & Design

Code Quality

Security & Validation

🔍 Issues & Suggestions

1. Migration Dependency Missing (Minor)

2. Potential Memory Issue in Runner (Medium)

3. WebSocket Message Handler Empty (Low)

4. Race Condition in Alarm Scheduling (Medium)

5. Missing Validation for Custom Load Curves (Low)

6. Cleanup Function Doesn't Delete Test Table (Low)

7. Typo Fixed in Latest Commit ✅

🧪 Test Plan Validation

📊 Performance Considerations

Positive

Watch For

🔒 Security Review

📝 Documentation

🎯 Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AutumnsGrove commented Jan 15, 2026 •

edited

Loading