-
Notifications
You must be signed in to change notification settings - Fork 0
feat: implement Sentinel stress testing system #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add comprehensive infrastructure validation system for Grove:
- Database migration (008_sentinel.sql):
- sentinel_runs: Test run configuration and results
- sentinel_metrics: Per-operation timing data
- sentinel_checkpoints: Periodic snapshots during tests
- sentinel_baselines: Performance baselines for comparison
- sentinel_schedules: Cron-triggered test schedules
- clearing_status: Public status page data
- clearing_incidents: Incident tracking
- Core library (src/lib/sentinel/):
- types.ts: Type definitions for all Sentinel entities
- profiles.ts: Load profile generators following Sentinel Pattern
- Traffic composition (35% reading, 25% browsing, etc.)
- Three-phase model (Ramp-up, Peak, Steady-state)
- Preset profiles (spike, sustained, oscillation, ramp)
- operations.ts: Operation generators for D1, KV, R2
- runner.ts: Test execution engine with metrics collection
- scheduler.ts: Cron trigger handling and maintenance mode
- API endpoints (src/routes/api/sentinel/):
- GET/POST /api/sentinel - List and create test runs
- GET/POST/DELETE /api/sentinel/[id] - Manage individual runs
- The Clearing status page (src/routes/clearing/):
- Public-facing system status display
- Recent sentinel test results
- Incident tracking and display
- Component status overview
Based on the Sentinel Pattern specification for realistic
traffic simulation and three-phase testing methodology.
- Rename migration from 008 to 021 (following existing sequence) - Remove duplicate /clearing route (exists as separate service) - Remove clearing_status/clearing_incidents tables from migration (those tables belong in Clearing service's own database) - Update scheduler to use API calls to Clearing instead of direct DB
PR Review: Sentinel Stress Testing SystemGreat work on implementing a comprehensive infrastructure validation system! The implementation is well-structured and follows many best practices. Here's my detailed review: ✅ StrengthsArchitecture & Design
Database Schema
Code Quality
|
- Add SentinelDO Durable Object following Loom pattern for long-running tests (solves Worker CPU timeout issues for tests >30 seconds) - Fix typo: estimateCloudflareCosat → estimateCloudflareCost - Add comprehensive input validation to API endpoints - Move cleanup to finally block for reliable resource cleanup - Export SentinelDO from index module
Sentinel Stress Testing System - Code ReviewSummaryThis PR implements a comprehensive stress testing infrastructure for Grove (the Sentinel system) with realistic traffic simulation, three-phase testing methodology, and Durable Object integration. Overall, this is high-quality, well-architected code that follows Grove's established patterns. The implementation is thorough and production-ready with only minor suggestions for improvement. ✅ StrengthsArchitecture & Design
Code Quality
Security & Validation
🔍 Issues & Suggestions1. Migration Dependency Missing (Minor)Location: Suggestion: Add the table to the migration for completeness: -- After line 233 in migration
CREATE TABLE IF NOT EXISTS sentinel_test_data (
id TEXT PRIMARY KEY,
tenant_id TEXT NOT NULL,
data TEXT,
created_at INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_sentinel_test_data_tenant ON sentinel_test_data(tenant_id);2. Potential Memory Issue in Runner (Medium)Location: Suggestion: Implement periodic flushing similar to the DO pattern: // After line 140 in runner.ts, add:
if (this.metrics.length >= 1000) {
await this.flushMetrics();
}3. WebSocket Message Handler Empty (Low)Location: Suggestion: Add error handling: async webSocketMessage(ws: WebSocket, message: string | ArrayBuffer): Promise<void> {
try {
const data = JSON.parse(message as string);
// Handle subscription preferences, etc.
this.log('WebSocket message', data);
} catch (error) {
this.log('Invalid WebSocket message', { error: String(error) });
}
}4. Race Condition in Alarm Scheduling (Medium)Location: Suggestion: Calculate next alarm based on when it should occur: // Replace line 116:
const nextInterval = Math.floor((Date.now() - this.runState.startedAt) / 1000) + 1;
await this.state.storage.setAlarm(this.runState.startedAt + (nextInterval * 1000));5. Missing Validation for Custom Load Curves (Low)Location: Suggestion: Add validation or document that custom profiles require direct API construction. 6. Cleanup Function Doesn't Delete Test Table (Low)Location: Suggestion: Add table drop after successful cleanup: // After line 764:
if (d1Deleted > 0) {
await db.prepare('DROP TABLE IF EXISTS sentinel_test_data').run();
}7. Typo Fixed in Latest Commit ✅The typo 🧪 Test Plan ValidationThe PR description mentions:
Recommendation: Add a test script or document the manual testing procedure: # Suggested test commands
wrangler d1 execute DB --file=packages/engine/migrations/021_sentinel.sql
curl -X POST http://localhost:8787/api/sentinel -d '{"name":"Test","profileType":"smoke","startImmediately":true}'📊 Performance ConsiderationsPositive
Watch For
🔒 Security ReviewPassed:
Minor concern: The 📝 DocumentationStrengths:
Suggestion: Add a usage guide in
🎯 RecommendationAPPROVE with minor fixes This is excellent work. The suggested fixes are all minor and non-blocking. The code is ready to merge with the understanding that:
The implementation correctly follows:
Great job on this comprehensive feature! 🌲 Reviewed by Claude Sonnet 4.5 via Claude Code |
Implements the Sentinel stress testing system for validating Grove’s infrastructure scalability (10 → 10,000 → 10,000,000 users). Based on the documented Sentinel Pattern with realistic traffic composition and three-phase testing.
Key additions:
∙ Database schema for test runs, metrics, checkpoints, baselines, and schedules (migration 021)
∙ Load profile generators following the Sentinel Pattern’s traffic composition (35% reading, 25% browsing, 10% auth, etc.)
∙ Three-phase testing model: Ramp-up → Peak → Steady-state
∙ Operation generators for D1, KV, and R2 that mimic real Grove usage
∙ API endpoints for triggering and monitoring tests
∙ Scheduler integration for automated weekly/daily stress tests
∙ Clearing service integration stubs (API calls, not direct DB)
Test plan
∙ Run migration 021 against test D1 database
∙ Verify Sentinel API endpoints return correct responses
∙ Run a smoke test profile (createSmokeTestProfile()) to validate operation generators
∙ Confirm no interference with existing Clearing service (separate DB)
∙ Test scheduled run creation and cron expression parsing