Decision: Use SSE instead of WebSockets for real-time triage updates.
Rationale:
- Unidirectional communication (server → client) is sufficient
- Built-in auto-reconnection with
EventSource - Works seamlessly with HTTP/2 multiplexing
- Simpler than WebSocket (no handshake complexity)
- Better for read-heavy streaming scenarios
Trade-offs: No bidirectional communication, but not needed for our use case.
Decision: Implement cursor-based (keyset) pagination using (customerId, timestamp) composite key.
Rationale:
- Stable results even when data changes (no page drift)
- O(1) performance regardless of page depth
- Better for large datasets (1M+ rows)
- Prevents duplicate/missing items on concurrent updates
Trade-offs: Cannot jump to arbitrary page numbers, but forward/backward navigation works well.
Decision: Implement circuit breaker with 30s timeout after 3 consecutive failures per agent.
Rationale:
- Prevents cascading failures when risk/fraud APIs are down
- Allows system to degrade gracefully with fallbacks
- Auto-recovery after timeout period
- Protects downstream services from overload
Trade-offs: Brief service degradation during recovery window.
Decision: Every agent has rule-based fallback logic (no LLM dependency required).
Rationale:
- System works offline without external API calls
- Predictable behavior for testing and evaluation
- Faster response times (no network latency)
- Compliance-friendly (no data leaves infrastructure)
Trade-offs: Less sophisticated insights compared to LLM-powered analysis.
Decision: Use TanStack Virtual for alert/transaction tables (2k+ rows).
Rationale:
- Renders only visible rows (~20-30 DOM nodes vs 2000+)
- Eliminates scroll jank and memory bloat
- Maintains 60fps scrolling performance
- Works with dynamic row heights
Trade-offs: Slight complexity in implementation, but huge performance gain.
Decision: Use Prisma for type-safe database access with TypeScript.
Rationale:
- Compile-time type safety (no runtime query errors)
- Auto-generated types from schema
- Migration management built-in
- Developer productivity (autocomplete, refactoring)
Trade-offs: Slight performance overhead vs raw SQL, but negligible for our scale.
Decision: Implement token bucket rate limiter in Redis (5 req/sec per client).
Rationale:
- Distributed state across API instances
- Atomic operations (INCR, EXPIRE) prevent race conditions
- Sub-millisecond latency for checks
- TTL-based cleanup (no manual garbage collection)
Trade-offs: Additional infrastructure dependency, but essential for multi-instance deployments.
Decision: Require Idempotency-Key header for all state-changing operations.
Rationale:
- Prevents duplicate actions on network retries
- Safe to retry failed requests
- Audit trail links multiple attempts to same logical action
- Industry best practice (Stripe, Twilio, etc.)
Trade-offs: Clients must generate unique keys, but prevents costly mistakes.
Decision: Redact PAN (13-19 digit sequences) and mask emails in all logs/traces/UI.
Rationale:
- PCI-DSS compliance requirement
- Defense-in-depth (multiple layers of redaction)
- Prevents accidental exposure in logs/monitoring
- Required for audit trail security
Trade-offs: Cannot reconstruct original data from logs (intentional).
Decision: Export metrics in Prometheus format via /metrics endpoint.
Rationale:
- Industry standard for observability
- Rich ecosystem (Grafana, AlertManager, etc.)
- Pull-based model (no client config needed)
- Built-in aggregation and alerting
Trade-offs: Requires Prometheus server for visualization, but widely adopted.
Decision: Single docker-compose.yml brings up all services (Postgres, Redis, API, Web).
Rationale:
- One command to start entire stack
- Consistent environment across developers
- Easy cleanup and reset
- Production-like local setup
Trade-offs: Higher resource usage than native processes, but worth consistency.
Decision: Keep client and server in same repository with shared types.
Rationale:
- Atomic commits across frontend/backend
- Shared TypeScript types (API contracts)
- Simplified CI/CD (single build pipeline)
- Easier code reviews (see both sides of changes)
Trade-offs: Larger repository, but better developer experience.