diff --git a/.env.example b/.env.example index 5ce377b..3ac171e 100644 --- a/.env.example +++ b/.env.example @@ -55,11 +55,17 @@ BUCKET_NAME=my-media-bucket GCS_SA_PATH=.secrets/service-account.json BUCKET_SA_PATH=.secrets/service-account.json -# S3 / S3-compatible (when BUCKET_PROVIDER=s3) — not yet implemented (sub-project 4) -# BUCKET_REGION=us-east-1 -# BUCKET_ACCESS_KEY= -# BUCKET_SECRET_KEY= -# BUCKET_ENDPOINT_URL= # optional — for MinIO or S3-compatible endpoints +# S3 / S3-compatible (when BUCKET_PROVIDER=s3, e.g. AWS S3 or MinIO) +# S3_* names are read by both the Go API and the Python worker. +# S3_BUCKET_NAME=my-media-bucket +# S3_REGION=us-east-1 +# S3_ACCESS_KEY_ID= +# S3_SECRET_ACCESS_KEY= +# S3_ENDPOINT_URL=http://localhost:9000 # internal/server-side endpoint (MinIO / S3-compatible). Leave unset for AWS S3. +# S3_PUBLIC_ENDPOINT_URL=http://localhost:9000 # optional client-facing endpoint for presigned + public URLs. +# # Set this when internal services reach the store by a private host +# # (e.g. http://minio:9000) that external clients cannot resolve. +# # Falls back to S3_ENDPOINT_URL when empty. # ─── OpenTelemetry ──────────────────────────────────────────────────────────── OTEL_EXPORTER_OTLP_ENDPOINT=otel-collector:4317 diff --git a/.gitignore b/.gitignore index aded70f..dc54106 100644 --- a/.gitignore +++ b/.gitignore @@ -93,7 +93,10 @@ coverage/ .vscode/ *.code-workspace # Ignore generated documentation -docs/ +docs/superpowers/ +docs/*.json +docs/ASSESSMENT.md +docs/FIX_PLAN.md doc/ *.pdf # Ignore custom files @@ -118,3 +121,5 @@ API_DOCUMENTATION.md .env.local .secrets/ docker-compose.override.yml + +.pytest_cache/ \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index 6a74ad9..1e1bc95 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -73,7 +73,7 @@ Entry: `worker/__main__.py` → `consumer/main.py` - `_handle_job` takes a `SELECT … FOR UPDATE` lock, marks the row `in_progress`, calls `process_asset_dispatch`, then marks `done` + acks the stream message. On failure it re-queues (up to `MAX_JOB_ATTEMPTS`). - `_recover_stuck_pending` re-adds `pending/in_progress` jobs older than 2 min back to the stream (recovery path, called when no messages available). - `worker/processing/processor.py` — `process_asset_dispatch` routes by asset type to `images.py` or `videos.py`. -- `worker/storage/` — `StorageX` ABC; `GCSStorage` is the concrete impl. +- `worker/storage/` — `StorageX` ABC; `GCSStorage` and `S3Storage` concrete impls (selected by a factory in `worker/storage/__init__.py`). `S3Storage` mirrors the Go split-endpoint behavior: object I/O uses `endpoint_url`, persisted variant URLs use `public_endpoint_url`. - `worker/utils/metrics.py` — Prometheus metrics via `prometheus_client`. ### Shared concerns @@ -84,7 +84,7 @@ Entry: `worker/__main__.py` → `consumer/main.py` **Error types (Go):** `pkg/errors` has typed API errors (`NotFoundError`, `BadRequestError`, `UnauthorizedError`, `ConflictError`, `InternalServerErrorError`) each embedding `*ApiError` (carries `StatusCode`). Handler layer type-asserts on these to set HTTP status. Use `fmt.Errorf("op: %w", err)` for internal wrapping; use `errors.New*` constructors (e.g. `errors.NewNotFoundError`) at the service/handler boundary. -**Storage (`pkg/utils/storagex`):** `StorageX` interface with `PutObject`, `GetObject`, `GeneratePresignedURL`, `PublicURL`, `DeleteObject`. Current impl: `GCSStorage`. S3/MinIO provider types exist in config but are not yet implemented. +**Storage (`pkg/utils/storagex`):** `StorageX` interface with `PutObject`, `GetObject`, `GeneratePresignedURL`, `PublicURL`, `DeleteObject`. Implementations: `GCSStorage` and `s3Storage` (S3 / S3-compatible MinIO). The S3 impl supports a split endpoint — `Endpoint` (internal/server-side) for object I/O and `PublicEndpoint` (client-facing) for presigned + public URLs; presigning happens against the public endpoint because SigV4 signs the Host header. **OTel:** Full tracing + metrics on the API side. Go instruments are in `internal/metrics/metrics.go`. Collector config at `observability/otel-collector.yml`; Grafana/Loki/Tempo/Prometheus configs in `observability/`. Python side uses `prometheus_client` (not OTel). diff --git a/README.md b/README.md index 4af88f1..80a0a4e 100644 --- a/README.md +++ b/README.md @@ -78,7 +78,7 @@ Create a `.env.local` file in the project root (`development` → `.env.local`, # Server ENV=development HOST=0.0.0.0 -PORT=8080 +PORT=5010 LOG_LEVEL=DEBUG # Database @@ -108,7 +108,11 @@ S3_BUCKET_NAME=your-bucket-name S3_REGION=us-east-1 S3_ACCESS_KEY_ID=your-access-key S3_SECRET_ACCESS_KEY=your-secret-key -S3_ENDPOINT_URL=http://localhost:9000 # set for MinIO / S3-compatible stores +S3_ENDPOINT_URL=http://localhost:9000 # internal/server-side endpoint (MinIO / S3-compatible) +# Optional client-facing endpoint baked into presigned + public URLs. Set this +# when internal services reach the store by a private host (e.g. http://minio:9000) +# that external clients cannot resolve. Falls back to S3_ENDPOINT_URL when empty. +S3_PUBLIC_ENDPOINT_URL=http://localhost:9000 # Worker STREAM_NAME=media:jobs @@ -159,8 +163,27 @@ python -m worker # worker ### 6. Test the API +All `/api/v1` routes require a Bearer token — an AES-256-GCM token carrying a +user id, signed with `ENCRYPTION_KEY` (see [`pkg/utils/crypt.go`](pkg/utils/crypt.go)). +Mint one for local testing: + +```bash +TOKEN=$(python3 - <<'PY' +import base64, os +from cryptography.hazmat.primitives.ciphers.aead import AESGCM +key = b"change_me_to_a_32_byte_secret____" # your 32-byte ENCRYPTION_KEY +nonce = os.urandom(12) +ct = AESGCM(key).encrypt(nonce, b"demo-user", None) +print(base64.urlsafe_b64encode(nonce + ct).rstrip(b"=").decode()) +PY +) +``` + +Request a presigned upload URL: + ```bash -curl -X POST http://localhost:8080/api/v1/assets/upload \ +curl -X POST http://localhost:5010/api/v1/storage/presign \ + -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "fileName": "image.jpg", @@ -169,6 +192,19 @@ curl -X POST http://localhost:8080/api/v1/assets/upload \ }' ``` +Upload the file to the returned `uploadUrl`, then mark the asset complete to +enqueue processing: + +```bash +curl -X PUT "" -H "Content-Type: image/jpeg" --data-binary @image.jpg + +curl "http://localhost:5010/api/v1/assets//complete" \ + -H "Authorization: Bearer $TOKEN" +``` + +> Prefer the scripted path? [`scripts/demo-e2e.sh`](scripts/demo-e2e.sh) runs this +> entire flow (image + video + webhooks) end-to-end — see **Run the demo** below. + ## 🐳 Docker Deployment ### Pull the published image (GHCR) @@ -199,9 +235,12 @@ kubectl apply -f deploy/k8s/ ## 📖 API Documentation -### Upload Asset +All `/api/v1` routes require an `Authorization: Bearer ` header (see +[Test the API](#6-test-the-api) for how to mint a token). + +### Request a presigned upload URL -**Endpoint:** `POST /api/v1/assets/upload` +**Endpoint:** `POST /api/v1/storage/presign` **Request:** ```json @@ -215,32 +254,86 @@ kubectl apply -f deploy/k8s/ **Response:** ```json { - "uploadUrl": "https:///...", - "assetId": "550e8400-e29b-41d4-a716-446655440000", - "method": "PUT", - "headers": { - "Content-Type": "image/jpeg" - }, - "objectPath": "media/raw/550e8400-e29b-41d4-a716-446655440000", - "publicUrl": "https:///...", - "expiresAt": 1702468800 + "status": "success", + "data": { + "uploadUrl": "http://localhost:9000/...", + "assetId": "550e8400-e29b-41d4-a716-446655440000", + "method": "PUT", + "headers": { "Content-Type": "image/jpeg" }, + "objectPath": "example.jpg", + "publicUrl": "http://localhost:9000/...", + "expiresAt": 300 + } } ``` -> The `uploadUrl` / `publicUrl` host depends on the configured storage provider (GCS, S3, or a MinIO endpoint). +> The `uploadUrl` / `publicUrl` host comes from the configured storage provider. +> For MinIO it is `S3_PUBLIC_ENDPOINT_URL` (the client-facing endpoint), so the +> URL is reachable from wherever the client runs — see [Storage Providers](#storage-providers). -### Mark Asset as Uploaded +### Mark an asset complete (enqueue processing) -**Endpoint:** `POST /api/v1/assets/{assetId}/uploaded` +**Endpoint:** `GET /api/v1/assets/{assetId}/complete` + +Verifies the raw object exists in storage, transitions the asset to `uploaded`, +creates the processing job, and enqueues it (transactionally, via the outbox). **Response:** ```json { - "message": "Asset marked as uploaded", - "assetId": "550e8400-e29b-41d4-a716-446655440000" + "status": "success", + "message": "Asset marked as uploaded" } ``` +### Webhooks + +Register an endpoint to receive processing-lifecycle events. + +**Endpoints:** +- `POST /api/v1/webhooks` — register `{ "url", "secret", "events" }` +- `GET /api/v1/webhooks` — list your registrations +- `DELETE /api/v1/webhooks/{id}` — remove a registration + +**Events:** `job.starting`, `job.started`, `job.done`, `job.failed`. + +Deliveries are signed: each POST carries an `X-Webhook-Signature: sha256=` +header computed over the JSON body using your registration `secret` (stored +encrypted at rest). A background dispatcher delivers pending events with +exponential-backoff retries and tracks them in the `webhook_deliveries` table. + +```bash +curl -X POST http://localhost:5010/api/v1/webhooks \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "url": "https://example.com/hooks/mpiper", + "secret": "my-signing-secret", + "events": ["job.starting", "job.started", "job.done", "job.failed"] + }' +``` + +## 🎬 Run the demo + +[`scripts/demo-e2e.sh`](scripts/demo-e2e.sh) drives the entire pipeline from the +host — exactly like a real client — for both an image and a video, including +webhook delivery. Bring the stack up **with the webhooks overlay**, then run it: + +```bash +docker compose -f docker-compose.yml -f docker-compose.webhooks.yml up -d --build + +./scripts/demo-e2e.sh +``` + +For each asset it presigns an upload, PUTs the file straight to MinIO over the +public `localhost:9000` endpoint, marks it complete, waits for the worker to +produce variants, fetches a variant back over HTTP, and asserts the +`job.starting → job.started → job.done` webhooks were delivered. It prints a +PASS/FAIL summary and exits non-zero on any failure. + +Requirements on the host: `bash`, `curl`, `jq`, `docker`, and a `python3` with +the `cryptography` package (used only to mint the auth token). + ## 🔧 Development ### Project Structure @@ -318,6 +411,20 @@ MPiper selects a storage backend via `BUCKET_PROVIDER`: Both the Go API and the Python worker share the same provider selection and env vars, so a single configuration drives the whole pipeline. +#### Internal vs public endpoints (`S3_PUBLIC_ENDPOINT_URL`) + +When the store is reachable by a different host internally than externally — +the classic Docker case, where services talk to `http://minio:9000` but a +browser or a host-run client must use `http://localhost:9000` — set both: + +- `S3_ENDPOINT_URL` — the **internal/server-side** endpoint used for object I/O (`http://minio:9000`) +- `S3_PUBLIC_ENDPOINT_URL` — the **client-facing** endpoint baked into presigned upload URLs and persisted variant URLs (`http://localhost:9000`) + +This matters because SigV4 signs the `Host` header: a presigned URL must be +generated against the exact host the client will connect to, so it can't simply +be rewritten afterwards. When `S3_PUBLIC_ENDPOINT_URL` is unset it falls back to +`S3_ENDPOINT_URL` (single-endpoint behavior). + ### Observability The API emits OpenTelemetry traces and metrics; the worker exposes Prometheus metrics. The `observability/` directory contains a ready-to-run collector plus Grafana, Tempo, Loki, and Prometheus configuration. @@ -368,9 +475,9 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file ## 📊 Roadmap - [x] Support for AWS S3 / MinIO storage -- [x] Webhook delivery tracking (schema) +- [x] Webhook delivery with HMAC signing + retry tracking +- [x] Video transcoding with FFmpeg (poster, 720p, preview) - [ ] Support for Azure Blob Storage -- [ ] Video transcoding with FFmpeg - [ ] Admin dashboard - [ ] Batch processing API - [ ] CDN integration diff --git a/cmd/server/main.go b/cmd/server/main.go index f3c835b..bb4f886 100644 --- a/cmd/server/main.go +++ b/cmd/server/main.go @@ -3,6 +3,7 @@ package main import ( "context" "errors" + "fmt" "net/http" "os" "os/signal" @@ -13,7 +14,11 @@ import ( "github.com/rndmcodeguy20/mpiper/internal/config" "github.com/rndmcodeguy20/mpiper/internal/database" "github.com/rndmcodeguy20/mpiper/internal/metrics" + "github.com/rndmcodeguy20/mpiper/internal/outbox" + "github.com/rndmcodeguy20/mpiper/internal/queue" + "github.com/rndmcodeguy20/mpiper/internal/repository" "github.com/rndmcodeguy20/mpiper/internal/server" + "github.com/rndmcodeguy20/mpiper/internal/webhook" "github.com/rndmcodeguy20/mpiper/pkg/logger" "go.uber.org/zap" ) @@ -27,6 +32,16 @@ var ( ) func main() { + // --health-check is used by the container HEALTHCHECK. It must be a + // lightweight probe against the already-running server — NOT a second + // server boot (which would fail to bind the port). Exit 0 if /healthz is OK. + for _, arg := range os.Args[1:] { + if arg == "--health-check" { + runHealthCheck() + return + } + } + cfg, err := config.InitializeConfig(config.ToEnvironment(Env)) if err != nil { panic(err) @@ -91,6 +106,46 @@ func main() { baseLogger.Info("Migrations applied successfully") } + // --- Outbox relay --- + rc, err := queue.MustGetRedisClient(&cfg.Redis, baseLogger) + if err != nil { + baseLogger.Sugar().Fatalf("Failed to create Redis client: %v", err) + } + rq := queue.NewRedisQueue(serverCtx, rc, queue.RedisQueueOptions{ + QueueName: "media:jobs", + ConnectionTimeOut: 2 * time.Second, + MaxStreamLength: 10_000, + MaxRetries: 3, + RetryInterval: 2 * time.Second, + EnableMetrics: true, + }, baseLogger, m) + + outboxRepo := repository.NewOutboxRepository(db, baseLogger) + relay := outbox.NewRelay(outboxRepo, rq, baseLogger, m, cfg.Outbox.RelayInterval, cfg.Outbox.RelayBatch) + _ = m.RegisterOutboxPendingFunc(func(ctx context.Context) (int64, error) { + return outboxRepo.CountPending(ctx) + }) + go relay.Start(serverCtx) + go relay.StartCleanup(serverCtx, cfg.Outbox.Retention) + + // --- Webhook dispatcher --- + webhookDispatcher := webhook.NewDispatcher(db, baseLogger, webhook.DispatcherConfig{ + PollInterval: cfg.Webhook.PollInterval, + BatchSize: cfg.Webhook.BatchSize, + Timeout: cfg.Webhook.Timeout, + MaxAttempts: cfg.Webhook.MaxAttempts, + EncryptionKey: cfg.EncryptionKey, + Retention: cfg.Webhook.Retention, + }) + go webhookDispatcher.Start(serverCtx) + go webhookDispatcher.StartCleanup(serverCtx) + + _ = m.RegisterWebhookPendingFunc(func(ctx context.Context) (int64, error) { + var count int64 + err := db.GetContext(ctx, &count, `SELECT COUNT(*) FROM webhook_deliveries WHERE status = 'pending'`) + return count, err + }) + srv := server.NewServer(db, cfg, m) go func() { if err := srv.Start(); err != nil && !errors.Is(err, http.ErrServerClosed) { @@ -107,3 +162,26 @@ func main() { baseLogger.Error("shutdown failed", zap.Error(err)) } } + +// runHealthCheck performs a lightweight HTTP probe against the running server's +// /healthz endpoint and exits 0 (healthy) or 1 (unhealthy). It deliberately +// avoids the full startup path so it can run as a container HEALTHCHECK without +// contending for the listen port. +func runHealthCheck() { + port := os.Getenv("PORT") + if port == "" { + port = "5010" + } + client := &http.Client{Timeout: 3 * time.Second} + resp, err := client.Get(fmt.Sprintf("http://127.0.0.1:%s/healthz", port)) + if err != nil { + fmt.Fprintf(os.Stderr, "health check failed: %v\n", err) + os.Exit(1) + } + defer func() { _ = resp.Body.Close() }() + if resp.StatusCode != http.StatusOK { + fmt.Fprintf(os.Stderr, "health check failed: status %d\n", resp.StatusCode) + os.Exit(1) + } + os.Exit(0) +} diff --git a/deploy/docker/worker.dockerfile b/deploy/docker/worker.dockerfile index 9fd0295..dd435b3 100644 --- a/deploy/docker/worker.dockerfile +++ b/deploy/docker/worker.dockerfile @@ -1,55 +1,40 @@ -# Stage 1: Builder - Install dependencies +# Stage 1: Builder FROM python:3.11-slim AS builder -# Install system dependencies required for Python packages -RUN apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends \ gcc \ g++ \ libpq-dev \ libmagic1 \ && rm -rf /var/lib/apt/lists/* -# Install Poetry -RUN pip install --no-cache-dir poetry==1.7.1 - WORKDIR /build -# Copy dependency files -COPY pyproject.toml poetry.lock* poetry.toml ./ - -# Configure Poetry to not create virtual environments (we're in a container) -RUN poetry config virtualenvs.create false +# requirements.txt exported from poetry.lock via: +# poetry export -f requirements.txt --only main --without-hashes -o requirements.txt +COPY requirements.txt . -# Install dependencies -RUN poetry install --no-dev --no-interaction --no-ansi +RUN pip install --no-cache-dir --prefix=/install -r requirements.txt # Stage 2: Runtime FROM python:3.11-slim -# Install runtime dependencies + ffmpeg -RUN apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends \ libpq5 \ libmagic1 \ ffmpeg \ ca-certificates \ && rm -rf /var/lib/apt/lists/* -# Optional: sanity check -RUN ffmpeg -version && ffprobe -version - -# Create non-root user RUN useradd -m -u 1000 -s /bin/bash worker WORKDIR /app -# Copy Python dependencies from builder -COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages -COPY --from=builder /usr/local/bin /usr/local/bin +# Only copy what was explicitly installed, not the builder's full bin/ +COPY --from=builder /install /usr/local -# Copy application code COPY worker/ ./worker/ -# Create temp directory for processing RUN mkdir -p /tmp/mpiper && chown -R worker:worker /tmp/mpiper /app ENV PYTHONUNBUFFERED=1 \ diff --git a/docker-compose.webhooks.yml b/docker-compose.webhooks.yml new file mode 100644 index 0000000..e9b5200 --- /dev/null +++ b/docker-compose.webhooks.yml @@ -0,0 +1,17 @@ +# docker-compose.webhooks.yml +# Overlay for webhook dev-testing. Run with: +# docker compose -f docker-compose.yml -f docker-compose.webhooks.yml up +# +# Adds a lightweight webhook receiver that logs all incoming POST requests. + +services: + webhook-receiver: + image: mendhak/http-https-echo:latest + container_name: mpiper-webhook-receiver + ports: + - "8888:8080" + environment: + HTTP_PORT: 8080 + networks: + - mpiper_net + restart: unless-stopped diff --git a/docker-compose.yml b/docker-compose.yml index 6d0e3a1..8152097 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -28,7 +28,20 @@ services: environment: # Compose-internal addressing — services reach each other by name. DB_HOST: postgres + DB_PORT: "5432" REDIS_CONNECTION_STRING: redis://redis:6379/0 + # Storage — use the local MinIO instance. + BUCKET_PROVIDER: s3 + S3_BUCKET_NAME: mpiper + S3_REGION: us-east-1 + S3_ACCESS_KEY_ID: minioadmin + S3_SECRET_ACCESS_KEY: minioadmin + S3_ENDPOINT_URL: http://minio:9000 + # Client-facing endpoint baked into presigned + public URLs. Internal + # services reach MinIO at minio:9000, but host clients (and the demo + # script) must use the published localhost:9000. SigV4 signs the Host + # header, so presigning has to target the host the client will connect to. + S3_PUBLIC_ENDPOINT_URL: http://localhost:9000 # Always apply migrations on first run; override in a compose override file. AUTO_MIGRATE: "true" ports: @@ -38,6 +51,8 @@ services: condition: service_healthy redis: condition: service_healthy + minio: + condition: service_healthy networks: - mpiper_net restart: unless-stopped @@ -61,12 +76,28 @@ services: environment: # Compose-internal addressing — services reach each other by name. DB_HOST: postgres + DB_PORT: "5432" REDIS_CONNECTION_STRING: redis://redis:6379/0 + # Storage — use the local MinIO instance. + BUCKET_PROVIDER: s3 + S3_BUCKET_NAME: mpiper + S3_REGION: us-east-1 + S3_ACCESS_KEY_ID: minioadmin + S3_SECRET_ACCESS_KEY: minioadmin + S3_ENDPOINT_URL: http://minio:9000 + # Variant URLs persisted to the DB use this client-facing endpoint so they + # are reachable from the host/browser; actual object I/O uses minio:9000. + S3_PUBLIC_ENDPOINT_URL: http://localhost:9000 + AUTO_MIGRATE: "false" depends_on: + api: + condition: service_healthy postgres: condition: service_healthy redis: condition: service_healthy + minio: + condition: service_healthy networks: - mpiper_net restart: unless-stopped @@ -83,14 +114,12 @@ services: postgres: image: postgres:16-alpine container_name: mpiper-postgres - # Postgres init reads POSTGRES_* below. These mirror DB_USER/DB_PASSWORD/DB_NAME. - # Values interpolate from a `.env` file or the shell; the defaults match - # .env.example so plain `docker compose up` works out of the box. To override, - # set DB_USER / DB_PASSWORD / DB_NAME in `.env` (compose's default env file). environment: POSTGRES_USER: ${DB_USER:-mpiper} POSTGRES_PASSWORD: ${DB_PASSWORD:-changeme} POSTGRES_DB: ${DB_NAME:-mpiper} + ports: + - "5433:5432" volumes: - mpiper_postgres_data:/var/lib/postgresql/data networks: @@ -109,6 +138,8 @@ services: redis: image: redis:7-alpine container_name: mpiper-redis + ports: + - "6380:6379" networks: - mpiper_net restart: unless-stopped @@ -118,6 +149,47 @@ services: timeout: 3s retries: 5 + # ========================================================================== + # MinIO — local S3-compatible object storage + # ========================================================================== + minio: + image: minio/minio:latest + container_name: mpiper-minio + command: server /data --console-address ":9001" + environment: + MINIO_ROOT_USER: minioadmin + MINIO_ROOT_PASSWORD: minioadmin + ports: + - "9000:9000" # S3 API + - "9001:9001" # Console UI + volumes: + - mpiper_minio_data:/data + networks: + - mpiper_net + restart: unless-stopped + healthcheck: + test: ["CMD", "mc", "ready", "local"] + interval: 5s + timeout: 5s + retries: 10 + start_period: 5s + + # One-shot container that creates the default bucket on first run. + minio-init: + image: minio/mc:latest + container_name: mpiper-minio-init + depends_on: + minio: + condition: service_healthy + entrypoint: > + /bin/sh -c " + mc alias set local http://minio:9000 minioadmin minioadmin && + mc mb --ignore-existing local/mpiper && + mc anonymous set download local/mpiper + " + networks: + - mpiper_net + # ============================================================================ # NETWORKS # ============================================================================ @@ -131,3 +203,4 @@ networks: # ============================================================================ volumes: mpiper_postgres_data: # Postgres data — persists across restarts + mpiper_minio_data: # MinIO object data — persists across restarts diff --git a/docs/arch/ingress-outbox-and-idempotent-consumer.md b/docs/arch/ingress-outbox-and-idempotent-consumer.md new file mode 100644 index 0000000..906dc17 --- /dev/null +++ b/docs/arch/ingress-outbox-and-idempotent-consumer.md @@ -0,0 +1,250 @@ +# Spec: Ingress Transactional Outbox + Idempotent Consumer + +> **Status:** Draft — for review & strengthening. +> **Owner:** Shantanu Mane. +> **Related:** [reliability-and-correctness.md](./reliability-and-correctness.md) (spine item #1). +> **Linear epic:** [DEV-55](https://linear.app/shans-odyssey/issue/DEV-55) (children DEV-56…DEV-60). +> **Last updated:** 2026-06-17. + +--- + +## 1. Summary + +Close the producer-side dual-write hazard in the asset-upload path by introducing a +**transactional outbox** on ingress (API → Redis stream), and make the worker +**effectively-once** by formalising the consumer's existing idempotency. Today the API +commits the job row and *then* publishes to Redis in a separate step; a crash between +the two strands the job until a slow recovery sweep notices. The outbox makes +"intent to publish" part of the same DB transaction, and a relay publishes +asynchronously with at-least-once delivery — which the idempotent consumer absorbs. + +## 2. Goals / Non-goals + +**Goals** +- No job can be committed without a durable, atomic intent-to-publish. +- Bounded, observable publish latency (relay interval), not a 2-min recovery floor. +- At-least-once delivery from the relay, absorbed by an effectively-once consumer. +- Keep the change minimal and consistent with the already-designed egress outbox + (DEV-40/47), so the two halves of the system share one mental model. + +**Non-goals (this spec)** +- Trace-context propagation through the stream + OTel on the worker → **fast-follow** + (separate issue). +- Egress/webhook outbox → already covered by DEV-40 family. +- CDC-based relay (logical replication / Debezium) → documented as the "100×" option, + not built now. +- Exactly-once *delivery* → explicitly out of scope; we target effectively-once + *effects*. + +## 3. Background — current behaviour + +### Producer dual-write (`internal/service/asset.go` → `MarkAssetUploaded`, ~L179) + +``` +BEGIN TX + MarkAssetUploadedTx -- assets.status = uploaded (L226) + InsertProcessAssetJobTx -- jobs row, status = pending (L246) +COMMIT -- L260 +queue.Enqueue({job_id, asset_id, event:"asset_uploaded"}) -- L271, AFTER commit +``` + +**Hazard:** crash/restart/network failure between `COMMIT` (L260) and `Enqueue` +(L271) → job durably `pending`, no stream message ever published. + +**Current backstop:** worker `_recover_stuck_pending` (`consumer.py:344`, DEV-35) +re-queues `pending`/`in_progress` jobs whose `updated_at < now() - interval '2 +minutes'`. Correctness currently *depends* on this sweep; publish latency floor under +failure is ~2 min, and the guarantee is implicit. + +### Consumer idempotency (already partially present) + +`_handle_job` (`consumer.py:188`): +- `SELECT job_id, asset_id, status, attempts FROM jobs WHERE job_id=%s FOR UPDATE` (L199) — claims the job. +- **`if status == "done": xack and return`** (L212–214) — already acks-and-skips redelivered, completed jobs. +- Else `UPDATE … status='in_progress', attempts=attempts+1` (L219), dispatch, then on success `status='done'` + asset `ready` + `xack` (L266–276). +- Variants are content-addressed (`variant_hash`), so storage writes are already idempotent. + +So a `done`-fast-path exists; this spec **hardens and documents** it rather than building it. + +## 4. Design + +### 4.1 `event_outbox` table + +```sql +CREATE TABLE event_outbox ( + id BIGSERIAL PRIMARY KEY, + aggregate_id UUID NOT NULL, -- asset_id + job_id BIGINT, -- nullable; canonical when present + event TEXT NOT NULL, -- e.g. 'asset_uploaded' + payload JSONB NOT NULL, -- exact stream body to publish + traceparent TEXT, -- reserved for fast-follow; nullable now + status TEXT NOT NULL DEFAULT 'pending', -- pending | published + attempts INT NOT NULL DEFAULT 0, + last_error TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + published_at TIMESTAMPTZ +); + +-- Relay hot path: pending rows oldest-first. +CREATE INDEX idx_event_outbox_pending + ON event_outbox (id) WHERE status = 'pending'; +``` + +Notes: +- `payload` stores the *exact* map the producer would have passed to `Enqueue`, so the + relay is a dumb pipe (no business logic). +- Partial index keeps the poller scan cheap as published rows accumulate. +- Retention/cleanup: see §11. + +### 4.2 Producer change (`MarkAssetUploaded`) + +- Inside the **existing transaction**, after `InsertProcessAssetJobTx`, `INSERT` one + `event_outbox` row with `payload = {job_id, asset_id, event:'asset_uploaded'}`. +- **Remove the post-commit `queue.Enqueue` call (L271–…).** Publishing is now the + relay's job. +- Net effect: `{asset.status, jobs row, outbox row}` commit atomically or not at all. + +### 4.3 Stream envelope / payload + +Unchanged from today's `Enqueue` body (`{"body": json(payload)}` on stream +`media:jobs`), so the worker needs **no parsing change**. `traceparent` column is +reserved but unused until the fast-follow. + +### 4.4 Relay publisher (Go, in-API goroutine) + +Mirrors the DEV-47 webhook poller for consistency. + +``` +loop every RELAY_INTERVAL: + BEGIN TX + SELECT id, payload FROM event_outbox + WHERE status='pending' + ORDER BY id + LIMIT RELAY_BATCH + FOR UPDATE SKIP LOCKED -- safe under multiple API replicas + for each row: + id := RedisQueue.Enqueue(payload) -- existing retrying XADD + UPDATE event_outbox + SET status='published', published_at=now() + WHERE id = row.id + COMMIT + on Enqueue error: UPDATE attempts=attempts+1, last_error=… ; leave pending +``` + +- **At-least-once:** if `Enqueue` succeeds but the row update / commit fails, the row + re-publishes next tick → duplicate stream message → consumer dedup absorbs it. +- `SKIP LOCKED` lets multiple API replicas run the relay without coordination. +- Graceful shutdown: finish in-flight batch, stop loop on server context cancel. + +### 4.5 Idempotent consumer (harden existing) + +- Keep & document the `done`-fast-path (L212). Add an explicit test for it. +- Confirm the `FOR UPDATE` claim + `attempts` increment behave under redelivery while + `in_progress` (another worker holds the lock → blocks, then re-reads status). +- **Inbox table deferred:** when DEV-46 (worker inserts a non-idempotent + `webhook_deliveries` row on completion) lands, add a `processed_messages` inbox keyed + on stream message id so each effect fires at most once. Tracked as a dependency, not + built here. + +### 4.6 Coexistence with the recovery sweep (DEV-35) + +Keep `_recover_stuck_pending` during and after rollout — it becomes a *backstop* for +the relay (e.g. relay down for an extended window), not the primary path. Its 2-min +threshold may be revisited once the relay is the norm. + +## 5. Delivery semantics + +- **Producer → outbox:** exactly-once (single DB transaction). +- **Relay → stream:** at-least-once. +- **Stream → worker:** at-least-once (Redis consumer group). +- **Worker effects:** effectively-once (job-status dedup + content-addressed variants; + inbox for future non-idempotent effects). + +## 6. Failure modes + +| Failure | Before (today) | After | +|---|---|---| +| Crash after COMMIT, before publish | job `pending` until ~2-min sweep | outbox row committed; relay publishes within `RELAY_INTERVAL` | +| Relay enqueues, then crashes pre-mark | n/a | row stays `pending` → re-published → duplicate msg → consumer skips (job `done`) | +| Multiple API replicas run relay | n/a | `FOR UPDATE SKIP LOCKED` → no double-claim | +| Worker crash after side effect, pre-XACK | redelivered; variants safe, other effects may double | `done`-fast-path skips; inbox (future) guards non-idempotent effects | +| Redis stream data loss | sweep re-adds from DB | outbox rows still `pending` if unpublished; published+lost rows rely on sweep (document residual) | +| Relay down for a long window | n/a | recovery sweep backstop still re-queues | + +## 7. Ordering + +- Per-asset ordering preserved: one outbox row per asset transition, relay reads + `ORDER BY id`. Cross-asset order under `SKIP LOCKED` batching is not guaranteed and + not required. + +## 8. Configuration + +| Env var | Default | Purpose | +|---|---|---| +| `OUTBOX_RELAY_INTERVAL` | `1s` | poll cadence | +| `OUTBOX_RELAY_BATCH` | `100` | rows per tick | +| `OUTBOX_RELAY_ENABLED` | `true` | kill-switch for rollout | +| `OUTBOX_RETENTION` | `168h` | published-row cleanup age (§11) | + +## 9. Observability + +Metrics (OTel, API side): `outbox_pending_gauge`, `outbox_published_total`, +`outbox_publish_failures_total`, `outbox_relay_lag_seconds` (now − oldest pending +`created_at`). Alert on lag breaching a threshold (the new, *explicit* SLO that +replaces the implicit 2-min floor). + +## 10. Testing strategy + +- **Unit:** producer writes outbox row in-tx and no longer calls `Enqueue`; relay + marks rows published; relay leaves rows pending on enqueue error. +- **Integration (testcontainers: postgres + redis):** end-to-end mark-uploaded → + relay → stream → worker → asset `ready`. +- **Chaos (issue F):** kill the API between COMMIT and the relay tick → assert the job + still reaches the stream. Kill the worker after `PutObject`, before `XACK` → assert + exactly one variant row + one storage object after redelivery. + +## 11. Migration & rollout + +1. Ship the migration (additive; no impact on existing flow). +2. Ship producer + relay behind `OUTBOX_RELAY_ENABLED`. With the flag on, the producer + writes the outbox row and stops direct-enqueuing; the relay publishes. +3. Recovery sweep stays on throughout (backstop). +4. Cleanup job/cron deletes `status='published' AND published_at < now() - OUTBOX_RETENTION`. + +Rollback: flip `OUTBOX_RELAY_ENABLED=false` and restore the direct `Enqueue` (keep the +old call path behind the flag during the first release for safety). + +## 12. Tradeoffs + +- **Polling vs CDC:** polling chosen — no extra infra, matches DEV-47, fine at this + scale. CDC is the documented 100× path (ADR candidate). +- **In-API goroutine vs separate relay process:** in-API now; extractable later. +- **Two outboxes (ingress + egress) vs one abstraction:** kept separate (different + producers/consumers); shared pattern is an ADR candidate, not premature abstraction. +- **Dedup granularity:** job-status now; message-id inbox when non-idempotent effects + land. + +## 13. Open questions (to strengthen during review) + +1. Should the producer keep a **dual-write fallback** (direct `Enqueue`) behind the + flag for the first release, or cut straight to outbox-only? +2. Relay **interval/batch** defaults — tune against expected upload rate. +3. Should the relay run on **all API replicas** (SKIP LOCKED) or be leader-elected to + reduce DB churn? +4. **Retention**: cron vs `pg_partman` partitioned outbox for cheap drops at high + volume. +5. Do we want a **`max_attempts` → dead-letter** state on the outbox itself (poison + payload), mirroring the egress DLQ? + +## 14. Work breakdown → Linear + +| Issue | Scope | Depends on | +|---|---|---| +| **DEV-56** | `event_outbox` migration | — | +| **DEV-57** | Producer writes outbox row in-tx + relay publisher (ship together) | DEV-56 | +| **DEV-58** | Idempotent consumer — harden + document `done`-fast-path; reserve inbox | DEV-57 | +| **DEV-59** (fast-follow) | `traceparent` through stream + OTel on worker | DEV-57 | +| **DEV-60** (optional) | Chaos tests — crash-window guarantees | DEV-57, DEV-58 | + +> Issue 2 combines producer + relay deliberately: an outbox row with no relay is a +> broken intermediate state (nothing publishes), so the two must land in one PR. diff --git a/docs/arch/reliability-and-correctness.md b/docs/arch/reliability-and-correctness.md new file mode 100644 index 0000000..6dbd020 --- /dev/null +++ b/docs/arch/reliability-and-correctness.md @@ -0,0 +1,171 @@ +# Reliability & Correctness Roadmap + +> **Living doc.** Captures the "what makes MPiper more than a toy" thinking and the +> distributed-systems work that follows from it. Revisit periodically: check what's +> shipped, what regressed, and what the next highest-leverage piece is. +> +> Last substantive update: 2026-06-17 (v1.0.0). + +--- + +## The thesis + +CRUD is reproducible by a single prompt. What isn't: reasoning about what happens +when the worker dies *after* writing to S3 but *before* acking the message. The +differentiator between a toy and a serious system — and the thing that makes an +engineer go "I'd work on this" and a hiring manager go "this person knows their +stuff" — is **demonstrated reasoning about failure modes**, plus the artifacts that +prove the reasoning happened (ADRs, a failure-modes table, chaos tests). + +MPiper is an unusually good canvas for this because it is: +- **async** (produce/consume across two services), +- **side-effectful** (object storage, outbound webhooks), +- doing **expensive, non-idempotent work** (transcoding), +- with **fan-out** (one asset → many variants), +- and **money attached** (compute + storage cost). + +So the strategy is: **go deep on one coherent reliability spine, and over-document +the judgment behind it.** Don't go wide (more storage backends, k8s autoscaling, +a dashboard) — depth over breadth. + +--- + +## The reliability spine + +A single end-to-end story: *delivery & correctness guarantees from API call to +webhook.* Each link references a concrete failure mode. + +1. **Transactional outbox (ingress)** — atomic "create intent + publish event". +2. **At-least-once transport** — Redis Streams (already in place). +3. **Idempotent consumer** — the other half of #1; turns at-least-once into + *effectively-once*. +4. **Webhook delivery (egress outbox)** — signed, retried, dead-lettered. +5. **End-to-end tracing across the async boundary + a reconciliation auditor** — + observability of *correctness*, not just latency. +6. **Fault-injection tests that prove each guarantee.** + +### The key insight (the senior signal) + +> "I don't chase exactly-once *delivery* — it's impossible across a network. I make +> the *effects* idempotent, so at-least-once delivery becomes effectively-once." + +Outbox alone is only half a solution; shipping it without an idempotent consumer can +even read as naive. The **pair** is what signals understanding. + +--- + +## Current state (grounded in code) + +### Ingress dual-write — the real gap (NOT yet planned) + +`internal/service/asset.go` → `MarkAssetUploaded` (line ~179): + +``` +BEGIN TX + MarkAssetUploadedTx -- assets.status = uploaded + InsertProcessAssetJobTx -- jobs row, status = pending +COMMIT -- line ~260 +queue.Enqueue(...) -- line ~271, AFTER commit +``` + +**Hazard:** crash between `COMMIT` and `Enqueue` → job committed as `pending` but no +stream message is ever published. The worker never sees it. + +**Current backstop:** the worker's `_recover_stuck_pending` (DEV-35) re-adds +`pending`/`in_progress` jobs older than ~2 min back to the stream. So the system is +not *broken* — but its correctness depends on a slow, implicit, undocumented polling +sweep. An ingress outbox makes that guarantee **explicit, fast, and atomic**. + +### Egress outbox — already planned (DEV-40 family) + +The webhook subsystem is, by design, a textbook outbox on the *egress* side: +- **DEV-44 (Done)** — `webhook_registrations` + `webhook_deliveries` (the outbox table). +- **DEV-46 (Backlog)** — worker inserts a `webhook_deliveries` row **in the same + transaction** as the job-completion write. +- **DEV-47 (Backlog)** — delivery poller: `SELECT … FOR UPDATE SKIP LOCKED`, POST, + exponential backoff, DLQ after N attempts. +- **DEV-45 (Backlog)** — `POST /webhooks/register`. + +So the egress half of the spine is specced. The **ingress half is the symmetric gap.** + +### Idempotency today + +- **Content-addressed variants** (`variant_hash`) → reprocessing writes the same + object to the same key and upserts the same row. Naturally idempotent on storage. +- **Retry classification** (DEV-34, DEV-52, Done) → retryable vs terminal failures + handled correctly; assets no longer stuck `failed` across retries. +- **Gap:** no explicit processed-message / inbox dedup keyed on the stream message + ID. Reprocessing is *safe* for variants but not *cheap*, and any non-idempotent + effect added later (e.g. the DEV-46 webhook insert) would double-fire on redelivery. + +--- + +## Ranked additions + +Status legend: ✅ done · 🟡 planned (Linear) · 🔴 gap (unplanned). + +| # | Addition | Status | Why it's interesting | Discussion threads | +|---|----------|--------|----------------------|--------------------| +| 1 | **Ingress outbox + idempotent consumer** | 🟡 DEV-55 (spec'd) | The dual-write hazard above; effectively-once | polling relay vs CDC; per-asset ordering vs throughput; outbox retention; relay idempotency | +| 2 | **Webhook delivery done right** | 🟡 DEV-40/45/46/47 | At-least-once + HMAC signing + replay protection + circuit breaking + DLQ + replay API | ordering offered to subscribers; "you must be idempotent"; poison-endpoint isolation | +| 3 | **Trace context across the async boundary** | 🔴 gap | Propagate W3C `traceparent` *through the stream message* so one trace spans Go API → relay → worker → storage → webhook. Worker currently uses Prometheus, not OTel | context propagation across language + async boundary | +| 4 | **Reconciliation / auditor job** | 🟡 partial (DEV-35) | Generalize `_recover_stuck_pending` into an invariant scanner: stuck assets, jobs with no terminal state, variant rows whose objects are missing, unrelayed outbox rows | which invariants; alert vs auto-heal | +| 5 | **DLQ + poison handling, surfaced** | 🟡 partial | After N attempts → DLQ + inspect/requeue API + alert | dead-letter schema; replay tooling | +| 6 | **Backpressure & fair scheduling** | 🔴 gap | Stream `MAXLEN` (set to 10k today), consumer-lag metrics (`XPENDING`), scale-on-lag; spicy: weighted fair queueing across tenants | capacity model; tenant starvation | +| 7 | **Client idempotency keys** | 🔴 gap | Stripe-style `Idempotency-Key` on `POST /upload` so client retries don't dupe assets | key storage + TTL; response replay | + +**Headline recommendation:** #1 (with #3 folded in) is the highest-leverage, most +distinctive piece, and it's the only spine link with *no* Linear coverage. + +--- + +## Failure modes & guarantees (fill in as we build) + +The single most senior-looking artifact. Each row = a failure; columns = outcome + +the mechanism that protects it + residual risk. + +| Failure | Today | With ingress outbox + idempotent consumer | +|---------|-------|-------------------------------------------| +| Producer crash after COMMIT, before Enqueue | job stuck `pending` until ~2 min recovery sweep | outbox row committed atomically; relay publishes; bounded by relay interval | +| Worker crash after side effect, before XACK | message redelivered → reprocessed; variants safe (hash), other effects may double | dedup/inbox skips already-processed message | +| Redis data loss | in-flight stream entries lost; recovery sweep re-adds from DB | DB outbox is source of truth; relay re-publishes | +| Partial S3 write | variant object may be incomplete | (open) checksum/verify on read; reconciler flags missing objects | +| Webhook receiver down | (n/a yet) | egress outbox retries w/ backoff → DLQ | + +--- + +## The meta-layer (what actually convinces) + +Building the features is table stakes. The reasoning trail is the part a prompt can't +reproduce: + +- **ADRs** in `docs/adr/`: "polling outbox vs CDC", "effectively-once not + exactly-once", "Redis Streams vs Kafka/SQS — and where Redis breaks for us". Short, + dated, with rejected alternatives. +- **This failure-modes table**, kept current. +- **Chaos / fault-injection tests** (testcontainers + a harness that kills the worker + at the dangerous moment). The trophy test: *"kill worker after `PutObject`, before + `XACK`; assert exactly one variant row and one storage object."* +- **An honest "where this breaks at 100×" section** — seniors signal by knowing the + edges of their own design. + +--- + +## What to avoid (so depth isn't diluted) + +- More storage backends (GCS/S3/MinIO is enough — solved). +- Kubernetes autoscaling / service-mesh theater. +- A frontend/admin dashboard (pulls focus from the systems story). +- "Add Kafka" without being able to defend *why Redis is insufficient* — at this + scale it isn't, so it reads as résumé-driven. + +--- + +## Revisit checklist + +When you come back to this doc, ask: +1. Which spine links are now ✅? Did any regress? +2. Is the failure-modes table still accurate against the code? +3. What's the next 🔴 gap with the highest signal-to-effort? +4. Is there an ADR for every non-obvious decision shipped since last visit? +5. Is there a chaos test proving each guarantee we *claim*? diff --git a/go.mod b/go.mod index 97919fb..f828092 100644 --- a/go.mod +++ b/go.mod @@ -15,6 +15,9 @@ require ( github.com/joho/godotenv v1.5.1 github.com/lib/pq v1.10.9 github.com/redis/go-redis/v9 v9.17.2 + github.com/testcontainers/testcontainers-go v0.43.0 + github.com/testcontainers/testcontainers-go/modules/postgres v0.43.0 + github.com/testcontainers/testcontainers-go/modules/redis v0.43.0 go.opentelemetry.io/otel v1.44.0 go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.39.0 go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.39.0 @@ -23,12 +26,15 @@ require ( go.opentelemetry.io/otel/sdk/metric v1.39.0 go.opentelemetry.io/otel/trace v1.44.0 go.uber.org/zap v1.28.0 - golang.org/x/crypto v0.45.0 + golang.org/x/crypto v0.51.0 google.golang.org/api v0.256.0 google.golang.org/grpc v1.77.0 ) require ( + dario.cat/mergo v1.0.2 // indirect + github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c // indirect + github.com/Microsoft/go-winio v0.6.2 // indirect github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.7.13 // indirect github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.29 // indirect github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.29 // indirect @@ -43,6 +49,42 @@ require ( github.com/aws/aws-sdk-go-v2/service/ssooidc v1.36.6 // indirect github.com/aws/aws-sdk-go-v2/service/sts v1.43.3 // indirect github.com/aws/smithy-go v1.27.1 // indirect + github.com/cenkalti/backoff/v4 v4.3.0 // indirect + github.com/containerd/errdefs v1.0.0 // indirect + github.com/containerd/errdefs/pkg v0.3.0 // indirect + github.com/containerd/log v0.1.0 // indirect + github.com/containerd/platforms v0.2.1 // indirect + github.com/cpuguy83/dockercfg v0.3.2 // indirect + github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect + github.com/distribution/reference v0.6.0 // indirect + github.com/docker/go-connections v0.6.0 // indirect + github.com/docker/go-units v0.5.0 // indirect + github.com/ebitengine/purego v0.10.0 // indirect + github.com/go-ole/go-ole v1.2.6 // indirect + github.com/klauspost/compress v1.18.5 // indirect + github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect + github.com/magiconair/properties v1.8.10 // indirect + github.com/mdelapenya/tlscert v0.2.0 // indirect + github.com/moby/docker-image-spec v1.3.1 // indirect + github.com/moby/go-archive v0.2.0 // indirect + github.com/moby/moby/api v1.54.2 // indirect + github.com/moby/moby/client v0.4.0 // indirect + github.com/moby/patternmatcher v0.6.1 // indirect + github.com/moby/sys/sequential v0.6.0 // indirect + github.com/moby/sys/user v0.4.0 // indirect + github.com/moby/sys/userns v0.1.0 // indirect + github.com/moby/term v0.5.2 // indirect + github.com/opencontainers/go-digest v1.0.0 // indirect + github.com/opencontainers/image-spec v1.1.1 // indirect + github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect + github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 // indirect + github.com/shirou/gopsutil/v4 v4.26.5 // indirect + github.com/sirupsen/logrus v1.9.4 // indirect + github.com/stretchr/testify v1.11.1 // indirect + github.com/tklauser/go-sysconf v0.3.16 // indirect + github.com/tklauser/numcpus v0.11.0 // indirect + github.com/yusufpapurcu/wmi v1.2.4 // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect ) require ( @@ -82,11 +124,11 @@ require ( go.opentelemetry.io/otel/log v0.20.0 go.opentelemetry.io/proto/otlp v1.9.0 // indirect go.uber.org/multierr v1.11.0 // indirect - golang.org/x/net v0.47.0 // indirect + golang.org/x/net v0.53.0 // indirect golang.org/x/oauth2 v0.33.0 // indirect - golang.org/x/sync v0.18.0 // indirect - golang.org/x/sys v0.39.0 // indirect - golang.org/x/text v0.31.0 // indirect + golang.org/x/sync v0.20.0 // indirect + golang.org/x/sys v0.45.0 // indirect + golang.org/x/text v0.37.0 // indirect golang.org/x/time v0.14.0 google.golang.org/genproto v0.0.0-20250922171735-9219d122eba9 // indirect google.golang.org/genproto/googleapis/api v0.0.0-20251202230838-ff82c1b0f217 // indirect diff --git a/go.sum b/go.sum index b24f082..10d89b6 100644 --- a/go.sum +++ b/go.sum @@ -20,10 +20,14 @@ cloud.google.com/go/storage v1.58.0 h1:PflFXlmFJjG/nBeR9B7pKddLQWaFaRWx4uUi/LyNx cloud.google.com/go/storage v1.58.0/go.mod h1:cMWbtM+anpC74gn6qjLh+exqYcfmB9Hqe5z6adx+CLI= cloud.google.com/go/trace v1.11.6 h1:2O2zjPzqPYAHrn3OKl029qlqG6W8ZdYaOWRyr8NgMT4= cloud.google.com/go/trace v1.11.6/go.mod h1:GA855OeDEBiBMzcckLPE2kDunIpC72N+Pq8WFieFjnI= +dario.cat/mergo v1.0.2 h1:85+piFYR1tMbRrLcDwR18y4UKJ3aH1Tbzi24VRW1TK8= +dario.cat/mergo v1.0.2/go.mod h1:E/hbnu0NxMFBjpMIE34DRGLWqDy0g5FuKDhCb31ngxA= filippo.io/edwards25519 v1.1.0 h1:FNf4tywRC1HmFuKW5xopWpigGjJKiJSV0Cqo0cJWDaA= filippo.io/edwards25519 v1.1.0/go.mod h1:BxyFTGdWcka3PhytdK4V28tE5sGfRvvvRV7EaN4VDT4= -github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161 h1:L/gRVlceqvL25UVaW/CKtUDjefjrs0SPonmDGUVOYP0= -github.com/Azure/go-ansiterm v0.0.0-20230124172434-306776ec8161/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E= +github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6 h1:He8afgbRMd7mFxO99hRNu+6tazq8nFF9lIwo9JFroBk= +github.com/AdaLogics/go-fuzz-headers v0.0.0-20240806141605-e8a1dd7889d6/go.mod h1:8o94RPi1/7XTJvwPpRSzSUedZrtlirdB3r9Z20bi2f8= +github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c h1:udKWzYgxTojEKWjV8V+WSxDXJ4NFATAsZjh8iIbsQIg= +github.com/Azure/go-ansiterm v0.0.0-20250102033503-faa5f7b0171c/go.mod h1:xomTg63KZ2rFqZQzSB4Vz2SUXa1BpHTVz9L5PTmPC4E= github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.30.0 h1:sBEjpZlNHzK1voKq9695PJSX2o5NEXl7/OL3coiIY0c= github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp v1.30.0/go.mod h1:P4WPRUkOhJC13W//jWpyfJNDAIpvRbAUIYLX/4jtlE0= github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric v0.54.0 h1:lhhYARPUu3LmHysQ/igznQphfzynnqI3D75oUyw1HXk= @@ -74,6 +78,8 @@ github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs= github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c= github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA= github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0= +github.com/cenkalti/backoff/v4 v4.3.0 h1:MyRJ/UdXutAwSAT+s3wNd7MfTIcy71VQueUuFK343L8= +github.com/cenkalti/backoff/v4 v4.3.0/go.mod h1:Y3VNntkOUPxTVeUxJ/G5vcM//AlwfmyYozVcomhLiZE= github.com/cenkalti/backoff/v5 v5.0.3 h1:ZN+IMa753KfX5hd8vVaMixjnqRZ3y8CuJKRKj1xcsSM= github.com/cenkalti/backoff/v5 v5.0.3/go.mod h1:rkhZdG3JZukswDf7f0cwqPNk4K0sa+F97BxZthm/crw= github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs= @@ -84,6 +90,14 @@ github.com/containerd/errdefs v1.0.0 h1:tg5yIfIlQIrxYtu9ajqY42W3lpS19XqdxRQeEwYG github.com/containerd/errdefs v1.0.0/go.mod h1:+YBYIdtsnF4Iw6nWZhJcqGSg/dwvV7tyJ/kCkyJ2k+M= github.com/containerd/errdefs/pkg v0.3.0 h1:9IKJ06FvyNlexW690DXuQNx2KA2cUJXx151Xdx3ZPPE= github.com/containerd/errdefs/pkg v0.3.0/go.mod h1:NJw6s9HwNuRhnjJhM7pylWwMyAkmCQvQ4GpJHEqRLVk= +github.com/containerd/log v0.1.0 h1:TCJt7ioM2cr/tfR8GPbGf9/VRAX8D2B4PjzCpfX540I= +github.com/containerd/log v0.1.0/go.mod h1:VRRf09a7mHDIRezVKTRCrOq78v577GXq3bSa3EhrzVo= +github.com/containerd/platforms v0.2.1 h1:zvwtM3rz2YHPQsF2CHYM8+KtB5dvhISiXh5ZpSBQv6A= +github.com/containerd/platforms v0.2.1/go.mod h1:XHCb+2/hzowdiut9rkudds9bE5yJ7npe7dG/wG+uFPw= +github.com/cpuguy83/dockercfg v0.3.2 h1:DlJTyZGBDlXqUZ2Dk2Q3xHs/FtnooJJVaad2S9GKorA= +github.com/cpuguy83/dockercfg v0.3.2/go.mod h1:sugsbF4//dDlL/i+S+rtpIWp+5h0BHJHfjj5/jFyUJc= +github.com/creack/pty v1.1.24 h1:bJrF4RRfyJnbTJqzRLHzcGaZK1NeM5kTC9jGgovnR1s= +github.com/creack/pty v1.1.24/go.mod h1:08sCNb52WyoAwi2QDyzUCTgcvVFhUzewun7wtTfvcwE= github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78= @@ -94,10 +108,12 @@ github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5Qvfr github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E= github.com/docker/docker v28.3.3+incompatible h1:Dypm25kh4rmk49v1eiVbsAtpAsYURjYkaKubwuBdxEI= github.com/docker/docker v28.3.3+incompatible/go.mod h1:eEKB0N0r5NX/I1kEveEz05bcu8tLC/8azJZsviup8Sk= -github.com/docker/go-connections v0.5.0 h1:USnMq7hx7gwdVZq1L49hLXaFtUdTADjXGp+uj1Br63c= -github.com/docker/go-connections v0.5.0/go.mod h1:ov60Kzw0kKElRwhNs9UlUHAE/F9Fe6GLaXnqyDdmEXc= +github.com/docker/go-connections v0.6.0 h1:LlMG9azAe1TqfR7sO+NJttz1gy6KO7VJBh+pMmjSD94= +github.com/docker/go-connections v0.6.0/go.mod h1:AahvXYshr6JgfUJGdDCs2b5EZG/vmaMAntpSFH5BFKE= github.com/docker/go-units v0.5.0 h1:69rxXcBk27SvSaaxTtLh/8llcHD8vYHT7WSdRZ/jvr4= github.com/docker/go-units v0.5.0/go.mod h1:fgPhTUdO+D/Jk86RDLlptpiXQzgHJF7gydDDbaIK4Dk= +github.com/ebitengine/purego v0.10.0 h1:QIw4xfpWT6GWTzaW5XEKy3HXoqrJGx1ijYHzTF0/ISU= +github.com/ebitengine/purego v0.10.0/go.mod h1:iIjxzd6CiRiOG0UyXP+V1+jWqUXVjPKLAI0mRfJZTmQ= github.com/envoyproxy/go-control-plane v0.13.5-0.20251024222203-75eaa193e329 h1:K+fnvUM0VZ7ZFJf0n4L/BRlnsb9pL/GuDG6FqaH+PwM= github.com/envoyproxy/go-control-plane v0.13.5-0.20251024222203-75eaa193e329/go.mod h1:Alz8LEClvR7xKsrq3qzoc4N0guvVNSS8KmSChGYr9hs= github.com/envoyproxy/go-control-plane/envoy v1.35.0 h1:ixjkELDE+ru6idPxcHLj8LBVc2bFP7iBytj353BoHUo= @@ -119,6 +135,8 @@ github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI= github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag= github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE= +github.com/go-ole/go-ole v1.2.6 h1:/Fpf6oFPoeFik9ty7siob0G6Ke8QvQEuVcuChpwXzpY= +github.com/go-ole/go-ole v1.2.6/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0= github.com/go-sql-driver/mysql v1.8.1 h1:LedoTUt/eveggdHS9qUFC1EFSa8bU2+1pZjSRpvNJ1Y= github.com/go-sql-driver/mysql v1.8.1/go.mod h1:wEBSXgmK//2ZFJyE+qWnIsVGmvmEKlqwuVSjsCm7DZg= github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q= @@ -127,6 +145,7 @@ github.com/golang-migrate/migrate/v4 v4.19.1 h1:OCyb44lFuQfYXYLx1SCxPZQGU7mcaZ7g github.com/golang-migrate/migrate/v4 v4.19.1/go.mod h1:CTcgfjxhaUtsLipnLoQRWCrjYXycRz/g5+RWDuYgPrE= github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek= github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps= +github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= github.com/google/martian/v3 v3.3.3 h1:DIhPTQrbPkgs2yJYdXU/eNACCG5DVQjySNRNlflZ9Fc= @@ -141,36 +160,92 @@ github.com/googleapis/gax-go/v2 v2.15.0 h1:SyjDc1mGgZU5LncH8gimWo9lW1DtIfPibOG81 github.com/googleapis/gax-go/v2 v2.15.0/go.mod h1:zVVkkxAQHa1RQpg9z2AUCMnKhi0Qld9rcmyfL1OZhoc= github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.3 h1:NmZ1PKzSTQbuGHw9DGPFomqkkLWMC+vZCkfs+FHv1Vg= github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.3/go.mod h1:zQrxl1YP88HQlA6i9c63DSVPFklWpGX4OWAc9bFuaH4= +github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM= +github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg= +github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo= +github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761/go.mod h1:5TJZWKEWniPve33vlWYSoGYefn3gLQRzjfDlhSJ9ZKM= +github.com/jackc/pgx/v5 v5.9.2 h1:3ZhOzMWnR4yJ+RW1XImIPsD1aNSz4T4fyP7zlQb56hw= +github.com/jackc/pgx/v5 v5.9.2/go.mod h1:mal1tBGAFfLHvZzaYh77YS/eC6IX9OWbRV1QIIM0Jn4= +github.com/jackc/puddle/v2 v2.2.2 h1:PR8nw+E/1w0GLuRFSmiioY6UooMp6KJv0/61nB7icHo= +github.com/jackc/puddle/v2 v2.2.2/go.mod h1:vriiEXHvEE654aYKXXjOvZM39qJ0q+azkZFrfEOc3H4= github.com/jmoiron/sqlx v1.4.0 h1:1PLqN7S1UYp5t4SrVVnt4nUVNemrDAtxlulVe+Qgm3o= github.com/jmoiron/sqlx v1.4.0/go.mod h1:ZrZ7UsYB/weZdl2Bxg6jCRO9c3YHl8r3ahlKmRT4JLY= github.com/joho/godotenv v1.5.1 h1:7eLL/+HRGLY0ldzfGMeQkb7vMd0as4CfYvUVzLqw0N0= github.com/joho/godotenv v1.5.1/go.mod h1:f4LDr5Voq0i2e/R5DDNOoa2zzDfwtkZa6DnEwAbqwq4= +github.com/klauspost/compress v1.18.5 h1:/h1gH5Ce+VWNLSWqPzOVn6XBO+vJbCNGvjoaGBFW2IE= +github.com/klauspost/compress v1.18.5/go.mod h1:cwPg85FWrGar70rWktvGQj8/hthj3wpl0PGDogxkrSQ= +github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= +github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= +github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= +github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= github.com/lib/pq v1.10.9 h1:YXG7RB+JIjhP29X+OtkiDnYaXQwpS4JEWq7dtCCRUEw= github.com/lib/pq v1.10.9/go.mod h1:AlVN5x4E4T544tWzH6hKfbfQvm3HdbOxrmggDNAPY9o= +github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 h1:6E+4a0GO5zZEnZ81pIr0yLvtUWk2if982qA3F3QD6H4= +github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0/go.mod h1:zJYVVT2jmtg6P3p1VtQj7WsuWi/y4VnjVBn7F8KPB3I= +github.com/magiconair/properties v1.8.10 h1:s31yESBquKXCV9a/ScB3ESkOjUYYv+X0rg8SYxI99mE= +github.com/magiconair/properties v1.8.10/go.mod h1:Dhd985XPs7jluiymwWYZ0G4Z61jb3vdS329zhj2hYo0= github.com/mattn/go-sqlite3 v1.14.22 h1:2gZY6PC6kBnID23Tichd1K+Z0oS6nE/XwU+Vz/5o4kU= github.com/mattn/go-sqlite3 v1.14.22/go.mod h1:Uh1q+B4BYcTPb+yiD3kU8Ct7aC0hY9fxUwlHK0RXw+Y= +github.com/mdelapenya/tlscert v0.2.0 h1:7H81W6Z/4weDvZBNOfQte5GpIMo0lGYEeWbkGp5LJHI= +github.com/mdelapenya/tlscert v0.2.0/go.mod h1:O4njj3ELLnJjGdkN7M/vIVCpZ+Cf0L6muqOG4tLSl8o= github.com/moby/docker-image-spec v1.3.1 h1:jMKff3w6PgbfSa69GfNg+zN/XLhfXJGnEx3Nl2EsFP0= github.com/moby/docker-image-spec v1.3.1/go.mod h1:eKmb5VW8vQEh/BAr2yvVNvuiJuY6UIocYsFu/DxxRpo= -github.com/moby/term v0.5.0 h1:xt8Q1nalod/v7BqbG21f8mQPqH+xAaC9C3N3wfWbVP0= -github.com/moby/term v0.5.0/go.mod h1:8FzsFHVUBGZdbDsJw/ot+X+d5HLUbvklYLJ9uGfcI3Y= +github.com/moby/go-archive v0.2.0 h1:zg5QDUM2mi0JIM9fdQZWC7U8+2ZfixfTYoHL7rWUcP8= +github.com/moby/go-archive v0.2.0/go.mod h1:mNeivT14o8xU+5q1YnNrkQVpK+dnNe/K6fHqnTg4qPU= +github.com/moby/moby/api v1.54.2 h1:wiat9QAhnDQjA7wk1kh/TqHz2I1uUA7M7t9SAl/JNXg= +github.com/moby/moby/api v1.54.2/go.mod h1:+RQ6wluLwtYaTd1WnPLykIDPekkuyD/ROWQClE83pzs= +github.com/moby/moby/client v0.4.0 h1:S+2XegzHQrrvTCvF6s5HFzcrywWQmuVnhOXe2kiWjIw= +github.com/moby/moby/client v0.4.0/go.mod h1:QWPbvWchQbxBNdaLSpoKpCdf5E+WxFAgNHogCWDoa7g= +github.com/moby/patternmatcher v0.6.1 h1:qlhtafmr6kgMIJjKJMDmMWq7WLkKIo23hsrpR3x084U= +github.com/moby/patternmatcher v0.6.1/go.mod h1:hDPoyOpDY7OrrMDLaYoY3hf52gNCR/YOUYxkhApJIxc= +github.com/moby/sys/sequential v0.6.0 h1:qrx7XFUd/5DxtqcoH1h438hF5TmOvzC/lspjy7zgvCU= +github.com/moby/sys/sequential v0.6.0/go.mod h1:uyv8EUTrca5PnDsdMGXhZe6CCe8U/UiTWd+lL+7b/Ko= +github.com/moby/sys/user v0.4.0 h1:jhcMKit7SA80hivmFJcbB1vqmw//wU61Zdui2eQXuMs= +github.com/moby/sys/user v0.4.0/go.mod h1:bG+tYYYJgaMtRKgEmuueC0hJEAZWwtIbZTB+85uoHjs= +github.com/moby/sys/userns v0.1.0 h1:tVLXkFOxVu9A64/yh59slHVv9ahO9UIev4JZusOLG/g= +github.com/moby/sys/userns v0.1.0/go.mod h1:IHUYgu/kao6N8YZlp9Cf444ySSvCmDlmzUcYfDHOl28= +github.com/moby/term v0.5.2 h1:6qk3FJAFDs6i/q3W/pQ97SX192qKfZgGjCQqfCJkgzQ= +github.com/moby/term v0.5.2/go.mod h1:d3djjFCrjnB+fl8NJux+EJzu0msscUP+f8it8hPkFLc= github.com/morikuni/aec v1.0.0 h1:nP9CBfwrvYnBRgY6qfDQkygYDmYwOilePFkwzv4dU8A= github.com/morikuni/aec v1.0.0/go.mod h1:BbKIizmSmc5MMPqRYbxO4ZU0S0+P200+tUnFx7PXmsc= github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U= github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM= -github.com/opencontainers/image-spec v1.1.0 h1:8SG7/vwALn54lVB/0yZ/MMwhFrPYtpEHQb2IpWsCzug= -github.com/opencontainers/image-spec v1.1.0/go.mod h1:W4s4sFTMaBeK1BQLXbG4AdM2szdn85PY75RI83NrTrM= +github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040= +github.com/opencontainers/image-spec v1.1.1/go.mod h1:qpqAh3Dmcf36wStyyWU+kCeDgrGnAve2nCC8+7h8Q0M= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10 h1:GFCKgmp0tecUJ0sJuv4pzYCqS9+RGSn52M3FUwPs+uo= github.com/planetscale/vtprotobuf v0.6.1-0.20240319094008-0393e58bdf10/go.mod h1:t/avpk3KcrXxUnYOhZhMXJlSEyie6gQbtLq5NM3loB8= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55 h1:o4JXh1EVt9k/+g42oCprj/FisM4qX9L3sZB3upGN2ZU= +github.com/power-devops/perfstat v0.0.0-20240221224432-82ca36839d55/go.mod h1:OmDBASR4679mdNQnz2pUhc2G8CO2JrUAVFDRBDP/hJE= github.com/redis/go-redis/v9 v9.17.2 h1:P2EGsA4qVIM3Pp+aPocCJ7DguDHhqrXNhVcEp4ViluI= github.com/redis/go-redis/v9 v9.17.2/go.mod h1:u410H11HMLoB+TP67dz8rL9s6QW2j76l0//kSOd3370= +github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= +github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= +github.com/shirou/gopsutil/v4 v4.26.5 h1:RPcBXkpz7kOj9PqGFQOlBPZHsyaPvPVQc098y9RmCNM= +github.com/shirou/gopsutil/v4 v4.26.5/go.mod h1:LZ6ewCSkBqUpvSOf+LsTGnRinC6iaNUNMGBtDkJBaLQ= +github.com/sirupsen/logrus v1.9.4 h1:TsZE7l11zFCLZnZ+teH4Umoq5BhEIfIzfRDZ1Uzql2w= +github.com/sirupsen/logrus v1.9.4/go.mod h1:ftWc9WdOfJ0a92nsE2jF5u5ZwH8Bv2zdeOC42RjbV2g= github.com/spiffe/go-spiffe/v2 v2.6.0 h1:l+DolpxNWYgruGQVV0xsfeya3CsC7m8iBzDnMpsbLuo= github.com/spiffe/go-spiffe/v2 v2.6.0/go.mod h1:gm2SeUoMZEtpnzPNs2Csc0D/gX33k1xIx7lEzqblHEs= +github.com/stretchr/objx v0.5.3 h1:jmXUvGomnU1o3W/V5h2VEradbpJDwGrzugQQvL0POH4= +github.com/stretchr/objx v0.5.3/go.mod h1:rDQraq+vQZU7Fde9LOZLr8Tax6zZvy4kuNKF+QYS+U0= github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U= github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U= +github.com/testcontainers/testcontainers-go v0.43.0 h1:oEQx5MW2DGd9z3AeEQfB2lPM0eLs7ztyaGRu75bFo5A= +github.com/testcontainers/testcontainers-go v0.43.0/go.mod h1:+VxkT2NQnKOZPKi6praMuMKYHYyOGXr0XSBSlSMCzFo= +github.com/testcontainers/testcontainers-go/modules/postgres v0.43.0 h1:ShNOFYAF4lKHvdIG258hi69bSxC88uXnxJkJvNs/IVs= +github.com/testcontainers/testcontainers-go/modules/postgres v0.43.0/go.mod h1:vdq5/RqmGfWeefzyfcVI/pID1rzmc1TDvqXa15bPJks= +github.com/testcontainers/testcontainers-go/modules/redis v0.43.0 h1:qzATMhrltLr07KcGl/d674ouqI0AFtf6wnQb3VnqP7M= +github.com/testcontainers/testcontainers-go/modules/redis v0.43.0/go.mod h1:ygEcEUIZzmIlOKpjBfnPn/lUIRNorr1kPj3XfFPTQXM= +github.com/tklauser/go-sysconf v0.3.16 h1:frioLaCQSsF5Cy1jgRBrzr6t502KIIwQ0MArYICU0nA= +github.com/tklauser/go-sysconf v0.3.16/go.mod h1:/qNL9xxDhc7tx3HSRsLWNnuzbVfh3e7gh/BmM179nYI= +github.com/tklauser/numcpus v0.11.0 h1:nSTwhKH5e1dMNsCdVBukSZrURJRoHbSEQjdEbY+9RXw= +github.com/tklauser/numcpus v0.11.0/go.mod h1:z+LwcLq54uWZTX0u/bGobaV34u6V7KNlTZejzM6/3MQ= +github.com/yusufpapurcu/wmi v1.2.4 h1:zFUKzehAFReQwLys1b/iSMl+JQGSCSjtVqQn9bBrPo0= +github.com/yusufpapurcu/wmi v1.2.4/go.mod h1:SBZ9tNy3G9/m5Oi98Zks0QjeHVDvuK0qfxQmPyzfmi0= go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64= go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y= go.opentelemetry.io/contrib/bridges/otelzap v0.19.0 h1:48Eq3xxFx2KlL/tF7lnl42kKJBDlhNTLRzv0h154JnM= @@ -213,20 +288,26 @@ go.uber.org/zap v1.28.0 h1:IZzaP1Fv73/T/pBMLk4VutPl36uNC+OSUh3JLG3FIjo= go.uber.org/zap v1.28.0/go.mod h1:rDLpOi171uODNm/mxFcuYWxDsqWSAVkFdX4XojSKg/Q= go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc= go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg= -golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q= -golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4= -golang.org/x/net v0.47.0 h1:Mx+4dIFzqraBXUugkia1OOvlD6LemFo1ALMHjrXDOhY= -golang.org/x/net v0.47.0/go.mod h1:/jNxtkgq5yWUGYkaZGqo27cfGZ1c5Nen03aYrrKpVRU= +golang.org/x/crypto v0.51.0 h1:IBPXwPfKxY7cWQZ38ZCIRPI50YLeevDLlLnyC5wRGTI= +golang.org/x/crypto v0.51.0/go.mod h1:8AdwkbraGNABw2kOX6YFPs3WM22XqI4EXEd8g+x7Oc8= +golang.org/x/net v0.53.0 h1:d+qAbo5L0orcWAr0a9JweQpjXF19LMXJE8Ey7hwOdUA= +golang.org/x/net v0.53.0/go.mod h1:JvMuJH7rrdiCfbeHoo3fCQU24Lf5JJwT9W3sJFulfgs= golang.org/x/oauth2 v0.33.0 h1:4Q+qn+E5z8gPRJfmRy7C2gGG3T4jIprK6aSYgTXGRpo= golang.org/x/oauth2 v0.33.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA= -golang.org/x/sync v0.18.0 h1:kr88TuHDroi+UVf+0hZnirlk8o8T+4MrK6mr60WkH/I= -golang.org/x/sync v0.18.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI= -golang.org/x/sys v0.39.0 h1:CvCKL8MeisomCi6qNZ+wbb0DN9E5AATixKsvNtMoMFk= -golang.org/x/sys v0.39.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks= -golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM= -golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM= +golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4= +golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0= +golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20201204225414-ed752295db88/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210616094352-59db8d763f22/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.45.0 h1:dO4czNzziLiiXplLQgBCEpCvXQ3dnkn0SdaZSYdQ+FY= +golang.org/x/sys v0.45.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= +golang.org/x/term v0.43.0 h1:S4RLU2sB31O/NCl+zFN9Aru9A/Cq2aqKpTZJ6B+DwT4= +golang.org/x/term v0.43.0/go.mod h1:lrhlHNdQJHO+1qVYiHfFKVuVioJIheAc3fBSMFYEIsk= +golang.org/x/text v0.37.0 h1:Cqjiwd9eSg8e0QAkyCaQTNHFIIzWtidPahFWR83rTrc= +golang.org/x/text v0.37.0/go.mod h1:a5sjxXGs9hsn/AJVwuElvCAo9v8QYLzvavO5z2PiM38= golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI= golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4= +golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk= gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E= google.golang.org/api v0.256.0 h1:u6Khm8+F9sxbCTYNoBHg6/Hwv0N/i+V94MvkOSor6oI= @@ -241,5 +322,12 @@ google.golang.org/grpc v1.77.0 h1:wVVY6/8cGA6vvffn+wWK5ToddbgdU3d8MNENr4evgXM= google.golang.org/grpc v1.77.0/go.mod h1:z0BY1iVj0q8E1uSQCjL9cppRj+gnZjzDnzV0dHhrNig= google.golang.org/protobuf v1.36.10 h1:AYd7cD/uASjIL6Q9LiTjz8JLcrh/88q5UObnmY3aOOE= google.golang.org/protobuf v1.36.10/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gotest.tools/v3 v3.5.2 h1:7koQfIKdy+I8UTetycgUqXWSDwpgv193Ka+qRsmBY8Q= +gotest.tools/v3 v3.5.2/go.mod h1:LtdLGcnqToBH83WByAAi/wiwSFCArdFIUV/xxN4pcjA= +pgregory.net/rapid v1.2.0 h1:keKAYRcjm+e1F0oAuU5F5+YPAWcyxNNRK2wud503Gnk= +pgregory.net/rapid v1.2.0/go.mod h1:PY5XlDGj0+V1FCq0o192FdRhpKHGTRIWBgqjDBTrq04= diff --git a/internal/config/env.go b/internal/config/env.go index a000571..05628af 100644 --- a/internal/config/env.go +++ b/internal/config/env.go @@ -66,7 +66,12 @@ type S3Config struct { Region string AccessKeyID string SecretAccessKey string - EndpointURL string // optional — set for MinIO / S3-compatible stores + EndpointURL string // optional — internal/server-side endpoint for MinIO / S3-compatible stores + // PublicEndpointURL is the client-facing endpoint used to sign presigned + // URLs and build public URLs (e.g. http://localhost:9000). When empty, + // EndpointURL is used for both. Set this when internal services reach the + // store by a private host (e.g. minio:9000) that external clients cannot. + PublicEndpointURL string } type StorageConfig struct { @@ -76,6 +81,21 @@ type StorageConfig struct { S3 S3Config } +type OutboxConfig struct { + RelayInterval time.Duration + RelayBatch int + MaxAttempts int + Retention time.Duration +} + +type WebhookConfig struct { + PollInterval time.Duration + BatchSize int + Timeout time.Duration + MaxAttempts int + Retention time.Duration +} + type EnvConfig struct { Environment string Server ServerConfig @@ -83,6 +103,8 @@ type EnvConfig struct { Redis RedisConfig Otel OtelConfig Storage StorageConfig + Outbox OutboxConfig + Webhook WebhookConfig CORSAllowedOrigins []string LogLevel string EncryptionKey string @@ -181,6 +203,62 @@ func GetEnvConfig(envFile string) (EnvConfig, error) { corsOrigins = strings.Split(raw, ",") } + outboxRelayInterval := time.Second + if raw := os.Getenv("OUTBOX_RELAY_INTERVAL"); raw != "" { + if d, err := time.ParseDuration(raw); err == nil && d > 0 { + outboxRelayInterval = d + } + } + outboxRelayBatch := 100 + if raw := os.Getenv("OUTBOX_RELAY_BATCH"); raw != "" { + if n, err := strconv.Atoi(raw); err == nil && n > 0 { + outboxRelayBatch = n + } + } + outboxMaxAttempts := 5 + if raw := os.Getenv("OUTBOX_MAX_ATTEMPTS"); raw != "" { + if n, err := strconv.Atoi(raw); err == nil && n > 0 { + outboxMaxAttempts = n + } + } + outboxRetention := 168 * time.Hour + if raw := os.Getenv("OUTBOX_RETENTION"); raw != "" { + if d, err := time.ParseDuration(raw); err == nil && d > 0 { + outboxRetention = d + } + } + + webhookPollInterval := 2 * time.Second + if raw := os.Getenv("WEBHOOK_POLL_INTERVAL"); raw != "" { + if d, err := time.ParseDuration(raw); err == nil && d > 0 { + webhookPollInterval = d + } + } + webhookBatchSize := 50 + if raw := os.Getenv("WEBHOOK_BATCH_SIZE"); raw != "" { + if n, err := strconv.Atoi(raw); err == nil && n > 0 { + webhookBatchSize = n + } + } + webhookTimeout := 10 * time.Second + if raw := os.Getenv("WEBHOOK_TIMEOUT"); raw != "" { + if d, err := time.ParseDuration(raw); err == nil && d > 0 { + webhookTimeout = d + } + } + webhookMaxAttempts := 5 + if raw := os.Getenv("WEBHOOK_MAX_ATTEMPTS"); raw != "" { + if n, err := strconv.Atoi(raw); err == nil && n > 0 { + webhookMaxAttempts = n + } + } + webhookRetention := 168 * time.Hour + if raw := os.Getenv("WEBHOOK_RETENTION"); raw != "" { + if d, err := time.ParseDuration(raw); err == nil && d > 0 { + webhookRetention = d + } + } + return EnvConfig{ Environment: env, Server: ServerConfig{ @@ -213,11 +291,12 @@ func GetEnvConfig(envFile string) (EnvConfig, error) { SAPath: os.Getenv("GCS_SA_PATH"), }, S3: S3Config{ - Bucket: envOr("S3_BUCKET_NAME", envOr("BUCKET_NAME", "mpiper")), - Region: os.Getenv("S3_REGION"), - AccessKeyID: os.Getenv("S3_ACCESS_KEY_ID"), - SecretAccessKey: os.Getenv("S3_SECRET_ACCESS_KEY"), - EndpointURL: os.Getenv("S3_ENDPOINT_URL"), + Bucket: envOr("S3_BUCKET_NAME", envOr("BUCKET_NAME", "mpiper")), + Region: os.Getenv("S3_REGION"), + AccessKeyID: os.Getenv("S3_ACCESS_KEY_ID"), + SecretAccessKey: os.Getenv("S3_SECRET_ACCESS_KEY"), + EndpointURL: os.Getenv("S3_ENDPOINT_URL"), + PublicEndpointURL: os.Getenv("S3_PUBLIC_ENDPOINT_URL"), }, }, CORSAllowedOrigins: corsOrigins, @@ -225,6 +304,19 @@ func GetEnvConfig(envFile string) (EnvConfig, error) { EncryptionKey: encryptionKey, AutoMigrate: strings.ToLower(os.Getenv("AUTO_MIGRATE")) == "true", MaxAssetSizeBytes: maxAssetSize, + Outbox: OutboxConfig{ + RelayInterval: outboxRelayInterval, + RelayBatch: outboxRelayBatch, + MaxAttempts: outboxMaxAttempts, + Retention: outboxRetention, + }, + Webhook: WebhookConfig{ + PollInterval: webhookPollInterval, + BatchSize: webhookBatchSize, + Timeout: webhookTimeout, + MaxAttempts: webhookMaxAttempts, + Retention: webhookRetention, + }, }, nil } diff --git a/internal/database/migrations/000003_event_outbox.down.sql b/internal/database/migrations/000003_event_outbox.down.sql new file mode 100644 index 0000000..a368c24 --- /dev/null +++ b/internal/database/migrations/000003_event_outbox.down.sql @@ -0,0 +1 @@ +DROP TABLE IF EXISTS event_outbox; diff --git a/internal/database/migrations/000003_event_outbox.up.sql b/internal/database/migrations/000003_event_outbox.up.sql new file mode 100644 index 0000000..9cb00da --- /dev/null +++ b/internal/database/migrations/000003_event_outbox.up.sql @@ -0,0 +1,16 @@ +CREATE TABLE IF NOT EXISTS event_outbox ( + id BIGSERIAL PRIMARY KEY, + aggregate_id UUID NOT NULL, + job_id BIGINT, + event TEXT NOT NULL, + payload JSONB NOT NULL, + traceparent TEXT, + status TEXT NOT NULL DEFAULT 'pending', + attempts INT NOT NULL DEFAULT 0, + max_attempts INT NOT NULL DEFAULT 5, + last_error TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + published_at TIMESTAMPTZ +); + +CREATE INDEX idx_event_outbox_pending ON event_outbox (id) WHERE status = 'pending'; diff --git a/internal/database/migrations/000004_assets_owner_id.down.sql b/internal/database/migrations/000004_assets_owner_id.down.sql new file mode 100644 index 0000000..517c8b8 --- /dev/null +++ b/internal/database/migrations/000004_assets_owner_id.down.sql @@ -0,0 +1 @@ +ALTER TABLE assets DROP COLUMN owner_id; diff --git a/internal/database/migrations/000004_assets_owner_id.up.sql b/internal/database/migrations/000004_assets_owner_id.up.sql new file mode 100644 index 0000000..aa07acf --- /dev/null +++ b/internal/database/migrations/000004_assets_owner_id.up.sql @@ -0,0 +1 @@ +ALTER TABLE assets ADD COLUMN owner_id TEXT; diff --git a/internal/database/migrations/000005_webhook_user_events.down.sql b/internal/database/migrations/000005_webhook_user_events.down.sql new file mode 100644 index 0000000..99c18ba --- /dev/null +++ b/internal/database/migrations/000005_webhook_user_events.down.sql @@ -0,0 +1,3 @@ +DROP INDEX IF EXISTS idx_webhook_registrations_user; +ALTER TABLE webhook_registrations DROP COLUMN events; +ALTER TABLE webhook_registrations DROP COLUMN user_id; diff --git a/internal/database/migrations/000005_webhook_user_events.up.sql b/internal/database/migrations/000005_webhook_user_events.up.sql new file mode 100644 index 0000000..23813d6 --- /dev/null +++ b/internal/database/migrations/000005_webhook_user_events.up.sql @@ -0,0 +1,3 @@ +ALTER TABLE webhook_registrations ADD COLUMN user_id TEXT NOT NULL DEFAULT ''; +ALTER TABLE webhook_registrations ADD COLUMN events JSONB NOT NULL DEFAULT '[]'::jsonb; +CREATE INDEX idx_webhook_registrations_user ON webhook_registrations (user_id); diff --git a/internal/handler/webhook_handler.go b/internal/handler/webhook_handler.go new file mode 100644 index 0000000..6d503af --- /dev/null +++ b/internal/handler/webhook_handler.go @@ -0,0 +1,73 @@ +package handler + +import ( + "net/http" + + "github.com/go-chi/chi/v5" + "github.com/google/uuid" + "github.com/rndmcodeguy20/mpiper/internal/service" + "github.com/rndmcodeguy20/mpiper/pkg/utils" + "go.uber.org/zap" +) + +type WebhookHandler struct { + svc service.WebhookService + logger *zap.Logger +} + +func NewWebhookHandler(svc service.WebhookService, logger *zap.Logger) *WebhookHandler { + return &WebhookHandler{svc: svc, logger: logger} +} + +type createWebhookRequest struct { + URL string `json:"url"` + Secret string `json:"secret"` + Events []string `json:"events"` +} + +func (h *WebhookHandler) Create(w http.ResponseWriter, r *http.Request) { + var req createWebhookRequest + if err := utils.ParseJSON(r.Body, &req); err != nil { + utils.RespondJSON(w, map[string]string{"status": "error", "message": "invalid request"}, http.StatusBadRequest) + return + } + + reg, err := h.svc.Create(r.Context(), req.URL, req.Secret, req.Events) + if err != nil { + h.logger.Warn("webhook create failed", zap.Error(err)) + utils.RespondJSON(w, map[string]string{"status": "error", "message": err.Error()}, http.StatusBadRequest) + return + } + + utils.RespondJSON(w, map[string]interface{}{"status": "success", "data": reg}, http.StatusCreated) +} + +func (h *WebhookHandler) List(w http.ResponseWriter, r *http.Request) { + regs, err := h.svc.List(r.Context()) + if err != nil { + h.logger.Error("webhook list failed", zap.Error(err)) + utils.RespondJSON(w, map[string]string{"status": "error", "message": err.Error()}, http.StatusInternalServerError) + return + } + utils.RespondJSON(w, map[string]interface{}{"status": "success", "data": regs}, http.StatusOK) +} + +func (h *WebhookHandler) Delete(w http.ResponseWriter, r *http.Request) { + idStr := chi.URLParam(r, "id") + id, err := uuid.Parse(idStr) + if err != nil { + utils.RespondJSON(w, map[string]string{"status": "error", "message": "invalid id"}, http.StatusBadRequest) + return + } + + if err := h.svc.Delete(r.Context(), id); err != nil { + status := http.StatusInternalServerError + if err.Error() == "not found" { + status = http.StatusNotFound + } + utils.RespondJSON(w, map[string]string{"status": "error", "message": err.Error()}, status) + return + } + + utils.RespondJSON(w, map[string]string{"status": "success", "message": "deleted"}, http.StatusOK) +} diff --git a/internal/metrics/metrics.go b/internal/metrics/metrics.go index 4dbe838..054436e 100644 --- a/internal/metrics/metrics.go +++ b/internal/metrics/metrics.go @@ -54,6 +54,16 @@ type Metrics struct { QueueDepth metric.Int64ObservableGauge QueueProcessingLag metric.Float64Histogram + OutboxPublishedTotal metric.Int64Counter + OutboxPublishFailures metric.Int64Counter + OutboxRelayLagSeconds metric.Float64Histogram + OutboxPendingGauge metric.Int64ObservableGauge + + WebhookDeliveryTotal metric.Int64Counter + WebhookDeliveryDuration metric.Float64Histogram + WebhookDeliveryFailures metric.Int64Counter + WebhookPendingGauge metric.Int64ObservableGauge + SystemMemoryUsage metric.Int64ObservableGauge SystemGoroutineCount metric.Int64ObservableGauge SystemGCPauseDuration metric.Float64Histogram @@ -73,6 +83,19 @@ func (m *Metrics) RegisterQueueDepthFunc(fn func(context.Context) (int64, error) return err } +// RegisterOutboxPendingFunc wires a callback to the OutboxPendingGauge. +func (m *Metrics) RegisterOutboxPendingFunc(fn func(context.Context) (int64, error)) error { + _, err := m.meter.RegisterCallback(func(ctx context.Context, o metric.Observer) error { + n, err := fn(ctx) + if err != nil { + return err + } + o.ObserveInt64(m.OutboxPendingGauge, n) + return nil + }, m.OutboxPendingGauge) + return err +} + func InitMetrics(ctx context.Context, logger *zap.Logger) (*Metrics, func(context.Context) error) { otelCfg := config.MustGet().Otel endpoint := stripURLScheme(otelCfg.Endpoint) @@ -145,6 +168,8 @@ func InitMetrics(ctx context.Context, logger *zap.Logger) (*Metrics, func(contex initStorageMetrics(m, meter, logger) initDatabaseMetrics(m, meter, logger) initQueueMetrics(m, meter, logger) + initOutboxMetrics(m, meter, logger) + initWebhookMetrics(m, meter, logger) initSystemMetrics(m, meter, logger) logger.Sugar().Info("OpenTelemetry metrics initialized successfully") @@ -307,6 +332,30 @@ func initQueueMetrics(m *Metrics, meter metric.Meter, logger *zap.Logger) { } } +func initOutboxMetrics(m *Metrics, meter metric.Meter, logger *zap.Logger) { + var err error + m.OutboxPublishedTotal, err = meter.Int64Counter("outbox.published.total", + metric.WithDescription("Total outbox events published to stream"), metric.WithUnit("{event}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create outbox published counter: %v", err) + } + m.OutboxPublishFailures, err = meter.Int64Counter("outbox.publish.failures", + metric.WithDescription("Total outbox publish failures"), metric.WithUnit("{event}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create outbox publish failures counter: %v", err) + } + m.OutboxRelayLagSeconds, err = meter.Float64Histogram("outbox.relay.lag", + metric.WithDescription("Age of oldest pending outbox row in seconds"), metric.WithUnit("s")) + if err != nil { + logger.Sugar().Fatalf("Failed to create outbox relay lag histogram: %v", err) + } + m.OutboxPendingGauge, err = meter.Int64ObservableGauge("outbox.pending", + metric.WithDescription("Number of pending outbox events"), metric.WithUnit("{event}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create outbox pending gauge: %v", err) + } +} + func initSystemMetrics(m *Metrics, meter metric.Meter, logger *zap.Logger) { var err error var memStats runtime.MemStats @@ -340,3 +389,40 @@ func initSystemMetrics(m *Metrics, meter metric.Meter, logger *zap.Logger) { logger.Sugar().Fatalf("Failed to create GC pause duration: %v", err) } } + +func initWebhookMetrics(m *Metrics, meter metric.Meter, logger *zap.Logger) { + var err error + m.WebhookDeliveryTotal, err = meter.Int64Counter("webhook.delivery.total", + metric.WithDescription("Total webhook deliveries attempted"), metric.WithUnit("{delivery}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create webhook delivery counter: %v", err) + } + m.WebhookDeliveryDuration, err = meter.Float64Histogram("webhook.delivery.duration", + metric.WithDescription("Duration of webhook delivery HTTP calls"), metric.WithUnit("s")) + if err != nil { + logger.Sugar().Fatalf("Failed to create webhook delivery duration: %v", err) + } + m.WebhookDeliveryFailures, err = meter.Int64Counter("webhook.delivery.failures", + metric.WithDescription("Total webhook delivery failures"), metric.WithUnit("{delivery}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create webhook delivery failures counter: %v", err) + } + m.WebhookPendingGauge, err = meter.Int64ObservableGauge("webhook.pending", + metric.WithDescription("Number of pending webhook deliveries"), metric.WithUnit("{delivery}")) + if err != nil { + logger.Sugar().Fatalf("Failed to create webhook pending gauge: %v", err) + } +} + +// RegisterWebhookPendingFunc wires a callback to the WebhookPendingGauge. +func (m *Metrics) RegisterWebhookPendingFunc(fn func(context.Context) (int64, error)) error { + _, err := m.meter.RegisterCallback(func(ctx context.Context, o metric.Observer) error { + n, err := fn(ctx) + if err != nil { + return err + } + o.ObserveInt64(m.WebhookPendingGauge, n) + return nil + }, m.WebhookPendingGauge) + return err +} diff --git a/internal/middleware/authorization.go b/internal/middleware/authorization.go index 4b707f7..fc16b09 100644 --- a/internal/middleware/authorization.go +++ b/internal/middleware/authorization.go @@ -58,3 +58,6 @@ func GetUserID(ctx context.Context) (string, bool) { userID, ok := ctx.Value(userIDKey).(string) return userID, ok } + +// UserIDKey returns the context key used for storing user_id. Exported for testing. +func UserIDKey() contextKey { return userIDKey } diff --git a/internal/models/outbox.go b/internal/models/outbox.go new file mode 100644 index 0000000..297883c --- /dev/null +++ b/internal/models/outbox.go @@ -0,0 +1,22 @@ +package models + +import ( + "encoding/json" + "time" + + "github.com/google/uuid" +) + +type OutboxEvent struct { + ID int64 `db:"id"` + AggregateID uuid.UUID `db:"aggregate_id"` + JobID *int64 `db:"job_id"` + Event string `db:"event"` + Payload json.RawMessage `db:"payload"` + Status string `db:"status"` + Attempts int `db:"attempts"` + MaxAttempts int `db:"max_attempts"` + LastError *string `db:"last_error"` + CreatedAt time.Time `db:"created_at"` + PublishedAt *time.Time `db:"published_at"` +} diff --git a/internal/outbox/relay.go b/internal/outbox/relay.go new file mode 100644 index 0000000..ae485b5 --- /dev/null +++ b/internal/outbox/relay.go @@ -0,0 +1,125 @@ +package outbox + +import ( + "context" + "encoding/json" + "time" + + "github.com/rndmcodeguy20/mpiper/internal/metrics" + "github.com/rndmcodeguy20/mpiper/internal/queue" + "github.com/rndmcodeguy20/mpiper/internal/repository" + "go.uber.org/zap" +) + +// Relay polls the event_outbox table for pending rows and publishes them to Redis. +type Relay struct { + repo repository.OutboxRepository + queue queue.Queue + logger *zap.Logger + m *metrics.Metrics + interval time.Duration + batch int +} + +func NewRelay(repo repository.OutboxRepository, q queue.Queue, logger *zap.Logger, m *metrics.Metrics, interval time.Duration, batch int) *Relay { + return &Relay{repo: repo, queue: q, logger: logger, m: m, interval: interval, batch: batch} +} + +// Start runs the relay loop until ctx is cancelled. It finishes the in-flight batch before returning. +func (r *Relay) Start(ctx context.Context) { + r.logger.Info("outbox relay started", zap.Duration("interval", r.interval), zap.Int("batch", r.batch)) + ticker := time.NewTicker(r.interval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + r.logger.Info("outbox relay stopped") + return + case <-ticker.C: + r.tick(ctx) + } + } +} + +func (r *Relay) tick(ctx context.Context) { + rows, err := r.repo.FetchPendingBatch(ctx, r.batch) + if err != nil { + r.logger.Error("outbox relay: fetch pending batch failed", zap.Error(err)) + return + } + if len(rows) == 0 { + return + } + + // Record relay lag from the oldest pending row. + if r.m != nil { + lag := time.Since(rows[0].CreatedAt).Seconds() + r.m.OutboxRelayLagSeconds.Record(ctx, lag) + } + + var publishedIDs []int64 + + for _, row := range rows { + var payload map[string]interface{} + if err := json.Unmarshal(row.Payload, &payload); err != nil { + r.logger.Error("outbox relay: unmarshal payload failed", zap.Int64("id", row.ID), zap.Error(err)) + _ = r.repo.MarkFailed(ctx, row.ID, err.Error()) + if r.m != nil { + r.m.OutboxPublishFailures.Add(ctx, 1) + } + continue + } + + if _, err := r.queue.Enqueue(ctx, payload); err != nil { + r.logger.Warn("outbox relay: enqueue failed", zap.Int64("id", row.ID), zap.Error(err)) + _ = r.repo.IncrementAttempts(ctx, row.ID, err.Error()) + if row.Attempts+1 >= row.MaxAttempts { + _ = r.repo.MarkFailed(ctx, row.ID, err.Error()) + } + if r.m != nil { + r.m.OutboxPublishFailures.Add(ctx, 1) + } + continue + } + + publishedIDs = append(publishedIDs, row.ID) + } + + if len(publishedIDs) > 0 { + if err := r.repo.MarkPublished(ctx, publishedIDs); err != nil { + r.logger.Error("outbox relay: mark published failed", zap.Error(err)) + } + if r.m != nil { + r.m.OutboxPublishedTotal.Add(ctx, int64(len(publishedIDs))) + } + } +} + +// StartCleanup periodically deletes published outbox rows older than retention. +func (r *Relay) StartCleanup(ctx context.Context, retention time.Duration) { + interval := retention / 24 + if interval < time.Minute { + interval = time.Minute + } + r.logger.Info("outbox cleanup started", zap.Duration("retention", retention), zap.Duration("interval", interval)) + ticker := time.NewTicker(interval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + r.logger.Info("outbox cleanup stopped") + return + case <-ticker.C: + deleted, err := r.repo.DeletePublishedBefore(ctx, time.Now().Add(-retention)) + if err != nil { + r.logger.Error("outbox cleanup: delete failed", zap.Error(err)) + continue + } + if deleted > 0 { + r.logger.Info("outbox cleanup: deleted old rows", zap.Int64("count", deleted)) + } + } + } +} diff --git a/internal/outbox/relay_integration_test.go b/internal/outbox/relay_integration_test.go new file mode 100644 index 0000000..cade7a9 --- /dev/null +++ b/internal/outbox/relay_integration_test.go @@ -0,0 +1,326 @@ +//go:build integration + +package outbox_test + +import ( + "context" + "database/sql" + "encoding/json" + "fmt" + "testing" + "time" + + "github.com/google/uuid" + "github.com/jmoiron/sqlx" + _ "github.com/lib/pq" + "github.com/redis/go-redis/v9" + "github.com/rndmcodeguy20/mpiper/internal/models" + "github.com/rndmcodeguy20/mpiper/internal/outbox" + "github.com/rndmcodeguy20/mpiper/internal/repository" + "github.com/testcontainers/testcontainers-go" + tcpostgres "github.com/testcontainers/testcontainers-go/modules/postgres" + tcredis "github.com/testcontainers/testcontainers-go/modules/redis" + "github.com/testcontainers/testcontainers-go/wait" + "go.uber.org/zap" +) + +// --- helpers --- + +func setupPostgres(t *testing.T, ctx context.Context) *sqlx.DB { + t.Helper() + pg, err := tcpostgres.Run(ctx, "postgres:16-alpine", + tcpostgres.WithDatabase("testdb"), + tcpostgres.WithUsername("test"), + tcpostgres.WithPassword("test"), + testcontainers.WithWaitStrategy(wait.ForListeningPort("5432/tcp").WithStartupTimeout(30*time.Second)), + ) + if err != nil { + t.Fatalf("start postgres container: %v", err) + } + t.Cleanup(func() { _ = pg.Terminate(ctx) }) + + dsn, err := pg.ConnectionString(ctx, "sslmode=disable") + if err != nil { + t.Fatalf("get connection string: %v", err) + } + + db, err := sqlx.Connect("postgres", dsn) + if err != nil { + t.Fatalf("connect to postgres: %v", err) + } + t.Cleanup(func() { _ = db.Close() }) + + // Apply schema for event_outbox (and assets/jobs for the full flow test). + for _, ddl := range []string{ + `CREATE EXTENSION IF NOT EXISTS "uuid-ossp"`, + `CREATE TABLE IF NOT EXISTS assets ( + asset_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), + original_url TEXT NOT NULL, type TEXT NOT NULL, status TEXT NOT NULL, + mime_type TEXT NOT NULL, size_bytes BIGINT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW())`, + `CREATE TABLE IF NOT EXISTS jobs ( + job_id BIGSERIAL PRIMARY KEY, asset_id UUID NOT NULL REFERENCES assets(asset_id), + type TEXT NOT NULL, status TEXT NOT NULL, attempts INT NOT NULL DEFAULT 0, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW())`, + `CREATE UNIQUE INDEX IF NOT EXISTS jobs_asset_type_unique ON jobs (asset_id, type)`, + `CREATE TABLE IF NOT EXISTS event_outbox ( + id BIGSERIAL PRIMARY KEY, aggregate_id UUID NOT NULL, job_id BIGINT, + event TEXT NOT NULL, payload JSONB NOT NULL, traceparent TEXT, + status TEXT NOT NULL DEFAULT 'pending', attempts INT NOT NULL DEFAULT 0, + max_attempts INT NOT NULL DEFAULT 5, last_error TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), published_at TIMESTAMPTZ)`, + `CREATE INDEX idx_event_outbox_pending ON event_outbox (id) WHERE status = 'pending'`, + } { + if _, err := db.ExecContext(ctx, ddl); err != nil { + t.Fatalf("apply DDL: %v", err) + } + } + return db +} + +func setupRedis(t *testing.T, ctx context.Context) *redis.Client { + t.Helper() + rc, err := tcredis.Run(ctx, "redis:7-alpine", + testcontainers.WithWaitStrategy(wait.ForListeningPort("6379/tcp").WithStartupTimeout(30*time.Second)), + ) + if err != nil { + t.Fatalf("start redis container: %v", err) + } + t.Cleanup(func() { _ = rc.Terminate(ctx) }) + + ep, err := rc.Endpoint(ctx, "") + if err != nil { + t.Fatalf("get redis endpoint: %v", err) + } + + rdb := redis.NewClient(&redis.Options{Addr: ep}) + t.Cleanup(func() { _ = rdb.Close() }) + return rdb +} + +// testQueue implements queue.Queue for integration tests. +type testQueue struct { + rdb *redis.Client + stream string +} + +func (q *testQueue) Enqueue(ctx context.Context, payload map[string]interface{}) (string, error) { + body, _ := json.Marshal(payload) + return q.rdb.XAdd(ctx, &redis.XAddArgs{ + Stream: q.stream, + Values: map[string]interface{}{"body": string(body)}, + ID: "*", + }).Result() +} + +// failQueue always returns an error on Enqueue. +type failQueue struct{} + +func (q *failQueue) Enqueue(_ context.Context, _ map[string]interface{}) (string, error) { + return "", fmt.Errorf("simulated redis failure") +} + +// --- tests --- + +func TestOutboxRelay_HappyPath(t *testing.T) { + ctx := context.Background() + db := setupPostgres(t, ctx) + rdb := setupRedis(t, ctx) + logger := zap.NewNop() + stream := "media:jobs" + + repo := repository.NewOutboxRepository(db, logger) + q := &testQueue{rdb: rdb, stream: stream} + + // Insert an asset and simulate the MarkAssetUploaded transaction. + assetID := uuid.New() + _, err := db.ExecContext(ctx, + `INSERT INTO assets (asset_id, original_url, type, mime_type, status, size_bytes) VALUES ($1,$2,$3,$4,$5,$6)`, + assetID, "http://example.com/raw", "image", "image/jpeg", "uploaded", 1024) + if err != nil { + t.Fatalf("insert asset: %v", err) + } + + var jobID int64 + err = db.QueryRowContext(ctx, + `INSERT INTO jobs (asset_id, type, status) VALUES ($1, 'process_asset', 'processing') RETURNING job_id`, assetID).Scan(&jobID) + if err != nil { + t.Fatalf("insert job: %v", err) + } + + // Insert outbox row (what the producer does inside the transaction). + payload, _ := json.Marshal(map[string]interface{}{ + "job_id": jobID, + "asset_id": assetID.String(), + "event": "asset_uploaded", + }) + tx, _ := db.BeginTx(ctx, nil) + err = repo.InsertTx(ctx, tx, models.OutboxEvent{ + AggregateID: assetID, + JobID: &jobID, + Event: "asset_uploaded", + Payload: payload, + }) + if err != nil { + t.Fatalf("insert outbox event: %v", err) + } + _ = tx.Commit() + + // Assert: outbox row pending, no Redis message yet. + var status string + _ = db.GetContext(ctx, &status, `SELECT status FROM event_outbox WHERE aggregate_id = $1`, assetID) + if status != "pending" { + t.Fatalf("expected pending, got %s", status) + } + msgs, _ := rdb.XRange(ctx, stream, "-", "+").Result() + if len(msgs) != 0 { + t.Fatalf("expected 0 messages, got %d", len(msgs)) + } + + // Start the relay with a short interval. + relayCtx, cancel := context.WithCancel(ctx) + defer cancel() + relay := outbox.NewRelay(repo, q, logger, nil, 50*time.Millisecond, 100) + go relay.Start(relayCtx) + + // Wait for message to appear on the stream. + deadline := time.After(2 * time.Second) + for { + msgs, _ = rdb.XRange(ctx, stream, "-", "+").Result() + if len(msgs) > 0 { + break + } + select { + case <-deadline: + t.Fatal("timeout waiting for message on Redis stream") + default: + time.Sleep(20 * time.Millisecond) + } + } + + // Verify message content. + var body map[string]interface{} + _ = json.Unmarshal([]byte(msgs[0].Values["body"].(string)), &body) + if body["asset_id"] != assetID.String() { + t.Fatalf("expected asset_id %s, got %v", assetID, body["asset_id"]) + } + + // Assert outbox row is now published. + _ = db.GetContext(ctx, &status, `SELECT status FROM event_outbox WHERE aggregate_id = $1`, assetID) + if status != "published" { + t.Fatalf("expected published, got %s", status) + } +} + +func TestOutboxRelay_FailureMarksRowFailed(t *testing.T) { + ctx := context.Background() + db := setupPostgres(t, ctx) + logger := zap.NewNop() + + repo := repository.NewOutboxRepository(db, logger) + + // Insert an outbox row with max_attempts=1 so first failure → failed. + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{"event": "test"}) + tx, _ := db.BeginTx(ctx, nil) + _, err := tx.ExecContext(ctx, + `INSERT INTO event_outbox (aggregate_id, event, payload, max_attempts) VALUES ($1, $2, $3, 1)`, + assetID, "asset_uploaded", payload) + if err != nil { + t.Fatalf("insert: %v", err) + } + _ = tx.Commit() + + // Run one relay tick with a failing queue. + relay := outbox.NewRelay(repo, &failQueue{}, logger, nil, 50*time.Millisecond, 100) + tickCtx, tickCancel := context.WithCancel(ctx) + go relay.Start(tickCtx) + time.Sleep(200 * time.Millisecond) + tickCancel() + + // Verify the row is marked failed. + var row struct { + Status string `db:"status"` + LastError *string `db:"last_error"` + } + err = db.GetContext(ctx, &row, `SELECT status, last_error FROM event_outbox WHERE aggregate_id = $1`, assetID) + if err != nil { + t.Fatalf("query: %v", err) + } + if row.Status != "failed" { + t.Fatalf("expected failed, got %s", row.Status) + } + if row.LastError == nil || *row.LastError == "" { + t.Fatal("expected last_error to be set") + } +} + +func TestOutboxRelay_Cleanup(t *testing.T) { + ctx := context.Background() + db := setupPostgres(t, ctx) + logger := zap.NewNop() + + repo := repository.NewOutboxRepository(db, logger) + + // Insert a published row with old published_at. + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{"event": "test"}) + oldTime := time.Now().Add(-200 * time.Hour) + _, err := db.ExecContext(ctx, + `INSERT INTO event_outbox (aggregate_id, event, payload, status, published_at) VALUES ($1, $2, $3, 'published', $4)`, + assetID, "asset_uploaded", payload, oldTime) + if err != nil { + t.Fatalf("insert: %v", err) + } + + // Verify row exists. + var count int + _ = db.GetContext(ctx, &count, `SELECT COUNT(*) FROM event_outbox WHERE aggregate_id = $1`, assetID) + if count != 1 { + t.Fatalf("expected 1 row, got %d", count) + } + + // Run cleanup with 168h retention. + deleted, err := repo.DeletePublishedBefore(ctx, time.Now().Add(-168*time.Hour)) + if err != nil { + t.Fatalf("cleanup: %v", err) + } + if deleted != 1 { + t.Fatalf("expected 1 deleted, got %d", deleted) + } + + // Verify row is gone. + _ = db.GetContext(ctx, &count, `SELECT COUNT(*) FROM event_outbox WHERE aggregate_id = $1`, assetID) + if count != 0 { + t.Fatalf("expected 0 rows, got %d", count) + } +} + +// Ensure InsertTx uses the provided transaction (rolls back correctly). +func TestOutboxRepo_InsertTx_RollbackDoesNotPersist(t *testing.T) { + ctx := context.Background() + db := setupPostgres(t, ctx) + logger := zap.NewNop() + + repo := repository.NewOutboxRepository(db, logger) + + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{"event": "test"}) + tx, _ := db.BeginTx(ctx, nil) + _ = repo.InsertTx(ctx, tx, models.OutboxEvent{ + AggregateID: assetID, + Event: "asset_uploaded", + Payload: payload, + }) + _ = tx.Rollback() + + // Should not exist after rollback. + var count int + _ = db.GetContext(ctx, &count, `SELECT COUNT(*) FROM event_outbox WHERE aggregate_id = $1`, assetID) + if count != 0 { + t.Fatalf("expected 0 rows after rollback, got %d", count) + } +} + +// Suppress unused import warning for database/sql. +var _ = (*sql.DB)(nil) diff --git a/internal/repository/asset_repo.go b/internal/repository/asset_repo.go index 09a1234..ed280dc 100644 --- a/internal/repository/asset_repo.go +++ b/internal/repository/asset_repo.go @@ -91,8 +91,8 @@ func ToAssetTypeFromMimeType(mimeType string) AssetType { } type AssetRepository interface { - CreateAsset(ictx context.Context, d uuid.UUID, url string, size int64, fileType AssetType, mimeType string) error - CreateAssetTx(ctx context.Context, tx *sql.Tx, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string) error + CreateAsset(ctx context.Context, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string, ownerID string) error + CreateAssetTx(ctx context.Context, tx *sql.Tx, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string, ownerID string) error MarkAssetUploadedTx(ctx context.Context, tx *sql.Tx, id uuid.UUID) (bool, error) InsertProcessAssetJobTx(ctx context.Context, tx *sql.Tx, assetID uuid.UUID) (*int64, error) GetDB() *sqlx.DB @@ -112,9 +112,9 @@ func (r *assetRepo) GetDB() *sqlx.DB { return r.db } -func (r *assetRepo) CreateAsset(ctx context.Context, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string) error { +func (r *assetRepo) CreateAsset(ctx context.Context, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string, ownerID string) error { start := time.Now() - query := `INSERT INTO assets (asset_id, original_url, type, mime_type, status, size_bytes) VALUES ($1, $2, $3, $4, $5, $6);` + query := `INSERT INTO assets (asset_id, original_url, type, mime_type, status, size_bytes, owner_id) VALUES ($1, $2, $3, $4, $5, $6, $7);` _, err := r.db.ExecContext( ctx, query, @@ -124,6 +124,7 @@ func (r *assetRepo) CreateAsset(ctx context.Context, id uuid.UUID, url string, s mimeType, StatusUploading, size, + ownerID, ) // Record database metrics @@ -151,9 +152,9 @@ func (r *assetRepo) CreateAsset(ctx context.Context, id uuid.UUID, url string, s return nil } -func (r *assetRepo) CreateAssetTx(ctx context.Context, tx *sql.Tx, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string) error { +func (r *assetRepo) CreateAssetTx(ctx context.Context, tx *sql.Tx, id uuid.UUID, url string, size int64, fileType AssetType, mimeType string, ownerID string) error { start := time.Now() - query := `INSERT INTO assets (asset_id, original_url, type, mime_type, status, size_bytes) VALUES ($1, $2, $3, $4, $5, $6);` + query := `INSERT INTO assets (asset_id, original_url, type, mime_type, status, size_bytes, owner_id) VALUES ($1, $2, $3, $4, $5, $6, $7);` _, err := tx.ExecContext( ctx, query, @@ -163,6 +164,7 @@ func (r *assetRepo) CreateAssetTx(ctx context.Context, tx *sql.Tx, id uuid.UUID, mimeType, StatusUploading, size, + ownerID, ) // Record database metrics diff --git a/internal/repository/outbox_repo.go b/internal/repository/outbox_repo.go new file mode 100644 index 0000000..2ed7d3b --- /dev/null +++ b/internal/repository/outbox_repo.go @@ -0,0 +1,83 @@ +package repository + +import ( + "context" + "database/sql" + "time" + + "github.com/jmoiron/sqlx" + "github.com/lib/pq" + "github.com/rndmcodeguy20/mpiper/internal/models" + "go.uber.org/zap" +) + +type OutboxRepository interface { + InsertTx(ctx context.Context, tx *sql.Tx, event models.OutboxEvent) error + FetchPendingBatch(ctx context.Context, limit int) ([]models.OutboxEvent, error) + MarkPublished(ctx context.Context, ids []int64) error + IncrementAttempts(ctx context.Context, id int64, errMsg string) error + MarkFailed(ctx context.Context, id int64, errMsg string) error + DeletePublishedBefore(ctx context.Context, before time.Time) (int64, error) + CountPending(ctx context.Context) (int64, error) +} + +type outboxRepo struct { + db *sqlx.DB + logger *zap.Logger +} + +func NewOutboxRepository(db *sqlx.DB, logger *zap.Logger) OutboxRepository { + return &outboxRepo{db: db, logger: logger} +} + +func (r *outboxRepo) InsertTx(ctx context.Context, tx *sql.Tx, event models.OutboxEvent) error { + _, err := tx.ExecContext(ctx, + `INSERT INTO event_outbox (aggregate_id, job_id, event, payload, max_attempts) VALUES ($1, $2, $3, $4, $5)`, + event.AggregateID, event.JobID, event.Event, event.Payload, event.MaxAttempts, + ) + return err +} + +func (r *outboxRepo) FetchPendingBatch(ctx context.Context, limit int) ([]models.OutboxEvent, error) { + var rows []models.OutboxEvent + err := r.db.SelectContext(ctx, &rows, + `SELECT id, aggregate_id, job_id, event, payload, status, attempts, max_attempts, last_error, created_at, published_at + FROM event_outbox WHERE status = 'pending' ORDER BY id LIMIT $1 FOR UPDATE SKIP LOCKED`, limit) + return rows, err +} + +func (r *outboxRepo) MarkPublished(ctx context.Context, ids []int64) error { + _, err := r.db.ExecContext(ctx, + `UPDATE event_outbox SET status = 'published', published_at = now() WHERE id = ANY($1)`, + pq.Array(ids)) + return err +} + +func (r *outboxRepo) IncrementAttempts(ctx context.Context, id int64, errMsg string) error { + _, err := r.db.ExecContext(ctx, + `UPDATE event_outbox SET attempts = attempts + 1, last_error = $2 WHERE id = $1`, + id, errMsg) + return err +} + +func (r *outboxRepo) MarkFailed(ctx context.Context, id int64, errMsg string) error { + _, err := r.db.ExecContext(ctx, + `UPDATE event_outbox SET status = 'failed', last_error = $2 WHERE id = $1`, + id, errMsg) + return err +} + +func (r *outboxRepo) DeletePublishedBefore(ctx context.Context, before time.Time) (int64, error) { + res, err := r.db.ExecContext(ctx, + `DELETE FROM event_outbox WHERE status = 'published' AND published_at < $1`, before) + if err != nil { + return 0, err + } + return res.RowsAffected() +} + +func (r *outboxRepo) CountPending(ctx context.Context) (int64, error) { + var count int64 + err := r.db.GetContext(ctx, &count, `SELECT COUNT(*) FROM event_outbox WHERE status = 'pending'`) + return count, err +} diff --git a/internal/repository/webhook_repo.go b/internal/repository/webhook_repo.go new file mode 100644 index 0000000..a3ebdd0 --- /dev/null +++ b/internal/repository/webhook_repo.go @@ -0,0 +1,77 @@ +package repository + +import ( + "context" + "encoding/json" + "time" + + "github.com/google/uuid" + "github.com/jmoiron/sqlx" + "go.uber.org/zap" +) + +type WebhookRegistration struct { + ID uuid.UUID `db:"id" json:"id"` + UserID string `db:"user_id" json:"user_id"` + URL string `db:"url" json:"url"` + Secret string `db:"secret" json:"-"` + Events []string `db:"-" json:"events"` + EventsRaw []byte `db:"events" json:"-"` + CreatedAt time.Time `db:"created_at" json:"created_at"` +} + +type WebhookRepository interface { + Create(ctx context.Context, reg WebhookRegistration) error + ListByUser(ctx context.Context, userID string) ([]WebhookRegistration, error) + Delete(ctx context.Context, id uuid.UUID, userID string) error +} + +type webhookRepo struct { + db *sqlx.DB + logger *zap.Logger +} + +func NewWebhookRepository(db *sqlx.DB, logger *zap.Logger) WebhookRepository { + return &webhookRepo{db: db, logger: logger} +} + +func (r *webhookRepo) Create(ctx context.Context, reg WebhookRegistration) error { + eventsJSON, _ := json.Marshal(reg.Events) + _, err := r.db.ExecContext(ctx, + `INSERT INTO webhook_registrations (id, user_id, url, secret, events) VALUES ($1, $2, $3, $4, $5)`, + reg.ID, reg.UserID, reg.URL, reg.Secret, eventsJSON, + ) + return err +} + +func (r *webhookRepo) ListByUser(ctx context.Context, userID string) ([]WebhookRegistration, error) { + var rows []WebhookRegistration + err := r.db.SelectContext(ctx, &rows, + `SELECT id, user_id, url, events, created_at FROM webhook_registrations WHERE user_id = $1 ORDER BY created_at DESC`, userID) + if err != nil { + return nil, err + } + for i := range rows { + _ = json.Unmarshal(rows[i].EventsRaw, &rows[i].Events) + } + return rows, nil +} + +func (r *webhookRepo) Delete(ctx context.Context, id uuid.UUID, userID string) error { + res, err := r.db.ExecContext(ctx, + `DELETE FROM webhook_registrations WHERE id = $1 AND user_id = $2`, id, userID) + if err != nil { + return err + } + n, _ := res.RowsAffected() + if n == 0 { + return ErrNotFound + } + return nil +} + +var ErrNotFound = ¬FoundError{} + +type notFoundError struct{} + +func (e *notFoundError) Error() string { return "not found" } diff --git a/internal/router/router.go b/internal/router/router.go index 64bb72d..7d44fea 100644 --- a/internal/router/router.go +++ b/internal/router/router.go @@ -113,7 +113,8 @@ func NewRouter(cfg config.EnvConfig, db *sqlx.DB, m *metrics.Metrics) *chi.Mux { r.Use(appMiddleware.SlowRequestMiddleware(logger, 2*time.Second)) assetRepo := repository.NewAssetRepository(db, logger, m) - assetSvc := service.NewAssetService(&cfg.Redis, assetRepo, logger, m) + outboxRepo := repository.NewOutboxRepository(db, logger) + assetSvc := service.NewAssetService(assetRepo, outboxRepo, logger, m) assetHandler := handler.NewAssetHandler(assetSvc, logger, m) // Routes @@ -155,6 +156,16 @@ func NewRouter(cfg config.EnvConfig, db *sqlx.DB, m *metrics.Metrics) *chi.Mux { r.Use(appMiddleware.AuthMiddleware(logger)) r.Get("/{assetID}/complete", assetHandler.MarkAssetUploaded) }) + + r.Route("/webhooks", func(r chi.Router) { + r.Use(appMiddleware.AuthMiddleware(logger)) + webhookRepo := repository.NewWebhookRepository(db, logger) + webhookSvc := service.NewWebhookService(webhookRepo, logger) + webhookHandler := handler.NewWebhookHandler(webhookSvc, logger) + r.Post("/", webhookHandler.Create) + r.Get("/", webhookHandler.List) + r.Delete("/{id}", webhookHandler.Delete) + }) }) return r diff --git a/internal/service/asset.go b/internal/service/asset.go index 8e1d48a..86bb684 100644 --- a/internal/service/asset.go +++ b/internal/service/asset.go @@ -2,6 +2,7 @@ package service import ( "context" + "encoding/json" "errors" "fmt" "strings" @@ -10,9 +11,8 @@ import ( "github.com/google/uuid" "github.com/rndmcodeguy20/mpiper/internal/config" "github.com/rndmcodeguy20/mpiper/internal/metrics" + "github.com/rndmcodeguy20/mpiper/internal/middleware" "github.com/rndmcodeguy20/mpiper/internal/models" - "github.com/rndmcodeguy20/mpiper/internal/queue" - lredis "github.com/rndmcodeguy20/mpiper/internal/queue" "github.com/rndmcodeguy20/mpiper/internal/repository" "github.com/rndmcodeguy20/mpiper/pkg/utils/storagex" "go.opentelemetry.io/otel" @@ -29,14 +29,14 @@ type AssetService interface { type assetService struct { assetRepo repository.AssetRepository + outboxRepo repository.OutboxRepository logger *zap.Logger storageClient storagex.StorageX bucket string - queue *queue.RedisQueue m *metrics.Metrics } -func NewAssetService(redisCfg *config.RedisConfig, assetRepo repository.AssetRepository, logger *zap.Logger, m *metrics.Metrics) AssetService { +func NewAssetService(assetRepo repository.AssetRepository, outboxRepo repository.OutboxRepository, logger *zap.Logger, m *metrics.Metrics) AssetService { ctx := context.Background() storeCfg := config.MustGet().Storage @@ -52,6 +52,7 @@ func NewAssetService(redisCfg *config.RedisConfig, assetRepo repository.AssetRep Bucket: bucket, Region: storeCfg.S3.Region, Endpoint: storeCfg.S3.EndpointURL, + PublicEndpoint: storeCfg.S3.PublicEndpointURL, AccessKeyID: storeCfg.S3.AccessKeyID, SecretAccessKey: storeCfg.S3.SecretAccessKey, GCPServiceAccount: storeCfg.GCS.SAPath, @@ -60,22 +61,12 @@ func NewAssetService(redisCfg *config.RedisConfig, assetRepo repository.AssetRep logger.Sugar().Fatalf("Failed to initialize storage client: %v", err) } - rc, err := lredis.MustGetRedisClient(redisCfg, logger) - rq := lredis.NewRedisQueue(ctx, rc, lredis.RedisQueueOptions{ - QueueName: "media:jobs", - ConnectionTimeOut: 2 * time.Second, - MaxStreamLength: 10_000, - MaxRetries: 3, - RetryInterval: 2 * time.Second, - EnableMetrics: true, - }, logger, m) - return &assetService{ assetRepo: assetRepo, + outboxRepo: outboxRepo, logger: logger, storageClient: storageClient, bucket: bucket, - queue: rq, m: m, } } @@ -133,7 +124,8 @@ func (s *assetService) CreateAsset(ctx context.Context, request models.UploadAss spanStorageCtx, spanStorage = tracer.Start(ctx, "AssetRepo.CreateAsset") spanStorage.SetAttributes(attribute.String("asset_id", assetID.String())) - err = s.assetRepo.CreateAsset(spanStorageCtx, assetID, publicUrl, request.Size, repository.ToAssetTypeFromMimeType(request.ContentType), request.ContentType) + ownerID, _ := middleware.GetUserID(ctx) + err = s.assetRepo.CreateAsset(spanStorageCtx, assetID, publicUrl, request.Size, repository.ToAssetTypeFromMimeType(request.ContentType), request.ContentType, ownerID) spanStorage.End() if err != nil { @@ -257,40 +249,60 @@ func (s *assetService) MarkAssetUploaded(ctx context.Context, assetID uuid.UUID) spanJob.SetAttributes(attribute.Int64("job_id", *jobID)) - err = tx.Commit() + // Insert outbox row in the same transaction — atomic with job + asset status. + ctxOutbox, spanOutbox := tracer.Start(ctxTx, "OutboxRepo.InsertTx") + spanOutbox.SetAttributes(attribute.String("asset_id", assetID.String()), attribute.Int64("job_id", *jobID)) + payload, _ := json.Marshal(map[string]interface{}{ + "job_id": *jobID, + "asset_id": assetID.String(), + "event": "asset_uploaded", + "timestamp": time.Now().UTC().Format(time.RFC3339), + }) + err = s.outboxRepo.InsertTx(ctxOutbox, tx, models.OutboxEvent{ + AggregateID: assetID, + JobID: jobID, + Event: "asset_uploaded", + Payload: payload, + }) + spanOutbox.End() + if err != nil { - spanTx.RecordError(err) - spanTx.SetStatus(codes.Error, "Transaction commit failed") + spanOutbox.RecordError(err) + spanOutbox.SetStatus(codes.Error, "Failed to insert outbox event") span.RecordError(err) - span.SetStatus(codes.Error, "Failed to commit transaction") + span.SetStatus(codes.Error, "Outbox insert failed") + s.logger.Sugar().Errorf("Failed to insert outbox event: %v", err) return err } - tx = nil // Prevent deferred rollback after commit - spanTx.SetStatus(codes.Ok, "Transaction committed") - ctxQueue, spanQueue := tracer.Start(ctx, "Queue.Enqueue") - spanQueue.SetAttributes( - attribute.Int64("job_id", *jobID), - attribute.String("asset_id", assetID.String()), - attribute.String("event", "asset_uploaded"), - ) - _, err = s.queue.Enqueue(ctxQueue, map[string]interface{}{ - "job_id": *jobID, + // Insert job.starting webhook deliveries for matching registrations (same tx). + webhookPayload, _ := json.Marshal(map[string]interface{}{ + "event": "job.starting", "asset_id": assetID.String(), - "event": "asset_uploaded", + "job_id": *jobID, + "status": "starting", "timestamp": time.Now().UTC().Format(time.RFC3339), }) - spanQueue.End() + _, _ = tx.ExecContext(ctxTx, + `INSERT INTO webhook_deliveries (registration_id, event, asset_id, job_id, payload) + SELECT wr.id, 'job.starting', $1, $2, $3::jsonb + FROM webhook_registrations wr + JOIN assets a ON a.owner_id = wr.user_id + WHERE a.asset_id = $1 AND wr.events @> '["job.starting"]'::jsonb`, + assetID, *jobID, webhookPayload, + ) + err = tx.Commit() if err != nil { - spanQueue.RecordError(err) - spanQueue.SetStatus(codes.Error, "Failed to enqueue job") + spanTx.RecordError(err) + spanTx.SetStatus(codes.Error, "Transaction commit failed") span.RecordError(err) - span.SetStatus(codes.Error, "Queue enqueue failed") + span.SetStatus(codes.Error, "Failed to commit transaction") return err } + tx = nil // Prevent deferred rollback after commit + spanTx.SetStatus(codes.Ok, "Transaction committed") - spanQueue.SetStatus(codes.Ok, "Job enqueued successfully") - span.SetStatus(codes.Ok, "Asset marked as uploaded and job queued") + span.SetStatus(codes.Ok, "Asset marked as uploaded and outbox event created") return nil } diff --git a/internal/service/webhook.go b/internal/service/webhook.go new file mode 100644 index 0000000..97c775d --- /dev/null +++ b/internal/service/webhook.go @@ -0,0 +1,93 @@ +package service + +import ( + "context" + "fmt" + "net/url" + + "github.com/google/uuid" + "github.com/rndmcodeguy20/mpiper/internal/config" + "github.com/rndmcodeguy20/mpiper/internal/middleware" + "github.com/rndmcodeguy20/mpiper/internal/repository" + "github.com/rndmcodeguy20/mpiper/pkg/utils" + "go.uber.org/zap" +) + +var validEvents = map[string]bool{ + "job.starting": true, + "job.started": true, + "job.done": true, + "job.failed": true, +} + +type WebhookService interface { + Create(ctx context.Context, reqURL, secret string, events []string) (*repository.WebhookRegistration, error) + List(ctx context.Context) ([]repository.WebhookRegistration, error) + Delete(ctx context.Context, id uuid.UUID) error +} + +type webhookService struct { + repo repository.WebhookRepository + logger *zap.Logger +} + +func NewWebhookService(repo repository.WebhookRepository, logger *zap.Logger) WebhookService { + return &webhookService{repo: repo, logger: logger} +} + +func (s *webhookService) Create(ctx context.Context, reqURL, secret string, events []string) (*repository.WebhookRegistration, error) { + if _, err := url.ParseRequestURI(reqURL); err != nil { + return nil, fmt.Errorf("invalid url: %w", err) + } + if secret == "" { + return nil, fmt.Errorf("secret is required") + } + if len(events) == 0 { + return nil, fmt.Errorf("at least one event is required") + } + for _, e := range events { + if !validEvents[e] { + return nil, fmt.Errorf("invalid event: %s", e) + } + } + + userID, ok := middleware.GetUserID(ctx) + if !ok || userID == "" { + return nil, fmt.Errorf("user_id not found in context") + } + + encryptedSecret, err := utils.GenerateToken(secret, config.MustGet().EncryptionKey) + if err != nil { + return nil, fmt.Errorf("failed to encrypt secret: %w", err) + } + + reg := repository.WebhookRegistration{ + ID: uuid.New(), + UserID: userID, + URL: reqURL, + Secret: encryptedSecret, + Events: events, + } + + if err := s.repo.Create(ctx, reg); err != nil { + return nil, err + } + + return ®, nil +} + +func (s *webhookService) List(ctx context.Context) ([]repository.WebhookRegistration, error) { + userID, ok := middleware.GetUserID(ctx) + if !ok || userID == "" { + return nil, fmt.Errorf("user_id not found in context") + } + return s.repo.ListByUser(ctx, userID) +} + +func (s *webhookService) Delete(ctx context.Context, id uuid.UUID) error { + userID, ok := middleware.GetUserID(ctx) + if !ok || userID == "" { + return fmt.Errorf("user_id not found in context") + } + return s.repo.Delete(ctx, id, userID) +} diff --git a/internal/service/webhook_test.go b/internal/service/webhook_test.go new file mode 100644 index 0000000..dc4d9fc --- /dev/null +++ b/internal/service/webhook_test.go @@ -0,0 +1,115 @@ +package service + +import ( + "context" + "testing" + + "github.com/google/uuid" + "github.com/rndmcodeguy20/mpiper/internal/config" + "github.com/rndmcodeguy20/mpiper/internal/middleware" + "github.com/rndmcodeguy20/mpiper/internal/repository" + "go.uber.org/zap" +) + +// mockWebhookRepo implements repository.WebhookRepository for testing. +type mockWebhookRepo struct { + created []repository.WebhookRegistration +} + +func (m *mockWebhookRepo) Create(_ context.Context, reg repository.WebhookRegistration) error { + m.created = append(m.created, reg) + return nil +} +func (m *mockWebhookRepo) ListByUser(_ context.Context, _ string) ([]repository.WebhookRegistration, error) { + return nil, nil +} +func (m *mockWebhookRepo) Delete(_ context.Context, _ uuid.UUID, _ string) error { return nil } + +func ctxWithUser(userID string) context.Context { + return context.WithValue(context.Background(), middleware.UserIDKey(), userID) +} + +func init() { + // Initialize config singleton for tests (32-byte encryption key). + config.Init(config.EnvConfig{ + EncryptionKey: "01234567890123456789012345678901", + }) +} + +func TestWebhookService_Create_ValidInput(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + ctx := ctxWithUser("user-123") + reg, err := svc.Create(ctx, "https://example.com/hook", "my-secret", []string{"job.done", "job.failed"}) + if err != nil { + t.Fatalf("unexpected error: %v", err) + } + if reg.UserID != "user-123" { + t.Errorf("expected user-123, got %s", reg.UserID) + } + if len(repo.created) != 1 { + t.Fatalf("expected 1 registration, got %d", len(repo.created)) + } + // Secret should be encrypted (not the raw value) + if repo.created[0].Secret == "my-secret" { + t.Error("secret should be encrypted, not stored as plaintext") + } + if repo.created[0].Secret == "" { + t.Error("encrypted secret should not be empty") + } +} + +func TestWebhookService_Create_InvalidURL(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + ctx := ctxWithUser("user-123") + _, err := svc.Create(ctx, "not-a-url", "secret", []string{"job.done"}) + if err == nil { + t.Fatal("expected error for invalid URL") + } +} + +func TestWebhookService_Create_InvalidEvent(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + ctx := ctxWithUser("user-123") + _, err := svc.Create(ctx, "https://example.com/hook", "secret", []string{"invalid.event"}) + if err == nil { + t.Fatal("expected error for invalid event") + } +} + +func TestWebhookService_Create_EmptyEvents(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + ctx := ctxWithUser("user-123") + _, err := svc.Create(ctx, "https://example.com/hook", "secret", []string{}) + if err == nil { + t.Fatal("expected error for empty events") + } +} + +func TestWebhookService_Create_EmptySecret(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + ctx := ctxWithUser("user-123") + _, err := svc.Create(ctx, "https://example.com/hook", "", []string{"job.done"}) + if err == nil { + t.Fatal("expected error for empty secret") + } +} + +func TestWebhookService_Create_NoUserInContext(t *testing.T) { + repo := &mockWebhookRepo{} + svc := NewWebhookService(repo, zap.NewNop()) + + _, err := svc.Create(context.Background(), "https://example.com/hook", "secret", []string{"job.done"}) + if err == nil { + t.Fatal("expected error for missing user in context") + } +} diff --git a/internal/webhook/dispatcher.go b/internal/webhook/dispatcher.go new file mode 100644 index 0000000..0fb08c9 --- /dev/null +++ b/internal/webhook/dispatcher.go @@ -0,0 +1,222 @@ +package webhook + +import ( + "context" + "crypto/hmac" + "crypto/sha256" + "encoding/hex" + "encoding/json" + "fmt" + "io" + "math" + "math/rand" + "net/http" + "time" + + "github.com/google/uuid" + "github.com/jmoiron/sqlx" + "github.com/rndmcodeguy20/mpiper/pkg/utils" + "go.uber.org/zap" +) + +type DispatcherConfig struct { + PollInterval time.Duration + BatchSize int + Timeout time.Duration + MaxAttempts int + EncryptionKey string + Retention time.Duration +} + +type Dispatcher struct { + db *sqlx.DB + logger *zap.Logger + client *http.Client + cfg DispatcherConfig +} + +func NewDispatcher(db *sqlx.DB, logger *zap.Logger, cfg DispatcherConfig) *Dispatcher { + return &Dispatcher{ + db: db, + logger: logger, + client: &http.Client{Timeout: cfg.Timeout}, + cfg: cfg, + } +} + +type deliveryRow struct { + ID uuid.UUID `db:"id"` + Event string `db:"event"` + AssetID uuid.UUID `db:"asset_id"` + JobID int64 `db:"job_id"` + Payload json.RawMessage `db:"payload"` + Attempts int `db:"attempts"` + URL string `db:"url"` + Secret string `db:"secret"` +} + +func (d *Dispatcher) Start(ctx context.Context) { + d.logger.Info("webhook dispatcher started", zap.Duration("interval", d.cfg.PollInterval)) + ticker := time.NewTicker(d.cfg.PollInterval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + d.logger.Info("webhook dispatcher stopped") + return + case <-ticker.C: + d.tick(ctx) + } + } +} + +func (d *Dispatcher) tick(ctx context.Context) { + rows := make([]deliveryRow, 0, d.cfg.BatchSize) + err := d.db.SelectContext(ctx, &rows, + `SELECT wd.id, wd.event, wd.asset_id, wd.job_id, wd.payload, wd.attempts, wr.url, wr.secret + FROM webhook_deliveries wd + JOIN webhook_registrations wr ON wd.registration_id = wr.id + WHERE wd.status = 'pending' AND wd.next_attempt_at <= now() + ORDER BY wd.next_attempt_at + LIMIT $1 + FOR UPDATE OF wd SKIP LOCKED`, d.cfg.BatchSize) + if err != nil { + d.logger.Error("webhook dispatcher: fetch failed", zap.Error(err)) + return + } + + for _, row := range rows { + d.deliver(ctx, row) + } +} + +func (d *Dispatcher) deliver(ctx context.Context, row deliveryRow) { + secret, err := utils.DecryptToken(row.Secret, d.cfg.EncryptionKey) + if err != nil { + d.logger.Error("webhook: decrypt secret failed", zap.String("delivery_id", row.ID.String()), zap.Error(err)) + d.markFailed(ctx, row.ID) + return + } + + payloadBytes, _ := json.Marshal(row.Payload) + sig := computeHMAC(secret, payloadBytes) + + req, err := http.NewRequestWithContext(ctx, http.MethodPost, row.URL, io.NopCloser( + bytesReader(payloadBytes), + )) + if err != nil { + d.logger.Error("webhook: build request failed", zap.Error(err)) + d.handleFailure(ctx, row) + return + } + req.Header.Set("Content-Type", "application/json") + req.Header.Set("X-Webhook-Signature", "sha256="+sig) + + resp, err := d.client.Do(req) + if err != nil { + d.logger.Warn("webhook: request failed", zap.String("url", row.URL), zap.Error(err)) + d.handleFailure(ctx, row) + return + } + defer resp.Body.Close() + + if resp.StatusCode >= 200 && resp.StatusCode < 300 { + _, _ = d.db.ExecContext(ctx, + `UPDATE webhook_deliveries SET status = 'delivered', delivered_at = now() WHERE id = $1`, row.ID) + d.logger.Debug("webhook delivered", zap.String("id", row.ID.String()), zap.String("url", row.URL)) + } else { + d.logger.Warn("webhook: non-2xx response", zap.String("url", row.URL), zap.Int("status", resp.StatusCode)) + d.handleFailure(ctx, row) + } +} + +func (d *Dispatcher) handleFailure(ctx context.Context, row deliveryRow) { + newAttempts := row.Attempts + 1 + if newAttempts >= d.cfg.MaxAttempts { + d.markFailed(ctx, row.ID) + return + } + next := backoff(newAttempts) + _, _ = d.db.ExecContext(ctx, + `UPDATE webhook_deliveries SET attempts = $2, next_attempt_at = now() + $3::interval WHERE id = $1`, + row.ID, newAttempts, fmt.Sprintf("%d seconds", int(next.Seconds()))) +} + +func (d *Dispatcher) markFailed(ctx context.Context, id uuid.UUID) { + _, _ = d.db.ExecContext(ctx, + `UPDATE webhook_deliveries SET status = 'failed' WHERE id = $1`, id) +} + +// StartCleanup deletes delivered rows older than retention. +func (d *Dispatcher) StartCleanup(ctx context.Context) { + interval := d.cfg.Retention / 24 + if interval < time.Minute { + interval = time.Minute + } + d.logger.Info("webhook cleanup started", zap.Duration("retention", d.cfg.Retention)) + ticker := time.NewTicker(interval) + defer ticker.Stop() + + for { + select { + case <-ctx.Done(): + return + case <-ticker.C: + res, err := d.db.ExecContext(ctx, + `DELETE FROM webhook_deliveries WHERE status = 'delivered' AND delivered_at < now() - $1::interval`, + fmt.Sprintf("%d hours", int(d.cfg.Retention.Hours()))) + if err != nil { + d.logger.Error("webhook cleanup failed", zap.Error(err)) + continue + } + if n, _ := res.RowsAffected(); n > 0 { + d.logger.Info("webhook cleanup: deleted old rows", zap.Int64("count", n)) + } + } + } +} + +// backoff returns exponential backoff with jitter, capped at 5 minutes. +func backoff(attempt int) time.Duration { + base := 1 * time.Second + maxBackoff := 5 * time.Minute + b := time.Duration(float64(base) * math.Pow(2, float64(attempt))) + if b > maxBackoff { + b = maxBackoff + } + // Add jitter: ±25% + jitter := time.Duration(rand.Int63n(int64(b/2))) - (b / 4) + result := b + jitter + if result > maxBackoff { + result = maxBackoff + } + if result < 0 { + result = base + } + return result +} + +func computeHMAC(secret string, payload []byte) string { + mac := hmac.New(sha256.New, []byte(secret)) + mac.Write(payload) + return hex.EncodeToString(mac.Sum(nil)) +} + +func bytesReader(b []byte) io.Reader { + return &bytesReaderImpl{data: b} +} + +type bytesReaderImpl struct { + data []byte + pos int +} + +func (r *bytesReaderImpl) Read(p []byte) (int, error) { + if r.pos >= len(r.data) { + return 0, io.EOF + } + n := copy(p, r.data[r.pos:]) + r.pos += n + return n, nil +} diff --git a/internal/webhook/dispatcher_integration_test.go b/internal/webhook/dispatcher_integration_test.go new file mode 100644 index 0000000..81e0e8f --- /dev/null +++ b/internal/webhook/dispatcher_integration_test.go @@ -0,0 +1,249 @@ +//go:build integration + +package webhook_test + +import ( + "context" + "crypto/hmac" + "crypto/sha256" + "encoding/hex" + "encoding/json" + "io" + "net/http" + "net/http/httptest" + "sync/atomic" + "testing" + "time" + + "github.com/google/uuid" + "github.com/jmoiron/sqlx" + _ "github.com/lib/pq" + "github.com/rndmcodeguy20/mpiper/internal/webhook" + "github.com/rndmcodeguy20/mpiper/pkg/utils" + "github.com/testcontainers/testcontainers-go" + tcpostgres "github.com/testcontainers/testcontainers-go/modules/postgres" + "github.com/testcontainers/testcontainers-go/wait" + "go.uber.org/zap" +) + +const testEncryptionKey = "01234567890123456789012345678901" + +func setupDB(t *testing.T, ctx context.Context) *sqlx.DB { + t.Helper() + pg, err := tcpostgres.Run(ctx, "postgres:16-alpine", + tcpostgres.WithDatabase("testdb"), + tcpostgres.WithUsername("test"), + tcpostgres.WithPassword("test"), + testcontainers.WithWaitStrategy(wait.ForListeningPort("5432/tcp").WithStartupTimeout(30*time.Second)), + ) + if err != nil { + t.Fatalf("start postgres: %v", err) + } + t.Cleanup(func() { _ = pg.Terminate(ctx) }) + + dsn, _ := pg.ConnectionString(ctx, "sslmode=disable") + db, err := sqlx.Connect("postgres", dsn) + if err != nil { + t.Fatalf("connect: %v", err) + } + t.Cleanup(func() { _ = db.Close() }) + + for _, ddl := range []string{ + `CREATE EXTENSION IF NOT EXISTS "uuid-ossp"`, + `CREATE TABLE assets ( + asset_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), + original_url TEXT NOT NULL, type TEXT NOT NULL, status TEXT NOT NULL, + mime_type TEXT NOT NULL, size_bytes BIGINT NOT NULL, owner_id TEXT, + created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW())`, + `CREATE TABLE webhook_registrations ( + id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), + user_id TEXT NOT NULL DEFAULT '', url TEXT NOT NULL, + secret TEXT NOT NULL, events JSONB NOT NULL DEFAULT '[]'::jsonb, + created_at TIMESTAMPTZ DEFAULT now())`, + `CREATE TABLE webhook_deliveries ( + id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), + registration_id UUID NOT NULL REFERENCES webhook_registrations(id) ON DELETE CASCADE, + event TEXT NOT NULL, asset_id UUID NOT NULL, job_id BIGINT NOT NULL, + payload JSONB NOT NULL, status TEXT NOT NULL DEFAULT 'pending', + attempts INT NOT NULL DEFAULT 0, next_attempt_at TIMESTAMPTZ NOT NULL DEFAULT now(), + delivered_at TIMESTAMPTZ, created_at TIMESTAMPTZ NOT NULL DEFAULT now())`, + } { + if _, err := db.ExecContext(ctx, ddl); err != nil { + t.Fatalf("DDL: %v", err) + } + } + return db +} + +func TestDispatcher_DeliversSuccessfully(t *testing.T) { + ctx := context.Background() + db := setupDB(t, ctx) + + secret := "test-secret-value" + encSecret, _ := utils.GenerateToken(secret, testEncryptionKey) + + // Set up a test HTTP server that records calls. + var received atomic.Int32 + var receivedBody []byte + var receivedSig string + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + received.Add(1) + receivedBody, _ = io.ReadAll(r.Body) + receivedSig = r.Header.Get("X-Webhook-Signature") + w.WriteHeader(http.StatusOK) + })) + defer srv.Close() + + // Insert registration + asset + delivery. + regID := uuid.New() + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{ + "event": "job.done", "asset_id": assetID.String(), "job_id": 1, "status": "done", + }) + + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_registrations (id, user_id, url, secret, events) VALUES ($1,$2,$3,$4,$5)`, + regID, "user-1", srv.URL, encSecret, `["job.done"]`) + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_deliveries (registration_id, event, asset_id, job_id, payload) VALUES ($1,$2,$3,$4,$5)`, + regID, "job.done", assetID, 1, payload) + + // Run dispatcher. + dispCtx, cancel := context.WithCancel(ctx) + defer cancel() + d := webhook.NewDispatcher(db, zap.NewNop(), webhook.DispatcherConfig{ + PollInterval: 50 * time.Millisecond, + BatchSize: 10, + Timeout: 5 * time.Second, + MaxAttempts: 5, + EncryptionKey: testEncryptionKey, + Retention: 168 * time.Hour, + }) + go d.Start(dispCtx) + + // Wait for delivery. + deadline := time.After(3 * time.Second) + for received.Load() == 0 { + select { + case <-deadline: + t.Fatal("timeout waiting for webhook delivery") + default: + time.Sleep(20 * time.Millisecond) + } + } + cancel() + + // Verify signature. + mac := hmac.New(sha256.New, []byte(secret)) + mac.Write(receivedBody) + expectedSig := "sha256=" + hex.EncodeToString(mac.Sum(nil)) + if receivedSig != expectedSig { + t.Errorf("signature mismatch: got %s, want %s", receivedSig, expectedSig) + } + + // Verify DB status. + var status string + _ = db.GetContext(ctx, &status, `SELECT status FROM webhook_deliveries WHERE asset_id = $1`, assetID) + if status != "delivered" { + t.Errorf("expected delivered, got %s", status) + } +} + +func TestDispatcher_RetriesOnFailure(t *testing.T) { + ctx := context.Background() + db := setupDB(t, ctx) + + encSecret, _ := utils.GenerateToken("secret", testEncryptionKey) + + var callCount atomic.Int32 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + callCount.Add(1) + w.WriteHeader(http.StatusInternalServerError) + })) + defer srv.Close() + + regID := uuid.New() + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{"event": "job.done"}) + + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_registrations (id, user_id, url, secret, events) VALUES ($1,$2,$3,$4,$5)`, + regID, "user-1", srv.URL, encSecret, `["job.done"]`) + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_deliveries (registration_id, event, asset_id, job_id, payload) VALUES ($1,$2,$3,$4,$5)`, + regID, "job.done", assetID, 1, payload) + + // Run dispatcher briefly — first attempt should fail and schedule retry. + dispCtx, cancel := context.WithCancel(ctx) + d := webhook.NewDispatcher(db, zap.NewNop(), webhook.DispatcherConfig{ + PollInterval: 50 * time.Millisecond, + BatchSize: 10, + Timeout: 2 * time.Second, + MaxAttempts: 5, + EncryptionKey: testEncryptionKey, + Retention: 168 * time.Hour, + }) + go d.Start(dispCtx) + time.Sleep(300 * time.Millisecond) + cancel() + + if callCount.Load() < 1 { + t.Fatal("expected at least 1 call") + } + + // Delivery should still be pending with attempts incremented. + var row struct { + Status string `db:"status"` + Attempts int `db:"attempts"` + } + _ = db.GetContext(ctx, &row, `SELECT status, attempts FROM webhook_deliveries WHERE asset_id = $1`, assetID) + if row.Status != "pending" { + t.Errorf("expected pending, got %s", row.Status) + } + if row.Attempts < 1 { + t.Errorf("expected attempts >= 1, got %d", row.Attempts) + } +} + +func TestDispatcher_FailsAfterMaxAttempts(t *testing.T) { + ctx := context.Background() + db := setupDB(t, ctx) + + encSecret, _ := utils.GenerateToken("secret", testEncryptionKey) + + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusBadGateway) + })) + defer srv.Close() + + regID := uuid.New() + assetID := uuid.New() + payload, _ := json.Marshal(map[string]interface{}{"event": "job.done"}) + + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_registrations (id, user_id, url, secret, events) VALUES ($1,$2,$3,$4,$5)`, + regID, "user-1", srv.URL, encSecret, `["job.done"]`) + // Pre-set attempts to 4 (max-1), so next failure marks it failed. + _, _ = db.ExecContext(ctx, + `INSERT INTO webhook_deliveries (registration_id, event, asset_id, job_id, payload, attempts) VALUES ($1,$2,$3,$4,$5,4)`, + regID, "job.done", assetID, 1, payload) + + dispCtx, cancel := context.WithCancel(ctx) + d := webhook.NewDispatcher(db, zap.NewNop(), webhook.DispatcherConfig{ + PollInterval: 50 * time.Millisecond, + BatchSize: 10, + Timeout: 2 * time.Second, + MaxAttempts: 5, + EncryptionKey: testEncryptionKey, + Retention: 168 * time.Hour, + }) + go d.Start(dispCtx) + time.Sleep(300 * time.Millisecond) + cancel() + + var status string + _ = db.GetContext(ctx, &status, `SELECT status FROM webhook_deliveries WHERE asset_id = $1`, assetID) + if status != "failed" { + t.Errorf("expected failed, got %s", status) + } +} diff --git a/internal/webhook/dispatcher_test.go b/internal/webhook/dispatcher_test.go new file mode 100644 index 0000000..760ea52 --- /dev/null +++ b/internal/webhook/dispatcher_test.go @@ -0,0 +1,70 @@ +package webhook + +import ( + "crypto/hmac" + "crypto/sha256" + "encoding/hex" + "testing" + "time" +) + +func TestBackoff_ExponentialWithCap(t *testing.T) { + tests := []struct { + attempt int + wantMin time.Duration + wantMax time.Duration + }{ + {1, 1 * time.Second, 4 * time.Second}, // 2s base ±25% + {2, 2 * time.Second, 6 * time.Second}, // 4s base ±25% + {3, 4 * time.Second, 12 * time.Second}, // 8s base ±25% + {4, 8 * time.Second, 24 * time.Second}, // 16s base ±25% + {10, 3 * time.Minute, 5*time.Minute + 1}, // capped at 5min (hard cap after jitter) + } + + for _, tt := range tests { + // Run multiple times for randomness coverage + for range 50 { + got := backoff(tt.attempt) + if got < tt.wantMin || got > tt.wantMax { + t.Errorf("backoff(%d) = %v, want [%v, %v]", tt.attempt, got, tt.wantMin, tt.wantMax) + } + } + } +} + +func TestBackoff_NeverExceedsFiveMinutes(t *testing.T) { + for attempt := 1; attempt <= 20; attempt++ { + for range 100 { + got := backoff(attempt) + if got > 5*time.Minute+time.Minute { // generous tolerance for jitter + t.Fatalf("backoff(%d) = %v exceeds 5min cap", attempt, got) + } + } + } +} + +func TestComputeHMAC(t *testing.T) { + secret := "my-webhook-secret" + payload := []byte(`{"event":"job.done","asset_id":"abc-123"}`) + + got := computeHMAC(secret, payload) + + // Verify independently + mac := hmac.New(sha256.New, []byte(secret)) + mac.Write(payload) + want := hex.EncodeToString(mac.Sum(nil)) + + if got != want { + t.Errorf("computeHMAC = %s, want %s", got, want) + } +} + +func TestComputeHMAC_DifferentSecrets(t *testing.T) { + payload := []byte(`{"event":"job.done"}`) + sig1 := computeHMAC("secret-a", payload) + sig2 := computeHMAC("secret-b", payload) + + if sig1 == sig2 { + t.Error("different secrets should produce different signatures") + } +} diff --git a/pkg/utils/storagex/config.go b/pkg/utils/storagex/config.go index 78c4ccd..c7e29d2 100644 --- a/pkg/utils/storagex/config.go +++ b/pkg/utils/storagex/config.go @@ -14,7 +14,8 @@ type Config struct { // Common settings Region string - Endpoint string // For custom endpoints (e.g., MinIO) + Endpoint string // Internal/server-side endpoint (e.g., http://minio:9000) + PublicEndpoint string // Optional client-facing endpoint used for presigned + public URLs (e.g., http://localhost:9000). Falls back to Endpoint when empty. Bucket string AccessKeyID string SecretAccessKey string diff --git a/pkg/utils/storagex/s3.go b/pkg/utils/storagex/s3.go index a999ab5..332d88f 100644 --- a/pkg/utils/storagex/s3.go +++ b/pkg/utils/storagex/s3.go @@ -22,17 +22,24 @@ import ( ) type s3Storage struct { - client *s3.Client - presign *s3.PresignClient - region string - endpoint string // non-empty for MinIO / S3-compatible endpoints - logger *zap.Logger - m *metrics.Metrics + client *s3.Client + presign *s3.PresignClient + region string + endpoint string // internal/server-side endpoint for MinIO / S3-compatible stores + publicEndpoint string // client-facing endpoint used for presigned + public URLs + logger *zap.Logger + m *metrics.Metrics } // NewS3Storage builds an S3-backed StorageX. An empty cfg.Endpoint targets AWS // S3; a non-empty one (with path-style addressing) targets MinIO or any // S3-compatible store. +// +// When cfg.PublicEndpoint is set it is used to sign presigned URLs and to build +// public URLs, so clients (e.g. a browser or a host-run script) receive a host +// they can actually reach — while server-side operations keep using the +// internal cfg.Endpoint. SigV4 signs the Host header, so presigning must happen +// against the same host the client will connect to. func NewS3Storage(ctx context.Context, cfg Config, m *metrics.Metrics, logger *zap.Logger) (StorageX, error) { region := cfg.Region if region == "" { @@ -58,13 +65,26 @@ func NewS3Storage(ctx context.Context, cfg Config, m *metrics.Metrics, logger *z } }) + // Presign against the public endpoint when one is configured; otherwise + // presign against the same internal client (back-compat). + publicEndpoint := cfg.PublicEndpoint + presignClient := s3.NewPresignClient(client) + if publicEndpoint != "" { + publicClient := s3.NewFromConfig(awsCfg, func(o *s3.Options) { + o.BaseEndpoint = aws.String(publicEndpoint) + o.UsePathStyle = true + }) + presignClient = s3.NewPresignClient(publicClient) + } + return &s3Storage{ - client: client, - presign: s3.NewPresignClient(client), - region: region, - endpoint: cfg.Endpoint, - logger: logger, - m: m, + client: client, + presign: presignClient, + region: region, + endpoint: cfg.Endpoint, + publicEndpoint: publicEndpoint, + logger: logger, + m: m, }, nil } @@ -220,9 +240,14 @@ func (s *s3Storage) PublicURL(ctx context.Context, bucket, key string) (string, ) var url string - if s.endpoint != "" { + // Prefer the client-facing public endpoint; fall back to the internal one. + endpoint := s.publicEndpoint + if endpoint == "" { + endpoint = s.endpoint + } + if endpoint != "" { // path-style for MinIO / S3-compatible endpoints - url = fmt.Sprintf("%s/%s/%s", strings.TrimRight(s.endpoint, "/"), bucket, key) + url = fmt.Sprintf("%s/%s/%s", strings.TrimRight(endpoint, "/"), bucket, key) } else { url = fmt.Sprintf("https://%s.s3.%s.amazonaws.com/%s", bucket, s.region, key) } diff --git a/pkg/utils/storagex/s3_test.go b/pkg/utils/storagex/s3_test.go index 2dcf6d7..3a1a95e 100644 --- a/pkg/utils/storagex/s3_test.go +++ b/pkg/utils/storagex/s3_test.go @@ -91,6 +91,88 @@ func TestS3StorageRoundTrip(t *testing.T) { } } +// TestS3PresignAndPublicURLEndpoints verifies the split-endpoint behavior +// without needing a live S3/MinIO server: presigning signs locally and +// PublicURL is pure string construction. +func TestS3PresignAndPublicURLEndpoints(t *testing.T) { + ctx := context.Background() + const ( + internal = "http://minio:9000" + public = "http://localhost:9000" + bucket = "mpiper" + key = "media/raw/abc" + ) + + t.Run("public endpoint set: presign + PublicURL use public host", func(t *testing.T) { + st, err := NewS3Storage(ctx, Config{ + Provider: S3Provider, + Region: "us-east-1", + Endpoint: internal, + PublicEndpoint: public, + Bucket: bucket, + AccessKeyID: "minioadmin", + SecretAccessKey: "minioadmin", + }, nil, zap.NewNop()) + if err != nil { + t.Fatalf("NewS3Storage: %v", err) + } + + url, err := st.GeneratePresignedURL(ctx, bucket, key, &PresignedURLOptions{Method: "PUT", ContentType: "image/jpeg", ExpiresInSeconds: 300}) + if err != nil { + t.Fatalf("GeneratePresignedURL: %v", err) + } + if !strings.HasPrefix(url, public) { + t.Errorf("presigned url should target public host %q, got %s", public, url) + } + if strings.Contains(url, "minio:9000") { + t.Errorf("presigned url should not contain internal host, got %s", url) + } + if !strings.Contains(url, "X-Amz-Signature") { + t.Errorf("presigned url missing signature: %s", url) + } + + pub, err := st.PublicURL(ctx, bucket, key) + if err != nil { + t.Fatalf("PublicURL: %v", err) + } + want := public + "/" + bucket + "/" + key + if pub != want { + t.Errorf("PublicURL = %q, want %q", pub, want) + } + }) + + t.Run("public endpoint unset: falls back to internal host (back-compat)", func(t *testing.T) { + st, err := NewS3Storage(ctx, Config{ + Provider: S3Provider, + Region: "us-east-1", + Endpoint: internal, + Bucket: bucket, + AccessKeyID: "minioadmin", + SecretAccessKey: "minioadmin", + }, nil, zap.NewNop()) + if err != nil { + t.Fatalf("NewS3Storage: %v", err) + } + + url, err := st.GeneratePresignedURL(ctx, bucket, key, &PresignedURLOptions{Method: "PUT", ExpiresInSeconds: 300}) + if err != nil { + t.Fatalf("GeneratePresignedURL: %v", err) + } + if !strings.HasPrefix(url, internal) { + t.Errorf("presigned url should target internal host %q, got %s", internal, url) + } + + pub, err := st.PublicURL(ctx, bucket, key) + if err != nil { + t.Fatalf("PublicURL: %v", err) + } + want := internal + "/" + bucket + "/" + key + if pub != want { + t.Errorf("PublicURL = %q, want %q", pub, want) + } + }) +} + func createTestBucket(t *testing.T, ctx context.Context, cfg Config, bucket string) { t.Helper() awsCfg, err := awsconfig.LoadDefaultConfig(ctx, diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..dc499c8 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,10 @@ +psycopg[binary]>=3.3.2 +psycopg-pool>=3.3.0 +redis>=7.1.0 +pillow>=12.0.0 +google-cloud-storage>=3.7.0 +boto3>=1.35.0 +python-dotenv>=1.2.1 +opentelemetry-api>=1.39.1 +opentelemetry-sdk>=1.39.1 +opentelemetry-exporter-otlp>=1.39.1 diff --git a/scripts/demo-e2e.sh b/scripts/demo-e2e.sh new file mode 100755 index 0000000..66ad7e6 --- /dev/null +++ b/scripts/demo-e2e.sh @@ -0,0 +1,321 @@ +#!/usr/bin/env bash +# scripts/demo-e2e.sh +# +# End-to-end demo driver for MPiper, run from the HOST machine exactly like a +# real client would: it presigns an upload, PUTs the file straight to MinIO over +# the published localhost:9000 endpoint, marks the asset complete, then waits for +# the worker to produce variants and for webhook deliveries to land. +# +# It exercises BOTH an image and a video, plus the full webhook lifecycle +# (job.starting -> job.started -> job.done). +# +# Prerequisites — bring the stack up WITH the webhooks overlay first: +# +# docker compose -f docker-compose.yml -f docker-compose.webhooks.yml up -d --build +# +# Then run: +# +# ./scripts/demo-e2e.sh +# +# Requirements on the host: bash, curl, jq, docker, and a python3 with the +# `cryptography` package (used only to mint the auth token, matching +# pkg/utils/crypt.go). Override defaults via env vars (API, ENCRYPTION_KEY, …). + +set -uo pipefail + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- +API="${API:-http://localhost:5010}" +ENCRYPTION_KEY="${ENCRYPTION_KEY:-}" +USER_ID="${USER_ID:-demo-user}" +WEBHOOK_RECEIVER_URL="${WEBHOOK_RECEIVER_URL:-http://webhook-receiver:8080}" # internal docker name; reached by the in-container dispatcher +WEBHOOK_SECRET="${WEBHOOK_SECRET:-demo-webhook-secret}" +PG_CONTAINER="${PG_CONTAINER:-mpiper-postgres}" +PG_USER="${PG_USER:-mpiper}" +PG_DB="${PG_DB:-mpiper}" +RECEIVER_CONTAINER="${RECEIVER_CONTAINER:-mpiper-webhook-receiver}" + +ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +IMAGE_FILE="${IMAGE_FILE:-$ROOT_DIR/worker/tests/test_assets/image.jpg}" +VIDEO_FILE="${VIDEO_FILE:-$ROOT_DIR/tests/test_assets/sample.mp4}" + +IMAGE_READY_TIMEOUT="${IMAGE_READY_TIMEOUT:-60}" +VIDEO_READY_TIMEOUT="${VIDEO_READY_TIMEOUT:-120}" +WEBHOOK_TIMEOUT="${WEBHOOK_TIMEOUT:-30}" + +# --------------------------------------------------------------------------- +# Output helpers +# --------------------------------------------------------------------------- +if [ -t 1 ]; then + RED=$'\033[0;31m'; GREEN=$'\033[0;32m'; BLUE=$'\033[0;34m'; BOLD=$'\033[1m'; NC=$'\033[0m' +else + RED=""; GREEN=""; BLUE=""; BOLD=""; NC="" +fi + +PASS_COUNT=0 +FAIL_COUNT=0 + +step() { printf '\n%s== %s ==%s\n' "$BLUE$BOLD" "$1" "$NC"; } +info() { printf ' %s\n' "$1"; } +pass() { PASS_COUNT=$((PASS_COUNT+1)); printf ' %s✓ PASS%s %s\n' "$GREEN" "$NC" "$1"; } +fail() { FAIL_COUNT=$((FAIL_COUNT+1)); printf ' %s✗ FAIL%s %s\n' "$RED" "$NC" "$1"; } +die() { printf '\n%sFATAL:%s %s\n' "$RED$BOLD" "$NC" "$1" >&2; exit 1; } + +# --------------------------------------------------------------------------- +# Preflight +# --------------------------------------------------------------------------- +step "Preflight checks" + +for bin in curl jq docker; do + command -v "$bin" >/dev/null 2>&1 || die "'$bin' is required but not installed." +done + +# Pick a python3 that can import cryptography (for token minting). +PYTHON_BIN="" +for cand in python3 python; do + if command -v "$cand" >/dev/null 2>&1 && "$cand" -c "import cryptography" >/dev/null 2>&1; then + PYTHON_BIN="$cand"; break + fi +done +[ -n "$PYTHON_BIN" ] || die "Need a python3 with the 'cryptography' package on PATH (pip install cryptography)." +info "Using python: $(command -v "$PYTHON_BIN")" + +# Resolve the encryption key. Prefer the env var; otherwise read it from .env.local. +if [ -z "$ENCRYPTION_KEY" ] && [ -f "$ROOT_DIR/.env.local" ]; then + ENCRYPTION_KEY="$(grep -E '^ENCRYPTION_KEY=' "$ROOT_DIR/.env.local" | head -1 | cut -d= -f2-)" +fi +[ -n "$ENCRYPTION_KEY" ] || die "ENCRYPTION_KEY not set and not found in .env.local." +[ "${#ENCRYPTION_KEY}" -eq 32 ] || die "ENCRYPTION_KEY must be exactly 32 bytes (got ${#ENCRYPTION_KEY})." + +[ -f "$IMAGE_FILE" ] || die "Image fixture not found: $IMAGE_FILE" +[ -f "$VIDEO_FILE" ] || die "Video fixture not found: $VIDEO_FILE (generate with ffmpeg or set VIDEO_FILE)." +info "Image fixture: $IMAGE_FILE ($(wc -c < "$IMAGE_FILE" | tr -d ' ') bytes)" +info "Video fixture: $VIDEO_FILE ($(wc -c < "$VIDEO_FILE" | tr -d ' ') bytes)" + +# API health +if curl -fsS "$API/healthz" >/dev/null 2>&1; then + pass "API healthy at $API" +else + die "API not reachable at $API/healthz — is the stack up?" +fi + +# Postgres reachable via the container +if docker exec "$PG_CONTAINER" pg_isready -U "$PG_USER" -d "$PG_DB" >/dev/null 2>&1; then + pass "Postgres healthy ($PG_CONTAINER)" +else + die "Postgres not ready in container $PG_CONTAINER." +fi + +# Webhook receiver present (overlay) +if docker ps --format '{{.Names}}' | grep -q "^${RECEIVER_CONTAINER}$"; then + pass "Webhook receiver running ($RECEIVER_CONTAINER)" +else + die "Webhook receiver $RECEIVER_CONTAINER not running. Start with the webhooks overlay: + docker compose -f docker-compose.yml -f docker-compose.webhooks.yml up -d" +fi + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- +psql_q() { docker exec "$PG_CONTAINER" psql -U "$PG_USER" -d "$PG_DB" -tAc "$1" 2>/dev/null; } + +mint_token() { + ENCRYPTION_KEY="$ENCRYPTION_KEY" USER_ID="$USER_ID" "$PYTHON_BIN" - <<'PY' +import base64, os +from cryptography.hazmat.primitives.ciphers.aead import AESGCM +key = os.environ["ENCRYPTION_KEY"].encode() +uid = os.environ["USER_ID"].encode() +nonce = os.urandom(12) +ct = AESGCM(key).encrypt(nonce, uid, None) +print(base64.urlsafe_b64encode(nonce + ct).rstrip(b"=").decode()) +PY +} + +# --------------------------------------------------------------------------- +# Auth token + webhook registration +# --------------------------------------------------------------------------- +step "Mint auth token (user=$USER_ID)" +TOKEN="$(mint_token)" || die "token generation failed" +[ -n "$TOKEN" ] || die "empty token" +AUTH="Authorization: Bearer $TOKEN" +pass "Token minted (${TOKEN:0:16}…)" + +step "Register webhook" +REG_RESP="$(curl -fsS -X POST "$API/api/v1/webhooks" \ + -H "$AUTH" -H "Content-Type: application/json" \ + -d "{\"url\":\"$WEBHOOK_RECEIVER_URL\",\"secret\":\"$WEBHOOK_SECRET\",\"events\":[\"job.starting\",\"job.started\",\"job.done\",\"job.failed\"]}")" \ + || die "webhook registration request failed" +WEBHOOK_ID="$(echo "$REG_RESP" | jq -r '.data.id // empty')" +if [ -n "$WEBHOOK_ID" ]; then + pass "Webhook registered (id=$WEBHOOK_ID -> $WEBHOOK_RECEIVER_URL)" +else + die "webhook registration returned no id: $REG_RESP" +fi + +# --------------------------------------------------------------------------- +# Core pipeline runner (per asset) +# --------------------------------------------------------------------------- +# run_asset