apify · mtrunkat · May 16, 2026 · May 15, 2026 · May 15, 2026 · May 16, 2026
diff --git a/mongodb-query-index-check/README.md b/mongodb-query-index-check/README.md
@@ -0,0 +1,113 @@
+# `mongodb-query-index-check` GitHub Action
+
+Reviews a pull request for **MongoDB queries that don't use an appropriate index**. For every changed or new MongoDB call (`.find`, `.findOne`, `.aggregate`, `.update*`, `.delete*`, `.findOneAnd*`, `.countDocuments`, `.distinct`, …) the action:
+
+1. Cross-references the query's filter and sort fields against the canonical index definitions in [`@apify-packages/mongo-indexes`](https://github.com/apify/apify-core/tree/develop/src/packages/mongo-indexes/src) (sparse-fetched from `apify/apify-core@develop`, or read straight from the caller's workspace when the action runs on `apify-core` itself).
+2. Invokes [`anthropics/claude-code-action`](https://github.com/anthropics/claude-code-action) (recent Opus) to apply an ESR-aware rubric (Equality → Sort → Range) and post inline review comments with severity tags (`🔴 critical`, `🟠 high`, `🟡 medium`, `🟢 low`).
+3. Fails the check whenever a finding is reported (unless `request-changes: false`) — useful as a required check in branch protection.
+
+The action runs a cheap pre-filter first (it lists PR files, glob-matches, and grep-checks for MongoDB call patterns in changed hunks) and only invokes Claude when something relevant changed. Repos that never touch MongoDB pay only the GitHub API cost of `pulls.listFiles`.
+
+## Usage
+
+### `apify-core` (the action reads its own workspace)
+
+```yaml
+# .github/workflows/mongodb_query_index_check.yaml
+name: MongoDB query index check
+
+on:
+  pull_request:
+    types: [opened, reopened, synchronize, ready_for_review]
+
+jobs:
+  check:
+    if: github.event.pull_request.draft == false
+    runs-on: ubuntu-22.04-arm64
+    permissions:
+      contents: read
+      pull-requests: write
+      id-token: write
+    steps:
+      - uses: actions/checkout@v6
+      - uses: apify/actions/mongodb-query-index-check@v1
+        with:
+          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+```
+
+### `apify-proxy`, `apify-web`, … (the action fetches indexes from `apify-core`)
+
+```yaml
+# .github/workflows/mongodb_query_index_check.yaml
+name: MongoDB query index check
+
+on:
+  pull_request:
+    types: [opened, reopened, synchronize, ready_for_review]
+
+jobs:
+  check:
+    if: github.event.pull_request.draft == false
+    runs-on: ubuntu-22.04-arm64
+    permissions:
+      contents: read
+      pull-requests: write
+      id-token: write
+    steps:
+      - uses: actions/checkout@v6
+      - uses: apify/actions/mongodb-query-index-check@v1
+        with:
+          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+          # PAT with `contents: read` on apify/apify-core. The default GITHUB_TOKEN only sees the
+          # current repo, so without this the action would fail to fetch the indexes.
+          apify-core-token: ${{ secrets.APIFY_CORE_RO_TOKEN }}
+```
+
+## Inputs
+
+| Name | Required | Default | Description |
+| --- | --- | --- | --- |
+| `anthropic-api-key` | yes | — | Anthropic API key passed through to `anthropics/claude-code-action`. |
+| `github-token` | no | `${{ github.token }}` | Token used to post review comments. |
+| `apify-core-token` | no | _(empty)_ | When set, fetches `mongo-indexes` from `apify/apify-core@develop`. When empty, the action assumes it is running on `apify-core` and reads `src/packages/mongo-indexes/src` from the workspace. |
+| `max-turns` | no | `30` | Maximum turns Claude may take. |
+| `paths` | no | TS/JS source files | Comma-separated globs to include. |
+| `request-changes` | no | `true` | When `true`, fail the check on any finding. When `false`, comment only. |
+
+## Outputs
+
+| Name | Description |
+| --- | --- |
+| `should-run` | `true` when the pre-filter detected MongoDB changes and Claude was invoked, `false` otherwise. |
+| `changed-files` | JSON array of files Claude reviewed. |
+| `max-severity` | Highest severity found: `none`, `low`, `medium`, `high`, or `critical`. |
+
+## How it works
+
+1. **Validate inputs**: checks the event is `pull_request[_target]`, rejects fork PRs, validates `request-changes`, and seeds `$RESULT_PATH` for the Finalize step.
+2. **Pre-filter** (`index.mts` → `preCheck()`): pages through `pulls.listFiles`, applies the `paths` glob and a fixed exclude list (`node_modules`, `dist`, `build`, tests, `mongo-indexes` package itself), and greps for MongoDB collection-method patterns in changed hunks. If nothing matches, the action sets `should-run=false` and exits before spending Anthropic credits.
+3. **Source resolution**: either sparse-checkouts `apify/apify-core@develop` (when `apify-core-token` is set) into a workspace subdir, or points at the caller's `src/packages/mongo-indexes/src` directly.
+4. **Prompt render**: substitutes the changed-files path, mongo-indexes directory, PR metadata, and request-changes mode into `prompts/review.md` via envsubst.
+5. **Claude Code run**: invokes `anthropics/claude-code-action@v1` (recent Opus) with a tight allowlist — GitHub MCP for pull-request read and pending-review tools, `Read`, `Write` (for the result file), and a handful of read-only `Bash(...)` commands.
+6. **Finalize**: reads the single-word severity Claude wrote to `${RUNNER_TEMP}/mongo-index-result.txt`. Exits non-zero when `request-changes: true` and Claude reported any finding; otherwise succeeds.
+
+## Severity rubric
+
+| Severity | Symptom |
+| --- | --- |
+| 🔴 critical | No index covers the query — collection scan. |
+| 🟠 high | Index exists but doesn't match: prefix missed, partial-filter incompatible, sort can't use the index, unanchored `$regex` on indexed field. |
+| 🟡 medium | Index used but inefficient: low selectivity, likely poor read/return ratio, wrong sort direction, `$or` branch without an index. |
+| 🟢 low | Stylistic: tighter partial filter, covered-query opportunity, missing index name. |
+
+Any finding turns the check red unless `request-changes` is set to `false`.
+
+## Limitations
+
+- **Fork PRs are rejected**: the action's Validate step fails fast when `head.repo` differs from `base.repo`. On `pull_request_target` this would otherwise hand a write-capable token to Claude while it analyses attacker-controlled diff content (prompt-injection risk); on `pull_request` it can't authenticate anyway. Internal PRs only.
+- **JS array methods**: the pre-filter regex matches `.find(`, `.findOne(`, etc. on any object, so `array.find(x => …)` still triggers Claude to look — Claude then disambiguates by inspecting the receiver. This errs on the side of running more often, never less.
+- **Dynamic collection access** (e.g. `db[name].findOne(...)`): Claude is instructed to skip findings where it can't determine the collection reliably.
+
+## Releasing a new version
+
+This action is published as part of the `apify/actions` repo. See the [repo README](../README.md) for the release-please flow.
diff --git a/mongodb-query-index-check/action.yaml b/mongodb-query-index-check/action.yaml
@@ -0,0 +1,186 @@
+name: 'MongoDB Query Index Check'
+description: >-
+  Reviews a pull request for MongoDB queries that don't use an appropriate index. Cross-references
+  changed queries against the canonical mongo-indexes definitions, then asks Claude Code to post
+  inline review comments and (optionally) fail the check when any finding is reported.
+
+inputs:
+  anthropic-api-key:
+    description: 'Anthropic API key passed through to anthropics/claude-code-action.'
+    required: true
+  github-token:
+    description: 'GitHub token used to post review comments. Defaults to GITHUB_TOKEN.'
+    required: false
+    default: ${{ github.token }}
+  apify-core-token:
+    description: >-
+      GitHub token with `contents: read` on apify/apify-core. When set, the action sparse-checkouts
+      `src/packages/mongo-indexes/src` from apify/apify-core@develop. When empty (the default), the
+      action assumes it is running on apify/apify-core itself and reads the indexes from the
+      caller's already-checked-out workspace at `src/packages/mongo-indexes/src`.
+    required: false
+    default: ''
+  max-turns:
+    description: 'Maximum turns Claude may take. Default 30.'
+    required: false
+    default: '30'
+  paths:
+    description: >-
+      Comma-separated glob patterns of files to inspect (matched against PR file paths).
+      Default covers TypeScript and JavaScript source files.
+    required: false
+    default: '**/*.ts,**/*.mts,**/*.cts,**/*.tsx,**/*.js,**/*.mjs,**/*.cjs,**/*.jsx'
+  request-changes:
+    description: 'If `true`, fail the check when any finding is reported. If `false`, leave comments only.'
+    required: false
+    default: 'true'
+
+outputs:
+  should-run:
+    description: '`true` when MongoDB-related changes were detected (and Claude was invoked).'
+    value: ${{ steps.pre-check.outputs.should-run }}
+  changed-files:
+    description: 'JSON array of files Claude reviewed.'
+    value: ${{ steps.pre-check.outputs.changed-files }}
+  max-severity:
+    description: 'Highest severity in the review: one of `none`, `low`, `medium`, `high`, `critical`.'
+    value: ${{ steps.finalize.outputs.max-severity }}
+
+runs:
+  using: composite
+  steps:
+    - name: Validate inputs
+      shell: bash
+      env:
+        EVENT_NAME: ${{ github.event_name }}
+        HEAD_REPO: ${{ github.event.pull_request.head.repo.full_name }}
+        BASE_REPO: ${{ github.event.pull_request.base.repo.full_name }}
+        REQUEST_CHANGES_INPUT: ${{ inputs.request-changes }}
+      run: |
+        set -euo pipefail
+        if [ "$EVENT_NAME" != "pull_request" ] && [ "$EVENT_NAME" != "pull_request_target" ]; then
+          echo "::error::This action only runs on 'pull_request' or 'pull_request_target' events (got '$EVENT_NAME')."
+          exit 1
+        fi
+        # Reject fork PRs: on `pull_request_target`, Claude would receive write-capable secrets
+        # (anthropic-api-key, apify-core-token) while analyzing attacker-controlled diff content,
+        # which is a prompt-injection vector. On `pull_request` it's mostly harmless (secrets aren't
+        # exposed to forks) but we still bail out so the action behaves predictably.
+        if [ -n "$HEAD_REPO" ] && [ -n "$BASE_REPO" ] && [ "$HEAD_REPO" != "$BASE_REPO" ]; then
+          echo "::error::This action does not support pull requests from forks ('$HEAD_REPO' → '$BASE_REPO'). Re-run from a branch in the base repository."
+          exit 1
+        fi
+        case "$REQUEST_CHANGES_INPUT" in
+          true|false) ;;
+          *) echo "::error::Invalid request-changes '$REQUEST_CHANGES_INPUT' (must be the literal string 'true' or 'false')."; exit 1 ;;
+        esac
+        # Seed the result file so Finalize (runs with `if: always()`) always sees a defined $RESULT_PATH.
+        echo "RESULT_PATH=${RUNNER_TEMP}/mongo-index-result.txt" >> "$GITHUB_ENV"
+        printf 'none' > "${RUNNER_TEMP}/mongo-index-result.txt"
+
+    - name: Pre-check PR diff
+      id: pre-check
+      uses: actions/github-script@v8
+      env:
+        INPUT_PATHS: ${{ inputs.paths }}
+        INPUT_PATHS_IGNORE: '**/node_modules/**,**/dist/**,**/build/**,**/test/**,**/__tests__/**,**/*.test.*,**/*.spec.*,**/mongo-indexes/**'
+        OUTPUT_CHANGED_FILES_PATH: ${{ runner.temp }}/mongo-index-changed-files.json
+      with:
+        github-token: ${{ inputs.github-token }}
+        script: |
+          const { preCheck } = require('${{ github.action_path }}/index.mts');
+          await preCheck({ github, context, core, env: process.env });
+
+    - name: Checkout apify-core mongo-indexes
+      if: steps.pre-check.outputs.should-run == 'true' && inputs.apify-core-token != ''
+      uses: actions/checkout@v6
+      with:
+        repository: apify/apify-core
+        ref: develop
+        token: ${{ inputs.apify-core-token }}
+        path: __mongo_index_check_apify_core
+        sparse-checkout: src/packages/mongo-indexes/src
+        sparse-checkout-cone-mode: false
+        fetch-depth: 1
+
+    - name: Resolve mongo-indexes directory
+      if: steps.pre-check.outputs.should-run == 'true'
+      shell: bash
+      env:
+        APIFY_CORE_TOKEN: ${{ inputs.apify-core-token }}
+      run: |
+        set -euo pipefail
+        if [ -n "$APIFY_CORE_TOKEN" ]; then
+          indexes_dir="${GITHUB_WORKSPACE}/__mongo_index_check_apify_core/src/packages/mongo-indexes/src"
+          origin_label="apify-core@develop"
+        else
+          indexes_dir="${GITHUB_WORKSPACE}/src/packages/mongo-indexes/src"
+          origin_label="local workspace (assuming caller is apify-core)"
+        fi
+        if [ ! -d "$indexes_dir" ]; then
+          echo "::error::Could not find mongo-indexes source directory at: $indexes_dir"
+          exit 1
+        fi
+        file_count=$(find "$indexes_dir" -maxdepth 2 -type f -name '*.ts' | wc -l)
+        echo "Resolved mongo-indexes from ${origin_label}: ${indexes_dir} (${file_count} .ts file(s))."
+        echo "MONGO_INDEXES_DIR=${indexes_dir}" >> "$GITHUB_ENV"
+
+    - name: Render Claude prompt
+      id: render
+      if: steps.pre-check.outputs.should-run == 'true'
+      shell: bash
+      env:
+        PR_NUMBER: ${{ github.event.pull_request.number }}
+        REPO: ${{ github.repository }}
+        BASE_SHA: ${{ github.event.pull_request.base.sha }}
+        HEAD_SHA: ${{ github.event.pull_request.head.sha }}
+        CHANGED_FILES_PATH: ${{ runner.temp }}/mongo-index-changed-files.json
+        REQUEST_CHANGES_MODE: ${{ inputs.request-changes }}
+      run: |
+        set -euo pipefail
+        prompt_file="${RUNNER_TEMP}/mongo-index-prompt.md"
+        envsubst '$PR_NUMBER $REPO $BASE_SHA $HEAD_SHA $MONGO_INDEXES_DIR $CHANGED_FILES_PATH $RESULT_PATH $REQUEST_CHANGES_MODE' \
+          < "${GITHUB_ACTION_PATH}/prompts/review.md" \
+          > "$prompt_file"
+        delimiter="EOF_${RANDOM}${RANDOM}${RANDOM}"
+        {
+          echo "prompt<<${delimiter}"
+          cat "$prompt_file"
+          echo
+          echo "${delimiter}"
+        } >> "$GITHUB_OUTPUT"
+
+    - name: Run Claude Code review
+      if: steps.pre-check.outputs.should-run == 'true'
+      uses: anthropics/claude-code-action@v1
+      with:
+        anthropic_api_key: ${{ inputs.anthropic-api-key }}
+        github_token: ${{ inputs.github-token }}
+        prompt: ${{ steps.render.outputs.prompt }}
+        claude_args: >-
+          --max-turns ${{ inputs.max-turns }}
+          --model claude-opus-4-7
+          --allowedTools "mcp__github__pull_request_read,mcp__github__pull_request_review_write,mcp__github__create_pending_pull_request_review,mcp__github__submit_pending_pull_request_review,mcp__github__add_comment_to_pending_review,mcp__github_inline_comment__create_inline_comment,Read,Write,Bash(ls:*),Bash(cat:*),Bash(grep:*),Bash(rg:*),Bash(find:*),Bash(test:*),Bash(echo:*),Bash(printf:*),Bash(head:*),Bash(tail:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr review:*),Bash(gh api:*)"
+
+    - name: Finalize review result
+      id: finalize
+      if: always() && steps.pre-check.outputs.should-run == 'true'
+      shell: bash
+      env:
+        INPUT_REQUEST_CHANGES: ${{ inputs.request-changes }}
+      run: |
+        set -euo pipefail
+
+        max_severity="$(tr -d '[:space:]' < "$RESULT_PATH" | tr '[:upper:]' '[:lower:]')"
+        case "$max_severity" in
+          none|low|medium|high|critical) ;;
+          *) echo "::warning::Unexpected severity value '${max_severity}' in result file, treating as 'none'."; max_severity="none" ;;
+        esac
+
+        echo "max-severity=${max_severity}" >> "$GITHUB_OUTPUT"
+        echo "Max severity from review: ${max_severity}."
+
+        if [ "$INPUT_REQUEST_CHANGES" = "true" ] && [ "$max_severity" != "none" ]; then
+          echo "::error::MongoDB index review found '${max_severity}' issues. See the inline review comments on this PR."
+          exit 1
+        fi