Skip to content

fix(cli): support $-prefixed pagination params for Socrata-style APIs#1204

Open
youdidwhat-towho wants to merge 1 commit into
mvanhorn:mainfrom
youdidwhat-towho:fix/socrata-prefix-pagination
Open

fix(cli): support $-prefixed pagination params for Socrata-style APIs#1204
youdidwhat-towho wants to merge 1 commit into
mvanhorn:mainfrom
youdidwhat-towho:fix/socrata-prefix-pagination

Conversation

@youdidwhat-towho
Copy link
Copy Markdown

Problem

Press's generator emits paginated-GET handlers that put each Param's Name directly into the URL query string. Socrata-backed APIs (NYC OpenData and ~70 other major US municipal portals) require a literal $ prefix on pagination + SoQL params:

  • $limit, $offset (pagination)
  • $where, $select, $order, $group (SoQL passthrough)

Generated CLIs against Socrata returned HTTP 400 Unrecognized arguments [limit] on every live call.

Fix (Option 2 — per-param url_name override)

Adds an optional url_name field to spec.Param that overrides the URL query-key emission while leaving the CLI flag name, Go identifier, and JSON body emission unchanged.

params:
  - name: limit
    url_name: "$limit"   # NEW — optional
    type: integer

--limit stays the CLI flag, flagLimit stays the Go identifier, but the URL emits ?$limit=N.

Why this approach (vs auto-detect)

Auto-detecting *.socrata.com / *.cityofnewyork.us would couple the generator to URL patterns, which doesn't scale to other custom-prefix APIs. A spec-level opt-in is non-breaking and explicit — existing specs work unchanged, new Socrata specs declare the prefix where they declare every other param attribute.

What's NOT touched

  • Path substitution, JSON body keys, MCP tool bindings still use Name. url_name affects URL query keys only.
  • Pagination's limit_param / cursor_param already accepted user-supplied strings, so spec-level pagination config already worked. This fixes the remaining gap for non-pagination SoQL params ($where, $select, etc.) where the user declares them as ordinary params.
  • ArcGIS, REST APIs, GraphQL — anything without url_name declared keeps emitting plain Name.

Tests

New file internal/generator/url_name_test.go:

  • TestParamWireNameUnitspec.Param.WireName() returns URLName when set, Name when empty
  • TestParamURLNameOverridesWireKey — generated handler emits params["$limit"] for a param with url_name: "$limit", plain params["borough"] for a sibling param without override, and CLI flag identifiers stay clean (no $ leak into Go)
  • TestParamWithoutURLNameUnchanged — regression guard: specs without url_name keep plain params["limit"], params["owner"] emission

Full internal/spec/ + internal/generator/ test suite green:

ok  internal/spec      0.555s
ok  internal/generator 82.551s

Live verification

Before (working around the bug by stripping limit from the spec):

nyc-liens-pp-cli records query --borough 3 --water-debt-only YES
# HTTP 400 if any of limit / where / select supplied

After (with url_name declared):

nyc-liens-pp-cli records query --borough 3 --water-debt-only YES --limit 2
# Returns 2 valid Brooklyn water-only lien records, structured JSON

Regression checkbexar-cad-pp-cli (ArcGIS, no url_name anywhere):

bexar-cad-pp-cli parcels query --where "Owner LIKE '%PULTE%'" --result-record-count 1
# Returns live PULTE HOMES TX data, unchanged

Inspected generated handler for bexar-cad-test confirms plain emission preserved:

params["f"] = fmt.Sprintf("%v", flagF)
params["outFields"] = fmt.Sprintf("%v", flagOutFields)
params["geometry"] = fmt.Sprintf("%v", flagGeometry)

And for nyc-liens-test (mixed plain + url_name):

"borough":         fmt.Sprintf("%v", flagBorough),         // plain
"water_debt_only": fmt.Sprintf("%v", flagWaterDebtOnly),   // plain
"$where":          fmt.Sprintf("%v", flagWhere),           // url_name override
"$select":         fmt.Sprintf("%v", flagSelect),          // url_name override
"$limit":          fmt.Sprintf("%v", flagLimit),           // url_name override

Diff size

6 files, +166 / -16. Five files modified, one new test file.

Follow-up — NOT addressed in this PR

Press's id_field extraction fails on Socrata because Socrata only returns its :id row identifier when explicitly $select-ed. Generated sync runs produce all_items_failed_id_extraction warnings on Socrata datasets. The clean fix is to detect id_field: ":id" in the spec (or any field starting with :) and auto-inject $select=:id,* (or the union of explicit $select + :id) into the URL. Filing as a separate PR to keep this one scoped.

Unlocks

This change unblocks every Socrata-backed CLI Factory target:

  • NYC: ACRIS, PLUTO, DOF Property Charges Balance, HPD Violations, DOB Violations, Tax Lien Sale Lists, etc. (600+ datasets)
  • Cook County / Chicago, Boston, SF, Seattle, Austin, Denver — every major US city on Socrata

@mergify
Copy link
Copy Markdown

mergify Bot commented May 12, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟠 require-ready-label-and-ci

Waiting for

  • check-success = build-and-test
  • check-success = generated-test
  • check-success = go-lint
  • check-success = golden
  • check-success = pr-title
  • check-success = test
Waiting checks: build-and-test, generated-test, go-lint, golden, pr-title, test.
  • check-success = build-and-test
  • check-success = generated-test
  • check-success = go-lint
  • check-success = golden
  • check-success = pr-title
  • check-success = test
  • any of:
    • label = ready-to-merge
    • all of:
      • head = release-please--branches--main
      • title ~= ^chore\(main\): release
  • any of:
    • check-success = Greptile Review
    • check-neutral = Greptile Review
    • check-skipped = Greptile Review

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR introduces an optional url_name field on spec.Param (and a corresponding WireName() method) to let spec authors override the URL query-key for a param while keeping the CLI flag name and Go identifier clean. The immediate motivation is Socrata-style APIs that require a literal $ prefix on query params ($limit, $where, etc.).

  • spec.Param.URLName / WireName() added in internal/spec/spec.go; a paramWireName template helper registered in generator.go / flag_collision.go.
  • All four GET-path code sites in both command_endpoint.go.tmpl and command_promoted.go.tmpl are updated to emit paramWireName instead of {{.Name}}.
  • Three new tests cover the WireName() unit behaviour, the generated wire-key override, and a regression guard for plain params.

Confidence Score: 4/5

Safe for Socrata GET-only use cases, but the fix is incomplete: POST, PUT, and DELETE endpoints with query params will silently ignore url_name and emit the wrong key in the URL.

All GET-based code paths are correctly updated and the spec/helper logic is sound. However, the same {{.Name}} interpolation that was fixed for GET still appears in every non-GET branch of command_endpoint.go.tmpl.

internal/generator/templates/command_endpoint.go.tmpl — seven non-GET params[{{.Name}}] sites need the same paramWireName substitution applied in this PR

Important Files Changed

Filename Overview
internal/spec/spec.go Adds URLName field to Param struct and WireName() method; clean, well-documented addition with no issues beyond the stale IdentName comment noted in prior review
internal/generator/flag_collision.go Adds paramWireName() helper and updates paramIdent() comment; correct and minimal change
internal/generator/generator.go Registers paramWireName as a template function; one-liner addition, no issues
internal/generator/templates/command_endpoint.go.tmpl GET code paths updated to paramWireName, but all POST/PUT/DELETE query-param code paths still use {{.Name}} directly — url_name is silently ignored on non-GET endpoints
internal/generator/templates/command_promoted.go.tmpl All four params-map emission sites updated to paramWireName; promoted template appears complete
internal/generator/url_name_test.go Three well-structured tests covering WireName unit behaviour, wire-key override, and regression guard; only tests GET endpoints so incomplete coverage for POST/PUT/DELETE gaps

Comments Outside Diff (1)

  1. internal/generator/templates/command_endpoint.go.tmpl, line 265-269 (link)

    P1 url_name silently ignored on POST/PUT/DELETE query params

    The PR updates every GET code path to emit paramWireName . but leaves all non-GET method branches unchanged. Lines 265/269 (POST multipart), 292/297 (POST form), 331/335 (POST JSON body + query params), 357/361 (DELETE), 386/390 (PUT multipart), 413/417 (PUT form), and 452/456 (PUT JSON) all still interpolate {{.Name}} directly into the params map that is sent as URL query string. Any spec that declares url_name on a param belonging to a non-GET endpoint will have the override silently ignored — the generated code will emit the bare Name instead of URLName in the URL, producing the same HTTP 400 class of failure this PR is fixing for GET.

    Fix in Codex Fix in Claude Code Fix in Cursor Fix in Conductor

Fix All in Codex Fix All in Claude Code Fix All in Cursor Fix All in Conductor

Reviews (2): Last reviewed commit: "fix(cli): support $-prefixed pagination ..." | Re-trigger Greptile

@tmchow tmchow added the ready-to-merge Allow Mergify to queue and merge this PR when protections pass label May 12, 2026
@mergify mergify Bot added the queued PR is in the Mergify merge queue label May 12, 2026
@mergify
Copy link
Copy Markdown

mergify Bot commented May 12, 2026

Merge Queue Status

  • Entered queue2026-05-12 15:50 UTC · Rule: default
  • 🚫 Left the queue2026-05-12 16:15 UTC · at 1e8bee05f48ef6e501b89e6690edf229d664d48f

This pull request spent 25 minutes 26 seconds in the queue, with no time running CI.

Reason

Pull request #1204 has been dequeued

The pull request rule doesn't match anymore

Hint

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you do update this pull request, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio queue comment.

@mergify mergify Bot added dequeued PR was removed from the Mergify merge queue and removed queued PR is in the Mergify merge queue labels May 12, 2026
@tmchow
Copy link
Copy Markdown
Collaborator

tmchow commented May 12, 2026

@youdidwhat-towho resolve conflicts

Adds an optional `url_name` field to spec.Param that, when set, overrides
the URL query-key while leaving the CLI flag, Go identifier, and JSON body
emission unchanged. Specs targeting Socrata-backed APIs (NYC OpenData and
~70 other major US municipal portals) can now declare:

    params:
      - name: limit
        url_name: "$limit"
        type: integer

and the generated handler emits params["$limit"] in the URL while keeping
--limit as the cobra flag.

Previously generated CLIs against Socrata returned HTTP 400
"Unrecognized arguments [limit]" because the wire-side key was the plain
Name. Specs without url_name are unaffected — verified by regenerating
bexar-cad-pp-cli and confirming params["f"], params["outFields"], etc.
remain plain.

Changes
- internal/spec/spec.go: add URLName field + Param.WireName() method
- internal/generator/flag_collision.go: add paramWireName helper, mirroring
  paramIdent. Updated paramIdent comment to clarify URL emission goes
  through paramWireName, not Name directly.
- internal/generator/generator.go: register paramWireName template helper
- internal/generator/templates/command_endpoint.go.tmpl: 6 emission sites
  switched from {{.Name}} to {{paramWireName .}}
- internal/generator/templates/command_promoted.go.tmpl: 8 emission sites
  switched from {{.Name}} to {{paramWireName .}}
- internal/generator/url_name_test.go: new file. Three tests cover
  WireName() unit behavior, $-prefixed URL emission via url_name, and
  the regression guard for plain specs.

Notes
- Path substitution, JSON body keys, and MCP tool bindings still use Name.
  url_name affects URL query keys only.
- Pagination's limit_param/cursor_param already accepted user-supplied
  strings (e.g. "$limit"), so the pagination loop alone was working;
  this fix closes the remaining gap for non-pagination SoQL params like
  $where, $select, $order.
- Secondary Socrata gap (id_field extraction needs $select=:id,*
  auto-injection) is NOT addressed here. Filed as follow-up.

Live-tested
- nyc-liens-pp-cli regen with url_name on limit/where/select returns
  valid Brooklyn water-only lien records from
  https://data.cityofnewyork.us/resource/9rz4-mjek.json against borough=3
  and water_debt_only=YES.
- bexar-cad-pp-cli regen unchanged — ArcGIS plain params preserved.

Signed-off-by: youdidwhat-towho <chris@lovephoenixhomes.com>
@tmchow tmchow force-pushed the fix/socrata-prefix-pagination branch from 1e8bee0 to af0731a Compare May 13, 2026 18:52
@mergify mergify Bot removed the dequeued PR was removed from the Mergify merge queue label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-merge Allow Mergify to queue and merge this PR when protections pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants