Skip to content

fix: don't coerce URL strings in str-typed inputs (regression #2868)#2872

Merged
markphelps merged 3 commits intomainfrom
input-url-files-regression
Mar 26, 2026
Merged

fix: don't coerce URL strings in str-typed inputs (regression #2868)#2872
markphelps merged 3 commits intomainfrom
input-url-files-regression

Conversation

@markphelps
Copy link
Contributor

Summary

  • Adds integration tests reproducing #2868: str-typed predictor inputs containing URLs (e.g. API endpoints) are incorrectly coerced into cog.Path/cog.File objects by coerce_url_strings() in coglet
  • Tests cover both str and list[str] input types, via both cog predict CLI and cog serve HTTP API
  • These tests are expected to fail on the current codebase — the fix will follow

Root Cause

coerce_url_strings() in crates/coglet-python/src/input.rs:186-234 iterates over all keys in the input payload. The file_fields set is only used to choose between File.validate() and Path.validate(), but fields not in file_fields still get coerced via Path.validate(). The function should skip fields whose declared type is neither File nor Path.

Test Plan

Two new txtar integration tests:

  • string_input_url_not_coerced.txtar — single str input with a URL value
  • string_input_url_list_not_coerced.txtarlist[str] input with URL values

Closes #2868

#2868)

str-typed inputs that contain URLs (e.g. API endpoints) are incorrectly
coerced into cog.Path/File objects by coerce_url_strings() in coglet.
These tests assert that str and list[str] inputs preserve URL strings
as-is. Both tests are expected to FAIL on the current codebase.
coerce_url_strings() was iterating over all input fields and coercing
any URL-like string via Path.validate(), even for fields declared as
plain str. This meant API endpoint URLs passed as string inputs were
incorrectly converted into cog.Path objects.

Refactor detect_file_fields() into classify_fields() which returns a
FieldClassification with separate sets for File-typed and Path-typed
fields. coerce_url_strings() now skips any field not in either set,
leaving str and other non-file types untouched.
@erayack
Copy link

erayack commented Mar 26, 2026

Hi @markphelps thanks for putting this together. I opened #2870 with a fix for the same regression; if helpful, please feel free to cherry-pick any parts that fit your approach. I’m happy to close mine if you prefer landing everything through this PR.

@markphelps markphelps marked this pull request as ready for review March 26, 2026 20:30
@markphelps markphelps requested a review from a team as a code owner March 26, 2026 20:30
@markphelps
Copy link
Contributor Author

Hey @erayack !

Thanks for the offer, and for the detailed work in #2870! We ended up arriving at essentially the same core fix, only coercing fields that are actually typed as File or Path.

We went with a slightly simpler approach here (two HashSets instead of the CoercionPlan enum), mainly to keep the change minimal and avoid the FallbackAllPath variant, which would preserve the coerce-everything behavior when type hints fail to resolve. The original code already returned an empty set in that case (meaning no coercion), which seems like the safer default.

One thing I noticed in your PR that we didn't handle: explicit support for PEP 604 unions (types.UnionType for the X | Y syntax). That's a legitimate edge case on Python 3.10+ — would you be up for opening a follow-up PR for that?

Going to close #2870 in favor of this one, but really appreciate you jumping on this quickly.

@erayack
Copy link

erayack commented Mar 26, 2026

Thanks for the thoughtful review and for calling this out @markphelps! It makes sense to keep the fix minimal and avoid fallback coercion.

I appreciate you highlighting the X | Y (types.UnionType) gap — I’ll open a small follow-up PR focused just on PEP 604 union handling.

Copy link
Member

@michaeldwan michaeldwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fix, looks good

@markphelps markphelps added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit 635fffb Mar 26, 2026
35 checks passed
@markphelps markphelps deleted the input-url-files-regression branch March 26, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Input: Not all URLs are files, some are to APIs, unable to disable. 0.16.2 regression

3 participants