fix: don't coerce URL strings in str-typed inputs (regression #2868)#2872
fix: don't coerce URL strings in str-typed inputs (regression #2868)#2872markphelps merged 3 commits intomainfrom
Conversation
#2868) str-typed inputs that contain URLs (e.g. API endpoints) are incorrectly coerced into cog.Path/File objects by coerce_url_strings() in coglet. These tests assert that str and list[str] inputs preserve URL strings as-is. Both tests are expected to FAIL on the current codebase.
coerce_url_strings() was iterating over all input fields and coercing any URL-like string via Path.validate(), even for fields declared as plain str. This meant API endpoint URLs passed as string inputs were incorrectly converted into cog.Path objects. Refactor detect_file_fields() into classify_fields() which returns a FieldClassification with separate sets for File-typed and Path-typed fields. coerce_url_strings() now skips any field not in either set, leaving str and other non-file types untouched.
|
Hi @markphelps thanks for putting this together. I opened #2870 with a fix for the same regression; if helpful, please feel free to cherry-pick any parts that fit your approach. I’m happy to close mine if you prefer landing everything through this PR. |
|
Hey @erayack ! Thanks for the offer, and for the detailed work in #2870! We ended up arriving at essentially the same core fix, only coercing fields that are actually typed as We went with a slightly simpler approach here (two One thing I noticed in your PR that we didn't handle: explicit support for PEP 604 unions (types.UnionType for the X | Y syntax). That's a legitimate edge case on Python 3.10+ — would you be up for opening a follow-up PR for that? Going to close #2870 in favor of this one, but really appreciate you jumping on this quickly. |
|
Thanks for the thoughtful review and for calling this out @markphelps! It makes sense to keep the fix minimal and avoid fallback coercion. I appreciate you highlighting the |
Summary
str-typed predictor inputs containing URLs (e.g. API endpoints) are incorrectly coerced intocog.Path/cog.Fileobjects bycoerce_url_strings()in cogletstrandlist[str]input types, via bothcog predictCLI andcog serveHTTP APIRoot Cause
coerce_url_strings()incrates/coglet-python/src/input.rs:186-234iterates over all keys in the input payload. Thefile_fieldsset is only used to choose betweenFile.validate()andPath.validate(), but fields not infile_fieldsstill get coerced viaPath.validate(). The function should skip fields whose declared type is neitherFilenorPath.Test Plan
Two new txtar integration tests:
string_input_url_not_coerced.txtar— singlestrinput with a URL valuestring_input_url_list_not_coerced.txtar—list[str]input with URL valuesCloses #2868