Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict by Rander7 · Pull Request #3247 · langgenius/dify-official-plugins

Rander7 · 2026-06-04T02:11:33Z

Why This Change Is Necessary

The official PaddleOCR SDK has a pyyaml dependency conflict with dify_plugin:

dify_plugin requires pyyaml >= 6.0.3
paddleocr (via paddlex) requires pyyaml == 6.0.2

uv cannot resolve this conflict. This PR replaces SDK calls with direct HTTP requests to the async Job API, following the exact same implementation logic as the SDK (submit → poll → fetch).

Breaking Changes

None for end users - the tool interface and output format remain identical. The plugin continues to accept the same credentials and file inputs.

Internal Changes

Dependencies

No change: continues to use requests for HTTP calls
Removed: paddleocr>=3.6.0 SDK dependency

HTTP API Integration (follows SDK implementation)

Implemented _submit_job() - POST to /api/v2/ocr/jobs
Implemented _poll_job() - exponential backoff polling (3s → max 15s, 600s timeout)
Implemented _parse_ocr_result() / _parse_doc_parsing_result() - JSONL parsing
Authentication: Authorization: Bearer {token} + Client-Platform: dify
Result format: compatible dict structure matching SDK's output

Code Changes

tools/utils.py: Add HTTP async Job API functions, remove SDK imports
Tool files: Replace client.ocr()/parse_document() with call_paddleocr_api()
provider/paddleocr.py: Update credential validation to use HTTP API
test_manual.py: Delete SDK-specific test script

Testing

All three tools maintain their original behavior:

Text Recognition (PP-OCRv5)
Document Parsing (PP-StructureV3)
VL Document Parsing (PaddleOCR-VL-1.6)

Module imports and utility functions verified.

gemini-code-assist

Code Review

This pull request refactors the PaddleOCR integration to use the official paddleocr SDK instead of direct HTTP requests, updating the provider and tools to utilize PaddleOCRClient and its corresponding option builders. Feedback on these changes highlights several important improvements: keeping requests in the dependencies since it is still used for image downloads, checking if a string is an existing file path before performing base64 validation to avoid false positives, handling trailing slashes in the API URL extraction, and introducing a bytes_to_temp_file helper to write raw binary data directly to temporary files, avoiding redundant base64 encoding and decoding.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-04T02:13:17Z

 dependencies = [
    "dify_plugin>=0.9.0",
-    "requests>=2.34.2",
+    "paddleocr>=3.6.0",
 ]


The requests library is still imported and used in tools/paddleocr/tools/utils.py inside the download_image_from_url function. Removing it from the explicit dependencies in pyproject.toml can lead to runtime errors if it is not transitively installed by other packages. Please keep requests in the dependencies list.

Suggested change

dependencies = [

"dify_plugin>=0.9.0",

"requests>=2.34.2",

"paddleocr>=3.6.0",

]

dependencies = [

"dify_plugin>=0.9.0",

"paddleocr>=3.6.0",

"requests>=2.34.2",

]

gemini-code-assist · 2026-06-04T02:13:17Z

    if isinstance(file_value, str):
-        return file_value, explicit_file_type
+        # Check if it's a URL
+        if file_value.startswith(("http://", "https://")):
+            return file_value, False, explicit_file_type
+        # Check if it's base64 (data URL or raw)
+        if file_value.startswith("data:") or is_likely_base64(file_value):
+            temp_file = base64_to_temp_file(extract_base64(file_value))
+            return temp_file, True, explicit_file_type
+        # It's a file path
+        return file_value, False, explicit_file_type


The is_likely_base64 check can produce false positives for plain text strings (e.g., alphanumeric file paths or identifiers of length >= 32 with no special characters), causing them to be incorrectly treated as base64 and written to temporary files. To prevent this, check if the string is an existing local file path using os.path.exists before performing the base64 check.

Suggested change

if isinstance(file_value, str):

return file_value, explicit_file_type

# Check if it's a URL

if file_value.startswith(("http://", "https://")):

return file_value, False, explicit_file_type

# Check if it's base64 (data URL or raw)

if file_value.startswith("data:") or is_likely_base64(file_value):

temp_file = base64_to_temp_file(extract_base64(file_value))

return temp_file, True, explicit_file_type

# It's a file path

return file_value, False, explicit_file_type

if isinstance(file_value, str):

# Check if it's a URL

if file_value.startswith(("http://", "https://")):

return file_value, False, explicit_file_type

# Check if it's an existing file path

if os.path.exists(file_value):

return file_value, False, explicit_file_type

# Check if it's base64 (data URL or raw)

if file_value.startswith("data:") or is_likely_base64(file_value):

temp_file = base64_to_temp_file(extract_base64(file_value))

return temp_file, True, explicit_file_type

# It's a file path

return file_value, False, explicit_file_type

gemini-code-assist · 2026-06-04T02:13:17Z

+    parsed = urlparse(api_url)
+    # Remove common PaddleOCR endpoints
+    path = parsed.path
+    if path in ("/ocr", "/layout-parsing", "/paddleocr"):
+        path = ""
+    return f"{parsed.scheme}://{parsed.netloc}{path}"


If the user-provided api_url contains a trailing slash (e.g., https://example.com/ocr/), parsed.path will be /ocr/. This won't match the exact strings in the common endpoints check, leaving the trailing slash intact and potentially causing malformed URLs when the SDK appends endpoints. Strip trailing slashes when checking for common endpoints.

Suggested change

parsed = urlparse(api_url)

# Remove common PaddleOCR endpoints

path = parsed.path

if path in ("/ocr", "/layout-parsing", "/paddleocr"):

path = ""

return f"{parsed.scheme}://{parsed.netloc}{path}"

parsed = urlparse(api_url)

# Remove common PaddleOCR endpoints, ignoring trailing slashes

path = parsed.path

if path.rstrip("/") in ("/ocr", "/layout-parsing", "/paddleocr"):

path = ""

return f"{parsed.scheme}://{parsed.netloc}{path}"

gemini-code-assist · 2026-06-04T02:13:17Z

    if isinstance(file_value, File):
        encoded_file = base64.b64encode(file_value.blob).decode("utf-8")
-        if explicit_file_type is not None:
-            return encoded_file, explicit_file_type
-        return encoded_file, infer_file_type(file_value)
+        temp_file = base64_to_temp_file(encoded_file, infer_file_extension(file_value))
+        file_type_code = explicit_file_type if explicit_file_type is not None else infer_file_type(file_value)
+        return temp_file, True, file_type_code


When file_value is a File object, its binary content is already available in file_value.blob. Encoding it to a base64 string only to immediately decode it back to bytes in base64_to_temp_file introduces unnecessary CPU and memory overhead. Instead, write the raw bytes directly to the temporary file using a helper function like bytes_to_temp_file.

Suggested change

if isinstance(file_value, File):

encoded_file = base64.b64encode(file_value.blob).decode("utf-8")

if explicit_file_type is not None:

return encoded_file, explicit_file_type

return encoded_file, infer_file_type(file_value)

temp_file = base64_to_temp_file(encoded_file, infer_file_extension(file_value))

file_type_code = explicit_file_type if explicit_file_type is not None else infer_file_type(file_value)

return temp_file, True, file_type_code

if isinstance(file_value, File):

temp_file = bytes_to_temp_file(file_value.blob, infer_file_extension(file_value))

file_type_code = explicit_file_type if explicit_file_type is not None else infer_file_type(file_value)

return temp_file, True, file_type_code

gemini-code-assist · 2026-06-04T02:13:17Z

+def base64_to_temp_file(base64_str: str, suffix: str = ".png") -> str:
+    """Save base64 string to a temporary file.
+
+    Args:
+        base64_str: Base64 encoded string
+        suffix: File extension suffix
+
+    Returns:
+        Path to the temporary file
+    """
+    with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as f:
+        f.write(base64.b64decode(base64_str))
+        return f.name


Introduce a bytes_to_temp_file helper to write raw binary data directly to a temporary file, and refactor base64_to_temp_file to reuse it. This avoids redundant base64 encoding/decoding for File objects.

def bytes_to_temp_file(data: bytes, suffix: str = ".png") -> str: """Save bytes to a temporary file. Args: data: Binary data suffix: File extension suffix Returns: Path to the temporary file """ with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as f: f.write(data) return f.name def base64_to_temp_file(base64_str: str, suffix: str = ".png") -> str: """Save base64 string to a temporary file. Args: base64_str: Base64 encoded string suffix: File extension suffix Returns: Path to the temporary file """ return bytes_to_temp_file(base64.b64decode(base64_str), suffix)

The official PaddleOCR SDK has a pyyaml dependency conflict with dify_plugin: - dify_plugin requires pyyaml >= 6.0.3 - paddleocr (via paddlex) requires pyyaml == 6.0.2 This change replaces SDK calls with direct HTTP requests to the async Job API, following the exact same implementation logic as the SDK (submit → poll → fetch). Key changes: - tools/utils.py: Implement HTTP async Job API (submit → poll → fetch) - Tool files: Replace SDK calls with call_paddleocr_api() - provider/paddleocr.yaml: Simplify to single optional base_url - Tests: Update to mock HTTP API instead of SDK

dosubot Bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 4, 2026

Rander7 had a problem deploying to tools/paddleocr June 4, 2026 02:12 — with GitHub Actions Failure

gemini-code-assist Bot reviewed Jun 4, 2026

View reviewed changes

Rander7 had a problem deploying to tools/paddleocr June 4, 2026 02:23 — with GitHub Actions Failure

Rander7 had a problem deploying to tools/paddleocr June 4, 2026 05:46 — with GitHub Actions Failure

Rander7 temporarily deployed to tools/paddleocr June 4, 2026 05:54 — with GitHub Actions Inactive

Rander7 temporarily deployed to tools/paddleocr June 4, 2026 09:00 — with GitHub Actions Inactive

Rander7 had a problem deploying to tools/paddleocr June 5, 2026 03:56 — with GitHub Actions Failure

Rander7 had a problem deploying to tools/paddleocr June 5, 2026 04:09 — with GitHub Actions Failure

Rander7 had a problem deploying to tools/paddleocr June 5, 2026 04:14 — with GitHub Actions Failure

Rander7 force-pushed the refactor/use-paddleocr-sdk branch from c6856a5 to 1a3d077 Compare June 5, 2026 05:52

Rander7 temporarily deployed to tools/paddleocr June 5, 2026 05:53 — with GitHub Actions Inactive

Rander7 temporarily deployed to tools/paddleocr June 5, 2026 05:57 — with GitHub Actions Inactive

Rander7 force-pushed the refactor/use-paddleocr-sdk branch from fa8810d to 0f7d8c2 Compare June 5, 2026 09:14

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Jun 5, 2026

Rander7 temporarily deployed to tools/paddleocr June 5, 2026 09:14 — with GitHub Actions Inactive

Rander7 mentioned this pull request Jun 5, 2026

Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict Rander7/dify-official-plugins#1

Closed

5 tasks

Rander7 changed the title ~~refactor: use PaddleOCR Python SDK instead of direct API calls~~ Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict Jun 5, 2026

Rander7 temporarily deployed to tools/paddleocr June 5, 2026 09:32 — with GitHub Actions Inactive

Rander7 temporarily deployed to tools/paddleocr June 5, 2026 09:46 — with GitHub Actions Inactive

Rander7 force-pushed the refactor/use-paddleocr-sdk branch from 4d44f74 to d00aa48 Compare June 5, 2026 10:11

Rander7 had a problem deploying to tools/paddleocr June 5, 2026 10:11 — with GitHub Actions Error

Merge branch 'main' into refactor/use-paddleocr-sdk

a97808c

Rander7 deployed to tools/paddleocr June 5, 2026 10:13 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict#3247

Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict#3247
Rander7 wants to merge 2 commits into
langgenius:mainfrom
Rander7:refactor/use-paddleocr-sdk

Rander7 commented Jun 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Rander7 commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why This Change Is Necessary

Breaking Changes

Internal Changes

Dependencies

HTTP API Integration (follows SDK implementation)

Code Changes

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rander7 commented Jun 4, 2026 •

edited

Loading