Skip to content

Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict#1

Closed
Rander7 wants to merge 8 commits into
mainfrom
refactor/use-paddleocr-sdk
Closed

Replace PaddleOCR SDK with HTTP async API due to pyyaml conflict#1
Rander7 wants to merge 8 commits into
mainfrom
refactor/use-paddleocr-sdk

Conversation

@Rander7
Copy link
Copy Markdown
Owner

@Rander7 Rander7 commented Jun 5, 2026

Summary

  • Replace PaddleOCR SDK with direct HTTP async Job API calls due to pyyaml dependency conflict
  • The official SDK (via paddlex) requires pyyaml==6.0.2, but dify_plugin needs pyyaml>=6.0.3
  • Implementation follows the exact same logic as SDK: submit → poll → fetch

Changes

  • tools/utils.py: Implement HTTP async Job API (_submit_job, _poll_job, _parse_*)
  • Replace client.ocr()/parse_document() with call_paddleocr_api()
  • Use same API endpoint /api/v2/ocr/jobs, Bearer token auth, exponential backoff polling
  • Keep all utility functions (file handling, camel_to_snake) unchanged
  • No changes to provider.yaml config or pyproject.toml dependencies

Technical Details

  • API endpoint: /api/v2/ocr/jobs
  • Authentication: Authorization: Bearer {token} + Client-Platform: dify
  • Poll strategy: initial 3s, exponential backoff (1.5x), max 15s, timeout 600s
  • Result format: compatible dict format with SDK's structure

Test plan

  • Module imports work correctly
  • get_sdk_client() returns correct config dict
  • build_ocr_options() converts camelCase to snake_case
  • normalize_file_input() handles URL and file inputs
  • Manual OCR test with real token (requires environment setup)

🤖 Generated with Claude Code

Rander7 and others added 8 commits June 4, 2026 10:09
## Why This Refactoring Is Necessary

PaddleOCR 3.6.0+ has migrated to a new async Job API architecture where
requests are submitted, then polled for completion. The legacy sync API
will be deprecated, making this refactoring critical for long-term maintenance.

Benefits of using the official SDK:
- **Future-proof**: Aligns with PaddleOCR's official API evolution
- **Better reliability**: Built-in retry logic, timeout handling, error classification
- **Reduced maintenance**: No need to manually implement poll loops and error handling
- **Consistent behavior**: Same implementation as PaddleOCR's own tools (CLI, MCP)

## Breaking Changes

**None for end users** - the tool interface and output format remain identical.
The plugin continues to accept the same credentials and file inputs.

## Internal Changes

### Dependencies
- Replaced `requests` with `paddleocr>=3.6.0`

### SDK Integration
- Added `get_sdk_client()` with `client_platform="dify"` header
- Added Base64 → temp file conversion (SDK requires file_path/file_url)
- Added result format converters to maintain legacy output structure

### Code Simplification
- Removed manual HTTP request handling (`make_paddleocr_api_request`)
- Removed manual poll loops (SDK handles submit → poll → fetch)
- Updated credential validation to use SDK

## Testing

All three tools maintain their original behavior:
- Text Recognition (PP-OCRv5)
- Document Parsing (PP-StructureV3)
- VL Document Parsing (PaddleOCR-VL-1.6)
Use lazy imports for paddleocr SDK to avoid requiring it for tests.
Tests now mock the SDK calls to avoid importing the large paddleocr package.

Key changes:
- Remove top-level imports from paddleocr in utils.py and provider.py
- Use lazy imports inside functions that need paddleocr
- Add comprehensive mocking in tests for SDK functions
- Rename project to "paddleocr-dify" to avoid name conflict

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit migrates from direct HTTP API calls to the official PaddleOCR SDK (>=3.6.0), which simplifies integration and improves maintainability.

Key changes:
- Use public API imports from paddleocr package instead of internal modules
- Implement unified camelCase to snake_case parameter conversion
- Remove unnecessary result format conversion functions
- Simplify credential configuration: base_url is now optional (uses SDK default if not provided)
- Update provider validation to use SDK for testing
- Add manual test script for validation
- Update tests to mock public API instead of internal modules

User-facing changes:
- Configuration simplified: only token is required for official service
- base_url is optional (only needed for self-hosted deployments)
- All core OCR and document parsing features continue to work as before

Testing:
- All 12 unit tests pass
- Manual tests confirm OCR (URL and Base64) and document parsing work correctly

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The official PaddleOCR SDK has a pyyaml dependency conflict with dify_plugin:
- dify_plugin requires pyyaml >= 6.0.3
- paddleocr (via paddlex) requires pyyaml == 6.0.2

This change replaces SDK calls with direct HTTP requests to the async Job API,
following the exact same implementation logic as the SDK (submit → poll → fetch).

Key changes:
- tools/utils.py: Implement HTTP async Job API (_submit_job, _poll_job, _parse_*)
- Replace client.ocr()/parse_document() with call_paddleocr_api()
- Use same API endpoint /api/v2/ocr/jobs, Bearer token auth, poll strategy
- Keep all utility functions (file handling, camel_to_snake) unchanged
- No changes to provider.yaml config or pyproject.toml dependencies

Changes: +461/-480 lines, 6 files modified

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@Rander7
Copy link
Copy Markdown
Owner Author

Rander7 commented Jun 5, 2026

Created by mistake, the correct PR is langgenius#3247 in langgenius/dify-official-plugins

@Rander7 Rander7 closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant