Skip to content

Conversation

@ShivaanshGusain
Copy link

@ShivaanshGusain ShivaanshGusain commented Dec 16, 2025

Summary of Changes

This PR addresses two compatibility issues encountered when setting up the environment with recent library versions (specifically PaddleOCR v2.9.1).

  1. Fix PaddleOCR Initialization
    Newer versions of paddleocr have deprecated/removed arguments like max_batch_size, use_gpu, and use_dilation from the constructor. Keeping them causes a ValueError crash on startup.
    Change: Updated PaddleOCR() initialization to rely on default argument parsing, which correctly auto-detects GPU support.

  2. Box Sorting
    Occasionally, filtered_boxes contains mixed types (dictionaries and raw lists) depending on the detection results. This causes the sorted() function (line ~435) to crash with an AttributeError because raw lists do not have keys.
    Change: Added a safety check loop (safe_boxes) to ensure all elements in filtered_boxes are standardized dictionaries before sorting.

  3. Fix Florence-2 Inference Crash
    The current dependency resolution installs a version of transformers (v4.41+) that is incompatible with the custom modeling_florence2.py, causing an AttributeError: 'NoneType' object has no attribute 'shape'. Change: Pinned transformers==4.40.0 in requirements.txt to ensure stability.

Testing

  • Verified environment setup on Windows with Python 3.10 and PaddleOCR 2.9.1.
  • Validated that the model loads and performs inference without crashing.

@ShivaanshGusain
Copy link
Author

@microsoft-github-policy-service agree

@ShivaanshGusain
Copy link
Author

Hi @ataymano, this is the fix for setting up the environment with the latest version of paddleocr (v2.9.1) -

Fixed PaddleOCR Initialization: Newer versions of paddleocr have deprecated or removed the max_batch_size, use_gpu, and use_dilation arguments. Passing them was causing a ValueError on startup. I’ve updated the initialization to rely on the default argument parsing, which correctly handles GPU detection automatically.

I noticed that filtered_boxes occasionally contained mixed types (dictionaries and raw lists) depending on the detection results, which caused the sorted() function to crash with an AttributeError. I added a safety check to standardize all elements to dictionaries before sorting.

I verified these changes on Windows with Python 3.11 and PaddleOCR 2.9.1. The model now loads correctly and performs inference without crashing.

test_screenshot_debug test_screenshot

@jomach
Copy link

jomach commented Jan 16, 2026

See #354

@ShivaanshGusain ShivaanshGusain changed the title Fix PaddleOCR 2.9+ initialization arguments & box sorting Fix PaddleOCR 2.9+ args, Box Sorting, and pin Transformers (Florence-2 fix) Jan 16, 2026
@ShivaanshGusain
Copy link
Author

See #354

The crash in your logs is actually coming from the Florence-2 model, not Paddle. The AttributeError: 'NoneType' object... error occurs because recent versions of the Transformers library break the custom model code.

I have updated requirements.txt in the PR above to pin the correct version.

Quick fix -
Run' pip install transformers==4.40.0' in your environment, and it should work immediately.

@jomach
Copy link

jomach commented Jan 21, 2026

I also had to add on the utils.py attn_implementation="eager" on :

if device == 'cpu':
            model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float32, trust_remote_code=True, attn_implementation=attn_implementation)
        else:
            model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float16, trust_remote_code=True, attn_implementation=attn_implementation).to(device)

dashscope
groq No newline at end of file

groq
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why groq ?

Copy link
Author

@ShivaanshGusain ShivaanshGusain Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, Groq has been used, since that is the reasoning model, it produces the chain of thoughts.
Example -
It generates -
<think>......</think> Tokens
You can find its use in this file -
OmniParser/omnitool/gradio/agent/llm_utils/groqclient.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants