Skip to content

Fix image-text-to-text stop_sequence handling#47032

Open
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:21
Open

Fix image-text-to-text stop_sequence handling#47032
Sunt-ing wants to merge 1 commit into
huggingface:mainfrom
Sunt-ing:21

Conversation

@Sunt-ing

@Sunt-ing Sunt-ing commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

CI

What does this PR do?

ImageTextToTextPipeline accepts stop_sequence, but the stop-sequence branch writes into generate_kwargs even when the user did not pass a generate_kwargs dict. A normal pipeline call such as pipe(image, text=..., stop_sequence=".") fails before generation with:

TypeError: 'NoneType' object does not support item assignment

This creates the forward generate_kwargs when needed and passes stop sequences through generation's stop_strings support, matching the any-to-any pipeline path. This also avoids forcing an extra eos_token_id into image-text model wrappers that already provide their own decoder EOS value.

Repro and output
from PIL import Image
from transformers import pipeline


model_id = "Xenova/tiny-random-LlavaForConditionalGeneration"
image = Image.new("RGB", (16, 16), color="white")
pipe = pipeline("image-text-to-text", model=model_id, device=-1)
print(pipe(image, text="<image> Describe.", stop_sequence=".", max_new_tokens=1))
# current main
loading Xenova/tiny-random-LlavaForConditionalGeneration
loaded ImageTextToTextPipeline LlavaForConditionalGeneration
ERR TypeError "'NoneType' object does not support item assignment"

# after this PR
loading Xenova/tiny-random-LlavaForConditionalGeneration
loaded ImageTextToTextPipeline LlavaForConditionalGeneration
OK [{'input_text': '<image> Describe.', 'generated_text': '<image> Describe. starb'}]
Tests
python -m pytest -q tests/pipelines/test_pipelines_image_text_to_text.py::ImageTextToTextPipelineTests::test_stop_sequence_without_generate_kwargs

tests/pipelines/test_pipelines_image_text_to_text.py::ImageTextToTextPipelineTests::test_stop_sequence_without_generate_kwargs PASSED [100%]

============================== 1 passed in 0.79s ===============================

python -m ruff check src/transformers/pipelines/image_text_to_text.py tests/pipelines/test_pipelines_image_text_to_text.py
All checks passed!

git diff --check

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline and the
    Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes according to the guidelines?
  • Did you write any new necessary tests?

Who can review?

cc @zucchini-nlp @Rocketknight1

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28625110499:2
Result: success | Jobs: 15 | Tests: 63,630 | Failures: 0 | Duration: 17h 35m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant