Skip to content

feat: add ark multimodal polling plugins#3195

Draft
WH-2099 wants to merge 3 commits into
mainfrom
feat/ark-multimodal-polling
Draft

feat: add ark multimodal polling plugins#3195
WH-2099 wants to merge 3 commits into
mainfrom
feat/ark-multimodal-polling

Conversation

@WH-2099
Copy link
Copy Markdown
Member

@WH-2099 WH-2099 commented May 25, 2026

Summary

  • Add Ark multimodal polling support to the existing VolcEngine model plugin.
  • Add a separate BytePlus ModelArk plugin package instead of using multiple providers in one model plugin.
  • Add Seedream 5.0 image generation and Seedance 2.0 / 1.5 Pro video generation model definitions.
  • Add polling unit coverage for payload mapping, task status mapping, output conversion, provider registration isolation, and retryable check errors.

Blocked

This draft PR should not be merged until dify has completed the matching graphon and dify-plugin-daemon upgrades. The end-to-end stack needs polling invocation support before these plugin changes can be safely released.

Tracked by #3194.

Validation

  • cd models/volcengine && uv run pytest
    • 24 passed, 20 skipped
  • cd models/byteplus && uv run pytest
    • 17 passed, 1 skipped

Live BytePlus generation was not completed because the supplied key returned an inactive-key 401 from the provider.

@WH-2099 WH-2099 self-assigned this May 25, 2026
@WH-2099 WH-2099 temporarily deployed to models/byteplus May 25, 2026 07:12 — with GitHub Actions Inactive
@WH-2099 WH-2099 had a problem deploying to models/volcengine May 25, 2026 07:12 — with GitHub Actions Failure
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new plugin for BytePlus ModelArk and updates the Volcengine Ark plugin to support multimodal models, specifically Seedance for video and Seedream for image generation. The implementation includes a polling mechanism for asynchronous tasks and logic for handling multimodal inputs like images, videos, and audio. Feedback identifies an issue in the get_num_tokens method for both plugins, where complex message content is not correctly handled during token estimation, and suggests using the _extract_prompt_text helper to fix this.

Comment thread models/byteplus/models/llm/llm.py Outdated
Comment thread models/volcengine/models/llm/llm.py Outdated
@WH-2099 WH-2099 force-pushed the feat/ark-multimodal-polling branch from 9c78f35 to 155b3cc Compare May 27, 2026 14:39
@WH-2099 WH-2099 had a problem deploying to models/volcengine May 27, 2026 14:40 — with GitHub Actions Failure
@WH-2099 WH-2099 temporarily deployed to models/byteplus May 27, 2026 14:40 — with GitHub Actions Inactive
@WH-2099 WH-2099 deployed to models/byteplus May 27, 2026 14:49 — with GitHub Actions Active
@WH-2099 WH-2099 deployed to models/volcengine May 27, 2026 14:49 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant