-
Notifications
You must be signed in to change notification settings - Fork 126
Add HeyGen plugin #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jrusso1020
wants to merge
7
commits into
main
Choose a base branch
from
codex/reopen-heygen-plugin
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Add HeyGen plugin #198
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6d954ab
Add HeyGen plugin
jrusso1020 72ac030
Polish HeyGen skill references
jrusso1020 0f3f4a5
Clarify HeyGen preview URL guidance
jrusso1020 e90b9c6
Refine HeyGen plugin README
jrusso1020 8c6f8a4
Merge remote-tracking branch 'origin/main' into codex/reopen-heygen-p…
jrusso1020 2695dcf
Add HeyGen skill mobile metadata
jrusso1020 c78ce8d
Improve HeyGen mobile starter prompts
jrusso1020 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| { | ||
| "apps": { | ||
| "heygen": { | ||
| "id": "asdk_app_69418aad55e08191aa5e437b649ca2e4" | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| { | ||
| "name": "heygen", | ||
| "version": "2.2.0", | ||
| "description": "Create HeyGen avatar videos and personalized video messages. Build a persistent digital identity from a photo, then generate presenter-led videos with your digital twin.", | ||
| "author": { | ||
| "name": "HeyGen", | ||
| "email": "developers@heygen.com", | ||
| "url": "https://heygen.com" | ||
| }, | ||
| "homepage": "https://heygen.com", | ||
| "repository": "https://github.com/heygen-com/skills", | ||
| "license": "MIT", | ||
| "keywords": [ | ||
| "heygen", | ||
| "avatar", | ||
| "identity", | ||
| "video", | ||
| "digital-twin", | ||
| "video-message", | ||
| "presenter", | ||
| "talking-head", | ||
| "ai-avatar", | ||
| "avatar-video" | ||
| ], | ||
| "skills": "./skills/", | ||
| "apps": "./.app.json", | ||
| "interface": { | ||
| "displayName": "HeyGen", | ||
| "shortDescription": "Avatar videos and personalized video messages", | ||
| "longDescription": "HeyGen Skills give your agent a face, a voice, and the ability to send video like a message. Use heygen-avatar to build a persistent digital identity from a photo and pick a voice, then heygen-video to generate identity-first presenter videos via the HeyGen v3 Video Agent pipeline (avatar resolution, aspect ratio correction, prompt engineering, and voice selection are handled automatically).", | ||
| "developerName": "HeyGen", | ||
| "category": "Design", | ||
| "capabilities": ["Read", "Write"], | ||
| "websiteURL": "https://heygen.com", | ||
| "defaultPrompt": [ | ||
| "Create my HeyGen avatar from this photo", | ||
| "Make a 30-second intro video of myself", | ||
| "Send a video update to my team about this week's progress" | ||
| ], | ||
| "brandColor": "#0a0a0a", | ||
| "composerIcon": "./assets/icon.png", | ||
| "logo": "./assets/logo.png" | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # heygen | ||
|
jrusso1020 marked this conversation as resolved.
|
||
|
|
||
| OpenAI Codex plugin for [HeyGen](https://heygen.com) — create AI avatar videos and personalized video messages. | ||
|
|
||
| ## What's included | ||
|
|
||
| Two skills that chain together: | ||
|
|
||
| - **heygen-avatar** — turn a photo into a persistent digital twin. Handles avatar lookup, instant-avatar creation, voice selection (or voice cloning), and writes an `AVATAR` file the video skill reads back. | ||
| - **heygen-video** — generate identity-first presenter videos via the HeyGen v3 Video Agent pipeline. Encodes the prompting, asset routing, aspect-ratio correction, and avatar/voice resolution that good HeyGen videos need. | ||
| - **HeyGen app reference** — `.app.json` points at the curated [HeyGen ChatGPT app](https://chatgpt.com/apps/heygen/asdk_app_69418aad55e08191aa5e437b649ca2e4). | ||
|
|
||
| ## Requirements | ||
|
|
||
| Installing the plugin connects the HeyGen ChatGPT app automatically (OAuth on first use). That is enough for the skills to work end-to-end on the user's existing HeyGen plan credits. | ||
|
|
||
| If you'd rather not use the app, the skills also support the HeyGen CLI: install it from <https://static.heygen.ai/cli/install.sh> and export `HEYGEN_API_KEY` (get one at <https://app.heygen.com/api>). | ||
|
|
||
| ## Source of truth | ||
|
|
||
| The skills are authored in [`heygen-com/skills`](https://github.com/heygen-com/skills) (under `heygen-avatar/` and `heygen-video/` at the repo root) and mirrored here. The main structural delta in this mirror is the wrapping `skills/` parent directory required by the Codex plugin convention. File issues about skill content on that repo. | ||
|
|
||
| ## License | ||
|
|
||
| MIT | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| interface: | ||
| display_name: "HeyGen" | ||
| short_description: "Create avatar videos and personalized video messages" | ||
| icon_small: "./assets/icon.png" | ||
| icon_large: "./assets/logo.png" | ||
| default_prompt: "Help me create a personalized HeyGen video message. Ask who should appear on camera, who the audience is, the key points, and the tone before generating it." |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| interface: | ||
| display_name: "HeyGen Avatar" | ||
| short_description: "Create reusable HeyGen avatar identities" | ||
| default_prompt: "Create a reusable HeyGen avatar for me from a photo or written description, then help me choose a matching voice." |
86 changes: 86 additions & 0 deletions
86
plugins/heygen/skills/heygen-avatar/references/asset-routing.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| # Asset Handling — The Classification Engine | ||
|
|
||
| When the user provides files, URLs, or references, route each asset to the right path. The user should NEVER have to think about this. | ||
|
|
||
| ## Two Paths | ||
|
|
||
| | Path | What happens | When to use | | ||
| |------|-------------|-------------| | ||
| | **A: Contextualize → Prompt** | Read/analyze the asset, extract key info, bake into script. Video Agent never sees the original. | Reference material, auth-walled content, documents where the *information* matters more than the *visual*. | | ||
| | **B: Attach to API** | Upload the raw file via `files[]`. Video Agent analyzes, extracts graphics, uses as frames/B-roll. | Screenshots, branded assets, PDFs with important visual layouts, images the viewer should literally see. | | ||
| | **A+B: Both** | Contextualize for script quality AND attach for visual use. | Long docs where you need to summarize but Video Agent should also have the full source. | | ||
|
|
||
| ## Classification Flow | ||
|
|
||
| ``` | ||
| 1. Can Video Agent access this directly? | ||
| - Public URL (no auth, no paywall) → YES | ||
| - Private/internal URL → NO | ||
| - Local file → NO (must upload first) | ||
|
|
||
| 2. Should the viewer SEE this asset? | ||
| - Screenshot, logo, product image, chart → YES → Path B | ||
| - Research doc, article, context material → NO → Path A | ||
| - Ambiguous → Path A+B | ||
|
|
||
| 3. Is the content too long for the prompt? | ||
| - Short (< 500 words) → fits in prompt | ||
| - Long (> 500 words) → summarize key points, attach full doc | ||
| ``` | ||
|
|
||
| ## Decision Matrix | ||
|
|
||
| | Asset Type | Publicly Accessible? | Show On Screen? | Route | | ||
| |-----------|---------------------|----------------|-------| | ||
| | Screenshot / image | N/A | Yes | **B: Attach** + describe in prompt as B-roll | | ||
| | Logo / brand asset | N/A | Yes | **B: Attach** + anchor to intro/outro | | ||
| | Public URL to file (PDF, image, video) | Yes | Maybe | **B: Download → upload via `/v3/assets` → pass `asset_id`** + summarize | | ||
| | Public URL to web page (HTML) | Yes | No | **A: Fetch and contextualize only.** Do NOT pass HTML URLs in `files[]`. | | ||
| | Auth-walled URL (requires login) | No | No | **A: Ask the user to paste the content.** Never fabricate. | | ||
| | PDF (short, text-heavy) | N/A | No | **A+B: Extract key points** + attach | | ||
| | PDF (long, visual-rich) | N/A | Maybe | **B: Attach** + summarize top points | | ||
| | Raw data / spreadsheet | N/A | Partially | **A: Analyze and describe** key stats. Attach if charts should appear. | | ||
|
|
||
| ## Executing Routes | ||
|
|
||
| ### Path A (Contextualize) | ||
| - URLs: retrieve publicly accessible content with the environment's standard web/content fetch capability | ||
| - For auth-walled content you cannot access: ask the user to paste the text directly | ||
| - Extract 3-5 most important points relevant to the video | ||
| - Weave naturally into the script. Don't dump. Integrate. | ||
|
|
||
| ### Path B (Attach) | ||
| Upload to HeyGen: | ||
|
|
||
| **App:** upload through the HeyGen app's asset flow when available. | ||
| **CLI:** `heygen asset create --file /path/to/file.png` | ||
|
|
||
| Max 32MB per file. Returns JSON with the new `asset_id`. | ||
|
|
||
| Or pass inline in `files[]`: | ||
| ```json | ||
| {"type": "url", "url": "https://example.com/image.png"} | ||
| {"type": "asset_id", "asset_id": "<from upload>"} | ||
| {"type": "base64", "data": "<base64>", "content_type": "image/png"} | ||
| ``` | ||
|
|
||
| ### Describe Asset Usage in Prompt | ||
| Be SPECIFIC: | ||
| - "Use the uploaded dashboard screenshot as B-roll when discussing analytics" | ||
| - "Display the company logo in the intro and end card" | ||
|
|
||
| ### Log Classification | ||
| In the learning log entry, record: | ||
| ```json | ||
| "assets_classified": [{"type": "image", "route": "attach", "accessible": true, "reason": "product screenshot"}] | ||
| ``` | ||
|
|
||
| ## Rules | ||
|
|
||
| - **Never ask the user which path unless genuinely 50/50.** You're the producer. Make the call. | ||
| - **When in doubt, do both (A+B).** Over-providing costs nothing. | ||
| - **Always describe attached assets in the prompt.** Uploading without description = ignored. | ||
| - **Auth-walled content is YOUR job.** Bridge the gap between your access and Video Agent's. | ||
| - **URLs that fail:** Try the environment's standard web/content fetch capability. If login/paywall/404 → tell the user, ask for content directly. Never silently fabricate. | ||
| - **HTML URLs cannot go in `files[]`.** Video Agent rejects `text/html`. Web pages are ALWAYS Path A only. | ||
| - **Prefer download→upload→asset_id** over `files[]{url}`. HeyGen's servers often blocked by CDN/WAF. |
178 changes: 178 additions & 0 deletions
178
plugins/heygen/skills/heygen-avatar/references/avatar-creation.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,178 @@ | ||
| # Avatar Creation API Surface | ||
|
|
||
| This guide expands `heygen-avatar` Phase 2 (avatar creation) and Phase 3 | ||
| (voice selection) with the full API surface, field mappings, and file | ||
| input formats. The SKILL.md gives the high-level workflow; this file is | ||
| the reference when you need exact arguments, edge cases, or alternative | ||
| creation paths. | ||
|
|
||
| For *avatar discovery* (finding an existing avatar at video time), see | ||
| [`../../heygen-video/references/avatar-discovery.md`](../../heygen-video/references/avatar-discovery.md). | ||
|
|
||
| --- | ||
|
|
||
| ## Avatar Creation: Three Types | ||
|
|
||
| `heygen-avatar` Phase 2 supports three creation types. Pick based on what | ||
| the user provides: | ||
|
|
||
| | User input | Type | Flow | | ||
| |---|---|---| | ||
| | A photo of a real person | `photo` | Photo avatar creation | | ||
| | A description of an appearance | `prompt` | Prompt-based avatar creation | | ||
| | A short video recording of a real person | `video` | Digital-twin creation | | ||
|
|
||
| All three accept an optional `avatar_group_id`: | ||
| - **Omit it** to create a new character (new group). | ||
| - **Include it** to add a new look (variation) to an existing character. | ||
|
|
||
| Always use Mode 2 (with `avatar_group_id`) when the avatar already exists | ||
| and you're creating a variant (different outfit, orientation fix, bg | ||
| change). Only use Mode 1 (new character) for genuinely new identities. | ||
|
|
||
| ### Photo avatar (from user's photo) | ||
|
|
||
| **App:** use the HeyGen app flow for photo avatar creation. | ||
|
|
||
| **CLI:** | ||
| ```bash | ||
| heygen avatar create -d '{ | ||
| "type": "photo", | ||
| "name": "My Avatar", | ||
| "file": {"type": "url", "url": "https://example.com/headshot.jpg"}, | ||
| "avatar_group_id": "<optional>" | ||
| }' | ||
| ``` | ||
|
|
||
| Photo requirements: | ||
| - JPEG or PNG | ||
| - Min 512x512 | ||
| - Clear front-facing face | ||
| - Good lighting | ||
|
|
||
| ### AI-generated avatar (from text prompt) | ||
|
|
||
| **App:** use the HeyGen app flow for prompt-based avatar creation. | ||
|
|
||
| **CLI:** | ||
| ```bash | ||
| heygen avatar create -d '{ | ||
| "type": "prompt", | ||
| "name": "Tech Presenter", | ||
| "prompt": "Young professional woman, modern workspace, confident smile", | ||
| "avatar_group_id": "<optional>" | ||
| }' | ||
| ``` | ||
|
|
||
| Prompt limit: 1000 characters (the API spec says 200 but the actual | ||
| enforced limit is 1000). Be descriptive — include style, features, | ||
| expression, lighting. | ||
|
|
||
| Optional: up to 3 `reference_images` to anchor the generated appearance. | ||
|
|
||
| ### Video avatar / digital twin (from a short recording) | ||
|
|
||
| **App:** use the HeyGen app flow for digital-twin creation from video. | ||
|
|
||
| **CLI:** | ||
| ```bash | ||
| heygen avatar create -d '{ | ||
| "type": "video", | ||
| "name": "My Video Avatar", | ||
| "file": {"type": "asset_id", "asset_id": "<uploaded_asset_id>"}, | ||
| "avatar_group_id": "<optional>" | ||
| }' | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## File Input Formats | ||
|
|
||
| `file` accepts three forms: | ||
|
|
||
| ```jsonc | ||
| // Public URL (no auth, no paywall) | ||
| { "type": "url", "url": "https://example.com/headshot.jpg" } | ||
|
|
||
| // Pre-uploaded asset (from `heygen asset create --file <path>`) | ||
| { "type": "asset_id", "asset_id": "<id>" } | ||
|
|
||
| // Inline base64 | ||
| { "type": "base64", "data": "<base64>", "content_type": "image/png" } | ||
| ``` | ||
|
|
||
| For when each is appropriate, see | ||
| [`references/asset-routing.md`](asset-routing.md). | ||
|
|
||
| --- | ||
|
|
||
| ## Response Shape | ||
|
|
||
| All three types return: | ||
| ```jsonc | ||
| { | ||
| "avatar_item": { | ||
| "id": "<look_id>", // ephemeral — the specific look | ||
| "group_id": "<group_id>" // stable — the character identity | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| - `id` is the **look_id** — what you pass downstream as `avatar_id` for | ||
| HeyGen video generation. | ||
| - `group_id` is the **character identity** — stable across looks. Save | ||
| this in the AVATAR-<NAME>.md file. Always resolve fresh look_ids at | ||
| video time via the avatar-looks flow rather than caching | ||
| a specific look_id. | ||
|
|
||
| --- | ||
|
|
||
| ## Identity Field → HeyGen Enum Mapping | ||
|
|
||
| When building a prompt-based avatar, map identity attributes to these | ||
| HeyGen enums: | ||
|
|
||
| - **age**: Young Adult | Early Middle Age | Late Middle Age | Senior | Unspecified | ||
| - **gender**: Man | Woman | Unspecified | ||
| - **ethnicity**: White | Black | Asian American | East Asian | South East Asian | South Asian | Middle Eastern | Pacific | Hispanic | Unspecified | ||
| - **style**: Realistic | Pixar | Cinematic | Vintage | Noir | Cyberpunk | Unspecified | ||
| - **orientation**: square | horizontal | vertical | ||
| - **pose**: half_body | close_up | full_body | ||
|
|
||
| --- | ||
|
|
||
| ## Voice Selection (during avatar setup) | ||
|
|
||
| After the avatar look is created, pair it with a voice. Two paths: | ||
|
|
||
| ### Path A — Voice Design (preferred) | ||
|
|
||
| Find matching voices via semantic search using the Voice section from | ||
| the AVATAR file. This searches HeyGen's full voice library. No new | ||
| voices are generated and no quota is consumed. | ||
|
|
||
| **Language matching:** The voice design prompt should specify the target | ||
| language from `user_language`. Example for Japanese: `"A calm, warm | ||
| female voice. Professional but approachable. Japanese speaker."` This | ||
| ensures semantic search returns voices in the correct language. | ||
|
|
||
| ### Path B — Voice Browse (fallback) | ||
|
|
||
| For manual catalog browsing: | ||
|
|
||
| **App:** browse available voices in the HeyGen app, filtered to the target language and voice characteristics when possible. | ||
|
|
||
| **CLI:** | ||
| ```bash | ||
| heygen voice list --type private --limit 20 | ||
| heygen voice list --type public --engine starfish --language en --gender female --limit 20 | ||
| ``` | ||
|
|
||
| **ALWAYS show a playable voice preview.** Each voice response includes | ||
| `preview_audio_url` — share it before committing. | ||
|
|
||
| **Handling missing/broken previews:** Some voices may not expose a usable | ||
| preview URL and can return `null`. When this happens: note "(no preview available)" and | ||
| offer to generate a short TTS sample via the app or | ||
| `heygen voice speech create --text "<sample>" --voice-id <id> | ||
| --input-type plain_text --language en --locale en-US` (CLI). |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.