Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .agents/plugins/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -1406,6 +1406,18 @@
},
"category": "Design"
},
{
"name": "heygen",
"source": {
"source": "local",
"path": "./plugins/heygen"
},
"policy": {
"installation": "AVAILABLE",
"authentication": "ON_INSTALL"
},
"category": "Design"
},
{
"name": "supabase",
"source": {
Expand Down
7 changes: 7 additions & 0 deletions plugins/heygen/.app.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"apps": {
"heygen": {
"id": "asdk_app_69418aad55e08191aa5e437b649ca2e4"
}
}
}
44 changes: 44 additions & 0 deletions plugins/heygen/.codex-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"name": "heygen",
"version": "2.2.0",
"description": "Create HeyGen avatar videos and personalized video messages. Build a persistent digital identity from a photo, then generate presenter-led videos with your digital twin.",
"author": {
"name": "HeyGen",
"email": "developers@heygen.com",
"url": "https://heygen.com"
},
"homepage": "https://heygen.com",
"repository": "https://github.com/heygen-com/skills",
Comment thread
jrusso1020 marked this conversation as resolved.
"license": "MIT",
"keywords": [
"heygen",
"avatar",
"identity",
"video",
"digital-twin",
"video-message",
"presenter",
"talking-head",
"ai-avatar",
"avatar-video"
],
"skills": "./skills/",
"apps": "./.app.json",
"interface": {
"displayName": "HeyGen",
"shortDescription": "Avatar videos and personalized video messages",
"longDescription": "HeyGen Skills give your agent a face, a voice, and the ability to send video like a message. Use heygen-avatar to build a persistent digital identity from a photo and pick a voice, then heygen-video to generate identity-first presenter videos via the HeyGen v3 Video Agent pipeline (avatar resolution, aspect ratio correction, prompt engineering, and voice selection are handled automatically).",
"developerName": "HeyGen",
"category": "Design",
"capabilities": ["Read", "Write"],
"websiteURL": "https://heygen.com",
"defaultPrompt": [
"Create my HeyGen avatar from this photo",
"Make a 30-second intro video of myself",
"Send a video update to my team about this week's progress"
],
"brandColor": "#0a0a0a",
"composerIcon": "./assets/icon.png",
"logo": "./assets/logo.png"
}
}
25 changes: 25 additions & 0 deletions plugins/heygen/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# heygen
Comment thread
jrusso1020 marked this conversation as resolved.

OpenAI Codex plugin for [HeyGen](https://heygen.com) — create AI avatar videos and personalized video messages.

## What's included

Two skills that chain together:

- **heygen-avatar** — turn a photo into a persistent digital twin. Handles avatar lookup, instant-avatar creation, voice selection (or voice cloning), and writes an `AVATAR` file the video skill reads back.
- **heygen-video** — generate identity-first presenter videos via the HeyGen v3 Video Agent pipeline. Encodes the prompting, asset routing, aspect-ratio correction, and avatar/voice resolution that good HeyGen videos need.
- **HeyGen app reference** — `.app.json` points at the curated [HeyGen ChatGPT app](https://chatgpt.com/apps/heygen/asdk_app_69418aad55e08191aa5e437b649ca2e4).

## Requirements

Installing the plugin connects the HeyGen ChatGPT app automatically (OAuth on first use). That is enough for the skills to work end-to-end on the user's existing HeyGen plan credits.

If you'd rather not use the app, the skills also support the HeyGen CLI: install it from <https://static.heygen.ai/cli/install.sh> and export `HEYGEN_API_KEY` (get one at <https://app.heygen.com/api>).

## Source of truth

The skills are authored in [`heygen-com/skills`](https://github.com/heygen-com/skills) (under `heygen-avatar/` and `heygen-video/` at the repo root) and mirrored here. The main structural delta in this mirror is the wrapping `skills/` parent directory required by the Codex plugin convention. File issues about skill content on that repo.

## License

MIT
6 changes: 6 additions & 0 deletions plugins/heygen/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
interface:
display_name: "HeyGen"
short_description: "Create avatar videos and personalized video messages"
icon_small: "./assets/icon.png"
icon_large: "./assets/logo.png"
default_prompt: "Help me create a personalized HeyGen video message. Ask who should appear on camera, who the audience is, the key points, and the tone before generating it."
189 changes: 189 additions & 0 deletions plugins/heygen/assets/PRISM_ORB.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added plugins/heygen/assets/icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added plugins/heygen/assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
452 changes: 452 additions & 0 deletions plugins/heygen/skills/heygen-avatar/SKILL.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions plugins/heygen/skills/heygen-avatar/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "HeyGen Avatar"
short_description: "Create reusable HeyGen avatar identities"
default_prompt: "Create a reusable HeyGen avatar for me from a photo or written description, then help me choose a matching voice."
86 changes: 86 additions & 0 deletions plugins/heygen/skills/heygen-avatar/references/asset-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Asset Handling — The Classification Engine

When the user provides files, URLs, or references, route each asset to the right path. The user should NEVER have to think about this.

## Two Paths

| Path | What happens | When to use |
|------|-------------|-------------|
| **A: Contextualize → Prompt** | Read/analyze the asset, extract key info, bake into script. Video Agent never sees the original. | Reference material, auth-walled content, documents where the *information* matters more than the *visual*. |
| **B: Attach to API** | Upload the raw file via `files[]`. Video Agent analyzes, extracts graphics, uses as frames/B-roll. | Screenshots, branded assets, PDFs with important visual layouts, images the viewer should literally see. |
| **A+B: Both** | Contextualize for script quality AND attach for visual use. | Long docs where you need to summarize but Video Agent should also have the full source. |

## Classification Flow

```
1. Can Video Agent access this directly?
- Public URL (no auth, no paywall) → YES
- Private/internal URL → NO
- Local file → NO (must upload first)

2. Should the viewer SEE this asset?
- Screenshot, logo, product image, chart → YES → Path B
- Research doc, article, context material → NO → Path A
- Ambiguous → Path A+B

3. Is the content too long for the prompt?
- Short (< 500 words) → fits in prompt
- Long (> 500 words) → summarize key points, attach full doc
```

## Decision Matrix

| Asset Type | Publicly Accessible? | Show On Screen? | Route |
|-----------|---------------------|----------------|-------|
| Screenshot / image | N/A | Yes | **B: Attach** + describe in prompt as B-roll |
| Logo / brand asset | N/A | Yes | **B: Attach** + anchor to intro/outro |
| Public URL to file (PDF, image, video) | Yes | Maybe | **B: Download → upload via `/v3/assets` → pass `asset_id`** + summarize |
| Public URL to web page (HTML) | Yes | No | **A: Fetch and contextualize only.** Do NOT pass HTML URLs in `files[]`. |
| Auth-walled URL (requires login) | No | No | **A: Ask the user to paste the content.** Never fabricate. |
| PDF (short, text-heavy) | N/A | No | **A+B: Extract key points** + attach |
| PDF (long, visual-rich) | N/A | Maybe | **B: Attach** + summarize top points |
| Raw data / spreadsheet | N/A | Partially | **A: Analyze and describe** key stats. Attach if charts should appear. |

## Executing Routes

### Path A (Contextualize)
- URLs: retrieve publicly accessible content with the environment's standard web/content fetch capability
- For auth-walled content you cannot access: ask the user to paste the text directly
- Extract 3-5 most important points relevant to the video
- Weave naturally into the script. Don't dump. Integrate.

### Path B (Attach)
Upload to HeyGen:

**App:** upload through the HeyGen app's asset flow when available.
**CLI:** `heygen asset create --file /path/to/file.png`

Max 32MB per file. Returns JSON with the new `asset_id`.

Or pass inline in `files[]`:
```json
{"type": "url", "url": "https://example.com/image.png"}
{"type": "asset_id", "asset_id": "<from upload>"}
{"type": "base64", "data": "<base64>", "content_type": "image/png"}
```

### Describe Asset Usage in Prompt
Be SPECIFIC:
- "Use the uploaded dashboard screenshot as B-roll when discussing analytics"
- "Display the company logo in the intro and end card"

### Log Classification
In the learning log entry, record:
```json
"assets_classified": [{"type": "image", "route": "attach", "accessible": true, "reason": "product screenshot"}]
```

## Rules

- **Never ask the user which path unless genuinely 50/50.** You're the producer. Make the call.
- **When in doubt, do both (A+B).** Over-providing costs nothing.
- **Always describe attached assets in the prompt.** Uploading without description = ignored.
- **Auth-walled content is YOUR job.** Bridge the gap between your access and Video Agent's.
- **URLs that fail:** Try the environment's standard web/content fetch capability. If login/paywall/404 → tell the user, ask for content directly. Never silently fabricate.
- **HTML URLs cannot go in `files[]`.** Video Agent rejects `text/html`. Web pages are ALWAYS Path A only.
- **Prefer download→upload→asset_id** over `files[]{url}`. HeyGen's servers often blocked by CDN/WAF.
178 changes: 178 additions & 0 deletions plugins/heygen/skills/heygen-avatar/references/avatar-creation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Avatar Creation API Surface

This guide expands `heygen-avatar` Phase 2 (avatar creation) and Phase 3
(voice selection) with the full API surface, field mappings, and file
input formats. The SKILL.md gives the high-level workflow; this file is
the reference when you need exact arguments, edge cases, or alternative
creation paths.

For *avatar discovery* (finding an existing avatar at video time), see
[`../../heygen-video/references/avatar-discovery.md`](../../heygen-video/references/avatar-discovery.md).

---

## Avatar Creation: Three Types

`heygen-avatar` Phase 2 supports three creation types. Pick based on what
the user provides:

| User input | Type | Flow |
|---|---|---|
| A photo of a real person | `photo` | Photo avatar creation |
| A description of an appearance | `prompt` | Prompt-based avatar creation |
| A short video recording of a real person | `video` | Digital-twin creation |

All three accept an optional `avatar_group_id`:
- **Omit it** to create a new character (new group).
- **Include it** to add a new look (variation) to an existing character.

Always use Mode 2 (with `avatar_group_id`) when the avatar already exists
and you're creating a variant (different outfit, orientation fix, bg
change). Only use Mode 1 (new character) for genuinely new identities.

### Photo avatar (from user's photo)

**App:** use the HeyGen app flow for photo avatar creation.

**CLI:**
```bash
heygen avatar create -d '{
"type": "photo",
"name": "My Avatar",
"file": {"type": "url", "url": "https://example.com/headshot.jpg"},
"avatar_group_id": "<optional>"
}'
```

Photo requirements:
- JPEG or PNG
- Min 512x512
- Clear front-facing face
- Good lighting

### AI-generated avatar (from text prompt)

**App:** use the HeyGen app flow for prompt-based avatar creation.

**CLI:**
```bash
heygen avatar create -d '{
"type": "prompt",
"name": "Tech Presenter",
"prompt": "Young professional woman, modern workspace, confident smile",
"avatar_group_id": "<optional>"
}'
```

Prompt limit: 1000 characters (the API spec says 200 but the actual
enforced limit is 1000). Be descriptive — include style, features,
expression, lighting.

Optional: up to 3 `reference_images` to anchor the generated appearance.

### Video avatar / digital twin (from a short recording)

**App:** use the HeyGen app flow for digital-twin creation from video.

**CLI:**
```bash
heygen avatar create -d '{
"type": "video",
"name": "My Video Avatar",
"file": {"type": "asset_id", "asset_id": "<uploaded_asset_id>"},
"avatar_group_id": "<optional>"
}'
```

---

## File Input Formats

`file` accepts three forms:

```jsonc
// Public URL (no auth, no paywall)
{ "type": "url", "url": "https://example.com/headshot.jpg" }

// Pre-uploaded asset (from `heygen asset create --file <path>`)
{ "type": "asset_id", "asset_id": "<id>" }

// Inline base64
{ "type": "base64", "data": "<base64>", "content_type": "image/png" }
```

For when each is appropriate, see
[`references/asset-routing.md`](asset-routing.md).

---

## Response Shape

All three types return:
```jsonc
{
"avatar_item": {
"id": "<look_id>", // ephemeral — the specific look
"group_id": "<group_id>" // stable — the character identity
}
}
```

- `id` is the **look_id** — what you pass downstream as `avatar_id` for
HeyGen video generation.
- `group_id` is the **character identity** — stable across looks. Save
this in the AVATAR-<NAME>.md file. Always resolve fresh look_ids at
video time via the avatar-looks flow rather than caching
a specific look_id.

---

## Identity Field → HeyGen Enum Mapping

When building a prompt-based avatar, map identity attributes to these
HeyGen enums:

- **age**: Young Adult | Early Middle Age | Late Middle Age | Senior | Unspecified
- **gender**: Man | Woman | Unspecified
- **ethnicity**: White | Black | Asian American | East Asian | South East Asian | South Asian | Middle Eastern | Pacific | Hispanic | Unspecified
- **style**: Realistic | Pixar | Cinematic | Vintage | Noir | Cyberpunk | Unspecified
- **orientation**: square | horizontal | vertical
- **pose**: half_body | close_up | full_body

---

## Voice Selection (during avatar setup)

After the avatar look is created, pair it with a voice. Two paths:

### Path A — Voice Design (preferred)

Find matching voices via semantic search using the Voice section from
the AVATAR file. This searches HeyGen's full voice library. No new
voices are generated and no quota is consumed.

**Language matching:** The voice design prompt should specify the target
language from `user_language`. Example for Japanese: `"A calm, warm
female voice. Professional but approachable. Japanese speaker."` This
ensures semantic search returns voices in the correct language.

### Path B — Voice Browse (fallback)

For manual catalog browsing:

**App:** browse available voices in the HeyGen app, filtered to the target language and voice characteristics when possible.

**CLI:**
```bash
heygen voice list --type private --limit 20
heygen voice list --type public --engine starfish --language en --gender female --limit 20
```

**ALWAYS show a playable voice preview.** Each voice response includes
`preview_audio_url` — share it before committing.

**Handling missing/broken previews:** Some voices may not expose a usable
preview URL and can return `null`. When this happens: note "(no preview available)" and
offer to generate a short TTS sample via the app or
`heygen voice speech create --text "<sample>" --voice-id <id>
--input-type plain_text --language en --locale en-US` (CLI).
Loading