diff --git a/src/data/nav/platform.ts b/src/data/nav/platform.ts index ce1a3b6682..383a459861 100644 --- a/src/data/nav/platform.ts +++ b/src/data/nav/platform.ts @@ -124,6 +124,15 @@ export default { link: '/docs/platform/pricing/limits', name: 'Limits', }, + { + name: 'Pricing examples', + pages: [ + { + link: '/docs/platform/pricing/examples/ai-chatbot', + name: 'AI support chatbot', + }, + ], + }, { link: '/docs/platform/pricing/faqs', name: 'Pricing FAQs', diff --git a/src/pages/docs/ai-transport/index.mdx b/src/pages/docs/ai-transport/index.mdx index ceec10b83a..2742030a2d 100644 --- a/src/pages/docs/ai-transport/index.mdx +++ b/src/pages/docs/ai-transport/index.mdx @@ -145,3 +145,16 @@ Take a look at some example code running in-browser of the sorts of features you }, ]} + +## Pricing + +AI Transport uses Ably's [usage based billing model](/docs/platform/pricing) at your package rates. Your consumption costs will depend on the number of messages inbound (published to Ably) and outbound (delivered to subscribers), and how long channels or connections are active. [Contact Ably](https://ably.com/contact) to discuss options for Enterprise pricing and volume discounts. + +The cost of streaming token responses over Ably depends on: + +- the number of tokens in the LLM responses that you are streaming. For example, a simple support chatbot response might be around 300 tokens, a coding session can be 2,000-3,000 tokens and a deep reasoning response could be over 50,000 tokens. +- the rate at which your agent publishes tokens to Ably and the number of messages it uses to do so. Some LLMs output every token as a single event, while others batch multiple tokens together. Similarly, your agent may publish tokens as they are received from the LLM or perform its own processing and batching first. +- the number of subscribers receiving the response. +- the [token streaming pattern](/docs/ai-transport/token-streaming#token-streaming-patterns) you choose. + +For example, suppose an AI support chatbot sends a response of 300 tokens, each as a discrete update, using the [message-per-response](/docs/ai-transport/token-streaming/message-per-response) pattern, and with a single client subscribed to the channel. With AI Transport's [append rollup](/docs/ai-transport/messaging/token-rate-limits#per-response),those 300 input tokens will be conflated to 100 discrete inbound messages, resulting in 100 outbound messages and 100 persisted messages. See the [AI support chatbot pricing example](/docs/platform/pricing/examples/ai-chatbot) for a full breakdown of the costs in this scenario. diff --git a/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx new file mode 100644 index 0000000000..38c849905a --- /dev/null +++ b/src/pages/docs/platform/pricing/examples/ai-chatbot.mdx @@ -0,0 +1,52 @@ +--- +title: AI support chatbot pricing example +meta_description: "Calculate AI Transport pricing for conversations with an AI chatbot. Example shows how using the message-per-response pattern and modifying the append rollup window can generate cost savings." +meta_keywords: "chatbot, support chat, token streaming, token cost, AI Transport pricing, Ably AI Transport pricing, stream cost, Pub/Sub pricing, realtime data delivery, Ably Pub/Sub pricing" +intro: "This example uses consumption-based pricing for an AI support chatbot use case, where a single agent is publishing tokens to user over AI Transport." +--- + +### Assumptions + +The scale and features used in this calculation. + +| Scale | Features | +|-------|----------| +| 4 user prompts to get to resolution | ✓ Message-per-response | +| 300 token events per LLM response | | +| 75 appends per second from agent | | +| 3 minute average chat duration | | +| 1 million chats | | + +### Cost summary + +The high level cost breakdown for this scenario is given in the table below. Messages are billed for both inbound (published to Ably) and outbound (delivered to subscribers). Enabling the "Message updates, deletes and appends" [channel rule](/docs/ai-transport/token-streaming/message-per-response#enable) will automatically enable message persistence. + +| Item | Calculation | Cost | +|------|-------------|------| +| Messages | 1212M × $2.50/M | $3030 | +| Connection minutes | 6M × $1.00/M | $6 | +| Channel minutes | 3M × $1.00/M | $3 | +| Package fee | | [See plans](/pricing) | +| **Total** | | **~$3039/M chats** | + +### Message usage breakdown + +Several factors influence the total message usage. The message-per-response pattern includes [automatic rollup of append events](/docs/ai-transport/token-streaming/token-rate-limits#per-response) to reduce consumption costs and avoid rate limits. + +- Agent stream time: 300 token events ÷ 75 appends per second = 4 seconds of streaming per response +- Messages published after rollup: 4 seconds x 25 messages/s = **100 messages per response** + +| Type | Calculation | Inbound | Outbound | Total messages | Cost | +|------|-------------|---------|----------|----------------|------| +| User prompts | 1M chats × 4 prompts | 4M | 4M | 8M | $20 | +| Agent responses | 1M chats x 4 responses x 100 messages per response | 400M | 400M | 800M | $2000 | +| Persisted messages | Every inbound message is persisted | 404M | 0 | 404M | $1010 | +| **Total** | | **808M** | **404M** | **1212M** | **$3030** | + + +