From 87a8d3d51b7bb1e846b276f22f326b5248a9e4cb Mon Sep 17 00:00:00 2001
From: Ben Echols <benjamin.echols@temporal.io>
Date: Wed, 28 Jan 2026 19:51:15 -0800
Subject: [PATCH 1/3] Clarify throttling behavior when exceeding APS limits

Add detailed documentation about what actually happens when Namespace
rate limits are exceeded, based on community feedback.

Key clarifications:
- Priority-based throttling (low-priority ops throttled first)
- Burst tolerance allows temporary spikes
- ResourceExhausted errors and SDK retry behavior
- Calls CAN fail if throttling persists beyond retry limits
- Best practices for handling throttling

This corrects the previous implication that "work is never lost" which
was misleading - client calls can fail if not handled properly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/cloud/capacity-modes.mdx           | 18 ++++++++++++++----
 docs/evaluate/temporal-cloud/limits.mdx | 15 +++++++++++++++
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/docs/cloud/capacity-modes.mdx b/docs/cloud/capacity-modes.mdx
index bfeb153e8c..475c1b2e31 100644
--- a/docs/cloud/capacity-modes.mdx
+++ b/docs/cloud/capacity-modes.mdx
@@ -67,12 +67,22 @@ RPS and OPS  are lower-level measures to control and balance request rates at th
 
 ### What happens when my Actions Rate exceeds my Limit?
 
-When your Action rate exceeds your quota, Temporal Cloud throttles Actions until the rate matches your quota. 
-Throttling means limiting the rate at which Actions are performed to prevent the Namespace from exceeding its APS limit. 
-Your work is never lost and will continue at the limited pace until APS returns below the limit. 
+When your Action rate exceeds your quota, Temporal Cloud throttles Actions.
+Throttling limits the rate at which Actions are performed to prevent the Namespace from exceeding its APS limit.
+
+**How throttling works:**
+- Low-priority operations are throttled first; higher-priority operations (like starting or signaling Workflows) continue when possible.
+- A burst factor allows temporary spikes above your limit without immediate throttling.
+- When throttled, the server returns `ResourceExhausted` errors that SDK clients automatically retry.
+- If throttling persists beyond the SDK's retry limit, client calls can fail.
+
+**To avoid data loss during throttling:**
+- Log any failed client calls (with payloads) so you can retry or backfill later.
+- Set up [limit metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when approaching your limits.
+
 Your rate limits can be adjusted automatically over time or provisioned manually with Capacity Modes.
 
-We recommend tracking your Actions Rate and Limits using Temporal metrics to assess your use cases specific needs. 
+We recommend tracking your Actions Rate and Limits using Temporal metrics to assess your use cases specific needs.
 See [Monitoring Trends Against Limits](/cloud/service-health#rps-aps-rate-limits) to track usage trends.
 
 :::note Actions that don't count against APS
diff --git a/docs/evaluate/temporal-cloud/limits.mdx b/docs/evaluate/temporal-cloud/limits.mdx
index d90ff3d037..07adf47287 100644
--- a/docs/evaluate/temporal-cloud/limits.mdx
+++ b/docs/evaluate/temporal-cloud/limits.mdx
@@ -62,6 +62,7 @@ The following limits apply at the Namespace level.
   - Automatically increases (and decreases) based on the last 7 days of APS usage. Will never go below the default limit.
   - See [Capacity Modes](/cloud/capacity-modes).
   - [Contact support](/cloud/support#support-ticket).
+- What happens when you exceed the limit: See [Throttling behavior](#throttling-behavior) below.
 
 See the [Actions page](/cloud/actions) for the list of actions. 
 
@@ -106,6 +107,20 @@ This approach uniformly distributes the scheduled Workflow Execution launches th
 
 All read calls are subject to the Visibility API rate limit.
 
+### Throttling behavior
+
+When you exceed your APS, RPS, or OPS limits, Temporal Cloud throttles requests. Here's what happens:
+
+1. **Priority-based throttling**: Low-priority operations are throttled first. Higher-priority operations like `StartWorkflowExecution`, `SignalWorkflowExecution`, and `UpdateWorkflowExecution` continue to go through when possible.
+2. **Burst tolerance**: A burst factor allows temporary spikes above your limit for a short duration without immediate throttling.
+3. **ResourceExhausted errors**: When throttled, the server returns a `ResourceExhausted` gRPC error. SDK clients automatically retry these based on the default gRPC retry policy.
+4. **Potential failure**: If throttling persists beyond the SDK's retry limit, client calls fail. This means work _can_ be lost if you don't handle these failures.
+
+**Best practices for handling throttling:**
+- Log any failed `StartWorkflowExecution`, `SignalWorkflowExecution`, or `UpdateWorkflowExecution` calls on the client side, including the payload, so you can retry or backfill later.
+- Set up [Cloud metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when throttling occurs and when you approach your limits.
+- Consider [Provisioned Capacity](/cloud/capacity-modes#provisioned-capacity) if you have predictable spikes or need guaranteed throughput.
+
 ### Nexus Rate Limit {#nexus-rate-limits}
 
 Nexus requests (such as starting a Nexus Operation or sending a Nexus completion callback) are counted as part of the overall Namespace RPS limit.

From c7f27ba6494fc53945d4e46a4cf2eccb8a6d4224 Mon Sep 17 00:00:00 2001
From: Ben Echols <benjamin.echols@temporal.io>
Date: Wed, 28 Jan 2026 20:02:19 -0800
Subject: [PATCH 2/3] Move throttling section, add OSS code link, cross-link
 pages

- Move throttling behavior section above schedules rate limit
- Add link to OSS throttling priorities (quotas.go)
- Link from capacity-modes to limits#throttling-behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/cloud/capacity-modes.mdx           |  2 ++
 docs/evaluate/temporal-cloud/limits.mdx | 28 ++++++++++++-------------
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/docs/cloud/capacity-modes.mdx b/docs/cloud/capacity-modes.mdx
index 475c1b2e31..b631ec7f40 100644
--- a/docs/cloud/capacity-modes.mdx
+++ b/docs/cloud/capacity-modes.mdx
@@ -80,6 +80,8 @@ Throttling limits the rate at which Actions are performed to prevent the Namespa
 - Log any failed client calls (with payloads) so you can retry or backfill later.
 - Set up [limit metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when approaching your limits.
 
+See [Throttling behavior](/cloud/limits#throttling-behavior) for more details.
+
 Your rate limits can be adjusted automatically over time or provisioned manually with Capacity Modes.
 
 We recommend tracking your Actions Rate and Limits using Temporal metrics to assess your use cases specific needs.
diff --git a/docs/evaluate/temporal-cloud/limits.mdx b/docs/evaluate/temporal-cloud/limits.mdx
index 07adf47287..cf1b0e9a9c 100644
--- a/docs/evaluate/temporal-cloud/limits.mdx
+++ b/docs/evaluate/temporal-cloud/limits.mdx
@@ -88,6 +88,20 @@ See the [glossary](/glossary#requests-per-second-rps) for more about RPS.
 
 See the [operations list](/references/operation-list) for the list of operations.
 
+### Throttling behavior
+
+When you exceed your APS, RPS, or OPS limits, Temporal Cloud throttles requests. Here's what happens:
+
+1. **Priority-based throttling**: Low-priority operations are throttled first. Higher-priority operations like `StartWorkflowExecution`, `SignalWorkflowExecution`, and `UpdateWorkflowExecution` continue to go through when possible. Temporal Cloud uses similar [throttling priorities as the open source server](https://github.com/temporalio/temporal/blob/main/service/frontend/configs/quotas.go#L66).
+2. **Burst tolerance**: A burst factor allows temporary spikes above your limit for a short duration without immediate throttling.
+3. **ResourceExhausted errors**: When throttled, the server returns a `ResourceExhausted` gRPC error. SDK clients automatically retry these based on the default gRPC retry policy.
+4. **Potential failure**: If throttling persists beyond the SDK's retry limit, client calls fail. This means work _can_ be lost if you don't handle these failures.
+
+**Best practices for handling throttling:**
+- Log any failed `StartWorkflowExecution`, `SignalWorkflowExecution`, or `UpdateWorkflowExecution` calls on the client side, including the payload, so you can retry or backfill later.
+- Set up [Cloud metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when throttling occurs and when you approach your limits.
+- Consider [Provisioned Capacity](/cloud/capacity-modes#provisioned-capacity) if you have predictable spikes or need guaranteed throughput.
+
 ### Schedules rate limit
 
 - Scope: Namespace
@@ -107,20 +121,6 @@ This approach uniformly distributes the scheduled Workflow Execution launches th
 
 All read calls are subject to the Visibility API rate limit.
 
-### Throttling behavior
-
-When you exceed your APS, RPS, or OPS limits, Temporal Cloud throttles requests. Here's what happens:
-
-1. **Priority-based throttling**: Low-priority operations are throttled first. Higher-priority operations like `StartWorkflowExecution`, `SignalWorkflowExecution`, and `UpdateWorkflowExecution` continue to go through when possible.
-2. **Burst tolerance**: A burst factor allows temporary spikes above your limit for a short duration without immediate throttling.
-3. **ResourceExhausted errors**: When throttled, the server returns a `ResourceExhausted` gRPC error. SDK clients automatically retry these based on the default gRPC retry policy.
-4. **Potential failure**: If throttling persists beyond the SDK's retry limit, client calls fail. This means work _can_ be lost if you don't handle these failures.
-
-**Best practices for handling throttling:**
-- Log any failed `StartWorkflowExecution`, `SignalWorkflowExecution`, or `UpdateWorkflowExecution` calls on the client side, including the payload, so you can retry or backfill later.
-- Set up [Cloud metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when throttling occurs and when you approach your limits.
-- Consider [Provisioned Capacity](/cloud/capacity-modes#provisioned-capacity) if you have predictable spikes or need guaranteed throughput.
-
 ### Nexus Rate Limit {#nexus-rate-limits}
 
 Nexus requests (such as starting a Nexus Operation or sending a Nexus completion callback) are counted as part of the overall Namespace RPS limit.

From 6d8c3d7fed95a4e74fd379cf93d03b298c6889d4 Mon Sep 17 00:00:00 2001
From: Ben Echols <benjamin.echols@temporal.io>
Date: Wed, 28 Jan 2026 20:39:27 -0800
Subject: [PATCH 3/3] Replace "burst" terminology with "throttling latency"

- capacity-modes.mdx: Change bullet point from "burst factor" to describe
  rate limiting latency
- limits.mdx: Rename "Burst tolerance" to "Throttling latency" and update
  description

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/cloud/capacity-modes.mdx           | 2 +-
 docs/evaluate/temporal-cloud/limits.mdx | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/cloud/capacity-modes.mdx b/docs/cloud/capacity-modes.mdx
index b631ec7f40..9115ec4c2c 100644
--- a/docs/cloud/capacity-modes.mdx
+++ b/docs/cloud/capacity-modes.mdx
@@ -72,7 +72,7 @@ Throttling limits the rate at which Actions are performed to prevent the Namespa
 
 **How throttling works:**
 - Low-priority operations are throttled first; higher-priority operations (like starting or signaling Workflows) continue when possible.
-- A burst factor allows temporary spikes above your limit without immediate throttling.
+- Rate limiting is not instantaneous, so usage may briefly exceed your limit before throttling takes effect.
 - When throttled, the server returns `ResourceExhausted` errors that SDK clients automatically retry.
 - If throttling persists beyond the SDK's retry limit, client calls can fail.
 
diff --git a/docs/evaluate/temporal-cloud/limits.mdx b/docs/evaluate/temporal-cloud/limits.mdx
index cf1b0e9a9c..7197632ab4 100644
--- a/docs/evaluate/temporal-cloud/limits.mdx
+++ b/docs/evaluate/temporal-cloud/limits.mdx
@@ -93,7 +93,7 @@ See the [operations list](/references/operation-list) for the list of operations
 When you exceed your APS, RPS, or OPS limits, Temporal Cloud throttles requests. Here's what happens:
 
 1. **Priority-based throttling**: Low-priority operations are throttled first. Higher-priority operations like `StartWorkflowExecution`, `SignalWorkflowExecution`, and `UpdateWorkflowExecution` continue to go through when possible. Temporal Cloud uses similar [throttling priorities as the open source server](https://github.com/temporalio/temporal/blob/main/service/frontend/configs/quotas.go#L66).
-2. **Burst tolerance**: A burst factor allows temporary spikes above your limit for a short duration without immediate throttling.
+2. **Throttling latency**: Rate limiting is not instantaneous, so usage may briefly exceed your limit before throttling takes effect.
 3. **ResourceExhausted errors**: When throttled, the server returns a `ResourceExhausted` gRPC error. SDK clients automatically retry these based on the default gRPC retry policy.
 4. **Potential failure**: If throttling persists beyond the SDK's retry limit, client calls fail. This means work _can_ be lost if you don't handle these failures.