diff --git a/docs/cloud/capacity-modes.mdx b/docs/cloud/capacity-modes.mdx index bfeb153e8c..9115ec4c2c 100644 --- a/docs/cloud/capacity-modes.mdx +++ b/docs/cloud/capacity-modes.mdx @@ -67,12 +67,24 @@ RPS and OPS are lower-level measures to control and balance request rates at th ### What happens when my Actions Rate exceeds my Limit? -When your Action rate exceeds your quota, Temporal Cloud throttles Actions until the rate matches your quota. -Throttling means limiting the rate at which Actions are performed to prevent the Namespace from exceeding its APS limit. -Your work is never lost and will continue at the limited pace until APS returns below the limit. +When your Action rate exceeds your quota, Temporal Cloud throttles Actions. +Throttling limits the rate at which Actions are performed to prevent the Namespace from exceeding its APS limit. + +**How throttling works:** +- Low-priority operations are throttled first; higher-priority operations (like starting or signaling Workflows) continue when possible. +- Rate limiting is not instantaneous, so usage may briefly exceed your limit before throttling takes effect. +- When throttled, the server returns `ResourceExhausted` errors that SDK clients automatically retry. +- If throttling persists beyond the SDK's retry limit, client calls can fail. + +**To avoid data loss during throttling:** +- Log any failed client calls (with payloads) so you can retry or backfill later. +- Set up [limit metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when approaching your limits. + +See [Throttling behavior](/cloud/limits#throttling-behavior) for more details. + Your rate limits can be adjusted automatically over time or provisioned manually with Capacity Modes. -We recommend tracking your Actions Rate and Limits using Temporal metrics to assess your use cases specific needs. +We recommend tracking your Actions Rate and Limits using Temporal metrics to assess your use cases specific needs. See [Monitoring Trends Against Limits](/cloud/service-health#rps-aps-rate-limits) to track usage trends. :::note Actions that don't count against APS diff --git a/docs/evaluate/temporal-cloud/limits.mdx b/docs/evaluate/temporal-cloud/limits.mdx index d90ff3d037..7197632ab4 100644 --- a/docs/evaluate/temporal-cloud/limits.mdx +++ b/docs/evaluate/temporal-cloud/limits.mdx @@ -62,6 +62,7 @@ The following limits apply at the Namespace level. - Automatically increases (and decreases) based on the last 7 days of APS usage. Will never go below the default limit. - See [Capacity Modes](/cloud/capacity-modes). - [Contact support](/cloud/support#support-ticket). +- What happens when you exceed the limit: See [Throttling behavior](#throttling-behavior) below. See the [Actions page](/cloud/actions) for the list of actions. @@ -87,6 +88,20 @@ See the [glossary](/glossary#requests-per-second-rps) for more about RPS. See the [operations list](/references/operation-list) for the list of operations. +### Throttling behavior + +When you exceed your APS, RPS, or OPS limits, Temporal Cloud throttles requests. Here's what happens: + +1. **Priority-based throttling**: Low-priority operations are throttled first. Higher-priority operations like `StartWorkflowExecution`, `SignalWorkflowExecution`, and `UpdateWorkflowExecution` continue to go through when possible. Temporal Cloud uses similar [throttling priorities as the open source server](https://github.com/temporalio/temporal/blob/main/service/frontend/configs/quotas.go#L66). +2. **Throttling latency**: Rate limiting is not instantaneous, so usage may briefly exceed your limit before throttling takes effect. +3. **ResourceExhausted errors**: When throttled, the server returns a `ResourceExhausted` gRPC error. SDK clients automatically retry these based on the default gRPC retry policy. +4. **Potential failure**: If throttling persists beyond the SDK's retry limit, client calls fail. This means work _can_ be lost if you don't handle these failures. + +**Best practices for handling throttling:** +- Log any failed `StartWorkflowExecution`, `SignalWorkflowExecution`, or `UpdateWorkflowExecution` calls on the client side, including the payload, so you can retry or backfill later. +- Set up [Cloud metrics](/cloud/metrics/openmetrics/metrics-reference#limit-metrics) to alert when throttling occurs and when you approach your limits. +- Consider [Provisioned Capacity](/cloud/capacity-modes#provisioned-capacity) if you have predictable spikes or need guaranteed throughput. + ### Schedules rate limit - Scope: Namespace