From b59047308919d889aaf5600b0ae9230cf1056b2f Mon Sep 17 00:00:00 2001 From: kawofong <14829553+kawofong@users.noreply.github.com> Date: Fri, 23 Jan 2026 16:22:40 -0500 Subject: [PATCH 1/4] Add knowledge hub and overview --- docs/best-practices/knowledge-hub/index.md | 39 +++++++ .../knowledge-hub/temporal-overview.md | 109 ++++++++++++++++++ sidebars.js | 12 ++ 3 files changed, 160 insertions(+) create mode 100644 docs/best-practices/knowledge-hub/index.md create mode 100644 docs/best-practices/knowledge-hub/temporal-overview.md diff --git a/docs/best-practices/knowledge-hub/index.md b/docs/best-practices/knowledge-hub/index.md new file mode 100644 index 0000000000..d2060ce937 --- /dev/null +++ b/docs/best-practices/knowledge-hub/index.md @@ -0,0 +1,39 @@ +--- +id: index +title: Temporal Knowledge Hub +sidebar_label: Knowledge Hub +description: A foundational template for organizations to create an internal knowledge base about the Temporal Platform, designed for customization by internal Temporal Platform teams. +toc_max_heading_level: 3 +keywords: + - temporal knowledge hub + - developer onboarding + - temporal best practices + - internal documentation +tags: + - Best Practices + - Knowledge Hub +--- + +The Temporal Knowledge Hub is a foundational template for organizations to create an internal knowledge base about the Temporal Platform. +It is designed for customization by internal Temporal Platform teams to facilitate structured developer onboarding and continuous education. + +For illustration, the content currently uses a hypothetical organization, "ABC Financial." +Users must follow the provided instructions (shown as `Notes`) to customize the content for their specific organization and operational needs. + +## Target audience + +The primary audience is the **Temporal Platform teams** within organizations. +These teams are responsible for owning and maintaining Temporal knowledge base for their engineering teams. + +The secondary audience is the engineering teams who use or need to learn the Temporal Platform. +They will consume the technical knowledge managed by the Temporal Platform teams. + +## Goals + +- Establish a centralized, consistently maintained repository of Temporal knowledge for internal developers. +- Streamline onboarding and support continuous professional development for engineering teams on the Temporal Platform. +- Reduce the Temporal Platform team's support load by providing comprehensive self-service documentation and established best practices. + +## Table of contents + +TODO: add links to the knowledge hub pages diff --git a/docs/best-practices/knowledge-hub/temporal-overview.md b/docs/best-practices/knowledge-hub/temporal-overview.md new file mode 100644 index 0000000000..588383f005 --- /dev/null +++ b/docs/best-practices/knowledge-hub/temporal-overview.md @@ -0,0 +1,109 @@ +--- +id: temporal-overview +title: Temporal Overview +sidebar_label: Temporal Overview +description: Learn what Temporal is, why users love it, and how it delivers business value across various industries. +toc_max_heading_level: 3 +keywords: + - temporal overview + - what is temporal + - durable execution + - temporal use cases +tags: + - Best Practices + - Knowledge Hub +--- + +## What is Temporal? + +:::note +Customize this introduction to describe Temporal that resonates with your developers. Highlight pain points Temporal solves for your developers. +::: + +Temporal provides a new way to build scalable, reliable applications. + +**Temporal** is an **open-source Durable Execution** platform that abstracts away the complexity of building distributed systems. +Durable Execution ensures that your application behaves correctly despite adverse conditions by guaranteeing that it will run to completion. +If a failure or a crash happens, your business processes keep running seamlessly without interruptions. + +With Temporal, engineering teams improve development velocity and deliver more reliable applications. + +Temporal is used for critical applications at enterprises like [Nvidia](https://temporal.io/blog/transforming-gpu-resource-management-with-temporal), [ANZ Bank](https://temporal.io/resources/case-studies/anz-story), [Netflix](https://temporal.io/resources/on-demand/netflix), [Snap](https://eng.snap.com/build_a_reliable_system_in_a_microservices_world_at_snap), [Yum! Brands](https://temporal.io/resources/on-demand/temporal-at-yum-brands), and AI leaders like [Replit](https://temporal.io/resources/case-studies/replit-uses-temporal-to-power-replit-agent-reliably-at-scale), [OpenAI](https://newsletter.pragmaticengineer.com/p/chatgpt-images). + +## Why users love Temporal + +:::note +Update this list to reflect why your organization chose Temporal. +::: + +1. **Durability**: your code never "forgets" where it is. If a server crashes or restarts, your function resumes exactly where it left off, ensuring no data or progress is ever lost. +2. **Easy-to-use code structure:** + * Choose between the Python and Java SDKs that best suit you and start writing your business logic. + * Integrate your favorite IDE, libraries, and tools into your development process. Temporal also supports polyglot and idiomatic programming - which enables developers to leverage the strengths of various programming languages and integrate Temporal into existing codebases. +3. **Simplicity:** You can achieve all of this without having to manage queues or complex state machines. Temporal does this all for you. +4. **Visibility:** Temporal provides a Web UI, SDK and Cloud metrics, and OpenTelemetry integration that gives developers unprecedented visibility into the current state of their applications. + +## Temporal business value + +:::note +Replace with metrics showing Temporal's impact at your organization. +::: + +At ABC Financial, Temporal serves as the development standard and platform for all asynchronous operations (e.g. payment, statement processing). +Since adopting Temporal, the company has saved millions of dollars. +The Temporal platform team continuously monitor the following business metrics to justify the adoption of Temporal: + +| Metric | Before Temporal | With Temporal | Result | +| ------ | --------------- | ------------- | ------ | +| **Service availability** | 99.7% (~2 hours of stalled transactions/month) | 99.99% (<5 minutes of stalled transactions/month) | $2.5M+ annual savings in operational costs | +| **On-call alert volume** | 28 actionable alerts/week | <3 alerts/week | ~90% reduction in on-call toil | +| **Feature time-to-market** | 9 months average (some projects take 12-18 months) | 3 months average | 66% faster product delivery | + +## Temporal use cases at ABC Financial + +:::note +Replace with Temporal use cases for your organization. +::: + +### FinTech/Financial Services + +1. **Payment processing** - Reliable payment orchestration with automatic retries and compensation logic (ex. [Block using Temporal](https://temporal.io/resources/on-demand/block-real-world-payments) for their checkout processes) +2. **Customer onboarding** - Leverage Temporal for multi-step customer verification and account setup processes (ex. [Mollie](https://temporal.io/resources/case-studies/mollie-payments-maximizes-operational-efficiency) for their customer onboarding processes) +3. **Cryptocurrency operations** - Orchestrate blockchain payments and crypto transactions (ex. [Coinbase](https://temporal.io/resources/case-studies/coinbase) uses Temporal for reliable crypto transactions) +4. **Operational workflows** - Various operational processes requiring high reliability + +### Banking + +1. **Loan origination** - Long-running approval processes with complex decision trees and human approvals (ex. [ANZ accelerates home loan origination](https://temporal.io/resources/case-studies/anz-story) with Temporal) +2. **Payment processing** - Core banking payment systems with high reliability requirements (ex. [JPMC uses Temporal](https://temporal.io/resources/on-demand/payments-modernization-jpmc) to handle complex transactions across multiple systems) +3. **Digital banking modernization** - Replacing legacy mainframe systems with cloud-native workflows (ex. [Will Bank](https://temporal.io/resources/on-demand/how-will-bank-leverages-temporal-to-handle-2-million-customers) modernized boleto processing and scaled to millions with Temporal) + +### Tech/Software + +1. **Data pipelines** - Orchestrate complex data processing workflows with reliability guarantees (ex. [Netflix](https://temporal.io/resources/on-demand/netflix) powers critical data pipelines on Temporal) +2. **Microservices deployment** - Coordinate deployment processes across distributed systems (ex. [Box](https://temporal.io/resources/case-studies/box) uses Temporal as a central "brain" for content operations) +3. **Workflow orchestration** - General workflow orchestration, improving development efficiency (ex. [AutoKitteh](https://temporal.io/resources/case-studies/autokitteh) increased reliability and reduced development effort with Temporal) +4. **Cloud migration** - Leverage Temporal for orchestrating complex cloud migration processes (ex. [SAP Concur](https://temporal.io/resources/case-studies/sap-concur) orchestrated a phased migration with Temporal) +5. **Infrastructure management** - Coordinate distributed operations and transactional changes reliably (ex. [DigitalOcean](https://temporal.io/resources/case-studies/digitalocean) reduced resources and developer backlog with Temporal) + +### AI + +1. **Long-running AI agents** - Durable execution for sophisticated agents requiring human-in-the-loop interactions (ex. [Replit uses Temporal](https://temporal.io/resources/case-studies/replit-uses-temporal-to-power-replit-agent-reliably-at-scale) to power Replit Agent reliably at scale) +2. **AI orchestration** - Coordinating multi-agent systems and LLM calls with fallback strategies (ex. [Dubber](https://temporal.io/resources/case-studies/dubber) runs conversational AI pipelines on Temporal) +3. **Data orchestration** - Managing complex AI/ML pipelines and model training workflows (ex. [Descript](https://temporal.io/resources/case-studies/descript) orchestrates applied-AI pipelines with Temporal) + +### Healthcare + +1. **Clinical assessments and diagnostics orchestration** - Orchestrate multi-step clinical assessments and diagnostic pipelines (ex. [Linus Health](https://temporal.io/resources/on-demand/transitioning-durable-workflows-cognitive-healthcare) uses Temporal to orchestrate cognitive assessments and analytics end-to-end) +2. **AI/ML inference and data processing in healthcare contexts** - Long-running AI/ML workflows for preprocessing, model inference, post-processing, and results delivery (ex. [Zebra Medical Vision](https://temporal.io/resources/case-studies/zebra-medical-vision)'s applied-AI diagnostics pipeline relies on Temporal for reliability and visibility) +3. **Medical imaging and bioinformatics pipelines** - Reliable, scalable orchestration for compute-heavy imaging workflows, transcription/feature extraction, and downstream analysis (ex. [Jackson Laboratory](https://temporal.io/resources/on-demand/imaging-workflows-temporal-cure-cancer) uses Temporal for imaging workflows and biological data science pipelines) + +### Retail + +1. **Order management and bookings** - Managing complex order fulfillment processes from payment to delivery (ex. [Yum! Brands](https://temporal.io/resources/on-demand/temporal-at-yum-brands) processes the majority of digital orders as Temporal Workflows) +2. **Orchestrating distributed transactions** - Coordinating multi-step e-commerce workflows (ex. [Vinted](https://temporal.io/resources/case-studies/vinted-10-12-million-worflows-daily-dev-velocity-low-cost) runs payment workflows at massive scale on Temporal) + +### Travel/Logistics + +1. **Logistics orchestration** - Managing complex shipping and delivery workflows (ex. [Maersk](https://temporal.io/resources/case-studies/maersk) built a "time machine" for logistics with Temporal to speed feature delivery) +2. **Booking management** - Long-running reservation and travel coordination processes (ex. [Turo](https://temporal.io/resources/on-demand/temporal-adoption-and-integration-at-turo) describes Temporal adoption and integration for durable, user-facing flows) diff --git a/sidebars.js b/sidebars.js index 567a21a3fd..0c3b12f164 100644 --- a/sidebars.js +++ b/sidebars.js @@ -648,6 +648,18 @@ module.exports = { 'best-practices/cloud-access-control', 'best-practices/security-controls', 'best-practices/worker', + { + type: 'category', + label: 'Knowledge Hub', + collapsed: true, + link: { + type: 'doc', + id: 'best-practices/knowledge-hub/index', + }, + items: [ + 'best-practices/knowledge-hub/temporal-overview', + ], + }, ], }, { From d7cd78786ac9b1f0e9858c0d48428e193d942938 Mon Sep 17 00:00:00 2001 From: kawofong <14829553+kawofong@users.noreply.github.com> Date: Fri, 23 Jan 2026 17:06:57 -0500 Subject: [PATCH 2/4] Add decision framework and getting started --- .../knowledge-hub/decision-framework.md | 138 +++++++++++++ .../knowledge-hub/getting-started.md | 189 ++++++++++++++++++ docs/best-practices/knowledge-hub/index.md | 6 +- .../knowledge-hub/temporal-overview.md | 4 + 4 files changed, 336 insertions(+), 1 deletion(-) create mode 100644 docs/best-practices/knowledge-hub/decision-framework.md create mode 100644 docs/best-practices/knowledge-hub/getting-started.md diff --git a/docs/best-practices/knowledge-hub/decision-framework.md b/docs/best-practices/knowledge-hub/decision-framework.md new file mode 100644 index 0000000000..5ef5ce1a4b --- /dev/null +++ b/docs/best-practices/knowledge-hub/decision-framework.md @@ -0,0 +1,138 @@ +--- +id: decision-framework +title: Temporal Decision Framework +sidebar_label: Decision Framework +description: A guide to help you determine whether Temporal is the right solution for your use case. +toc_max_heading_level: 3 +keywords: + - temporal decision framework + - when to use temporal + - temporal use cases + - temporal alternatives +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +This guide helps you quickly determine whether Temporal is the right solution for your use case. + +## Temporal decision framework + +:::note +Tailor these questions to match your organization's technical landscape. +::: + +To decide whether Temporal is a suitable solution for your use case, ask yourself 3 questions: + +1. **Does your digital process have multiple steps that can fail independently?** +2. **Do you need the process to survive failures?** +3. **Does your process span multiple services, APIs, or long time periods (i.e. >10 seconds)?** + +If you answered "**yes**" to 2 or more questions, Temporal is likely a good fit. Continue reading. + +If you answered "**no**" to all three questions, consider alternatives first. Skip to [Bad use cases for Temporal](#bad-use-cases-for-temporal) to explore alternative solutions. + +## Temporal benefits + +:::note +Highlight benefits that address your developers' pain points. +::: + +1. **Durable Execution** - your code will always complete. + * Automatic retry, recovery from infrastructure failures, durable state persistence, and exactly-once execution semantics—all without custom code. +2. **Developer velocity** - ship faster with less code to maintain. + * Write business logic in familiar languages, collaborate with developers across language barriers, eliminate boilerplate infrastructure code, and leverage built-in testing for rapid iteration. +3. **Audit trail** - complete visibility in your digital process. + * Immutable execution history, self-documenting Workflow execution, and operational transparency. +4. **Priority and Fairness** - enterprise-grade multi-tenancy. + * Priority-based execution, and fair distribution of Workflow Executions across your customer base or tenant. +5. **Workflow fabric** - break down development silo. + * Cross-team Workflow orchestration with reusable operations, cross-namespace coordination, and service registry for discoverability. + +## Good use cases for Temporal + +:::note +Replace with use cases from your domain. See [Customer Stories](https://temporal.io/in-use) for inspiration. +::: + +### Business transactions + +1. **Payment processing** + * **Why Temporal is perfect**: Multi-party coordination with compensation logic, audit requirements, idempotency guarantees, timeout handling for authorizations that expire, and scalability to support more than billions of transactions per day. +2. **Order management** + * **Why Temporal is perfect**: Long-running state machines spanning hours to days with complex state transitions, human intervention, parallel operations, different order priority, variable timing per order, and support for more than millions of orders per hour. +3. **Mortgage underwriting** + * **Why Temporal is perfect**: Weeks-long processes with complex decision trees, multiple external integrations, human approvals, strict compliance requirements, and durable state persistence. + +### Customer experience + +1. **Marketing campaign** + * **Why Temporal is perfect**: Multi-channel orchestration with time-based sequencing and long campaign durations with dynamic personalization. +2. **Customer onboarding** + * **Why Temporal is perfect:** Great for long-running, multi-step, and sometimes human-in-the-loop processes that onboarding often requires. + +### Data engineering + +1. **Document processing** + * **Why Temporal is perfect**: Multi-stage pipelines with variable processing times, external service dependencies, rate limit requirements, and coordinated large-scale processing. +2. **Data pipeline** + * **Why Temporal is perfect**: Data orchestration with complex dependencies, incremental processing, backfill coordination, cross-system dependencies, SLA monitoring, and idempotent execution. +3. **Video processing** + * **Why Temporal is perfect**: Long-running compute, resource-intensive GPU activities, complex pipelines with parallel variant generation, failure isolation, and cost-optimized scheduling. + +### AI/ML + +1. **ML inference** + * **Why Temporal is perfect**: Multi-model orchestration with fallback logic, batch and real-time handling, feature engineering, and comprehensive audit trail. +2. **RAG** + * **Why Temporal is perfect**: Multi-step retrieval with hybrid search, context assembly from multiple sources, LLM orchestration with retries and fallbacks, and evaluation pipeline tracking. +3. **AI agents** + * **Why Temporal is perfect**: Long-running autonomous execution with tool orchestration, planning and replanning, human-in-the-loop controls, durable memory management, and safety guardrails. + +### Operational + +1. **Infrastructure management** + * **Why Temporal is perfect**: Multi-step provisioning with automatic rollback on failure, idempotent cloud operations, change management, and complete auditability. +2. **CI/CD** + * **Why Temporal is perfect**: Complex pipeline stages with environment promotion gates, parallel test execution, conditional deployment strategies, automatic rollback monitoring, and approval gates. + +## Bad use cases for Temporal + +:::note +Add anti-patterns specific to your organization's domain and technology stack. +::: + +1. **Simple Request-Response APIs** + * No failure recovery needed + * Better alternative: REST / gRPC server +2. **Real-time stream processing** + * High throughput (>1M events/sec) + * Ultra-low latency requirements (<100ms) + * No durable state needed + * Better alternative: Flink, Amazon Kinesis, Google Cloud Dataflow +3. **Database triggers & stored procedures** + * Logic tightly coupled to database + * Needs transactional guarantees within single DB + * No external service calls + * Better alternative: database native features +4. **Pure Compute Workloads** + * CPU/GPU intensive calculations + * No I/O or service calls + * No state management needed + * Better alternative: AWS Lambda, Spark, Ray + +## Next steps + +:::note +Add relevant links (i.e. support channel) for your developers to explore next. +::: + +To learn more: + +* [Run your first Temporal Workflow in under 30 minutes](./getting-started.md) +* Schedule a discovery session with the Temporal platform team to validate your use case +* [See how other teams are using Temporal today](./temporal-overview.md#temporal-use-cases-at-abc-financial) diff --git a/docs/best-practices/knowledge-hub/getting-started.md b/docs/best-practices/knowledge-hub/getting-started.md new file mode 100644 index 0000000000..26de03631e --- /dev/null +++ b/docs/best-practices/knowledge-hub/getting-started.md @@ -0,0 +1,189 @@ +--- +id: getting-started +title: Getting Started with Temporal +sidebar_label: Getting Started +description: A self-service tutorial to set up your Temporal development environment and run your first Workflow. +toc_max_heading_level: 3 +keywords: + - temporal getting started + - temporal tutorial + - temporal development environment + - first temporal workflow +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Update learning objectives to match your organization's onboarding goals. +::: + +In 30 minutes, you will: + +* Set up a complete Temporal development environment. +* Write and run your first Temporal Workflow locally. +* Run your Temporal Workflow in our dev environment. + +By the end, you'll have: + +* A functional "Hello World" Workflow. +* Access to our internal Temporal Cloud namespaces. + +## Prerequisites + +* One of the following supported programming languages: + * Python 3.12+ + * Java 17+ +* [Temporal CLI](https://docs.temporal.io/cli#install) +* [Docker Desktop](https://docs.docker.com/desktop/setup/install/mac-install/) +* [Visual Studio Code](https://code.visualstudio.com/download) + * Install these extensions: [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) + +## Development environment setup + +:::note +Replace with your organization's starter template and tooling. +::: + +You have two options for setting up your local environment. We strongly recommend using [Dev Container](https://containers.dev/) because it is 1) faster to set up and 2) maintained by the Temporal Platform team. + +### Option A: Dev Container (Recommended) + +1. Clone the [starter template](https://github.com/kawofong/temporal-python-template/tree/main) + +```shell +git clone git@github.com:kawofong/temporal-python-template.git +code temporal-python-template +``` + +2. Reopen VS Code in Dev Container. + +``` +1. In VS Code, open Command Palette (Cmd/Ctrl + Shift + P). +2. Select "Dev Containers: Reopen in Container". +3. Wait 2-3 minutes for image pull and setup. +4. After the Dev Container is running, open your browser and verify that you can access Temporal UI via http://localhost:8233. +``` + +3. Verify development environment. + +```shell +# 1. Run all unit tests; all tests shall succeed. +uv run poe test + +# 2. Run pre-commit on all files; all pre-commit validations shall succeed. +uv run poe pre-commit-run +``` + +**What's included in the dev container:** + +* Local Temporal development server +* Pre-configured git hooks and linters +* Debugging tools and extensions + +### Option B: From Scratch + +1. Clone the [starter-template](https://github.com/kawofong/temporal-python-template/tree/main) + +```shell +git clone git@github.com:kawofong/temporal-python-template.git +code temporal-python-template +``` + +2. Install dependency locally. + +```shell +# Requires `uv` to be installed in local machine. + +# 1. Install all uv dependencies. +uv sync --dev. + +# 2. Install pre-commit hooks. +uv run poe pre-commit-install +``` + +3. Verify development environment. + +```shell +# 1. Run all unit tests; all tests shall succeed. +uv run poe test + +# 2. Run pre-commit on all files; all pre-commit validations shall succeed. +uv run poe pre-commit-run + +# 3. Run Temporal dev server and verify UI is up via http://localhost:8233. +temporal server start-dev +``` + +## Run your first Workflow locally + +:::note +Update commands to match your starter template's Workflow examples. +::: + +Once your development environment is configured, you are ready to run your first Temporal Workflow locally. + +1. Run a Temporal Worker from the starter-template. + +```shell +uv run -m src.workflows.crawler.worker +``` + +2. Start a crawler Workflow Execution. + +```shell +uv run -m src.workflows.crawler.crawler_workflow +``` + +3. Wait for ~1 minute for the Workflow Execution to complete. + * You can verify completion of the Workflow Execution by: + * Observing the Workflow Execution output in your terminal or + * Navigating to the Temporal UI + +## Run your first Workflow on Temporal Cloud + +:::note +Link to your internal process for Temporal Cloud access and Namespace provisioning. +::: + +To run the same Workflow on Temporal Cloud, take the following steps: + +* Request Temporal Cloud access via an internal service ticket. +* Request a Temporal Cloud Namespace via an internal service ticket. + +Once your user account and Namespace are ready, follow these steps to run your Workflow on Temporal Cloud: + +1. Log in to Temporal Cloud. +2. Access your Temporal Cloud Namespace via the Temporal Cloud UI. +3. Generate an [API key via Temporal Cloud UI](https://docs.temporal.io/cloud/api-keys#generate-api-keys-with-the-temporal-cloud-ui). +4. Replace the Temporal Client code in [src/workflows/crawler/worker.py](https://github.com/kawofong/temporal-python-template/blob/main/src/workflows/crawler/worker.py#L21) and [src/workflows/crawler/crawler_workflow.py](https://github.com/kawofong/temporal-python-template/blob/main/src/workflows/crawler/crawler_workflow.py#L101). + +```python +client = await Client.connect( + "..tmprl.cloud:7233", + namespace=".", + api_key="your-api-key", + tls=True, # Required for Temporal Cloud +) +``` + +5. Run the Temporal Worker from the starter-template. + +```shell +uv run -m src.workflows.crawler.worker +``` + +6. Start the crawler Workflow Execution. + +```shell +uv run -m src.workflows.crawler.crawler_workflow +``` + +7. Wait for ~1 minute for the Workflow Execution to complete. + * You can verify completion of the Workflow Execution by: + * Observing the Workflow Execution output in your terminal or + * Navigating to the Temporal Cloud UI diff --git a/docs/best-practices/knowledge-hub/index.md b/docs/best-practices/knowledge-hub/index.md index d2060ce937..8bcaf5c36d 100644 --- a/docs/best-practices/knowledge-hub/index.md +++ b/docs/best-practices/knowledge-hub/index.md @@ -18,7 +18,11 @@ The Temporal Knowledge Hub is a foundational template for organizations to creat It is designed for customization by internal Temporal Platform teams to facilitate structured developer onboarding and continuous education. For illustration, the content currently uses a hypothetical organization, "ABC Financial." -Users must follow the provided instructions (shown as `Notes`) to customize the content for their specific organization and operational needs. +Users must follow the provided instructions (see below for an example) to customize the content for their specific organization and operational needs. + +:::note +On each page, instructions will be shown in note banners, like this one. +::: ## Target audience diff --git a/docs/best-practices/knowledge-hub/temporal-overview.md b/docs/best-practices/knowledge-hub/temporal-overview.md index 588383f005..f83b7ace07 100644 --- a/docs/best-practices/knowledge-hub/temporal-overview.md +++ b/docs/best-practices/knowledge-hub/temporal-overview.md @@ -14,6 +14,10 @@ tags: - Knowledge Hub --- +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + ## What is Temporal? :::note From d3a860e68b19e0bbb5eb1784da6bd1a75ab43bc0 Mon Sep 17 00:00:00 2001 From: kawofong <14829553+kawofong@users.noreply.github.com> Date: Tue, 27 Jan 2026 13:52:27 -0500 Subject: [PATCH 3/4] Add remaining pages --- .../knowledge-hub/architecture.md | 101 ++++++++++++ docs/best-practices/knowledge-hub/cost.md | 70 +++++++++ docs/best-practices/knowledge-hub/faqs.md | 26 ++++ docs/best-practices/knowledge-hub/index.md | 12 +- .../knowledge-hub/learning-path.md | 76 +++++++++ docs/best-practices/knowledge-hub/patterns.md | 147 ++++++++++++++++++ .../knowledge-hub/shared-responsibility.md | 139 +++++++++++++++++ docs/best-practices/knowledge-hub/support.md | 48 ++++++ .../knowledge-hub/troubleshooting.md | 146 +++++++++++++++++ sidebars.js | 10 ++ 10 files changed, 774 insertions(+), 1 deletion(-) create mode 100644 docs/best-practices/knowledge-hub/architecture.md create mode 100644 docs/best-practices/knowledge-hub/cost.md create mode 100644 docs/best-practices/knowledge-hub/faqs.md create mode 100644 docs/best-practices/knowledge-hub/learning-path.md create mode 100644 docs/best-practices/knowledge-hub/patterns.md create mode 100644 docs/best-practices/knowledge-hub/shared-responsibility.md create mode 100644 docs/best-practices/knowledge-hub/support.md create mode 100644 docs/best-practices/knowledge-hub/troubleshooting.md diff --git a/docs/best-practices/knowledge-hub/architecture.md b/docs/best-practices/knowledge-hub/architecture.md new file mode 100644 index 0000000000..f30e2a4288 --- /dev/null +++ b/docs/best-practices/knowledge-hub/architecture.md @@ -0,0 +1,101 @@ +--- +id: architecture +title: Temporal Architecture +sidebar_label: Architecture +description: Enterprise Temporal architecture covering Namespace conventions, Worker deployment patterns, network connectivity, and disaster recovery procedures. +toc_max_heading_level: 3 +keywords: + - temporal architecture + - temporal namespace + - temporal connectivity + - temporal worker deployment +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Customize this section to describe the architectural decisions and guardrails that shape how your developers build with Temporal. +::: + +This document defines our enterprise Temporal architecture, covering Namespace conventions, Worker deployment patterns, network connectivity, and disaster recovery procedures. + +## Temporal Cloud + +At ABC Financial, we use Temporal Cloud, which is a fully managed Temporal service. It offers a hassle-free way to run our Temporal Applications without the need to manage the underlying infrastructure. + +Our Workers and Temporal Applications connect to the Temporal Cloud service, which takes care of the persistence layer, scalability, and availability for you. + +## Namespace + +A Temporal Cloud [Namespace](https://docs.temporal.io/namespaces) is a unit of isolation within the Temporal platform. It ensures that Workflow executions, Task Queues, and resources are logically separated. + +:::note +Define a Namespace naming convention based on the Temporal [Namespace Best Practices](../managing-namespace.mdx). +::: + +At ABC Financial, we adhere to the following standards for our Temporal Cloud Namespaces: + +1. The naming convention is `--` + 1. Use at most 10 characters for business units (e.g. `consumer`, `commercial`, `investment`). + 2. Use at most 10 characters for domain (e.g. `payment`, `mortgage`). + 3. Use one of the support environments: `dev`, `stg`, `prd`. + +:::note +Link to your internal Namespace provisioning process so developers can self-serve. +::: + +File an internal service ticket to request for a new Temporal Cloud Namespace. + +:::note +List the default features and guardrails applied to new Namespaces by environment. +::: + +Based on the environment (i.e. `dev`, `stg`, `prd`), the following features are configured by our automation: + +| Feature | Development | Staging | Production | +| :---- | ----- | ----- | ----- | +| [Deletion Protection](https://docs.temporal.io/cloud/namespaces#delete-protection) | ✅ | ✅ | ✅ | +| [Private Connectivity](https://docs.temporal.io/cloud/connectivity) | ✅ | ✅ | ✅ | +| [Custom Encryption](https://docs.temporal.io/default-custom-data-converters) | ✅ | ✅ | ✅ | +| [Codec Server](https://docs.temporal.io/codec-server) | ✅ | ✅ | ✅ | +| [API Key](https://docs.temporal.io/cloud/api-keys) | ✅ | ✅ | ✅ | +| [API Key Rotation](https://docs.temporal.io/cloud/api-keys#rotate-an-api-key) | ✅ | ✅ | ✅ | +| [Observability](https://docs.temporal.io/evaluate/development-production-features/observability) | ✅ | ✅ | ✅ | +| [Audit Logs](https://docs.temporal.io/cloud/audit-logs) | ✅ | ✅ | ✅ | +| [Workflow History Export](https://docs.temporal.io/cloud/export) | ❌ | ❌ | ✅ | +| [Multi-Region Replication](https://docs.temporal.io/cloud/high-availability#multi-region-replication) | ❌ | ❌ | ✅ | + +## Connectivity + +:::note +Describe your network connectivity requirements so developers understand how Workers connect to Temporal Cloud. +::: + +At ABC Financial, private connectivity is required for all Temporal Cloud Namespaces for compliance reasons. [Private connectivity](https://docs.temporal.io/cloud/connectivity) eliminates traffic over public internet to Temporal Cloud. + +For reference, see below for official Temporal documentations on AWS and GCP private connectivity: + +* [AWS PrivateLink Connectivity | Temporal Platform Documentation](https://docs.temporal.io/cloud/connectivity/aws-connectivity) +* [Google Private Service Connect Connectivity | Temporal Platform Documentation](https://docs.temporal.io/cloud/connectivity/gcp-connectivity) + +## Worker + +:::note +Document your Worker deployment standards so developers know where and how to deploy. +::: + +At ABC Financial, Temporal Workers are deployed as containerized applications on Kubernetes clusters across AWS EKS and GCP GKE. + +All worker deployments are managed through [Helm](https://helm.sh/) charts, ensuring: + +* Standardized deployment configurations across clouds +* Version-controlled infrastructure as code +* Simplified rollbacks and updates +* Environment-specific value overrides + +[KEDA](https://keda.sh/docs/2.18/scalers/) is configured to auto-scale Workers based on Temporal Task Queue backlog. diff --git a/docs/best-practices/knowledge-hub/cost.md b/docs/best-practices/knowledge-hub/cost.md new file mode 100644 index 0000000000..a007fa5908 --- /dev/null +++ b/docs/best-practices/knowledge-hub/cost.md @@ -0,0 +1,70 @@ +--- +id: cost +title: Temporal Cloud Cost +sidebar_label: Cost +description: Understanding Temporal Cloud's consumption-based pricing model and tips for building cost-effective Workflows. +toc_max_heading_level: 3 +keywords: + - temporal cloud cost + - temporal pricing + - temporal actions + - temporal storage +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Add cost-saving tips to help developers optimize Temporal Cloud spending. +::: + +As we scale our usage of Temporal Cloud, understanding the cost model is critical for designing cost-efficient workflows. Temporal Cloud is consumption-based, and its pricing is based on Action and Storage. + +Our Enterprise contract covers base fees and support, but your specific namespace usage drives the variable costs. + +## Action + +Actions are the primary unit of consumption-based pricing for Temporal Cloud. They track billable operations within the Temporal Cloud Service. + +### What counts as an Action? + +* **Workflow Start**: Starting a Workflow execution. +* **Activity Start and Retry**: Starting and retrying an Activity. +* **Signals**: Sending a signal to a Workflow. +* **Timers**: A Timer firing. +* **Child Workflows**: Starting a Child Workflow. +* **Search Attribute upsert**: occurs for each invocation of `UpsertSearchAttributes` command + +For a complete list of billable Actions, see [Temporal Cloud Actions](https://docs.temporal.io/cloud/actions). + +### Cost-saving tip #1: Configure exponential backoff for Activity Retry + +Ensure your Activity Retry Policy uses a `BackoffCoefficient` > 1.0 (e.g. 2.0) and a reasonable `MaximumInterval`. + +**Why**: Each retry attempt counts as a billable Action. Aggressive, constant-interval retries during downstream outages will skyrocket Action usage and costs without progressing the workflow. + +## Storage + +Storage is charged based on Gigabyte-Hours (GB-h). There are two tiers: + +1. **Active Storage (higher cost)**: + * This is the storage used by `Open` workflows. + * It is 40x more expensive than Retained storage. +2. **Retained Storage (lower cost)**: + * This is the Event History of `Closed` Workflows. + * We pay this to keep the history available for debugging (based on the Namespace Retention policy). + +### Cost-saving tip #2: Use Continue-As-New for long-running Workflows + +Trigger `ContinueAsNew` periodically (e.g. every ~4,000 events or daily) for long-running or indefinite workflows. + +**Why**: This closes the current run, moving its Event History from Active Storage (expensive) to Retained Storage (cheap). This creates a ~97% reduction in storage costs for that history data. + +## What's next + +* [Temporal Cloud pricing](https://docs.temporal.io/cloud/pricing) +* [Temporal Cloud Actions](https://docs.temporal.io/cloud/actions) diff --git a/docs/best-practices/knowledge-hub/faqs.md b/docs/best-practices/knowledge-hub/faqs.md new file mode 100644 index 0000000000..fd82d8e2fc --- /dev/null +++ b/docs/best-practices/knowledge-hub/faqs.md @@ -0,0 +1,26 @@ +--- +id: faqs +title: Frequently Asked Questions +sidebar_label: FAQs +description: Common questions and answers about using Temporal at your organization. +toc_max_heading_level: 3 +keywords: + - temporal faqs + - temporal questions + - temporal help +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Capture common questions from developers to reduce repeated support requests. +::: + +## When should I use Temporal? + +There are many reasons why you should use Temporal. Use the [Temporal Decision Framework](./decision-framework.md) to help you decide. diff --git a/docs/best-practices/knowledge-hub/index.md b/docs/best-practices/knowledge-hub/index.md index 8bcaf5c36d..8fdee03e9c 100644 --- a/docs/best-practices/knowledge-hub/index.md +++ b/docs/best-practices/knowledge-hub/index.md @@ -40,4 +40,14 @@ They will consume the technical knowledge managed by the Temporal Platform teams ## Table of contents -TODO: add links to the knowledge hub pages +- [Temporal Overview](./temporal-overview.md) - Learn what Temporal is, why users love it, and how it delivers business value. +- [Decision Framework](./decision-framework.md) - Determine whether Temporal is the right solution for your use case. +- [Getting Started](./getting-started.md) - Set up your development environment and run your first Workflow. +- [Learning Paths](./learning-path.md) - Structured learning from foundational concepts to advanced patterns. +- [Architecture](./architecture.md) - Enterprise Temporal architecture covering Namespace conventions and Worker deployment. +- [Cost](./cost.md) - Understanding Temporal Cloud's consumption-based pricing model. +- [Shared Responsibility](./shared-responsibility.md) - Defining team responsibilities for building and managing Temporal applications. +- [Patterns](./patterns.md) - Common Temporal Workflow design patterns with code samples. +- [Troubleshooting](./troubleshooting.md) - How to observe and troubleshoot Temporal Workflows and Workers. +- [Support](./support.md) - Temporal Cloud support model and expert-led sessions. +- [FAQs](./faqs.md) - Common questions and answers about using Temporal. diff --git a/docs/best-practices/knowledge-hub/learning-path.md b/docs/best-practices/knowledge-hub/learning-path.md new file mode 100644 index 0000000000..35a455c222 --- /dev/null +++ b/docs/best-practices/knowledge-hub/learning-path.md @@ -0,0 +1,76 @@ +--- +id: learning-path +title: Learning Paths +sidebar_label: Learning Paths +description: Structured learning paths from foundational concepts to advanced patterns, tailored for Software Developers, AI Developers, and Platform Engineers. +toc_max_heading_level: 3 +keywords: + - temporal learning path + - temporal training + - temporal courses + - developer onboarding +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Customize learning paths for your developers to learn Temporal based on their skills and personas. +::: + +This guide provides a structured learning path from foundational concepts to advanced patterns, tailored specifically for Software Developers, AI Developers, and Platform Engineers. + +Temporal offers free, self-paced training courses that provide a solid grounding in the platform. Developers can sign up for free for these courses using their work emails at [learn.temporal.io](http://learn.temporal.io). + +## Foundation + +1. [Temporal 101: Introducing the Temporal Platform](https://learn.temporal.io/courses/temporal_101/) + 1. Learn the fundamentals of Temporal, including Workflows, Activities, and the core value proposition of Durable Execution. +2. [Temporal 102: Exploring Durable Execution](https://learn.temporal.io/courses/temporal_102/) + 1. You will acquire skills necessary to use Temporal throughout the development lifecycle by learning how to test, debug, and deploy applications. + +## Intermediate + +1. [Securing Application Data](https://learn.temporal.io/courses/appdatasec/) + 1. Provides general guidance and example applications for addressing user management, encryption standards, and key rotation. +2. [Interacting with Workflows](https://learn.temporal.io/courses/interacting_with_workflows/) + 1. Learn how to interact with Workflows using Signal, Update, and Query. +3. [Crafting an Error Handling Strategy](https://learn.temporal.io/courses/errstrat/) + 1. You will explore the nature of different types of failures and investigate the support that Temporal provides for addressing them. + +## Advanced + +The Advanced learning paths are tailored to 3 distinct user personas: + +1. [Platform Engineers](#platform-engineer) +2. [Software Developers](#software-developers) +3. [AI Developers](#ai-developers) + +### Platform Engineer {#platform-engineer} + +1. [Introduction to Temporal Cloud](https://learn.temporal.io/courses/intro_to_temporal_cloud/) + 1. Learn the role of Temporal Cloud, how to log into and navigate its Web UI, and how to perform tasks that new Temporal Cloud users may do in preparation for using this service. +2. [Best practices | Temporal Platform Documentation](https://docs.temporal.io/best-practices) + 1. Learn the foundational principles and best practices for using Temporal Cloud. + +### Software Developers {#software-developers} + +1. [Versioning Workflows](https://learn.temporal.io/courses/versioning/) + 1. In this course, you will learn how to safely evolve your Temporal application code in production. +2. [Worker Versioning](https://learn.temporal.io/courses/worker_versioning/) + 1. You will learn the benefits of Worker Versioning and evaluate tradeoffs of various versioning approaches. + +### AI Developers {#ai-developers} + +1. [Building Durable AI Applications with Temporal](https://learn.temporal.io/tutorials/ai/building-durable-ai-applications/) + 1. Learn how to build reliable AI applications using Temporal to orchestrate LLM calls, handle retries, and manage complex AI workflows that can recover from failures. +2. [Building Durable MCP Tool with Temporal](https://learn.temporal.io/tutorials/ai/building-mcp-tools-with-temporal/) + 1. Learn how to build long-running Model Context Protocol (MCP) tools using Temporal. + +## What's next + +* Check whether [Temporal is the right technology for your use case](./decision-framework.md). diff --git a/docs/best-practices/knowledge-hub/patterns.md b/docs/best-practices/knowledge-hub/patterns.md new file mode 100644 index 0000000000..efed2cb80d --- /dev/null +++ b/docs/best-practices/knowledge-hub/patterns.md @@ -0,0 +1,147 @@ +--- +id: patterns +title: Temporal Patterns +sidebar_label: Patterns +description: Common Temporal Workflow design patterns with code samples for Python and Java. +toc_max_heading_level: 3 +keywords: + - temporal patterns + - temporal design patterns + - temporal code samples + - temporal best practices +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Curate Temporal Workflow patterns relevant to your use cases so developers can quickly find solutions. +::: + +## Parallel Activity + +| | | +| :---- | :---- | +| **What it does** | Execute multiple Activities concurrently. | +| **Why use it** | Improve Workflow performance when Activities are independent and don't need sequential execution. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_parallel_activity.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloParallelActivity.java) | + +## Custom Search Attributes + +| | | +| :---- | :---- | +| **What it does** | Adds custom key-value metadata to Workflow executions. | +| **Why use it** | Enables advanced filtering, sorting, and visibility of Workflows in the Web UI and CLI based on business-specific data. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_search_attributes.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloSearchAttributes.java) | + +## Child Workflow + +| | | +| :---- | :---- | +| **What it does** | Spawns a new Workflow execution from within a parent Workflow. | +| **Why use it** | Partition work into smaller chunks, encapsulates Activities into observable components, and model business entities with different lifecycles. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_child_workflow.py), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/asyncchild) | + +## Continue as new + +| | | +| :---- | :---- | +| **What it does** | Atomically completes the current Workflow execution and starts a new one with the same Workflow ID. | +| **Why use it** | Prevents "Event History Limit Exceeded" errors and other [Workflow Execution limits](https://docs.temporal.io/cloud/limits#workflow-execution-event-history-limits) by clearing the history. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_continue_as_new.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloPeriodic.java) | + +## Exception handling + +| | | +| :---- | :---- | +| **What it does** | Implements logic to catch and respond to Activity or Workflow failures. | +| **Why use it** | Ensures system resilience by defining fallback logic, compensation transactions, or specific retry policies when errors occur. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_exception.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloException.java) | + +## Cancellation + +| | | +| :---- | :---- | +| **What it does** | Sends a request to gracefully terminate a running Workflow or specific scope. | +| **Why use it** | Stops unnecessary processing and cleans up resources when a result is no longer needed or a user explicitly stops the process. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_cancellation.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloCancellationScope.java) | + +## Async Activity completion + +| | | +| :---- | :---- | +| **What it does** | Enables the Activity Function to return without the Activity Execution completing. | +| **Why use it** | Essential for long-running external processes that can heartbeat and inform Temporal of its completion. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_async_activity_completion.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloAsyncActivityCompletion.java) | + +## Local Activity + +| | | +| :---- | :---- | +| **What it does** | Executes short-lived Activity logic within the same process as the Workflow Worker. | +| **Why use it** | Reduces latency and history size for short, high-throughput operations that do not require global durability guarantees. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_local_activity.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloLocalActivity.java) | + +## Batch Processing (Sliding Window) + +| | | +| :---- | :---- | +| **What it does** | Processes a large stream of items in controlled, concurrent chunks. | +| **Why use it** | Manages concurrency and throughput limits while efficiently processing high volumes of data without overwhelming downstream services. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/batch_sliding_window), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/batch/slidingwindow) | + +## Custom Metrics + +| | | +| :---- | :---- | +| **What it does** | Emits application-specific telemetry (counters, gauges, timers) from Workflows and Activities. | +| **Why use it** | Provides observability into business-level KPIs and specific Workflow performance characteristics beyond default system metrics. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/custom_metric), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/metrics) | + +## Encryption + +| | | +| :---- | :---- | +| **What it does** | Encrypts Workflow and Activity payloads client-side using a custom Data Converter. | +| **Why use it** | Ensures sensitive data remains secure and opaque to the Temporal Server, satisfying strict compliance and privacy requirements. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/encryption), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/encryptedpayloads) | + +## Polling + +| | | +| :---- | :---- | +| **What it does** | Periodically checks the state of an external system from within an Activity. | +| **Why use it** | Provides reliable integration with external APIs or systems that do not provide webhooks or asynchronous event notifications. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/polling), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/polling) | + +## Worker routing + +| | | +| :---- | :---- | +| **What it does** | Dynamically routes Activities to specific Task Queues monitored by designated Workers. | +| **Why use it** | Targets tasks to specific hosts or environments; required for file-system affinity, local caching strategies, or hardware-specific (e.g., GPU) operations. | +| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/worker_specific_task_queues), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/fileprocessing) | + +## Saga + +| | | +| :---- | :---- | +| **What it does** | Manages long-running, distributed transactions by executing a sequence of steps. If a step fails, it triggers "compensating actions" (undo operations) in reverse order to revert the changes made by previous steps. | +| **Why use it** | Ensures data consistency across microservices (e.g., booking a flight, hotel, and car) without locking resources for long periods. It handles partial failures gracefully by rolling back the system to a known consistent state. | +| **Code samples** | [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloSaga.java) | + +## Early Return + +| | | +| :---- | :---- | +| **What it does** | Uses "Update with Start" to begin a Workflow execution and synchronously return a result to the client (e.g., validation success) while continuing to process longer-running tasks (e.g., database updates, external API calls) in the background. | +| **Why use it** | Drastically reduces end-user latency in interactive applications. Users receive immediate feedback (like an "Order Received" confirmation) without waiting for the entire process to complete. | +| **Code samples** | [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/earlyreturn) | + +## Example Temporal Applications + +See [Temporal Code Exchange](https://temporal.io/code-exchange) for example Temporal applications. diff --git a/docs/best-practices/knowledge-hub/shared-responsibility.md b/docs/best-practices/knowledge-hub/shared-responsibility.md new file mode 100644 index 0000000000..00aac33237 --- /dev/null +++ b/docs/best-practices/knowledge-hub/shared-responsibility.md @@ -0,0 +1,139 @@ +--- +id: shared-responsibility +title: Shared Responsibility Model +sidebar_label: Shared Responsibility +description: Defining team responsibilities for building and managing Temporal applications between Platform and Application teams. +toc_max_heading_level: 3 +keywords: + - temporal shared responsibility + - temporal platform team + - temporal application team + - temporal governance +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Tailor this matrix to clarify ownership boundaries so developers know who to contact. +::: + +At ABC Financial, the ownership of Temporal applications is shared between the **Temporal Platform Team** (who manages Temporal Cloud infrastructure) and **Application Teams** (who build and run Temporal Workflows). + +*Key: ✅= responsible, ❌= not responsible, 🤝🏼= shared responsibility* + +### Identity Access Management (IAM) + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Temporal Cloud access ([go/temporal-request](http://go/temporal-request)) | ✅ | ❌ | +| [SAML](https://docs.temporal.io/cloud/saml) and [SCIM](https://docs.temporal.io/cloud/scim) configurations | ✅ | ❌ | +| Temporal Cloud [user groups](https://docs.temporal.io/cloud/user-groups) | ✅ | ❌ | +| User principal provisioning and de-provisioning | ✅ | ❌ | +| [User principal role](https://docs.temporal.io/cloud/users) assignment | ✅ | ❌ | +| [API key](https://docs.temporal.io/cloud/api-keys) provisioning | ✅ | ❌ | + +### Network Connectivity + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| [Private Connectivity](https://docs.temporal.io/cloud/connectivity) to Temporal Cloud | ✅ | ❌ | +| Firewall rules to Temporal Cloud | ✅ | ❌ | + +### Data Security + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Data compliance policy | ✅ | ❌ | +| [Data Converter](https://docs.temporal.io/evaluate/development-production-features/data-encryption) implementation | ✅ | ❌ | +| [Data Converter](https://docs.temporal.io/evaluate/development-production-features/data-encryption) usage | ❌ | ✅ | +| [Codec Server](https://docs.temporal.io/production-deployment/data-encryption) hosting | ✅ | ❌ | +| [Codec Server](https://docs.temporal.io/production-deployment/data-encryption) configuration (per Namespace) | ❌ | ✅ | + +### Infrastructure + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Temporal Cloud Namespace provisioning ([go/temporal-namespace](http://go/temporal-namespace)) | ✅ | ❌ | +| [Temporal Cloud metrics](https://docs.temporal.io/production-deployment/cloud/metrics/reference) | ✅ | ❌ | +| Temporal Cloud [Namespace rate limits](https://docs.temporal.io/cloud/limits#namespace-level) | ❌ | ✅ | +| Temporal Cloud [Namespace Capacity](https://docs.temporal.io/cloud/capacity-modes) | ❌ | ✅ | +| [Temporal Cloud audit logs](https://docs.temporal.io/cloud/audit-logs) | ✅ | ❌ | + +### Governance + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Temporal Platform Hub | ✅ | ❌ | +| [Temporal developer guide](#) | ✅ | ❌ | + +### Development + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Workflow development | ❌ | ✅ | +| Automated tests (i.e. unit, integration, [replay](https://docs.temporal.io/develop/java/testing-suite#replay)) | ❌ | ✅ | +| Workflow versioning | ❌ | ✅ | + +### Worker + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Worker identity authentication policy | ✅ | ❌ | +| Worker identity auth implementation | ❌ | ✅ | +| Worker identity auth rotation | ✅ | ❌ | +| Worker infrastructure health (e.g. Kubernetes health) | ✅ | ❌ | +| Worker deployment health | ❌ | ✅ | +| Worker configurations (i.e. Task Queue, Execution Slots) | 🤝🏼 (defaults) | 🤝🏼 (customization) | +| Worker auto-scaling framework (i.e. KEDA) | ✅ | ❌ | +| Worker auto-scaling configuration | ❌ | ✅ | + +### Temporal Application Deployment + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Build pipeline for Worker | ✅ | ❌ | +| Artifact management | ✅ | ❌ | +| Workflow versioning management (e.g. [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning)) policy | ✅ | ❌ | +| Worker build (i.e. Workflow and Worker Definition) | ❌ | ✅ | +| Worker build release (i.e. control which build to release and when) | ✅ | ❌ | + +### Observability + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Observability platform (e.g. Datadog, Dynatrace) | ✅ | ❌ | +| [Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics) collection | ✅ | ❌ | +| [Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics) configuration | ❌ | ✅ | +| Temporal custom metrics emission | ❌ | ✅ | +| [Temporal Cloud metrics](https://docs.temporal.io/cloud/metrics/openmetrics) collection | ✅ | ❌ | +| Monitoring dashboard ([go/temporal-dashboard](http://go/temporal-dashboard)) | ✅ | ❌ | +| Temporal Cloud platform alerts | ✅ | ❌ | +| Temporal Workflow alerts | ❌ | ✅ | + +### Operation + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Support coordination with Temporal (the company) | ✅ | ❌ | +| Load testing | ❌ | ✅ | +| Incident response | 🤝🏼 (platform incident) | 🤝🏼 (application incident) | + +### Cost + +| Responsibility | Platform Team | Application Team | +| :---- | ----- | ----- | +| Temporal Cloud platform cost | ✅ | ❌ | +| Temporal Cloud Namespace cost | ❌ | ✅ | + +## Decision framework + +When in doubt, ask yourself: + +* **Does the issue affect multiple teams or namespaces?** → Platform Team +* **Is it business logic or application-specific?** → Application Team +* **Does it require Temporal Cloud `Admin` access?** → Platform Team diff --git a/docs/best-practices/knowledge-hub/support.md b/docs/best-practices/knowledge-hub/support.md new file mode 100644 index 0000000000..4325522fc6 --- /dev/null +++ b/docs/best-practices/knowledge-hub/support.md @@ -0,0 +1,48 @@ +--- +id: support +title: Get Help from the Temporal Team +sidebar_label: Support +description: Temporal Cloud support model, how to submit tickets, and expert-led sessions available through Enterprise support. +toc_max_heading_level: 3 +keywords: + - temporal support + - temporal cloud support + - temporal enterprise + - temporal expert sessions +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +## Temporal Cloud support model + +:::note +Specify your support tier so developers understand the SLAs available to them. +::: + +At ABC Financial, we have **Enterprise** support for Temporal Cloud. With Enterprise support, Temporal offers the following response time targets for support tickets: + +| | P0 | P1 | P2 | P3 | +| :---- | ----- | ----- | ----- | ----- | +| Definition | **Production impacted** - Temporal Cloud service is unavailable or degraded with a significant impact. | **Production issue** - An issue related to production workloads running on the Temporal Cloud service, or a significant project, is blocked. | **General issues** - General Temporal Cloud service or other issues where there is no production impact or a workaround exists to mitigate the impact. | **General guidance** - Questions or an issue with the Temporal Cloud service that is not impacting system availability or functionality. | +| Response time target | 30 minutes (24×7) | 1 hour | 4 hours | 1 day | + +## How to submit a support ticket + +1. Go to [support.temporal.io](http://support.temporal.io). +2. If prompted, log in to Temporal Cloud using the same method you normally use (e.g., Google, Microsoft, email-password, or other methods). +3. You will be presented with a screen where you can view open and closed tickets for your Temporal account, as well as submit a new ticket. + +## Temporal account team + +:::note +List your Temporal contacts so developers know who to escalate to. Request a Calendly link from your Temporal team. +::: + +* Temporal Account Executive: Person +* Temporal Solution Architect: Person +* Temporal Dedicated Support Engineer: Person diff --git a/docs/best-practices/knowledge-hub/troubleshooting.md b/docs/best-practices/knowledge-hub/troubleshooting.md new file mode 100644 index 0000000000..b5113ddafd --- /dev/null +++ b/docs/best-practices/knowledge-hub/troubleshooting.md @@ -0,0 +1,146 @@ +--- +id: troubleshooting +title: Troubleshooting +sidebar_label: Troubleshooting +description: How to observe and troubleshoot Temporal Workflows and Workers across environments. +toc_max_heading_level: 3 +keywords: + - temporal troubleshooting + - temporal debugging + - temporal observability + - temporal alerts +tags: + - Best Practices + - Knowledge Hub +--- + +:::info +This page is part of the [Temporal Knowledge Hub](./index.md). +::: + +:::note +Define the escalation path so developers know how to get help when issues arise. +::: + +This article documents how to observe and troubleshoot Temporal Workflows and Workers across environments (i.e. `dev`, `prd`). + +## Detection + +The first step to troubleshooting is collecting Temporal Workflow telemetry and understanding the issue. + +:::note +Link to your monitoring dashboard so developers can self-diagnose Workflow issues. +::: + +At ABC Financial, the following observability tools are supported for Temporal Cloud: + +| Tool | Purpose | What it answers | +| :---- | :---- | :---- | +| [Temporal Cloud UI](https://cloud.temporal.io/) | Source of truth for Temporal Workflow Event History, status, and traces. | *What happened to the Workflow?* *What is the current Workflow status?* | +| Grafana | Provides a single-pane-of-glass monitoring for logs, metrics, and traces across ABC Financial applications. | *Are the Workers healthy and sufficiently scaled?* *What happened to the upstream and downstream services?* | + +### Gather context + +Before troubleshooting, collect this information: + +* **Namespace:** Which Temporal Cloud namespace? +* **Workflow ID:** Specific Workflow instance(s) affected +* **Time window**: When did the issue start? Is it ongoing or intermittent? +* **Recent changes**: Any recent deployments or configuration updates? +* **Impact Scope**: Single Workflow, specific Workflow Type, or entire Namespace? + +### Quick health checks + +Perform these checks before detailed investigation: + +1. **Is Temporal Cloud healthy?** + 1. Check [status.temporal.io](https://status.temporal.io). +2. **Are Workers healthy?** + 1. Grafana → Infrastructure → Filter by `service:temporal` +3. **Are there recent deployments?** + 1. Check Slack channel. + +## Respond + +:::note +Add runbooks for common issues so developers can resolve problems independently. +::: + +### Common issues and troubleshooting steps + +#### 1. Workflow not starting + +**Symptoms**: Workflow appears in Temporal Cloud UI as `Running`, but the Workflow is not executing. + +**Troubleshooting**: + +1. **Check Worker Registration** + * Datadog → Logs → Filter: `service:temporal "Registered workflow"` + * Verify your Workflow Type appears in Worker startup logs +2. **Verify Task Queue** + * Temporal UI → Search for Workflows on your Task Queue + * Confirm Task Queue name matches exactly (case-sensitive) between Temporal Client and Worker +3. **Check Client Connection** + * Datadog → Filter by your application service name + * Search for: `"Temporal"` AND `"connection"` OR `"authentication"` + * Look for API key or connection errors + +**Fix**: + +* Redeploy Worker if Workflow not registered. +* Correct Task Queue name mismatch in code. +* Contact Temporal Platform team for API key issues. + +## Escalation + +:::note +Define escalation procedures and contact information for the platform team. +::: + +Escalate to the Temporal platform team when the issue persists after following the troubleshooting steps above. + +Include the following information in your request: + +``` +1. Temporal Cloud Namespace +2. Workflow ID(s) and time window +3. Description of the issue +4. Context collected (from the Detection section) +5. Troubleshooting steps already attempted +6. Other helpful information (e.g. screenshots) +``` + +### Response time SLA + +:::note +Set response time expectations so developers know when to expect help. +::: + +* P1 (Production outage): 30 minutes +* P2 (Degraded performance): 4 hours +* P3 (Non-urgent issues): 1 business day + +## Alerts + +It is the application team's responsibility to detect Temporal issues. Hence, it is recommended that you create appropriate alerts to proactively catch issues early. + +:::note +Add alert examples that developers can copy for their Workflows. +::: + +Here are some example alerts: + +| Alert name | Metric | Condition | Channel | +| :---- | :---- | :---- | :---- | +| High Workflow failure rate | `temporal.workflow.failed` | > 10% failure rate over 10 minutes | Page | +| High Activity Schedule-to-Start latency | `temporal.activity.schedule_to_start_latency` (p95) | > 30 seconds for 15 minutes | Slack | +| High Worker CPU utilization | `kubernetes.cpu.usage.pct` | > 80% for 10 minutes | Slack | + +## Need help? + +:::note +Specify the Slack channel or support portal for developers to reach the platform team. +::: + +* Learn [how the Temporal platform can support you](./support.md). +* Reach out to the Temporal platform team via Slack. diff --git a/sidebars.js b/sidebars.js index 0c3b12f164..db65883220 100644 --- a/sidebars.js +++ b/sidebars.js @@ -658,6 +658,16 @@ module.exports = { }, items: [ 'best-practices/knowledge-hub/temporal-overview', + 'best-practices/knowledge-hub/decision-framework', + 'best-practices/knowledge-hub/getting-started', + 'best-practices/knowledge-hub/learning-path', + 'best-practices/knowledge-hub/architecture', + 'best-practices/knowledge-hub/cost', + 'best-practices/knowledge-hub/shared-responsibility', + 'best-practices/knowledge-hub/patterns', + 'best-practices/knowledge-hub/troubleshooting', + 'best-practices/knowledge-hub/support', + 'best-practices/knowledge-hub/faqs', ], }, ], From 7affb6c12ddecfbff1d1a131623d9fb243c32394 Mon Sep 17 00:00:00 2001 From: kawofong <14829553+kawofong@users.noreply.github.com> Date: Wed, 28 Jan 2026 10:55:46 -0500 Subject: [PATCH 4/4] Remove hub and add reference to it --- docs/best-practices/knowledge-hub.mdx | 54 +++++ .../knowledge-hub/architecture.md | 101 ---------- docs/best-practices/knowledge-hub/cost.md | 70 ------- .../knowledge-hub/decision-framework.md | 138 ------------- docs/best-practices/knowledge-hub/faqs.md | 26 --- .../knowledge-hub/getting-started.md | 189 ------------------ docs/best-practices/knowledge-hub/index.md | 53 ----- .../knowledge-hub/learning-path.md | 76 ------- docs/best-practices/knowledge-hub/patterns.md | 147 -------------- .../knowledge-hub/shared-responsibility.md | 139 ------------- docs/best-practices/knowledge-hub/support.md | 48 ----- .../knowledge-hub/temporal-overview.md | 113 ----------- .../knowledge-hub/troubleshooting.md | 146 -------------- sidebars.js | 23 +-- 14 files changed, 55 insertions(+), 1268 deletions(-) create mode 100644 docs/best-practices/knowledge-hub.mdx delete mode 100644 docs/best-practices/knowledge-hub/architecture.md delete mode 100644 docs/best-practices/knowledge-hub/cost.md delete mode 100644 docs/best-practices/knowledge-hub/decision-framework.md delete mode 100644 docs/best-practices/knowledge-hub/faqs.md delete mode 100644 docs/best-practices/knowledge-hub/getting-started.md delete mode 100644 docs/best-practices/knowledge-hub/index.md delete mode 100644 docs/best-practices/knowledge-hub/learning-path.md delete mode 100644 docs/best-practices/knowledge-hub/patterns.md delete mode 100644 docs/best-practices/knowledge-hub/shared-responsibility.md delete mode 100644 docs/best-practices/knowledge-hub/support.md delete mode 100644 docs/best-practices/knowledge-hub/temporal-overview.md delete mode 100644 docs/best-practices/knowledge-hub/troubleshooting.md diff --git a/docs/best-practices/knowledge-hub.mdx b/docs/best-practices/knowledge-hub.mdx new file mode 100644 index 0000000000..ff9d50153c --- /dev/null +++ b/docs/best-practices/knowledge-hub.mdx @@ -0,0 +1,54 @@ +--- +id: knowledge-hub +title: Knowledge Hub +sidebar_label: Knowledge Hub +description: + TBD +toc_max_heading_level: 3 +hide_table_of_contents: true +keywords: + - temporal best practices + - operational excellence +tags: + - Best Practices +--- + +As organizations scale their Temporal adoption, the need for centralized, consistent knowledge becomes critical. +Mature Temporal organizations establish internal knowledge hubs to address common challenges that emerge when multiple teams +adopt Temporal independently. + +### The problem + +Without a centralized knowledge base, organizations often experience: + +- **Fragmented expertise**: Tribal knowledge stays siloed within teams, leading to inconsistent Temporal implementations and + repeated mistakes across the organization. +- **Slow onboarding**: New developers spend weeks piecing together information from scattered sources, Slack threads, and + ad-hoc meetings. +- **Inconsistent patterns**: Teams develop their own conventions for Workflow design, error handling, and testing, + making cross-team collaboration and code reuse difficult. +- **Redundant support burden**: Platform teams answer the same questions repeatedly, diverting resources from strategic + work. +- **Compliance and security gaps**: Without standardized guidance, teams may inadvertently introduce security + vulnerabilities or miss compliance requirements. + +### Value of a Temporal knowledge hub + +Organizations that invest in an internal Temporal knowledge hub see measurable improvements: + +| Benefit | Impact | +| :--- | :--- | +| **Accelerated onboarding** | Reduce onboarding from weeks to days with clear learning paths and starter templates. | +| **Consistent standards** | Establish conventions for Namespaces, Workers, and error handling across all teams. | +| **Reduced support toil** | Up to 90% fewer repetitive questions with self-service documentation. | +| **Faster time-to-production** | Ship features faster with validated patterns and decision frameworks. | +| **Improved compliance** | Ensure consistent security controls and access management across teams. | + +### Next steps + +1. **Use the template**: Start with the [Temporal Platform Hub template](https://kawofong.github.io/temporal-platform-hub/) as your foundation. +It provides a proven structure covering everything from decision frameworks to escalation path. +2. **Assign ownership**: Designate a Platform team to customize the content and keep it current as your +Temporal practice evolves. +3. **Iterate**: Track which pages developers visit most and which questions still come to the Platform + team. Use this data to continuously improve the documentation. diff --git a/docs/best-practices/knowledge-hub/architecture.md b/docs/best-practices/knowledge-hub/architecture.md deleted file mode 100644 index f30e2a4288..0000000000 --- a/docs/best-practices/knowledge-hub/architecture.md +++ /dev/null @@ -1,101 +0,0 @@ ---- -id: architecture -title: Temporal Architecture -sidebar_label: Architecture -description: Enterprise Temporal architecture covering Namespace conventions, Worker deployment patterns, network connectivity, and disaster recovery procedures. -toc_max_heading_level: 3 -keywords: - - temporal architecture - - temporal namespace - - temporal connectivity - - temporal worker deployment -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Customize this section to describe the architectural decisions and guardrails that shape how your developers build with Temporal. -::: - -This document defines our enterprise Temporal architecture, covering Namespace conventions, Worker deployment patterns, network connectivity, and disaster recovery procedures. - -## Temporal Cloud - -At ABC Financial, we use Temporal Cloud, which is a fully managed Temporal service. It offers a hassle-free way to run our Temporal Applications without the need to manage the underlying infrastructure. - -Our Workers and Temporal Applications connect to the Temporal Cloud service, which takes care of the persistence layer, scalability, and availability for you. - -## Namespace - -A Temporal Cloud [Namespace](https://docs.temporal.io/namespaces) is a unit of isolation within the Temporal platform. It ensures that Workflow executions, Task Queues, and resources are logically separated. - -:::note -Define a Namespace naming convention based on the Temporal [Namespace Best Practices](../managing-namespace.mdx). -::: - -At ABC Financial, we adhere to the following standards for our Temporal Cloud Namespaces: - -1. The naming convention is `--` - 1. Use at most 10 characters for business units (e.g. `consumer`, `commercial`, `investment`). - 2. Use at most 10 characters for domain (e.g. `payment`, `mortgage`). - 3. Use one of the support environments: `dev`, `stg`, `prd`. - -:::note -Link to your internal Namespace provisioning process so developers can self-serve. -::: - -File an internal service ticket to request for a new Temporal Cloud Namespace. - -:::note -List the default features and guardrails applied to new Namespaces by environment. -::: - -Based on the environment (i.e. `dev`, `stg`, `prd`), the following features are configured by our automation: - -| Feature | Development | Staging | Production | -| :---- | ----- | ----- | ----- | -| [Deletion Protection](https://docs.temporal.io/cloud/namespaces#delete-protection) | ✅ | ✅ | ✅ | -| [Private Connectivity](https://docs.temporal.io/cloud/connectivity) | ✅ | ✅ | ✅ | -| [Custom Encryption](https://docs.temporal.io/default-custom-data-converters) | ✅ | ✅ | ✅ | -| [Codec Server](https://docs.temporal.io/codec-server) | ✅ | ✅ | ✅ | -| [API Key](https://docs.temporal.io/cloud/api-keys) | ✅ | ✅ | ✅ | -| [API Key Rotation](https://docs.temporal.io/cloud/api-keys#rotate-an-api-key) | ✅ | ✅ | ✅ | -| [Observability](https://docs.temporal.io/evaluate/development-production-features/observability) | ✅ | ✅ | ✅ | -| [Audit Logs](https://docs.temporal.io/cloud/audit-logs) | ✅ | ✅ | ✅ | -| [Workflow History Export](https://docs.temporal.io/cloud/export) | ❌ | ❌ | ✅ | -| [Multi-Region Replication](https://docs.temporal.io/cloud/high-availability#multi-region-replication) | ❌ | ❌ | ✅ | - -## Connectivity - -:::note -Describe your network connectivity requirements so developers understand how Workers connect to Temporal Cloud. -::: - -At ABC Financial, private connectivity is required for all Temporal Cloud Namespaces for compliance reasons. [Private connectivity](https://docs.temporal.io/cloud/connectivity) eliminates traffic over public internet to Temporal Cloud. - -For reference, see below for official Temporal documentations on AWS and GCP private connectivity: - -* [AWS PrivateLink Connectivity | Temporal Platform Documentation](https://docs.temporal.io/cloud/connectivity/aws-connectivity) -* [Google Private Service Connect Connectivity | Temporal Platform Documentation](https://docs.temporal.io/cloud/connectivity/gcp-connectivity) - -## Worker - -:::note -Document your Worker deployment standards so developers know where and how to deploy. -::: - -At ABC Financial, Temporal Workers are deployed as containerized applications on Kubernetes clusters across AWS EKS and GCP GKE. - -All worker deployments are managed through [Helm](https://helm.sh/) charts, ensuring: - -* Standardized deployment configurations across clouds -* Version-controlled infrastructure as code -* Simplified rollbacks and updates -* Environment-specific value overrides - -[KEDA](https://keda.sh/docs/2.18/scalers/) is configured to auto-scale Workers based on Temporal Task Queue backlog. diff --git a/docs/best-practices/knowledge-hub/cost.md b/docs/best-practices/knowledge-hub/cost.md deleted file mode 100644 index a007fa5908..0000000000 --- a/docs/best-practices/knowledge-hub/cost.md +++ /dev/null @@ -1,70 +0,0 @@ ---- -id: cost -title: Temporal Cloud Cost -sidebar_label: Cost -description: Understanding Temporal Cloud's consumption-based pricing model and tips for building cost-effective Workflows. -toc_max_heading_level: 3 -keywords: - - temporal cloud cost - - temporal pricing - - temporal actions - - temporal storage -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Add cost-saving tips to help developers optimize Temporal Cloud spending. -::: - -As we scale our usage of Temporal Cloud, understanding the cost model is critical for designing cost-efficient workflows. Temporal Cloud is consumption-based, and its pricing is based on Action and Storage. - -Our Enterprise contract covers base fees and support, but your specific namespace usage drives the variable costs. - -## Action - -Actions are the primary unit of consumption-based pricing for Temporal Cloud. They track billable operations within the Temporal Cloud Service. - -### What counts as an Action? - -* **Workflow Start**: Starting a Workflow execution. -* **Activity Start and Retry**: Starting and retrying an Activity. -* **Signals**: Sending a signal to a Workflow. -* **Timers**: A Timer firing. -* **Child Workflows**: Starting a Child Workflow. -* **Search Attribute upsert**: occurs for each invocation of `UpsertSearchAttributes` command - -For a complete list of billable Actions, see [Temporal Cloud Actions](https://docs.temporal.io/cloud/actions). - -### Cost-saving tip #1: Configure exponential backoff for Activity Retry - -Ensure your Activity Retry Policy uses a `BackoffCoefficient` > 1.0 (e.g. 2.0) and a reasonable `MaximumInterval`. - -**Why**: Each retry attempt counts as a billable Action. Aggressive, constant-interval retries during downstream outages will skyrocket Action usage and costs without progressing the workflow. - -## Storage - -Storage is charged based on Gigabyte-Hours (GB-h). There are two tiers: - -1. **Active Storage (higher cost)**: - * This is the storage used by `Open` workflows. - * It is 40x more expensive than Retained storage. -2. **Retained Storage (lower cost)**: - * This is the Event History of `Closed` Workflows. - * We pay this to keep the history available for debugging (based on the Namespace Retention policy). - -### Cost-saving tip #2: Use Continue-As-New for long-running Workflows - -Trigger `ContinueAsNew` periodically (e.g. every ~4,000 events or daily) for long-running or indefinite workflows. - -**Why**: This closes the current run, moving its Event History from Active Storage (expensive) to Retained Storage (cheap). This creates a ~97% reduction in storage costs for that history data. - -## What's next - -* [Temporal Cloud pricing](https://docs.temporal.io/cloud/pricing) -* [Temporal Cloud Actions](https://docs.temporal.io/cloud/actions) diff --git a/docs/best-practices/knowledge-hub/decision-framework.md b/docs/best-practices/knowledge-hub/decision-framework.md deleted file mode 100644 index 5ef5ce1a4b..0000000000 --- a/docs/best-practices/knowledge-hub/decision-framework.md +++ /dev/null @@ -1,138 +0,0 @@ ---- -id: decision-framework -title: Temporal Decision Framework -sidebar_label: Decision Framework -description: A guide to help you determine whether Temporal is the right solution for your use case. -toc_max_heading_level: 3 -keywords: - - temporal decision framework - - when to use temporal - - temporal use cases - - temporal alternatives -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -This guide helps you quickly determine whether Temporal is the right solution for your use case. - -## Temporal decision framework - -:::note -Tailor these questions to match your organization's technical landscape. -::: - -To decide whether Temporal is a suitable solution for your use case, ask yourself 3 questions: - -1. **Does your digital process have multiple steps that can fail independently?** -2. **Do you need the process to survive failures?** -3. **Does your process span multiple services, APIs, or long time periods (i.e. >10 seconds)?** - -If you answered "**yes**" to 2 or more questions, Temporal is likely a good fit. Continue reading. - -If you answered "**no**" to all three questions, consider alternatives first. Skip to [Bad use cases for Temporal](#bad-use-cases-for-temporal) to explore alternative solutions. - -## Temporal benefits - -:::note -Highlight benefits that address your developers' pain points. -::: - -1. **Durable Execution** - your code will always complete. - * Automatic retry, recovery from infrastructure failures, durable state persistence, and exactly-once execution semantics—all without custom code. -2. **Developer velocity** - ship faster with less code to maintain. - * Write business logic in familiar languages, collaborate with developers across language barriers, eliminate boilerplate infrastructure code, and leverage built-in testing for rapid iteration. -3. **Audit trail** - complete visibility in your digital process. - * Immutable execution history, self-documenting Workflow execution, and operational transparency. -4. **Priority and Fairness** - enterprise-grade multi-tenancy. - * Priority-based execution, and fair distribution of Workflow Executions across your customer base or tenant. -5. **Workflow fabric** - break down development silo. - * Cross-team Workflow orchestration with reusable operations, cross-namespace coordination, and service registry for discoverability. - -## Good use cases for Temporal - -:::note -Replace with use cases from your domain. See [Customer Stories](https://temporal.io/in-use) for inspiration. -::: - -### Business transactions - -1. **Payment processing** - * **Why Temporal is perfect**: Multi-party coordination with compensation logic, audit requirements, idempotency guarantees, timeout handling for authorizations that expire, and scalability to support more than billions of transactions per day. -2. **Order management** - * **Why Temporal is perfect**: Long-running state machines spanning hours to days with complex state transitions, human intervention, parallel operations, different order priority, variable timing per order, and support for more than millions of orders per hour. -3. **Mortgage underwriting** - * **Why Temporal is perfect**: Weeks-long processes with complex decision trees, multiple external integrations, human approvals, strict compliance requirements, and durable state persistence. - -### Customer experience - -1. **Marketing campaign** - * **Why Temporal is perfect**: Multi-channel orchestration with time-based sequencing and long campaign durations with dynamic personalization. -2. **Customer onboarding** - * **Why Temporal is perfect:** Great for long-running, multi-step, and sometimes human-in-the-loop processes that onboarding often requires. - -### Data engineering - -1. **Document processing** - * **Why Temporal is perfect**: Multi-stage pipelines with variable processing times, external service dependencies, rate limit requirements, and coordinated large-scale processing. -2. **Data pipeline** - * **Why Temporal is perfect**: Data orchestration with complex dependencies, incremental processing, backfill coordination, cross-system dependencies, SLA monitoring, and idempotent execution. -3. **Video processing** - * **Why Temporal is perfect**: Long-running compute, resource-intensive GPU activities, complex pipelines with parallel variant generation, failure isolation, and cost-optimized scheduling. - -### AI/ML - -1. **ML inference** - * **Why Temporal is perfect**: Multi-model orchestration with fallback logic, batch and real-time handling, feature engineering, and comprehensive audit trail. -2. **RAG** - * **Why Temporal is perfect**: Multi-step retrieval with hybrid search, context assembly from multiple sources, LLM orchestration with retries and fallbacks, and evaluation pipeline tracking. -3. **AI agents** - * **Why Temporal is perfect**: Long-running autonomous execution with tool orchestration, planning and replanning, human-in-the-loop controls, durable memory management, and safety guardrails. - -### Operational - -1. **Infrastructure management** - * **Why Temporal is perfect**: Multi-step provisioning with automatic rollback on failure, idempotent cloud operations, change management, and complete auditability. -2. **CI/CD** - * **Why Temporal is perfect**: Complex pipeline stages with environment promotion gates, parallel test execution, conditional deployment strategies, automatic rollback monitoring, and approval gates. - -## Bad use cases for Temporal - -:::note -Add anti-patterns specific to your organization's domain and technology stack. -::: - -1. **Simple Request-Response APIs** - * No failure recovery needed - * Better alternative: REST / gRPC server -2. **Real-time stream processing** - * High throughput (>1M events/sec) - * Ultra-low latency requirements (<100ms) - * No durable state needed - * Better alternative: Flink, Amazon Kinesis, Google Cloud Dataflow -3. **Database triggers & stored procedures** - * Logic tightly coupled to database - * Needs transactional guarantees within single DB - * No external service calls - * Better alternative: database native features -4. **Pure Compute Workloads** - * CPU/GPU intensive calculations - * No I/O or service calls - * No state management needed - * Better alternative: AWS Lambda, Spark, Ray - -## Next steps - -:::note -Add relevant links (i.e. support channel) for your developers to explore next. -::: - -To learn more: - -* [Run your first Temporal Workflow in under 30 minutes](./getting-started.md) -* Schedule a discovery session with the Temporal platform team to validate your use case -* [See how other teams are using Temporal today](./temporal-overview.md#temporal-use-cases-at-abc-financial) diff --git a/docs/best-practices/knowledge-hub/faqs.md b/docs/best-practices/knowledge-hub/faqs.md deleted file mode 100644 index fd82d8e2fc..0000000000 --- a/docs/best-practices/knowledge-hub/faqs.md +++ /dev/null @@ -1,26 +0,0 @@ ---- -id: faqs -title: Frequently Asked Questions -sidebar_label: FAQs -description: Common questions and answers about using Temporal at your organization. -toc_max_heading_level: 3 -keywords: - - temporal faqs - - temporal questions - - temporal help -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Capture common questions from developers to reduce repeated support requests. -::: - -## When should I use Temporal? - -There are many reasons why you should use Temporal. Use the [Temporal Decision Framework](./decision-framework.md) to help you decide. diff --git a/docs/best-practices/knowledge-hub/getting-started.md b/docs/best-practices/knowledge-hub/getting-started.md deleted file mode 100644 index 26de03631e..0000000000 --- a/docs/best-practices/knowledge-hub/getting-started.md +++ /dev/null @@ -1,189 +0,0 @@ ---- -id: getting-started -title: Getting Started with Temporal -sidebar_label: Getting Started -description: A self-service tutorial to set up your Temporal development environment and run your first Workflow. -toc_max_heading_level: 3 -keywords: - - temporal getting started - - temporal tutorial - - temporal development environment - - first temporal workflow -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Update learning objectives to match your organization's onboarding goals. -::: - -In 30 minutes, you will: - -* Set up a complete Temporal development environment. -* Write and run your first Temporal Workflow locally. -* Run your Temporal Workflow in our dev environment. - -By the end, you'll have: - -* A functional "Hello World" Workflow. -* Access to our internal Temporal Cloud namespaces. - -## Prerequisites - -* One of the following supported programming languages: - * Python 3.12+ - * Java 17+ -* [Temporal CLI](https://docs.temporal.io/cli#install) -* [Docker Desktop](https://docs.docker.com/desktop/setup/install/mac-install/) -* [Visual Studio Code](https://code.visualstudio.com/download) - * Install these extensions: [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) - -## Development environment setup - -:::note -Replace with your organization's starter template and tooling. -::: - -You have two options for setting up your local environment. We strongly recommend using [Dev Container](https://containers.dev/) because it is 1) faster to set up and 2) maintained by the Temporal Platform team. - -### Option A: Dev Container (Recommended) - -1. Clone the [starter template](https://github.com/kawofong/temporal-python-template/tree/main) - -```shell -git clone git@github.com:kawofong/temporal-python-template.git -code temporal-python-template -``` - -2. Reopen VS Code in Dev Container. - -``` -1. In VS Code, open Command Palette (Cmd/Ctrl + Shift + P). -2. Select "Dev Containers: Reopen in Container". -3. Wait 2-3 minutes for image pull and setup. -4. After the Dev Container is running, open your browser and verify that you can access Temporal UI via http://localhost:8233. -``` - -3. Verify development environment. - -```shell -# 1. Run all unit tests; all tests shall succeed. -uv run poe test - -# 2. Run pre-commit on all files; all pre-commit validations shall succeed. -uv run poe pre-commit-run -``` - -**What's included in the dev container:** - -* Local Temporal development server -* Pre-configured git hooks and linters -* Debugging tools and extensions - -### Option B: From Scratch - -1. Clone the [starter-template](https://github.com/kawofong/temporal-python-template/tree/main) - -```shell -git clone git@github.com:kawofong/temporal-python-template.git -code temporal-python-template -``` - -2. Install dependency locally. - -```shell -# Requires `uv` to be installed in local machine. - -# 1. Install all uv dependencies. -uv sync --dev. - -# 2. Install pre-commit hooks. -uv run poe pre-commit-install -``` - -3. Verify development environment. - -```shell -# 1. Run all unit tests; all tests shall succeed. -uv run poe test - -# 2. Run pre-commit on all files; all pre-commit validations shall succeed. -uv run poe pre-commit-run - -# 3. Run Temporal dev server and verify UI is up via http://localhost:8233. -temporal server start-dev -``` - -## Run your first Workflow locally - -:::note -Update commands to match your starter template's Workflow examples. -::: - -Once your development environment is configured, you are ready to run your first Temporal Workflow locally. - -1. Run a Temporal Worker from the starter-template. - -```shell -uv run -m src.workflows.crawler.worker -``` - -2. Start a crawler Workflow Execution. - -```shell -uv run -m src.workflows.crawler.crawler_workflow -``` - -3. Wait for ~1 minute for the Workflow Execution to complete. - * You can verify completion of the Workflow Execution by: - * Observing the Workflow Execution output in your terminal or - * Navigating to the Temporal UI - -## Run your first Workflow on Temporal Cloud - -:::note -Link to your internal process for Temporal Cloud access and Namespace provisioning. -::: - -To run the same Workflow on Temporal Cloud, take the following steps: - -* Request Temporal Cloud access via an internal service ticket. -* Request a Temporal Cloud Namespace via an internal service ticket. - -Once your user account and Namespace are ready, follow these steps to run your Workflow on Temporal Cloud: - -1. Log in to Temporal Cloud. -2. Access your Temporal Cloud Namespace via the Temporal Cloud UI. -3. Generate an [API key via Temporal Cloud UI](https://docs.temporal.io/cloud/api-keys#generate-api-keys-with-the-temporal-cloud-ui). -4. Replace the Temporal Client code in [src/workflows/crawler/worker.py](https://github.com/kawofong/temporal-python-template/blob/main/src/workflows/crawler/worker.py#L21) and [src/workflows/crawler/crawler_workflow.py](https://github.com/kawofong/temporal-python-template/blob/main/src/workflows/crawler/crawler_workflow.py#L101). - -```python -client = await Client.connect( - "..tmprl.cloud:7233", - namespace=".", - api_key="your-api-key", - tls=True, # Required for Temporal Cloud -) -``` - -5. Run the Temporal Worker from the starter-template. - -```shell -uv run -m src.workflows.crawler.worker -``` - -6. Start the crawler Workflow Execution. - -```shell -uv run -m src.workflows.crawler.crawler_workflow -``` - -7. Wait for ~1 minute for the Workflow Execution to complete. - * You can verify completion of the Workflow Execution by: - * Observing the Workflow Execution output in your terminal or - * Navigating to the Temporal Cloud UI diff --git a/docs/best-practices/knowledge-hub/index.md b/docs/best-practices/knowledge-hub/index.md deleted file mode 100644 index 8fdee03e9c..0000000000 --- a/docs/best-practices/knowledge-hub/index.md +++ /dev/null @@ -1,53 +0,0 @@ ---- -id: index -title: Temporal Knowledge Hub -sidebar_label: Knowledge Hub -description: A foundational template for organizations to create an internal knowledge base about the Temporal Platform, designed for customization by internal Temporal Platform teams. -toc_max_heading_level: 3 -keywords: - - temporal knowledge hub - - developer onboarding - - temporal best practices - - internal documentation -tags: - - Best Practices - - Knowledge Hub ---- - -The Temporal Knowledge Hub is a foundational template for organizations to create an internal knowledge base about the Temporal Platform. -It is designed for customization by internal Temporal Platform teams to facilitate structured developer onboarding and continuous education. - -For illustration, the content currently uses a hypothetical organization, "ABC Financial." -Users must follow the provided instructions (see below for an example) to customize the content for their specific organization and operational needs. - -:::note -On each page, instructions will be shown in note banners, like this one. -::: - -## Target audience - -The primary audience is the **Temporal Platform teams** within organizations. -These teams are responsible for owning and maintaining Temporal knowledge base for their engineering teams. - -The secondary audience is the engineering teams who use or need to learn the Temporal Platform. -They will consume the technical knowledge managed by the Temporal Platform teams. - -## Goals - -- Establish a centralized, consistently maintained repository of Temporal knowledge for internal developers. -- Streamline onboarding and support continuous professional development for engineering teams on the Temporal Platform. -- Reduce the Temporal Platform team's support load by providing comprehensive self-service documentation and established best practices. - -## Table of contents - -- [Temporal Overview](./temporal-overview.md) - Learn what Temporal is, why users love it, and how it delivers business value. -- [Decision Framework](./decision-framework.md) - Determine whether Temporal is the right solution for your use case. -- [Getting Started](./getting-started.md) - Set up your development environment and run your first Workflow. -- [Learning Paths](./learning-path.md) - Structured learning from foundational concepts to advanced patterns. -- [Architecture](./architecture.md) - Enterprise Temporal architecture covering Namespace conventions and Worker deployment. -- [Cost](./cost.md) - Understanding Temporal Cloud's consumption-based pricing model. -- [Shared Responsibility](./shared-responsibility.md) - Defining team responsibilities for building and managing Temporal applications. -- [Patterns](./patterns.md) - Common Temporal Workflow design patterns with code samples. -- [Troubleshooting](./troubleshooting.md) - How to observe and troubleshoot Temporal Workflows and Workers. -- [Support](./support.md) - Temporal Cloud support model and expert-led sessions. -- [FAQs](./faqs.md) - Common questions and answers about using Temporal. diff --git a/docs/best-practices/knowledge-hub/learning-path.md b/docs/best-practices/knowledge-hub/learning-path.md deleted file mode 100644 index 35a455c222..0000000000 --- a/docs/best-practices/knowledge-hub/learning-path.md +++ /dev/null @@ -1,76 +0,0 @@ ---- -id: learning-path -title: Learning Paths -sidebar_label: Learning Paths -description: Structured learning paths from foundational concepts to advanced patterns, tailored for Software Developers, AI Developers, and Platform Engineers. -toc_max_heading_level: 3 -keywords: - - temporal learning path - - temporal training - - temporal courses - - developer onboarding -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Customize learning paths for your developers to learn Temporal based on their skills and personas. -::: - -This guide provides a structured learning path from foundational concepts to advanced patterns, tailored specifically for Software Developers, AI Developers, and Platform Engineers. - -Temporal offers free, self-paced training courses that provide a solid grounding in the platform. Developers can sign up for free for these courses using their work emails at [learn.temporal.io](http://learn.temporal.io). - -## Foundation - -1. [Temporal 101: Introducing the Temporal Platform](https://learn.temporal.io/courses/temporal_101/) - 1. Learn the fundamentals of Temporal, including Workflows, Activities, and the core value proposition of Durable Execution. -2. [Temporal 102: Exploring Durable Execution](https://learn.temporal.io/courses/temporal_102/) - 1. You will acquire skills necessary to use Temporal throughout the development lifecycle by learning how to test, debug, and deploy applications. - -## Intermediate - -1. [Securing Application Data](https://learn.temporal.io/courses/appdatasec/) - 1. Provides general guidance and example applications for addressing user management, encryption standards, and key rotation. -2. [Interacting with Workflows](https://learn.temporal.io/courses/interacting_with_workflows/) - 1. Learn how to interact with Workflows using Signal, Update, and Query. -3. [Crafting an Error Handling Strategy](https://learn.temporal.io/courses/errstrat/) - 1. You will explore the nature of different types of failures and investigate the support that Temporal provides for addressing them. - -## Advanced - -The Advanced learning paths are tailored to 3 distinct user personas: - -1. [Platform Engineers](#platform-engineer) -2. [Software Developers](#software-developers) -3. [AI Developers](#ai-developers) - -### Platform Engineer {#platform-engineer} - -1. [Introduction to Temporal Cloud](https://learn.temporal.io/courses/intro_to_temporal_cloud/) - 1. Learn the role of Temporal Cloud, how to log into and navigate its Web UI, and how to perform tasks that new Temporal Cloud users may do in preparation for using this service. -2. [Best practices | Temporal Platform Documentation](https://docs.temporal.io/best-practices) - 1. Learn the foundational principles and best practices for using Temporal Cloud. - -### Software Developers {#software-developers} - -1. [Versioning Workflows](https://learn.temporal.io/courses/versioning/) - 1. In this course, you will learn how to safely evolve your Temporal application code in production. -2. [Worker Versioning](https://learn.temporal.io/courses/worker_versioning/) - 1. You will learn the benefits of Worker Versioning and evaluate tradeoffs of various versioning approaches. - -### AI Developers {#ai-developers} - -1. [Building Durable AI Applications with Temporal](https://learn.temporal.io/tutorials/ai/building-durable-ai-applications/) - 1. Learn how to build reliable AI applications using Temporal to orchestrate LLM calls, handle retries, and manage complex AI workflows that can recover from failures. -2. [Building Durable MCP Tool with Temporal](https://learn.temporal.io/tutorials/ai/building-mcp-tools-with-temporal/) - 1. Learn how to build long-running Model Context Protocol (MCP) tools using Temporal. - -## What's next - -* Check whether [Temporal is the right technology for your use case](./decision-framework.md). diff --git a/docs/best-practices/knowledge-hub/patterns.md b/docs/best-practices/knowledge-hub/patterns.md deleted file mode 100644 index efed2cb80d..0000000000 --- a/docs/best-practices/knowledge-hub/patterns.md +++ /dev/null @@ -1,147 +0,0 @@ ---- -id: patterns -title: Temporal Patterns -sidebar_label: Patterns -description: Common Temporal Workflow design patterns with code samples for Python and Java. -toc_max_heading_level: 3 -keywords: - - temporal patterns - - temporal design patterns - - temporal code samples - - temporal best practices -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Curate Temporal Workflow patterns relevant to your use cases so developers can quickly find solutions. -::: - -## Parallel Activity - -| | | -| :---- | :---- | -| **What it does** | Execute multiple Activities concurrently. | -| **Why use it** | Improve Workflow performance when Activities are independent and don't need sequential execution. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_parallel_activity.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloParallelActivity.java) | - -## Custom Search Attributes - -| | | -| :---- | :---- | -| **What it does** | Adds custom key-value metadata to Workflow executions. | -| **Why use it** | Enables advanced filtering, sorting, and visibility of Workflows in the Web UI and CLI based on business-specific data. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_search_attributes.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloSearchAttributes.java) | - -## Child Workflow - -| | | -| :---- | :---- | -| **What it does** | Spawns a new Workflow execution from within a parent Workflow. | -| **Why use it** | Partition work into smaller chunks, encapsulates Activities into observable components, and model business entities with different lifecycles. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_child_workflow.py), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/asyncchild) | - -## Continue as new - -| | | -| :---- | :---- | -| **What it does** | Atomically completes the current Workflow execution and starts a new one with the same Workflow ID. | -| **Why use it** | Prevents "Event History Limit Exceeded" errors and other [Workflow Execution limits](https://docs.temporal.io/cloud/limits#workflow-execution-event-history-limits) by clearing the history. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_continue_as_new.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloPeriodic.java) | - -## Exception handling - -| | | -| :---- | :---- | -| **What it does** | Implements logic to catch and respond to Activity or Workflow failures. | -| **Why use it** | Ensures system resilience by defining fallback logic, compensation transactions, or specific retry policies when errors occur. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_exception.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloException.java) | - -## Cancellation - -| | | -| :---- | :---- | -| **What it does** | Sends a request to gracefully terminate a running Workflow or specific scope. | -| **Why use it** | Stops unnecessary processing and cleans up resources when a result is no longer needed or a user explicitly stops the process. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_cancellation.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloCancellationScope.java) | - -## Async Activity completion - -| | | -| :---- | :---- | -| **What it does** | Enables the Activity Function to return without the Activity Execution completing. | -| **Why use it** | Essential for long-running external processes that can heartbeat and inform Temporal of its completion. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_async_activity_completion.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloAsyncActivityCompletion.java) | - -## Local Activity - -| | | -| :---- | :---- | -| **What it does** | Executes short-lived Activity logic within the same process as the Workflow Worker. | -| **Why use it** | Reduces latency and history size for short, high-throughput operations that do not require global durability guarantees. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/blob/main/hello/hello_local_activity.py), [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloLocalActivity.java) | - -## Batch Processing (Sliding Window) - -| | | -| :---- | :---- | -| **What it does** | Processes a large stream of items in controlled, concurrent chunks. | -| **Why use it** | Manages concurrency and throughput limits while efficiently processing high volumes of data without overwhelming downstream services. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/batch_sliding_window), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/batch/slidingwindow) | - -## Custom Metrics - -| | | -| :---- | :---- | -| **What it does** | Emits application-specific telemetry (counters, gauges, timers) from Workflows and Activities. | -| **Why use it** | Provides observability into business-level KPIs and specific Workflow performance characteristics beyond default system metrics. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/custom_metric), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/metrics) | - -## Encryption - -| | | -| :---- | :---- | -| **What it does** | Encrypts Workflow and Activity payloads client-side using a custom Data Converter. | -| **Why use it** | Ensures sensitive data remains secure and opaque to the Temporal Server, satisfying strict compliance and privacy requirements. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/encryption), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/encryptedpayloads) | - -## Polling - -| | | -| :---- | :---- | -| **What it does** | Periodically checks the state of an external system from within an Activity. | -| **Why use it** | Provides reliable integration with external APIs or systems that do not provide webhooks or asynchronous event notifications. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/polling), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/polling) | - -## Worker routing - -| | | -| :---- | :---- | -| **What it does** | Dynamically routes Activities to specific Task Queues monitored by designated Workers. | -| **Why use it** | Targets tasks to specific hosts or environments; required for file-system affinity, local caching strategies, or hardware-specific (e.g., GPU) operations. | -| **Code samples** | [Python](https://github.com/temporalio/samples-python/tree/main/worker_specific_task_queues), [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/fileprocessing) | - -## Saga - -| | | -| :---- | :---- | -| **What it does** | Manages long-running, distributed transactions by executing a sequence of steps. If a step fails, it triggers "compensating actions" (undo operations) in reverse order to revert the changes made by previous steps. | -| **Why use it** | Ensures data consistency across microservices (e.g., booking a flight, hotel, and car) without locking resources for long periods. It handles partial failures gracefully by rolling back the system to a known consistent state. | -| **Code samples** | [Java](https://github.com/temporalio/samples-java/blob/main/core/src/main/java/io/temporal/samples/hello/HelloSaga.java) | - -## Early Return - -| | | -| :---- | :---- | -| **What it does** | Uses "Update with Start" to begin a Workflow execution and synchronously return a result to the client (e.g., validation success) while continuing to process longer-running tasks (e.g., database updates, external API calls) in the background. | -| **Why use it** | Drastically reduces end-user latency in interactive applications. Users receive immediate feedback (like an "Order Received" confirmation) without waiting for the entire process to complete. | -| **Code samples** | [Java](https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/earlyreturn) | - -## Example Temporal Applications - -See [Temporal Code Exchange](https://temporal.io/code-exchange) for example Temporal applications. diff --git a/docs/best-practices/knowledge-hub/shared-responsibility.md b/docs/best-practices/knowledge-hub/shared-responsibility.md deleted file mode 100644 index 00aac33237..0000000000 --- a/docs/best-practices/knowledge-hub/shared-responsibility.md +++ /dev/null @@ -1,139 +0,0 @@ ---- -id: shared-responsibility -title: Shared Responsibility Model -sidebar_label: Shared Responsibility -description: Defining team responsibilities for building and managing Temporal applications between Platform and Application teams. -toc_max_heading_level: 3 -keywords: - - temporal shared responsibility - - temporal platform team - - temporal application team - - temporal governance -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Tailor this matrix to clarify ownership boundaries so developers know who to contact. -::: - -At ABC Financial, the ownership of Temporal applications is shared between the **Temporal Platform Team** (who manages Temporal Cloud infrastructure) and **Application Teams** (who build and run Temporal Workflows). - -*Key: ✅= responsible, ❌= not responsible, 🤝🏼= shared responsibility* - -### Identity Access Management (IAM) - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Temporal Cloud access ([go/temporal-request](http://go/temporal-request)) | ✅ | ❌ | -| [SAML](https://docs.temporal.io/cloud/saml) and [SCIM](https://docs.temporal.io/cloud/scim) configurations | ✅ | ❌ | -| Temporal Cloud [user groups](https://docs.temporal.io/cloud/user-groups) | ✅ | ❌ | -| User principal provisioning and de-provisioning | ✅ | ❌ | -| [User principal role](https://docs.temporal.io/cloud/users) assignment | ✅ | ❌ | -| [API key](https://docs.temporal.io/cloud/api-keys) provisioning | ✅ | ❌ | - -### Network Connectivity - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| [Private Connectivity](https://docs.temporal.io/cloud/connectivity) to Temporal Cloud | ✅ | ❌ | -| Firewall rules to Temporal Cloud | ✅ | ❌ | - -### Data Security - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Data compliance policy | ✅ | ❌ | -| [Data Converter](https://docs.temporal.io/evaluate/development-production-features/data-encryption) implementation | ✅ | ❌ | -| [Data Converter](https://docs.temporal.io/evaluate/development-production-features/data-encryption) usage | ❌ | ✅ | -| [Codec Server](https://docs.temporal.io/production-deployment/data-encryption) hosting | ✅ | ❌ | -| [Codec Server](https://docs.temporal.io/production-deployment/data-encryption) configuration (per Namespace) | ❌ | ✅ | - -### Infrastructure - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Temporal Cloud Namespace provisioning ([go/temporal-namespace](http://go/temporal-namespace)) | ✅ | ❌ | -| [Temporal Cloud metrics](https://docs.temporal.io/production-deployment/cloud/metrics/reference) | ✅ | ❌ | -| Temporal Cloud [Namespace rate limits](https://docs.temporal.io/cloud/limits#namespace-level) | ❌ | ✅ | -| Temporal Cloud [Namespace Capacity](https://docs.temporal.io/cloud/capacity-modes) | ❌ | ✅ | -| [Temporal Cloud audit logs](https://docs.temporal.io/cloud/audit-logs) | ✅ | ❌ | - -### Governance - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Temporal Platform Hub | ✅ | ❌ | -| [Temporal developer guide](#) | ✅ | ❌ | - -### Development - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Workflow development | ❌ | ✅ | -| Automated tests (i.e. unit, integration, [replay](https://docs.temporal.io/develop/java/testing-suite#replay)) | ❌ | ✅ | -| Workflow versioning | ❌ | ✅ | - -### Worker - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Worker identity authentication policy | ✅ | ❌ | -| Worker identity auth implementation | ❌ | ✅ | -| Worker identity auth rotation | ✅ | ❌ | -| Worker infrastructure health (e.g. Kubernetes health) | ✅ | ❌ | -| Worker deployment health | ❌ | ✅ | -| Worker configurations (i.e. Task Queue, Execution Slots) | 🤝🏼 (defaults) | 🤝🏼 (customization) | -| Worker auto-scaling framework (i.e. KEDA) | ✅ | ❌ | -| Worker auto-scaling configuration | ❌ | ✅ | - -### Temporal Application Deployment - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Build pipeline for Worker | ✅ | ❌ | -| Artifact management | ✅ | ❌ | -| Workflow versioning management (e.g. [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning)) policy | ✅ | ❌ | -| Worker build (i.e. Workflow and Worker Definition) | ❌ | ✅ | -| Worker build release (i.e. control which build to release and when) | ✅ | ❌ | - -### Observability - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Observability platform (e.g. Datadog, Dynatrace) | ✅ | ❌ | -| [Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics) collection | ✅ | ❌ | -| [Temporal SDK metrics](https://docs.temporal.io/references/sdk-metrics) configuration | ❌ | ✅ | -| Temporal custom metrics emission | ❌ | ✅ | -| [Temporal Cloud metrics](https://docs.temporal.io/cloud/metrics/openmetrics) collection | ✅ | ❌ | -| Monitoring dashboard ([go/temporal-dashboard](http://go/temporal-dashboard)) | ✅ | ❌ | -| Temporal Cloud platform alerts | ✅ | ❌ | -| Temporal Workflow alerts | ❌ | ✅ | - -### Operation - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Support coordination with Temporal (the company) | ✅ | ❌ | -| Load testing | ❌ | ✅ | -| Incident response | 🤝🏼 (platform incident) | 🤝🏼 (application incident) | - -### Cost - -| Responsibility | Platform Team | Application Team | -| :---- | ----- | ----- | -| Temporal Cloud platform cost | ✅ | ❌ | -| Temporal Cloud Namespace cost | ❌ | ✅ | - -## Decision framework - -When in doubt, ask yourself: - -* **Does the issue affect multiple teams or namespaces?** → Platform Team -* **Is it business logic or application-specific?** → Application Team -* **Does it require Temporal Cloud `Admin` access?** → Platform Team diff --git a/docs/best-practices/knowledge-hub/support.md b/docs/best-practices/knowledge-hub/support.md deleted file mode 100644 index 4325522fc6..0000000000 --- a/docs/best-practices/knowledge-hub/support.md +++ /dev/null @@ -1,48 +0,0 @@ ---- -id: support -title: Get Help from the Temporal Team -sidebar_label: Support -description: Temporal Cloud support model, how to submit tickets, and expert-led sessions available through Enterprise support. -toc_max_heading_level: 3 -keywords: - - temporal support - - temporal cloud support - - temporal enterprise - - temporal expert sessions -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -## Temporal Cloud support model - -:::note -Specify your support tier so developers understand the SLAs available to them. -::: - -At ABC Financial, we have **Enterprise** support for Temporal Cloud. With Enterprise support, Temporal offers the following response time targets for support tickets: - -| | P0 | P1 | P2 | P3 | -| :---- | ----- | ----- | ----- | ----- | -| Definition | **Production impacted** - Temporal Cloud service is unavailable or degraded with a significant impact. | **Production issue** - An issue related to production workloads running on the Temporal Cloud service, or a significant project, is blocked. | **General issues** - General Temporal Cloud service or other issues where there is no production impact or a workaround exists to mitigate the impact. | **General guidance** - Questions or an issue with the Temporal Cloud service that is not impacting system availability or functionality. | -| Response time target | 30 minutes (24×7) | 1 hour | 4 hours | 1 day | - -## How to submit a support ticket - -1. Go to [support.temporal.io](http://support.temporal.io). -2. If prompted, log in to Temporal Cloud using the same method you normally use (e.g., Google, Microsoft, email-password, or other methods). -3. You will be presented with a screen where you can view open and closed tickets for your Temporal account, as well as submit a new ticket. - -## Temporal account team - -:::note -List your Temporal contacts so developers know who to escalate to. Request a Calendly link from your Temporal team. -::: - -* Temporal Account Executive: Person -* Temporal Solution Architect: Person -* Temporal Dedicated Support Engineer: Person diff --git a/docs/best-practices/knowledge-hub/temporal-overview.md b/docs/best-practices/knowledge-hub/temporal-overview.md deleted file mode 100644 index f83b7ace07..0000000000 --- a/docs/best-practices/knowledge-hub/temporal-overview.md +++ /dev/null @@ -1,113 +0,0 @@ ---- -id: temporal-overview -title: Temporal Overview -sidebar_label: Temporal Overview -description: Learn what Temporal is, why users love it, and how it delivers business value across various industries. -toc_max_heading_level: 3 -keywords: - - temporal overview - - what is temporal - - durable execution - - temporal use cases -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -## What is Temporal? - -:::note -Customize this introduction to describe Temporal that resonates with your developers. Highlight pain points Temporal solves for your developers. -::: - -Temporal provides a new way to build scalable, reliable applications. - -**Temporal** is an **open-source Durable Execution** platform that abstracts away the complexity of building distributed systems. -Durable Execution ensures that your application behaves correctly despite adverse conditions by guaranteeing that it will run to completion. -If a failure or a crash happens, your business processes keep running seamlessly without interruptions. - -With Temporal, engineering teams improve development velocity and deliver more reliable applications. - -Temporal is used for critical applications at enterprises like [Nvidia](https://temporal.io/blog/transforming-gpu-resource-management-with-temporal), [ANZ Bank](https://temporal.io/resources/case-studies/anz-story), [Netflix](https://temporal.io/resources/on-demand/netflix), [Snap](https://eng.snap.com/build_a_reliable_system_in_a_microservices_world_at_snap), [Yum! Brands](https://temporal.io/resources/on-demand/temporal-at-yum-brands), and AI leaders like [Replit](https://temporal.io/resources/case-studies/replit-uses-temporal-to-power-replit-agent-reliably-at-scale), [OpenAI](https://newsletter.pragmaticengineer.com/p/chatgpt-images). - -## Why users love Temporal - -:::note -Update this list to reflect why your organization chose Temporal. -::: - -1. **Durability**: your code never "forgets" where it is. If a server crashes or restarts, your function resumes exactly where it left off, ensuring no data or progress is ever lost. -2. **Easy-to-use code structure:** - * Choose between the Python and Java SDKs that best suit you and start writing your business logic. - * Integrate your favorite IDE, libraries, and tools into your development process. Temporal also supports polyglot and idiomatic programming - which enables developers to leverage the strengths of various programming languages and integrate Temporal into existing codebases. -3. **Simplicity:** You can achieve all of this without having to manage queues or complex state machines. Temporal does this all for you. -4. **Visibility:** Temporal provides a Web UI, SDK and Cloud metrics, and OpenTelemetry integration that gives developers unprecedented visibility into the current state of their applications. - -## Temporal business value - -:::note -Replace with metrics showing Temporal's impact at your organization. -::: - -At ABC Financial, Temporal serves as the development standard and platform for all asynchronous operations (e.g. payment, statement processing). -Since adopting Temporal, the company has saved millions of dollars. -The Temporal platform team continuously monitor the following business metrics to justify the adoption of Temporal: - -| Metric | Before Temporal | With Temporal | Result | -| ------ | --------------- | ------------- | ------ | -| **Service availability** | 99.7% (~2 hours of stalled transactions/month) | 99.99% (<5 minutes of stalled transactions/month) | $2.5M+ annual savings in operational costs | -| **On-call alert volume** | 28 actionable alerts/week | <3 alerts/week | ~90% reduction in on-call toil | -| **Feature time-to-market** | 9 months average (some projects take 12-18 months) | 3 months average | 66% faster product delivery | - -## Temporal use cases at ABC Financial - -:::note -Replace with Temporal use cases for your organization. -::: - -### FinTech/Financial Services - -1. **Payment processing** - Reliable payment orchestration with automatic retries and compensation logic (ex. [Block using Temporal](https://temporal.io/resources/on-demand/block-real-world-payments) for their checkout processes) -2. **Customer onboarding** - Leverage Temporal for multi-step customer verification and account setup processes (ex. [Mollie](https://temporal.io/resources/case-studies/mollie-payments-maximizes-operational-efficiency) for their customer onboarding processes) -3. **Cryptocurrency operations** - Orchestrate blockchain payments and crypto transactions (ex. [Coinbase](https://temporal.io/resources/case-studies/coinbase) uses Temporal for reliable crypto transactions) -4. **Operational workflows** - Various operational processes requiring high reliability - -### Banking - -1. **Loan origination** - Long-running approval processes with complex decision trees and human approvals (ex. [ANZ accelerates home loan origination](https://temporal.io/resources/case-studies/anz-story) with Temporal) -2. **Payment processing** - Core banking payment systems with high reliability requirements (ex. [JPMC uses Temporal](https://temporal.io/resources/on-demand/payments-modernization-jpmc) to handle complex transactions across multiple systems) -3. **Digital banking modernization** - Replacing legacy mainframe systems with cloud-native workflows (ex. [Will Bank](https://temporal.io/resources/on-demand/how-will-bank-leverages-temporal-to-handle-2-million-customers) modernized boleto processing and scaled to millions with Temporal) - -### Tech/Software - -1. **Data pipelines** - Orchestrate complex data processing workflows with reliability guarantees (ex. [Netflix](https://temporal.io/resources/on-demand/netflix) powers critical data pipelines on Temporal) -2. **Microservices deployment** - Coordinate deployment processes across distributed systems (ex. [Box](https://temporal.io/resources/case-studies/box) uses Temporal as a central "brain" for content operations) -3. **Workflow orchestration** - General workflow orchestration, improving development efficiency (ex. [AutoKitteh](https://temporal.io/resources/case-studies/autokitteh) increased reliability and reduced development effort with Temporal) -4. **Cloud migration** - Leverage Temporal for orchestrating complex cloud migration processes (ex. [SAP Concur](https://temporal.io/resources/case-studies/sap-concur) orchestrated a phased migration with Temporal) -5. **Infrastructure management** - Coordinate distributed operations and transactional changes reliably (ex. [DigitalOcean](https://temporal.io/resources/case-studies/digitalocean) reduced resources and developer backlog with Temporal) - -### AI - -1. **Long-running AI agents** - Durable execution for sophisticated agents requiring human-in-the-loop interactions (ex. [Replit uses Temporal](https://temporal.io/resources/case-studies/replit-uses-temporal-to-power-replit-agent-reliably-at-scale) to power Replit Agent reliably at scale) -2. **AI orchestration** - Coordinating multi-agent systems and LLM calls with fallback strategies (ex. [Dubber](https://temporal.io/resources/case-studies/dubber) runs conversational AI pipelines on Temporal) -3. **Data orchestration** - Managing complex AI/ML pipelines and model training workflows (ex. [Descript](https://temporal.io/resources/case-studies/descript) orchestrates applied-AI pipelines with Temporal) - -### Healthcare - -1. **Clinical assessments and diagnostics orchestration** - Orchestrate multi-step clinical assessments and diagnostic pipelines (ex. [Linus Health](https://temporal.io/resources/on-demand/transitioning-durable-workflows-cognitive-healthcare) uses Temporal to orchestrate cognitive assessments and analytics end-to-end) -2. **AI/ML inference and data processing in healthcare contexts** - Long-running AI/ML workflows for preprocessing, model inference, post-processing, and results delivery (ex. [Zebra Medical Vision](https://temporal.io/resources/case-studies/zebra-medical-vision)'s applied-AI diagnostics pipeline relies on Temporal for reliability and visibility) -3. **Medical imaging and bioinformatics pipelines** - Reliable, scalable orchestration for compute-heavy imaging workflows, transcription/feature extraction, and downstream analysis (ex. [Jackson Laboratory](https://temporal.io/resources/on-demand/imaging-workflows-temporal-cure-cancer) uses Temporal for imaging workflows and biological data science pipelines) - -### Retail - -1. **Order management and bookings** - Managing complex order fulfillment processes from payment to delivery (ex. [Yum! Brands](https://temporal.io/resources/on-demand/temporal-at-yum-brands) processes the majority of digital orders as Temporal Workflows) -2. **Orchestrating distributed transactions** - Coordinating multi-step e-commerce workflows (ex. [Vinted](https://temporal.io/resources/case-studies/vinted-10-12-million-worflows-daily-dev-velocity-low-cost) runs payment workflows at massive scale on Temporal) - -### Travel/Logistics - -1. **Logistics orchestration** - Managing complex shipping and delivery workflows (ex. [Maersk](https://temporal.io/resources/case-studies/maersk) built a "time machine" for logistics with Temporal to speed feature delivery) -2. **Booking management** - Long-running reservation and travel coordination processes (ex. [Turo](https://temporal.io/resources/on-demand/temporal-adoption-and-integration-at-turo) describes Temporal adoption and integration for durable, user-facing flows) diff --git a/docs/best-practices/knowledge-hub/troubleshooting.md b/docs/best-practices/knowledge-hub/troubleshooting.md deleted file mode 100644 index b5113ddafd..0000000000 --- a/docs/best-practices/knowledge-hub/troubleshooting.md +++ /dev/null @@ -1,146 +0,0 @@ ---- -id: troubleshooting -title: Troubleshooting -sidebar_label: Troubleshooting -description: How to observe and troubleshoot Temporal Workflows and Workers across environments. -toc_max_heading_level: 3 -keywords: - - temporal troubleshooting - - temporal debugging - - temporal observability - - temporal alerts -tags: - - Best Practices - - Knowledge Hub ---- - -:::info -This page is part of the [Temporal Knowledge Hub](./index.md). -::: - -:::note -Define the escalation path so developers know how to get help when issues arise. -::: - -This article documents how to observe and troubleshoot Temporal Workflows and Workers across environments (i.e. `dev`, `prd`). - -## Detection - -The first step to troubleshooting is collecting Temporal Workflow telemetry and understanding the issue. - -:::note -Link to your monitoring dashboard so developers can self-diagnose Workflow issues. -::: - -At ABC Financial, the following observability tools are supported for Temporal Cloud: - -| Tool | Purpose | What it answers | -| :---- | :---- | :---- | -| [Temporal Cloud UI](https://cloud.temporal.io/) | Source of truth for Temporal Workflow Event History, status, and traces. | *What happened to the Workflow?* *What is the current Workflow status?* | -| Grafana | Provides a single-pane-of-glass monitoring for logs, metrics, and traces across ABC Financial applications. | *Are the Workers healthy and sufficiently scaled?* *What happened to the upstream and downstream services?* | - -### Gather context - -Before troubleshooting, collect this information: - -* **Namespace:** Which Temporal Cloud namespace? -* **Workflow ID:** Specific Workflow instance(s) affected -* **Time window**: When did the issue start? Is it ongoing or intermittent? -* **Recent changes**: Any recent deployments or configuration updates? -* **Impact Scope**: Single Workflow, specific Workflow Type, or entire Namespace? - -### Quick health checks - -Perform these checks before detailed investigation: - -1. **Is Temporal Cloud healthy?** - 1. Check [status.temporal.io](https://status.temporal.io). -2. **Are Workers healthy?** - 1. Grafana → Infrastructure → Filter by `service:temporal` -3. **Are there recent deployments?** - 1. Check Slack channel. - -## Respond - -:::note -Add runbooks for common issues so developers can resolve problems independently. -::: - -### Common issues and troubleshooting steps - -#### 1. Workflow not starting - -**Symptoms**: Workflow appears in Temporal Cloud UI as `Running`, but the Workflow is not executing. - -**Troubleshooting**: - -1. **Check Worker Registration** - * Datadog → Logs → Filter: `service:temporal "Registered workflow"` - * Verify your Workflow Type appears in Worker startup logs -2. **Verify Task Queue** - * Temporal UI → Search for Workflows on your Task Queue - * Confirm Task Queue name matches exactly (case-sensitive) between Temporal Client and Worker -3. **Check Client Connection** - * Datadog → Filter by your application service name - * Search for: `"Temporal"` AND `"connection"` OR `"authentication"` - * Look for API key or connection errors - -**Fix**: - -* Redeploy Worker if Workflow not registered. -* Correct Task Queue name mismatch in code. -* Contact Temporal Platform team for API key issues. - -## Escalation - -:::note -Define escalation procedures and contact information for the platform team. -::: - -Escalate to the Temporal platform team when the issue persists after following the troubleshooting steps above. - -Include the following information in your request: - -``` -1. Temporal Cloud Namespace -2. Workflow ID(s) and time window -3. Description of the issue -4. Context collected (from the Detection section) -5. Troubleshooting steps already attempted -6. Other helpful information (e.g. screenshots) -``` - -### Response time SLA - -:::note -Set response time expectations so developers know when to expect help. -::: - -* P1 (Production outage): 30 minutes -* P2 (Degraded performance): 4 hours -* P3 (Non-urgent issues): 1 business day - -## Alerts - -It is the application team's responsibility to detect Temporal issues. Hence, it is recommended that you create appropriate alerts to proactively catch issues early. - -:::note -Add alert examples that developers can copy for their Workflows. -::: - -Here are some example alerts: - -| Alert name | Metric | Condition | Channel | -| :---- | :---- | :---- | :---- | -| High Workflow failure rate | `temporal.workflow.failed` | > 10% failure rate over 10 minutes | Page | -| High Activity Schedule-to-Start latency | `temporal.activity.schedule_to_start_latency` (p95) | > 30 seconds for 15 minutes | Slack | -| High Worker CPU utilization | `kubernetes.cpu.usage.pct` | > 80% for 10 minutes | Slack | - -## Need help? - -:::note -Specify the Slack channel or support portal for developers to reach the platform team. -::: - -* Learn [how the Temporal platform can support you](./support.md). -* Reach out to the Temporal platform team via Slack. diff --git a/sidebars.js b/sidebars.js index db65883220..780ab6d75b 100644 --- a/sidebars.js +++ b/sidebars.js @@ -648,28 +648,7 @@ module.exports = { 'best-practices/cloud-access-control', 'best-practices/security-controls', 'best-practices/worker', - { - type: 'category', - label: 'Knowledge Hub', - collapsed: true, - link: { - type: 'doc', - id: 'best-practices/knowledge-hub/index', - }, - items: [ - 'best-practices/knowledge-hub/temporal-overview', - 'best-practices/knowledge-hub/decision-framework', - 'best-practices/knowledge-hub/getting-started', - 'best-practices/knowledge-hub/learning-path', - 'best-practices/knowledge-hub/architecture', - 'best-practices/knowledge-hub/cost', - 'best-practices/knowledge-hub/shared-responsibility', - 'best-practices/knowledge-hub/patterns', - 'best-practices/knowledge-hub/troubleshooting', - 'best-practices/knowledge-hub/support', - 'best-practices/knowledge-hub/faqs', - ], - }, + 'best-practices/knowledge-hub', ], }, {