diff --git a/models/track/limits.mdx b/models/track/limits.mdx
index b0ff023c8c..3b82d54ea3 100644
--- a/models/track/limits.mdx
+++ b/models/track/limits.mdx
@@ -1,20 +1,87 @@
---
-description: Keep your pages in W&B faster and more responsive by logging within these
- suggested bounds.
-title: Experiments limits and performance
+description: Learn how W&B logging scales and see recommended limits for metrics, runs, and workspace performance.
+title: Experiment limits and performance
---
-{/* ## Best Practices for Fast Pages */}
+The following sections describe recommended limits and performance considerations when logging experiment data to W&B.
-Keep your pages in W&B faster and more responsive by logging within the following suggested bounds.
+## How W&B counts logged data
-## Logging considerations
+W&B organizes logged data along three dimensions:
-Use `wandb.Run.log()` to track experiment metrics.
+* **Steps**: The number of time steps in a run that you finalize by committing logged data. Each step represents a single time index (such as a training step or epoch) and is finalized when you call `wandb.Run.log()` with `commit=True`, or implicitly when `commit` and `step` are not specified.
+* **Metrics**: The number of distinct metric keys you log (for example, `loss`, `accuracy`, or `eval/precision`).
+* **Logged points**: The total number of metric values recorded, calculated as:
+ ```text
+ logged points = steps × metrics
+ ```
+
+{/*
+W&B creates a new step when it finalizes the current step. A step is finalized when:
+
+* You call `wandb.Run.log(..., commit=True)`, or
+* You call `wandb.Run.log(...)` without specifying commit and you do not specify step (implicit commit).
+ */}
+
+
+These dimensions [scale differently](/models/track/limits#logging-at-scale) and have different performance considerations.
+
+### Logging at scale
+
+W&B handles logging across steps, metrics, and total logged points in the following ways at scale:
+
+* **Steps**: Can scale into the millions per run.
+* **Metrics**: For best performance, keep the number of distinct metrics under 100,000 per run.
+* **Total logged points (steps × metrics)**: Can scale into the hundreds of millions or more, depending on logging patterns and metric types.
+
+
+Most performance issues come from logging too many distinct metrics, not from logging too many steps.
+
+
+## Summary of recommended limits
+
+The following table summarizes recommended limits for logging at scale:
+
+| Dimension | Guidance at scale |
+|-----------------------|--------------------------------------------|
+| Steps per run | Millions of steps per run are common |
+| Distinct metrics | Fewer than 100,000 per run |
+| Scalars per metric | Fewer than 100,000 values |
+| Media per metric | Fewer than 50,000 values |
+| Histograms per metric | Fewer than 10,000 values |
+| Total logged points | Hundreds of millions or more possible |
+
+See [Logging limits and considerations](/models/track/limits#logging-limits-and-considerations) for details on each dimension.
+
+## Logging limits and considerations
+
+The following sections describe recommended limits for logging experiment data to W&B and how those limits impact performance.
### Distinct metric count
-For faster performance, keep the total number of distinct metrics in a project under 10,000.
+A distinct metric is a unique metric key logged in a run. Each unique key name counts as one metric, regardless of how often you log it or what values it contains.
+
+For example, the following code snippet logs two distinct metrics: `accuracy` and `loss`.
+
+```python
+import wandb
+
+with wandb.init() as run:
+ run.log(
+ {
+ "accuracy": 0.9,
+ "loss": 0.1,
+ }
+ )
+```
+
+W&B flattens nested dictionaries when you log metrics. Each key in the flattened dictionary also counts as a distinct metric. This means that if you pass a dictionary, W&B flattens it into dot-separated metric names.
+
+
+Metric names must follow certain naming constraints imposed by GraphQL. See [Metric naming constraints](/models/track/log/#metric-naming-constraints) for details.
+
+
+The following example logs a nested dictionary. W&B first flattens the dictionary, then logs each key as a distrinct metric. In this case, the W&B logs: `a`, `b.c`, and `b.d`.
```python
import wandb
@@ -31,11 +98,15 @@ with wandb.init() as run:
)
```
-
-W&B automatically flattens nested values. This means that if you pass a dictionary, W&B turns it into a dot-separated name. For config values, W&B supports 3 dots in the name. For summary values, W&B supports 4 dots.
-
+Each distinct metric is multiplied by the number of steps in a run. For example, logging 10,000 metrics over 1 million steps produces 10 billion logged points.
-Metric names must follow certain naming constraints imposed by GraphQL. See [Metric naming constraints](/models/track/log/#metric-naming-constraints) for details.
+{/* For config values, W&B supports 3 dots in the name. For summary values, W&B supports 4 dots. */}
+
+Logging too many distinct metrics can slow down your project's workspace and run pages.
+
+
+For optimal performance, keep the total number of distinct metrics in a project under 10,000. This project-level guidance is more conservative than the per-run limits described earlier, because project workspaces aggregate metrics across many runs.
+
{/* ### Log media with same metric name
Log related media to the same metric name:
@@ -49,7 +120,7 @@ for i, img in enumerate(images):
run.log({"pred_imgs": [wandb.Image(image) for image in images]})
``` */}
-If your workspace suddenly slows down, check whether recent runs have unintentionally logged thousands of new metrics. (This is easiest to spot by seeing sections with thousands of plots that have only one or two runs visible on them.) If they have, consider deleting those runs and recreating them with the desired metrics.
+If your workspace suddenly slows down, check whether recent runs have unintentionally logged thousands of new metrics. One way to check is to look at your project's workspace. If your workspace has thousands of plots that have only one or two runs visible on them, this may indicate an issue. If this case, consider deleting those runs and recreating them with the desired metrics.
### Value width
@@ -77,57 +148,89 @@ Data is saved and tracked even if you log values wider than the recommended amou
### Metric frequency
-Pick a logging frequency that is appropriate to the metric you are logging. As a general rule of thumb, log wider values less frequently than narrower values. W&B recommends:
+Pick a logging frequency that is appropriate to your metric.
+
+
+
+W&B recommends that you log large or complex values (for example, images, audio, or histograms) less frequently than small scalar values, such as loss or accuracy.
+
+
+
+The following recommendations apply per metric key, per run:
+
+* **Scalars**: Fewer than 100,000 logged values per metric
+* **Media**: Fewer than 50,000 logged values per metric
+* **Histograms**: Fewer than 10,000 logged values per metric
+
+These limits apply to each metric independently. For example, logging 1,000 scalar metrics for 100,000 steps produces 100 million logged points.
+
+Your workspace performance may degrade if you log too many values for a given metric. For example, logging 1 million histogram values for a single metric can slow down the run page for that run.
-- Scalars: \<100,000 logged points per metric
-- Media: \<50,000 logged points per metric
-- Histograms: \<10,000 logged points per metric
+To stay within recommended per-metric limits, W&B suggests that you use separate `wandb.Run.log()` calls with `commit=False` to associate less frequent metrics with an existing step without logging them at every step.
+
+
+Calling `wandb.Run.log()` multiple times with `commit=False` adds data to the same step without increasing the step count.
+
+
+The following example shows how to log different metric types at different frequencies to stay within recommended per-metric limits.
```python
import wandb
with wandb.init(project="metric-frequency") as run:
- # Not recommended
+ # Not recommended: logs all metric types at the same frequency.
+ # This results in logging expensive values (images, histograms)
+ # at every step.
run.log(
{
- "scalar": 1, # 100,000 scalars
- "media": wandb.Image(...), # 100,000 images
- "histogram": wandb.Histogram(...), # 100,000 histograms
+ "scalar": 1, # Logged every step
+ "media": wandb.Image(...), # Logged every step (too frequent)
+ "histogram": wandb.Histogram(...), # Logged every step (too frequent)
}
)
- # Recommended
+with wandb.init(project="metric-frequency") as run:
+ # Recommended: log different metric types at different frequencies
+ # with separate `wandb.Run.log()` calls and `commit=False`.
+
+ # Log scalar metrics frequently (for example, every step).
run.log(
{
- "scalar": 1, # 100,000 scalars
+ "scalar": 1,
},
- commit=True,
- ) # Commit batched, per-step metrics together
+ commit=True, # Commit batched, per-step metrics together
+ )
+ # Log media less frequently and associate it with an existing step.
run.log(
{
- "media": wandb.Image(...), # 50,000 images
+ "media": wandb.Image(...),
},
- commit=False,
+ commit=False, # Do not commit a new step for less frequent metrics
)
+ # Log histograms less frequently to avoid large volumes of data.
run.log(
{
- "histogram": wandb.Histogram(...), # 10,000 histograms
+ "histogram": wandb.Histogram(...),
},
- commit=False,
+ commit=False, # Do not commit a new step for less frequent metrics
)
```
+{/* This scale is supported, but workspaces load faster when you plot only a focused subset of metrics. */}
+
+{/* W&B continues to accept your logged data but pages may load more slowly if you exceed guidelines. */}
+
+
+
+
{/* Enable batching in calls to `run.log` by passing `commit=False` to minimize the total number of API calls for a given step. See [the docs](/models/ref/python/experiments/run/#method-runlog) for `run.log` for more details. */}
-
-W&B continues to accept your logged data but pages may load more slowly if you exceed guidelines.
-
### Config size
-Limit the total size of your run config to less than 10 MB. Logging large values could slow down your project workspaces and runs table operations.
+Limit the total size of your run config to less than 10 MB. Logging config values greater than 10 MB may slow down your project workspaces and runs table operations.
```python
import wandb
@@ -161,8 +264,26 @@ with open("large_config.json", "r") as f:
wandb.init(config=large_config)
```
-## Workspace considerations
+## Workspace performance
+
+Workspace performance depends on the number of runs, metrics, panels, and files in a project, as well as how they are organized and visualized.
+
+### Summary of performance factors
+
+The following table summarizes recommended guidance for workspace performance:
+
+| Factor | Recommended guidance | Why it matters |
+|------------------|--------------------------------------------------------|--------------------------------------------------|
+| Run count | Fewer than 100,000 runs per project (SaaS Cloud) | Large run sets slow workspace loading and queries |
+| | Fewer than 10,000 runs (Dedicated or Self-Managed) | |
+| Panel count | Use manual workspaces for large projects | Too many panels increase load and render time |
+| Section count | Avoid one section per metric | Hundreds of sections degrade workspace performance |
+| Metric count | Use manual mode for 5,000–100,000 metrics per run | Plotting many metrics slows workspace rendering |
+| File count | Fewer than 1,000 files per run | Large file lists slow run page loading |
+| Visualization type | Use workspaces for analysis, reports for presentation | Workspaces are optimized for high-density analysis |
+
+The following sections describe each factor in more detail.
### Run count
@@ -175,10 +296,8 @@ Run counts over these thresholds can slow down operations that involve project w
If your team accesses the same set of runs frequently, such as the set of recent runs, consider [moving less frequently used runs in bulk](/models/runs/manage-runs/) to a new "archive" project, leaving a smaller set of runs in your working project.
-### Workspace performance
-This section gives tips for optimizing the performance of your workspace.
-#### Panel count
+### Panel count
By default, a workspace is _automatic_, and generates standard panels for each logged key. If a workspace for a large project includes panels for many logged keys, the workspace may be slow to load and use. To improve performance, you can:
1. Reset the workspace to manual mode, which includes no panels by default.
@@ -190,9 +309,9 @@ Deleting unused panels one at a time has little impact on performance. Instead,
To learn more about configuring your workspace, refer to [Panels](/models/app/features/panels/).
-#### Section count
+### Section count
-Having hundreds of sections in a workspace can hurt performance. Consider creating sections based on high-level groupings of metrics and avoiding an anti-pattern of one section for each metric.
+Hundreds of sections in a workspace can impact performance. Consider creating sections based on high-level groupings of metrics and avoiding an anti-pattern of one section for each metric.
If you find you have too many sections and performance is slow, consider the workspace setting to create sections by prefix rather than suffix, which can result in fewer sections and better performance.
@@ -230,19 +349,21 @@ Is frequent logging slowing your training runs down? Check out [this Colab](http
W&B does not assert any limits beyond rate limiting. The W&B Python SDK automatically completes an exponential "backoff" and "retry" requests that exceed limits. W&B Python SDK responds with a “Network failure” on the command line. For unpaid accounts, W&B may reach out in extreme cases where usage exceeds reasonable thresholds.
-## Rate limits
+## API rate limits
-W&B SaaS Cloud API implements a rate limit to maintain system integrity and ensure availability. This measure prevents any single user from monopolizing available resources in the shared infrastructure, ensuring that the service remains accessible to all users. You may encounter a lower rate limit for a variety of reasons.
+W&B applies API rate limits on SaaS Cloud to protect system reliability and ensure fair access across shared infrastructure. Rate limits prevent any single user or project from consuming a disproportionate share of resources.
+
+W&B returns a `429 Rate limit exceeded` error along with the [relevant HTTP headers](#rate-limit-http-headers) when you exceed rate limits.
+
+Depending on your usage patterns, plan, or request type, you may encounter lower rate limits.
Rate limits are subject to change.
-If you encounter a rate limit, you receive a HTTP `429` `Rate limit exceeded` error and the response includes [rate limit HTTP headers](#rate-limit-http-headers).
-
### Rate limit HTTP headers
-The preceding table describes rate limit HTTP headers:
+The following table describes the HTTP headers returned when a request is rate limited:
| Header name | Description |
| ------------------- | --------------------------------------------------------------------------------------- |
@@ -250,51 +371,57 @@ The preceding table describes rate limit HTTP headers:
| RateLimit-Remaining | The amount of quota in the current rate limit window, scaled in the range of 0 and 1000 |
| RateLimit-Reset | The number of seconds until the current quota resets |
-### Rate limits on metric logging API
+### Metric logging API rate limits
-`wandb.Run.log()` logs your training data to W&B. This API is engaged through either online or [offline syncing](/models/ref/cli/wandb-sync). In either case, it imposes a rate limit quota limit in a rolling time window. This includes limits on total request size and request rate, where latter refers to the number of requests in a time duration.
+W&B enforces rate limits in a rolling time window when you log metrics using the W&B Python SDK. Rate limits apply to both online logging (using `wandb.init()` and `wandb.Run.log()`) and offline syncing (using the `wandb sync` CLI command). Limits apply to both:
-W&B applies rate limits per W&B project. So if you have 3 projects in a team, each project has its own rate limit quota. Users on [Paid plans](https://wandb.ai/site/pricing) have higher rate limits than Free plans.
+* The total size of logged requests
+* The request rate (number of requests per unit time)
-If you encounter a rate limit, you receive a HTTP `429` `Rate limit exceeded` error and the response includes [rate limit HTTP headers](#rate-limit-http-headers).
+W&B applies metric logging rate limits per project. If your team uses multiple projects, each project has its own independent quota. Users on [paid plans](https://wandb.ai/site/pricing) have higher rate limits than Free plans.
-### Suggestions for staying under the metrics logging API rate limit
+#### Suggestions for staying under the metrics logging API rate limit
Exceeding the rate limit may delay `run.finish()` until the rate limit resets. To avoid this, consider the following strategies:
-- Update your W&B Python SDK version: Ensure you are using the latest version of the W&B Python SDK. The W&B Python SDK is regularly updated and includes enhanced mechanisms for gracefully retrying requests and optimizing quota usage.
-- Reduce metric logging frequency:
- Minimize the frequency of logging metrics to conserve your quota. For example, you can modify your code to log metrics every five epochs instead of every epoch:
+- Update your W&B Python SDK version: Use the latest version of the SDK, which includes improvements for request batching, retries, and quota usage.
+- Reduce metric logging frequency: Log metrics less frequently to conserve quota. For example, log metrics every five epochs instead of every epoch:
+ ```python
+ import wandb
+ import random
-```python
-import wandb
-import random
+ with wandb.init(project="basic-intro") as run:
+ for epoch in range(10):
+ # Simulate training and evaluation
+ accuracy = 1 - 2 ** -epoch - random.random() / epoch
+ loss = 2 ** -epoch + random.random() / epoch
-with wandb.init(project="basic-intro") as run:
- for epoch in range(10):
- # Simulate training and evaluation
- accuracy = 1 - 2 ** -epoch - random.random() / epoch
- loss = 2 ** -epoch + random.random() / epoch
+ # Log metrics every 5 epochs
+ if epoch % 5 == 0:
+ run.log({"acc": accuracy, "loss": loss})
+ ```
- # Log metrics every 5 epochs
- if epoch % 5 == 0:
- run.log({"acc": accuracy, "loss": loss})
-```
+- Use manual syncing: W&B stores your run data locally if you hit a rate limit. You can sync your data with the command `wandb sync `. For more details, see the [`wandb sync`](/models/ref/cli/wandb-sync) reference.
+
+### GraphQL API rate limits
+
+The W&B App and [Public API](/models/ref/python/public-api/api) make GraphQL to query and modify data. W&B enforces rate limits for GraphQL requests on SaaS Cloud to protect backend performance.
-- Manual data syncing: W&B store your run data locally if you are rate limited. You can manually sync your data with the command `wandb sync `. For more details, see the [`wandb sync`](/models/ref/cli/wandb-sync) reference.
+W&B applies different rate limits based on the type of request:
-### Rate limits on GraphQL API
+* Unauthorized requests: Limited per IP address
+* Authorized requests: Limited per user
+* Project-scoped SDK requests (such as reports, runs, or artifacts): Limited per project based on database query time
-The W&B Models UI and SDK’s [public API](/models/ref/python/public-api/api) make GraphQL requests to the server for querying and modifying data. For all GraphQL requests in SaaS Cloud, W&B applies rate limits per IP address for unauthorized requests and per user for authorized requests. The limit is based on request rate (request per second) within a fixed time window, where your pricing plan determines the default limits. For relevant SDK requests that specify a project path (for example, reports, runs, artifacts), W&B applies rate limits per project, measured by database query time.
+Your pricing plan determines the default rate limits. Users on [Teams and Enterprise plans](https://wandb.ai/site/pricing/) receive higher limits than users on the Free plan.
-Users on [Teams and Enterprise plans](https://wandb.ai/site/pricing) receive higher rate limits than those on the Free plan.
-When you hit the rate limit while using the W&B Models SDK's public API, you see a relevant message indicating the error in the standard output.
+{/* For all GraphQL requests in SaaS Cloud, W&B applies rate limits per IP address for unauthorized requests and per user for authorized requests. The limit is based on request rate (request per second) within a fixed time window, where your pricing plan determines the default limits. For relevant SDK requests that specify a project path (for example, reports, runs, artifacts), W&B applies rate limits per project, measured by database query time.
-If you encounter a rate limit, you receive a HTTP `429` `Rate limit exceeded` error and the response includes [rate limit HTTP headers](#rate-limit-http-headers).
+{/* If you encounter a rate limit, you receive a HTTP `429` `Rate limit exceeded` error and the response includes [rate limit HTTP headers](#rate-limit-http-headers). */}
#### Suggestions for staying under the GraphQL API rate limit
-If you are fetching a large volume of data using the W&B Models SDK's [public API](/models/ref/python/public-api/api), consider waiting at least one second between requests. If you receive a HTTP `429` `Rate limit exceeded` error or see `RateLimit-Remaining=0` in the response headers, wait for the number of seconds specified in `RateLimit-Reset` before retrying.
+Wait at least one second between requests if you fetch large volumes of data using the W&B Public API. If you receive an HTTP `429` error or see `RateLimit-Remaining=0` in the response headers, wait for the number of seconds specified in `RateLimit-Reset` before retrying the request.
## Browser considerations