Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 10 additions & 24 deletions docs/content/stable/additional-features/auto-analyze.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,40 +22,26 @@ Similar to [PostgreSQL autovacuum](https://www.postgresql.org/docs/current/routi

## Enable Auto Analyze

Before you can use the feature, you must enable it by setting `ysql_enable_auto_analyze_service` to true on all YB-Masters, and both `ysql_enable_auto_analyze_service` and `ysql_enable_table_mutation_counter` to true on all YB-TServers.
Auto analyze is automatically enabled on YugabyteDB clusters when CBO is enabled (CBO is automatically enabled when a YugabyteDB cluster is created with version >= 2025.2 through YugabyteDB Aeon, YugabyteDB Anywhere or yugabyted). If needed, you can explicitly enable or disable auto analyze by setting `ysql_enable_auto_analyze` on both yb-master and yb-tserver.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can link to some common section which explains the flags which get enabled on new 2025.2 clusters and also list such flags which get transitively enabled as a result of CBO (auto analyze, bnl, etc)


For example, to create a single-node [yugabyted](../../reference/configuration/yugabyted/) cluster with Auto Analyze enabled, use the following command:
For example, to create a single-node [yugabyted](../../reference/configuration/yugabyted/) cluster with Auto Analyze explicitly enabled, use the following command:

```sh
./bin/yugabyted start \
--master_flags "ysql_enable_auto_analyze_service=true" \
--tserver_flags "ysql_enable_auto_analyze_service=true,ysql_enable_table_mutation_counter=true"
--master_flags "ysql_enable_auto_analyze=true" \
--tserver_flags "ysql_enable_auto_analyze=true"
```

Enabling Auto Analyze on an existing cluster requires a rolling restart to set `ysql_enable_auto_analyze_service` and `ysql_enable_table_mutation_counter` to true.

## Configure Auto Analyze

You can control how frequently the service updates table statistics using the following YB-TServer flags:
The auto analyze service counts the number of mutations (INSERT, UPDATE, and DELETE) to a table and triggers ANALYZE on the table automatically when certain thresholds are reached. This behavior is determined by the following knobs.

- `ysql_auto_analyze_threshold` - the minimum number of mutations (INSERT, UPDATE, and DELETE) needed to run ANALYZE on a table. Default is 50.
- `ysql_auto_analyze_scale_factor` - a fraction that determines when enough mutations have been accumulated to run ANALYZE for a table. Default is 0.1.
A table needs to accumulate a minimum number of mutations before it is considered for ANALYZE. This minimum is the sum of
* A fraction of the table size - this is controlled by [ysql_auto_analyze_scale_factor](../reference/configuration/yb-tserver/#ysql-auto-analyze-scale-factor). This setting defaults to 0.1, which translates to 10% of the current table size. Current table size is determined by the [`reltuples`].(https://www.postgresql.org/docs/15/catalog-pg-class.html#:~:text=CREATE%20INDEX.-,reltuples,-float4) column value stored in the `pg_class` catalog entry for that table.
* A static count of [ysql_auto_analyze_threshold](../reference/configuration/yb-tserver/#ysql-auto-analyze-threshold) (default 50) mutations. This setting ensures that small tables are not aggressively ANALYZED because the scale factor requirement is easily met.

Increasing either of these flags reduces the frequency of statistics updates.

If the total number of mutations for a table is greater than its analyze threshold, then the service runs ANALYZE on the table. The analyze threshold of a table is calculated as follows:

```sh
analyze_threshold = ysql_auto_analyze_threshold + (ysql_auto_analyze_scale_factor * <table_size>)
```

where `<table_size>` is the current `reltuples` column value stored in the `pg_class` catalog.

`ysql_auto_analyze_threshold` is important for small tables. With default settings, if a table has 100 rows and 20 are mutated, ANALYZE won't run as the threshold is not met, even though 20% of the rows are mutated.

On the other hand, `ysql_auto_analyze_scale_factor` is especially important for big tables. If a table has 1,000,000,000 rows, 10% (100,000,000 rows) would have to be mutated before ANALYZE runs. Set the scale factor to a lower value to allow for more frequent statistics collection for such large tables.

In addition, `ysql_auto_analyze_batch_size` controls the maximum number of tables the Auto Analyze service tries to analyze in a single ANALYZE statement. The default is 10. Setting this flag to a larger value can potentially reduce the number of YSQL catalog cache refreshes if Auto Analyze decides to ANALYZE many tables in the same database at the same time.
Separately, auto analyze also considers cooldown settings for a table so as to not trigger ANALYZE aggressively. After every run of ANALYZE on a table, a cooldown period is enforced before the next run of ANALYZE on that table, even if the mutation thresholds are met. The cooldown period starts from [ysql_auto_analyze_min_cooldown_per_table](../reference/configuration/yb-tserver/#ysql_auto_analyze_min_cooldown_per_table) (default: 10 secs) and exponentially increases to [ysql_auto_analyze_max_cooldown_per_table](../reference/configuration/yb-tserver/#ysql_auto_analyze_max_cooldown_per_table) (default: 24 hrs). Cooldown values for a table do not reset - so in most cases, it is expected that, after a while, a frequently updated table only gets ANALYZE'd once every ysql_auto_analyze_max_cooldown_per_table period (default: 24 hrs).

For more information on flags used to configure the Auto Analyze service, refer to [Auto Analyze service flags](../../reference/configuration/yb-tserver/#auto-analyze-service-flags).

Expand Down Expand Up @@ -94,4 +80,4 @@ SELECT reltuples FROM pg_class WHERE relname = 'test';

## Limitations

Because ANALYZE is a DDL statement, it can cause DDL conflicts when run concurrently with other DDL statements. As Auto Analyze runs ANALYZE in the background, you should turn off Auto Analyze if you want to execute DDL statements. You can do this by setting `ysql_enable_auto_analyze_service` to false on all YB-TServers at runtime.
ANALYZE is technically considered a DDL statement (schema change) and normally conflicts with other [concurrent DDLs](../best-practices-operations/administration/#concurrent-ddl-during-a-ddl-operation). However, when run via the auto analyze service, ANALYZE can run concurrently with other DDL. In this case, ANALYZE is pre-empted by concurrent DDL and will be retried at a later point. However, when [transactional DDL](../explore/transactions/transactional-ddl/) is enabled (off by default), certain kinds of transactions that contain DDL may face a `kConflict` error when a background ANALYZE from the auto analyze service interrupts this transaction. In such cases, it is recommended to disable the auto analyze service explicitly and trigger ANALYZE manually. Issue {{<issue 28903>}} tracks this scenario.
8 changes: 7 additions & 1 deletion docs/content/stable/reference/configuration/yb-master.md
Original file line number Diff line number Diff line change
Expand Up @@ -1062,7 +1062,13 @@ Default: `true`

See also [Auto Analyze Service TServer flags](../yb-tserver/#auto-analyze-service-flags).

##### ysql_enable_auto_analyze_service
Auto analyze is automatically enabled by default when the [cost-based optimizer](../../../architecture/query-layer/planner-optimizer/) (CBO) is enabled through gflags. To explicitly control the service, you can set the flag [ysql_enable_auto_analyze].

##### ysql_enable_auto_analyze

{{<tags/feature/ea idea="590">}}Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold.

##### ysql_enable_auto_analyze_service (deprecated)

{{<tags/feature/ea idea="590">}}Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold.

Expand Down
69 changes: 38 additions & 31 deletions docs/content/stable/reference/configuration/yb-tserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -2131,7 +2131,7 @@ Default: `legacy_mode`

Enables the YugabyteDB [cost-based optimizer](../../../architecture/query-layer/planner-optimizer/) (CBO). Options are `on`, `off`, `legacy_mode`, and `legacy_stats_mode`.

When enabling CBO, you must run ANALYZE on user tables to maintain up-to-date statistics.
When CBO is enabled through this gflag, auto analyze is also enabled automatically. If you disable auto analyze explicitly, you are responsible for periodically running ANALYZE on user tables to maintain up-to-date statistics.

For information on using this parameter to configure CBO, refer to [Enable cost-based optimizer](../../../best-practices-operations/ysql-yb-enable-cbo/).

Expand All @@ -2141,54 +2141,39 @@ For information on using this parameter to configure CBO, refer to [Enable cost-

{{< note title="Note" >}}

To fully enable the Auto Analyze service, you need to enable `ysql_enable_auto_analyze_service` on all YB-Masters and YB-TServers, and `ysql_enable_table_mutation_counter` on all YB-TServers.
Auto analyze is automatically enabled by default when the [cost-based optimizer](../../../architecture/query-layer/planner-optimizer/) (CBO) is enabled through gflags. To explicitly control the service, you can set the flag [ysql_enable_auto_analyze]

{{< /note >}}

See also [Auto Analyze Service Master flags](../yb-master/#auto-analyze-service-flags).

##### --ysql_enable_auto_analyze_service

{{% tags/wrap %}}
{{<tags/feature/ea idea="590">}}
{{<tags/feature/t-server>}}
{{<tags/feature/restart-needed>}}
Default: `false`
{{% /tags/wrap %}}
##### --ysql_enable_auto_analyze

Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold.

##### --ysql_enable_table_mutation_counter

{{% tags/wrap %}}

##### --ysql_auto_analyze_threshold

Default: `false`
{{% /tags/wrap %}}
Default: `50`

Enable per table mutation (INSERT, UPDATE, DELETE) counting. The Auto Analyze service runs ANALYZE when the number of mutations of a table exceeds the threshold determined by the [ysql_auto_analyze_threshold](#ysql-auto-analyze-threshold) and [ysql_auto_analyze_scale_factor](#ysql-auto-analyze-scale-factor) settings.
The minimum number of mutations needed to run ANALYZE on a table. For more details, see [Auto Analyze service](../../../additional-features/auto-analyze).

##### --ysql_auto_analyze_threshold
##### --ysql_auto_analyze_scale_factor

{{% tags/wrap %}}
Default: `0.1`

{{<tags/feature/restart-needed>}}
Default: `50`
{{% /tags/wrap %}}
The fraction defining when sufficient mutations have been accumulated to run ANALYZE for a table. For more details, see [Auto Analyze service](../../../additional-features/auto-analyze).

The minimum number of mutations needed to run ANALYZE on a table.
##### --ysql_auto_analyze_min_cooldown_per_table

##### --ysql_auto_analyze_scale_factor
Default: `10000` (10 secs)

{{% tags/wrap %}}
The minimum duration (in milliseconds) for the cooldown period between successive runs of ANALYZE on a specific table by the auto analyze service. For more details, see [Auto Analyze service](../../../additional-features/auto-analyze).

{{<tags/feature/restart-needed>}}
Default: `0.1`
{{% /tags/wrap %}}
##### --ysql_auto_analyze_max_cooldown_per_table

The fraction defining when sufficient mutations have been accumulated to run ANALYZE for a table.
Default: `86400000` (24 hours)

ANALYZE runs when the mutation count exceeds `ysql_auto_analyze_scale_factor * <table_size> + ysql_auto_analyze_threshold`, where table_size is the value of the `reltuples` column in the `pg_class` catalog.
The maximum duration (in milliseconds) for the cooldown period between successive runs of ANALYZE on a specific table by the auto analyze service. For more details, see [Auto Analyze service](../../../additional-features/auto-analyze).

##### --ysql_auto_analyze_batch_size

Expand All @@ -2198,7 +2183,7 @@ ANALYZE runs when the mutation count exceeds `ysql_auto_analyze_scale_factor * <
Default: `10`
{{% /tags/wrap %}}

The maximum number of tables the Auto Analyze service tries to analyze in a single ANALYZE statement.
The maximum number of tables the Auto Analyze service tries to analyze in a single ANALYZE statement.

##### --ysql_cluster_level_mutation_persist_interval_ms

Expand Down Expand Up @@ -2240,6 +2225,28 @@ Default: `5000`

Timeout, in milliseconds, for the node-level mutation reporting RPC to the Auto Analyze service.


##### --ysql_enable_auto_analyze_service (deprecated)

{{% tags/wrap %}}
{{<tags/feature/ea idea="590">}}
{{<tags/feature/t-server>}}
{{<tags/feature/restart-needed>}}
Default: `false`
{{% /tags/wrap %}}

Enable the Auto Analyze service, which automatically runs ANALYZE to update table statistics for tables that have changed more than a configurable threshold.

##### --ysql_enable_table_mutation_counter (deprecated)

{{% tags/wrap %}}


Default: `false`
{{% /tags/wrap %}}

Enable per table mutation (INSERT, UPDATE, DELETE) counting. The Auto Analyze service runs ANALYZE when the number of mutations of a table exceeds the threshold determined by the [ysql_auto_analyze_threshold](#ysql-auto-analyze-threshold) and [ysql_auto_analyze_scale_factor](#ysql-auto-analyze-scale-factor) settings.

### Advisory lock flags

To learn about advisory locks, see [Advisory locks](../../../architecture/transactions/concurrency-control/#advisory-locks).
Expand Down