diff --git a/docs/en/02-configuration.md b/docs/en/02-configuration.md new file mode 100644 index 00000000..b280939c --- /dev/null +++ b/docs/en/02-configuration.md @@ -0,0 +1,177 @@ +--- +id: configuration +--- + +# Configuration + +This document describes all available configuration options for the system. + +## Address Configuration + +Network addresses for various services. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `address.http` | string | `:9002` | HTTP listen address | +| `address.grpc` | string | `:9004` | GRPC listen address | +| `address.debug` | string | `:9200` | Debug listen address | + +## Storage Configuration + +Storage settings for data persistence. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `storage.data_dir` | string | - | Path to a directory where fractions will be stored | +| `storage.frac_size` | Bytes | `128MiB` | Maximum size of an active fraction before it gets sealed | +| `storage.total_size` | Bytes | `1GiB` | Upper bound of how much disk space can be occupied by sealed fractions before they get deleted (or offloaded) | + +## Cluster Configuration + +Cluster topology and replication settings. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `cluster.write_stores` | []string | - | Cold store instances which will be written to | +| `cluster.read_stores` | []string | - | Cold store instances wich will be queried from | +| `cluster.hot_stores` | []string | - | Store instances which will be written to and queried from | +| `cluster.hot_read_stores` | []string | - | Store instances which will be queried from. This field is optional but if specified will take precedence over `cluster.hot_stores` | +| `cluster.replicas` | int | `1` | Number of instances that belong to one shard | +| `cluster.hot_replicas` | int | - | Number if hot instances that belong to one shard. If specified will take precedence over `cluster.replicas` for hot stores | +| `cluster.shuffle_replicas` | bool | `false` | Whether to shuffle replicas | +| `cluster.mirror_address` | string | - | Host to which search queries will be mirrored. It can be useful if you have development cluster and you want to have same search pattern as you have on production cluster | + +## Slow Logs Configuration + +Thresholds for logging slow operations. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `slow_logs.bulk_threshold` | Duration | `0ms` | Duration to determine slow bulks. When bulk request exceeds this threshold it will be logged | +| `slow_logs.search_threshold` | Duration | `3s` | Duration to determine slow searches. When search request exceeds this threshold it will be logged | +| `slow_logs.fetch_threshold` | Duration | `3s` | Duration to determine slow fetches. When fetch request exceeds this threshold it will be logged | + +## Limits Configuration + +Rate limiting and resource constraints. + +### General Limits + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `limits.query_rate` | float64 | `2` | Maximum amount of requests per second | +| `limits.search_requests` | int | `32` | Maximum amount of simultaneous requests per second | +| `limits.bulk_requests` | int | `32` | Maximum amount of simultaneous requests per second | +| `limits.inflight_bulks` | int | `32` | Maximum amount of simultaneous requests per second | +| `limits.fraction_hits` | int | `6000` | Maximum amount of fractions that can be processed within single search request | +| `limits.search_docs` | int | `100000` | Maximum amount of documents that can be returned within single search request | +| `limits.doc_size` | Bytes | `128KiB` | Maximum possible size for single document. Document larger than this threshold will be skipped | + +### Aggregation Limits + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `limits.aggregation.field_tokens` | int | `1000000` | Maximum amount of unique field tokens that can be processed in single aggregation requests. Setting this field to 0 disables limit | +| `limits.aggregation.group_tokens` | int | `2000` | Maximum amount of unique group tokens that can be processed in single aggregation requests. Setting this field to 0 disables limit | +| `limits.aggregation.fraction_tokens` | int | `100000` | Maximum amount of unique tokens that are contained in single fraction which was picked up by aggregation request. Setting this field to 0 disables limit | + +## Circuit Breaker Configuration + +Circuit breaker settings for bulk operations. See [CircuitBreaker documentation](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `circuit_breaker.bulk.shard_timeout` | Duration | `10s` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | +| `circuit_breaker.bulk.err_percentage` | int | `50` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | +| `circuit_breaker.bulk.bucket_width` | Duration | `1s` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | +| `circuit_breaker.bulk.buckets_count` | int | `10` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | +| `circuit_breaker.bulk.sleep_window` | Duration | `5s` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | +| `circuit_breaker.bulk.volume_threshold` | int | `5` | Checkout [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) for more information | + +## Resources Configuration + +Resource allocation settings. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `resources.reader_workers` | int | runtime.GOMAXPROCS | Number of workers for readers pool. By default this setting is equal to runtime.GOMAXPROCS | +| `resources.search_workers` | int | runtime.GOMAXPROCS | Number of workers for searchers pool. By default this setting is equal to runtime.GOMAXPROCS | +| `resources.cache_size` | Bytes | 30% of available RAM | Maxium size of cache. By default this setting is equal to 30% of available RAM | +| `resources.sort_docs_cache_size` | Bytes | - | Size of the sorted documents cache | +| `resources.skip_fsync` | bool | `false` | Whether to skip fsync operations | + +## Compression Configuration + +Compression level settings for various data types. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `compression.docs_zstd_compression_level` | int | `1` | Zstandard compression level for documents | +| `compression.metas_zstd_compression_level` | int | `1` | Zstandard compression level for metadata | +| `compression.sealed_zstd_compression_level` | int | `3` | Zstandard compression level for sealed fractions | +| `compression.doc_block_zstd_compression_level` | int | `3` | Zstandard compression level for document blocks | + +## Indexing Configuration + +Settings for document indexing behavior. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `indexing.max_token_size` | int | `72` | Maximum token size | +| `indexing.case_sensitive` | bool | `false` | Whether indexing is case sensitive | +| `indexing.partial_field_indexing` | bool | `false` | Whether to enable partial field indexing | +| `indexing.past_allowed_time_drift` | Duration | `24h` | How much time can elapse since the message's timestamp. If more time than this has passed since the message's timestamp, the message's timestamp gets overwritten | +| `indexing.future_allowed_time_drift` | Duration | `5m` | Maximum allowable offset for a message's timestamp into the future. If a message's timestamp is further in the future than this, it is overwritten | + +## Mapping Configuration + +Field mapping configuration. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `mapping.path` | string | - | Path to mapping file or 'auto' to index all fields as keywords | +| `mapping.enable_updates` | bool | `false` | Will periodically check mapping file and reload configuration if there is an update | +| `mapping.update_period` | Duration | `30s` | Manages how often mapping file will be checked for updates | + +## Documents Sorting Configuration + +Settings for document sorting functionality. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `docs_sorting.enabled` | bool | `false` | Enables/disables documents sorting | +| `docs_sorting.doc_block_size` | Bytes | - | Sets document block size. Large size consumes more RAM but improves compression ratio | + +## Async Search Configuration + +Configuration for asynchronous search operations. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `async_search.data_dir` | string | subdirectory in storage.data_dir | Directory that contains data for asynchronous searches. By default will be subdirectory in `storage.data_dir` | +| `async_search.concurrency` | int | - | Concurrency level for async searches | +| `async_search.max_total_size` | Bytes | `1GiB` | - | +| `async_search.max_size_per_request` | Bytes | `100MiB` | - | + +## API Configuration + +API-related settings. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `api.es_version` | string | `8.9.0` | Default version that will be returned in the `/` handler | + +## Tracing Configuration + +Distributed tracing settings. + +| Field | Type | Default | Description | +|-------|------|---------|-------------| +| `tracing.sampling_rate` | float64 | `0.01` | Sampling rate for distributed tracing | + +## Notes + +- **Bytes**: Size values can be specified with units like `KiB`, `MiB`, `GiB` (e.g., `128MiB`) +- **Duration**: Time values can be specified with units like `ms`, `s`, `m`, `h` (e.g., `3s`, `24h`) +- **Default Values**: Fields without explicit defaults are required unless marked as optional +- **Arrays**: Fields of type `[]string` accept multiple values diff --git a/docs/en/02-flags.md b/docs/en/02-flags.md deleted file mode 100644 index 80b8b592..00000000 --- a/docs/en/02-flags.md +++ /dev/null @@ -1,107 +0,0 @@ ---- -id: flags -position: 1 ---- - -# Flags - -## Flags for Ingestor and Store Modes - -### Basic Flags - -- **--help:** Show short help -- **--mode="ingestor":** Operation mode. You can choose between `ingestor` and `store` modes. - - In store mode, seq-db acts as a stateful replica responsible for storing its part of the data. Implements an internal API that is used to communicate between the store and ingestor. - - In ingestor mode, seq-db acts as a shard and replica coordinator. Implements a public client API. Does not have its own state. -- **--addr=":9002":** Depending on the mode: - - For ingestor mode, this is the address of the public HTTP API (bulk method). - - For store mode, this is the address of the internal gRPC API. By default, port 9002 is used. -- **--debug-addr=":9200":** Address for debugging requests (HTTP). Go metrics and profiling are sent to this address. By default, port 9200 is used. -- **--tracing-probability=0.01:** Tracing probability. - -### Indexing Flags - -- **--mapping=MAPPING:** Path to the file with indexing parameters or value `auto`. See the corresponding section. -- **--case-sensitive:** Token case sensitivity. By default, if not specified, the search is case-insensitive. -- **--max-token-size=72:** Maximum token size. -- **--partial-indexing:** By default, if the indexed value exceeds the maximum size, it is ignored and does not get into the index. If this parameter is set, the value will be truncated to the maximum length and indexed. - -## Flags for Ingestor Mode - -- **--proxy-grpc-addr=":9004":** Address for gRPC requests. By default, port 9004 is used. - -### Clustering Flags - -- **--write-stores=WRITE-STORES:** List of hosts for writing data. Specified as a string with values separated by commas. For example, `--write-stores=host1,host2,host3,host4`. -- **--read-stores=READ-STORES:** List of hosts for reading data. If not set, `--write-stores` is used. Can be used, for example, in case of data migration to other hosts, when we write to some stores and read from other, old stores. -- **--hot-stores=HOT-STORES:** List of `hot` storage hosts. If specified, the proxy works with 2 store clusters: cold (`--write-stores`) and hot (`--hot-stores`). And sends each write request to each of these clusters accordingly. But when reading, it first tries to get data from the `hot` cluster and in some cases from the `cold` one. -- **--hot-read-stores=HOT-READ-STORES:** List of `hot` storage hosts for reading. Can be used when migrating data to other hosts. -- **--store-mode="":** Storage operating mode. If specified, the allowed values are `hot` or `cold`. If the `hot` mode is selected, then when executing a search query, if the requested data range is older than the oldest store documents, the service will return a special error `query wants old data`. Ingestor will make a request to the `cold` store cluster in this case. -- **--replicas=1:** Replication factor for storages. If N is specified, the first N hosts in the list are replicas of one shard, the next N are replicas of another shard, etc. -- **--hot-replicas=HOT-REPLICAS:** Replication factor for hot storages. If not specified, the global factor is used. -- **--shuffle-replicas:** Shuffle replicas before performing a search. If not specified, then the first replica is always read, and only in case of failure, the second one comes. - -### Bulk Request Flags - -- **--bulk-shard-timeout=10s:** Timeout for processing a bulk operation by one shard. -- **--bulk-err-percentage=50:** Error percentage for triggering the overload protection mechanism. See circuitbreaker/README.md for more details. -- **--bulk-bucket-width=1s:** Window width for counting errors. See circuitbreaker/README.md for more details. -- **--bulk-err-count=10:** Number of errors required to trigger bulk protection mechanism. See circuitbreaker/README.md for more details. -- **--bulk-sleep-window=5s:** Time to wait after bulk protection mechanism is triggered. See circuitbreaker/README.md for more details. -- **--bulk-request-volume-threshold=5:** Request volume threshold for bulk protection mechanism to be triggered. See circuitbreaker/README.md for more details. -- **--max-inflight-bulks=32:** Maximum number of concurrent bulk requests that can be processed. - -### Limits Flags - -- **--query-rate-limit=2.0:** Maximum request rate per second. `Search` and `fetch` requests are counted. If the limit is exceeded, the request will return an error. - -### Compression Flags - -- **--docs-zstd-compress-level=3:** ZSTD compression level for documents. More information can be found in the documentation: https://facebook.github.io/zstd/zstd_manual.html. -- **--metas-zstd-compress-level=3:** ZSTD compression level for metadata. More information can be found in the documentation: https://facebook.github.io/zstd/zstd_manual.html. - -### Others Ingestor Flags - -- **--allowed-time-drift=24h:** Maximum allowed time since the document's timestamp. -- **--future-allowed-time-drift=5m:** Maximum allowed future time since the document's timestamp. -- **--es-version="8.9.0":** Elasticsearch version to return in `/` handler. -- **--mirror-addr="":** Seqproxy mirror address. Used for debugging and profiling with load mirroring. - - -## Flags For Store Mode - -### Data Configuration Flags - -- **--data-dir=DATA-DIR:** Directory where data is stored. -- **--frac-size=128MB:** Size of one fraction (minimum fragment of data on disk). The larger the fraction size, the more RAM is required for fraction buffering. -- **--total-size=1GB:** Maximum size of all data. If the data becomes larger than this limit, the oldest data is deleted until the dataset size fits within the specified value. - -### Bulk Request Flags - -- **--requests-limit=16:** Maximum number of simultaneous bulk requests. -- **--skip-fsync:** Skip fsync operations for the active fraction. Speeds up data insertion at the expense of reduced data delivery guarantee. - -### Flags Affecting Search Performance - -- **--reader-workers=128:** The size of the reader pool from the disk. For SSDs, it is recommended to set it equal to the number of cores. -- **--search-workers-count=128:** The number of worker threads that will process the search by fraction. It is recommended to set it equal to the number of cores. -- **--search-requests-limit=30:** The maximum number of simultaneous search requests. If exceeded, the request will return an error. -- **--search-fraction-limit=6000:** The maximum number of fractions used in the search. If the query requires reading more fractions, the query will return an error. -- **--max-search-docs=100000:** The maximum number of documents returned by the search query. -- **--cache-size=8GB:** The maximum cache size. Used when processing search and fetch requests. - -### Compression Flags - -- **--seal-zstd-compress-level=3:** ZSTD compression level for sealed data. - -### Aggregation Flags - -- **--agg-max-group-tokens=2000:** Maximum number of unique tokens for a grouping field affected by an aggregation query. Setting this value to 0 disables the limitation. -- **--agg-max-field-tokens=1000000:** Maximum number of unique tokens for an aggregation function field affected by an aggregation query. Setting this value to 0 disables the limitation. -- **--agg-max-fraction-tids=100000:** Maximum number of unique tokens per fraction for an grouping field or for an aggregation function calculation field. Setting this value to 0 disables the limitation. - -### Logging Flags - -- **--log-search-threshold-ms=3000:** `Search` query logging threshold in milliseconds. All queries that take longer than the specified time are written to the log. -- **--log-fetch-threshold-ms=3000:** `Fetch` query logging threshold in milliseconds. All queries that take longer than the specified time are written to the log. -- **--log-bulk-threshold-ms=LOG-BULK-THRESHOLD-MS:** `Bulk` query logging threshold in milliseconds. A record of such a query is written to the log. diff --git a/docs/en/05-seq-ql.md b/docs/en/05-seq-ql.md index c101d021..b95be949 100644 --- a/docs/en/05-seq-ql.md +++ b/docs/en/05-seq-ql.md @@ -13,7 +13,7 @@ the [index types](03-index-types.md) documentation. When performing a full-text search, the system automatically selects results that match the specified text. Search queries are case-insensitive by default. -To change this behavior, use the `--case-sensitive` flag, but it affects only new documents. +To change this behavior, use the `indexing.case_sensitive` option, but it affects only new documents. ### String Literals diff --git a/docs/en/08-rate-limiting.md b/docs/en/08-rate-limiting.md index a531bd68..31f40215 100644 --- a/docs/en/08-rate-limiting.md +++ b/docs/en/08-rate-limiting.md @@ -30,6 +30,5 @@ such requests by message ID. This is implemented in `search_proxy.go`. ## How to enable the rate limiter -The rate limiter can be enabled on launch using the `query-rate-limit` flag +The rate limiter can be enabled on launch using the `limits.query_rate` option followed by a number -- the maximum number of queries allowed per second. -The default value for this flag is `2.0`. \ No newline at end of file diff --git a/docs/en/09-troubleshooting.md b/docs/en/09-troubleshooting.md index f5d1b11a..fbbffd27 100644 --- a/docs/en/09-troubleshooting.md +++ b/docs/en/09-troubleshooting.md @@ -41,9 +41,9 @@ CPU time. ### Reduce the frac-size -The `frac-size` parameter defines the maximum amount of memory that seq-db will store before sealing the active +The `storage.frac_size` parameter defines the maximum amount of memory that seq-db will store before sealing the active fraction. Reducing this parameter can help lower memory consumption but may increase query times. See more details -about `frac-size` in the [configuration documentation](02-flags.md#data-configuration-flags). +about `storage.frac_size` in the [configuration documentation](02-configuration.md#storage-configuration). ## Issue: Slow Search Queries diff --git a/docs/en/internal/frac-cache.md b/docs/en/internal/frac-cache.md index ba7eb7d2..b4aadeef 100644 --- a/docs/en/internal/frac-cache.md +++ b/docs/en/internal/frac-cache.md @@ -6,7 +6,7 @@ SeqDB stores the indexed data in special structures named *fractions*, each frac A fraction that is currently being written to is called an **active fraction**. A **sealed fraction** is a read-only fraction, that is not going to be modified by the database in any kind. A sealed fraction is only going to be read from. -When an *active* fraction reaches a certain size (configured by the `frac-size` flag), the *active* fraction is turned into a *sealed* one, this process is named **sealing**. The database must have **only one active fraction** at any given time. (TBD link to sealing) +When an *active* fraction reaches a certain size (configured by the `storage.frac_size` option), the *active* fraction is turned into a *sealed* one, this process is named **sealing**. The database must have **only one active fraction** at any given time. (TBD link to sealing) Both sealed and active fractions have the following *metadata*: - `Name` - fraction name diff --git a/docs/ru/02-configuration.md b/docs/ru/02-configuration.md new file mode 100644 index 00000000..664530aa --- /dev/null +++ b/docs/ru/02-configuration.md @@ -0,0 +1,177 @@ +--- +id: configuration +--- + +# Конфигурация + +Этот документ описывает все доступные параметры конфигурации системы. + +## Конфигурация адресов + +Сетевые адреса для различных сервисов. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `address.http` | string | `:9002` | Адрес прослушивания HTTP | +| `address.grpc` | string | `:9004` | Адрес прослушивания GRPC | +| `address.debug` | string | `:9200` | Адрес прослушивания для отладки | + +## Конфигурация хранилища + +Настройки хранилища для сохранения данных. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `storage.data_dir` | string | - | Путь к директории, где будут храниться фракции | +| `storage.frac_size` | Bytes | `128MiB` | Максимальный размер активной фракции перед ее запечатыванием | +| `storage.total_size` | Bytes | `1GiB` | Верхняя граница дискового пространства, которое может быть занято запечатанными фракциями перед их удалением (или отгрузкой в remote хранилище) | + +## Конфигурация кластера + +Топология кластера и настройки репликации. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `cluster.write_stores` | []string | - | Экземпляры холодного хранилища, в которые будет производиться запись | +| `cluster.read_stores` | []string | - | Экземпляры холодного хранилища, из которых будут выполняться запросы | +| `cluster.hot_stores` | []string | - | Экземпляры хранилища, в которые будет производиться запись и из которых будут выполняться запросы | +| `cluster.hot_read_stores` | []string | - | Экземпляры хранилища, из которых будут выполняться запросы. Это поле опционально, но если указано, будет иметь приоритет над `cluster.hot_stores` | +| `cluster.replicas` | int | `1` | Количество экземпляров, принадлежащих одному шарду | +| `cluster.hot_replicas` | int | - | Количество горячих экземпляров, принадлежащих одному шарду. Если указано, будет иметь приоритет над `cluster.replicas` для горячих хранилищ | +| `cluster.shuffle_replicas` | bool | `false` | Перемешивать ли реплики | +| `cluster.mirror_address` | string | - | Хост, на который будут зеркалироваться поисковые запросы. Это может быть полезно, если у вас есть dev-кластер и вы хотите иметь такой же паттерн поиска, как на production-кластере | + +## Конфигурация медленных логов + +Пороговые значения для логирования медленных операций. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `slow_logs.bulk_threshold` | Duration | `0ms` | Длительность для определения медленных bulk-запросов. Когда bulk-запрос превышает этот порог, он будет залогирован | +| `slow_logs.search_threshold` | Duration | `3s` | Длительность для определения медленных поисковых запросов. Когда поисковый запрос превышает этот порог, он будет залогирован | +| `slow_logs.fetch_threshold` | Duration | `3s` | Длительность для определения медленных fetch-запросов. Когда fetch-запрос превышает этот порог, он будет залогирован | + +## Конфигурация лимитов + +Ограничение скорости и ресурсов. + +### Общие лимиты + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `limits.query_rate` | float64 | `2` | Максимальное количество запросов в секунду | +| `limits.search_requests` | int | `32` | Максимальное количество одновременных запросов в секунду | +| `limits.bulk_requests` | int | `32` | Максимальное количество одновременных запросов в секунду | +| `limits.inflight_bulks` | int | `32` | Максимальное количество одновременных запросов в секунду | +| `limits.fraction_hits` | int | `6000` | Максимальное количество фракций, которые могут быть обработаны в рамках одного поискового запроса | +| `limits.search_docs` | int | `100000` | Максимальное количество документов, которые могут быть возвращены в рамках одного поискового запроса | +| `limits.doc_size` | Bytes | `128KiB` | Максимально возможный размер одного документа. Документы больше этого порога будут пропущены | + +### Лимиты агрегаций + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `limits.aggregation.field_tokens` | int | `1000000` | Максимальное количество уникальных токенов полей, которые могут быть обработаны в одном запросе агрегации. Установка этого поля в 0 отключает лимит | +| `limits.aggregation.group_tokens` | int | `2000` | Максимальное количество уникальных токенов групп, которые могут быть обработаны в одном запросе агрегации. Установка этого поля в 0 отключает лимит | +| `limits.aggregation.fraction_tokens` | int | `100000` | Максимальное количество уникальных токенов, содержащихся в одной фракции, которая была выбран запросом агрегации. Установка этого поля в 0 отключает лимит | + +## Конфигурация CircuitBreaker + +Настройки CircuitBreaker для bulk-операций. См. [документацию CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `circuit_breaker.bulk.shard_timeout` | Duration | `10s` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | +| `circuit_breaker.bulk.err_percentage` | int | `50` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | +| `circuit_breaker.bulk.bucket_width` | Duration | `1s` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | +| `circuit_breaker.bulk.buckets_count` | int | `10` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | +| `circuit_breaker.bulk.sleep_window` | Duration | `5s` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | +| `circuit_breaker.bulk.volume_threshold` | int | `5` | См. [CircuitBreaker](https://github.com/ozontech/seq-db/blob/main/network/circuitbreaker/README.md) для дополнительной информации | + +## Конфигурация ресурсов + +Настройки распределения ресурсов. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `resources.reader_workers` | int | runtime.GOMAXPROCS | Количество воркеров для пула чтения. По умолчанию эта настройка равна runtime.GOMAXPROCS | +| `resources.search_workers` | int | runtime.GOMAXPROCS | Количество воркеров для пула поиска. По умолчанию эта настройка равна runtime.GOMAXPROCS | +| `resources.cache_size` | Bytes | 30% доступной RAM | Максимальный размер кэша. По умолчанию эта настройка равна 30% доступной оперативной памяти | +| `resources.sort_docs_cache_size` | Bytes | - | Размер кэша отсортированных документов | +| `resources.skip_fsync` | bool | `false` | Пропускать ли операции fsync | + +## Конфигурация сжатия + +Настройки уровня сжатия для различных типов данных. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `compression.docs_zstd_compression_level` | int | `1` | Уровень сжатия для документов | +| `compression.metas_zstd_compression_level` | int | `1` | Уровень сжатия для метаданных | +| `compression.sealed_zstd_compression_level` | int | `3` | Уровень сжатия для запечатанных фракций | +| `compression.doc_block_zstd_compression_level` | int | `3` | Уровень сжатия для блоков документов | + +## Конфигурация индексирования + +Настройки поведения индексирования документов. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `indexing.max_token_size` | int | `72` | Максимальный размер токена | +| `indexing.case_sensitive` | bool | `false` | Учитывает ли индексирование регистр | +| `indexing.partial_field_indexing` | bool | `false` | Включить ли частичное индексирование полей | +| `indexing.past_allowed_time_drift` | Duration | `24h` | Сколько времени может пройти с момента временной метки сообщения. Если прошло больше времени, чем это значение, временная метка сообщения перезаписывается | +| `indexing.future_allowed_time_drift` | Duration | `5m` | Максимально допустимое смещение временной метки сообщения в будущее. Если временная метка сообщения находится дальше в будущем, чем это значение, она перезаписывается | + +## Конфигурация маппинга + +Конфигурация маппинга полей. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `mapping.path` | string | - | Путь к файлу маппинга или `auto` для индексирования всех полей как `keyword` | +| `mapping.enable_updates` | bool | `false` | Периодически проверять файл маппинга и перезагружать конфигурацию при наличии обновления | +| `mapping.update_period` | Duration | `30s` | Как часто файл маппинга будет проверяться на наличие обновлений | + +## Конфигурация сортировки документов + +Настройки функциональности сортировки документов. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `docs_sorting.enabled` | bool | `false` | Включает/отключает сортировку документов | +| `docs_sorting.doc_block_size` | Bytes | - | Устанавливает размер блока документов. Большой размер потребляет больше оперативной памяти, но улучшает коэффициент сжатия | + +## Конфигурация асинхронного поиска + +Конфигурация для асинхронных поисковых операций. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `async_search.data_dir` | string | поддиректория в storage.data_dir | Директория, содержащая данные для асинхронных поисков. По умолчанию будет поддиректорией в `storage.data_dir` | +| `async_search.concurrency` | int | - | Уровень параллелизма для асинхронных поисков | +| `async_search.max_total_size` | Bytes | `1GiB` | - | +| `async_search.max_size_per_request` | Bytes | `100MiB` | - | + +## Конфигурация API + +Настройки, связанные с API. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `api.es_version` | string | `8.9.0` | Версия по умолчанию, которая будет возвращаться в обработчике `/` | + +## Конфигурация трассировки + +Настройки распределенной трассировки. + +| Параметр | Тип | Значение по умолчанию | Описание | +|----------|-----|----------------------|-----------| +| `tracing.sampling_rate` | float64 | `0.01` | Частота выборки для распределенной трассировки | + +## Примечания + +- **Bytes**: Значения размера могут быть указаны с единицами измерения, такими как `KiB`, `MiB`, `GiB` (например, `128MiB`) +- **Duration**: Временные значения могут быть указаны с единицами измерения, такими как `ms`, `s`, `m`, `h` (например, `3s`, `24h`) +- **Значения по умолчанию**: Поля без явных значений по умолчанию являются обязательными, если не отмечены как опциональные +- **Массивы**: Поля типа `[]string` принимают несколько значений diff --git a/docs/ru/02-flags.md b/docs/ru/02-flags.md deleted file mode 100644 index c98f4065..00000000 --- a/docs/ru/02-flags.md +++ /dev/null @@ -1,102 +0,0 @@ -# Flags - -## Flags for Ingestor and Store Modes - -### Basic Flags - -- **--help:** Show short help -- **--mode="ingestor":** Operation mode. You can choose between `ingestor` and `store` modes. - - In store mode, seq-db acts as a stateful replica responsible for storing its part of the data. Implements an internal API that is used to communicate between the store and ingestor. - - In ingestor mode, seq-db acts as a shard and replica coordinator. Implements a public client API. Does not have its own state. -- **--addr=":9002":** Depending on the mode: - - For ingestor mode, this is the address of the public HTTP API (bulk method). - - For store mode, this is the address of the internal gRPC API. By default, port 9002 is used. -- **--debug-addr=":9200":** Address for debugging requests (HTTP). Go metrics and profiling are sent to this address. By default, port 9200 is used. -- **--tracing-probability=0.01:** Tracing probability. - -### Indexing Flags - -- **--mapping=MAPPING:** Path to the file with indexing parameters or value `auto`. See the corresponding section. -- **--case-sensitive:** Token case sensitivity. By default, if not specified, the search is case-insensitive. -- **--max-token-size=72:** Maximum token size. -- **--partial-indexing:** By default, if the indexed value exceeds the maximum size, it is ignored and does not get into the index. If this parameter is set, the value will be truncated to the maximum length and indexed. - -## Flags for Ingestor Mode - -- **--proxy-grpc-addr=":9004":** Address for gRPC requests. By default, port 9004 is used. - -### Clustering Flags - -- **--write-stores=WRITE-STORES:** List of hosts for writing data. Specified as a string with values ​​separated by commas. For example, `--write-stores=host1,host2,host3,host4`. -- **--read-stores=READ-STORES:** List of hosts for reading data. If not set, `--write-stores` is used. Can be used, for example, in case of data migration to other hosts, when we write to some stores and read from other, old stores. -- **--hot-stores=HOT-STORES:** List of `hot` storage hosts. If specified, the proxy works with 2 store clusters: cold (`--write-stores`) and hot (`--hot-stores`). And sends each write request to each of these clusters accordingly. But when reading, it first tries to get data from the `hot` cluster and in some cases from the `cold` one. -- **--hot-read-stores=HOT-READ-STORES:** List of `hot` storage hosts for reading. Can be used when migrating data to other hosts. -- **--store-mode="":** Storage operating mode. If specified, the allowed values ​​are `hot` or `cold`. If the `hot` mode is selected, then when executing a search query, if the requested data range is older than the oldest store documents, the service will return a special error `query wants old data`. Ingestor will make a request to the `cold` store cluster in this case. -- **--replicas=1:** Replication factor for storages. If N is specified, the first N hosts in the list are replicas of one shard, the next N are replicas of another shard, etc. -- **--hot-replicas=HOT-REPLICAS:** Replication factor for hot storages. If not specified, the global factor is used. -- **--shuffle-replicas:** Shuffle replicas before performing a search. If not specified, then the first replica is always read, and only in case of failure, the second one comes. - -### Bulk Request Flags - -- **--bulk-shard-timeout=10s:** Timeout for processing a bulk operation by one shard. -- **--bulk-err-percentage=50:** Error percentage for triggering the overload protection mechanism. See circuitbreaker/README.md for more details. -- **--bulk-bucket-width=1s:** Window width for counting errors. See circuitbreaker/README.md for more details. -- **--bulk-err-count=10:** Number of errors required to trigger bulk protection mechanism. See circuitbreaker/README.md for more details. -- **--bulk-sleep-window=5s:** Time to wait after bulk protection mechanism is triggered. See circuitbreaker/README.md for more details. -- **--bulk-request-volume-threshold=5:** Request volume threshold for bulk protection mechanism to be triggered. See circuitbreaker/README.md for more details. -- **--max-inflight-bulks=32:** Maximum number of concurrent bulk requests that can be processed. - -### Limits Flags - -- **--query-rate-limit=2.0:** Maximum request rate per second. `Search` and `fetch` requests are counted. If the limit is exceeded, the request will return an error. - -### Compression Flags - -- **--docs-zstd-compress-level=3:** ZSTD compression level for documents. More information can be found in the documentation: https://facebook.github.io/zstd/zstd_manual.html. -- **--metas-zstd-compress-level=3:** ZSTD compression level for metadata. More information can be found in the documentation: https://facebook.github.io/zstd/zstd_manual.html. - -### Others Ingestor Flags - -- **--allowed-time-drift=24h:** Maximum allowed time since the document's timestamp. -- **--future-allowed-time-drift=5m:** Maximum allowed future time since the document's timestamp. -- **--es-version="8.9.0":** Elasticsearch version to return in `/` handler. -- **--mirror-addr="":** Seqproxy mirror address. Used for debugging and profiling with load mirroring. - - -## Flags For Store Mode - -### Data Configuration Flags - -- **--data-dir=DATA-DIR:** Directory where data is stored. -- **--frac-size=128MB:** Size of one fraction (minimum fragment of data on disk). The larger the fraction size, the more RAM is required for fraction buffering. -- **--total-size=1GB:** Maximum size of all data. If the data becomes larger than this limit, the oldest data is deleted until the dataset size fits within the specified value. - -### Bulk Request Flags - -- **--requests-limit=16:** Maximum number of simultaneous bulk requests. -- **--skip-fsync:** Skip fsync operations for the active fraction. Speeds up data insertion at the expense of reduced data delivery guarantee. - -### Flags Affecting Search Performance - -- **--reader-workers=128:** The size of the reader pool from the disk. For SSDs, it is recommended to set it equal to the number of cores. -- **--search-workers-count=128:** The number of worker threads that will process the search by fraction. It is recommended to set it equal to the number of cores. -- **--search-requests-limit=30:** The maximum number of simultaneous search requests. If exceeded, the request will return an error. -- **--search-fraction-limit=6000:** The maximum number of fractions used in the search. If the query requires reading more fractions, the query will return an error. -- **--max-search-docs=100000:** The maximum number of documents returned by the search query. -- **--cache-size=8GB:** The maximum cache size. Used when processing search and fetch requests. - -### Compression Flags - -- **--seal-zstd-compress-level=3:** ZSTD compression level for sealed data. - -### Aggregation Flags - -- **--agg-max-group-tokens=2000:** Maximum number of unique tokens for a grouping field affected by an aggregation query. Setting this value to 0 disables the limitation. -- **--agg-max-field-tokens=1000000:** Maximum number of unique tokens for an aggregation function field affected by an aggregation query. Setting this value to 0 disables the limitation. -- **--agg-max-fraction-tids=100000:** Maximum number of unique tokens per fraction for an grouping field or for an aggregation function calculation field. Setting this value to 0 disables the limitation. - -### Logging Flags - -- **--log-search-threshold-ms=3000:** `Search` query logging threshold in milliseconds. All queries that take longer than the specified time are written to the log. -- **--log-fetch-threshold-ms=3000:** `Fetch` query logging threshold in milliseconds. All queries that take longer than the specified time are written to the log. -- **--log-bulk-threshold-ms=LOG-BULK-THRESHOLD-MS:** `Bulk` query logging threshold in milliseconds. A record of such a query is written to the log. diff --git a/docs/ru/08-rate-limiting.md b/docs/ru/08-rate-limiting.md index c9824440..98727f6f 100644 --- a/docs/ru/08-rate-limiting.md +++ b/docs/ru/08-rate-limiting.md @@ -26,6 +26,5 @@ such requests by message ID. This is implemented in `search_proxy.go`. ## How to enable the rate limiter -The rate limiter can be enabled on launch using the `query-rate-limit` flag +The rate limiter can be enabled on launch using the `limits.query_rate` option followed by a number -- the maximum number of queries allowed per second. -The default value for this flag is `2.0`. \ No newline at end of file diff --git a/docs/ru/09-troubleshooting.md b/docs/ru/09-troubleshooting.md index d5caf7ba..7a851a0a 100644 --- a/docs/ru/09-troubleshooting.md +++ b/docs/ru/09-troubleshooting.md @@ -37,9 +37,9 @@ used by the GC, %" (todo: график не работает). Рекоменд ### Уменьшите значение параметра frac-size -Параметр `frac-size` определяет максимальный объем памяти, используемой seq-db для хранения данных перед завершением +Параметр `storage.frac_size` определяет максимальный объем памяти, используемой seq-db для хранения данных перед завершением активной фракции. Снижение этого параметра помогает сократить потребление памяти, но может увеличить время поиска. -Подробности о параметре `frac-size` доступны в [документации по настройкам](02-flags.md#data-configuration-flags). +Подробности о параметре `storage.frac_size` доступны в [документации по настройкам](02-configuration.md#storage-configuration). ## Проблема: Медленный поиск diff --git a/docs/ru/internal/frac-cache.md b/docs/ru/internal/frac-cache.md index ba7eb7d2..b4aadeef 100644 --- a/docs/ru/internal/frac-cache.md +++ b/docs/ru/internal/frac-cache.md @@ -6,7 +6,7 @@ SeqDB stores the indexed data in special structures named *fractions*, each frac A fraction that is currently being written to is called an **active fraction**. A **sealed fraction** is a read-only fraction, that is not going to be modified by the database in any kind. A sealed fraction is only going to be read from. -When an *active* fraction reaches a certain size (configured by the `frac-size` flag), the *active* fraction is turned into a *sealed* one, this process is named **sealing**. The database must have **only one active fraction** at any given time. (TBD link to sealing) +When an *active* fraction reaches a certain size (configured by the `storage.frac_size` option), the *active* fraction is turned into a *sealed* one, this process is named **sealing**. The database must have **only one active fraction** at any given time. (TBD link to sealing) Both sealed and active fractions have the following *metadata*: - `Name` - fraction name