diff --git a/README.md b/README.md index 3b68525d..8c6241fc 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,7 @@ ## Overview [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/ozontech/seq-db/graphs/commit-activity) [![CI](https://github.com/ozontech/seq-db/actions/workflows/ci.yml/badge.svg)](https://github.com/ozontech/seq-db/actions/workflows/go.yml) +![Telegram](https://telegram-badge.vercel.app/api/telegram-badge?channelId=@file_d_community) [![Code coverage](https://codecov.io/github/ozontech/seq-db/coverage.svg?branch=main)](https://codecov.io/github/ozontech/seq-db?branch=main) [![GitHub go.mod Go version of a Go module](https://img.shields.io/github/go-mod/go-version/ozontech/seq-db)](https://github.com/ozontech/seq-db) [![GoReportCard example](https://goreportcard.com/badge/github.com/ozontech/seq-db)](https://goreportcard.com/report/github.com/ozontech/seq-db) diff --git a/docs/en/07-long-term-store.md b/docs/en/07-long-term-store.md deleted file mode 100644 index bd160da8..00000000 --- a/docs/en/07-long-term-store.md +++ /dev/null @@ -1,93 +0,0 @@ ---- -id: long-term-store ---- - -# Long term stores - -## Problem -Currently seq-db is using SSD storage to ensure good performance for users. -But SSD storage is limited, so we can't store a lot of historical data. -At the same time a small number of requests want to get historical data -for a long period of time. - -## Solution -Natural solution to this is to introduce long term (cold) stores with -large storage (possibly HDD). So, data should be written to both types of -stores. Most reads go to hot store, but reads for long periods should go -to long term stores. - -## Stores -Ingestor knows several types of stores: -- hot stores (always used for write, used for search only if hot read stores are not enabled) -- hot read stores -- long term (cold) stores (always used for write, used for search only if cold read stores are not enabled) -- long term (cold) read stores - -### Read stores -Read mode of stores is needed for migration. - -Since write operation fails on single write failure it is necessary to exclude machine to be migrated from the write list. -It is done by enabling read stores (hot/cold respectively). If read stores are set, querying is done only through them and -regular stores continue to be used only for write. - -Thus to move a regular (hot/cold) store `M` to another machine the pattern is: -- enable read stores (move all stores to be queried to the read list, including M) -- exclude M from regular list -- restart ingestor -- shutdown M. Since M is excluded from write, write operations will not fail -- migrate store -- disable read stores, return M to regular list -- restart ingestor - -### Write -When data is written (bulk send), it is first sent to hot stores, then to cold stores. Error in writing to any of them results an overall error. -Currently data can be saved in long term store, but not in hot (TODO: fix). - -### Querying -On search hot stores are queried first. - -Hot stores refuse to search if `From` field is less than the oldest MID on this store -(it means search may ask for data that is already rotated on the store). Ingestor -receives an error and in case it has long term stores configured, queries them. -Both hot and cold store can return partial response, and it will be considered valid. -For now there are no error type checking because it is not trivial for GRPC, -this will be implemented in the future. - -### Avoid old docs in hot store -There is a problem that a doc with very old timestamp may be submitted to hot store. -This doc will have very low seq.MID and sooner or later it will become the oldest -MID in the store. This will result a behavior that hot store will answer to wider -range of queries, when normally they should be sent to long term store. - -To avoid this need to make an important change to bulk process. There is now a special check of time field -and three possible outcomes: -- Time field exists and has a correct value (not older than 24h from now): - in this case no changes to doc and seq.MID calculated from this value. -- Time field does not exist: doc is not changed, seq.MID calculated - from time.Now() -- Time field holds very old value (more than 24h from now): in this - case doc is changed, value of time field is changed to time.Now(), seq.MID - is calculated from that. Original timestamp is stored in field - `original_timestamp`, this field is overwritten if exists. - -## Deploy -As we don't have long term store right now, deploy will be done in several steps -to avoid interruption of service. - -### First step -Current stores become hot stores, but hot mode is not enabled. In this case -behavior is the same as in older code. This removes an option to have read/write -stores separately, but it's necessary. - -### Second step -This step may be done together with the first step. We create new stores with -large storage and add them to ingestors as write-stores. Now ingestors write -data to both types of stores, but read queries will go to hot stores only. -After that we need to wait for long term stores to have data at least for -the same period as hot stores. - -### Third step -Now we can add all long term stores as read stores and enable hot store-mode -for hot stores. This effectively enables a new scheme, when hot store can return -error and query will go to read (long term) stores. As before, write stores should -be a subset of read stores. diff --git a/docs/en/08-rate-limiting.md b/docs/en/08-rate-limiting.md deleted file mode 100644 index 31f40215..00000000 --- a/docs/en/08-rate-limiting.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -id: rate-limiting ---- - -# Rate limiting requests - -Obviously there is a need to rate limit some requests from users or other -services. Right now we use simple internal implementation of RateLimiter, -see `network/ratelimiter.go`, it is enough -for current tasks. Following sections describe the use cases for -rate limiter. - -## Rate limiting search queries - -Because of bugs in UI or script automation there is a possibility of -repeating the same search query multiple times. Search query may create -a significant load on stores, and to evade useless work, search queries -are rate limited by stores. Two queries are considered identical if they -have same query string, aggregation and interval. This is implemented in -`search_store.go`. - -## Rate limiting document fetching - -There are 2 cases of document fetching, first is made after search query -found IDs and fetching is needed to return results to user. Second is -when document is directly requested from API on ingestor. Second way -is vulnerable to DDOS kind of attack, because fetching by ID is not -simple operation for now. So rate limiter is implemented to throttle -such requests by message ID. This is implemented in -`search_proxy.go`. - -## How to enable the rate limiter -The rate limiter can be enabled on launch using the `limits.query_rate` option -followed by a number -- the maximum number of queries allowed per second. diff --git a/docs/ru/07-long-term-store.md b/docs/ru/07-long-term-store.md deleted file mode 100644 index f1262af5..00000000 --- a/docs/ru/07-long-term-store.md +++ /dev/null @@ -1,89 +0,0 @@ -# Long term stores - -## Problem -Currently seq-db is using SSD storage to ensure good performance for users. -But SSD storage is limited, so we can't store a lot of historical data. -At the same time a small number of requests want to get historical data -for a long period of time. - -## Solution -Natural solution to this is to introduce long term (cold) stores with -large storage (possibly HDD). So, data should be written to both types of -stores. Most reads go to hot store, but reads for long periods should go -to long term stores. - -## Stores -Ingestor knows several types of stores: -- hot stores (always used for write, used for search only if hot read stores are not enabled) -- hot read stores -- long term (cold) stores (always used for write, used for search only if cold read stores are not enabled) -- long term (cold) read stores - -### Read stores -Read mode of stores is needed for migration. - -Since write operation fails on single write failure it is necessary to exclude machine to be migrated from the write list. -It is done by enabling read stores (hot/cold respectively). If read stores are set, querying is done only through them and -regular stores continue to be used only for write. - -Thus to move a regular (hot/cold) store `M` to another machine the pattern is: -- enable read stores (move all stores to be queried to the read list, including M) -- exclude M from regular list -- restart ingestor -- shutdown M. Since M is excluded from write, write operations will not fail -- migrate store -- disable read stores, return M to regular list -- restart ingestor - -### Write -When data is written (bulk send), it is first sent to hot stores, then to cold stores. Error in writing to any of them results an overall error. -Currently data can be saved in long term store, but not in hot (TODO: fix). - -### Querying -On search hot stores are queried first. - -Hot stores refuse to search if `From` field is less than the oldest MID on this store -(it means search may ask for data that is already rotated on the store). Ingestor -receives an error and in case it has long term stores configured, queries them. -Both hot and cold store can return partial response, and it will be considered valid. -For now there are no error type checking because it is not trivial for GRPC, -this will be implemented in the future. - -### Avoid old docs in hot store -There is a problem that a doc with very old timestamp may be submitted to hot store. -This doc will have very low seq.MID and sooner or later it will become the oldest -MID in the store. This will result a behavior that hot store will answer to wider -range of queries, when normally they should be sent to long term store. - -To avoid this need to make an important change to bulk process. There is now a special check of time field -and three possible outcomes: -- Time field exists and has a correct value (not older than 24h from now): - in this case no changes to doc and seq.MID calculated from this value. -- Time field does not exist: doc is not changed, seq.MID calculated - from time.Now() -- Time field holds very old value (more than 24h from now): in this - case doc is changed, value of time field is changed to time.Now(), seq.MID - is calculated from that. Original timestamp is stored in field - `original_timestamp`, this field is overwritten if exists. - -## Deploy -As we don't have long term store right now, deploy will be done in several steps -to avoid interruption of service. - -### First step -Current stores become hot stores, but hot mode is not enabled. In this case -behavior is the same as in older code. This removes an option to have read/write -stores separately, but it's necessary. - -### Second step -This step may be done together with the first step. We create new stores with -large storage and add them to ingestors as write-stores. Now ingestors write -data to both types of stores, but read queries will go to hot stores only. -After that we need to wait for long term stores to have data at least for -the same period as hot stores. - -### Third step -Now we can add all long term stores as read stores and enable hot store-mode -for hot stores. This effectively enables a new scheme, when hot store can return -error and query will go to read (long term) stores. As before, write stores should -be a subset of read stores. diff --git a/docs/ru/08-rate-limiting.md b/docs/ru/08-rate-limiting.md deleted file mode 100644 index 98727f6f..00000000 --- a/docs/ru/08-rate-limiting.md +++ /dev/null @@ -1,30 +0,0 @@ -# Rate limiting requests - -Obviously there is a need to rate limit some requests from users or other -services. Right now we use simple internal implementation of RateLimiter, -see `network/ratelimiter.go`, it is enough -for current tasks. Following sections describe the use cases for -rate limiter. - -## Rate limiting search queries - -Because of bugs in UI or script automation there is a possibility of -repeating the same search query multiple times. Search query may create -a significant load on stores, and to evade useless work, search queries -are rate limited by stores. Two queries are considered identical if they -have same query string, aggregation and interval. This is implemented in -`search_store.go`. - -## Rate limiting document fetching - -There are 2 cases of document fetching, first is made after search query -found IDs and fetching is needed to return results to user. Second is -when document is directly requested from API on ingestor. Second way -is vulnerable to DDOS kind of attack, because fetching by ID is not -simple operation for now. So rate limiter is implemented to throttle -such requests by message ID. This is implemented in -`search_proxy.go`. - -## How to enable the rate limiter -The rate limiter can be enabled on launch using the `limits.query_rate` option -followed by a number -- the maximum number of queries allowed per second.