Skip to content

Commit d718397

Browse files
committed
Clustering: Advertise recommended shard size of 5-50 GB
Before, the upper limit was advertised as 100 GB.
1 parent 9ee24ef commit d718397

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

docs/admin/sharding-partitioning.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ cluster.
111111
Over-sharding and over-partitioning are common flaws leading to an overall
112112
poor performance.
113113

114-
**As a rule of thumb, a single shard should hold somewhere between 5 - 100
114+
**As a rule of thumb, a single shard should hold somewhere between 5 - 50
115115
GB of data.**
116116

117117
To avoid oversharding, CrateDB by default limits the number of shards per
@@ -129,7 +129,7 @@ benchmarks across various strategies. The following steps provide a general guid
129129
- Calculate the throughput
130130

131131
Then, to calculate the number of shards, you should consider that the size of each
132-
shard should roughly be between 5 - 100 GB, and that each node can only manage
132+
shard should roughly be between 5 - 50 GB, and that each node can only manage
133133
up to 1000 shards.
134134

135135
Time series example
@@ -146,12 +146,12 @@ time series data with the following assumptions:
146146
Given the daily throughput is around 10 GB/day, the monthly throughput is 30 times
147147
that (~ 300 GB). The partition column can be day, week, month, quarter, etc. So,
148148
assuming a monthly partition, the next step is to calculate the number of shards
149-
with the **shard size recommendation** (5 - 100 GB) and the **number of nodes** in
149+
with the **shard size recommendation** (5 - 50 GB) and the **number of nodes** in
150150
the cluster in mind.
151151

152-
With three shards, each shard will hold 100 GB (300 GB / 3 shards), which is too
153-
close to the upper limit. With six shards, each shard will manage 50 GB
154-
(300 GB / 6 shards) of data, which is closer to the recommended size range (5 - 100 GB).
152+
With three shards, each shard would hold 100 GB (300 GB / 3 shards), which is above
153+
the upper limit. With six shards, each shard will manage 50 GB (300 GB / 6 shards)
154+
of data, which is right on the spot.
155155

156156
.. code-block:: psql
157157

docs/feature/cluster/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ data loss, and to improve read performance.
111111
## Synopsis
112112
With a monthly throughput of 300 GB, partitioning your table by month,
113113
and using six shards, each shard will manage 50 GB of data, which is
114-
within the recommended size range (5 - 100 GB).
114+
within the recommended size range (5 - 50 GB).
115115

116116
Through replication, the table will store three copies of your data,
117117
in order to reduce the chance of permanent data loss.

0 commit comments

Comments
 (0)