Skip to content

Commit 308b5e9

Browse files
authored
Update 00-cluster-key.md
1 parent a975ffa commit 308b5e9

File tree

1 file changed

+6
-7
lines changed

1 file changed

+6
-7
lines changed

docs/en/guides/55-performance/00-cluster-key.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,9 @@ If your primary queries involve retrieving cities based on their temperature, se
2828

2929
Rows are sorted based on the Temperature column in each block (file). However, there can be overlapping age ranges between blocks. If a query falls precisely within the overlapping range of blocks, it requires reading multiple blocks. The number of blocks involved in this situation is referred to as the "Depth." Therefore, the smaller the depth, the better. This implies that having fewer relevant blocks to read during queries enhances query performance.
3030

31-
To see how well a table is clustered, use the function [CLUSTERING_INFORMATION](/sql/sql-functions/system-functions/clustering_information). For example,
31+
To see how well a table is clustered, use the function [CLUSTERING_INFORMATION](/sql/sql-functions/system-functions/clustering_information).
32+
**Note**: This function works only for clustered tables.
33+
For example,
3234

3335
```sql
3436
SELECT * FROM clustering_information('default','T');
@@ -68,8 +70,7 @@ CREATE TABLE sales (
6870
region VARCHAR,
6971
product_category VARCHAR,
7072
-- Other columns...
71-
CLUSTER BY (order_id)
72-
);
73+
) CLUSTER BY (order_id);
7374
```
7475

7576
On the other hand, if filtering commonly occurs based on `region` and `product_category`, then clustering the table using both columns would be beneficial:
@@ -84,8 +85,7 @@ CREATE TABLE sales (
8485
region VARCHAR,
8586
product_category VARCHAR,
8687
-- Other columns...
87-
CLUSTER BY (region, product_category)
88-
);
88+
) CLUSTER BY (region, product_category);
8989
```
9090

9191
When choosing a column as the cluster key, ensure that the number of distinct values strikes a balance between being sufficient for effective query performance and being manageable for optimal storage within the system.
@@ -104,8 +104,7 @@ CREATE TABLE sales (
104104
region VARCHAR,
105105
product_category VARCHAR,
106106
-- Other columns...
107-
CLUSTER BY (SUBSTRING(order_id,7,8))
108-
);
107+
) CLUSTER BY (SUBSTRING(order_id,7,8));
109108
```
110109

111110
By clustering the table using the extracted date from the `order_id` column, transactions occurring on the same day are now grouped into the same or adjacent blocks. This frequently results in improved compression and a reduction in the volume of data that must be read from storage during query execution, contributing to enhanced overall performance.

0 commit comments

Comments
 (0)