You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/guides/55-performance/00-cluster-key.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,9 @@ If your primary queries involve retrieving cities based on their temperature, se
28
28
29
29
Rows are sorted based on the Temperature column in each block (file). However, there can be overlapping age ranges between blocks. If a query falls precisely within the overlapping range of blocks, it requires reading multiple blocks. The number of blocks involved in this situation is referred to as the "Depth." Therefore, the smaller the depth, the better. This implies that having fewer relevant blocks to read during queries enhances query performance.
30
30
31
-
To see how well a table is clustered, use the function [CLUSTERING_INFORMATION](/sql/sql-functions/system-functions/clustering_information). For example,
31
+
To see how well a table is clustered, use the function [CLUSTERING_INFORMATION](/sql/sql-functions/system-functions/clustering_information).
32
+
**Note**: This function works only for clustered tables.
On the other hand, if filtering commonly occurs based on `region` and `product_category`, then clustering the table using both columns would be beneficial:
@@ -84,8 +85,7 @@ CREATE TABLE sales (
84
85
region VARCHAR,
85
86
product_category VARCHAR,
86
87
-- Other columns...
87
-
CLUSTER BY (region, product_category)
88
-
);
88
+
) CLUSTER BY (region, product_category);
89
89
```
90
90
91
91
When choosing a column as the cluster key, ensure that the number of distinct values strikes a balance between being sufficient for effective query performance and being manageable for optimal storage within the system.
@@ -104,8 +104,7 @@ CREATE TABLE sales (
104
104
region VARCHAR,
105
105
product_category VARCHAR,
106
106
-- Other columns...
107
-
CLUSTER BY (SUBSTRING(order_id,7,8))
108
-
);
107
+
) CLUSTER BY (SUBSTRING(order_id,7,8));
109
108
```
110
109
111
110
By clustering the table using the extracted date from the `order_id` column, transactions occurring on the same day are now grouped into the same or adjacent blocks. This frequently results in improved compression and a reduction in the volume of data that must be read from storage during query execution, contributing to enhanced overall performance.
0 commit comments