chalk-ai · AndyMoreland · Oct 23, 2025
diff --git a/materialized_aggregations.mdx b/materialized_aggregations.mdx
@@ -82,7 +82,7 @@ The following table lists the supported aggregations along with some notes.
 |`count` | |
 |`std` | Standard deviation. Requires at least 2 values. |
 |`var` | Variance. Same requirements as `std`. |
-|`approx_count_distinct` | An approximation of the cardinality of non-null data. |
+|`approx_count_distinct` | An approximation of the cardinality of non-null data. Uses [Apache DataSketches CPC](https://datasketches.apache.org/docs/CPC/CpcSketches.html). |
 
 These aggregations can be applied to DataFrame features that represent a [has-many](/docs/has-many) join relationship
 between two feature classes. Typically, these joins can be defined using a join key, like in our previous example:
@@ -153,6 +153,23 @@ class User:
 
 ---
 
+## Approximate Count Distinct
+
+The `approx_count_distinct` aggregation provides an efficient way to estimate the number of unique values in your data
+using the [Compressed Probability Counting (CPC) sketch](https://datasketches.apache.org/docs/CPC/CpcSketches.html)
+algorithm from Apache DataSketches.
+
+### Why use approximate count distinct?
+
+Computing exact distinct counts for large datasets can be memory-intensive and slow, especially for materialized
+aggregations where you need to track uniqueness across many time buckets. The CPC sketch algorithm provides:
+
+- **Memory efficiency**: Uses significantly less memory than storing all unique values
+- **Mergeable sketches**: Partial aggregates from different buckets can be efficiently combined
+- **High accuracy**: Provides estimates with low relative error (typically < 2% for reasonable sketch sizes)
+
+---
+
 ## How do I use materialized aggregations with Chalk?
 
 Users can materialize a feature aggregation in Chalk by supplying the [`materialization`](/api-docs#windowed.materialization)