Skip to content

Conversation

@AndyMoreland
Copy link
Contributor

Summary

  • Documents the Apache DataSketches CPC (Compressed Probability Counting) algorithm used for approx_count_distinct
  • Adds a reference link to the Apache DataSketches CPC documentation in the aggregations table
  • Adds a dedicated section explaining the implementation benefits including memory efficiency, mergeability, and accuracy

This documentation helps users understand when and why to use approx_count_distinct for materialized aggregations, especially for tracking cardinality across large datasets and many time buckets.

Test plan

  • Verify documentation renders correctly at localhost:3001/docs/materialized_aggregations
  • Confirm the Apache DataSketches link works
  • Check that the new section flows well with existing content

🤖 Generated with Claude Code

Document the Apache DataSketches CPC (Compressed Probability Counting) algorithm used for the approx_count_distinct aggregation.

Changes:
- Add reference link to Apache DataSketches CPC documentation in aggregations table
- Add new section explaining the implementation and benefits of approx_count_distinct
- Clarify memory efficiency, mergeability, and accuracy characteristics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants