-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
Summary
Add decomposition support for statistical aggregation functions, enabling pre-aggregation of variance, standard deviation, covariance, and correlation metrics in materialized cubes.
Implementation
Variance & Standard Deviation (Single-column)
Mathematical basis: VAR_POP = E[X²] - E[X]²
| Function | Components | Combine Formula |
|---|---|---|
VAR_POP(x) |
sum, sum_sq, count | (sum_sq/n) - (sum/n)² |
VAR_SAMP(x) |
sum, sum_sq, count | (n*sum_sq - sum²) / (n*(n-1)) |
STDDEV_POP(x) |
sum, sum_sq, count | SQRT(VAR_POP) |
STDDEV_SAMP(x) |
sum, sum_sq, count | SQRT(VAR_SAMP) |
Component definitions:
ComponentDef("_sum", "SUM", "SUM") # sum of x
ComponentDef("_sum_sq", "SUM(POWER({}, 2))", "SUM") # sum of x²
ComponentDef("_count", "COUNT", "SUM") # countCovariance (Two-column)
Mathematical basis: COVAR_POP = E[XY] - E[X]*E[Y]
| Function | Components | Combine Formula |
|---|---|---|
COVAR_POP(x, y) |
sum_x, sum_y, sum_xy, count | (sum_xy/n) - (sum_x/n)*(sum_y/n) |
COVAR_SAMP(x, y) |
sum_x, sum_y, sum_xy, count | (n*sum_xy - sum_x*sum_y) / (n*(n-1)) |
Component definitions:
ComponentDef("_sum_x", "SUM({0})", "SUM", arg_index=0)
ComponentDef("_sum_y", "SUM({1})", "SUM", arg_index=1)
ComponentDef("_sum_xy", "SUM({0} * {1})", "SUM", arg_index=None) # product of args
ComponentDef("_count", "COUNT({0})", "SUM", arg_index=0)Correlation (Two-column)
Mathematical basis: CORR = COVAR(X,Y) / (STDDEV(X) * STDDEV(Y))
| Function | Components | Notes |
|---|---|---|
CORR(x, y) |
sum_x, sum_y, sum_x_sq, sum_y_sq, sum_xy, count | 6 components |
Component definitions:
ComponentDef("_sum_x", "SUM({0})", "SUM", arg_index=0)
ComponentDef("_sum_y", "SUM({1})", "SUM", arg_index=1)
ComponentDef("_sum_x_sq", "SUM(POWER({0}, 2))", "SUM", arg_index=0)
ComponentDef("_sum_y_sq", "SUM(POWER({1}, 2))", "SUM", arg_index=1)
ComponentDef("_sum_xy", "SUM({0} * {1})", "SUM", arg_index=None)
ComponentDef("_count", "COUNT({0})", "SUM", arg_index=0)Combine formula:
numerator = n * sum_xy - sum_x * sum_y
var_x = n * sum_x_sq - sum_x²
var_y = n * sum_y_sq - sum_y²
CORR = numerator / SQRT(var_x * var_y)
Metadata
Metadata
Assignees
Labels
No labels