From 277a3e11d7d43407f3d1eae3e5bea7c94a9f8c3d Mon Sep 17 00:00:00 2001 From: Mano Toth Date: Mon, 24 Nov 2025 14:47:20 +0100 Subject: [PATCH 1/2] Explain limit behavior with summarize --- apl/tabular-operators/limit-operator.mdx | 4 ++ apl/tabular-operators/summarize-operator.mdx | 41 ++++++++++++++++++++ 2 files changed, 45 insertions(+) diff --git a/apl/tabular-operators/limit-operator.mdx b/apl/tabular-operators/limit-operator.mdx index f3ce56d38..f2d95499e 100644 --- a/apl/tabular-operators/limit-operator.mdx +++ b/apl/tabular-operators/limit-operator.mdx @@ -62,6 +62,10 @@ SELECT * FROM sample_http_logs LIMIT 10; The `limit` operator returns the top **`N`** rows from the input dataset. If fewer than **`N`** rows are available, all rows are returned. + +When using `limit` with `summarize` where the first grouping expression is a time bin, the limit behavior differs: Axiom computes the global top **N** groups across all time buckets, then limits each time bucket to only include those groups. This can result in more than **N** total rows. For more information, see [summarize](/apl/tabular-operators/summarize-operator#limit-behavior-with-time-binning). + + ## Use case examples diff --git a/apl/tabular-operators/summarize-operator.mdx b/apl/tabular-operators/summarize-operator.mdx index ca31da1af..75c67de04 100644 --- a/apl/tabular-operators/summarize-operator.mdx +++ b/apl/tabular-operators/summarize-operator.mdx @@ -178,6 +178,47 @@ Returns a table that shows the heatmap in each interval [0, 30], [30, 20, 10], a [Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22%5B%27github-push-event%27%5D%20%7C%20where%20_time%20%3E%20ago(7d)%20%7C%20where%20repo%20contains%20%5C%22axiom%5C%22%20%7C%20summarize%20count()%2C%20numCommits%3Dsum(size)%20by%20_time%3Dbin(_time%2C%203h)%2C%20repo%20%7C%20take%20100%22%2C%22queryOptions%22%3A%7B%22quickRange%22%3A%2230d%22%7D%7D) +## Limit behavior with time-binning + +When using `limit` or `take` with `summarize` where the first grouping expression is a time bin, Axiom applies the following limit behavior: + +1. Compute the global top **N** groups across all time buckets, disregarding the time dimension. +2. Limit each time bucket to only include those groups that are in the global top **N**. + +This means the total number of output rows can be more than **N** rows because each time bucket may contain up to **N** groups. For example, if you have 10 time buckets and limit to 5 groups, you can get up to 50 rows. + +### Workarounds + +To limit the result set to exactly **N** rows: + +- Apply a second `summarize` statement after the first to aggregate further and limit the results. For example: + + ```kusto + ['sample-http-logs'] + | summarize count() by _time=bin(_time, 1h), status + | summarize make_list(count_), make_list(_time) by status + | limit 10 + ``` + +- If you don’t need time as the first grouping expression, reorder your groups so time is not first. For example: + + ```kusto + ['sample-http-logs'] + | summarize count() by status, _time=bin(_time, 1h) + | limit 10 + ``` + +- Convert time to an integer and bin on that instead: + + ```kusto + ['sample-http-logs'] + | extend intTime = toint(_time) + | summarize count() by bin(intTime, 3600), status + | limit 10 + ``` + + This approach makes the first group a non-time field, so the limit applies globally rather than per time bucket. + ## List of related operators - [count](/apl/tabular-operators/count-operator): Use when you only need to count rows without grouping by specific fields. From 63f4a3ff1a9a90447cbfbf32ba5dfc0e18c27e36 Mon Sep 17 00:00:00 2001 From: Mano Toth Date: Mon, 24 Nov 2025 14:48:19 +0100 Subject: [PATCH 2/2] Update summarize-operator.mdx --- apl/tabular-operators/summarize-operator.mdx | 2 -- 1 file changed, 2 deletions(-) diff --git a/apl/tabular-operators/summarize-operator.mdx b/apl/tabular-operators/summarize-operator.mdx index 75c67de04..4a9668471 100644 --- a/apl/tabular-operators/summarize-operator.mdx +++ b/apl/tabular-operators/summarize-operator.mdx @@ -217,8 +217,6 @@ To limit the result set to exactly **N** rows: | limit 10 ``` - This approach makes the first group a non-time field, so the limit applies globally rather than per time bucket. - ## List of related operators - [count](/apl/tabular-operators/count-operator): Use when you only need to count rows without grouping by specific fields.