metrics: handle varying ncpu within a query. #2896

jmcarp · 2025-09-04T14:51:04Z

The metrics dashboards assume that queries return a consistent number of series
over time. For example, if a query for cpu utilization gets four timeseries at
one timepoint, it shouldhave four timeseries at all timepoints. However, this
assumption isn't met when the cardinality of the given resource changes over
time, such as when the user scales the number of vcpus for an instance. This
assumption leads to two bugs:

If the number of vcpus changes within a query, we normalize by the largest
observed vcpu count. In other words, increasing the number vcpus makes
utilization appear lower at timepoints prior to the scaling event.
If a user scales up the number of vcpus such that the 0th vcpu timeseries has
fewer timepoints than subsequent timeseries, we truncate series after the
0th.

This patch attempts to fix both bugs. First, we count the number of non-null
values per timeseries to normalize cpu use. Second, we consider all timestamps
from all returned series so that we don't accidentally truncate the results.

The metrics dashboards assume that queries return a consistent number of series over time. For example, if a query for cpu utilization gets four timeseries at one timepoint, it shouldhave four timeseries at all timepoints. However, this assumption isn't met when the cardinality of the given resource changes over time, such as when the user scales the number of vcpus for an instance. This assumption leads to two bugs: * If the number of vcpus changes within a query, we normalize by the largest observed vcpu count. In other words, increasing the number vcpus makes utilization appear lower at timepoints prior to the scaling event. * If a user scales up the number of vcpus such that the 0th vcpu timeseries has fewer timepoints than subsequent timeseries, we truncate series after the 0th. This patch attempts to fix both bugs. First, we count the number of non-null values per timeseries to normalize cpu use. Second, we consider all timestamps from all returned series so that we don't accidentally truncate the results.

vercel · 2025-09-04T14:51:42Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Updated (UTC)
console	Ready	Preview	Sep 4, 2025 2:52pm

david-crespo · 2025-09-04T15:04:11Z

app/components/oxql-metrics/util.ts

+  const timestamps = R.pipe(
+    timeseriesData,
+    R.map((series) => series.points.timestamps.map((t) => new Date(t).getTime())),
+    R.flat(),


Not that perf matters a ton here, but R.flatMap "is identical to a map followed by a flat of depth 1 (flat(map(data, ...args))), but slightly more efficient than calling those two methods separately"

https://remedajs.com/docs/#flatMap

david-crespo · 2025-09-04T15:28:12Z

Thank you for doing this. This is a gnarly bug and the logic is gnarly so I will have to spend a little time looking through it.

david-crespo · 2025-09-04T15:35:40Z

app/components/oxql-metrics/util.ts

    .slice(1)

-  return { chartData, timeseriesCount }
+  return { chartData, valueCounts }


Because the elements of valueCounts should match up one to one with the elements of chartData, it might be clearer to represent that relationship more directly by either making the count a field on the chartData element or even pre-normalizing the summed values data here by dividing by the count, so you don't have to return the count at all. I haven't worked through it enough to articulate it properly, just musing.

david-crespo · 2025-09-04T15:36:16Z

app/components/oxql-metrics/util.ts

    // Drop the first datapoint, which — for delta metric types — is the cumulative sum of all previous
    // datapoints (like CPU utilization). We've accounted for this by adjusting the start time earlier;
    // We could use a more elegant approach to this down the road
    .slice(1)


does valueCounts also need to get its first value dropped?

Oh's it's worse than that. This needs to be done per-timeseries prior to aggregation now that we realize they can have different first timestamps.

david-crespo · 2025-09-04T15:42:43Z

app/components/oxql-metrics/util.ts

  const summedValues = sumValues(timeseriesData, timestamps.length)
+  const valueCounts = countValues(timeseriesData, timestamps.length)
  const chartData = timestamps
    .map((timestamp, idx) => ({ timestamp, value: summedValues[idx] }))


if you add valueCount here, maybe instead of using idx, summedValues and valuesCounts could be objects indexed by timestamp so we don't have to worry about the arrays lining up index-wise

vercel bot deployed to Preview September 4, 2025 14:51 View deployment

david-crespo reviewed Sep 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metrics: handle varying ncpu within a query. #2896

metrics: handle varying ncpu within a query. #2896

Uh oh!

jmcarp commented Sep 4, 2025

Uh oh!

vercel bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

david-crespo Sep 4, 2025 •

edited

Loading

Uh oh!

david-crespo commented Sep 4, 2025

Uh oh!

david-crespo Sep 4, 2025 •

edited

Loading

Uh oh!

david-crespo Sep 4, 2025

Uh oh!

david-crespo Sep 4, 2025

Uh oh!

david-crespo Sep 4, 2025

Uh oh!

Uh oh!

metrics: handle varying ncpu within a query. #2896

Are you sure you want to change the base?

metrics: handle varying ncpu within a query. #2896

Uh oh!

Conversation

jmcarp commented Sep 4, 2025

Uh oh!

vercel bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-crespo Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-crespo commented Sep 4, 2025

Uh oh!

david-crespo Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-crespo Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

david-crespo Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

david-crespo Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel bot commented Sep 4, 2025 •

edited

Loading

david-crespo Sep 4, 2025 •

edited

Loading

david-crespo Sep 4, 2025 •

edited

Loading