Skip to content

Conversation

@roidelapluie
Copy link
Member

No description provided.

Signed-off-by: Julien Pivotto <[email protected]>
@juliusv
Copy link
Member

juliusv commented Nov 3, 2025

Wow, I really like this and think it's well thought out and clear the way you put it, thanks @roidelapluie!

The only obvious question is of course whether all those scrape-time rules should really live in the main config file, which is inconsistent with normal rules living in separate files. So I wonder if it should be scrape_rule_files instead of scrape_rules (or optionally in addition). But maybe we don't expect people to use many scrape-time rules at all, so putting them into the main config file is more doable?

* **Related Issues and PRs:**
* [Original feature request](https://github.com/prometheus/prometheus/issues/394)

> This proposal introduces the ability to evaluate PromQL expressions at scrape time against raw metrics from a single scrape, before any relabeling occurs. This enables the creation of derived metrics that combine values from the same scrape without the time skew issues inherent in recording rules or query-time calculations. Additionally, by evaluating before relabeling, this enables powerful cardinality reduction strategies where aggregated metrics can be computed and stored while dropping the original high-cardinality metrics.
Copy link
Member

@bwplotka bwplotka Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> This proposal introduces the ability to evaluate PromQL expressions at scrape time against raw metrics from a single scrape, before any relabeling occurs. This enables the creation of derived metrics that combine values from the same scrape without the time skew issues inherent in recording rules or query-time calculations. Additionally, by evaluating before relabeling, this enables powerful cardinality reduction strategies where aggregated metrics can be computed and stored while dropping the original high-cardinality metrics.
> This proposal introduces the ability to evaluate PromQL expressions at scrape time against raw metrics from a single scrape, before metric relabeling occurs. This enables the creation of derived metrics that combine values from the same scrape without the time skew issues inherent in recording rules or query-time calculations. Additionally, by evaluating before metric relabeling, this enables powerful cardinality reduction strategies where aggregated metrics can be computed and stored while dropping the original high-cardinality metrics.

Adding "metric" for relabeling mention as scrape rules (and scrapes in general) will be done AFTER (SD) relabeling

Copy link
Member Author

@roidelapluie roidelapluie Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not really correct, because I want scrape rule BEFORE SD relabeling (which means that the metrics used by the rules DO NOT have the target labels. Juste bare metrics from the scrape. They should look look just as scrape metrics.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! It would be great addition 👍🏽

cc @fpetkovski who started initial experiment.

As discussed before, the main obvious challenge is a relatively limited potential for this feature vs what people actually want, so:

  • Quick 5m rollups (rate/irate) and drop the rest.
  • Instance label drop + aggregation across targets (or even scrapers!) and drop rest.
  • Drop metrics conditionally using existing TSDB data.

This means that if we accept this we might need to manage user expectations, but more we will need to solve this problem anyway eventually, with likely a 3rd solution (e.g. cross process aggregation proxy like vmagent is doing or aggregation 5m stateful layer in Prometheus process). This means user will have likely 3 types of rules to choose from (unless we reuse one of existing config surfaces).

I think that's acceptable and it's good to iterate here, just want to make the consequences clear here (:

The only obvious question is of course whether all those scrape-time rules should really live in the main config file, which is inconsistent with normal rules living in separate files. So I wonder if it should be scrape_rule_files instead of scrape_rules (or optionally in addition). But maybe we don't expect people to use many scrape-time rules at all, so putting them into the main config file is more doable?

Good question. Given it's expected from users to DROP certain metrics based on scrape rule it would be more difficult to see what's happening if the scrape rules were in the remote place. Plus as mentioned before, scrape rules have relatively limited usability, so maybe there won't be that many? Previous small discussion.

@bwplotka bwplotka changed the title Add Scrape Time Rule Evaluation PROM-67: Scrape Time Rule Evaluation Nov 4, 2025
@juliusv
Copy link
Member

juliusv commented Nov 4, 2025

As discussed before, the main obvious challenge is a relatively limited potential for this feature vs what people actually want, so:

Good point. The use case for aggregating across instances is probably going to be way more important than the one for aggregating within an instance.

FWIW, I have no strong opinion on whether the kind of scrape-time rule evaluation as laid out in this proposal here should be added to Prometheus or not, I can find arguments either way. I just think the proposal is well done :)

The only obvious question is of course whether all those scrape-time rules should really live in the main config file, which is inconsistent with normal rules living in separate files. So I wonder if it should be scrape_rule_files instead of scrape_rules (or optionally in addition). But maybe we don't expect people to use many scrape-time rules at all, so putting them into the main config file is more doable?

Good question. Given it's expected from users to DROP certain metrics based on scrape rule it would be more difficult to see what's happening if the scrape rules were in the remote place. Plus as mentioned before, scrape rules have relatively limited usability, so maybe there won't be that many? Previous small discussion.

Agreed with both, even though it's a bit odd :)

@fpetkovski
Copy link

Thanks for writing this out @roidelapluie. Applying aggregations before SD relabeling makes perfect sense and I think will solve the caching issue described in my POC.

@roidelapluie
Copy link
Member Author

Thanks for writing this out @roidelapluie. Applying aggregations before SD relabeling makes perfect sense and I think will solve the caching issue described in my POC.

ideally we would only add the "relevant" metrics to the in memory cache, by pre-parsing rules and extracting matchers as descibed in this document, but that could be added in a second PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants