-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Problem:
Based on the Trino Contributor call last week, we want to support having a unified way of fetching cluster metrics using REST API. These cluster metrics will be used by Trino-Gateway for routing/load-balancing purposes. As a short term goal, we will focus on getting cluster-wide memory and cpu usage.
Existing solution:
Currently, metrics are scattered as mbeans, behind /metrics, and some are behind/ui/api. But they are all only available as a node-level metrics. There is no aggregated metrics available. Trino-Gateway will need to make network calls to each node in the cluster to fetch metrics, resulting in extra overhead.
Compatibility:
we want to support
- REST API access
- Adding these metrics to system catalog so they can be queried
Solution:
The coordinator will expose an API that will be used by gateway to fetch cluster metrics. Our short term goal will do it on-demand. The coordinator will only aggregate the metrics from workers when the API is called. For long term goal we may want to add caching, maintain an internal state... etc.