Flink Agents Metrics Design #73

GreatEugenius · 2025-07-21T11:25:08Z

GreatEugenius
Jul 21, 2025
Collaborator

Introduce

In Flink Agents, we aim to introduce a metrics mechanism that allows users not only to understand their jobs through output results but also to easily collect and analyze the usage of different Events, Actions, Models, and Tools within the Agent jobs by integrating with external systems. When necessary, we will also set up monitoring and alerting for key metrics—such as abnormal spikes in Token usage over a short period. This will help users better understand the runtime status of their jobs, enabling them to refine and optimize their workflows to achieve desired outcomes.
We plan to integrate Flink Agents Metrics into Flink’s built-in metric system. Through the Flink Web UI, users can conveniently view relevant parameters. Additionally, leveraging the rich set of Metric Reporters provided by the Flink metric system, users will be able to export Flink runtime metrics to external systems, enabling visualization and setting up monitoring alerts.

Flink Agents Metric Design

In Flink Agent jobs, the ActionExecutionOperator plays a central role in the entire Agent lifecycle — from event parsing to action execution. To align with Flink’s native metric architecture, we design the metric system by leveraging a ProxyMetricGroup associated with the ActionExecutionOperator. This ensures that all metrics defined under the FlinkAgentMetricGroup are automatically registered at the operator-level (OperatorMetricGroup), enabling seamless integration with Flink’s built-in monitoring capabilities.
The Flink Agents metrics system is logically divided into two main categories:

Builtin Metrics: These are predefined metrics designed for common use cases, such as tracking agent throughput, latency, and resource utilization. They are implemented directly within core components like the ActionExecutionOperator and BuiltInAction, ensuring automatic collection during job execution.
User-defined Metrics: For custom or scenario-specific requirements, the system supports user-defined metrics via an extensible API. Users can create and register their own metrics using the provided MetricGroup interface, offering flexibility for advanced monitoring needs.

Flink Agents Builtin Metric

To provide a comprehensive and extensible metric system, the Flink Agents project defines a set of Builtin Metrics, which are automatically collected during job execution. These metrics are implemented directly within core components such as ActionExecutionOperator and BuiltInAction, ensuring minimal user configuration while offering rich visibility into agent behavior.

Survey of Metric Support in Agent-Based Frameworks

When developing metrics for Flink Agents, we analyzed the types of metrics supported by mainstream frameworks such as LlamaIndex and LangChain. Based on this analysis, metrics can be broadly categorized into three groups:

1. Performance Monitoring Metrics

These metrics track system performance characteristics such as response time and throughput. Commonly collected using tools like Prometheus or Grafana, they provide insights into runtime behavior.

2. Resource and Cost Metrics

These metrics monitor resource usage, particularly in relation to model token consumption. They are essential for cost tracking and optimization.

Common metrics include:

Token Usage:
- total_tokens: total number of tokens
- prompt_tokens: number of prompt tokens
- completion_tokens: number of generated tokens

3. Model Evaluation Metrics

Model Evaluation Metrics are used to assess the quality of model input and output, such as correctness, semantic similarity, and preference alignment. These metrics are typically applied in offline evaluation scenarios rather than real-time monitoring.

Common types include accuracy checks, text similarity measurements (e.g., based on string or embedding distances), and pairwise comparisons between model outputs for preference scoring or ranking.

Proposed Scope of Builtin Metrics in Flink Agents

The MVP version of Flink Agents focuses on implementing Performance Monitoring Metrics and Resource & Cost Metrics, which are most relevant for real-time monitoring and observability. Support for Model Evaluation Metrics is planned for future versions.

We have designed the following metrics at different levels of granularity, using a two-dimensional structure based on component type (Agent, Event, Action, Model, Tool) and metric type (Count, Meter, Histogram):

Component Type	Count	Meter	Histogram
Agent (Operater Builtin has been implemented)	NumOfInput NumOfOutput	NumOfInputPerSec NumOfOutputPerSec
Event	NumOfEventProcessed	NumOfEventProcessedPerSec
Action	NumOfActionsExecuting NumOfActionsExecuted	NumOfActionsExecutedPerSec	ActionExecutionTime
Model	NumOfModelCalls NumOfToken NumOfPromptToken NumCompletionToken	NumOfModelCallsPerSec NumOfTokenPerSec NumOfPromptTokenPerSec NumCompletionTokenPerSec	ModelCallsDuration
Tool	NumOfToolCall	NumOfToolCallPerSec	ToolCallDuration

Note: Token statistics depend on the output returned by the model, as different models use different tokenizers.

For Builtin Metrics, we provide metrics for Event, ModelChat, and ToolCall at a general level (without type distinction), as well as per-action metrics. For example, the NumOfEvent metric counts the total number of events without differentiation by event type, while the NumOfAction metric tracks both the total number of actions executed and the count per action type.

Implementation Example: NumOfEvent Builtin Metric

To illustrate how Builtin Metrics are implemented, consider the NumOfEvent metric. In the ActionExecutionOperator, event counting can be achieved as follows:

public class ActionExecutionOperator<IN, OUT> {
  private transient FlinkAgentsMetricGroup metricGroup;

  public void open() throws Exception {
    // Initialize metric group with Flink's operator-level metric context
    metricGroup = new FlinkAgentsMetricGroup(getMetricGroup());
  }

  @Override
  public void processElement(StreamRecord<IN> record) throws Exception {
    while (!events.isEmpty()) {
      Event event = events.pop();
      // Record event occurrence using the metric group
      metricGroup.markEvent(event);
    }
  }
}

This implementation ensures that every event processed by the operator is counted and reported via the Flink metric system, enabling visibility through the Web UI or external monitoring tools.

Flink Agents User Define Metric

In Flink Agents, users implement their logic by defining custom Actions that respond to various Events throughout the Agent lifecycle. To support user-defined metrics, we introduce two new methods: get_metric_group() and get_action_metric_group() in the RunnerContext. These methods allow users to create or update global metrics and independent metrics for actions.

This method allows users to access the operator-level OperatorMetricGroup from any Action. With this capability, users can register and update metrics directly at the operator level while defining their Flink Agent jobs. The resulting metric identifier is <host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>.<metrix_name>. We also support Pre Action metrics, with the identifier being <host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>.<action_name>.<metric_name>.

Additionally, the system supports the creation of sub-metric groups to distinguish metrics generated by different Actions. These sub-groups enable more granular tracking and use the identifier: <host>.taskmanager.<tm_id>.<job_name>.<operator_name>.<subtask_index>.<user_define_group_name>.<metrix_name>.

APIs

MetricGroup

The MetricGroup interface enables hierarchical metric management through get_sub_group(name: str) and provides dynamic creation/update of four core metric types: counters (e.g., action counts), meters (e.g., event rates), histograms (e.g., latency distributions), and gauges (e.g., system state). These methods allow users to organize and track metrics within custom Actions while integrating seamlessly with Flink’s operator-level monitoring system.

class MetricGroup(ABC):
    """Abstract base class providing a metric group for action execution.

    This metric group offers access to various metric types.
    """

    @abstractmethod
    def get_sub_group(self, name: str) -> "MetricGroup":
        """Create or retrieve a sub-metric group with the given name.

        Parameters
        ----------
        name : str
            The name of the sub-metric group.
        """

    @abstractmethod
    def get_counter(self, name: str) -> "Counter":
        """Create or retrieve a counter with the given name.

        Parameters
        ----------
        name : str
            The name of the counter.
        """

    @abstractmethod
    def get_meter(self, name: str) -> "Meter":
        """Create or retrieve a meter with the given name.

        Parameters
        ----------
        name : str
            The name of the meter.
        """

    @abstractmethod
    def get_histogram(self, name: str, window_size: int = 100) -> "Histogram":
        """Create or retrieve a histogram with the given name and window size.

        Parameters
        ----------
        name : str
            The name of the histogram.
        window_size : int, optional
            The sliding window size for histogram statistics.
        """

    @abstractmethod
    def get_gauge(self, name: str) -> "Gauge":
        """Create or retrieve a gauge with the given name.

        Parameters
        ----------
        name : str
            The name of the gauge.
        """

Note: In FlinkAgentsMetricGroup, we maintain four internal maps keyed by metric name for each type (Counter, Meter, Histogram, Gauge). If a metric with a given name does not exist, it will be automatically created. Otherwise, the existing metric is returned.

Core Metric Types

Counter

A counter is used to measure the number of occurrences of an event.

class Counter(ABC):
    """A counter that measures the count of events."""

    @abstractmethod
    def inc(self, n: int = 1) -> None:
        """Increment the current count by the given value."""

    @abstractmethod
    def dec(self, n: int = 1) -> None:
        """Decrement the current count by the given value."""

    @abstractmethod
    def get_count(self) -> int:
        """Return the current count."""

Meter

A meter is used to track the rate of events over time (throughput).

class Meter(ABC):
    """A meter that measures throughput."""

    @abstractmethod
    def mark(self, n: int = 1) -> None:
        """Mark the occurrence of one or more events."""

    @abstractmethod
    def get_rate(self) -> float:
        """Return the current event rate per second."""

Histogram

A histogram is used to collect and analyze distributions of values (e.g., latencies).

class Histogram(ABC):
    """A histogram for recording values and computing statistical summaries."""

    @abstractmethod
    def update(self, value: int) -> None:
        """Record a new value into the histogram."""

    @abstractmethod
    def get_mean(self) -> float:
        """Return the average value."""

    @abstractmethod
    def get_max(self) -> int:
        """Return the maximum recorded value."""

    @abstractmethod
    def get_min(self) -> int:
        """Return the minimum recorded value."""

Gauge

A gauge is used to record a single value at a point in time (e.g., current system load).

class Gauge(ABC):
    """A gauge for recording a single value at a specific moment in time."""

    @abstractmethod
    def update(self, value: str) -> None:
        """Update the gauge with the given value."""

    @abstractmethod
    def get_value(self) -> str:
        """Return the current value of the gauge."""

RunnerContext

class RunnerContext(ABC):
  # ...

  @abstractmethod
  def get_agent_metric_group(self) -> MetricGroup:
      """Get the global metric group for flink agents.

      Returns:
      -------
      MetricGroup
          The global metric group shared across all actions.
      """

  @abstractmethod
  def get_action_metric_group(self) -> MetricGroup:
      """Get the individual metric group dedicated for each action.

      Returns:
      -------
      MetricGroup
          The individual metric group specific to the current action.
      """

Examples

class MyWorkflow(Workflow):
    @action(InputEvent)
    @staticmethod
    def first_action(event: Event, ctx: RunnerContext):  # noqa D102
        start_time = time.time_ns()
        input = event.input
        content = input.get_review() + " first action."
        ctx.send_event(MyEvent(value=content))

        # Access the main agent metric group
        metrics = ctx.get_agent_metric_group()

        # Increment counters and meters
        metrics.get_counter("numEvent").inc()
        metrics.get_meter("numEventPerSecond").mark()

        # Access the per-action metric group
        action_metrics = ctx.get_action_metric_group()
        action_metrics.get_histogram("actionLatencyMs") \
            .update(int(time.time_ns() - start_time) // 1000000)

    @action(MyEvent)
    @staticmethod
    def second_action(event: Event, ctx: RunnerContext):  # noqa D102
        input = event.value
        content = input + " second action."
        ctx.send_event(OutputEvent(output=content))

        # Access the main agent metric group
        metrics = ctx.get_metric_group()

        # Update global metrics
        metrics.get_counter("numEvent").inc()
        metrics.get_meter("numEventPerSecond").mark()

        # Creating and tracking metrics for MyEvent using submetric group
        if isinstance(event, MyEvent):
            sub_metrics = metrics.get_sub_metric_group("myEvent")
            sub_metrics.get_counter("numEvent").inc()
            sub_metrics.get_meter("numEventPerSecond").mark()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flink Agents Metrics Design #73

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Flink Agents Metrics Design #73

Uh oh!

Uh oh!

GreatEugenius Jul 21, 2025 Collaborator

Introduce

Flink Agents Metric Design

Flink Agents Builtin Metric

Survey of Metric Support in Agent-Based Frameworks

1. Performance Monitoring Metrics

2. Resource and Cost Metrics

3. Model Evaluation Metrics

Proposed Scope of Builtin Metrics in Flink Agents

Implementation Example: NumOfEvent Builtin Metric

Flink Agents User Define Metric

APIs

MetricGroup

Core Metric Types

Counter

Meter

Histogram

Gauge

RunnerContext

Examples

Replies: 0 comments

GreatEugenius
Jul 21, 2025
Collaborator