[JMX Insight] Hadoop jmx metics semconv alignment #14411

robsunday · 2025-08-12T10:07:17Z

Fixes #14274
Includes:

Hadoop JMX metrics semconv alignment
Moved yaml with JMX metrics mapping to the library
Integration tests for Hadoop 2.x and Hadoop 3.x

robsunday · 2025-08-12T10:14:54Z

instrumentation/jmx-metrics/library/hadoop.md

+
+| Metric Name                     | Type          | Attributes                          | Description                                            |
+|---------------------------------|---------------|-------------------------------------|--------------------------------------------------------|
+| hadoop.dfs.capacity             | UpDownCounter | hadoop.node.name                    | Current raw capacity of data nodes.                    |


[for reviewer] Naming prefix of metrics has been change to utilize a metric context (dfs).
Context is described in official docs: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Metrics.html

otelbot-java-instrumentation · 2025-08-12T10:21:53Z

🔧 The result from spotlessApply was committed to the PR branch.

…ent' into hadoop-jmx-metics-semconv-alignment

...trics/library/src/test/java/io/opentelemetry/instrumentation/jmx/rules/TargetSystemTest.java

…gnment

…unday/opentelemetry-java-instrumentation into hadoop-jmx-metics-semconv-alignment

SylvainJuge · 2025-08-19T06:52:14Z

...trics/library/src/test/java/io/opentelemetry/instrumentation/jmx/rules/TargetSystemTest.java

@@ -63,7 +63,7 @@ public class TargetSystemTest {
  private static OtlpGrpcServer otlpServer;
  private static Path agentPath;
  private static Path testAppPath;
-  private static String otlpEndpoint;
+  protected static String otlpEndpoint;


[minor] I would suggest to use a protected getter to expose this as read-only to sub-classes only as the field is not final.

SylvainJuge · 2025-08-19T06:56:34Z

instrumentation/jmx-metrics/library/src/test/resources/hadoop2-env.sh

+# Java agent opts needed for
+export JAVA_AGENT_OPTS="-javaagent:/opentelemetry-instrumentation-javaagent.jar"
+export JAVA_AGENT_OPTS="$JAVA_AGENT_OPTS -Dotel.logs.exporter=none -Dotel.traces.exporter=none -Dotel.metrics.exporter=otlp"
+export JAVA_AGENT_OPTS="$JAVA_AGENT_OPTS -Dotel.exporter.otlp.endpoint=<<ENDPOINT_PLACEHOLDER>> -Dotel.exporter.otlp.protocol=grpc"


If I understand this correctly, the reason we have to use this placeholder and replace it at runtime is because we can't use environment variables to provide it (this does not only applies to otel env variables).

SylvainJuge · 2025-08-19T06:58:08Z

...trics/library/src/test/java/io/opentelemetry/instrumentation/jmx/rules/TargetSystemTest.java

@@ -150,7 +150,7 @@ protected static Map<String, String> otelConfigProperties(List<String> yamlFiles
    // disable runtime telemetry metrics
    config.put("otel.instrumentation.runtime-telemetry.enabled", "false");
    // set yaml config files to test
-    config.put("otel.jmx.target", "tomcat");
+    config.put("otel.jmx.target", "hadoop");


I think this line should probably be removed as we test with only the explicit list of yaml files here.

SylvainJuge · 2025-08-19T07:02:02Z

...jmx-metrics/library/src/test/java/io/opentelemetry/instrumentation/jmx/rules/HadoopTest.java

+    // Hadoop startup script does not propagate env vars to launched hadoop daemons,
+    // so all the env vars needs to be embedded inside the hadoop-env.sh file
+    GenericContainer<?> target =
+        new GenericContainer<>("loum/hadoop-pseudo:3.3.6")
+            .withExposedPorts(9870, 9000)
+            .withCopyToContainer(
+                Transferable.of(readAndPreprocessEnvFile("hadoop3-env.sh")),
+                "/opt/hadoop/etc/hadoop/hadoop-env.sh")
+            .withCreateContainerCmdModifier(cmd -> cmd.withHostName("test-host"))
+            .waitingFor(
+                Wait.forListeningPorts(9870, 9000).withStartupTimeout(Duration.ofMinutes(3)));


[minor] this is the only part that differs between hadoop 2.x and 3.x, it could be worth refactoring this with a common method for the test body that delegates the container creation to a Producer<GenericContainer<?>> lambda with a dedicated implementation for each.

SylvainJuge · 2025-08-19T07:03:49Z

instrumentation/jmx-metrics/library/hadoop.md

+
+| Metric Name                     | Type          | Attributes                          | Description                                            |
+|---------------------------------|---------------|-------------------------------------|--------------------------------------------------------|
+| hadoop.dfs.capacity             | UpDownCounter | hadoop.node.name                    | Current raw capacity of data nodes.                    |


The capacity is in bytes, so maybe we should start adding a dedicated column for the units (then we can do the same for other metrics as follow-up).

also not sure if we should have a metric that is a prefix of another, maybe capacity.raw could work here to replicate the wording in the docs.

SylvainJuge · 2025-08-19T07:28:39Z

instrumentation/jmx-metrics/library/src/main/resources/jmx/rules/hadoop.yaml

+      NumLiveDataNodes:
+        metric: &metric data_node.count
+        type: &type updowncounter
+        unit: &unit "{node}"
+        desc: &desc The number of DataNodes.
+        metricAttribute:
+          hadoop.node.state: const(live)
+      NumDeadDataNodes:
+        metric: *metric
+        type: *type
+        unit: *unit
+        desc: *desc
+        metricAttribute:
+          hadoop.node.state: const(dead)


I am not very familiar with hadoop, but that there are more than 2 states for the data nodes as we can infer from the Num.*Datanodes attributes in doc, so I would suggest to map this as individual metrics instead of using a constant metric attribute as we can't guarantee that it's a partition.

For example, if a node would go into a state that is not mapped here, then it means that the total number of nodes in the cluster would go down by one, even if the node is still part of the cluster. With individual metrics per state, the consumer can expect that the metric does not represent the whole cluster node count.

SylvainJuge · 2025-08-19T08:07:52Z

instrumentation/jmx-metrics/library/src/main/resources/jmx/rules/hadoop.yaml

+---
+rules:
+  - bean: Hadoop:service=NameNode,name=FSNamesystem
+    prefix: hadoop.dfs.


do we need to always have the dfs. infix here ? It's an acronym and probably implicit by hadoop, unless we have other use-cases, for example hadoop.rpc to capture the in/out bytes

As discussed today, while dfs is an acronym, it maps to the "context" in the documentation, also providing some structure and help in the mapping/linking to the MBean attributes. Adding hadoop.rpc.io to with ReceivedBytes and SentBytes MBean attributes would be a worthwile addition that we could include directly into this PR, hence creating at least one metric that is not related to dfs.

instrumentation/jmx-metrics/library/hadoop.md

SylvainJuge · 2025-08-19T08:49:57Z

instrumentation/jmx-metrics/library/src/main/resources/jmx/rules/hadoop.yaml

+        desc: Current number of files and directories.
+      # hadoop.dfs.connection.count
+      TotalLoad:
+        metric: connection.count


Here the .count suffix could be removed, as there is currently no other connection-related metric besides the count in the dfs context. I really don't know what "connection count" means in this context here and if it overlaps or relates with the number of open network connections or not.

With rpc context, we have NumOpenConnections and numDroppedConnections which could (maybe in a future improvement) be mapped respectively to:

hadoop.rpc.connection.count

hadoop.rpc.connection.dropped

So if those metrics were added in the future, then having hadoop.dfs.connection.count and hadoop.rpc.connection.count would be consistent and keeping connection.count as suffix would be a good option.

However, if the dfs.connection.count metric overlaps the rpc.connection.count, then maybe we could only capture rpc.connection.{count,dropped} to prevent any confusion. Due to my lack of knowledge on hadoop, in doubt I would suggest to capture both.

SylvainJuge · 2025-08-19T09:25:28Z

instrumentation/jmx-metrics/library/src/main/resources/jmx/rules/hadoop.yaml

+        desc: Current number of blocks with corrupt replicas.
+      # hadoop.dfs.volume.failure.count
+      VolumeFailuresTotal:
+        metric: volume.failure.count


We could map EstimatedCapacityLostTotal attribute in the future to volume.failure.capacity to provide an estimated capacity lost due to volume failures, so the .count suffix is relevant to keep here.

Also, I think this is safe to keep the volume wording here for clarity as there are lots of different possible failure types.

robsunday added 3 commits August 7, 2025 16:47

Initial commit

889a9c9

Hadoop 2.x and Hadoop 3.x support

3b54d6e

spotless

c60c0e5

robsunday commented Aug 12, 2025

View reviewed changes

robsunday and others added 2 commits August 12, 2025 12:16

Comments updated

a0c743b

./gradlew spotlessApply

be3f90c

robsunday added 3 commits August 12, 2025 12:38

Script fixes

51c3875

Merge remote-tracking branch 'origin/hadoop-jmx-metics-semconv-alignm…

579c8fe

…ent' into hadoop-jmx-metics-semconv-alignment

Script fixes

a4626e7

robsunday commented Aug 12, 2025

View reviewed changes

...trics/library/src/test/java/io/opentelemetry/instrumentation/jmx/rules/TargetSystemTest.java Outdated Show resolved Hide resolved

SylvainJuge assigned robsunday Aug 12, 2025

robsunday and others added 3 commits August 12, 2025 15:46

Exporter port is now ephemeral

551436c

Spotless

e8b8375

Merge branch 'main' into hadoop-jmx-metics-semconv-alignment

9c02f4d

robsunday changed the title ~~Hadoop jmx metics semconv alignment~~ [JMX Insight] Hadoop jmx metics semconv alignment Aug 13, 2025

robsunday and others added 3 commits August 14, 2025 10:30

Merge branch 'open-telemetry:main' into hadoop-jmx-metics-semconv-ali…

ae462de

…gnment

Merge branch 'main' into hadoop-jmx-metics-semconv-alignment

6217116

Merge branch 'hadoop-jmx-metics-semconv-alignment' of github.com:robs…

01f6503

…unday/opentelemetry-java-instrumentation into hadoop-jmx-metics-semconv-alignment

robsunday marked this pull request as ready for review August 14, 2025 13:39

robsunday requested a review from a team as a code owner August 14, 2025 13:39

SylvainJuge reviewed Aug 19, 2025

View reviewed changes

instrumentation/jmx-metrics/library/hadoop.md Show resolved Hide resolved

SylvainJuge reviewed Aug 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JMX Insight] Hadoop jmx metics semconv alignment #14411

[JMX Insight] Hadoop jmx metics semconv alignment #14411

Uh oh!

robsunday commented Aug 12, 2025 •

edited by SylvainJuge

Loading

Uh oh!

robsunday Aug 12, 2025

Uh oh!

otelbot-java-instrumentation bot commented Aug 12, 2025

Uh oh!

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

SylvainJuge Aug 19, 2025 •

edited

Loading

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

Uh oh!

SylvainJuge Aug 19, 2025 •

edited

Loading

Uh oh!

SylvainJuge Aug 19, 2025

Uh oh!

Uh oh!

[JMX Insight] Hadoop jmx metics semconv alignment #14411

Are you sure you want to change the base?

[JMX Insight] Hadoop jmx metics semconv alignment #14411

Uh oh!

Conversation

robsunday commented Aug 12, 2025 • edited by SylvainJuge Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

otelbot-java-instrumentation bot commented Aug 12, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SylvainJuge Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SylvainJuge Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robsunday commented Aug 12, 2025 •

edited by SylvainJuge

Loading

SylvainJuge Aug 19, 2025 •

edited

Loading

SylvainJuge Aug 19, 2025 •

edited

Loading