Add Jaeger tracing integration to inferencepool chart

cyclinder · cyclinder · commit 5396689470b9 · 2025-10-30T11:24:20.000+08:00
Signed-off-by: Cyclinder Kuo &lt;qifeng.guo@daocloud.io&gt;
diff --git a/config/charts/inferencepool/Chart.yaml b/config/charts/inferencepool/Chart.yaml
@@ -7,3 +7,9 @@ type: application
 version: 0.0.0
 
 appVersion: "0.0.0"
+
+dependencies:
+  - name: jaeger
+    version: "2.11.0"
+    repository: "https://jaegertracing.github.io/helm-charts"
+    condition: jaeger.enabled
diff --git a/config/charts/inferencepool/README.md b/config/charts/inferencepool/README.md
@@ -237,6 +237,93 @@ inferenceExtension:
 Make sure that the `otelExporterEndpoint` points to your OpenTelemetry collector endpoint. 
 Current only the `parentbased_traceidratio` sampler is supported. You can adjust the base sampling ratio using the `samplerArg` (e.g., 0.1 means 10% of traces will be sampled).
 
+#### Jaeger Tracing Backend
+
+GAIE provides an opt-in Jaeger all-in-one deployment as a sub-chart for easy trace collection and visualization. This is particularly useful for development, testing, and understanding how inference requests are processed (filtered, scored) and forwarded to vLLM models.
+
+**Quick Start with Jaeger:**
+
+To install the InferencePool with Jaeger tracing enabled:
+
+```bash
+# Update Helm dependencies to fetch Jaeger chart
+helm dependency update ./config/charts/inferencepool
+
+# Install with Jaeger enabled
+helm install vllm-llama3-8b-instruct ./config/charts/inferencepool \
+  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
+  --set inferenceExtension.tracing.enabled=true \
+  --set jaeger.enabled=true
+```
+
+Or using a `values.yaml` file:
+
+```yaml
+inferenceExtension:
+  tracing:
+    enabled: true
+    sampling:
+      sampler: "parentbased_traceidratio"
+      samplerArg: "1.0"  # 100% sampling for development
+
+jaeger:
+  enabled: true
+```
+
+Then install:
+
+```bash
+helm dependency update ./config/charts/inferencepool
+helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
+```
+
+**Accessing Jaeger UI:**
+
+Once deployed, you can access the Jaeger UI to visualize traces:
+
+```bash
+# Port-forward to access Jaeger UI
+kubectl port-forward svc/vllm-llama3-8b-instruct-jaeger-query 16686:16686
+
+# Open browser to http://localhost:16686
+```
+
+In the Jaeger UI, you can:
+- Search for traces by service name (`gateway-api-inference-extension`)
+- View detailed span information showing filter and scorer execution
+- Analyze request routing decisions and latency
+- Understand the complete inference request flow
+
+**Configuration Options:**
+
+The Jaeger sub-chart supports the following configuration:
+
+| **Parameter Name**                    | **Description**                                                                                     | **Default**                      |
+|---------------------------------------|-----------------------------------------------------------------------------------------------------|----------------------------------|
+| `jaeger.enabled`                      | Enable Jaeger all-in-one deployment                                                                 | `false`                          |
+| `jaeger.allInOne.enabled`             | Enable all-in-one deployment mode                                                                   | `true`                           |
+| `jaeger.allInOne.image.repository`    | Jaeger all-in-one image repository                                                                  | `jaegertracing/all-in-one`       |
+| `jaeger.allInOne.image.tag`           | Jaeger image tag                                                                                    | `1.62`                           |
+| `jaeger.allInOne.resources.limits`    | Resource limits for Jaeger pod                                                                      | `cpu: 500m, memory: 512Mi`       |
+| `jaeger.allInOne.resources.requests`  | Resource requests for Jaeger pod                                                                    | `cpu: 100m, memory: 128Mi`       |
+| `jaeger.query.service.type`           | Jaeger UI service type                                                                              | `ClusterIP`                      |
+| `jaeger.query.service.port`           | Jaeger UI port                                                                                      | `16686`                          |
+| `jaeger.collector.service.otlp.grpc.port` | OTLP gRPC collector port                                                                        | `4317`                           |
+| `jaeger.storage.type`                 | Storage backend type (memory, elasticsearch, cassandra, etc.)                                       | `memory`                         |
+
+**Important Notes:**
+
+1. **Development vs Production**: The all-in-one deployment uses in-memory storage and is suitable for development and testing. For production use, consider:
+   - Using a persistent storage backend (Elasticsearch, Cassandra, etc.)
+   - Deploying Jaeger components separately for better scalability
+   - Refer to [Jaeger Production Deployment](https://www.jaegertracing.io/docs/latest/deployment/) for best practices
+
+2. **Automatic Configuration**: When `jaeger.enabled=true`, the OTLP exporter endpoint is automatically configured to point to the Jaeger collector. You don't need to manually set `inferenceExtension.tracing.otelExporterEndpoint`.
+
+3. **Sampling Rate**: For development, you may want to set `samplerArg: "1.0"` to capture all traces. For production, use a lower value like `"0.1"` (10%) to reduce overhead.
+
+4. **Resource Requirements**: Adjust the resource limits based on your trace volume and cluster capacity.
+
 ## Notes
 
 This chart will only deploy an InferencePool and its corresponding EndpointPicker extension. Before install the chart, please make sure that the inference extension CRDs are installed in the cluster. For more details, please refer to the [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/).
diff --git a/config/charts/inferencepool/templates/epp-deployment.yaml b/config/charts/inferencepool/templates/epp-deployment.yaml
@@ -114,7 +114,11 @@ spec:
         - name: OTEL_SERVICE_NAME
           value: "gateway-api-inference-extension"
         - name: OTEL_EXPORTER_OTLP_ENDPOINT
+          {{- if .Values.jaeger.enabled }}
+          value: "http://{{ .Release.Name }}-jaeger-collector:4317"
+          {{- else }}
           value: {{ .Values.inferenceExtension.tracing.otelExporterEndpoint | quote }}
+          {{- end }}
         - name: OTEL_TRACES_EXPORTER
           value: "otlp"
         - name: OTEL_RESOURCE_ATTRIBUTES_NODE_NAME
diff --git a/config/charts/inferencepool/values.yaml b/config/charts/inferencepool/values.yaml
@@ -58,6 +58,8 @@ inferenceExtension:
       enabled: false
   tracing:
     enabled: false
+    # When jaeger.enabled is true, this will automatically point to the Jaeger collector
+    # Otherwise, you can specify your own OpenTelemetry collector endpoint
     otelExporterEndpoint: "http://localhost:4317"
     sampling:
       sampler: "parentbased_traceidratio"
@@ -94,4 +96,43 @@ istio:
     trafficPolicy: {}
       # connectionPool:
       #   http:
-      #     maxRequestsPerConnection: 256000
+      #     maxRequestsPerConnection: 256000
+
+# Jaeger tracing backend configuration
+# When enabled, deploys Jaeger all-in-one for trace collection and visualization
+jaeger:
+  enabled: false
+  # Use the all-in-one deployment mode for simplicity
+  # For production, consider using a more robust deployment with separate components
+  allInOne:
+    enabled: true
+    image:
+      repository: jaegertracing/all-in-one
+      tag: "2.11"
+      pullPolicy: IfNotPresent
+    resources:
+      limits:
+        cpu: 500m
+        memory: 512Mi
+      requests:
+        cpu: 100m
+        memory: 128Mi
+  # Expose Jaeger UI service
+  query:
+    service:
+      type: ClusterIP
+      port: 16686
+  # Collector configuration for OTLP
+  collector:
+    service:
+      otlp:
+        grpc:
+          port: 4317
+        http:
+          port: 4318
+  # Storage configuration - use in-memory for simplicity
+  storage:
+    type: memory
+  # Agent configuration
+  agent:
+    enabled: false