Skip to content

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Sep 30, 2025

Standardizes /completions spans to use llm.* prefix with indexed nested attributes, aligning with /chat/completions format:

  • llm.prompts.{i}.prompt.text for completion prompts (nested indexed format)
  • llm.choices.{i}.completion.text for completion outputs (nested indexed format)
  • Adds LLM_CHOICES constant to semantic conventions
  • Reuses existing LLM_PROMPTS constant (un-deprecated for completions use)
  • Adds OPENINFERENCE_HIDE_CHOICES environment variable
  • Reuses existing OPENINFERENCE_HIDE_PROMPTS environment variable
  • Updates spec documentation with proper indexed attribute patterns

Key benefits:

  • Consistent llm.* prefix across all LLM span types (no completion.* top-level prefix)
  • Nested discriminated union structure enables future extensibility
  • Aligns with existing llm.input_messages and llm.output_messages patterns
  • Simplifies attribute parsing and querying with uniform structure

Breaking changes:

  • Attribute names change from completion.prompt.{i}llm.prompts.{i}.prompt.text
  • Attribute names change from completion.text.{i}llm.choices.{i}.completion.text

Note

Migrates OpenAI completions to indexed llm.prompts.N.prompt.text and llm.choices.N.completion.text, adds LLM_CHOICES and OPENINFERENCE_HIDE_CHOICES, and updates masking, tests, and docs accordingly.

  • OpenAI Instrumentation:
    • Standardizes completions request attributes to llm.prompts.N.prompt.text (replaces list under llm.prompts).
    • Emits completion outputs as llm.choices.N.completion.text from response choices.
  • Semantic Conventions:
    • Adds SpanAttributes.LLM_CHOICES; documents indexed/nested format for llm.prompts and llm.choices.
  • Config:
    • Introduces OPENINFERENCE_HIDE_CHOICES and TraceConfig.hide_choices; extends masking to redact llm.prompts.* and llm.choices.* based on hide settings.
  • Tests:
    • Updates OpenAI instrumentation and config tests to assert new indexed attributes and hiding behavior.
  • Docs/Specs:
    • Updates configuration and LLM span specs with indexed patterns and a completions example.

Written by Cursor Bugbot for commit 5a324c1. This will update automatically on new commits. Configure here.

cursor[bot]

This comment was marked as outdated.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 30, 2025
@codefromthecrypt
Copy link
Contributor Author

google-adk haystack and bedrock latest failures are not related to this PR, rather drift on their corresponding latest versions.

@codefromthecrypt codefromthecrypt force-pushed the openinference-completions branch from 4bb94eb to 463fd1e Compare September 30, 2025 02:55
@codefromthecrypt
Copy link
Contributor Author

bedrock ci fix: #2243

@codefromthecrypt
Copy link
Contributor Author

google-adk fix: #2244

@codefromthecrypt
Copy link
Contributor Author

haystack fix: #2245

@axiomofjoy
Copy link
Contributor

Thanks @codefromthecrypt. I think the llm.input_prompts change looks fine if it helps to avoid the attribute size limit. I'm concerned about the llm.output_choices attribute, since it seems like we would need to nest our existing LLM output message attributes under each choice in addition to text. I'm concerned about this resulting in a large amount of duplicate data in the span and quite a few more semantic conventions. Is choices a common pattern for different models and APIs? I have so far only seen it as an idiosyncrasy of the OpenAI API.

Looks like spec/semantic_conventions.md also needs to be updated.

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Sep 30, 2025

@axiomofjoy so what I was told in envoy AI gateway is that /completions is a special case and needed for LoRA use cases, so while it is specified semantically, we can clarify that this is only applies to that endpoint which is implemented several places outside openai including most notably vLLM (bloomberg was talking about this specifically)

So, where I'm getting at is I think we don't imply that /chat/completions and others map to choices, rather the existing semantics which index on messages (a /chat/completions nouns) isn't re-used for the choices of the raw/legacy completions (which has no such response field "messages" only "choices"). In other words, the choices is contained to the completions use case.

If pragmatically we want to map /completions attributes into a synthetic schema to merge with /chat/completions, I'm keen on that, just would like guidance on it. Either way, response arrays have the same size issue as the request ones.

Thoughts?

@codefromthecrypt
Copy link
Contributor Author

so concretely in LLM spans (normal chat completions who have a concept of role) we do traverse the choices path and get to the message.role or message.content field and add them as indexed attributes.

    {
      "key": "output.value",
      "value": {
        "stringValue": "{\"id\":\"chatcmpl-C4Gm9xikLXbgE8He0BHWeoM03aa72\",\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\"Hi there! How can I help you today? I can explain concepts, answer questions, help with writing or editing, brainstorm ideas, assist with coding or math, plan tasks, and more. Tell me what you’d like to do.\",\"refusal\":null,\"role\":\"assistant\",\"annotations\":[]}}],\"created\":1755133833,\"model\":\"gpt-5-nano-2025-08-07\",\"object\":\"chat.completion\",\"service_tier\":\"default\",\"system_fingerprint\":null,\"usage\":{\"completion_tokens\":377,\"prompt_tokens\":8,\"total_tokens\":385,\"completion_tokens_details\":{\"accepted_prediction_tokens\":0,\"audio_tokens\":0,\"reasoning_tokens\":320,\"rejected_prediction_tokens\":0},\"prompt_tokens_details\":{\"audio_tokens\":0,\"cached_tokens\":0}}}"
      }
    },
    {
      "key": "llm.output_messages.0.message.role",
      "value": {
        "stringValue": "assistant"
      }
    },
    {
      "key": "llm.output_messages.0.message.content",
      "value": {
        "stringValue": "Hi there! How can I help you today? I can explain concepts, answer questions, help with writing or editing, brainstorm ideas, assist with coding or math, plan tasks, and more. Tell me what you’d like to do."
      }

In completions there is no structured field inside the choices field, so it doesn't make as much sense to map the same way as chat completions which has structured data, so needs to split out the content vs the role (there is no role)

        "output.value": "{\"id\": \"cmpl-CKz4klHa1MMqAa4hQn3yzIMlLMZHd\", \"object\": \"text_completion\", \"created\": 1759117370, \"model\": \"babbage:2023-07-21-v2\", \"choices\": [{\"text\": \" + fib(n-3) + fib(n-4)\\n\\ndef fib(n):\\n    if n <= 1:\\n        return\", \"index\": 0, \"finish_reason\": \"length\"}], \"usage\": {\"prompt_tokens\": 31, \"completion_tokens\": 25, \"total_tokens\": 56}}",
        "output.mime_type": "application/json",
        "llm.output_choices.0.choice.text": " + fib(n-3) + fib(n-4)\n\ndef fib(n):\n    if n <= 1:\n        return",

So, I think what you are saying is that because the /chat/completions has a choices field which the existing output attributes are sourced from, saying "choices" in the attribute name for the completions response, even if technically valid would be confusing.

What if instead of "llm.output_choices.0.choice.text" we made a pragmatic change to say "llm.output_completion.0.text" . While less technically accurate, it might reduce the confusion?

I'll go ahead and spike this and also update the docs as requested. if you think of something better meanwhile lemme know.

@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Oct 1, 2025

eek.. just realized how bad embeddings looks in practice: "embedding.embeddings.0.embedding.text" "embedding.embeddings.0.embedding.vector"

what if we change both of the special apis embedding and completion to be less repetitive?
"embedding.input.0.text"
"embedding.output.0.vector"

"completion.input.0.prompt"
"completion.output.0.text"

I'll spike this for completions for feedback

@codefromthecrypt codefromthecrypt force-pushed the openinference-completions branch from 463fd1e to 62473a2 Compare October 1, 2025 02:54
cursor[bot]

This comment was marked as outdated.

@codefromthecrypt codefromthecrypt force-pushed the openinference-completions branch from 62473a2 to b3fe44e Compare October 1, 2025 03:11
@codefromthecrypt
Copy link
Contributor Author

codefromthecrypt commented Oct 1, 2025

Current Design: "completion.prompt.N" and "completion.text.N"

Rationale

JSON Path Alignment

The convention directly mirrors the actual OpenAI Completions API structure:

Completions API (simple):

  • Request: { "prompt": "text" } or { "prompt": ["text1", "text2"] }
  • Response: { "choices": [{"text": "output", "index": 0}] }
    → Attributes: "completion.prompt.0", "completion.text.0"

Chat Completions API (structured objects):

  • Request: { "messages": [{"role": "user", "content": "text"}] }
  • Response: { "choices": [{"message": {"role": "assistant", "content": "text"}}] }
    → Attributes: "llm.input_messages.0.message.role", "llm.input_messages.0.message.content"

Key Difference: Completions deals with simple indexed strings, not structured objects with multiple fields. The attribute format reflects this fundamental difference.

Index-at-the-End Convention

For simple values (strings), the index comes last: "completion.prompt.0"

  • JSON path: request.prompt[0]
  • No nested object fields to navigate after the index

For structured objects, index comes in the middle: "llm.input_messages.0.message.content"

  • JSON path: request.messages[0].content
  • Must navigate through object fields after the index

Alternative Conventions (and why they're worse)

"completion.input.N.prompt" + "completion.output.N.text"

  • ❌ Adds unnecessary input/output levels not in the JSON structure
  • JSON path is prompt[N], not input.prompt[N]
  • Creates false hierarchy where none exists

"llm.prompts.N" (old convention)

  • ❌ Uses deprecated llm.prompts attribute
  • ❌ Plural form misleading (each index is ONE prompt, not multiple)
  • ❌ Doesn't distinguish input vs output

"completion.input_prompts.N.prompt.text"

  • ❌ Mimics chat format when completions are fundamentally simpler
  • ❌ Way too verbose for a single string value
  • ❌ Creates fake nesting (prompt.text) that doesn't exist in API

"llm.input_choices.N.choice.prompt"

  • ❌ "choices" is an output concept, not input
  • ❌ Confusing mental model (choices don't have prompts)
  • ❌ Doesn't match API terminology

@codefromthecrypt
Copy link
Contributor Author

openai drift #2253

@codefromthecrypt
Copy link
Contributor Author

beeai drift: #2255

@axiomofjoy
Copy link
Contributor

axiomofjoy commented Oct 2, 2025

@codefromthecrypt After discussing with @mikeldking, here's what we'd like to propose.

Prompts

We'd like to keep llm.prompts rather than llm.input_prompts. While llm.input_prompts does mirror the existing llm.input_messages prefix, prompts are implicitly inputs while messages can be inputs or outputs. We'd prefer to keep the conventions shorter without sacrificing descriptiveness, which we believe is possible in this case. In terms of backward compatibility, we think it's okay to deprecate the usage of llm.prompts with a list of strings attribute value while keeping the llm.prompts prefix. This should involve minimal changes to the implementations in the various OpenInference libraries since this convention is not currently widely used.

We'd also like to propose namespacing the "text" field under a "prompt" field. Concretely, keys would look like "llm.prompts.<prompt_index>.prompt.text". There are a few reasons we advocate for this approach:

  • The second part of the key after the index (i.e., prompt.text) is more descriptive than text alone.
  • It's convenient for downstream consumers of the telemetry data, which receive a payload such as the one below and can use "prompt" as the discriminator in a discriminated union to know what type to expect after accessing "llm.prompts.<prompt_index>.prompt".
{
  "llm": {
    "prompts": [
      "prompt": {
        "text": "Write a haiku"
      }
    ]
  }
}
  • Along those lines, having a mechanism for a discriminated union leaves us open to including other types in addition to "prompt" in the prompts array in the future if needed.
  • It mirrors the structure of our existing conventions for messages, which were chosen for similar reasons to those described above.
{
  "llm": {
    "input_messages": [
      {
        "message": {
          "role": "user",
          "content": "Write a haiku"
        }
      }
    ]
  }
}

Choices

llm.choices as a prefix makes sense to us. The nuance here is that both legacy completions and modern chat completions APIs support multiple choices. Similarly to above, we'd like to leave room for both via discriminated unions. Concretely, we'd like to propose keys of the form "llm.choices.<choice_index>.completion.text". This shares the benefits outlined for the proposed prompt key format. It also results in a format that is consistent between prompts, choices, and messages:

{
  "llm": {
    "prompts": [
      {
        "prompt": {
          "text": "Write a haiku"
        }
      }
    ],
    "input_messages": [
      {
        "message": {
          "role": "user",
          "content": "Write a haiku"
        }
      }
    ],
    "choices": [
      {
        "completion": {
          "text": "Cherry blossoms bloom\nabove New York’s restless streets\nskyline crowned in pink"
        }
      },
      {
        "chat_completion": {
          ... //  leave for another day
        }
      }
    ]
  }
}

Notes

We'd like to avoid adding a top-level "completion" prefix. Currently, the prefixes correspond to the span kind attribute values ("chain", "llm", "embedding", etc.). We still think of legacy completions as LLM spans and don't see a need to introduce a new completion span kind.

@codefromthecrypt
Copy link
Contributor Author

looks beautiful @axiomofjoy @mikeldking thanks for collaborating on this!

@codefromthecrypt codefromthecrypt force-pushed the openinference-completions branch from b3fe44e to 0cf1098 Compare October 3, 2025 01:07
cursor[bot]

This comment was marked as outdated.

@codefromthecrypt
Copy link
Contributor Author

beeai drift refactored here: #2255

Signed-off-by: Adrian Cole <[email protected]>
@codefromthecrypt
Copy link
Contributor Author

also, it seems a trend that if you hide_inputs you also hide things derived from it. similar for outputs. Is that true? if so, maybe I'll do a follow-up to make it coherent in tests and docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants