-
Notifications
You must be signed in to change notification settings - Fork 1.2k
chore(api)!: BREAKING CHANGE: remove ALL telemetry APIs #3740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We are moving away from having our own APIs for serving telemetry, and our own logic for storing it. We want to defer that responsibility to purpose built services for handling telemetry data in production environments like jaeger, prometheus, datadog, etc. This is a best practice and a design pattern a user would recognize and expect. |
@leseb here's a thread in Discord: https://discord.com/channels/1257833999603335178/1422318785175748709 what is your opinion? I feel like a semantic opinionated API like "get me token usage" could be useful in simplifying some of the metrics experience for users (i.e., the Stack would require in its configuration the correct sinks for OTEL and appropriate providers to query for metrics) but there's also the potential danger of needing to cover a lot of territory just for this because people's logging and metrics collection setup can be very very different. |
Instead of building our own syntax, and APIs, what we should do is set a clear and consistent standard for how we name, label and capture telemetry data. Things like metrics can be queried in services like prometheus, which already have their own query language and APIs, and if you want to simplify the user experience or show people a reference architecture, I would extend what we have in the telemetry scripts to include a Grafana dashboard that shows off some of the metrics and trace data we capture. This lets us still have a "simple" experience, while still conforming to standards that our customers expect and use. |
8fc91f9
to
bd759b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also docs/docs/building_applications/telemetry.mdx
that might need updating or deleting.
|
||
@runtime_checkable | ||
class Telemetry(Protocol): | ||
@webmethod(route="/telemetry/events", method="POST", level=LLAMA_STACK_API_V1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably follow: https://llamastack.github.io/docs/concepts/apis/api_leveling#approval-and-announcement-process-for-breaking-changes in this case since we are flat out deleting an API, some sort of announcement on discord and in the upcoming release notes would be great
Thanks for the additional context. That sounds good to me for now. Having a solid internal telemetry implementation is more important than the API surface at this point. The API is something we can revisit later. |
REQUIRED_SCOPE = "telemetry.read" | ||
|
||
|
||
@json_schema_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to remove from Stainless Spec as well
# What does this PR do? ## Test Plan
What does this PR do?
As discussed on discord, we do not need to reinvent the wheel for telemetry. Instead we'll lean into the canonical OTEL stack. Logs/traces/metrics will still be sent via OTEL - they just won't be stored on, queried through Stack.
This is the first of many PRs to remove telemetry API from Stack.
Test Plan