Skip to content

Conversation

habema
Copy link
Contributor

@habema habema commented Aug 14, 2025

Resolves #1385

This PR introduces an optional structured storage mode to SQLiteSession to improve the observability and query-ability of conversation histories.

When a user initializes a session with structured=True, two new tables are created and populated alongside the existing raw JSON log:

  • agent_conversation_messages: Stores distinct user, assistant, and system messages.
  • agent_tool_calls: Records tool call invocations and their corresponding outputs.

This makes it significantly easier to analyze and debug agent interactions using standard SQL, addressing the limitations of querying JSON blobs.

Key points:

  • The feature is opt-in. Existing implementations are unaffected.
  • Foreign key constraints ensure data integrity when items are popped or sessions are cleared.
  • Includes comprehensive unit tests and updated documentation.

@seratch seratch added documentation Improvements or additions to documentation enhancement New feature or request feature:sessions labels Aug 14, 2025
@habema habema marked this pull request as draft August 19, 2025 13:35
@habema
Copy link
Contributor Author

habema commented Aug 19, 2025

Switching this back to a draft as it needs some reviewing

@habema habema marked this pull request as ready for review August 20, 2025 10:49
@habema
Copy link
Contributor Author

habema commented Aug 20, 2025

In true developer fashion, I might have dived a bit too deep. Yet, this all would be very beneficial to my current project.

The problem with my earlier proposed schema is that it lacks a fundamental feature, the ability to connect raw events to each other (user message to assistant message, tool calls to the invoking user message or resulting assistant message(s), usage, etc.), very similarly to tracing functionality.

To sum up the goal, its to make mimic tracing functionality in a structured and query-able manner.

  • Added spans and linkage

    • agent_conversation_messages: added parent_raw_event_id, trace_id, span_id. User rows keep the agent span; assistant rows are now attributed to the model’s generation/response span.
    • agent_tool_calls: added trace_id, span_id.
  • Introduced agent_usage to record per‑response usage: model, requests, token counts, details, and trace_id/span_id. Indexed by (trace_id, created_at) and response_id.

  • Configuration

    • The opt‑in flag is now structured_metadata=True (formerly structured=True).
    • Existing non‑structured behavior is unchanged.
  • Necessary tests, docs, and demo are included.

This keeps the feature opt‑in while making stored conversations, tool calls, and usage easy to analyze with SQL, and ensures accurate span attribution for observability.

@habema habema changed the title Feat: Add Optional Structured Session Storage Feat: Add Optional Structured Session Metadata Aug 20, 2025
@seratch
Copy link
Member

seratch commented Aug 26, 2025

Thanks for sending this PR. I quickly checked the changes and felt the current changes in this PR make the SDK internals way more complex, so I am a bit hesitant to have this.

@habema
Copy link
Contributor Author

habema commented Aug 26, 2025

Thank you for the feedback. I understand the concern about complexity.

To clarify:

  • Is the core idea of structured session metadata (making conversations queryable via SQL) welcome, but the current implementation too complex? If so, I'd be happy to look into alternative approaches. Perhaps just the basic structured tables (messages, tool calls, usage) without the tracing ingestion, or a implement this as an extension to separate it from the SDK internals.
  • Or is the core concept itself considered too complex for the SDK? If the tracing integration is the main concern, I could remove that entirely and focus just on the basic structured storage tables.

As mentioned earlier.

To sum up the goal, its to make mimic tracing functionality in a structured and query-able manner.

For my usecase, exact trace_ids and span_ids are not important, I just want to be able to connect every full interaction and its usage.

@seratch
Copy link
Member

seratch commented Aug 26, 2025

Is the core idea of structured session metadata (making conversations queryable via SQL) welcome, but the current implementation too complex?

Yes, it is. As we discussed at the issue, providing an option to effectively use relational database schema is totally fine but I don't think we need tracing integration this time. We already have sessions feature, so I was assuming that we can have yet another solution for the same purpose.

a implement this as an extension to separate it from the SDK internals.

I haven't checked how this can be at all, but if you have an idea like this in your mind, this could be more clean, plus enhancing the layer could be easier for users too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request feature:sessions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: Database schema for conversations could be better
2 participants