Tool/Function calling UI

## Tool Call Current State

Tool calling is currently entirely limited to the back-end chat interface. Tools are registered with ellmer or chatlas (see below), but shinychat does not do anything in the UI to indicate that a tool call is being made.

For background, here are how `ellmer` and `chatlas` register and store tool definitions.

### ellmer

``` r
library(ellmer)

get_current_time <- function(tz = "UTC") {
 format(Sys.time(), tz = tz, usetz = TRUE)
}

tool_get_current_time <- tool(
 get_current_time,
 .description = "Gets the current time in the given time zone.",
 tz = type_string(
 "The time zone to get the current time in. Defaults to `\"UTC\"`.",
 required = FALSE
 )
)

chat <- chat_openai(model = "gpt-4o", echo = "all")
chat$register_tool(tool_get_current_time)

chat$chat("How long ago exactly was the moment Neil Armstrong touched down on the moon?")
#> > How long ago exactly was the moment Neil Armstrong touched down on the moon?
#> < [tool request (call_WkRmaly9E7kgpMB5RPWVzijh)]: get_current_time(tz = "UTC")
#> < [tool request (call_0OqdPpugMw3wjX9IEz1xVwxd)]: get_current_time(tz = 
#> < "America/New_York")
#> > [tool result (call_WkRmaly9E7kgpMB5RPWVzijh)]: 2025-02-27 17:53:17 UTC
#> > [tool result (call_0OqdPpugMw3wjX9IEz1xVwxd)]: 2025-02-27 12:53:17 EST
#> < Neil Armstrong touched down on the moon on July 20, 1969, at 20:17 UTC.
#> < 
#> < As of now, which is February 27, 2025, at 17:53 UTC, it has been 
#> < approximately 55 years, 7 months, and 7 days since that historic moment.
#> <
```

Internally, `tool()` creates [a `ToolDef` instance](https://github.com/tidyverse/ellmer/blob/main/R/tools-def.R) with `name`, `description` and `arguments` properties.

### chatlas

```python
import requests
from chatlas import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

def get_current_temperature(latitude: float, longitude: float):
 """
 Get the current weather given a latitude and longitude.

 Parameters
 ----------
 latitude
 The latitude of the location.
 longitude
 The longitude of the location.
 """
 lat_lng = f"latitude={latitude}&longitude={longitude}"
 url = f"https://api.open-meteo.com/v1/forecast?{lat_lng}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
 response = requests.get(url)
 json = response.json()
 return json["current"]

chat = ChatOpenAI(model="gpt-4o-mini")
chat.register_tool(get_current_temperature)
chat.chat("What's the weather like today in Duluth, MN?", echo="all")
#> 👤 User turn:
#> 
#> What's the weather like today in Duluth, MN?
#> 
#> 🤖 Assistant turn:
#> 
#> # tool request (call_YRma1FOUHVPGHkfylqdHw886)
#> get_current_temperature(latitude=46.7833, longitude=-92.1062)
#> 
#> << 🤖 finish reason: tool_calls >>
#> 
#> 
#> 👤 User turn:
#> 
#> # tool result (call_YRma1FOUHVPGHkfylqdHw886)
#> {'time': '2025-02-27T17:45', 'interval': 900, 'temperature_2m': 3.2, 'wind_speed_10m': 21.6}
#> 
#> 🤖 Assistant turn:
#> 
#> Today in Duluth, MN, the temperature is approximately 3.2°C with a wind speed of 21.6 km/h.
```

[Chat.register_tool()](https://posit-dev.github.io/chatlas/reference/Chat.html#chatlas.Chat.register_tool) has signature `Chat.register_tool(func, *, model=None)`, i.e. it takes a function `func` and determines input parameters from the function docstring. For more complicated tools, you can pass a pydantic model to `model`.

Internally, `.register_tool()` creates [a `Tool` instance](https://github.com/posit-dev/chatlas/blob/main/chatlas/_tools.py) with properties `.func`, `.schema` and `.name`. The `schema` stores the function description.

## Tool calls in shinychat

Here are a few design sketches for what tool calling might look like:

| Step | Block | Inline |
|:-----|:------|:------:|
| Tool call starts | ![Image](https://github.com/user-attachments/assets/af443815-f940-4ff1-86a2-beb8d29af3fb) | ![Image](https://github.com/user-attachments/assets/9db14807-6d1b-4e3f-ab60-a004440324f1) |
| Tool call completes | ![Image](https://github.com/user-attachments/assets/db0d2320-b63b-4603-b9e8-9109b21c9bf9) | ![Image](https://github.com/user-attachments/assets/bfafd3e1-fa6f-475c-a35c-70d662af6d24) |
| Extra info1 | ![Image](https://github.com/user-attachments/assets/2ff5baa7-5376-4fe6-bf72-a49bc16e708a) | ![Image](https://github.com/user-attachments/assets/0ccdfb7c-bdba-4dbb-b410-42cc9b3d894c) |

1. The extra info being outlined here is happening with two different mechanisms. In the block-style UI, the tool call completion hook returns custom UI with Shiny UI elements to show the results. In the inline-style UI, we could use popovers to display additional information about the call, e.g. showing the parameters used, etc.

There are two points in the tool-calling lifecycle where we need status updates:

1. When the tool is called
2. When the tool call completes

I'm envisioning that tools would be registered with shinychat in similar to how they're registered with `chatlas` or `ellmer`. Internally (or alternatively) shinychat could provide classes that extend `chatlas.Turn` or UI methods for `ellmer::TurnDef`.

Ideally, the minimum amount of work required would be to register the tools, from there our default methods could create the UI as needed, using only information in the tool definition. For example, the name of the `get_current_weather` tool above could be converted into the UI label "Get current weather". 

Depending on the use-case, I can also see wanting to pick between a block display or an inline display. The block works best for larger or bigger tasks, and the inline display could be used in situations where each turn is likely to include many tool calls.

Both of these variants would be encapsulated in functions available to users that we call with normal defaults. For example, we could default to showing "Get current weather", but a user could provide their own method that uses our block-display design but changes the title to "Get weather in Duluth, MN".

The completion method would, by default, find the tool call UI added at the start of the tool call and simply mark it as completed by updating its attributes.

That said, we also want it to be possible for the completion callback to send entirely custom UI that replaces the initial tool call UI. This is shown in the last row of the table for the block method, where instead of simply marking the flight search tool call as complete, the data received in the tool call is presented using custom UI.

## Static Case

To start with the most simple and straight-forward case, we'll consider a simple chat with a single tool call.

```
User: What time is it in London?
Assistant: 
 Sure, I can look up the time.
 <tool_request id="123" name="get_current_time" tz="UK/London">
User: <tool_result id="123">2025-03-31 11:12:13</tool_result>
Assistant: It's 11am in London.
```

ellmer handles the tool request in the first assistant message, automatically invokes the tool, and returns the result. Note that in the chat turns, ellmer stores the tool result as a **user** turn, because we ran code locally to invoke the tool. Note also that, in a live session, control of the chat isn't returned to the user until after the second assistant message.

Structurally, the turns look like this at the end of this chat:

* `Turn(role = "user")`
 * `ContentText` (user message)
* `Turn(role = "assistant")`
 * `ContentText` (assistant message)
 * `ContentToolRequest`
* `Turn(role = "user")`
 * `ContentToolResult`
* `Turn(role = "assistant")`
 * `ContentToolText`

Currently, when launching `live_browser(client)` on a chat client object, ellmer calls `contents_markdown()` on the turns. For `contents_markdown(<ContentText>)`, this is a simple transformation that extracts the text entered by the user or returned from the LLM. On the other hand, `contents_markdown(<ContentToolRequest>)` and `contents_markdown(<ContentToolResult>)` are no-ops and hide the tool request/result from the chat UI.

I propose that we introduce a new generic -- `contents_shinychat()` -- that we use instead and that can be used to create a default display for `ContentToolRequest` or `ContentToolResult`. We'd also have a `contents_shinychat(<Chat>)` method, wherein we'd reorganize the turns to coalesce tool results into a single assistant message. We would also suppress the tool request display and only show the tool results.

```mermaid
sequenceDiagram
 participant R as R Session
 participant client
 participant ellmer
 participant Terminal as Terminal (emit)
 participant UI as UI (yield)
 
 
 rect rgba(255, 165, 0, 0.05)
 Note over client: ContentText (user)
 client-->>UI: contents_shinychat(turn)
 Note over UI: User message
 end
 
 rect rgba(100, 100, 255, 0.05)
 Note over client: ContentText (assistant)
 client-->>UI: contents_shinychat(turn)
 Note over UI: Asisstant message
 Note over client: ContentToolRequest
 Note over client: ContentToolResult
 client-->>UI: contents_shinychat(ContentToolResult)
 Note over UI: Tool result display
 end
```

## Live Case

When running live, the `Content*` objects are not directly used in the UI. In general, they're created by ellmer and recorded in the chat's `turns`, but the response from the LLM (API) are streamed to the UI via yielded strings.


```mermaid
sequenceDiagram
 participant R as R Session
 participant client as client (record)
 participant ellmer
 participant Terminal as Terminal (emit)
 participant UI as UI (yield)
 
 UI->>client: User input submitted
 Note over client: ContentText (user)
 client->>ellmer: chat_append(client$stream())
 activate client
 activate ellmer
 ellmer-->>Terminal: emit(chunk)
 Note over Terminal: Text of assistant response
 ellmer-->>UI: yield(chunk)
 Note over UI: Display of assistant response
 ellmer->>client: 
 Note over client: ContentText (assistant)
 deactivate client
 rect rgba(100, 100, 255, 0.05)
 ellmer->>client: Record tool request
 activate ellmer
 Note over client: ContentToolRequest
 ellmer-->>Terminal: emit(request)
 Note over Terminal: echo="all"
 ellmer-->>UI: yield(request)
 Note over UI: on_tool_request(request)
 Note over UI: contents_shinychate(request)
 ellmer->>R: Invoke tool
 activate R
 R-->>UI: chat_append_stream()
 Note over UI: Text appended during tool call
 R->>ellmer: tool result 
 deactivate R
 ellmer-->>Terminal: emit(result)
 Note over Terminal: echo="all"
 ellmer-->>UI: yield(result)
 Note over UI: on_tool_result(result)
 Note over UI: contents_shinychat(result)
 deactivate ellmer
 deactivate ellmer
 ellmer->>client: 
 Note over client: ContentToolResult
 end
```

Here's a description of the process depicted in the sequence diagram. I've added bold to the steps that we would be modifying to make this approach work.

1. User Input: The user provides input through the UI. This input is sent to the client as `ContentText` from the user.
2. Client to ellmer: The client calls `client$stream(input)` and ellmer makes the API request to the LLM and returns a generator as a response. Calling `shinychat::chat_append(stream)` directs yielded strings to the UI.
4. Assistant Response (Text): ellmer emits chunks of the assistant's response to the Terminal (assuming `echo="all"`) and yields chunks to the UI for display, showing the text of the assistant's response in both the Terminal and the UI.
5. Assistant Response (Content): ellmer records the assistant message with `ContentText` and `ContentToolRequest` objects in the assistant turn.
7. **Tool Request:** ellmer currently doesn't yield the tool request, but I propose we add a `yield_all` option to `$stream()` and `$stream_async()`. When `TRUE`, we yield all non-text content at the end of the assistant turn.
8. **Tool Request Display:** shinychat receives the yielded `ContentToolRequest` and transforms it with `contents_shinychat()` before appending to the current chat message.
9. Tool Invocation: ellmer invokes the tool in the R session.
10. Text Appended During Tool Call: During the tool invocation (inside the tool function body), tool authors can use `chat_append_stream()` to append content to the UI. This content is ephemeral unless it ends up recorded in the tool result.
11. **Tool Result:** The tool function returns a result. 

 Currently, this can be any jsonifiable object but doesn't have a special class. In addition to returning a regular R object, I propose we also allow the tool to return a `ContentToolResult` object, which might be a custom user-defined class that inherits from `ContentToolResult`.

 In addition to the properties currently used by `ContentToolResult` -- `id`, `value`, `error` -- we would add `detail` (additional data), `call_tool` (the tool def of the calling tool) and `call_args` (the arguments used when calling the tool).

 Currently, the tool results are converted to `ContentToolResult` and stored as a new user turn, but with `yield_all = TRUE`, ellmer would yield the contents of the turn into the generator.
12. **Tool Result Display:** Again, shinychat would receive the yielded `ContentToolResult`, call `contents_shinychat()` on the `ContentToolResult` and append the formatted result to the chat message. 
13. **On Tool Callbacks:** With this approach, shinychat can also own the `on_tool_request()` and `on_tool_result()` callbacks. The immediate need is to to remove, replace or hide any UI from the `ContentToolRequest` when we receive a `ContentToolResult`. These callbacks do not need to be user-facing at this point.
14. The final goal is that, once we have a paired `ContentToolRequest` and `ContentToolResult`, to have the final live state be equivalent to the chat state in the static case.

## Things we need

- [x] A way to list attached tools from chat, e.g. `chat$get_tools()` or `chat.get_tools()`. Would be useful if we want to take a `chat` client and register its tools with shinychat. Not sure if this is strictly required, but could be a nice addition regardless. **This now exists in ellmer and chatlas.**

- [ ] shinychat gains `contents_shinychat()` generic, with methods for `Chat`, `ContentToolRequest`, `ContentToolResult`, etc., otherwise falling back to `contents_html()` or `contents_markdown()`.

- [ ] Expand data included in the `ContentToolResult` object: `id`, `call_tool`, `call_args` are added by ellmer when the tool is invoked. `value` is the value that's sent to the LLM. `detail` is a list that collects any other data that someone would want to add to the tool result (analogue to `CustomEvent.detail` in JavaScript).

- [ ] ellmer gains support for tools to return `ContentToolResult` objects. If a tool returns a `ContentToolResult`, ellmer fills in `id`, `call_tool` and `call_args`. Otherwise, ellmer creates the `ContentToolResult`.

- [ ] ellmer gains `yield_all` in `Chat$stream()` and `Chat$stream_async()`. When `TRUE`, we yield non-text assistant responses after the assistant turn completes (text is already yielded) and we yield the contents of the user turn added by tool invokation.

- [ ] With the above in place, tool authors could return custom `ContentToolRequest` objects with custom `contents_shinychat` methods for formatting for display in Shiny.

- [ ] Additionally, `ToolDef` should gain an `annotations` property that allows tool definitions to carry additional properties that would be used in display. `annotations` were recently added to the [MCP schema](https://spec.modelcontextprotocol.io/specification/2025-03-26/server/tools/#tool). `contents_shinychat()` would hook into these annotations for displaying the tool name or knowing which tools modify their environment, etc.

- [ ] Relatedly, shinychat needs a way to append to the current chat without knowing the ID of the chat. This would allow tool authors to write tool functions that append to shinychat when it's available or do something else when used without a shinychat UI. That might look something like this:

 ```r
 my_tool <- function(...) {
 chat_ui <- local_shinychat()
 chat_ui$append("Progress: 0%") # no-op if called outside `chat_append()`

 # ... do stuff ...
 chat_ui$replace("Progress: 50%")
 # or maybe
 chat_ui$append("Progress: 50%", operation = "replace")

 # finally...
 chat_ui$replace("All done!")
 result
 }
 ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tool/Function calling UI #31

Tool Call Current State

ellmer

chatlas

Tool calls in shinychat

Static Case

Live Case

Things we need

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tool/Function calling UI #31

Description

Tool Call Current State

ellmer

chatlas

Tool calls in shinychat

Static Case

Live Case

Things we need

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions