Skip to content

Tool/Function calling UI #31

@gadenbuie

Description

@gadenbuie

Tool Call Current State

Tool calling is currently entirely limited to the back-end chat interface. Tools are registered with ellmer or chatlas (see below), but shinychat does not do anything in the UI to indicate that a tool call is being made.

For background, here are how ellmer and chatlas register and store tool definitions.

ellmer

library(ellmer)

get_current_time <- function(tz = "UTC") {
  format(Sys.time(), tz = tz, usetz = TRUE)
}

tool_get_current_time <- tool(
  get_current_time,
  .description = "Gets the current time in the given time zone.",
  tz = type_string(
    "The time zone to get the current time in. Defaults to `\"UTC\"`.",
    required = FALSE
  )
)

chat <- chat_openai(model = "gpt-4o", echo = "all")
chat$register_tool(tool_get_current_time)

chat$chat("How long ago exactly was the moment Neil Armstrong touched down on the moon?")
#> > How long ago exactly was the moment Neil Armstrong touched down on the moon?
#> < [tool request (call_WkRmaly9E7kgpMB5RPWVzijh)]: get_current_time(tz = "UTC")
#> < [tool request (call_0OqdPpugMw3wjX9IEz1xVwxd)]: get_current_time(tz = 
#> < "America/New_York")
#> > [tool result  (call_WkRmaly9E7kgpMB5RPWVzijh)]: 2025-02-27 17:53:17 UTC
#> > [tool result  (call_0OqdPpugMw3wjX9IEz1xVwxd)]: 2025-02-27 12:53:17 EST
#> < Neil Armstrong touched down on the moon on July 20, 1969, at 20:17 UTC.
#> < 
#> < As of now, which is February 27, 2025, at 17:53 UTC, it has been 
#> < approximately 55 years, 7 months, and 7 days since that historic moment.
#> <

Internally, tool() creates a ToolDef instance with name, description and arguments properties.

chatlas

import requests
from chatlas import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

def get_current_temperature(latitude: float, longitude: float):
    """
    Get the current weather given a latitude and longitude.

    Parameters
    ----------
    latitude
        The latitude of the location.
    longitude
        The longitude of the location.
    """
    lat_lng = f"latitude={latitude}&longitude={longitude}"
    url = f"https://api.open-meteo.com/v1/forecast?{lat_lng}&current=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
    response = requests.get(url)
    json = response.json()
    return json["current"]

chat = ChatOpenAI(model="gpt-4o-mini")
chat.register_tool(get_current_temperature)
chat.chat("What's the weather like today in Duluth, MN?", echo="all")
#> 👤 User turn:
#> 
#> What's the weather like today in Duluth, MN?
#> 
#> 🤖 Assistant turn:
#> 
#>  # tool request (call_YRma1FOUHVPGHkfylqdHw886)
#>  get_current_temperature(latitude=46.7833, longitude=-92.1062)
#> 
#> << 🤖 finish reason: tool_calls >>
#> 
#> 
#> 👤 User turn:
#> 
#>  # tool result (call_YRma1FOUHVPGHkfylqdHw886)
#>  {'time': '2025-02-27T17:45', 'interval': 900, 'temperature_2m': 3.2, 'wind_speed_10m': 21.6}
#> 
#> 🤖 Assistant turn:
#> 
#> Today in Duluth, MN, the temperature is approximately 3.2°C with a wind speed of 21.6 km/h.

Chat.register_tool() has signature Chat.register_tool(func, *, model=None), i.e. it takes a function func and determines input parameters from the function docstring. For more complicated tools, you can pass a pydantic model to model.

Internally, .register_tool() creates a Tool instance with properties .func, .schema and .name. The schema stores the function description.

Tool calls in shinychat

Here are a few design sketches for what tool calling might look like:

Step Block Inline
Tool call starts Image Image
Tool call completes Image Image
Extra info1 Image Image

1. The extra info being outlined here is happening with two different mechanisms. In the block-style UI, the tool call completion hook returns custom UI with Shiny UI elements to show the results. In the inline-style UI, we could use popovers to display additional information about the call, e.g. showing the parameters used, etc.

There are two points in the tool-calling lifecycle where we need status updates:

  1. When the tool is called
  2. When the tool call completes

I'm envisioning that tools would be registered with shinychat in similar to how they're registered with chatlas or ellmer. Internally (or alternatively) shinychat could provide classes that extend chatlas.Turn or UI methods for ellmer::TurnDef.

Ideally, the minimum amount of work required would be to register the tools, from there our default methods could create the UI as needed, using only information in the tool definition. For example, the name of the get_current_weather tool above could be converted into the UI label "Get current weather".

Depending on the use-case, I can also see wanting to pick between a block display or an inline display. The block works best for larger or bigger tasks, and the inline display could be used in situations where each turn is likely to include many tool calls.

Both of these variants would be encapsulated in functions available to users that we call with normal defaults. For example, we could default to showing "Get current weather", but a user could provide their own method that uses our block-display design but changes the title to "Get weather in Duluth, MN".

The completion method would, by default, find the tool call UI added at the start of the tool call and simply mark it as completed by updating its attributes.

That said, we also want it to be possible for the completion callback to send entirely custom UI that replaces the initial tool call UI. This is shown in the last row of the table for the block method, where instead of simply marking the flight search tool call as complete, the data received in the tool call is presented using custom UI.

Static Case

To start with the most simple and straight-forward case, we'll consider a simple chat with a single tool call.

User: What time is it in London?
Assistant: 
  Sure, I can look up the time.
  <tool_request id="123" name="get_current_time" tz="UK/London">
User: <tool_result id="123">2025-03-31 11:12:13</tool_result>
Assistant: It's 11am in London.

ellmer handles the tool request in the first assistant message, automatically invokes the tool, and returns the result. Note that in the chat turns, ellmer stores the tool result as a user turn, because we ran code locally to invoke the tool. Note also that, in a live session, control of the chat isn't returned to the user until after the second assistant message.

Structurally, the turns look like this at the end of this chat:

  • Turn(role = "user")
    • ContentText (user message)
  • Turn(role = "assistant")
    • ContentText (assistant message)
    • ContentToolRequest
  • Turn(role = "user")
    • ContentToolResult
  • Turn(role = "assistant")
    • ContentToolText

Currently, when launching live_browser(client) on a chat client object, ellmer calls contents_markdown() on the turns. For contents_markdown(<ContentText>), this is a simple transformation that extracts the text entered by the user or returned from the LLM. On the other hand, contents_markdown(<ContentToolRequest>) and contents_markdown(<ContentToolResult>) are no-ops and hide the tool request/result from the chat UI.

I propose that we introduce a new generic -- contents_shinychat() -- that we use instead and that can be used to create a default display for ContentToolRequest or ContentToolResult. We'd also have a contents_shinychat(<Chat>) method, wherein we'd reorganize the turns to coalesce tool results into a single assistant message. We would also suppress the tool request display and only show the tool results.

sequenceDiagram
    participant R as R Session
    participant client
    participant ellmer
    participant Terminal as Terminal (emit)
    participant UI as UI (yield)
    
    
    rect rgba(255, 165, 0, 0.05)
    Note over client: ContentText (user)
    client-->>UI: contents_shinychat(turn)
    Note over UI: User message
    end
    
    rect rgba(100, 100, 255, 0.05)
    Note over client: ContentText (assistant)
    client-->>UI: contents_shinychat(turn)
    Note over UI: Asisstant message
    Note over client: ContentToolRequest
    Note over client: ContentToolResult
    client-->>UI: contents_shinychat(ContentToolResult)
    Note over UI: Tool result display
    end
Loading

Live Case

When running live, the Content* objects are not directly used in the UI. In general, they're created by ellmer and recorded in the chat's turns, but the response from the LLM (API) are streamed to the UI via yielded strings.

sequenceDiagram
    participant R as R Session
    participant client as  client<br>(record)
    participant ellmer
    participant Terminal as Terminal<br>(emit)
    participant UI as UI<br>(yield)
    
    UI->>client: User input submitted
    Note over client: ContentText (user)
    client->>ellmer: chat_append(client$stream())
    activate client
    activate ellmer
    ellmer-->>Terminal: emit(chunk)
    Note over Terminal: Text of assistant response
    ellmer-->>UI: yield(chunk)
    Note over UI: Display of assistant response
    ellmer->>client: 
    Note over client: ContentText (assistant)
    deactivate client
    rect rgba(100, 100, 255, 0.05)
    ellmer->>client: Record tool request
    activate ellmer
    Note over client: ContentToolRequest
    ellmer-->>Terminal: emit(request)
    Note over Terminal: echo="all"
    ellmer-->>UI: yield(request)
    Note over UI: on_tool_request(request)
    Note over UI: contents_shinychate(request)
    ellmer->>R: Invoke tool
    activate R
    R-->>UI: chat_append_stream()
    Note over UI: Text appended during tool call
    R->>ellmer: tool result 
    deactivate R
    ellmer-->>Terminal: emit(result)
    Note over Terminal: echo="all"
    ellmer-->>UI: yield(result)
    Note over UI: on_tool_result(result)
    Note over UI: contents_shinychat(result)
    deactivate ellmer
    deactivate ellmer
    ellmer->>client: 
    Note over client: ContentToolResult
    end
Loading

Here's a description of the process depicted in the sequence diagram. I've added bold to the steps that we would be modifying to make this approach work.

  1. User Input: The user provides input through the UI. This input is sent to the client as ContentText from the user.

  2. Client to ellmer: The client calls client$stream(input) and ellmer makes the API request to the LLM and returns a generator as a response. Calling shinychat::chat_append(stream) directs yielded strings to the UI.

  3. Assistant Response (Text): ellmer emits chunks of the assistant's response to the Terminal (assuming echo="all") and yields chunks to the UI for display, showing the text of the assistant's response in both the Terminal and the UI.

  4. Assistant Response (Content): ellmer records the assistant message with ContentText and ContentToolRequest objects in the assistant turn.

  5. Tool Request: ellmer currently doesn't yield the tool request, but I propose we add a yield_all option to $stream() and $stream_async(). When TRUE, we yield all non-text content at the end of the assistant turn.

  6. Tool Request Display: shinychat receives the yielded ContentToolRequest and transforms it with contents_shinychat() before appending to the current chat message.

  7. Tool Invocation: ellmer invokes the tool in the R session.

  8. Text Appended During Tool Call: During the tool invocation (inside the tool function body), tool authors can use chat_append_stream() to append content to the UI. This content is ephemeral unless it ends up recorded in the tool result.

  9. Tool Result: The tool function returns a result.

    Currently, this can be any jsonifiable object but doesn't have a special class. In addition to returning a regular R object, I propose we also allow the tool to return a ContentToolResult object, which might be a custom user-defined class that inherits from ContentToolResult.

    In addition to the properties currently used by ContentToolResult -- id, value, error -- we would add detail (additional data), call_tool (the tool def of the calling tool) and call_args (the arguments used when calling the tool).

    Currently, the tool results are converted to ContentToolResult and stored as a new user turn, but with yield_all = TRUE, ellmer would yield the contents of the turn into the generator.

  10. Tool Result Display: Again, shinychat would receive the yielded ContentToolResult, call contents_shinychat() on the ContentToolResult and append the formatted result to the chat message.

  11. On Tool Callbacks: With this approach, shinychat can also own the on_tool_request() and on_tool_result() callbacks. The immediate need is to to remove, replace or hide any UI from the ContentToolRequest when we receive a ContentToolResult. These callbacks do not need to be user-facing at this point.

  12. The final goal is that, once we have a paired ContentToolRequest and ContentToolResult, to have the final live state be equivalent to the chat state in the static case.

Things we need

  • A way to list attached tools from chat, e.g. chat$get_tools() or chat.get_tools(). Would be useful if we want to take a chat client and register its tools with shinychat. Not sure if this is strictly required, but could be a nice addition regardless. This now exists in ellmer and chatlas.

  • shinychat gains contents_shinychat() generic, with methods for Chat, ContentToolRequest, ContentToolResult, etc., otherwise falling back to contents_html() or contents_markdown().

  • Expand data included in the ContentToolResult object: id, call_tool, call_args are added by ellmer when the tool is invoked. value is the value that's sent to the LLM. detail is a list that collects any other data that someone would want to add to the tool result (analogue to CustomEvent.detail in JavaScript).

  • ellmer gains support for tools to return ContentToolResult objects. If a tool returns a ContentToolResult, ellmer fills in id, call_tool and call_args. Otherwise, ellmer creates the ContentToolResult.

  • ellmer gains yield_all in Chat$stream() and Chat$stream_async(). When TRUE, we yield non-text assistant responses after the assistant turn completes (text is already yielded) and we yield the contents of the user turn added by tool invokation.

  • With the above in place, tool authors could return custom ContentToolRequest objects with custom contents_shinychat methods for formatting for display in Shiny.

  • Additionally, ToolDef should gain an annotations property that allows tool definitions to carry additional properties that would be used in display. annotations were recently added to the MCP schema. contents_shinychat() would hook into these annotations for displaying the tool name or knowing which tools modify their environment, etc.

  • Relatedly, shinychat needs a way to append to the current chat without knowing the ID of the chat. This would allow tool authors to write tool functions that append to shinychat when it's available or do something else when used without a shinychat UI. That might look something like this:

    my_tool <- function(...) {
       chat_ui <- local_shinychat()
       chat_ui$append("Progress: 0%") # no-op if called outside `chat_append()`
    
       # ... do stuff ...
       chat_ui$replace("Progress: 50%")
       # or maybe
       chat_ui$append("Progress: 50%", operation = "replace")
    
       # finally...
       chat_ui$replace("All done!")
       result
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions