-
Notifications
You must be signed in to change notification settings - Fork 15
Description
Tool Call Current State
Tool calling is currently entirely limited to the back-end chat interface. Tools are registered with ellmer or chatlas (see below), but shinychat does not do anything in the UI to indicate that a tool call is being made.
For background, here are how ellmer
and chatlas
register and store tool definitions.
ellmer
library(ellmer)
get_current_time <- function(tz = "UTC") {
format(Sys.time(), tz = tz, usetz = TRUE)
}
tool_get_current_time <- tool(
get_current_time,
.description = "Gets the current time in the given time zone.",
tz = type_string(
"The time zone to get the current time in. Defaults to `\"UTC\"`.",
required = FALSE
)
)
chat <- chat_openai(model = "gpt-4o", echo = "all")
chat$register_tool(tool_get_current_time)
chat$chat("How long ago exactly was the moment Neil Armstrong touched down on the moon?")
#> > How long ago exactly was the moment Neil Armstrong touched down on the moon?
#> < [tool request (call_WkRmaly9E7kgpMB5RPWVzijh)]: get_current_time(tz = "UTC")
#> < [tool request (call_0OqdPpugMw3wjX9IEz1xVwxd)]: get_current_time(tz =
#> < "America/New_York")
#> > [tool result (call_WkRmaly9E7kgpMB5RPWVzijh)]: 2025-02-27 17:53:17 UTC
#> > [tool result (call_0OqdPpugMw3wjX9IEz1xVwxd)]: 2025-02-27 12:53:17 EST
#> < Neil Armstrong touched down on the moon on July 20, 1969, at 20:17 UTC.
#> <
#> < As of now, which is February 27, 2025, at 17:53 UTC, it has been
#> < approximately 55 years, 7 months, and 7 days since that historic moment.
#> <
Internally, tool()
creates a ToolDef
instance with name
, description
and arguments
properties.
chatlas
import requests
from chatlas import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
def get_current_temperature(latitude: float, longitude: float):
"""
Get the current weather given a latitude and longitude.
Parameters
----------
latitude
The latitude of the location.
longitude
The longitude of the location.
"""
lat_lng = f"latitude={latitude}&longitude={longitude}"
url = f"https://api.open-meteo.com/v1/forecast?{lat_lng}¤t=temperature_2m,wind_speed_10m&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m"
response = requests.get(url)
json = response.json()
return json["current"]
chat = ChatOpenAI(model="gpt-4o-mini")
chat.register_tool(get_current_temperature)
chat.chat("What's the weather like today in Duluth, MN?", echo="all")
#> 👤 User turn:
#>
#> What's the weather like today in Duluth, MN?
#>
#> 🤖 Assistant turn:
#>
#> # tool request (call_YRma1FOUHVPGHkfylqdHw886)
#> get_current_temperature(latitude=46.7833, longitude=-92.1062)
#>
#> << 🤖 finish reason: tool_calls >>
#>
#>
#> 👤 User turn:
#>
#> # tool result (call_YRma1FOUHVPGHkfylqdHw886)
#> {'time': '2025-02-27T17:45', 'interval': 900, 'temperature_2m': 3.2, 'wind_speed_10m': 21.6}
#>
#> 🤖 Assistant turn:
#>
#> Today in Duluth, MN, the temperature is approximately 3.2°C with a wind speed of 21.6 km/h.
Chat.register_tool() has signature Chat.register_tool(func, *, model=None)
, i.e. it takes a function func
and determines input parameters from the function docstring. For more complicated tools, you can pass a pydantic model to model
.
Internally, .register_tool()
creates a Tool
instance with properties .func
, .schema
and .name
. The schema
stores the function description.
Tool calls in shinychat
Here are a few design sketches for what tool calling might look like:
Step | Block | Inline |
---|---|---|
Tool call starts | ![]() |
![]() |
Tool call completes | ![]() |
![]() |
Extra info1 | ![]() |
![]() |
1. The extra info being outlined here is happening with two different mechanisms. In the block-style UI, the tool call completion hook returns custom UI with Shiny UI elements to show the results. In the inline-style UI, we could use popovers to display additional information about the call, e.g. showing the parameters used, etc.
There are two points in the tool-calling lifecycle where we need status updates:
- When the tool is called
- When the tool call completes
I'm envisioning that tools would be registered with shinychat in similar to how they're registered with chatlas
or ellmer
. Internally (or alternatively) shinychat could provide classes that extend chatlas.Turn
or UI methods for ellmer::TurnDef
.
Ideally, the minimum amount of work required would be to register the tools, from there our default methods could create the UI as needed, using only information in the tool definition. For example, the name of the get_current_weather
tool above could be converted into the UI label "Get current weather".
Depending on the use-case, I can also see wanting to pick between a block display or an inline display. The block works best for larger or bigger tasks, and the inline display could be used in situations where each turn is likely to include many tool calls.
Both of these variants would be encapsulated in functions available to users that we call with normal defaults. For example, we could default to showing "Get current weather", but a user could provide their own method that uses our block-display design but changes the title to "Get weather in Duluth, MN".
The completion method would, by default, find the tool call UI added at the start of the tool call and simply mark it as completed by updating its attributes.
That said, we also want it to be possible for the completion callback to send entirely custom UI that replaces the initial tool call UI. This is shown in the last row of the table for the block method, where instead of simply marking the flight search tool call as complete, the data received in the tool call is presented using custom UI.
Static Case
To start with the most simple and straight-forward case, we'll consider a simple chat with a single tool call.
User: What time is it in London?
Assistant:
Sure, I can look up the time.
<tool_request id="123" name="get_current_time" tz="UK/London">
User: <tool_result id="123">2025-03-31 11:12:13</tool_result>
Assistant: It's 11am in London.
ellmer handles the tool request in the first assistant message, automatically invokes the tool, and returns the result. Note that in the chat turns, ellmer stores the tool result as a user turn, because we ran code locally to invoke the tool. Note also that, in a live session, control of the chat isn't returned to the user until after the second assistant message.
Structurally, the turns look like this at the end of this chat:
Turn(role = "user")
ContentText
(user message)
Turn(role = "assistant")
ContentText
(assistant message)ContentToolRequest
Turn(role = "user")
ContentToolResult
Turn(role = "assistant")
ContentToolText
Currently, when launching live_browser(client)
on a chat client object, ellmer calls contents_markdown()
on the turns. For contents_markdown(<ContentText>)
, this is a simple transformation that extracts the text entered by the user or returned from the LLM. On the other hand, contents_markdown(<ContentToolRequest>)
and contents_markdown(<ContentToolResult>)
are no-ops and hide the tool request/result from the chat UI.
I propose that we introduce a new generic -- contents_shinychat()
-- that we use instead and that can be used to create a default display for ContentToolRequest
or ContentToolResult
. We'd also have a contents_shinychat(<Chat>)
method, wherein we'd reorganize the turns to coalesce tool results into a single assistant message. We would also suppress the tool request display and only show the tool results.
sequenceDiagram
participant R as R Session
participant client
participant ellmer
participant Terminal as Terminal (emit)
participant UI as UI (yield)
rect rgba(255, 165, 0, 0.05)
Note over client: ContentText (user)
client-->>UI: contents_shinychat(turn)
Note over UI: User message
end
rect rgba(100, 100, 255, 0.05)
Note over client: ContentText (assistant)
client-->>UI: contents_shinychat(turn)
Note over UI: Asisstant message
Note over client: ContentToolRequest
Note over client: ContentToolResult
client-->>UI: contents_shinychat(ContentToolResult)
Note over UI: Tool result display
end
Live Case
When running live, the Content*
objects are not directly used in the UI. In general, they're created by ellmer and recorded in the chat's turns
, but the response from the LLM (API) are streamed to the UI via yielded strings.
sequenceDiagram
participant R as R Session
participant client as client<br>(record)
participant ellmer
participant Terminal as Terminal<br>(emit)
participant UI as UI<br>(yield)
UI->>client: User input submitted
Note over client: ContentText (user)
client->>ellmer: chat_append(client$stream())
activate client
activate ellmer
ellmer-->>Terminal: emit(chunk)
Note over Terminal: Text of assistant response
ellmer-->>UI: yield(chunk)
Note over UI: Display of assistant response
ellmer->>client:
Note over client: ContentText (assistant)
deactivate client
rect rgba(100, 100, 255, 0.05)
ellmer->>client: Record tool request
activate ellmer
Note over client: ContentToolRequest
ellmer-->>Terminal: emit(request)
Note over Terminal: echo="all"
ellmer-->>UI: yield(request)
Note over UI: on_tool_request(request)
Note over UI: contents_shinychate(request)
ellmer->>R: Invoke tool
activate R
R-->>UI: chat_append_stream()
Note over UI: Text appended during tool call
R->>ellmer: tool result
deactivate R
ellmer-->>Terminal: emit(result)
Note over Terminal: echo="all"
ellmer-->>UI: yield(result)
Note over UI: on_tool_result(result)
Note over UI: contents_shinychat(result)
deactivate ellmer
deactivate ellmer
ellmer->>client:
Note over client: ContentToolResult
end
Here's a description of the process depicted in the sequence diagram. I've added bold to the steps that we would be modifying to make this approach work.
-
User Input: The user provides input through the UI. This input is sent to the client as
ContentText
from the user. -
Client to ellmer: The client calls
client$stream(input)
and ellmer makes the API request to the LLM and returns a generator as a response. Callingshinychat::chat_append(stream)
directs yielded strings to the UI. -
Assistant Response (Text): ellmer emits chunks of the assistant's response to the Terminal (assuming
echo="all"
) and yields chunks to the UI for display, showing the text of the assistant's response in both the Terminal and the UI. -
Assistant Response (Content): ellmer records the assistant message with
ContentText
andContentToolRequest
objects in the assistant turn. -
Tool Request: ellmer currently doesn't yield the tool request, but I propose we add a
yield_all
option to$stream()
and$stream_async()
. WhenTRUE
, we yield all non-text content at the end of the assistant turn. -
Tool Request Display: shinychat receives the yielded
ContentToolRequest
and transforms it withcontents_shinychat()
before appending to the current chat message. -
Tool Invocation: ellmer invokes the tool in the R session.
-
Text Appended During Tool Call: During the tool invocation (inside the tool function body), tool authors can use
chat_append_stream()
to append content to the UI. This content is ephemeral unless it ends up recorded in the tool result. -
Tool Result: The tool function returns a result.
Currently, this can be any jsonifiable object but doesn't have a special class. In addition to returning a regular R object, I propose we also allow the tool to return a
ContentToolResult
object, which might be a custom user-defined class that inherits fromContentToolResult
.In addition to the properties currently used by
ContentToolResult
--id
,value
,error
-- we would adddetail
(additional data),call_tool
(the tool def of the calling tool) andcall_args
(the arguments used when calling the tool).Currently, the tool results are converted to
ContentToolResult
and stored as a new user turn, but withyield_all = TRUE
, ellmer would yield the contents of the turn into the generator. -
Tool Result Display: Again, shinychat would receive the yielded
ContentToolResult
, callcontents_shinychat()
on theContentToolResult
and append the formatted result to the chat message. -
On Tool Callbacks: With this approach, shinychat can also own the
on_tool_request()
andon_tool_result()
callbacks. The immediate need is to to remove, replace or hide any UI from theContentToolRequest
when we receive aContentToolResult
. These callbacks do not need to be user-facing at this point. -
The final goal is that, once we have a paired
ContentToolRequest
andContentToolResult
, to have the final live state be equivalent to the chat state in the static case.
Things we need
-
A way to list attached tools from chat, e.g.
chat$get_tools()
orchat.get_tools()
. Would be useful if we want to take achat
client and register its tools with shinychat. Not sure if this is strictly required, but could be a nice addition regardless. This now exists in ellmer and chatlas. -
shinychat gains
contents_shinychat()
generic, with methods forChat
,ContentToolRequest
,ContentToolResult
, etc., otherwise falling back tocontents_html()
orcontents_markdown()
. -
Expand data included in the
ContentToolResult
object:id
,call_tool
,call_args
are added by ellmer when the tool is invoked.value
is the value that's sent to the LLM.detail
is a list that collects any other data that someone would want to add to the tool result (analogue toCustomEvent.detail
in JavaScript). -
ellmer gains support for tools to return
ContentToolResult
objects. If a tool returns aContentToolResult
, ellmer fills inid
,call_tool
andcall_args
. Otherwise, ellmer creates theContentToolResult
. -
ellmer gains
yield_all
inChat$stream()
andChat$stream_async()
. WhenTRUE
, we yield non-text assistant responses after the assistant turn completes (text is already yielded) and we yield the contents of the user turn added by tool invokation. -
With the above in place, tool authors could return custom
ContentToolRequest
objects with customcontents_shinychat
methods for formatting for display in Shiny. -
Additionally,
ToolDef
should gain anannotations
property that allows tool definitions to carry additional properties that would be used in display.annotations
were recently added to the MCP schema.contents_shinychat()
would hook into these annotations for displaying the tool name or knowing which tools modify their environment, etc. -
Relatedly, shinychat needs a way to append to the current chat without knowing the ID of the chat. This would allow tool authors to write tool functions that append to shinychat when it's available or do something else when used without a shinychat UI. That might look something like this:
my_tool <- function(...) { chat_ui <- local_shinychat() chat_ui$append("Progress: 0%") # no-op if called outside `chat_append()` # ... do stuff ... chat_ui$replace("Progress: 50%") # or maybe chat_ui$append("Progress: 50%", operation = "replace") # finally... chat_ui$replace("All done!") result }