Skip to content

Conversation

centdix
Copy link
Collaborator

@centdix centdix commented Oct 7, 2025

Important

This PR adds functionality to display tool usage in flow conversations, updating both backend and frontend components to handle and display tool-related messages.

  • Backend:
    • Add step_name and success columns to flow_conversation_message table in migrations/20251007123506_update_conversation_message_types.up.sql.
    • Modify add_message_to_conversation_tx() in flow_conversations.rs to include step_name and success.
    • Update run_agent() in ai_executor.rs to handle tool messages and update conversation with tool usage.
  • Frontend:
    • Introduce FlowChatManager in FlowChatManager.svelte.ts to manage chat state and message handling.
    • Update FlowChatInterface.svelte to use FlowChatManager for managing chat interactions.
    • Modify FlowChatMessage.svelte to display tool usage with success indicators.

This description was created by Ellipsis for 12b5d7e. You can customize this summary. It will automatically update as commits are pushed.

Copy link

cloudflare-workers-and-pages bot commented Oct 8, 2025

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 12b5d7e
Status: ✅  Deploy successful!
Preview URL: https://504c67ea.windmill.pages.dev
Branch Preview URL: https://fg-toolsinchat.windmill.pages.dev

View logs

@centdix
Copy link
Collaborator Author

centdix commented Oct 8, 2025

/updatesqlx

Copy link
Contributor

Starting sqlx update...

@centdix
Copy link
Collaborator Author

centdix commented Oct 8, 2025

/updatesqlx

Copy link
Contributor

Starting sqlx update...

Copy link
Contributor

Successfully ran sqlx update

@centdix
Copy link
Collaborator Author

centdix commented Oct 8, 2025

/ai-fast review this PR

Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @centdix's task —— View job


PR Review: Tool Usage Display in Flow Conversations

Todo List

  • Read CLAUDE.md files for project conventions
  • Review database migration changes
  • Analyze backend Rust code changes
  • Review frontend Svelte component changes
  • Provide comprehensive review feedback

Summary

This PR adds support for displaying tool usage in flow conversations, allowing users to see when AI agents invoke tools during conversations. The implementation adds a new "tool" message type and tracks which flow step generated each message.

Database Changes ✅

The migration properly extends the conversation system:

  • Adds 'tool' to the MESSAGE_TYPE enum
  • Adds step_name (VARCHAR(255)) and success (BOOLEAN DEFAULT TRUE) columns to flow_conversation_message table
  • Includes proper down migration for rollback

Note: The migration looks clean and follows best practices.

Backend Changes

Positive Aspects ✅

  1. Centralized message handling - Good refactoring to move add_message_to_conversation_tx to windmill-common (backend/windmill-common/src/flow_conversations.rs:18-50)
  2. Non-blocking conversation updates - Tool messages are added via spawned tasks to avoid blocking job execution (backend/windmill-worker/src/ai_executor.rs:1081-1096, 1185-1200)
  3. Proper error handling - Tool execution failures are tracked with success: false flag
  4. Step name extraction - Helper function get_step_name_from_flow properly extracts step names with fallback (backend/windmill-worker/src/ai_executor.rs:1319-1327)

Areas for Improvement 🔧

  1. Performance optimization needed in ai_executor.rs:

    • Lines 1069-1072, 1167-1170: Multiple sequential database calls to get memory_id. Consider caching the memory_id at the start of the agent execution to avoid repeated lookups.
  2. Error logging consistency:

    • Line 1093, 1197: Uses tracing::warn! for conversation update failures. Consider using tracing::error! since these are database failures that may need attention.
  3. Tool message content:

    • Lines 1178-1182: The tool success message is generic ("Used X tool"). Consider including more context like execution time or result summary for better debugging.

Frontend Changes

Positive Aspects ✅

  1. Clean separation of concerns - New FlowChatManager.svelte.ts class properly encapsulates chat state management
  2. Tool message visualization - Nice visual indicators with CheckCircle2 (success) and AlertTriangle (failure) icons (frontend/src/lib/components/flows/conversations/FlowChatMessage.svelte:46-51)
  3. Step name display - Clean UI showing which step generated each message (lines 26-30)
  4. Error state handling - Proper border styling for failed tool messages (line 24)

Areas for Improvement 🔧

  1. Type safety in FlowChatManager.svelte.ts:

    • Line 112: Using console.log for debugging - should be removed or wrapped in a debug flag
    • Missing TypeScript strict null checks in several places
  2. Message caching logic:

    • Lines 100-103: Cache check happens before API call, but cache invalidation strategy is unclear. What happens when new messages arrive?
  3. Scroll behavior:

    • Lines 133-136: Using hardcoded timeouts (50ms) for scroll restoration. Consider using requestAnimationFrame or ResizeObserver for more reliable behavior.

Security Considerations 🔒

  • Tool execution properly respects job permissions (backend/windmill-worker/src/ai_executor.rs:888-894)
  • Database transactions are used correctly to ensure consistency
  • No SQL injection vulnerabilities found in the new queries

Recommendations

  1. Add integration tests for the new tool message flow to ensure messages are properly recorded
  2. Consider rate limiting for conversation message additions to prevent potential abuse
  3. Add metrics/monitoring for tool message creation failures
  4. Document the new message type in API documentation beyond just OpenAPI spec

Overall Assessment

The implementation is solid with proper separation of concerns and good error handling. The suggested improvements are mostly optimizations and polish items. The feature adds valuable debugging capabilities for AI agent flows.

Verdict: Approved with minor suggestions

The code is production-ready but would benefit from the performance optimizations and cleanup mentioned above.

@centdix
Copy link
Collaborator Author

centdix commented Oct 8, 2025

/updatesqlx

Copy link
Contributor

Starting sqlx update...

Copy link
Contributor

Successfully ran sqlx update

@centdix
Copy link
Collaborator Author

centdix commented Oct 8, 2025

/ai-fast review the changes since last review

Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @centdix's task —— View job


Review of Changes Since Last Review

Todo List

  • Identify changes since last review
  • Review new commits and modifications
  • Provide feedback on updates

Summary of New Changes

Since my last review, the following improvements have been made:

  1. Memory ID caching for nested agents (commits 3feaa00, 4ed468e)
  2. EventSource error suppression after stream completion (95ac812)
  3. Code cleanup - removed debug console.log statements (6cc2251)
  4. UI refinement - added prose-sm class for better message styling (12b5d7e)

Detailed Review

✅ Excellent Improvements

  1. Performance Optimization Addressed - The memory_id caching issue I raised in my previous review has been fully resolved:

    • Created a FlowChatSettings struct to cache both memory_id and chat_input_enabled (backend/windmill-worker/src/ai_executor.rs:103-107)
    • Now fetches settings once and reuses them throughout agent execution
    • This eliminates the repeated database lookups I identified as a performance concern
  2. Better Root Job Resolution for nested agents:

    • Improved logic to find the root flow job using root_job, flow_innermost_root_job, or parent_job fallback (lines 113-117)
    • This ensures chat messages work correctly even in deeply nested agent scenarios
  3. Stream Error Handling fixed:

    • Added isCompleted flag to prevent spurious error messages after successful stream completion (frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:347, 459)
    • This resolves potential false error notifications when EventSource closes normally
  4. Code Cleanup:

    • Removed debug console.log statement (line 112 in FlowChatManager)
    • Added proper prose sizing with prose-sm class for better readability

🎯 All Previous Concerns Addressed

I'm pleased to see that all the performance and error handling issues from my previous review have been addressed:

  • Memory ID caching - No more repeated database lookups
  • Error logging - Improved with better context in warning messages
  • Debug cleanup - Console.log removed
  • Nested agent support - Proper root job resolution

💡 Minor Suggestions (Optional)

  1. Consider extracting chat settings retrieval to a separate module if this pattern grows, but current implementation is clean and appropriate for the scope.

  2. The FlowChatSettings struct is well-designed and could potentially be extended in the future if more chat-related settings need caching.


Overall Assessment

All requested changes have been implemented effectively. The code now properly caches chat settings, handles nested agents correctly, and provides a cleaner user experience with proper error suppression. The performance optimization I suggested has been implemented elegantly with the FlowChatSettings struct.

Verdict: Ready to merge

Great work addressing all the feedback! The implementation is now robust and performant.

@centdix centdix marked this pull request as ready for review October 8, 2025 21:34
Copy link
Contributor

claude bot commented Oct 8, 2025

Claude finished @centdix's task —— View job


PR Review: Tool Usage Display in Flow Conversations

Todo List

  • Read CLAUDE.md files for project conventions and guidelines
  • Review database migration changes
  • Analyze backend Rust code changes
  • Review frontend Svelte component changes
  • Check OpenAPI specification updates
  • Provide comprehensive review feedback
  • Include reproduction instructions for testers

Summary

This PR implements a comprehensive feature to display tool usage in flow conversations, allowing users to see when AI agents invoke tools during conversations. The implementation adds a new "tool" message type and tracks which flow step generated each message, with success/failure indicators.

Review Findings

✅ Excellent Implementations

1. Database Design (backend/migrations/20251007123506_update_conversation_message_types.up.sql:4-8)

  • Clean migration extending MESSAGE_TYPE enum with 'tool'
  • Proper addition of step_name and success columns with appropriate defaults
  • Includes proper down migration for rollback safety

2. Code Organization (backend/windmill-common/src/flow_conversations.rs:17-50)

  • Excellent refactoring moving add_message_to_conversation_tx to windmill-common for reusability
  • Clean function signature with all necessary parameters including the new step_name and success fields
  • Proper transaction handling with conversation timestamp update

3. Performance Optimization (backend/windmill-worker/src/ai_executor.rs:103-145)

  • Smart caching with FlowChatSettings struct prevents repeated database lookups
  • Efficient root job resolution using fallback chain: root_jobflow_innermost_root_jobparent_job
  • This addresses performance concerns effectively

4. Non-blocking Design (backend/windmill-worker/src/ai_executor.rs:1126-1141, 1238-1252)

  • Tool messages are added via spawned async tasks to avoid blocking job execution
  • Proper error handling with warning logs for conversation update failures
  • Good separation of concerns between job execution and conversation tracking

5. Frontend Architecture (frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts)

  • Clean separation with dedicated FlowChatManager class
  • Proper state management using Svelte 5 runes
  • Good caching strategy for conversation messages

6. Visual Design (frontend/src/lib/components/flows/conversations/FlowChatMessage.svelte:46-51)

  • Clear visual indicators: CheckCircle2 (success) and AlertTriangle (failure)
  • Clean step name display with proper styling
  • Conditional border styling for failed tool messages

🔧 Areas for Improvement

1. Error Message Content (backend/windmill-worker/src/ai_executor.rs:1232-1236)

let content = if success {
    format!("Used {} tool", tool_name)
} else {
    format!("Error executing {}", tool_name)
};

Issue: Generic tool success messages provide minimal debugging value.
Suggestion: Include more context like execution time, result summary, or error details for better debugging experience.

2. OpenAPI Schema Completeness (backend/windmill-api/openapi.yaml)

The FlowConversationMessage schema includes the new fields but could be more descriptive:

step_name:
  type: string
  description: The step name that produced that message
success:
  type: boolean
  description: Whether the message is a success

Suggestion: Add examples and clarify that step_name is nullable and success is primarily relevant for tool messages.

3. Type Safety Enhancement (frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:100-103)

if (this.#conversationsCache[conversationIdToUse]) {
    // Cache check happens before API call

Issue: Cache invalidation strategy is unclear.
Suggestion: Add explicit cache invalidation methods or TTL to ensure data freshness.

4. Step Name Extraction Robustness (backend/windmill-worker/src/ai_executor.rs)

fn get_step_name_from_flow(flow_value: &FlowValue, flow_step_id: Option<&str>) -> Option<String> {
    let flow_step_id = flow_step_id?;
    let module = flow_value.modules.iter().find(|m| m.id == flow_step_id)?;
    Some(
        module
            .summary
            .clone()
            .unwrap_or_else(|| format!("AI Agent Step {}", module.id)),
    )
}

Good: Proper fallback handling with descriptive default names.

🔒 Security Assessment

All security practices followed:

  • Tool execution respects job permissions (ai_executor.rs:888-894)
  • Database transactions ensure consistency
  • Parameterized queries prevent SQL injection
  • No sensitive data exposure in tool messages

📊 Performance Analysis

Well optimized:

  • Conversation updates are non-blocking via spawned tasks
  • Memory ID caching prevents repeated database lookups
  • Efficient root job resolution reduces query complexity
  • Proper use of transactions for data consistency

🧪 Code Quality

Follows repository conventions:

  • ✅ Clean, readable code with proper error handling
  • ✅ Consistent naming and patterns with existing codebase
  • ✅ Proper use of Rust async patterns and Tokio primitives
  • ✅ Good separation of concerns between components
  • ✅ Svelte 5 best practices with proper state management

🚀 Recommendations

  1. Enhanced Tool Messages: Consider including execution time or result summary in tool success messages
  2. Cache Strategy: Add explicit cache invalidation for conversation messages
  3. Error Context: Include more specific error details in failed tool messages
  4. Integration Tests: Add tests for the tool message flow to ensure reliability
  5. Monitoring: Consider adding metrics for tool message creation failures

Overall Assessment

Verdict: Excellent Implementation ✅

This is a well-architected feature that demonstrates strong technical execution. The code follows best practices, handles edge cases properly, and integrates cleanly with the existing system. All previous performance and error handling concerns from earlier reviews have been addressed effectively.

The implementation provides valuable debugging capabilities for AI agent flows while maintaining good performance and reliability standards.


Testing Instructions

To test the tool usage display feature:

Navigate to any flow that contains AI Agent steps with tool usage enabled. Create or open a flow conversation by clicking the chat icon. Run the flow with agent steps that use tools (like API calls, database queries, or script executions). In the conversation interface, you should see new "tool" messages appearing with:

  • Step name badges showing which flow step generated each tool message
  • Green checkmark icons for successful tool executions
  • Red warning triangles for failed tool executions
  • Clear visual separation between user messages, agent responses, and tool usage indicators

The tool messages will appear in real-time as the flow executes, providing immediate feedback on which tools are being used and whether they succeed or fail.


Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 12b5d7e in 2 minutes and 29 seconds. Click for details.
  • Reviewed 2241 lines of code in 22 files
  • Skipped 0 files when reviewing.
  • Skipped posting 9 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. frontend/src/lib/components/flows/conversations/FlowChatInterface.svelte:32
  • Draft comment:
    The initialization effect cleanly sets up and later cleans up the FlowChatManager instance. Consider adding a brief comment clarifying the purpose of each $effect block (one for initialization and one for conversationId updates) to ease future maintenance.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
2. frontend/src/lib/components/flows/conversations/FlowChatInterface.svelte:47
  • Draft comment:
    The cleanup function (manager.cleanup) is properly called on unmount. It may be useful to include a comment regarding error handling inside cleanup, in case asynchronous errors occur during manager cleanup.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment is speculative - it suggests what "may be useful" without evidence of actual issues. We don't see the implementation of manager.cleanup() so we can't know if error handling is actually needed. The cleanup pattern shown is standard and simple. Adding error handling here would be premature optimization without evidence of problems. Perhaps there are known issues with cleanup functions in this codebase that would justify this suggestion? The cleanup could involve async operations that could fail. Without seeing the cleanup implementation or evidence of actual issues, suggesting error handling is speculative. The cleanup pattern shown is standard and working code should not be complicated without justification. The comment should be deleted as it makes a speculative suggestion without evidence of actual issues, violating our rule against speculative comments.
3. frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:21
  • Draft comment:
    The reactive state properties are initialized using $state. For improved readability and type safety, consider adding explicit type annotations for these state variables so that future developers understand their types immediately.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.
4. frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:95
  • Draft comment:
    In the loadMessages method, a Promise with setTimeout is used twice to delay DOM updates. Consider extracting this delay into a helper function to improve code reuse and readability.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The two setTimeout delays serve different purposes and have different timing values. One is for scrolling to bottom after reset (100ms), the other for restoring scroll position after loading more messages (50ms). Extracting them to a helper wouldn't reduce much code duplication since they're used in different contexts. The timing values are intentionally different. The comment has a point about code reuse, but the two timeouts serve different purposes and have different timing requirements. Combining them might actually make the code less clear by obscuring their distinct purposes. While code reuse is generally good, in this case the timeouts are sufficiently different in purpose and timing that combining them wouldn't meaningfully improve the code. The comment should be deleted as the suggested refactor would not meaningfully improve code quality or readability, and might actually reduce clarity by combining two distinct timing operations.
5. frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:254
  • Draft comment:
    The startPolling method uses a hardcoded interval (500 ms) and stops polling after 2 minutes. Consider making these values configurable or documenting the rationale behind these specific values for easier adjustment in the future.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The polling values seem reasonable and well-commented. Making them configurable would add complexity without clear benefit. The 500ms polling interval is a standard reasonable default that balances responsiveness with server load. The 2-minute timeout prevents infinite polling. These aren't likely to need frequent adjustment. The comment has a point - hardcoded values can be problematic if requirements change. Maybe some use cases need different polling frequencies or timeout periods. While technically valid, making these configurable would add unnecessary complexity. The current values are reasonable defaults and the comments explain their purpose. If changes are needed later, they can be modified then. Delete the comment. The suggestion would add complexity without clear benefit, and the current implementation with documented magic numbers is sufficient.
6. frontend/src/lib/components/flows/conversations/FlowChatManager.svelte.ts:175
  • Draft comment:
    In pollJobResult, the catch block only logs the error. For better robustness, consider if additional error handling or user notification might be needed when waiting for a job’s result fails.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% The error handling seems appropriate for this context because: 1. It's an internal polling function called by sendMessage() which has user-facing error handling 2. The function continues execution in finally block to clean up state 3. Adding more error handling here would be redundant with parent 4. The error is logged for debugging I could be wrong about the importance of error handling in polling functions - perhaps there are edge cases where the polling error should be handled differently than other errors. While polling errors could be special cases, the existing error handling pattern (log error, continue cleanup) seems appropriate for an internal polling function. The parent already handles user notification. The comment should be deleted. The error handling is appropriate for an internal polling function, with errors logged and parent function handling user notification.
7. frontend/src/lib/components/flows/conversations/FlowChatMessage.svelte:16
  • Draft comment:
    The FlowChatMessage component conditionally applies classes based on message type and status. Consider adding ARIA roles or additional accessibility attributes to improve accessibility for screen readers.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. backend/migrations/20251007123506_update_conversation_message_types.up.sql:1
  • Draft comment:
    Typo: The comment on line 1 reads "Add up migration script here". Consider revising it for clarity – for example, if it is meant to indicate setting up the migration script, you might simply write "Add migration script here" or "Setup migration script here."
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% 1. "Up migration" is correct technical terminology in the context of database migrations. 2. The comment is suggesting a change that would actually make the header less precise. 3. This is a purely cosmetic comment about a comment, not about actual code. 4. The existing comment is clear and technically accurate. Could there be a house style guide that prefers different migration comment formats? Even if there were a style guide, this kind of minor comment about comment wording is not important enough to warrant a PR comment. The comment should be deleted as it suggests changing correct technical terminology to less precise wording, and comments about comment formatting are not important enough for PR review.
9. frontend/src/lib/components/flows/conversations/FlowChatMessage.svelte:29
  • Draft comment:
    There's a possible typographical issue with the closing
    tag for the step_name block. The tag is split across two lines (line 29 and 30), which may lead to unexpected behavior. Consider combining them into a single correctly formatted closing tag, e.g., </div>.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_R6wxS8R9mNEb6hJ1

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant