-
Notifications
You must be signed in to change notification settings - Fork 132
Add architecture documentation #2165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
54cb6dc
to
680158a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2165 +/- ##
==========================================
+ Coverage 53.32% 53.36% +0.03%
==========================================
Files 231 231
Lines 29529 29529
==========================================
+ Hits 15747 15757 +10
+ Misses 12649 12633 -16
- Partials 1133 1139 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
4d6d2bc
to
309ebb5
Compare
Wondering if @danbarr has any thoughts on this, as there will seem to be overlaps in documentation between the docs website and this repo? |
@ChrisJBurns docs for a different purpose. these are for devs |
We have some other documentation throughout the codebase that overlap with this, for example https://github.com/stacklok/toolhive/blob/main/docs/middleware.md. Also is there an opportunity to split up this PR? It's quite long and dense to fit into my mind at once. |
@eleftherias I can split it into multiple PRs... but then I'd have broken markdown references and incomplete parts 😕 I figured it might just be worth getting something started and iterating on top of this. regarding middleware.md, that's a good idea! We could ditch that one and absorb it to the new arch docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation review for docs/arch/04-secrets-management.md
: Found 2 technical inaccuracies with suggested fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation Review - Factual Accuracy
Thorough review of architecture documentation for factual accuracy against codebase. Found 14 issues across 4 files:
- 06-registry-system.md: 6 issues (file paths, annotations, phases, README reference)
- 07-groups.md: 1 issue (stale PR reference)
- 08-workloads-lifecycle.md: 3 issues (line numbers, storage paths, label names)
- 09-operator-architecture.md: 4 issues (filename, annotation, example code, missing controller)
Most issues have inline suggestions that can be applied directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation review findings: Found 2 inaccuracies in the Groups documentation that should be corrected for accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation review findings for registry system architecture doc. Found 13 issues including incorrect CLI flags, wrong CRD field names, non-existent file paths, and incomplete examples. Most have inline suggestions for fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation review findings for 08-workloads-lifecycle.md
. Found several inaccuracies in CLI commands, file paths, and label formats. Most issues have inline suggestions for easy fixes.
This commit introduces a new architectural documentation suite in docs/arch/ that provides in-depth coverage of ToolHive's design, components, and concepts. The documentation is organized into the following sections: - 00-overview.md: High-level architecture overview and introduction - 01-deployment-modes.md: Local CLI, UI, and Kubernetes deployment patterns - 02-core-concepts.md: Core terminology, abstractions, and design patterns - 03-transport-architecture.md: MCP transport protocols and proxy architecture - 04-secrets-management.md: Secret handling and backend integrations - 05-runconfig-and-permissions.md: Configuration schema and security profiles - 06-registry-system.md: Registry architecture and distribution - 07-groups.md: Group management and virtual MCP servers - 08-workloads-lifecycle.md: Workload state management and operations - 09-operator-architecture.md: Kubernetes operator design and patterns - README.md: Navigation guide and documentation index This documentation serves as the canonical reference for understanding ToolHive's architecture, making it easier for contributors to navigate the codebase and for users to understand deployment options. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Juan Antonio Osorio <[email protected]>
Made the following changes based on review comments: - Fix API version references: point to actual examples instead of inline YAML - Fix CRD names: ToolConfig → MCPToolConfig, add MCPExternalAuthConfig - Remove all line number references from code file paths - Fix CLI commands: registry show → info, group delete → rm - Remove non-existent CLI commands from documentation - Fix 1Password implementation details (uses SDK not CLI) - Point to cmd/thv-operator/ README instead of duplicating info - Add note that thv-registry-api is moving out of tree These changes make the documentation more maintainable by reducing references to implementation details that change frequently and ensuring all commands and APIs referenced actually exist. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Reduces duplication and improves maintainability of architecture documentation: - Remove duplicated Core Concepts section from overview, replace with brief summary - Update stdio flow diagram to show independent stdin/stdout streams more clearly - Add context for when to use exported configs (sharing, migration, version control) - Remove Project Structure section to reduce maintenance burden - Simplify Registry API Server section with note about out-of-tree migration - Fix persistent volume statement in Kubernetes scaling section 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Change "metrics" to "telemetry" for proxy endpoints clarity - Clarify stdio session limitations (single connection to container) - Explain why tool filter vs tool call filter (context optimization) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Use backticks for proper code formatting in attach process documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Remove non-existent commands and fix interactive command documentation: - Remove 'thv group move' (doesn't exist) - Fix 'thv client setup' description (is interactive, doesn't take client name) - Update group operations list to match actual CLI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Soften HA scaling claim (not currently tested) - Add stdio transport limitation for proxy scaling - Clarify MCP server scaling applies to SSE/Streamable HTTP transports 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add note that SSE transport is deprecated in the MCP specification, though ToolHive continues to support it with potential future transition. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Replace full struct definition with link to pkg/runner/config.go and categorized field summary to reduce maintenance burden. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Verified against source code and corrected: - Export command syntax (requires 2 args: workload and path, no stdout) - Cedar policy format (Client:: not User::, Action::call_tool not "tools/call") - Group operations (thv list --group, not thv group list <name>) - File locations (data files in ~/.local/share, state in ~/.local/state) - Complete socket paths including macOS locations (Podman Machine, Docker Desktop, Rancher) All changes verified against pkg/authz/cedar.go, cmd/thv/app/export.go, pkg/container/docker/sdk/client_unix.go, and pkg state/workloads code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Verified against actual code: - Scalar UI path is /api/doc not /scalar (pkg/api/docs.go:13, server.go:234) - Fixed audit event types based on pkg/audit/mcp_events.go (15 total types) - Corrected mcp_list_operation to actual types: mcp_tools_list, mcp_resources_list, mcp_prompts_list 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Project structure section was removed from overview, update index to match. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Clarify tool-filter and tool-call-filter middleware descriptions - Separate tool filtering from tool overriding in documentation - Rename "Filter" section to "Filter and Override" to reflect both operations - Change "metrics" to "telemetry" for consistency with middleware naming - Explain that both middlewares work together with shared configuration 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Replace CRD examples with references to examples/operator/mcp-servers/ directory - Fix export command syntax (thv export requires output path) - Fix group commands documentation (thv list --group instead of thv group list) - Refocus groups documentation on architecture rather than CLI usage - Remove excessive CLI usage examples to reduce maintenance burden All changes verified against actual codebase implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Clarify token storage security in remote authentication (AES-256-GCM encryption) - Add Kubernetes Mode section to secrets documentation explaining native K8s Secret usage - Note that Kubernetes uses SecretKeyRef, not the provider system used in CLI mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Add a new section to CLAUDE.md instructing agents to update architecture documentation when making code changes. Includes a mapping table of code areas to documentation files and guidelines for keeping docs in sync with implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Fix all 12 unresolved review comments by improving architectural focus: - Remove CLI command examples, focus on architectural concepts - Update file path references to actual implementation files - Fix middleware type name from 'authz' to 'authorization' - Organize RunConfig fields by architectural categories - Simplify audit events to categories instead of exhaustive list - Simplify request flow diagram and reference middleware.md - Correct file paths for registry, session, client, MCP, audit, monitor, healthcheck These changes align the documentation with architectural best practices: focusing on concepts, patterns, and system design rather than CLI usage or exhaustive implementation details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Address PR feedback by removing CLI examples and correcting technical details: - Remove all CLI command examples (architecture docs should focus on design, not usage) - Fix container monitor path: pkg/container/docker/monitor.go (not pkg/container/monitor.go) - Correct OAuth token storage: tokens managed in-memory by TokenSource, not persisted - Clarify MCP_HOST: defaults to 127.0.0.1 locally, 0.0.0.0 in Kubernetes - Replace CLI examples with architectural descriptions of concepts - Update port management to describe architecture, not command flags - Document TokenSource pattern and client credential storage distinction These changes align documentation with actual implementation and follow architecture documentation best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Address final round of PR feedback by removing CLI examples and correcting technical details: - Remove all CLI command examples from architecture docs - Fix 1Password implementation: SDK not CLI (diagram and text) - Add missing secret providers: environment and none - Document Environment provider security: ListSecrets disabled for security - Correct environment variable merge order with architectural reasoning - Fix Windows path handling: allowed as host paths only, not container paths - Replace export/import CLI examples with architectural descriptions - Update permission auditing, network isolation, secrets management sections - Remove CLI flags from custom profiles section All changes verified by toolhive-expert agent. Documentation now focuses on architectural concepts and design patterns rather than CLI usage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Fix architecture documentation inaccuracies identified in code review: Registry System (06): - Update file references to actual provider implementation files - Remove reference to non-existent README - Fix annotation keys to use correct toolhive.stacklok.dev domain - Correct MCPRegistry phases (remove Degraded, add Terminating) - Fix YAML examples (apiVersion, Git repository field, sync policy) - Remove incomplete OAuth example - Update CLI flags to match actual implementation - Remove reference to non-existent converter command - Simplify architecture diagram to reflect actual implementation Groups (07): - Clarify group move functionality is internal only - Add note about empty default registry groups - Remove stale PR reference, use generic description Workloads Lifecycle (08): - Remove all line number references per documentation guidelines - Fix storage paths to match XDG directory structure - Correct label format to simple prefix style Operator Architecture (09): - Fix MCPExternalAuthConfig filename reference - Add missing controller reference - Remove incorrect StatusCollector example code - Fix sync trigger annotation key All changes verified against actual code implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Signed-off-by: Juan Antonio Osorio <[email protected]>
Remove CLI-focused content and maintain architecture focus: - Fix state transition: container exit goes to stopped (was already correct in diagram) - Remove non-existent update command section - Remove CLI examples from List section, describe architecture instead - Rename 'Async Operations' to 'Batch Operations' for clarity - Remove CLI flags from filtering, describe capability architecturally - Expand label descriptions with purpose/meaning Architecture docs should describe system design, not CLI usage. Verified against pkg/workloads/manager.go and pkg/container/runtime/types.go 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Summary
This PR adds a comprehensive architecture documentation suite in
docs/arch/
covering ToolHive's design, components, and concepts.Documentation Added
🤖 Generated with Claude Code