-
Notifications
You must be signed in to change notification settings - Fork 132
MCPGroup support in k8s #2247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MCPGroup support in k8s #2247
Conversation
Implements MCPGroup types and k8s controller per [this proposal](https://github.com/stacklok/toolhive/blob/73707af47db039f51b2bda07d1ef8eccb25ba819/docs/proposals/kubernetes-mcpgroup-crd.md). The MCP group controller reconciles MCPGroup resources and watches for changes to MCPServer resources. When the latter happens, it requests reconciliation of the server's referenced MCPGroup. The MCP server controller checks the groupRef during reconciliation and sets status conditions accordingly (no fatal error if the groupRef is invalid). Generated code and manifests. ```console toolhive % task operator-generate toolhive % task task operator-manifests ``` Added unit, integration, and e2e (chainsaw) tests. ```console toolhive % task operator-test toolhive % task operator-test-integration toolhive % task operator-e2e-test ```
@claude please review this and take into account kubernetes best practices while you're at it. |
Claude finished @JAORMX's task —— View job Code Review Complete ✅I've completed my review of the MCPGroup implementation with a focus on Kubernetes best practices. Here's my comprehensive analysis: SummaryThis PR implements MCPGroup CRD support in the Kubernetes operator following the approved proposal. The implementation is solid overall with good adherence to Kubernetes best practices. I've identified several improvements related to best practices, edge cases, and API design. ✅ Strengths1. Excellent Design Alignment
2. Strong Test Coverage
3. Good Controller Patterns
🔍 Issues & ImprovementsCritical: Missing observedGeneration in Status UpdatesIssue: Both controllers update status conditions without setting Location:
Fix: // In mcpgroup_controller.go
meta.SetStatusCondition(&mcpGroup.Status.Conditions, metav1.Condition{
Type: mcpv1alpha1.ConditionTypeMCPServersChecked,
Status: metav1.ConditionFalse,
Reason: mcpv1alpha1.ConditionReasonListMCPServersFailed,
Message: "Failed to list MCPServers in namespace",
ObservedGeneration: mcpGroup.Generation, // ADD THIS
})
// Similar fixes needed in mcpserver_controller.go validateGroupRef() Why: High Priority: Lack of Finalizers for Cascade HandlingIssue: The proposal mentions that when an MCPGroup is deleted, MCPServers should continue running but will have dangling references. However, there's no finalizer to clean up the group gracefully or update server conditions before deletion. Location: Recommendation:
Current behavior risk: MCPServers will have Medium: Status Update Without Conflict HandlingIssue: Status updates in the MCPGroup controller don't handle conflicts properly. If two reconciliations happen simultaneously, one could overwrite the other. Location: // Current code
if err := r.Status().Update(ctx, mcpGroup); err != nil {
ctxLogger.Error(err, "Failed to update MCPGroup status")
return ctrl.Result{}, err
} Better approach: if err := r.Status().Update(ctx, mcpGroup); err != nil {
if errors.IsConflict(err) {
// Requeue to retry with fresh data
return ctrl.Result{Requeue: true}, nil
}
ctxLogger.Error(err, "Failed to update MCPGroup status")
return ctrl.Result{}, err
} Medium: Non-Deterministic Server List OrderIssue: The Location: Fix: mcpGroup.Status.Servers = make([]string, len(filteredServers))
for i, server := range filteredServers {
mcpGroup.Status.Servers[i] = server.Name
}
// Add this:
sort.Strings(mcpGroup.Status.Servers) Medium: MCPServer Controller Doesn't Update Status After GroupRef ValidationIssue: The Location: Impact: The GroupRef validation conditions are lost unless another reconciliation updates the status. Fix: Either:
Looking at line 187 in the Reconcile function, it appears validation is called but status is only updated much later. This could lead to race conditions. Low: Missing Printer Columns in MCPGroup CRDIssue: MCPGroup CRD has printer columns defined in the Go types but they could be improved for better UX. Location: Current: //+kubebuilder:printcolumn:name="Servers",type="integer",JSONPath=".status.serverCount"
//+kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" Suggested addition: //+kubebuilder:printcolumn:name="Ready",type="string",JSONPath=".status.conditions[?(@.type=='MCPServersChecked')].status" This would give users quick visibility into whether the group is healthy. Low: Inconsistent Constant NamingIssue: The condition type constant uses "Validated" while the reason uses "Succeeded". Location: const (
ConditionTypeMCPServersChecked = "MCPServersChecked"
)
const (
ConditionReasonListMCPServersFailed = "ListMCPServersFailed"
ConditionReasonListMCPServersSucceeded = "ListMCPServersSucceeded" // inconsistent with type name
) Minor improvement: Consider Low: CRD Missing Short NamesIssue: Neither CRD defines short names for kubectl convenience. Location: CRD manifests Suggestion: # In mcpgroups CRD
names:
kind: MCPGroup
plural: mcpgroups
singular: mcpgroup
shortNames: # ADD THIS
- mcpg
- mcpgroup Users could then use 📝 Documentation & API Design Notes1. GroupRef is Optional but Has ConditionsThe 2. Cross-Namespace References Intentionally BlockedThe design correctly restricts GroupRef to same-namespace only. This follows Kubernetes security best practices. ✅ 3. No Field Selector Support YetThe proposal mentions field selectors for querying servers by group ( 🎯 Kubernetes Best Practices Assessment
🚀 RecommendationsMust Fix Before Merge:
Should Fix:
Nice to Have:
✨ ConclusionThis is a well-implemented feature that follows the proposal closely and includes comprehensive testing. The main concerns are around status handling best practices (observedGeneration, conflict handling) and missing finalizers for graceful deletion. These are important for production-readiness but don't block the feature from functioning correctly. Great work on the test coverage - the integration and e2e tests demonstrate thorough validation of the feature! Job Run | Branch: |
Implement MCPGroup types and k8s controller per this proposal, tracked by #2234.
The MCP group controller reconciles MCPGroup resources and watches for changes to MCPServer resources. When the latter happens, it requests reconciliation of the server's referenced MCPGroup.
The MCP server controller checks the groupRef during reconciliation and sets status conditions accordingly (no fatal error if the groupRef is invalid).
Generated code and manifests.
Added unit, integration, and e2e (chainsaw) tests.
Many added lines are generated (CRD) or test cases.