Skip to content

Commit 158f3c3

Browse files
authored
feat: Improve user experience of az aks agent with aks-mcp (#9132)
* feat: Improve user experience of az aks agent with aks-mcp Enhance the user experience of az aks agent, including: 1. Use aks-mcp by default, offering an opt-out flag --no-aks-mcp. 2. Disable duplicated built-in toolsets when using aks-mcp. 3. Manage the lifecycle of aks-mcp binary, including downloading, updating, health checking and gracefully stopping. 4. Offer status subcommand to display the system status. Refine system prompt. 5. Smart toolset refreshment when switching between mcp and traditional mode. * use --status instead of status * address ai comments * style * add pytest-asyncio dependency * fix unit tests * fix(aks-agent/mcp): eliminate “Event loop is closed” shutdown error - Launch aks-mcp via subprocess.Popen instead of asyncio.create_subprocess_exec to avoid asyncio transport GC on a closed loop. - Add robust teardown: terminate → wait(timeout) → kill fallback, and explicitly close stdin/stdout/stderr pipes. - Make is_server_running use Popen.poll() safely. - Minor: update MCP prompt to prefer kubectl node listing when Azure Compute ops are blocked by read-only policy. * {AKS} Clarify model parameter (cherry-pick PR #9145) Squashed cherry-pick of PR #9145 commits:\n- clarify model parameter\n- adjust command example to pretty print recommendation\n- fix disallowed html tag in deployment name\n- update examples to use model name as deployment name\n- remove redundant starting space in parameter help\n\nExcluded changes to HISTORY.rst and setup.py as requested. * chore: Add nilo19 and mainerd to aks agent owners * chore(aks-agent): fix flake8 issues (E306, E261, W291) * chore(aks-agent): flake8 E261 fix in mcp_manager.py (two spaces before inline comment)
1 parent 49940bd commit 158f3c3

30 files changed

+6550
-122
lines changed

.github/CODEOWNERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,8 @@
6262

6363
/src/aks-preview/ @andyzhangx @andyliuliming @fumingzhang
6464

65+
/src/aks-agent/ @nilo19 @mainerd
66+
6567
/src/bastion/ @aavalang
6668

6769
/src/vm-repair/ @haagha

src/aks-agent/HISTORY.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,17 @@ To release a new version, please select a new version number (usually plus 1 to
1212
Pending
1313
+++++++
1414

15+
1.0.0b2
16+
+++++++
17+
18+
- Add MCP integration for `az aks agent` with aks-mcp binary management and local server lifecycle (download, version validation, start/stop, health checks).
19+
- Introduce dual-mode operation: MCP mode (enhanced) and Traditional mode (built-in toolsets), with mode-specific system prompts.
20+
- Implement smart toolset refresh strategy with persisted mode state to avoid unnecessary refresh on repeated runs.
21+
- Add `--no-aks-mcp` flag to force Traditional mode when desired.
22+
- Add `az aks agent status` command to display MCP binary availability/version, server health, and overall mode/readiness.
23+
- Add structured error handling with user-friendly messages and actionable suggestions for MCP/binary/server/config errors.
24+
- Port and adapt comprehensive unit tests covering binary manager, MCP manager, configuration generation/validation, status models/collection, error handling, user feedback, parameters, smart refresh, MCP integration, and status command.
25+
1526
1.0.0b1
1627
+++++++
1728
* Add interactive AI-powered debugging tool `az aks agent`.

src/aks-agent/azext_aks_agent/_consts.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,11 @@
88
CONST_AGENT_NAME = "AKS AGENT"
99
CONST_AGENT_NAME_ENV_KEY = "AGENT_NAME"
1010
CONST_AGENT_CONFIG_FILE_NAME = "aksAgent.yaml"
11+
12+
# MCP Integration Constants (ported from previous change)
13+
CONST_MCP_BINARY_NAME = "aks-mcp"
14+
CONST_MCP_DEFAULT_PORT = 8003
15+
CONST_MCP_DEFAULT_URL = "http://localhost:8003/sse"
16+
CONST_MCP_MIN_VERSION = "0.0.7"
17+
CONST_MCP_GITHUB_REPO = "Azure/aks-mcp"
18+
CONST_MCP_BINARY_DIR = "bin"

src/aks-agent/azext_aks_agent/_help.py

Lines changed: 43 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,14 @@
2626
short-summary: Name of the resource group.
2727
- name: --model
2828
type: string
29-
short-summary: Model to use for the LLM.
29+
short-summary: Specify the LLM provider and model or deployment to use for the AI assistant.
30+
long-summary: |-
31+
The --model parameter determines which large language model (LLM) and provider will be used to analyze your cluster.
32+
For OpenAI, use the model name directly (e.g., gpt-4o).
33+
For Azure OpenAI, use `azure/<deployment name>` (e.g., azure/gpt-4.1).
34+
Each provider may require different environment variables and model naming conventions.
35+
For a full list of supported providers, model patterns, and required environment variables, see https://docs.litellm.ai/docs/providers.
36+
Note: For Azure OpenAI, it is recommended to set the deployment name as the model name until https://github.com/BerriAI/litellm/issues/13950 is resolved.
3037
- name: --api-key
3138
type: string
3239
short-summary: API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY).
@@ -48,36 +55,30 @@
4855
- name: --refresh-toolsets
4956
type: bool
5057
short-summary: Refresh the toolsets status.
58+
- name: --status
59+
type: bool
60+
short-summary: Show AKS agent configuration and status information.
61+
- name: --no-aks-mcp
62+
type: bool
63+
short-summary: Disable AKS MCP integration and use traditional toolsets.
5164
5265
examples:
5366
- name: Ask about pod issues in the cluster with Azure OpenAI
5467
text: |-
5568
export AZURE_API_BASE="https://my-azureopenai-service.openai.azure.com/"
5669
export AZURE_API_VERSION="2025-01-01-preview"
5770
export AZURE_API_KEY="sk-xxx"
58-
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment
71+
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model azure/gpt-4.1
5972
- name: Ask about pod issues in the cluster with OpenAI
6073
text: |-
6174
export OPENAI_API_KEY="sk-xxx"
6275
az aks agent "Why are my pods not starting?" --name MyManagedCluster --resource-group MyResourceGroup --model gpt-4o
63-
- name: Run in interactive mode without a question
64-
text: az aks agent "Check the pod status in my cluster" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment --api-key "sk-xxx"
65-
- name: Run in non-interactive batch mode
66-
text: az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
67-
- name: Show detailed tool output during analysis
68-
text: az aks agent "Why is my service workload unavailable in namespace workload-ns?" --show-tool-output --model azure/my-gpt4.1-deployment
69-
- name: Use custom configuration file
70-
text: az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --model azure/my-gpt4.1-deployment
71-
- name: Run agent with no echo of the original question
72-
text: az aks agent "What is the status of my cluster?" --no-echo-request --model azure/my-gpt4.1-deployment
73-
- name: Refresh toolsets to get the latest available tools
74-
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
7576
- name: Run agent with config file
7677
text: |
77-
az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml
78+
az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --name MyManagedCluster --resource-group MyResourceGroup
7879
Here is an example of config file:
7980
```json
80-
model: "gpt-4o"
81+
model: "azure/gpt-4.1"
8182
api_key: "..."
8283
# define a list of mcp servers, mcp server can be defined
8384
mcp_servers:
@@ -103,4 +104,30 @@
103104
aks/core:
104105
enabled: false
105106
```
107+
- name: Run in interactive mode without a question
108+
text: az aks agent "Check the pod status in my cluster" --name MyManagedCluster --resource-group MyResourceGroup --model azure/gpt-4.1 --api-key "sk-xxx"
109+
- name: Run in non-interactive batch mode
110+
text: az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/gpt-4.1
111+
- name: Show detailed tool output during analysis
112+
text: az aks agent "Why is my service workload unavailable in namespace workload-ns?" --show-tool-output --model azure/gpt-4.1
113+
- name: Use custom configuration file
114+
text: az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --model azure/gpt-4.1
115+
- name: Run agent with no echo of the original question
116+
text: az aks agent "What is the status of my cluster?" --no-echo-request --model azure/gpt-4.1
117+
- name: Refresh toolsets to get the latest available tools
118+
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/gpt-4.1
119+
- name: Show agent status (MCP readiness)
120+
text: az aks agent --status
121+
- name: Run in interactive mode without a question
122+
text: az aks agent "Check the pod status in my cluster" --name MyManagedCluster --resource-group MyResourceGroup --model azure/my-gpt4.1-deployment --api-key "sk-xxx"
123+
- name: Run in non-interactive batch mode
124+
text: az aks agent "Diagnose networking issues" --no-interactive --max-steps 15 --model azure/my-gpt4.1-deployment
125+
- name: Show detailed tool output during analysis
126+
text: az aks agent "Why is my service workload unavailable in namespace workload-ns?" --show-tool-output --model azure/my-gpt4.1-deployment
127+
- name: Use custom configuration file
128+
text: az aks agent "Check kubernetes pod resource usage" --config-file /path/to/custom.yaml --model azure/my-gpt4.1-deployment
129+
- name: Run agent with no echo of the original question
130+
text: az aks agent "What is the status of my cluster?" --no-echo-request --model azure/my-gpt4.1-deployment
131+
- name: Refresh toolsets to get the latest available tools
132+
text: az aks agent "What is the status of my cluster?" --refresh-toolsets --model azure/my-gpt4.1-deployment
106133
"""

src/aks-agent/azext_aks_agent/_params.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,9 @@ def load_arguments(self, _):
1717
with self.argument_context("aks agent") as c:
1818
c.positional(
1919
"prompt",
20+
nargs='?',
2021
help="Ask any question and answer using available tools.",
22+
required=False,
2123
)
2224
c.argument(
2325
"resource_group_name",
@@ -47,12 +49,12 @@ def load_arguments(self, _):
4749
)
4850
c.argument(
4951
"model",
50-
help="The model to use for the LLM.",
52+
help=" Specify the LLM provider and model or deployment to use for the AI assistant.",
5153
required=False,
5254
type=str,
5355
)
5456
c.argument(
55-
"api-key",
57+
"api_key",
5658
help="API key to use for the LLM (if not given, uses environment variables AZURE_API_KEY, OPENAI_API_KEY)",
5759
required=False,
5860
type=str,
@@ -77,3 +79,15 @@ def load_arguments(self, _):
7779
help="Refresh the toolsets status.",
7880
action="store_true",
7981
)
82+
c.argument(
83+
"status",
84+
options_list=["--status"],
85+
action="store_true",
86+
help="Show AKS agent configuration and status information.",
87+
)
88+
c.argument(
89+
"no_aks_mcp",
90+
options_list=["--no-aks-mcp"],
91+
help="Disable AKS MCP integration and use traditional toolsets.",
92+
action="store_true",
93+
)

0 commit comments

Comments
 (0)