diff --git a/playbook/README.md b/playbook/README.md index 467c66c..2295f8c 100644 --- a/playbook/README.md +++ b/playbook/README.md @@ -33,6 +33,19 @@ Supports enabling `etcd Overload Protection` and `APF Flow Control` [APF Rate Li | `inject-stress-list-qps` | `int` | "100" | QPS per stress test Pod | | `inject-stress-total-duration` | `string` | "30s" | Total test duration (e.g. 30s, 5m) | +**Recommended Parameters for TKE Clusters** + +| Cluseter Level | resource-create-object-size-bytes | resource-create-object-count | resource-create-qps | inject-stress-concurrency | inject-stress-list-qps | +|---------|----------------------------------|-----------------------------|---------------------|--------------------------|-----------------------| +| L5 | 10000 | 100 | 10 | 6 | 200 | +| L50 | 10000 | 300 | 10 | 6 | 200 | +| L100 | 50000 | 500 | 20 | 6 | 200 | +| L200 | 100000 | 1000 | 50 | 9 | 200 | +| L500 | 100000 | 1000 | 50 | 12 | 200 | +| L1000 | 100000 | 3000 | 50 | 12 | 300 | +| L3000 | 100000 | 6000 | 500 | 18 | 500 | +| L5000 | 100000 | 10000 | 500 | 21 | 500 | + **etcd Overload Protection & Enhanced APF** Tencent Cloud TKE team has developed these core protection features: @@ -56,15 +69,19 @@ Supported versions: **playbook**: `workflow/coredns-disruption-scenario.yaml` This scenario simulates coredns service disruption by: -1. Scaling coredns Deployment replicas to 0 -2. Maintaining zero replicas for specified duration -3. Restoring original replica count + +1. **Pre-check**: Verify the existence of the `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in the target cluster to ensure the cluster is available for testing + +2. **Component Shutdown**: Log in to the Argo Web UI, click on `coredns-disruption-scenario workflow`, then click the `RESUME` button under the `SUMMARY` tab of the `suspend-1` node to scale down the coredns Deployment replicas to 0 + +3. **Service Validation**: During the coredns disruption, you can verify whether your services are affected by the coredns disruption + +4. **Component Recovery**: Click the `RESUME` button under the `SUMMARY` tab of the `suspend-2` node to restore the coredns Deployment replicas **Parameters** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `disruption-duration` | `string` | `30s` | Disruption duration (e.g. 30s, 5m) | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name | ## kubernetes-proxy Disruption @@ -72,15 +89,19 @@ This scenario simulates coredns service disruption by: **playbook**: `workflow/kubernetes-proxy-disruption-scenario.yaml` This scenario simulates kubernetes-proxy service disruption by: -1. Scaling kubernetes-proxy Deployment replicas to 0 -2. Maintaining zero replicas for specified duration -3. Restoring original replica count + +1. **Pre-check**: Verify the existence of the `tke-chaos-test/tke-chaos-precheck-resource ConfigMap` in the target cluster to ensure the cluster is available for testing + +2. **Component Shutdown**: Log in to the Argo Web UI, click on `kubernetes-proxy-disruption-scenario workflow`, then click the `RESUME` button under the `SUMMARY` tab of the `suspend-1` node to scale down the kubernetes-proxy Deployment replicas to 0 + +3. **Service Validation**: During the kubernetes-proxy disruption, you can verify whether your services are affected by the kubernetes-proxy disruption + +4. **Component Recovery**: Click the `RESUME` button under the `SUMMARY` tab of the `suspend-2` node to restore the kubernetes-proxy Deployment replicas **Parameters** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `disruption-duration` | `string` | `30s` | Disruption duration (e.g. 30s, 5m) | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Target cluster kubeconfig secret name | ## Namespace Deletion Protection @@ -140,10 +161,10 @@ kubectl create -f workflow/managed-cluster-master-component/restore-apiserver.ya | Parameter | Type | Default | Description | |-----------|------|---------|-------------| -| `region` | `string` | `` | Tencent Cloud region, e.g. `ap-guangzhou` [Region List](https://www.tencentcloud.com/document/product/213/6091?lang=en&pg=) | -| `secret-id` | `string` | `` | Tencent Cloud API secret ID, obtain from [API Key Management](https://console.cloud.tencent.com/cam/capi) | -| `secret-key` | `string` | `` | Tencent Cloud API secret key | -| `cluster-id` | `string` | `` | Target cluster ID | +| `region` | `string` | "" | Tencent Cloud region, e.g. `ap-guangzhou` [Region List](https://www.tencentcloud.com/document/product/213/6091?lang=en&pg=) | +| `secret-id` | `string` | "" | Tencent Cloud API secret ID, obtain from [API Key Management](https://console.cloud.tencent.com/cam/capi) | +| `secret-key` | `string` | "" | Tencent Cloud API secret key | +| `cluster-id` | `string` | "" | Target cluster ID | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | Secret name containing target cluster kubeconfig | **Notes** diff --git a/playbook/README_zh.md b/playbook/README_zh.md index dff6aaf..d77e2a8 100644 --- a/playbook/README_zh.md +++ b/playbook/README_zh.md @@ -33,6 +33,19 @@ | `inject-stress-list-qps` | `int` | "100" | 每个发压`Pod`的`QPS` | | `inject-stress-total-duration` | `string` | "30s" | 发压执行总时长(如30s,5m等) | +**TKE集群推荐压测参数** + +| 集群规格 | resource-create-object-size-bytes | resource-create-object-count | resource-create-qps | inject-stress-concurrency | inject-stress-list-qps | +|---------|----------------------------------|-----------------------------|---------------------|--------------------------|-----------------------| +| L5 | 10000 | 100 | 10 | 6 | 200 | +| L50 | 10000 | 300 | 10 | 6 | 200 | +| L100 | 50000 | 500 | 20 | 6 | 200 | +| L200 | 100000 | 1000 | 50 | 9 | 200 | +| L500 | 100000 | 1000 | 50 | 12 | 200 | +| L1000 | 100000 | 3000 | 50 | 12 | 300 | +| L3000 | 100000 | 6000 | 500 | 18 | 500 | +| L5000 | 100000 | 10000 | 500 | 21 | 500 | + **etcd过载保护&增强apf限流说明** 腾讯云TKE团队在社区版本基础上开发了以下核心保护特性: @@ -56,15 +69,16 @@ **playbook**:`workflow/coredns-disruption-scenario.yaml` 该场景通过以下方式构造`coredns`服务中断: -1. 将`coredns Deployment`副本数缩容到`0` -2. 维持指定时间副本数为`0` -3. 恢复原有副本数 + +1. **前置检查**:验证目标集群中存在`tke-chaos-test/tke-chaos-precheck-resource ConfigMap`,确保集群可用于演练 +2. **组件停机**:登录argo Web UI,点击`coredns-disruption-scenario workflow`,点击`suspend-1`节点`SUMMARY`标签下的`RESUME`按钮,将`coredns Deployment`副本数缩容到`0` +3. **业务验证**:`coredns`停服期间,您可以去验证您的业务是否受到`cordns`停服的影响 +4. **组件恢复**:点击`suspend-2`节点`SUMMARY`标签下的`RESUME`按钮,将`coredns Deployment`副本数恢复 **参数说明** | 参数名称 | 类型 | 默认值 | 说明 | |---------|------|--------|------| -| `disruption-duration` | `string` | `30s` | 服务中断持续时间(如30s,5m等) | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | `目标集群kubeconfig secret`名称,如为空,则演练当前集群 | ## kubernetes-proxy停服 @@ -72,15 +86,16 @@ **playbook**:`workflow/kubernetes-proxy-disruption-scenario.yaml` 该场景通过以下方式构造`kubernetes-proxy`服务中断: -1. 将`kubernetes-proxy` `Deployment`副本数缩容到0 -2. 维持指定时间副本数为`0` -3. 恢复原有副本数 + +1. **前置检查**:验证目标集群中存在`tke-chaos-test/tke-chaos-precheck-resource ConfigMap`,确保集群可用于演练 +2. **组件停机**:登录argo Web UI,点击`kubernetes-proxy-disruption-scenario workflow`,点击`suspend-1`节点`SUMMARY`标签下的`RESUME`按钮,将`kubernetes-proxy Deployment`副本数缩容到`0` +3. **业务验证**:`kubernetes-proxy`停服期间,您可以去验证您的业务是否受到`kubernetes-proxy`停服的影响 +4. **组件恢复**:点击`suspend-2`节点`SUMMARY`标签下的`RESUME`按钮,将`kubernetes-proxy Deployment`副本数恢复 **参数说明** | 参数名称 | 类型 | 默认值 | 说明 | |---------|------|--------|------| -| `disruption-duration` | `string` | `30s` | 服务中断持续时间(如30s,5m等) | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | `目标集群kubeconfig secret`名称,如为空,则演练当前集群 | ## 命名空间删除防护 @@ -139,10 +154,10 @@ kubectl create -f workflow/managed-cluster-master-component/restore-apiserver.ya | 参数名称 | 类型 | 默认值 | 说明 | |---------|------|--------|------| -| `region` | `string` | `` | 腾讯云地域,如`ap-guangzhou` [地域查询](https://www.tencentcloud.com/zh/document/product/213/6091) | -| `secret-id` | `string` | `` | 腾讯云API密钥ID, 密钥可前往官网控制台 [API密钥管理](https://console.cloud.tencent.com/cam/capi) 进行获取 | -| `secret-key` | `string` | `` | 腾讯云API密钥 | -| `cluster-id` | `string` | `` | 演练集群ID | +| `region` | `string` | "" | 腾讯云地域,如`ap-guangzhou` [地域查询](https://www.tencentcloud.com/zh/document/product/213/6091) | +| `secret-id` | `string` | "" | 腾讯云API密钥ID, 密钥可前往官网控制台 [API密钥管理](https://console.cloud.tencent.com/cam/capi) 进行获取 | +| `secret-key` | `string` | "" | 腾讯云API密钥 | +| `cluster-id` | `string` | "" | 演练集群ID | | `kubeconfig-secret-name` | `string` | `dest-cluster-kubeconfig` | 目标集群kubeconfig secret名称 | **注意事项** diff --git a/playbook/all-in-one-template.yaml b/playbook/all-in-one-template.yaml index 2685e40..57de3e3 100644 --- a/playbook/all-in-one-template.yaml +++ b/playbook/all-in-one-template.yaml @@ -13,7 +13,7 @@ spec: parameters: # cluster-status-collect 参数 - name: image - value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" entrypoint: cluster-status-collect templates: - name: cluster-status-collect @@ -373,7 +373,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" - name: args container: image: "{{inputs.parameters.image}}" @@ -565,7 +565,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "precheck工具镜像, 用于校验集群健康状态" - name: check-configmap-name default: "tke-chaos-precheck-resource" @@ -762,7 +762,7 @@ spec: # precheck参数 - name: precheck-cluster-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "前置检查工具镜像" - name: check-configmap-name default: "tke-chaos-precheck-resource" @@ -779,7 +779,7 @@ spec: # 资源创建参数 - name: resource-create-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "资源创建工具镜像" - name: resource-create-namespace default: "tke-chaos-test" @@ -799,12 +799,12 @@ spec: # 集群状态采集参数 - name: cluster-status-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "集群状态检查工具镜像" # 压测参数 - name: inject-stress-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "故障注入工具镜像" - name: inject-stress-list-namespace default: "" @@ -1234,7 +1234,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "压测工具镜像" - name: namespace default: "" @@ -1392,8 +1392,6 @@ spec: - name: main inputs: parameters: - - name: disruption-duration - description: "服务中断持续时间" - name: workload-type description: "要测试的工作负载类型, 可选值: daemonset/deployment/statefulset" - name: workload-name @@ -1409,6 +1407,8 @@ spec: default: "tke-chaos-test" description: "预检查配置configmap所在命名空间" steps: + - - name: suspend-1 + template: suspend - - name: precheck arguments: parameters: @@ -1437,7 +1437,7 @@ spec: - name: kubeconfig-secret-name value: "{{inputs.parameters.kubeconfig-secret-name}}" template: scale-down-workload - - - name: suspend + - - name: suspend-2 template: suspend - - name: scale-up-workload arguments: @@ -1549,7 +1549,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "创建资源使用的工具镜像" - name: namespace description: "创建资源所在的命名空间" @@ -1669,6 +1669,138 @@ spec: - key: config path: config +--- +# 功能说明:在集群中删除资源,支持pods, configmaps +# 参数说明: +# resource-delete模版参数说明: +# 1. image: 删除资源使用的工具镜像 +# 2. namespace: 删除资源所在的命名空间 +# 3. object-type: 删除资源的类型, 支持pods/configmaps +# 4. object-count: 删除资源的数量 +# 5. num-clients: 删除资源的客户端数量 +# 6. qps: 删除资源的QPS +# 7. delete-all: 是否删除所有资源 +# 8. kubeconfig-secret-name: 用于指定接入的K8s集群的kubeconfig凭证的secret名称, 如果为空,则默认为当前集群 +apiVersion: argoproj.io/v1alpha1 +kind: ClusterWorkflowTemplate +metadata: + name: resource-delete +spec: + entrypoint: resource-delete + templates: + - name: resource-delete + inputs: + parameters: + - name: image + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" + description: "删除资源使用的工具镜像" + - name: namespace + description: "删除资源所在的命名空间" + - name: object-type + default: "pods" + description: "删除资源的类型, 支持pods/configmaps" + - name: object-count + default: "100" + description: "创建资源的数量" + - name: num-clients + default: "10" + description: "创建资源的客户端数量" + - name: qps + default: "10" + description: "创建资源的QPS" + - name: kubeconfig-secret-name + description: "用于指定接入的K8s集群的kubeconfig凭证的secret名称, 如果为空,则默认为当前集群" + steps: + - - name: internal-resource-delete + arguments: + parameters: + - name: image + value: "{{inputs.parameters.image}}" + - name: namespace + value: "{{inputs.parameters.namespace}}" + - name: object-type + value: "{{inputs.parameters.object-type}}" + - name: object-count + value: "{{inputs.parameters.object-count}}" + - name: num-clients + value: "{{inputs.parameters.num-clients}}" + - name: qps + value: "{{inputs.parameters.qps}}" + template: internal-resource-delete + when: "'{{inputs.parameters.kubeconfig-secret-name}}' == ''" + - - name: external-resource-delete + arguments: + parameters: + - name: image + value: "{{inputs.parameters.image}}" + - name: namespace + value: "{{inputs.parameters.namespace}}" + - name: object-type + value: "{{inputs.parameters.object-type}}" + - name: object-count + value: "{{inputs.parameters.object-count}}" + - name: num-clients + value: "{{inputs.parameters.num-clients}}" + - name: qps + value: "{{inputs.parameters.qps}}" + - name: kubeconfig-secret-name + value: "{{inputs.parameters.kubeconfig-secret-name}}" + template: external-resource-delete + when: "'{{inputs.parameters.kubeconfig-secret-name}}' != ''" + + - name: internal-resource-delete + inputs: + parameters: + - name: image + - name: namespace + - name: object-type + - name: object-count + - name: num-clients + - name: qps + container: + image: "{{inputs.parameters.image}}" + command: + - /kubestress + - delete + - --namespace={{inputs.parameters.namespace}} + - --object-type={{inputs.parameters.object-type}} + - --object-count={{inputs.parameters.object-count}} + - --num-clients={{inputs.parameters.num-clients}} + - --qps={{inputs.parameters.qps}} + + - name: external-resource-delete + inputs: + parameters: + - name: image + - name: namespace + - name: object-type + - name: object-count + - name: num-clients + - name: qps + - name: kubeconfig-secret-name + container: + image: "{{inputs.parameters.image}}" + command: + - /kubestress + - delete + - --kubeconfig=/.kube/config + - --namespace={{inputs.parameters.namespace}} + - --object-type={{inputs.parameters.object-type}} + - --object-count={{inputs.parameters.object-count}} + - --num-clients={{inputs.parameters.num-clients}} + - --qps={{inputs.parameters.qps}} + volumeMounts: + - name: kubeconfig + mountPath: "/.kube" + readOnly: true + volumes: + - name: kubeconfig + secret: + secretName: "{{inputs.parameters.kubeconfig-secret-name}}" + items: + - key: config + path: config + --- # 功能说明:创建或删除apf配置和apf限流规则, 详见 # https://doc.weixin.qq.com/doc/w3_ACYAlwbdAFwI8ImLq0SQcqldWe71Y?scode=AJEAIQdfAAoHwUzWGHAVoAPgaKAKk diff --git a/playbook/rbac.yaml b/playbook/rbac.yaml index 6dcae98..36c361a 100644 --- a/playbook/rbac.yaml +++ b/playbook/rbac.yaml @@ -105,6 +105,7 @@ apiVersion: v1 kind: Secret metadata: name: tke-chaos.service-account-token + namespace: tke-chaos-test annotations: kubernetes.io/service-account.name: tke-chaos type: kubernetes.io/service-account-token diff --git a/playbook/template/apiserver-overload-template.yaml b/playbook/template/apiserver-overload-template.yaml index 0c4e8bf..0cf1cfb 100644 --- a/playbook/template/apiserver-overload-template.yaml +++ b/playbook/template/apiserver-overload-template.yaml @@ -33,7 +33,7 @@ spec: # precheck参数 - name: precheck-cluster-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "前置检查工具镜像" - name: check-configmap-name default: "tke-chaos-precheck-resource" @@ -50,7 +50,7 @@ spec: # 资源创建参数 - name: resource-create-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "资源创建工具镜像" - name: resource-create-namespace default: "tke-chaos-test" @@ -70,12 +70,12 @@ spec: # 集群状态采集参数 - name: cluster-status-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "集群状态检查工具镜像" # 压测参数 - name: inject-stress-image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "故障注入工具镜像" - name: inject-stress-list-namespace default: "" diff --git a/playbook/template/cluster-status-collect-template.yaml b/playbook/template/cluster-status-collect-template.yaml index 4b107dd..baf94de 100644 --- a/playbook/template/cluster-status-collect-template.yaml +++ b/playbook/template/cluster-status-collect-template.yaml @@ -13,7 +13,7 @@ spec: parameters: # cluster-status-collect 参数 - name: image - value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" entrypoint: cluster-status-collect templates: - name: cluster-status-collect diff --git a/playbook/template/inject-stress-template.yaml b/playbook/template/inject-stress-template.yaml index 318f355..941d601 100644 --- a/playbook/template/inject-stress-template.yaml +++ b/playbook/template/inject-stress-template.yaml @@ -23,7 +23,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "压测工具镜像" - name: namespace default: "" diff --git a/playbook/template/precheck-template.yaml b/playbook/template/precheck-template.yaml index 42ce6ac..e919337 100644 --- a/playbook/template/precheck-template.yaml +++ b/playbook/template/precheck-template.yaml @@ -22,7 +22,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "precheck工具镜像, 用于校验集群健康状态" - name: check-configmap-name default: "tke-chaos-precheck-resource" diff --git a/playbook/template/resource-delete-template.yaml b/playbook/template/resource-delete-template.yaml new file mode 100644 index 0000000..0be22ae --- /dev/null +++ b/playbook/template/resource-delete-template.yaml @@ -0,0 +1,131 @@ +--- +# 功能说明:在集群中删除资源,支持pods, configmaps +# 参数说明: +# resource-delete模版参数说明: +# 1. image: 删除资源使用的工具镜像 +# 2. namespace: 删除资源所在的命名空间 +# 3. object-type: 删除资源的类型, 支持pods/configmaps +# 4. object-count: 删除资源的数量 +# 5. num-clients: 删除资源的客户端数量 +# 6. qps: 删除资源的QPS +# 7. delete-all: 是否删除所有资源 +# 8. kubeconfig-secret-name: 用于指定接入的K8s集群的kubeconfig凭证的secret名称, 如果为空,则默认为当前集群 +apiVersion: argoproj.io/v1alpha1 +kind: ClusterWorkflowTemplate +metadata: + name: resource-delete +spec: + entrypoint: resource-delete + templates: + - name: resource-delete + inputs: + parameters: + - name: image + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" + description: "删除资源使用的工具镜像" + - name: namespace + description: "删除资源所在的命名空间" + - name: object-type + default: "pods" + description: "删除资源的类型, 支持pods/configmaps" + - name: object-count + default: "100" + description: "创建资源的数量" + - name: num-clients + default: "10" + description: "创建资源的客户端数量" + - name: qps + default: "10" + description: "创建资源的QPS" + - name: kubeconfig-secret-name + description: "用于指定接入的K8s集群的kubeconfig凭证的secret名称, 如果为空,则默认为当前集群" + steps: + - - name: internal-resource-delete + arguments: + parameters: + - name: image + value: "{{inputs.parameters.image}}" + - name: namespace + value: "{{inputs.parameters.namespace}}" + - name: object-type + value: "{{inputs.parameters.object-type}}" + - name: object-count + value: "{{inputs.parameters.object-count}}" + - name: num-clients + value: "{{inputs.parameters.num-clients}}" + - name: qps + value: "{{inputs.parameters.qps}}" + template: internal-resource-delete + when: "'{{inputs.parameters.kubeconfig-secret-name}}' == ''" + - - name: external-resource-delete + arguments: + parameters: + - name: image + value: "{{inputs.parameters.image}}" + - name: namespace + value: "{{inputs.parameters.namespace}}" + - name: object-type + value: "{{inputs.parameters.object-type}}" + - name: object-count + value: "{{inputs.parameters.object-count}}" + - name: num-clients + value: "{{inputs.parameters.num-clients}}" + - name: qps + value: "{{inputs.parameters.qps}}" + - name: kubeconfig-secret-name + value: "{{inputs.parameters.kubeconfig-secret-name}}" + template: external-resource-delete + when: "'{{inputs.parameters.kubeconfig-secret-name}}' != ''" + + - name: internal-resource-delete + inputs: + parameters: + - name: image + - name: namespace + - name: object-type + - name: object-count + - name: num-clients + - name: qps + container: + image: "{{inputs.parameters.image}}" + command: + - /kubestress + - delete + - --namespace={{inputs.parameters.namespace}} + - --object-type={{inputs.parameters.object-type}} + - --object-count={{inputs.parameters.object-count}} + - --num-clients={{inputs.parameters.num-clients}} + - --qps={{inputs.parameters.qps}} + + - name: external-resource-delete + inputs: + parameters: + - name: image + - name: namespace + - name: object-type + - name: object-count + - name: num-clients + - name: qps + - name: kubeconfig-secret-name + container: + image: "{{inputs.parameters.image}}" + command: + - /kubestress + - delete + - --kubeconfig=/.kube/config + - --namespace={{inputs.parameters.namespace}} + - --object-type={{inputs.parameters.object-type}} + - --object-count={{inputs.parameters.object-count}} + - --num-clients={{inputs.parameters.num-clients}} + - --qps={{inputs.parameters.qps}} + volumeMounts: + - name: kubeconfig + mountPath: "/.kube" + readOnly: true + volumes: + - name: kubeconfig + secret: + secretName: "{{inputs.parameters.kubeconfig-secret-name}}" + items: + - key: config + path: config diff --git a/playbook/template/resource-orchestrate-template.yaml b/playbook/template/resource-orchestrate-template.yaml index ff63199..e542783 100644 --- a/playbook/template/resource-orchestrate-template.yaml +++ b/playbook/template/resource-orchestrate-template.yaml @@ -21,7 +21,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" description: "创建资源使用的工具镜像" - name: namespace description: "创建资源所在的命名空间" diff --git a/playbook/template/tke-master-manager-template.yaml b/playbook/template/tke-master-manager-template.yaml index 7700c29..5df873f 100644 --- a/playbook/template/tke-master-manager-template.yaml +++ b/playbook/template/tke-master-manager-template.yaml @@ -31,7 +31,7 @@ spec: inputs: parameters: - name: image - default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + default: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" - name: args container: image: "{{inputs.parameters.image}}" diff --git a/playbook/template/workload-disruption-template.yaml b/playbook/template/workload-disruption-template.yaml index fda02f6..399a2a1 100644 --- a/playbook/template/workload-disruption-template.yaml +++ b/playbook/template/workload-disruption-template.yaml @@ -13,8 +13,6 @@ spec: - name: main inputs: parameters: - - name: disruption-duration - description: "服务中断持续时间" - name: workload-type description: "要测试的工作负载类型, 可选值: daemonset/deployment/statefulset" - name: workload-name @@ -30,6 +28,8 @@ spec: default: "tke-chaos-test" description: "预检查配置configmap所在命名空间" steps: + - - name: suspend-1 + template: suspend - - name: precheck arguments: parameters: @@ -58,7 +58,7 @@ spec: - name: kubeconfig-secret-name value: "{{inputs.parameters.kubeconfig-secret-name}}" template: scale-down-workload - - - name: suspend + - - name: suspend-2 template: suspend - - name: scale-up-workload arguments: diff --git a/playbook/workflow/apiserver-overload-scenario.yaml b/playbook/workflow/apiserver-overload-scenario.yaml index f824d1a..37fbf06 100644 --- a/playbook/workflow/apiserver-overload-scenario.yaml +++ b/playbook/workflow/apiserver-overload-scenario.yaml @@ -20,7 +20,7 @@ spec: parameters: # 全局参数 - name: chaos-image - value: ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2 + value: shadcccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5 - name: cluster-id # 演练集群ID value: "未知" - name: webhook-url # 企业微信群webhook地址 @@ -71,6 +71,10 @@ spec: value: "{{workflow.parameters.resource-create-object-type}}" - name: from-cache # 构造apiserver高负载时, 置为true, 构造etcd高负载时, 置为false value: "true" + - name: inject-stress-page-size + value: "0" + - name: inject-stress-namespace + value: "" - name: inject-stress-user-agent # 发压端UserAgent, 如: "kubestress/1.0.0" value: "kubestress/v0.0.1" - name: inject-stress-concurrency # 发压端并发数 @@ -153,6 +157,10 @@ spec: templates: - name: main steps: + - - name: validate-params + template: validate-params + - - name: suspend + template: suspend - - name: create-apf # 演练开始前, 创建apf限速 arguments: parameters: @@ -228,6 +236,10 @@ spec: value: "{{workflow.parameters.inject-stress-object-type}}" - name: from-cache # 构造apiserver高负载时, 置为true, 构造etcd高负载时, 置为false value: "{{workflow.parameters.from-cache}}" + - name: inject-stress-list-page-size + value: "{{workflow.parameters.inject-stress-page-size}}" + - name: inject-stress-list-namespace + value: "{{workflow.parameters.inject-stress-namespace}}" - name: inject-stress-user-agent # 发压端UserAgent, 如: "kubestress/1.0.0" value: "{{workflow.parameters.inject-stress-user-agent}}" - name: inject-stress-concurrency # 发压端并发数 @@ -272,3 +284,25 @@ spec: template: etcd-protect-cm-orchestrate clusterScope: true when: "'{{workflow.parameters.enable-etcd-overload-protect}}' == 'true'" + + - name: suspend + suspend: {} + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/coredns-disruption-scenario.yaml b/playbook/workflow/coredns-disruption-scenario.yaml index c5172a1..c9bca68 100644 --- a/playbook/workflow/coredns-disruption-scenario.yaml +++ b/playbook/workflow/coredns-disruption-scenario.yaml @@ -11,8 +11,6 @@ spec: serviceAccountName: tke-chaos arguments: parameters: - - name: disruption-duration - value: "30s" - name: workload-type value: "deployment" - name: workload-name @@ -21,7 +19,42 @@ spec: value: "kube-system" - name: kubeconfig-secret-name value: "dest-cluster-kubeconfig" - serviceAccountName: tke-chaos - workflowTemplateRef: - name: workload-disruption-template - clusterScope: true + templates: + - name: main + steps: + - - name: validate-params + template: validate-params + - - name: run-coredns-disruption + templateRef: + name: workload-disruption-template + template: main + clusterScope: true + arguments: + parameters: + - name: workload-type + value: "{{workflow.parameters.workload-type}}" + - name: workload-name + value: "{{workflow.parameters.workload-name}}" + - name: workload-namespace + value: "{{workflow.parameters.workload-namespace}}" + - name: kubeconfig-secret-name + value: "{{workflow.parameters.kubeconfig-secret-name}}" + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/create-resource.yaml b/playbook/workflow/create-resource.yaml index fcc5f5f..c3e0bf2 100644 --- a/playbook/workflow/create-resource.yaml +++ b/playbook/workflow/create-resource.yaml @@ -9,11 +9,11 @@ spec: arguments: parameters: - name: image - value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.2" + value: "ccr.ccs.tencentyun.com/tkeimages/tke-chaos:v0.0.5" - name: namespace value: "tke-chaos-test" - name: object-type - value: "configmaps" + value: "pods" - name: object-size-bytes value: "10000" - name: object-count @@ -23,7 +23,7 @@ spec: - name: qps value: "10" - name: kubeconfig-secret-name - value: "" + value: "dest-cluster-kubeconfig" serviceAccountName: tke-chaos workflowTemplateRef: name: resource-archestrate diff --git a/playbook/workflow/delete-resource.yaml b/playbook/workflow/delete-resource.yaml new file mode 100644 index 0000000..33ac0bc --- /dev/null +++ b/playbook/workflow/delete-resource.yaml @@ -0,0 +1,28 @@ +--- +apiVersion: argoproj.io/v1alpha1 +kind: Workflow +metadata: + name: delete-resource + namespace: tke-chaos-test +spec: + entrypoint: resource-delete + arguments: + parameters: + - name: image + value: "ccr.ccs.tencentyun.com/northjhuang/tke-chaos:v0.0.5" + - name: namespace + value: "tke-chaos-test" + - name: object-type + value: "pods" + - name: object-count + value: "50000" + - name: num-clients + value: "10" + - name: qps + value: "250" + - name: kubeconfig-secret-name + value: "dest-cluster-kubeconfig" + serviceAccountName: tke-chaos + workflowTemplateRef: + name: resource-delete + clusterScope: true diff --git a/playbook/workflow/kubernetes-proxy-disruption-scenario.yaml b/playbook/workflow/kubernetes-proxy-disruption-scenario.yaml index ac15d4e..0d831b6 100644 --- a/playbook/workflow/kubernetes-proxy-disruption-scenario.yaml +++ b/playbook/workflow/kubernetes-proxy-disruption-scenario.yaml @@ -11,8 +11,6 @@ spec: serviceAccountName: tke-chaos arguments: parameters: - - name: disruption-duration - value: "30s" - name: workload-type value: "deployment" - name: workload-name @@ -21,7 +19,42 @@ spec: value: "default" - name: kubeconfig-secret-name value: "dest-cluster-kubeconfig" - serviceAccountName: tke-chaos - workflowTemplateRef: - name: workload-disruption-template - clusterScope: true + templates: + - name: main + steps: + - - name: validate-params + template: validate-params + - - name: run-kubernetes-proxy-disruption + templateRef: + name: workload-disruption-template + template: main + clusterScope: true + arguments: + parameters: + - name: workload-type + value: "{{workflow.parameters.workload-type}}" + - name: workload-name + value: "{{workflow.parameters.workload-name}}" + - name: workload-namespace + value: "{{workflow.parameters.workload-namespace}}" + - name: kubeconfig-secret-name + value: "{{workflow.parameters.kubeconfig-secret-name}}" + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/managed-cluster-apiserver-shutdown-scenario.yaml b/playbook/workflow/managed-cluster-apiserver-shutdown-scenario.yaml index 162c5a9..1d2f931 100644 --- a/playbook/workflow/managed-cluster-apiserver-shutdown-scenario.yaml +++ b/playbook/workflow/managed-cluster-apiserver-shutdown-scenario.yaml @@ -20,13 +20,13 @@ spec: arguments: parameters: - name: region # Tencent Cloud region (e.g. ap-qingyuan) - value: "" + value: "" - name: cluster-id # Cluster ID - value: "" + value: "" - name: secret-id # Tencent Cloud API secret ID - value: "" + value: "" - name: secret-key # Tencent Cloud API secret key - value: "" + value: "" - name: kubeconfig-secret-name # Secret name containing target cluster's kubeconfig value: "dest-cluster-kubeconfig" - name: precheck-configmap-name # ConfigMap name for pre-check validation @@ -36,6 +36,8 @@ spec: templates: - name: main steps: + - - name: validate-params + template: validate-params - - name: precheck arguments: parameters: @@ -146,3 +148,38 @@ spec: - name: duration suspend: duration: "{{inputs.parameters.duration}}" + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.region}}" ]]; then + echo "[ERROR] region parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.cluster-id}}" ]]; then + echo "[ERROR] cluster-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-id}}" ]]; then + echo "[ERROR] secret-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-key}}" ]]; then + echo "[ERROR] secret-key parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/managed-cluster-controller-manager-shutdown-scenario.yaml b/playbook/workflow/managed-cluster-controller-manager-shutdown-scenario.yaml index f74723c..e9cc8f0 100644 --- a/playbook/workflow/managed-cluster-controller-manager-shutdown-scenario.yaml +++ b/playbook/workflow/managed-cluster-controller-manager-shutdown-scenario.yaml @@ -20,13 +20,13 @@ spec: arguments: parameters: - name: region # Tencent Cloud region (e.g. ap-qingyuan) - value: "" + value: "" - name: cluster-id # Cluster ID - value: "" + value: "" - name: secret-id # Tencent Cloud API secret ID - value: "" + value: "" - name: secret-key # Tencent Cloud API secret key - value: "" + value: "" - name: kubeconfig-secret-name # Secret name containing target cluster's kubeconfig value: "dest-cluster-kubeconfig" - name: precheck-configmap-name # ConfigMap name for pre-check validation @@ -146,3 +146,38 @@ spec: - name: duration suspend: duration: "{{inputs.parameters.duration}}" + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.region}}" ]]; then + echo "[ERROR] region parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.cluster-id}}" ]]; then + echo "[ERROR] cluster-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-id}}" ]]; then + echo "[ERROR] secret-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-key}}" ]]; then + echo "[ERROR] secret-key parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/managed-cluster-scheduler-shutdown-scenario.yaml b/playbook/workflow/managed-cluster-scheduler-shutdown-scenario.yaml index b049535..d4af39b 100644 --- a/playbook/workflow/managed-cluster-scheduler-shutdown-scenario.yaml +++ b/playbook/workflow/managed-cluster-scheduler-shutdown-scenario.yaml @@ -20,13 +20,13 @@ spec: arguments: parameters: - name: region # Tencent Cloud region (e.g. ap-qingyuan) - value: "" + value: "" - name: cluster-id # Cluster ID - value: "" + value: "" - name: secret-id # Tencent Cloud API secret ID - value: "" + value: "" - name: secret-key # Tencent Cloud API secret key - value: "" + value: "" - name: kubeconfig-secret-name # Secret name containing target cluster's kubeconfig value: "dest-cluster-kubeconfig" - name: precheck-configmap-name # ConfigMap name for pre-check validation @@ -36,6 +36,8 @@ spec: templates: - name: main steps: + - - name: validate-params + template: validate-params - - name: precheck arguments: parameters: @@ -146,3 +148,38 @@ spec: - name: duration suspend: duration: "{{inputs.parameters.duration}}" + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.region}}" ]]; then + echo "[ERROR] region parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.cluster-id}}" ]]; then + echo "[ERROR] cluster-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-id}}" ]]; then + echo "[ERROR] secret-id parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.secret-key}}" ]]; then + echo "[ERROR] secret-key parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result diff --git a/playbook/workflow/namespace-delete-scenario.yaml b/playbook/workflow/namespace-delete-scenario.yaml index c9fe40a..19211ce 100644 --- a/playbook/workflow/namespace-delete-scenario.yaml +++ b/playbook/workflow/namespace-delete-scenario.yaml @@ -49,6 +49,10 @@ spec: templates: - name: main steps: + - - name: validate-params + template: validate-params + - - name: suspend + template: suspend - - name: create-block-namespace-deletion arguments: parameters: @@ -142,3 +146,22 @@ spec: - name: suspend suspend: {} + + - name: validate-params + script: + image: bitnami/kubectl:1.32.4 + command: [bash] + source: | + #!/bin/bash + set -e + if [[ -z "{{workflow.parameters.kubeconfig-secret-name}}" ]]; then + echo "[ERROR] kubeconfig-secret-name parameter cannot be empty" > /tmp/validate_result + exit 1 + fi + echo "Parameter validation passed" + outputs: + parameters: + - name: result + valueFrom: + default: "null" + path: /tmp/validate_result