5. 创建 kubernetes 相关的实验场景
5.0 blade create k8s
5.0.1 介绍
创建 kubernetes 相关的实验场景,除了使用 blade 命令创建场景外,还可以将实验使用 yaml 文件描述,使用 kubectl 命令执行。目前支持的实验场景如下:
- [blade create k8s node-cpu](blade create k8s node-cpu.md) Node 节点 CPU 负载场景
- [blade create k8s node-network](blade create k8s node-network.md) Node 节点网络场景
- [blade create k8s node-process](blade create k8s node-process.md) Node 节点进程场景
- [blade create k8s node-disk](blade create k8s node-disk.md) Node 节点磁盘场景
- [blade create k8s pod-pod](blade create k8s pod-pod.md) Pod 资源场景,比如杀 Pod
- [blade create k8s pod-network](blade create k8s pod-network.md) Pod 网络资源场景,比如网络延迟
- blade create k8s pod-IO Pod IO 文件系统异常场景
- [blade create k8s pod-fail](blade create k8s pod-fail.md) Pod 不可用异常场景
- [blade create k8s container-container](blade create k8s container-container.md) Container 资源场景,比如杀容器
- [blade create k8s container-cpu](blade create k8s container-cpu.md) 容器内 CPU 负载场景
- [blade create k8s container-network](blade create k8s container-network.md) 容器内网络场景
- [blade create k8s container-process](blade create k8s container-process.md) 容器内进程场景
5.0.2 部署
执行 Kubernetes 实验场景,需要提前部署 ChaosBlade Operator,Helm 安装包下载地址:https://github.com/chaosblade-io/chaosblade-operator/releases 。使用以下命令安装:
helm install --namespace kube-system --name chaosblade-operator chaosblade-operator-<VERSION>.tgz
# 会安装在 kube-system 命令空间下。ChaosBlade Operator 启动后会在每个节点部署 chaosblade-tool Pod 和一个 chaosblade-operator Pod.可通过以下命令查看安装结果:
kubectl get pod -n kube-system -o wide | grep chaosblade
如果显示 chaosblade-operator 和 chaosblade-tool Pod 都处于 Running 状态,说明部署成功,如果部署出现问题,可详见下发的QA。
5.0.3 创建实验
执行方式有两种,一是通过配置 yaml 方式,使用 kubectl 执行,另一种是直接使用 chaosblade 包中的 blade 命令执行,下面以指定一台节点,做 CPU 负载 80% 实验举例。
5.0.3.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: cpu-load
spec:
experiments:
- scope: node
target: cpu
action: fullload
desc: "increase node cpu load by names"
matchers:
- name: names
value:
- "cn-hangzhou.192.168.0.205"
- name: cpu-percent
value:
- "80"
# 例如配置好文件后,保存为 chaosblade_cpu_load.yaml,使用以下命令执行实验场景:
kubectl apply -f chaosblade_cpu_load.yaml
# 可通过以下命令查看每个实验的执行状态:
kubectl get blade cpu-load -o json
# 更多的实验场景配置事例可查看: https://github.com/chaosblade-io/chaosblade-operator/tree/master/examples
#blade 命令执行方式 下载 chaosblade 工具包,下载地址:https://github.com/chaosblade-io/chaosblade/releases ,解压即可使用。还是上述例子,使用 blade 命令执行如下:
blade create k8s node-cpu fullload --names cn-hangzhou.192.168.0.205 --cpu-percent 80 --kubeconfig ~/.kube/config
#使用 blade 命令执行,如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID,使用查询命令可以查询详细的实验结果:
blade query k8s create <UID>
5.0.4 修改实验
yaml 配置文件的方式支持场景动态修改,比如将上述的 cpu 负载调整为 60%,则只需将上述 value 的值从 80 改为 60 即可,例如:
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: cpu-load
spec:
experiments:
- scope: node
target: cpu
action: fullload
desc: "increase node cpu load by names"
matchers:
- name: names
value:
- "cn-hangzhou.192.168.0.205"
- name: cpu-percent
value:
- "80" # 修改为60即可
然后使用 kubeclt apply -f chaosblade_cpu_load.yaml 命令执行更新即可。
5.0.5 销毁实验
可以通过以下三种方式停止实验: 根据实验资源名停止 比如上述 cpu-load 场景,可以执行以下命令停止实验
kubectl delete chaosblade cpu-load
通过 yaml 配置文件停止 指定上述创建好的 yaml 文件进行删除,命令如下:
kubectl delete -f chaosblade_cpu_load.yaml
通过 blade 命令停止 此方式仅限使用 blade 创建的实验,使用以下命令停止:
blade destroy <UID>
是执行 blade create 命令返回的结果,如果忘记,可使用 blade status --type create 命令查询
5.0.6 卸载
执行 helm del --purge chaosblade-operator 卸载即可,将会停止全部实验,删除所有创建的资源。
5.1 kubernetes 节点 CPU 负载实验场景
blade create k8s node-cpu
5.1.1 介绍
kubernetes 节点 CPU 负载实验场景,同基础资源的 CPU 场景
5.1.2 命令
支持 CPU 场景命令如下:
- blade create k8s node-cpu load,节点 CPU 负载场景,同 [blade create cpu load](blade create cpu load.md)
5.1.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: 节点资源标签
–names string: 节点资源名,多个资源名之间使用逗号分隔
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.1.4 案例
面以指定一台节点,做 CPU 负载 80% 实验举例。
5.1.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: cpu-load
spec:
experiments:
- scope: node
target: cpu
action: fullload
desc: "increase node cpu load by names"
matchers:
- name: names
value:
- "cn-hangzhou.192.168.0.205"
- name: cpu-percent
value:
- "80"
例如配置好文件后,保存为 chaosblade_cpu_load.yaml,使用以下命令执行实验场景:
kubectl apply -f chaosblade_cpu_load.yaml
可通过以下命令查看每个实验的执行状态:
kubectl get blade cpu-load -o json
更多的实验场景配置事例可查看: https://github.com/chaosblade-io/chaosblade-operator/tree/v0.0.1/examples
blade 命令执行方式 下载 chaosblade 工具包,下载地址:https://github.com/chaosblade-io/chaosblade/releases/tag/v0.4.0-alpha ,解压即可使用。还是上述例子,使用 blade 命令执行如下:
blade create k8s node-cpu fullload --names cn-hangzhou.192.168.0.205 --cpu-percent 80 --kubeconfig ~/.kube/config
使用 blade 命令执行,如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID,使用查询命令可以查询详细的实验结果:
blade query k8s create <UID>
5.1.4.2 修改实验
yaml 配置文件的方式支持场景动态修改,比如将上述的 cpu 负载调整为 60%,则只需将上述 value 的值从 80 改为 60 即可,例如:
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: cpu-load
spec:
experiments:
- scope: node
target: cpu
action: fullload
desc: "increase node cpu load by names"
matchers:
- name: names
value:
- "cn-hangzhou.192.168.0.205"
- name: cpu-percent
value:
- "60"
然后使用 kubeclt apply -f chaosblade_cpu_load.yaml 命令执行更新即可。
5.1.4.3 销毁实验
可以通过以下三种方式停止实验: 根据实验资源名停止 比如上述 cpu-load 场景,可以执行以下命令停止实验
kubectl delete chaosblade cpu-load
通过 yaml 配置文件停止 指定上述创建好的 yaml 文件进行删除,命令如下:
kubectl delete -f chaosblade_cpu_load.yaml
通过 blade 命令停止 此方式仅限使用 blade 创建的实验,使用以下命令停止:
blade destroy <UID>
是执行 blade create 命令返回的结果,如果忘记,可使用 blade status --type create 命令查询
5.2 kubernetes 节点网络相关场景
blade create k8s node-network
5.2.1 介绍
kubernetes 节点网络相关场景,同基础资源的网络场景
5.2.2 命令
支持的网络场景命令如下:
- blade create k8s node-network delay 节点网络延迟场景,同 [blade create network delay](blade create network delay.md)
- blade create k8s node-network loss 节点网络丢包场景,同 [blade create network loss](blade create network loss.md)
- blade create k8s node-network dns 节点域名访问异常场景,同 [blade create network dns](blade create network dns.md)
5.2.3 参数
除了上述场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: 节点资源标签
–names string: 节点资源名,多个资源名之间使用逗号分隔
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.2.4 案例
对 cn-hangzhou.192.168.0.205 节点本地端口 40690 访问丢包率 60%
5.2.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: loss-node-network-by-names
spec:
experiments:
- scope: node
target: network
action: loss
desc: "node network loss"
matchers:
- name: names
value: ["cn-hangzhou.192.168.0.205"]
- name: percent
value: ["60"]
- name: interface
value: ["eth0"]
- name: local-port
value: ["40690"]
保存为 yaml 文件,比如 loss-node-network-by-names.yaml,使用 kubectl 命令执行:
kubectl apply -f loss-node-network-by-names.yaml
实验状态查询:
kubectl get blade loss-node-network-by-names -o json
返回结果如下(省略了一部分):
~ » kubectl get blade loss-node-network-by-names -o json
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"creationTimestamp": "2019-11-04T09:56:36Z",
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "loss-node-network-by-names",
"resourceVersion": "9262302",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/loss-node-network-by-names",
"uid": "63a926dd-fee9-11e9-b3be-00163e136d88"
},
"status": {
"expStatuses": [
{
"action": "loss",
"resStatuses": [
{
"id": "057acaa47ae69363",
"kind": "node",
"name": "cn-hangzhou.192.168.0.205",
"nodeName": "cn-hangzhou.192.168.0.205",
"state": "Success",
"success": true,
"uid": "e179b30d-df77-11e9-b3be-00163e136d88"
}
],
"scope": "node",
"state": "Success",
"success": true,
"target": "network"
}
],
"phase": "Running"
}
}
执行以下命令停止实验:
kubectl delete -f loss-node-network-by-names.yaml
或者直接删除 blade 资源:
kubectl delete blade loss-node-network-by-names
5.2.4.3 blade 执行方式
blade create k8s node-network loss --percent 60 --interface eth0 --local-port 40690 --kubeconfig config --names cn-hangzhou.192.168.0.205
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"e647064f5f20953c"}
可通过以下命令查询实验状态:
blade query k8s create e647064f5f20953c --kubeconfig config
{"code":200,"success":true,"result":{"uid":"e647064f5f20953c","success":true,"error":"","statuses":[{"id":"fa471a6285ec45f5","uid":"e179b30d-df77-11e9-b3be-00163e136d88","name":"cn-hangzhou.192.168.0.205","state":"Success","kind":"node","success":true,"nodeName":"cn-hangzhou.192.168.0.205"}]}}
5.2.4.4 销毁实验:
blade destroy e647064f5f20953c
5.3 kubernetes 节点进程相关场景
blade create k8s node-process
5.3.1 介绍
kubernetes 节点进程相关场景,同基础资源的进程场景
5.3.2 命令
支持的进程场景命令如下:
- blade create k8s node-process kill 杀节点上指定进程,同 [blade create process kill](blade create process kill.md)
- blade create k8s node-process stop 挂起节点上指定进程,同 [blade create process stop](blade create process stop.md)
5.3.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: 节点资源标签
–names string: 节点资源名,多个资源名之间使用逗号分隔
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.3.4 案例
杀掉指定 cn-hangzhou.192.168.0.205 节点上 kubelet 进程
5.3.4.1 yaml配置方式如下
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: kill-node-process-by-names
spec:
experiments:
- scope: node
target: process
action: kill
desc: "kill node process by names"
matchers:
- name: names
value: ["cn-hangzhou.192.168.0.205"]
- name: process
value: ["redis-server"]
可以看到执行前后,redis-server 的进程号发生改变,说明被杀掉后,又被重新拉起
# ps -ef | grep redis-server
19497 root 2:05 redis-server *:6379
# ps -ef | grep redis-server
31855 root 0:00 redis-server *:6379
通过 kubectl get blade kill-node-process-by-names -o json 可以查看详细的执行结果(下发只截取部分内容)
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "kill-node-process-by-names",
"resourceVersion": "9421288",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/kill-node-process-by-names",
"uid": "24aed084-ff70-11e9-8883-00163e0ad0b3"
},
"status": {
"expStatuses": [
{
"action": "kill",
"resStatuses": [
{
"id": "ebe34959424fb022",
"kind": "node",
"name": "cn-hangzhou.192.168.0.205",
"nodeName": "cn-hangzhou.192.168.0.205",
"state": "Success",
"success": true,
"uid": "e179b30d-df77-11e9-b3be-00163e136d88"
}
],
"scope": "node",
"state": "Success",
"success": true,
"target": "process"
}
],
"phase": "Running"
}
}
],
}
5.3.4.2 执行以下命令停止实验:
kubectl delete -f kill_node_process_by_names.yaml
或者直接删除 blade 资源:
kubectl delete blade kill-node-process-by-names
5.3.4.3 blade 执行方式
blade create k8s node-process kill --process redis-server --names cn-hangzhou.192.168.0.205 --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"fc93e5bbe4827d4b"}
可通过以下命令查询实验状态:
blade query k8s create fc93e5bbe4827d4b --kubeconfig config
{"code":200,"success":true,"result":{"uid":"fc93e5bbe4827d4b","success":true,"error":"","statuses":[{"id":"859c56e6850c1c1b","uid":"e179b30d-df77-11e9-b3be-00163e136d88","name":"cn-hangzhou.192.168.0.205","state":"Success","kind":"node","success":true,"nodeName":"cn-hangzhou.192.168.0.205"}]}}
5.3.4.4 销毁实验:
blade destroy fc93e5bbe4827d4b
5.4 kubernetes 节点磁盘场景
blade create k8s node-disk
5.4.1 介绍
kubernetes 节点磁盘场景,包含磁盘填充和磁盘IO读写高
5.4.2 命令
支持 CPU 场景命令如下:
- blade create k8s node-disk fill,节点磁盘填充,同 [blade create disk fill](blade create disk fill.md)
- blade create k8s node-disk burn,节点磁盘IO读写负载,同 [blade create disk burn](blade create disk burn.md)
5.4.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: 节点资源标签
–names string: 节点资源名,多个资源名之间使用逗号分隔
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.4.4 案例
指定节点磁盘占用 80%
5.4.4.1 blade 命令执行方式
blade c k8s node-disk fill --names cn-hangzhou.192.168.0.35 --percent 80 --kubeconfig ~/.kube/config
{"code":200,"success":true,"result":"ec322fbb977a455c"}
df -h
Filesystem Size Used Available Use% Mounted on
/dev/vda1 118.0G 89.0G 24.0G 79% /
# 恢复实验
blade d ec322fbb977a455c
{"code":200,"success":true,"result":{"Target":"node-disk","Scope":"","ActionName":"fill","ActionFlags":{"kubeconfig":"~/.kube/config","names":"cn-hangzhou.192.168.0.35","percent":"80"}}}
df -h
Filesystem Size Used Available Use% Mounted on
/dev/vda1 118.0G 74.8G 38.1G 66% /
使用 blade 命令执行,如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID,使用查询命令可以查询详细的实验结果:
blade query k8s create <UID>
5.5 kubernetes Pod 资源自身场景
blade create k8s pod-pod
5.5.1 介绍
kubernetes Pod 资源自身场景,比如删除 Pod
5.5.2 命令
支持的场景命令如下:
- blade create k8s pod-pod delete 删除 POD
5.5.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.5.4 案例
删除指定 default 命名空间下标签是 app=guestbook 的 pod,删除
5.5.4.1 yaml配置方式如下
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: delete-two-pod-by-labels
spec:
experiments:
- scope: pod
target: pod
action: delete
desc: "delete pod by labels"
matchers:
- name: labels
value:
- "app=guestbook"
- name: namespace
value:
- "default"
- name: evict-count
value:
- "2"
保存文件为 delete_pod_by_labels.yaml,使用 kubectl apply -f delete_pod_by_labels.yaml 命令执行,可以看到执行前后,指定数量的 Pod 被杀掉后,又被重新拉起 before after
通过 kubectl get blade delete-two-pod-by-labels -o json 可以查看详细的执行结果(下发只截取部分内容)
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "delete-two-pod-by-labels",
"resourceVersion": "9423460",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/delete-two-pod-by-labels",
"uid": "f31da567-ff71-11e9-a8e2-00163e08a39b"
},
"status": {
"expStatuses": [
{
"action": "delete",
"resStatuses": [
{
"kind": "pod",
"name": "frontend-d89756ff7-94fj6",
"nodeName": "cn-hangzhou.192.168.0.203",
"state": "Success",
"success": true,
"uid": "79cd691c-fe3a-11e9-8883-00163e0ad0b3"
},
{
"kind": "pod",
"name": "frontend-d89756ff7-dkgmd",
"nodeName": "cn-hangzhou.192.168.0.205",
"state": "Success",
"success": true,
"uid": "79d1f47e-fe3a-11e9-8883-00163e0ad0b3"
}
],
"scope": "pod",
"state": "Success",
"success": true,
"target": "pod"
}
],
"phase": "Running"
}
}
5.5.4.3 执行以下命令停止实验:
kubectl delete -f delete_pod_by_labels.yaml
或者直接删除 blade 资源:
kubectl delete blade delete-two-pod-by-labels
删除 Pod 的停止实验操作,chaosblade 本身不会重新拉起被删除的 Pod,只是去更改实验状态!!
blade 执行方式
blade create k8s pod-pod delete --labels app=guestbook --namespace default --evict-count 2 --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"4d3caa0a99c3b2dd"}
可通过以下命令查询实验状态:
blade query k8s create 4d3caa0a99c3b2dd --kubeconfig config
{"code":200,"success":true,"result":{"uid":"4d3caa0a99c3b2dd","success":true,"error":"","statuses":[{"uid":"f325d43c-ff71-11e9-8883-00163e0ad0b3","name":"frontend-d89756ff7-5wgg5","state":"Success","kind":"pod","success":true,"nodeName":"cn-hangzhou.192.168.0.203"},{"uid":"28af19dd-f987-11e9-bd30-00163e08a39b","name":"frontend-d89756ff7-dpv7h","state":"Success","kind":"pod","success":true,"nodeName":"cn-hangzhou.192.168.0.205"}]}}
5.5.4.4 销毁实验:
blade destroy 4d3caa0a99c3b2dd
5.6 给kubernetes的pod注入文件系统I/O故障
blade create k8s pod-IO
5.6.1 介绍
k8s pod文件系统I/O异常场景,可以模拟对指定路径上的文件读写异常,包括延迟,错误等.
注意!!!此场景需要激活–webhook-enable参数,如需使用此功能,请在 chaosblade-operator 参数中添加 --webhook-enable,或者在安装时指定,例如 helm 安装时: --set webhook.enable=true 指定。
5.6.2 前提条件
集群中部署了chaosblade-admission-webhook
需要注入故障的volume设置mountPropagation为HostToContainer
pod上面添加了如下annotations:
chaosblade/inject-volume: "data" //需要注入故障的volume name
chaosblade/inject-volume-subpath: "conf" //volume挂载的子目录
blade create k8s pod-pod IO
5.6.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
–methods string: I/O故障方法
–delay string: I/O延迟时间
–errno string: 指定特性的I/O异常错误码
–random string: 随机产生I/O异常错误码
–percent string: I/O错误百分比 [0-100]
–path string: I/O异常的目录或者文件
5.6.4 案例
首先,通过deployment部署测试pod,并在pod的annotation里面指定需要注入I/O异常的volume以及子目录。
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: test
name: test
namespace: test
spec:
replicas: 1
selector:
matchLabels:
app: test
template:
metadata:
annotations:
chaosblade/inject-volume: data
chaosblade/inject-volume-subpath: conf
labels:
app: test
spec:
containers:
- command: ["/bin/sh", "-c", "while true; do sleep 10000; done"]
image: busybox
imagePullPolicy: IfNotPresent
name: test
volumeMounts:
- mountPath: /data
mountPropagation: HostToContainer
name: data
volumes:
- hostPath:
path: /data/fuse
name: data
chaosblade webhook会根据pod的annotation,注入fuse的sidecar容器:
chaosblade/inject-volume指明需要注入故障的volume name,比如例子中的data
chaosblade/inject-volume-subpath指明volume挂载路径的子目录。上面的例子中,volume的挂载路径是/data,子目录是conf,则在pod内,注入I/O异常的目录是/data/conf。
指定需要注入故障的volume需要指定mountPropagation:HostToContainer,这个字段的含义可以参考官方文档Volumes
通过上面的yaml文件创建deployment后,chaosblade webhook会自动插入sidecar容器:
kubectl get pod -n test
NAME READY STATUS RESTARTS AGE
test-bc7786698-k6tb7 2/2 Running 0 3m40s
这时虽然插入了sidecar容器,但是还没有注入I/O异常,可以通过下面的yaml注入相关的I/O异常:
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: inject-pod-by-labels
spec:
experiments:
- scope: pod
target: pod
action: IO
desc: "Pod IO Exception by labels"
matchers:
- name: labels
value:
- "app=test"
- name: namespace
value:
- "test"
- name: method
value:
- "read"
- name: delay
value:
- "1000"
- name: path
value:
- ""
- name: percent
value:
- "60"
- name: errno
value:
- "28"
在这里例子中,我们对read操作注入两种异常,异常率为百分之60:
对read操作增加1s的延迟,支持的操作类型包括:
- open
- read
- write
- mkdir
- rmdir
- opendir
- fsync
- flush
- release
- truncate
- getattr
- chown
- chmod
- utimens
- allocate
- getlk
- setlk
- setlkw
- statfs
- readlink
- symlink
- create
- access
- link
- mknod
- rename
- unlink
- getxattr
- listxattr
- removexattr
- setxattr
对read操作返回错误28,支持的错误码包括:
1: Operation not permitted
2: No such file or directory
5: I/O error
6: No such device or address
12: Out of memory
16: Device or resource busy
17: File exists
20: Not a directory
22: Invalid argument
24: Too many open files
28: No space left on device
当用上面的yaml文件注入I/O异常后,在pod内读取指定目录中的文件,发现返回了No space left on device,因为有重试,显示有3s的延迟。
kubectl exec test-bc7786698-k6tb7 -c test -n test time cat /data/conf/file
cat: read error: No space left on device
Command exited with non-zero status 1
real 0m 3.00s
user 0m 0.00s
sys 0m 0.00s
5.7 kubernetes Pod网络相关场景
blade create k8s pod-network
5.7.1 介绍
kubernetes Pod网络相关场景,同基础资源的网络场景
5.7.2 命令
支持的网络场景命令如下:
- blade create k8s pod-network delay Pod 网络延迟场景,同 [blade create network delay](blade create network delay.md)
- blade create k8s pod-network loss Pod 网络丢包场景,同 [blade create network loss](blade create network loss.md)
- blade create k8s pod-network dns Pod 域名访问异常场景,同 [blade create network dns](blade create network dns.md)
5.7.3 参数
除了上述场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.7.4 案例
对 default 命名空间下,指定名为 redis-slave-674d68586-jnf7f Pod本地端口 6379 访问延迟 3000 毫秒,延迟时间上下浮动 1000 毫秒
5.7.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: delay-pod-network-by-names
spec:
experiments:
- scope: pod
target: network
action: delay
desc: "delay pod network by names"
matchers:
- name: names
value:
- "redis-slave-674d68586-jnf7f"
- name: namespace
value:
- "default"
- name: local-port
value: ["6379"]
- name: interface
value: ["eth0"]
- name: time
value: ["3000"]
- name: offset
value: ["1000"]
保存为 yaml 文件,比如 delay_pod_network_by_names.yaml,使用 kubectl 命令执行:
kubectl apply -f delay_pod_network_by_names.yaml
实验状态查询:
kubectl get blade delay-pod-network-by-names -o json
返回结果如下(省略了一部分):
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "delay-pod-network-by-names",
"resourceVersion": "9425766",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/delay-pod-network-by-names",
"uid": "cf32327c-ff73-11e9-b3be-00163e136d88"
},
"status": {
"expStatuses": [
{
"action": "delay",
"resStatuses": [
{
"id": "e28f6e3ae2732a86",
"kind": "pod",
"name": "chaosblade-tool-vv49t", // 此pod为sidecar
"nodeName": "cn-hangzhou.192.168.0.204",
"state": "Success",
"success": true,
"uid": "4f1a28a1-fee6-11e9-8883-00163e0ad0b3"
}
],
"scope": "pod",
"state": "Success",
"success": true,
"target": "network"
}
],
"phase": "Running"
}
}
],
}
可通过访问服务,或者 telnet 命令验证实验效果
5.7.4.2 执行以下命令停止实验:
kubectl delete -f delay_pod_network_by_names.yaml
或者直接删除 blade 资源:
kubectl delete blade delay-pod-network-by-names
5.7.4.3 blade 执行方式
blade create k8s pod-network delay --time 3000 --offset 1000 --interface eth0 --local-port 6379 --names redis-slave-674d68586-jnf7f --namespace default --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"127f1ee0afcd4798"}
可通过以下命令查询实验状态:
blade query k8s create 127f1ee0afcd4798 --kubeconfig config
{"code":200,"success":true,"result":{"uid":"127f1ee0afcd4798","success":true,"error":"","statuses":[{"id":"b5a216dddeb3389f","uid":"4f1a28a1-fee6-11e9-8883-00163e0ad0b3","name":"chaosblade-tool-vv49t","state":"Success","kind":"pod","success":true,"nodeName":"cn-hangzhou.192.168.0.204"}]}}
5.7.4.4 销毁实验:
blade destroy 127f1ee0afcd4798
5.8 kubernetes 下 容器内 CPU 负载实验场景
blade create k8s container-cpu
5.8.1 介绍
kubernetes 下 容器内 CPU 负载实验场景,同基础资源的 CPU 场景
5.8.2 命令
支持 CPU 场景命令如下:
- blade create k8s container-cpu load,容器内 CPU 负载场景,同 [blade create cpu load](blade create cpu load.md)
5.8.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–container-ids string: 容器ID,支持配置多个
–container-names string: 容器名称,支持配置多个
–docker-endpoint string: Docker server 地址,默认为本地的 /var/run/docker.sock
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.8.4 案例
指定 default 命名空间下 Pod 名为 frontend-d89756ff7-pbnnc,容器id为 2ff814b246f86,做 CPU 负载 100% 实验举例。
5.8.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: increase-container-cpu-load-by-id
spec:
experiments:
- scope: container
target: cpu
action: fullload
desc: "increase container cpu load by id"
matchers:
- name: container-ids
value:
- "2ff814b246f86"
- name: cpu-percent
value: ["100"]
# pod names
- name: names
value: ["frontend-d89756ff7-pbnnc"]
例如配置好文件后,保存为 increase_container_cpu_load_by_id.yaml,使用以下命令执行实验场景:
kubectl apply -f increase_container_cpu_load_by_id.yaml
可通过以下命令查看每个实验的执行状态:
kubectl get blade increase-container-cpu-load-by-id -o json
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "increase-container-cpu-load-by-id",
"resourceVersion": "9432486",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/increase-container-cpu-load-by-id",
"uid": "737ae2e8-ff79-11e9-a8e2-00163e08a39b"
},
"status": {
"expStatuses": [
{
"action": "fullload",
"resStatuses": [
{
"id": "2bcb4178003f46fe",
"kind": "container",
"name": "php-redis",
"nodeName": "cn-hangzhou.192.168.0.204",
"state": "Success",
"success": true,
"uid": "2ff814b246f86aba2392379640e4c6b16efbfd61846fc419a24f8d8ccf0f86f0"
}
],
"scope": "container",
"state": "Success",
"success": true,
"target": "cpu"
}
],
"phase": "Running"
}
}
通过资源监控,可以看到此 Pod 下 CPU 使用情况 monitor
使用以下命令停止实验:
kubectl delete -f examples/increase_container_cpu_load_by_id.yaml
5.8.4.2 blade 命令执行方式
blade create k8s container-cpu fullload --cpu-percent 100 --container-ids 2ff814b246f86 --names frontend-d89756ff7-pbnnc --namespace default --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"092e8b4d88d4f449"}
可通过以下命令查询实验状态:
blade query k8s create 092e8b4d88d4f449 --kubeconfig config
{"code":200,"success":true,"result":{"uid":"092e8b4d88d4f449","success":true,"error":"","statuses":[{"id":"eab5fb70b61c9c45","uid":"2ff814b246f86aba2392379640e4c6b16efbfd61846fc419a24f8d8ccf0f86f0","name":"php-redis","state":"Success","kind":"container","success":true,"nodeName":"cn-hangzhou.192.168.0.204"}]}}
销毁实验:
blade destroy 092e8b4d88d4f449
5.9 kubernetes 下容器内网络实验场景
blade create k8s container-network
5.9.1 介绍
kubernetes 下 容器内网络实验场景,同基础资源网络场景,由于同一个 Pod 内的容器共享 Pod 网络,所以效果同对 Pod 网络实验
5.9.2 命令
支持的网络场景命令如下:
- blade create k8s container-network delay container 网络延迟场景,同 [blade create network delay](blade create network delay.md)
- blade create k8s container-network loss container 网络丢包场景,同 [blade create network loss](blade create network loss.md)
- blade create k8s container-network dns container 域名访问异常场景,同 [blade create network dns](blade create network dns.md)
5.9.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–container-ids string: 容器ID,支持配置多个
–container-names string: 容器名称,支持配置多个
–docker-endpoint string: Docker server 地址,默认为本地的 /var/run/docker.sock
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.9.4 案例
指定 default 命名空间下 Pod 名为 frontend-d89756ff7-pbnnc,容器id为 2ff814b246f86,做访问 www.baidu.com 域名异常实验举例。
5.9.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: tamper-container-dns-by-id
spec:
experiments:
- scope: container
target: network
action: dns
desc: "tamper container dns by id"
matchers:
- name: container-ids
value:
- "4b25f66580c4"
- name: domain
value: ["www.baidu.com"]
- name: ip
value: ["10.0.0.1"]
# pod names
- name: names
value: ["frontend-d89756ff7-trsxf"]
# or use pod labels
例如配置好文件后,保存为 tamper_container_dns_by_id.yaml,使用以下命令执行实验场景:
kubectl apply -f tamper_container_dns_by_id.yaml
可通过以下命令查看每个实验的执行状态:
kubectl get blade tamper_container_dns_by_id.yaml -o json
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "tamper-container-dns-by-id",
"resourceVersion": "9435600",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/tamper-container-dns-by-id",
"uid": "137372c2-ff7c-11e9-8883-00163e0ad0b3"
},
"status": {
"expStatuses": [
{
"action": "dns",
"resStatuses": [
{
"id": "1141530f66869a82",
"kind": "container",
"name": "php-redis",
"nodeName": "cn-hangzhou.192.168.0.203",
"state": "Success",
"success": true,
"uid": "4b25f66580c4dbf465a1b167c4c6967e987773442e5d47f0bee5db0a5e27a12d"
}
],
"scope": "container",
"state": "Success",
"success": true,
"target": "network"
}
],
"phase": "Running"
}
}
可以登录容器访问 www.baidu.com 域名进行验证
使用以下命令停止实验:
kubectl delete -f tamper_container_dns_by_id.yaml
5.9.4.2 blade 命令执行方式
blade create k8s container-network dns --domain www.baidu.com --ip 10.0.0.1 --names frontend-d89756ff7-trsxf --namespace default --container-ids 4b25f66580c4 --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"6e46a5df94e0b065"}
可通过以下命令查询实验状态:
blade query k8s create 6e46a5df94e0b065 --kubeconfig config
{"code":200,"success":true,"result":{"uid":"6e46a5df94e0b065","success":true,"error":"","statuses":[{"id":"90304950e52d679e","uid":"4b25f66580c4dbf465a1b167c4c6967e987773442e5d47f0bee5db0a5e27a12d","name":"php-redis","state":"Success","kind":"container","success":true,"nodeName":"cn-hangzhou.192.168.0.203"}]}}
销毁实验:
blade destroy 6e46a5df94e0b065
5.10 kubernetes 下 容器内进程场景
blade create k8s container-process
5.10.1 介绍
kubernetes 下 容器内进程场景,同基础资源的进程场景
5.10.2 命令
支持的进程场景命令如下:
- blade create k8s container-process kill 杀容器内指定进程,同 [blade create process kill](blade create process kill.md)
- blade create k8s container-process stop 挂起容器内指定进程,同 [blade create process stop](blade create process stop.md)
5.10.3 参数
除了上述基础场景各自所需的参数外,在 kubernetes 环境下,还支持的参数如下:
–container-ids string: 容器ID,支持配置多个
–container-names string: 容器名称,支持配置多个
–docker-endpoint string: Docker server 地址,默认为本地的 /var/run/docker.sock
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
5.10.4 案例
指定 default 命名空间下 Pod 名是 frontend-d89756ff7-tl4xl,容器id为 f1de335b4eeaf,进程名为 top 的进程。
5.10.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: kill-container-process-by-id
spec:
experiments:
- scope: container
target: process
action: kill
desc: "kill container process by id"
matchers:
- name: container-ids
value:
- "f1de335b4eeaf"
- name: process
value: ["top"]
- name: names
value: ["frontend-d89756ff7-tl4xl"]
例如配置好文件后,保存为 kill_container_process_by_id.yaml ,使用以下命令执行实验场景:
kubectl apply -f kill_container_process_by_id.yaml
可通过以下命令查看每个实验的执行状态:
kubectl get blade kill-container-process-by-id -o json
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "kill-container-process-by-id",
"resourceVersion": "9438733",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/kill-container-process-by-id",
"uid": "a5a597be-ff7e-11e9-a8e2-00163e08a39b"
},
"status": {
"expStatuses": [
{
"action": "kill",
"resStatuses": [
{
"id": "10cdc57b9c80a9f0",
"kind": "container",
"name": "php-redis",
"nodeName": "cn-hangzhou.192.168.0.204",
"state": "Success",
"success": true,
"uid": "f1de335b4eeaf035b8d23a87080f3d24cebc803cbb6ad15e5fe0d8567e2e8939"
}
],
"scope": "container",
"state": "Success",
"success": true,
"target": "process"
}
],
"phase": "Running"
}
}
使用以下命令停止实验:
kubectl delete -f kill_container_process_by_id.yaml
注意,停止实验不会恢复已杀掉的进程!!
5.10.4.2 blade 命令执行方式
blade create k8s container-process kill --process top --names frontend-d89756ff7-tl4xl --container-ids f1de335b4eeaf --namespace default --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"06d5ebae60e8fe3f"}
可通过以下命令查询实验状态:
blade query k8s create 06d5ebae60e8fe3f --kubeconfig config
{"code":200,"success":true,"result":{"uid":"06d5ebae60e8fe3f","success":true,"error":"","statuses":[{"id":"1000cbd2018e2c90","uid":"f1de335b4eeaf035b8d23a87080f3d24cebc803cbb6ad15e5fe0d8567e2e8939","name":"php-redis","state":"Success","kind":"container","success":true,"nodeName":"cn-hangzhou.192.168.0.204"}]}}
销毁实验:
blade destroy 06d5ebae60e8fe3f
5.11 Kubernetes 下container 资源自身的场景
blade create k8s container-container
5.11.1 介绍
Kubernetes 下 container 资源自身的场景,比如删容器,需要注意,执行容器场景,必须先确定 Pod,所以需要配置 Pod 相关参数
5.11.2 命令
支持场景命令如下
- blade create k8s container-container remove 删除容器
5.11.3 参数
–container-ids string: 容器ID,支持配置多个
–container-names string: 容器名称,支持配置多个
–docker-endpoint string: Docker server 地址,默认为本地的 /var/run/docker.sock
–namespace string: Pod 所属的命名空间,只能填写一个值,必填项
–evict-count string: 限制实验生效的数量
–evict-percent string: 限制实验生效数量的百分比,不包含 %
–labels string: Pod 资源标签,多个标签之前是或的关系
–names string: Pod 资源名
–kubeconfig string: kubeconfig 文件全路径(仅限使用 blade 命令调用时使用)
–waiting-time string: 实验结果等待时间,默认为 20s,参数值要包含单位,例如 10s,1m
–force: 是否强制删除
5.11.4 案例
删除 default 命名空间下,Pod 名为 frontend-d89756ff7-szblb 下的 container id 是 072aa6bbf2e2e2 的容器
5.11.4.1 yaml 配置方式
apiVersion: chaosblade.io/v1alpha1
kind: ChaosBlade
metadata:
name: remove-container-by-id
spec:
experiments:
- scope: container
target: container
action: remove
desc: "remove container by id"
matchers:
- name: container-ids
value: ["072aa6bbf2e2e2"]
# pod name
- name: names
value: ["frontend-d89756ff7-szblb"]
- name: namespace
value: ["default"]
保存为 yaml 文件,比如 remove_container_by_id.yaml,使用 kubectl 命令执行:
kubectl apply -f remove_container_by_id.yaml
实验状态查询:
kubectl get blade remove-container-by-id -o json
返回结果如下(省略了一部分):
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "chaosblade.io/v1alpha1",
"kind": "ChaosBlade",
"metadata": {
"finalizers": [
"finalizer.chaosblade.io"
],
"generation": 1,
"name": "remove-container-by-id",
"resourceVersion": "9429224",
"selfLink": "/apis/chaosblade.io/v1alpha1/chaosblades/remove-container-by-id",
"uid": "bb1482ea-ff76-11e9-8883-00163e0ad0b3"
},
"status": {
"expStatuses": [
{
"action": "remove",
"resStatuses": [
{
"id": "f5bfa325da504cac",
"kind": "container",
"name": "php-redis",
"nodeName": "cn-hangzhou.192.168.0.205",
"state": "Success",
"success": true,
"uid": "072aa6bbf2e2e286ec77b4b05440107b48aeebae6aea06e8e3a65b40e4f40326"
}
],
"scope": "container",
"state": "Success",
"success": true,
"target": "container"
}
],
"phase": "Running"
}
}
],
}
执行前后,可以看到 Pod 内容器的变化: before after
执行以下命令停止实验:
kubectl delete -f remove_container_by_id.yaml
或者直接删除 blade 资源:
kubectl delete blade remove-container-by-id
删除容器后,执行销毁实验命令不会恢复容器,需要靠容器自身的管理拉起!
5.11.4.2 blade 执行方式
blade create k8s container-container remove --container-ids 060833967b0a37 --names frontend-d89756ff7-szblb --namespace default --kubeconfig config
如果执行失败,会返回详细的错误信息;如果执行成功,会返回实验的 UID:
{"code":200,"success":true,"result":"17d7021c777b76e3"}
可通过以下命令查询实验状态:
blade query k8s create 17d7021c777b76e3 --kubeconfig config
{"code":200,"success":true,"result":{"uid":"17d7021c777b76e3","success":true,"error":"","statuses":[{"id":"205515ad8fcc31da","uid":"060833967b0a3733d10f0e64d3639066b8b7fbcf371e0ace2401af150dbd9b12","name":"php-redis","state":"Success","kind":"container","success":true,"nodeName":"cn-hangzhou.192.168.0.205"}]}}
销毁实验:
blade destroy 17d7021c777b76e3
删除容器后,执行销毁实验命令不会恢复容器,需要靠容器自身的管理拉起!