traas-stack / chaosmeta Goto Github PK
View Code? Open in Web Editor NEWA chaos engineering platform for supporting the complete fault drill lifecycle.
Home Page: https://chaosmeta.gitbook.io/chaosmeta-cn
License: Apache License 2.0
A chaos engineering platform for supporting the complete fault drill lifecycle.
Home Page: https://chaosmeta.gitbook.io/chaosmeta-cn
License: Apache License 2.0
堆积pendding状态的pod实验,需要一个当前集群不存在的ns,在配置过程中只能选择已存在的ns,无法手动输入不存在的ns;
版本是v0.6.0
问题描述:
看官方文档里面,目前只有K8S集群的chaosmeta部署,没有纯docker模式下的部署,
自己docker pull一个 chaosmeta前端的docker,貌似缺少一些依赖,想问问有没有docker compose或者纯cocker的部署文档
万分感谢
The following error is occurred when injecting network type faults.
- injectObjectName: pod/default/httpserver-8cb888b6d-klfbj/httpserver
message: "experiment inject error: kubectl exec error: exec remote cmd error:
command terminated with exit code 1 time=\"2023-09-18 13:02:59\" level=error
msg=\"unknown args: [true], please add -h to get more info\"\n "
Experiment detail:
apiVersion: chaosmeta.io/v1alpha1
kind: Experiment
metadata:
creationTimestamp: "2023-09-18T05:02:59Z"
finalizers:
- chaosmeta/experiment
generation: 1
name: inject-pod-network-170363570037876736011695013377experiment-170363570039135027211695013377node
namespace: chaosmeta
resourceVersion: "213682137"
uid: a35922f5-a377-41b7-8701-41a69c09781a
spec:
experiment:
args:
- key: percent
value: "30"
valueType: int
- key: interface
value: eth0
valueType: string
- key: mode
value: normal
valueType: string
- key: force
value: "true"
valueType: bool
- key: containername
value: firstcontainer
valueType: string
duration: 60s
fault: loss
target: network
scope: pod
selector:
- name:
- httpserver-8cb888b6d-klfbj
namespace: default
targetPhase: inject
status:
createTime: "2023-09-18 05:02:59"
detail:
inject:
- injectObjectName: pod/default/httpserver-8cb888b6d-klfbj/httpserver
message: "experiment inject error: kubectl exec error: exec remote cmd error:
command terminated with exit code 1 time=\"2023-09-18 13:02:59\" level=error
msg=\"unknown args: [true], please add -h to get more info\"\n "
startTime: "2023-09-18 05:02:59"
status: failed
2023-12-11 06:00:24 error experiment/routine.go:128 convertToExperimentInstance:{"uuid":"17340907200016588801","name":"test","description":"","creator":1,"namespace_id":1,"create_time":"","update_time":"","status":"Running","message":"","workflow_nodes":[{"uuid":"768e691097ea11eea3611b6e38aa27a3","name":"增删Pod标签","row":0,"column":0,"duration":"60s","scope_id":3,"target_id":23,"exec_name":"增删Pod标签","exec_type":"fault","exec_id":68,"status":"","message":"","create_time":"","update_time":"","args_value":[{"args_id":251,"value":"app=demo"}],"subtasks":{"id":0,"workflow_node_instance_uuid":"","target_name":"chaosmeta-measure-controller-manager-85c4f44449-fqnvh","target_ip":"","target_hostname":"","target_label":"","target_app":"","target_namespace":"chaosmeta","range_type":"","exec_log":"","status":"","message":"","create_time":"0001-01-01T00:00:00Z","update_time":"0001-01-01T00:00:00Z"},"flow_subtasks":null,"measure_subtasks":null}]}
2023-12-11 06:00:35 error experiment/routine.go:192 fault CR get failed, err:experiments.chaosmeta.io "inject-fault-kubernetes-pod-e-173409073969391616011702274424node" not found
After the experiment is executed, observe the experimental indicators and record the experimental results
Containerd runs as the default container after k8s version 1.24. So we need to support it.
Add the part of chaos engineering experiment in chaosmeta-platform, which refers to the experimental scene formed by the combination of certain failure nodes, which can be arranged in the experiment
模拟Kubernetes原子故障注入能力:删除pod
报错:kubectl apply -f 111.yaml
Error from server (InternalError): error when creating "111.yaml": Internal error occurred: failed calling webhook "mexperiment.kb.io": failed to call webhook: Post "https://chaosmeta-inject-webhook-service.chaosmeta.svc:443/mutate-inject-chaosmeta-io-v1alpha1-experiment?timeout=10s": dial tcp 10.99.2.161:443: connect: connection timed out
$kubectl get pod -n obcluster
NAME READY STATUS RESTARTS AGE
sapp-ob-test-cn-zone1-0 2/2 Running 0 25h
sapp-ob-test-cn-zone2-0 2/2 Running 0 25h
sapp-ob-test-cn-zone3-0 2/2 Running 0 25h
111.yaml配置文件内容如下
$cat 111.yaml
apiVersion: inject.chaosmeta.io/v1alpha1
kind: Experiment
metadata:
name: kubernetes-pod-delete-experiment
namespace: chaosmeta
spec:
scope: kubernetes
targetPhase: inject
rangeMode:
type: count
value: 2
experiment:
target: pod
fault: delete
duration: 10m
selector:
- namespace: obcluster
name:
- sapp-ob-test-cn-zone3-0
Domestic network access to the ghcr.io registry to download mirrors is very slow. Can it support the domestic public mirror library?
Add golangci-lint
check in CI.
I create a "内存填充" type fault, the configuration parameters refer to the following screenshot. But an error was encountered during execution.
# k -n chaosmeta get experiments.chaosmeta.io inject-pod-mem-170359246454102835211695003069experiment-170359246457458278411695003069node -o yaml
......
recover:
- injectObjectName: pod/default/httpserver-8cb888b6d-klfbj/httpserver
message: 'inject error: container cp from [/tmp/chaosmetad-0.2.0/tools/chaosmeta_memfill]
to [/tmp/chaosmeta_memfill] error: task start error: OCI runtime exec failed:
exec failed: unable to start container process: exec: "/bin/bash": stat /bin/bash:
no such file or directory: unknown'
......
1.When there are many parallel tasks, task status synchronization may occasionally fail.
2.When the fault CR is deleted, the task is always in progress.
When the cluster resources are insufficient, all tasks will fail without a prompt message.
In my opinion, prompt message should be add in this condition.
we need english language UI.
The configurations of these three faults all return values from runtime methods.
When I create a “parallel” experiment, it is actually executed serially. When I create a “serial” experiment, it is actually executed in parallel.
The following screenshot shows that the two experiments in row “1” are actually executed in parallel, while the experiment in row "2" is executed after the experiment in row 1 is completed.
hi~
我使用0.5.0版本部署文件部署了chaosmeta体验一下,发现度量引擎和流量注入是被禁用的,是我的配置不对还是还没有release出来呢
From the component level, not the application level
diskio error
Is java runtime injection persistent injection? For example, if the process is restarted, does the fault still take effect?
Does the platform layer support fault injection for machines in non-kubernetes clusters?
We need SSO, such as OIDC, SAML, OAUTH.
could consider using dex (https://dexidp.io/).
Thanks to provide that great tool, in my environment using the mysql/Elasticsearch/clickhouse/kafka… Etc., hoping to open up more Risk Catalog and Metric Engine and use to produc environment
I want to get the ability to inject DNS failure
pod-->mem --> fill
Due to the limitations on the front end of the page, percentage is required and must be between 1 and 100, and cannot be 0. And once percent= 0, the number of injected bytes will be calculated based on percentage (source code chaosmetad pkg/utils/memory/mem. go). This results in the inability to specify a fixed byte value。
由于页面前端限制了 percent 为必填项,并且必须为1-100之间,不能为0。一旦 percent !=0 ,就会按percent计算注入的bytes数,(源码chaosmetad pkg/utils/memory/mem.go 34行)。从而导致无法指定一个固定的 bytes 值,或者说指定了也无效,因为percent不能填0。
另外,我个人觉得对于pod来说,这里的 percent 没什么实际意义,因为percent是按宿主机的内存指标计算的,与pod的request和limit都没有关系。这里的逻辑是否应该完善一下?不知道我的理解是否正确,还是这个设计是针对某特定场景而考虑的。
Error occurred: inject memory faults proportionally when using minimalist container image (such as distroless and scratch)
chaosmetad log:
./chaosmetad inject mem fill --percent=30 --mode=ram --timeout 180s --container-runtime containerd --container-id 0737e2e63ccd4c0df86b3b5a4c287c5732d9f2b92d0d1ecba390a8e5c4ae174e --log-level debug
DEBU[2023-10-17 14:38:06] get containerd client
DEBU[2023-10-17 14:38:06] new containerd client, ns: k8s.io, socket: /run/containerd/containerd.sock
INFO[2023-10-17 14:38:06] uid: 202310171438066116
INFO[2023-10-17 14:38:06] args: {"percent":30,"mode":"ram"}
DEBU[2023-10-17 14:38:06] get containerd client
DEBU[2023-10-17 14:38:06] container exec cmd: [/bin/bash -c /root/chaosmeta-github/chaosmeta/chaosmetad/tools/chaosmeta_execns -t 1083599 -m -c "grep -m1 MemTotal /proc/meminfo | sed 's/[^0-9]*//g'"]
DEBU[2023-10-17 14:38:07] container exec result: exit code: 0, output: , err: <nil>
ERRO[2023-10-17 14:38:07] inject error: calculateFillKBytes error: get total mem error: get total mem[] error: strconv.ParseFloat: parsing "": invalid syntax
now only support first container in selected pod
Does the platform layer support cross-kubernetes cluster injection?
We hope to perform fault injection on arm architecture machines, but chaosmeta only support x86 architecture. Other chaos engineering tools also support arm architecture
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.