Comments (12)
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: The restart/shutdown button is changed to non-forced shutdown, leaving some time for the applications in the pod to shut down.
from rainbond.
目前,在重启和关闭时,是会去直接删除底层的 Deployment 资源的。实际上,如果你直接使用K8s的Yaml文件部署,再删除,也会遇到相同的问题。
你现在在业务中使用了钩子,所以收到删除pod信号时,你的代码会优雅退出。但是 Kubelet 删除 pod 却几乎是同步的。这就导致了代码正在优雅关闭服务时,就被强行终止了。
但是 K8s 也提供了 preStop Hook,可以参考文档,配置这个以后,你可以让 kubelet 收到删除 Pod 的事件时,先等待一段时间,再强行删除。
在Rainbond中它的位置在组件的K8s属性中,如下图所示:
你可以在 Rainbond 中进行如下配置,这样在关闭该组件时,将会休眠10秒再删除 pod
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
from rainbond.
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Currently, when restarting and shutting down, the underlying Deployment resources will be deleted directly. In fact, if you directly use the Yaml file of K8s to deploy and then delete it, you will encounter the same problem.
You now use hooks in your business, so your code will exit gracefully when the pod is deleted. But Kubelet deletes pods almost synchronously. This resulted in the code being forcibly terminated when it was shutting down the service gracefully.
But K8s also provides preStop Hook, you can refer to Document. After configuring this, you can let kubelet receive When deleting a Pod event, wait for a period of time before forcibly deleting it.
Its location in Rainbond is in the K8s property of the component, as shown in the following figure:
You can configure the following in Rainbond so that when the component is closed, it will sleep for 10 seconds before deleting the pod.
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
from rainbond.
这个之前也尝试过,没有任何效果。取消注册的日志并没有打印出来,Nacos中的实例依然没有取消注册。
from rainbond.
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
I've tried this before, to no avail. The unregistration log is not printed, and the instance in Nacos is still not unregistered.
from rainbond.
-
你可以先确认这个属性是否确实生效了,可以在组件运行时查看其yaml文件,是否包含你设置的字段
-
如果这个属性生效,那么你的程序优雅退出的时间需要多久?默认情况下 Kubernetes 为删除容器留了最大 30s 的宽限时间,如果程序的优雅关闭时间和 preStop Hook 时间之和超过 30s,也依然会被删除。如果是这种情况,那还可以通过设置 terminationGracePeriodSeconds 字段来处理。不过这个字段当前 Rainbond 还不支持
from rainbond.
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
-
You can first confirm whether this attribute is indeed effective. You can check its yaml file when the component is running to see whether it contains the fields you set.
-
If this property takes effect, how long will it take for your program to exit gracefully? By default, Kubernetes leaves a maximum grace time of 30 seconds for deleting a container. If the sum of the graceful shutdown time and preStop Hook time of the program exceeds 30 seconds, it will still be deleted. If this is the case, it can also be handled by setting the terminationGracePeriodSeconds field. However, this field is not currently supported by Rainbond.
from rainbond.
关于第1个问题:确定生效。
我换一个必定重现僵尸进程和没有取消注册问题的组件,csp-gen。lifecycle配置如下:
在kubectl中可以确确实实的看到,配置上了。
kubectl get deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c -o yaml
关于第2个问题:应用关闭耗时很多,不超过1秒。
具体可以看下后面这个试验。
另外我做了几个试验,有一些结论
使用 kubectl 删除deployment,可以正常取消注册并且不会产生僵尸经常,不管加不加--force参数都可以正常关闭。
请看下,如下步骤是否分析正确。
1、这个组件的启动日志:正常注册Nacos,启动成功
2、当前k8s节点(就一个k8s master节点的集群)有14个僵尸进程
3、删除上述deployment,在RainBond界面也能看出容器处于关闭中有10秒的时间。
心中默念,1,2,3,4,5...9,10,然后就能看到取消注册的日志了。也就是说,正常了。
kubectl delete deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c
与此同时,k8s节点没有新增僵尸进程,依然是14个僵尸进程。
4、上面没有使用强制删除,我又试了下强制删除:结果是也能正常关闭,并取消注册nacos
kubectl delete deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
deployment.apps "csp-csp-gen" force deleted
5、如果使用RainBond关闭按钮,则取消注册没做完就被杀掉,且产生僵尸进程
from rainbond.
目前我没有K8s 1.18的环境,因此我拿你给的例子在单机版本和基于主机安装的 Rainbond 中做了试验,并未复现该问题。我并未设置任何健康检测、prestop等相关字段。
两个环境如下:
- 单机版默认K3s为 v1.25.15-k3s1,Rainbond 版本为 v5.16.0-release,使用containerd运行时。
- 基于主机安装K8s版本为1.23.10,Rainbond 版本为 v5.16.0-release,使用docker运行时,docker 版本为24.0.7
我的操作步骤如下:
- 下载你提供的代码,并在服务器上打出镜像
- 在应用市场直接安装 nacos 单机版,2.2.0 版本。
- 通过镜像部署,并依赖 nacos
- 代码启动完成后,可以查看到 nacos 注册信息。
- 关闭组件,nacos 服务列表自动清空,更新组件,nacos 服务列表,先清空,后更新了最新地址。重启组件和更新结果一致。同时日志完整输出。
详细步骤截图如下:
- 应用完整部署拓扑图如下,probe是你提供的压缩包打出的镜像,nacos是从应用市场安装的2.2.0版本。
- 注册日志、服务列表、服务详情如下,可以看到服务 ip 地址一致。
- 执行关闭操作,日志正常打印,nacos服务列表为空,详细截图如下
- 重新启动组件,此时注册日志、服务列表、服务详情如下,可以看到服务 ip 地址依然一致。
- 此时进行更新操作,查看日志实例IP为10.42.0.58,此时回到nacos查看服务列表和服务详情,截图如下
- 再次尝试重启操作,注册日志、服务列表、服务详情如下
- 至于僵尸进程,从我执行以上操作到结束,整个服务器的僵尸进程为0,无增长
从代码层面进行排查,删除 deployment 是直接调用api-server处理的,删除逻辑与kubelet删除deployment yaml的逻辑应该无任何差异,且这部分逻辑从 5.13 到 5.16 版本并无任何改动,代码参考:
rainbond/worker/appm/controller/stop.go
Line 171 in 41bc5ee
from rainbond.
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Currently, I do not have a K8s 1.18 environment, so I tested the example you gave in the stand-alone version and Rainbond installed on the host, and the problem did not reproduce. I have not set any health detection, prestop and other related fields.
The two environments are as follows:
- The default K3s for the stand-alone version is v1.25.15-k3s1, the Rainbond version is v5.16.0-release, and the containerd runtime is used.
- The K8s version installed on the host is 1.23.10, the Rainbond version is v5.16.0-release, and when running with docker, the docker version is 24.0.7
My steps are as follows:
- Download the code you provided and mirror it on the server
- Install nacos stand-alone version, version 2.2.0 directly in the application market.
- Deploy through mirroring and rely on nacos
- After the code startup is completed, you can view the nacos registration information.
- Close the component, the nacos service list is automatically cleared, update the component, and the nacos service list is cleared first, and then the latest address is updated. Restarting the component is consistent with the update result. At the same time, the log is output completely.
Screenshots of detailed steps are as follows:
- The complete application deployment topology is as follows. probe is the image of the compressed package you provided, and nacos is version 2.2.0 installed from the application market.
- The registration log, service list, and service details are as follows. You can see that the service IP addresses are consistent.
- Execute the shutdown operation, the log prints normally, and the nacos service list is empty. The detailed screenshot is as follows
- Restart the component. At this time, the registration log, service list, and service details are as follows. You can see that the service IP address is still the same.
- Perform the update operation at this time and check the log. The instance IP is 10.42.0.58. At this time, return to nacos to view the service list and service details. The screenshot is as follows
- Try the restart operation again. The registration log, service list, and service details are as follows:
- As for the zombie process, from the time I performed the above operation to the end, the zombie process of the entire server was 0 and there was no growth.
from rainbond.
如果没有进一步的可复现步骤,我将在新版本发布时关闭此 issue
from rainbond.
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
If there are no further reproducible steps, I will close this issue when a new version is released
from rainbond.
Related Issues (20)
- 云原生应用市场,官方插件undefined HOT 4
- 5.17资源监控无法正常显示 HOT 6
- 本地文件zip包创建的组建,重新传包 ->构建时不会将新的zip包解压到nginx目录下 HOT 1
- 多集群环境时, 只能在一个集群创建Gateway API HOT 1
- 接入cert-manager为域名签发免费证书 HOT 1
- 卸载集群并重新安装,应用信息均能正常显示,网关数据库配置存在,但是页面无法显示。 HOT 1
- 使用ceph存储,pvc创建为RWO,应该为RWX HOT 1
- 如何打开rbd-gateway内的nginx的access_log HOT 3
- 快速安装的时候docker 无法启动 Failed to start docker.service - Docker Application Container Engine HOT 2
- 您的应用正在准备中,请稍等一会儿 HOT 5
- 应用拉取镜像失败 HOT 3
- 组件删除报错,无法删除 HOT 3
- 离线环境安装rbd-db启动失败 HOT 3
- 应用发布无法修改镜像tag HOT 3
- api接口无法更换镜像地址名称提示NOT FOUND HOT 6
- 通过源码构建golang微服务的文档和demo项目比较老,能更新下吗? HOT 3
- 服务器日期修改之后服务异常接口提示500 HOT 3
- New gateway access rule error reporting HOT 2
- Please provide support for 1.29 / 1.30 HOT 2
- 安装pipeline,流水线展示不完全 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rainbond.