GithubHelp home page GithubHelp logo

Comments (12)

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: The restart/shutdown button is changed to non-forced shutdown, leaving some time for the applications in the pod to shut down.

from rainbond.

yangkaa avatar yangkaa commented on June 2, 2024

目前,在重启和关闭时,是会去直接删除底层的 Deployment 资源的。实际上,如果你直接使用K8s的Yaml文件部署,再删除,也会遇到相同的问题。

你现在在业务中使用了钩子,所以收到删除pod信号时,你的代码会优雅退出。但是 Kubelet 删除 pod 却几乎是同步的。这就导致了代码正在优雅关闭服务时,就被强行终止了。

但是 K8s 也提供了 preStop Hook,可以参考文档,配置这个以后,你可以让 kubelet 收到删除 Pod 的事件时,先等待一段时间,再强行删除。

在Rainbond中它的位置在组件的K8s属性中,如下图所示:
image

你可以在 Rainbond 中进行如下配置,这样在关闭该组件时,将会休眠10秒再删除 pod

preStop:
   exec:
     command: ["sh", "-c", "sleep 10"]

from rainbond.

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Currently, when restarting and shutting down, the underlying Deployment resources will be deleted directly. In fact, if you directly use the Yaml file of K8s to deploy and then delete it, you will encounter the same problem.

You now use hooks in your business, so your code will exit gracefully when the pod is deleted. But Kubelet deletes pods almost synchronously. This resulted in the code being forcibly terminated when it was shutting down the service gracefully.

But K8s also provides preStop Hook, you can refer to Document. After configuring this, you can let kubelet receive When deleting a Pod event, wait for a period of time before forcibly deleting it.

Its location in Rainbond is in the K8s property of the component, as shown in the following figure:
image

You can configure the following in Rainbond so that when the component is closed, it will sleep for 10 seconds before deleting the pod.

preStop:
   exec:
       command: ["sh", "-c", "sleep 10"]

from rainbond.

shun634501730 avatar shun634501730 commented on June 2, 2024

这个之前也尝试过,没有任何效果。取消注册的日志并没有打印出来,Nacos中的实例依然没有取消注册。
image

from rainbond.

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


I've tried this before, to no avail. The unregistration log is not printed, and the instance in Nacos is still not unregistered.
image

from rainbond.

yangkaa avatar yangkaa commented on June 2, 2024
  1. 你可以先确认这个属性是否确实生效了,可以在组件运行时查看其yaml文件,是否包含你设置的字段

  2. 如果这个属性生效,那么你的程序优雅退出的时间需要多久?默认情况下 Kubernetes 为删除容器留了最大 30s 的宽限时间,如果程序的优雅关闭时间和 preStop Hook 时间之和超过 30s,也依然会被删除。如果是这种情况,那还可以通过设置 terminationGracePeriodSeconds 字段来处理。不过这个字段当前 Rainbond 还不支持

from rainbond.

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


  1. You can first confirm whether this attribute is indeed effective. You can check its yaml file when the component is running to see whether it contains the fields you set.

  2. If this property takes effect, how long will it take for your program to exit gracefully? By default, Kubernetes leaves a maximum grace time of 30 seconds for deleting a container. If the sum of the graceful shutdown time and preStop Hook time of the program exceeds 30 seconds, it will still be deleted. If this is the case, it can also be handled by setting the terminationGracePeriodSeconds field. However, this field is not currently supported by Rainbond.

from rainbond.

shun634501730 avatar shun634501730 commented on June 2, 2024

关于第1个问题:确定生效。

我换一个必定重现僵尸进程和没有取消注册问题的组件,csp-gen。lifecycle配置如下:
image

在kubectl中可以确确实实的看到,配置上了。
kubectl get deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c -o yaml
image

关于第2个问题:应用关闭耗时很多,不超过1秒。

具体可以看下后面这个试验。

另外我做了几个试验,有一些结论

使用 kubectl 删除deployment,可以正常取消注册并且不会产生僵尸经常,不管加不加--force参数都可以正常关闭。

请看下,如下步骤是否分析正确。
1、这个组件的启动日志:正常注册Nacos,启动成功
image

2、当前k8s节点(就一个k8s master节点的集群)有14个僵尸进程
image

3、删除上述deployment,在RainBond界面也能看出容器处于关闭中有10秒的时间。
心中默念,1,2,3,4,5...9,10,然后就能看到取消注册的日志了。也就是说,正常了。
kubectl delete deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c
image

与此同时,k8s节点没有新增僵尸进程,依然是14个僵尸进程。
image

4、上面没有使用强制删除,我又试了下强制删除:结果是也能正常关闭,并取消注册nacos
kubectl delete deployment csp-csp-gen -n 8b435cdaeb984cf69af21d8fc5c4fb8c --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
deployment.apps "csp-csp-gen" force deleted

5、如果使用RainBond关闭按钮,则取消注册没做完就被杀掉,且产生僵尸进程
image
image

from rainbond.

yangkaa avatar yangkaa commented on June 2, 2024

目前我没有K8s 1.18的环境,因此我拿你给的例子在单机版本和基于主机安装的 Rainbond 中做了试验,并未复现该问题。我并未设置任何健康检测、prestop等相关字段。

两个环境如下:

  1. 单机版默认K3s为 v1.25.15-k3s1,Rainbond 版本为 v5.16.0-release,使用containerd运行时。
  2. 基于主机安装K8s版本为1.23.10,Rainbond 版本为 v5.16.0-release,使用docker运行时,docker 版本为24.0.7

我的操作步骤如下:

  1. 下载你提供的代码,并在服务器上打出镜像
  2. 在应用市场直接安装 nacos 单机版,2.2.0 版本。
  3. 通过镜像部署,并依赖 nacos
  4. 代码启动完成后,可以查看到 nacos 注册信息。
  5. 关闭组件,nacos 服务列表自动清空,更新组件,nacos 服务列表,先清空,后更新了最新地址。重启组件和更新结果一致。同时日志完整输出。

详细步骤截图如下:

  1. 应用完整部署拓扑图如下,probe是你提供的压缩包打出的镜像,nacos是从应用市场安装的2.2.0版本。
image
  1. 注册日志、服务列表、服务详情如下,可以看到服务 ip 地址一致。
image image image
  1. 执行关闭操作,日志正常打印,nacos服务列表为空,详细截图如下
image image
  1. 重新启动组件,此时注册日志、服务列表、服务详情如下,可以看到服务 ip 地址依然一致。
image image image
  1. 此时进行更新操作,查看日志实例IP为10.42.0.58,此时回到nacos查看服务列表和服务详情,截图如下
image image image image
  1. 再次尝试重启操作,注册日志、服务列表、服务详情如下
image image image image
  1. 至于僵尸进程,从我执行以上操作到结束,整个服务器的僵尸进程为0,无增长
image

从代码层面进行排查,删除 deployment 是直接调用api-server处理的,删除逻辑与kubelet删除deployment yaml的逻辑应该无任何差异,且这部分逻辑从 5.13 到 5.16 版本并无任何改动,代码参考:

if deployment := app.GetDeployment(); deployment != nil && deployment.Name != "" {

from rainbond.

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Currently, I do not have a K8s 1.18 environment, so I tested the example you gave in the stand-alone version and Rainbond installed on the host, and the problem did not reproduce. I have not set any health detection, prestop and other related fields.

The two environments are as follows:

  1. The default K3s for the stand-alone version is v1.25.15-k3s1, the Rainbond version is v5.16.0-release, and the containerd runtime is used.
  2. The K8s version installed on the host is 1.23.10, the Rainbond version is v5.16.0-release, and when running with docker, the docker version is 24.0.7

My steps are as follows:

  1. Download the code you provided and mirror it on the server
  2. Install nacos stand-alone version, version 2.2.0 directly in the application market.
  3. Deploy through mirroring and rely on nacos
  4. After the code startup is completed, you can view the nacos registration information.
  5. Close the component, the nacos service list is automatically cleared, update the component, and the nacos service list is cleared first, and then the latest address is updated. Restarting the component is consistent with the update result. At the same time, the log is output completely.

Screenshots of detailed steps are as follows:

  1. The complete application deployment topology is as follows. probe is the image of the compressed package you provided, and nacos is version 2.2.0 installed from the application market.
image
  1. The registration log, service list, and service details are as follows. You can see that the service IP addresses are consistent.
image image image
  1. Execute the shutdown operation, the log prints normally, and the nacos service list is empty. The detailed screenshot is as follows
image image
  1. Restart the component. At this time, the registration log, service list, and service details are as follows. You can see that the service IP address is still the same.
image image image
  1. Perform the update operation at this time and check the log. The instance IP is 10.42.0.58. At this time, return to nacos to view the service list and service details. The screenshot is as follows
image image image image
  1. Try the restart operation again. The registration log, service list, and service details are as follows:
image image image image
  1. As for the zombie process, from the time I performed the above operation to the end, the zombie process of the entire server was 0 and there was no growth.
image

from rainbond.

yangkaa avatar yangkaa commented on June 2, 2024

如果没有进一步的可复现步骤,我将在新版本发布时关闭此 issue

from rainbond.

Issues-translate-bot avatar Issues-translate-bot commented on June 2, 2024

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


If there are no further reproducible steps, I will close this issue when a new version is released

from rainbond.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.