Comments (8)
You would need to provide k3s configuration and logs during the event for us to help you. There isn't enough for us to act on.
from k3s.
which k3s configuration exactly, as in which files?
i checked k3s-agent's logs and there isn't anything meaningful, e.g. yesterday logs stopped at midnight, when everything worked fine, but as soon as i restarted k3s-agent, this appeared:
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 2497 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 2920 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 3191 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 3525 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 4335 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
May 02 08:41:20 n2 systemd[1]: k3s-agent.service: Found left-over process 79756 (containerd-shim) in control group while starting unit. Ignoring.
May 02 08:41:20 n2 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
any clue? any more logs i should inspect?
from k3s.
Please attach the complete logs from the time period in question. Those messages are all from systemd, not k3s. They are normal to see, as container processes remain running while k3s itself is stopped.
from k3s.
can you tell me which logs specifically? as per the FAQ:
- i'm running from systemd (so rc and command line are irrelevant)
- pod logs aren't useful as the pods have no issues (i can port-forward)
- containerd logs match systemd log
i'm sort of blind right now, i am trying to connect to a specific ingress, it says 404 page not found and i can't really see any info in the logs i'm checking. the only (non-realtime) message i see is in the traefik pod logs, e.g.
time="2024-05-08T10:58:41Z" level=error msg="Skipping service: no endpoints found" serviceName=blahblah namespace=stuff servicePort="&ServiceBackendPort{Name:,Number:8000,}" providerName=kubernetes ingress=blahblah
from k3s.
OK, well "can't connect to" is not really the same as "get a 404 response from". In this case you have specific logs from traefik indicating that there are no endpoints for that service, so you'd want to check on the pods backing that service and see why they're not ready.
from k3s.
i mentioned before that pods and services are fine, i can port forward and access the service without issues. the issue isn't always the same. earlier on it was 404, now it's gateway timeout. i just restarted k3s-agent again and it's all fine.
i'll ask again. what is the correct way to debug this?
from k3s.
Pretty much just standard linux/kubernetes stuff...
- Journald logs -
k3s
on the servers,k3s-agent
on agents - Pod events -
kubectl describe pod -n <NAMESPACE> <POD>
, check for events, restarts, failed health checks, and so on - Check service endpoints -
kubectl describe service -n <NAMESPACE> <SERVICE>
Note that you will probably need to catch this very close in time to when you're unable to reach the site via the ingress.
For some reason the service's endpoints are going away at times. I get that you can port-forward to it and such, but you need to figure out why the endpoints are occasionally being removed from the service. This usually indicates that the pods are failing health-checks or are being restarted or recreated for some other reason.
from k3s.
they're not really "occasionally" removed. they always are. but it only applies to those that are on that node. once that happens they will stay that way until i restart k3s-agent on said node. anyway, thanks for the help. i'll investigate.
from k3s.
Related Issues (20)
- Update etcd to >= v3.5.13
- Improve documentation for Tailscale + K3s with an example ACL HOT 1
- Pod priority enforcement without killing lower priority pods
- netpol controller does not honor debug: true HOT 1
- Availability of StructuredAuthenticationConfiguration in 1.30.0-k3s HOT 1
- k3s 使用mysql 8.0.20 版本存储kine表,启动k3s后,mysql进程会反复重启,k3s报错连接被拒绝 HOT 1
- etcd curl: (58) unable to load client key: -8178 (SEC_ERROR_BAD_KEY) HOT 6
- Image pulls from embedded registry fail if --bind-address is set (and not 127.0.0.1) HOT 5
- missing kernel config check
- k3s-agent Fails to start with with embedded registry and kill entire OS HOT 7
- Update kube-router to v2.1.2 HOT 1
- unable to get REST mapping for extensions HOT 1
- anget node working but access fail on anget node HOT 3
- container didn't get the environment var from CONTAINERD_XXX HOT 1
- Install script `INSTALL_K3S_PR` support does not work if CI has run more than once HOT 1
- Spegel mirror returns 500 instead of 404 on unavailable images
- Bump containerd to >= v1.7.17
- Bump etcd to >= v3.5.13
- Add client certificate auth support to HelmChart controller / job image
- k3s spontaneous implosion HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k3s.