grpc-ecosystem / grpc-health-probe Goto Github PK

A command-line tool to perform health-checks for gRPC applications in Kubernetes and elsewhere

License: Apache License 2.0

Dockerfile 2.52% Go 97.48%

grpc-health-probe's Introduction

grpc_health_probe(1)

The grpc_health_probe utility allows you to query health of gRPC services that expose service their status through the gRPC Health Checking Protocol.

grpc_health_probe is meant to be used for health checking gRPC applications in Kubernetes, using the exec probes.

⚠️ Kubernetes has now built-in gRPC health checking capability as generally available. As a result, you might no longer need to use this tool and can use the native Kubernetes feature instead.

This tool can still be useful if you are on older versions of Kubernetes, or using advanced configuration (such as custom metadata, TLS or finer timeout tuning), or not using Kubernetes at all.

This command-line utility makes a RPC to /grpc.health.v1.Health/Check. If it responds with a SERVING status, the grpc_health_probe will exit with success, otherwise it will exit with a non-zero exit code (documented below).

EXAMPLES

$ grpc_health_probe -addr=localhost:5000
healthy: SERVING

$ grpc_health_probe -addr=localhost:9999 -connect-timeout 250ms -rpc-timeout 100ms
failed to connect service at "localhost:9999": context deadline exceeded
exit status 2

Installation

It is recommended to use a version-stamped binary distribution:

Choose a binary from the Releases page.

Installing from source (not recommended):

Make sure you have git and go installed.
Run: go install github.com/grpc-ecosystem/grpc-health-probe@latest
This will compile the binary into your $GOPATH/bin (or $HOME/go/bin).

Using the gRPC Health Checking Protocol

To make use of the grpc_health_probe, your application must implement the gRPC Health Checking Protocol v1. This means you must to register the Health service and implement the rpc Check that returns a SERVING status.

Since the Health Checking protocol is part of the gRPC core, it has packages/libraries available for the languages supported by gRPC:

[health.proto] [Go] [Java] [Python] [C#/NuGet] [Ruby] ...

Most of the languages listed above provide helper functions that hides implementation details. This eliminates the need for you to implement the Check rpc yourself.

Example: gRPC health checking on Kubernetes

Kubernetes now supports gRPC health checking. If your cluster is running a version that supports gRPC health checking, you can define a gRPC liveness probe in your Pod specification. For more information on how to define a gRPC liveness probe in Kubernetes, see the Kubernetes documentation.

However, if your Kubernetes version does not support gRPC health checking or if you want to use some advanced features that Kubernetes does not support, you can use grpc_health_probe to health-check your gRPC server. As a solution, grpc_health_probe can be used for Kubernetes to health-check gRPC servers running in the Pod.

You are recommended to use Kubernetes exec probes and define liveness and/or readiness checks for your gRPC server pods.

You can bundle the statically compiled grpc_health_probe in your container image. Choose a binary release and download it in your Dockerfile:

RUN GRPC_HEALTH_PROBE_VERSION=v0.4.13 && \
    wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 && \
    chmod +x /bin/grpc_health_probe

In your Kubernetes Pod specification manifest, specify a livenessProbe and/or readinessProbe for the container:

spec:
  containers:
  - name: server
    image: "[YOUR-DOCKER-IMAGE]"
    ports:
    - containerPort: 5000
    readinessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:5000"]
      initialDelaySeconds: 5
    livenessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:5000"]
      initialDelaySeconds: 10

This approach provide proper readiness/liveness checking to your applications that implement the gRPC Health Checking Protocol.

Health Checking TLS Servers

If a gRPC server is serving traffic over TLS, or uses TLS client authentication to authorize clients, you can still use grpc_health_probe to check health with command-line options:

Option	Description
`-tls`	use TLS (default: false)
`-tls-ca-cert`	path to file containing CA certificates (to override system root CAs)
`-tls-client-cert`	client certificate for authenticating to the server
`-tls-client-key`	private key for for authenticating to the server
`-tls-no-verify`	use TLS, but do not verify the certificate presented by the server (INSECURE) (default: false)
`-tls-server-name`	override the hostname used to verify the server certificate

Health checking TLS Servers with SPIFFE issued credentials

If your gRPC server requires authentication, you can use the following command line options and set the SPIFFE_ENDPOINT_SOCKET environment variable.

Option	Description
`-spiffe`	use SPIFFE Workload API to retrieve TLS credentials (default: false)

Other Available Flags

Option	Description
`-v`	verbose logs (default: false)
`-connect-timeout`	timeout for establishing connection
`-rpc-timeout`	timeout for health check rpc
`-rpc-header`	sends metadata in the RPC request context (default: empty map)
`-user-agent`	user-agent header value of health check requests (default: grpc_health_probe)
`-service`	service name to check (default: "") - empty string is convention for server health
`-gzip`	use GZIPCompressor for requests and GZIPDecompressor for response (default: false)
`-version`	print the probe version and exit

Example:

Start the route_guide example server with TLS by running:
```
go run server/server.go -tls
```

Run grpc_client_probe with the CA certificate (in the testdata/ directory) and hostname override the cert is signed for:

$ grpc_health_probe -addr 127.0.0.1:10000 \
    -tls \
    -tls-ca-cert /path/to/testdata/ca.pem \
    -tls-server-name=example.com \
    -rpc-header=foo:bar \
    -rpc-header=foo2:bar2

status: SERVING

Exit codes

It is not recommended to rely on specific exit statuses. Any failure will be a non-zero exit code.

Exit Code	Description
0	success: rpc response is `SERVING`.
1	failure: invalid command-line arguments
2	failure: connection failed or timed out
3	failure: rpc failed or timed out
4	failure: rpc successful, but the response is not `SERVING`
20	failure: could not retrieve TLS credentials using the SPIFFE Workload API

This is not an official Google project.

grpc-health-probe's People

Stargazers

Watchers

Forkers

iamrare arne-klein tsexton6689 munisystem snowan hchenxa srkinz84 chideat soypete operator-framework rajeshpachar qiangyt platoblm massmutual thejasn ahmetb srispl dreamryx jimilong kxion hilalisadev yutkin gamer22026 reddy-s p1c2u amrmahdi d7561985 arohat maanur forkkit vertan shtripat rizalgowandy kahirokunn freiheit-com zero-ghub happyhakka reckonpoint windforcetech sethtomy multi-arch weibaohui arindamnayak ilanfrai jan-xyz waitingzeng lingling1420q bancek chenglinjava68 mehrdad-shokri vaikzs zongrongli vitush joerghall barsemiangeorgik smart1986 avoxi zufardhiyaulhaq hrudayg mohammedimrank th3w4y ccf19881030 aramase fuxu mwcampbell slyncio zhoushuke stevenfrst isabella232 alis0nm ralucas bpotaczek laurianti ujjwalsh ori-edge tommyulfsparre arisneander kpramesh2212 koddidev shipt dtaniwaki netkey kirinse larrynung edwarnicke tuapuikia senior88oqz jobtome-labs section42 iostrovok gaima8 rio rikke drigz guoyiang sunilthorat09 practicalgo damonxu tlj chizi-chizi

grpc-health-probe's Issues

Adds support for additional metadata

My use case is that I want to disable tracing for health probes. For example:

    livenessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:5000", "-metadata=b3:0"]
      initialDelaySeconds: 10

There are certainly other ways to achieve the same thing but they involve sampling request which is not enabled in many tracing libraries.

Q: Could add a flag -disable-tracing=true instead?

Not really, propagation format dictates the metadata key so we can't have just one. I could work on this if you do agree.

Ping @mohit-a21 @adriancole

Service flag does not work properly

because of the protobuf definition here: https://github.com/grpc/grpc-go/blob/87eb5b7502493f758e76c4d09430c0049a81a557/health/grpc_health_v1/health.pb.go#L149

The service flag won't ever work. It will always look for your service at "grpc.health.v1.Health" and your protobuf has to satisfy that.

We should fix this or remove the service flag altogether so it doesn't give people (like me) false hope that you can change the folder the generated protobuf for the health service lives in.

Publishing docker image

Hi, do you guys have a plan to publish the docker image on docker hub or gcr? We use grpc-health-probe as a sidecar and it would be nice if there is an official repository.

Jfrog Xray reports security flaws on grpc_health_probe-linux-amd64 v 0.3.6

One of my organization security scanning tools (Jfrog Xray) is reporting a bunch of security flaws on the grpc_health_problem component which I'm using for my grpc service.

Some issues are coming from the go version used: 1.15.6. Apparently there are some flaws in this version and it is requesting to use at least version 1.15.9 or 1.16.1

There is also a flaw reported in golang.org/x/text:0.3.4. Version 0.3.5 should be used to fix the problem.

Any chance to have these flaws fixed any time soon ?

Thanks.

Security vulnerabilities

We use x ray for determining security issue within our project it is reporting the following vulnerabilities in GRPC_HEALTH_PROBE

In Go before 1.13.13 and 1.14.x before 1.14.5, Certificate.Verify may lack a check on the VerifyOptions.KeyUsages EKU requirements (if VerifyOptions.Roots equals nil and the installation is on Windows). Thus, X.509 certificate verification is incomplete.
CVE Id: CVE-2020-14039 (CVSS v2 Score : 5.0/CVSS:2.0/AV:N/AC:L/Au:N/C:N/I:P/A:N) (CVSS v3 Score : 5.3/CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:L/A:N)
The x/text package before 0.3.3 for Go has a vulnerability in encoding/unicode that could lead to the UTF-16 decoder entering an infinite loop, causing the program to crash or run out of memory. An attacker could provide a single byte to a UTF16 decoder instantiated with UseBOM or ExpectBOM to trigger an infinite loop if the String function on the Decoder is called, or the Decoder is passed to golang.org/x/text/transform.String.
CVE Id : CVE-2020-14040 (CVSS v2 Score : 5.0/CVSS:2.0/AV:N/AC:L/Au:N/C:N/I:N/A:P) (CVSS v3 Score : 7.5/CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H)

Support arm and arm64

Since this is written in Go, adding support for arm and arm64 should be pretty easy. Gox doesn't support arm64, but you can invoke go build directly to accomplish the same task.

Provide binary releases for macOS and Windows as well as an official Linux container image

In development it is often useful to use this tool for testing. It would be nice to have macOS and Windows binaries available from the releases page for download as well as a trusted container image available on Docker hub or GCR.

A tool like gox can be used to create the binaries pretty easily.
And Github Actions can be setup pretty easily to handle building the container image when new code is merged.

The service name option should have been set as required or at least documented to be default="/".

I am not sure to understand how to implement HealthCheck

Hi,

I think my issue is pretty described in the title, I don't understand how to set it up. My gRPC Service is running the server on port 8080 but, based on your readme, I feel like I need to setup a double server or make my rpc service inherit from HealthCheck service

I really don't understand how to implement it on code side (go currently)

Thanks for any explanation!

Better error message for "context deadline exceeded"

Go context.WithTimeout returns a generic context deadline exceeded error message when it times out.

It leaks the implementation detail (i.e. golang context pkg) to the end-user.

Capture and modify this error message.

go should be upgraded to version 1.17

According https://nvd.nist.gov/vuln/detail/CVE-2021-29923, go before 1.17 has a vulnerability

Go before 1.17 does not properly consider extraneous zero characters at the beginning of an IP address octet, which (in some situations) allows attackers to bypass access control that is based on IP addresses, because of unexpected octal interpretation. This affects net.ParseIP and net.ParseCIDR.

This vulnerability is marked in Twistlock scans as High.

As a user I want to be able to use watch endpoint

For some cases, it would be helpful if there was the ability to watch an app's health status.

Kubernetes not terminating container when health check fails

Hi,

Thanks for maintaining this utility. I came across it when I was going through the GCP microservices demo project on Github. I have been using it since v0.3.x and has been working well until now.

I'm on Rancher kubernetes v1.20.9, Go v1.17, grpc_health_probe version v0.4.5 and using the following dependencies:

google.golang.org/genproto v0.0.0-20210831024726-fe130286e0e2
google.golang.org/grpc v1.40.0
google.golang.org/protobuf v1.27.1

I have a GRPC service that does a health check like this:

import (
  .
  .
  healthpb "google.golang.org/grpc/health/grpc_health_v1"
  .
  .
)
.
.
.
grpcServer := grpc.NewServer()

// Register heatlh check server.
healthcheck := health.NewServer()
healthpb.RegisterHealthServer(grpcServer, healthcheck)

go func() {
for {
	serverStatus := healthpb.HealthCheckResponse_SERVING
	healthy := instance.HealthChecks() // calls a method to do some health checks
	if !healthy {
		serverStatus = healthpb.HealthCheckResponse_NOT_SERVING
	}
	instance.logger.Info(serverStatus)
	healthcheck.SetServingStatus("service.namespace", serverStatus)
	time.Sleep(10 * time.Second)
}
}()

When I run the program locally, and when the health check fails, I can see the status is being set to NOT_SERVING.

When the container runs on kubernetes, and when the health check fails, I can see the status is being set to NOT_SERVING. However, the container isn't restarted (terminate + start) automatically on k8s.

My Dockerfile has:

# Download tool to perform GRPC health checks for k8s.
RUN GRPC_HEALTH_PROBE_VERSION=v0.4.5 && \
    wget -qO/bin/grpc_health_probe https://github.com/grpc-ecosystem/grpc-health-probe/releases/download/${GRPC_HEALTH_PROBE_VERSION}/grpc_health_probe-linux-amd64 && \
    chmod +x /bin/grpc_health_probe

And my k8s manifest has:

readinessProbe:
  exec:
    command:
      [
        "/bin/grpc_health_probe",
        "-addr=:3000",
        "-rpc-timeout=5s",
      ]
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 5
livenessProbe:
  exec:
    command:
      [
        "/bin/grpc_health_probe",
        "-addr=:3000",
        "-rpc-timeout=5s",
      ]
  initialDelaySeconds: 10
  periodSeconds: 10
  timeoutSeconds: 5

I downgraded go to v.1.16.7 and had the same issue.

What am I missing?

Thanks.

connection error logged for health checks

Getting the following error repeated in my logs when I enable health checks using grpc_health_probe:

transport: loopyWriter.run returning. connection error: desc = "transport is closing"

The health check succeeds. Any thoughts?

Buildpacks

Sorry if this is a way to generic issue. I'm investigating Skaffold + the Google Buildpacks for Golang and found that I have no way to incorporate the health probe binary.

I'm thinkering about including the command as a subcommand of my application, so the binary can be used in health checks too. Would this be advisable? Would you recommend against it?

If done, this would require this go module to be includable in other applications and therefore offer a lib package next to the main package.

Add unit tests

It should be possible to easily spin-up grpc servers in-memory for tests and exercise probing capabilities.

It can be either execing out to a binary, or we could export the probing code to a pkg and just invoke that (but that wouldn't probably test the CLI flags).

Netty server terminated - k8s readiness probe using gprc-health-probe

A java grpc service when queried by a k8s readiness probe with a grpc-health-probe gives a netty exception netty server gives a io.grpc.netty.shaded.io.grpc.netty.NettyServerTransport notifyTerminated. The same service runs fine when used with outside k8s with docker

docker run -p 50051:50051 bikertales/publisher:v0.2
grpc-health-probe -addr=:50051

The grpc server code (https://github.com/kubesure/publisher/blob/master/src/main/java/io/kubesure/publish/App.java)

deployment yaml (https://github.com/kubesure/helm-charts/blob/master/publisher/templates/deployment.yaml)

grpc-health-probe is build using go get pkg

Minikube 1.2.0 k8s 1.15

Cannot pull docker image

The docker repository doesn't seem to be public?

> docker pull ghcr.io/grpc-ecosystem/grpc-health-probe:v0.4.0
Error response from daemon: Head https://ghcr.io/v2/grpc-ecosystem/grpc-health-probe/manifests/v0.4.0: denied: denied

Health probe with rpc-header on k8s

Locally when I run, works as expected.

/bin/grpc_health_probe -rpc-header="authorization: token be69a4ef-1c32-4901-8ea2-8866f0707c24"

But on k8s deployment

readinessProbe:
exec:
command:
- /bin/grpc_health_probe
- -addr=:{{ .Values.service.port }}
- -rpc-header="authorization:token be69a4ef-1c32-4901-8ea2-8866f0707c24"

I got an error:

error: health rpc failed: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR

Verbose mode shows:

addr=:50053 conn_timeout=1s rpc_timeout=1s
headers: {map["authorization:[token be69a4ef-1c32-4901-8ea2-8866f0707c24"]]}
tls=false
spiffe=false
establishing connection
connection established (took 3.726224ms)
error: health rpc failed: rpc error: code = Internal desc = stream terminated by RST_STREAM with error code: PROTOCOL_ERROR

grpc_health_probe version: v0.4.6

[Feature Request] Custom path

The linked example Protocol page allows for a different path

The suggested format of service name is package_names.ServiceName

I request the ability to pass that path in on the command line rather than being restricted to /grpc.health.v1.Health/Check

Usage of TLS endpoints - stats

I'm curious if there any information on how often TLS arguments are being used? With the built-in gRPC support in K8s it is not trivial to add support for any secure communication options. So the question is how important it is to implement those secure options before promoting gRPC probes support to beta.

KEP: kubernetes/enhancements#2727

Example: gRPC health checking on Kubernetes does not work on GKE with mTLS

Using mutual TLS, the example provided does not work on GKS when deploying an ingress over a service. GKE still insists on providing a HTTP2 health check looking for a response from /. This fails since Googles health check has no certificates and so cannot complete a mTLS connection.

My exec readiness and liveliness probes are ignored when the ingress is set up.

readinessProbe:
  exec:
    command: ["/hc", "-addr=127.0.0.1:9999", "-tls", "-tls-ca-cert=/tls/ca-chain.pem", "-tls-client-cert=/tls/client.pem", "-tls-client-key=/tls/client-key.pem"]
  initialDelaySeconds: 5
livenessProbe:
  exec:
    command: ["/hc", "-addr=127.0.0.1:9999", "-tls", "-tls-ca-cert=/tls/ca-chain.pem", "-tls-client-cert=/tls/client.pem", "-tls-client-key=/tls/client-key.pem"]
  initialDelaySeconds: 10

Release request to get support for go modules

I'm assuming the latest commit in master is fine for consumption. If so, can a new release be made so that version pinning is unnecessary for go modules usage?

Only amd64 container images

Would like to have a multi-arch container image for all the architectures available so I can use the container with the same behavior in different architectures.

Jfrog Xray reports security flaws on grpc_health_probe-linux-amd64 v 0.4.2

Jfrog Xray is reporting a security flaw on grpc_health_probe-linux-amd64 v0.4.2. The flow seem to come from go:1.15.12

Any chance to have it fix ASAP?

Thanks.

trivy security scan trips on CVE-2020-29652

Our CI pipeline runs trivy on our docker images, today it started to throw an error on version 0.4.1 of grpc_health_probe:

usr/local/bin/grpc_health_probe
===============================
Total: 1 (UNKNOWN: 0, LOW: 0, MEDIUM: 0, HIGH: 1, CRITICAL: 0)
+---------------------+------------------+----------+------------------------------------+------------------------------------+---------------------------------------+
|       LIBRARY       | VULNERABILITY ID | SEVERITY |         INSTALLED VERSION          |           FIXED VERSION            |                 TITLE                 |
+---------------------+------------------+----------+------------------------------------+------------------------------------+---------------------------------------+
| golang.org/x/crypto | CVE-2020-29652   | HIGH     | v0.0.0-20200622213623-75b288015ac9 | v0.0.0-20201216223049-8b5274cf687f | golang: crypto/ssh: crafted           |
|                     |                  |          |                                    |                                    | authentication request can            |
|                     |                  |          |                                    |                                    | lead to nil pointer dereference       |
|                     |                  |          |                                    |                                    | -->avd.aquasec.com/nvd/cve-2020-29652 |
+---------------------+------------------+----------+------------------------------------+------------------------------------+---------------------------------------+

Not sure what the impact for other people might be. Also tricky as this is basically coming from another package that (inadvertently) is used.

Health probe auth

Hello!

I have some misunderstanding of grpc health-probe. I hope someone can help me to find the right way.

I have some service with gRPC API. This service have authorization mechanism.

I have added health-check API (https://github.com/grpc/grpc/blob/master/doc/health-checking.md) to the same grpc.Server instance.
And now to check health in k8s I should use client credentials for probe.
Inside of k8s I should mount to the server deployment certificates of server and client(some special cert for local requests), because without it health-probe not work.

It's look like that health probe should work without any authorization, am I right? If not, please explain me why.

Thanks!

Relatively high cpu usage

Hi,

We've seen that the grpc-health-probe uses relatively high cpu usage on our low-request pods.
We have the following setting for the probe:

livenessProbe:
 exec:
  command:
   - /bin/grpc_health_probe
   - -addr=:8081
  failureThreshold: 3
  periodSeconds: 2
  successThreshold: 1
  timeoutSeconds: 1

Our go-based app in this case has the following resource requests:

resources:
    limits:
       cpu: "1"
       memory: 128Mi
    requests:
       cpu: 250m
       memory: 32Mi

This app also has a HPA setup to scale at 80% cpu usage (200m avg).

We've observed the probes to use 300m cpu on average, which causes the HPA to always keep up too many pods

CPU usage over a 7 day period, the drop is where I changed the periodSeconds to '10'

Zoomed in into the last day with the old probe settings, and the drop with the new settings:

The load on our system was the same during these days, so we were essentially wasting cpu.

One last example, where the service is generally only using CPU from the probes:

The 02s interval was taking 300m cpu
The 05s interval was taking 170m cpu
The 10s interval was taking 90m cpu

I've googled and looked through the existing issues but I've not found a clue to this issue. Is this known? I did find some issues with kubernetes exec type of probes, but we prefer to not switch to HTTP probes since these are all GRPC systems.

Treat liveness and readiness separately

Suppose I have an application that can take itself "out of service" for various reasons (operations, too many requests, etc). The application code will make that determination and change their health check to NOT_SERVING.

If I run this application in a container/pod in kubernetes, then I can set a livenessProbe that will terminate the container/pod if the status changes from SERVING. However, I'd also like to run a readinessProbe that will mark the container as Not Ready if the status returns NOT_SERVING, but change the livenessProbe to not kill the container/Pod.

My initial thought was to add a flag --not-serving (will gladly take input on flag name) that would return exit code 0 if the status is NOT_SERVING. This would allow the container/pod to live but not be ready.

I'm willing to contribute this if it makes sense.

working on my CLA

liveness and readiness throws protocol error.

Hello team,
I am having a strange problem. I have a Kubernetes deployment, a grpc service. I have included the grpc-health-probe bundled with my service binaries in the docker container. I have included the health proto inside service and then used HealthServiceImpl (in-built library from c#).
After I deploy to K8s, the pods never start. they fail the probes with the following error-

Support for Unix Domain Socket

Please, add support Unix Domain Sockets.

environment variables expansion of args?

Currently the exec probes in kubernetes that does not support using environment variables in the exec command. See kubernetes/kubernetes#40846. So there is no way to use environment variables referenced in the container spec, or that are built in into the container image.

The current work around is to use a shell e.g. bash -c <command> which is not feasible in containers that do not have a shell, like distroless containers.

My proposal is to do an environment variable expansion of the args that are parsed by the probe.

Thought? If you agree with the general idea, we can discuss how it can be implemented.

Health probe causing "broken pipe"

Hi,
I recently started using the health probe and noticed our server (Java grpc) outputting a constant exception with 'Connection reset by peer'.
I dug into the reason and I'm led to believe the probe is doing something funky with the connection.
Here is a demo server implementation https://gist.github.com/filaruina/1961a2f0f823bef95a23fa4b76b3e6e6
As you can see this only contains the healthcheck and reflection.
When I run this with debug level logging, I see an error from Netty like this:

11:49:24.318 [grpc-nio-worker-ELG-3-5] DEBUG i.g.n.s.i.n.h.c.h.Http2ConnectionHandler - [id: 0x0e281517, L:/0:0:0:0:0:0:0:1:5990 ! R:/0:0:0:0:0:0:0:1:51966] Sending GOAWAY failed: lastStreamId '1', errorCode '2', debugData 'Connection reset by peer'. Forcing shutdown of the connection.
java.io.IOException: Broken pipe
	at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:182)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:130)
	at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:496)

However, if I do a grpcurl like grpcurl -max-time 1 -plaintext localhost:5990 grpc.health.v1.Health/Check I don't get any kind of exception in the logs.
Similartly, I wrote a simple java main that calls the healthcheck (https://gist.github.com/filaruina/11e7f30b1c388874fe5f7b84c24b5d5a) and don't get any kind of exceptions in the server side.

This exception happens consistently when using the probe binary.

This seems similar, if not the same, to this issue #15

Using a private key with a password for tls

When using the flag -tls-client-key is it possible to pass in a password. I want know whether it is possible to use a private key protected with a password for the probe.

Health probe sends invalid host header

When called with -addr=:5000 the health probe sends a host header of :5000.

./grpc_health_probe-linux-amd64 -addr=:5000 -v

:5000 isn't valid according to the HTTP/2 spec and some servers, like the .NET Kestrel server, reject the call.

Trace id "0HM3MTR9PCIAV:00000001": HTTP/2 stream error "PROTOCOL_ERROR". A Reset is being sent to the stream. Microsoft.AspNetCore.Connections.ConnectionAbortedException: Invalid Host header: ':5000'

When this happens the grpc_health_probe returns a INTERNAL error status.

When inferring the local computer, The health probe should send a host header of localhost:5000 instead.

Multiple services check support

like -service=service1,service2
Another option is to support -discoverServices flag that will use reflection API to discover services and then runs check for each discovered service.

How to use the value of status:SERVING

Hi I have a use case, where if the grpc_health_probe doesnt emit out status:SERVING, I need to get the coredumps out of the service. For this I need to parse the key status and look for its value to be "SERVING" or not. How can I achieve it with grpc_health_probe.

ALTS support

Are there any plans for ALTS support, via flags, similar to how TLS is currently supported?

Bump to go >= 1.17.1

Because of bug golang/go#47801 Twistlock recognized grpc-health-probe as having high vulnerability. The problem is fixed in go 1.17.1

Add aarch64 built in releases ?

Go flag package causes wrong exit status

Since we use go "flag" package, an unknown -option causes program to exit
with os.Exit(2). However 2 is currently assigned to StatusConnectionFailure = 2.

Need to find a way to exit with StatusInvalidArguments = 1, I'm not seeing
it in https://pkg.go.dev/flag?tab=doc yet.

Better error message when server isn't implementing grpc.v1.health.Health service

Currently failing with

health check rpc failed: rpc error: code = Unimplemented desc =

HealthCheckRequest should have a flag for specifying calling server

We want to have an optional field to identify which server is calling service's health check. This is useful for multi-client services :D

It would be nice to have a flag to check the version of the probe

It would be nice to have a flag to check the version of the probe. Currently I am unable to find the version of the probe that is running on my docker image. Will be really helpful.

grpc-health-probe shuts down the gRPC connection non-gracefully

The probe process sends a TCP RST at the conclusion of the gRPC session instead of gracefully shutting down with a FIN/ACK exchange. This causes a lot of noise logging in k8s pods such as:

Feb 01, 2020 2:54:05 PM io.grpc.netty.shaded.io.grpc.netty.NettyServerTransport notifyTerminated
INFO: Transport failed
io.grpc.netty.shaded.io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer

Screen-shot of Wireshark packet exchange:

bad sumdb sum for v0.3.5

hugh@dev:/tmp$ docker run --rm -it -e GOPROXY=direct -e GOSUMDB=sum.golang.org  golang:1.15 go mod download -json github.com/grpc-ecosystem/[email protected]
{
        "Path": "github.com/grpc-ecosystem/grpc-health-probe",
        "Version": "v0.3.5",
        "Error": "github.com/grpc-ecosystem/[email protected]: verifying module: checksum mismatch\n\tdownloaded: h1:AUIBOe9C5KhBIMlk7HJSu21jV24b9wV5jL81VKQOOSA=\n\tsum.golang.org: h1:YBHZT8Xa+s1n5anej0w+ktRUQ/R/9EmjPR5i8MF3Xd0=\n\nSECURITY ERROR\nThis download does NOT match the one reported by the checksum server.\nThe bits may have been replaced on the origin server, or an attacker may\nhave intercepted the download attempt.\n\nFor more information, see 'go help module-auth'.\n",
        "Info": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.info",
        "GoMod": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.mod",
        "GoModSum": "h1:DKHSwzDRGBH0nXsMsmJhKPW4z447cNpPVuJYJ7/Qe6I="
}

Perhaps the tag was moved to a new commit after being cached?

The proxy also has a different version

hugh@dev:/tmp$ docker run --rm -it -e GOPROXY=proxy.golang.org -e GOSUMDB=sum.golang.org  golang:1.15 go mod download -json github.com/grpc-ecosystem/[email protected]
{
        "Path": "github.com/grpc-ecosystem/grpc-health-probe",
        "Version": "v0.3.5",
        "Info": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.info",
        "GoMod": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.mod",
        "Zip": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.zip",
        "Dir": "/go/pkg/mod/github.com/grpc-ecosystem/[email protected]",
        "Sum": "h1:YBHZT8Xa+s1n5anej0w+ktRUQ/R/9EmjPR5i8MF3Xd0=",
        "GoModSum": "h1:DKHSwzDRGBH0nXsMsmJhKPW4z447cNpPVuJYJ7/Qe6I="
}

gosum.io seems to not have the issue

hugh@dev:/tmp$ docker run --rm -it -e GOPROXY=direct -e GOSUMDB=gosum.io+ce6e7565+AY5qEHUk/qmHc5btzW45JVoENfazw8LielDsaI+lEbq6  golang:1.15 go mod download -json github.com/grpc-ecosystem/[email protected]{
        "Path": "github.com/grpc-ecosystem/grpc-health-probe",
        "Version": "v0.3.5",
        "Info": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.info",
        "GoMod": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.mod",
        "Zip": "/go/pkg/mod/cache/download/github.com/grpc-ecosystem/grpc-health-probe/@v/v0.3.5.zip",
        "Dir": "/go/pkg/mod/github.com/grpc-ecosystem/[email protected]",
        "Sum": "h1:AUIBOe9C5KhBIMlk7HJSu21jV24b9wV5jL81VKQOOSA=",
        "GoModSum": "h1:DKHSwzDRGBH0nXsMsmJhKPW4z447cNpPVuJYJ7/Qe6I="
}

A bump in version to 0.3.6 might be a quick fix.

Thanks

Is grpc-health-probe affected by CVE 2021-38297?

See http://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-38297
If it is affected, a new release should be made using go 1.17.2 or later.

grpc: transport credentials are set for an insecure connection (grpc.WithTransportCredentials() and grpc.WithInsecure() are both called

#63 (release v0.4.0) breaks grpc-health-probe for commands that uses --tls but does not use --spiffe, e.g. grpc-health-probe -addr=:3000 -tls -tls-client-cert=/etc/tls/tls.crt -tls-client-key=/etc/tls/tls.key -tls-ca-cert=/etc/tls/tls.ca:

  Warning  Unhealthy  31s (x7 over 91s)   kubelet            Liveness probe failed: error: failed to connect service at ":3000": grpc: transport credentials are set for an insecure connection (grpc.WithTransportCredentials() and grpc.WithInsecure() are both called)

The issue is that you can't set both grpc.WithTransportCredentials() (ref) and grpc.WithInsecure() (ref).

Add ability to split out "readiness" and "liveness" checks

Hi,

I am implementing the gRPC health probe for our containers running in Kubernetes and would like to have a way of specifying whether I am checking for "liveness" or "readiness".

Specifically, if the service returns "NOT_SERVING", I would read that as "live" but not "ready".

This enables graceful startup (where the application is "live" while starting up, but not yet "ready" to receive traffic) and also graceful shutdown (where the application is in the process of shutting down).

One way of implementing the above might be to add a "-liveness-check" argument flag which returns "0" in the case that the service responds in a timely and correct manner with the "NOT_SERVING" status.

Running the command without the flag would behave the same as the current behaviour.

Netflix vulnerability

The vulnerabilities reported by Netflix https://vuls.cert.org/confluence/pages/viewpage.action?pageId=56393752 require that grpc is upgraded to at least 1.23.0

https://github.com/grpc/grpc-go/releases

The go.mod file for grpc-health-probe points to grpc 1.17.1

Please fix

grpc-ecosystem / grpc-health-probe Goto Github PK

grpc-health-probe's Introduction

grpc_health_probe(1)

Installation

Using the gRPC Health Checking Protocol

Example: gRPC health checking on Kubernetes

Health Checking TLS Servers

Health checking TLS Servers with SPIFFE issued credentials

Other Available Flags

Exit codes

grpc-health-probe's People

Stargazers

Watchers

Forkers

grpc-health-probe's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs