openresty / lua-resty-upstream-healthcheck Goto Github PK

View Code? Open in Web Editor NEW

512.0 512.0 134.0 92 KB

Health Checker for Nginx Upstream Servers in Pure Lua

Lua 97.91% Makefile 2.09%

lua-resty-upstream-healthcheck's Introduction

Name

OpenResty - Turning Nginx into a Full-Fledged Scriptable Web Platform

Name
Description
- For Users
- For Bundle Maintainers
Additional Features
- resolv.conf parsing
Mailing List
Report Bugs
Copyright & License

Description

OpenResty is a full-fledged web application server by bundling the standard nginx core, lots of 3rd-party nginx modules, as well as most of their external dependencies.

This bundle is maintained by Yichun Zhang (agentzh).

Because most of the nginx modules are developed by the bundle maintainers, it can ensure that all these modules are played well together.

The bundled software components are copyrighted by the respective copyright holders.

The homepage for this project is on openresty.org.

For Users

Visit the download page on the openresty.org web site to download the latest bundle tarball, and follow the installation instructions in the installation page.

For Bundle Maintainers

The bundle's source is at the following git repository:

https://github.com/openresty/openresty

To reproduce the bundle tarball, just do

make

at the top of the bundle source tree.

Please note that you may need to install some extra dependencies, like perl, dos2unix, and mercurial. On Fedora 22, for example, installing the dependencies is as simple as running the following commands:

sudo dnf install perl dos2unix mercurial

Back to TOC

Additional Features

In additional to the standard nginx core features, this bundle also supports the following:

Back to TOC

resolv.conf parsing

syntax: resolver address ... [valid=time] [ipv6=on|off] [local=on|off|path]

default: -

context: http, stream, server, location

Similar to the resolver directive in standard nginx core with additional support for parsing additional resolvers from the resolv.conf file format.

When local=on, the standard path of /etc/resolv.conf will be used. You may also specify arbitrary path to be used for parsing, for example: local=/tmp/test.conf.

When local=off, parsing will be disabled (this is the default).

This feature is not available on Windows platforms.

Back to TOC

Mailing List

You're very welcome to join the English OpenResty mailing list hosted on Google Groups:

https://groups.google.com/group/openresty-en

The Chinese mailing list is here:

https://groups.google.com/group/openresty

Back to TOC

Report Bugs

You're very welcome to report issues on GitHub:

https://github.com/openresty/openresty/issues

Back to TOC

Copyright & License

The bundle itself is licensed under the 2-clause BSD license.

This module is licensed under the terms of the BSD license.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Back to TOC

lua-resty-upstream-healthcheck's People

Contributors

Stargazers

Watchers

Forkers

caquino zhanglei rcosnita hexiaofeng qrof boojapathy cloudxtreme sitano wangfakang sdgdsffdsfff luozy loveshell gitluochao lightcrest guanglinlv bungle wzugang yanglo tieske doujiang24 szelcsanyi kuaikuai coverxiaoeye wangwei1237 hujinyong fasterize weihaitong adaksuman p0pr0ck5 infinityhacks hilllinux zuoqiying wxfpdc xerox-xeon ycaihua ut0mt8 samwonwyd gooops kioco raydog thisverygoodhhhh bakins boynanboy jzh800 kitjacky lihuibin feixiangcode hamburml sunwuyan membphis ren-i shi7161979 hongliang5316 hishamhm tokers zhaoxx063 mehrdad-shokri zhousoft hlyhexiansheng aa1215018028 mesos-utility cnscud wanglixiang90 eglinux agui888 heidsoft nanqinlang-fork-web-openresty 605629942 k9yhuang ii0 julianocristian pybt helinbo2015 ouyang789987 mingodad bjuthang wuhujun jangocheng jmpcyc mosjin yigitbasalma zorrock sept5288 duanshuaimin ducmeit1 qwzhou89 qadirluo retsubasa zhenyu-aws-lab santoshmagdum jin-long l-olsem wangrzneu arbibmohammed wjd198344 xiaocang dworznik shenxiuwu andy-zhangtao investlab

lua-resty-upstream-healthcheck's Issues

How about to add a function for getting specified peer status

my scenario:
when the upstream is unhealthy, i need to get the specified peer status(eg: DOWN) and do some suspend work, but not to show the peer status with function status_page().
workaround:
so i have added a function get_peer_status() to get specified peer status.
and i'd like to share with everybody who come across the same scenario.

Routing requests to all peers when they all down

Sometimes it happens that healthcheckers configuration gets spoiled. For example, when the new version of application has been deployed on the backend servers without respecting healthchecks on the balancer. In this case all peers will be switched to 'DOWN' state and nginx will return 502 error message for incoming requests.
If balancer detects such issue it could just switch to default routing of all requests to all peers instead.
In theory if all peers are down there is no big reason who shows the error message in situation when we have real problems with all our backend servers.

It is kind of protection from the fool. However, for example AWS application balancer applies it by default.

I have already made a fork and added this feature. You can have a look here: https://github.com/eglinux/lua-resty-upstream-healthcheck
So, if you by chance find this way is ok - I can make a pull request.
Thanks!

PR: add upstream name to log message to distinguish among multiple upstreams with same server

Would it be possible to add upstream name to log message to distinguish among multiple upstreams with same server.
Namely, change current message from

healthcheck: failed to receive status line from 10.0.0.1:80:

to something like

healthcheck: failed to receive status line from 10.0.0.1:80: for upstream backend

Might be easier to add error format string as healthcheck field se we can set to log the fields we need?

This is follow up to the issue.

Thank you

many upstream problem

about 100 upstreams , healthcheck does not work
2 shm zones , 100m dict

When does Prometheus metrics support? Have been waiting for a long time!

健康检测失败

你好，我用的是0.04的版本，以下是我的nginx的健康检测配置

ok, err = hc.spawn_checker{
    shm = "healthcheck",
    upstream = "dlv-business-service",
    type = "http",
    http_req = "GET /dlv-business-service HTTP/1.0\r\nHost: dlv-business-service\r\n\r\n",
    interval = 5000,
    timeout = 3000,
    fall = 5,
    rise = 3,
    valid_statuses = {200, 302, 304},
    concurrency = 10,
}

upstream中有一个节点异常，但是lua-resty-upstream-healthcheck模块检测结果是正常的；但我将该节点的应用完全停止时，检测结果就是异常的了。

简而言之，lua-resty-upstream-healthcheck模块只实现了IP端口的检测，没有实现http检测，这种情况偶尔会出现，请问可能是什么原因导致的呢？

Timers "leaking" ?

Hi,
On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear:
"failed to create timer: too many pending timers".
I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like.
It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time.
After ca. hour in logs got values like this:

Current runing timers -2129 (??!!!)
Current penging timers 4096
failed to spawn health checker: failed to create timer: too many pending timers,

I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly.
Is this a known error? Can it be avoided somehow?
Any help / insights would be very appreciated.

Our config looks like this:

       local hc = require "resty.upstream.healthcheck"
 
        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "application_karaf",
            type = "http",
            http_req = "GET /tenant/health HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 3000, timeout = 1500, fall = 3, rise = 2,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            ngx.log(ngx.ERR, "Current penging timers ", ngx.timer.pending_count())
            ngx.log(ngx.ERR, "Current runing timers ", ngx.timer.running_count())
            return
        end

Test case failures on rhel 7.6 ppc64le platform

Hi All,

I had build the nginx binary on rhel 7.6 ppc64le (version 1.17.1.1rc0) from source code - https://github.com/openresty/openresty.
Please note that, I had copied and used ppc64le compiled LuaJIT code while building openresty (nginx).
Below command I used to compile the openresty -

./configure --with-cc-opt="-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC" --with-http_image_filter_module --with-http_dav_module --with-http_auth_request_module --with-poll_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module --with-http_iconv_module --with-http_drizzle_module --with-http_postgres_module --with-http_addition_module --add-module=/usr/openresty/openresty_test_modules/nginx-eval-module --add-module=/usr/openresty/openresty_test_modules/replace-filter-nginx-module

And then tried to execute the test cases for 'lua-resty-upstream-healthcheck' like below -

[root]# pwd
/usr/openresty/openresty/openresty-1.17.1.1rc0/build/lua-resty-upstream-healthcheck-0.06
[root]# prove -r t

NOTE: The 'lua-resty-upstream-healthcheck' module version used was 0.06 which was downoaded with openresty bundle.

But I am getting below kind of repeated errors -

	[root  lua-resty-upstream-healthcheck-0.06]#
	[root  lua-resty-upstream-healthcheck-0.06]# prove -r t/
	t/sanity.t .. 1/99
	#   Failed test 'TEST 7: peers version upgrade (make up peers down) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12355 is turned up after 2 success(es)
	# warn(): healthcheck: peer 127.0.0.1:12356 is turned up after 2 success(es)
	# '
	#     doesn't match '(?^:^upgrading peers version to 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned up after 2 success\(es\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12356 is turned up after 2 success\(es\)
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){2,4}$)'
	TEST 15: peers > concurrency - WARNING: killing the child process 12933 with force... at /usr/local/share/perl5/Test/Nginx/Util.pm line 609.
	t/sanity.t .. 8/99
	#   Failed test 'TEST 15: peers > concurrency - response_body_like - response is expected (Upstream foo.com Primary Peers 127.0.0.1:12354 up 127.0.0.1:12355 up 127.0.0.1:12356 up 127.0.0.1:12357 up 127.0.0.1:12358 up Backup Peers 127.0.0.1:12359 up)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1635.
	#                   'Upstream foo.com
	#     Primary Peers
	#         127.0.0.1:12354 up
	#         127.0.0.1:12355 up
	#         127.0.0.1:12356 up
	#         127.0.0.1:12357 up
	#         127.0.0.1:12358 up
	#     Backup Peers
	#         127.0.0.1:12359 up
	# '
	#     doesn't match '(?^s:Upstream foo.com
	#     Primary Peers
	#         127.0.0.1:12354 DOWN
	#         127.0.0.1:12355 \S+
	#         127.0.0.1:12356 \S+
	#         127.0.0.1:12357 \S+
	#         127.0.0.1:12358 \S+
	#     Backup Peers
	#         127.0.0.1:12359 \S+
	# )'
	t/sanity.t .. 10/99
	#   Failed test 'TEST 15: peers > concurrency - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: failed to receive status line from 127.0.0.1:12354
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12357 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12358 was checked to be not ok
	# healthcheck: failed to receive status line from 127\.0\.0\.1:12354
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12359 was checked to be not ok
	# $)'
	t/sanity.t .. 14/99
	#   Failed test 'TEST 9: concurrency == 2 (odd number of peers) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^(?:spawn a thread checking primary peers 0 to 2
	# check primary peers 3 to 4
	# check backup peer 0
	# ){4,6}$)'
	t/sanity.t .. 25/99
	#   Failed test 'TEST 3: health check (bad case), no listening port in a primary peer - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){2,4}$)'
	t/sanity.t .. 31/99
	#   Failed test 'TEST 5: health check (bad case), timed out - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12354 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12354 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){0,2}$)'

	#   Failed test 'TEST 8: peers version upgrade (make down peers up) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12354 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^upgrading peers version to 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12354 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){3,5}$)'
	t/sanity.t .. 43/99
	#   Failed test 'TEST 4: health check (bad case), bad status - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: bad status code from 127.0.0.1:12355: 404
	# healthcheck: bad status code from 127.0.0.1:12355: 404
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 404
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 404
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){1,4}$)'
	t/sanity.t .. 48/99
	#   Failed test 'TEST 11: health check (good case), status ignored by default - tcp_query ok'
	#   at /usr/local/share/perl5/Test/Nginx/Util.pm line 188.
	#          got: ''
	#     expected: 'GET /status HTTP/1.0
	# Host: localhost
	#
	# '
	t/sanity.t .. 60/99
	#   Failed test 'TEST 1: health check (good case), status ignored by default - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){3,5}$)'

	#   Failed test 'TEST 6: health check (bad case), bad status, and then rise again - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: bad status code from 127.0.0.1:12355: 403
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 1 failure(s)
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned up after 2 success(es)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 403
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 1 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned up after 2 success\(es\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){1,3}$)'
	t/sanity.t .. 71/99
	#   Failed test 'TEST 10: concurrency == 3 (odd number of peers) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^(?:spawn a thread checking primary peer 0
	# spawn a thread checking primary peer 1
	# check primary peer 2
	# check backup peer 0
	# ){4,6}$)'
	t/sanity.t .. 86/99
	#   Failed test 'TEST 2: health check (bad case), no listening port in the backup peer - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12356 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12356 is turned down after 2 failure\(s\)
	# publishing peers version 1
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# ){2,4}$)'

	#   Failed test 'TEST 14: health check with ipv6 backend (good case), status ignored by default - response_body - response is expected (repeated req 0, req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1589.
	# @@ -3,6 +3,6 @@
	#          127.0.0.1:12354 up
	#          [::1]:12355 up
	#      Backup Peers
	# -        [0:0::1]:12356 up
	# +        [::1]:12356 up
	#  upstream addr: 127.0.0.1:12354
	#  upstream addr: [::1]:12355
	t/sanity.t .. 94/99
	#   Failed test 'TEST 14: health check with ipv6 backend (good case), status ignored by default - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer \[::1\]:12355 was checked to be ok
	# healthcheck: peer \[0:0::1\]:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer \[::1\]:12355 was checked to be ok
	# healthcheck: peer \[0:0::1\]:12356 was checked to be ok
	# ){3,7}$)'
	# Looks like you failed 15 tests of 99.
	t/sanity.t .. Dubious, test returned 15 (wstat 3840, 0xf00)
	Failed 15/99 subtests

	Test Summary Report
	-------------------
	t/sanity.t (Wstat: 3840 Tests: 99 Failed: 15)
	  Failed tests:  3, 9-10, 16, 27, 33, 39, 45, 57, 60, 68
					79, 88, 93-94
	  Non-zero exit status: 15
	Files=1, Tests=99, 37 wallclock secs ( 0.04 usr  0.01 sys +  0.60 cusr  0.25 csys =  0.90 CPU)
	Result: FAIL
	[root  lua-resty-upstream-healthcheck-0.06]#

Please help suggest if I need to export any specific environment/setup any additional service or should try any compiler flag/somehow increase timeout value to make these test cases pass?

nginx version (compiled with libdrizzle 1.0 and radius, mariadb, postgresql services setup) -

# nginx -V
nginx version: openresty/1.17.1.1rc0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)
built with OpenSSL 1.0.2k-fips  26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC' --add-module=../ngx_devel_kit-0.3.1rc1 --add-module=../iconv-nginx-module-0.14 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../drizzle-nginx-module-0.1.11 --add-module=../ngx_postgres-1.0 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.15 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.7 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-http_image_filter_module --with-http_dav_module --with-http_auth_request_module --with-poll_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module --with-http_addition_module --add-module=/usr/openresty/openresty_test_modules/nginx-eval-module --add-module=/usr/openresty/openresty_test_modules/replace-filter-nginx-module --with-stream --with-stream_ssl_preread_module

unix socket upstream

Hi! Kindly asking, since I failed to find any mention about unix sockets like unix:/tmp/forwarded.sock (without port) being supported in upstreams.

(You can send plain http requests to them since they are essentialy just tcp sockets)

Can anyone confirm and possibly update docs ?

Update:

lua-resty-upstream-healthcheck/lib/resty/upstream/healthcheck.lua

Line 230 in f0b6528

ok, err = sock:connect(name)

it looks like this code is exactly for unix sockets, but not sure.

multiple stream group problem

How many checkers healthcheck does support maximum? Now, we have 20+ upstream groups, but we have found we can only have 8 checkers, when we write 9th checker, nginx can't start and report error: missing '}' character, but our config don't miss any '}'.

failed to get primary peers in upstream [24e7:c180:1210:c021:3de4:2edf:5bc1:950e]:9334: upstream not found

healthcheck status_page failed for ipv6 upstream
server{ listen [::]:9333; proxy_pass [24e7:c180:1210:c021:3de4:2edf:5bc1:950e]:9334; }

one domain return 502

Many upstream works, but this domain return 502
How can I debug?
conf like this
local ok, err = hc.spawn_checker{
shm = 'healthcheck', -- defined by 'lua_shared_dict'
upstream = 'test', -- defined by 'upstream'
type = 'http',

        http_req = 'GET /status HTTP/1.0\r\nHost: healthcheck.nxin.com\r\n\r\n',
                -- raw HTTP request for checking

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302, 301,401,402,403, 404},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }

request respones below

curl 10.21.1.11:8080/status -v

About to connect() to 10.21.1.11 port 8080 (#0)
Trying 10.21.1.11...
Connected to 10.21.1.11 (10.21.1.11) port 8080 (#0)

GET /status HTTP/1.1
User-Agent: curl/7.29.0
Host: 10.221.14.110:8080
Accept: /

< HTTP/1.1 302
< Location: https:

attempt to index field 're' (a nil value)

error.log

2016/08/15 14:06:34 [error] 4566#0: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:9: attempt to index field 're' (a nil value)
stack traceback:
    /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:9: in main chunk
    [C]: in function 'require'
    init_worker_by_lua:2: in main chunk

Am I missing some lua dependencies?

v1/healthcheck 接口返回信息

使用场景：当成7层LB替代Nginx使用
目前问题：所有upstream上游都配置了主动检查（tcp检测），但是通过GET v1/healthcheck 只返回了其中一组upstream状态信息。
需求：想要类似于Nginx的check_status页面，实时查看所有upstream的状态。

Support for specifying custom port

Would it be possible to support the following?

type = "http",
port = 80

This would be a reasonable workaround for those needing support for SSL backends, as long as they also listen on port 80, it's a good enough health check for me, and should be much easier to implement than adding type = "https"

openresty configuration upstream healthcheck nginx core dump

os version: Centos 6.5 x86_64 2.6.32-431.29.2.el6
openresty version : 1.9.7.1
config file path : /usr/local/openresty/nginx/conf/healthcheck.conf

lua_shared_dict healthcheck 10m;

lua_socket_log_errors off;

init_worker_by_lua_block {
    local hc = require "resty.upstream.healthcheck"

    local ok, err = hc.spawn_checker{
        shm = "healthcheck",  -- defined by "lua_shared_dict"
        upstream = "test", -- defined by "upstream"
        type = "http",

        http_req = "GET /  HTTP/1.0\r\nHost: test\r\n\r\n",

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }
    if not ok then
        ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
        return
    end

}

vhosts 目录当中配置引用该文件
e.g
include /usr/local/openresty/nginx/conf/healthcheck.conf

server {
..........
}

重新reload 后，nginx error.log 报core dump

2016/03/17 14:09:38 [alert] 32936#0: worker process 27057 exited on signal 11 (core dumped)
2016/03/17 14:09:40 [alert] 32936#0: worker process 27050 exited on signal 11 (core dumped)
2016/03/17 14:09:43 [alert] 32936#0: worker process 27059 exited on signal 11 (core dumped)
2016/03/17 14:09:44 [alert] 32936#0: worker process 27060 exited on signal 11 (core dumped)
2016/03/17 14:09:46 [alert] 32936#0: worker process 27061 exited on signal 11 (core dumped)

invalid http_req introduce 400 code

I introduce this module to my project but got 400 error for the invalid http_req.

    local ok, err = hc.spawn_checker{
        shm = "healthcheck",  -- defined by "lua_shared_dict"
        upstream = "foo.com", -- defined by "upstream"
        type = "http",

        -- the http_req has a bug ? 
        -- http_req字段的格式可能有问题? 至少在我这里是老得到400 code. 
        http_req = "GET /status HTTP/1.0\r\nHost: foo.com\r\n\r\n",
                -- raw HTTP request for checking

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }

Everything is ok if i update the http_req as below:
http_req = "GET /status HTTP/1.0\r\n\r\nHost: foo.com",

please advice.

Why no_timer works only in debug mode?

lua-resty-upstream-healthcheck/lib/resty/upstream/healthcheck.lua

Line 623 in 26bad5d

if debug_mode and opts.no_timer then

I change this in #68

add callback to peer_fail

lua entry thread aborted: runtime error: string length overflow stack traceback

use curl for test

curl http://127.0.0.1/status
curl: (52) Empty reply from server

and log is:

2016/03/02 13:50:01 [error] 18864#0: *150 lua entry thread aborted: runtime error: string length overflow
stack traceback:
coroutine 0:
    [C]: in function 'get_primary_peers'
    /etc/nginx/lualib/resty/upstream/healthcheck.lua:682: in function 'status_page'
    content_by_lua(default.conf:102):4: in function <content_by_lua(default.conf:102):1>, client: 127.0.0.1, server: xxx, request: "GET /status HTTP/1.1", host: "127.0.0.1"

cat /etc/nginx/nginx.conf

user  nginx;
worker_processes  32;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;
error_log  /home/ceph/log/nginx/error.log;
pid        /var/run/nginx.pid;
worker_rlimit_nofile 65535;



events {
    worker_connections  20000;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    vhost_traffic_status_zone;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent $request_length "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;
    access_log  /home/ceph/log/nginx/access.log  main;


    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  30;

upstream foo.com {
    server 127.0.0.1:801;
    server 192.168.170.1:80;
}

lua_shared_dict healthcheck 1m;
lua_socket_log_errors off;
init_worker_by_lua_block {
    local hc = require "resty.upstream.healthcheck"
    local ok, err = hc.spawn_checker({
        shm = "healthcheck",
        upstream = "foo.com",
        type = "http",
        http_req = "GET / HTTP/1.1\r\nHost: 127.0.0.1\r\n\r\n",
        interval = 2000,
        timeout = 1000,
        fall = 3,
        rise = 2,
        valid_statuses = {200, 302},
        concurrency = 10,
        })
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            return
        end
    }

    include /etc/nginx/conf.d/*.conf;
}

cat /etc/nginx/conf.d/default.conf

server {
    listen       80 backlog=10240;
    gzip off;
    client_max_body_size 0;
    server_name  xxx;



    location  / {
    set $target '';
    proxy_buffering off;
    proxy_ignore_client_abort on ;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_pass http://foo.com;
    }

    location = /status {
        access_log off;
        default_type text/plain;
        content_by_lua_block {
            local hc = require "resty.upstream.healthcheck"
            ngx.say("Nginx Worker PID: ", ngx.worker.pid())
            ngx.print(hc.status_page())
            }
    }
}

What happens if a server exist in two upstream?

upstream foo1 { server 172.0.0.1; } upstream foo2 { server 172.0.0.1; }
172.0.01 will recieve two healthcheck requests in a check cycle?

healthchecl behind proxy

Thank for greate module.
I'm using openrestry newest version. I'm using proxy . I config your module and it's working but status is down

Upstream test
    Primary Peers
        10.1.11.12:222 DOWN
        10.1.11.12:222 DOWN
        10.1.11.13:222 DOWN
    Backup Peers

Small patch to match a specific string in body(or header) for healthness

Useful in some situation, you want to check something more precise than an http status.
healthcheck.lua.patch.txt

Why not setkeepalive on tcp socket ?

For each healthcheck, you seem to close the socket, why not put it into the connection pool with sock:setkeepalive() instead ?

lua-resty-upstream-healthcheck/lib/resty/upstream/healthcheck.lua

Line 275 in 346f3a6

sock:close()

Publish 0.07 to OPM

Since 0.07 is ready, please publish to openresty/lua-resty-upstream-healthcheck package on OPM, so that users can work with latest features with opm get openresty/lua-resty-upstream-healthcheck
@agentzh

no request object found?

I used the openresty version 1.13.62, and I installed two openresty machines to do the test work. I followed the same test code in lua-resty-upstream-healthcheck, but the following error occurred, please help to answer, the first time I came into contact with openresty, thanks!

problem:

this is what I wrote in the init by lua block phase:

nginx.conf

Feature request: support dynamic upstream

Hi,

The current usage scenario assumes that the upstream configuration will not change dynamically, we only can change upstream configuration via reload nginx.

In some scenarios (see also #55), we want to dynamically modify the upstream configuration without reload nginx, and the current design cannot satisfy it, so can we add a parameter to hc.spawn_checker to support dynamic upstream mode?

case1: support upstream modify only

Module: https://github.com/openresty/lua-upstream-nginx-module (TODO)
Module: https://github.com/weibocom/nginx-upsync-module
NGINX Plus: https://docs.nginx.com/nginx/admin-guide/load-balancer/dynamic-configuration-api/

In this case, just add a parameter to hc.spawn_checker:

init_worker_by_lua_block {
    local us, err = get_upstreams()
    if not us then
        return
    end

    for _, u in ipairs(us) do
        local ok, err = hc.spawn_checker{
            shm = "healthcheck",
            type = "http",
            upstream = u,

            dynamic = true, -- enable dynamic upstream mode
        }
    end
}

case2: support upstream add/delete/modify

Module: https://github.com/yzprofile/ngx_http_dyups_module
Tengine: https://tengine.taobao.org/document/http_dyups.html

In this case, we should add a timer to watch global upstream configuration:

init_worker_by_lua_block {
    require "resty.core"

    local upstream = require "ngx.upstream"
    local hc = require "resty.upstream.healthcheck"

    local get_upstreams = upstream.get_upstreams

    local watch
    watch = function (premature)
        if premature then
            return
        end

        local us, err = get_upstreams()
        if not us then
            return
        end

        for _, u in ipairs(us) do
            local ok, err = hc.spawn_checker{
                shm = "healthcheck",
                type = "http",
                upstream = u,

                dynamic = true, -- enable dynamic upstream mode
            }
        end

        local ok, err = ngx.timer.at(2, watch)
    end

    local ok, err = ngx.timer.at(2, watch) -- create a timer to watch global upstream configuration every 2s.
}

Dynamic upstream mode checker behave as follows:

When call spawn_checker, if the upstream not exists, new checker will be created, otherwise ignore. (case1)
Any time, when delete an upstream, the checker will exit automatically. (case 1)
Any time, when modify an upstream (compared with peers md5 digest), the checker will update peers automatically. (case1 + case2)

Any plan to support tcp health check?

Currently it seems not to support tcp check, any plan to do it?
Thanks.

how to integration 'balancer_by_lua'

i need to dynamic add server and need healthcheck, how to integration

Set a different upstream manager module

The module currently relies on the ngx.upstream module for managing the upstreams.

Unfortunately we cannot use that module for our usecase. See discussion here

Would you accept a PR to set the module explicitly, making it configurable? so I won't have to fork the code.

If so, should it be a setting on module level (in an upvalue, shared by all checkers spawned), or a setting per spawned checker. Imho the latter, but please share your thoughts.

How can support register the callback function when peer down

I will call some fail warning program via kafka or redis client, so need this callback function.
Just like the hook_up: https://github.com/p0pr0ck5/lua-resty-upstream-healthcheck

NO checkers problem?

openresty version is 1.11.2.2
A nginx & B nginx,have the same upstreams contents,eg:
A：
upstream app_cluster {
ip_hash;
server 192.168.0.100:9082;
server 192.168.0.100:9083;
}
B：
upstream proxy_AB {
ip_hash;
server 192.168.0.100:9082;
server 192.168.0.100:9083;
}
after shutdown 9082&9083 app port server,the A nginx has correct check result:
Nginx Worker PID: 21127
Upstream app_cluster
Primary Peers
192.168.0.100:9082 DOWN
192.168.0.100:9083 DOWN
Backup Peers

but the B nginx tips NO checkers,follow:
Nginx Worker PID: 31735
Upstream proxy_AB (NO checkers)
Primary Peers
192.168.0.100:9082 DOWN
192.168.0.100:9083 DOWN
Backup Peers

===============================
B nginx is right?

Init error using config as given in synopsis section of README

The error msg is [error] 9595#0: init_worker_by_lua error: init_worker_by_lua:11: function arguments expected near '.'

And it seems like it's caused by this comment line: -- then you should write this instead: http_req = "GET /status HTTP/1.0\r\nHost: foo.com\r\n\r\n",, I think it's a trial issue but I failed to fix this :-(, as I'm new to both OpenResty and Lua.

checking response body as part of healthcheck?

Hello,

is there a way to check the contents of the response body in addition to the list of valid statuses? For example, ability to fail it a situation where the /healthCheck response code is HTTP 200 but body contains 'failure'.

Thank you!

Support for upstreams that use https.

This doesn't seem to support upstreams that require ssl connections. Is there any plans to expand this beyond pure http tests? Or any suggestions on how I could work around this. I'm fairly new to lua/openresty so I may be overlooking something obvious.

Thanks!

Test Case failures

Test case 14 failed to be executed

Error:

The IPv6 format is inconsistent

The linux kernel simplifies IPv6，Can we modify this use case？

system:

What happens if the server in upstream is dynamic changed ? Now health check will ignore the new incoming server.

when call hc.spawn_checker the upstream is:
upstream test { server 127.0.0.1:8080; }

But some time, upstream update to :upstream test { server 127.0.0.1:8080; server 127.0.0.1:8081; }

Health Check will ignore 127.0.0.1:8081 server, because that peer not in upstream, when i first call hc. spawn_checker.

bad status line always returned from healthcheck

Hi, I've just tried setting up this module, and all peers are very quickly marked as down.

My init_worker code is this, basically copy/pasted from the docs:

local ok, err = hc.spawn_checker({
  shm = "healthcheck",
  upstream = "backend",
  type = "http",
  http_req = [[GET /XXX/XXX HTTP/1.0\r\nHost: www.academia.edu\r\n\r\n]],
  interval = 2000,
  timeout = 1000,
  fall = 3,
  rise = 2,
  valid_statuses = { 200 },
  concurrency = 10
})
if not ok then
  ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
  return 
end

In the logs I see:

2014/06/19 16:36:40 [error] 2623#0: [lua] healthcheck.lua:57: errlog(): healthcheck: bad status line from XX.XXX.XXX.XXX:XX: <html>, context: ngx.timer
2014/06/19 16:36:40 [error] 2623#0: lua user thread aborted: runtime error: /var/lib/academia-config/nginx/shared/healthcheck.lua:255: bad argument #2 to 'sub' (number expected, got nil)
stack traceback:
coroutine 0:
        [C]: in function 'sub'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:255: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:202>
coroutine 1:
        [C]: in function 'receive'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:236: in function 'check_peer'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:315: in function 'check_peers'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:460: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:454>
        [C]: in function 'pcall'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:490: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:485>, context: ngx.timer

This is surprising, because if I telnet to the backend and paste that request, it responds correctly:

# telnet xxx.xxx.xx 81
Trying xx.xxx.xxx.xxx...
Connected to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com.
Escape character is '^]'.
GET /XXX/XXX HTTP/1.0
Host: www.academia.edu

HTTP/1.1 200 OK
Date: Fri, 20 Jun 2014 00:11:40 GMT
Content-Type: text/plain; charset=utf-8
Connection: close
Vary: Accept-Encoding
Status: 200 OK
Content-Language: en
X-Logged-In: false
ETag: "0e66aa8f5070dcc18fb58711fedfa4eb"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: c451fceb4eb3f0dca7b7f97df8e8543d
X-Runtime: 0.045754

Rails up
Postgres master up
Memcached up
Persistent Redis up
Counts Redis up
Zsets redis up Connection closed by foreign host.

Is there some misconfiguration somewhere in my code?

ngx_lua 0.9.5+ required

when using openresty/openresty:1.15.8.3-2-alpine-fat image I receive an error that ngx_lua 0.9.5+ is required.

web_1  | 2020/05/22 21:49:48 [error] 8#8: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 6#6: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 9#9: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 7#7: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk

/ # nginx -V
nginx version: openresty/1.15.8.3
built by gcc 9.2.0 (Alpine 9.2.0)
built with OpenSSL 1.1.1g  21 Apr 2020
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl/include' --add-module=../ngx_devel_kit-0.3.1rc1 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.15 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.7 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl/lib -Wl,-rpath,/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl/lib' --with-pcre --with-compat --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_xslt_module=dynamic --with-ipv6 --with-mail --with-mail_ssl_module --with-md5-asm --with-pcre-jit --with-sha1-asm --with-stream --with-stream_ssl_module --with-threads --with-stream --with-stream_ssl_preread_module

the problem of returning of healthcheck

my code in nginx.conf

upstream swordnet.com{
		server 127.0.0.1:18080;#two tomcats have bean started
		server 127.0.0.1:28080;
	}

lua_shared_dict ngx_stats 500m;
lua_shared_dict healthcheck 1m;
lua_socket_log_errors off;
init_worker_by_lua_block {
	local hc = require "resty.upstream.healthcheck"
	local ok, err = hc.spawn_checker {
		shm = "healthcheck",
		type = "http",
		upstream = "swordnet.com",
		http_req = "GET /health.txt HTTP/1.0\r\nHost: swordnet.com\r\n\r\n",
		valid_statuses = {200, 302}
	}
	if not ok then
	ngx.log(ngx.ERR, "=======> failed to spawn health checker: ", err)
	return
	end
}

.................

location /server/status {
			access_log off;
			default_type text/plain;
			allow 127.0.0.1;
			deny all;
			content_by_lua_block {
				local hc = require "resty.upstream.healthcheck"
				ngx.say("Nginx Worker PID: ", ngx.worker.pid())
				ngx.print(hc.status_page())
			}
		}

when I visited the url http://localhost/server/status on browser. It showed as below:

Nginx Worker PID: 12158
Upstream swordnet.com
    Primary Peers
        127.0.0.1:18080 DOWN
        127.0.0.1:28080 DOWN
    Backup Peers

all the visiting became 502 gateway problems, and the error.log
no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://swordnet.com/", host: "localhost:81"
but when I changed the upstream = "swordnet.com" into something else but swordnet.com
everything became normal again but no matter what's the status tomcats was (started,stoped),It always showed as below:

Nginx Worker PID: 13742
Upstream swordnet.com (NO checkers)
    Primary Peers
        127.0.0.1:18080 up
        127.0.0.1:28080 up
    Backup Peers

PR offer: Making Prometheus metrics more adaptable

I offer to make a PR for this library, I only want to ask for the direction of it beforehand.

The goal

At the end, no matter what, we need metrics looking like

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",role=\"%s\"} %d

instead of

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",status=\"%s\",role=\"%s\"} 1

=> I will remove the status tag and represent the current status by the value only.
That's the way, e.g. the haproxy_server_status metric is implemented and it is way easier to graph it and read the graphs, if the values differ instead of the tags.
Also it's easier to alert on that.
So this is the end result, to have a way to get metrics looking like above from this library.

Now the question is, what is the, from your side, preferred way (both options will be implemented non-breaking):
A) I implement an opts parameter for the prometheus_status_page()
--> st, err = hc.prometheus_status_page{ up_value = 1, down_value = 0, unknown_value = -1, include_status_tag = false }
B) I implement a second metric method to call if the status should change by value, not by tag
--> st, err = hc.prometheus_status_page_by_value()

Both would work, A) would create more flags and more parameters on internal functions, B) would create new functions and "duplicate" some code I guess.
There is still C), where I just implement the way we need it as a breaking change, but I guess this is the less preferred way?

Please decide on one of these.

Best regards!

health checker does not support stream directive

The health checker can't be used in the stream directive.

The snippet of nginx.conf:

stream {
    lua_shared_dict healthcheck 1m;
    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"

The error message is:
nginx: [emerg] "lua_shared_dict" directive is not allowed here in ./conf/nginx.conf:121

bad local

The nthrvariable here; https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/lib/resty/upstream/healthcheck.lua#L322

Should not be a local. It shadows the outer scope. as such the check slightly further down, where we wait for all spawned threads to finish will never run, because the outer scoped nthr will still be nil.

bad id reference

see https://github.com/openresty/lua-resty-upstream-healthcheck/blob/master/lib/resty/upstream/healthcheck.lua#L282

This line assumes that iterator - 1 equals the id of the peer. That is a bad assumption. It should use peers[iterator].id instead probably to use the true id value provided by the upstream module.

The code has multiple occurrences of this issue.

each worker spawns healthchecker - is this ok?

It looks like each worker is spawning hc.spawn_checker, is this by design?

Thanks!

Config:

worker_processes  2;
error_log logs/error.log  warn;

events {
    worker_connections 1024;
    use epoll;
}

...

    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"
        ngx.log(ngx.INFO, "initialising health checker for upstreams manually defined")
...

Log:

2016/09/28 12:15:47 [notice] 4778#0: using the "epoll" event method
2016/09/28 12:15:47 [notice] 4778#0: openresty/1.9.7.3
2016/09/28 12:15:47 [notice] 4778#0: built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
2016/09/28 12:15:47 [notice] 4778#0: OS: Linux 3.13.0-62-generic
2016/09/28 12:15:47 [notice] 4778#0: getrlimit(RLIMIT_NOFILE): 1024:4096
2016/09/28 12:15:47 [notice] 4779#0: start worker processes
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4780
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4781
2016/09/28 12:15:47 [notice] 4779#0: start cache manager process 4782
2016/09/28 12:15:47 [notice] 4779#0: start cache loader process 4783
**2016/09/28 12:15:47 [info] 4780#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 95.172.249.216:81: connection refused, context: ngx.timer
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.72.34.102:81: connection refused, context: ngx.timer
**2016/09/28 12:15:47 [info] 4782#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4781#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4783#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:48 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 12.121.12.21:8080: timeout, context: ngx.timer
2016/09/28 12:15:53 [info] 4781#0: *20 client 10.78.153.254 closed keepalive connection
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.146.149.126:81: connection refused, context: ngx.timer
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.701.12.222:81: connection refused, context: ngx.timer
2016/09/28 12:15:58 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 19.126.12.23:8080: timeout, context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.276.349.426:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 951.276.349.226:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 93.27.34.202:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 193.727.155.22:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:08 [error] 4783#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 110.322.47.21:8080: timeout, context: ngx.timer
2016/09/28 12:16:08 [warn] 4783#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 10.122.133.25:8080 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache 0.152M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/stream 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/re 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/events 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/twimg 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4779#0: signal 17 (SIGCHLD) received
2016/09/28 12:16:47 [notice] 4779#0: cache loader process 4783 exited with code 0
2016/09/28 12:16:47 [notice] 4779#0: signal 29 (SIGIO) received

Log messages for multiple upstreams with same servers, but different hostnames

Guys,
I’ve followed module recommendation for multiple upstreams.
Error log has messages like:

[error] 193#193: *118495174 [lua] healthcheck.lua:53: errlog(): healthcheck: failed to receive status line from 10.0.0.1:80: timeout, context: ngx.timer
[error] 190#190: *118495179 [lua] healthcheck.lua:53: errlog(): healthcheck: failed to receive status line from 10.0.0.5:80: timeout, context: ngx.timer

How do I tell, which upstream they belong too?

Full setup described here, snippet is below:

upstream one.abc.com_80 {
                server 10.0.0.1:80;
                server 10.0.0.2:80;
                ...
                server 10.0.0.8:80;
}
 
upstream two.abc.com_80 {
                server 10.0.0.1:80;
                server 10.0.0.2:80;
                ...
                server 10.0.0.8:80;
}

init_worker.lua

local servers = { 
	"one.abc.com", "two.abc.com", ...
}

local hc = require "resty.upstream.healthcheck"

local function checker(upstream, server_name)
	local ok, err = hc.spawn_checker{
		shm = "healthcheck",  -- defined by "lua_shared_dict"
		upstream = upstream,
		type = "http",

		http_req = "GET /HealthCheck/Health.ashx HTTP/1.0\r\nHost: " .. server_name .. "\r\n\r\n", -- raw HTTP request for checking
		interval = 2000, -- run the check cycle every 2 sec
		timeout = 1000, -- 1 sec is the timeout for network operations
		fall = 3, -- # of successive failures before turning a peer down
		rise = 2, -- # of successive successes before turning a peer up
		valid_statuses = {200}, -- a list valid HTTP status code
		concurrency = 10, -- concurrency level for test requests
	}

	if not ok then
		ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
	end

	return ok
end

local function main()
	for _, server in ipairs(servers) do
		checker(server .. "_80", server)
	end
end

main()

Does not compile with openresty-1.13.5.1rc0

Looked for similar issues on other revs but can't find an exact patch. Any suggestions?
openresty-1.13.5.1rc0
nginx-1.13.5

--2017-10-21 17:09:50--  https://github.com/openresty/lua-resty-upstream-healthcheck/tarball/v0.05
Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113
Connecting to github.com (github.com)|192.30.253.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/openresty/lua-resty-upstream-healthcheck/legacy.tar.gz/v0.05 [following]
--2017-10-21 17:09:50--  https://codeload.github.com/openresty/lua-resty-upstream-healthcheck/legacy.tar.gz/v0.05
Resolving codeload.github.com (codeload.github.com)... 192.30.253.121, 192.30.253.120
Connecting to codeload.github.com (codeload.github.com)|192.30.253.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘lua-resty-upstream-healthcheck-0.05.tar.gz’
 
0K .......... ..                                          9.72M=0.001s
 
2017-10-21 17:09:50 (9.72 MB/s) - ‘lua-resty-upstream-healthcheck-0.05.tar.gz’ saved [12463]
 
unix2dos: converting file README-win32.txt to DOS format ...
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c: In function ‘ngx_http_upstream_init_fair_rr’:
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c:543:28: error: ‘ngx_http_upstream_srv_conf_t {aka struct ngx_http_upstream_srv_conf_s}’ has no member named ‘default_port’
if (us->port == 0 && us->default_port == 0) {
^
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c:553:51: error: ‘ngx_http_upstream_srv_conf_t {aka struct ngx_http_upstream_srv_conf_s}’ has no member named ‘default_port’
u.port = (in_port_t) (us->port ? us->port : us->default_port);
^
ake[2]: *** [objs/addon/nginx-upstream-fair/ngx_http_upstream_fair_module.o] Error 1
ake[1]: *** [build] Error 2
ake: *** [all] Error 2

init_worker_by_lua core dumps

I am attempting to set up health checks using this library in combination with upstreams which are stored in redis. When trying to test the example provided in the README, the nginx worker process core dumps repeatedly.

nginx error.log:

2014/09/02 18:34:32 [alert] 4898#0: worker process 5415 exited on signal 11 (core dumped)
2014/09/02 18:34:32 [alert] 4898#0: worker process 5418 exited on signal 11 (core dumped)

I tried removing the contents of the init_worker_by_lua block (so it was empty), and experienced the same behavior. In both cases, the nginx configuration test passed. In an attempt to better understand what was happening, I straced the nginx master process, but was unable to make any inferences.

strace output:

socketpair(PF_LOCAL, SOCK_STREAM, 0, [3, 14]) = 0
ioctl(3, FIONBIO, [1])                  = 0
ioctl(14, FIONBIO, [1])                 = 0
ioctl(3, FIOASYNC, [1])                 = 0
fcntl(3, F_SETOWN, 4898)                = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f808e9eca50) = 8899
rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=8899, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
gettimeofday({1409683086, 122934}, NULL) = 0
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WNOHANG, NULL) = 8899
write(4, "2014/09/02 18:38:06 [alert] 4898"..., 90) = 90
wait4(-1, 0x7fff1a357a3c, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
gettimeofday({1409683086, 123357}, NULL) = 0
close(3)                                = 0
close(14)                               = 0
socketpair(PF_LOCAL, SOCK_STREAM, 0, [3, 14]) = 0
ioctl(3, FIONBIO, [1])                  = 0
ioctl(14, FIONBIO, [1])                 = 0
ioctl(3, FIOASYNC, [1])                 = 0
fcntl(3, F_SETOWN, 4898)                = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f808e9eca50) = 8901
rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=8901, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
gettimeofday({1409683086, 249215}, NULL) = 0
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WNOHANG, NULL) = 8901
write(4, "2014/09/02 18:38:06 [alert] 4898"..., 90) = 90
wait4(-1, 0x7fff1a357a3c, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
gettimeofday({1409683086, 249710}, NULL) = 0
close(3)                                = 0
close(14)                               = 0

From my debugging it does not seem that the issue is with this library, but rather the lua-nginx-module init_worker_by_lua function, or, more likely, my use of it. If this turns out to actually be an issue in the module, I can create an issue on that repository.

I am running openresty/1.7.2.1. Please let me know if additional information about my configuration is needed.

this module conflict with consistent_hash module

this module could not work with consistent_hash module，
if I using as following, it is ok.

#consistent_hash $key;
hash $key consistent;

Why Is the Health Check Status Inherited by PeerID During Reload?

I have a cluster that contains two servers. The health check of the first server is down and the second server is OK. When I delete the first server and retain the second server, after reload , the second server is immediately set to Down.This is because the peer status is updated based on the peer ID.See:

lua-resty-upstream-healthcheck/lib/resty/upstream/healthcheck.lua

Line 378 in 346f3a6

local key = gen_peer_key("d:", u, is_backup, id)

To solve this problem, can we use the name of the peer to identify a peer instead of the ID?

openresty / lua-resty-upstream-healthcheck Goto Github PK

lua-resty-upstream-healthcheck's Introduction

Name

Table of Contents

Description

For Users

For Bundle Maintainers

Additional Features

resolv.conf parsing

Mailing List

Report Bugs

Copyright & License

lua-resty-upstream-healthcheck's People

Contributors

Stargazers

Watchers

Forkers

lua-resty-upstream-healthcheck's Issues

你好，我用的是0.04的版本，以下是我的nginx的健康检测配置

The goal

Recommend Projects

Recommend Topics

Recommend Org

Jobs