GithubHelp home page GithubHelp logo

lua-resty-upstream-healthcheck's Introduction

Name

OpenResty - Turning Nginx into a Full-Fledged Scriptable Web Platform

Table of Contents

Description

OpenResty is a full-fledged web application server by bundling the standard nginx core, lots of 3rd-party nginx modules, as well as most of their external dependencies.

This bundle is maintained by Yichun Zhang (agentzh).

Because most of the nginx modules are developed by the bundle maintainers, it can ensure that all these modules are played well together.

The bundled software components are copyrighted by the respective copyright holders.

The homepage for this project is on openresty.org.

For Users

Visit the download page on the openresty.org web site to download the latest bundle tarball, and follow the installation instructions in the installation page.

For Bundle Maintainers

The bundle's source is at the following git repository:

https://github.com/openresty/openresty

To reproduce the bundle tarball, just do

make

at the top of the bundle source tree.

Please note that you may need to install some extra dependencies, like perl, dos2unix, and mercurial. On Fedora 22, for example, installing the dependencies is as simple as running the following commands:

sudo dnf install perl dos2unix mercurial

Back to TOC

Additional Features

In additional to the standard nginx core features, this bundle also supports the following:

Back to TOC

resolv.conf parsing

syntax: resolver address ... [valid=time] [ipv6=on|off] [local=on|off|path]

default: -

context: http, stream, server, location

Similar to the resolver directive in standard nginx core with additional support for parsing additional resolvers from the resolv.conf file format.

When local=on, the standard path of /etc/resolv.conf will be used. You may also specify arbitrary path to be used for parsing, for example: local=/tmp/test.conf.

When local=off, parsing will be disabled (this is the default).

This feature is not available on Windows platforms.

Back to TOC

Mailing List

You're very welcome to join the English OpenResty mailing list hosted on Google Groups:

https://groups.google.com/group/openresty-en

The Chinese mailing list is here:

https://groups.google.com/group/openresty

Back to TOC

Report Bugs

You're very welcome to report issues on GitHub:

https://github.com/openresty/openresty/issues

Back to TOC

Copyright & License

The bundle itself is licensed under the 2-clause BSD license.

Copyright (c) 2011-2019, Yichun "agentzh" Zhang (章亦春) [email protected], OpenResty Inc.

This module is licensed under the terms of the BSD license.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Back to TOC

lua-resty-upstream-healthcheck's People

Contributors

agentzh avatar bungle avatar chipitsine avatar flombardi avatar freggy avatar jonasbadstuebner avatar membphis avatar saaldjormike avatar szelcsanyi avatar thibaultcha avatar tieske avatar xiaocang avatar yueziii avatar zhousoft avatar zhuizhuhaomeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lua-resty-upstream-healthcheck's Issues

How about to add a function for getting specified peer status

  • my scenario:
    when the upstream is unhealthy, i need to get the specified peer status(eg: DOWN) and do some suspend work, but not to show the peer status with function status_page().

  • workaround:
    so i have added a function get_peer_status() to get specified peer status.
    and i'd like to share with everybody who come across the same scenario.

Routing requests to all peers when they all down

Sometimes it happens that healthcheckers configuration gets spoiled. For example, when the new version of application has been deployed on the backend servers without respecting healthchecks on the balancer. In this case all peers will be switched to 'DOWN' state and nginx will return 502 error message for incoming requests.
If balancer detects such issue it could just switch to default routing of all requests to all peers instead.
In theory if all peers are down there is no big reason who shows the error message in situation when we have real problems with all our backend servers.

It is kind of protection from the fool. However, for example AWS application balancer applies it by default.

I have already made a fork and added this feature. You can have a look here: https://github.com/eglinux/lua-resty-upstream-healthcheck
So, if you by chance find this way is ok - I can make a pull request.
Thanks!

PR: add upstream name to log message to distinguish among multiple upstreams with same server

Would it be possible to add upstream name to log message to distinguish among multiple upstreams with same server.
Namely, change current message from

healthcheck: failed to receive status line from 10.0.0.1:80:

to something like

healthcheck: failed to receive status line from 10.0.0.1:80: for upstream backend

Might be easier to add error format string as healthcheck field se we can set to log the fields we need?

This is follow up to the issue.

Thank you

健康检测失败

你好,我用的是0.04的版本,以下是我的nginx的健康检测配置

ok, err = hc.spawn_checker{
    shm = "healthcheck",
    upstream = "dlv-business-service",
    type = "http",
    http_req = "GET /dlv-business-service HTTP/1.0\r\nHost: dlv-business-service\r\n\r\n",
    interval = 5000,
    timeout = 3000,
    fall = 5,
    rise = 3,
    valid_statuses = {200, 302, 304},
    concurrency = 10,
}

upstream中有一个节点异常,但是lua-resty-upstream-healthcheck模块检测结果是正常的;但我将该节点的应用完全停止时,检测结果就是异常的了。

简而言之,lua-resty-upstream-healthcheck模块只实现了IP端口的检测,没有实现http检测,这种情况偶尔会出现,请问可能是什么原因导致的呢?

Timers "leaking" ?

Hi,
On several of our servers we use the "healthcheck" module, unfortunately something is wrong because after a while, the messages start to appear:
"failed to create timer: too many pending timers".
I know it's a standard error message, but it seemed strange to me so I decided to check what the situation was like.
It looks like a few timers appear very quickly (although I have an error here that the number is negative), but the number of pending timers is constantly increasing up to the configured maximum. Increasing the maximum amount dosen't help, just saturation takes a little more time.
After ca. hour in logs got values like this:

Current runing timers -2129 (??!!!)
Current penging timers 4096
failed to spawn health checker: failed to create timer: too many pending timers,

I do not think it possible for healthcheck to take so long time to block timers, unfortunately my skills in the lua are a bit too low to debug it properly.
Is this a known error? Can it be avoided somehow?
Any help / insights would be very appreciated.

Our config looks like this:

       local hc = require "resty.upstream.healthcheck"
 
        local ok, err = hc.spawn_checker{
            shm = "healthcheck1",
            upstream = "application_karaf",
            type = "http",
            http_req = "GET /tenant/health HTTP/1.0\r\nHost: localhost\r\n\r\n",
            interval = 3000, timeout = 1500, fall = 3, rise = 2,
            valid_statuses = {200, 302},
            concurrency = 10,
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            ngx.log(ngx.ERR, "Current penging timers ", ngx.timer.pending_count())
            ngx.log(ngx.ERR, "Current runing timers ", ngx.timer.running_count())
            return
        end

Test case failures on rhel 7.6 ppc64le platform

Hi All,

I had build the nginx binary on rhel 7.6 ppc64le (version 1.17.1.1rc0) from source code - https://github.com/openresty/openresty.
Please note that, I had copied and used ppc64le compiled LuaJIT code while building openresty (nginx).
Below command I used to compile the openresty -

./configure --with-cc-opt="-DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC" --with-http_image_filter_module --with-http_dav_module --with-http_auth_request_module --with-poll_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module --with-http_iconv_module --with-http_drizzle_module --with-http_postgres_module --with-http_addition_module --add-module=/usr/openresty/openresty_test_modules/nginx-eval-module --add-module=/usr/openresty/openresty_test_modules/replace-filter-nginx-module

And then tried to execute the test cases for 'lua-resty-upstream-healthcheck' like below -

[root]# pwd
/usr/openresty/openresty/openresty-1.17.1.1rc0/build/lua-resty-upstream-healthcheck-0.06
[root]# prove -r t

NOTE: The 'lua-resty-upstream-healthcheck' module version used was 0.06 which was downoaded with openresty bundle.

But I am getting below kind of repeated errors -

	[root  lua-resty-upstream-healthcheck-0.06]#
	[root  lua-resty-upstream-healthcheck-0.06]# prove -r t/
	t/sanity.t .. 1/99
	#   Failed test 'TEST 7: peers version upgrade (make up peers down) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12355 is turned up after 2 success(es)
	# warn(): healthcheck: peer 127.0.0.1:12356 is turned up after 2 success(es)
	# '
	#     doesn't match '(?^:^upgrading peers version to 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned up after 2 success\(es\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12356 is turned up after 2 success\(es\)
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){2,4}$)'
	TEST 15: peers > concurrency - WARNING: killing the child process 12933 with force... at /usr/local/share/perl5/Test/Nginx/Util.pm line 609.
	t/sanity.t .. 8/99
	#   Failed test 'TEST 15: peers > concurrency - response_body_like - response is expected (Upstream foo.com Primary Peers 127.0.0.1:12354 up 127.0.0.1:12355 up 127.0.0.1:12356 up 127.0.0.1:12357 up 127.0.0.1:12358 up Backup Peers 127.0.0.1:12359 up)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1635.
	#                   'Upstream foo.com
	#     Primary Peers
	#         127.0.0.1:12354 up
	#         127.0.0.1:12355 up
	#         127.0.0.1:12356 up
	#         127.0.0.1:12357 up
	#         127.0.0.1:12358 up
	#     Backup Peers
	#         127.0.0.1:12359 up
	# '
	#     doesn't match '(?^s:Upstream foo.com
	#     Primary Peers
	#         127.0.0.1:12354 DOWN
	#         127.0.0.1:12355 \S+
	#         127.0.0.1:12356 \S+
	#         127.0.0.1:12357 \S+
	#         127.0.0.1:12358 \S+
	#     Backup Peers
	#         127.0.0.1:12359 \S+
	# )'
	t/sanity.t .. 10/99
	#   Failed test 'TEST 15: peers > concurrency - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: failed to receive status line from 127.0.0.1:12354
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12357 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12358 was checked to be not ok
	# healthcheck: failed to receive status line from 127\.0\.0\.1:12354
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12359 was checked to be not ok
	# $)'
	t/sanity.t .. 14/99
	#   Failed test 'TEST 9: concurrency == 2 (odd number of peers) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^(?:spawn a thread checking primary peers 0 to 2
	# check primary peers 3 to 4
	# check backup peer 0
	# ){4,6}$)'
	t/sanity.t .. 25/99
	#   Failed test 'TEST 3: health check (bad case), no listening port in a primary peer - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){2,4}$)'
	t/sanity.t .. 31/99
	#   Failed test 'TEST 5: health check (bad case), timed out - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12354 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12354 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){0,2}$)'

	#   Failed test 'TEST 8: peers version upgrade (make down peers up) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12354 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^upgrading peers version to 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12354 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){3,5}$)'
	t/sanity.t .. 43/99
	#   Failed test 'TEST 4: health check (bad case), bad status - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: bad status code from 127.0.0.1:12355: 404
	# healthcheck: bad status code from 127.0.0.1:12355: 404
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 404
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 404
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 2 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){1,4}$)'
	t/sanity.t .. 48/99
	#   Failed test 'TEST 11: health check (good case), status ignored by default - tcp_query ok'
	#   at /usr/local/share/perl5/Test/Nginx/Util.pm line 188.
	#          got: ''
	#     expected: 'GET /status HTTP/1.0
	# Host: localhost
	#
	# '
	t/sanity.t .. 60/99
	#   Failed test 'TEST 1: health check (good case), status ignored by default - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){3,5}$)'

	#   Failed test 'TEST 6: health check (bad case), bad status, and then rise again - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'healthcheck: bad status code from 127.0.0.1:12355: 403
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned down after 1 failure(s)
	# warn(): healthcheck: peer 127.0.0.1:12355 is turned up after 2 success(es)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: bad status code from 127\.0\.0\.1:12355: 403
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned down after 1 failure\(s\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 1
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12355 is turned up after 2 success\(es\)
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# publishing peers version 2
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be ok
	# ){1,3}$)'
	t/sanity.t .. 71/99
	#   Failed test 'TEST 10: concurrency == 3 (odd number of peers) - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^(?:spawn a thread checking primary peer 0
	# spawn a thread checking primary peer 1
	# check primary peer 2
	# check backup peer 0
	# ){4,6}$)'
	t/sanity.t .. 86/99
	#   Failed test 'TEST 2: health check (bad case), no listening port in the backup peer - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   'warn(): healthcheck: peer 127.0.0.1:12356 is turned down after 2 failure(s)
	# '
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# warn\(\): healthcheck: peer 127\.0\.0\.1:12356 is turned down after 2 failure\(s\)
	# publishing peers version 1
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12355 was checked to be ok
	# healthcheck: peer 127\.0\.0\.1:12356 was checked to be not ok
	# ){2,4}$)'

	#   Failed test 'TEST 14: health check with ipv6 backend (good case), status ignored by default - response_body - response is expected (repeated req 0, req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1589.
	# @@ -3,6 +3,6 @@
	#          127.0.0.1:12354 up
	#          [::1]:12355 up
	#      Backup Peers
	# -        [0:0::1]:12356 up
	# +        [::1]:12356 up
	#  upstream addr: 127.0.0.1:12354
	#  upstream addr: [::1]:12355
	t/sanity.t .. 94/99
	#   Failed test 'TEST 14: health check with ipv6 backend (good case), status ignored by default - grep_error_log_out (req 0)'
	#   at /usr/local/share/perl5/Test/Nginx/Socket.pm line 1145.
	#                   ''
	#     doesn't match '(?^:^healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer \[::1\]:12355 was checked to be ok
	# healthcheck: peer \[0:0::1\]:12356 was checked to be ok
	# (?:healthcheck: peer 127\.0\.0\.1:12354 was checked to be ok
	# healthcheck: peer \[::1\]:12355 was checked to be ok
	# healthcheck: peer \[0:0::1\]:12356 was checked to be ok
	# ){3,7}$)'
	# Looks like you failed 15 tests of 99.
	t/sanity.t .. Dubious, test returned 15 (wstat 3840, 0xf00)
	Failed 15/99 subtests

	Test Summary Report
	-------------------
	t/sanity.t (Wstat: 3840 Tests: 99 Failed: 15)
	  Failed tests:  3, 9-10, 16, 27, 33, 39, 45, 57, 60, 68
					79, 88, 93-94
	  Non-zero exit status: 15
	Files=1, Tests=99, 37 wallclock secs ( 0.04 usr  0.01 sys +  0.60 cusr  0.25 csys =  0.90 CPU)
	Result: FAIL
	[root  lua-resty-upstream-healthcheck-0.06]#

Please help suggest if I need to export any specific environment/setup any additional service or should try any compiler flag/somehow increase timeout value to make these test cases pass?

nginx version (compiled with libdrizzle 1.0 and radius, mariadb, postgresql services setup) -

# nginx -V
nginx version: openresty/1.17.1.1rc0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC)
built with OpenSSL 1.0.2k-fips  26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC' --add-module=../ngx_devel_kit-0.3.1rc1 --add-module=../iconv-nginx-module-0.14 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../drizzle-nginx-module-0.1.11 --add-module=../ngx_postgres-1.0 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.15 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.7 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-http_image_filter_module --with-http_dav_module --with-http_auth_request_module --with-poll_module --with-stream --with-stream_ssl_module --with-stream_ssl_preread_module --with-http_ssl_module --with-http_addition_module --add-module=/usr/openresty/openresty_test_modules/nginx-eval-module --add-module=/usr/openresty/openresty_test_modules/replace-filter-nginx-module --with-stream --with-stream_ssl_preread_module

unix socket upstream

Hi! Kindly asking, since I failed to find any mention about unix sockets like unix:/tmp/forwarded.sock (without port) being supported in upstreams.

(You can send plain http requests to them since they are essentialy just tcp sockets)

Can anyone confirm and possibly update docs ?

Update:

it looks like this code is exactly for unix sockets, but not sure.

multiple stream group problem

How many checkers healthcheck does support maximum? Now, we have 20+ upstream groups, but we have found we can only have 8 checkers, when we write 9th checker, nginx can't start and report error: missing '}' character, but our config don't miss any '}'.

one domain return 502

Many upstream works, but this domain return 502
How can I debug?
conf like this
local ok, err = hc.spawn_checker{
shm = 'healthcheck', -- defined by 'lua_shared_dict'
upstream = 'test', -- defined by 'upstream'
type = 'http',

        http_req = 'GET /status HTTP/1.0\r\nHost: healthcheck.nxin.com\r\n\r\n',
                -- raw HTTP request for checking

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302, 301,401,402,403, 404},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }

request respones below

curl 10.21.1.11:8080/status -v

  • About to connect() to 10.21.1.11 port 8080 (#0)
  • Trying 10.21.1.11...
  • Connected to 10.21.1.11 (10.21.1.11) port 8080 (#0)

GET /status HTTP/1.1
User-Agent: curl/7.29.0
Host: 10.221.14.110:8080
Accept: /

< HTTP/1.1 302
< Location: https:

attempt to index field 're' (a nil value)

error.log

2016/08/15 14:06:34 [error] 4566#0: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:9: attempt to index field 're' (a nil value)
stack traceback:
    /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:9: in main chunk
    [C]: in function 'require'
    init_worker_by_lua:2: in main chunk

Am I missing some lua dependencies?

v1/healthcheck 接口返回信息

使用场景:当成7层LB替代Nginx使用
目前问题:所有upstream上游都配置了主动检查(tcp检测),但是通过GET v1/healthcheck 只返回了其中一组upstream状态信息。
需求:想要类似于Nginx的check_status页面,实时查看所有upstream的状态。
upstream配置界面
upstream配置界面2
Uploading 接口返回信息.jpg…

Support for specifying custom port

Would it be possible to support the following?

type = "http",
port = 80

This would be a reasonable workaround for those needing support for SSL backends, as long as they also listen on port 80, it's a good enough health check for me, and should be much easier to implement than adding type = "https"

openresty configuration upstream healthcheck nginx core dump

os version: Centos 6.5 x86_64 2.6.32-431.29.2.el6
openresty version : 1.9.7.1
config file path : /usr/local/openresty/nginx/conf/healthcheck.conf

lua_shared_dict healthcheck 10m;

lua_socket_log_errors off;

init_worker_by_lua_block {
    local hc = require "resty.upstream.healthcheck"

    local ok, err = hc.spawn_checker{
        shm = "healthcheck",  -- defined by "lua_shared_dict"
        upstream = "test", -- defined by "upstream"
        type = "http",

        http_req = "GET /  HTTP/1.0\r\nHost: test\r\n\r\n",

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }
    if not ok then
        ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
        return
    end

}

vhosts 目录当中配置引用该文件
e.g
include /usr/local/openresty/nginx/conf/healthcheck.conf

server {
..........
}

重新reload 后,nginx error.log 报core dump

2016/03/17 14:09:38 [alert] 32936#0: worker process 27057 exited on signal 11 (core dumped)
2016/03/17 14:09:40 [alert] 32936#0: worker process 27050 exited on signal 11 (core dumped)
2016/03/17 14:09:43 [alert] 32936#0: worker process 27059 exited on signal 11 (core dumped)
2016/03/17 14:09:44 [alert] 32936#0: worker process 27060 exited on signal 11 (core dumped)
2016/03/17 14:09:46 [alert] 32936#0: worker process 27061 exited on signal 11 (core dumped)

invalid http_req introduce 400 code

I introduce this module to my project but got 400 error for the invalid http_req.

    local ok, err = hc.spawn_checker{
        shm = "healthcheck",  -- defined by "lua_shared_dict"
        upstream = "foo.com", -- defined by "upstream"
        type = "http",

        -- the http_req has a bug ? 
        -- http_req字段的格式可能有问题? 至少在我这里是老得到400 code. 
        http_req = "GET /status HTTP/1.0\r\nHost: foo.com\r\n\r\n",
                -- raw HTTP request for checking

        interval = 2000,  -- run the check cycle every 2 sec
        timeout = 1000,   -- 1 sec is the timeout for network operations
        fall = 3,  -- # of successive failures before turning a peer down
        rise = 2,  -- # of successive successes before turning a peer up
        valid_statuses = {200, 302},  -- a list valid HTTP status code
        concurrency = 10,  -- concurrency level for test requests
    }

Everything is ok if i update the http_req as below:
http_req = "GET /status HTTP/1.0\r\n\r\nHost: foo.com",

please advice.

lua entry thread aborted: runtime error: string length overflow stack traceback

use curl for test

`

curl http://127.0.0.1/status
curl: (52) Empty reply from server

`

and log is:

`

2016/03/02 13:50:01 [error] 18864#0: *150 lua entry thread aborted: runtime error: string length overflow
stack traceback:
coroutine 0:
    [C]: in function 'get_primary_peers'
    /etc/nginx/lualib/resty/upstream/healthcheck.lua:682: in function 'status_page'
    content_by_lua(default.conf:102):4: in function <content_by_lua(default.conf:102):1>, client: 127.0.0.1, server: xxx, request: "GET /status HTTP/1.1", host: "127.0.0.1"

`

`

cat /etc/nginx/nginx.conf

user  nginx;
worker_processes  32;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;
error_log  /home/ceph/log/nginx/error.log;
pid        /var/run/nginx.pid;
worker_rlimit_nofile 65535;



events {
    worker_connections  20000;
}


http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
    vhost_traffic_status_zone;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent $request_length "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;
    access_log  /home/ceph/log/nginx/access.log  main;


    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  30;

upstream foo.com {
    server 127.0.0.1:801;
    server 192.168.170.1:80;
}

lua_shared_dict healthcheck 1m;
lua_socket_log_errors off;
init_worker_by_lua_block {
    local hc = require "resty.upstream.healthcheck"
    local ok, err = hc.spawn_checker({
        shm = "healthcheck",
        upstream = "foo.com",
        type = "http",
        http_req = "GET / HTTP/1.1\r\nHost: 127.0.0.1\r\n\r\n",
        interval = 2000,
        timeout = 1000,
        fall = 3,
        rise = 2,
        valid_statuses = {200, 302},
        concurrency = 10,
        })
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            return
        end
    }

    include /etc/nginx/conf.d/*.conf;
}

`

`

cat /etc/nginx/conf.d/default.conf

server {
    listen       80 backlog=10240;
    gzip off;
    client_max_body_size 0;
    server_name  xxx;



    location  / {
    set $target '';
    proxy_buffering off;
    proxy_ignore_client_abort on ;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_pass http://foo.com;
    }

    location = /status {
        access_log off;
        default_type text/plain;
        content_by_lua_block {
            local hc = require "resty.upstream.healthcheck"
            ngx.say("Nginx Worker PID: ", ngx.worker.pid())
            ngx.print(hc.status_page())
            }
    }
}

`

healthchecl behind proxy

Thank for greate module.
I'm using openrestry newest version. I'm using proxy . I config your module and it's working but status is down

Upstream test
    Primary Peers
        10.1.11.12:222 DOWN
        10.1.11.12:222 DOWN
        10.1.11.13:222 DOWN
    Backup Peers

no request object found?

I used the openresty version 1.13.62, and I installed two openresty machines to do the test work. I followed the same test code in lua-resty-upstream-healthcheck, but the following error occurred, please help to answer, the first time I came into contact with openresty, thanks!

problem:
image

this is what I wrote in the init by lua block phase:
image

nginx.conf
image

image

Feature request: support dynamic upstream

Hi,

The current usage scenario assumes that the upstream configuration will not change dynamically, we only can change upstream configuration via reload nginx.

In some scenarios (see also #55), we want to dynamically modify the upstream configuration without reload nginx, and the current design cannot satisfy it, so can we add a parameter to hc.spawn_checker to support dynamic upstream mode?

case1: support upstream modify only

In this case, just add a parameter to hc.spawn_checker:

init_worker_by_lua_block {
    local us, err = get_upstreams()
    if not us then
        return
    end

    for _, u in ipairs(us) do
        local ok, err = hc.spawn_checker{
            shm = "healthcheck",
            type = "http",
            upstream = u,

            dynamic = true, -- enable dynamic upstream mode
        }
    end
}

case2: support upstream add/delete/modify

In this case, we should add a timer to watch global upstream configuration:

init_worker_by_lua_block {
    require "resty.core"

    local upstream = require "ngx.upstream"
    local hc = require "resty.upstream.healthcheck"

    local get_upstreams = upstream.get_upstreams

    local watch
    watch = function (premature)
        if premature then
            return
        end

        local us, err = get_upstreams()
        if not us then
            return
        end

        for _, u in ipairs(us) do
            local ok, err = hc.spawn_checker{
                shm = "healthcheck",
                type = "http",
                upstream = u,

                dynamic = true, -- enable dynamic upstream mode
            }
        end

        local ok, err = ngx.timer.at(2, watch)
    end

    local ok, err = ngx.timer.at(2, watch) -- create a timer to watch global upstream configuration every 2s.
}

Dynamic upstream mode checker behave as follows:

  • When call spawn_checker, if the upstream not exists, new checker will be created, otherwise ignore. (case1)
  • Any time, when delete an upstream, the checker will exit automatically. (case 1)
  • Any time, when modify an upstream (compared with peers md5 digest), the checker will update peers automatically. (case1 + case2)

Set a different upstream manager module

The module currently relies on the ngx.upstream module for managing the upstreams.

Unfortunately we cannot use that module for our usecase. See discussion here

Would you accept a PR to set the module explicitly, making it configurable? so I won't have to fork the code.

If so, should it be a setting on module level (in an upvalue, shared by all checkers spawned), or a setting per spawned checker. Imho the latter, but please share your thoughts.

NO checkers problem?

openresty version is 1.11.2.2
A nginx & B nginx,have the same upstreams contents,eg:
A:
upstream app_cluster {
ip_hash;
server 192.168.0.100:9082;
server 192.168.0.100:9083;
}
B:
upstream proxy_AB {
ip_hash;
server 192.168.0.100:9082;
server 192.168.0.100:9083;
}
after shutdown 9082&9083 app port server,the A nginx has correct check result:
Nginx Worker PID: 21127
Upstream app_cluster
Primary Peers
192.168.0.100:9082 DOWN
192.168.0.100:9083 DOWN
Backup Peers

but the B nginx tips NO checkers,follow:
Nginx Worker PID: 31735
Upstream proxy_AB (NO checkers)
Primary Peers
192.168.0.100:9082 DOWN
192.168.0.100:9083 DOWN
Backup Peers

===============================
B nginx is right?

Init error using config as given in synopsis section of README

The error msg is [error] 9595#0: init_worker_by_lua error: init_worker_by_lua:11: function arguments expected near '.'

And it seems like it's caused by this comment line: -- then you should write this instead: http_req = "GET /status HTTP/1.0\r\nHost: foo.com\r\n\r\n",, I think it's a trial issue but I failed to fix this :-(, as I'm new to both OpenResty and Lua.

checking response body as part of healthcheck?

Hello,

is there a way to check the contents of the response body in addition to the list of valid statuses? For example, ability to fail it a situation where the /healthCheck response code is HTTP 200 but body contains 'failure'.

Thank you!

Support for upstreams that use https.

This doesn't seem to support upstreams that require ssl connections. Is there any plans to expand this beyond pure http tests? Or any suggestions on how I could work around this. I'm fairly new to lua/openresty so I may be overlooking something obvious.

Thanks!

Test Case failures

Test case 14 failed to be executed

Error:
image
The IPv6 format is inconsistent

The linux kernel simplifies IPv6,Can we modify this use case?

system:
image

bad status line always returned from healthcheck

Hi, I've just tried setting up this module, and all peers are very quickly marked as down.

My init_worker code is this, basically copy/pasted from the docs:

local ok, err = hc.spawn_checker({
  shm = "healthcheck",
  upstream = "backend",
  type = "http",
  http_req = [[GET /XXX/XXX HTTP/1.0\r\nHost: www.academia.edu\r\n\r\n]],
  interval = 2000,
  timeout = 1000,
  fall = 3,
  rise = 2,
  valid_statuses = { 200 },
  concurrency = 10
})
if not ok then
  ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
  return 
end

In the logs I see:

2014/06/19 16:36:40 [error] 2623#0: [lua] healthcheck.lua:57: errlog(): healthcheck: bad status line from XX.XXX.XXX.XXX:XX: <html>, context: ngx.timer
2014/06/19 16:36:40 [error] 2623#0: lua user thread aborted: runtime error: /var/lib/academia-config/nginx/shared/healthcheck.lua:255: bad argument #2 to 'sub' (number expected, got nil)
stack traceback:
coroutine 0:
        [C]: in function 'sub'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:255: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:202>
coroutine 1:
        [C]: in function 'receive'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:236: in function 'check_peer'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:315: in function 'check_peers'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:460: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:454>
        [C]: in function 'pcall'
        /var/lib/academia-config/nginx/shared/healthcheck.lua:490: in function </var/lib/academia-config/nginx/shared/healthcheck.lua:485>, context: ngx.timer

This is surprising, because if I telnet to the backend and paste that request, it responds correctly:

# telnet xxx.xxx.xx 81
Trying xx.xxx.xxx.xxx...
Connected to ec2-xx-xxx-xxx-xxx.compute-1.amazonaws.com.
Escape character is '^]'.
GET /XXX/XXX HTTP/1.0
Host: www.academia.edu

HTTP/1.1 200 OK
Date: Fri, 20 Jun 2014 00:11:40 GMT
Content-Type: text/plain; charset=utf-8
Connection: close
Vary: Accept-Encoding
Status: 200 OK
Content-Language: en
X-Logged-In: false
ETag: "0e66aa8f5070dcc18fb58711fedfa4eb"
Cache-Control: max-age=0, private, must-revalidate
X-Request-Id: c451fceb4eb3f0dca7b7f97df8e8543d
X-Runtime: 0.045754

Rails up
Postgres master up
Memcached up
Persistent Redis up
Counts Redis up
Zsets redis up Connection closed by foreign host.

Is there some misconfiguration somewhere in my code?

ngx_lua 0.9.5+ required

when using openresty/openresty:1.15.8.3-2-alpine-fat image I receive an error that ngx_lua 0.9.5+ is required.

web_1  | 2020/05/22 21:49:48 [error] 8#8: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 6#6: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 9#9: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
web_1  | 2020/05/22 21:49:48 [error] 7#7: init_worker_by_lua error: /usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: ngx_lua 0.9.5+ required
web_1  | stack traceback:
web_1  | 	[C]: in function 'error'
web_1  | 	/usr/local/openresty/lualib/resty/upstream/healthcheck.lua:28: in main chunk
web_1  | 	[C]: in function 'require'
web_1  | 	init_worker_by_lua:2: in main chunk
/ # nginx -V
nginx version: openresty/1.15.8.3
built by gcc 9.2.0 (Alpine 9.2.0)
built with OpenSSL 1.1.1g  21 Apr 2020
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -DNGX_LUA_ABORT_AT_PANIC -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl/include' --add-module=../ngx_devel_kit-0.3.1rc1 --add-module=../echo-nginx-module-0.61 --add-module=../xss-nginx-module-0.06 --add-module=../ngx_coolkit-0.2 --add-module=../set-misc-nginx-module-0.32 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.08 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.15 --add-module=../ngx_lua_upstream-0.07 --add-module=../headers-more-nginx-module-0.33 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.19 --add-module=../redis2-nginx-module-0.15 --add-module=../redis-nginx-module-0.3.7 --add-module=../rds-json-nginx-module-0.15 --add-module=../rds-csv-nginx-module-0.09 --add-module=../ngx_stream_lua-0.0.7 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl/lib -Wl,-rpath,/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl/lib' --with-pcre --with-compat --with-file-aio --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_geoip_module=dynamic --with-http_gunzip_module --with-http_gzip_static_module --with-http_image_filter_module=dynamic --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-http_xslt_module=dynamic --with-ipv6 --with-mail --with-mail_ssl_module --with-md5-asm --with-pcre-jit --with-sha1-asm --with-stream --with-stream_ssl_module --with-threads --with-stream --with-stream_ssl_preread_module

the problem of returning of healthcheck

my code in nginx.conf

upstream swordnet.com{
		server 127.0.0.1:18080;#two tomcats have bean started
		server 127.0.0.1:28080;
	}
lua_shared_dict ngx_stats 500m;
lua_shared_dict healthcheck 1m;
lua_socket_log_errors off;
init_worker_by_lua_block {
	local hc = require "resty.upstream.healthcheck"
	local ok, err = hc.spawn_checker {
		shm = "healthcheck",
		type = "http",
		upstream = "swordnet.com",
		http_req = "GET /health.txt HTTP/1.0\r\nHost: swordnet.com\r\n\r\n",
		valid_statuses = {200, 302}
	}
	if not ok then
	ngx.log(ngx.ERR, "=======> failed to spawn health checker: ", err)
	return
	end
}

.................

location /server/status {
			access_log off;
			default_type text/plain;
			allow 127.0.0.1;
			deny all;
			content_by_lua_block {
				local hc = require "resty.upstream.healthcheck"
				ngx.say("Nginx Worker PID: ", ngx.worker.pid())
				ngx.print(hc.status_page())
			}
		}

when I visited the url http://localhost/server/status on browser. It showed as below:

Nginx Worker PID: 12158
Upstream swordnet.com
    Primary Peers
        127.0.0.1:18080 DOWN
        127.0.0.1:28080 DOWN
    Backup Peers

all the visiting became 502 gateway problems, and the error.log
no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://swordnet.com/", host: "localhost:81"
but when I changed the upstream = "swordnet.com" into something else but swordnet.com
everything became normal again but no matter what's the status tomcats was (started,stoped),It always showed as below:

Nginx Worker PID: 13742
Upstream swordnet.com (NO checkers)
    Primary Peers
        127.0.0.1:18080 up
        127.0.0.1:28080 up
    Backup Peers

PR offer: Making Prometheus metrics more adaptable

I offer to make a PR for this library, I only want to ask for the direction of it beforehand.

The goal

At the end, no matter what, we need metrics looking like

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",role=\"%s\"} %d

instead of

nginx_upstream_status_info{name=\"%s\",endpoint=\"%s\",status=\"%s\",role=\"%s\"} 1

=> I will remove the status tag and represent the current status by the value only.
That's the way, e.g. the haproxy_server_status metric is implemented and it is way easier to graph it and read the graphs, if the values differ instead of the tags.
Also it's easier to alert on that.
So this is the end result, to have a way to get metrics looking like above from this library.

Now the question is, what is the, from your side, preferred way (both options will be implemented non-breaking):
A) I implement an opts parameter for the prometheus_status_page()
--> st, err = hc.prometheus_status_page{ up_value = 1, down_value = 0, unknown_value = -1, include_status_tag = false }
B) I implement a second metric method to call if the status should change by value, not by tag
--> st, err = hc.prometheus_status_page_by_value()

Both would work, A) would create more flags and more parameters on internal functions, B) would create new functions and "duplicate" some code I guess.
There is still C), where I just implement the way we need it as a breaking change, but I guess this is the less preferred way?

Please decide on one of these.

Best regards!

health checker does not support stream directive

The health checker can't be used in the stream directive.

The snippet of nginx.conf:

stream {
    lua_shared_dict healthcheck 1m;
    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"

The error message is:
nginx: [emerg] "lua_shared_dict" directive is not allowed here in ./conf/nginx.conf:121

each worker spawns healthchecker - is this ok?

It looks like each worker is spawning hc.spawn_checker, is this by design?

Thanks!

Config:

worker_processes  2;
error_log logs/error.log  warn;

events {
    worker_connections 1024;
    use epoll;
}

...

    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"
        ngx.log(ngx.INFO, "initialising health checker for upstreams manually defined")
...

Log:

2016/09/28 12:15:47 [notice] 4778#0: using the "epoll" event method
2016/09/28 12:15:47 [notice] 4778#0: openresty/1.9.7.3
2016/09/28 12:15:47 [notice] 4778#0: built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
2016/09/28 12:15:47 [notice] 4778#0: OS: Linux 3.13.0-62-generic
2016/09/28 12:15:47 [notice] 4778#0: getrlimit(RLIMIT_NOFILE): 1024:4096
2016/09/28 12:15:47 [notice] 4779#0: start worker processes
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4780
2016/09/28 12:15:47 [notice] 4779#0: start worker process 4781
2016/09/28 12:15:47 [notice] 4779#0: start cache manager process 4782
2016/09/28 12:15:47 [notice] 4779#0: start cache loader process 4783
**2016/09/28 12:15:47 [info] 4780#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 95.172.249.216:81: connection refused, context: ngx.timer
2016/09/28 12:15:47 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.72.34.102:81: connection refused, context: ngx.timer
**2016/09/28 12:15:47 [info] 4782#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4781#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua*
2016/09/28 12:15:47 [info] 4783#0: [lua] init_worker_by_lua:3: initialising health checker for upstreams manually defined, context: init_worker_by_lua***
2016/09/28 12:15:48 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 12.121.12.21:8080: timeout, context: ngx.timer
2016/09/28 12:15:53 [info] 4781#0: *20 client 10.78.153.254 closed keepalive connection
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.146.149.126:81: connection refused, context: ngx.timer
2016/09/28 12:15:57 [error] 4780#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 193.701.12.222:81: connection refused, context: ngx.timer
2016/09/28 12:15:58 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 19.126.12.23:8080: timeout, context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 91.276.349.426:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 951.276.349.226:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:07 [error] 4781#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 93.27.34.202:81: connection refused, context: ngx.timer
2016/09/28 12:16:07 [warn] 4781#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 193.727.155.22:81 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:08 [error] 4783#0: [lua] healthcheck.lua:57: errlog(): healthcheck: failed to connect to 110.322.47.21:8080: timeout, context: ngx.timer
2016/09/28 12:16:08 [warn] 4783#0: [lua] healthcheck.lua:53: warn(): healthcheck: peer 10.122.133.25:8080 is turned down after 3 failure(s), context: ngx.timer
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache 0.152M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/stream 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/re 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/events 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4783#0: http file cache: /usr/local/openresty/nginx/cache/twimg 0.000M, bsize: 4096
2016/09/28 12:16:47 [notice] 4779#0: signal 17 (SIGCHLD) received
2016/09/28 12:16:47 [notice] 4779#0: cache loader process 4783 exited with code 0
2016/09/28 12:16:47 [notice] 4779#0: signal 29 (SIGIO) received

Log messages for multiple upstreams with same servers, but different hostnames

Guys,
I’ve followed module recommendation for multiple upstreams.
Error log has messages like:

[error] 193#193: *118495174 [lua] healthcheck.lua:53: errlog(): healthcheck: failed to receive status line from 10.0.0.1:80: timeout, context: ngx.timer
[error] 190#190: *118495179 [lua] healthcheck.lua:53: errlog(): healthcheck: failed to receive status line from 10.0.0.5:80: timeout, context: ngx.timer

How do I tell, which upstream they belong too?

Full setup described here, snippet is below:

upstream one.abc.com_80 {
                server 10.0.0.1:80;
                server 10.0.0.2:80;
                ...
                server 10.0.0.8:80;
}
 
upstream two.abc.com_80 {
                server 10.0.0.1:80;
                server 10.0.0.2:80;
                ...
                server 10.0.0.8:80;
}

init_worker.lua

local servers = { 
	"one.abc.com", "two.abc.com", ...
}

local hc = require "resty.upstream.healthcheck"

local function checker(upstream, server_name)
	local ok, err = hc.spawn_checker{
		shm = "healthcheck",  -- defined by "lua_shared_dict"
		upstream = upstream,
		type = "http",

		http_req = "GET /HealthCheck/Health.ashx HTTP/1.0\r\nHost: " .. server_name .. "\r\n\r\n", -- raw HTTP request for checking
		interval = 2000, -- run the check cycle every 2 sec
		timeout = 1000, -- 1 sec is the timeout for network operations
		fall = 3, -- # of successive failures before turning a peer down
		rise = 2, -- # of successive successes before turning a peer up
		valid_statuses = {200}, -- a list valid HTTP status code
		concurrency = 10, -- concurrency level for test requests
	}

	if not ok then
		ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
	end

	return ok
end

local function main()
	for _, server in ipairs(servers) do
		checker(server .. "_80", server)
	end
end

main()


Does not compile with openresty-1.13.5.1rc0

Looked for similar issues on other revs but can't find an exact patch. Any suggestions?
openresty-1.13.5.1rc0
nginx-1.13.5

--2017-10-21 17:09:50--  https://github.com/openresty/lua-resty-upstream-healthcheck/tarball/v0.05
Resolving github.com (github.com)... 192.30.253.112, 192.30.253.113
Connecting to github.com (github.com)|192.30.253.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/openresty/lua-resty-upstream-healthcheck/legacy.tar.gz/v0.05 [following]
--2017-10-21 17:09:50--  https://codeload.github.com/openresty/lua-resty-upstream-healthcheck/legacy.tar.gz/v0.05
Resolving codeload.github.com (codeload.github.com)... 192.30.253.121, 192.30.253.120
Connecting to codeload.github.com (codeload.github.com)|192.30.253.121|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘lua-resty-upstream-healthcheck-0.05.tar.gz’
 
0K .......... ..                                          9.72M=0.001s
 
2017-10-21 17:09:50 (9.72 MB/s) - ‘lua-resty-upstream-healthcheck-0.05.tar.gz’ saved [12463]
 
unix2dos: converting file README-win32.txt to DOS format ...
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c: In function ‘ngx_http_upstream_init_fair_rr’:
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c:543:28: error: ‘ngx_http_upstream_srv_conf_t {aka struct ngx_http_upstream_srv_conf_s}’ has no member named ‘default_port’
if (us->port == 0 && us->default_port == 0) {
^
/tmp/nginx-upstream-fair/ngx_http_upstream_fair_module.c:553:51: error: ‘ngx_http_upstream_srv_conf_t {aka struct ngx_http_upstream_srv_conf_s}’ has no member named ‘default_port’
u.port = (in_port_t) (us->port ? us->port : us->default_port);
^
ake[2]: *** [objs/addon/nginx-upstream-fair/ngx_http_upstream_fair_module.o] Error 1
ake[1]: *** [build] Error 2
ake: *** [all] Error 2

init_worker_by_lua core dumps

I am attempting to set up health checks using this library in combination with upstreams which are stored in redis. When trying to test the example provided in the README, the nginx worker process core dumps repeatedly.

nginx error.log:

2014/09/02 18:34:32 [alert] 4898#0: worker process 5415 exited on signal 11 (core dumped)
2014/09/02 18:34:32 [alert] 4898#0: worker process 5418 exited on signal 11 (core dumped)

I tried removing the contents of the init_worker_by_lua block (so it was empty), and experienced the same behavior. In both cases, the nginx configuration test passed. In an attempt to better understand what was happening, I straced the nginx master process, but was unable to make any inferences.

strace output:

socketpair(PF_LOCAL, SOCK_STREAM, 0, [3, 14]) = 0
ioctl(3, FIONBIO, [1])                  = 0
ioctl(14, FIONBIO, [1])                 = 0
ioctl(3, FIOASYNC, [1])                 = 0
fcntl(3, F_SETOWN, 4898)                = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f808e9eca50) = 8899
rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=8899, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
gettimeofday({1409683086, 122934}, NULL) = 0
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WNOHANG, NULL) = 8899
write(4, "2014/09/02 18:38:06 [alert] 4898"..., 90) = 90
wait4(-1, 0x7fff1a357a3c, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
gettimeofday({1409683086, 123357}, NULL) = 0
close(3)                                = 0
close(14)                               = 0
socketpair(PF_LOCAL, SOCK_STREAM, 0, [3, 14]) = 0
ioctl(3, FIONBIO, [1])                  = 0
ioctl(14, FIONBIO, [1])                 = 0
ioctl(3, FIOASYNC, [1])                 = 0
fcntl(3, F_SETOWN, 4898)                = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fcntl(14, F_SETFD, FD_CLOEXEC)          = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f808e9eca50) = 8901
rt_sigsuspend([])                       = ? ERESTARTNOHAND (To be restarted if no handler)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=8901, si_status=SIGSEGV, si_utime=0, si_stime=0} ---
gettimeofday({1409683086, 249215}, NULL) = 0
wait4(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], WNOHANG, NULL) = 8901
write(4, "2014/09/02 18:38:06 [alert] 4898"..., 90) = 90
wait4(-1, 0x7fff1a357a3c, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = -1 EINTR (Interrupted system call)
gettimeofday({1409683086, 249710}, NULL) = 0
close(3)                                = 0
close(14)                               = 0

From my debugging it does not seem that the issue is with this library, but rather the lua-nginx-module init_worker_by_lua function, or, more likely, my use of it. If this turns out to actually be an issue in the module, I can create an issue on that repository.

I am running openresty/1.7.2.1. Please let me know if additional information about my configuration is needed.

Why Is the Health Check Status Inherited by PeerID During Reload?

I have a cluster that contains two servers. The health check of the first server is down and the second server is OK. When I delete the first server and retain the second server, after reload , the second server is immediately set to Down.This is because the peer status is updated based on the peer ID.See:

local key = gen_peer_key("d:", u, is_backup, id)

To solve this problem, can we use the name of the peer to identify a peer instead of the ID?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.