kong / lua-resty-healthcheck Goto Github PK

Healthcheck library for OpenResty to validate upstream service status

Home Page: https://kong.github.io/lua-resty-healthcheck/topics/README.md.html

License: Apache License 2.0

Makefile 0.44% Lua 99.56%

lua-resty-healthcheck's Issues

checker:event_handler fails on a "remove" event

When the self.targets table is populated, the code uses hostname or ip in case the user did not provide a hostname. However, when the same table is accessed for a "remove" event, the code simply refers to target_found.hostname, which results in an attempt to access a nil key.

Adding: or target_found.ip to that key should suffice.

Exception!!! 'report_tcp_success()' unimplemented in healthcheck.lua

2019/05/10 12:23:29 [error] 12056#0: *195028 lua user thread aborted: runtime error: /usr/local/share/lua/5.1/resty/healthcheck.lua:737: attempt to call method 'report_tcp_success' (a nil value)
stack traceback:
coroutine 0:
/usr/local/share/lua/5.1/resty/healthcheck.lua: in function 'run_single_check'
/usr/local/share/lua/5.1/resty/healthcheck.lua:798: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:794>
coroutine 1:
[C]: in function 'connect'
/usr/local/share/lua/5.1/resty/healthcheck.lua:726: in function 'run_single_check'
/usr/local/share/lua/5.1/resty/healthcheck.lua:798: in function 'run_work_package'
/usr/local/share/lua/5.1/resty/healthcheck.lua:826: in function 'active_check_targets'
/usr/local/share/lua/5.1/resty/healthcheck.lua:907: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:871>, context: ngx.timer, client: 127.0.0.1, server: 127.0.0.1:8001
2019/05/10 12:23:29 [error] 12056#0: *195028 lua user thread aborted: runtime error: /usr/local/share/lua/5.1/resty/healthcheck.lua:737: attempt to call method 'report_tcp_success' (a nil value)
stack traceback:
coroutine 0:
/usr/local/share/lua/5.1/resty/healthcheck.lua: in function 'run_single_check'
/usr/local/share/lua/5.1/resty/healthcheck.lua:798: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:794>
coroutine 1:
[C]: in function 'connect'
/usr/local/share/lua/5.1/resty/healthcheck.lua:726: in function 'run_single_check'
/usr/local/share/lua/5.1/resty/healthcheck.lua:798: in function 'run_work_package'
/usr/local/share/lua/5.1/resty/healthcheck.lua:826: in function 'active_check_targets'
/usr/local/share/lua/5.1/resty/healthcheck.lua:907: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:871>, context: ngx.timer, client: 127.0.0.1, server: 127.0.0.1:8001
2019/05/10 12:23:29 [error] 12056#0: *195028 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/resty/healthcheck.lua:737: attempt to call method 'report_tcp_success' (a nil value)
stack traceback:
coroutine 0:
/usr/local/share/lua/5.1/resty/healthcheck.lua: in function 'run_single_check'
/usr/local/share/lua/5.1/resty/healthcheck.lua:798: in function 'run_work_package'
/usr/local/share/lua/5.1/resty/healthcheck.lua:826: in function 'active_check_targets'
/usr/local/share/lua/5.1/resty/healthcheck.lua:907: in function </usr/local/share/lua/5.1/resty/healthcheck.lua:871>, context: ngx.timer, client: 127.0.0.1, server: 127.0.0.1:8001

http 1.0 is out-of-date?

Hi,
I'm just a Kong user, know little about lua.
Just wondering the code line 1050:
local request = ("GET %s HTTP/1.0\r\n%sHost: %s\r\n\r\n"):format(path, headers, hostheader or hostname or ip)
if such health check request can adopt HTTP/1.1 ?

Is there any plans to use 2.0 version in Kong?

As I understand version 2.0 of lua-resty-healthcheck uses shared dict to create health checker timer in only one worker process(system wide timer). The previous version 1.6.2 creates dedicated health checker timer per worker process. In some cases, this causing too much stress on upstream services when we run large number of kong pods.

Is there any plan kong to upgrade the version 2.0 to make use of kong node level timers?

Timeout Errors and Unhealthy Upstreams during Health Checks

We are experiencing frequent timeout errors during the health checks of our services. While the health APIs work fine when invoked directly from the nodes, we encounter issues during the health checks performed by Kong's lua-resty-healthcheck library.

The timeout errors are logged as follows:

Unhealthy TIMEOUT increment (10/3) for 'my-service.my-domain.com(10.123.321.234:443)', context: ngx.timer
Failed to receive status line from 'my-service.my-domain.com(10.123.321.234:443)': timeout, context: ngx.timer
Failed SSL handshake with 'my-service.my-domain.com(10.123.321.234:443)': handshake failed, context: ngx.timer

It is important to note that this issue affects specific upstreams, and only one or two pods at a time experience this problem. The upstreams remain in an unhealthy state and do not recover automatically. The issue is resolved temporarily by restarting the affected Kong pod, which sets the upstream to a healthy state again.

Upon investigating the code used by Kong's lua-resty-healthcheck library, it appears that the health check query is performed using HTTP/1.0. The relevant code snippet is as follows:

local request = ("GET %s HTTP/1.0\r\n%sHost: %s\r\n\r\n"):format(path, headers, hostheader or hostname or ip)

Considering this, we suspect that the timeouts might be related to the usage of HTTP/1.0 instead of HTTP/1.1. We believe that updating the health check query to use HTTP/1.1 might help mitigate these timeout errors.

We kindly request to make the necessary changes to the lua-resty-healthcheck library to use HTTP/1.1 for health checks. This update should help improve the reliability of the health checks and prevent the upstreams from getting stuck in an unhealthy state.

support https healthcheck

can not use https healthcheck

init_worker_by_lua error: /usr/local/openresty/lualib/resty/lock.lua:153: API disabled in the context of init_worker_by_lua*

Environment

OS: Ubuntu 18.04.3 LTS with all updates
Lua: 5.1.5
lua-resty-healthcheck 1.1.0-1 via luarocks
Nginx version: openresty/1.15.8.2 (./configure -j2 --with-pcre-jit --with-ipv6 --add-module=../ngx_lua_ipc --with-http_sub_module --with-http_v2_module)

Error

When restarting or reloading a instance configured resty-healthcheck most of the time the following error occurs:

Nov 19 19:01:09 host.tld nginx[1959]: 2019/11/19 19:01:07 [error] 1973#0: init_worker_by_lua error: /usr/local/openresty/lualib/resty/lock.lua:153: API disabled in the context of init_worker_by_lua*
Nov 19 19:01:09 host.tld nginx[1959]: stack traceback:
Nov 19 19:01:09 host.tld nginx[1959]:         [C]: in function 'sleep'
Nov 19 19:01:09 host.tld nginx[1959]:         /usr/local/openresty/lualib/resty/lock.lua:153: in function 'lock'
Nov 19 19:01:09 host.tld nginx[1959]:         /usr/local/share/lua/5.1/resty/healthcheck.lua:195: in function 'locking_target_list'
Nov 19 19:01:09 host.tld nginx[1959]:         /usr/local/share/lua/5.1/resty/healthcheck.lua:1307: in function 'new'
Nov 19 19:01:09 host.tld nginx[1959]:         /usr/local/share/lua/5.1/custommodule.lua:24293: in function 'init_worker'
Nov 19 19:01:09 host.tld nginx[1959]:         init_worker_by_lua:2: in main chunk

Snippets

nginx.conf:

http {
    [...]
    init_by_lua_block {
      cm = require 'custommodule'
      cm.init()
    }
    init_worker_by_lua_block {
      cm.init_worker()
    }
    [...]
}

custommodule.lua:

local _M = {}
local origins = {}
local healthchecker = {}

[...]

function _M.init_worker()
  -- Attention: Both modules must be required in init_worker().
  -- If one of those is required in global module space (e.g. Nginx init) resty.worker.events fails to manage it's internal _callback list
  local we = require "resty.worker.events"
  local hc = require "resty.healthcheck"

  -- worker events
  -- add dummy handler
  local handler = function(target, eventname, sourcename, pid)
  end
  we.register(handler)

  -- configure worker events
  local ok, err = we.configure{ shm = "worker_events", interval = 0.1 }
  if not ok then
    ngx.log(ngx.ERR, "Failed to configure worker events: ", err)
    return
  end

-- 8< --
-- All code below is repeated for each handled domain.
-- At failing instance: 3x
-- >8 --

  -- upstream health checks
  -- domain: domain1.tld
  -- init checker
  local checker = hc.new({
    name = "domain1.tld",
    shm_name = "healthchecks",
    type = "https",
    checks = {
      active = {
        http_path = '/',
        healthy = {
          interval = 10,
          successes = 1,
        },
        unhealthy = {
          interval = 5,
          http_statuses = { 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410,
                            410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420,
                            420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430,
                            431, 451,
                            500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510,
                            511 },
          tcp_failures = 2,
          http_failures = 2,
          timeouts = 2,
        }
      },
      passive = {
        healthy  = {
          successes = 1,
        },
        unhealthy  = {
          http_statuses = { 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410,
                            410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420,
                            420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430,
                            431, 451,
                            500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510,
                            511 },
          tcp_failures = 2,
          http_failures = 2,
          timeouts = 2,
        }
      }
    }
  })
  -- clear data to avoid broken syncronisation between health checker and balancer
  -- on nginx reload healtchecker keeps state and balancer loose it
  -- checker:clear()
  -- ^^ not necessary anymore because of workaround below

  -- add event handler for checker
  local handler = function(target, eventname, sourcename, pid)
    if not target then
      return
    end

    local domain = target.hostname
    local origin_host = target.ip
    if eventname == checker.events.remove then
      -- a target was removed
      local ok, err = pcall(function () origins[domain][1]:delete(origin_host) end)
      if not ok then
        ngx.log(ngx.WARN, "Deleting balancer node ", origin_host, " from domain: ", domain, " failed: ", err)
      else
        ngx.log(ngx.DEBUG, "Balancer node ", origin_host, " of domain: ", domain, " deleted")
      end
    elseif eventname == checker.events.healthy then
      -- target changed state, or was added
      local ok, err = pcall(function () origins[domain][1]:set(origin_host, 1) end)
      if not ok then
        ngx.log(ngx.WARN, "Setting balancer node ", origin_host, " for domain: ", domain, " failed: ", err)
      else
        ngx.log(ngx.DEBUG, "Balancer node ", origin_host, " for domain: ", domain, " added")
      end
    elseif eventname ==  checker.events.unhealthy then
      -- target changed state, or was added
      local ok, err = pcall(function () origins[domain][1]:delete(origin_host) end)
      if not ok then
        ngx.log(ngx.WARN, "Balancer delete for domain: ", domain, " failed: ", err)
      else
        ngx.log(ngx.DEBUG, "Balancer node ", origin_host, " of domain: ", domain, " deleted")
      end
    end
  end
  we.register(handler)

  -- add origin nodes
  -- special handling for nginx relaod
  -- health checker keeps state from previous instance, balancer not.
  -- we use the previous known state to resend healthy and unhealthy events to resync balancer node states
  -- this avoids the also possible use of checker:clear to reset the internal health states. The advantage of this
  -- approach is, that we don't lose the knowledge about unhealthy hosts.
  local healthy, err = checker:get_target_status("192.0.2.1", 443)
  if not err then
    checker:set_target_status("192.0.2.1", 443, healthy)
  end
  -- add new nodes in case of newly configured origins after reload or just a normal (re)start of nginx
  local ok, err = checker:add_target("192.0.2.1", 443, "domain1.tld", true)
  if err then
    ngx.log(ngx.ERR, 'Error adding target 192.0.2.1 to healthchecker domain1.tld: ', err)
  end

  -- store in module wide register
  healthchecker["domain1.tld"] = checker

  [...]
end

Question

I'm not sure if I'm just holding it wrong or if there is a bigger architectural problem buried, because @spacewander mentioned in openresty/lua-nginx-module#1210 *Deeplink that:

lua-resty-lock uses ngx.sleep internally, and ngx.sleep could not be used in init_by_worker* phase.

failed to release lock

2020/12/15 14:44:37 [error] 123#0: *60053758 [lua] healthcheck.lua:1104: log(): [healthcheck] (1bdcdd5c-ecbc-4cb3-b271-c5c4a3e03f56:ae-app-56.www.hba.main) failed to release lock 'lua-resty-healthcheck:1bdcdd5c-ecbc-4cb3-b271-c5c4a3e03f56:ae-app-56.www.hba.main:target_list_lock': unlocked, context: ngx.timer

suggestion: periodic lock time change to configurable (0.001 change to configurable)

local ok, err = self.shm:add(key, true, interval - 0.001)

I use lua-resty-healthcheck to test, and found that sometimes there will only one checks between 2 interval(There is only one worker start health checker).This is because the checker process to fast ,the time of consuming is less than 0.001。So，I suggest that the shm key expire time can be controled。

timer failure: attempt to call a number value

Today I started deploying my Openresty project, I used lua-resty-healthcheck with version = 2.0.0-1 in this project.

I found many error logs which printed by my app code and caused by this line in the github project. And code snippets around was also displayed below.

--- Get the current status of the target.
-- @param ip IP address of the target being checked.
-- @param port the port being checked against.
-- @param hostname the hostname of the target being checked.
-- @return `true` if healthy, `false` if unhealthy, or `nil + error` on failure.
function checker:get_target_status(ip, port, hostname)

  local target = get_target(self, ip, port, hostname)
  if not target then
    return nil, "target not found"
  end
  return target.internal_health == "healthy"
      or target.internal_health == "mostly_healthy"

end

It seemed function get_target failed, so I deep into code and found it was that self.targets was nil when call this function get_target_status. But the root reasons was still unknown.

I spent almost the whole afternoon to deal with it. Finally I found some abnormal error log in my Nginx error_logs. It said timer failure: attempt to call a number value.

I read the source code again and found this snippets which may be related to it.

I doubt that pcall(args[1] ...) here was a bug. When ngx_timer_at was called, the args[1] will always be parameter delay. So
the call ngx_timer_at always failed, then Lua thread failed get target_list from nginx shm, and self.targets will be empty at last.

BTW, I found this error in my Nginx server which has 24 workers firstly. And I reproduced this error locally with another Nginx server which has 5 workers. This error seems occureed in Nginx server with more than one workers.

Why hasn't the fix or feature from release/1.5.x been synchronized to the master branch？

Why hasn't these fix from release/1.5.x been synchronized to the master branch？
c619d8c
19b23d8
47f8709
e434007

Passive healthcheck bug

file: resty/healthcheck.lua
function: incr_counter

description:
The bug code is there:
if (health_mode == “healthy” and target.healthy) or
(health_mode == “unhealthy” and not target.healthy) then
– No need to count successes when healthy or failures when unhealthy
return true
end

When I config passive healthcheck without active healthcheck. If the failures counter of a target is not null and it’s current status is heathy, then this bug results that the success request can’t clean the failure counter。

I had resove this bug in this way:

local nokCounter = self.shm:get(get_shm_key(self.TARGET_NOKS, ip, port));
if (health_mode == “healthy” and target.healthy and (not nokCounter or nokCounter == 0)) or
(health_mode == “unhealthy” and not target.healthy) then
– No need to count successes when healthy or failures when unhealthy
return true
end

When multiple targets have the same IP:PORT, active healthcheck results for one impact them all

When routing to multiple separate application with the same IP:PORT for ingress (as in an HA-proxy), I notice that healthcheck results for the original targets seem to be ignored as I add new ones.

This impacts us heavily, as most of our APIs are deployed to a kubernetes cluster, all of which has the same IP:PORT for ingress (the host header is used to determine which specific service to send traffic to at the cluster's ingress router).

Request adding logic to the balancer to track each target's status as a hostname:port object, or hostname:ip:port object instead of just ip:port to allow separate targets to have separate status'

Active health check stops working randomly

I have Kong installed on Openshift cluster. I have upstreams as external servers (not a internal openshift nodes).
I have configured a active and passive health check. Active health check will be used only for targets which are unhealthy. It is set to check with 10s interval.

After sometime (randomly) active checks are stopped working. When targets are unhealthy by passive checks, it remains unhealthy until I reload/restart entire kong instance.

Here is how I have healthcheck config

  Healthchecks:
    Active:
      Concurrency:  1
      Healthy:
        http_statuses:
          200
          302
        Interval:                0
        Successes:               1
      http_path:                 /v1/management/health/simple
      https_verify_certificate:  false
      Timeout:                   3
      Type:                      https
      Unhealthy:
        http_failures:  0
        Interval:      10
        tcp_failures:  0
        Timeouts:      0
    Passive:
      Healthy:
        Successes:  1
      Unhealthy:
        http_failures:  1
        http_statuses:
          429
          500
          503
        tcp_failures:  1
        Timeouts:      1

As per below logs, It seems health checker stopped and started immediately but it did not really performing active probe after this restart, it happens randomly

2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) healthchecker stopped
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) Got initial target list (0 targets)
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) active check flagged as active
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) starting timer to check active checks
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) Healthchecker started!
2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) Got initial target list (2 targets)
2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) Got initial status healthy <ip> <ip>:<port>
2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) active check flagged as active
2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) Healthchecker started!
2021/11/04 16:30:45 [debug] 25#0: *1125536 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx) adding an existing target: <ip> <ip>:<port> (ign
oring)
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] events.lua:211: do_event_json(): worker-events: handling event; source=lua-resty-healthcheck [0dc6f45b-8f8d-40d2-a504-473544ee190b:<upstream xxxxxxxxxxxxx], event=clear, 
pid=24, data=table: 0x7f3487367af0
2021/11/04 16:30:45 [debug] 24#0: *1125506 [lua] healthcheck.lua:1126: log(): [healthcheck] (0dc6f45b-8f8d-40d2-a504-473544ee190b:g<upstream xxxxxxxxxxxxx) event: local cache cleared

I don't have steps to reproduce since it happens randomly

too many pending timers

Hi, I'm using the master branch and encountered this error:

...
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:46 [error] 6083#0: *417577 [lua] healthcheck.lua:18: add_target(): failed to add target: too many pending timers, context: init_worker_by_lua*
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
2020/04/01 16:24:48 [alert] 6083#0: 256 lua_max_running_timers are not enough
...

Is there a limit for how many targets I can add? Is it possible to add more than 2000 upstream servers?

seems to be related to locking_target_list when add_target

lua-resty-healthcheck/lib/resty/healthcheck.lua

Line 232 in 68f0cc6

local _, terr = ngx.timer.at(0, run_fn_locked_target_list, self, fn)

a tcp failure on an http(s) check should increment the http-failure count

originated here: #28 by @onematchfox

currently if an http check is performed, and it fails on TCP, then the TCP failure is recorded but the HTTP one is not. If this is combined with ignoring the TCP failures, (tcp_failures = 0) then the healthchecker will never trip if TCP errors occur for an HTTP check.

The targets is nil when call the function get_target after add target.

At v1.2.0 , some errors got like:
[error] 46#46: 14188959 failed to run balancer_by_lua: /opt/app/test_proj/deps/share/lua/5.1/resty/healthcheck.lua:247: attempt to index field 'targets' (a nil value)
stack traceback:
/opt/app/test_proj/deps/share/lua/5.1/resty/healthcheck.lua:247: in function 'get_target'
/opt/app/test_proj/deps/share/lua/5.1/resty/healthcheck.lua:424: in function 'get_target_status'
/opt/app/test_proj/lua/ins_breaker/http/balancer.lua:334: in function 'load_balancer'
/opt/app/test_proj/lua/circuit_breaker.lua:259: in function 'http_balancer_phase'
balancer_by_lua:2: in main chunk while connecting to upstream, client: 10.23.178.7

Using the function locking_target_list, the timer delays adding targets, so it will get an error if use the get_target_status function to get the status of a node before added.
I wanna add a wait_add_target_list to store the target before added to the target_list actually, and to store target's default is_healthy status.

Log Level of active succeeding health checks?

Does it make sense for successful active health-checks to log with warn and not info or debug?

2019/01/13 06:03:09 [warn] 34#0: *272118 [lua] healthcheck.lua:989: log(): [healthcheck] (test_upstream) healthy SUCCESS increment (1/3) for 10.xxx.xxx.xxx:443, context: ngx.timer

I think for unhealthy it makes sense to log a warn/error, but generally when things are working correctly is there a reason to write to log when running kong under notice mode(and since these are warn they all flood into my terminal view since I point Kongs logs to stdout)?

Testing using the Kong 1.0 release.

The passive healthcheck not support fail_timeout?

Like nginx fail_timeout，The passive healthcheck not support it? How to use only passive healthcheck?

Question: How could we install this package using opm?

I'm pretty new to openresty/lua world, as I understands this library is part of luarocks wheres openresty recommending to use opm as package manager. I wonder how should I manage to make this happen to try library on openresty? Thanks

check return codes of lua-resty-worker-events calls

Check return codes and log any errors in worker_events calls.

Note that this requires a bump on the lua-resty-worker-events dependency, as the return codes are different between lua-resty-worker-events 0.x and 1.x.

What is the difference between active and passive mode?

Hi all,

I'm using this lib in our project, thanks for the great work.

I'm just a bit confused about the purpose of passive mode healthcheck, what's it design for? what's the scenario?

Should we check the version of lua-resty-events?

In 1.6.2 there is a line:

local RESTY_EVENTS_VER = [[^0\.1\.\d+$]]

It says that the library only supports 0.1.x version of lua-resty-events.

Is it necessary? If lua-resty-events changes the version we will get a disaster.

The version of lua-resty-worker-events

The latest version of lua-resty-worker-events is 2.0.1 (https://github.com/Kong/lua-resty-worker-events/releases/tag/2.0.1),
And you are testing with the version 1.0.0 (https://github.com/Kong/lua-resty-healthcheck/blob/release/1.6.x/.github/workflows/build_and_test_with_worker_events.yml#L46C59-L46C59),
Do you have a plan to upgrade the lua-resty-worker-events?
Is kong already using the latest version of lua-resty-worker-events?

a success will reset _all three_ failure counters doesn't work

you have said "a success will reset all three failure counters" at the beginning, but it doesn't work.
Provided the http_failures is set three times, if there are two failures and then a success, but then another failure, the target will be marked as failure.
I think the problem is that the conditional statement of incr_counter "(health_mode == "healthy" and target.healthy) or (health_mode == "unhealthy" and not target.healthy) " goes wrong

Do you need to fill in the domain name in tcp mode?

lua-resty-healthcheck 1.0.0-1
lua-resty-worker-events 1.0.0-1

example:

if h.mode == "tcp" then
    h.http_domain = nil
    h.http_path = nil
end

error log

[error] 67293#3894067: *16 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/resty/healthcheck.lua:1310: table index is nil
stack traceback:
coroutine 0:
        /usr/local/share/lua/5.1/resty/healthcheck.lua: in function 'fn'
        /usr/local/share/lua/5.1/resty/healthcheck.lua:206: in function 'locking_target_list'
        /usr/local/share/lua/5.1/resty/healthcheck.lua:1296: in function 'new'

If the domain name is set to a placeholder is normal, is this design unreasonable?

checker:get_target_status fail to get result

Thank you for providing such a great project. I encountered some problems while using it.

Test according to the test case, the following code:

 location = /t {
        content_by_lua_block {
            local we = require "resty.worker.events"
            assert(we.configure{ shm = "my_worker_events", interval = 0.1 })
            local healthcheck = require("resty.healthcheck")
            local checker = healthcheck.new({
                name = "testing",
                shm_name = "test_shm",
                checks = {
                    active = {
                        http_path = "/status",
                        healthy  = {
                            interval = 999, -- we don't want active checks
                            successes = 1,
                        },
                        unhealthy  = {
                            interval = 999, -- we don't want active checks
                            tcp_failures = 1,
                            http_failures = 1,
                        }
                    },
                    passive = {
                        healthy  = {
                            successes = 1,
                        },
                        unhealthy  = {
                            tcp_failures = 1,
                            http_failures = 1,
                        }
                    }
                }
            })
            ngx.sleep(0.1) -- wait for initial timers to run once
            local ok, err = checker:add_target("127.0.0.1", 8088, nil, true)
            ngx.say(checker:get_target_status("127.0.0.1", 8088))  -- true
            checker:report_tcp_failure("127.0.0.1", 8088)
            ngx.say(checker:get_target_status("127.0.0.1", 8088))  -- false
            checker:report_success("127.0.0.1", 8088)
            ngx.say(checker:get_target_status("127.0.0.1", 8088))  -- true
        }
    }

The result of execution is:

curl http://127.0.0.1:8085/t
false
false
true

Why can't I reproduce the same results as your test case?
In other words, after executing checker:add_target, the result of executing checker:get_target_status for the first time is false.
I can confirm that my corresponding interface exists: as follows

curl -I   http://127.0.0.1:8088/status
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Mon, 04 Mar 2024 15:53:43 GMT
Content-Length: 15

Please tell me if there is anything wrong here;Thx
the openresty version of mine is 1.13.

TCP Health check is not Making upstream target unhealthy.

In order to reproduce this, i created a random service and enabled tcp healthcheck to it.
Post that i added some random IP and port which doesnt Exist. Even i dont see any error of timeout in logs, the status of upstream is never set to UNHEALTHY.

Kong Version i am using : kong:2.1.3-centos

Below is my service configuration
{ "client_certificate": null, "created_at": 1598938985, "id": "a250bcbe-3934-4825-89ab-81411ee95969", "tags": null, "name": "upstream_javaapigw", "algorithm": "round-robin", "hash_on_header": null, "hash_fallback_header": null, "host_header": null, "hash_on_cookie": null, "healthchecks": { "threshold": 100, "active": { "unhealthy": { "http_statuses": [ 429, 404, 500, 501, 502, 503, 504, 505 ], "tcp_failures": 2, "timeouts": 2, "http_failures": 1, "interval": 3 }, "type": "tcp", "http_path": "/", "timeout": 1, "healthy": { "successes": 5, "interval": 1, "http_statuses": [ 200, 302 ] }, "https_sni": null, "https_verify_certificate": true, "concurrency": 10 }, "passive": { "unhealthy": { "http_failures": 1, "http_statuses": [ 429, 500, 503 ], "tcp_failures": 1, "timeouts": 1 }, "healthy": { "http_statuses": [ 200, 201, 202, 203, 204, 205, 206, 207, 208, 226, 300, 301, 302, 303, 304, 305, 306, 307, 308 ], "successes": 0 }, "type": "tcp" } }, "hash_on_cookie_path": "/", "hash_on": "none", "hash_fallback": "none", "slots": 10000 }

kong / lua-resty-healthcheck Goto Github PK

lua-resty-healthcheck's Issues

Environment

Error

Snippets

Question

Recommend Projects

Recommend Topics

Recommend Org

Jobs