GithubHelp home page GithubHelp logo

Comments (11)

Al2Klimov avatar Al2Klimov commented on May 23, 2024 1

Monkey patched local Icinga DB a bit, so that it syncs history entries one by one.

diff --git a/pkg/icingadb/history/sync.go b/pkg/icingadb/history/sync.go
index dc8bc611..a23cbb13 100644
--- a/pkg/icingadb/history/sync.go
+++ b/pkg/icingadb/history/sync.go
@@ -20,6 +20,7 @@ import (
        "golang.org/x/sync/errgroup"
        "reflect"
        "sync"
+       "time"
 )

 // Sync specifies the source and destination of a history sync.
@@ -118,6 +119,7 @@ func (s Sync) readFromRedis(ctx context.Context, key string, output chan<- redis

                for _, stream := range streams {
                        for _, message := range stream.Messages {
+                               time.Sleep(time.Second / 3)
                                xra.Streams[1] = message.ID

                                select {

Exactly the only option I saw in the code and was afraid of:

127.0.0.1:6380> XRANGE icinga:history:stream:downtime - + COUNT 1
1) 1) "1697508001888-4"
   2)  1) "downtime_id"
       2) "1a542440ed19197da76b01788dd54cb10af7b4e6"
       3) "environment_id"
       4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
       5) "host_id"
       6) "b4281fed7d790ac5b54b7481addee4413cdc7c49"
       7) "entry_time"
       8) "1697508001690"
       9) "author"
      10) "maintenance-user"
      11) "comment"
      12) "automatic downtime for customer creation"
      13) "is_flexible"
      14) "0"
      15) "flexible_duration"
      16) "0"
      17) "scheduled_start_time"
      18) "1697508001000"
      19) "scheduled_end_time"
      20) "1705475119000"
      21) "has_been_cancelled"
      22) "1"
      23) "trigger_time"
      24) "1697508001690"
      25) "event_id"
      26) "fbe4ec2afa1feca24b21c7fa510db57b75ba152f"
      27) "event_type"
      28) "downtime_start"
      29) "object_type"
      30) "service"
      31) "service_id"
      32) "9c42d91fc10ee3e6da1c0477aee3233a6d08fcc8"
      33) "start_time"
      34) "1697508001000"
      35) "end_time"
      36) "1705475119000"
      37) "endpoint_id"
      38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
      39) "parent_id"
      40) "8edec9935b10706f2389ff7f62b537c6d285410b"
127.0.0.1:6380>

has_been_cancelled=1, but cancel_time is missing.

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

@log1-c Please could you share the Redis data Icinga DB can't process:

  1. Run icingadb-redis-cli
  2. Query XRANGE icinga:history:stream:downtime - +

from icinga2.

log1-c avatar log1-c commented on May 23, 2024

Running the command produces a lot of output like this

6470) 1) "1698153784788-1"
       2)  1) "downtime_id"
           2) "8763702cd5f895051473a13b1a0f6a83a86521c0"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "192bb8b024dd83154cf545f4c646c737e67d7eae"
           7) "entry_time"
           8) "1698153784538"
           9) "author"
          10) "icingaadmin"
          11) "comment"
          12) "Diese Downtime gilt immer f\xc3\xbcr alle Develop Machinen des Virtual Desktop Teams"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698098400000"
          19) "scheduled_end_time"
          20) "1698184800000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698153784538"
          25) "event_id"
          26) "a79ec8a7a7a82aceb13c3c5880e84a5d66c13bf1"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "service"
          31) "service_id"
          32) "0a3cffce433d4305f0d5effa3dbf8005df53352c"
          33) "start_time"
          34) "1698098400000"
          35) "end_time"
          36) "1698184800000"
          37) "endpoint_id"
          38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
          39) "scheduled_by"
          40) "mvd-d9-ts1001!service-lanmanworkstation!virtual_desktop_develop_server"
 6471) 1) "1698153784788-2"
       2)  1) "downtime_id"
           2) "3d2da43046eca2490e42ae935d1afe05f80cba84"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "192bb8b024dd83154cf545f4c646c737e67d7eae"
           7) "entry_time"
           8) "1698153784542"
           9) "author"
          10) "icingaadmin"
          11) "comment"
          12) "Diese Downtime gilt immer f\xc3\xbcr alle Develop Machinen des Virtual Desktop Teams"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698098400000"
          19) "scheduled_end_time"
          20) "1698184800000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698153784542"
          25) "event_id"
          26) "05f6c3ed0c3efaa52ea2ac66c2580ba14555f8bc"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "service"
          31) "service_id"
          32) "f84893b01646df8dc1ee7eec33ead3900e76d0e0"
          33) "start_time"
          34) "1698098400000"
          35) "end_time"
          36) "1698184800000"
          37) "endpoint_id"
          38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
          39) "scheduled_by"
          40) "mvd-d9-ts1001!service-frxsvc!virtual_desktop_develop_server"

...
15068) 1) "1698702302458-3"
       2)  1) "downtime_id"
           2) "c391e07992e08e36631d914dcf177fa9221e5f3a"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "6426d0d5be7fde903c2c4b5ffc72b243e20c41e7"
           7) "entry_time"
           8) "1698702302348"
           9) "author"
          10) "maintenance-user"
          11) "comment"
          12) "automatic downtime for vm creation, 24h sentinelone downtime"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698702302000"
          19) "scheduled_end_time"
          20) "1698822302000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698702302348"
          25) "event_id"
          26) "d5746b65f3471c39d5b2928d9f1c5a7a24c9232d"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "service"
          31) "service_id"
          32) "253efc7b550fa1ed48f9b08cc35e1ddeaba5431e"
          33) "start_time"
          34) "1698702302000"
          35) "end_time"
          36) "1698822302000"
          37) "endpoint_id"
          38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
15069) 1) "1698702302458-4"
       2)  1) "downtime_id"
           2) "011a8cc65880cadac251a55f87c973dc4711c7e6"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "a1d9bac76db364203e7d546b7c9511dfaa384425"
           7) "entry_time"
           8) "1698702302371"
           9) "author"
          10) "maintenance-user"
          11) "comment"
          12) "automatic downtime for vm creation"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698702302000"
          19) "scheduled_end_time"
          20) "1698735902000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698702302371"
          25) "event_id"
          26) "f3fabb956cb45826349b58eda4c5c642e399d637"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "host"
          31) "start_time"
          32) "1698702302000"
          33) "end_time"
          34) "1698735902000"
          35) "endpoint_id"
          36) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
15070) 1) "1698702302458-5"
       2)  1) "downtime_id"
           2) "dc6299459d548ec1e577df385d02b6d7e6f65730"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "a1d9bac76db364203e7d546b7c9511dfaa384425"
           7) "entry_time"
           8) "1698702302380"
           9) "author"
          10) "maintenance-user"
          11) "comment"
          12) "automatic downtime for vm creation"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698702302000"
          19) "scheduled_end_time"
          20) "1698735902000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698702302380"
          25) "event_id"
          26) "376ea1f29f0ca519b1200531d853a2892aaa8027"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "service"
          31) "service_id"
          32) "e4a461d6f88d4d783dca596e7015781f6634de7c"
          33) "start_time"
          34) "1698702302000"
          35) "end_time"
          36) "1698735902000"
          37) "endpoint_id"
          38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
          39) "parent_id"
          40) "011a8cc65880cadac251a55f87c973dc4711c7e6"
15071) 1) "1698702302458-6"
       2)  1) "downtime_id"
           2) "18da40453435eb3461437cef796599f5efcf9d1f"
           3) "environment_id"
           4) "d4af2e0c3a51075336c92e0de60bd5336ca9c9bf"
           5) "host_id"
           6) "a1d9bac76db364203e7d546b7c9511dfaa384425"
           7) "entry_time"
           8) "1698702302384"
           9) "author"
          10) "maintenance-user"
          11) "comment"
          12) "automatic downtime for vm creation"
          13) "is_flexible"
          14) "0"
          15) "flexible_duration"
          16) "0"
          17) "scheduled_start_time"
          18) "1698702302000"
          19) "scheduled_end_time"
          20) "1698735902000"
          21) "has_been_cancelled"
          22) "0"
          23) "trigger_time"
          24) "1698702302384"
          25) "event_id"
          26) "21ee7e1813b010e8ae11d72a664de7a6ccaede63"
          27) "event_type"
          28) "downtime_start"
          29) "object_type"
          30) "service"
          31) "service_id"
          32) "9595c971c11b49346ce91ab1bb7580684859bd24"
          33) "start_time"
          34) "1698702302000"
          35) "end_time"
          36) "1698735902000"
          37) "endpoint_id"
          38) "7fad74b9bcc7287690d7f4bdc47ce9c133830d81"
          39) "parent_id"
          40) "011a8cc65880cadac251a55f87c973dc4711c7e6"

but only on the node were the icingadb daemon is crashing. on the second master node (not the config master btw.) it just returns emtpy array

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

Please could you share your file in /var/lib/icingadb-redis from the node with the long output via https://nextcloud.icinga.com/index.php/s/YnTKYNSLaYyx9d8 ?

from icinga2.

log1-c avatar log1-c commented on May 23, 2024

Sure, I uploaded the dump.rdb file

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

From the node with the long output of XRANGE. Obviously I need to debug "the node were the icingadb daemon is crashing".

127.0.0.1:6380> XRANGE icinga:history:stream:downtime + -
(empty array)
127.0.0.1:6380>

from icinga2.

log1-c avatar log1-c commented on May 23, 2024

That is the file I uploaded

msd-ic-ma01# ls -lah
total 74M
drwxr-x---.  2 icingadb-redis icingadb-redis   22 Nov  8 14:35 .
drwxr-xr-x. 49 root           root           4.0K Oct 18 14:57 ..
-rw-r--r--.  1 icingadb-redis icingadb-redis  74M Nov  8 14:35 dump.rdb

node where the daemon runs:


msd-ic-ma02# ls -lah
total 28M
drwxr-x---.  2 icingadb-redis icingadb-redis   22 Nov  8 14:36 .
drwxr-xr-x. 46 root           root           4.0K Oct 23 06:42 ..
-rw-r--r--.  1 icingadb-redis icingadb-redis  28M Nov  8 14:36 dump.rdb

Did you maybe switch + and - in your command? Because when I do XRANGE icinga:history:stream:downtime + - I also get (emtpy array)
The other way round produces output

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

Oops 😅

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

Successfully reproduced it, but only on Postgres.🙈

from icinga2.

Al2Klimov avatar Al2Klimov commented on May 23, 2024

Can you discard that broken event?

XDEL icinga:history:stream:downtime 1697508001888-4

from icinga2.

log1-c avatar log1-c commented on May 23, 2024

Done.

127.0.0.1:6380> XDEL icinga:history:stream:downtime 1697508001888-4
(integer) 1

Service keeps running after that.

● icingadb.service - Icinga DB
   Loaded: loaded (/usr/lib/systemd/system/icingadb.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2023-12-12 08:42:02 CET; 2min 12s ago
 Main PID: 791376 (icingadb)
    Tasks: 8 (limit: 49030)
   Memory: 240.7M
   CGroup: /system.slice/icingadb.service
           └─791376 /usr/sbin/icingadb --config /etc/icingadb/config.yml

Dec 12 08:42:22 msd-ic-ma01 icingadb[791376]: history-sync: Synced 80 notification history items
Dec 12 08:42:42 msd-ic-ma01 icingadb[791376]: history-sync: Synced 17424 downtime history items
Dec 12 08:42:42 msd-ic-ma01 icingadb[791376]: history-sync: Synced 67328 state history items
Dec 12 08:43:02 msd-ic-ma01 icingadb[791376]: history-sync: Synced 88187 state history items
Dec 12 08:43:02 msd-ic-ma01 icingadb[791376]: history-sync: Synced 37740 downtime history items
Dec 12 08:43:02 msd-ic-ma01 icingadb[791376]: high-availability: Another instance is active
Dec 12 08:43:22 msd-ic-ma01 icingadb[791376]: history-sync: Synced 8704 downtime history items
Dec 12 08:43:22 msd-ic-ma01 icingadb[791376]: history-sync: Synced 36014 state history items
Dec 12 08:43:42 msd-ic-ma01 icingadb[791376]: history-sync: Synced 1 state history items
Dec 12 08:44:02 msd-ic-ma01 icingadb[791376]: high-availability: Another instance is active

from icinga2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.