GithubHelp home page GithubHelp logo

Comments (9)

juliusv avatar juliusv commented on May 22, 2024

Hey, thanks! This might be a known crash, but not sure. The log lines are truncated (like /home/wrouesnel/src/scripts/go/src/github.com/prometheus/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manag), so I don't see the exact line number of the nil pointer dereference at the end of the line. Do you happen to have a version of this log with non-truncated lines?

from alertmanager.

wrouesnel avatar wrouesnel commented on May 22, 2024

Whoops, the OP now has the full logs.

from alertmanager.

juliusv avatar juliusv commented on May 22, 2024

Thanks. Yeah, that's a crash for an alert being grouped without a matching aggregation_rule in the config file. That crash is of course a bug, but not on our urgent list of things to fix right now. Could you check:

  • Do you have a catch-all rule at the end of the config file which matches any alert (no label filters)? I'd recommend that in general for now.
  • Did you do the config file change completely atomically (completely write new file first, then rename)?

from alertmanager.

wrouesnel avatar wrouesnel commented on May 22, 2024

Neither of those things, but it looks like the config I have is taking down the alert manager due to the first. I missed it the first time because I only had the 2 alerts both of which were in a rule. A dummy notification works for now, so manageable.

from alertmanager.

dan-cleinmark avatar dan-cleinmark commented on May 22, 2024

I believe we hit the same issue (at fsnotify/fsnotify_linux.go:127) on an older version of the alert manager (1ef8e9c).

We are rewriting the alertmanager.conf file directly when we see this error. I'm changing that logic to write a temporary file && rename now and will update here with the outcome.

Dump below.

2015-06-30 15:38:50.462684500 panic: runtime error: invalid memory address or nil pointer dereference
2015-06-30 15:38:50.462784500 [signal 0xb code=0x1 addr=0x0 pc=0x48cccd]
2015-06-30 15:38:50.462944500 
2015-06-30 15:38:50.462964500 goroutine 15 [running]:
2015-06-30 15:38:50.463054500 github.com/prometheus/alertmanager/manager.(*memoryAlertManager).removeExpiredAggregates(0xc208185f40)
2015-06-30 15:38:50.463136500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:282 +0x23d
2015-06-30 15:38:50.463250500 github.com/prometheus/alertmanager/manager.(*memoryAlertManager).runIteration(0xc208185f40)
2015-06-30 15:38:50.463331500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:395 +0x33
2015-06-30 15:38:50.463458500 github.com/prometheus/alertmanager/manager.(*memoryAlertManager).Run(0xc208185f40)
2015-06-30 15:38:50.463525500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/manager.go:408 +0x9a
2015-06-30 15:38:50.463655500 created by main.main
2015-06-30 15:38:50.463707500   /home/ubuntu/alertmanager/main.go:73 +0x914
2015-06-30 15:38:50.463844500 
2015-06-30 15:38:50.463865500 goroutine 1 [chan receive, 12 minutes]:
2015-06-30 15:38:50.464013500 github.com/prometheus/alertmanager/manager.(*notifier).Dispatch(0xc20817ba00)
2015-06-30 15:38:50.464085500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/manager/notifier.go:398 +0x54
2015-06-30 15:38:50.464209500 main.main()
2015-06-30 15:38:50.464276500   /home/ubuntu/alertmanager/main.go:117 +0x101d
2015-06-30 15:38:50.464392500 
2015-06-30 15:38:50.464417500 goroutine 5 [chan receive]:
2015-06-30 15:38:50.464511500 github.com/golang/glog.(*loggingT).flushDaemon(0xad4b80)
2015-06-30 15:38:50.464582500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/golang/glog/glog.go:879 +0x78
2015-06-30 15:38:50.464706500 created by github.com/golang/glog.init·1
2015-06-30 15:38:50.464771500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/golang/glog/glog.go:410 +0x2a7
2015-06-30 15:38:50.464885500 
2015-06-30 15:38:50.464906500 goroutine 17 [syscall, 75 minutes, locked to thread]:
2015-06-30 15:38:50.465067500 runtime.goexit()
2015-06-30 15:38:50.465134500   /usr/local/go/src/runtime/asm_amd64.s:2232 +0x1
2015-06-30 15:38:50.465258500 
2015-06-30 15:38:50.465278500 goroutine 14 [chan receive]:
2015-06-30 15:38:50.465361500 main.func·001()
2015-06-30 15:38:50.465423500   /home/ubuntu/alertmanager/main.go:51 +0x84
2015-06-30 15:38:50.465535500 created by main.main
2015-06-30 15:38:50.465597500   /home/ubuntu/alertmanager/main.go:56 +0x42b
2015-06-30 15:38:50.465708500 
2015-06-30 15:38:50.465728500 goroutine 16 [IO wait, 3 minutes]:
2015-06-30 15:38:50.465870500 net.(*pollDesc).Wait(0xc208011e20, 0x72, 0x0, 0x0)
2015-06-30 15:38:50.466043500   /usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
2015-06-30 15:38:50.466169500 net.(*pollDesc).WaitRead(0xc208011e20, 0x0, 0x0)
2015-06-30 15:38:50.466307500   /usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
2015-06-30 15:38:50.466435500 net.(*netFD).accept(0xc208011dc0, 0x0, 0x7f6bde890d70, 0xc208d81eb0)
2015-06-30 15:38:50.466617500   /usr/local/go/src/net/fd_unix.go:419 +0x40b
2015-06-30 15:38:50.466746500 net.(*TCPListener).AcceptTCP(0xc2084164d0, 0x61a53e, 0x0, 0x0)
2015-06-30 15:38:50.466919500   /usr/local/go/src/net/tcpsock_posix.go:234 +0x4e
2015-06-30 15:38:50.467043500 net/http.tcpKeepAliveListener.Accept(0xc2084164d0, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.467247500   /usr/local/go/src/net/http/server.go:1976 +0x4c
2015-06-30 15:38:50.467370500 net/http.(*Server).Serve(0xc20800ad20, 0x7f6bde89fd50, 0xc2084164d0, 0x0, 0x0)
2015-06-30 15:38:50.467573500   /usr/local/go/src/net/http/server.go:1728 +0x92
2015-06-30 15:38:50.467696500 net/http.(*Server).ListenAndServe(0xc20800ad20, 0x0, 0x0)
2015-06-30 15:38:50.467833500   /usr/local/go/src/net/http/server.go:1718 +0x154
2015-06-30 15:38:50.467946500 net/http.ListenAndServe(0x874900, 0x5, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.468189500   /usr/local/go/src/net/http/server.go:1808 +0xba
2015-06-30 15:38:50.468316500 github.com/prometheus/alertmanager/web.WebService.ServeForever(0xc208503840, 0xc208503860, 0xc208416008, 0xc208576100, 0x0, 0x0)
2015-06-30 15:38:50.468546500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/web/web.go:73 +0x6a5
2015-06-30 15:38:50.468675500 created by main.main
2015-06-30 15:38:50.468725500   /home/ubuntu/alertmanager/main.go:105 +0xd8f
2015-06-30 15:38:50.468853500 
2015-06-30 15:38:50.468873500 goroutine 18 [select, 38 minutes]:
2015-06-30 15:38:50.469018500 github.com/prometheus/alertmanager/config.(*fileWatcher).Watch(0xc2082ade70, 0xc208573ef0)
2015-06-30 15:38:50.469131500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/prometheus/alertmanager/config/watcher.go:49 +0x8fb
2015-06-30 15:38:50.469243500 created by main.main
2015-06-30 15:38:50.469313500   /home/ubuntu/alertmanager/main.go:114 +0xf0d
2015-06-30 15:38:50.469423500 
2015-06-30 15:38:50.469454500 goroutine 41 [IO wait, 3 minutes]:
2015-06-30 15:38:50.469582500 net.(*pollDesc).Wait(0xc208011e90, 0x72, 0x0, 0x0)
2015-06-30 15:38:50.469754500   /usr/local/go/src/net/fd_poll_runtime.go:84 +0x47
2015-06-30 15:38:50.469879500 net.(*pollDesc).WaitRead(0xc208011e90, 0x0, 0x0)
2015-06-30 15:38:50.470104500   /usr/local/go/src/net/fd_poll_runtime.go:89 +0x43
2015-06-30 15:38:50.470105500 net.(*netFD).Read(0xc208011e30, 0xc2084e7000, 0x1000, 0x1000, 0x0, 0x7f6bde890d70, 0xc208f583f8)
2015-06-30 15:38:50.470106500   /usr/local/go/src/net/fd_unix.go:242 +0x40f
2015-06-30 15:38:50.470107500 net.(*conn).Read(0xc2084165a0, 0xc2084e7000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.470108500   /usr/local/go/src/net/net.go:121 +0xdc
2015-06-30 15:38:50.470119500 net/http.(*liveSwitchReader).Read(0xc2084f00e8, 0xc2084e7000, 0x1000, 0x1000, 0xc208f583f0, 0x0, 0x0)
2015-06-30 15:38:50.470120500   /usr/local/go/src/net/http/server.go:214 +0xab
2015-06-30 15:38:50.470121500 io.(*LimitedReader).Read(0xc208407b20, 0xc2084e7000, 0x1000, 0x1000, 0x800, 0x0, 0x0)
2015-06-30 15:38:50.470221500   /usr/local/go/src/io/io.go:408 +0xce
2015-06-30 15:38:50.470345500 bufio.(*Reader).fill(0xc20800ade0)
2015-06-30 15:38:50.470430500   /usr/local/go/src/bufio/bufio.go:97 +0x1ce
2015-06-30 15:38:50.470542500 bufio.(*Reader).ReadSlice(0xc20800ade0, 0x51ac0a, 0x0, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.470745500   /usr/local/go/src/bufio/bufio.go:295 +0x257
2015-06-30 15:38:50.470746500 bufio.(*Reader).ReadLine(0xc20800ade0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.470747500   /usr/local/go/src/bufio/bufio.go:324 +0x62
2015-06-30 15:38:50.470748500 net/textproto.(*Reader).readLineSlice(0xc208d6aa20, 0x0, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.470749500   /usr/local/go/src/net/textproto/reader.go:55 +0x9e
2015-06-30 15:38:50.470754500 net/textproto.(*Reader).ReadLine(0xc208d6aa20, 0x0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.470755500   /usr/local/go/src/net/textproto/reader.go:36 +0x4f
2015-06-30 15:38:50.470756500 net/http.ReadRequest(0xc20800ade0, 0xc2088d8820, 0x0, 0x0)
2015-06-30 15:38:50.470776500   /usr/local/go/src/net/http/request.go:598 +0xcb
2015-06-30 15:38:50.470905500 net/http.(*conn).readRequest(0xc2084f00a0, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.471061500   /usr/local/go/src/net/http/server.go:586 +0x26f
2015-06-30 15:38:50.471367500 net/http.(*conn).serve(0xc2084f00a0)
2015-06-30 15:38:50.471368500   /usr/local/go/src/net/http/server.go:1162 +0x69e
2015-06-30 15:38:50.471369500 created by net/http.(*Server).Serve
2015-06-30 15:38:50.471370500   /usr/local/go/src/net/http/server.go:1751 +0x35e
2015-06-30 15:38:50.471371500 
2015-06-30 15:38:50.471372500 goroutine 20 [syscall, 38 minutes]:
2015-06-30 15:38:50.471373500 syscall.Syscall(0x0, 0x3, 0xc208605ee0, 0x10000, 0x0, 0x0, 0xc2088ea700)
2015-06-30 15:38:50.471377500   /usr/local/go/src/syscall/asm_linux_amd64.s:21 +0x5
2015-06-30 15:38:50.471378500 syscall.read(0x3, 0xc208605ee0, 0x10000, 0x10000, 0x0, 0x0, 0x0)
2015-06-30 15:38:50.471379500   /usr/local/go/src/syscall/zsyscall_linux_amd64.go:867 +0x6e
2015-06-30 15:38:50.471380500 syscall.Read(0x3, 0xc208605ee0, 0x10000, 0x10000, 0xc20807eef8, 0x0, 0x0)
2015-06-30 15:38:50.471381500   /usr/local/go/src/syscall/syscall_unix.go:136 +0x58
2015-06-30 15:38:50.471385500 github.com/howeyc/fsnotify.(*Watcher).readEvents(0xc20800a5a0)
2015-06-30 15:38:50.471386500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/howeyc/fsnotify/fsnotify_linux.go:219 +0x12c
2015-06-30 15:38:50.471387500 created by github.com/howeyc/fsnotify.NewWatcher
2015-06-30 15:38:50.471388500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/howeyc/fsnotify/fsnotify_linux.go:126 +0x420
2015-06-30 15:38:50.471389500 
2015-06-30 15:38:50.471392500 goroutine 21 [chan receive, 38 minutes]:
2015-06-30 15:38:50.471393500 github.com/howeyc/fsnotify.(*Watcher).purgeEvents(0xc20800a5a0)
2015-06-30 15:38:50.471394500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/howeyc/fsnotify/fsnotify.go:21 +0x55
2015-06-30 15:38:50.471402500 created by github.com/howeyc/fsnotify.NewWatcher
2015-06-30 15:38:50.471403500   /home/ubuntu/alertmanager/.build/gopath/src/github.com/howeyc/fsnotify/fsnotify_linux.go:127 +0x43a

from alertmanager.

juliusv avatar juliusv commented on May 22, 2024

@dan-cleinmark Yep, not surprising if you're directly writing to the watched config file. @fabxc is currently rewriting the alertmanager completely, and the file watching will actually be gone (like Prometheus, it will only react to explicit SIGHUPs to reload the config, and the config will be YAML, like Prometheus, etc.).

from alertmanager.

dan-cleinmark avatar dan-cleinmark commented on May 22, 2024

@juliusv we modified the config updates to do an atomic update, but are still seeing the same issue. Other possibly interesting details - we're running on an XFS formatted EBS volume and are using xfs_freeze && ebs-snapshot to perform backups. That backup happens once / day and we were seeing the /alerts issue much more frequently, so I doubt they're related, but dropping that here as an FYI.

You mentioned having a 'catch-all' alert rule which we don't have currently. Is that related to the fsnotify exception or just a general best practice?

from alertmanager.

juliusv avatar juliusv commented on May 22, 2024

@dan-cleinmark Yeah, currently there's a known crash in the line 282 where it crashed for you when an alert entry expires which doesn't have a matching aggregation_rule. So to avoid that, make sure that all your alerts are matched by at least one rule (e.g. by adding a catch-all at the end). Again, we're completely rewriting the Alertmanager at the moment, so we're not spending much time fixing the old experimental version.

from alertmanager.

juliusv avatar juliusv commented on May 22, 2024

Closing in favor of #122.

from alertmanager.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.