rabbitmq / aten Goto Github PK
View Code? Open in Web Editor NEWAn adaptive accrual node failure detection library for Elixir and Erlang
License: Other
An adaptive accrual node failure detection library for Elixir and Erlang
License: Other
Hi All
I'm trying to use aten
to monitor my Erlang cluster nodes.
The first thing I did is to try monitoring the node itlself:
$ rebar3 shell
[...]
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling wa8_rest
===> Compiling wa8
===> Compiling wa8_perf
Erlang/OTP 26 [erts-14.0.2] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:64] [jit:ns] [dtrace]
Eshell V14.0.2 (press Ctrl+G to abort, type help(). for help)
===> Booted woo
===> Booted sasl
===> Booted aten
(woo@127.0.0.1)1> node().
'[email protected]'
(woo@127.0.0.1)2> aten:register(node()).
ok
(woo@127.0.0.1)3> flush().
Shell got {node_event,'[email protected]',down}
Strange that I'm getting a down
event, isn't it?
I should get an up
event in this case as the node is up and running.
Shell got {node_event,'[email protected]',up}
No response
Currently there is no way for a watcher that receives a node_event to make any judgement on how long ago the state change was observed. Depending on the watcher process implementation it could have spent significant time in the mailbox. If we attach a timestamp to the node_event, receivers of the event can at least evaluate the freshness of the event and possibly choose to ignore it or wait a bit to see if another state change might have been emitted.
A timestamp should be good enough given watchers should only reside on the same erlang node as the aten detector process.
This is a breaking change.
I wonder that now the project is stable enough if you could tag a release and publish it on aten? Unfortunately hex doesn't allow to mix hex and git dependency :/
I was looking at the code and I am wondering about the requirements.
As I understand it it first try to connect on register but there doesn't seems to have any any reconnection planned when the node is down. What should we do then when the node is down? Forcing a reconnect? I am thinking that since you force the connection on first call you may want to reconnect it as well, thoughts?
aten_sink probably should maintain a set of node monitors for each node in it's monitored set and remove them from the set when a node is disconnected.
This will probably cause nodes that are disconnected suddenly never to be detected as down by aten_detector
- additionally the analyse
function will need to be updated to include the difference between the map of previous nodes and current as down
We need to write a test aten
under increasing amount of data load to see how well it adapts.
aten_sink:beat
should use nosuspend
to avoid the chance of a single partitioned node with a full TCP send buffer blocking aten_emitter
from sending heartbeats to other un-partitioned nodes.
Use monotonic time functions.
N/A
No response
Hello, aten developer.
Thank you for your works, very helpful.
When registered a connected node, aten_detector sends a down message at first. I expected that I could receive an up message after register a connected node.
Is this by design?
If so, please close this.
aten_detector
needs to monitor all processes that register and unregister them if they go down.
When I reviewed the sources there I noticed that the tags have not been uploaded in github :)
Currently node_event
s are only emitted on state changes, however it is useful for newly registered monitors to be notified of a node's state at the time of registering.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.