GithubHelp home page GithubHelp logo

hashicorp / serf Goto Github PK

View Code? Open in Web Editor NEW
5.8K 468.0 587.0 7.06 MB

Service orchestration and management tool.

Home Page: https://www.serf.io/

License: Mozilla Public License 2.0

Makefile 0.33% Go 90.50% Ruby 0.45% Shell 2.33% HTML 3.48% JavaScript 0.64% Dockerfile 0.11% SCSS 2.12% HCL 0.04%

serf's Introduction

Serf Build Status Join the chat at https://gitter.im/hashicorp-serf/Lobby

Serf is a decentralized solution for service discovery and orchestration that is lightweight, highly available, and fault tolerant.

Serf runs on Linux, Mac OS X, and Windows. An efficient and lightweight gossip protocol is used to communicate with other nodes. Serf can detect node failures and notify the rest of the cluster. An event system is built on top of Serf, letting you use Serf's gossip protocol to propagate events such as deploys, configuration changes, etc. Serf is completely masterless with no single point of failure.

Here are some example use cases of Serf, though there are many others:

  • Discovering web servers and automatically adding them to a load balancer
  • Organizing many memcached or redis nodes into a cluster, perhaps with something like twemproxy or maybe just configuring an application with the address of all the nodes
  • Triggering web deploys using the event system built on top of Serf
  • Propagating changes to configuration to relevant nodes.
  • Updating DNS records to reflect cluster changes as they occur.
  • Much, much more.

Quick Start

First, download a pre-built Serf binary for your operating system, compile Serf yourself, or install using go get -u github.com/hashicorp/serf/cmd/serf.

Next, let's start a couple Serf agents. Agents run until they're told to quit and handle the communication of maintenance tasks of Serf. In a real Serf setup, each node in your system will run one or more Serf agents (it can run multiple agents if you're running multiple cluster types. e.g. web servers vs. memcached servers).

Start each Serf agent in a separate terminal session so that we can see the output of each. Start the first agent:

$ serf agent -node=foo -bind=127.0.0.1:5000 -rpc-addr=127.0.0.1:7373
...

Start the second agent in another terminal session (while the first is still running):

$ serf agent -node=bar -bind=127.0.0.1:5001 -rpc-addr=127.0.0.1:7374
...

At this point two Serf agents are running independently but are still unaware of each other. Let's now tell the first agent to join an existing cluster (the second agent). When starting a Serf agent, you must join an existing cluster by specifying at least one existing member. After this, Serf gossips and the remainder of the cluster becomes aware of the join. Run the following commands in a third terminal session.

$ serf join 127.0.0.1:5001
...

If you're watching your terminals, you should see both Serf agents become aware of the join. You can prove it by running serf members to see the members of the Serf cluster:

$ serf members
foo    127.0.0.1:5000    alive
bar    127.0.0.1:5001    alive
...

At this point, you can ctrl-C or force kill either Serf agent, and they'll update their membership lists appropriately. If you ctrl-C a Serf agent, it will gracefully leave by notifying the cluster of its intent to leave. If you force kill an agent, it will eventually (usually within seconds) be detected by another member of the cluster which will notify the cluster of the node failure.

Documentation

Full, comprehensive documentation is viewable on the Serf website:

https://www.serf.io/docs

Developing Serf

If you wish to work on Serf itself, you'll first need Go installed (version 1.10+ is required). Make sure you have Go properly installed, including setting up your GOPATH.

Next, clone this repository into $GOPATH/src/github.com/hashicorp/serf and then just type make. In a few moments, you'll have a working serf executable:

$ make
...
$ bin/serf
...

NOTE: make will also place a copy of the executable under $GOPATH/bin/

Serf is first and foremost a library with a command-line interface, serf. The Serf library is independent of the command line agent, serf. The serf binary is located under cmd/serf and can be installed stand alone by issuing the command go get -u github.com/hashicorp/serf/cmd/serf. Applications using the Serf library should only need to include github.com/hashicorp/serf.

Tests can be run by typing make test.

If you make any changes to the code, run make format in order to automatically format the code according to Go standards.

serf's People

Contributors

alvin-huang avatar armon avatar banks avatar captainill avatar charleswhchan avatar dependabot[bot] avatar derekchiang avatar derpston avatar dhiaayachi avatar dnephin avatar hanshasselberg avatar hashi-derek avatar hmrm avatar jefferai avatar jen20 avatar kyhavlov avatar masteinhauser avatar mcroydon avatar mitchellh avatar mkeeler avatar nathanielc avatar pearkes avatar pmenglund avatar preetapan avatar rboyer avatar ryanuber avatar sean- avatar sethvargo avatar shore avatar slackpad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

serf's Issues

serf agent support for config "profiles"

Currently we ship serf with the default configs that are optimized for a LAN environment. We should introduce a flag "profile" that can be used to select more appropriate configs. Examples include: lan, wan, local

NAT traversal

I don't think this needs a description. It would be handy if Serf clusters could do this.

Running serf join on an unavailable node hangs

Please let me know if I'm doing this wrong ...

To automate the act of joining a cluster I cache the list of servers that have joined. In case "this" client disconnects, it can go through the list and automatically rejoin itself to the cluster, even if the original host it joined through is not available. Herein lies the problem.

If for some reason it attempts to join through a node that is not currently available, the serf join command will just hang.

Is there a way to set a timeout and fail if the join command does not succeed?

Alternatively, is there a better way to do this?

Encryption & Authentication

It would be lovely to have authentication / access permissions on keys as well as encryption on the traffic so that you cannot just grab information based on the network.

Allow additional output options, and filtering members by -role=<role> and -state=<state>

Current behavior:

$ serf members
router    10.0.1.69    alive    uplink-router
ubuntu    10.0.5.253    alive    actor
livecd    10.0.5.252    alive    actor

New behaviors (perhaps for 0.2.1?)

Role Filter feature request:

$ serf members -role=actor
ubuntu    10.0.5.253    alive    actor
livecd    10.0.5.252    alive    actor

State Filter feature request:

$ serf members -state=alive
router    10.0.1.69    alive    uplink-router
ubuntu    10.0.5.253    alive    actor
livecd    10.0.5.252    alive    actor

Output options feature request:

JSON support is already in core for loading configuration.

$ serf members -json
{
  "members": [
    "10.0.1.69": [ name:"router", state:"alive", role:"uplink-router"],
    "10.0.5.253": [ name:"ubuntu", state:"alive", role:"actor"],
    "10.0.5.252": [ name:"livecd", state:"alive", role:"actor"]
  ]
}

Optionally, if it's easily available for implementation, CSV would also be helpful for 'cut'.

$ serf members -csv
name,ip,state,role
router,10.0.1.69,alive,uplink-router
ubuntu,10.0.5.253,alive,actor
livecd,10.0.5.252,alive,actor

And a -one option would also be quite helpful.

$ serf members -one -csv -state=alive -role=uplink-router | sed "1d" | cut -d ',' -f 2
10.0.1.69

Give me an actor, please:

$ serf members -detailed -one -json -state=alive -role=actor
{
  "members": [
    "10.0.1.252": [ name:"livecd", state:"alive", role:"actor", protocol: 1, protocols: [0, 1]]
  ]
}

Feel free to skip the CSV support, as it's just as easy to

serf members | sed "s/    /\t/g" | cut -f 2

since cut won't take more than a single character as a delimiter.

But if you're using serf with something like python and the 'sh' library, boy it sure is nice to use a CSVReader or json to deal with preformatted shell input.

import json, pprint
from sh import serf
# The serf binary ^ must be somewhere in your path for sh to find it.

# Check out http://amoffat.github.io/sh/#sub-commands
my_json_iter = serf.members("-json")  # Yields lines as __iter__
members_dict = json.load(my_json_iter)  # json.load iterates and json.loads expects a string.
pprint members_dict  # An an object comes out!

my_actor = "uplink-router"
pprint json.load(serf.members("-json","-role={}".format(my_actor))) # How bout a oneliner?

Having both may be overboard; however, JSON isn't exactly shellscript friendly.

serf agent should allow for `join` flag

join flag can be used to attempt to join a node immediately after starting an agent. What is left to figure out is the semantics of a join failure. We can either exit immediately with non-zero exit code, or have some sort of retry mechanism

Member role not updated when re-joining cluster

When a member with a particular role fails and later re-joins the cluster with a different role, the cluster state is not updated with the new role.

This will cause problems in situations where role names are changed slightly or when a name is reused by another role. I can see this happening often in situations where member names are somewhat arbitrary (e.g. based on network address), like inside an EC2 auto-scaling group.

Example

test-case.sh:

# start cluster
serf agent -node=node0 \
           -rpc-addr=:7373 \
           -bind=127.0.0.2 \
           -event-handler=handle.sh &

# node1 joins as an 'app'
sleep 1; serf agent -node=node1 \
                    -rpc-addr=:7374 \
                    -bind=127.0.0.3 \
                    -join=127.0.0.2 \
                    -role=app &
node1=$!
# node1 fails
sleep 1;  kill -9 $node1
# node1 re-joins later as a 'worker'
sleep 10; serf agent -node=node1 \
                     -rpc-addr=:7374 \
                     -bind=127.0.0.3 \
                     -join=127.0.0.2 \
                     -role=worker &

handle.sh:

while read member
do
  echo "$SERF_EVENT: $member" > handle.out.log
done

Expected:

# handle.out.log
member-join: node0  127.0.0.2
member-join: node1  127.0.0.3   app
member-failed: node1    127.0.0.3   app
member-join: node1  127.0.0.3   worker

Actual:

# handle.out.log
member-join: node0  127.0.0.2
member-join: node1  127.0.0.3   app
member-failed: node1    127.0.0.3   app
member-join: node1  127.0.0.3   app

Checking cluster state using serf members shows the same result. I haven't dug in yet to see what might be causing this, but I thought I'd start the discussion.

user event payload not visible by handler?

I feel like I must be doing something wrong, but user event data isn't showing up for me. I am testing in Vagrant with 2 VMs called source and target.

Source looks like this:

[vagrant@source ~]$ serf version
Serf v0.1.1
[vagrant@source ~]$ serf agent -role=source -log-level=debug -bind=192.168.50.100

Target is identical except I'm specifying an event handler:

[vagrant@target ~]$ serf version
Serf v0.1.1
[vagrant@target ~]$ serf agent -event-handler="/vagrant/event.sh" -role=target -log-level=debug -bind=192.168.50.200 

The handler script itself is trivial:

#!/bin/bash

echo
echo "$0 triggered!"
echo
echo "SERF_EVENT is ${SERF_EVENT}"
echo "SERF_SELF_NAME is ${SERF_SELF_NAME}"
echo "SERF_SELF_ROLE is ${SERF_SELF_ROLE}"
echo "SERF_USER_EVENT is ${SERF_USER_EVENT}"
echo
echo "BEGIN event data"
while read line; do
  echo $line
done
echo "END event data"
echo "$0 finished!"
echo

I know the script is reading stdin correctly because I can see the data from the member-join event:

2013/11/01 04:46:45 [INFO] serf: EventMemberJoin: source.serf.dev 192.168.50.100
2013/11/01 04:46:45 [INFO] agent: Received event: member-join
2013/11/01 04:46:45 [DEBUG] Event 'member-join' script output:
/vagrant/event.sh triggered!

SERF_EVENT is member-join
SERF_SELF_NAME is target.serf.dev
SERF_SELF_ROLE is target
SERF_USER_EVENT is

BEGIN event data
source.serf.dev 192.168.50.100 source
END event data
/vagrant/event.sh finished!

But when I fire off my own event:

[vagrant@source vagrant]$ serf event foo bar
...
2013/11/01 04:49:27 Requesting user event send: foo "bar"

I don't see the event data:

2013/11/01 04:49:27 [DEBUG] serf-delegate: messageUserEventType: foo
2013/11/01 04:49:27 [INFO] agent: Received event: user-event: foo
2013/11/01 04:49:27 [DEBUG] Event 'user' script output:
/vagrant/event.sh triggered!

SERF_EVENT is user
SERF_SELF_NAME is target.serf.dev
SERF_SELF_ROLE is target
SERF_USER_EVENT is foo

BEGIN event data
END event data
/vagrant/event.sh finished!

I'm on CentOS 6 x86_64 if it matters. Thanks!

Ship v0.2

I'm making this issue to vote that we ship v0.2. after #50 and #51 are merged and after some integ testing with our examples, as well as upgrades. We can close this when we ship.

@armon

trunk fails to run serf version

same results with make in the src tree per readme

$ go get -u -v github.com/hasicorp/serf
$ serf version
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x38 pc=0x42a0d8]

goroutine 1 [running]:
runtime.panic(0x7042c0, 0xb09c08)
/home/kapil/projects/go-lang/go/src/pkg/runtime/panic.c:266 +0xb6
github.com/hashicorp/serf/command.(_VersionCommand).Run(0xc210052840, 0xc21000a020, 0x0, 0x0, 0xc210053818)
/home/kapil/src/github.com/hashicorp/serf/command/version.go:33 +0x398
github.com/mitchellh/cli.(_CLI).Run(0xc21004abd0, 0xc21004abd0, 0xc2100638d0, 0xc2100638a0)
/home/kapil/src/github.com/mitchellh/cli/cli.go:69 +0x252
main.realMain(0x401be5)
/home/kapil/src/github.com/hashicorp/serf/main.go:36 +0x2ab
main.main()
/home/kapil/src/github.com/hashicorp/serf/main.go:12 +0x1e

goroutine 3 [syscall]:
os/signal.loop()
/home/kapil/projects/go-lang/go/src/pkg/os/signal/signal_unix.go:21 +0x1e
created by os/signal.init·1
/home/kapil/projects/go-lang/go/src/pkg/os/signal/signal_unix.go:27 +0x31

Coalesce user events

It is possible that the rate of user events will be quite high, especially during a Node join where it may sync a large number of past user events. This causes Serf to fire many event handles at once, which may not be desirable.

It may be nice to use event coalescing and only select the "latest" event per event name. A reasonable default is 1 second for event coalescing, since it at least resolves the Node join case.

serf does not work on 386 hosts

This is golang.org/issue/599

uint64's need to be aligned on 64bit boundaries to work on 386 platforms.

220887(~) % ==> Starting Serf agent...
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x1 pc=0x8107e4c]

goroutine 1 [running]:
runtime.panic(0x827f260, 0x85a5a28)
/home/dfc/go/src/pkg/runtime/panic.c:266 +0xac
sync/atomic.AddUint64(0x1880b214, 0x1, 0x0, 0x82797e0, 0x4)
/home/dfc/go/src/pkg/sync/atomic/asm_386.s:118 +0xc
github.com/hashicorp/serf/serf.(_LamportClock).Increment(0x1880b214, 0x200, 0x0)
/home/dfc/src/github.com/hashicorp/serf/serf/lamport.go:24 +0x41
github.com/hashicorp/serf/serf.Create(0x188531c0, 0x40, 0x0, 0x1882fd80)
/home/dfc/src/github.com/hashicorp/serf/serf/serf.go:183 +0x34f
github.com/hashicorp/serf/command/agent.(_Agent).Start(0x18853230, 0x0, 0x0)
/home/dfc/src/github.com/hashicorp/serf/command/agent/agent.go:160 +0x1db
github.com/hashicorp/serf/command/agent.(_Command).Run(0x1880a9f0, 0x1880a010, 0x0, 0x0, 0xb7520408, ...)
/home/dfc/src/github.com/hashicorp/serf/command/agent/command.go:171 +0xc7b
github.com/hashicorp/serf/cli.(_CLI).Run(0x188309c0, 0x827ef60, 0x85a8994, 0xb7520408)
/home/dfc/src/github.com/hashicorp/serf/cli/cli.go:52 +0x16a
main.realMain(0x8049776)
/home/dfc/src/github.com/hashicorp/serf/main.go:37 +0x242
main.main()
/home/dfc/src/github.com/hashicorp/serf/main.go:12 +0x21

Filter event scripts for certain events

It would be nice to specify event scripts that are filtered at the Serf level before being executed, so you can say "only execute this script if it is a member-join event"

serf broadcast queues can't drain if no peers

The queues can potentially backup infinitely if there are no peers to broadcast to. We need to probably add a maximum queue depth and start dropping messages to prevent an infinite queue.

ipv6 support

I can't specify an ipv6 address to bind to:

serf agent -node=agent-one -bind=::1
Invalid bind address: too many colons in address ::1
serf agent -node=agent-one -bind=[::1]
Invalid bind address: missing port in address [::1]
serf agent -node=agent-one -bind=[::1]:1234
==> Starting Serf agent...
==> Error creating Serf: Failed to start TCP listener. Err: listen tcp: too many colons in address ::1:1234

So it looks like it almost works but might need some tweaking of the parsing.

Middleman doesn't update on change

I pulled down the repo and figured I would do some quick CSS work. I had no trouble getting middleman up and running. However, when I make a change to the source files, save and reload my browser the changes are not shown. I also had mixed results with stopping and starting middleman again.

Some people in IRC noted they had similar problems with Serf and Middleman but not in other projects with Middleman. For example, someone mentioned Packer works as expected with Middleman. I cloned the Packer repo and verified that this is true - it works fine for me. There must be something wonky in this project since the problem is isolated there.

Mac OS, Ruby 1.9.3-p448

Agent startup units for Upstart and systemd

Here's a basic upstart unit.

# Serf 0.2.x Agent (Upstart unit)
description "Serf Agent"
start on (local-filesystems and net-device-up IFACE!=lo)
stop on runlevel [06]

kill signal INT    # Use SIGINT instead of SIGTERM so serf can depart the cluster.
respawn            # Restart the process if it dies and GOAL was not 'stopping'.
kill timeout 90    # Allow 90 seconds for serf to die before sending SIGKILL.

env SERF=/usr/local/bin/serf
env CFG_FOLDER=/etc/serf/

pre-start script
    install -d -g root -o root $CFG_FOLDER; # make the empty directory, owned by root.
end script

exec $SERF agent -config-dir=$CFG_FOLDER
post-stop exec sleep 10  # Wait ten seconds before respawn attempts.

Here's a basic systemd service unit.

# Serf 0.2.x Agent (systemd service unit)
[Unit]
Description=Serf Agent
After=syslog.target
After=network.target

[Service]
Type=simple
# Ensure the configuration directory exists.
ExecPreStart=/usr/bin/install -d -g root -o root /etc/serf/
ExecStart=/usr/local/bin/serf agent -config-dir=/etc/serf/
# Use SIGINT instead of SIGTERM so serf can depart the cluster.
KillSignal=SIGINT
# Restart on success, failure, and any emitted signals like HUP.
Restart=always
# Wait ten seconds before respawn attempts.
RestartSec=10

[Install]
WantedBy=multi-user.target

The behavior should be reasonably identical for both, as documented by the comments. systemd defaults to 90 second timeouts for start and stop, but does not do variable expansion like upstart does, so a sane default configuration is hardcoded.

Join/Leave event flushing causes active agents to show as 'leaving'

To replicate:

start foo
$ serf agent -node=foo -bind=127.0.0.10 -rpc-addr=127.0.0.1:7373

start bar
$ serf agent -node=bar -bind=127.0.0.11 -rpc-addr=127.0.0.1:7374

tell foo to join bar
$ serf join -rpc-addr=127.0.0.1:7373 127.0.0.11

at this point, both show the other as alive, now control-c foo

bar now sees foo as 'left'
$ serf members -rpc-addr=127.0.0.1:7374

bar    127.0.0.11    alive
foo    127.0.0.10    left

restart foo:
$ serf agent -node=foo -bind=127.0.0.10 -rpc-addr=127.0.0.1:7373

both don't see the other, as expected -- now tell bar to join foo:
$ serf join -rpc-addr=127.0.0.1:7374 127.0.0.10

In this state, bar sees that 'foo' is alive, but foo thinks that 'foo' is leaving.
$ serf members

foo    127.0.0.10    leaving
bar    127.0.0.11    alive

Support DNS as the address gossiped out

Memberlist chooses the first private IP. If we join from outside that cluster, we can't communicate with that Serf cluster. This is by design, but we also have no way to tell memberlist to choose to a public IP. We need to expose that.

Persistent cluster membership

I was surprised by the behaviour of the agent when giving it a SIGINT - it gracefully leaves the cluster by notifying other nodes before it shuts down. When restarting the agent, it is no longer a member of the cluster. It doesn't attempt to rejoin, and the other nodes don't attempt to contact it to tell it to rejoin.

When killing the agent with a SIGTERM or SIGKILL, it makes no attempt to leave the cluster gracefully. Other agents eventually notice that it disappeared, and make regular attempts to contact it. When the agent comes back up, other agents will tell it to rejoin the cluster, and it does.

I found this behaviour surprising because I expected cluster persistence to be less fragile. If I issue a serf join foo.example.com I expect that this action won't be undone unless I later issue a serf leave on that agent, or a force-leave on another agent.

So, I propose:

  • Having serf agents not attempt to leave the cluster when given a SIGINT.
  • Adding a serf leave command, to compliment force-leave, that performs an orderly leaving of the local agent from the cluster.
  • Having serf agents persist enough cluster state locally (/var/lib/serf/ring?) to be able to bootstrap that node back into the cluster immediately on startup, even if many other agents are down.

This would have some benefits:

  • Better operational predictability, no side-effects.
  • Faster agent recovery time - not waiting for other agents to attempt a reconnect.
  • Resilience against whole-cluster state loss, such as during a power failure that affects a cluster whose agents are all in the same rack.

This is basically the same ring/cluster persistence model used by Riak, I believe. I've used Riak in production for over a year and I've grown to love the resilience. Nodes go up, nodes go down, and the operator never has to do a thing to maintain cluster state.

In terms of implementation, we have a few options:

  1. The operator could store a member list in the config.
  2. Serf could periodically (or more likely, on change events) write a cluster state file to local disk.
  3. An init script could, before shutting down serf, query the member list, store it locally, and write this into the config so it is ready for the next startup.

Of these options, I think (1) is poor because it requires the operator stay on top of cluster membership and write it to every agent's config file. This seems like error-prone busywork to me.

Option (3) feels fragile to me - it seems like a hack. I feel like cluster membership persistence should be a first class feature, so I would be in favour of option (2) and having serf do this by default.

It was suggested that this could be implemented as a plugin for use with the eventual plugin system. This feels like just a slightly tidier version of option (3), so I'm not keen on it.

While I'm obviously not in a position to dictate project goals, I feel like serf (and every other piece of potentially production software) should strive for:

  • Robustness in the face of broad network, system, and power failures.
  • Operational predictability - don't surprise the operator, don't have side effects.
  • Minimal configuration.

I think Riak's cluster membership model is perfect in this regard, and I think it should be a model for distributed system membership.

Opinions, anyone? :)

Thanks for reading!

Multiple separate clusters

Hey - I just stumbled across serf. Cool!

Usecase: I want to maintain two clusters, let's say one cluster fo webservers and another of memcache servers.

Question: Is there a way for me to make sure that the two serf cluster don't converge into one giant cluster?

BUG: serf join after a SIGTERM does not ignore replay log

I worked with @mikespokefire in #serfdom earlier today to identify a possible bug in serf join.

We confirmed he is using Serf v0.2.1 and is not passing in -replay so the events should be ignored by default.

The issue:

If an agent catches a SIGTERM and then is restarted (manually, in this case) it replays all prior events even though it should not.

To replicate:

  1. serf A starts
  2. serf B starts and joins A
  3. send a bunch of user events
  4. confirm serf A/B see the events
  5. serf B gets SIGTERM
  6. serf B manually restarted
  7. serf B sees replay log of prior events
  8. serf C starts and joins A
  9. serf C does not see replay log of prior events

Example provided here with full output from each agent.

Thanks for your help @mikespokefire!

serf version/members structured output

I am curious your thoughts on adding JSON/YAML/... support for commands like version or members.

With the support for different agent protocols upgrading is now rather complicated. For things like Chef/Puppet/... it would be nice for them to be able to parse things like,

  • What version of the agent is installed?
  • What protocols are supported by my agent?
  • What other protocols are supported by the other members in my cluster?

So they can safety and correctly upgrade a serf cluster.

Serf uses loopback address on public IP address nodes by default

If you fail to pass the appropriate bind network address via -bind, serf will use 127.0.0.1 as the nodes address. This causes members in the cluster to see the node as 127.0.0.1.

A fix is to pass the public IP to serf agent via the -bind argument, however it should not use the loopback address by default.

Serf currently uses the first private IP address it can find... which happens to be 127.0.0.1.

Support mDNS for peer discovery

In addition to an explicit join, it would be interesting to suport mDNS for discovering peers. This would only work in network environments supporting multicast, but does enable completely zero touch discovery

Single node event triggering

I just found Serf and it's a perfect fit for a project I've been working on! I'm very excited to use it. Great work!

From the documentation, it appears that all events that occur reach every node in the cluster. That makes a lot of sense for events like node joins and departure where everyone needs to be on the same page. For custom events, I think it would be great if the event could be limited to a single node.

Consider a node failure. Every node needs to know about the failure, but it would be terrific if only a single node would be given the task of updating the DNS server, recovering tasks that the node had been working on, etc.

Would that be feasible? Or is there already way of accomplishing this as-is and I've just missed it?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.