GithubHelp home page GithubHelp logo

moby / swarmkit Goto Github PK

View Code? Open in Web Editor NEW
3.3K 3.3K 607.0 40.41 MB

A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more.

License: Apache License 2.0

Makefile 0.31% Go 97.82% Shell 0.09% TLA 1.63% Dockerfile 0.13% HCL 0.02%

swarmkit's Introduction

The Moby Project

Moby Project logo

Moby is an open-source project created by Docker to enable and accelerate software containerization.

It provides a "Lego set" of toolkit components, the framework for assembling them into custom container-based systems, and a place for all container enthusiasts and professionals to experiment and exchange ideas. Components include container build tools, a container registry, orchestration tools, a runtime and more, and these can be used as building blocks in conjunction with other tools and projects.

Principles

Moby is an open project guided by strong principles, aiming to be modular, flexible and without too strong an opinion on user experience. It is open to the community to help set its direction.

  • Modular: the project includes lots of components that have well-defined functions and APIs that work together.
  • Batteries included but swappable: Moby includes enough components to build fully featured container systems, but its modular architecture ensures that most of the components can be swapped by different implementations.
  • Usable security: Moby provides secure defaults without compromising usability.
  • Developer focused: The APIs are intended to be functional and useful to build powerful tools. They are not necessarily intended as end user tools but as components aimed at developers. Documentation and UX is aimed at developers not end users.

Audience

The Moby Project is intended for engineers, integrators and enthusiasts looking to modify, hack, fix, experiment, invent and build systems based on containers. It is not for people looking for a commercially supported system, but for people who want to work and learn with open source code.

Relationship with Docker

The components and tools in the Moby Project are initially the open source components that Docker and the community have built for the Docker Project. New projects can be added if they fit with the community goals. Docker is committed to using Moby as the upstream for the Docker Product. However, other projects are also encouraged to use Moby as an upstream, and to reuse the components in diverse ways, and all these uses will be treated in the same way. External maintainers and contributors are welcomed.

The Moby project is not intended as a location for support or feature requests for Docker products, but as a place for contributors to work on open source code, fix bugs, and make the code more useful. The releases are supported by the maintainers, community and users, on a best efforts basis only, and are not intended for customers who want enterprise or commercial support; Docker EE is the appropriate product for these use cases.


Legal

Brought to you courtesy of our legal counsel. For more context, please see the NOTICE document in this repo.

Use and transfer of Moby may be subject to certain restrictions by the United States and other governments.

It is your responsibility to ensure that your use and/or transfer does not violate applicable laws.

For more information, please see https://www.bis.doc.gov

Licensing

Moby is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

swarmkit's People

Contributors

aaronlehmann avatar abhi avatar aboch avatar abronan avatar akihirosuda avatar allencloud avatar aluzzardi avatar anshulpundir avatar corhere avatar cpuguy83 avatar crazy-max avatar cyli avatar diogomonica avatar dongluochen avatar dperny avatar errordeveloper avatar ijc avatar lk4d4 avatar mavenugo avatar mrjana avatar nanxiao avatar nishanttotla avatar olljanat avatar runshenzhu avatar stevvooe avatar thajeztah avatar tonistiigi avatar vieux avatar wk8 avatar yongtang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swarmkit's Issues

Discussion Proposal: Swarm networking design options

Swarm networking design options

This discusses 2 different options we have on the table on how to do networking/service discovery/load balancing in swarmkit. I feel both these options needs to be understood and best possible option chosen based on our short term and long term goals for swarmkit and not based on how networking is designed today.

Option 1: Swarm manager manages networking state

In this option swarm manager manages the creation of networks and endpoints and allocating the required network and endpoint resources including any driver specific resources. This means drivers (both network driver and IPAM driver) are part of manager for resource allocation purposes.

The rough control flow looks like this:

  • User creates a network through swarmkit api/cli
  • Swarm manager invokes libnetwork to create Network
  • User creates a Job and attaches to a set of Networks through swarmkit api/cli
  • Swarm manager invokes libnetwork to create Service object for the Job which might allocate a VIP or an Anycast IP or whatever is required depending on the configuration and driver used.
  • Swarm manager generates the required number of Tasks for the Job
  • For each assigned Task manager invokes libnetwork to create an Endpoint for each Network that the Job belongs to.
  • Manager invokes libnetwork to update the Service object with the list of Endpoints that it just created in one shot.
  • Manager gathers all the required network state(both task specific and service specific) bound to Task and updates the Task state
  • The assigned agent node downloads the Task state and generates a set of imperative steps to achieve the Task state by creating local scope networks using the network ids in Task state, creating a container and connecting the container to all the networks that the `Task belongs in.

As it stands today libnetwork is a collection of go packages with the dependency graph rooted at libnetwork package. libnetwork cluster model is a completely decentralized model without any master/slave hierarchy. All libnetwork instances are equal. In the swarmkit model, creation of resources happens in the manager, while the act of joining a container to a network happens in the engine.

Rough work items to achieve this:

  • Implement a libkv backend for the libnetwork instance in the manager to integrate with raftstore. Note that this can be a simple enough back end without any need for CAS or distributed watch/watch tree capabilities.
  • Implement the necessary manager/agent network handling code which is mostly invoking a set of libnetwork apis. For how it looks please take a look at swarmkit-poc project.
  • If swarm and engine are going to be separate binaries then linking the whole of libnetwork against swarm binary may be unnecessary memory footprint usage. So if they are separate binaries libnetwork can be split into two top level packages, one handling the creation part, the other handling the join part. While swarm manager links only against the creation part, the engine links against both.

Pros

  • Centralized state management which provides a simplified framework for distributed state management for the current set of functionality and future additions.
  • No back and forth messaging to synchronize state. State is written only by manager and agents merely consume state.
  • Overall reduces the latency in launching tasks

Cons

  • Need plugin involvement in the manager
  • Requires a change in engine-api or need a private endpoint to talk b/w swarm agent and engine(entirely depends on swarm/engine integration design)
  • libnetwork code had to be refactored a bit if we need to reduce memory footprint

Option 2: Engine manages all network state

In this model swarm manager literally works the same way as docker/swarm, in that networking state is completely opaque to the manager. All kinds of networking state management is handled in an engine instance.

The rough control flow looks like this:

  • User creates a network through swarmkit cli/api
  • Swarm manager chooses an engine node and generates an engine request a create the Network
  • The libnetwork instance in that engine node creates the Network with the help of the local IPAM and network driver. Engine updates the network object into the raft store(running in the manager) with the help of the raft store libkv backend
  • User creates a Job attaching to the created Network
  • Swarm manager again chooses a random engine node and generates an engine request to create a Service
  • Engine node creates the Service and gets back to swarm manager owned raft store to update the Service object
    -Swarm manager generates the required number of Tasks and updates the Tasks in the raft store
  • The assigned agent node downloads the tasks and generates a set of imperative steps to CreateEndpoint and Join them to containers.
  • The libnetwork instance with the help of local IPAM and network drivers achieves these imperative steps and updates back the manager raftstore using the libkv backend.
  • Each agent node which has an assigned Task individually updates the Service object with the locally generated endpoints using the local libnetwork instance which in turn updates the manager raftstore using the libkv backend
  • The agent node updates Task status with all the endpoint information and sends back the information to manager which updates it into raftstore

The rough work items to achieve this:

  • Enhance raftstore to become a full fledged kv store including adding support to AtomicPut and AtomicDelete and support for remote watches
  • Implement a libkv backend client library for raftstore which needs to support remote watches, manage manager failover, possibly understand topology if any configured
  • Implement the necessary manager to forward and manager networking state related operations from the user.
  • Implement the necessary agent code to orchestrate the steps required in the engine side.

Pros

  • Requires no engine-api change
  • No need to refactor libnetwork
  • No network plugins in manager

Cons

  • Need to make the raftstore a full-fledge kv store which is not a trivial task
  • Relies on decentralized state management and so will incur the associated complexity that it involves for any new functionality addition like Services
  • This architecture will lead to increased back and forth chit-chatting because state synchronization is distributed
  • This architecture will lead to inherently increased latency and overhead when we orchestrate tasks

CPU Resources

In docker/swarm, CPUs were represented by number of cores. A container could define a reservation and limits by specifying how many cores it wanted.

This is the same approach taken almost everyone.

To request 128MB of memory and half a CPU in Aurora one would specify:

resources = Resources(cpu = 0.5, ram = 128 * MB),

/cc @aaronlehmann @stevvooe @vieux @LK4D4 @dongluochen

Document Log Drivers

Docker/swarm currently just proxies logs calls back to the engine.

This is problematic since swarm is definitely not the best place to do log management.

Instead, we should do something like delegating that to log drivers. Perhaps we should simply have documentation around that, or perhaps we should have a default method.

Study performance impact of reflect.DeepEqual

While handy, reflect.DeepEqual can be slow since it uses reflection. This may be impactful when run thousands or tens of thousands of times in a tight loop. We do have the ability to generate Equal methods for protobuf, but may not want to take on this extra code. We should make a performance comparison of reflect.DeepEqual and gogo's generated equality methods.

Design Proposal: Volume support

Assumption:

  • Leverage existing Engine Volume plugin infra and plugins
  • For the initial implementation, the volume plugin(s) will be required to be preinstalled on all nodes in the cluster.
  • In the longer term, we can store the node --> "volume plugins installed" information in Raft, and use it during scheduling decisions

Cluster commands

  • swarmctl volume create
    • Create a new volume at the cluster wide volume
    • Options
      • Name - unique in the cluster [required]
      • Volume driver [required]
      • Driver options [optional]
    • Implementation
      • Insert into Raftstore
      • Insert into memory store
  • swarmctl volume rm
    • Remove a cluster level volume
    • Options
      • Name [required]
    • Implementation
      • Check if the volume is being used by any Tasks; return error if it is.
      • Delete from Raftstore
      • Delete from memory store
  • swarmctl volume ls
    • List volumes defined at the cluster level
    • Implementation
      • Query the in-memory store and return the formatted information
  • swarmctl volume inspect
    • Get information about a particular volume
    • Input: Name or ID [required]
    • Implementation
      • Query the in-memory store and return the formatted information

We may support swarmctl update to rename a cluster wide volume.


Cases to handle

  • Ephemeral vol --> 1 task {ScratchDir}
  • Bind mount --> 1 task {LocalDir}
  • StickyVol --> task {Remote storage volume}
    • 1 volume --> 1 task
    • 1 volume --> N tasks

Local Disk usage (does not need a cluster wide resource)

  • ScratchDir
  • LocalDir -- will not be defined at the cluster level; scheduling will fail if Dir does not exist)

Remote volumes: need a cluster wide volume to be defined. Can use AWS etc

  • Support determined by drivers on Engine
  • Will support a swarmctl volume <name> --operation {clone, snapshot, โ€ฆ} --opts "k-v pairs"

Elastic volumes (automatically request new volumes as new tasks get spun up)

  • TODO

YAML

ScratchDir {tmp}

No cluster wide volume needs to be created.
Note: the volume is scoped to a single Task, and thus it need note be named.
Issue: the Agent will have to create and delete Task local volumes

services:
  redis:
    image: redis
    mount:
      - targetPath: /scratch          # Temp directory mounted to /scratch/
        type: tmp                          # tmp - cleared on task start/restart; always mounted rw

LocalDir

No cluster wide volume needs to be created.
Note: the volume is scoped to a single Task, and thus it need note be named.
Issue: the Agent will have to create and delete Task local volumes

(mapping to actual cadvisor docker run command)

sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:rw \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  --detach=true \
  --name=cadvisor \
  google/cadvisor:latest
services:
  cadvisor:
    image: cadvisor
    mount:
      - targetPath: /rootfs    # path in container
        mask: ro               # ro or rw
        type: hostDir          # bind mount
        sourcePath: /          # path on host
      - targetPath: /var/run
        mask: rw
        type: hostDir
        sourcePath: /var/run
      - targetPath: /sys       
        mask: ro
        type: hostDir
        sourcePath: /sys
      - targetPath: /var/lib/docker
        mask: ro
        type: hostDir
        sourcePath: /var/lib/docker
# port mapping not shown

Remote storage volume

Cluster wide volume equivalent to the below will be created:
swarmctl volume create RedisData --driver glusterfs --opts "k1=v1" --opts "k2=v2"

services:
  redis:
    image: redis
    mount:
      name: RedisData          # logical name, points to volume below
      targetPath: /data        # path mounted - MyClusterVolume:/glustervol/ to /data/
      mask: rw                 # ro or rw
      perTaskFolder: 1         # 0/1 - 0 => all tasks share same volume folder
                               #       1 => each task gets a private folder inside volume (taskid(?) used to create a folder)
      sourcePath: /            # path in the glusterfs volume

volume:
  name: RedisData              # Cluster volume name
  driver: glusterfs
    opts:
      k1: v1
      k2: v2

Workflow

At the Manager, volume mount information is stored with each Task. When a task is scheduled on a Node, this volume mounting information is communicated to the Node.

Local Disk Usage

  1. User does not need to create a cluster wide volume
  2. User runs a Job that references a local disk volume
  3. Node uses the volume mounting information to mount the local host volume or to create a scratch dir

Remote volume

  1. User creates a cluster wide volume
    • For Dockercon, all nodes joining the cluster will need to contain all volume plugins (this will need to be fixed quickly afterwards)
    • This volume information will be stored at the Manager
  2. User runs a Job that references the cluster wide volume
  3. When Task is assigned to a Node
  4. Task information (containing volume mount info) is fetched by the Agent
  5. Agent configures the Engine to mount the remote volume when running the container
    • If perTask == 1, then the Agent creates a separate folder inside the volume for the task (perhaps named TaskID so it is portable across machines?)

When the task is rescheduled on a different node, step 3 through step 5 are repeated.
CC: @aluzzardi @stevvooe @vieux

Sometimes raft deadlocks

We saw it on ci: on 3-machine cluster creation sometimes nodes can't decide who is leader and hang on term 2 forever:

    node 1: term 2, leader 1
    node 2: term 2, leader 1
    node 3: term 2, leader 3

It appears on CI pretty often I think because of limited CPU time, I saw it on my machine as well when used smaller raft tick.
I dunno, maybe will be fixed by #131

Limit transaction batch sizes

All components that write to the store should have a limit on the number of updates that happen within a single transaction. This is both to avoid hitting the limit on serialized raft message size (presently 1.5 MB), and to avoid tying up the store for a long period of time.

To make the size of serialized updates predictable, there should be a limit on the size of client-submitted objects, imposed by the API. We would need to leave enough wiggle room that even after fields are filled in on the server size, these objects stay below a predictable size threshold.

raft: Stuck election round and cluster left without leader

on master

Reproduce steps

  • Start a 3 nodes cluster
  • Restart a follower
  • Stop the leader

What happens

The cluster is stuck in an election round involving the past leader. node 2 and node 3 are exchanging votes indefinitely. Recovering the leader ends the election loop and elects a new node as the leader.

What should happen instead

The 2 nodes left should be electing a new leader without the crashed leader involved (or at least should ignore the vote and go on with the election as a majority is available). Also it seems that one of the nodes keeps receiving MsgVotes from the leader.

logs node 1 (node recovery ending the stuck election loop at term 13):

$ sudo swarmd manager init
raft2016/04/05 10:10:30 INFO: 77e98f5f46dec8c8 became follower at term 2
raft2016/04/05 10:10:30 INFO: newRaft 77e98f5f46dec8c8 [peers: [], term: 2, commit: 4, applied: 0, lastindex: 4, lastterm: 2]
INFO[0000] Listening for connections                     addr=[::]:4242 proto=tcp
raft2016/04/05 10:10:30 INFO: 77e98f5f46dec8c8 became follower at term 2
raft2016/04/05 10:10:30 INFO: newRaft 77e98f5f46dec8c8 [peers: [], term: 2, commit: 4, applied: 0, lastindex: 4, lastterm: 2]
raft2016/04/05 10:10:33 INFO: 77e98f5f46dec8c8 [term: 2] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 13]
raft2016/04/05 10:10:33 INFO: 77e98f5f46dec8c8 became follower at term 13
raft2016/04/05 10:10:33 INFO: 77e98f5f46dec8c8 [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 13
raft2016/04/05 10:10:33 INFO: raft.node: 77e98f5f46dec8c8 elected leader 2a1a85c1d8f4b39b at term 13

logs node 2 (elected leader at term 13):

$ sudo swarmd manager join --join-cluster "0.0.0.0:4242" --state-dir "/var/lib/docker2/cluster" --listen-addr "0.0.0.0:4243"
INFO[0000] Listening for connections                     addr=[::]:4243 proto=tcp
2016/04/05 10:09:04 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
raft2016/04/05 10:09:04 INFO: 2a1a85c1d8f4b39b became follower at term 0
raft2016/04/05 10:09:04 INFO: newRaft 2a1a85c1d8f4b39b [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2016/04/05 10:09:04 INFO: 2a1a85c1d8f4b39b became follower at term 1
raft2016/04/05 10:09:05 INFO: 2a1a85c1d8f4b39b [term: 1] received a MsgHeartbeat message with higher term from 77e98f5f46dec8c8 [term: 2]
raft2016/04/05 10:09:05 INFO: 2a1a85c1d8f4b39b became follower at term 2
raft2016/04/05 10:09:05 INFO: raft.node: 2a1a85c1d8f4b39b elected leader 77e98f5f46dec8c8 at term 2
2016/04/05 10:09:16 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/04/05 10:09:17 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/04/05 10:09:19 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/04/05 10:09:22 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/04/05 10:09:25 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/04/05 10:09:32 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/04/05 10:09:39 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/04/05 10:09:40 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:42 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:09:44 INFO: 2a1a85c1d8f4b39b is starting a new election at term 2
raft2016/04/05 10:09:44 INFO: 2a1a85c1d8f4b39b became candidate at term 3
raft2016/04/05 10:09:44 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 3
raft2016/04/05 10:09:44 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 3
raft2016/04/05 10:09:44 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 3
raft2016/04/05 10:09:44 INFO: raft.node: 2a1a85c1d8f4b39b lost leader 77e98f5f46dec8c8 at term 3
2016/04/05 10:09:45 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:45 grpc: Conn.transportMonitor exits due to: grpc: the client connection is closing
2016/04/05 10:09:45 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:47 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:09:48 INFO: 2a1a85c1d8f4b39b is starting a new election at term 3
raft2016/04/05 10:09:48 INFO: 2a1a85c1d8f4b39b became candidate at term 4
raft2016/04/05 10:09:48 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 4
raft2016/04/05 10:09:48 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 4
raft2016/04/05 10:09:48 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 4
2016/04/05 10:09:49 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:50 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:50 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:09:51 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:09:52 INFO: 2a1a85c1d8f4b39b is starting a new election at term 4
raft2016/04/05 10:09:52 INFO: 2a1a85c1d8f4b39b became candidate at term 5
raft2016/04/05 10:09:52 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 5
raft2016/04/05 10:09:52 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 5
raft2016/04/05 10:09:52 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 5
2016/04/05 10:09:53 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:54 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:54 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:09:55 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:09:58 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:09:58 INFO: 2a1a85c1d8f4b39b is starting a new election at term 5
raft2016/04/05 10:09:58 INFO: 2a1a85c1d8f4b39b became candidate at term 6
raft2016/04/05 10:09:58 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 6
raft2016/04/05 10:09:58 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 6
raft2016/04/05 10:09:58 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 6
2016/04/05 10:09:59 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:01 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:02 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:02 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
raft2016/04/05 10:10:03 INFO: 2a1a85c1d8f4b39b is starting a new election at term 6
raft2016/04/05 10:10:03 INFO: 2a1a85c1d8f4b39b became candidate at term 7
raft2016/04/05 10:10:03 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 7
raft2016/04/05 10:10:03 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 7
raft2016/04/05 10:10:03 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 7
2016/04/05 10:10:04 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:04 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:10:04 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:06 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:10:07 INFO: 2a1a85c1d8f4b39b is starting a new election at term 7
raft2016/04/05 10:10:07 INFO: 2a1a85c1d8f4b39b became candidate at term 8
raft2016/04/05 10:10:07 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 8
raft2016/04/05 10:10:07 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 8
raft2016/04/05 10:10:07 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 8
2016/04/05 10:10:08 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:08 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:10:08 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:10 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:12 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:10:12 INFO: 2a1a85c1d8f4b39b is starting a new election at term 8
raft2016/04/05 10:10:12 INFO: 2a1a85c1d8f4b39b became candidate at term 9
raft2016/04/05 10:10:12 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 9
raft2016/04/05 10:10:12 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 9
raft2016/04/05 10:10:12 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 9
2016/04/05 10:10:13 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:15 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:16 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:16 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:10:18 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:10:18 INFO: 2a1a85c1d8f4b39b is starting a new election at term 9
raft2016/04/05 10:10:18 INFO: 2a1a85c1d8f4b39b became candidate at term 10
raft2016/04/05 10:10:18 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 10
raft2016/04/05 10:10:18 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 10
raft2016/04/05 10:10:18 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 10
2016/04/05 10:10:19 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:21 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:23 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:23 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
raft2016/04/05 10:10:23 INFO: 2a1a85c1d8f4b39b is starting a new election at term 10
raft2016/04/05 10:10:23 INFO: 2a1a85c1d8f4b39b became candidate at term 11
raft2016/04/05 10:10:23 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 11
raft2016/04/05 10:10:23 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 11
raft2016/04/05 10:10:23 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 11
2016/04/05 10:10:24 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:24 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:10:24 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:26 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:10:27 INFO: 2a1a85c1d8f4b39b is starting a new election at term 11
raft2016/04/05 10:10:27 INFO: 2a1a85c1d8f4b39b became candidate at term 12
raft2016/04/05 10:10:27 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 12
raft2016/04/05 10:10:27 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 12
raft2016/04/05 10:10:27 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 12
2016/04/05 10:10:28 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:28 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:10:28 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:30 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:10:32 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b is starting a new election at term 12
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b became candidate at term 13
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b received vote from 2a1a85c1d8f4b39b at term 13
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 30fd4a114dca23df at term 13
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b [logterm: 2, index: 4] sent vote request to 77e98f5f46dec8c8 at term 13
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b received vote from 77e98f5f46dec8c8 at term 13
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b [quorum:2] has received 2 votes and 0 vote rejections
raft2016/04/05 10:10:33 INFO: 2a1a85c1d8f4b39b became leader at term 13
raft2016/04/05 10:10:33 INFO: raft.node: 2a1a85c1d8f4b39b elected leader 2a1a85c1d8f4b39b at term 13
2016/04/05 10:12:16 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/04/05 10:12:17 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:12:17 grpc: Conn.transportMonitor exits due to: grpc: the client connection is closing
2016/04/05 10:12:17 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:12:19 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"
2016/04/05 10:12:19 Failed to dial 0.0.0.0:4242: grpc: the client connection is closing; please retry.
2016/04/05 10:12:19 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4242: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4242"

logs node 3:

$ sudo swarmd manager join --join-cluster "0.0.0.0:4242" --state-dir "/var/lib/docker3/cluster" --listen-addr "0.0.0.0:4244"
raft2016/04/05 10:09:30 INFO: 30fd4a114dca23df became follower at term 2
raft2016/04/05 10:09:30 INFO: newRaft 30fd4a114dca23df [peers: [], term: 2, commit: 4, applied: 0, lastindex: 4, lastterm: 2]
INFO[0000] Listening for connections                     addr=[::]:4244 proto=tcp
ERRO[0000] can't join cluster because cluster state already exists 
raft2016/04/05 10:09:31 INFO: raft.node: 30fd4a114dca23df elected leader 77e98f5f46dec8c8 at term 2
raft2016/04/05 10:09:44 INFO: 30fd4a114dca23df [term: 2] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 3]
raft2016/04/05 10:09:44 INFO: 30fd4a114dca23df became follower at term 3
raft2016/04/05 10:09:44 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 3
raft2016/04/05 10:09:44 INFO: raft.node: 30fd4a114dca23df lost leader 77e98f5f46dec8c8 at term 3
raft2016/04/05 10:09:48 INFO: 30fd4a114dca23df [term: 3] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 4]
raft2016/04/05 10:09:48 INFO: 30fd4a114dca23df became follower at term 4
raft2016/04/05 10:09:48 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 4
raft2016/04/05 10:09:52 INFO: 30fd4a114dca23df [term: 4] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 5]
raft2016/04/05 10:09:52 INFO: 30fd4a114dca23df became follower at term 5
raft2016/04/05 10:09:52 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 5
raft2016/04/05 10:09:58 INFO: 30fd4a114dca23df [term: 5] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 6]
raft2016/04/05 10:09:58 INFO: 30fd4a114dca23df became follower at term 6
raft2016/04/05 10:09:58 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 6
raft2016/04/05 10:10:03 INFO: 30fd4a114dca23df [term: 6] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 7]
raft2016/04/05 10:10:03 INFO: 30fd4a114dca23df became follower at term 7
raft2016/04/05 10:10:03 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 7
raft2016/04/05 10:10:07 INFO: 30fd4a114dca23df [term: 7] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 8]
raft2016/04/05 10:10:07 INFO: 30fd4a114dca23df became follower at term 8
raft2016/04/05 10:10:07 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 8
raft2016/04/05 10:10:12 INFO: 30fd4a114dca23df [term: 8] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 9]
raft2016/04/05 10:10:12 INFO: 30fd4a114dca23df became follower at term 9
raft2016/04/05 10:10:12 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 9
raft2016/04/05 10:10:18 INFO: 30fd4a114dca23df [term: 9] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 10]
raft2016/04/05 10:10:18 INFO: 30fd4a114dca23df became follower at term 10
raft2016/04/05 10:10:18 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 10
raft2016/04/05 10:10:23 INFO: 30fd4a114dca23df [term: 10] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 11]
raft2016/04/05 10:10:23 INFO: 30fd4a114dca23df became follower at term 11
raft2016/04/05 10:10:23 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 11
raft2016/04/05 10:10:27 INFO: 30fd4a114dca23df [term: 11] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 12]
raft2016/04/05 10:10:27 INFO: 30fd4a114dca23df became follower at term 12
raft2016/04/05 10:10:27 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 12
raft2016/04/05 10:10:33 INFO: 30fd4a114dca23df [term: 12] received a MsgVote message with higher term from 2a1a85c1d8f4b39b [term: 13]
raft2016/04/05 10:10:33 INFO: 30fd4a114dca23df became follower at term 13
raft2016/04/05 10:10:33 INFO: 30fd4a114dca23df [logterm: 2, index: 4, vote: 0] voted for 2a1a85c1d8f4b39b [logterm: 2, index: 4] at term 13
raft2016/04/05 10:10:33 INFO: raft.node: 30fd4a114dca23df elected leader 2a1a85c1d8f4b39b at term 13
raft2016/04/05 10:12:33 INFO: 30fd4a114dca23df [term: 13] received a MsgVote message with higher term from 77e98f5f46dec8c8 [term: 14]
raft2016/04/05 10:12:33 INFO: 30fd4a114dca23df became follower at term 14
raft2016/04/05 10:12:33 INFO: 30fd4a114dca23df [logterm: 13, index: 5, vote: 0] voted for 77e98f5f46dec8c8 [logterm: 13, index: 5] at term 14
raft2016/04/05 10:12:33 INFO: raft.node: 30fd4a114dca23df lost leader 2a1a85c1d8f4b39b at term 14
raft2016/04/05 10:12:37 INFO: 30fd4a114dca23df [term: 14] received a MsgVote message with higher term from 77e98f5f46dec8c8 [term: 15]
raft2016/04/05 10:12:37 INFO: 30fd4a114dca23df became follower at term 15
raft2016/04/05 10:12:37 INFO: 30fd4a114dca23df [logterm: 13, index: 5, vote: 0] voted for 77e98f5f46dec8c8 [logterm: 13, index: 5] at term 15
raft2016/04/05 10:12:42 INFO: 30fd4a114dca23df [term: 15] received a MsgVote message with higher term from 77e98f5f46dec8c8 [term: 16]
raft2016/04/05 10:12:42 INFO: 30fd4a114dca23df became follower at term 16
raft2016/04/05 10:12:42 INFO: 30fd4a114dca23df [logterm: 13, index: 5, vote: 0] voted for 77e98f5f46dec8c8 [logterm: 13, index: 5] at term 16
raft2016/04/05 10:12:47 INFO: 30fd4a114dca23df [term: 16] received a MsgVote message with higher term from 77e98f5f46dec8c8 [term: 17]
raft2016/04/05 10:12:47 INFO: 30fd4a114dca23df became follower at term 17
raft2016/04/05 10:12:47 INFO: 30fd4a114dca23df [logterm: 13, index: 5, vote: 0] voted for 77e98f5f46dec8c8 [logterm: 13, index: 5] at term 17
raft2016/04/05 10:12:51 INFO: 30fd4a114dca23df [term: 17] received a MsgVote message with higher term from 77e98f5f46dec8c8 [term: 18]
raft2016/04/05 10:12:51 INFO: 30fd4a114dca23df became follower at term 18
raft2016/04/05 10:12:51 INFO: 30fd4a114dca23df [logterm: 13, index: 5, vote: 0] voted for 77e98f5f46dec8c8 [logterm: 13, index: 5] at term 18

/cc @aaronlehmann

Refactor schema to leverage compositional inheritance

The current schema has proved well academically, but there are complications in leveraging union-style additions to the record types. We'd like to refactor this to decouple the generic Job/Task chain from the more specific ServiceJob, CronJob, BatchJob, etc.

The new model would look roughly as follows:

message ServiceJobSpec {
     Meta meta = 1;

     // Template defines the base configuration for tasks created for this job.
     TaskSpec template = 2;

     // Instances specifies the number of instances of the service job that
     // should be running.
     int64 instances = 3;
 }

 message ServiceJob {
     string id = 1;
     ServiceJobSpec spec = 2;

     int64 instances = 3;

     repeated Slot slots = 4;
 }

Effectively, we get rid of Job and JobSpec and specialize per instance. We still have an interface class, Job, which can hold our job types, but they share very little.

This has a number of advantages. First, it ensures our sharing is based more on what is need. We don't share just because it was easy or made the code easy. We also remove the complex oneof types for different kinds of jobs. It also reduces coupling between the job types.

Attach

Attaching doesn't feel like a cluster management operation. Additionally, since agents connect to managers rather than the other way around, it wouldn't work seamlessly.

What if instead we provided an ssh service out of band from the cluster API to allow attach (debugging) operations?

It could either be:

  • Distributed, on the engine itself or in the agent. At that point, attaching would result into sshing into the node's IP at our SSH port
  • Centralized, on the manager. The manager would then need to communicate with the node in a way or another
  • Both: It's distributed, but ssh'ing into the manager forwards the connection into the node ssh endpoint

Standardize on Remove over Delete

Historically, docker uses rm or remove to encompass the operation of user-desired removal of resources. I would argue the use of remove over delete is correct in most cases, as actual deletion may be deferred. Furthermore, this will make the implementation of garbage collection much more clear, as a user may request removal but the GC deletes.

We should do this on an API level, as well as CLI and source documentation.

We can look at removal as the following flow:

  • remove - marked for removal, no longer accessible. Removal should be validated against other resources that may use it.
  • collect - ensure that any resources owned by the removed are also removed and progress towards a deleted (non-existent).
  • deleted - perform actual removal of the resource representation (remove the job from the backing store).

Put another way, we always go through remove->collect->delete. collect blocks until delete of owned resources. delete is equivalent to "does not exist".
@vieux @al

Conditions for closing this:

  • Achieve consensus on distinction between "remove" and "delete"
  • Document design distinction between deletion and removal
    • Consider how much of this distinction is exposed to users
    • Ensure we are clear where one or the other is used.
  • Decide whether or not we want to hook this into object state

Test failure on spec

Not sure if contextual or if already taken care of but this can be reproduced locally (and was spotted on the CI as well):

--- FAIL: TestParse (0.00s)
    Error Trace:    spec_test.go:69
    Error:          Not equal: "name1" (expected)
                    != "name2" (actual)

    Error Trace:    spec_test.go:70
    Error:          Not equal: 1 (expected)
                    != 2 (actual)

    Error Trace:    spec_test.go:71
    Error:          Not equal: "image1" (expected)
                    != "image2" (actual)

    Error Trace:    spec_test.go:69
    Error:          Not equal: "name2" (expected)
                    != "name1" (actual)

    Error Trace:    spec_test.go:70
    Error:          Not equal: 2 (expected)
                    != 1 (actual)

    Error Trace:    spec_test.go:71
    Error:          Not equal: "image2" (expected)
                    != "image1" (actual)

/cc @aluzzardi @stevvooe

Race condition in raft_test

Looks like some internal values are being accessed using atomic that are protected by channel ops:

WARNING: DATA RACE
Read by goroutine 1038:
  github.com/docker/swarm-v2/manager/state.(*Node).Start.func1()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft.go:371 +0x7e5

Previous write by goroutine 54:
  sync/atomic.StoreInt64()
      /usr/local/go/src/runtime/race_amd64.s:220 +0xb
  github.com/docker/swarm-v2/manager/state.TestRaftSnapshot()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft_test.go:550 +0x183
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc

Goroutine 1038 (running) created at:
  github.com/docker/swarm-v2/manager/state.(*Node).Start()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft.go:408 +0x1bc
  github.com/docker/swarm-v2/manager/state.newInitNode()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft_test.go:130 +0xbd3
  github.com/docker/swarm-v2/manager/state.newRaftCluster()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft_test.go:198 +0x79
  github.com/docker/swarm-v2/manager/state.TestRaftSnapshot()
      /Users/sday/go/src/github.com/docker/swarm-v2/manager/state/raft_test.go:544 +0x7f
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc

Goroutine 54 (running) created at:
  testing.RunTests()
      /usr/local/go/src/testing/testing.go:582 +0xae2
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:515 +0x11d
  main.main()
      github.com/docker/swarm-v2/manager/state/_test/_testmain.go:108 +0x210

cc @aaronlehmann @abronan

GRPC backoff too aggressive for our purposes

The backoff implementation ends up with too much time between retries after several failures, making recovery slow after masters return. Need to submit a patch to GRPC that allows backoff configuration.

Naming and Namespace in the cluster

Requirements

Below is a list of possible requirements for namespaces. Those that are checked off have been fully accepted. Unchecked requirements are those that are considered.

  • All system resources belong to a namespace
    • removing a namespace removes all member resources
  • DNS compatible naming across the board.
  • Applications can use namespace for portability. For example, run two web services in different namespaces.
  • Namespace should be able to reference each other across namespaces.
    • ACL hooks can intercept references before they are connected
    • A simple system for resolving cross-namespace references to support portability.

Considerations

We'll need to evaluate this proposal with consideration from other teams. The following quorum is proposed before proceeding:

  • Works well with compose plans (cc @aanand)
  • Works well with libnetwork (cc @mrjana)

Resources

While we have opened up the discussion of namespaces, we need to discuss
naming in general. Providing an effect of feel, the following description is
going to pretend that namespaces already exist. To open this discussion up, we
must understand the Kinds resources:

  • Cluster: A cluster is controlled by a set of managers. Most resources
    will be scoped in a cluster. Under the current model, we define this as a quorum set.
  • Namespace: A cluster is divided into several namespaces.
  • Node: A node resides in a cluster. From a user perspective, there isn't
    much access other than reporting their existence. We may want to route a
    user to a node for certain requests. We may want to hook the node into the
    DNS system.
  • Job: A job belongs to a namespace within the cluster. A job may have
    multiple tasks. The job itself may have a service endpoint associated with
    it, accessible over DNS, such as with a service job.
  • Task: A task belongs to a job and a node, when assigned.
  • Network: A network belongs to a namespace.
  • Volume: A volume belongs to a namespace.

Rules about Naming

All resources in the cluster system use the same naming conventions.

All names should be compatible DNS subdomains, compliant with
RFC1035. This allows any resource to
be expressed over DNS. It also ensures that we have a well-known, restricted
and reliable character space, compatible with existing tools.

For reference, names must comply with the following grammar:

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

Each label must be less than 64 characters and the total length must be less
than 256.

Names are case-insensitive, but stored and reported in lowercase, by
convention.

Tools interacting with names should support conversions too and from punycode.
This can be supported via
golang.org/x/net/idna.

Structure

For each kind of resource, the name must be unique in the name space. This
has the excellent property that all names are unique within the cluster. This
means that by default, we have a way to reference every other thing.

Resource Component Structure Examples
Cluster <cluster> <cluster> local, cluster0
Namespace <namespace> <namespace>.<cluster> production.cluster0, development.local, xn--7o8h (๐Ÿณ), system
Node <node> <node>.<cluster> node0.local
Job <job> <job>.<namespace>.<cluster> job0.production.cluster0
Task <task> <task>.<job>.<namespace>.<cluster> task0.job0.production.cluster0
Volume <volume> <volume>.<namespace>.<cluster> postgrs.production.cluster0
Network <network> <network>.<namespace>.<cluster> frontend.production.cluster0

At the base, we have the <cluster>. The cluster should refer to a specific
cluster and can be named by configuration. Users should all share a common
configuration but it is not necessary to interoperate.

While names are generated from structure, a resource name may have one or more
labels, so they cannot be parsed to infer the source structure. For example, a
node may be named a.b. When qualified, it may be a.b.default.local. If we
don't know this is a node name, we may try to infer that based on structure.
It is impossible to tell whether this is a resource named a on node b or a
node named a.b.

Namespaces

A namespace is an area where resources can reference each other without
qualification.

Every operation has a default namespace from which it is conducted. Any
objects created in that context become a member of that namespace.

By default, we will have the following namespaces:

Namespace Description
default Default namespace for all resources
system System namespace for cluster jobs

By default, all resources are created under default, unless the user modifies their configuration. The system namespace is used to run cluster tasks, such as plugins and data distribution plains. Resources in the system namespace are only shown in a special mode.

References

For most service declarations, we reference resources by a name. Typically,
this name is evaluated within a namespace, as described above. To allow access
to objects in disparate namespaces, we define a searchspace as part of an
operation context. When referencing another object, the reference only needs
to be long enough to resolve in the common parent. Two objects in the same
cluster but different namespace only need to include the namespace in the
reference but not the cluster name.

A searchspace consists of one or more namespaces, in precedence order. If a
resource is not resolved with an unqualified name, each available namespace is
tried until a match is found.

This can extend to involve resource sharing between two users. Let's say two
developers are developing an application in their own namespaces, lucy and
steve.

Let's say we have an identical service definition myapp which can be run
independently:

service:
  myapp:
    instances: 4
    requires:
      redis # leave this syntax for another discussion!
    container:
      #...

For Lucy, the fully qualifed service name is myapp.lucy.local and Steve
has myapp.steve.local. However, when running the service, the requirement
of redis is not fulfilled. It is absent from the definition. Running the
service fails. Fortunately, the operations team has made a development
instance available at redis.development.cluster0. By default, neither Lucy
or Steve cannot see this resource.

A few things can happen here to resolve the issue. They can both edit the
configuration file to add .development to the redis reference. While this
does work, it now makes the definition non-portable.

A better resolution is to have both developers add development to the
searchspace for the operation context. For steve, the unqualified name
would be expanded to the following fully qualified names:

redis.steve.local
redis.development.cluster0

Lucy does the same gets the following qualified names:

redis.lucy.local
redis.development.cluster0

Note that both developers did the same thing and got the same result but have
different application environments.

With this, we get a very clear order in which resources are resolved. Each
user can set their default namespace and searchspace and control the order
in which resources are resolved. Once this is setup correctly, only
unqualified names will have to be used in practice for most API operations.

The main complexity here is that all names from user input need to resolved at
API request time, associating the resolution with an operation context.
Subsequintly, names get written out from user input during the API call, to
capture the current searchspace.

Clusters

We slightly glossed over a point above. Where did cluster0 come from? This
is simply the domain name of the cluster. In the example above, both
developers have a cluster on their machines, known as local. This just has
to be one or more endpoints that are available for cluster submission.

Just as in searchspace, we can define a set of clusters that one might want
to use from an environment. These clusters combine with the search space to
create names. Let's say we have the following list of cluster domains:

local
cluster0
cluster1

We can combine this using a cross product with our searchspace ([local, development]) to get all of the possible references for a resource redis
from the point of view of the user:

redis.steve.local
redis.development.local
redis.development.cluster0
redis.development.cluster1

We Let's say that Steve needs help with his application. Lucy tries to
reference it with myapp.steve.local but that won't work, since .local is
different between the two machines. To deal with this, we can define clusters
with names. A possible configuration on Lucy's machine might be the following:

<steves ip> steves-mbp

Now, she can reference his app with myapp.steve.steves-mbp or just
myapp.steve if she adds "steve" to the search space.

Access Control

Namespaces provide a tool for access control. To build this framework, we
say that every operation has a context with a namespace. Under normal
operation, all creations, updates and deletions happen within the context's
namespace.

Access control operations simply use this framework to operate within. We can
define which namespaces can access other namespaces.

TODO: Work out some examples here. This actually works well, but we need
examples.

Alternative Models

Some other possible models under consideration:

  1. Similar to the above but resources cannot reference between namespaces. Slightly inflexible in large teams that want to partition a cluster arbitrarily.
  2. Slash-based model. Not DNS compatible, but somewhat useful against current docker projects.

Vanity

Naming is typically done out of vanity. While this specification is fairly
restrictive in naming, since we intend to use naming as an organizational
tool, we may find it necessary to introduce the concept of a vanity name.

Put whatever you like in this name.

Road Map

  • Define first-class Namespace object
    • Provides definition of the namespace
  • Lock down naming
  • Define context object for all API requests
    • namespace for owning namespace of operation
    • searchspace for reference context

@mikegoezler @aluzzardi @amitshukla @icecrime

Reorganize protobuf files

Currently, all proto files are in the api subdirectory. We may want to move to a model where proto files live in subdirectories of the Go modules that use them.

i.e.:

  • dispatcher/pb
  • manager/pb
  • manager/state/pb

etc.

cc @stevvooe

Define Manager.Info RPC API

Currently, #120 proposes a single purpose method named Manager.NodeCount. It returns only the number of registered nodes active with a dispatcher. This is important for connection balancing and we go forward with this for M1.

In the future, we may find it more fitting to have a Manager.Info or Manager.DispatcherInfo call which can return other kinds of data corresponding the manager's dispatcher state.

Find an efficient way to expose state to manager components

The current Store interface has functions that return copies of internal lists or maps; for example, Nodes(), Jobs(), and Tasks(). This copying would become very expensive at a large scale (i.e. tens of thousands of nodes, jobs, and tasks). If each scheduling or planning decision requires doing these queries into the store that make copies, it would be an issue for scaling.

Here are a few ideas to make this more efficient:

  • Expose internal datastructures from the store for direct reading, through a transactional interface that acts as a reader/writer lock. This avoids the cost of making a complete copy each time the data is accessed. However, it means that things like scheduling decisions would block updates to the state from Raft, and this could be a serious issue if any such decisions take significant time to compute.
  • Have each component maintain its own copy of the state, using watchers in a synchronous event loop. For example, the orchestrator would have its own memory store instance that it keeps up to date by watching the main store for changes. While it's busy making an orchestration decision, it wouldn't be reading from the channel that receives these updates, so it's guaranteed to see a consistent state. I like this approach because it's easy to reason about and doesn't involve complicated locking or transaction semantics. The potential downside is that it involves multiple "sources of truth", and there could be problems if these copies of the state somehow get out of sync or lag one another.

raft: start and stop nodes N/2+1 times end up in infinite rejection loop

/cc @aaronlehmann

Diagnose

Stopping and joining a node again to an existing cluster (using swarm manage init/join) seems to end up in an infinite rejection loop.

Reproduce

On a 4 nodes cluster, stop and start a node 3 times.

On a 3 nodes cluster, stop and start a node 2 times.

Generally, start a N nodes cluster, stop and start a node N/2+1 times

Solution

This is related to membership: it fails after the third attempt because raft thinks that the nodes who crashed may still be reachable, so they are still registered in the state machine.

Thus the new node will wait for the leader to send it a heartbeat message with higher term which it does not do because there is no more majority. So this will trigger a new election and confuse everyone with a term index lower than the current one in the cluster.

All the nodes that are up will reject those messages naturally and this will end up in an infinite rejection loop.

The solution is to remove nodes who crashed cleanly, so that we don't end up in this situation in the first place.

The safest approach would be to monitor unreachable nodes on the leader and propose a configuration change to remove it cleanly after a few connect attempts.

Logs

Existing Member (reject new nodes votes and heartbeats)

raft2016/03/24 17:49:47 INFO: 3583926d0ba80881 became follower at term 0
raft2016/03/24 17:49:47 INFO: newRaft 3583926d0ba80881 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
raft2016/03/24 17:49:47 INFO: 3583926d0ba80881 became follower at term 1
INFO[0000] Listening for connections                     addr=[::]:4243 proto=tcp
raft2016/03/24 17:49:48 INFO: 3583926d0ba80881 [term: 1] received a MsgHeartbeat message with higher term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:49:48 INFO: 3583926d0ba80881 became follower at term 2
raft2016/03/24 17:49:48 INFO: raft.node: 3583926d0ba80881 elected leader 199842e2ef7c4b08 at term 2
2016/03/24 17:49:55 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:49:55 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:49:55 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:49:56 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:49:56 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:49:56 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:49:56 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2016/03/24 17:49:56 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2016/03/24 17:49:56 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken EOF.
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40681->127.0.0.1:4244: read: connection reset by peer.
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40683->127.0.0.1:4244: read: connection reset by peer.
2016/03/24 17:50:05 transport: http2Client.notifyError got notified that the client transport was broken read tcp 127.0.0.1:40682->127.0.0.1:4244: read: connection reset by peer.
2016/03/24 17:50:06 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:50:06 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:50:06 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 0.0.0.0:4244: getsockopt: connection refused"; Reconnecting to "0.0.0.0:4244"
2016/03/24 17:50:06 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2016/03/24 17:50:06 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
2016/03/24 17:50:06 grpc: Conn.transportMonitor exits due to: grpc: timed out trying to connect
raft2016/03/24 17:50:25 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 0] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 2
raft2016/03/24 17:50:36 INFO: 3583926d0ba80881 [term: 2] received a MsgVote message with higher term from 350c61c9b855e64e [term: 3]
raft2016/03/24 17:50:36 INFO: 3583926d0ba80881 became follower at term 3
raft2016/03/24 17:50:36 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 0] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 3
raft2016/03/24 17:50:36 INFO: raft.node: 3583926d0ba80881 lost leader 199842e2ef7c4b08 at term 3
raft2016/03/24 17:50:37 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:38 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:39 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:40 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:41 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:42 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:43 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:44 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:45 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:46 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:47 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:48 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 [term: 3] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 is starting a new election at term 3
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 became candidate at term 4
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 4
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 4
raft2016/03/24 17:50:49 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 4
raft2016/03/24 17:50:50 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:51 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:52 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:53 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:54 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:54 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 4
raft2016/03/24 17:50:55 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:56 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:57 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:58 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:50:59 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:00 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:01 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:02 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:03 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 [term: 4] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 is starting a new election at term 4
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 became candidate at term 5
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 5
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 5
raft2016/03/24 17:51:04 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 5
raft2016/03/24 17:51:05 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:06 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:06 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 5
raft2016/03/24 17:51:07 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:08 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:09 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:10 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:11 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:12 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:13 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:14 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:15 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:16 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:17 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:18 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 [term: 5] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 is starting a new election at term 5
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 became candidate at term 6
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 6
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 6
raft2016/03/24 17:51:19 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 6
raft2016/03/24 17:51:20 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:21 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:21 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 6
raft2016/03/24 17:51:22 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:23 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:24 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:25 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:26 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:27 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:28 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:29 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:30 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 [term: 6] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 is starting a new election at term 6
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 became candidate at term 7
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 7
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 7
raft2016/03/24 17:51:31 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 7
raft2016/03/24 17:51:32 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:33 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:34 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:34 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 7
raft2016/03/24 17:51:35 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:36 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:37 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:38 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:39 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:40 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:41 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:42 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 [term: 7] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 is starting a new election at term 7
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 became candidate at term 8
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 8
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 8
raft2016/03/24 17:51:43 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 8
raft2016/03/24 17:51:44 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:45 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:46 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:47 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:47 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 8
raft2016/03/24 17:51:48 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:49 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:50 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:51 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:52 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:53 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:54 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 [term: 8] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 is starting a new election at term 8
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 became candidate at term 9
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 9
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 9
raft2016/03/24 17:51:55 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 9
raft2016/03/24 17:51:56 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:57 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:58 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:51:59 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:00 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:01 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:01 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 9
raft2016/03/24 17:52:02 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:03 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:04 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:05 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 [term: 9] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 is starting a new election at term 9
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 became candidate at term 10
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 10
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 10
raft2016/03/24 17:52:06 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 10
raft2016/03/24 17:52:07 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:08 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:09 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:10 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:11 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:12 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:12 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 10
raft2016/03/24 17:52:13 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:14 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:15 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:16 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:17 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:18 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:19 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:20 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 [term: 10] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 is starting a new election at term 10
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 became candidate at term 11
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 received vote from 3583926d0ba80881 at term 11
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 435f33483429a668 at term 11
raft2016/03/24 17:52:21 INFO: 3583926d0ba80881 [logterm: 2, index: 6] sent vote request to 34f93bfe72a79963 at term 11
raft2016/03/24 17:52:22 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:23 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:23 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 3583926d0ba80881] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 11
raft2016/03/24 17:52:24 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:25 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:26 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:27 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:28 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:29 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:30 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:31 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:32 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:33 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:34 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:35 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:36 INFO: 3583926d0ba80881 [term: 11] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:36 INFO: 3583926d0ba80881 [term: 11] received a MsgVote message with higher term from 350c61c9b855e64e [term: 12]
raft2016/03/24 17:52:36 INFO: 3583926d0ba80881 became follower at term 12
raft2016/03/24 17:52:36 INFO: 3583926d0ba80881 [logterm: 2, index: 6, vote: 0] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 12
raft2016/03/24 17:52:37 INFO: 3583926d0ba80881 [term: 12] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:38 INFO: 3583926d0ba80881 [term: 12] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:39 INFO: 3583926d0ba80881 [term: 12] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:40 INFO: 3583926d0ba80881 [term: 12] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]
raft2016/03/24 17:52:41 INFO: 3583926d0ba80881 [term: 12] ignored a MsgHeartbeat message with lower term from 199842e2ef7c4b08 [term: 2]

New Node (gets its Heartbeat and votes rejected)

raft2016/03/24 17:50:10 INFO: 350c61c9b855e64e became follower at term 2
raft2016/03/24 17:50:10 INFO: newRaft 350c61c9b855e64e [peers: [], term: 2, commit: 5, applied: 0, lastindex: 5, lastterm: 1]
raft2016/03/24 17:50:10 INFO: 350c61c9b855e64e became follower at term 1
INFO[0000] Listening for connections                     addr=[::]:4244 proto=tcp
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e is starting a new election at term 1
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e became candidate at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 2
raft2016/03/24 17:50:25 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e is starting a new election at term 2
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e became candidate at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 3
raft2016/03/24 17:50:36 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e is starting a new election at term 3
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e became candidate at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 4
raft2016/03/24 17:50:54 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e is starting a new election at term 4
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e became candidate at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 5
raft2016/03/24 17:51:06 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e is starting a new election at term 5
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e became candidate at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 6
raft2016/03/24 17:51:21 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e is starting a new election at term 6
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e became candidate at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 7
raft2016/03/24 17:51:34 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e is starting a new election at term 7
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e became candidate at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 8
raft2016/03/24 17:51:47 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e is starting a new election at term 8
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e became candidate at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 9
raft2016/03/24 17:52:01 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e is starting a new election at term 9
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e became candidate at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 10
raft2016/03/24 17:52:12 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e is starting a new election at term 10
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e became candidate at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e received vote rejection from 350c61c9b855e64e at term 11
raft2016/03/24 17:52:23 INFO: 350c61c9b855e64e [quorum:3] has received 1 votes and 0 vote rejections
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e is starting a new election at term 11
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e became candidate at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e received vote from 350c61c9b855e64e at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 435f33483429a668 at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 3583926d0ba80881 at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6] sent vote request to 34f93bfe72a79963 at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 12
raft2016/03/24 17:52:36 INFO: 350c61c9b855e64e [logterm: 1, index: 6, vote: 350c61c9b855e64e] rejected vote from 350c61c9b855e64e [logterm: 1, index: 6] at term 12

The Leader is inactive (it can't reach a majority that is still registered in the state machine).

Ensure raft ID uniqueness

Currently, raft IDs are randomly generated on the client side before joining the cluster. If the ID it generates collides with an ID that's already in use, this is not handled gracefully.

There are two ways I can see to handle this properly:

  • Generate raft IDs on the server side: Have the Join RPC call to respond with a valid ID. This either requires that Join requests are forwarded to the leader, or that Join retries proposing a configuration change until one gets committed with an ID that doesn't conflict with anything else in the log. The logic around this could be a bit complex.
  • Continue to generate the ID client-side, but have the Join RPC call return an error if the ID is not unique (after the ProposeConfChange raft round). Have the client retry with a different ID.

cc @abronan

Support all container options in ContainerSpec

Right now we only support args, env and a couple of others.

We should support all container options whenever they make sense in a cluster environment.

See:
https://github.com/docker/engine-api/blob/master/types/container/config.go
https://github.com/docker/engine-api/blob/master/types/container/host_config.go
https://docs.docker.com/engine/reference/api/docker_remote_api_v1.22/#create-a-container
https://docs.docker.com/compose/compose-file/

Progress

We've enumerated all of the docker container fields from engine-api. Once they have been implemented or eliminated here, check off the item and give a reason why it is supported. Once they are all checked off, we can close this issue (may want to make this a table).

Config

  • Hostname string // Hostname
  • Domainname string // Domainname
  • User string // User that will run the command(s) inside the container
  • AttachStdin bool // Attach the standard input, makes possible user interaction
  • AttachStdout bool // Attach the standard output
  • AttachStderr bool // Attach the standard error
  • ExposedPorts map[nat.Port]struct{} json:",omitempty" // List of exposed ports
  • Tty bool // Attach standard streams to a tty, including stdin if it is not closed.
  • OpenStdin bool // Open stdin
  • StdinOnce bool // If true, close stdin after the 1 attached client disconnects.
  • Env []string // List of environment variable to set in the container
  • Cmd strslice.StrSlice // Command to run when starting the container
  • ArgsEscaped bool json:",omitempty" // True if command is already escaped (Windows specific)
  • Image string // Name of the image as it was passed by the operator (eg. could be symbolic)
  • Volumes map[string]struct{} // List of volumes (mounts) used for the container
  • WorkingDir string // Current directory (PWD) in the command will be launched
  • Entrypoint strslice.StrSlice // Entrypoint to run when starting the container
  • NetworkDisabled bool json:",omitempty" // Is network disabled
  • MacAddress string json:",omitempty" // Mac Address of the container
  • OnBuild []string // ONBUILD metadata that were defined on the image Dockerfile
  • Labels map[string]string // List of labels set to this container
  • StopSignal string json:",omitempty" // Signal to stop a container

Resources:

  • CPUShares int64 json:"CpuShares" // CPU shares (relative weight vs. other containers)
  • Memory int64 // Memory limit (in bytes)
  • CgroupParent string // Parent cgroup.
  • BlkioWeight uint16 // Block IO weight (relative weight vs. other containers)
  • BlkioWeightDevice []*blkiodev.WeightDevice
  • BlkioDeviceReadBps []*blkiodev.ThrottleDevice
  • BlkioDeviceWriteBps []*blkiodev.ThrottleDevice
  • BlkioDeviceReadIOps []*blkiodev.ThrottleDevice
  • BlkioDeviceWriteIOps []*blkiodev.ThrottleDevice
  • CPUPeriod int64 json:"CpuPeriod" // CPU CFS (Completely Fair Scheduler) period
  • CPUQuota int64 json:"CpuQuota" // CPU CFS (Completely Fair Scheduler) quota
  • CpusetCpus string // CpusetCpus 0-2, 0,1
  • CpusetMems string // CpusetMems 0-2, 0,1
  • Devices []DeviceMapping // List of devices to map inside the container
  • DiskQuota int64 // Disk limit (in bytes)
  • KernelMemory int64 // Kernel memory limit (in bytes)
  • MemoryReservation int64 // Memory soft limit (in bytes)
  • MemorySwap int64 // Total memory usage (memory + swap); set -1 to enable unlimited swap
  • MemorySwappiness *int64 // Tuning container memory swappiness behaviour
  • OomKillDisable *bool // Whether to disable OOM Killer or not
  • PidsLimit int64 // Setting pids limit for a container
  • Ulimits []*units.Ulimit // List of ulimits to be set in the container
  • CPUCount int64 json:"CpuCount" // CPU count
  • CPUPercent int64 json:"CpuPercent" // CPU percent
  • IOMaximumIOps uint64 // Maximum IOps for the container system drive
  • IOMaximumBandwidth uint64 // Maximum IO in bytes per second for the container system drive
  • NetworkMaximumBandwidth uint64 // Maximum bandwidth of the network endpoint in bytes per second

HostConfig

  • Binds []string // List of volume bindings for this container
  • ContainerIDFile string // File (path) where the containerId is written
  • LogConfig LogConfig // Configuration of the logs for this container
  • NetworkMode NetworkMode // Network mode to use for the container
  • PortBindings nat.PortMap // Port mapping between the exposed port (container) and the host
  • RestartPolicy RestartPolicy // Restart policy to be used for the container
  • AutoRemove bool // Automatically remove container when it exits
  • VolumeDriver string // Name of the volume driver used to mount volumes
  • VolumesFrom []string // List of volumes to take from other container
  • CapAdd strslice.StrSlice // List of kernel capabilities to add to the container
  • CapDrop strslice.StrSlice // List of kernel capabilities to remove from the container
  • DNS []string json:"Dns" // List of DNS server to lookup
  • DNSOptions []string json:"DnsOptions" // List of DNSOption to look for
  • DNSSearch []string json:"DnsSearch" // List of DNSSearch to look for
  • ExtraHosts []string // List of extra hosts
  • GroupAdd []string // List of additional groups that the container process will run as
  • IpcMode IpcMode // IPC namespace to use for the container
  • Cgroup CgroupSpec // Cgroup to use for the container
  • Links []string // List of links (in the name:alias form)
  • OomScoreAdj int // Container preference for OOM-killing
  • PidMode PidMode // PID namespace to use for the container
  • Privileged bool // Is the container in privileged mode
  • PublishAllPorts bool // Should docker publish all exposed port for the container
  • ReadonlyRootfs bool // Is the container root filesystem in read-only
  • SecurityOpt []string // List of string values to customize labels for MLS systems, such as SELinux.
  • StorageOpt map[string]string // Storage driver options per container.
  • Tmpfs map[string]string json:",omitempty" // List of tmpfs (mounts) used for the container
  • UTSMode UTSMode // UTS namespace to use for the container
  • UsernsMode UsernsMode // The user namespace to use for the container
  • ShmSize int64 // Total shm memory usage
  • Sysctls map[string]string json:",omitempty" // List of Namespaced sysctls used for the container
  • ConsoleSize [2]int // Initial console size
  • Isolation Isolation // Isolation technology of the container (eg default, hyperv)
  • Resources

EndpointSettings

  • IPAMConfig *EndpointIPAMConfig
  • Links []string
  • Aliases []string
  • NetworkID string
  • EndpointID string
  • Gateway string
  • IPAddress string
  • IPPrefixLen int
  • IPv6Gateway string
  • GlobalIPv6Address string
  • GlobalIPv6PrefixLen int
  • MacAddress string

NetworkingConfig

  • EndpointsConfig map[string]*EndpointSettings // Endpoint configs for each connecting network

Define approach to distributed image resolution

Currently, there is a field on Task discussing the concept of image resolution:

  192     // Resolved is the source resolved by the swarm cluster. This may be
  193     // identical, depending on the name provided in the JobSpec. For example,
  194     // the name field may be "redis", whereas this field would specify the
  195     // exact hash, "redis@sha256:...".
  196     oneof resolved {
  197         ImageSpec image = 7;
  198     }

For now, we are removing this. However, we need to discuss the final approach to image resolution and ensure that we have a supportable approach that ensures all nodes are running consistent images.

cc @aluzzardi

Handle duplicate Job submissions

Right now, running two job create on the same spec will result in two jobs being created.

We should consider using name as a primary key and detect when creating jobs/networks/etc.

Example:

$ swarmctl job create -f foo.yml
<ID>
$ swarmctl job create -f foo.yml
Error: Job `foo` already exists

NodeID on Dispatcher.RegisterRequest should be removed or ignored

NodeID in api.RegisterRequest is currently in place to allow node registration to work without authentication and mtls support. Since we plan to ensure the use of mTLS, the presence of this field may be a future problem, since its presence means it may be used.

We need to ensure that this field is properly validated against authentication system or via mTLS before releasing.

@diogomonica @LK4D4

Race and panic in `ProcessRaftMessage`

==================
WARNING: DATA RACE
Read by goroutine 1146:
  github.com/docker/swarm-v2/manager/state.(*Node).ProcessRaftMessage()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft.go:415 +0x55
  github.com/docker/swarm-v2/api._Manager_ProcessRaftMessage_Handler()
      /vagrant/gowork/src/github.com/docker/swarm-v2/api/manager.pb.go:314 +0x150
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).processUnaryRPC()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:505 +0x105b
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).handleStream()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:654 +0x1418
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:323 +0xad

Previous write by goroutine 64:
  github.com/docker/swarm-v2/manager/state.(*Node).Start.func1()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft.go:324 +0x8ca

Goroutine 1146 (running) created at:
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveStreams.func1()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:324 +0xa7
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc/transport.(*http2Server).operateHeaders()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/transport/http2_server.go:212 +0x17fa
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc/transport.(*http2Server).HandleStreams()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/transport/http2_server.go:276 +0x1166
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveStreams()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:325 +0x1dc
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveNewHTTP2Transport()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:312 +0x54d
  github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).handleRawConn()
      /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:289 +0x5cb

Goroutine 64 (finished) created at:
  github.com/docker/swarm-v2/manager/state.(*Node).Start()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft.go:329 +0x1bc
  github.com/docker/swarm-v2/manager/state.newInitNode()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft_test.go:53 +0xb5f
  github.com/docker/swarm-v2/manager/state.newRaftCluster()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft_test.go:123 +0x82
  github.com/docker/swarm-v2/manager/state.TestRaftBootstrap()
      /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft_test.go:164 +0x51
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc
==================
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x68 pc=0x4ac505]

goroutine 1303 [running]:
panic(0xda76a0, 0xc82000e160)
    /usr/local/go/src/runtime/panic.go:464 +0x3ff
github.com/docker/swarm-v2/manager/state.(*Node).ProcessRaftMessage(0xc820138c40, 0x7f7383684370, 0xc823014630, 0xc820128f20, 0x0, 0x0, 0x0)
    /vagrant/gowork/src/github.com/docker/swarm-v2/manager/state/raft.go:415 +0x135
github.com/docker/swarm-v2/api._Manager_ProcessRaftMessage_Handler(0xe82980, 0xc820138c40, 0x7f7383684370, 0xc823014630, 0xc822d31c00, 0x0, 0x0, 0x0, 0x0)
    /vagrant/gowork/src/github.com/docker/swarm-v2/api/manager.pb.go:314 +0x151
github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc82012d300, 0x7f73836f18f8, 0xc8224a47e0, 0xc822ec4d20, 0xc82037de00, 0x12e9490, 0xc823014600, 0x0, 0x0)
    /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:505 +0x105c
github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).handleStream(0xc82012d300, 0x7f73836f18f8, 0xc8224a47e0, 0xc822ec4d20, 0xc823014600)
    /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:654 +0x1419
github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc8224e6690, 0xc82012d300, 0x7f73836f18f8, 0xc8224a47e0, 0xc822ec4d20)
    /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:323 +0xae
created by github.com/docker/swarm-v2/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
    /vagrant/gowork/src/github.com/docker/swarm-v2/vendor/google.golang.org/grpc/server.go:324 +0xa8
FAIL    github.com/docker/swarm-v2/manager/state    3.277s

Deepcopy plugin incompatible with types that don't have Copy

Right now, the deepcopy plugin isn't compatible with types that aren't generated with deepcopy enabled.

Here is an error encountered where we reference an external type and it fails to generate valid code:

api/manager.pb.go:194: m.Msg.Copy undefined (type *raftpb.Message has no field or method Copy)

raftpb is imported from github.com/coreos/etcd/raft/raftpb.

There are a few options:

  1. Define an extension that gets set when we generate Copy for a type. If that is unset, we just generate copy code inline.
  2. Detect Copy on the target type. This may require some tricky reflection or dependency parsing.
  3. Do nothing.

cc @mrjana

Specify declarative yaml format

We need to determine whether the manifest format in Swarm2:

  • is identical to Compose v2 format
  • implements a subset (most commonly used features) of Compose v2
  • is a completely breaking change with Compose v2

Related issues to consider:

  • there are a lot of ServiceConfig options -- which would we support?
  • @stevvooe's idea of adding an extra containers level to the manifest:
web:
    [...]
    container:
      image: nginx
  • what team will own this file format and the parser long term
  • what do users want/expect

Follows up on conversation in PR #136

Reduce store boilerplate

There are a lot of things that need to change for each new object type that is added.

  • Interface definitions for read-only access to this object type, and for read/write access
  • A series of functions to implement these interfaces
  • Event types for updates, creations, and deletions (for watch)
  • StoreAction types for updates, creations, and deletions
  • Updates to several functions such as DeleteAll, Save, Restore, CopyFrom, applyStoreAction, newStoreAction...

To reduce code duplication, we should consider codegen'ing the functions that need to be implemented for each object type, and the functions that need to do something for each object type.

Maybe we can do something about the number of types as well. For example, StoreAction_* and Event* could possibly be merged, since they do essentially the same thing (convey a state change).

cc @aluzzardi @stevvooe @mrjana

Specify resource selectors

Several features require selecting sets of resources through selectors. See #199 for one example. We may want to use this for networking, volume selection and even load balancing. We need a common Selector type for use across the API to express this.

A few common requirements:

  1. Match over attributes, types, names, and labels, fully or partially.
  2. Additive selectors describe an intersection.

Whatever we choose should be compatible with docker's system.

  1. Docker's engine filters: https://godoc.org/github.com/docker/engine-api/types/filters
  2. Swarm's constraint filters: https://docs.docker.com/swarm/scheduler/filter/

@aluzzardi @vieux

Raft backend improvements and integration points

Items to take care of on the raft backend:

  • Decouple transport from the backend (include transport interface so that it could be plugged to anything)
  • Improve Cluster membership management (send back list on Join, handle additional error cases, disallow leave cluster if this ends up in a loss of quorum)
  • [#89] Synchronous Put (replicate processInternalRaftRequest() method in etcd)
  • Plug the memory store to the backend (after synchronous put is taken care of and #70 is merged)
  • [#98] First steps: Include snapshotting and durability mechanism for logs (see snap/snapshotter.go in etcd)

Feel free to add more if you see more items.

/cc @aaronlehmann @aluzzardi

Node management upon Manager failure

When the manager starts, it should:

  • Set all nodes (from state) status to UNKNOWN.
  • Start the heartbeat timer (with a "grace period" to give time for agents to connect)
  • If the agents come back online, they will be READY
  • If agents fail to come back within grace period, then the heartbeat will set them to DOWN, triggering the drainer.

This should handle all use cases, including leader re-election.

Nullable fields in proto

We should have a coherent strategy for setting the nullable attribute in protos: right now it's very ad-hoc and inconsistent.

I previously argued we should default to nullable = false but now I have some doubts.

In protos, nil pointers are used to check whether a field was set or not and it's the only way to know that. If we set everything to nullable=false, we have no way to know whether the field was explicitly set to the zero value or unset.

The problem gets worse with other languages: nullable=false is just a property of golang. If, for instance, a Python client doesn't set a particular field, we'll receive a valid (zeroed) structure in our Go server.

We also pay the price during marshalling/unmarshalling. In protos, unset fields don't get marshalled at all. Our non-nullable fields however get marshalled whether we've set them or not.

Example:

In NodeDescription::MarshalTo:

    if m.Resources != nil {
        data[i] = 0x1a
        i++

In NodeSpec::MarshalTo:

    data[i] = 0xa
    i++
    i = encodeVarintTypes(data, i, uint64(m.Meta.Size()))
    n1, err := m.Meta.MarshalTo(data[i:])

In the first case, we add bytes (i++) to the serialized message only if NodeDescription.Resources is set, in the second case, we add bytes (i++) every single time even if NodeSpec.Meta is not set.

/cc @stevvooe @aaronlehmann @vieux

Resource Control

Summary

We have to handle resources both at the scheduler level and at the runtime level.

Swarm provided a single set of resources (CPU, Memory) which was used both for scheduling and constraining.

Several users were unhappy by this since specifying a resource limit resulted in reservation. For instance, if the user wanted to limit a container to 1GB of RAM (-m 1g) then the scheduler assumed that gigabyte was reserved and would "take it away" from the machine.

In SwarmKit
, we should let users specify the two independently:

  • Reservation: Amount of resources reserved for this task on the node. A node should not go beyond its total capacity, as in, the scheduler should only chose a node if: sumOfAllTasksReservations(node) + task.Resources.Reserved < node.Resources
  • Limits: Limit the amount (quota) of resources the task can use. This is effectively the -m and -c flags of the engine.

CPU:

Swarm was using --cpu-shares which is very approximate. Docker later introduced --cpu-period and --cpu-quota which use CFS (Completely Fair Scheduler). We should leverage that.

Memory:

Reservation is actually trickier than just scheduling.

Tasks can have higher limits than their reservation. There might also be tasks on the same node that simply don't have reservations nor limits. We might want to let tasks "burst" beyond their limits if no one else is using the resources.

Network:

@mrjana Is there anything we can do with network resources?

Disk:

Not sure if this is tied to volumes rather than containers.

/cc @aaronlehmann @LK4D4 @tonistiigi @icecrime

This is a combination of scheduling plus low-level system stuff, we're going to need to help of engine to design it.

Metrics Collection

We should have a way to do metrics collection, either by supporting a solution and implementing it by documenting it, or shipping with a default.

A possible solution would be to deploy a prometheus and run collectors on every node.

Add concept of namespacing cluster-level resources

For example, let's say we have the following:

services:
  web:
    networks:
      - front
      - back
  db:
    networks:
      - back
networks:
  front: {}
  back: {}  

One should be able to run the same service on different network set. For example, there may exist "production/back" and "development/back" to differentiate between production and development.

We'd want to apply the same logic to volumes, as well.

Remove ID from NodeSpec

A node ID will never be the responsibility of user input or unverified reports through the dispatcher API. NodeID should be moved from NodeSpec to Node.

Clarify agent registration states and task acceptance state.

Please see #118 (comment).

After discussion, it seems the right approach is to separate the states. First, we have node membership state:

State Description
DOWN node heartbeat grace period timed out and not available
DISCONNECTED node is still registered but not reporting. it may reconnect and resume on another manager
PENDING Node has connected but has not yet registered. May be accompanied by an error.
READY Node is ready for instruction

The membership state of a node is detected by the dispatchers, according the agent's registration behavior and connection state. This would be reported by the dispatcher and would go directly on the Node object.

To complement the above, we have a user-controlled node status, known as the availability state:

State Description
DRAIN All tasks should be removed from the node
PAUSE Assign no new tasks. Leave existing tasks alone.
ACTIVE Schedule tasks with this node as normal

This state would belong in NodeSpec, as provided by the user.

  • Define NodeMembershipState and NodeAvailabilityState in protobuf
  • Separate NodeSpec, for user control and a NodeStatus report, coming from the agents.

Updates to this description:

  • Named the two states membership state and availability state (thanks @aaronlehmann / @aluzzardi)
  • Clarified roles of different states
  • Add PENDING state
  • Removed past tense on availability state

@aluzzardi @LK4D4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.