natganker

KISS NAT takeover for Amazon AWS

High-Level Overview

┌─────────────────────────────────────────────────────────┐
│                       Amazon VPC                        │
│     ┏━━━━━━━━━━┓                           ┌ ─ ─ ─ ─ ─  │
│     ┃ nat-gw-1 ┃─────ssh/ping heartbeat────  nat-gw-2 │ │
│     ┗━━━━━━━━━━┛                           └ ─ ─ ─ ─ ─  │
│           │                                             │
│           │                                             │
│           ├────────────────────────┐                    │
│           │                        │                    │
│           │                        │                    │
│           │                        │                    │
│ ┌─── rtr-table-a ──┐     ┌─── rtr-table-b ──┐           │
│ │ ┌~~~~~~~~~~~~~~┐ │     │ ┌~~~~~~~~~~~~~~┐ │           │
│ │ │   subnet 1   │ │     │ │   subnet 3   │ │           │
│ │ └~~~~~~~~~~~~~~┘ │     │ └~~~~~~~~~~~~~~┘ │           │
│ │ ┌~~~~~~~~~~~~~~┐ │     │ ┌~~~~~~~~~~~~~~┐ │           │
│ │ │   subnet 2   │ │     │ │   subnet 4   │ │           │
│ │ └~~~~~~~~~~~~~~┘ │     │ └~~~~~~~~~~~~~~┘ │           │
│ └──────────────────┘     └──────────────────┘           │
└─────────────────────────────────────────────────────────┘
                             │
                             │heartbeat failure
                             │
                             ▼
┌─────────────────────────────────────────────────────────┐
│                       Amazon VPC                        │
│     ┌ ─ ─ ─ ─ ─                            ┏━━━━━━━━━━┓ │
│       nat-gw-1 │─────ssh/ping heartbeat────┃ nat-gw-2 ┃ │
│     └ ─ ─ ─ ─ ─                            ┗━━━━━━━━━━┛ │
│                                                  │      │
│                                                  │      │
│           ┌────────────────────────┬─────────────┘      │
│           │                        │                    │
│           │                        │                    │
│           │                        │                    │
│ ┌─── rtr-table-a ──┐     ┌─── rtr-table-b ──┐           │
│ │ ┌~~~~~~~~~~~~~~┐ │     │ ┌~~~~~~~~~~~~~~┐ │           │
│ │ │   subnet 1   │ │     │ │   subnet 3   │ │           │
│ │ └~~~~~~~~~~~~~~┘ │     │ └~~~~~~~~~~~~~~┘ │           │
│ │ ┌~~~~~~~~~~~~~~┐ │     │ ┌~~~~~~~~~~~~~~┐ │           │
│ │ │   subnet 2   │ │     │ │   subnet 4   │ │           │
│ │ └~~~~~~~~~~~~~~┘ │     │ └~~~~~~~~~~~~~~┘ │           │
│ └──────────────────┘     └──────────────────┘           │
└─────────────────────────────────────────────────────────┘

Created with Monodraw

Requirements

Two NAT gateways:
- Primary: actively forwards traffic and is associated with an elastic IP.
- Secondary: runs natganker to passively monitor the primary instance and takeover if a healthcheck fails.
Both gateways must be able to perform healthchecks over the internal IP. This is easily solved by allowing ping or SSH to/from the internal subnet which both NAT gateways reside.
The primary NAT gateway instance should be the default route for routing tables in the region.
Both NAT instances requires the following IAM roles:
- ec2:DescribeInstances
- ec2:ModifyInstanceAttribute
- ec2:DescribeAddresses
- ec2:AssociateAddress
- ec2:DisassociateAddress
- ec2:DescribeSubnets
- ec2:DescribeRouteTables
- ec2:CreateRoute
- ec2:ReplaceRoute

Usage

There are three ways to run natganker:

Run on secondary NAT gateway and perform ping health check:
- ./natganker.sh <primary-nat-instance-id> <primary-nat-elastic-ip>
Run on secondary NAT gateway and perform SSH health check:
- ./natganker.sh <primary-nat-instance-id> <primary-nat-elastic-ip> ssh
Force failover to secondary NAT gateway:
- ./natganker.sh <primary-nat-instance-id> <primary-nat-elastic-ip> forcefail

In (1) and (2) if a healthcheck fails, then the secondary NAT will replace itself as default route for all route tables defaulting to the primary NAT gateway. The script is idempotent so it can be rerun and will perform any necessary operations until the secondary NAT has the associated elastic IP and is the default route for all associated routing tables.

Using Monit

To ensure the script runs persistently you can use monit.

Here is an example configuration file which is placed in /etc/monit.d/natganker:

check process natganker matching bash.*natganker
   start program = "/root/natganker/natganker.sh <primary-nat-instance-id> <primary-nat-elastic-ip>" as uid "root" and gid "root"
   stop program = "/usr/bin/pkill -f bash.*natganker" as uid "root" and gid "root"

nnathan / natganker Goto Github PK

natganker's Introduction

natganker

High-Level Overview

Requirements

Usage

Using Monit

natganker's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs