jeffmccune / ncio Goto Github PK

Puppet Node Classifier backup / restore and transformation of hostnames

Home Page: https://www.openinfrastructure.co

License: MIT License

Ruby 98.23% Shell 1.77%

ncio's Introduction

ncio - Puppet Node Classifier backup / restore

This project implements a small command line utility to backup and restore node classification data. The intended purpose is to backup node classification groups on a Primary, monolithic PE master and restore the backup on a secondary monolithic PE master. The purpose is to keep node classification groups in sync and ready in the event the secondary master needs to take over service from the primary.

Transformation

To achieve the goal of replicating node classification groups from one PE monolithic master to a secondary monolithic master, certain values need to be transformed. For example, consider a primary named master1.puppet.vm and a secondary named master2.puppet.vm Both are monolithic masters. When the backup is taken on the primary, the hostname will be embedded in the data. This is problematic because it will cause mis-configuration errors when imported into the secondary which has a different name.

To illustrate, consider the PuppetDB classification group:

{
  "name": "PE PuppetDB",
  "rule": [
    "or",
    [
      "=",
      "name",
      "master1.puppet.vm"
    ]
  ],
  "classes": {
    "puppet_enterprise::profile::puppetdb": {
    }
  }
}

Transformation from master1 to master2 is possible:

export PATH="/opt/pupeptlabs/puppet/bin:$PATH"
ncio --uri https://master1.puppet.vm:4433/classification-api/v1 backup \
 | ncio transform --hostname master1.puppet.vm:master2.puppet.vm \
 | ncio --uri https://master2.puppet.vm:4433/classification-api/v1 restore

This method of "replicating" node classification data has some caveats. It's only been tested on PE Monolithic masters. The method assumes master1 and master2 share the same Certificate Authority. By default, only the default puppet_enterprise classification groups are transformed.

Additional groups and classes may be processed by chaining transfomation processes and getting creative with the use of the --class-matcher option.

Installation

Install this tool on the same node running the node classification service:

$ sudo /opt/puppetlabs/puppet/bin/gem install ncio
Successfully installed ncio-0.1.0
Parsing documentation for ncio-0.1.0
Installing ri documentation for ncio-0.1.0
Done installing documentation for ncio after 0 seconds
1 gem installed

Usage

Ncio will attempt to use the host certificate from /etc/puppetlabs/puppet/ssl/certs/$FQDN.pem if it exists on the same node as the Node Classifier. If this certificate has sufficient access then no configuration is necessary. The default options will work to backup and restore node classification data.

sudo -H -u pe-puppet /opt/puppetlabs/puppet/bin/ncio backup > /var/tmp/backup.json
I, [2016-06-28T19:25:55.507684 #2992]  INFO -- : Backup completed successfully!

If this file does not exist, ncio will need to use a different client certificate. It is recommended to use the same certificate used by the Puppet Agent, which should be white-listed for node classification API access. The white-list of certificates is located at /etc/puppetlabs/console-services/rbac-certificate-whitelist

sudo -H -u pe-puppet /opt/puppetlabs/puppet/bin/ncio \
  --cert /etc/puppetlabs/puppet/ssl/certs/${HOSTNAME}.pem \
  --key  /etc/puppetlabs/puppet/ssl/private_keys/${HOSTNAME}.pem \
  backup > /var/tmp/backup.json
I, [2016-06-28T19:28:48.236257 #3148]  INFO -- : Backup completed successfully!

Logging

The status of backup and restore operations are logged to the syslog by default. The daemon facility is used to ensure messages are written to files on a wide variety of systems that log daemon messages by default. A general exception handler will log a backtrace in JSON format to help log processors and notification systems like Splunk and Logstash.

Here's an example of a failed restore triggering the catch all handler:

Jun 29 12:12:21 Jeff-McCune ncio[51474]: ERROR Restoring backup: {
  "error": "RuntimeError",
  "message": "Some random error",
  "backtrace": [
    "/Users/jeff/projects/puppet/ncio/lib/ncio/app.rb:94:in `restore_groups'",
    "/Users/jeff/projects/puppet/ncio/lib/ncio/app.rb:59:in `run'",
    "/Users/jeff/projects/puppet/ncio/exe/ncio:5:in `<top (required)>'"
  ]
}

Log to the console using the --no-syslog command line option.

ncio --no-syslog restore --file backup.json

The tool can only log to either syslog or the console at this time. Multiple log destinations are not currently supported.

Retrying Connections

It can take some time for the pe-console-services service to come online. In an effort to make things are robust as possible consider using the --retry-connections global option. This allows ncio to retry API connections while the service comes online. This option has been added to address the following use case:

service start pe-console-services.service
ncio --retry-connections backup

In this scenario ncio will retry API connections, eventually succeeding or timing out. Ideally the service will come online before the timeout expires.

Replication

A simple way to replicate node classification data between a primary and a secondary can be achieved with the following shell script. Call this from cron on a periodic basis:

#! /bin/bash
#
# This shell script is intended to be executed from cron on a periodic basis.
# The goal is to keep a Standby PE Monolithic master in sync with an active
# Primary.  Pass the FQDN of the primary as ARG 1 and the FQDN of the secondary
# as ARG 2
#
# In a DR situation when the secondary becomes active, block replication by
# touching the lockfile.  This will prevent any changes made to the standby from
# being clobbered as soon as the primary comes back online.
#
# Prior to re-enabling replication after a DR situation, replicate back to the
# primary by reversing the direction of this script.

set -euo pipefail

PRIMARY="$1"
STANDBY="$2"

SOURCE="https://${PRIMARY}:4433/classification-api/v1"
PATH="/opt/puppetlabs/puppet/bin:$PATH"
lockfile='/etc/ncio_do_not_replicate'

log() {
  logger -t ncio-replicate -p daemon.warn -s "$1"
}

if [[ -e "$lockfile" ]]; then
  log "WARN: Replication aborted, $lockfile exists!"
  exit 1
fi

ncio --uri "$SOURCE" --retry-connections backup \
  | ncio transform --hostname "${PRIMARY}:${STANDBY}" \
  | ncio --retry-connections restore
rval=$?

[[ $rval -eq 0 ]] && STATUS='OK' || STATUS='ERROR'
msg="INFO: Finished replicating puppet classification groups."
log "$msg STATUS=${STATUS} EXITCODE=${rval} (touch $lockfile to disable)"
exit $rval

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/jeffmccune/ncio.

License

The gem is available as open source under the terms of the MIT License.

ncio's People

Contributors

Stargazers

Watchers

Forkers

geoffwilliams dylanratcliffe

ncio's Issues

PE 2016.4.2 breaks ncio due to removal of pe-internal-orchestrator.pem certificate

ncio hangs backups hang with PE 2017.3.5

Overview

Using latest PE 2017.3.5 on CentOS 7.4 I'm experiencing hangs 100% of the time when running ncio backup. Also occurs on PE 2017.3.2.

Environment

Running PE inside a docker container - I've done this many times before so don't think this is the problem

[root@pe-puppet /]# ncio --version
ncio 2.0.0 (c) 2016 Jeff McCune

Expected result

Expected to be able to dump the classifier database

Actual result

ncio backup hangs until killed (ctrl +c). Equivalent curl request before and after failure works so the server is operational

[root@pe-puppet /]# ncio --debug --no-syslog backup
W, [2018-04-19T05:55:51.031613 #11796]  WARN -- : Starting Node Classification Backup using GET https://pe-puppet.localdomain:4433/classifier-api/v1/groups
^CF, [2018-04-19T05:55:52.390937 #11796] FATAL -- : ERROR Obtaining backup: {
  "error": "Interrupt",
  "message": "",
  "backtrace": [
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:176:in `wait_readable'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:176:in `rbuf_fill'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:154:in `readuntil'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/protocol.rb:164:in `readline'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http/response.rb:40:in `read_status_line'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http/response.rb:29:in `read_new'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1446:in `block in transport_request'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1443:in `catch'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1443:in `transport_request'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1416:in `request'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1409:in `block in request'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:877:in `start'",
    "/opt/puppetlabs/puppet/lib/ruby/2.4.0/net/http.rb:1407:in `request'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/http_client.rb:77:in `request'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/api/v1.rb:77:in `request_without_timeout'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/api/v1.rb:86:in `request'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/api/v1.rb:101:in `groups'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/app.rb:81:in `backup_groups'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/lib/ncio/app.rb:57:in `run'",
    "/opt/puppetlabs/puppet/lib/ruby/gems/2.4.0/gems/ncio-2.0.0/exe/ncio:5:in `<top (required)>'",
    "/opt/puppetlabs/puppet/bin/ncio:23:in `load'",
    "/opt/puppetlabs/puppet/bin/ncio:23:in `<main>'"
  ]

Analysis

Spent some time debugging this today and have verified the cause of this as the use of chunked encoding to make the REST calls.

It's possible this is related to the container based PE instance I'm using although I don't know why this would be the case. Anyone else seeing this issue?

Workaround

File: /lib/ncio/api/v1.rb
Comment line 26, eg:

      DEFAULT_HEADERS = {
        'Content-Type' => 'application/json',
#        'Transfer-Encoding' => 'chunked'
      }.freeze

After this change, backup command works instantly, as expected

[feature] retry-wait with timeout if puppetmaster is not up yet

Overview

If puppetserver is still loading, ncio will fail with an error. It would be cool if ncio could automatically retry the command while the server is booting until it either succeeds or times out.

Current error

If puppet server is down, the error looks like this at present

root@pe-puppet:/# ncio backup
/opt/puppetlabs/puppet/lib/ruby/2.1.0/net/http.rb:879:in `initialize': Connection refused - connect(2) for "pe-puppet.localdomain" port 4433 (Errno::ECONNREFUSED)

Improvement

We could catch Errno::ECONNREFUSED add a switch --connect-timeout and retry say every 5 seconds until we either hit the connection timeout or get a hard fail/success from the classifier

error running on older rubies

If user accidentally installed gem and ran inside an older ruby, the attached error will occur. The workaround is of course to use puppet's supplied modern version of ruby:

/opt/puppetlabs/puppet/bin/gem install ncio
/opt/puppetlabs/puppet/bin/ncio backup

/usr/share/ruby/syslog/logger.rb:177:in `initialize': wrong number of arguments (2 for 0..1) (ArgumentError)
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/support.rb:28:in `new'
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/support.rb:28:in `syslog_logger'
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/support.rb:20:in `reset_logging!'
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/support.rb:80:in `reset_logging!'
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/app.rb:41:in `reset!'
	from /usr/local/share/gems/gems/ncio-2.0.0/lib/ncio/app.rb:34:in `initialize'
	from /usr/local/share/gems/gems/ncio-2.0.0/exe/ncio:4:in `new'
	from /usr/local/share/gems/gems/ncio-2.0.0/exe/ncio:4:in `<top (required)>'
	from /usr/local/bin/ncio:23:in `load'
	from /usr/local/bin/ncio:23:in `<main>'

[improvement] half-installed puppet masters give un-authenticated error

Overview

In the current version of PE (2016.2.0), the certificate whitelist is not updated until the initial puppet run has been completed. If puppet orchestrator has not been enabled on the master, then any puppet code using its new language features will generate a syntax error (there is a ticket on this ...somewhere), preventing puppet from running and thus preventing us from being able to run NCIO at all.

It took me a while to figure this out so it would be great to express the above to the user somehow.

Message

Currently, users encountering the above situation receive the message:

[root@pupper-sbxr101 vagrant]# ncio backup
/opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/ncio-1.0.1/lib/ncio/api/v1.rb:68:in `groups': Expected 200 response, got 401 body: {"kind":"puppetlabs.rbac/user-unauthenticated","msg":"Route requires authentication","redirect":"/classifier-api/v1/groups?inherited=false"} (Ncio::Api::V1::ApiError)
    from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/ncio-1.0.1/lib/ncio/app.rb:75:in `backup_groups'
    from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/ncio-1.0.1/lib/ncio/app.rb:56:in `run'
    from /opt/puppetlabs/puppet/lib/ruby/gems/2.1.0/gems/ncio-1.0.1/exe/ncio:5:in `<top (required)>'
    from /opt/puppetlabs/puppet/bin/ncio:23:in `load'
    from /opt/puppetlabs/puppet/bin/ncio:23:in `<main>'

Improvement

Perhaps catch the exception (being careful not to mask errors from bad file permissions) and give a message like:

Error:  Route requires authentication
Make sure the certificate from file #{CERTFILE} is in the certificate whitelist and that you are able to run puppet on the master

[feature/idea] customer groups

I had an idea earlier about using NCIO to upgrade between different versions of PE by dumping just customer's rules, omitting the build in ones.

This might let us do something like dump the customer-owned rules from say PE 2016.4.2 and then loading them into PE 2017.3.0 when its becomes available, without having to worry about importing 'old' built-in rules and breaking things.

Would this be a useful feature? I this might work as long as customers have another way to get any change they need into the puppet-owned rules - say through puppet code or some other tool that just deals with upgrades and installations

jeffmccune / ncio Goto Github PK

ncio's Introduction

ncio - Puppet Node Classifier backup / restore

Transformation

Installation

Usage

Logging

Retrying Connections

Replication

Contributing

License

ncio's People

Contributors

Stargazers

Watchers

Forkers

ncio's Issues

Overview

Environment

Expected result

Actual result

Analysis

Workaround

Overview

Current error

Improvement

Overview

Message

Improvement

Recommend Projects

Recommend Topics

Recommend Org

Jobs