voxpupuli / puppet-corosync Goto Github PK

Sets up and manages Corosync.

Home Page: https://forge.puppet.com/puppet/corosync

License: Apache License 2.0

Ruby 88.33% Puppet 10.45% HTML 1.22%

linux-puppet-module puppet hacktoberfest centos-puppet-module debian-puppet-module redhat-puppet-module sles-puppet-module ubuntu-puppet-module

puppet-corosync's Introduction

Puppet module for Corosync & Pacemaker

The clusterlabs stack incorporates Corosync and Pacemaker in an Open-Source, High Availability stack for both small and large deployments.

It supports a lot of different HA setups and is very flexible.

This puppet module is suitable for the management of both the software stack (pacemaker and corosync) and the cluster resources (via puppet types and providers).

Note: This module is the successor of puppetlabs-corosync.

Documentation

Basic usage

To install and configure Corosync

class { 'corosync':
  authkey        => '/var/lib/puppet/ssl/certs/ca.pem',
  bind_address   => $facts['networking']['ip'],
  cluster_name   => 'mycluster',
  enable_secauth => true,
}

To enable Pacemaker

corosync::service { 'pacemaker':
  version => '0',
}

To configure advanced and (very) verbose logging settings

class { 'corosync':
  log_stderr        => false,
  log_function_name => true,
  syslog_priority   => 'debug',
  debug             => true,
}

To disable Corosync and Pacemaker services

class { 'corosync':
  enable_corosync_service  => false,
  enable_pacemaker_service => false,
}

Configure Corosync Secure Authentication

By default the built-in Puppet CA will be used to perform this authentication, however, generating a dedicated key is a better approach.

Generate a new key on a machine with Corosync installed and convert it to Base64.
```
# Generate the key
corosync-keygen -k /tmp/authkey
```

Convert the key file to a Base64 string so it can be used in your manifest.

# Convert it to a Base64 string
base64 -w 0 /tmp/authkey > /tmp/authkey_base64

Declare the corosync module using this string.

class { 'corosync':
  enable_secauth => true,
  authkey_source => 'string',
  authkey        => 'MxjvpEztT3Mi+QagUO2cefhLDrP2BSFYKS3g1WXTUj2eCgGDPcSNf3uCKgzJKhoWTgJm2nYDHJv8KiFqMoW3ATuVr/9fLb/lgUVfoz0GnP10S7r77aqaIsERhJcGVQhcteHVlZl6zOo6VQz4ekH7VPmMlKJX0iQPuJTh9o6qhjg=',
}

If the authkey is included directly in config, consider storing the value in hiera and encrypting it via hiera-eyaml.

PCSD Authorization

The pacemaker/corosync configuration system (pcs) includes a daemon (pcsd) which can be configured to perform distributed communication across the cluster. This is accomplished by establishing token-based authorization of each cluster node via the pcs auth command.

On systems which support it, management of PCS authorization can be configured and deployed via this module as shown in the following example:

class { 'corosync':
  manage_pcsd_service          => true,
  manage_pcsd_auth             => true,
  sensitive_hacluster_password => Sensitive('this-is-the-actual-password'),
  sensitive_hacluster_hash     => Sensitive('a-hash-of-the-passwd-for-the-user-resource'),
}

Note that as this must only be executed on one node and by default the 'first' node in the cluster list is used. There may be timing issues if the configuration has not yet been applied on the other nodes as a successful execution requires the password for hacluster to be appropriately set on each system.

Configure votequorum

To enable Corosync 2 votequorum and define a nodelist of nodes named n1, n2, n3 with auto generated node IDs

class { 'corosync':
  set_votequorum => true,
  quorum_members => [ 'n1', 'n2', 'n3' ],
}

To do the same but with custom node IDs instead

class { 'corosync':
  set_votequorum     => true,
  quorum_members     => [ 'n1', 'n2', 'n3' ],
  quorum_members_ids => [ 10, 11, 12 ],
}

Note: custom IDs may be required when adding or removing nodes to a cluster on a fly. Then each node shall have an unique and persistent ID.

To have multiple rings in the nodelist

class { 'corosync':
  set_votequorum     => true,
  quorum_members     => [
    ['172.31.110.1', '172.31.111.1'],
    ['172.31.110.2', '172.31.111.2'],
  ],
}

When quorum_members is an array of arrays, each sub array represents one host IP addresses.

Configure a Quorum Device (corosync-qdevice)

Recent versions of corosync include support for a network based quorum device that is external to the cluster. This provides tiebreaker functionality to clusters with even node counts allowing 2-node or higher clusters which can operate with exactly half of their nodes to function. There are two components to quorum device configuration:

A node which is not a member of any Corosync cluster will host the corosync-qnet daemon. This node should be outside of the network containing the cluster nodes.
Each member of the cluster will be authorized to communicate with the quorum node and have the corosync-qdevice service scheduled and operating.

This implementation depends entirely on PCSD authorization and will only execute with that enabled.

Configure the qdevice class on the quorum node. Note that the same quorum node can be used for multiple clusters. Additionally, this node cannot be a normal cluster member!

# In this example, the node's name is quorum1.test.org
class { 'corosync::qdevice':
  sensitive_hacluster_hash => Sensitive('hash-of-haclusters-password-on-the-qdevice-node')
}

Configure and enable qdevice settings on the cluster members via the corosync main class.

class { 'corosync':
  cluster_name                     => 'example',
  manage_pcsd_service              => true,
  manage_pcsd_auth                 => true,
  sensitive_hacluster_password     => Sensitive('this-is-the-actual-password'),
  sensitive_hacluster_hash         => Sensitive('a-hash-of-the-passwd-for-the-user-resource'),
  manage_quorum_device             => true,
  quorum_device_host               => 'quorum1.test.org',
  quorum_device_algorithm          => 'ffsplit',
  sensitive_quorum_device_password => Sensitive('Actual password for hacluster on quorum1.test.org'),
}

For more information see the following:

Configuring primitives

The resources that Corosync will manage can be referred to as a primitive. These are things like virtual IPs or services like drbd, nginx, and apache.

To assign a VIP to a network interface to be used by Nginx

cs_primitive { 'nginx_vip':
  primitive_class => 'ocf',
  primitive_type  => 'IPaddr2',
  provided_by     => 'heartbeat',
  parameters      => { 'ip' => '172.16.210.100', 'cidr_netmask' => '24' },
  operations      => { 'monitor' => { 'interval' => '10s' } },
}

Make Corosync manage and monitor the state of Nginx using a custom OCF agent

cs_primitive { 'nginx_service':
  primitive_class => 'ocf',
  primitive_type  => 'nginx_fixed',
  provided_by     => 'pacemaker',
  operations      => {
    'monitor'     => { 'interval' => '10s', 'timeout' => '30s' },
    'start'       => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' }
  },
  require         => Cs_primitive['nginx_vip'],
}

Make Corosync manage and monitor the state of Apache using a LSB agent

cs_primitive { 'apache_service':
  primitive_class => 'lsb',
  primitive_type  => 'apache2',
  provided_by     => 'heartbeat',
  operations      => {
    'monitor'     => { 'interval' => '10s', 'timeout' => '30s' },
    'start'       => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' }
  },
  require         => Cs_primitive['apache2_vip'],
}

Note: If you have multiple operations with the same names, you have to use an array. Example:

cs_primitive { 'pgsql_service':
  primitive_class => 'ocf',
  primitive_type  => 'pgsql',
  provided_by     => 'heartbeat',
  operations      => [
    { 'monitor'   => { 'interval' => '10s', 'timeout' => '30s' } },
    { 'monitor'   => { 'interval' => '5s', 'timeout' => '30s' 'role' => 'Master', } },
    { 'start'     => { 'interval' => '0', 'timeout' => '30s', 'on-fail' => 'restart' } }
  ],
}

If you do mot want Puppet to interfere with manually stopped resources (e.g not change the target-role metaparameter), you can use the unmanaged_metadata parameter:

cs_primitive { 'pgsql_service':
  primitive_class    => 'ocf',
  primitive_type     => 'pgsql',
  provided_by        => 'heartbeat',
  unmanaged_metadata => ['target-role'],
}

Configuring STONITH Resources

Special primitives can be configured to support STONITH (Shoot The Other Node In The Head) fencing. This is critical for clusters which include shared resources (shared disk typically) or are vulnerable to cluster splits. The STONITH resource is responsible for providing a mechanism to restart or simply halt a rouge resource, often via power fencing.

The following example performs this configuration via the fence_vmware_soap STONITH agent.

cs_primitive { 'vmfence':
  primitive_class => 'stonith',
  primitive_type  => 'fence_vmware_soap',
  operations      => {
    'monitor'     => { 'interval' => '60s'},
  },
  parameters      => {
    'ipaddr'          => 'vcenter.example.org',
    'login'           => '[email protected]'
    'passwd'          => 'some plaintext secret',
    'ssl'             => '1',
    'ssl_insecure'    => '1',
    'pcmk_host_map'   => 'host0.example.org:host0;host1.example.org:host1',
    'pcmk_delay_max'  => '10s',
  },
}

Note that currently this implementation only handles STONITH for RHEL/CentOS based clusters which utilize pcs.

Configuring locations

Locations determine on which nodes primitive resources run.

cs_location { 'nginx_service_location':
  primitive => 'nginx_service',
  node_name => 'hostname',
  score     => 'INFINITY'
}

To manage rule on a location. Example to force the location to not run on a container (VM).

cs_location { 'nginx_service_location':
  primitive => 'nginx_service',
  rules     => [
    { 'nginx-service-avoid-container-rule' => {
        'score'      => '-INFINITY',
        'expression' => [
          { 'attribute' => '#kind',
            'operation' => 'eq',
            'value'     => 'container'
          },
        ],
      },
    },
  ],
}

Example of a virtual ip location that checks ping connectivity for placement.

cs_location { 'vip-ping-connected':
  primitive => 'vip',
  rules     => [
    { 'vip-ping-exclude-rule' => {
        'score'      => '-INFINITY',
        'expression' => [
          { 'attribute' => 'pingd',
            'operation' => 'lt',
            'value'     => '100',
          },
        ],
      },
    },
    { 'vip-ping-prefer-rule' => {
        'score-attribute' => 'pingd',
        'expression'      => [
          { 'attribute' => 'pingd',
            'operation' => 'defined',
          }
        ],
      },
    },
  ],
}

Example of another possibility to use ping connectivity for placement.

cs_location { 'vip-ping-connected':
  primitive => 'vip',
  rules     => [
    { 'vip-ping-connected-rule' => {
        'score'      => '-INFINITY',
        'boolean-op' => 'or',
        'expression' => [
          { 'attribute' => 'pingd',
            'operation' => 'not_defined',
          },
          { 'attribute' => 'pingd',
            'operation' => 'lte',
            'value'     => '100',
          },
        ],
      },
    },
  ],
}

Configuring colocations

Colocations keep primitives together. Meaning if a vip moves to web02 from web01 because web01 just hit the dirt it will drag the nginx service with it.

cs_colocation { 'vip_with_service':
  primitives => [ 'nginx_vip', 'nginx_service' ],
}

pcs only Advanced colocations are also possible with colocation sets by using arrays instead of strings in the primitives array. Additionally, a hash can be added to the inner array with the specific options for that resource set.

cs_colocation { 'mysql_and_ptheartbeat':
  primitives => [
    ['mysql', {'role' => 'master'}],
    [ 'ptheartbeat' ],
  ],
}

cs_colocation { 'mysql_apache_munin_and_ptheartbeat':
  primitives => [
    ['mysql', 'apache', {'role' => 'master'}],
    [ 'munin', 'ptheartbeat' ],
  ],
}

Configuring migration or state order

Colocation defines that a set of primitives must live together on the same node but order definitions will define the order of which each primitive is started. If Nginx is configured to listen only on our vip we definitely want the vip to be migrated to a new node before nginx comes up or the migration will fail.

cs_order { 'vip_before_service':
  first   => 'nginx_vip',
  second  => 'nginx_service',
  require => Cs_colocation['vip_with_service'],
}

Configuring cloned resources/groups

Cloned resources should be active on multiple hosts at the same time. You can clone any existing resource provided the resource agent supports it.

cs_clone { 'nginx_service-clone' :
  ensure    => present,
  primitive => 'nginx_service',
  clone_max => 3,
  require   => Cs_primitive['nginx_service'],
}

You can also clone groups:

cs_clone { 'nginx_service-clone' :
  ensure    => present,
  group     => 'nginx_group',
  clone_max => 3,
  require   => Cs_primitive['nginx_service'],
}

Configure a Promotable (Active/Passive) resource

cs_clone { 'redis-clone':
  ensure            => present,
  primitive         => 'redis',
  clone_max         => 2,
  clone_node_max    => 1,
  promotable        => true,
  promoted_max      => 1,
  promoted_node_max => 1,
  notify_clones     => true,
}

Corosync Properties

A few global settings can be changed with the "cs_property" section.

Disable STONITH if required.

cs_property { 'stonith-enabled' :
  value   => 'false',
}

Change quorum policy

cs_property { 'no-quorum-policy' :
  value   => 'ignore',
}

You can use the replace parameter to create but not update some values:

cs_property { 'maintenance-mode':
  value   => 'true',
  replace => false,
}

Resource defaults

A few global settings can be changed with the "cs_rsc_defaults" section.

Don't move resources.

cs_rsc_defaults { 'resource-stickiness' :
  value => 'INFINITY',
}

Multiple rings

In unicast mode, you can have multiple rings by specifying unicast_address and bind_address as arrays:

class { 'corosync':
  enable_secauth    => true,
  authkey           => '/var/lib/puppet/ssl/certs/ca.pem',
  bind_address      => ['10.0.0.1', '10.0.1.1'],
  unicast_addresses => [
      [ '10.0.0.1',
        '10.0.1.1'
      ], [
        '10.0.0.2',
        '10.0.1.2'
      ],
  ],
}

The unicast_addresses is an array of arrays. One sub array matches one host IP addresses. In this example host2 has IP addresses 10.0.0.2 and 10.0.1.2.

Shadow CIB

Shadow CIB allows you to apply all the changes at the same time. For that, you need to use the cib parameter and the cs_commit and cs_shadow types.

Shadow CIB is the recommended way to manage large CIB with puppet, as it will apply all your changes at once, starting the cluster when everything is in place: primitives, constraints, properties.

If you set the cib parameter to one cs_* resource we recommend you to set that cib parameter to all the cs_* resources.

cs_shadow {
    'puppet':
}
cs_primitive { 'pgsql_service':
  primitive_class => 'ocf',
  primitive_type  => 'pgsql',
  provided_by     => 'heartbeat',
  cib             => 'puppet'
}
cs_commit {
    'puppet':
}

Notes

Upstream documentation

We suggest you at least go read the Clusters from Scratch document from Cluster Labs. It will help you out a lot when understanding how all the pieces fall together a point you in the right direction when Corosync/Pacemaker fails unexpectedly.

Roadmap

We do maintain a roadmap regarding next releases of this module.

Operating System support matrix

OS	release	Puppet 3.8.7	Puppet 4 (PC1)	Puppet 5.X
CentOS/RHEL	7	Not supported	Supported	Supported
Debian	9	Not supported	Supported	Supported
Ubuntu	16.04	Not supported	Supported	Supported

Contributors

See Github.

Special thanks to Puppet, Inc for initial development and Vox Pupuli to provide a platform that allows us to continue the development of this module.

Development

See the contributing guide for details. Additionally, some general guidelines on PR structure can be found here.

Copyright and License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

puppet-corosync's People

Contributors

Stargazers

Watchers

Forkers

ody gergnz asherbond bodepd branan cldmnky cammoraton mrobbert hunner philio aimonb unyonsys speedyrails strategist922 gf-hub haraldsk aglarendil antaflos kbon ninjix deadpoint bluemutedwisdom bitglue clement-roblot jhoblitt sathieu bigon blook sbadia cscfi justicel arnisoph shaunfangx redhat-cip lmorfitt mikehelix dmsimard nexusis mcanevet codecap verosk jonhattan elconas stamak proteon ghoneycutt michakrause natewarr jenos cmurphy fdallaca underscorgan elconas2 derdanne javierpena voljin madkiss petems jurgenweber gwdg roberthawdon gavmain tdb cloudevelops nodeintegration jistr gioodev soumentrivedi bmjen rdo-puppet-modules xavpaice nibalizer syseleven ffrank fghaas inuits francetv btravouillon rawmannovak barajus osuosl ruslanloman rasschaert kmeaw juniorsysadmin roidelapluie oranenj fatmcgav roman-mueller slm0n87 bastelfreak ppouliot stephencooke rhamon pulecp itsbcit alexjfisher vuokkovuorinnen trondiz cybercom-finland

puppet-corosync's Issues

CentOS 7 Startup issue

The failures in PCCI are actually bubbling up a real error:

A few things you could check:

Make sure that Corosync and Pacemaker start at boot (or at least start them both manually) on both nodes:
$ sudo systemctl enable corosync
$ sudo systemctl enable pacemaker

There is a know bug which appears at boot on RHEL 7 or CentOS 7, I reported a workaround in Redhat bugzilla bug #1030583 but it’s no longer public.

The workardound is to let Corosync wait for 10s at boot, so it doesn’t start when the interfaces aren’t completely available (ugly workaround, I know :))

Change /usr/lib/systemd/system/corosync.service to include the ExecStartPre:
…
[Service]
ExecStartPre=/usr/bin/sleep 10
ExecStart=/usr/share/corosync/corosync start
…
Then, reload systemd:
$ sudo systemctl daemon-reload

You can also look in /var/log/pacemaker.log or look for something related in /var/log/messages.

In case these steps won’t help, I will check to redo the tutorial myself and see if I missed or forgot to write something.

Keep me posted :)

http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs

Unable to find operation matching: monitor:Master

Hi all,

after #233 was merged the following code:

cs_primitive { 'pgsql':
        primitive_class         => 'ocf',
        primitive_type          => 'pgsql',
        provided_by             => 'heartbeat',
        promotable              => 'true',
        parameters              => { 'pgctl' => '/bin/pg_ctl', 'psql' => '/bin/psql', 'pgdata' => '/var/lib/pgsql/data/', 'rep_mode' => 'sync', 'node_list' => inline_template("<%= @node_list %>"), 'restore_command' => 'cp /var/lib/pgsql/pg_archive/%f %p', 'primary_conninfo_opt' => 'keepalives_idle=60 keepalives_interval=5 keepalives_count=5', 'master_ip' => inline_template("<%= @vip_slave %>"), 'restart_on_promote' => 'true' },
        operations              => {
                'start'         => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'restart' },
                'monitor'       => { 'interval' => '4s', 'timeout' => '60s', 'on-fail' => 'restart' },
                'monitor'       => { 'interval' => '3s', 'timeout' => '60s', 'on-fail' => 'restart', 'role' => 'Master' },
                'promote'       => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'restart' },
                'demote'       => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'stop' },
                'stop'          => { 'interval' => '0s', 'timeout' => '60s', 'on-fail' => 'block' },
                'notify'       => { 'interval' => '0s', 'timeout' => '60s' },
        },
        ms_metadata                => { 'master-max' => '1', 'master-node-max' => '1', 'clone-max' => '2', 'clone-node-max' => '1', 'notify' => 'true' },
        require                 => Cs_primitive['vip-master','vip-slave'],
}

throws the following error:

Error: /Stage[main]/Profile::Db::Bi_db/Cs_primitive[pgsql]: Could not evaluate: Execution of '/sbin/pcs resource op remove pgsql monitor:Master interval=3s on-fail=restart timeout=60s' returned 1: Error: Unable to find operation matching: monitor:Master interval=3s on-fail=restart timeout=60s

move helpers to lib/puppet_x

Warning after latest changes merged

Hi community,

after latest changes merged I got this warning

Warning: cs_primitive.rb[operations]: Role in the operations name is now deprecated. Please use an array of hashes and put the role in the values.
(at /opt/puppetlabs/puppet/cache/lib/puppet/type/cs_primitive.rb:103:in `block (4 levels) in <top (required)>')

I know it's not a major error, just be advised

`cs_primitive` resources without a `parameters` attribute are not realized.

ssia.

Puppet error if utilization property of cs_primitive not set.

I followed the example to set up some primitives with the current master of puppetlabs-corosync.

Unless I explicitly define the sample primitive as follows:

    cs_primitive { 'nginx_vip':
      primitive_class => 'ocf',
      primitive_type  => 'IPaddr2',
      provided_by     => 'heartbeat',
      parameters      => { 'ip' => '172.16.210.100', 'cidr_netmask' => '24' },
      operations      => { 'monitor' => { 'interval' => '10s' } },
      utilization     => {},
    }

I get the following error:

err: Cs_primitive[nginx_vip]: Could not evaluate: undefined method `empty?' for nil:NilClass

The key is to explicitly define

      utilization     => {},

The error happens here and I suppose this was introduced with #41 however I don't really understand why the same problem doesn't exist for the metadata attribute?

Using puppet 2.6.

Corosync vs Pacemaker: wrong usage of "Corosync"

Corosync doesn't manage resources. Corosync provides reliable communication between nodes, manages cluster membership and determines quorum. Pacemaker is a cluster resource manager (CRM) that manages the resources that make up the cluster, such as IP addresses, mount points, file systems, DRBD devices, services such as MySQL or Apache and so on. Basically everything that can be monitored, stopped, started and moved around between nodes.

Pacemaker does not depend on Corosync, it could use Heartbeat (v3) for communication, membership and quorum instead. Corosync could also work without Pacemaker, for example with Red Hat's CMAN.

This is a documentation problem but also reflected in the names of the types this module provides. The Linux HA stack and its history as well as various cluster components are already confusing enough so it is important to not mix up terms.

I'll submit a PR for the Readme but I don't think it will be possible to rename the types this module provides at this point.

clone of a group isn't supported

When trying to clone a group I see the error:

Error: Could not prefetch cs_clone provider 'crm': undefined method `attributes' for nil:NilClass

The configuration looks like:

group g_nfs p_rpcbind p_nfs_kernel_server
clone cl_nfs g_nfs \
        meta interleave="true"

And the XML:

      <clone id="cl_nfs">
...
        <group id="g_nfs">
          <primitive id="p_rpcbind" class="lsb" type="rpcbind">

So the primitive is a level deeper than it would be with a clone of a primitive. The following code in https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_clone/crm.rb#L41 expects the primitive to be directly under clone:

    doc.root.elements['configuration'].elements['resources'].each_element('clone') do |e|
      primitive_id = e.elements['primitive'].attributes['id']

It appears to be the same for pcs too, although I've not tested it.

I don't have a fix, so just documenting the issue so others will be aware.

PCCI ubuntu 12.04 + centos 6

Would be nice to have acceptance tests for ubuntu 12.04 and centos 6

use run_command_in_cib everywhere in the crmsh provider

we need to migrate the crmsh commands to that new helper, which manages executions in a better way.

pcs: constraints, groups do not delay creation of primitives

With the pcs provider (not sure whether the crm provider does the same), cs_primitive resources are committed immediately. This usually causes the resource to be started immediately, regardless of explicit order and colocation constraints, or implicit constraints from cs_groups.

This means resources start prematurely, in the wrong order, or in the wrong place, requiring the user to run pcs resource cleanup after the fact.

This can be worked around by running exactly that via an exec at the end of each run, but that's exceedingly ugly, so I am wondering whether others have come up with better workarounds.

missing autedependency

[Error] Error: /Stage[main]/Profile_corosync/Profile_corosync::Daemon[messagebus]/Cs_order[nfs_before_messagebus]: Could not evaluate: Execution of '/usr/sbin/pcs con
straint order nfs-mount-clone then messagebus INFINITY kind=Mandatory id=nfs_before_messagebus symmetrical=true' returned 1: Error: Resource 'nfs-mount-clone' does no
t exist

RHEL 6.4: crm replaced by pcs

See https://bugzilla.redhat.com/show_bug.cgi?id=878508 .

Apparently RedHat chose to remove the crm command from their RHEL 6.4 release in favor of pcs. This Puppet module relies on crm for all operations, which means this module is currently broken on RHEL 6.4 based systems e.g. CentOS 6.4.

Solutions:

add a pcs provider to keep this module working on 6.4 based systems
the crm command has moved to a new crmsh package, available on SuSE repo http://download.opensuse.org/repositories/network:/ha-clustering/RedHat_RHEL-6/x86_64/

I would prefer the first option as this is probably the best supported way forward.

When adding/removing cluster nodes, use configurable node IDs in the nodelist config

Usecase: when scaling adding/removing corosync cluster nodes given in the "quorum_members" parameter, existing indexes should be unique and preserved for a node lifetime (optionally).

While shifted by auto-increment (https://github.com/voxpupuli/puppet-corosync/blob/master/templates/corosync.conf.udpu.erb#L90-L96) IDs of old and new nodes might bring a major issue to the dynamic config feature of Corosync 2 cluster, which relies on node IDs mappings in the "runtime.totem.pg.mrp.srp.members" namespace.

The solution is to make IDs generation configurable: either by autoincrement (default), or by a given "quorum_members_ids" data, if specified by a user.

Example:
A) Source cluster: quorum_members = [ node-1, node-3, node-22 ].
The resulting corosync.conf's nodelist will be generated by the erb template as a following:
nodelist {
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-3
nodeid: 2
ring0_addr: node-22
nodeid: 3
}
}

B) Destination cluster: quorum_members = [ node-1, node-22, node-4 ]
Expected the nodelist IDs to be preserved for existing nodes:
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-22
nodeid: 3
ring0_addr: node-22
nodeid: 4 (or anything else but 2)
}
Actual:
node {
ring0_addr: node-1
nodeid: 1
ring0_addr: node-22
nodeid: 2
ring0_addr: node-4
nodeid: 3
}
Which makes IDs 2 and 3 to be mapped to wrong nodes.

Incorrect/incomplete autorequire in cs_primitive type

cs_primitive has an autorequire against the corosync service.

This makes no sense except when are running with corosync::service { 'pacemaker': version => 0 }; under all other circumstances the autorequire should be against pacemaker. For context, see also #143.

Colocation constraints with pcs provider are broken, incorrectly assume bidirectionality, swap order of primitives around

https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_colocation/pcs.rb#L39 says that the order of primitives in colocation constraints does not matter. It does.

<rsc_colocation id="c_rbd_volume2_on_target1" rsc="g_target1" score="INFINITY" with-rsc="g_rbd_volume2"/>
<rsc_colocation id="c_rbd_volume1_on_target1" rsc="g_rbd_volume1" score="INFINITY" with-rsc="g_target1"/>

These two constraints are not identical, but were generated from equivalent cs_colocation resources.

Confused by the Versioning on this Module

The latest version of this module on the forge is 0.7.0, released December 2014. In the project on Github, however, there are several releases since. It looks like the current release is 1.2.1, release in the last few days.

Why the discrepancy between the forge and the Github project?

Thanks,
Lance

Clone names can't be overridden

https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/provider/cs_clone/pcs.rb#L59 sets the clone's CIB ID to <primitive>-clone, which is an arbitrary naming convention. The CIB ID should simply honor the cs_clone resource's namevar, name.

puppet Error defining cs_primitive

I created a little class and would like to have a simple VIP
puppet-corosync 2.0.1 on centos 7 with puppet 3.7.4 ruby 2.0.0:

cs_primitive { 'haproxy_vip':
    primitive_class => 'ocf',
    primitive_type  => 'IPaddr2',
    provided_by     => 'heartbeat',
    parameters      => {
      'ip'           => '1.2.3.4',
      'cidr_netmask' => '24'
    },
    operations      => {
      'monitor' => {
        'interval' => '30s' }
      },
  }

But all I get on the puppet run is:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not autoload puppet/type/cs_primitive: /etc/puppet/modules/corosync/lib/puppet/type/cs_primitive.rb:68: syntax error, unexpected ':', expecting ')'
newparam(:unmanaged_metadata, array_matching: :all) do
^
/etc/puppet/modules/corosync/lib/puppet/type/cs_primitive.rb:104: syntax error, unexpected ':', expecting ')'
newproperty(:operations, array_matching: :all) do
^ on node foo.bar

location rules definition missing

in puppet-corosync 1.2.1, i am missing location rules configuration
(for example a ping-gateway configuration). Would be great, if it does exist.

cs_colocation's self.instances is huge, and should be split

the following function is huge, not very ruby-like, and can be split in at least two functions (between if and else) and multiple helpers.

  def self.instances

    block_until_ready

    instances = []

    cmd = [ command(:crm), 'configure', 'show', 'xml' ]
    if Puppet::PUPPETVERSION.to_f < 3.4
      raw, status = Puppet::Util::SUIDManager.run_and_capture(cmd)
    else
      raw = Puppet::Util::Execution.execute(cmd)
      status = raw.exitstatus
    end
    doc = REXML::Document.new(raw)

    doc.root.elements['configuration'].elements['constraints'].each_element('rsc_colocation') do |e|
      rscs = []
      items = e.attributes

      if items['rsc']
        # The colocation is defined as a single rsc_colocation element. This means
        # the format is rsc and with-rsc. In the type we chose to always deal with
        # ordering in a sequential way, which is why we reverse their order.
        if items['rsc-role']
          rsc = "#{items['rsc']}:#{items['rsc-role']}"
        else
          rsc = items['rsc']
        end

        if items ['with-rsc-role']
          with_rsc = "#{items['with-rsc']}:#{items['with-rsc-role']}"
        else
          with_rsc = items['with-rsc']
        end

        # Put primitives in chronological order, first 'with-rsc', then 'rsc'.
        primitives = [with_rsc , rsc]
      else
        # The colocation is defined as a rsc_colocation element wrapped around a single resource_set.
        # This happens automatically when you configure a colocation between more than 2 primitives.
        # Notice, we can only interpret colocations of single sets, not multiple sets combined.
        # In Pacemaker speak, this means we can support "A B C" but not e.g. "A B (C D) E".
        # Feel free to contribute a patch for this.
        primitives = []
        e.each_element('resource_set') do |rset|
          rsetitems = rset.attributes

          # If the resource set has a role, it will apply to all referenced resources.
          if rsetitems['role']
            rsetrole = rsetitems['role']
          else
            rsetrole = nil
          end

          # Add all referenced resources to the primitives array.
          rset.each_element('resource_ref') do |rref|
            rrefitems = rref.attributes
            if rsetrole
              # Make sure the reference is stripped from a possible role
              rrefprimitive = rrefitems['id'].split(':')[0]
              # Always reuse the resource set role
              primitives.push("#{rrefprimitive}:#{rsetrole}")
            else
              # No resource_set role was set: just push the complete reference.
              primitives.push(rrefitems['id'])
            end
          end
        end
      end

      colocation_instance = {
        :name       => items['id'],
        :ensure     => :present,
        :primitives => primitives,
        :score      => items['score'],
        :provider   => self.name
      }
      instances << new(colocation_instance)
    end
    instances
  end

my perception is that those helpers could also be used in other cs_* providers.

Flush cs properties in one go

From the discussion in #170, it would probably be nice if the cs providers implemented the post_resource_eval hook in order to work faster through batch syncing.

port #174 to the pcs provider

0.7.0 release broken; corosync.conf template creates configuration file that makes it impossible for corosync daemon to start

https://github.com/puppet-community/puppet-corosync/blob/0.7.0/templates/corosync.conf.erb uses a reference to threads_real which is never set. This creates a corosync.conf file where the threads parameter has an empty value; under these circumstances corosync will refuse to start. This is fixed in master, but the 0.7.0 release should be retracted from the Puppet Forge or at least being marked as broken or non-working.

Post-#209: check why there is an exception for RedHat in acceptance tests

This task need work after #209 has been merged.

Acceptance tests contain
unless fact('osfamily') == 'RedHat' # Something's wrong with the pcs provider

We need to fix the issue and fix it

Support Ruby 2.2.x

This module does not yet support Ruby 2.2.x or above

Added clone, rsc_options and location including rules

Hi,

I tried to implement types and providers for cs_clone, cs_location and cs_rsc_defaults. Some examples explain the syntax:

cs_rsc_defaults { 'migration-threshold': value => '2' }

cs_clone { 'pingclone':
ensure => present,
primitive => 'ping',
metadata => { 'globally-unique' => "false", 'clone-max' => "2", 'target-role' => "Started" },
}

cs_location { 'monitoring_on_connected_node':
ensure => present,
rsc => 'cluster-ip',
rules => [ { 'score' => '-INFINITY', 'operation' => 'or', expressions => ['not_defined pingd', 'pingd lte 0'], }, ],
}

cs_location { 'monitoring_on_preferred_node':
ensure => present,
rsc => 'cluster-ip',
host => 'fcil02v231',
score => '1000',
}

Feedback will be welcome. Contact me for source code (lennart.betz(at)netways.de)

Ciao Lennart.

Steps to do before a -beta1

I would like 1 more week before we tag the rc1 release as I am working hard on this now

Disallowing 1-primitive groups in cs_group is pointless

The validation check in https://github.com/puppet-community/puppet-corosync/blob/master/lib/puppet/type/cs_group.rb#L30 is overzealous. It is a perfectly valid use case to have groups with just 1 primitive. For example, you might want to define a set of 3, 2 or even just 1 resource that you then reference from a constraint. The easiest way to that is to use a group, and to always have the constraint point to the group. If there is only one primitive in the group, so be it.

The alternative is to have manifests full of ifs and unlesses to either point to a group or to a standalone primitive, and that's just silly.

Cs_colocation Could not evaluate: undefined method `include?' for nil:NilClass

[Error] Error: /Stage[main]/Profile_corosync/Profile_corosync::Daemon[pt-heartbeat-1]/Cs_colocation[mysql-1_pt-heartbeat-1]: Could not evaluate: undefined method `include?' f
or nil:NilClass

Undefined method 'first'

After last updates being merged to master this code:

cs_order { 'order-msPostgresql-master-group-INFINITY':
first => 'msPostgresql:promote',
second => 'master-group:start',
score => 'INFINITY',
symmetrical => 'false',
}

throws this error

Error: /Stage[main]/Profile::Db::Bi_db/Cs_order[order-msPostgresql-master-group-INFINITY]: Could not evaluate: undefined method `first' for Cs_orderorder-msPostgresql-master-group-INFINITY:Puppet::Type::Cs_order::ProviderPcs

Earlier it didn't appear. Looks like someone brought new bug trying to fix old one:)

trying to sed file that does not exist

on RHEL 6.4:

err: /Stage[main]/Corosync/Exec[enable corosync]/returns: change from notrun to 0 failed: sed -i s/START=no/START=yes/ /etc/default/corosync returned 2 instead of one of [0] at /etc/puppetlabs/puppet/modules/corosync/manifests/init.pp:198

The file /etc/default/corosync does not exist on RHEL 6.4 after installing corosync.

I've worked around the issue for now with:

    file { '/etc/default/corosync':
        owner   => 'root',
        group   => 'root',
        mode    => '600',
        content => "START=yes"
    }

`crm configure load update` returns 0 on error

When running crm configure load update in the providers' flush methods, if there are syntax errors or anything, the command still returns 0 and the resource shows that the changes were applied.

I haven't figured out a good way to validate the configuration other than crm_verify but this command requires XML, but we are currently emitting normal crm commands. I don't have other ideas of how to fix it at the moment.

Drop self.instances

We need to drop self.instances if we plan to support the cib => parameter

Test basic compatibility with Puppet 3.8.x (on RHEL7)

@fghaas commented that he had to resort to Puppet 3.6 when deploying on CentOS 7, because with 3.8 there was severe breakage.

Support for multiples rings

Post-#209: Add cs_property {replace=> no} in the changelog

This task need work after #209 has been merged.

default token value of 3000 is not a supported config

The default token value of 3000ms is not a supported config by redhat. This could result in clusters being installed that would not be supported.

As per article https://access.redhat.com/solutions/300953

Red Hat Enterprise Linux 5 and 6 using cman:

The default timeout is 10 seconds (10000 ms).

The following are the limits of the range tested by Red Hat Cluster Engineering:

Minimum: 5 seconds (5000 ms)
Maximum: 300 seconds (300000 ms)

There are known issues with values outside the tested range. If an issue is determined to be associated with values outside the tested range, Red Hat Support may require you reproduce the issue with a token timeout within the tested range.

My cluster is running on rhel6

Wrong syntax in pcs command call

Hi,

when Puppet tries to apply this piece of code

cs_order { 'order-msPostgresql-master-group-INFINITY':
  first  => 'promote msPostgresql:Master',
  second => 'start master-group',
  score  => 'INFINITY',
}

the following error appear

Error: /Stage[main]/Profile::Db::Bi_db/Cs_order[order-msPostgresql-master-group-INFINITY]: Could not evaluate: Execution of '/sbin/pcs constraint order Master promote msPostgresql then start master-group INFINITY kind=Mandatory id=order-msPostgresql-master-group-INFINITY symmetrical=false' returned 1: Usage: pcs constraint [constraints]...

Looks like pcs command call has wrong syntax that is supposed to be:

/sbin/pcs constraint order promote msPostgresql then start master-group INFINITY kind=Mandatory id=order-msPostgresql-master-group-INFINITY symmetrical=false

(Master statement is excessive)

What would we do?

Regards,

`cs_group` resource cannot handle `ensure => absent`

It complains about a missing should method.

Failures with Puppet 4 + Ubuntu

Acceptance tests have a race condition with Ubuntu 14.04 and Puppet 4.

To be investagted

Beaker tests results are irrelevant

When PCCI runs the tests it first send a failure because tests fail on Centos then it sends a success because tests work on Ubuntu and we only see the Ubuntu success at the end.

Could not autoload puppet/type/cs_property

Error Message:
Could not autoload puppet/type/cs_property: Could not autoload puppet/provider/cs_property/pcs: cannot load such file -- puppet_x/voxpupuli/corosync/provider/pcs on node XXX
We use crm not pcs.

`cs_shadow` and `cs_commit` should not create changes every run.

The cs_shadow and cs_commit resources create notice messages every run. This shouldn't happen. /var/lib/heartbeat/crm/shadow.* are left after each run, so we could perhaps use that? The crm command has to have a way to know what cib's stick around, since trying to create a new one with the same name says that it already exists.

root@puppet-failover-secondary:~# crm configure cib new puppet_ha
A shadow instance 'puppet_ha' already exists.
  To prevent accidental destruction of the cluster, the --force flag is required in order to proceed.
root@puppet-failover-secondary:~# wc /var/lib/heartbeat/crm/shadow.puppet_ha
  70  199 4724 /var/lib/heartbeat/crm/shadow.puppet_ha

Add support for multiple rings in node list

It is possible that each cluster node has multiple rings, the needed config would like this this:

rrp_mode = active
  interface {
    ringnumber:  0
    bindnetaddr: 192.168.254.1
    mcastaddr:   239.1.1.1
    mcastport:   5405
  }
  interface {
    ringnumber:  1
    bindnetaddr: 192.168.255.1
    mcastaddr:   239.2.1.1
    mcastport:   5405
  }
...
nodelist {
  node {
    ring0_addr: 192.168.254.1
    ring1_addr: 192.168.255.1
    nodeid: 1
  }
  node {
    ring0_addr: 192.168.254.2
    ring1_addr: 192.168.255.2
    nodeid: 2
  }

We already have support for multiple interfaces, but not several rings per node. We also need to valide rrp_mode, this needs to be set to active or passive, none isn't allowed with multiple rings.

utilization attributes of primitives can't be managed

Pacemaker allows resources to be associated with some utilization of arbitrary resources. Nodes can then be said to have so much of those resources, and pacemaker will take care place resources only where there are enough resources. For example, one can declare that a resource representing a VM requires 512 MB of RAM, and that some VM host has 16 GB of RAM. See the relevant Pacemaker documentation for more.

I've started working on a branch to add the capability of managing these attributes.

Incomplete support for Corosync 2.x

This module has no proper way to enable Pacemaker without a service plugin. The documented way of enabling Pacemaker as per https://github.com/puppet-community/puppet-corosync/blob/master/README.md is to add

corosync::service { 'pacemaker': 
  version => 0,
}

which is a bad idea to begin with, since version 1 had long been the preferred service plugin.

However, this configuration mode makes no sense at all on systems past Corosync 2.0 which did away with service plugins. This means that you currently have to do

service { 'pacemaker':
  ensure => running,
  require => Class['corosync'],
}

to get Pacemaker to run with Corosync 2.x, and then have all cs_* resources require Service['pacemaker'], which really ought to be an auto-dependency.

There is no existing issue or PR that addresses this problem

Optional, but makes our lives much easier:

The issue affects the latest release of this module at the time of
submission

Affected Puppet, Ruby, OS and module versions/distributions

Puppet 3.8.7, Ruby 2.1.5p273, Debian 8.5 Jessie, Module from master branch

What are you seeing

Using the cs_rsc_defaults type gives an error:

Error: Invalid parameter cib(:cib)
Error: /Stage[main]/Zivit::Cluster/Cs_rsc_defaults[resource-stickiness]/ensure: change from absent to present failed: Invalid parameter cib(:cib)

What behaviour did you expect instead

No error.

How did this behaviour get triggered

cs_rsc_defaults { 'resource-stickiness':
    value => '100'
}

Output log

see above

Any additional information you'd like to impart

It seems the type is missing the cib parameter which is referenced in the crm provider implementation (line 84). The parameter should probably be added to this type.

voxpupuli / puppet-corosync Goto Github PK

puppet-corosync's Introduction

Puppet module for Corosync & Pacemaker

Documentation

Basic usage

Configure Corosync Secure Authentication

PCSD Authorization

Configure votequorum

Configure a Quorum Device (corosync-qdevice)

Configuring primitives

Configuring STONITH Resources

Configuring locations

Configuring colocations

Configuring migration or state order

Configuring cloned resources/groups

Corosync Properties

Resource defaults

Multiple rings

Shadow CIB

Notes

Upstream documentation

Roadmap

Operating System support matrix

Contributors

Development

Copyright and License

puppet-corosync's People

Contributors

Stargazers

Watchers

Forkers

puppet-corosync's Issues

Affected Puppet, Ruby, OS and module versions/distributions

What are you seeing

What behaviour did you expect instead

How did this behaviour get triggered

Output log

Any additional information you'd like to impart

Recommend Projects

Recommend Topics

Recommend Org

Jobs