Building RHEL7 Cluster Part-I

Hello Folks,

It has been long time, I could not create any post about the clustering with RHEL7. It uses Pacemaker as a high-availability cluster resource manager.

In this post, It will be used two-node cluster. Each of the node has been configured to resolve the hostnames into the IP addresses. Also, each node has been configured the sync its time from the outside world ntp servers. It is depicted in the Figure-1.

Nodes:

pck1 – 192.168.122.22

pck2- 192.168.122.23

It is used ntpd deamon, instead of chronyd.

Figure-1 Two-Node Cluster with Pacemaker

 

Installing Necessary Packages:

On both nodes;

[root@pcmk-1 ~]# yum update -y
[root@pcmk-1 ~]# yum install -y pacemaker pcs psmisc policycoreutils-python

Configuring Firewall:

On both nodes;

[root@pcmk-1 ~]# firewall-cmd --permanent --add-service=high-availability
[root@pcmk-1 ~]# firewall-cmd --reload

Disabling Selinux:

On both nodes;

[root@pcmk-1 ~]# setenforce 0
[root@pcmk-1 ~]# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config

Staring and Enabling Cluster Service:

On either node.

root@pck1 ~]# systemctl enable pcsd.service 
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@pck1 ~]# systemctl start pcsd.service 

Configuring Cluster:

On both node set password for the user hacluster

root@pck1 ~]# passwd hacluster

On either node authenticate the nodes.

[root@pck1 ~]# pcs cluster auth pck1 pck2
Username: hacluster
Password: 
pck2: Authorized
pck1: Authorized

Named the Cluster:

I named cluster as LATAM_GW

[root@pck1 ~]# pcs cluster setup LATAM_GW pck1 pck2
Error: A cluster name (--name <name>) is required to setup a cluster
[root@pck1 ~]# pcs cluster setup --name LATAM_GW pck1 pck2
Destroying cluster on nodes: pck1, pck2...
pck1: Stopping Cluster (pacemaker)...
pck2: Stopping Cluster (pacemaker)...
pck2: Successfully destroyed cluster
pck1: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'pck1', 'pck2'
pck1: successful distribution of the file 'pacemaker_remote authkey'
pck2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
pck1: Succeeded
pck2: Succeeded

Synchronizing pcsd certificates on nodes pck1, pck2...
pck2: Success
pck1: Success
Restarting pcsd on the nodes in order to reload the certificates...
pck2: Success
pck1: Success
[root@pck1 ~]#

Starting Cluster

On either node.

[root@pck1 ~]# pcs cluster start --all
pck1: Starting Cluster...
pck2: Starting Cluster...

You can also start the specific node instead of all nodes in the cluster.

[root@pck1 ~]# pcs cluster start pck1
pck1: Starting Cluster...

Viewing the version of cluster software:

[root@pck1 ~]# pacemakerd --features
Pacemaker 1.1.18-11.el7_5.3 (Build: 2b07d5c5a9)
 Supporting v3.0.14:  generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd nagios  corosync-native atomic-attrd acls
[root@pck1 ~]# 

Checking current cluster Configuration:

Cluster configuration stored in XML format. You can view the current cluster configuration with “pcs cluster cib”.

[root@pck2 ~]# pcs cluster cib 
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="5" num_updates="4" admin_epoch="0" cib-last-written="Sat Dec  1 14:55:56 2018" update-origin="pck2" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
        <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.18-11.el7_5.3-2b07d5c5a9"/>
        <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
        <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="LATAM_GW"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="pck1"/>
      <node id="2" uname="pck2"/>
    </nodes>
    <resources/>
    <constraints/>
  </configuration>
  <status>
    <node_state id="1" uname="pck1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="1">
        <lrm_resources/>
      </lrm>
    </node_state>
    <node_state id="2" uname="pck2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
      <lrm id="2">
        <lrm_resources/>
      </lrm>
    </node_state>
  </status>
</cib>

Adding Resource to Cluster:

In this post, It will be built High Available(Active-Passive) Apache Web Server. For this purpose first resource that we need to create is the ClusterIP. In other words Floating IP. A floating IP address is used to support failover in a high-availability cluster. The cluster is configured such that only the active member of the cluster “owns” or responds to that IP address at any given time.

On either node.

[root@pck1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
    ip=192.168.122.24 cidr_netmask=32 op monitor interval=30s

Configuring Apache for the Active-Passive cluster(HA)

On both nodes. Install httpd and wget on both nodes. wget is necessary tool which used by the resource agent to check the healthiness of the node.

[root@pck1 ~]# yum install -y httpd wget

Create a webpage on both nodes like below.

On first node:

Create index.html in the /var/www/html

 <html>
 <body>LATAMWEBGW running on - pck1.localdomain</body>
 </html>

On second node.

Create index.html in the /var/www/html

<html>
 <body>LATAMWEBGW running on - pck2.localdomain</body>
 </html>

Configure apache for server-status:

This pages will be requested by the cluster resource agent to check the healthiness of nodes.

On first node; create configuration into the /etc/httpd/conf.d/status.conf

<Location /server-status>
    SetHandler server-status
    Require all denied
    Require ip 127.0.0.1
    Require ip ::1
    Require ip 192.168.122.22
</Location>

On second node; create configuration into the /etc/httpd/conf.d/status.conf

<Location /server-status>
    SetHandler server-status
    Require all denied
    Require ip 127.0.0.1
    Require ip ::1
    Require ip 192.168.122.23
</Location>

Creating Resource for the Apache

Only either one.

pcs resource create LATAMWEBGW ocf:heartbeat:apache  \
      configfile=/etc/httpd/conf/httpd.conf \
      statusurl="http://localhost/server-status" \
      op monitor interval=1min

As you may now, we have two resources now. One is the ClusterIP and the other one is LATAMWEBGW. If you do not do anything special, Cluster Manager load balances these to resources by running on the different nodes, that we do not want that. So, these two resources are dependent on each other. We need to add some constraints solve this problem.

Creating colocation constraint:

Colocation constraint tells the cluster manager that location of one resource is depended location of another resource.

On either node

pcs constraint colocation add LATAMWEBGW with ClusterIP INFINITY

By issuing the command above; we told cluster manager that LATAMWEBGW resource must be in the same node as ClusterIP resource.

Latest status of our cluster:

[root@pck1 conf.d]# pcs status
Cluster name: LATAM_GW
Stack: corosync
Current DC: pck1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Sun Dec  2 13:43:46 2018
Last change: Sun Dec  2 13:43:42 2018 by root via cibadmin on pck1

2 nodes configured
3 resources configured

Online: [ pck1 pck2 ]

Full list of resources:

 virsh-fencing	(stonith:fence_virsh):	Stopped
 ClusterIP	(ocf::heartbeat:IPaddr2):	Started pck1
 LATAMWEBGW	(ocf::heartbeat:apache):	Started pck1

Creating Order Constraint:

We may have still an issue when sending a request to the our Web Server. Other than, colocation constraint, we also need to tell the cluster software to order of the resources to be started. It is called order constraint.

[root@pck1 conf.d]# pcs constraint order ClusterIP then LATAMWEBGW
Adding ClusterIP LATAMWEBGW (kind: Mandatory) (Options: first-action=start then-action=start)
[root@pck1 conf.d]#

By issuing the command above we are telling the cluster manager to which resource to be started first. After we configured constraints, we can test our Web Server if it handles the requests properly.

tesla@otuken:~$ curl http://latamwebgw
 <html>
 <body>My Test Site - pck1.localdomain</body>
 </html>

Relocating the Resource to Another Node:

Sometimes we need to relocate the resources to the other node in order to maintain or upkeep of the nodes.

[root@pck2 ~]# pcs status
Cluster name: LATAM_GW
Stack: corosync
Current DC: pck1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Dec  5 23:26:49 2018
Last change: Mon Dec  3 09:27:47 2018 by root via crm_resource on pck1

2 nodes configured
3 resources configured

Online: [ pck1 pck2 ]

Full list of resources:

 virsh-fencing	(stonith:fence_virsh):	Stopped
 ClusterIP	(ocf::heartbeat:IPaddr2):	Started pck1
 LATAMWEBGW	(ocf::heartbeat:apache):	Started pck1

Failed Actions:
* virsh-fencing_start_0 on pck1 'unknown error' (1): call=14, status=Error, exitreason='',
    last-rc-change='Wed Dec  5 21:26:32 2018', queued=0ms, exec=1404ms
* virsh-fencing_start_0 on pck2 'unknown error' (1): call=14, status=Error, exitreason='',
    last-rc-change='Wed Dec  5 21:26:36 2018', queued=0ms, exec=1398ms


Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

As you may see current resources running on pck1, which is the first node. Let’s relocate it to the pck2 which is the second node.

[root@pck2 ~]# pcs resource move LATAMWEBGW pck2
[root@pck2 ~]# pcs status
Cluster name: LATAM_GW
Stack: corosync
Current DC: pck1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Wed Dec  5 23:34:10 2018
Last change: Wed Dec  5 23:33:42 2018 by root via crm_resource on pck2

2 nodes configured
3 resources configured

Online: [ pck1 pck2 ]

Full list of resources:

 virsh-fencing	(stonith:fence_virsh):	Stopped
 ClusterIP	(ocf::heartbeat:IPaddr2):	Started pck2
 LATAMWEBGW	(ocf::heartbeat:apache):	Started pck2

As you see all LATAMWEBGW resource is now running on pck2 which is the second node.

tesla@otuken:~$ curl http://latamwebgw
 <html>
 <body>My Test Site - pck2.localdomain</body>
 </html>

 

This is the end of  first part of the RHEL7 clustering part. We have not yet configured fencing, which is very crucial part of the clustering. In the next post we are going to configure fencing, stickiness and other cluster settings.

Happy Clustering 🙂