Building RedHat 6 Cluster

Prepare nodes

Install RH6 with minimal configuration on two VMs (Look into HOWTO align VMware Linux VMDK files ). Add (vmdk) disk to every node for application. Mine configuration looks as follow (on both nodes):

/dev/sda	128m	-> First partition /dev/sda1 used for /boot
/dev/sdb	8g	-> Whole disk used as PV for rootvg
/dev/sdc	30g	-> Whole disk used as PV for orahome

Copy host SSH keys from one node to another:

vorh6t01 # scp vorh6t02:/etc/ssh/ssh_host_\* /etc/ssh/
...
vorh6t01 # service sshd restart

Generate root SSH keys and exchange it over cluster nodes:

vorh6t01 # ssh-keygen -t rsa -b 1024 -C "root@vorh6t"
.....
vorh6t01 # cat .ssh/id_rsa.pub >> .ssh/authorized_keys
vorh6t01 # scp -pr .ssh vorh6t02:

Update /etc/hosts file with all relevant IPs and put it on both nodes.

Do following LVM preparation on both nodes:

# pvcreate --dataalignment 4k /dev/sdc 
  Physical volume "/dev/sdc" successfully created
# vgcreate orahome /dev/sdc
  Volume group "orahome" successfully created
# lvcreate -n export -L25g /dev/orahome
  Logical volume "export" created

No need mkfs, only lvcreate

Installing DRBD software

DRBD (Distributed Replicated Block Device) will convert our both /dev/sdc disks, dedicated to each node, to behaive like shared storage. This will make our VMs becomes storage independant and allows RH fail-over cluster work.

There still no binary distribution for RH6, however you can purchase it with support from author LINBIT. However you still able to compile it from source (thanks to GPL)

# yum install make gcc kernel-devel flex rpm-build libxslt
# cd /tmp && wget -q -O - http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz | tar zxvf -
# cd drbd-8.4.4/
# ./configure --with-utils --with-km --with-udev --with-rgmanager --with-bashcompletion --prefix=/usr --localstatedir=/var --sysconfdir=/etc
# make
# make install

Note:: You have to recompile kernel module every time you upgrade kernel

# make module

Configuring DRBD

You can put everything in /etc/drbd.conf, however recommended by LINBIT practice to separate common and resources configuration by include directive:

# cat /etc/drbd.conf
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

Copy global_common.conf from distribution to /etc/drbd.d and edit it to fix your needs.

# cat /etc/drbd.d/global_common.conf 
global {                                                      
        usage-count no;                                       
}                                                             

common {
        handlers { }                                                                                                                                                           

        startup {
                wfc-timeout 300;
                degr-wfc-timeout 0;
        }

        options { }

        disk { }

        net {
                protocol        C;
                cram-hmac-alg   sha1;
                shared-secret   "9szdFmSkQEoXU1s7UNVbpqYrhhIsGjhQ4MxzNeotPku3NkJEq3LovZcHB2pITRy";
                use-rle yes;
        }
}

Some security is not a bad idea, use "shared-secret".

# cat /etc/drbd.d/export.res
resource export {
        device    /dev/drbd1;
        disk      /dev/orahome/export;
        meta-disk internal;

        disk {
		resync-rate 40M;
		fencing resource-and-stonith;
	}
        net {
		csums-alg sha1;
		after-sb-0pri discard-zero-changes;
		after-sb-1pri discard-secondary;
		after-sb-2pri disconnect;
	}
	handlers {
		fence-peer	"/usr/lib/drbd/rhcs_fence";
	}

        on vorh6t01.domain.com { address   10.10.10.240:7789; }
        on vorh6t02.domain.com { address   10.10.10.241:7789; }
}

I've added 10.10.10/24 NIC to both VM for replication purpose only. As you can see, DRBD run over LVM's logical volume. This will help me with backup procedure later. LVM over DRBD works also, but cluster software cannot manage this configuration well, thus it not recommended.

Replicate configuration to second node:

root@vorh6t01:~ # scp -pr /etc/drbd.* root@vorh6t02:/etc/

Initialize DRBD:

root@vorh6t01:~ # drbdadm create-md export
...
root@vorh6t02:~ # drbdadm create-md export
...
root@vorh6t01:~ # drbdadm up export
root@vorh6t02:~ # drbdadm up export
# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@vorh6t01.domain.com, 2014-03-18 12:05:58

 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:31456284

As you can see, it is in Connected state, both sides marked as Secondary and Inconsistent

Let's help DRBD to take decision:

root@vorh6t01:~ # drbdadm primary --force export
root@vorh6t01:~ # cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@vorh6t01.domain.com, 2014-03-18 12:05:58

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:2169856 nr:0 dw:0 dr:2170520 al:0 bm:132 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:27475996
        [>...................] sync'ed:  7.4% (26832/28948)M
        finish: 0:11:03 speed: 41,416 (27,464) K/sec

OK, vorh6t01 becomes Primary and UpToDate and synchronization beguns.

Format our FS:

root@vorh6t01:~ # mkfs.ext3 -j -m0 -b4096 /dev/drbd1
...

Try mount:

root@vorh6t02:~ # mkdir /export
root@vorh6t01:~ # mkdir /export && mount /dev/drbd1 /export
root@vorh6t01:~ # df
Filesystem            Size  Used Avail Use% Mounted on
...
/dev/drbd1             25G  5.9G   19G  24% /export
root@vorh6t01:~ # umount /export

Fix checkconfig line of /etc/init.d/drbd script on both nodes. Also remove any hint lines between ### BEGIN INIT INFO and ### END INIT INFO. This fix will adjust drbd start/stop to correct place (as for RH6), between network and clusterware.

...
# chkconfig: 2345 20 80
...
### BEGIN INIT INFO
# Provides: drbd
### END INIT INFO
...

Make DRBD starting at boot time on both nodes:

# chkconfig --add drbd
# chkconfig drbd on

Cluster software

Install these RPM on both nodes (with all depencies):

# yum install lvm2-cluster ccs cman rgmanager

Setting Cluster

vorh6t01 and vorh6t02 are two nodes of HA (fail-over) cluser named vorh6t. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.

Define cluster:

# ccs_tool create -2 vorh6t

The command above create /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. -2 option required for two-node cluster; usual configuration suppose more than two nodes, to make quorum clear.

Open file and change nodenames to real names. The resulting file should be like:

<?xml version="1.0"?>
<cluster name="vorh6t" config_version="1">

  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
  </fencedevices>

  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

I am using transport="udpu" here, because my network does not support multicasts and broadcasts are not welcomed too. Without this option, my cluster works upredictable. Check:

# ccs_tool lsnode

Cluster name: vorh6t, config_version: 1

Nodename                        Votes Nodeid Fencetype
vorh6t01.domain.com                1    1    
vorh6t02.domain.com                1    2    
# ccs_tool lsfence
Name             Agent

Copy /etc/cluster/cluster.conf to second node:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf

You can start sluster services now to see it working. Start it by /etc/init.d/cman start on both nodes. Check /var/log/messages. See clustat output:

vorh6t01 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:04:58 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online, Local
 vorh6t02.domain.com                                                2 Online

vorh6t02 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:05:07 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online
 vorh6t02.domain.com                                                2 Online, Local

Setting Cluster resources

Stop cluster services on both nodes by /etc/init.d/cman stop

There are two sections related to resources: <resources/> and <service/>. First section is about "Global" resources shared between services (like IP). Second is for resources grouped by service (like FS + script). Our cluster is single purpose cluster, then open only <service> section.

...
  <rm>
    <failoverdomains/>
    <resources/>
    <service autostart="1" name="vorh6t" recovery="relocate">
      <ip address="192.168.131.12/24" />
    </service>
  <rm>
...

Copy config file to second node.

Start, stop, switch service

Add cluster services to init scripts. Start cluster and resource manager on both nodes:

# chkconfig --add cman
# chkconfig cman on
# chkconfig --add rgmanager
# chkconfig  rgmanager on
# /etc/init.d/cman start
# /etc/init.d/rgmanager start
# clustat
Cluster Status for vorh6t @ Tue Oct  2 12:55:38 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 vorh6t01.domain.com                        1 Online, rgmanager
 vorh6t02.domain.com                        2 Online, Local, rgmanager

 Service Name                             Owner (Last)                                     State         
 ------- ----                             ----- ------                                     -----         
 service:vorh6t                           vorh6t01.domain.com                              started

Switch Service to another node:

# clusvcadm -r vorh6t -m vorh6t02  
Trying to relocate service:vorh6t...Success
service:vorh6t is now running on vorh6t02.domain.com

Freeze resources (for maintenance):

# clusvcadm -Z vorh6t
Local machine freezing service:vorh6t...Success

Resume normal operation:

# clusvcadm -U vorh6t   
Local machine unfreezing service:vorh6t...Success

Adding more resources

Resorce can and should be nested to create dependencies between them:

...
  <rm>
    <failoverdomains/>
    <resources/>
        <service autostart="1" name="vorh6t" recovery="relocate">
                <drbd name="vorh6tdrdb" resource="export">
                        <fs name="vorh6tfs"
                                device="/dev/drbd/by-res/export/0"
                                mountpoint="/export"
                                fstype="ext3"
                                force_unmount="1"
                                self_fence="1"
                        />
                </drbd>
                <ip address="192.168.168.3/22" />
        <service>
  </rm>
...

Increment config_version at the beginning of /etc/cluster/cluster.conf. Distribute updated /etc/cluster/cluster.conf and inform cluster about changes:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
# cman_tool version -r -S

Check /var/log/messages for errors on both nodes. Verify status with clustat. df should show you /export mounted on one of nodes.

Adding Fencing

RH cluster almost broken without well configured fencing. You can see available fencing methods in /usr/sbin/fence*. Now we'll use VmWare fencing. Install prerequisites:

# yum install openssl-devel

Install VI Perl Toolkit on both nodes ; Somtime VmWare call it vSpher SDK, CLI or whatever. It should install /usr/lib/vmware-vcli/apps/ and other tools in /usr/bin. Package that was called "VMware-vSphere-Perl-SDK-5.5.0*" was OK for me.

/tmp # tar zxf VMware-vSphere-Perl-SDK-5.5.0-2043780.x86_64.tar.gz
/tmp # cd vmware-vsphere-cli-distrib
/tmp/vmware-vsphere-cli-distrib # ./vmware-install.pl

Check everything works and your user has some rights to talk with VC. Check it on both nodes:

# fence_vmware --action=status --ip="VCNAME" --username="VCUSER" --password="PASSWORD" --plug=vorh6t01

Fix /etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster name="vorh6t" config_version="3">

  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
                <device name="vmware" port="vorh6t01" />
        </method>
      </fence>
    </clusternode>
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
                <device name="vmware" port="vorh6t02" />
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
        <fencedevice name="vmware" agent="fence_vmware" ipaddr="VCNAME" action="off" login="VCUSER" passwd="PASSWORD" /> 
  </fencedevices>
...

port is name of VM on VC, ipaddr is name or IP of VC

Copy to neighbour and propagate changes:

vorh6t01:~ # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
vorh6t01:~ # cman_tool version -r -S

Load script

# cat /export/load.sh
#!/bin/bash

mkdir /export/$(hostname -s)
cd /export/$(hostname -s) || exit

i=1;
while [ true ] ; do
        a=$(dd if=/dev/urandom bs=4k count=256 2>/dev/null)
        echo "$a" > $(echo "$a" | md5sum -b | awk '{print $1}')
        echo $i
        i=$(($i+1))
done
# chmod +x /export/load.sh

Manual fencing

If you know second node definetely dead, but your fencing had not worked (ESX dead), you can acknoledge fencing manually, like:

root@vorh6t02:~ # fence_ack_manual vorh6t01.domain.com
About to override fencing for vorh6t01.domain.com.             
Improper use of this command can cause severe file system damage.

Continue [NO/absolutely]? absolutely
Done

Updated on Sun Dec 14 12:29:20 IST 2014 More documentations here