Building RedHat 6 Cluster

Prepare nodes and shared storage

Prepare one node similar to described at Mounting iSCSI LUN on RedHat 6 memo. Remove /etc/fstab /export entry, umount FS and export VG:

# vi /etc/fstab
# umount /export
# vgchange -a n datavg
# vgexport datavg

Prepare second node as first one. Map same netapp LUN to second node too. After multipath saw the new LUN, rescan PVs and try to mount FS:

# vgscan
# vgimport datavg
# vgchange -a y datavg
# mkdir /export ; mount -o discard /dev/datavg/export /export
# vi /etc/fstab

Test server restart to see if everythings start automatically. Remove /etc/fstab /export entry, umount FS and export VG exactly as was done for first node.

Prepare LVM for HA configuration. Edit /etc/lvm/lvm.conf with following changes.

Filter to see only relevant devices (multipath'ed), /dev/sdb is my "rootvg" (see HOWTO align VMware Linux VMDK files for reason of that):

filter = [ "a|/dev/mapper/nlun|","a|/dev/sdb|","r/.*/" ]

Name explicit VGs activated on LVM start (this is just a list ov VGs and tag - hearthbeat NIC's hostname) :

volume_list = [ "rootvg", "@vorh6t01.domain.com" ]

Initrd have to be rebuild to include new lvm.conf in it (otherwice cluster refuse to start):

mkinitrd -f /boot/initramfs-$(uname -r).img $(uname -r)

Repeat with /etc/lvm/lvm.conf changes on other node.

Generate root SSH keys and exchange it over cluster nodes:

vorh6t01 # ssh-keygen -t rsa -b 1024 -C "root@vorh6t"
.....
vorh6t01 # cat .ssh/id_rsa.pub >> .ssh/authorized_keys
vorh6t01 # scp -pr .ssh vorh6t02:

Copy SSH host keys between nodes and restart sshd.

Cluster software

One package requre another, thus install these RPM on both nodes:

# yum install lvm2-cluster ccs cman rgmanager

Setting Cluster

vorh6t01 and vorh6t02 are two nodes of HA (fail-over) cluser named vorh6t. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.

Define cluster:

# ccs_tool create -2 vorh6t

The command above create /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. -2 option required for two-node cluster; usual configuration suppose more than two nodes, to make quorum clear.

Open file and change nodenames to real names. Check:

# ccs_tool lsnode

Cluster name: vorh6t, config_version: 1

Nodename                        Votes Nodeid Fencetype
vorh6t01.domain.com                1    1    
vorh6t02.domain.com                1    2    
# ccs_tool lsfence
Name             Agent

I do not make deal with fencing right now. This section will be added later, once installed on real physical servers.

Copy /etc/cluster/cluster.conf to second node:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf

You can start sluster services now to see it working. Start it by /etc/init.d/cman start on both nodes. Check /var/log/messages. See clustat output:

vorh6t01 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:04:58 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online, Local
 vorh6t02.domain.com                                                2 Online

vorh6t02 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:05:07 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online
 vorh6t02.domain.com                                                2 Online, Local

Setting Cluster resources

Stop cluster services on both nodes by /etc/init.d/cman stop

There are two sections related to resources: <resources/> and <service/>. First section is about "Global" resources shared between services (like IP). Second is for resources grouped by service (like FS + script). Our cluster is single purpose cluster, then open only <service> section.

...
  <rm>
    <failoverdomains/>
    <resources/>
    <service autostart="1" name="vorh6t" recovery="relocate">
      <ip address="192.168.131.12/24" />
    </service>
  <rm>
...

Start, stop, switch service

Add cluster services to init scripts. Start cluster and resource manager on both nodes:

# chkconfig --add cman
# chkconfig cman on
# chkconfig --add rgmanager
# chkconfig  rgmanager on
# /etc/init.d/cman start
# /etc/init.d/rgmanager start
# clustat
Cluster Status for vorh6t @ Tue Oct  2 12:55:38 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 vorh6t01.domain.com                        1 Online, rgmanager
 vorh6t02.domain.com                        2 Online, Local, rgmanager

 Service Name                             Owner (Last)                                     State         
 ------- ----                             ----- ------                                     -----         
 service:vorh6t                           vorh6t01.domain.com                              started

Switch Service to another node:

# clusvcadm -r vorh6t -m vorh6t02  
Trying to relocate service:vorh6t...Success
service:vorh6t is now running on vorh6t02.domain.com

Freeze resources (for maintenance):

# clusvcadm -Z vorh6t
Local machine freezing service:vorh6t...Success

Resume normal operation:

# clusvcadm -U vorh6t   
Local machine unfreezing service:vorh6t...Success

Adding more resources

Resorce can and should be nested to create dependencies between them:

...
  <rm>
    <failoverdomains/>
    <resources/>
    <service autostart="1" name="vorh6t" recovery="relocate">
      <ip address="10.129.131.12/22">
        <lvm name="vorh6tlv" lv_name="export" vg_name="datavg">
          <fs name="vorh6tfs"
                device="/dev/datavg/export"
                mountpoint="/export"
                fstype="ext4"
                options="discard"
                force_unmount="1"
                self_fence="1"
          />
        </lvm>
      </ip>
    </service>
  </rm>
...

Increment config_version at the beginning of /etc/cluster/cluster.conf. Distribute updated /etc/cluster/cluster.conf and inform cluster about changes:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
# cman_tool version -r -S

Check /var/log/messages for errors on both nodes. Verify status with clustat. df should show you /export mounted on one of nodes.

Let's do something more original

Edit /etc/ssh/sshd_config file to force sshd listen only on local IPs (not on VIP). Populate shared /export with virtual FC16 distribution. Similar example can be found here Installing NFS based FC15 with updates for FC16. Then, still in chrooted FC16 skeleton, install openssh-server using yum. Exit chrooted FC16. Resulting /export listing should be similar:

# ls /export/
bin   dev  home  lib64       media  opt   root  sbin  sys  usr
boot  etc  lib   lost+found  mnt    proc  run   srv   tmp  var

Edit /export/etc/ssh/sshd_config to listen only on VIP.

Copy SSH host keys to FC16:

# cp -va /etc/ssh/ssh_host* /export/etc/ssh/
`/etc/ssh/ssh_host_dsa_key' -> `/export/etc/ssh/ssh_host_dsa_key'
`/etc/ssh/ssh_host_dsa_key.pub' -> `/export/etc/ssh/ssh_host_dsa_key.pub'
`/etc/ssh/ssh_host_key' -> `/export/etc/ssh/ssh_host_key'
`/etc/ssh/ssh_host_key.pub' -> `/export/etc/ssh/ssh_host_key.pub'
`/etc/ssh/ssh_host_rsa_key' -> `/export/etc/ssh/ssh_host_rsa_key'
`/etc/ssh/ssh_host_rsa_key.pub' -> `/export/etc/ssh/ssh_host_rsa_key.pub'

Create service script:

# cat /export/sshd-start-stop 
case "$1" in
start)
        mount -o bind /proc /export/proc
        mount -o bind /sys /export/sys
        mount -o bind /dev /export/dev
        mount -o bind /dev/pts /export/dev/pts
        chroot /export /usr/sbin/sshd
        ;;
stop)
        kill -9 $(netstat -tlnp | awk '/192.168.131.12:22/ {gsub("/sshd","");print $NF}')
        umount /export/dev/pts || umount -l /export/dev/pts
        umount /export/dev || umount -l /export/dev
        umount /export/sys || umount -l /export/sys
        umount /export/proc || umount -l /export/proc
        ;;
status)
        [ 'x'"$(netstat -tlnp | awk '/192.168.131.12:22/ {gsub("/sshd","");print $NF}')" = 'x' ] && exit 1
	exit 0
        ;; 
esac

Modify /etc/cluster/cluster.conf:

  <rm>
    <failoverdomains/>
    <resources/>
    <service autostart="1" name="vorh6t" recovery="relocate">
      <lvm name="vorh6tlv" lv_name="export" vg_name="datavg">
        <fs name="vorh6tfs"
                device="/dev/datavg/export" mountpoint="/export"
                fstype="ext4" options="discard"
                force_unmount="1" self_fence="1" >

          <ip address="10.129.131.12/22">
            <script name="vorh6tssh" file="/export/sshd-start-stop" />
          </ip>

        </fs>
      </lvm>
    </service>
  </rm>

Reload configuration, see SSH started on VIP. Connect to it, we are on FC16 !!:

$ SSH vorh6t
Last login: Thu Oct  4 05:33:09 2012 from ovws
-bash-4.2# cat /etc/issue       
Fedora release 16 (Verne)

This is the way you can implement HA solution for non cluster aware applications.

Using DRBD within cluster

Building RedHat 6 Cluster with DRBD is a seperate (but similar) document.


Updated on Tue Sep 2 10:33:52 IDT 2014 More documentations here