RedHat 6 (metro) HA cluster

This is an attempt to build something similar to metro cluster ability using RedHat 6 only. The POC is to simulate two different sites with server and storage on each one. Both servers see both LUNs from both sites. Linux LVM (or md) do mirror between LUNs coming from different sites. Cluster software do failover between nodes (sites). FS reads should be from closest disk, writes should go to both.

This POC uses two CISCO UCS blades in same cage and two LUNs coming from same NetApp, therefore it is only simulation of metro.

During this POC, LVM way prove itself unusable, because once one of "mirrored" LUNs goes offline (that supposed to be happen by design), LVM hung and cause hung everything. Therefore "MD Way" was added.

Then was discovered that tuning multipath parameters solves hung problem, therefore LVM way was rechecked and seems to be working well.

Installing software

Two UCS servers were installed with minimal RH6 installation, configured with boot from SAN (you can use local boot either, I just does not have internal disks installed).

Prepare SSH

All nodes in HA cluster have to have same SSH host keys to not make SSH clients crazy after fail over.

vorh6t01 # scp vorh6t02:/etc/ssh/ssh_host_\* /etc/ssh/
...
vorh6t01 # service sshd restart

Generate root SSH keys and exchange it over cluster nodes:

vorh6t01 # ssh-keygen -t rsa -b 1024 -C "root@vorh6t"
.....
vorh6t01 # cat .ssh/id_rsa.pub >> .ssh/authorized_keys
vorh6t01 # scp -pr .ssh vorh6t02:

Cluster software

Install theese RPMs on both nodes (with all depencies):

# yum install ccs cman rgmanager

Initial Cluster Settings

vorh6t01 and vorh6t02 are two nodes of HA (fail-over) cluser named vorh6t. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.

Define cluster (on any one node):

# ccs_tool create -2 vorh6t

The command above create /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. -2 option required for two-node cluster; usual configuration suppose more than two nodes, to make quorum clear.

Open file by editor and change nodenames to be real names. I am using transport="udpu" here, because my network does not support multicasts and broadcasts are not welcomed too. Without this option, my cluster works upredictable. The resulting file should be like:

<?xml version="1.0"?>
<cluster name="vorh6t" config_version="1">

  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
  </fencedevices>

  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>

Then check:

# ccs_tool lsnode

Cluster name: vorh6t, config_version: 1

Nodename                        Votes Nodeid Fencetype
vorh6t01.domain.com                1    1    
vorh6t02.domain.com                1    2    
# ccs_tool lsfence
Name             Agent

Copy /etc/cluster/cluster.conf to second node:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf

You can start sluster services now to see it working. Start it by /etc/init.d/cman start on both nodes. Check /var/log/messages. See clustat output:

vorh6t01 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:04:58 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online, Local
 vorh6t02.domain.com                                                2 Online

vorh6t02 # clustat 
Cluster Status for vorh6t @ Thu Sep 27 15:05:07 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 vorh6t01.domain.com                                                1 Online
 vorh6t02.domain.com                                                2 Online, Local

Adding resources

Stop cluster services on both nodes by /etc/init.d/cman stop

There are two sections related to resources: <resources/> and <service/>. First section is about "Global" resources shared between services (like IP). Second is for resources grouped by service (like FS + script). Our cluster is single purpose cluster, then open only <service> section.

...
  <rm>
    <failoverdomains/>
    <resources/>
    <service autostart="1" name="vorh6t" recovery="relocate">
      <ip address="192.168.131.12/24" />
    </service>
  <rm>
...

Start, stop, switch service

Add cluster services to init scripts. Start cluster and resource manager on both nodes:

# chkconfig --add cman
# chkconfig cman on
# chkconfig --add rgmanager
# chkconfig  rgmanager on
# /etc/init.d/cman start
# /etc/init.d/rgmanager start
# clustat
Cluster Status for vorh6t @ Tue Oct  2 12:55:38 2012
Member Status: Quorate

 Member Name                             ID   Status
 ------ ----                             ---- ------
 vorh6t01.domain.com                        1 Online, rgmanager
 vorh6t02.domain.com                        2 Online, Local, rgmanager

 Service Name                             Owner (Last)                                     State         
 ------- ----                             ----- ------                                     -----         
 service:vorh6t                           vorh6t01.domain.com                              started

Switch Service to another node:

# clusvcadm -r vorh6t -m vorh6t02  
Trying to relocate service:vorh6t...Success
service:vorh6t is now running on vorh6t02.domain.com

Freeze resources (for maintenance):

# clusvcadm -Z vorh6t
Local machine freezing service:vorh6t...Success

Resume normal operation:

# clusvcadm -U vorh6t   
Local machine unfreezing service:vorh6t...Success

Adding Fencing

RH cluster behaviour is almost broken without well configured fencing. You can see available fencing methods in /usr/sbin/fence*.

CISCO had supplied fencing for UCS, lets use it. Create user for fencing at UCS or use existing one. Give him "poweroff" + "profile change" roles. Check if you can use this user:

vorh6t02:~ # fence_cisco_ucs -z --action=status --ip=UCSNAME --username=USERNAME --password=PASSWORD --suborg=SUBORG --plug=vorh6t01
Status: ON
vorh6t02:~ # fence_cisco_ucs -z --action=off --ip=UCSNAME --username=USERNAME --password=PASSWORD --suborg=SUBORG --plug=vorh6t01
Success: Powered OFF
vorh6t02:~ # fence_cisco_ucs -z --action=on --ip=UCSNAME --username=USERNAME --password=PASSWORD --suborg=SUBORG --plug=vorh6t01
Success: Powered ON

The --suborg string is usually your "Sub-Organization" name with prefix "org-". If you called your "Sub-Organization" "Test", then results will be --suborg=org-Test.

Once fencing tests worked, fix cluster.conf:

# cat /etc/cluster/cluster.conf                                                                                         
<?xml version="1.0"?>                                                                                                                            
<cluster name="vorh6t" config_version="2">                                                                                                       
        <logging syslog_priority="error"/>                                                                                                       

  <fence_daemon post_fail_delay="20" post_join_delay="30" clean_start="1" />
  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>                         
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>                                                      
        <method name="single">                                     
                <device name="ucsfence" port="vorh6t01" action="off" />
        </method>                                                  
      </fence>                                                     
    </clusternode>                                                 
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
                <device name="ucsfence" port="vorh6t02" action="off" />
        </method>
      </fence>
    </clusternode>
  </clusternodes>

  <fencedevices>
        <fencedevice name="myfence" agent="fence_manual" />
	<fencedevice name="ucsfence"
		agent="fence_cisco_ucs"
		ipaddr="UCSNAME"
		login="USERNAME"
		passwd="PASSWORD"
		ssl="on"
		suborg="SUBORG"
		/>
  </fencedevices>

  <rm>
    <failoverdomains/>
    <resources/>
      <service autostart="1" name="vorh6t" recovery="relocate">
       <ip address="192.168.131.12/24" />
    </service>
  </rm>
</cluster>

Do not forget increment config_version number and save changes. Verify config file:

# ccs_config_validate
Configuration validates

Distribute file and update cluster:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
vorh6t01 # cman_tool version -r -S

Let's stop network (/etc/init.d/network stop) on active node and see cluster kill "bad" server !

Looks like you need restart cman to reread config file.

Storage settings

Lets create two volumes with LUNs and map both them to both clusters nodes:

netapp> igroup create -f -t linux vorh6t01 $WWN1 $WWN2
netapp> igroup set vorh6t01 alua yes
netapp> igroup create -f -t linux vorh6t02 $WWN1 $WWN2
netapp> igroup set vorh6t02 alua yes
netapp> vol create vorh6t01 -s none $AGGRNAME 600g
netapp> exportfs -z /vol/vorh6t01
netapp> vol options vorh6t01 minra on
netapp> vol autosize vorh6t01 -m 14t on
netapp> lun create -s 250g -t linux -o noreserve /vol/vorh6t01/data
netapp> lun map /vol/vorh6t01/data vorh6t01
netapp> lun map /vol/vorh6t01/data vorh6t02
netapp> vol create vorh6t02 -s none $AGGRNAME 600g
netapp> exportfs -z /vol/vorh6t02
netapp> vol options vorh6t02 minra on
netapp> vol autosize vorh6t02 -m 14t on
netapp> lun create -s 250g -t linux -o noreserve /vol/vorh6t02/data
netapp> lun map /vol/vorh6t02/data vorh6t01
netapp> lun map /vol/vorh6t02/data vorh6t02

Rescan FC for changes (on both nodes):

# for FC in /sys/class/fc_host/host?/issue_lip ; do echo "1" > $FC ; sleep 5 ; done ; sleep 20

Use NetApp LUN serial to WWID converter to calculate LUN's WWID. Fix LUN's name in /etc/multipath.conf. I've choosed to use name "site01" and "site02" to reflect simulation of two storages for local and remote site.

multipaths {
        multipath {
                wwid 360a980004176596d6a3f447356493258
                alias site01
        }
        multipath {
                wwid 360a980004176596d6a3f44735649325a
                alias site02
        }
}

Run multipath command on both nodes and see both new LUNs recognized and use multipaths.

# multipath
# multipath -ll

Easy part is over.

LVM way

Fix /etc/lvm/lvm.conf filter line to filter out plain SCSI disk. My line is an example to explicit adding used devices, ignoring other:

...
filter = [ "a|/dev/mapper/mroot|", "a|/dev/mapper/site|", "r/.*/" ]
...

Create mirrored LV on one node. Make "site01" disk preferrable for reading:

vorh6t01:~ # pvcreate --dataalignment 4k /dev/mapper/site01
  Physical volume "/dev/mapper/site01" successfully created
vorh6t01:~ # pvcreate --dataalignment 4k /dev/mapper/site02
  Physical volume "/dev/mapper/site02" successfully created
vorh6t01:~ # vgcreate orahome /dev/mapper/site0?
  Volume group "orahome" successfully created
vorh6t01:~ # pvs
  PV                   VG      Fmt  Attr PSize   PFree
  /dev/mapper/mroot0p2 rootvg  lvm2 a--   39.88g  34.78g
  /dev/mapper/site01   orahome lvm2 a--  250.00g 250.00g
  /dev/mapper/site02   orahome lvm2 a--  250.00g 250.00g
vorh6t01:~ # lvcreate -n export -L 20g --type raid1 -m1 --nosync /dev/orahome
  WARNING: New raid1 won't be synchronised. Don't read what you didn't write!
  Logical volume "export" created
vorh6t01:~ # lvchange --writemostly /dev/mapper/site02:y /dev/orahome/export
  Logical volume "export" changed.
vorh6t01:~ # mkfs.ext3 -j -m0 -b4096 /dev/orahome/export
vorh6t01:~ # mkdir /export && mount /dev/orahome/export /export

Make some IO tests on /export and verify that writes goes to both LUNs, while reads only from site1 disk.

Test it works on partner:

vorh6t01:~ # umount /export/
vorh6t01:~ # vgchange -a n orahome
  0 logical volume(s) in volume group "orahome" now active
vorh6t02:~ # vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "orahome" using metadata type lvm2
  Found volume group "rootvg" using metadata type lvm2
vorh6t02:~ # vgchange -ay orahome
  1 logical volume(s) in volume group "orahome" now active
vorh6t02:~ # mount /dev/orahome/export /export

Running IO test shows that volume remember previous settings and "site02" remains writemostly LUN. It is not a desired situation at node(site) 02. Lets fix it:

vorh6t02:~ # lvchange --writemostly /dev/mapper/site02:n /dev/orahome/export
  Logical volume "export" changed.
vorh6t02:~ # lvchange --writemostly /dev/mapper/site01:y /dev/orahome/export
  Logical volume "export" changed.

Repeat IO tests. Now it behaive as desired.

Adding LVM resources to cluster

Fix /etc/lvm/lvm.conf to name explicit VGs activated on LVM start (this is just a list of VGs and tag - hearthbeat NIC's hostname) :

volume_list = [ "rootvg", "@vorh6t01.domain.com" ]

Initrd have to be rebuild to include new lvm.conf in it (otherwice cluster refuse to start):

mkinitrd -f /boot/initramfs-$(uname -r).img $(uname -r)

Repeat with /etc/lvm/lvm.conf changes on other node.

Let's add our LV to clusters resources. Edit /etc/cluster/cluster.conf, do not forget increment config_version:

...
<cluster name="vorh6t" config_version="3">
...
  <rm>
    <failoverdomains/>
    <resources/>
      <service autostart="1" name="vorh6t" recovery="relocate">
       <ip address="192.168.131.12/24">
        <lvm name="vorh6tlv" lv_name="export" vg_name="orahome">
          <fs name="vorh6tfs"
                device="/dev/orahome/export"
                mountpoint="/export"
                fstype="ext3"
                force_unmount="1"
                self_fence="1"
          />
        </lvm>
       </ip>
    </service>
  </rm>
...

Distribute file and update cluster:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
vorh6t01 # cman_tool version -r -S

See cluster took LV and mount it somewhere.

Now we will create script that will tune "writemostly" parameter. This file should be "LSB" comlient (these scripts in "/etc/init.d" are). I've copied shortest script from /etc/init.d and this is a result:

#!/bin/sh
# /export/site-tune
# description: Adjust writemostly parameter for mirrored /dev/orahome/export
# Everything hardcoded.

case "$1" in
  start)
        if grep -q 01 /proc/sys/kernel/hostname ; then
                lvchange --writemostly /dev/mapper/site01:n /dev/orahome/export
                lvchange --writemostly /dev/mapper/site02:y /dev/orahome/export
                echo "tuned 01 read, 02 write"
        else
                lvchange --writemostly /dev/mapper/site01:y /dev/orahome/export
                lvchange --writemostly /dev/mapper/site02:n /dev/orahome/export
                echo "tuned 02 read, 01 write"
        fi
        ;;

  status|monitor) ;;
  stop) ;;
  restart|reload|force-reload|condrestart|try-restart) ;;
  *) echo "Usage: $0 start|stop|status" ;;
esac
exit 0

Make it executable and test it functionality.

Now we'll add new cluster resource type script. It will be nested in "fs" resource because the script located on this FS.

...
<cluster name="vorh6t" config_version="4">
...
        <lvm name="vorh6tlv" lv_name="export" vg_name="orahome">
          <fs name="vorh6tfs"
                device="/dev/orahome/export"
                mountpoint="/export"
                fstype="ext3"
                force_unmount="1"
                self_fence="1">
            <script name="vorh6tfstune" file="/export/site-tune" />
          </fs>
        </lvm>
...

Distribute file and update cluster:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
vorh6t01 # cman_tool version -r -S

Make cluster failovers, check "writemostly" bit changed as desired:

vorh6t01:~ # lvs -a
  LV                VG      Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  export            orahome Rwi-aor---  20.00g                               100.00
  [export_rimage_0] orahome iwi-aor---  20.00g
  [export_rimage_1] orahome iwi-aor-w-  20.00g
  [export_rmeta_0]  orahome ewi-aor---   4.00m
  [export_rmeta_1]  orahome ewi-aor---   4.00m
...

On second node, after failover:

vorh6t02:~ # lvs -a
  LV                VG      Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  export            orahome Rwi-aor---  20.00g                               100.00
  [export_rimage_0] orahome iwi-aor-w-  20.00g
  [export_rimage_1] orahome iwi-aor---  20.00g
  [export_rmeta_0]  orahome ewi-aor---   4.00m
  [export_rmeta_1]  orahome ewi-aor---   4.00m
...

Testing LVM way

Now, take off-line one of mirrored LUNs. The result is unexpected. LVM hungs, cluster hungs, then file system IO hungs also. Googling found that the problem was arized already at RedHat bugzilla, however there is no hope for quick fix. This bug marked as lack of interest at end user. Please vote for bug fix.

MD way

If you did "LVM Way", restore all files as they was prior LVM part.

Create mirror on one node as:

vorh6t01:~ # mdadm -C /dev/md0 -n2 -l1 /dev/mapper/site0?
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

If LUN size is quite big, you can use --assume-clean flag to eliminate initial resynchronization. This flag is not recommended by mdadm documentation, but I found it usefull for my thin devices.

Stop array for now:

vorh6t01:~ # mdadm -S /dev/md0
mdadm: stopped /dev/md0

IMPORTANT: Lets assume that "passive" node rebooted for some reason. This will be dangerous that this node will try to assemble MD RAID during boot time when "active" partner still use it. Cluster software will care about brain split during run time, but not at boot time.

Disable MD assembling during initrd boot time. Add rd_NO_MD to kernel boot line at /boot/grub/grub.conf. Other init scripts will try to start MD raids too. It is a lot of places to patch, therefore I've did lazy thing renaming /sbin/dmadm to /sbin/dmadm.real and creating stub script instead of original. OS updates will replace it back, but this will be solved in script you will see later.

# /sbin/mdadm --version | grep -q Fake || mv -f /sbin/mdadm /sbin/mdadm.real && \
echo -e '#!/bin/bash\necho "Fake stub. Use /sbin/mdadm.real"' > /sbin/mdadm && chmod +x /sbin/mdadm

Now reboot node(s) and check that there is no /dev/md?* devices created at boot time. Please do not continue without this phase.

Creating script resource to make MD run

This is my /root/bin/mdscript script. The script should be LSB complient by exit codes. This script starts md0 device and set "writemostly" bit on relevant device. The longest string detects real name masked by dm-XX.(Mapping DM device to name).

#!/bin/sh                                                                                                                                                                            
# Everything hardcoded.                                                                                                                                                              

# Take care about mdadm RPM updates:
/sbin/mdadm --version | grep -q Fake || mv -f /sbin/mdadm /sbin/mdadm.real && \
echo -e '#!/bin/bash\necho "Fake stub. Use /sbin/mdadm.real"' > /sbin/mdadm && chmod +x /sbin/mdadm

MDADM="/sbin/mdadm.real"

case "$1" in
  start)    
        $0 stop
        $MDADM -A md0 /dev/mapper/site0?
        [ ! -b /dev/md0 ] && exit 1

        # Find names of devices:
        SITE01=$(grep -H site0 /sys/block/md0/md/dev-*/block/dm/name|awk '/site01/{gsub("/block/dm/name:.*","");print $1}')
        SITE02=$(grep -H site0 /sys/block/md0/md/dev-*/block/dm/name|awk '/site02/{gsub("/block/dm/name:.*","");print $1}')

        if grep -q 01 /proc/sys/kernel/hostname ; then
                echo "writemostly" > ${SITE02}/state
        else
                echo "writemostly" > ${SITE01}/state
        fi
        ;;

  status|monitor)
        [ ! -b /dev/md0 ] && exit 3
        ;;
  stop)
        for md in $( ls /dev/md?* 2>/dev/null) ; do
                $MDADM -S $md
        done
        sleep 1
        [ -b /dev/md0 ] && exit 1
        ;;
  *) echo "Usage: $0 start|stop|status" ;;
esac
exit 0

Copy script to partner node and add script resource to cluster:

...
<cluster name="vorh6t" config_version="3">
...
  <rm>
    <failoverdomains/>
    <resources/>
      <service autostart="1" name="vorh6t" recovery="relocate">
       <ip address="192.168.131.12/24">
        <script name="vorh6tmd" file="/root/bin/mdscript" />
       </ip>
    </service>
  </rm>
...

Spread config file, restart services and check cat /proc/mdstat. You should see md0 running on active node. If not, recheck configuration.

If it looks working, lets format it on active node. I've decided not use LVM over my md device as unneccesary overhead.

vorh6t01:~ # mkfs.ext3 -j -m0 -b4096 -E nodiscard /dev/md0
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=16 blocks
16375808 inodes, 65503184 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1999 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 29 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Add FS resource to cluster:

...
<cluster name="vorh6t" config_version="4">
...
  <rm>
    <failoverdomains/>
    <resources/>
      <service autostart="1" name="vorh6t" recovery="relocate">
       <ip address="192.168.131.12/24">
        <script name="vorh6tmd" file="/root/bin/mdscript">
          <fs name="vorh6tfs"
                device="/dev/md0"
                mountpoint="/export"
                fstype="ext3"
                force_unmount="1"
                self_fence="1"
          />
        </script>
       </ip>
    </service>
  </rm>
...

Distribute file and update cluster:

vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
vorh6t01 # cman_tool version -r -S

Check /export mounted by rgmanager.

Testing MD way

Simple test, turning off one of mirrored data LUN. Again, unexpected result occure. Multipath cry about lost paths, cat /proc/mdstat hungs, sync command hungs. Turning online missing LUN solves problem. The problem is exactly the same as in "LVM Way". May be problem with multipath configuration?

YES! The default feature queue_if_no_path cause IO hang to all-path-dead device. There is a command to change multipath features on-line. Let's add it to our script at "status" :

...
  status|monitor)
        /sbin/dmsetup message site01 0 "fail_if_no_path"
        /sbin/dmsetup message site02 0 "fail_if_no_path"
        [ ! -b /dev/md0 ] && exit 3
        ;;
...

Wow! Much better now. Everything works as it expected. The big minus of solution is full resync of devices and manual intervention to make mirror back.

Let's recheck "LVM Way" with multipath feature trick.

LVM way back

Restore everything were done for "LVM Way" previous chapter.

Let's add multipath feature resetting to our /export/site-tune script status case:

...
  status|monitor)
        /sbin/dmsetup message site01 0 "fail_if_no_path"
        /sbin/dmsetup message site02 0 "fail_if_no_path"
        sleep 1
        ;;
...

Check with multipath -ll that there is no queue_if_no_path in feature field for "site0?" devices.

Take offline one of mirrored LUNs. Huh! No more hungs. Great. Let's see it resynchronizing back when LUN return:

# lvs -a                                                          
  LV                VG      Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  export            orahome Rwi-aor-r-  20.00g                               100.00
  [export_rimage_0] orahome iwi-aor-w-  20.00g
  [export_rimage_1] orahome iwi-aor-r-  20.00g
  [export_rmeta_0]  orahome ewi-aor---   4.00m
  [export_rmeta_1]  orahome ewi-aor-r-   4.00m
...

According to manual, this "r" bit means "refresh needed". A repair occur automatically, only this bit remains. Try command lvchange --refresh to clean bit.

# lvchange --refresh /dev/orahome/export
# lvs -a
  LV                VG      Attr       LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  export            orahome Rwi-aor---  20.00g                               100.00
  [export_rimage_0] orahome iwi-aor-w-  20.00g
  [export_rimage_1] orahome iwi-aor---  20.00g
  [export_rmeta_0]  orahome ewi-aor---   4.00m
  [export_rmeta_1]  orahome ewi-aor---   4.00m
...

Everything back to OK status. Another command lvconvert --repair orahome/export may be used for full resync if LUNs were out of sync for long time.


Updated on Wed Dec 31 15:03:51 IST 2014 More documentations here