Setting SAP HANA System Replication on SuSE SLE11 SP4

Two sle11sp4 were installed on different sites with a stretched network. The SuSE HA solution can not be implemented there because of the unreliable non-duplicated connection between the sites. Therefore, we will configure the HANA System Replication. The fail-over will be performed manually using the prepared scripts. Virtual IP will follow the active instance using additional scripts.

Installing SAP HANA

Preparing nodes

Configure NTP and name resolution. Even if you have a reliable DNS, put everything associated with the cluster in /etc/hosts, and copy it between the nodes:

# cat /etc/hosts
127.0.0.1       localhost

192.168.80.31   vhana01.domain.local vhana01
192.168.80.32   vhana02.domain.local vhana02
192.168.80.33   vhana.domain.local vhana

Exchange root SSH keys and copy host SSH keys from one to another.

Preparing the unattend configuration file

The installation of SAP products must be performed using the hdblcm tool. You can also use this tool to perform other post installation tasks, such as renaming a host.

Locate the hdblcm tool:

root@vhana01:~ # find /mnt/cdrom -type f -name hdblcm
/mnt/cdrom/HANA/DATA_UNITS/SAP HANA DATABASE 1.0 FOR B1/LINX64SUSE/SAP_HANA_DATABASE/hdblcm
root@vhana01:~ # HDBLCM=$(find /mnt/cdrom -type f -name hdblcm)

The first command found the only tool on my installation media, so I assigned the result to the variable to shorten the commands.

You can generate the configuration file template as follows:

root@vhana01:~ # "$HDBLCM" --action=install --dump_configfile_template=install.ini
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "C.utf8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").


SAP HANA Lifecycle Management - SAP HANA 1.00.122.05.1481577062
***************************************************************


Scanning Software Locations...
Detected components:
    SAP HANA Database (1.00.122.05.1481577062) in /mnt/cdrom/HANA/DATA_UNITS/SAP HANA DATABASE 1.0 FOR B1/LINX64SUSE/SAP_HANA_DATABASE/server
Config file template '/root/install.ini' written
Password file template '/root/install.ini.xml' written
Configuration file template created.

Now you can edit it. The default values can be omitted. Here is an example of my file:

root@vhana01:~ # cat /root/install.ini
[General]
component_root=/mnt/cdrom/HANA/DATA_UNITS
components=all
ignore=check_signature_file

[Server]
root_password=ROOT_PASSWORD
sid=PRD
number=00
sapadm_password=SAPADM_PASSWORD
password=INSTANCE_OWNER_PASSWORD
system_user_password=SYSTEM_PASSWORD
autostart=y
isc_mode=standard

[Action]
action=install

Installing SAP HANA

Install SAP HANA on both nodes with same SID (PRD) and NUM (00):

# "$HDBLCM" --configfile=install.ini --ignore=check_signature_file -b
 ..
Note: Deployment of SAP Host Agent configurations finished with errors

I will not show here the whole output, it is very long. Check the log files for errors. Despite the error message about the Host Agent, everything seems to be working. Make sure that users created by the installer have the same UID on both servers.

Setup System Replication

A full backup is required to initialize the replication. Create it on the node, which should be the master.

root@vhana01:~ # su - prdadm
vhana01:~> hdbsql -u SYSTEM -i 00 "BACKUP DATA USING FILE ('/hana/backup/FULL')"
Single Sign-On authentication failed
Password: 
0 rows affected (overall time 6328.477 msec; server time 6327.586 msec)

Enable system replication on the master node, give it a logical name related to the location of the site:

vhana01:~> hdbnsutil -sr_enable --name=PRODSITE
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.

vhana01:~> hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
online: true

mode: primary
site id: 1
site name: PRODSITE

Host Mappings:
~~~~~~~~~~~~~~


done.

Stop the secondary database:

root@vhana02:~ # su - prdadm
vhana02:~> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

18.12.2017 14:21:51
Stop
OK
Waiting for stopped instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2


18.12.2017 14:22:11
WaitforStopped
OK
hdbdaemon is stopped.

Register the secondary system:

vhana02:~> hdbnsutil -sr_register \
	--remoteHost=vhana01 \
	--remoteInstance=00 \
	--replicationMode=async \
	--operationMode=logreplay \
	--name=DRSITE
adding site ...
checking for inactive nameserver ...
nameserver vhana02:30001 not responding.
collecting information ...
updating local ini files ...
done.
vhana02:~> HDB start
 ..
vhana02:~> hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
online: true

mode: async
site id: 2
site name: DRSITE
active primary site: 1


Host Mappings:
~~~~~~~~~~~~~~

vhana02 -> [PRODSITE] vhana01
vhana02 -> [DRSITE] vhana02

primary masters:vhana01

done.

The hdbnsutil -sr_state command also works on primary site.

vhana01:~> hdbnsutil -sr_state --sapcontrol=1
checking for active or inactive nameserver ...
SAPCONTROL-OK: <begin>
online=true
mode=primary
site id=1
site name=PRODSITE
mapping/vhana01=PRODSITE/vhana01
mapping/vhana01=DRSITE/vhana02
SAPCONTROL-OK: <end>

done.

Much more information you can get with the another command:

vhana01:~> hdbcons -e hdbindexserver "replication info"

Tuning System Replication

There are many interesting features for tuning the replication connection.

First, it is the ability to replicate not only the data, but also the settings in the "ini" files. This is configured in the global.ini file in the [inifile_checker] section, the replicate parameter should be set to "true" when the default is "false" You can change it online on the master node using HANA Studio or hdbsql

vhana01:~> hdbsql -u SYSTEM -i 00 \
"ALTER SYSTEM ALTER CONFIGURATION ('global.ini','SYSTEM')
 SET ('inifile_checker','replicate') = 'true'"
Single Sign-On authentication failed
Password: 
0 rows affected (overall time 1467 usec; server time 698 usec)
vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini
[inifile_checker]
replicate = true

[persistence]
basepath_datavolumes = /hana/data/PRD
basepath_logvolumes = /hana/log/PRD

[system_replication]
mode = primary
actual_mode = primary
site_id = 1
site_name = PRODSITE

Another feature is traffic compression. This can be very important for a connection between two sites. Two enable_log_compression and enable_data_compression parameters are part of the [system_replication] section in the global.ini file. This section of the file is not replicated by the turned on earlier replication. According to the documentation, the definitions should be on the secondary side, but if you think about it, when you switch between sites, the primary node becomes secondary. Therefore, we manually add definitions on both sides.

Stop both instances, fix global.ini file and start both instances.

vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini
 ..
[system_replication]
enable_log_compression = true
enable_data_compression = true
enable_log_retention = auto
 ..

The third enabled parameter enable_log_retention is good to set auto to keep unshipped logs on primary node. This can explode FS, but will ensure the data integrity.

Take over the system

Take-over is performed on the secondary node. This command will not attempt to connect to the main site, stop the main database, or switch it to the slave mode. It will only promote the secondary node, it will become a standalone primary server. As a result, if the previous node is still alive, you will get two active databases with a potential split brain.

So, a correct takeover procedure should looks like:

  1. Check connection to master node. If it avaliable, shutdown database, then remove Virtual IP from it.
  2. Promote secondary database becomes primary and assign VIP to it.
  3. If second node avaliable, register it to be secondary to new primary.
  4. Start database on secondary node to resume replication.

Now we will do this manually, then will use the scripts.

root@vhana01:~ # su - prdadm
vhana01:/usr/sap/PRD/HDB00> HDB stop
root@vhana02:~ # su - prdadm
vhana02:/usr/sap/PRD/HDB00> hdbnsutil -sr_takeover
checking local nameserver ...
done.
root@vhana01:~ # su - prdadm
vhana01:/usr/sap/PRD/HDB00> hdbnsutil -sr_register \
	--remoteHost=vhana02 \
	--remoteInstance=00 \
	--replicationMode=async \
	--operationMode=logreplay \
	--name=PRODSITE
adding site ...
checking for inactive nameserver ...
nameserver vhana01:30001 not responding.
collecting information ...
updating local ini files ...
done.
vhana01:/usr/sap/PRD/HDB00> HDB start

A semi-automatic fail-over script will be here.

Keep it running

If logs becomes full due to secondary failed for long time and you want to drop replication:

Unregister the secondary system:

hdbnsutil -sr_unregister

On primary system disable system replication:

hdbnsutil -sr_disable

Updated on Tue Dec 19 00:52:31 IST 2017 More documentations here