Setting SAP HANA System Replication on SuSE SLE11 SP4

Two sle11sp4 were installed on different sites with a stretched network. The SuSE HA solution can not be implemented there because of the unreliable non-duplicated connection between the sites. Therefore, we will configure the HANA System Replication. The fail-over will be performed manually using the prepared scripts. Virtual IP will follow the active instance using additional scripts.

Installing SAP HANA

Preparing nodes

Configure NTP and name resolution. Even if you have a reliable DNS, put everything associated with the cluster in /etc/hosts, and copy it between the nodes:

# cat /etc/hosts       localhost   vhana01.domain.local vhana01   vhana02.domain.local vhana02   vhana.domain.local vhana

Exchange root SSH keys and copy host SSH keys from one to another.

Preparing the unattend configuration file

The installation of SAP products must be performed using the hdblcm tool. You can also use this tool to perform other post installation tasks, such as renaming a host.

Locate the hdblcm tool:

root@vhana01:~ # find /mnt/cdrom -type f -name hdblcm
root@vhana01:~ # HDBLCM=$(find /mnt/cdrom -type f -name hdblcm)

The first command found the only tool on my installation media, so I assigned the result to the variable to shorten the commands.

You can generate the configuration file template as follows:

root@vhana01:~ # "$HDBLCM" --action=install --dump_configfile_template=install.ini
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LANG = "C.utf8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

SAP HANA Lifecycle Management - SAP HANA

Scanning Software Locations...
Detected components:
Config file template '/root/install.ini' written
Password file template '/root/install.ini.xml' written
Configuration file template created.

Now you can edit it. The default values can be omitted. Here is an example of my file:

root@vhana01:~ # cat /root/install.ini



Installing SAP HANA

Install SAP HANA on both nodes with same SID (PRD) and NUM (00):

# "$HDBLCM" --configfile=install.ini --ignore=check_signature_file -b
Note: Deployment of SAP Host Agent configurations finished with errors

I will not show here the whole output, it is very long. Check the log files for errors. Despite the error message about the Host Agent, everything seems to be working. Make sure that users created by the installer have the same UID on both servers.

Setup System Replication

A full backup is required to initialize the replication. Create it on the node, which should be the master.

root@vhana01:~ # su - prdadm
vhana01:~> hdbsql -u SYSTEM -i 00 "BACKUP DATA USING FILE ('/hana/backup/FULL')"
Single Sign-On authentication failed
0 rows affected (overall time 6328.477 msec; server time 6327.586 msec)

Enable system replication on the master node, give it a logical name related to the location of the site:

vhana01:~> hdbnsutil -sr_enable --name=PRODSITE
checking for active nameserver ...
nameserver is active, proceeding ...
successfully enabled system as system replication source site

vhana01:~> hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
online: true

mode: primary
site id: 1
site name: PRODSITE

Host Mappings:


Stop the secondary database:

root@vhana02:~ # su - prdadm
vhana02:~> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function Stop 400

18.12.2017 14:21:51
Waiting for stopped instance using: /usr/sap/PRD/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 00 -function WaitforStopped 600 2

18.12.2017 14:22:11
hdbdaemon is stopped.

Register the secondary system:

vhana02:~> hdbnsutil -sr_register \
	--remoteHost=vhana01 \
	--remoteInstance=00 \
	--replicationMode=async \
	--operationMode=logreplay \
adding site ...
checking for inactive nameserver ...
nameserver vhana02:30001 not responding.
collecting information ...
updating local ini files ...
vhana02:~> HDB start
vhana02:~> hdbnsutil -sr_state
checking for active or inactive nameserver ...

System Replication State
online: true

mode: async
site id: 2
site name: DRSITE
active primary site: 1

Host Mappings:

vhana02 -> [PRODSITE] vhana01
vhana02 -> [DRSITE] vhana02

primary masters:vhana01


The hdbnsutil -sr_state command also works on primary site.

vhana01:~> hdbnsutil -sr_state --sapcontrol=1
checking for active or inactive nameserver ...
site id=1
site name=PRODSITE


Much more information you can get with the another command:

vhana01:~> hdbcons -e hdbindexserver "replication info"

Tuning System Replication

There are many interesting features for tuning the replication connection.

First, it is the ability to replicate not only the data, but also the settings in the "ini" files. This is configured in the global.ini file in the [inifile_checker] section, the replicate parameter should be set to "true" when the default is "false" You can change it online on the master node using HANA Studio or hdbsql

vhana01:~> hdbsql -u SYSTEM -i 00 \
 SET ('inifile_checker','replicate') = 'true'"
Single Sign-On authentication failed
0 rows affected (overall time 1467 usec; server time 698 usec)
vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini
replicate = true

basepath_datavolumes = /hana/data/PRD
basepath_logvolumes = /hana/log/PRD

mode = primary
actual_mode = primary
site_id = 1
site_name = PRODSITE

Another feature is traffic compression. This can be very important for a connection between two sites. Two enable_log_compression and enable_data_compression parameters are part of the [system_replication] section in the global.ini file. This section of the file is not replicated by the turned on earlier replication. According to the documentation, the definitions should be on the secondary side, but if you think about it, when you switch between sites, the primary node becomes secondary. Therefore, we manually add definitions on both sides.

Stop both instances, fix global.ini file and start both instances.

vhana01:~> cat /usr/sap/PRD/SYS/global/hdb/custom/config/global.ini
enable_log_compression = true
enable_data_compression = true
enable_log_retention = auto

The third enabled parameter enable_log_retention is good to set auto to keep unshipped logs on primary node. This can explode FS, but will ensure the data integrity.

Take over the system

Take-over is performed on the secondary node. This command will not attempt to connect to the main site, stop the main database, or switch it to the slave mode. It will only promote the secondary node, it will become a standalone primary server. As a result, if the previous node is still alive, you will get two active databases with a potential split brain.

So, a correct takeover procedure should looks like:

  1. Check connection to master node. If it avaliable, shutdown database, then remove Virtual IP from it.
  2. Promote secondary database becomes primary and assign VIP to it.
  3. If second node avaliable, register it to be secondary to new primary.
  4. Start database on secondary node to resume replication.

Now we will do this manually, then will use the scripts.

root@vhana01:~ # su - prdadm
vhana01:/usr/sap/PRD/HDB00> HDB stop
root@vhana02:~ # su - prdadm
vhana02:/usr/sap/PRD/HDB00> hdbnsutil -sr_takeover
checking local nameserver ...
root@vhana01:~ # su - prdadm
vhana01:/usr/sap/PRD/HDB00> hdbnsutil -sr_register \
	--remoteHost=vhana02 \
	--remoteInstance=00 \
	--replicationMode=async \
	--operationMode=logreplay \
adding site ...
checking for inactive nameserver ...
nameserver vhana01:30001 not responding.
collecting information ...
updating local ini files ...
vhana01:/usr/sap/PRD/HDB00> HDB start

A semi-automatic fail-over script will be here.

Keep it running

If logs becomes full due to secondary failed for long time and you want to drop replication:

Unregister the secondary system:

hdbnsutil -sr_unregister

On primary system disable system replication:

hdbnsutil -sr_disable

Updated on Tue Dec 19 00:52:31 IST 2017 More documentations here