Backup SAP HANA2 using storage snapshot

This POC will demonstrate the ability to use storage snapshots as a backup of a HANA2 database. I'll take an LVM snapshot simulating a storage snapshot. Then I will try to restore the database from the resulting snapshot. This POC was performed on a test, almost inactive system and therefore must be repeated on the system under load.

The snapshot only needs to be taken on the "data" volume. When restoring, the "log" and "backup" areas will be used to achieve the point-in-time recovery target. If the "shared", "logs" and "data" volumes actually use the same storage LUN and are therefore required be recovered together, then the only recovery option is the time when the snapshot was taken.

This POC done for HANA DB version 2. Please refer to this article for HANA DB version 1

Pros:

Space efficiency. The snapshot will take storage capacity only for changes, still be FULL backup of DB.
Fast. Takes seconds for a database of any size.
There is no waste of server resources. Data does not leave storage, there is no traffic or IO.
Fast recovery. Just stop the database, revert to snapshot and start the database.

Contras:

No data copied to another location for disaster recovery.

According to the documentation, only instances with a single tenant are able to recover from a snapshot.

Make some database data

I use here, in reference, a passwordless entries, defined by hdbuserstore. This is not required at all, it is just for saving time for password typing. Here is an example of creating such entries:

ndbadm@hana2ur:/> hdbuserstore SET SYSTEMDB localhost:30013@SYSTEMDB SYSTEM
ndbadm@hana2ur:/> hdbuserstore SET SYSTEM localhost:30015 SYSTEM

Then, the key SYSTEM is match to user SYSTEM for tenant database, and the key SYSTEMDB is match to user SYSTEM for system database. Pay attention which key used later in this article.

Let's add some data to the database to check the recovery process.

ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "CREATE TABLE t1(epoc VARCHAR(12), comment VARCHAR(80));"
ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "INSERT INTO t1 VALUES('"$(date "+%s")"','Before any snapshot');"
ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "SELECT * FROM t1"
| EPOC       | COMMENT             |
| ---------- | ------------------- |
| 1632897820 | Before any snapshot |

Taking backup

NOTE: The backup commands are executed in SYSTEMDB, not in the tenant databases.

Instruct HANA to start backup using a storage snapshot, and get resulting BACKUP_ID:

ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "BACKUP DATA FOR FULL SYSTEM CREATE SNAPSHOT;"
ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "SELECT BACKUP_ID FROM SYS.M_BACKUP_CATALOG WHERE ENTRY_TYPE_NAME='data snapshot' AND STATE_NAME='prepared';"
| BACKUP_ID            |
| -------------------- |
|        1632900703037 |

Database files becomes in a consistent state and a result is similar to the Oracle "begin backup" command. The last command returns a BACKUP_ID that will be used to complete the backup and resume normal database operations.

As a result of executing the CREATE SNAPSHOT command, HANA creates an additional files that includes the database metadata:

ndbadm@hana2ur:/> find /hana/data/ -name "snap*"
/hana/data/NDB/mnt00001/hdb00001/snapshot_databackup_0_1
/hana/data/NDB/mnt00001/hdb00003.00003/snapshot_databackup_0_1

The first file at hdb00001 directory contains metadata for SYSTEMDB and the next file contains metadata for tenant DB.

The next step is to flush the file system buffers to disk. This step is very important because the data files are in a consistent state only for the HANA engine, but not for the rest of the world.

ndbadm@hana2ur:/> sync ; sync ; sync

(Pardon me for tripple sync)

Now we need to take a snapshot. Here we will use an LVM snapshot. To take it, you must have some free space in your PV to be able to take a snapshot. The free space must be larger than the expected LV changes over the lifetime of the snapshot. You can automatically increase the size of a snapshot using the snapshot_autoextend_threshold and snapshot_autoextend_percent options in /etc/lvm/lvm.conf. To check for free space:

10:44:38 root@hana2ur:~ # pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/sda2  rootvg lvm2 a--   31.99g   4.99g
  /dev/sdb   hanavg lvm2 a--  300.00g 180.00g

Finally, create a snapshot:

10:48:32 root@hana2ur:~ # lvcreate -s -n data_backup_first -l 10%ORIGIN /dev/hanavg/data
  Logical volume "data_backup_first" created.

Now we should finish backup (similar to the Oracle's "end backup" command)

ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "BACKUP DATA FOR FULL SYSTEM CLOSE SNAPSHOT BACKUP_ID 1632900703037 SUCCESSFUL 'data_backup_first';"
ndbadm@hana2ur:/>

Where the BACKUP_ID number is what we got at the beginning of the backup, and the last string is the description that helps us find the snapshot during the restore phase.

Thats it. The backup done.

Putting it all together into a backup script

I am assuming there is a script, named take_hana_snap in the <sid>adm/bin directory that somehow takes a snapshot and returns the name of the snapshot taken as output.

Another assumption is that there is a configured BACKUP user with privileges as described here. Then the script becomes very simple:

#!/bin/bash
# By Oleg Volkov (voleg) Refer: http://www.voleg.info/hana2-backup-snapshot.html

# Reload environment (important for cron jobs)
source $HOME/.bashrc

# This SQL return previous good full backup id from catalog:
SQL="SELECT top 1 min(to_bigint(BACKUP_ID)) FROM SYS.M_BACKUP_CATALOG
where SYS_START_TIME >= ADD_DAYS(CURRENT_TIMESTAMP,-15)
and ENTRY_TYPE_NAME='complete data backup' and STATE_NAME='successful' ;"

# Same for tenant ${SAPSYSTEMNAME}
SQLTENANT="SELECT top 1 min(to_bigint(BACKUP_ID)) FROM ${SAPSYSTEMNAME}.SYS.M_BACKUP_CATALOG
where SYS_START_TIME >= ADD_DAYS(CURRENT_TIMESTAMP,-15)
and ENTRY_TYPE_NAME='complete data backup' and STATE_NAME='successful' ;"

HDBSQL="$DIR_EXECUTABLE/hdbsql -U BACKUP -a -x -j"

# Begin Backup
$HDBSQL "BACKUP DATA FOR FULL SYSTEM CREATE SNAPSHOT;"
BACKUP_ID=$($HDBSQL "SELECT BACKUP_ID FROM SYS.M_BACKUP_CATALOG WHERE ENTRY_TYPE_NAME='data snapshot' AND STATE_NAME='prepared';")

# Take a snap
sync;sync;sync
SNAP=$(take_hana_snap)

# End Backup
if [ $? -gt 0 ] ; then
        $HDBSQL "BACKUP DATA FOR FULL SYSTEM CLOSE SNAPSHOT BACKUP_ID $BACKUP_ID UNSUCCESSFUL '"$SNAP"';"
else
        $HDBSQL "BACKUP DATA FOR FULL SYSTEM CLOSE SNAPSHOT BACKUP_ID $BACKUP_ID SUCCESSFUL '"$SNAP"';"
fi

# Clean backup catalog from outdated backups:
BACKUPID=$($HDBSQL $SQL)
[ -z $BACKUPID -o 'x'$BACKUPID = 'x?' ] || $HDBSQL "BACKUP CATALOG DELETE ALL BEFORE BACKUP_ID $BACKUPID"

# TENANT DB has its own backup catalog.
BACKUPID=$($HDBSQL $SQLTENANT)
[ -z $BACKUPID -o 'x'$BACKUPID = 'x?' ] || $HDBSQL "BACKUP CATALOG DELETE FOR ${SAPSYSTEMNAME} ALL BEFORE BACKUP_ID $BACKUPID"

Add another data and take second backup

ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "INSERT INTO t1 VALUES('"$(date "+%s")"','After first snap');"
ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "SELECT * FROM t1"
| EPOC       | COMMENT             |
| ---------- | ------------------- |
| 1632897820 | Before any snapshot |
| 1632902470 | After first snap    |
ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "BACKUP DATA FOR FULL SYSTEM CREATE SNAPSHOT;"
ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "select BACKUP_ID from SYS.M_BACKUP_CATALOG where ENTRY_TYPE_NAME='data snapshot' and STATE_NAME='prepared';"
| BACKUP_ID            |
| -------------------- |
|        1632902529436 |
ndbadm@hana2ur:/> exit
logout
11:02:45 root@hana2ur:~ # sync ; sync ; sync
11:03:11 root@hana2ur:~ # lvcreate -s -n data_backup_second -l 10%ORIGIN /dev/hanavg/data
  Logical volume "data_backup_second" created.
11:03:26 root@hana2ur:~ # su - ndbadm
ndbadm@hana2ur:/> hdbsql -U SYSTEMDB -A -x -j "BACKUP DATA FOR FULL SYSTEM CLOSE SNAPSHOT BACKUP_ID 1632902529436 SUCCESSFUL 'data_backup_second';"

Lets add some data after all backups.

ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "INSERT INTO t1 VALUES('"$(date "+%s")"','After both backups');"
ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "SELECT * FROM t1"
| EPOC       | COMMENT             |
| ---------- | ------------------- |
| 1632897820 | Before any snapshot |
| 1632902470 | After first snap    |
| 1632902807 | After both backups  |

Restoring HANA database using snapshot

First of all, stop the database with the HDB stop command. Then check for are still running processes named hdbrsutil. These processes hold the memory with previously loaded tables to save load time the next time the database is started. I couldn't find any information about their behavior when restoring a database from a backup, so I stopped them. A clean stop is the opposite of what is shown in the "ps" output, for example:

ndbadm@hana2ur:/> HDB stop
 ..
 OK
ndbadm@hana2ur:/> ps -ef | grep hdbrsutil
ndbadm    8639     1  0 14:42 ?        00:00:00 hdbrsutil  --start --port 30001 --volume 1 --volumesuffix mnt00001/hdb00001 --identifier 1632915727
ndbadm    8964     1  0 14:42 ?        00:00:00 hdbrsutil  --start --port 30003 --volume 3 --volumesuffix mnt00001/hdb00003.00003 --identifier 1632915768
ndbadm@hana2ur:/> hdbrsutil --stop --port 30003
ndbadm@hana2ur:/> hdbrsutil --stop --port 30001

Before reverting the snapshot, you must unmount the file system, otherwise the result will be unpredictable:

ndbadm@hana2ur:/> exit
logout
11:09:36 root@hana2ur:~ # umount /hana/data

LVM snapshots disappear after reverting to them. To save it for the next tests, right after the rollback I will take another snapshot with the same name:

11:12:24 root@hana2ur:~ # lvconvert --merge /dev/hanavg/data_backup_first
  Merging of volume hanavg/data_backup_first started.
  hanavg/data: Merged: 98.39%
  hanavg/data: Merged: 100.00%
11:13:16 root@hana2ur:~ # lvcreate -s -n data_backup_first -l 10%ORIGIN /dev/hanavg/data
  Logical volume "data_backup_first" created.
11:13:20 root@hana2ur:~ # mount /hana/data

Start the instance:

11:13:49 root@hana2ur:~ # su - ndbadm
ndbadm@hana2ur:/> HDB start
 ..
OK
ndbadm@hana2ur:/> 
ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "SELECT * FROM t1"
| EPOC       | COMMENT             |
| ---------- | ------------------- |
| 1632897820 | Before any snapshot |
| 1632902470 | After first snap    |
| 1632902807 | After both backups  |

The database is up, but includes the most recent date as if no recovery occurred. Something went wrong? No, the recovery actually occurs, you can verify this by looking at the trace files. But the HANA DB engine found in the database logs that several more transactions had occurred since the snapshot, and applied them after a successful recovery.

Restore to Point-in-Time using snapshot

In most cases, point-in-time recovery is more desirable than rolling forward till the end. Let's see how to do this. Let's restore up to the time before adding the third entry to the table. The time stamp used by the restore command must be in UTC time zone, not your local time zone. The exact desired time would be:

ndbadm@hana2ur:/> date -u -d @1632902807 "+%Y-%m-%d %H:%M:%S"
2021-09-29 08:06:47

This number (1632902807) is the EPOC time we've added the third entry in table. So, the desired time will be just before that, lets say '2021-09-29 08:06:00'.

Stop the database and revert to first snapshot again:

ndbadm@hana2ur:/> HDB stop
 ..
 OK
hdbdaemon is stopped.
ndbadm@hana2ur:/> exit
logout
13:15:43 root@hana2ur:~ # umount /hana/data
13:16:00 root@hana2ur:~ # lvconvert --merge /dev/hanavg/data_backup_first
  Merging of volume hanavg/data_backup_first started.
  hanavg/data: Merged: 97.63%
  hanavg/data: Merged: 100.00%
13:16:39 root@hana2ur:~ # lvcreate -s -n data_backup_first -l 10%ORIGIN /dev/hanavg/data
  Logical volume "data_backup_first" created.
13:16:42 root@hana2ur:~ # mount /hana/data
13:16:46 root@hana2ur:~ # su - ndbadm
ndbadm@hana2ur:/>

Create recovery command file and start an instance:

ndbadm@hana2ur:/> echo "RECOVER DATABASE UNTIL TIMESTAMP '2021-09-29 08:06:00' USING SNAPSHOT" > $DIR_INSTANCE/work/recoverInstance.sql
ndbadm@hana2ur:/> HDB start
 ..
OK

This recovery command will recover only the SYSTEM database. You have to use another command to recover tenant database.


ndbadm@hana2ur:/> hdbsql -U SYSTEMDB
hdbsql SYSTEMDB=> \pa
Page by page scroll output switched OFF
hdbsql SYSTEMDB=> \al
Aligned output mode switched ON
hdbsql SYSTEMDB=> SELECT * FROM M_DATABASES;
| DATABASE | DESCRIPTION     | ACT | ACTIVE_S | OS_USER  | OS_GROUP | RESTART | F |
| -------- | --------------- | --- | -------- | -------- | -------- | ------- | - |
| SYSTEMDB | SystemDB-NDB-00 | YES |          |          |          | DEFAULT | ? |
| NDB      | NDB-00          | NO  |          |          |          | DEFAULT | ? |
2 rows selected (overall time 18.776 msec; server time 344 usec)
hdbsql SYSTEMDB=> RECOVER DATABASE FOR NDB UNTIL TIMESTAMP '2021-09-29 08:06:00' USING SNAPSHOT ;
0 rows affected (overall time 44.290199 sec; server time 44.287628 sec)

hdbsql SYSTEMDB=> SELECT * FROM M_DATABASES;
| DATABASE | DESCRIPTION     | ACT | ACTIVE_S | OS_USER  | OS_GROUP | RESTART | F |
| -------- | --------------- | --- | -------- | -------- | -------- | ------- | - |
| SYSTEMDB | SystemDB-NDB-00 | YES |          |          |          | DEFAULT | ? |
| NDB      | NDB-00          | YES |          |          |          | DEFAULT | ? |
2 rows selected (overall time 17.404 msec; server time 623 usec)

The command itself is almost the same used before.

NOTE: The UNTIL TIMESTAMP is in UTC timezone !!!!

Lets check restored data:

ndbadm@hana2ur:/> hdbsql -U SYSTEM -A -x -j "SELECT * FROM t1"
| EPOC       | COMMENT             |
| ---------- | ------------------- |
| 1632897820 | Before any snapshot |
| 1632902470 | After first snap    |

If you want to restore to exact snapshot time only, the commands are same as in next chapter

Restore from the snapshot on another server

The target server must have at least the same version of HANA DB installed as the source database. There is no need to migrate SSFS keys as they are represented in the snapshot metadata file. I have not tested recovery for another tenant, so the target server must have the same tenant SID.

Stop the database and do not forget to stop hdbrsutil, as described above. Transfer the snapshot and mount it instead /hana/data. Then create recovery command file with option CLEAR LOG:

ndbadm@hana2ur:/> echo "RECOVER DATA USING SNAPSHOT CLEAR LOG" > $DIR_INSTANCE/work/recoverInstance.sql
ndbadm@hana2ur:/> HDB start

Once system database restored, connect to it and issue command to restore tenant database:

hdbsql SYSTEMDB=> RECOVER DATA FOR NDB USING SNAPSHOT CLEAR LOG;

Again, use the CLEAR LOG option.

Updated on Wed Sep 29 19:13:41 IDT 2021 More documentations here