Linux backup

There is no single backup solution that can fit to any environment. The choosen method heavy depends on size and type of data to be backed up. If we are talking about backup of any kind of database, the backup window can affect on solution choose too.

Snapshots

It is a great solution to take crush consistant backup even for running databases. Does not depend on data size but depend on free size (either in disk, on PV, on LV, etc.) because the altered content should be saved anywhere.

Do not forget flush buffers by synccommand just before taking snapshot, otherwice it will not be consistant and probably not usefull.

Once snaphsot taken, think what to do with it. Copy their content to tape, to remote server or just leave it in place. The last solve human errors and allow immediate restore but does not protect against hardware failure.

LVM snapshot

I likes a lot of LVs on my servers. Making a lot of LV do your system to be more efficient. Putting everything in / is a bad practice, making things disordered.

First, check free space on your VG and LV sizes:

# pvs
  PV         VG     Fmt  Attr PSize PFree
  /dev/sdb   rootvg lvm2 a--  9.97g 1.53g
# lvs
  LV    VG     Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  home  rootvg -wi-ao--- 640.00m
  opt   rootvg -wi-ao--- 512.00m
  slash rootvg -wi-ao--- 512.00m
  swap  rootvg -wi-ao---   1.97g
  usr   rootvg -wi-ao---   3.59g
  var   rootvg -wi-ao---   1.25g

LVM snapshot is COW (Copy-On-Write) in old (_real_) meaning, that copy original data to snapshot area, when original data overwritten. The huge minus of that is write performance degradation when snapshot just exist. If you have number of snapshots (let's say 7 daily snaps), then when old data (that was exist in all 7 snaps) altered, the original content should be copied 7 times to each snapshot area.

That is why you should remove LVM snapshot immediately after use. An another reason to remove snaphost very quick is that snapshot becomes invalid when running out of space. Let's demonstrate that.

Create some data:

# dd if=/dev/urandom of=/home/150m bs=1024k count=150
# ll -h /home/150m
-rw-r--r-- 1 root root 150M Dec 18 09:10 /home/150m

Create snapshot (I've created snap less than data size for demostration purpose):

# sync;sync;sync
# lvcreate -L100m -n home_backup_20131218.1114 -s /dev/rootvg/home
  Rounding up size to full physical extent 128.00 MiB
  Logical volume "home_backup_20131218.1114" created
        *** TEST environment ***
# lvs
  LV                        VG     Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  home                      rootvg owi-aos-- 640.00m
  home_backup_20131218.1114 rootvg swi-a-s-- 128.00m      home     0.23
  opt                       rootvg -wi-ao--- 512.00m
  slash                     rootvg -wi-ao--- 512.00m
  swap                      rootvg -wi-ao---   1.97g
  usr                       rootvg -wi-ao---   3.59g
  var                       rootvg -wi-ao---   1.25g

Bolded are: origin LV, snapshot LV, active, snapshot size and name of origin.

Now I refreshing data and check:

# dd if=/dev/urandom of=/home/150m bs=1024k count=150
# sync
# lvs
  /dev/rootvg/home_backup_20131218.1114: read failed after 0 of 4096 at 671023104: Input/output error
  /dev/rootvg/home_backup_20131218.1114: read failed after 0 of 4096 at 671080448: Input/output error
  /dev/rootvg/home_backup_20131218.1114: read failed after 0 of 4096 at 0: Input/output error
  /dev/rootvg/home_backup_20131218.1114: read failed after 0 of 4096 at 4096: Input/output error
  LV                        VG     Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  home                      rootvg owi-aos-- 640.00m
  home_backup_20131218.1114 rootvg swi-I-s-- 128.00m      home   100.00
  opt                       rootvg -wi-ao--- 512.00m
  slash                     rootvg -wi-ao--- 512.00m
  swap                      rootvg -wi-ao---   1.97g
  usr                       rootvg -wi-ao---   3.59g
  var                       rootvg -wi-ao---   1.25g

As you see, snapshot becomes Invalid and cannot be used anymore, just lvremove it.

Therefore best practice is define snapshot size equal to it's origin, or at least some more than expected changes. However, you cannot be sure that changes will stay in expected rate, therefore every time check your snapshot state.

Here is an example of LVM snapshot base backup script (server has ORACLE_HOME in /export and DB files in /oradbs):

...
echo " *** Stoping Oracle Database $(hostname):$0 at $(date) ***"
/etc/init.d/oracle stop || STATUS="failure"
DEBUGINFO

/usr/sbin/lvcreate -s -n backup_export -l 8%FREE /dev/vgdata/export || STATUS="failure" ; DEBUGINFO
/usr/sbin/lvcreate -s -n backup_oradbs -l 100%FREE /dev/vgdata/oradbs || STATUS="failure" ; DEBUGINFO

echo " *** Start  Oracle  $(hostname):$0 at $(date) ***"
/etc/init.d/oracle start || STATUS="failure"
DEBUGINFO

for fs in oradbs export ; do
        mount -o ro /dev/vgdata/backup_$fs /backup/$fs || STATUS="failure" ; DEBUGINFO
        grep -q /backup/$fs /proc/mounts && backup_fs /backup/$fs || STATUS="failure" ; DEBUGINFO
        umount_force /backup/$fs # umount_force is just a wrapper to umount||kill;umount||kill -9;umount||umount -l
	/usr/sbin/lvs # Check this output for snapshot becomes Invalid
        /usr/sbin/lvremove -f /dev/vgdata/backup_$fs || STATUS="failure" ; DEBUGINFO
done

echo " ***  Logical Volumes after run @ $(hostname):$0 at $(date) ***"
/usr/sbin/lvscan
...

The script stops Oracle, takes both snapshots , start Oracle, then call procedure backup_fs to make real backup to backup server and remove snapshots

btrfs snapshots

btrfs is very interesting FS becomes more popular, but still is very experimental. For example, earlier versions of btrfs were not suitable to hold samba content (actually, I've did not check latest versions for that, probably it is fixed already).

Snapshot for btrfs are not COW, they are more similar as NetApp takes snapshot, old data stay frozen in snapshot, while new data written in free space on same FS. Therefore you do not limited in snapshot amounts and there is no impact on performance.

Backup script for taking snapshot on btrfs:

...
BDIR=/home/data
# Take snapshot
/usr/sbin/btrfsctl -s $BDIR/backup_$(date "+%Y%m%d_%H%M%S") $BDIR
# Remove old snapshot
for s in $(cd $BDIR && ls -1 | grep backup_ |sort -r|tail --lines=+17) ; do
        /usr/sbin/btrfsctl -D $s $BDIR >/dev/nul
done
...

As small "minus" (or plus, as you want), a snapshot appear directly in mount point, making referncing more confusing.

Using rsync

You can use rsync to replicate data locally, but most interesting is replication to remote server/storage as backup tool. Rsync has reach options, that have to suit particular scenario.

Here is an example for rsyncing to NetApp NFS volume with NetApp friendly parameters:

# USAGE: rsync_fs $FS $DEST_DIR
rsync_fs() {
        local fs=$1
        local fsname=root$(echo $1 | tr "/" ".")
        local ddir=$2

        [ -x /usr/bin/rsync ] || { echo " *** NO rsync found" ; return 1 ; }
        [ "x$ddir" = "x" ] && return 1
        [ -d $fs ] || echo " *** No $fs exist here !"
        [ -d $ddir ] || return 1
        echo " *** rsync FS $fs started"

        OPTS=""
        #OPTS="--dry-run"
        #OPTS="$OPTS --quiet"
        OPTS="$OPTS --verbose"
        #OPTS="$OPTS --progress"
        OPTS="$OPTS -a"
        #OPTS="$OPTS -H"        # probably bad hardlink support for rsync.
        #OPTS="$OPTS --sparse"
        #OPTS="$OPTS --partial"
        OPTS="$OPTS --inplace"
        OPTS="$OPTS --block-size=4096"
        OPTS="$OPTS --no-whole-file"
        #OPTS="$OPTS --whole-file"
        OPTS="$OPTS --delete"
        OPTS="$OPTS --one-file-system"
        OPTS="$OPTS --exclude=tmp/"
        [ -r /etc/rsync.exclude ] && OPTS="$OPTS --exclude-from=/etc/rsync.exclude"
        [ -r /etc/rsync.include ] && OPTS="$OPTS --include-from=/etc/rsync.include"

        rsync $OPTS $fs/ ${ddir}/$fsname
        exitcode=$?
        [ $exitcode -eq 24 ] && exitcode=0
        return $exitcode
}

When running rsync backup, swap usage alerts can be arized. It is because rsync hold all files tree metadata in memory. More files you have, more metadata should be processed. Meanwhile, there is no other solution, than just increase your swap size.


Updated on Mon Dec 23 11:25:56 UTC 2013 More documentations here