Verifying Full Backup Files
This section applies to full backups, only.
With each backup, there is a checksum associated with the files that comprise the backup.
Our implementation of CRC32C has been verified with the commonly used package crcmod, but any library that implements or uses CRC32C can be utilized.
To illustrate how a checksum is comprised of the discrete parts of a database backup, consider the following example where all backup files for the cluster are stored into a NFS share named backup_
:
| backup_database
\
| BACKUP_COMPLETE
| db.backup
| db_0.backup
| db_0.backup_columns0.tar
| db_0.backup_columns1.tar
| db_1.backup
| db_2.backup
| db_2.backup_columns0.tar
| db_2.backup_columns1.tar
| db_2.backup_columns2.tar
| db_3.backup
| db_3.backup_columns0.tar
This is a backup is for a small, mixed columnstore/rowstore to an NFS drive with four partitions.BACKUP_
sentinel file that was created on the master aggregator node in its relative /data/db_
SingleStore directory.
The first 8 characters are the CRC32C of the reference database, db.
, on the master aggregator.
Next, each partition in order, either has 8 or 16 characters, depending on whether it has columnstore segment tar files or not.db_
while the second 8 characters are for the segment tar files.
If segment tar files have been created, each segment tar file is concatenated, and then the CRC32C checksum is taken.
Note Even if a database has a columnstore table, there may not be a corresponding tar file for a partition, because data may be cached in the hidden rowstore table for that columnstore table.
The following Python script can be used to verify the backup checksum.verify_
.echo $?
.0
means the verification was successful, while a result of -1
means it was not.
Note
Script completion time is dependent on the size of the backup and may take hours to complete.
# verify_backup.py## Given a directory, this script will verify that the backup crc and size both# are unchanged.## REQUIRES: crcmod to be installed: https://pypi.org/project/crcmod/## USAGE: python verify_backup.py /absolute/path/to/backup## NOTE: This script needs read privileges on all files being verified.#import crcmodimport globimport jsonimport sysimport errno# VerifyBackup:# Verifies the CRC located in the backup sentinel file (BACKUP_COMPLETE)# matches the calculated CRC of files in backupDirectory.## Param backupDirectory: absolute path to directory where backup exists.# Return: 0 on success, -1 on failure.#def verifyBackup(backupDirectory):# Strip off trailing '/' if exists.#if backupDirectory[-1] == '/':backupDirectory = backupDirectory[:(len(backupDirectory)-1)]with open("%s/BACKUP_COMPLETE" % backupDirectory, "r") as f:buf = f.read()backupDictionary = json.loads(buf)try:finalCrc = backupDictionary["Checksum"]dbName = backupDictionary["Database_Name"]numPartitions = int(backupDictionary["Num_Partitions"])except KeyError as e:print eprint "Sentinel File 'BACKUP_COMPLETE' is from unsupported version of backup."return -1# This is in the crc32c specification, crcmod also has crc32c hardcoded,# so either can be used.#crc = crcmod.Crc(0x11EDC6F41, rev=True, initCrc=0x00000000, xorOut =0xFFFFFFFF)crclist = ""# Process the reference snapshot.#with open("%s/%s.backup" % (backupDirectory, dbName), "r") as f:buf = f.read()crc.update(buf)crclist += crc.hexdigest()# Process each partition.#for i in range (numPartitions):crc = crc.new()# Process Partition snapshot.# Each Partition MUST have a snapshot.#with open("%s/%s_%d.backup" % (backupDirectory, dbName, i), "r") as f:buf = f.read()crc.update(buf)crclist += crc.hexdigest()# Snapshots and Columns are checksummed with separate CRC's.#crc = crc.new()# To emulate a do while loop in Python.tarFound = True# If the columnar blobs is non empty, append the crc to the list.columnCrc = Falsej = 0# Process all tarballed columnstore files.## NOTE: Even if a database has a columnstore, this does not imply# each partition has columnar blobs. The data might exist in the# rowstore snapshot or might be skewed such that all rows exist# in other partitions.#while tarFound:try:with open("%s/%s_%d.backup_columns%d.tar" % (backupDirectory, dbName, i,j) , "r") as f:buf = f.read()crc.update(buf)j += 1columnCrc = Trueexcept IOError as e:if e.errno == errno.ENOENT:tarFound = Falseelse:assert eif columnCrc:crclist += crc.hexdigest()# CRC's will be of different case, make both uppercase.#if crclist != finalCrc.upper():print "Crc calculated from directory:" + crclistprint "Crc in backup file :" + finalCrcprint "Crcs do not match!"return -1return 0if __name__ == '__main__':backupDirectory = sys.argv[1]if len(sys.argv) != 2:print "Incorrect usage: please include just the directory where the backup is located."sys.exit(verifyBackup(backupDirectory))
Related Topics
Last modified: April 3, 2023