Managing HBase Snapshots
This page demonstrates how to manage HBase snapshots using either Cloudera Manager or the command line.
Managing HBase Snapshots Using Cloudera Manager
For HBase services, you can use the Table Browser tab to view the HBase tables associated with a service on your cluster. You can view the currently saved snapshots for your tables, and delete or restore them. From the HBase Table Browser tab, you can:
- View the HBase tables for which you can take snapshots.
- Initiate immediate (unscheduled) snapshots of a table.
- View the list of saved snapshots currently maintained. These can include one-off immediate snapshots, as well as scheduled policy-based snapshots.
- Delete a saved snapshot.
- Restore from a saved snapshot.
- Restore a table from a saved snapshot to a new table (Restore As).
Browsing HBase Tables
To browse the HBase tables to view snapshot activity:
- From the Clusters tab, select your HBase service.
- Go to the Table Browser tab.
Managing HBase Snapshots
Minimum Required Role: BDR Administrator (also provided by Full Administrator)
- Click a table.
- Click Take Snapshot.
- Specify the name of the snapshot, and click Take Snapshot.
To delete a snapshot, click and select Delete.
Storing HBase Snapshots on Amazon S3
- The access key ID for your Amazon S3 account.
- The secret access key for your Amazon S3 account.
- The path to the directory in Amazon S3 where you want your HBase snapshots to be stored.
You can improve the transfer of large snapshots to Amazon S3 by increasing the number of nodes due to throughput limitations of EC2 on a per node basis.
Configuring HBase in Cloudera Manager to Store Snapshots in Amazon S3
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
Perform the following steps in Cloudera Manager:
- Open the HBase service page.
- Select .
- Select .
- Type AWS in the Search box.
- Enter your Amazon S3 access key ID in the field AWS S3 access key ID for remote snapshots.
- Enter your Amazon S3 secret access key in the field AWS S3 secret access key for remote snapshots.
Important: If AWS S3 access keys are rotated, the Cloudera Manager server must be restarted.
- Enter the path to the location in Amazon S3 where your HBase snapshots will be stored in the field Amazon S3 Path for Remote Snapshots.
Warning: Do not use the Amazon S3 location defined by the path entered in Amazon S3 Path for Remote Snapshots for any other purpose, or directly add or delete content there. Doing so risks corrupting the metadata associated with the HBase snapshots stored there. Use this path and Amazon S3 location only through Cloudera Manager, and only for managing HBase snapshots.
- In a terminal window, log in to your Cloudera Manager cluster at the command line and create a /user/hbase directory in HDFS. Change the owner of the
directory to hbase. For example:
hdfs dfs -mkdir /user/hbase hdfs dfs -chown hbase /user/hbase
Note: Amazon S3 has default rate limitation per prefix per bucket. The throughput can be limited to 3500 requests per second. Consider to use different prefixes on S3 per table namespace, or table if any of the following applies:- large number of tables
- tables with a large number of store files or regions
- frequent snapshot policy
Configuring the Dynamic Resource Pool Used for Exporting and Importing Snapshots in Amazon S3
- Open the HBase service page.
- Select .
- Select .
- Type Scheduler in the Search box.
- Enter name of a dynamic resource pool in the Scheduler pool for remote snapshots in AWS S3 property.
- Click Save Changes.
HBase Snapshots on Amazon S3 with Kerberos Enabled
Starting with Cloudera Manager 5.8, YARN should by default allow the hbase user to run MapReduce jobs even when Kerberos is enabled. However, this change only applies to new Cloudera Manager deployments, and not if you have upgraded from a previous version to Cloudera Manager 5.8 (or higher).
- Open the YARN service page in Cloudera Manager.
- Select .
- Select .
- In the Allowed System Users property, click the + sign and add hbase to the list of allowed system users.
- Click Save Changes.
- Restart the YARN service.
Managing HBase Snapshots on Amazon S3 in Cloudera Manager
Minimum Required Role: BDR Administrator (also provided by Full Administrator)
To take HBase snapshots and store them on Amazon S3, perform the following steps:
- On the HBase service page in Cloudera Manager, click the Table Browser tab.
- Select a table in the Table Browser. If any recent local or remote snapshots already exist, they display on the right side.
- In the dropdown for the selected table, click Take Snapshot.
- Enter a name in the Snapshot Name field of the Take Snapshot dialog box.
- If Amazon S3 storage is configured as described above, the Take Snapshot dialog box Destination section shows a choice of Local or Remote S3. Select Remote S3.
- Click Take Snapshot.
While the Take Snapshot command is running, a local copy of the snapshot with a name beginning cm-tmp followed by an auto-generated filename is displayed in the Table Browser. This local copy is deleted as soon as the remote snapshot has been stored in Amazon S3. If the command fails without being completed, the temporary local snapshot might not be deleted. This copy can be manually deleted or kept as a valid local snapshot. To store a current snapshot in Amazon S3, run the Take Snapshot command again, selecting Remote S3 as the Destination, or use the HBase command-line tools to manually export the existing temporary local snapshot to Amazon S3.
Deleting HBase Snapshots from Amazon S3
- Select the snapshot in the Table Browser.
- Click the dropdown arrow for the snapshot.
- Click Delete.
Restoring an HBase Snapshot from Amazon S3
- Select the table in the Table Browser.
- Click Restore Table.
- Choose Remote S3 and select the table to restore.
- Click Restore.
Cloudera Manager creates a local copy of the remote snapshot with a name beginning with cm-tmp followed by an auto-generated filename, and uses that local copy to restore the table in HBase. Cloudera Manager then automatically deletes the local copy. If the Restore command fails without completing, the temporary copy might not be deleted and can be seen in the Table Browser. In that case, delete the local temporary copy manually and re-run the Restore command to restore the table from Amazon S3.
Restoring an HBase Snapshot from Amazon S3 with a New Name
By restoring an HBase snapshot stored in Amazon S3 with a new name, you clone the table without affecting the existing table in HBase. To do this, perform the following steps:
- Select the table in the Table Browser.
- Click Restore Table From Snapshot As.
- In the Restore As dialog box, enter a new name for the table in the Restore As field.
- Select Remote S3 and choose the snapshot in the list of available Amazon S3 snapshots.
Managing Policies for HBase Snapshots in Amazon S3
You can configure policies to automatically create snapshots of HBase tables on an hourly, daily, weekly, monthly or yearly basis. Snapshot policies for HBase snapshots stored in Amazon S3 are configured using the same procedures as for local HBase snapshots. These procedures are described in Cloudera Manager Snapshot Policies. For snapshots stored in Amazon S3, you must also choose Remote S3 in the Destination section of the policy management dialog boxes.
When you create a snapshot based on a snapshot policy, a local copy of the snapshot is created with a name beginning with cm-auto followed by an auto-generated filename. The temporary copy of the snapshot is displayed in the Table Browser and is deleted as soon as the remote snapshot has been stored in Amazon S3. If the snapshot procedure fails without being completed, the temporary local snapshot might not be deleted. This copy can be manually deleted or kept as a valid local snapshot. To export the HBase snapshot to Amazon S3, use the HBase command-line tools to manually export the existing temporary local snapshot to Amazon S3.
Managing HBase Snapshots Using the Command Line
- Follow these command-line instructions on systems that do not use Cloudera Manager.
- This information applies specifically to CDH 6.3.x. See Cloudera Documentation for information specific to other releases.
About HBase Snapshots
In previous HBase releases, the only way to a back up or to clone a table was to use CopyTable or ExportTable, or to copy all the hfiles in HDFS after disabling the table. These methods have disadvantages:
- CopyTable and ExportTable can degrade RegionServer performance.
- Disabling the table means no reads or writes; this is usually unacceptable.
HBase snapshots allow you to clone a table without making data copies, and with minimal impact on RegionServers. Exporting the table to another cluster does not have any impact on the RegionServers.
Use Cases
- Recovery from user or application errors
- Useful because it may be some time before the database administrator notices the error.
Note:
The database administrator needs to schedule the intervals at which to take and delete snapshots. Use a script or management tool; HBase does not have this functionality.
- The database administrator may want to save a snapshot before a major application upgrade or change.
Note:
Snapshots are not primarily used for system upgrade protection because they do not roll back binaries, and would not necessarily prevent bugs or errors in the system or the upgrade.
- Recovery cases:
- Roll back to previous snapshot and merge in reverted data.
- View previous snapshots and selectively merge them into production.
- Useful because it may be some time before the database administrator notices the error.
- Backup
- Capture a copy of the database and store it outside HBase for disaster recovery.
- Capture previous versions of data for compliance, regulation, and archiving.
- Export from a snapshot on a live system provides a more consistent view of HBase than CopyTable and ExportTable.
- Audit or report view of data at a specific time
- Capture monthly data for compliance.
- Use for end-of-day/month/quarter reports.
- Application testing
- Test schema or application changes on similar production data from a snapshot and then discard.
For example:
- Take a snapshot.
- Create a new table from the snapshot content (schema and data)
- Manipulate the new table by changing the schema, adding and removing rows, and so on. The original table, the snapshot, and the new table remain independent of each other.
- Test schema or application changes on similar production data from a snapshot and then discard.
- Offload work
- Capture, copy, and restore data to another site
- Export data to another cluster
Where Snapshots Are Stored
Snapshot metadata is stored in the .hbase_snapshot directory under the hbase root directory (/hbase/.hbase-snapshot). Each snapshot has its own directory that includes all the references to the hfiles, logs, and metadata needed to restore the table.
hfiles required by the snapshot are in the /hbase/data/<namespace>/<tableName>/<regionName>/<familyName>/ location if the table is still using them; otherwise, they are in /hbase/.archive/<namespace>/<tableName>/<regionName>/<familyName>/.
Zero-Copy Restore and Clone Table
From a snapshot, you can create a new table (clone operation) or restore the original table. These two operations do not involve data copies; instead, a link is created to point to the original hfiles.
Changes to a cloned or restored table do not affect the snapshot or (in case of a clone) the original table.
To clone a table to another cluster, you export the snapshot to the other cluster and then run the clone operation; see Exporting a Snapshot to Another Cluster.
Reverting to a Previous HBase Version
Snapshots do not affect HBase backward compatibility if they are not used.
If you use snapshots, backward compatibility is affected as follows:
- If you only take snapshots, you can still revert to a previous HBase version.
- If you use restore or clone, you cannot revert to a previous version unless the cloned or restored tables have no links. Links cannot be detected automatically; you would need to inspect the file system manually.
Storage Considerations
Because hfiles are immutable, a snapshot consists of a reference to the files in the table at the moment the snapshot is taken. No copies of the data are made during the snapshot operation, but copies may be made when a compaction or deletion is triggered. In this case, if a snapshot has a reference to the files to be removed, the files are moved to an archive folder, instead of being deleted. This allows the snapshot to be restored in full.
Because no copies are performed, multiple snapshots share the same hfiles, butfor tables with lots of updates, and compactions, each snapshot could have a different set of hfiles.
Configuring and Enabling Snapshots
Snapshots are on by default; to disable them, set the hbase.snapshot.enabled property in hbase-site.xml to false:
<property> <name>hbase.snapshot.enabled</name> <value> false </value> </property>
To enable snapshots after you have disabled them, set hbase.snapshot.enabled to true.
If you have taken snapshots and then decide to disable snapshots, you must delete the snapshots before restarting the HBase master; the HBase master will not start if snapshots are disabled and snapshots exist.
Snapshots do not affect HBase performance if they are not used.
Shell Commands
You can manage snapshots by using the HBase shell or the HBaseAdmin Java API.
The following table shows actions you can take from the shell.
Action |
Shell command |
Comments |
---|---|---|
Take a snapshot of tableX called snapshotX |
snapshot 'tableX', 'snapshotX' |
Snapshots can be taken while a table is disabled, or while a table is online and serving traffic.
|
Restore snapshot snapshotX (replaces the source table content) |
restore_snapshot ‘snapshotX’ |
For emergency use only; see Restrictions. Restoring a snapshot replaces the current version of a table with different version. To run this command, you must disable the target table. The restore
command takes a snapshot of the table (appending a timestamp code), and then clones data into the original data and removes data not in the snapshot. If the operation succeeds, the target table is
enabled.
Warning: If you use coprocessors, the
coprocessor must be available on the destination cluster before restoring the snapshot.
|
List all available snapshots |
list_snapshots |
|
List all available snapshots starting with ‘mysnapshot_’ (regular expression) |
list_snapshots ‘my_snapshot_.*’ |
|
Remove a snapshot called snapshotX |
delete_snapshot ‘snapshotX’ |
|
Create a new table tableY from a snapshot snapshotX |
clone_snapshot ‘snapshotX’, ‘tableY’ |
Cloning a snapshot creates a new read/write table that serves the data kept at the time of the snapshot. The original table and the cloned table can be modified independently; new data written to one table does not show up on the other. |
Taking a Snapshot Using a Shell Script
#!/bin/bash # Take a snapshot of the table passed as an argument # Usage: snapshot_script.sh table_name # Names the snapshot in the format snapshot-YYYYMMDD # Parse the arguments if [ -z $1 ]||[$1 == '-h' ]; then echo "Usage: $0 <table>" echo " $0 -h" exit 1 fi # Modify to suit your environment export HBASE_PATH=/home/user/hbase export DATE=`date +"%Y%m%d"` echo "snapshot '$1', 'snapshot-$DATE'" | $HBASE_PATH/bin/hbase shell -n status=$? if [$status -ne 0]; then echo "Snapshot may have failed: $status" fi exit $status
HBase Shell returns an exit code of 0 on successA non-zero exit code indicates the possibility of failure, not a definite failure. Your script should check to see if the snapshot was created before taking the snapshot again, in the event of a reported failure.
Exporting a Snapshot to Another Cluster
The ExportSnapshot tool executes a MapReduce Job similar to distcp to copy files to the other cluster. It works at file-system level, so the HBase cluster can be offline.
Run ExportSnapshot as the hbase user or the user that owns the files. If the user, group, or permissions need to be different on the destination cluster than the source cluster, use the -chuser, -chgroup, or -chmod options as in the second example below, or be sure the destination directory has the correct permissions. In the following examples, replace the HDFS server path and port with the appropriate ones for your cluster.
To copy a snapshot called MySnapshot to an HBase cluster srv2 (hdfs://srv2:8020/hbase) using 16 mappers:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:<hdfs_port>/hbase -mappers 16
To export the snapshot and change the ownership of the files during the copy:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs://srv2:<hdfs_port>/hbase -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -Dsnapshot.export.default.map.group=10 -snapshot MY_SNAPSHOT -copy-to hdfs://cluster2/hbase(The number of mappers is calculated as TotalNumberOfHFiles/10.)
To export from one remote cluster to another remote cluster, specify both -copy-from and -copy-to parameters.
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot-test -copy-from hdfs://machine1/hbase -copy-to hdfs://machine2/my-backup
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot-test -copy-from hdfs://machine1/hbase -copy-to hdfs://machine2/my-backup -target new-snapshot
Restrictions
Do not use merge in combination with snapshots. Merging two regions can cause data loss if snapshots or cloned tables exist for this table.
The merge is likely to corrupt the snapshot and any tables cloned from the snapshot. If the table has been restored from a snapshot, the merge may also corrupt the table. The snapshot may survive intact if the regions being merged are not in the snapshot, and clones may survive if they do not share files with the original table or snapshot. You can use the Snapinfo tool (see Information and Debugging) to check the status of the snapshot. If the status is BROKEN, the snapshot is unusable.
- If you have enabled the AccessController Coprocessor for HBase, only a global administrator can take, clone, or restore a snapshot, and these actions do not capture the ACL rights. This means that restoring a table preserves the ACL rights of the existing table, and cloning a table creates a new table that has no ACL rights until the administrator adds them.
- Do not take, clone, or restore a snapshot during a rolling restart. Snapshots require RegionServers to be up; otherwise, the snapshot fails.
Note: This restriction also applies to a rolling upgrade, which can be done only through Cloudera Manager.
If you are using HBase Replication and you need to restore a snapshot:
Snapshot restore is an emergency tool; you need to disable the table and table replication to get to an earlier state, and you may lose data in the process.
If you are using HBase Replication, the replicas will be out of sync when you restore a snapshot. If you need to restore a snapshot, proceed as follows:
- Disable the table that is the restore target, and stop the replication.
- Remove the table from both the master and worker clusters.
- Restore the snapshot on the master cluster.
- Create the table on the worker cluster and use CopyTable to initialize it.
If this is not an emergency (for example, if you know exactly which rows you have lost), you can create a clone from the snapshot and create a MapReduce job to copy the data that you have lost.
In this case, you do not need to stop replication or disable your main table.
Snapshot Failures
Region moves, splits, and other metadata actions that happen while a snapshot is in progress can cause the snapshot to fail. The software detects and rejects corrupted snapshot attempts.
Information and Debugging
You can use the SnapshotInfo tool to get information about a snapshot, including status, files, disk usage, and debugging information.
Examples:
Use the -h option to print usage instructions for the SnapshotInfo utility.
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -h Usage: bin/hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo [options] where [options] are: -h|-help Show this help and exit. -remote-dir Root directory that contains the snapshots. -list-snapshots List all the available snapshots and exit. -snapshot NAME Snapshot to examine. -files Files and logs list. -stats Files and logs stats. -schema Describe the snapshotted table.
Use the -list-snapshots option to list all snapshots and exit.
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -list-snapshots SNAPSHOT | CREATION TIME | TABLE NAME snapshot-test | 2014-06-24T19:02:54 | test
Use the -remote-dir option with the -list-snapshots option to list snapshots located on a remote system.
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -remote-dir s3a://mybucket/mysnapshot-dir -list-snapshots SNAPSHOT | CREATION TIME | TABLE NAME snapshot-test 2014-05-01 10:30 myTable
Use the -snapshot option to print information about a specific snapshot.
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot Snapshot Info ---------------------------------------- Name: test-snapshot Type: DISABLED Table: test-table Version: 0 Created: 2012-12-30T11:21:21 **************************************************************
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -stats -snapshot snapshot-test Snapshot Info ---------------------------------------- Name: snapshot-test Type: FLUSH Table: test Format: 0 Created: 2014-06-24T19:02:54 1 HFiles (0 in archive), total size 1.0k (100.00% 1.0k shared with the source table)
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -schema -snapshot snapshot-test Snapshot Info ---------------------------------------- Name: snapshot-test Type: FLUSH Table: test Format: 0 Created: 2014-06-24T19:02:54 Table Descriptor ---------------------------------------- 'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
Use the -files option with the -snapshot option to list information about files contained in a snapshot.
$ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot -files Snapshot Info ---------------------------------------- Name: test-snapshot Type: DISABLED Table: test-table Version: 0 Created: 2012-12-30T11:21:21 Snapshot Files ---------------------------------------- 52.4k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/bdf29c39da2a4f2b81889eb4f7b18107 (archive) 52.4k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/1e06029d0a2a4a709051b417aec88291 (archive) 86.8k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/506f601e14dc4c74a058be5843b99577 (archive) 52.4k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/5c7f6916ab724eacbcea218a713941c4 (archive) 293.4k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/aec5e33a6564441d9bd423e31fc93abb (archive) 52.4k test-table/02ba3a0f8964669520cf96bb4e314c60/cf/97782b2fbf0743edaacd8fef06ba51e4 (archive) 6 HFiles (6 in archive), total size 589.7k (0.00% 0.0 shared with the source table) 0 Logs, total size 0.0
<< Cloudera Manager Snapshot Policies | ©2016 Cloudera, Inc. All rights reserved | Managing HDFS Snapshots >> |
Terms and Conditions Privacy Policy |