Cloudera Enterprise 6.3.x | Other versions

Backup and Disaster Recovery

Cloudera Manager provides an integrated, easy-to-use management solution for enabling data protection on the Hadoop platform. Cloudera Manager enables you to replicate data across data centers for disaster recovery scenarios. Replications can include data stored in HDFS, data stored in Hive tables, Hive metastore data, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore. When critical data is stored on HDFS, Cloudera Manager helps to ensure that the data is available at all times, even in case of complete shutdown of a datacenter.

You can also replicate HDFS data to and from Amazon S3.

You can also use the HBase shell to replicate HBase data. (Cloudera Manager does not manage HBase replications.)

  Important: This feature requires a Cloudera Enterprise license. It is not available in Cloudera Express. See Managing Licenses for more information.

You can also use Cloudera Manager to schedule, save, and restore snapshots of HDFS directories and HBase tables.

Cloudera Manager provides key functionality in the Cloudera Manager Admin Console:
  • Select - Choose datasets that are critical for your business operations.
  • Schedule - Create an appropriate schedule for data replication and snapshots. Trigger replication and snapshots as required for your business needs.
  • Monitor - Track progress of your snapshots and replication jobs through a central console and easily identify issues or files that failed to be transferred.
  • Alert - Issue alerts when a snapshot or replication job fails or is aborted so that the problem can be diagnosed quickly.

Replication works seamlessly across Hive and HDFS—you can set it up on files or directories in HDFS and on tables in Hive—without manual translation of Hive datasets to HDFS datasets, or vice versa. Hive metastore information is also replicated, so applications that depend on table definitions stored in Hive will work correctly on both the replica side and the source side as table definitions are updated.

You can also perform a “dry run” to verify configuration and understand the cost of the overall operation before actually copying the entire dataset.

Continue reading:

Page generated August 29, 2019.