Cloudera Enterprise 6.3.x | Other versions

Cloudera Navigator support for Virtual Private Clusters

Cloudera Manager supports deploying workloads in virtual private compute clusters, allowing administrators to access resources for high-demand times or to isolate workloads. In this environment Cloudera Navigator continues to extract metadata and track audit events from services running on the Base cluster and track audit events from services running on the Compute cluster. Navigator does not extract metadata from services running on the Compute cluster.

  Attention: Navigator does not extract metadata from services running on the compute cluster.

When you use Compute clusters, you define a data context to control how data is shared between a Compute cluster and the Base cluster. The interaction between the Compute clusters and the Base clusters through the data context means that some of the activity that occurs on Compute clusters does affect the metadata collected in Navigator. For example, if you create Hive data assets using HiveServer2 or SparkSQL on the Compute cluster and you have Hive in your data context, you will see entities for the new Hive data assets in Navigator. You won't see lineage for how these assets were created because the operations on the Compute cluster are not extracted. You will see audits for the events that created the assets. The following tables describe the behavior of Navigator metadata and audit collection in Base and Compute clusters for the services Navigator supports.

Navigator Auditing in Virtual Private Compute Clusters

Audits appear in Navigator for events that occur on a Compute cluster. Note that there can be latency between an event occurring and the audit reaching Navigator on the base cluster. Be aware of this possible delay when terminating a compute cluster: audit events from the Compute cluster may be lost if they have not been processed when the cluster is terminated.

Table 1. Audit Behavior in Virtual Private Clusters
Audited Service Base Cluster Compute Cluster
HBase
HDFS
HiveServer2
Hue
Impala
Sentry
Solr

Navigator Metadata and Lineage Extraction in Virtual Private Compute Clusters

No metadata is extracted from services running on a Compute cluster. However, if HDFS or Hive is included in the data context for a Compute cluster, Navigator shows entities created or updated on a Compute cluster and stored in HDFS or Hive Metastore on the Base cluster. For example, when directories or files are created from actions on a Compute cluster with HDFS in its data context, the directories and files are stored on the HDFS in the Base cluster. Navigator collects the metadata from the Base cluster HDFS and creates entities for the directories and files. Similarly, when Hive databases, tables, views, or partitions are created or modified by HiveServer2, Impala, or SparkSQL operations on a Compute cluster and Hive is included in the data context for that cluster, the updated metadata is extracted from HMS on the Base cluster and collected by Navigator. Because Navigator does not extract metadata directly from the Compute cluster, the operations and operation executions that created the data assets are not collected; therefore, Navigator does not calculate lineage for these data assets.

Table 2. Metadata and Lineage Behavior in Virtual Private Clusters
Service Providing Metadata Metadata Lineage Notes
Base Cluster Compute Cluster Base Cluster Compute Cluster
HDFS HDFS in the Data Context
HiveServer2  
HMS Hive in the Data Context
Impala  
MapReduce (v1 and v2)  
Oozie  
Pig  
Spark (v1 and v2)  
Sqoop (v1)  
YARN  
Cluster  
S3 Extraction occurs outside the Base or Compute clusters
Page generated August 29, 2019.