Configuring the Amazon S3 Connector
You can securely configure your cluster to authenticate with Amazon Simple Storage Service (S3) using the Cloudera S3 Connector Service. This configuration enables Hive and Impala queries to access data in S3 and also enables the Hue S3 Browser. Hive, Impala, and Hue are automatically configured to authenticate with S3, but applications such as YARN, MapReduce, or Spark must provide their own AWS credentials when submitting jobs. You can define only one Amazon S3 service for each cluster.
Cloudera Manager stores these values securely and does not store them in world-readable locations. The credentials are masked in the Cloudera Manager Admin console, encrypted in the configurations passed to processes managed by Cloudera Manager, and redacted from the logs.
To access this storage, you define AWS Credentials in Cloudera Manager, and then you add the S3 Connector Service and configure it to use the AWS credentials.
Consider using the S3Guard feature to address possible issues with the "eventual consistency" guarantee provided by Amazon for data stored in S3. To use the S3Guard feature, you provision an Amazon DynamoDb for use as an additional metadata store to improve performance and guarantee that your queries return the most current data. See Configuring and Managing S3Guard.
Continue reading:
Adding AWS Credentials
Minimum Required Role: User Administrator (also provided by Full Administrator)
To connect to Amazon S3, obtain an Access Key and Secret Key from Amazon Web Services, and then add AWS credentials in Cloudera Manager. These keys should permit access to all data in S3 that you want to query with Hive and Impala or browse with Hue.
Managing AWS Credentials in Cloudera Manager
- Open Cloudera Manager and go to .
- Select the AWS Credentials tab.
- To remove a credential, in the row for the credential you want to change, click
You cannot remove a credential that is currently being used by the S3 Connector Service; you must first remove the Connector Service.
.
- To edit a credential, in the row for the credential you want to edit, click .
- Edit the fields of the credential as needed and click Save.
Adding the S3 Connector Service
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
- If all hosts are configured with IAM Role-based Authentication that allows access to S3 and you do not want to use S3Guard, you do not need to add the S3 Connector Service.
- When using the More Secure mode, you must have the Sentry service and Kerberos enabled for the cluster in order add the S3 Connector Service. For secure operation, Cloudera also recommends that you enable TLS for Cloudera Manager.
- A cluster cannot use the S3 Connector Service and the ADLS Connector Service at the same time. You must remove the old connector service before adding a new one. See Removing the ADLS Connector Service or Removing the S3 Connector Service.
- If you have not defined AWS Credentials, add AWS credentials in Cloudera Manager.
- Go to the cluster where you want to add the Amazon S3 Connector Service.
- Click .
- Select S3 Connector.
- Click Continue.
The Add S3 Connector Service to Cluster Name wizard displays.
The wizard checks your configuration for compatibility with S3 and reports any issues. The wizard does not allow you to continue if you have an invalid configuration. Fix any issues, and then repeat these steps to add the S3 Connector Service.
- Select a Credentials Protection Policy. (Not applicable when IAM Role-Based Authentication is used.)
Choose one of the following:
- Less Secure
Credentials can be stored in plain text in some configuration files for specific services (currently Hive, Impala, and Hue) in the cluster.
This configuration is appropriate for unsecure, single-tenant clusters that provide fine-grained access control for data stored in S3.
- More Secure
Cloudera Manager distributes secrets to a limited set of services (currently Hive, Impala, and Hue) and enables those services to access S3. It does not distribute these credentials to any other clients or services. See S3 Credentials Security.
Other configurations that are not sensitive, such as the S3Guard configuration, are included in the configuration of all services and clients as needed.
- Less Secure
- Click Continue.
- Select previously-defined AWS credentials from the Name drop-down list.
- Click Continue.
The Restart Dependent Services page displays and indicates the dependent services that need to be restarted.
- Select Restart Now to restart these services. You can also restart these services later. Hive, Impala, and Hue will not be able to authenticate with S3 until you restart the services.
- Click Continue to complete the addition of the Amazon S3 service. If Restart Now is selected, the dependent services are restarted.
Removing the S3 Connector Service
- Open Cloudera Manager and go to .
- Select the AWS Credentials tab.
- In the row for the credential used for the service, click
The Connect to Amazon Web Services dialog box displays.
.
- Click Disable for Cluster_name.
- Click OK.
A message displays saying "The configuration has been updated". You will need to restart any stale services. Click the View Stale Configurations link to open the Stale Configurations page. Click Restart Stale Services.
<< Get Started with Amazon S3 | ©2016 Cloudera, Inc. All rights reserved | Using S3 Credentials with YARN, MapReduce, or Spark >> |
Terms and Conditions Privacy Policy |