Skip to main content

Hive-DB connections using Kerberos Authentication from Mule 4

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Like all data warehouses, it provides users with a centralized place to store information for easy analysis and data-driven decision-making. Hive specifically allows users to read, write, and manage petabytes of data using SQL.

Using MuleSoft to connect Hive-DB unlocks connectivity for more systems within an organization’s application network. It also does so faster and more securely than other methods.

This blog explains reading Apache Hive-DB data with MuleSoft Database Connector. It uses Kerberos authentication to authenticate with Hive-DB.

 

Prerequisites

You’ll need the following prerequisites to complete this process:

  • Anypoint Studio 7.0
  • MuleSoft Database Connector
  • Cloudera JDBC Driver (ClouderaHiveJDBC42-2.6.18.1021)
  • The below config files:
    • Jaas.conf
    • Krb5.conf
    • {MulesoftUser}.keytab

 

Note:
The krb5.conf and MulesoftUser}.keytab files should be provided through the Hadoop/Hive team.

What the krb5.conf file looks like:

Hive-DB connections pic 1


Jaas.conf

Hive-DB connections pic 2

 

Note:
Kerberos authentication won’t work if the above “Client” root node is changed or missing.

{MulesoftUser}.keytab

The {MulesoftUser}.keytab file is generated after a service user in AD for Hadoop Hive-DB access is created. This serves as a credential file for the user and is an essential step for authenticating using Kerberos.

Use this command to check principal for the keytab:

klist -t -k MulesoftUser.keytab

 

Workflow

The flow presented below is API based and receives database queries from post requests. It directs requests to the appropriate database cluster based on the query parameters, the response for which is then transformed and sent back to the caller.

hive-db connections pic3

Hive-DB connections pic 4

Steps:

1. Download database connector from Mule palette.

Hive-DB connections pic5

2. This will add the following dependency in POM.xml

Hive-DB connections pic6

3. Add the ClouderaHiveJDBC42-2.6.18.1021 driver to your organization repository manager (example: Artifactory/nexus etc.).

Ref: https://stackoverflow.com/questions/24122382/how-to-configure-maven2-to-publish-to-artifactory

4.   Add ClouderaHiveJDBC42-2.6.18.1021 to the POM dependency and share-library

Hive-DB connections pic 7

Hive-DB connections pic8

5. Add the below database configuration in global.xml:

Hive-DB connections pic9

URL pattern:

jdbc:hive2://hadoop-lr4-hiveserver1.anycast.prod.abc.com:10000;AuthMech=1;KrbRealm=ABCDATA.COM;KrbHostFQDN=http://hadoop-lr4-hiveserver1.anycast.prod.abc.com;KrbServiceName=hive;KrbAuthType=1

Note:
Please also add your reconnection strategy at this step.

 

hadoop-lr4-hiveserver1.anycast.prod.abc.com

Host

10000

Port

AuthMech

1, means Kerberos authentication

ABCDATA.COM

KrbRealm, the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service

http://hadoop-lr4-hiveserver1.anycast.prod.abc.com

KrbHostFQDN, used to connect to kerberos host with fully qualified domain name.

hive

KrbServiceName

1

KrbAuthType, to create a LoginContext from a JAAS configuration and then use the Subject associated with it, set the KrbAuthType property to 1.

6. Set up your API kit router and bind it with the appropriate sub-flow and database config per the above workflow.

Hive-DB connections pic10

7. Once the flow is completed, add the following properties and run configurations.

Hive-DB connections pic11

Hive-DB connections pic12

Note:
Be sure to provide the appropriate krb5.conf and jaas.conf paths as per the above prerequisites.

8. Fire a query to your hive-DB using postman.

Hive-DB connections pic14

 

References:

 

To find out more email us at: salesforce@coforge.com

Let’s engage