Enabling Wire Encryption and Kerberos on HDFS Pipelines
On this page
In advanced HDFS Pipelines mode, you can encrypt your pipeline’s connection to HDFS and you can authenticate your pipeline using Kerberos.
This topic assumes you have already have set up your HDFS cluster to use wire encryption and/or Kerberos.
To create an advanced HDFS pipeline, first set the advanced_
Engine Variables to true
on the master aggregator.CREATE PIPELINE
statement and pass in JSON attributes in the CONFIG
clause.
Note
With advanced HDFS pipelines, you can enable debug logging.pipelines_
engine sync variable to true
.
Wire Encryption
If encrypted DTP is enabled in your HDFS cluster, you can encrypt your pipeline’s connection to HDFS.CONFIG
JSON that you will use in CREATE PIPELINE
as follows:
-
Set
dfs.
toencrypt. data. transfer true
. -
Set the attributes
dfs.
,encrypt. data. transfer. cipher. key. bitlength dfs.
, andencrypt. data. transfer. algorithm dfs.
.data. transfer. protection Set these attribute’s values as they are specified your hdfs-site.
file.xml Find a copy of this file on each node in your HDFS cluster.
The following example creates a pipeline that uses encrypted DTP to communicate with HDFS.
CREATE PIPELINE my_pipelineAS LOAD DATA HDFS 'hdfs://hadoop-namenode:8020/path/to/files'CONFIG '{"dfs.encrypt.data.transfer": true,"dfs.encrypt.data.transfer.cipher.key.bitlength": 256,"dfs.encrypt.data.transfer.algorithm": "rc4","dfs.data.transfer.protection": "authentication"}'INTO TABLE `my_table`FIELDS TERMINATED BY '\t';
Authenticating with Kerberos
You can create an HDFS pipeline that authenticates with Kerberos.EXAMPLE.
as the default realm and host.
as the fully qualified domain name (FQDN) of the KDC server.
Note
Perform the following steps on every SingleStoreDB leaf node (referred to below as the “node”).
-
Install version 1.
8 or later of the Java Runtime Environment (JRE). The JRE version installed should match the JRE version installed on the HDFS nodes. -
Tell SingleStoreDB the path where the JRE binary files have been installed.
An example path is /usr/bin/java/jre1.
.8. 2_ 12/bin Specify the path using one of the two following methods: Method 1: Add the path to your operating system’s
PATH
environment variable.Method 2: Set the engine variables
java_
andpipelines_ java_ path java_
to the path.pipelines_ java_ home -
On the KDC server, create a SingleStoreDB service principal (e.
g. memsql/host.
) and a keytab file containing the SingleStoreDB service principal.example. com@EXAMPLE. COM -
Securely copy the keytab file containing the SingleStoreDB service principal from the KDC server to the node.
You should use a secure file transfer method, such as scp
, to copy the keytab file to your node.The file location on your node should be consistent across all nodes in the cluster. -
Ensure that the Linux service account used to run SingleStoreDB on the node can access the copied keytab file.
This can be accomplished by changing file ownership or permissions. If this account cannot access the keytab file, you will not be able to complete the next step because your master aggregator will not be able to restart after applying configuration updates. -
When authenticating with Kerberos, SingleStoreDB needs to authenticate as a client, which means you must also install a Kerberos client on your node.
The following command installs the client on Debian-based Linux distributions.
sudo apt-get update && apt-get install krb5-userThe following command installs the client on RHEL/CentOS:
yum install krb5-workstation -
Configure your Kerberos client to connect to the KDC server.
In your node’s /etc/krb5.
file, set your default realm, Kerberos admin server, and other options to those defined by your KDC server.conf -
Make sure your node can connect to the KDC server using the fully-qualified domain name (FQDN) of the KDC server.
This FQDN is found in the /etc/krb5.
file.conf This might require configuring network settings or updating /etc/hosts
on your node. -
Ensure that your node can access every HDFS datanode, using the FQDN or IP by which the HDFS namenode accesses the datanode.
The FQDN is typically used. -
Specify the path of your keytab file in the
kerberos.
attribute of yourkeytab CONFIG
JSON that you will pass to yourCREATE PIPELINE
statement. -
In your
CONFIG
JSON, add the attributesdfs.
anddatanode. kerberos. principal dfs.
.namenode. kerberos. principal Set these attribute’s values as they are specified your hdfs-site.
file.xml Find a copy of this file on each node in your HDFS cluster.
Example CREATE PIPELINE
Statement Using Kerberos
The following example demonstrates how to create an HDFS pipeline that authenticates using Kerberos.
CREATE PIPELINE my_pipelineAS LOAD DATA HDFS 'hdfs://hadoop-namenode:8020/path/to/files'CONFIG '{"hadoop.security.authentication": "kerberos","kerberos.user": "memsql/host.example.com@EXAMPLE.COM","kerberos.keytab": "/path/to/kerberos.keytab","dfs.client.use.datanode.hostname": true,"dfs.datanode.kerberos.principal": "datanode_principal/_HOST@EXAMPLE.COM","dfs.namenode.kerberos.principal": "namenode_principal/_HOST@EXAMPLE.COM"}'INTO TABLE `my_table`FIELDS TERMINATED BY '\t';
Last modified: April 6, 2023