Load Data from Kafka

Securely Connect to Kafka from SingleStore

Overview

When running a CREATE PIPELINE ... KAFKA ... statement, you may need to make a secure connection to Kafka.

Use Secure Socket Layer (SSL) for the connection and Simple Authentication and Security Layer (SASL) to authenticate. Using SASL for authentication is optional.

GSSAPI (Kerberos)
PLAIN
SCRAM-SHA-256
SCRAM-SHA-512
OAUTHBEARER SASL

Kerberos may used for authentication.

This topic assumes SSL and/or Kerberos have been set up, configured, and enabled on the Kafka brokers. For information on how to enable this functionality, see the SSL and SASL sections in the Kafka documentation.

Convert Java Keystore (JKS) to Privacy Enhanced Mail (PEM) Key

To use SSL encryption for SingleStore pipelines, JKS keys need to be converted to PEM keys.

Note

In the steps below, the < > symbols indicate a variable and any information provided between these symbols is an example.

Create the key and keystore. You will be prompted to enter details such as name, organizational unit, city, state, etc.

Shell

keytool -genkey -keyalg RSA -keystore <keystore-name>.jks -storepass <password> -alias "<keystore-alias-name>"

What is your first and last name?
 [Unknown]:- 
What is the name of your organizational unit?
 [Unknown]:-
What is the name of your organization?
 [Unknown]:-

Export the client certificate from the keystore using the same password as in step 1.

Shell

keytool -exportcert -rfc -file <pem-name>.pem -alias <pem-name-alias> -keystore <keystore-name>.jks

Enter keystore password:  <password>
Certificate stored in file <pem-name>.pem

Import the client certificate to the truststore located on your Apache Server. Enter a new password for the keystore.

Shell

keytool -keystore <truststore-name>.jks -alias <truststore-name-alias> -import -file <pem-name>.pem

Enter keystore password:  
Re-enter new password: 

Trust this certificate? [no]:  yes
Certificate was added to keystore

Convert the client keystore to Public-Key Cryptography Standards (PKCS12) format. The <key name>.jks and password are the same as in step 1.

Shell

keytool -v -importkeystore -srckeystore <keystore-name>.jks -srcalias <keystore alias> -destkeystore <new-keystore-name>.p12 -deststoretype PKCS12

Importing keystore <keyname>.jks to <new-keyname>.p12...
Enter destination keystore password:  
Re-enter new password: 
Enter source keystore password:  
[Storing new-keystore-name.p12]

Extract the client certificate key into a .pem file. Use the import password created in step 4.
Shell
```
openssl pkcs12 -in <new-keyname>.p12 -nocerts -nodes > <keyname>.pem
```
```
Enter Import Password:
```

Note

Refer to Generating SSL Certificates for more information.

Steps for Creating a Secure Connection

To create a secure connection from SingleStore to a Kafka cluster, follow these steps in order.

Copy Security Files to the SingleStoreCluster

Securely copy the CA certificate, SSL certificate, and SSL key used for connections between the SingleStorecluster and Kafka brokers from the Kafka cluster to each SingleStore node. You should use a secure file transfer method, such as scp, to copy the files to your SingleStore nodes. The file locations on your SingleStore nodes should be consistent across the cluster.

Setup Your SingleStoreCluster for Kerberos Authentication

To configure a Kafka pipeline to authenticate with Kerberos, you must configure all of your nodes in your SingleStorecluster as clients for Kerberos authentication. To do this, perform the following steps:

Securely copy the keytab file containing the SingleStore service principal (e.g. memsql/host.domain.com@REALM.NAME) from the Kerberos server to each node in your SingleStorecluster. You should use a secure file transfer method, such as scp, to copy the keytab file to your SingleStore nodes. The file location on your SingleStore nodes should be consistent across the cluster.
Make sure your SingleStore nodes can connect to the KDC server using the fully-qualified domain name (FQDN) of the KDC server. This might require configuring network settings or updating /etc/hosts on your nodes.
Also ensure that the memsql service account on each node can access the copied keytab file. This can be accomplished by changing file ownership or permissions. If the memsql account cannot access the keytab file, you will not be able to complete the next step because your master aggregator will not be able to restart after applying configuration updates.
When authenticating with Kerberos, SingleStore needs to authenticate as a client, which means you must also install a Kerberos client onto each node in your cluster. The following installs the krb5-user package for Debian-based Linux distributions.
Shell
```
sudo apt-get update && apt-get install krb5-user
```
When setting up your Kerberos configuration settings, set your default realm, Kerberos admin server, and other options to those defined by your KDC server. In the examples used in this topic, the default realm is EXAMPLE.COM, and the Kerberos server settings are set to the FQDN of the KDC server host.example.com.

Build a String with the Connection Settings

Using the following settings, create a string containing the CONFIG clause and optionally the CREDENTIALS clause of the CREATE PIPELINE ... KAFKA ... or SELECT ... INTO KAFKA ... statement that you will be running.

SSL Connection Settings

In your CONFIG JSON, if you want to enable SSL encryption only, set "security.protocol": "ssl". If you want to enable Kerberos with SSL, or otherwise want to use SASL, set "security.protocol": "sasl_ssl".
Set the remaining SSL configuration in the CONFIG JSON:
- ssl.ca.location: Path to the CA certificate on the SingleStore node.
- ssl.certificate.location: Path to the SSL certificate on the SingleStore node.
- ssl.key.location: Path to the SSL certificate key on the SingleStore node.
If your SSL certificate key is using a password, set it in your CREDENTIALS JSON.
- ssl.key.password: Password for the SSL certificate key.

SASL Connection Settings

In your CONFIG JSON, set "security.protocol": "sasl_ssl" for SSL connections, or "security.protocol": "sasl_plaintext" if you want to authenticate with Kafka without SSL encryption.
If your Kafka brokers do not use SCRAM for authentication, set "sasl.mechanism": "PLAIN" in your CONFIG JSON. Otherwise, set "sasl.mechanism": "SCRAM-SHA-256" or "sasl.mechanism": "SCRAM-SHA-512".
In your CONFIG JSON, provide the username, "sasl.username": "<kafka_credential_username>".
In your CREDENTIALS JSON, provide the password, "sasl.password": "<kafka_credential_password>".

Note

SASL_PLAINTEXT/PLAIN authentication mode with Kafka sends your credentials unencrypted over the network. It is therefore not secure and susceptible to being sniffed.

SASL_PLAINTEXT/SCRAM authentication mode with Kafka will encrypt the credentials information sent over the network, but transport of Kafka messages themselves is not secure.

Kerberos Connection Settings

In your CONFIG JSON, set "sasl.mechanism": "GSSAPI".
Set "security.protocol": "sasl_ssl" for Kerberos and SSL connections, or "security.protocol": "sasl_plaintext" if you want to authenticate with Kerberos without SSL encryption.
Set the remaining Kerberos configuration in CONFIG JSON:
- sasl.kerberos.service.name: The Kerberos principal name that Kafka runs as. For example, "kafka".
- sasl.kerberos.keytab: The local file path on the SingleStore node to the authenticating keytab.
- sasl.kerberos.principal: The service principal name for the SingleStorecluster. For example, "memsql/host.example.com@EXAMPLE.COM".

Configuring OAUTHBEARER Authentication Mechanism

OAUTHBEARER is used to secure access to resources on a server by requiring clients to obtain a bearer token. The client presents this token with each request to the server, and the server verifies the token before granting access to the requested resource.

To use SASL OAUTHBEARER authentication with Kafka, the following information is required:

"sasl.mechanism":"OAUTHBEARER" - Specifies the client will authenticate using an OAuth 2.0 Bearer Token.
"sasl.oauthbearer.client.id":"<CLIENT_ID>" - The client ID is usually provided by the OAuth provider when the client is registered. It is a unique ID that is associated with the OAuth 2.0 Bearer Token.
"sasl.oauthbearer.client.secret":"<CLIENT_SECRET>" The client secret is usually assigned by the OAuth provider when a client is registered and is used to authenticate the client.
"sasl.oauthbearer.token.endpoint.url":"<ENDPOINT_URL>" - This is the endpoint URL on the authorization server that is used to obtain an OAuth 2.0 Bearer Token. The client sends a request to this endpoint to get a token, which is then used to authenticate subsequent requests to the server.

Optional configurations are:

"sasl.oauthbearer.scope":"<SCOPE>" - Determines the permissions and resources that are available to an authorized client. This extension is optional.
"sasl.outhbearer.extensions":"<EXTENSION>" - Can be included in the SASL/OAUTHBEARER mechanism to provide additional data or parameters for authentication. Consult RFC-7628 for further information on SASL extensions.
See "OAUTHBEARER Pipelines Configuration Details" section for additional details regarding "sasl.oauthbearer.ssl.ca.location" "sasl.oauthbearer.config" fields.

Prerequisites

To use SASL OAUTHBEARER authentication the following prerequisites are required:

A Kafka broker with listeners configured using OAUTHBEARER authentication.
A connection to a database where Kafka brokers are reachable to create a pipeline to pull data from the Kafka queue.
An identity service was selected (e.g., Okta, Google OAuth, Facebook OAuth, Azure AD, Keycloak, etc.).
An Oauthbearer client was created and configured with the client_credentials grant type on the identity service.

Note

The instructions for setting up a client on the Identity Service will vary depending on the server chosen and the type of application being built.

Syntax for SASL OAUTHBEARER Pipeline

Below is the syntax for creating a Kafka pipeline using OAUTHBEARER authentication.

SQL

CREATE or REPLACE PIPELINE <pipeline_name>   
  AS LOAD DATA KAFKA "<Kafka cluster location>"   
  CONFIG '{"security.protocol":"SASL_SSL",  
         "sasl.mechanism":"OAUTHBEARER",
         "sasl.oauthbearer.client.id":"<CLIENT_ID>",    
         "sasl.oauthbearer.client.secret":<"CLIENT_SECRET>",
         "sasl.oauthbearer.token.endpoint.url":"<ENDPOINT_URL>",
         "sasl.oauthbearer.scope":"<SCOPE>"}'  
  INTO TABLE <table_name>;

OAUTHBEARER Pipelines Configuration Details

To review the current pipeline configuration use the command:

SQL

SELECT * FROM information_schema.pipelines

Most token providers are assumed to work out-of-the-box. However, the implementation details below will allow troubleshooting and configuring pipelines for specific cases.

Token Request Details:

SingleStore implements client_credentials grant type for OAUTHBEARER token requests. Ensure the OAuth client is created with support for the client_credentials grant type.
By default, SingleStore treats Oauthbearer tokens as opaque to address privacy concerns and uses the expires_in field from the JSON token response to determine the token's validity time. If expires_in is not present or "sasl.oauthbearer.config”:"use_expires_in=false” is set in the pipeline configuration, we fall back to decoding the Oauth token JWT and use the exp claim to determine its validity.
The token refresh is scheduled at 80% of the token’s lifetime and handled by SingleStore in the background.
For https Oauth token requests, by default, SingleStore will use the CA bundle specified at pipeline creation or try to find one of the system paths if none are specified. This behavior can be changed by including additional configuration settings in the pipeline CONFIG on creation. This change will affect token request logic only, not the total SSL Kafka communication.
- "sasl.oauthbearer.ssl.ca.location":"system" - to use the system default path;
- "sasl.oauthbearer.ssl.ca.location":"/usr/lib/ssl/certs/ca-certificates.crt" - to use a specific CA path location;
- "sasl.oauthbearer.ssl.ca.location":"" (empty) to disable SSL verification;

Token Cache Details:

The OAUTHBEARER tokens are cached in the extractor pools for some time (along with the extractors) and the pipeline cookies (per pipeline). The Oauth server must allow multiple simultaneous active tokens due to the several nodes or extractor pools used in the SingleStore architecture.
"sasl.oauthbearer.cookie.cache":"on" - by default, tokens are cached in the pipeline cookies to minimize the number of token requests. To disable this feature set "sasl.oauthbearer.cookie.cache":"off". To reset the pipeline cookies, use alter pipeline … set offset cursor ''.
The pipeline may fail because the OAUTHBEARER client config on the identity server is changed or the pipeline config is altered in a way which causes previously requested cached tokens to become invalid. This issue may be avoided by resetting the respective pipeline cache.
- alter pipeline … set offset cursor '' - to reset cookies cache;
- flush extractor pools - to reset extractor pools;

Kafka Version Setting

Warning

Using SSL and SASL with Kafka requires Kafka protocol version 0.9 or later; therefore, CREATE PIPELINE ... KAFKA ... and SELECT ... INTO KAFKA ... statements using SSL and SASL with Kafka also need to adhere to that version requirement. The Kafka protocol version can be passed in through JSON through the CONFIG clause, similar to this CONFIG '{"kafka_version":"0.10.0.0"}'. Alternatively, the pipelines_kafka_version engine variable controls this parameter for any pipeline without using a Kafka version configuration value in a CREATE PIPELINE ... KAFKA ... statement.

Final Step: Use the Connection String in a SQL Statement

Create your CREATE PIPELINE ... KAFKA ... or SELECT ... INTO KAFKA ... statement, using the string containing the connection settings that you created in the previous steps.

Examples

The following examples make the following assumptions:

Port 9092 is a plaintext endpoint
Port 9093 is an SSL endpoint
Port 9094 is a plaintext SASL endpoint
Port 9095 is an SSL SASL endpoint

Note

The examples use CREATE PIPELINE, but the CONFIG and CREDENTIALS clauses shown can be used with SELECT ... INTO ... KAFKA also. SASL OAUTHBEARER is not supported with SELECT ... INTO ... KAFKA.

Plaintext

The following CREATE PIPELINE statements are equivalent:

SQL

CREATE PIPELINE `kafka_plaintext`
AS LOAD DATA KAFKA 'host.example.com:9092/test'
CONFIG '{"security.protocol": "plaintext"}'
INTO table t;

SQL

CREATE PIPELINE `kafka_no_creds`
AS LOAD DATA KAFKA 'host.example.com:9092/test'
INTO table t;

SSL

SQL

CREATE PIPELINE `kafka_ssl`
AS LOAD DATA KAFKA 'host.example.com:9093/test'
CONFIG '{"security.protocol": "ssl",
"ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem",
"ssl.key.location": "/var/private/ssl/client_memsql_client.key",
"ssl.ca.location": "/var/private/ssl/ca-cert.pem"}'
CREDENTIALS '{"ssl.key.password": "abcdefgh"}'
INTO table t;

Kerberos with no SSL

SQL

CREATE PIPELINE `kafka_kerberos_no_ssl`
AS LOAD DATA KAFKA 'host.example.com:9094/test'
CONFIG '{""security.protocol": "sasl_plaintext",
"sasl.mechanism": "GSSAPI",
"sasl.kerberos.service.name": "kafka",
"sasl.kerberos.principal": "memsql/host.example.com@EXAMPLE.COM",
"sasl.kerberos.keytab": "/etc/krb5.keytab"}'
INTO table t

Kerberos with SSL

SQL

CREATE PIPELINE `kafka_kerberos_ssl`
AS LOAD DATA KAFKA 'host.example.com:9095/test'
CONFIG '{"security.protocol": "sasl_ssl",
"sasl.mechanism": "GSSAPI",
"ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem",
"ssl.key.location": "/var/private/ssl/client_memsql_client.key",
"ssl.ca.location": "/var/private/ssl/ca-cert.pem",
"sasl.kerberos.service.name": "kafka",
"sasl.kerberos.principal": "memsql/host.example.com@EXAMPLE.COM",
"sasl.kerberos.keytab": "/etc/krb5.keytab"}'
CREDENTIALS '{"ssl.key.password": "abcdefgh"}'
INTO table t

SASL/PLAIN with SSL

SQL

CREATE PIPELINE `kafka_sasl_ssl_plain`
AS LOAD DATA KAFKA 'host.example.com:9095/test'
CONFIG '{"security.protocol": "sasl_ssl",
"sasl.mechanism": "PLAIN",
"ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem",
"ssl.key.location": "/var/private/ssl/client_memsql_client.key",
"ssl.ca.location": "/var/private/ssl/ca-cert.pem",
"sasl.username": "kafka"}'
CREDENTIALS '{"ssl.key.password": "abcdefgh", "sasl.password": "metamorphosis"}'
INTO table t;

SASL/PLAIN without SSL

SQL

CREATE PIPELINE `kafka_sasl_plaintext_plain`
AS LOAD DATA KAFKA 'host.example.com:9094/test'
CONFIG '{"security.protocol": "sasl_plaintext",
"sasl.mechanism": "PLAIN",
"sasl.username": "kafka"}'
CREDENTIALS '{"sasl.password": "metamorphosis"}'
INTO table t;

SASL/SCRAM with SSL

SQL

CREATE PIPELINE `kafka_sasl_ssl_scram`
AS LOAD DATA KAFKA 'host.example.com:9095/test'
CONFIG '{"security.protocol": "sasl_ssl",
"sasl.mechanism": "SCRAM-SHA-512",
"ssl.certificate.location": "/var/private/ssl/client_memsql_client.pem",
"ssl.key.location": "/var/private/ssl/client_memsql_client.key",
"ssl.ca.location": "/var/private/ssl/ca-cert.pem",
"sasl.username": "kafka"}'
CREDENTIALS '{"ssl.key.password": "abcdefgh", "sasl.password": "metamorphosis"}'
INTO table t;

SASL/SCRAM without SSL

SQL

CREATE PIPELINE `kafka_sasl_plaintext_plain`
AS LOAD DATA KAFKA 'host.example.com:9094/test'
CONFIG '{"security.protocol": "sasl_plaintext",
"sasl.mechanism": "SCRAM-SHA-512",
"sasl.username": "kafka"}'
CREDENTIALS '{"sasl.password": "metamorphosis"}'
INTO table t;

Configure SASL OAUTHBEARER for Use with Kafka Pipelines

Note

SASL OAUTHBEARER is not supported with SELECT INTO ...

SQL

CREATE or REPLACE PIPELINE <pipeline_name>   
AS LOAD DATA KAFKA "<Kafka cluster location>"  
CONFIG '{"security.protocol":"SASL_SSL",  
         "sasl.mechanism":"OAUTHBEARER",
         "sasl.oauthbearer.client.id":"<CLIENT_ID>",    
         "sasl.oauthbearer.client.secret":<"CLIENT_SECRET>",
         "sasl.oauthbearer.token.endpoint.url":"<ENDPOINT_URL>",
         "sasl.oauthbearer.scope":"<SCOPE>"}'  
INTO TABLE <table_name>;

Connecting to Confluent Schema Registry over SSL

To enable Pipelines to connect to Confluent Schema Registry over SSL, install the registry’s certificate (ca-cert) on all nodes in your SingleStore cluster. For example:

Shell

cat ca-cert >> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
update-ca-trust

Note

You only need to install the registry’s certificate (ca-cert) in your SingleStore cluster if you are using a self-signed certificate or a certificate signed by an internal CA.

After installing the registry’s certificate on all nodes, existing Pipelines that you already created (or new Pipelines that you create) with the host name/IP address and port of Confluent Schema Registry will use SSL automatically.

Schema Registry Configuration Options

Supported configuration options include:

schema.registry.ssl.ca.location
schema.registry.ssl.ca.directory
schema.registry.ssl.verifypeer
schema.registry.ssl.certificate.location
schema.registry.ssl.key.location
schema.registry.ssl.key.password
schema.registry.curl.timeout
schema.registry.curl.tcp.keepalive
schema.registry.curl.tcp.keepidle
schema.registry.curl.keepintvl
schema.registry.curl.connecttimeout
schema.registry.username
schema.registry.password
schema.registry.curl.verbose

Load Data using the SingleStore Kafka Sink Connector

The SingleStore Kafka Sink Connector is a Kafka Connect connector that enables you to ingest AVRO, JSON, and CSV messages from Kafka topics into SingleStore. It is a Sink (target) connector designed to read data from Kafka topics and write the data to SingleStore databases.

Refer to Load Data using the SingleStore Kafka Sink Connector for more information.

Load Data from Kafka

On this page

Securely Connect to Kafka from SingleStore

Overview

Convert Java Keystore (JKS) to Privacy Enhanced Mail (PEM) Key

Steps for Creating a Secure Connection

Copy Security Files to the SingleStoreCluster

Setup Your SingleStoreCluster for Kerberos Authentication

Build a String with the Connection Settings

SSL Connection Settings

SASL Connection Settings

Kerberos Connection Settings

Configuring OAUTHBEARER Authentication Mechanism

Kafka Version Setting

Final Step: Use the Connection String in a SQL Statement

Examples

Plaintext

SSL

Kerberos with no SSL

Kerberos with SSL

SASL/PLAIN with SSL

SASL/PLAIN without SSL

SASL/SCRAM with SSL

SASL/SCRAM without SSL

Configure SASL OAUTHBEARER for Use with Kafka Pipelines

Connecting to Confluent Schema Registry over SSL

Schema Registry Configuration Options

Load Data using the SingleStore Kafka Sink Connector

In this section

Was this article helpful?

On this page

Was this article helpful?