Configuration Options for Different Sources

Kafka Configurations

The following table shows the SingleStore-specific configurations for a Kafka environment.

Parameter	Description
`spoof.dns`	Used while connecting to Kafka via a proxy, for example, when connecting across multiple cloud services. Use `spoof.dns` to re-route the connections to the proxy without modifying the Kafka broker configuration.
`operation.timeout.ms`	Specifies a timeout for operations such as metadata requests and message consumption/production. This value can be adjusted based on the size of the consumed/produced dataset. Default: 10 seconds `CONFIG '{"operation.timeout.ms" : "10000"}'`
`sasl.kerberos.cache`	Used with Kerberos authentication to specify where to cache Kerberos tickets. When this value is not specified, a `"sasl.tmpdir"` + `/pipeline_digest` location is used. Default `sasl.tmpdir`: `/tmp` `CONFIG '{"sasl.kerberos.cache" : "/tmp"}'`
`sasl.kerberos.disable.kinit`	Use this parameter if the client does not support `kinit` and refresh tokens with SingleStore. Running `kinit` is not required if a background process keeps the Kerberos ticket cache up to date. `CONFIG '{"sasl.kerberos.disable.kinit" : true}'`

The CONFIG clause of a Kafka pipeline can accept a spoof.dns element as an alternative to configuring Kafka brokers. The spoof.dns element must be a JSON object consisting of an arbitrary number of key-value pairs with URL string values. When the pipeline attempts to connect to a Kafka broker whose URL matches one of the keys, the pipeline will connect to the corresponding URL value, effectively remapping the broker URLs inside the pipeline Kafka client.

This CREATE PIPELINE command will let you set the AWS private link configuration for Kafka Brokers with AWS MSK.

SQL

CREATE PIPELINE <pipeline_name> AS LOAD DATA KAFKA '<Kafka bootstrap server endpoint>:<port>/<topic name>'
CONFIG '{
  "spoof.dns": {
    "<broker 1 endpoint>:<port>":"<SingleStore shared endpoint (outbound)>:<NLB listener port for broker 1>",
    "<broker 2 endpoint>:<port>":"<SingleStore shared endpoint (outbound)>:<NLB listener port for broker 2>",
    "<broker 3 endpoint>:<port>":"<SingleStore shared endpoint (outbound)>:<NLB listener port for broker 3>",
  }
}'
INTO TABLE <table_name>;

There are a few more configuration options that are supported by Kafka. Consult the CONFIGURATION.md file in the librdkafka project in GitHub to see the full list.

Note

Some of the configuration options are not supported in SingleStore. The client will receive a "Forbidden Key" error when accessing unsupported configuration options.

The configuration below controls some of the various aspects of the consumer's behavior (e.g., timeouts, fetching behavior, and message handling). These parameters can be adjusted to optimize the performance and reliability of the Kafka consumer based on your environment and requirements.

SQL

CREATE PIPELINE p AS LOAD DATA kafka 'host.example.com:9092/whatever'
  CONFIG '{"fetch.max.bytes": "52428800", "topic.metadata.refresh.interval.ms": "300000", "message.max.bytes": "1000000",
           "fetch.wait.max.ms": "500", "session.timeout.ms": "45000", "topic.metadata.refresh.fast.interval.ms": "100",
           "fetch.min.bytes": "1", "max.partition.fetch.bytes": "1048576", "fetch.message.max.bytes": "1048576",
           "socket.keepalive.enable": "true", "fetch.error.backoff.ms": "500", "socket.timeout.ms": "60000"}'
  INTO TABLE t format CSV;

The following configuration sets some of the different communication options that are used with Kafka brokers (e.g., timeouts, batching behavior, and resource usage). These parameters should be based on your application requirements and specific Kafka deployment environment.

SQL

CREATE PIPELINE p AS LOAD DATA kafka 'host.example.com:9092/whatever2'
  CONFIG '{"connections.max.idle.ms": "230000", "client.id": "<client_id>", "fetch.max.bytes": "1000000",
  "operation.timeout.ms": "30000", "batch.num.messages": "1000", "socket.keepalive.enable": "false",
  "socket.timeout.ms": "60000"}'
INTO TABLE t format CSV;

S3 Configurations

The following table shows the SingleStore-specific configurations for S3.

Parameter	Description
`disable_gunzip`	When this parameter is set to `true`, files with the `.gz` extension are not decompressed. When this parameter is disabled or missing, files with the `.gz` extension are decompressed. `CONFIG '{"disable_gunzip" : true}'`
`request_payer`	Specifies who is responsible for paying for the data transfer and request costs associated with accessing an S3 bucket. By default, the owner of an S3 bucket is responsible for paying these costs. However, when using the `request_payer` parameter, the requester will be responsible for covering the costs associated with the request. This can include costs such as `GET`, `PUT`, and `LIST` requests, as well as data transfer charges. `CONFIG '{"request_payer" : "name"}'`
`endpoint_url`	Specifies the URL of the S3-compatible storage provider. This parameter can be used to direct requests to a non-standard endpoint, such as an S3-compatible service other than AWS. For example, MiniO, which is an S3-compatible storage provider, or a private cloud object storage which exposes an interface like S3. `CONFIG '{"endpoint_url" : "sample_url"}'`
`compatibility_mode`	Instructs the downloader to use S3 API calls that are better supported by third parties. `CONFIG '{"compatibility_mode" : true}'`

No CONFIG clause is required to create an S3 pipeline. This clause is used to specify things like the Amazon S3 region where the source bucket is located or an entrypoint for an S3-compatible object sore. If no CONFIG clause is specified, SingleStore will automatically use the us-east-1 region, also known as US Standard in the Amazon S3 console. To specify a different region, such as us-west-1, include a CONFIG clause as shown in the example below. The CONFIG clause can also be used to specify the suffixes for files to load. These suffixes are a JSON array of strings. When specified, CREATE PIPELINE only loads files that have the specified suffix. Suffixes in the CONFIG clause can be specified without a . before them, for example, CONFIG '{"suffixes": ["csv"]}'.

SQL

CREATE OR REPLACE PIPELINE <pipeline_name>
   AS LOAD DATA S3 'data-test-bucket'
   CONFIG '{"region": "us-east-1","request_payer": "requester", "endpoint_url": "https://storage.googleapis.com", "compatibility_mode": true}'
   CREDENTIALS '{"aws_access_key_id": "ANIAVX7U2LM9QVJMK2ZT",
                 "aws_secret_access_key": "xxxxxxxxxxxxxxxxxxxxxxx"}'
   INTO TABLE 'market_data'
     (ts, timestamp, event_type, ticker, price, quantity, exchange, conditions);

Azure Blob Configurations

The following table shows the SingleStore-specific configurations for Azure Blobs.

Parameter

Description

disable_gunzip

When this parameter is set to true, files with the .gz extension are not decompressed.

When this parameter is disabled or missing, files with the .gz extension are decompressed.

CONFIG '{"disable_gunzip" : true}'

Note that no CONFIG clause is required to create an Azure pipeline unless you need to specify the suffixes for files to load. These suffixes are a JSON array of strings. When specified, CREATE PIPELINE only loads files that have the specified suffix. Suffixes in the CONFIG clause can be specified without a . before them, for example, CONFIG '{"suffixes": ["csv"]}'.

GCS Configurations

The following table shows the SingleStore-specific configurations for GCS.

Parameter

Description

disable_gunzip

When this parameter is set to true, files with the .gz extension are not decompressed.

When this parameter is disabled or missing, files with the .gz extension are decompressed.

CONFIG '{"disable_gunzip" : true}'

HDFS Configurations

The following table shows the SingleStore-specific configurations for HDFS.

Parameter	Description
`disable_partial_check`	When this parameter is set to `true`, a pipeline is created that imports Hive output files. When the pipeline runs, the extractor imports files, but does not check for additional files in the directory. `CONFIG '{"disable_partial_check" : true}'`
`disable_gunzip`	When this parameter is set to `true`, files with the `.gz` extension are not decompressed. When this parameter is disabled or missing, files with the `.gz` extension are decompressed. `CONFIG '{"disable_gunzip" : true}'`

Parameter

Description

disable_partial_check

When this parameter is set to true, a pipeline is created that imports Hive output files. When the pipeline runs, the extractor imports files, but does not check for additional files in the directory.

CONFIG '{"disable_partial_check" : true}'

disable_gunzip

When this parameter is set to true, files with the .gz extension are not decompressed.

When this parameter is disabled or missing, files with the .gz extension are decompressed.

CONFIG '{"disable_gunzip" : true}'

Filesystem Configurations

The following table shows the SingleStore-specific configurations for the filesystem.

Parameter

Description

disable_gunzip

When this parameter is set to true, files with the .gz extension are not decompressed.

When this parameter is disabled or missing, files with the .gz extension are decompressed.

CONFIG '{"disable_gunzip" : true}'

process_zero_byte_files

When this parameter is set to true, zero-byte files are processed.

When this parameter is disabled or missing, zero-byte files are not processed.

CONFIG '{"process_zero_byte_files" : true}'

On this page

Kafka Configurations

S3 Configurations

Azure Blob Configurations

GCS Configurations

HDFS Configurations

Filesystem Configurations

Was this article helpful?

On this page

Was this article helpful?