CREATE EXTERNAL FUNCTION
On this page
Creates or replaces an external function, which is a function that calls code that is executed outside of a SingleStore database.
A CREATE EXTERNAL FUNCTION
statement does not specify the function body.
This topic uses the following terms:
-
user-defined function (UDF): A function that returns a scalar value.
-
table-valued function (TVF): A function that returns a
TABLE
value (a rowset).
An external function may be either a UDF or TVF.
Syntax
CREATE [OR REPLACE] EXTERNAL FUNCTION [<database_name>.]function_name ([<parameter_list>])
RETURNS <return_type>
[DEFINER = CURRENT_USER]
AS [COLLOCATED|REMOTE SERVICE] "<service_endpoint>"
FORMAT <data_format> [LINK [database_name.]<connection_link_name>];
parameter_list: <variable_name> <data_type> [<data_type_modifier> [...n]] [, ...]
data_type_modifier:
DEFAULT <default_value> | NOT NULL | NULL | COLLATE collation_name
AS [COLLOCATED|REMOTE SERVICE] "<service_endpoint>"
FORMAT <data_format> [USING DELIMITED BATCHES] [LINK [database_name.]<connection_link_name>];
parameter_list:
<variable_name> <data_type> [<data_type_modifier> [...n]] [, ...]
data_type_modifier:
DEFAULT <default_value> | NOT NULL | NULL | COLLATE collation_name
Return Type
-
return_
: A scalar value or atype TABLE
value.If a TABLE
value, includes a list of<column name> <data_
values such astype> TABLE(a TEXT, b INT)
.
Arguments
-
function_
:name The name of the function.
Function names must be unique within a database, and cannot be duplicates of names for other tables, views, UDFs, TVFs, user-defined aggregate functions (UDAFs), or stored procedures. You can also specify
<database_
andname> <function_
together by replacingname> <function_
withname> <database_
instead of defaulting to the current context database.name>. <function_ name> For example, you can write the following: CREATE EXTERNAL FUNCTION db.some_func(a int)... -
data_
: One of the following:format JSON
orROWDAT_
.1 For more information, refer to Data Formats. -
connection_
: The name of a connection link, which is a secure link that stores connection details (credentials and configurations) for an HTTP connection.link_ name Refer to Configuring and Using Connection Links for more details. -
DEFINER = current_
: (Optional) When specified, a special flag is stored in the metadata table which indicates if a different user is required to run the function.user
Remarks
Note
SingleStore does not start, stop, or otherwise maintain the service that hosts the external function.
-
A call to an external UDF can be made anywhere in a SQL statement where an expression is allowed.
For example: -
SELECT example_external_udf(100,'foo');
-
SELECT * FROM tWHERE example_external_udf(100,'foo') = 'bar';
-
SELECT a, COUNT(*) FROM t GROUP BY a HAVING a > example_external_udf(100,'foo');
-
-
A call to an external TVF can be made anywhere in a SQL statement where a table reference is allowed, including joining to a table.
For example: -
SELECT * FROM example_external_tvf(100,'foo');
-
SELECT * FROM t, example_external_tvf(100,'foo') AS e WHERE t.id = e.id;
-
Enable External Functions
External functions must be enabled before they can be used.enable_
engine variable to either of the following as required:
Value |
Description |
---|---|
|
Enables the use of all external functions. |
|
Enables the use of all the external functions in a specified list. For example, to enable the use of any external function located at
|
By default, enable_
is set to OFF
.
Batching of Data Sent from SingleStore to an External Function
A batch contains multiple rows of data (as defined in the external function declaration).row_
is added to each row of data in the batch.
Batching of data sent from SingleStore to an external TVF
SingleStore always sends data in batches to an external TVF.
Batching of data sent from SingleStore to an external UDF
External UDFs are rewritten by SingleStore to external TVFs to allow external UDFs to be pushed down to a table scan to support batch processing of inputs.
However, not all external UDFs can be rewritten as TVFs.WHERE
clause of a SELECT
can be rewritten as external TVFs.SELECT
statements as input.
The value of the engine variable batch_
controls the rewrites of external UDFs to external TVFs for batching.
-
ALWAYS
: Any query, which contains an external UDF that SingleStore cannot convert to an external TVF, will not be executed.External UDFs used in all other types of queries will run and will be rewritten to TVFs. -
AUTO
: SingleStore rewrites all external UDFs that it can to external TVFs for batching.Any external UDFs that cannot be rewritten will be called one row at a time. -
NEVER
: All external UDFs will be called without batching (i.e. , one row at a time).
Send Delimited Batch Sets
Specify the USING DELIMITED BATCHES
clause to keep the connection open after all the batches have been processed by SingleStore and accept a final batch from the external function.
CREATE EXTERNAL FUNCTION extFunc(a TEXT)RETURNS TEXT AS COLLOCATED SERVICE '/xfunc_pipe'FORMAT ROWDAT_1 USING DELIMITED BATCHES;
By default, once SingleStore processes all the rows in the batch, it sends a final batch of zero size and closes the connection to the external function.
Implement Delimited Batch Sets in External Functions
When USING DELIMITED BATCHES
is specified in CREATE EXTERNAL FUNCTION
statements, SingleStore appends the batchdelim=true
parameter to the function name in the URI, which indicates that delimited batch sets are enabled for this function.
If delimited batch sets are enabled for an external function, SingleStore waits for a response after sending the final zero-size batch.
-
If there are no additional rows to send, the external function must respond with a HTTP status message and a response size of 0.
-
If additional rows exist, the
row_
of the rows sent in the response must match theid row_
of the original rows in the target SingleStore table.id SingleStore matches the rows in the last batch with the corresponding rows in the table using the row_
.id
Benefits of Batching
-
Batching reduces network and CPU utilization.
-
Batching does increase memory utilization.
However, the benefits of the decrease in network and CPU utilization far outweigh the increase in memory utilization. -
A global variable which assists with batching is
external_
.functions_ batch_ size It controls the batch size, in rows, for calls to the external process that services external functions. It has a default size of 512 rows. Most applications should simply use the default, but if the overhead of calling the external process becomes significant, you may wish to increase the batch size.
Data Formats
External functions support two data formats, JSON
and ROWDAT_
.CREATE [OR REPLACE] EXTERNAL FUNCTION
statement.
JSON
If batching is used, the data format is serialized as a JSON array of arrays as follows:
{"data":
[[<row id>, <argument 1>, <argument 2>, …],
[<row id>, <argument 1>, <argument 2>, …],
...
]}
If batching is not used, the data format is serialized as a JSON array of one array.
{"data":
[[<row id>, <argument 1>, <argument 2>, …]]
}
-
<row id>
: A row id, which is an integer greater than or equal to0
.The row id is generated for the rows sent as input to the external function. The output rows are listed in ascending order. There can be repeating row ids. -
<argument n>
: The serialization format supports three types:-
integer
-
double
-
variable-length string
All other types are serialized as strings.
-
Following is a description of the fields, in order, of the JSON sub-array:
Note
Because a single row or multiple rows (a batch) may be sent as input to an external function, the function should be written to handle both cases.
ROWDAT_ 1
ROWDAT_
is a SingleStore binary format.
Following is a description of the fields, in order, in each row:
-
A row id, which is an integer greater than or equal to
0
.The row id in a given row must be larger than the row id in the previous row. Multiple rows may contain the same row id. -
The remaining fields contain the arguments to the external function.
The serialization format supports three types: -
64-bit signed integer
-
64-bit double (IEEE 754)
-
length-encoded variable-length string
The serialization format is little-endian.
A single byte denoting whether a value is NULL
(1:NULL
, 0: not-NULL
) precedes all type values (except the row id which cannot beNULL
).All other types are serialized as string.
-
Note
Because a single row or multiple rows (a batch) may be sent as input to an external function, the function should be written to handle both cases.
CREATE [OR REPLACE] EXTERNAL FUNCTION . . . AS REMOTE
Use the AS REMOTE
option to create an external function, which is a function that is accessible through an HTTP service.
HTTP request
Request method
A standard HTTP POST request is used.
Standard HTTP request headers
The following standard HTTP headers are used:
-
accept/content-type
:application/json
orapplication/octet-stream
-
user-agent
:SingleStore/<S2 version>
Custom HTTP request headers
-
s2-ef-version
: <data format version> -
s2-ef-name
: <name of the external function>
The following custom HTTP headers are used:
HTTP response
When a remote function is successful, it returns a HTTP response with status code 200
.200
is treated as error, and the response body contains the error message.
SingleStore Transaction Semantics Are Not Enforced Inside of External Functions
Caution
Any update to state made by an external function is not part of the calling SingleStore transaction.
For example, suppose an application runs a SQL query which calls an external function.v
held in an external process by 1
, from 100
to 101
, and returns v
.ROLLBACK
.v
in the external process remains at 101
; it is not rolled back to 100
.
Using HTTP Connection Links
An HTTP connection link can be used when creating an external function.
HTTP connections have the following limitations:
-
Only external functions can use them.
-
They cannot contain the following header keys:
s2-ef-.
,. . Accept
,Connection
,Content-Length
,Content-Type
,Upgrade
,User-Agent
.
Example: Create a HTTP Link Using the CREDENTIALS and CONFIG Clauses
The following example shows how to create an HTTP link, which you can use in a CREATE [OR REPLACE] EXTERNAL FUNCTION
statement.
CREATE LINK HTTP_link AS HTTPCREDENTIALS '{"headers":{"authorization":"Bearer <token>"}}'CONFIG '{"headers":{"version":"2.3"}}';
Example: Create an HTTP Link Using Only the CREDENTIALS Clause
The following example shows how to create an HTTP link, which you can use in a CREATE [OR REPLACE] EXTERNAL FUNCTION
statement.
CREATE LINK HTTP_link AS HTTPCREDENTIALS '{"headers":{"authorization":"Bearer <token>", "version":"2.3"}}';
Encrypting HTTP Connection Links
To create an encrypted HTTP connection (encrypts the traffic between the database and the remote endpoint), set the following security-specific CONFIG/CREDENTIALS JSON:
{
"security": {
"security.protocol": "ssl",
"ssl.key.location": "/path/to/key.pem",
"ssl.key.password": "<password_for_private_key>",
"ssl.certificate.location": "/path/to/cert.pem"
"ssl.ca.location": "/path/to/ca.pem"
}
}
Where:
-
security.
: must be "ssl".protocol No other protocols are supported/implemented. -
ssl.
: required.key. location Specifies the path to the private key that will be used during SSL negotiation with a remote server. -
ssl.
: optional.key. password Used if ssl.
points to a password-protected private key.key. location -
ssl.
: required.certificate. location Specifies the path to the certificate with the signature of the public part of the key during the SSL handshake. -
ssl.
: optional.ca. location If specified, the SingleStore Helios engine will verify the peer’s certificate against the CA bundle loaded from the provided path.
You can omit the headers
object in the JSON when configuring encrypted connections.
Note
When configuring encrypted HTTP connections with your SingleStore Heliosworkspaces you can use the default keys and certificates, or you can upload your own.
If the remote endpoint needs to uniquely identify each workspace/entity that tries to connect, upload your own certificates.
If the remote endpoint is just checking whether a connecting workspace is a SingleStore Heliosworkspace (no matter what user) use the default settings.
If the remote endpoint does not care about the client's origin, use the simplified configuration, with CA only (as shown in "l_
-- Default Values
{
"security": {
"security.protocol": "ssl",
"ssl.key.location": "/etc/memsql/ssl/server-key.pem",
"ssl.key.password": "<password_for_private_key>",
"ssl.certificate.location": "/etc/memsql/ssl/server-cert.pem"
"ssl.ca.location": "/etc/memsql/extra/<.crt file>"
}
}
-- Locations for certificates manually uploaded via the Cloud Portal
{
"security": {
"security.protocol": "ssl",
"ssl.key.location": "/etc/memsql/extra/<.crt file>",
"ssl.key.password": "<password_for_private_key>",
"ssl.certificate.location": "/etc/memsql/extra/<.crt file>"
"ssl.ca.location": "/etc/memsql/extra/<.crt file>"
}
}
See the SingleStore Heliosexamples.
Example: Encrypt Traffic Between Database and Remote Endpoint
CREATE LINK link_1 AS HTTPCREDENTIALS '{"security":{"ssl.key.location":"<path_to_private_key>/server-key.pem"}}'CONFIG '{"security":{"security.protocol":"ssl","ssl.certificate.location":"<path_to_pem>/server-cert.pem","ssl.ca.location":"/<path_to_ca_bundle>/ca-cert.pem"}}';CREATE OR REPLACE EXTERNAL FUNCTION 'ef1' (input INT) RETURNS TEXT AS REMOTE SERVICE'https://localhost:8123/func1' FORMAT JSON LINK link_1;SELECT ef1(42);
Example: Create a Basic Encrypted HTTP Link Using Only the CONFIG Clause
The following example shows how to create a basic encrypted HTTP link using only the CA bundle path in the security
settings in the CONFIG clause."security.
in this case.
CREATE LINK link_2 AS HTTPCREDENTIALS ''CONFIG '{"security":{"ssl.ca.location":"<path_to_pe>/ca-cert.pem"}}';
Examples: SingleStore Helios
For the following examples, assume there are 4 HTTP servers running at "example.
8881: Server uses a private key and a legitimate certificate from a well-known CA for the example.
8882: Server uses a private key and a legitimate certificate from a well-known CA for the example.singlestore_
-- which SingleStore publicly distributes and is already included in SingleStore Helios.
8883: Server is running in full private mutual auth mode.
8884: Server uses a private key and legitimate certificate from a well-known CA for the example.
The links:
l_/etc/memsql/extra/ca-93057.
.
CREATE LINK l_public AS HTTPcredentials ''config '{"security":{"ssl.ca.location":"/etc/memsql/extra/ca-93057.crt"}}';CREATE OR REPLACE EXTERNAL FUNCTION 'ef_public'(input int) RETURNS text AS REMOTE SERVICE'https://example.com:8881/func1' FORMAT JSON LINK l_public;
l_
CREATE LINK l_public_s2 AS HTTPcredentials '{"security":{"ssl.key.location":"/etc/memsql/ssl/server-key.pem"}}'config '{"security":{"security.protocol":"ssl","ssl.certificate.location":"/etc/memsql/ssl/server-cert.pem","ssl.ca.location":"/etc/memsql/extra/ca-93057.crt"}}';CREATE OR REPLACE EXTERNAL FUNCTION 'ef_public_s2'(input int) RETURNS text AS REMOTE SERVICE'https://example.com:8882/func1' FORMAT JSON LINK l_public_s2;
l_/etc/memsql/extra/ca-12c5d.
.
-----BEGIN CERTIFICATE-----
MIIDODCCAiACAQEwDQYJKoZIhvcNAQELBQAwYDELMAkGA1UEBhMCVVMxCzAJBgN
[...]
vp9IHlbbNgAk4DpWvP8zYlAraVAClS0/sWAwhwkyU4OkB3VB8ti6KPBlc0V8ZoO
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,3EB18A38696F0637E720801EA2A1FAEF
azQ5ApyXYoL4SsUFjPJEld4qKYl4p8rVKyiZ7RHihhmtct6jgFOD8h6R3GvifdD
[...]
peT4q+YCrys36Z7ZH/aYJbGzQ305XYvstlL5BK5pKVxTxRucgIGRMa56gbMGdCK
-----END RSA PRIVATE KEY-----
Warning
Please note that the private key is protected (encrypted) with a passphrase in the example above.
The certificate is signed with a private "CA for database".ssl.
and ssl.
./etc/memsql/extra/ca-4860e.
and used as ssl.
.
CREATE LINK l_private AS HTTPcredentials '{"security":{"ssl.key.location":"/etc/memsql/extra/ca-12c5d.crt","ssl.key.password":"muchsecret"}}'config '{"security":{"security.protocol":"ssl","ssl.certificate.location":"/etc/memsql/extra/ca-12c5d.crt","ssl.ca.location":"/etc/memsql/extra/ca-4860e.crt"}}';CREATE OR REPLACE EXTERNAL FUNCTION 'ef_private'(input int) RETURNS text AS REMOTE SERVICE'https://example.com:8883/func1' FORMAT JSON LINK l_private;
l_
CREATE LINK l_private_le AS HTTP
credentials '{"security":{
"ssl.key.location":"/etc/memsql/extra/ca-12c5d.crt",
"ssl.key.password":"muchsecret"}
}'
config '{"security":{
"security.protocol":"ssl",
"ssl.certificate.location":"/etc/memsql/extra/ca-12c5d.crt",
"ssl.ca.location":"/etc/memsql/extra/ca-93057.crt"}
}';
CREATE OR REPLACE EXTERNAL FUNCTION 'ef_private_le'(input int) RETURNS text AS REMOTE SERVICE
'https://example.com:8884/func1' FORMAT JSON LINK l_private_le;
Examples
Example: An External UDF
Suppose you are writing an external function that implements an elementary grammar checker.
Create the paragraphs
table which stores the paragraphs to be checked and whether to check the digits in each paragraph.
CREATE TABLE paragraphs(paragraph TEXT, check_digits BOOL);
The table contains the following data.
+---------------------------------------------------------------+----------------+
| paragraph | check_digits |
+---------------------------------------------------------------+----------------+
| this paragraph has 1 sentence | 1 |
| this paragraph has 2 sentences. this is the second sentence | 0 |
| This paragraph has one sentence and doesn't need corrections. | 1 |
+-------------------------------------------------------------+------------------+
Create the definition for an external function that checks a paragraph for grammar mistakes.1
is sent as input to the check_
parameter, then digits are included as part of the grammar check.
CREATE EXTERNAL FUNCTION check_paragraph(paragraph TEXT, check_digits BOOL)RETURNS TEXTAS REMOTE SERVICE '<endpoint_URL>'FORMAT JSON;
Implement the body of the external function and store the implementation on the HTTP server.convert_
, capitalize_
and add_
is not shown.
import jsondef lambda_handler(event, context):request = json.loads(event['body'])response = []for row_id, paragraph, check_digits in request["data"]:if check_digits:paragraph = convert_numbers_to_words(paragraph)paragraph = capitalize_sentences(paragraph)paragraph = add_missing_periods_to_sentences(paragraph)response.append([row_id, paragraph])return { 'statusCode': 200, 'body': json.dumps({"data": response}) }
Query the paragraphs
table and call the external function as part of the query.
SELECT check_paragraph(paragraph, check_digits) as corrected_paragraph,check_digits FROM paragraphs;
The external UDF is rewritten to an external TVF; the query calls the external function only once with the following input:
{"data":
[[0, "this paragraph has 1 sentence", 1],
[1, "this paragraph has 2 sentences. this is the second sentence", 0],
[2, "This paragraph has one sentence and doesn't need corrections.", 1]]
}
After processing the input data, the external function returns the following data to the calling query.
{"data":
[[0, "This paragraph has one sentence."],
[1, "This paragraph has 2 sentences. This is the second sentence."],
[2, "This paragraph has one sentence and doesn't need corrections."]]
}
The query returns the following output.
+----------------------------------------------------------------+--------------+
| corrected_paragraph | check_digits |
+----------------------------------------------------------------+--------------+
| This paragraph has one sentence. | 1 |
| This paragraph has 2 sentences. This is the second sentence. | 0 |
| This paragraph has one sentence and doesn't need corrections. | 1 |
+----------------------------------------------------------------+--------------+
Example: External UDF using DEFINER
CREATE EXTERNAL FUNCTION ext_func()RETURNS TEXT DEFINER ='root'@'%'AS REMOTE SERVICE '...' FORMAT JSON;
To view a list of the functions and see the definer used:
SHOW FUNCTIONS;
+------------------------+------------------+----------+-------------+-----------------+------+
| Functions_in_func_proc | Function Type | Definer | Data Format | Runtime Type | Link |
+------------------------+------------------+----------+-------------+-----------------+------+
| ext_func | External User | root@% | JSON | Remote Service | |
| | Defined Function | | | | |
+------------------------+------------------+----------+-------------+-----------------+------+
Example: An External TVF
Suppose you want to read the contents of an external file and store each line in a row in a table.
Create the definition for the external function, which gets the contents of a file that is passed as input.
CREATE EXTERNAL FUNCTION get_file_lines(filename TEXT)RETURNS TABLE(line TEXT)AS REMOTE SERVICE '<endpoint_URL>'FORMAT JSON;
Implement the body of the external function and store the implementation on the HTTP server.
Query the external function:
SELECT line FROM get_file_lines("simple_file.txt");
When the external function runs, it returns a JSON array of arrays, which contains the data in simple_
:
{"data":
[[0, "This is the first sentence."],
[0, "This is the second sentence."],
[0, "This is the third sentence."]]
}
When the query that calls the external function completes, the query returns the following output.
+------------------------------+| line |+------------------------------+| This is the first sentence. || This is the second sentence. || This is the third sentence. |+------------------------------+
Implementing High-Performance Parallel External Functions
Fast, large-batch processing can be accomplished, when in an on-premises configuration, by implementing high-performance parallel external functions.
By using this approach, rows are batched when sent to a UDF, cross-process calls are not done for every row, and data is not sent across the network.
Example: Using Remote Service
The default service type is REMOTE and the endpoint is expected to be the URL of a remote HTTP service.
CREATE OR REPLACE EXTERNAL FUNCTION uppercase_string(string_col TEXT)RETURNS TEXTAS REMOTE SERVICE'https://fnc.execute-api.us-east-1.amazonaws.com/prod/upcase_string'FORMAT JSON;
Example: Using localhost as Remote Service
CREATE EXTERNAL FUNCTION ext_udf2(col_a INT, col_b INT)RETURNS TABLE(col_1 TEXT, col_2 BOOL)AS REMOTE SERVICE 'http://localhost:3305/cool_table_func'FORMAT JSON;
Using Collocated Service
When the COLLOCATED option is specified, commands are sent via Unix domain socket while input/output is sent via shared memory files.
For each collocated external function, the engine will first connect to the Unix domain socket listened to by the collocated service.memfd_
, both sealed with F_
so the files can only grow in size but not shrink and thus is safe to mmap
.
For each input batch, the input and output will be written to their corresponding shared files using either ROWDAT_
Example:
CREATE EXTERNAL FUNCTION lowercase_string(string_col TEXT)
RETURNS TEXT
AS COLLOCATED SERVICE '/usr/bin/fd'
FORMAT ROWDAT_1;
Last modified: April 15, 2025