SingleStore DB

CREATE [OR REPLACE] EXTERNAL FUNCTION

Creates or replaces an external function, which is a function that calls code that is executed outside of a SingleStore database. A CREATE [OR REPLACE] EXTERNAL FUNCTION statement does not specify the function body.

Note

CREATE [OR REPLACE] EXTERNAL FUNCTION is currently a preview feature.

Note

This topic uses the following terms:

  • user-defined function (UDF): A function that returns a scalar value.

  • table-valued function (TVF): A function that returns a TABLE value (a rowset).

An external function may be either a UDF or TVF.

Syntax
CREATE [OR REPLACE] EXTERNAL FUNCTION [<database_name>.]function_name ([<parameter_list>])
RETURNS <return_type>
AS REMOTE SERVICE "<service_endpoint>"
FORMAT <data_format> [LINK <connection_link_name>];

    parameter_list: <variable_name> <data_type> [<data_type_modifier> [...n]] [, ...]

    data_type_modifier:
        DEFAULT <default_value> | NOT NULL | NULL | COLLATE collation_name
Return Type
  • return_type: A scalar value or a TABLE value. If a TABLE value, includes a list of <column name> <data_type> values such as TABLE(a TEXT, b INT).

Arguments
  • function_name:

    The name of the function. Function names must be unique within a database, and cannot be duplicates of names for other tables, views, UDFs, TVFs, user-defined aggregate functions (UDAFs), or stored procedures.

    You can also specify <database_name> and <function_name>  together by replacing  <function_name> with <database_name>.<function_name> instead of defaulting to the current context database. For example, you can write the following:

    CREATE EXTERNAL FUNCTION db.some_func(a int)
    ...
  • data_format: One of the following: JSON or ROWDAT_1. For more details, see the section Data Formats, below.

  • connection_link_name: The name of a connection link, which is a secure link that stores connection details (credentials and configurations) for an HTTP connection. See Configuring and Using Connection Links for more details.

Remarks
  • Note

    SingleStore does not start, stop, or otherwise maintain the service that hosts the external function.

  • A call to an external UDF can be made anywhere in a SQL statement where an expression is allowed. For example:

    • SELECT example_external_udf(100,'foo');

    • SELECT * FROM t WHERE example_external_udf(100,'foo') = 'bar';

    • SELECT a, COUNT(*) FROM t
      GROUP BY a
      HAVING a > example_external_udf(100,'foo');
  • A call to an external TVF can be made anywhere in a SQL statement where a table reference is allowed. For example:

    • SELECT * FROM example_external_tvf(100,'foo');

    • SELECT * FROM t, example_external_tvf(100,'foo') AS e
      WHERE t.id = e.id;
Enabling external functions

External functions must be enabled before they can be used.

  • To enable the use of all external functions, set the engine variable enable_external_functions to ON.

  • To enable the use of all external functions in a list that you specify, set enable_external_functions to ALLOWLIST and the external_functions_allowlist engine variable to the names of the external functions that you want to enable. You can use wildcards in the names.

    For example, to enable the use of any external function located at http://host1.example.com and the functions lowercase and uppercase located at http://host2.example.com, set the engine variable external_functions_allowlist to:

    "{'endpoints' : ['http://host1.example.com/*', 'http://host2.example.com/uppercase','http://host2.example.com/lowercase']}"

By default, enable_external_functions is set to OFF.

Batching of data sent from SingleStore to an external function

A batch contains multiple rows of data and a row id.

Batching of data sent from SingleStore to an external TVF

SingleStore always sends data in batches to an external TVF.

Batching of data sent from SingleStore to an external UDF

External UDFs are rewritten by SingleStore to external TVFs. This is done to allow external UDFs to be pushed down to a table scan to support batch processing of inputs.

However, not all external UDFs can be rewritten as TVFs. Only UDFs contained in the projection list (the columns that are selected by a query) or the WHERE clause of a SELECT can be rewritten as external TVFs. Furthermore, these external UDFs must not receive aggregate functions, window functions, or nested SELECT statements as input.

The value of the engine variable batch_external_function controls the rewrites of external UDFs to external TVFs for batching. The possible values are as follows:

  • ALWAYS: Any query, which contains an external UDF that SingleStore cannot convert to an external TVF, will not be executed. External UDFs used in all other types of queries will run and will be rewritten to TVFs.

  • AUTO: SingleStore will rewrite all external UDFs that it can to external TVFs for batching. Any external UDFs that cannot be rewritten will be called one row at a time.

  • NEVER: All external UDFs will be called without batching (i.e., one row at a time).

The benefits of batching

Batching reduces network and CPU utilization.

Batching does increase memory utilization. However, the benefits of the decrease in network and CPU utilization far outweigh the increase in memory utilization.

Data formats

External functions support two data formats, JSON and ROWDAT_1. You specify which format to use when you write a CREATE [OR REPLACE] EXTERNAL FUNCTION statement. The format you select specifies how the input row(s) that are sent to the external function are serialized and how the output row(s) that are returned from the external function are serialized.

JSON

If batching is used, the data format is serialized as a JSON array of arrays as follows:

{"data":
    [[<row id>, <argument 1>, <argument 2>, …],
    [<row id>, <argument 1>, <argument 2>, …],
    ...
]}

If batching is not used, the data format is serialized as a JSON array of one array. The one array contains the row.

{"data":
    [[<row id>, <argument 1>, <argument 2>, …]]
]}

Following is a description of the fields, in order, of the JSON sub-array:

  • <row id>: A row id, which is an integer greater than or equal to 0. The row id in a given row must be larger than the row id in the previous row. Multiple rows may contain the same row id.

  • <argument n>: The serialization format supports three types:

    • integer

    • double

    • variable-length string

    All other types are serialized as strings.

Note

Because a single row or multiple rows (a batch) may be sent as input to an external function, the function should be written to handle both cases.

ROWDAT_1

ROWDAT_1 is a SingleStore binary format used for serializing data that is the input to and the output from an external function. This data format contains one row, or multiple rows if batching is used.

Following is a description of the fields, in order, in each row:

  • A row id, which is an integer greater than or equal to 0. The row id in a given row must be larger than the row id in the previous row. Multiple rows may contain the same row id.

  • The remaining fields contain the arguments to the external function. The serialization format supports three types:

    • 64-bit signed integer

    • 64-bit double (IEEE 754)

    • length-encoded variable-length string

    The serialization format is little-endian.  A single byte denoting whether a value is NULL (1: NULL, 0: not-NULL) precedes all type values (except the row id which cannot be NULL).

    All other types are serialized as string.

Note

Because a single row or multiple rows (a batch) may be sent as input to an external function, the function should be written to handle both cases.

CREATE [OR REPLACE] EXTERNAL FUNCTION ... AS REMOTE ...

Use the AS REMOTE option to create an external function, which is a function that is accessible through an HTTP service. Following are the specifics of the HTTP request and response format.

HTTP request
Request method

A standard HTTP POST request is used.

Standard HTTP request headers

The following standard HTTP headers are used:

  • accept/content-type: application/json or application/octet-stream

  • user-agent: SingleStore/<S2 version>

Custom HTTP request headers

The following custom HTTP headers are used:

  • s2-ef-version: <data format version>

  • s2-ef-name: <name of the external function>

HTTP response

When a remote function is successful, it returns a HTTP response with status code 200. A HTTP response with status code other than 200 is treated as error, and the response body contains the error message.

SingleStore transaction semantics are not enforced inside of external functions

Caution

Any update to state made by an external function is not part of the calling SingleStore transaction. If the calling transaction rolls back, that will not roll back updates made by an external function.

For example, suppose an application runs a SQL query which calls an external function. The function runs and increments a variable v held in an external process by 1, from 100 to 101, and returns v. After the function completes, but before the transaction commits, the application running the transaction calls ROLLBACK. The state of the variable v in the external process remains at 101; it is not rolled back to 100.

Examples
Example: An external UDF

Suppose you are writing an external function that implements an elementary grammar checker. The checker scans one paragraph at a time. The checker corrects each paragraph by adding missing periods at the end of sentences and capitalizes the first letter of sentences (if the first letter is lowercase). Also, the checker optionally replaces digits, that are standalone words, in the sentence with the corresponding numbers spelled out. If no corrections to a paragraph are needed, the checker outputs the paragraph, unchanged.

Create the paragraphs table which stores the paragraphs to be checked and whether to check the digits in each paragraph.

CREATE TABLE paragraphs(paragraph TEXT, check_digits BOOL);

The table contains the following data.

+---------------------------------------------------------------+----------------+
| paragraph                                                     | check_digits   |
+---------------------------------------------------------------+----------------+
| this paragraph has 1 sentence                                 |              1 |
| this paragraph has 2 sentences. this is the second sentence   |              0 |
| This paragraph has one sentence and doesn't need corrections. |              1 |
+-------------------------------------------------------------+------------------+

Create the definition for an external function that checks a paragraph for grammar mistakes. If 1 is sent as input to the check_digits parameter, then digits are included as part of the grammar check.

CREATE EXTERNAL FUNCTION check_paragraph(paragraph TEXT, check_digits BOOL)
RETURNS TEXT
AS REMOTE SERVICE '<endpoint_URL>'
FORMAT JSON;

Implement the body of the external function and store the implementation on the HTTP server. An example implementation using an AWS Lambda function follows. Note that the implementation of the helper functions convert_numbers_to_words, capitalize_sentences and add_missing_periods_to_sentences is not shown.

import json

def lambda_handler(event, context):
    request = json.loads(event['body'])
    response = []
    for row_id, paragraph, check_digits in request["data"]:
        if check_digits:
            paragraph = convert_numbers_to_words(paragraph)
        paragraph = capitalize_sentences(paragraph)
        paragraph = add_missing_periods_to_sentences(paragraph)
        response.append([row_id, paragraph])
    return { 'statusCode': 200, 'body': json.dumps({"data": response}) }

Query the paragraphs table and call the external function as part of the query.

SELECT check_paragraph(paragraph, check_digits) as corrected_paragraph,
check_digits FROM paragraphs;

The external UDF is rewritten to an external TVF; the query calls the external function only once with the following input:

{"data":
    [[0, "this paragraph has 1 sentence", 1],
    [1, "this paragraph has 2 sentences. this is the second sentence", 0],
    [2, "This paragraph has one sentence and doesn't need corrections.", 1]]
}

After processing the input data, the external function returns the following data to the calling query. The data contains the corrected paragraphs. Paragraphs not needing any corrections are also included.

{"data":
    [[0, "This paragraph has one sentence."],
    [1, "This paragraph has 2 sentences. This is the second sentence."],
    [2, "This paragraph has one sentence and doesn't need corrections."]]
}

The query returns the following output.

+----------------------------------------------------------------+--------------+
| corrected_paragraph                                            | check_digits |
+----------------------------------------------------------------+--------------+
| This paragraph has one sentence.                               |            1 |
| This paragraph has 2 sentences. This is the second sentence.   |            0 |
| This paragraph has one sentence and doesn't need corrections.  |            1 |
+----------------------------------------------------------------+--------------+
Example: An external TVF

Suppose you want to read the contents of an external file and store each line in a row in a table.

Create the definition for the external function, which gets the contents of a file that is passed as input.

CREATE EXTERNAL FUNCTION get_file_lines(filename TEXT)
RETURNS TABLE(line TEXT)
AS REMOTE SERVICE '<endpoint_URL>'
FORMAT JSON;

Implement the body of the external function and store the implementation on the HTTP server. The implementation is not shown here.

Query the external function:

SELECT line FROM get_file_lines("simple_file.txt");

When the external function runs, it returns a JSON array of arrays, which contains the data in simple_file.txt:

{"data":
    [[0, "This is the first sentence."],
    [1, "This is the second sentence."],
    [2, "This is the third sentence."]]
}

When the query that calls the external function completes, the query returns the following output.

+------------------------------+
| line                         |
+------------------------------+
| This is the first sentence.  |
| This is the second sentence. |
| This is the third sentence.  |
+------------------------------+