Python UDFs

A Python User-Defined Function (UDF) is an external function that allows you to execute Python code outside of the SingleStore engine's process. It enables you to extend SingleStore with custom Python logic in a SingleStore Notebook. Python UDFs are especially useful when you need to integrate with AI applications, machine learning (ML) models and perform vector operations with libraries like NumPy, Pandas, Polars, or call external APIs.

Prerequisites

To enable Python UDFs in the SingleStore deployment, ensure the following:

SingleStore Version: SingleStore version 8.9 or later.
Environment: The SingleStore deployment must run in an AWS EKS IRSA-supported environment.
Engine Global Variable: enable_managed_functions must be enabled.
SQL
```
SET GLOBAL enable_managed_functions = 1;
```
Once this engine variable is enabled, create and deploy Python UDFs and TVFs.

Publish a Python UDF

Create a Python UDF

Python UDFs can be created in Shared notebook only. To create a new Python UDF, perform the following steps:

In the left navigation, select Editor > Shared.
Select Publish (on the top right).

New Python UDF

After selecting Publish, a new dialog box appears.

Publish Settings

Publish as	Select Python UDF.
Name	Enter a name of the Python UDF.
Description	Enter the Python UDF description.
Notebook	Select a shared notebook to publish as a Python UDF. The shared notebook is pre-selected when the Python UDF is published through notebooks.
Deployment	Select the SingleStore deployment (workspace) the notebook connects to. Selecting a workspace allows connecting to the SingleStore databases referenced in the notebook natively.
Runtime	Select a runtime from the following: Small Medium GPU-T4 Note This field is in preview.
Region	Select a region.
Idle Timeout	Select an idle timeout. Note This field is in preview.

Select Next.

Select Publish to publish the notebook as Python UDF. Once the Python UDF is published, call the function using SQL Editor.

Note

Creating a new Python UDF with the same name replaces the existing function. If no function with that name exists, a new one is created.

Manage an Existing Python UDF

To view an existing Python UDF, select Python UDFs in the left navigation. Existing Python UDFs can be managed by performing the following actions:

View
Update
Delete

View an Existing Python UDF

To view an existing Python UDF, select the Python UDF from the Name column. Following actions can be performed for a dashboard app from this page:

View Live Logs
Update
Delete

View Live Logs

To view live logs of the selected Python UDF, select View Live Logs from the ellipsis on the right side. A new window appears, where the Timestamp and the message in the Body column can be viewed. View the Log JSON by selecting the eye icon.

Update an Existing Python UDF

To update an existing Python UDF, select the ellipsis in the Actions column of the Python UDF, and select Update.

Delete an Existing Python UDF

To delete an existing Python UDF, select the ellipsis in the Actions column of the Python UDF, and select Delete.

Defining Python UDFs

Each Python UDF must meet these requirements:

The function's parameters and return types must be annotated.
The function must be wrapped with the @udf decorator, which is located in singlestoredb.functions.

The @udf decorator is a critical component, as it automatically analyzes the type annotations to map Python data types to SingleStore data types. It then uses this mapping to generate the necessary CREATE EXTERNAL FUNCTION statement in the SingleStore database, ensuring a reliable connection between the Python code and the SQL queries. Refer to Equivalent Data Types for related information.

There are two main types of Python UDFs, defined by the type annotations:

Scalar
Vectorized

Scalar Python UDFs

Scalar Python UDFs are defined with standard Python type annotations, such as int, float, or str. When called from the database, the Python UDF server receives a batch of rows, but the Python UDF itself is invoked once for each individual row of data. This is useful in complex logic with individual records.

The following example demonstrates a scalar Python UDF:

Python

from singlestoredb.functions import udf
import singlestoredb.apps as apps

@udf
async def multiply(x: float, y: float) -> float:
    return x * y

# Start Python UDF server
connection_info = await apps.run_udf_app()
print("UDF server running. Connection info:", connection_info)

This creates the following external function:

SQL

CREATE EXTERNAL FUNCTION multiply(x DOUBLE NOT NULL, y DOUBLE NOT NULL)
RETURNS DOUBLE NOT NULL
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Use async def for improved cancellation handling.

Vectorized Python UDFs

Vectorized Python UDFs are defined with vector type annotations, such as numpy.ndarray, pandas.Series, polars.Series, or pyarrow.Array. The Python UDF is called only once for each batch of rows received from the SingleStore database. The entire batch is converted into vectorized inputs, where each column of data corresponds to a single vector object passed as a function parameter. This is useful in high-performance numerical processing.

The following example demonstrates a vectorized Python UDF.

Python

import numpy as np
import numpy.typing as npt
from singlestoredb.functions import udf

@udf
async def vec_multiply(
      x: npt.NDArray[np.float64],
      y: npt.NDArray[np.float64]
    ) -> npt.NDArray[np.float64]:
    return x * y

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This creates the following external function:

SQL

CREATE EXTERNAL FUNCTION vec_multiply(x DOUBLE NOT NULL, y DOUBLE NOT NULL)
RETURNS DOUBLE NOT NULL
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Scalar Python TVFs

Scalar Python TVFs are defined in the same way as scalar Python UDFs, except that a scalar TVF uses a Table annotation to indicate that the function returns a table. The function must also return the final result wrapped in a Table object.

The following example demonstrates a scalar Python TVF:

Python

import numpy as np
import numpy.typing as npt
from singlestoredb.functions import udf, Table

@udf
def async number_stats(
    n: npt.NDArray[np.int_],
) -> Table[npt.NDArray[np.int_], npt.NDArray[np.int_], npt.NDArray[np.float64]]:
    numbers = np.arange(1, n[0] + 1, dtype=np.int_)
    squares = numbers ** 2
    roots = np.sqrt(numbers).round(2)
    return Table(numbers, squares, roots)

# Start Python UDF server
import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This creates the following external function:

SQL

CREATE EXTERNAL FUNCTION
    `number_stats`(`n` BIGINT NOT NULL)
    RETURNS TABLE(`a` BIGINT NOT NULL)
    AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

SingleStore automatically generates generic column names if no names are associated with the return fields (a in this example). Explicitly name result columns using overrides or schema classes. Refer to Overriding Parameters and Return Value Types for related information.

Vectorized Python TVFs

Vectorized Python TVFs operate in a way similar to vectorized Python UDFs. The main difference is that the vectorized Python TVFs can return multiple columns as output. In vectorized Python TVFs, each returned vector corresponds to a column.

The following example demonstrates a vectorized Python TVF:

Python

import numpy as np
import numpy.typing as npt
from singlestoredb.functions import udf, Table

@udf
def vec_table_function(
    n: npt.NDArray[np.int_],
) -> Table[npt.NDArray[np.int_], npt.NDArray[np.float64], npt.NDArray[np.str_]]:
    x = np.array([10] * n[0], dtype=np.int_)
    y = np.array([10.0] * n[0], dtype=np.float64)
    z = np.array(['ten'] * n[0], dtype=np.str_)

    # Returns a tuple of vectors (each column of the output)
    return Table(x, y, z)

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This creates the following external function:

SQL

CREATE EXTERNAL FUNCTION
    `vec_table_function`(`n` BIGINT NOT NULL)
    RETURNS TABLE(
        `a` BIGINT NOT NULL,
        `b` DOUBLE NOT NULL,
        `c` TEXT NOT NULL
    )
    AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

SingleStore automatically generates generic column names if no names are associated with the return fields (a, b, c). Explicitly name result columns using overrides or schema classes. Refer to Overriding Parameters and Return Value Types for related information.

Python UDFs

On this page

Prerequisites

Publish a Python UDF

Create a Python UDF

New Python UDF

Manage an Existing Python UDF

View an Existing Python UDF

View Live Logs

Update an Existing Python UDF

Delete an Existing Python UDF

Defining Python UDFs

Scalar Python UDFs

Vectorized Python UDFs

Scalar Python TVFs

Vectorized Python TVFs

In this section

Was this article helpful?

On this page

Was this article helpful?