Working with Python UDFs

Equivalent Data Types

When defining Python UDFs, input parameters and return values must be mapped between Python and SingleStore data types. The framework automatically infers these mappings from type annotations, but they can be overridden if necessary.

Following are the default Python to SingleStore mappings:

Python Type

SingleStore Type

int

BIGINT

float

DOUBLE

str

TEXT

bytes

BLOB

bool

BOOL

For finer control, use the singlestoredb.functions.dtypes module to specify types such as SMALLINT, VARCHAR, DECIMAL, or JSON.

Handling NULL Values

By default, Python UDFs and TVFs do not allow NULL values. The method for enabling NULL support depends on whether the Python function is scalar or vectorized.

Scalar Python Functions (UDFs and TVFs)

  • Use Optional[...] (or type | None in Python 3.10+) to allow parameters and return values to accept NULL.

  • This makes the entire parameter or return value nullable.

  • If any argument is NULL, the function can return NULL as needed.

Vectorized Python Functions (UDFs and TVFs)

  • Optional[...] cannot be used for element-level NULLs inside vectors.

  • Instead, use the Masked annotation for parameters and return values that must support NULL.

  • A Masked value consists of:

    • A data vector containing the non-NULL values.

    • A boolean mask vector, where True indicates a NULL element.

  • This ensures that NULL values are preserved and propagated correctly across inputs and outputs.

  • In Python TVFs, each output column can independently use Masked to indicate nullability.

Overriding Parameter and Return Value Types

You can specify an output schema using the returns= parameter in the @udf decorator. This schema can be defined as:

  • A list of SQL types with name= specified, or

  • A NamedTuple, TypedDict, or pydantic.BaseModel.

The schema class is automatically inferred and it does not need to be returned explicitly, it is only used to define the output schema.

Cancelling Running Python UDFs

When a query that uses a Python UDF is cancelled or when the connection between the database engine and the Python UDF server is broken, the Python UDF execution continues until it can be safely interrupted.

  • Synchronous Python UDFs

    • Scalar Python UDFs: Cancellation can only occur between row function calls.

    • Vectorized Python UDFs: Cancellation occurs only after all rows in the current batch are processed.

  • Asynchronous Python UDFs

    • Cancellation is detected more quickly.

    • When a disconnect is detected, an asyncio.CancelledError is raised the next time the Python UDF becomes active.

    • If synchronous operations are used inside an async Python UDF, they must complete before cancellation occurs.

    • Nested async calls are cancelled as soon as they activate again.

SingleStore recommends using async before the function definition for better cancellation handling.

Note

Use asynchronous Python UDFs and async libraries wherever possible to ensure that Python UDFs can be cancelled promptly.

Timeouts

A timeout can be applied directly in the @udfdecorator using the timeout= parameter.

  • The value must be specified in seconds.

  • If the timeout period is exceeded, the Python UDF is cancelled automatically.

User Permissions

To allow users other than the organization owner to execute Python UDFs or TVFs, the organization owner must grant the appropriate permissions. These permissions are applied within the customer’s cluster using the SQL Editor.

Required Permissions

  1. EXECUTE (Object-Level Permission)

    • Required to execute a function in a specific database.

    • Can be granted on all functions in a database or a specific function.

    • Following example demonstrates the syntax:

      GRANT EXECUTE ON <db_name>.<function_name> TO 'user'@'%';
  2. OUTBOUND (Global Permission)

    • Required to allow functions to make external network requests.

    • Must be granted globally.

    • Following example demonstrates the syntax:

      GRANT OUTBOUND ON *.* TO 'user'@'%';

Note

Both permissions must be granted to ensure the user can execute Python UDFs or TVFs successfully. Without these permissions, execution may fail due to insufficient privileges.

Examples

Example 1: Calculate Sales Metrics with Vectorized Python TVFs

This example demonstrates a vectorized Python TVF that computes total and average sales per row and returns a table with named columns.

import typing
import numpy as np
import pandas as pd
from singlestoredb.functions import udf, Table
import numpy.typing as npt
# Define output schema using NamedTuple
class SalesOutput(typing.NamedTuple):
total_sales: float
average_sales: float
category: str
@udf(returns=SalesOutput)
async def vector_sales(
units_sold: npt.NDArray[np.float64], # Vector of units sold
unit_price: npt.NDArray[np.float64], # Vector of unit prices
category: npt.NDArray[np.str_] # Vector of categories
) -> Table[pd.DataFrame]:
"""
Calculate total and average sales per row with category.
Parameters
----------
units_sold : np.ndarray[np.float64]
Number of units sold
unit_price : np.ndarray[np.float64]
Price per unit
category : np.ndarray[str]
Category name for each row
Returns
-------
pd.DataFrame
Table with columns:
- total_sales
- average_sales
- category
"""
# Compute vectorized total and average
total_sales = units_sold * unit_price
average_sales = total_sales / np.maximum(units_sold, 1) # Avoid divide by zero
# Return as a DataFrame with named columns
df = pd.DataFrame({
'total_sales': total_sales,
'average_sales': average_sales,
'category': category
})
return Table(df)
# Start Python UDF server
import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

CREATE EXTERNAL FUNCTION `vector_sales`(
`units_sold` DOUBLE NOT NULL,
`unit_price` DOUBLE NOT NULL,
`category` TEXT NOT NULL
)
RETURNS TABLE(
`total_sales` DOUBLE NOT NULL,
`average_sales` DOUBLE NOT NULL,
`category` TEXT NOT NULL
)
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Example 2: Vector Embeddings

This example demonstrates a Python UDF that can connect to any deployed embedding service in SingleStore Aura.

import base64
import sys
import numpy as np
import requests
import numpy.typing as npt
from singlestoredb.functions import udf
# Configuration for the embedding service
MODEL_NAME = 'name-of-embedding-service'
MODEL_URL = 'url-of-embedding-server'
TOKEN = 'api-token-of-embedding-service'
HEADERS = {
'accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': f'Bearer {TOKEN}',
}
@udf
async def mixedbread_embeddings(
text: npt.NDArray[np.str_], # Input: array of strings
) -> npt.NDArray[np.bytes_]: # Output: array of bytes
"""
Generate vector embeddings for an array of text inputs.
Parameters
----------
text : numpy.ndarray
Array of input strings
Returns
-------
numpy.ndarray
Array of byte embeddings
"""
# Send request to embedding service
res = requests.post(
MODEL_URL,
headers=HEADERS,
json=dict(
model=MODEL_NAME,
input=text.tolist(),
encoding_format='base64',
),
)
# Raise an error if the request failed
if res.status_code >= 400:
print(res.content.decode('utf8'), file=sys.stderr)
raise RuntimeError(res.content.decode('utf8'))
# Decode embeddings and return as a NumPy array of bytes
return np.array(
[base64.b64decode(x['embedding']) for x in res.json()['data']],
dtype=object, # Preserve byte arrays correctly
)
# Start Python UDF server
import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

CREATE EXTERNAL FUNCTION `mixedbread_embeddings`(`text` TEXT NOT NULL)
RETURNS BLOB NOT NULL
AS REMOTE SERVICE "base-url-of-udf-server-including-port"
FORMAT ROWDAT_1;

Example 3: OpenAI Integration with Structured Output

This example demonstrates a Python table-valued function (TVF) that returns structured output using a pydantic.BaseModel. The function takes a word as input and returns 10 synonyms along with a score indicating how close each synonym is to the original word.

import os
from typing import List
from pydantic import BaseModel, Field
from singlestoredb.functions import udf, Table
import instructor
from openai import OpenAI
# Create OpenAI client via instructor wrapper
client = instructor.from_openai(OpenAI(api_key=os.getenv("OPENAI_API_KEY")))
# Define output schema
class Synonym(BaseModel):
word: str = Field(description='Synonym of the given word')
score: float = Field(
description='Closeness score from 0.0 to 1.0'
)
# Define the TVF
@udf
async def get_synonyms(word: str) -> Table[List[Synonym]]:
"""Return a list of synonyms of the given word and a score."""
return Table(client.create(
model='gpt-4o-mini',
messages=[
dict(role='system', content='You are a helpful assistant'),
dict(
role='user',
content=f'''
* Get a list of synonyms of the word "{word}"
* Limit the number of results to 10
'''
)
],
response_model=List[Synonym],
max_retries=0,
))
# Start Python UDF server
import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

CREATE EXTERNAL FUNCTION
`get_synonyms`(`word` TEXT NOT NULL)
RETURNS TABLE(
`word` TEXT NOT NULL,
`score` DOUBLE NOT NULL
)
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Run the following command to retrieve the synonyms of the word “danger”.

SELECT * FROM get_synonyms('danger');

The command results the following:

word score
0 hazard 0.90
1 risk 0.85
2 threat 0.88
3 peril 0.90
4 jeopardy 0.87
5 threatening 0.80
6 threatening situation 0.70
7 threatened 0.75
8 insecurities 0.60
9 unsafe 0.65

Example 4: Machine Learning Model Scoring

This example demonstrates how to use a pre-trained Keras model stored in Link text to score a collection of input parameters.

import numpy as np
import pandas as pd
import numpy.typing as npt
import singlestoredb.notebook as nb
from tensorflow.keras.models import load_model
from singlestoredb.functions import udf
# Download model from Stage
nb.stage.download_file('my_model.keras', local_path='my_model.keras')
# Load the model
model = load_model('my_model.keras')
@udf
async def keras_score(
param_1: npt.NDArray[np.float32],
param_2: npt.NDArray[np.float32],
param_3: npt.NDArray[np.float32],
) -> npt.NDArray[np.float32]:
"""
Score rows using a Keras model.
Parameters
----------
param_1 : np.ndarray[np.float32]
First input parameter.
param_2 : np.ndarray[np.float32]
Second input parameter.
param_3 : np.ndarray[np.float32]
Third input parameter.
Returns
-------
np.ndarray[np.float32]
Predicted scores from the model.
"""
X_test = pd.DataFrame({
'param_1': param_1,
'param_2': param_2,
'param_3': param_3
})
return model.predict(X_test).reshape((-1,))
# Start Python UDF server
import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

Last modified: September 18, 2025

Was this article helpful?

Verification instructions

Note: You must install cosign to verify the authenticity of the SingleStore file.

Use the following steps to verify the authenticity of singlestoredb-server, singlestoredb-toolbox, singlestoredb-studio, and singlestore-client SingleStore files that have been downloaded.

You may perform the following steps on any computer that can run cosign, such as the main deployment host of the cluster.

  1. (Optional) Run the following command to view the associated signature files.

    curl undefined
  2. Download the signature file from the SingleStore release server.

    • Option 1: Click the Download Signature button next to the SingleStore file.

    • Option 2: Copy and paste the following URL into the address bar of your browser and save the signature file.

    • Option 3: Run the following command to download the signature file.

      curl -O undefined
  3. After the signature file has been downloaded, run the following command to verify the authenticity of the SingleStore file.

    echo -n undefined |
    cosign verify-blob --certificate-oidc-issuer https://oidc.eks.us-east-1.amazonaws.com/id/CCDCDBA1379A5596AB5B2E46DCA385BC \
    --certificate-identity https://kubernetes.io/namespaces/freya-production/serviceaccounts/job-worker \
    --bundle undefined \
    --new-bundle-format -
    Verified OK