Working with Python UDFs

Equivalent Data Types

When defining Python UDFs, input parameters and return values must be mapped between Python and SingleStore data types. The framework automatically infers these mappings from type annotations, but they can be overridden if necessary.

Following are the default Python to SingleStore mappings:

Python Type	SingleStore Type
`int`	`BIGINT`
`float`	`DOUBLE`
`str`	`TEXT`
`bytes`	`BLOB`
`bool`	`BOOL`

For finer control, use the singlestoredb.functions.dtypes module to specify types such as SMALLINT, VARCHAR, DECIMAL, or JSON.

Handling NULL Values

By default, Python UDFs and TVFs do not allow NULL values. The method for enabling NULL support depends on whether the Python function is scalar or vectorized.

Scalar Python Functions (UDFs and TVFs)

Use Optional[...] (or type | None in Python 3.10+) to allow parameters and return values to accept NULL.
This makes the entire parameter or return value nullable.
If any argument is NULL, the function can return NULL as needed.

Vectorized Python Functions (UDFs and TVFs)

Optional[...] cannot be used for element-level NULLs inside vectors.
Instead, use the Masked annotation for parameters and return values that must support NULL.
A Masked value consists of:
- A data vector containing the non-NULL values.
- A boolean mask vector, where True indicates a NULL element.
This ensures that NULL values are preserved and propagated correctly across inputs and outputs.
In Python TVFs, each output column can independently use Masked to indicate nullability.

Overriding Parameter and Return Value Types

You can specify an output schema using the returns= parameter in the @udf decorator. This schema can be defined as:

A list of SQL types with name= specified, or
A NamedTuple, TypedDict, or pydantic.BaseModel.

The schema class is automatically inferred and it does not need to be returned explicitly, it is only used to define the output schema.

Cancelling Running Python UDFs

When a query that uses a Python UDF is cancelled or when the connection between the database engine and the Python UDF server is broken, the Python UDF execution continues until it can be safely interrupted.

Synchronous Python UDFs
- Scalar Python UDFs: Cancellation can only occur between row function calls.
- Vectorized Python UDFs: Cancellation occurs only after all rows in the current batch are processed.
Asynchronous Python UDFs
- Cancellation is detected more quickly.
- When a disconnect is detected, an asyncio.CancelledError is raised the next time the Python UDF becomes active.
- If synchronous operations are used inside an async Python UDF, they must complete before cancellation occurs.
- Nested async calls are cancelled as soon as they activate again.

SingleStore recommends using async before the function definition for better cancellation handling.

Note

Use asynchronous Python UDFs and async libraries wherever possible to ensure that Python UDFs can be cancelled promptly.

Timeouts

A timeout can be applied directly in the @udfdecorator using the timeout= parameter.

The value must be specified in seconds.
If the timeout period is exceeded, the Python UDF is cancelled automatically.

User Permissions

To allow users other than the organization owner to execute Python UDFs or TVFs, the organization owner must grant the appropriate permissions. These permissions are applied within the customer’s cluster using the SQL Editor.

Required Permissions

EXECUTE (Object-Level Permission)
- Required to execute a function in a specific database.
- Can be granted on all functions in a database or a specific function.
- Following example demonstrates the syntax:
  SQL
```
GRANT EXECUTE ON <db_name>.<function_name> TO 'user'@'%';
```
OUTBOUND (Global Permission)
- Required to allow functions to make external network requests.
- Must be granted globally.
- Following example demonstrates the syntax:
  SQL
```
GRANT OUTBOUND ON *.* TO 'user'@'%';
```

Note

Both permissions must be granted to ensure the user can execute Python UDFs or TVFs successfully. Without these permissions, execution may fail due to insufficient privileges.

Examples

Example 1: Calculate Sales Metrics with Vectorized Python TVFs

This example demonstrates a vectorized Python TVF that computes total and average sales per row and returns a table with named columns.

Python

import typing
import numpy as np
import pandas as pd
from singlestoredb.functions import udf, Table
import numpy.typing as npt

# Define output schema using NamedTuple
class SalesOutput(typing.NamedTuple):
    total_sales: float
    average_sales: float
    category: str

@udf(returns=SalesOutput)
async def vector_sales(
    units_sold: npt.NDArray[np.float64],  # Vector of units sold
    unit_price: npt.NDArray[np.float64],  # Vector of unit prices
    category: npt.NDArray[np.str_]          # Vector of categories
) -> Table[pd.DataFrame]:
    """
    Calculate total and average sales per row with category.
    
    Parameters
    ----------
    units_sold : np.ndarray[np.float64]
        Number of units sold
    unit_price : np.ndarray[np.float64]
        Price per unit
    category : np.ndarray[str]
        Category name for each row

    Returns
    -------
    pd.DataFrame
        Table with columns:
        - total_sales
        - average_sales
        - category
    """
    # Compute vectorized total and average
    total_sales = units_sold * unit_price
    average_sales = total_sales / np.maximum(units_sold, 1)  # Avoid divide by zero
    
    # Return as a DataFrame with named columns
    df = pd.DataFrame({
        'total_sales': total_sales,
        'average_sales': average_sales,
        'category': category
    })
    
    return Table(df)

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

SQL

CREATE EXTERNAL FUNCTION `vector_sales`(
    `units_sold` DOUBLE NOT NULL,
    `unit_price` DOUBLE NOT NULL,
    `category` TEXT NOT NULL
)
RETURNS TABLE(
    `total_sales` DOUBLE NOT NULL,
    `average_sales` DOUBLE NOT NULL,
    `category` TEXT NOT NULL
)
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Example 2: Vector Embeddings

This example demonstrates a Python UDF that can connect to any deployed embedding service in SingleStore Aura.

Python

import base64
import sys
import numpy as np
import requests
import numpy.typing as npt
from singlestoredb.functions import udf

# Configuration for the embedding service
MODEL_NAME = 'name-of-embedding-service'
MODEL_URL = 'url-of-embedding-server'
TOKEN = 'api-token-of-embedding-service'

HEADERS = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {TOKEN}',
}

@udf
async def mixedbread_embeddings(
    text: npt.NDArray[np.str_],  # Input: array of strings
) -> npt.NDArray[np.bytes_]:     # Output: array of bytes
    """
    Generate vector embeddings for an array of text inputs.

    Parameters
    ----------
    text : numpy.ndarray
        Array of input strings

    Returns
    -------
    numpy.ndarray
        Array of byte embeddings
    """
    # Send request to embedding service
    res = requests.post(
        MODEL_URL,
        headers=HEADERS,
        json=dict(
            model=MODEL_NAME,
            input=text.tolist(),
            encoding_format='base64',
        ),
    )

    # Raise an error if the request failed
    if res.status_code >= 400:
        print(res.content.decode('utf8'), file=sys.stderr)
        raise RuntimeError(res.content.decode('utf8'))

    # Decode embeddings and return as a NumPy array of bytes
    return np.array(
        [base64.b64decode(x['embedding']) for x in res.json()['data']],
        dtype=object,  # Preserve byte arrays correctly
    )

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

SQL

CREATE EXTERNAL FUNCTION `mixedbread_embeddings`(`text` TEXT NOT NULL)
    RETURNS BLOB NOT NULL
    AS REMOTE SERVICE "base-url-of-udf-server-including-port"
    FORMAT ROWDAT_1;

Example 3: OpenAI Integration with Structured Output

This example demonstrates a Python table-valued function (TVF) that returns structured output using a pydantic.BaseModel. The function takes a word as input and returns 10 synonyms along with a score indicating how close each synonym is to the original word.

Python

import os
from typing import List
from pydantic import BaseModel, Field
from singlestoredb.functions import udf, Table
import instructor
from openai import OpenAI

# Create OpenAI client via instructor wrapper
client = instructor.from_openai(OpenAI(api_key=os.getenv("OPENAI_API_KEY")))

# Define output schema
class Synonym(BaseModel):
    word: str = Field(description='Synonym of the given word')
    score: float = Field(
        description='Closeness score from 0.0 to 1.0'
    )

# Define the TVF
@udf
async def get_synonyms(word: str) -> Table[List[Synonym]]:
    """Return a list of synonyms of the given word and a score."""
    return Table(client.create(
        model='gpt-4o-mini',
        messages=[
            dict(role='system', content='You are a helpful assistant'),
            dict(
                role='user',
                content=f'''
                    * Get a list of synonyms of the word "{word}"
                    * Limit the number of results to 10
                '''
            )
        ],
        response_model=List[Synonym],
        max_retries=0,
    ))

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

This results in the following external function:

SQL

CREATE EXTERNAL FUNCTION
    `get_synonyms`(`word` TEXT NOT NULL)
    RETURNS TABLE(
        `word` TEXT NOT NULL,
        `score` DOUBLE NOT NULL
    )
    AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;

Run the following command to retrieve the synonyms of the word “danger”.

SQL

SELECT * FROM get_synonyms('danger');

The command results the following:

SQL

                    word   score
0                 hazard   0.90
1                   risk   0.85
2                 threat   0.88
3                  peril   0.90
4               jeopardy   0.87
5            threatening   0.80
6  threatening situation   0.70
7             threatened   0.75
8           insecurities   0.60
9                 unsafe   0.65

Example 4: Machine Learning Model Scoring

This example demonstrates how to use a pre-trained Keras model stored in Link text to score a collection of input parameters.

Python

import numpy as np
import pandas as pd
import numpy.typing as npt
import singlestoredb.notebook as nb
from tensorflow.keras.models import load_model
from singlestoredb.functions import udf

# Download model from Stage
nb.stage.download_file('my_model.keras', local_path='my_model.keras')

# Load the model
model = load_model('my_model.keras')

@udf
async def keras_score(
    param_1: npt.NDArray[np.float32],
    param_2: npt.NDArray[np.float32],
    param_3: npt.NDArray[np.float32],
) -> npt.NDArray[np.float32]:
    """
    Score rows using a Keras model.

    Parameters
    ----------
    param_1 : np.ndarray[np.float32]
        First input parameter.
    param_2 : np.ndarray[np.float32]
        Second input parameter.
    param_3 : np.ndarray[np.float32]
        Third input parameter.

    Returns
    -------
    np.ndarray[np.float32]
        Predicted scores from the model.
    """
    X_test = pd.DataFrame({
        'param_1': param_1,
        'param_2': param_2,
        'param_3': param_3
    })
    return model.predict(X_test).reshape((-1,))

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

On this page

Equivalent Data Types

Handling NULL Values

Scalar Python Functions (UDFs and TVFs)

Vectorized Python Functions (UDFs and TVFs)

Overriding Parameter and Return Value Types

Cancelling Running Python UDFs

Timeouts

User Permissions

Required Permissions

Examples

Example 1: Calculate Sales Metrics with Vectorized Python TVFs

Example 2: Vector Embeddings

Example 3: OpenAI Integration with Structured Output

Example 4: Machine Learning Model Scoring

Was this article helpful?

On this page

Was this article helpful?