# Working with Python UDFs

## Equivalent Data Types

When defining Python UDFs, input parameters and return values must be mapped between Python and SingleStore data types. The framework automatically infers these mappings from type annotations, but they can be overridden if necessary.

Following are the default Python to SingleStore mappings:

| **Python Type** | **SingleStoreType** |
| --------------- | ------------------- |
| `int`           | `BIGINT`            |
| `float`         | `DOUBLE`            |
| `str`           | `TEXT`              |
| `bytes`         | `BLOB`              |
| `bool`          | `BOOL`              |

For finer control, use the `singlestoredb.functions.dtypes` module to specify types such as `SMALLINT`, `VARCHAR`, `DECIMAL`, or `JSON`.

## Handling NULL Values

By default, Python UDFs and TVFs do not allow `NULL` values. The method for enabling `NULL` support depends on whether the Python function is scalar or vectorized.

## Scalar Python Functions (UDFs and TVFs)

* Use `Optional[...] `(or `type` | `None` in Python 3.10+) to allow parameters and return values to accept `NULL`.
* This makes the entire parameter or return value nullable.
* If any argument is `NULL`, the function can return `NULL` as needed.

## Vectorized Python Functions (UDFs and TVFs)

* `Optional[...]` cannot be used for element-level NULLs inside vectors.
* Instead, use the `Masked` annotation for parameters and return values that must support `NULL`.
* A `Masked` value consists of:

  * A data vector containing the non-NULL values.
  * A boolean mask vector, where `True` indicates a `NULL` element.
* This ensures that `NULL` values are preserved and propagated correctly across inputs and outputs.
* In Python TVFs, each output column can independently use `Masked` to indicate nullability.

## Overriding Parameter and Return Value Types

You can specify an output schema using the `returns=` parameter in the `@udf` decorator. This schema can be defined as:

* A list of SQL types with `name=` specified, or
* A `NamedTuple`, `TypedDict`, or `pydantic.BaseModel`.

The schema class is automatically inferred and it does not need to be returned explicitly, it is only used to define the output schema.

## Cancelling Running Python UDFs

When a query that uses a Python UDF is cancelled or when the connection between the database engine and the Python UDF server is broken, the Python UDF execution continues until it can be safely interrupted.

* **Synchronous Python UDFs**

  * **Scalar Python UDFs**: Cancellation can only occur between row function calls.
  * **Vectorized Python UDFs**: Cancellation occurs only after all rows in the current batch are processed.
* **Asynchronous Python UDFs**

  * Cancellation is detected more quickly.
  * When a disconnect is detected, an `asyncio.CancelledError` is raised the next time the Python UDF becomes active.
  * If synchronous operations are used inside an `async` Python UDF, they must complete before cancellation occurs.
  * Nested async calls are cancelled as soon as they activate again.

SingleStore recommends using `async` before the function definition for better cancellation handling.

> **📝 Note**: Use asynchronous Python UDFs and `async` libraries wherever possible to ensure that Python UDFs can be cancelled promptly.

## Timeouts

A timeout can be applied directly in the `@udf`decorator using the `timeout=` parameter.

* The value must be specified in seconds.
* If the timeout period is exceeded, the Python UDF is cancelled automatically.

## User Permissions

To allow users other than the organization owner to execute Python UDFs or TVFs, the organization owner must grant the appropriate permissions. These permissions are applied within the customer’s workspace using the SQL Editor.

## Required Permissions

1. `EXECUTE` (Object-Level Permission)

   * Required to execute a function in a specific database.
   * Can be granted on all functions in a database or a specific function.
   * Following example demonstrates the syntax:
     ```sql
     GRANT EXECUTE ON <db_name>.<function_name> TO 'user'@'%';

     ```

2. `OUTBOUND` (Global Permission)

   * Required to allow functions to make external network requests.
   * Must be granted globally.
   * Following example demonstrates the syntax:
     ```sql
     GRANT OUTBOUND ON *.* TO 'user'@'%';

     ```

> **📝 Note**: Both permissions must be granted to ensure the user can execute Python UDFs or TVFs successfully. Without these permissions, execution may fail due to insufficient privileges.

## Examples

## Example 1: Calculate Sales Metrics with Vectorized Python TVFs

This example demonstrates a vectorized Python TVF that computes total and average sales per row and returns a table with named columns.

```python
import typing
import numpy as np
import pandas as pd
from singlestoredb.functions import udf, Table
import numpy.typing as npt

# Define output schema using NamedTuple
class SalesOutput(typing.NamedTuple):
    total_sales: float
    average_sales: float
    category: str

@udf(returns=SalesOutput)
async def vector_sales(
    units_sold: npt.NDArray[np.float64],  # Vector of units sold
    unit_price: npt.NDArray[np.float64],  # Vector of unit prices
    category: npt.NDArray[np.str_]          # Vector of categories
) -> Table[pd.DataFrame]:
    """
    Calculate total and average sales per row with category.
    
    Parameters
    ----------
    units_sold : np.ndarray[np.float64]
        Number of units sold
    unit_price : np.ndarray[np.float64]
        Price per unit
    category : np.ndarray[str]
        Category name for each row

    Returns
    -------
    pd.DataFrame
        Table with columns:
        - total_sales
        - average_sales
        - category
    """
    # Compute vectorized total and average
    total_sales = units_sold * unit_price
    average_sales = total_sales / np.maximum(units_sold, 1)  # Avoid divide by zero
    
    # Return as a DataFrame with named columns
    df = pd.DataFrame({
        'total_sales': total_sales,
        'average_sales': average_sales,
        'category': category
    })
    
    return Table(df)

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()
```

This results in the following external function:

```sql
CREATE EXTERNAL FUNCTION `vector_sales`(
    `units_sold` DOUBLE NOT NULL,
    `unit_price` DOUBLE NOT NULL,
    `category` TEXT NOT NULL
)
RETURNS TABLE(
    `total_sales` DOUBLE NOT NULL,
    `average_sales` DOUBLE NOT NULL,
    `category` TEXT NOT NULL
)
AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;
```

## Example 2: Vector Embeddings

This example demonstrates a Python UDF that can connect to any deployed embedding service in SingleStore Aura.

```python
import base64
import sys
import numpy as np
import requests
import numpy.typing as npt
from singlestoredb.functions import udf

# Configuration for the embedding service
MODEL_NAME = 'name-of-embedding-service'
MODEL_URL = 'url-of-embedding-server'
TOKEN = 'api-token-of-embedding-service'

HEADERS = {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {TOKEN}',
}

@udf
async def mixedbread_embeddings(
    text: npt.NDArray[np.str_],  # Input: array of strings
) -> npt.NDArray[np.bytes_]:     # Output: array of bytes
    """
    Generate vector embeddings for an array of text inputs.

    Parameters
    ----------
    text : numpy.ndarray
        Array of input strings

    Returns
    -------
    numpy.ndarray
        Array of byte embeddings
    """
    # Send request to embedding service
    res = requests.post(
        MODEL_URL,
        headers=HEADERS,
        json=dict(
            model=MODEL_NAME,
            input=text.tolist(),
            encoding_format='base64',
        ),
    )

    # Raise an error if the request failed
    if res.status_code >= 400:
        print(res.content.decode('utf8'), file=sys.stderr)
        raise RuntimeError(res.content.decode('utf8'))

    # Decode embeddings and return as a NumPy array of bytes
    return np.array(
        [base64.b64decode(x['embedding']) for x in res.json()['data']],
        dtype=object,  # Preserve byte arrays correctly
    )

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

```

This results in the following external function:

```sql
CREATE EXTERNAL FUNCTION `mixedbread_embeddings`(`text` TEXT NOT NULL)
    RETURNS BLOB NOT NULL
    AS REMOTE SERVICE "base-url-of-udf-server-including-port"
    FORMAT ROWDAT_1;
```

## Example 3: OpenAI Integration with Structured Output

This example demonstrates a Python table-valued function (TVF) that returns structured output using a `pydantic.BaseModel`. The function takes a word as input and returns 10 synonyms along with a score indicating how close each synonym is to the original word.

```python
import os
from typing import List
from pydantic import BaseModel, Field
from singlestoredb.functions import udf, Table
import instructor
from openai import OpenAI

# Create OpenAI client via instructor wrapper
client = instructor.from_openai(OpenAI(api_key=os.getenv("OPENAI_API_KEY")))

# Define output schema
class Synonym(BaseModel):
    word: str = Field(description='Synonym of the given word')
    score: float = Field(
        description='Closeness score from 0.0 to 1.0'
    )

# Define the TVF
@udf
async def get_synonyms(word: str) -> Table[List[Synonym]]:
    """Return a list of synonyms of the given word and a score."""
    return Table(client.create(
        model='gpt-4o-mini',
        messages=[
            dict(role='system', content='You are a helpful assistant'),
            dict(
                role='user',
                content=f'''
                    * Get a list of synonyms of the word "{word}"
                    * Limit the number of results to 10
                '''
            )
        ],
        response_model=List[Synonym],
        max_retries=0,
    ))

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()
```

This results in the following external function:

```sql
CREATE EXTERNAL FUNCTION 
    `get_synonyms`(`word` TEXT NOT NULL) 
    RETURNS TABLE(
        `word` TEXT NOT NULL,
        `score` DOUBLE NOT NULL
    )
    AS REMOTE SERVICE "http://<svchost>/<endpoint>/invoke" FORMAT ROWDAT_1;
```

Run the following command to retrieve the synonyms of the word “danger”.

```sql
SELECT * FROM get_synonyms('danger');
```

The command results the following:

```sql
                    word   score
0                 hazard   0.90
1                   risk   0.85
2                 threat   0.88
3                  peril   0.90
4               jeopardy   0.87
5            threatening   0.80
6  threatening situation   0.70
7             threatened   0.75
8           insecurities   0.60
9                 unsafe   0.65
```

## Example 4: Machine Learning Model Scoring

This example demonstrates how to use a pre-trained Keras model stored in [Stage](https://docs.singlestore.com/cloud/load-data/load-data-from-files/stage.md) to score a collection of input parameters.

```python
import numpy as np
import pandas as pd
import numpy.typing as npt
import singlestoredb.notebook as nb
from tensorflow.keras.models import load_model
from singlestoredb.functions import udf

# Download model from Stage
nb.stage.download_file('my_model.keras', local_path='my_model.keras') 

# Load the model
model = load_model('my_model.keras')

@udf
async def keras_score(
    param_1: npt.NDArray[np.float32],
    param_2: npt.NDArray[np.float32],
    param_3: npt.NDArray[np.float32],
) -> npt.NDArray[np.float32]:
    """
    Score rows using a Keras model.

    Parameters
    ----------
    param_1 : np.ndarray[np.float32]
        First input parameter.
    param_2 : np.ndarray[np.float32]
        Second input parameter.
    param_3 : np.ndarray[np.float32]
        Third input parameter.

    Returns
    -------
    np.ndarray[np.float32]
        Predicted scores from the model.
    """
    X_test = pd.DataFrame({
        'param_1': param_1,
        'param_2': param_2,
        'param_3': param_3
    })
    return model.predict(X_test).reshape((-1,))

# Start Python UDF server

import singlestoredb.apps as apps
connection_info = await apps.run_udf_app()

```

## Example 5: Overriding Types with dtypes

By default, Python types map to general SQL types (for example, `int` maps to `BIGINT`). Refer to [Equivalent Data Types](https://docs.singlestore.com/#section-id235163203492027.md) for related information. Use types from `singlestoredb.functions.dtypes` to override these defaults and define precise SQL signatures.

Pass these types to the `@udf(args=[...], returns=...)` decorator. The following examples show how to override default type mappings:

* Use `SMALLINT` for integer inputs and outputs

  This example overrides the default `BIGINT` mapping for Python `int` by explicitly using `SMALLINT`. Use this approach when it is required to match an existing schema that uses smaller integer types.
  ```python
  from singlestoredb.functions import udf
  from singlestoredb.functions.dtypes import SMALLINT

  @udf(
     args=[SMALLINT(nullable=False)],
     returns=SMALLINT(nullable=False),
  )
  def double_small(x: int) -> int:
     return x * 2

  ```
* Control string length and return structured JSON data

  This example defines a bounded string input using `VARCHAR(n)` and returns structured data as JSON. Use `VARCHAR(n)` to enforce input size limits and return a JSON-encoded string when using the `JSON` type.
  ```python
  from singlestoredb.functions import udf
  from singlestoredb.functions.dtypes import JSON, VARCHAR

  @udf(
     args=[VARCHAR(128, nullable=False)],
     returns=JSON(nullable=False),
  )
  def wrap_as_json(label: str) -> str:
     import json
     return json.dumps({"label": label})
  ```
  > **📝 Note**: - Ensure `VARCHAR(n)` length matches the expected input size. Values longer than this limit may be truncated or rejected.
  > - When using `JSON` as the return type, return a JSON-encoded string.
  > - Set `nullable` to match whether your Python types allow `None`.

***

Modified at: May 22, 2026

Source: [/cloud/container-services/python-udfs/working-with-python-udfs/](https://docs.singlestore.com/cloud/container-services/python-udfs/working-with-python-udfs/)

(An index of the documentation is available at /llms.txt)
