Code Engine - Powered by Wasm

The Code Engine feature in SingleStore supports creating functions (UDFs and TVFs) using code compiled to WebAssembly (Wasm). The Code Engine uses the wasmtime runtime to compile and run WebAssembly code. See Wasmtime for more information.

This feature supports any language that can compile to the Wasm core specification, which allows you to create UDFs/TVFs in a language of your choice using existing code libraries and run them in a sandboxed environment for enhanced security. Each Wasm function instance runs in its own in-process sandbox. It uses a linear memory model and provides hard memory protection boundaries. By default, each Wasm function sandbox is allocated 16MB of memory. You can use the GROW TO clause to modify the memory allocated to a Wasm function sandbox.

The Wasm UDF/TVF should not require runtime capabilities that the database does not provide out of the box. For example, a UDF/TVF can not make system calls, open files, open sockets, send network messages, create processes, or create threads. SingleStore does not support these operations out of the box to protect the security and integrity of its service process.

Wasm modules are serialized to the plancache. Because Wasm modules can be significantly large in size, if the value of Alloc_compiled_unit_sections exceeds Effective_compiled_images_eviction_memory_limit_mb, compiled images are evicted from memory. The Effective_compiled_images_eviction_memory_limit_mb is derived from compiled_images_eviction_memory_limit_mb and compiled_images_eviction_memory_limit_percent variables. They can be set both at startup and runtime. Refer to Managing Plancache Memory and Disk Usage for more information. The eviction task runs in the background once per second.

SingleStore supports both the Basic and Canonical Wasm Application Binary Interfaces (ABIs). See Select the Wasm ABI for more information.

  • Using the Basic ABI, SingleStore supports any language that can compile to Wasm code, if the UDF accepts and returns only numeric data types.

  • Using the Canonical ABI, you can pass and return complex data types from UDF/TVF. SingleStore currently supports the following languages for Canonical ABI:

    • C/C++

    • Rust

Once a Wasm function is added to SingleStore, it becomes a part of the database. Hence, it is also included in the database backup and restore operations.

Refer to Create Wasm UDFs for information on creating a Wasm UDF.

Refer to Create Wasm TVFs for information on creating a Wasm TVF.

If your logic and data structures can be implemented in PSQL with reasonable effort in a way that performs well, use PSQL. For information on creating PSQL functions or procedures, refer to Procedural Extensions. For integrating more complex logic and data structures, or using existing C/C++ or Rust code, SingleStore recommends using Wasm-based functions.

Prerequisites

You can build Wasm extensions for SingleStore on Mac, Windows, or Linux. To build Wasm extensions, you'll need:

Select the Wasm ABI

When a Wasm function is created, you can specify one of the following Application Binary Interfaces (ABIs): Basic or Canonical. The default ABI type is Canonical, unless specified explicitly.

  • Basic: This is the bare-bones Wasm ABI. It supports only 32-bit and 64-bit integers and floating point numbers.

  • Canonical: This ABI type is defined by the Canonical ABI specification (a part of the Interface Types proposal). It is a superset of Basic ABI, and it allows usage of structured and complex interface types.

Note

The user must ensure that the specified ABI type matches the Wasm module implementation. A mismatch may lead to unexpected results at runtime.

Canonical ABI

The canonical ABI is a proposed part of WASI. To use the canonical ABI, you must specify additional metadata in the form of a WIT IDL string.

The wit-bindgen tool, which generates bindings for the Canonical ABI, declares function and structure names as hyphenated strings. Using hyphenated names for UDFs/TVFs requires them to be enclosed in backticks (`), Hence, all function, parameter, and record field names loaded from the Wasm modules are renamed to use underscores instead of hyphens. For example, consider the the following Wasm UDF specified by the WIT definition:

user-function-name: func() -> string;

To call this Wasm UDF, we use the following syntax:

SELECT user_function_name();

Configure Wasm

Use the following global variables to configure the Code Engine for Wasm extensibility:

Variable Name

Description

Default Value

enable_wasm

Specifies if a user can create or call Wasm UDFs/TVFs.

ON

wasm_max_compiled_module_size

Specifies the maximum size (in bytes) that a compiled Wasm module can use.

26214400 (25MB)

wasm_max_linear_memory_size

Specifies the maximum linear memory (in bytes) that an individual Wasm module can use. This will further constrain UDF/TVF-specific GROW TO values.

16777216 (16MB or 256 Wasm pages)

wasm_max_raw_module_size

Specifies the maximum size (in bytes) of Wasm modules that may be loaded. The size is defined as the size of raw, uncompiled data passed in the CREATE FUNCTION statement.

26214400 (25MB)

Wasm Data Type Coercions

The following table describes how database types are coerced to and from Wasm ABI types when using explicitly-typed UDFs/TVFs. Empty cells indicate that automatic coercion is not available currently, and you must cast that SingleStore database type to another data type for which data type conversion is available:

SingleStore Data Type

Wasm Basic ABI Type

Wasm Canonical ABI Type

ARRAY

list<...>

BIGINT

i64

i64

BINARY

u32

u32

BINARY(...)

list<u8>

BIT

BLOB

list<u8>

BOOL

i32

u8

CHAR

i32

char

CHAR(...)

string

DATE

DATETIME

DATETIME(6)

DECIMAL

DOUBLE

f64

float64

ENUM

FLOAT

f32

float32

GEOGRAPHY

string (WKT format)

GEOGRAPHYPOINT

string (WKT format)

INT

i32

i32

JSON

string

LONGBLOB

list<u8>

LONGTEXT

string

MEDIUMBLOB

list<u8>

MEDIUMINT

i32

i32

MEDIUMTEXT

string

RECORD

record

SET

SMALLINT

i32

i16

TEXT

string

TIME

TIME(6)

TIMESTAMP

i64

i64

TIMESTAMP(6)

i64

i64

TINYBLOB

list<u8>

TINYINT

i32

i8

TINYTEXT

string

VARBINARY

list<u8>

VARCHAR

string

YEAR

i32

i32

Remarks

  • Wasm does not support Base-10 numeric or date-time data types. You must specify/cast these data types to ABI-supported data types.

  • The GEOGRAPHY and GEOGRAPHYPOINT data types are converted to strings using the WKT representation.

  • SingleStore recommends casting the ENUM data type to strings.

  • Coercions to/from canonical ABI string types require that the data be encoded using multi-byte UTF-8 (utf8mb3 or utf8mb4 collations).

Improve Wasm Function Performance

Here are some examples of ways to increase the speed of your Wasm code:

  • Compile the Wasm function in "release" mode and not in debug mode. To compile in release mode,

    • In Rust, pass the --release flag to cargo, and

    • In C/C++, pass the -O3 option to the compiler.

  • If your program can benefit by building data structures in advance, you may improve performance by maintaining a cache in your Wasm program. For example, you can cache reusable data between function calls.

    The following example caches compiled regular expressions so it does not have to compile the regular expression argument again for every row passed to the function:

    wit_bindgen_rust::export!("s2regex.wit");
    use regex::Regex;
    use std::cell::RefCell;
    use std::collections::HashMap;
    struct S2regex;
    thread_local! {
    static COMPILED_RGXS: RefCell<HashMap<String, Regex>> = RefCell::new(HashMap::new());
    }
    impl s2regex::S2regex for S2regex {
    fn capture(input: String, pattern: String) -> String {
    COMPILED_RGXS.with(|c| {
    let mut map = c.borrow_mut();
    let re = map
    .entry(pattern)
    .or_insert_with_key(|pattern| Regex::new(pattern).unwrap());
    re.captures(&input)
    .and_then(|c| c.get(1))
    .map(|c| c.as_str())
    .unwrap_or_default()
    .to_string()
    })
    }
    }

    The following query has to compile the regular expression only for the first row:

    SELECT * FROM tbl WHERE s2regex(tbl.col, '<reg_exp_string>') = "val";

Troubleshooting

How to find bug(s) in the Wasm code when it is called from SQL?

Debug the code outside of the database. You may use a test harness, or single-step through the code in a debugger.

FAQs

  • Can I use a Wasm binary outside of the SingleStore database once it is imported into the database?

    No. Once a Wasm binary is imported into a SingleStore database, you cannot use it for applications outside the database. However, you can make a copy of the binary and use it instead.

  • What happens if there is a mismatch between the WIT file and the generated code?

    A mismatch may cause a Wasm trap or unexpected results. The exact behavior depends on how the Wasm program interprets the arguments.

  • Can I reverse engineer a Wasm binary once it is imported into the database?

    No. Once a Wasm binary is imported into the database, it's not possible to see the binary data, it can only be executed. Hence, you can only reverse engineer the Wasm binary behaviorally and not by disassembling the binary.

  • What happens when a Wasm function crashes or runs out of memory?

    If a Wasm function crashes or runs out of memory, it causes the query to fail and returns an error. If possible, the error message may include a stack into the Wasm function.

In this section

Last modified: August 22, 2024

Was this article helpful?