Code Engine - Powered by Wasm
The Code Engine feature in SingleStoreDB supports creating functions (UDFs and TVFs) using code compiled to WebAssembly (Wasm). The Code Engine uses the wasmtime
runtime to compile and run WebAssembly code. See Wasmtime for more information.
This feature supports any language that can compile to the Wasm core specification, which allows you to create UDFs/TVFs in a language of your choice using existing code libraries and run them in a sandboxed environment for enhanced security. Each Wasm function instance runs in its own in-process sandbox. It uses a linear memory model and provides hard memory protection boundaries. By default, each Wasm function sandbox is allocated 16MB
of memory. You can use the GROW TO
clause to modify the memory allocated to a Wasm function sandbox.
The Wasm UDF/TVF should not require runtime capabilities that the database does not provide out of the box. For example, a UDF/TVF can not make system calls, open files, open sockets, send network messages, create processes, or create threads. SingleStoreDB does not support these operations out of the box to protect the security and integrity of its service process.
Wasm modules are serialized to the plancache. Because Wasm modules can be significantly large in size, if the value of Alloc_compiled_unit_sections
exceeds Effective_compiled_images_eviction_memory_limit_mb
, compiled images are evicted from memory. The Effective_compiled_images_eviction_memory_limit_mb
is derived from compiled_images_eviction_memory_limit_mb
and compiled_images_eviction_memory_limit_percent
variables. They can be set both at startup and runtime. Refer to In-Depth Variable Definitions for more information. The eviction task runs in the background once per second.
SingleStoreDB supports both the Basic and Canonical Wasm Application Binary Interfaces (ABIs). See Select the Wasm ABI for more information.
Using the Basic ABI, SingleStoreDB supports any language that can compile to Wasm code, if the UDF accepts and returns only numeric data types.
Using the Canonical ABI, you can pass and return complex data types from UDF/TVF. SingleStoreDB currently supports the following languages for Canonical ABI:
C/C++
Rust
Once a Wasm function is added to SingleStoreDB, it becomes a part of the database. Hence, it is also included in the database backup and restore operations.
See Create Wasm UDFs for information on creating a Wasm UDF.
See Create Wasm TVFs for information on creating a Wasm TVF.
Prerequisites
You can build Wasm extensions for SingleStoreDB on Mac, Windows, or Linux. To build Wasm extensions, you'll need:
(For Mac) Homebrew package manager.
Select the Wasm ABI
When a Wasm function is created, you can specify one of the following Application Binary Interfaces (ABIs): Basic or Canonical. The default ABI type is Canonical, unless specified explicitly.
Basic: This is the bare-bones Wasm ABI. It supports only 32-bit and 64-bit integers and floating point numbers.
Canonical: This ABI type is defined by the Canonical ABI specification (a part of the Interface Types proposal). It is a superset of Basic ABI, and it allows usage of structured and complex interface types.
Note
The user must ensure that the specified ABI type matches the Wasm module implementation. A mismatch may lead to unexpected results at runtime.
Canonical ABI
The canonical ABI is a proposed part of WASI. To use the canonical ABI, you must specify additional metadata in the form of a WIT IDL string.
The wit-bindgen
tool, which generates bindings for the Canonical ABI, declares function and structure names as hyphenated strings. Using hyphenated names for UDFs/TVFs requires them to be enclosed in backticks (`
), Hence, all function, parameter, and record field names loaded from the Wasm modules are renamed to use underscores instead of hyphens. For example, consider the the following Wasm UDF specified by the WIT definition:
user-function-name: func() -> string;
To call this Wasm UDF, we use the following syntax:
SELECT user_function_name();
Configure Wasm
Use the following global variables to configure the Code Engine for Wasm extensibility:
Variable Name | Description | Default Value |
---|---|---|
| Specifies if a user can create or call Wasm UDFs/TVFs. |
|
| Specifies the maximum size (in bytes) that a compiled Wasm module can use. |
|
| Specifies the maximum linear memory (in bytes) that an individual Wasm module can use. This will further constrain UDF/TVF-specific |
|
| Specifies the maximum size (in bytes) of Wasm modules that may be loaded. The size is defined as the size of raw, uncompiled data passed in the |
|
Wasm Data Type Coercions
The following table describes how database types are coerced to and from Wasm ABI types when using explicitly-typed UDFs/TVFs. Empty cells indicate that automatic coercion is not available currently, and you must cast that SingleStoreDB database type to another data type for which data type conversion is available:
SingleStoreDB Data Type | Wasm Basic ABI Type | Wasm Canonical ABI Type |
---|---|---|
ARRAY | list<...> | |
BIGINT | i64 | i64 |
BINARY | u32 | u32 |
BINARY(...) | list<u8> | |
BIT | ||
BLOB | list<u8> | |
BOOL | i32 | u8 |
CHAR | i32 | char |
CHAR(...) | string | |
DATE | ||
DATETIME | ||
DATETIME(6) | ||
DECIMAL | ||
DOUBLE | f64 | float64 |
ENUM | ||
FLOAT | f32 | float32 |
GEOGRAPHY | string (WKT format) | |
GEOGRAPHYPOINT | string (WKT format) | |
INT | i32 | i32 |
JSON | string | |
LONGBLOB | list<u8> | |
LONGTEXT | string | |
MEDIUMBLOB | list<u8> | |
MEDIUMINT | i32 | i32 |
MEDIUMTEXT | string | |
RECORD | record | |
SET | ||
SMALLINT | i32 | i16 |
TEXT | string | |
TIME | ||
TIME(6) | ||
TIMESTAMP | i64 | i64 |
TIMESTAMP(6) | i64 | i64 |
TINYBLOB | list<u8> | |
TINYINT | i32 | i8 |
TINYTEXT | string | |
VARBINARY | list<u8> | |
VARCHAR | string | |
YEAR | i32 | i32 |
Remarks
Wasm does not support Base-10 numeric or date-time data types. You must specify/cast these data types to ABI-supported data types.
The
GEOGRAPHY
andGEOGRAPHYPOINT
data types are converted to strings using the WKT representation.SingleStoreDB recommends casting the
ENUM
data type to strings.Coercions to/from canonical ABI string types require that the data be encoded using multi-byte UTF-8 (
utf8mb3
orutf8mb4
collations).
Improve Wasm Function Performance
Here are some examples of ways to increase the speed of your Wasm code:
Compile the Wasm function in "release" mode and not in debug mode. To compile in release mode,
In Rust, pass the
--release
flag to cargo, andIn C/C++, pass the
-O3
option to the compiler.
If your program can benefit by building data structures in advance, you may improve performance by maintaining a cache in your Wasm program. For example, you can cache reusable data between function calls.
The following example caches compiled regular expressions so it does not have to compile the regular expression argument again for every row passed to the function:
wit_bindgen_rust::export!("s2regex.wit"); use regex::Regex; use std::cell::RefCell; use std::collections::HashMap; struct S2regex; thread_local! { static COMPILED_RGXS: RefCell<HashMap<String, Regex>> = RefCell::new(HashMap::new()); } impl s2regex::S2regex for S2regex { fn capture(input: String, pattern: String) -> String { COMPILED_RGXS.with(|c| { let mut map = c.borrow_mut(); let re = map .entry(pattern) .or_insert_with_key(|pattern| Regex::new(pattern).unwrap()); re.captures(&input) .and_then(|c| c.get(1)) .map(|c| c.as_str()) .unwrap_or_default() .to_string() }) } }
The following query has to compile the regular expression only for the first row:
SELECT * FROM tbl WHERE s2regex(tbl.col, '<reg_exp_string>') = "val";
Troubleshooting
How to find bug(s) in the Wasm code when it is called from SQL?
Debug the code outside of the database. You may use a test harness, or single-step through the code in a debugger.
FAQs
Can I use a Wasm binary outside of the SingleStoreDB database once it is imported into the database?
No. Once a Wasm binary is imported into a SingleStoreDB database, you cannot use it for applications outside the database. However, you can make a copy of the binary and use it instead.
What happens if there is a mismatch between the WIT file and the generated code?
A mismatch may cause a Wasm trap or unexpected results. The exact behavior depends on how the Wasm program interprets the arguments.
Can I reverse engineer a Wasm binary once it is imported into the database?
No. Once a Wasm binary is imported into the database, it's not possible to see the binary data, it can only be executed. Hence, you can only reverse engineer the Wasm binary behaviorally and not by disassembling the binary.
What happens when a Wasm function crashes or runs out of memory?
If a Wasm function crashes or runs out of memory, it causes the query to fail and returns an error. If possible, the error message may include a stack into the Wasm function.