ML Functions

Overview

Machine learning (ML) functions enable trained models to be run directly within SQL queries. They support real-time classification of new data and detection of anomalies without requiring custom code. These functions allow predictions to be embedded directly into workflows that operationalize insights at the data layer.

Introduction to Machine Learning

Machine learning is a field of study in artificial intelligence that develops and applies methods for learning patterns from historical data and using those patterns to make predictions or decisions on new data. Unlike rule-based systems, ML models adapt automatically based on the data. Common use cases include fraud detection, predictive maintenance, and customer behavior analysis. Once trained, models can be deployed and invoked using ML functions to generate predictions at scale.

Classification

Classification is a supervised learning technique that assigns each input to one of a predefined set of classes or labels. Models are trained on labeled datasets, where each input is paired with its correct output, and then used to classify new data. For example, a model may predict whether an incoming email is "spam" or "not spam" based on its content and metadata.

Following are the types of classification:

Binary Classification: Predicts one of two possible outcomes (for example, fraud vs. non-fraud).
Multiclass Classification: Predicts one label from multiple possible categories (for example, product type A, B, or C).
Multilabel Classification: Assigns multiple labels to a single data point (for example, tagging an image with "beach" and "sunset").

Examples:

A credit-card transaction classified as “fraudulent” or “legitimate.”
Customer support tickets categorized as “billing,” “technical issue,” or “account upgrade.”

The ML_CLASSIFY is a supervised machine learning function for classification tasks. It supports both binary (two classes) and multi-class (more than two classes) classification. It leverages algorithms such as logistic regression, random forest, and gradient boosting. Use SQL queries to call ML_CLASSIFY function and return predicted class labels.

Anomaly Detection

Anomaly detection identifies data points that deviate significantly from expected patterns. Anomalies signal critical issues such as fraud, equipment failure, or network intrusions. An anomaly is any value or pattern that does not match normal behavior. Anomalies indicate the following:

Performance issues (for example, server overload)
System faults (for example, failed jobs or memory leaks)
Opportunities (for example, traffic spikes caused by a marketing campaign)

For example, if a cluster's CPU usage normally stays between 20–60% and suddenly rises to 95%, the spike is an anomaly.

Time-Series Anomaly Detection

Time-series anomaly detection analyzes data collected over time. For example, CPU or memory utilization per minute or hour. It considers not only individual values, but also the sequence and patterns in the data. It learns seasonal patterns (daily or weekly), long‑term trends, and normal variability ranges. The system flags values that deviate from these learned patterns.

Following are the types of time-series anomaly detection:

Supervised: Supervised models use labeled anomalies to learn failure patterns.
Unsupervised: Unsupervised models learn normal behavior from historical data and flag deviations without labels.

The ML_ANOMALY_DETECT function is an unsupervised time-series anomaly detection function currently. It supports statistical methods (e.g., z-score, interquartile range) and machine learning methods (e.g., Isolation Forest, One-Class SVM). This function returns a prediction for each row, identifying it as normal or anomalous, which can trigger alerts or be recorded for further analysis.

Install ML Functions

To install ML Functions, navigate to AI > AI & ML Functions, select the deployment on which to install ML Functions. In the ML Functions tab, select Install, review the ML Functions Summary and then select Deploy.

Once the ML Functions are installed, query them in the SQL Editor or SingleStore Notebooks. SingleStore provides the following ML Functions:

Category	Function
Statistical and Predictive Functions	SQL ML_CLASSIFY(model_name, TO_JSON(selected_data.*))
Statistical and Predictive Functions	SQL ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*))

Statistical and Predictive Functions

ML_CLASSIFY

Performs binary and multi-class classification on a dataset using standard machine learning algorithms. Supports common algorithms including:

Logistic Regression
Random Forest
Gradient Boosting

Syntax

SQL

ML_CLASSIFY(model_name, TO_JSON(selected_data.*))

Arguments

model_name: Name of the trained ML model to use.
selected_data: A row or set of rows selected for prediction.

Return Type

string

Usage

Basic usage

SQL

SELECT cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table) AS selected_data;

Basic usage with LIMIT

SQL

SELECT cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table WHERE column1 > 100000LIMIT 100) AS selected_data;

Insert predictions into a table

SQL

INSERT INTO predictions_table (id, prediction);
SELECT selected_data.id,
cluster.ML_CLASSIFY(model_name, TO_JSON(selected_data.*)) AS prediction 
FROM (SELECT * FROM table LIMIT 100) AS selected_data;

ML_ANOMALY_DETECT

Detects outliers and anomalies in datasets using statistical or machine learning-based methods. Suitable for security, monitoring, and anomaly detection applications. Supports the following methods:

Statistical: z-score, interquartile range (IQR)
ML-based: Isolation Forest, One-Class SVM

Syntax

SQL

ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*))

Arguments

model_name: Name of the trained ML model to use.
selected_data: A row or set of rows selected for prediction.

Return Type

string

Usage

Basic usage

SQL

SELECT cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM table) AS selected_data;

Basic usage with LIMIT

SQL

SELECT cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS predictions
FROM (SELECT * FROM tableWHERE column1 > 100000LIMIT 100) AS selected_data;

Insert predictions into a table

SQL

INSERT INTO predictions_table (id, prediction);
SELECT selected_data.id,
cluster.ML_ANOMALY_DETECT(model_name, TO_JSON(selected_data.*)) AS prediction 
FROM (SELECT * FROM table LIMIT 100) AS selected_data;

Train a New ML Model

To train a new ML model, follow these steps:

Navigate to AI > Models.
Select ML Models tab and then select Train New ML Model.
In the Select Function dialog, select one of the following ML functions:
- ML_CLASSIFY
- ML_ANOMALY_DETECT
Select Next to configure the model.

Configure Model

Model Name	Enter the name of the ML model.
Training Description	Enter the training description.
Workspace	Select the SingleStore deployment (workspace) the notebook connects to. Specifying a workspace allows natively connecting the SingleStore databases referenced in the notebook.
Compute Size	Select one of the following compute sizes: Small Medium GPU-T4
Run as	Run the notebook for training a model with or without personal credentials. Select one of the following: Run as <username>: Runs the notebook using the permissions and access of the current user account. Run as a Service Account: Runs the notebook independently of personal credentials, using a service account. Note Service accounts can only be created by Admin.

Select Next.

Select Training Data

Database	Select the database that contains the training data.
Table	Select the table from the selected database to train the machine learning model.
Target Column	Select the column that represents the prediction target for the model.
Feature Selection Mode	Specify how feature columns are selected.
Feature Column	Select one or more columns to be used as input features for training the model.

Preview the data and select Next.

Review the Summary and generated Fusion SQL syntax in the Generated SQL Script. The generated script performs the following:

Creates and trains a ML model
Uses data from the selected table in the selected database
Predicts values of target column status
Runs on the selected compute instance
Uses all available features by default

Following is the syntax of Fusion SQL script:

SQL

%s2ml train <machine_learning_algorithm>
	--model <model_name>
	--db <database_name>
	--input_table <table_name>
	--target_column <target_column>
        --description <training_description>
	--runtime <compute_instance>
	--selected_features { \"mode\": <feature_selection_mode>, \"features\": <feature_column> }

Select Start Training to train the ML model.

Manage an Existing ML Model

Existing ML models can be managed by performing the following actions:

View details
Run prediction
Share
Delete

View Details of an Existing ML Model

To view details of an existing ML model, select the ellipsis under Actions column of the trained ML model, and select View Details. Alternatively, select the ML model in the Name column. Select the Details tab to view training status, training configuration, training logs, and details about how to use the ML model.

Run Prediction on an Existing ML Model

Run batch prediction on the existing ML model.

Run a Batch Prediction

To run a batch prediction on the existing ML model, select the ellipsis under Actions column of the trained ML model, and select Run Prediction.

Select Prediction Data

Database	Select the database.
Target Table	Select the target table on which the prediction will be run.
Target Column	Select the target column on which the prediction will focus on.
Timestamp Column	Select the column having timestamp data. Available for `ML_ANOMALY_DETECT` only.

Preview the data and select Next.

Configure Destination

Prediction Interval Width	Select the interval width of prediction. Available for `ML_ANOMALY_DETECT` only.
Destination Table Name	Select the destination table in which the prediction results will be stored.
Destination Column	Select the destination column in which the prediction data will be saved.
Run as	Run the notebook for training a model with or without personal credentials. Select one of the following: Run as <username>: Runs the notebook using the permissions and access of the current user account. Run as a Service Account: Runs the notebook independently of personal credentials, using a service account. Note Service accounts are only created by Admin.

Review the Summary and generated Fusion SQL syntax in the Generated SQL Script. Select Start Prediction to run batch prediction on the trained ML model.

View Predictions of an Existing ML Model

To view the predictions of the trained ML model, select the ML model in the Name column. Select the Predictions tab to view prediction metadata and status.

To share an existing ML model, select the ellipsis under the Actions column of the trained ML model, and select Share.

Delete an Existing ML Model

To delete an existing ML model, select the ellipsis under Actions column of the trained ML model, and select Delete.

Status of ML Models

Status	Description
Pre-processing	The system is preparing data for ML model training (e.g., data cleaning, feature extraction).
Training	The ML model is currently being trained but results are not yet available.
Done	The ML model has been successfully trained and is ready for use.
Error	The ML model training or processing failed due to an error.

On this page

Overview

Introduction to Machine Learning

Classification

Anomaly Detection

Time-Series Anomaly Detection

Install ML Functions

Statistical and Predictive Functions

ML_CLASSIFY

Syntax

Arguments

Return Type

Usage

ML_ANOMALY_DETECT

Syntax

Arguments

Return Type

Usage

Train a New ML Model

Manage an Existing ML Model

View Details of an Existing ML Model

Run Prediction on an Existing ML Model

Run a Batch Prediction

View Predictions of an Existing ML Model

Delete an Existing ML Model

Status of ML Models

Was this article helpful?

On this page

Was this article helpful?

ML Functions

On this page

Overview

Introduction to Machine Learning

Classification

Anomaly Detection

Time-Series Anomaly Detection

Install ML Functions

Statistical and Predictive Functions

ML_CLASSIFY

Syntax

Arguments

Return Type

Usage

ML_ANOMALY_DETECT

Syntax

Arguments

Return Type

Usage

Train a New ML Model

Manage an Existing ML Model

View Details of an Existing ML Model

Run Prediction on an Existing ML Model

Run a Batch Prediction

View Predictions of an Existing ML Model

Share an Existing ML Model

Delete an Existing ML Model

Status of ML Models

Was this article helpful?

On this page

Was this article helpful?